resolve partial gitbook link in SyncPrim

This commit is contained in:
Dongliang Mu
2024-05-03 17:31:52 +08:00
parent e6cd5b1fd6
commit 811d4d091b
5 changed files with 38 additions and 38 deletions

View File

@@ -3,7 +3,7 @@ Linux 内核中的同步原语. 第一部分.
Introduction
--------------------------------------------------------------------------------
这一部分为 [linux-insides](https://xinqiu.gitbooks.io/linux-insides-cn/content/) 这本书开启了新的章节。定时器和时间管理相关的概念在上一个[章节](https://xinqiu.gitbooks.io/linux-insides-cn/content/Timers/index.html)已经描述过了。现在是时候继续了。就像你可能从这一部分的标题所了解的那样,本章节将会描述 Linux 内核中的[同步](https://en.wikipedia.org/wiki/Synchronization_%28computer_science%29)原语。
这一部分为 [linux-insides](/SUMMARY.md) 这本书开启了新的章节。定时器和时间管理相关的概念在上一个[章节](/Timers/)已经描述过了。现在是时候继续了。就像你可能从这一部分的标题所了解的那样,本章节将会描述 Linux 内核中的[同步](https://en.wikipedia.org/wiki/Synchronization_%28computer_science%29)原语。
像往常一样,在考虑一些同步相关的事情之前,我们会尝试去概括地了解什么是`同步原语`。事实上,同步原语是一种软件机制,提供了两个或者多个[并行](https://en.wikipedia.org/wiki/Parallel_computing)进程或者线程在不同时刻执行一段相同的代码段的能力。例如下面的代码片段:
@@ -20,7 +20,7 @@ clocksource_select();
...
mutex_unlock(&clocksource_mutex);
```
出自 [kernel/time/clocksource.c](https://github.com/torvalds/linux/master/kernel/time/clocksource.c) 源文件。这段代码来自于 `__clocksource_register_scale` 函数,此函数添加给定的 [clocksource](https://xinqiu.gitbooks.io/linux-insides-cn/content/Timers/linux-timers-2.html) 到时钟源列表中。这个函数在注册时钟源列表中生成两个不同的操作。例如 `clocksource_enqueue` 函数就是添加给定时钟源到注册时钟源列表——`clocksource_list` 中。注意这几行代码被两个函数所包围:`mutex_lock``mutex_unlock`,这两个函数都带有一个参数——在本例中为 `clocksource_mutex`
出自 [kernel/time/clocksource.c](https://github.com/torvalds/linux/master/kernel/time/clocksource.c) 源文件。这段代码来自于 `__clocksource_register_scale` 函数,此函数添加给定的 [clocksource](/Timers/linux-timers-2.md) 到时钟源列表中。这个函数在注册时钟源列表中生成两个不同的操作。例如 `clocksource_enqueue` 函数就是添加给定时钟源到注册时钟源列表——`clocksource_list` 中。注意这几行代码被两个函数所包围:`mutex_lock``mutex_unlock`,这两个函数都带有一个参数——在本例中为 `clocksource_mutex`
这些函数展示了基于[互斥锁 (mutex)](https://en.wikipedia.org/wiki/Mutual_exclusion) 同步原语的加锁和解锁。当 `mutex_lock` 被执行,允许我们阻止两个或两个以上线程执行这段代码,而 `mute_unlock` 还没有被互斥锁的处理拥有者锁执行。换句话说,就是阻止在 `clocksource_list`上的并行操作。为什么在这里需要使用`互斥锁` 如果两个并行处理尝试去注册一个时钟源会怎样。正如我们已经知道的那样,其中具有最大的等级(其具有最高的频率在系统中注册的时钟源)的列表中选择一个时钟源后,`clocksource_enqueue` 函数立即将一个给定的时钟源到 `clocksource_list` 列表:
@@ -39,7 +39,7 @@ static void clocksource_enqueue(struct clocksource *cs)
如果两个并行处理尝试同时去执行这个函数,那么这两个处理可能会找到相同的 `入口 (entry)` 可能发生[竞态条件 (race condition)](https://en.wikipedia.org/wiki/Race_condition) 或者换句话说,第二个执行 `list_add` 的处理程序,将会重写第一个线程写入的时钟源。
除了这个简答的例子,同步原语在 Linux 内核无处不在。如果再翻阅之前的[章节] (https://xinqiu.gitbooks.io/linux-insides-cn/content/Timers/index.html) 或者其他章节或者如果大概看看 Linux 内核源码,就会发现许多地方都使用同步原语。我们不考虑 `mutex` 在 Linux 内核是如何实现的。事实上Linux 内核提供了一系列不同的同步原语:
除了这个简答的例子,同步原语在 Linux 内核无处不在。如果再翻阅之前的[章节] (/Timers/) 或者其他章节或者如果大概看看 Linux 内核源码,就会发现许多地方都使用同步原语。我们不考虑 `mutex` 在 Linux 内核是如何实现的。事实上Linux 内核提供了一系列不同的同步原语:
* `mutex`;
* `semaphores`;
@@ -236,7 +236,7 @@ static inline void __raw_spin_lock(raw_spinlock_t *lock)
LOCK_CONTENDED(lock, do_raw_spin_trylock, do_raw_spin_lock);
}
```
就像你们可能了解的那样, 首先我们禁用了[抢占](https://en.wikipedia.org/wiki/Preemption_%28computing%29),通过 [include/linux/preempt.h](https://github.com/torvalds/linux/blob/master/include/linux/preempt.h) (在 Linux 内核初始化进程章节的第九[部分](https://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-9.html)会了解到更多关于抢占)中的 `preempt_disable` 调用实现禁用。当我们将要解开给定的`自旋锁`,抢占将会再次启用:
就像你们可能了解的那样, 首先我们禁用了[抢占](https://en.wikipedia.org/wiki/Preemption_%28computing%29),通过 [include/linux/preempt.h](https://github.com/torvalds/linux/blob/master/include/linux/preempt.h) (在 Linux 内核初始化进程章节的第九[部分](/Initialization/linux-initialization-9.md)会了解到更多关于抢占)中的 `preempt_disable` 调用实现禁用。当我们将要解开给定的`自旋锁`,抢占将会再次启用:
```C
static inline void __raw_spin_unlock(raw_spinlock_t *lock)
@@ -409,7 +409,7 @@ head | 7 | - - - | 7 | tail
* [Concurrent computing](https://en.wikipedia.org/wiki/Concurrent_computing)
* [Synchronization](https://en.wikipedia.org/wiki/Synchronization_%28computer_science%29)
* [Clocksource framework](https://xinqiu.gitbooks.io/linux-insides-cn/content/Timers/linux-timers-2.html)
* [Clocksource framework](/Timers/linux-timers-2.md)
* [Mutex](https://en.wikipedia.org/wiki/Mutual_exclusion)
* [Race condition](https://en.wikipedia.org/wiki/Race_condition)
* [Atomic operations](https://en.wikipedia.org/wiki/Linearizability)
@@ -422,4 +422,4 @@ head | 7 | - - - | 7 | tail
* [xadd instruction](http://x86.renejeschke.de/html/file_module_x86_id_327.html)
* [NOP](https://en.wikipedia.org/wiki/NOP)
* [Memory barriers](https://www.kernel.org/doc/Documentation/memory-barriers.txt)
* [Previous chapter](https://xinqiu.gitbooks.io/linux-insides-cn/content/Timers/index.html)
* [Previous chapter](/Timers/)

View File

@@ -4,9 +4,9 @@ Linux 内核的同步原语. 第二部分.
队列自旋锁
--------------------------------------------------------------------------------
这是本[章节](https://xinqiu.gitbooks.io/linux-insides-cn/content/SyncPrim/index.html)的第二部分,这部分描述 Linux 内核的和我们在本章的第一[部分](https://xinqiu.gitbooks.io/linux-insides-cn/content/SyncPrim/linux-sync-1.html)所见到的--[自旋锁](https://en.wikipedia.org/wiki/Spinlock)的同步原语。在这个部分我们将继续学习自旋锁的同步原语。 如果阅读了上一部分的相关内容你可能记得除了正常自旋锁Linux 内核还提供`自旋锁`的一种特殊类型 - `队列自旋锁`。 在这个部分我们将尝试理解此概念锁代表的含义。
这是本[章节](/SyncPrim/)的第二部分,这部分描述 Linux 内核的和我们在本章的第一[部分](/SyncPrim/linux-sync-1.md)所见到的--[自旋锁](https://en.wikipedia.org/wiki/Spinlock)的同步原语。在这个部分我们将继续学习自旋锁的同步原语。 如果阅读了上一部分的相关内容你可能记得除了正常自旋锁Linux 内核还提供`自旋锁`的一种特殊类型 - `队列自旋锁`。 在这个部分我们将尝试理解此概念锁代表的含义。
我们在上一[部分](https://xinqiu.gitbooks.io/linux-insides-cn/content/SyncPrim/linux-sync-1.html)已知`自旋锁`的 [API](https://en.wikipedia.org/wiki/Application_programming_interface):
我们在上一[部分](/SyncPrim/linux-sync-1.md)已知`自旋锁`的 [API](https://en.wikipedia.org/wiki/Application_programming_interface):
* `spin_lock_init` - 为给定`自旋锁`进行初始化;
* `spin_lock` - 获取给定`自旋锁`
@@ -95,12 +95,12 @@ int unlock(lock)
第一个线程将执行 `test_and_set` 指令设置 `lock``1`。当第二个线程调用 `lock` 函数,它将在 `while` 循环中自旋,直到第一个线程调用 `unlock` 函数而且 `lock` 等于 `0`。这个实现对于执行不是很好,因为该实现至少有两个问题。第一个问题是该实现可能是非公平的而且一个处理器的线程可能有很长的等待时间,即使有其他线程也在等待释放锁,它还是调用了 `lock`。第二个问题是所有想要获取锁的线程,必须在共享内存的变量上执行很多类似`test_and_set` 这样的`原子`操作。这导致缓存失效,因为处理器缓存会存储 `lock=1`,但是在线程释放锁之后,内存中 `lock`可能只是`1`
在上一[部分](https://xinqiu.gitbooks.io/linux-insides-cn/content/SyncPrim/linux-sync-1.html) 我们了解了自旋锁的第二种实现 -
在上一[部分](/SyncPrim/linux-sync-1.md) 我们了解了自旋锁的第二种实现 -
`排队自旋锁(ticket spinlock)`。这一方法解决了第一个问题而且能够保证想要获取锁的线程的顺序,但是仍然存在第二个问题。
这一部分的主旨是 `队列自旋锁`。这个方法能够帮助解决上述的两个问题。`队列自旋锁`允许每个处理器对自旋过程使用他自己的内存地址。通过学习名为 [MCS](http://www.cs.rochester.edu/~scott/papers/1991_TOCS_synch.pdf) 锁的这种基于队列自旋锁的实现,能够最好理解基于队列自旋锁的基本原则。在了解`队列自旋锁`的实现之前,我们先尝试理解什么是 `MCS` 锁。
`MCS`锁的基本理念就在上一段已经写到了,一个线程在本地变量上自旋然后每个系统的处理器自己拥有这些变量的拷贝。换句话说这个概念建立在 Linux 内核中的 [per-cpu](https://xinqiu.gitbooks.io/linux-insides-cn/content/Concepts/linux-cpu-1.html) 变量概念之上。
`MCS`锁的基本理念就在上一段已经写到了,一个线程在本地变量上自旋然后每个系统的处理器自己拥有这些变量的拷贝。换句话说这个概念建立在 Linux 内核中的 [per-cpu](/Concepts/linux-cpu-1.md) 变量概念之上。
当第一个线程想要获取锁,线程在`队列`中注册了自身,或者换句话说,因为线程现在是闲置的,线程要加入特殊`队列`并且获取锁。当第二个线程想要在第一个线程释放锁之前获取相同锁,这个线程就会把他自身的所变量的拷贝加入到这个特殊`队列`中。这个例子中第一个线程会包含一个 `next` 字段指向第二个线程。从这一时刻,第二个线程会等待直到第一个线程释放它的锁并且关于这个事件通知给 `next` 线程。第一个线程从`队列`中删除而第二个线程持有该锁。
@@ -458,7 +458,7 @@ smp_cond_acquire(!((val = atomic_read(&lock->val)) & _Q_LOCKED_PENDING_MASK));
总结
--------------------------------------------------------------------------------
这是 Linux 内核[同步原语](https://en.wikipedia.org/wiki/Synchronization_%28computer_science%29)章节第二部分的结尾。在上一个[部分](https://xinqiu.gitbooks.io/linux-insides-cn/content/SyncPrim/linux-sync-1.html)我们已经见到了第一个同步原语`自旋锁`通过 Linux 内核 实现的`排队自旋锁ticket spinlock`。在这个部分我们了解了另一个`自旋锁`机制的实现 - `队列自旋锁`。下一个部分我们继续深入 Linux 内核同步原语。
这是 Linux 内核[同步原语](https://en.wikipedia.org/wiki/Synchronization_%28computer_science%29)章节第二部分的结尾。在上一个[部分](/SyncPrim/linux-sync-1.md)我们已经见到了第一个同步原语`自旋锁`通过 Linux 内核 实现的`排队自旋锁ticket spinlock`。在这个部分我们了解了另一个`自旋锁`机制的实现 - `队列自旋锁`。下一个部分我们继续深入 Linux 内核同步原语。
如果您有疑问或者建议请在twitter [0xAX](https://twitter.com/0xAX) 上联系我,通过 [email](anotherworldofworld@gmail.com) 联系我,或者创建一个 [issue](https://github.com/0xAX/linux-insides/issues/new).
@@ -473,11 +473,11 @@ smp_cond_acquire(!((val = atomic_read(&lock->val)) & _Q_LOCKED_PENDING_MASK));
* [API](https://en.wikipedia.org/wiki/Application_programming_interface)
* [Test and Set](https://en.wikipedia.org/wiki/Test-and-set)
* [MCS](http://www.cs.rochester.edu/~scott/papers/1991_TOCS_synch.pdf)
* [per-cpu variables](https://xinqiu.gitbooks.io/linux-insides-cn/content/Concepts/linux-cpu-1.html)
* [per-cpu variables](/Concepts/linux-cpu-1.md)
* [atomic instruction](https://en.wikipedia.org/wiki/Linearizability)
* [CMPXCHG instruction](http://x86.renejeschke.de/html/file_module_x86_id_41.html)
* [LOCK instruction](http://x86.renejeschke.de/html/file_module_x86_id_159.html)
* [NOP instruction](https://en.wikipedia.org/wiki/NOP)
* [PREFETCHW instruction](http://www.felixcloutier.com/x86/PREFETCHW.html)
* [x86_64](https://en.wikipedia.org/wiki/X86-64)
* [Previous part](https://xinqiu.gitbooks.io/linux-insides-cn/content/SyncPrim/linux-sync-1.html)
* [Previous part](/SyncPrim/linux-sync-1.md)

View File

@@ -5,7 +5,7 @@
信号量
--------------------------------------------------------------------------------
这是本章的第三部分 [chapter](https://xinqiu.gitbooks.io/linux-insides-cn/content/SyncPrim/index.html),本章描述了内核中的同步原语,在之前的部分我们见到了特殊的 [自旋锁](https://en.wikipedia.org/wiki/Spinlock) - `排队自旋锁`。 在更前的 [部分](https://xinqiu.gitbooks.io/linux-insides-cn/content/SyncPrim/linux-sync-2.html) 是和 `自旋锁` 相关的描述。我们将描述更多同步原语。
这是本章的第三部分 [chapter](/SyncPrim/),本章描述了内核中的同步原语,在之前的部分我们见到了特殊的 [自旋锁](https://en.wikipedia.org/wiki/Spinlock) - `排队自旋锁`。 在更前的 [部分](/SyncPrim/linux-sync-2.md) 是和 `自旋锁` 相关的描述。我们将描述更多同步原语。
`自旋锁` 之后的下一个我们将要讲到的 [内核同步原语](https://en.wikipedia.org/wiki/Synchronization_%28computer_science%29)是 [信号量](https://en.wikipedia.org/wiki/Semaphore_%28programming%29)。我们会从理论角度开始学习什么是 `信号量` 然后我们会像前几章一样讲到Linux内核是如何实现信号量的。
@@ -70,13 +70,13 @@ struct semaphore {
}
```
`__SEMAPHORE_INITIALIZER` 宏传入了 `信号量` 结构体的名字并且初始化这个结构体的各个域。首先我们使用 `__RAW_SPIN_LOCK_UNLOCKED` 宏对给予的 `信号量` 初始化一个 `自旋锁`。就像你从 [之前](https://xinqiu.gitbooks.io/linux-insides-cn/content/SyncPrim/linux-sync-1.html) 的部分看到那样,`__RAW_SPIN_LOCK_UNLOCKED` 宏是在 [include/linux/spinlock_types.h](https://github.com/torvalds/linux/blob/master/include/linux/spinlock_types.h) 头文件中定义,它展开到 `__ARCH_SPIN_LOCK_UNLOCKED` 宏,而 `__ARCH_SPIN_LOCK_UNLOCKED` 宏又展开到零或者无锁状态
`__SEMAPHORE_INITIALIZER` 宏传入了 `信号量` 结构体的名字并且初始化这个结构体的各个域。首先我们使用 `__RAW_SPIN_LOCK_UNLOCKED` 宏对给予的 `信号量` 初始化一个 `自旋锁`。就像你从 [之前](/SyncPrim/linux-sync-1.md) 的部分看到那样,`__RAW_SPIN_LOCK_UNLOCKED` 宏是在 [include/linux/spinlock_types.h](https://github.com/torvalds/linux/blob/master/include/linux/spinlock_types.h) 头文件中定义,它展开到 `__ARCH_SPIN_LOCK_UNLOCKED` 宏,而 `__ARCH_SPIN_LOCK_UNLOCKED` 宏又展开到零或者无锁状态
```C
#define __ARCH_SPIN_LOCK_UNLOCKED { { 0 } }
```
`信号量` 的最后两个域 `count``wait_list` 是通过现有资源的数量和空 [链表](https://xinqiu.gitbooks.io/linux-insides-cn/content/DataStructures/linux-datastructures-1.html)来初始化。
`信号量` 的最后两个域 `count``wait_list` 是通过现有资源的数量和空 [链表](/DataStructures/linux-datastructures-1.md)来初始化。
第二种初始化 `信号量` 的方式是将 `信号量` 和现有资源数目传送给 `sema_init` 函数。 这个函数是在 [include/linux/semaphore.h](https://github.com/torvalds/linux/blob/master/include/linux/semaphore.h) 头文件中定义的。
```C
@@ -88,7 +88,7 @@ static inline void sema_init(struct semaphore *sem, int val)
}
```
我们来看看这个函数是如何实现的。它看起来很简单。函数使用我们刚看到的 `__SEMAPHORE_INITIALIZER` 宏对传入的 `信号量` 进行初始化。就像我们在之前 [部分](https://xinqiu.gitbooks.io/linux-insides-cn/content/SyncPrim/index.html) 写的那样我们将会跳过Linux内核关于 [锁验证](https://www.kernel.org/doc/Documentation/locking/lockdep-design.txt) 的部分。
我们来看看这个函数是如何实现的。它看起来很简单。函数使用我们刚看到的 `__SEMAPHORE_INITIALIZER` 宏对传入的 `信号量` 进行初始化。就像我们在之前 [部分](/SyncPrim/) 写的那样我们将会跳过Linux内核关于 [锁验证](https://www.kernel.org/doc/Documentation/locking/lockdep-design.txt) 的部分。
从现在开始我们知道如何初始化一个 `信号量`我们看看如何上锁和解锁。Linux内核提供了如下操作 `信号量` 的 [API](https://en.wikipedia.org/wiki/Application_programming_interface)
```
@@ -104,7 +104,7 @@ int down_timeout(struct semaphore *sem, long jiffies);
`down_killable` 函数和 `down_interruptible` 函数提供类似的功能,但是它还将当前进程的 `TASK_KILLABLE` 标志置位。这表示等待的进程可以被杀死信号中断。
`down_trylock` 函数和 `spin_trylock` 函数相似。这个函数试图去获取一个锁并且退出如果这个操作是失败的。在这个例子中,想获取锁的进程不会等待。最后的 `down_timeout`函数试图去获取一个锁。当前进程将会被中断进入到等待状态当超过传入的可等待时间。除此之外你也许注意到,这个等待的时间是以 [jiffies](https://xinqiu.gitbooks.io/linux-insides-cn/content/Timers/linux-timers-1.html)计数。
`down_trylock` 函数和 `spin_trylock` 函数相似。这个函数试图去获取一个锁并且退出如果这个操作是失败的。在这个例子中,想获取锁的进程不会等待。最后的 `down_timeout`函数试图去获取一个锁。当前进程将会被中断进入到等待状态当超过传入的可等待时间。除此之外你也许注意到,这个等待的时间是以 [jiffies](/Timers/linux-timers-1.md)计数。
我们刚刚看了 `信号量` [API](https://en.wikipedia.org/wiki/Application_programming_interface)的定义。我们从 `down` 函数开始看。这个函数是在 [kernel/locking/semaphore.c](https://github.com/torvalds/linux/blob/master/kernel/locking/semaphore.c) 源代码定义的。我们来看看函数实现:
@@ -180,7 +180,7 @@ struct semaphore_waiter waiter;
#define current get_current()
```
`get_current` 函数返回 `current_task` [per-cpu](https://xinqiu.gitbooks.io/linux-insides-cn/content/Concepts/linux-cpu-1.html) 变量的值。
`get_current` 函数返回 `current_task` [per-cpu](/Concepts/linux-cpu-1.md) 变量的值。
```C
@@ -339,14 +339,14 @@ static noinline void __sched __up(struct semaphore *sem)
* [preemption](https://en.wikipedia.org/wiki/Preemption_%28computing%29)
* [deadlocks](https://en.wikipedia.org/wiki/Deadlock)
* [scheduler](https://en.wikipedia.org/wiki/Scheduling_%28computing%29)
* [Doubly linked list in the Linux kernel](https://xinqiu.gitbooks.io/linux-insides-cn/content/DataStructures/linux-datastructures-1.html)
* [jiffies](https://xinqiu.gitbooks.io/linux-insides-cn/content/Timers/linux-timers-1.html)
* [Doubly linked list in the Linux kernel](/DataStructures/linux-datastructures-1.md)
* [jiffies](/Timers/linux-timers-1.md)
* [interrupts](https://en.wikipedia.org/wiki/Interrupt)
* [per-cpu](https://xinqiu.gitbooks.io/linux-insides-cn/content/Concepts/linux-cpu-1.html)
* [per-cpu](/Concepts/linux-cpu-1.md)
* [bitmask](https://en.wikipedia.org/wiki/Mask_%28computing%29)
* [SIGKILL](https://en.wikipedia.org/wiki/Unix_signal#SIGKILL)
* [errno](https://en.wikipedia.org/wiki/Errno.h)
* [API](https://en.wikipedia.org/wiki/Application_programming_interface)
* [mutex](https://en.wikipedia.org/wiki/Mutual_exclusion)
* [Previous part](https://xinqiu.gitbooks.io/linux-insides-cn/content/SyncPrim/linux-sync-2.html)
* [Previous part](/SyncPrim/linux-sync-2.md)

View File

@@ -430,4 +430,4 @@ Links
* [Inline assembly](https://0xax.gitbooks.io/linux-insides/content/Theory/asm.html)
* [XADD instruction](http://x86.renejeschke.de/html/file_module_x86_id_327.html)
* [LOCK instruction](http://x86.renejeschke.de/html/file_module_x86_id_159.html)
* [Previous part](https://0xax.gitbooks.io/linux-insides/content/SyncPrim/sync-4.html)
* [Previous part](/SyncPrim/linux-sync-4.md)

View File

@@ -6,20 +6,20 @@ Introduction
This is the sixth part of the chapter which describes [synchronization primitives](https://en.wikipedia.org/wiki/Synchronization_(computer_science)) in the Linux kernel and in the previous parts we finished to consider different [readers-writer lock](https://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock) synchronization primitives. We will continue to learn synchronization primitives in this part and start to consider a similar synchronization primitive which can be used to avoid the `writer starvation` problem. The name of this synchronization primitive is - `seqlock` or `sequential locks`.
We know from the previous [part](https://0xax.gitbooks.io/linux-insides/content/SyncPrim/sync-5.html) that [readers-writer lock](https://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock) is a special lock mechanism which allows concurrent access for read-only operations, but an exclusive lock is needed for writing or modifying data. As we may guess, it may lead to a problem which is called `writer starvation`. In other words, a writer process can't acquire a lock as long as at least one reader process which aqcuired a lock holds it. So, in the situation when contention is high, it will lead to situation when a writer process which wants to acquire a lock will wait for it for a long time.
We know from the previous [part](/SyncPrim/sync-5.md) that [readers-writer lock](https://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock) is a special lock mechanism which allows concurrent access for read-only operations, but an exclusive lock is needed for writing or modifying data. As we may guess, it may lead to a problem which is called `writer starvation`. In other words, a writer process can't acquire a lock as long as at least one reader process which aqcuired a lock holds it. So, in the situation when contention is high, it will lead to situation when a writer process which wants to acquire a lock will wait for it for a long time.
The `seqlock` synchronization primitive can help solve this problem.
As in all previous parts of this [book](https://0xax.gitbooks.io/linux-insides/content), we will try to consider this synchronization primitive from the theoretical side and only than we will consider [API](https://en.wikipedia.org/wiki/Application_programming_interface) provided by the Linux kernel to manipulate with `seqlocks`.
As in all previous parts of this [book](/SUMMARY.md), we will try to consider this synchronization primitive from the theoretical side and only than we will consider [API](https://en.wikipedia.org/wiki/Application_programming_interface) provided by the Linux kernel to manipulate with `seqlocks`.
So, let's start.
Sequential lock
--------------------------------------------------------------------------------
So, what is a `seqlock` synchronization primitive and how does it work? Let's try to answer on these questions in this paragraph. Actually `sequential locks` were introduced in the Linux kernel 2.6.x. Main point of this synchronization primitive is to provide fast and lock-free access to shared resources. Since the heart of `sequential lock` synchronization primitive is [spinlock](https://0xax.gitbooks.io/linux-insides/content/SyncPrim/sync-1.html) synchronization primitive, `sequential locks` work in situations where the protected resources are small and simple. Additionally write access must be rare and also should be fast.
So, what is a `seqlock` synchronization primitive and how does it work? Let's try to answer on these questions in this paragraph. Actually `sequential locks` were introduced in the Linux kernel 2.6.x. Main point of this synchronization primitive is to provide fast and lock-free access to shared resources. Since the heart of `sequential lock` synchronization primitive is [spinlock](/SyncPrim/linux-sync-1.md) synchronization primitive, `sequential locks` work in situations where the protected resources are small and simple. Additionally write access must be rare and also should be fast.
Work of this synchronization primitive is based on the sequence of events counter. Actually a `sequential lock` allows free access to a resource for readers, but each reader must check existence of conflicts with a writer. This synchronization primitive introduces a special counter. The main algorithm of work of `sequential locks` is simple: Each writer which acquired a sequential lock increments this counter and additionally acquires a [spinlock](https://0xax.gitbooks.io/linux-insides/content/SyncPrim/sync-1.html). When this writer finishes, it will release the acquired spinlock to give access to other writers and increment the counter of a sequential lock again.
Work of this synchronization primitive is based on the sequence of events counter. Actually a `sequential lock` allows free access to a resource for readers, but each reader must check existence of conflicts with a writer. This synchronization primitive introduces a special counter. The main algorithm of work of `sequential locks` is simple: Each writer which acquired a sequential lock increments this counter and additionally acquires a [spinlock](/SyncPrim/linux-sync-1.md). When this writer finishes, it will release the acquired spinlock to give access to other writers and increment the counter of a sequential lock again.
Read only access works on the following principle, it gets the value of a `sequential lock` counter before it will enter into [critical section](https://en.wikipedia.org/wiki/Critical_section) and compares it with the value of the same `sequential lock` counter at the exit of critical section. If their values are equal, this means that there weren't writers for this period. If their values are not equal, this means that a writer has incremented the counter during the [critical section](https://en.wikipedia.org/wiki/Critical_section). This conflict means that reading of protected data must be repeated.
@@ -54,7 +54,7 @@ typedef struct {
} seqlock_t;
```
As we may see the `seqlock_t` provides two fields. These fields represent a sequential lock counter, description of which we saw above and also a [spinlock](https://0xax.gitbooks.io/linux-insides/content/SyncPrim/sync-1.html) which will protect data from other writers. Note that the `seqcount` counter represented as `seqcount` type. The `seqcount` is structure:
As we may see the `seqlock_t` provides two fields. These fields represent a sequential lock counter, description of which we saw above and also a [spinlock](/SyncPrim/linux-sync-1.md) which will protect data from other writers. Note that the `seqcount` counter represented as `seqcount` type. The `seqcount` is structure:
```C
typedef struct seqcount {
@@ -67,7 +67,7 @@ typedef struct seqcount {
which holds counter of a sequential lock and [lock validator](https://www.kernel.org/doc/Documentation/locking/lockdep-design.txt) related field.
As always in previous parts of this [chapter](https://0xax.gitbooks.io/linux-insides/content/SyncPrim/), before we will consider an [API](https://en.wikipedia.org/wiki/Application_programming_interface) of `sequential lock` mechanism in the Linux kernel, we need to know how to initialize an instance of `seqlock_t`.
As always in previous parts of this [chapter](/SyncPrim/), before we will consider an [API](https://en.wikipedia.org/wiki/Application_programming_interface) of `sequential lock` mechanism in the Linux kernel, we need to know how to initialize an instance of `seqlock_t`.
We saw in the previous parts that often the Linux kernel provides two approaches to execute initialization of the given synchronization primitive. The same situation with the `seqlock_t` structure. These approaches allows to initialize a `seqlock_t` in two following:
@@ -114,7 +114,7 @@ So we just initialize counter of the given sequential lock to zero and additiona
#endif
```
As I already wrote in previous parts of this [chapter](https://0xax.gitbooks.io/linux-insides/content/SyncPrim/) we will not consider [debugging](https://en.wikipedia.org/wiki/Debugging) and [lock validator](https://www.kernel.org/doc/Documentation/locking/lockdep-design.txt) related stuff in this part. So for now we just skip the `SEQCOUNT_DEP_MAP_INIT` macro. The second field of the given `seqlock_t` is `lock` initialized with the `__SPIN_LOCK_UNLOCKED` macro which is defined in the [include/linux/spinlock_types.h](https://github.com/torvalds/linux/blob/master/include/linux/spinlock_types.h) header file. We will not consider implementation of this macro here as it just initialize [rawspinlock](https://0xax.gitbooks.io/linux-insides/content/SyncPrim/sync-1.html) with architecture-specific methods (More abot spinlocks you may read in first parts of this [chapter](https://0xax.gitbooks.io/linux-insides/content/SyncPrim/)).
As I already wrote in previous parts of this [chapter](/SyncPrim/) we will not consider [debugging](https://en.wikipedia.org/wiki/Debugging) and [lock validator](https://www.kernel.org/doc/Documentation/locking/lockdep-design.txt) related stuff in this part. So for now we just skip the `SEQCOUNT_DEP_MAP_INIT` macro. The second field of the given `seqlock_t` is `lock` initialized with the `__SPIN_LOCK_UNLOCKED` macro which is defined in the [include/linux/spinlock_types.h](https://github.com/torvalds/linux/blob/master/include/linux/spinlock_types.h) header file. We will not consider implementation of this macro here as it just initialize [rawspinlock](/SyncPrim/linux-sync-1.md) with architecture-specific methods (More abot spinlocks you may read in first parts of this [chapter](/SyncPrim/)).
We have considered the first way to initialize a sequential lock. Let's consider second way to do the same, but do it dynamically. We can initialize a sequentional lock with the `seqlock_init` macro which is defined in the same [include/linux/seqlock.h](https://github.com/torvalds/linux/blob/master/include/linux/seqlock.h) header file.
@@ -149,7 +149,7 @@ static inline void __seqcount_init(seqcount_t *s, const char *name,
}
```
just initializes counter of the given `seqcount_t` with zero. The second call from the `seqlock_init` macro is the call of the `spin_lock_init` macro which we saw in the [first part](https://0xax.gitbooks.io/linux-insides/content/SyncPrim/sync-1.html) of this chapter.
just initializes counter of the given `seqcount_t` with zero. The second call from the `seqlock_init` macro is the call of the `spin_lock_init` macro which we saw in the [first part](/SyncPrim/linux-sync-1.md) of this chapter.
So, now we know how to initialize a `sequential lock`, now let's look at how to use it. The Linux kernel provides following [API](https://en.wikipedia.org/wiki/Application_programming_interface) to manipulate `sequential locks`:
@@ -223,7 +223,7 @@ static inline int __read_seqcount_retry(const seqcount_t *s, unsigned start)
which just compares value of the counter of the given `sequential lock` with the initial value of this counter. If the initial value of the counter which is obtained from `read_seqbegin()` function is odd, this means that a writer was in the middle of updating the data when our reader began to act. In this case the value of the data can be in inconsistent state, so we need to try to read it again.
This is a common pattern in the Linux kernel. For example, you may remember the `jiffies` concept from the [first part](https://0xax.gitbooks.io/linux-insides/content/Timers/timers-1.html) of the [timers and time management in the Linux kernel](https://0xax.gitbooks.io/linux-insides/content/Timers/) chapter. The sequential lock is used to obtain value of `jiffies` at [x86_64](https://en.wikipedia.org/wiki/X86-64) architecture:
This is a common pattern in the Linux kernel. For example, you may remember the `jiffies` concept from the [first part](/Timers/linux-timers-1.md) of the [timers and time management in the Linux kernel](/Timers/) chapter. The sequential lock is used to obtain value of `jiffies` at [x86_64](https://en.wikipedia.org/wiki/X86-64) architecture:
```C
u64 get_jiffies_64(void)
@@ -303,7 +303,7 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
and in the end we just call the `spin_unlock` macro to give access for other readers or writers.
That's all about `sequential lock` mechanism in the Linux kernel. Of course we did not consider full [API](https://en.wikipedia.org/wiki/Application_programming_interface) of this mechanism in this part. But all other functions are based on these which we described here. For example, Linux kernel also provides some safe macros/functions to use `sequential lock` mechanism in [interrupt handlers](https://en.wikipedia.org/wiki/Interrupt_handler) of [softirq](https://0xax.gitbooks.io/linux-insides/content/Interrupts/linux-interrupts-9.html): `write_seqclock_irq` and `write_sequnlock_irq`:
That's all about `sequential lock` mechanism in the Linux kernel. Of course we did not consider full [API](https://en.wikipedia.org/wiki/Application_programming_interface) of this mechanism in this part. But all other functions are based on these which we described here. For example, Linux kernel also provides some safe macros/functions to use `sequential lock` mechanism in [interrupt handlers](https://en.wikipedia.org/wiki/Interrupt_handler) of [softirq](/Interrupts/linux-interrupts-9.md): `write_seqclock_irq` and `write_sequnlock_irq`:
```C
static inline void write_seqlock_irq(seqlock_t *sl)
@@ -339,14 +339,14 @@ Links
* [synchronization primitives](https://en.wikipedia.org/wiki/Synchronization_(computer_science))
* [readers-writer lock](https://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock)
* [spinlock](https://0xax.gitbooks.io/linux-insides/content/SyncPrim/sync-1.html)
* [spinlock](/SyncPrim/linux-sync-1.md)
* [critical section](https://en.wikipedia.org/wiki/Critical_section)
* [lock validator](https://www.kernel.org/doc/Documentation/locking/lockdep-design.txt)
* [debugging](https://en.wikipedia.org/wiki/Debugging)
* [API](https://en.wikipedia.org/wiki/Application_programming_interface)
* [x86_64](https://en.wikipedia.org/wiki/X86-64)
* [Timers and time management in the Linux kernel](https://0xax.gitbooks.io/linux-insides/content/Timers/)
* [Timers and time management in the Linux kernel](/Timers/)
* [interrupt handlers](https://en.wikipedia.org/wiki/Interrupt_handler)
* [softirq](https://0xax.gitbooks.io/linux-insides/content/Interrupts/linux-interrupts-9.html)
* [softirq](/Interrupts/linux-interrupts-9.md)
* [IRQ](https://en.wikipedia.org/wiki/Interrupt_request_(PC_architecture))
* [Previous part](https://0xax.gitbooks.io/linux-insides/content/SyncPrim/sync-5.html)
* [Previous part](/SyncPrim/linux-sync-5.md)