diff --git a/Initialization/linux-initialization-1.md b/Initialization/linux-initialization-1.md index 28cd9fd..5fc7d23 100644 --- a/Initialization/linux-initialization-1.md +++ b/Initialization/linux-initialization-1.md @@ -4,11 +4,11 @@ 踏入内核代码的第一步(TODO: Need proofreading) -------------------------------------------------------------------------------- -[上一章](https://xinqiu.gitbooks.io/linux-insides-cn/content/Booting/linux-bootstrap-5.html)是[引导过程](https://xinqiu.gitbooks.io/linux-insides-cn/content/Booting/index.html)的最后一部分。从现在开始,我们将深入探究 Linux 内核的初始化过程。在解压缩完 Linux 内核镜像、并把它妥善地放入内存后,内核就开始工作了。我们在第一章中介绍了 Linux 内核引导程序,它的任务就是为执行内核代码做准备。而在本章中,我们将探究内核代码,看一看内核的初始化过程——即在启动 [PID](https://en.wikipedia.org/wiki/Process_identifier) 为 `1` 的 `init` 进程前,内核所做的大量工作。 +[上一章](/Booting/linux-bootstrap-5.md)是[引导过程](/Booting/)的最后一部分。从现在开始,我们将深入探究 Linux 内核的初始化过程。在解压缩完 Linux 内核镜像、并把它妥善地放入内存后,内核就开始工作了。我们在第一章中介绍了 Linux 内核引导程序,它的任务就是为执行内核代码做准备。而在本章中,我们将探究内核代码,看一看内核的初始化过程——即在启动 [PID](https://en.wikipedia.org/wiki/Process_identifier) 为 `1` 的 `init` 进程前,内核所做的大量工作。 本章的内容很多,介绍了在内核启动前的所有准备工作。[arch/x86/kernel/head_64.S](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/head_64.S) 文件中定义了内核入口点,我们会从这里开始,逐步地深入下去。在 `start_kernel` 函数(定义在 [init/main.c](https://github.com/torvalds/linux/blob/master/init/main.c#L489)) 执行之前,我们会看到很多的初期的初始化过程,例如初期页表初始化、切换到一个新的内核空间描述符等等。 -在[上一章](https://xinqiu.gitbooks.io/linux-insides-cn/content/Booting/index.html)的[最后一节](https://xinqiu.gitbooks.io/linux-insides-cn/content/Booting/linux-bootstrap-5.html)中,我们跟踪到了 [arch/x86/boot/compressed/head_64.S](https://github.com/torvalds/linux/blob/master/arch/x86/boot/compressed/head_64.S) 文件中的 [jmp](https://github.com/torvalds/linux/blob/master/arch/x86/boot/compressed/head_64.S) 指令: +在[上一章](/Booting/)的[最后一节](/Booting/linux-bootstrap-5.md)中,我们跟踪到了 [arch/x86/boot/compressed/head_64.S](https://github.com/torvalds/linux/blob/master/arch/x86/boot/compressed/head_64.S) 文件中的 [jmp](https://github.com/torvalds/linux/blob/master/arch/x86/boot/compressed/head_64.S) 指令: ```assembly jmp *%rax @@ -90,7 +90,7 @@ rbp = 0x1000000 - (0xffffffff81000000 - 0xffffffff80000000) jnz bad_address ``` -在这里我们将 `rbp` 寄存器的低32位与 `PMD_PAGE_MASK` 进行比较。`PMD_PAGE_MASK` 代表中层页目录(`Page middle directory`)屏蔽位(相关信息请阅读 [paging](http://xinqiu.gitbooks.io/linux-insides-cn/content/Theory/linux-theory-1.html) 一节),它的定义如下: +在这里我们将 `rbp` 寄存器的低32位与 `PMD_PAGE_MASK` 进行比较。`PMD_PAGE_MASK` 代表中层页目录(`Page middle directory`)屏蔽位(相关信息请阅读 [paging](/Theory/linux-theory-1.md) 一节),它的定义如下: ```C #define PMD_PAGE_MASK (~(PMD_PAGE_SIZE-1)) @@ -162,7 +162,7 @@ NEXT_PAGE(level1_fixmap_pgt) _PAGE_ACCESSED | _PAGE_DIRTY) ``` -更多信息请阅读 [分页](http://xinqiu.gitbooks.io/linux-insides-cn/content/Theory/linux-theory-1.html) 部分. +更多信息请阅读 [分页](/Theory/linux-theory-1.md) 部分. `level3_kernel_pgt` 中保存的两项用来映射内核空间,在它的前 `510`(即 `L3_START_KERNEL`)项均为 `0`。这里的 `L3_START_KERNEL` 保存的是在上层页目录(Page Upper Directory)中包含`__START_KERNEL_map` 地址的那一条索引,它等于 `510`。后面一项 `level2_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE` 中的 `level2_kernel_pgt` 比较容易理解,它是一条页表项,包含了指向中层页目录的指针,它用来映射内核空间,并且具有如下的访问权限: @@ -491,7 +491,7 @@ INIT_PER_CPU(gdt_page); `INIT_PER_CPU` 扩展后也将得到 `init_per_cpu__gdt_page` 并将它的值设置为相对于 `__per_cpu_load` 的偏移量。这样,我们就得到了新GDT的正确的基地址。 -per-CPU变量是2.6内核中的特性。顾名思义,当我们创建一个 `per-CPU` 变量时,每个CPU都会拥有一份它自己的拷贝,在这里我们创建的是 `gdt_page` per-CPU变量。这种类型的变量有很多有点,比如由于每个CPU都只访问自己的变量而不需要锁等。因此在多处理器的情况下,每一个处理器核心都将拥有一份自己的 `GDT` 表,其中的每一项都代表了一块内存,这块内存可以由在这个核心上运行的线程访问。这里 [Concepts/per-cpu](https://xinqiu.gitbooks.io/linux-insides-cn/content/Concepts/linux-cpu-1.html) 有关于 `per-CPU` 变量的更详细的介绍。 +per-CPU变量是2.6内核中的特性。顾名思义,当我们创建一个 `per-CPU` 变量时,每个CPU都会拥有一份它自己的拷贝,在这里我们创建的是 `gdt_page` per-CPU变量。这种类型的变量有很多有点,比如由于每个CPU都只访问自己的变量而不需要锁等。因此在多处理器的情况下,每一个处理器核心都将拥有一份自己的 `GDT` 表,其中的每一项都代表了一块内存,这块内存可以由在这个核心上运行的线程访问。这里 [Concepts/per-cpu](/Concepts/linux-cpu-1.md) 有关于 `per-CPU` 变量的更详细的介绍。 在加载好了新的全局描述附表之后,跟之前一样我们重新加载一下各个段: @@ -620,7 +620,7 @@ write_cr3(__pa_nodebug(early_level4_pgt)); -------------------------------------------------------------------------------- * [Model Specific Register](http://en.wikipedia.org/wiki/Model-specific_register) -* [Paging](http://xinqiu.gitbooks.io/linux-insides-cn/content/Theory/linux-theory-1.html) -* [Previous part - Kernel decompression](http://xinqiu.gitbooks.io/linux-insides-cn/content/Booting/linux-bootstrap-5.html) +* [Paging](/Theory/linux-theory-1.md) +* [Previous part - Kernel decompression](/Booting/linux-bootstrap-5.md) * [NX](http://en.wikipedia.org/wiki/NX_bit) * [ASLR](http://en.wikipedia.org/wiki/Address_space_layout_randomization) diff --git a/Initialization/linux-initialization-10.md b/Initialization/linux-initialization-10.md index 49b4850..e1a13d0 100644 --- a/Initialization/linux-initialization-10.md +++ b/Initialization/linux-initialization-10.md @@ -4,7 +4,7 @@ Kernel initialization. Part 10. End of the linux kernel initialization process ================================================================================ -This is tenth part of the chapter about linux kernel [initialization process](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/index.html) and in the [previous part](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-9.html) we saw the initialization of the [RCU](http://en.wikipedia.org/wiki/Read-copy-update) and stopped on the call of the `acpi_early_init` function. This part will be the last part of the [Kernel initialization process](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/index.html) chapter, so let's finish it. +This is tenth part of the chapter about linux kernel [initialization process](/Initialization/) and in the [previous part](/Initialization/linux-initialization-9.md) we saw the initialization of the [RCU](http://en.wikipedia.org/wiki/Read-copy-update) and stopped on the call of the `acpi_early_init` function. This part will be the last part of the [Kernel initialization process](/Initialization/index.md) chapter, so let's finish it. After the call of the `acpi_early_init` function from the [init/main.c](https://github.com/torvalds/linux/blob/master/init/main.c), we can see the following code: @@ -185,7 +185,7 @@ nrpages = (nr_free_buffer_pages() * 10) / 100; max_buffer_heads = nrpages * (PAGE_SIZE / sizeof(struct buffer_head)); ``` -which will be equal to the `10%` of the `ZONE_NORMAL` (all RAM from the 4GB on the `x86_64`). The next function after the `buffer_init` is - `vfs_caches_init`. This function allocates `SLAB` caches and hashtable for different [VFS](http://en.wikipedia.org/wiki/Virtual_file_system) caches. We already saw the `vfs_caches_init_early` function in the eighth part of the linux kernel [initialization process](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-8.html) which initialized caches for `dcache` (or directory-cache) and [inode](http://en.wikipedia.org/wiki/Inode) cache. The `vfs_caches_init` function makes post-early initialization of the `dcache` and `inode` caches, private data cache, hash tables for the mount points, etc. More details about [VFS](http://en.wikipedia.org/wiki/Virtual_file_system) will be described in the separate part. After this we can see `signals_init` function. This function is defined in the [kernel/signal.c](https://github.com/torvalds/linux/blob/master/kernel/signal.c) and allocates a cache for the `sigqueue` structures which represents queue of the real time signals. The next function is `page_writeback_init`. This function initializes the ratio for the dirty pages. Every low-level page entry contains the `dirty` bit which indicates whether a page has been written to after been loaded into memory. +which will be equal to the `10%` of the `ZONE_NORMAL` (all RAM from the 4GB on the `x86_64`). The next function after the `buffer_init` is - `vfs_caches_init`. This function allocates `SLAB` caches and hashtable for different [VFS](http://en.wikipedia.org/wiki/Virtual_file_system) caches. We already saw the `vfs_caches_init_early` function in the eighth part of the linux kernel [initialization process](/Initialization/linux-initialization-8.md) which initialized caches for `dcache` (or directory-cache) and [inode](http://en.wikipedia.org/wiki/Inode) cache. The `vfs_caches_init` function makes post-early initialization of the `dcache` and `inode` caches, private data cache, hash tables for the mount points, etc. More details about [VFS](http://en.wikipedia.org/wiki/Virtual_file_system) will be described in the separate part. After this we can see `signals_init` function. This function is defined in the [kernel/signal.c](https://github.com/torvalds/linux/blob/master/kernel/signal.c) and allocates a cache for the `sigqueue` structures which represents queue of the real time signals. The next function is `page_writeback_init`. This function initializes the ratio for the dirty pages. Every low-level page entry contains the `dirty` bit which indicates whether a page has been written to after been loaded into memory. Creation of the root for the procfs -------------------------------------------------------------------------------- @@ -440,7 +440,7 @@ That's all! Linux kernel initialization process is finished! Conclusion -------------------------------------------------------------------------------- -It is the end of the tenth part about the linux kernel [initialization process](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/index.html). It is not only the `tenth` part, but also is the last part which describes initialization of the linux kernel. As I wrote in the first [part](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-1.html) of this chapter, we will go through all steps of the kernel initialization and we did it. We started at the first architecture-independent function - `start_kernel` and finished with the launch of the first `init` process in the our system. I skipped details about different subsystem of the kernel, for example I almost did not cover scheduler, interrupts, exception handling, etc. From the next part we will start to dive to the different kernel subsystems. Hope it will be interesting. +It is the end of the tenth part about the linux kernel [initialization process](/Initialization/). It is not only the `tenth` part, but also is the last part which describes initialization of the linux kernel. As I wrote in the first [part](/Initialization/linux-initialization-1.md) of this chapter, we will go through all steps of the kernel initialization and we did it. We started at the first architecture-independent function - `start_kernel` and finished with the launch of the first `init` process in the our system. I skipped details about different subsystem of the kernel, for example I almost did not cover scheduler, interrupts, exception handling, etc. From the next part we will start to dive to the different kernel subsystems. Hope it will be interesting. If you have any questions or suggestions write me a comment or ping me at [twitter](https://twitter.com/0xAX). @@ -470,4 +470,4 @@ Links * [Tmpfs](http://en.wikipedia.org/wiki/Tmpfs) * [initrd](http://en.wikipedia.org/wiki/Initrd) * [panic](http://en.wikipedia.org/wiki/Kernel_panic) -* [Previous part](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-9.html) +* [Previous part](/Initialization/linux-initialization-9.md) diff --git a/Initialization/linux-initialization-2.md b/Initialization/linux-initialization-2.md index a2e87bc..5f5d617 100644 --- a/Initialization/linux-initialization-2.md +++ b/Initialization/linux-initialization-2.md @@ -4,9 +4,9 @@ 初期中断和异常处理 -------------------------------------------------------------------------------- -在上一个 [部分](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-1.html) 我们谈到了初期中断初始化。目前我们已经处于解压缩后的Linux内核中了,还有了用于初期启动的基本的 [分页](https://en.wikipedia.org/wiki/Page_table) 机制。我们的目标是在内核的主体代码执行前做好准备工作。 +在上一个 [部分](/Initialization/linux-initialization-1.md) 我们谈到了初期中断初始化。目前我们已经处于解压缩后的Linux内核中了,还有了用于初期启动的基本的 [分页](https://en.wikipedia.org/wiki/Page_table) 机制。我们的目标是在内核的主体代码执行前做好准备工作。 -我们已经在 [本章](https://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/index.html) 的 [第一部分](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-1.html) 做了一些工作,在这一部分中我们会继续分析关于中断和异常处理部分的代码。 +我们已经在 [本章](/Initialization/) 的 [第一部分](/Initialization/linux-initialization-1.md) 做了一些工作,在这一部分中我们会继续分析关于中断和异常处理部分的代码。 我们在上一部分谈到了下面这个循环: @@ -493,4 +493,4 @@ pmd_p[pmd_index(address)] = pmd; * [Page table](https://en.wikipedia.org/wiki/Page_table) * [Interrupt handler](https://en.wikipedia.org/wiki/Interrupt_handler) * [Page Fault](https://en.wikipedia.org/wiki/Page_fault), -* [Previous part](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-1.html) +* [Previous part](/Initialization/linux-initialization-1.md) diff --git a/Initialization/linux-initialization-3.md b/Initialization/linux-initialization-3.md index 65d0603..f2a7e86 100644 --- a/Initialization/linux-initialization-3.md +++ b/Initialization/linux-initialization-3.md @@ -77,7 +77,7 @@ extern char __initdata boot_command_line[]; 初始化内存页 -------------------------------------------------------------------------------- -至此,我们已经拷贝了 `boot_params` 结构体,接下来将对初期页表进行一些设置以便在初始化内核的过程中使用。我们之前已经对初始化了初期页表,以便支持换页,这在之前的[部分](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-1.html)中已经讨论过。现在则通过调用 `reset_early_page_tables` 函数将初期页表中大部分项清零(在之前的部分也有介绍),只保留内核高地址的映射。然后我们调用: +至此,我们已经拷贝了 `boot_params` 结构体,接下来将对初期页表进行一些设置以便在初始化内核的过程中使用。我们之前已经对初始化了初期页表,以便支持换页,这在之前的[部分](/Initialization/linux-initialization-1.md)中已经讨论过。现在则通过调用 `reset_early_page_tables` 函数将初期页表中大部分项清零(在之前的部分也有介绍),只保留内核高地址的映射。然后我们调用: ```C clear_page(init_level4_pgt); diff --git a/Initialization/linux-initialization-4.md b/Initialization/linux-initialization-4.md index 6cc1376..4880957 100644 --- a/Initialization/linux-initialization-4.md +++ b/Initialization/linux-initialization-4.md @@ -208,7 +208,7 @@ int cpu = smp_processor_id(); #define raw_smp_processor_id() (this_cpu_read(cpu_number)) ``` -`this_cpu_read` 函数与其它很多函数一样如(`this_cpu_write`, `this_cpu_add` 等等...) 被定义在[include/linux/percpu-defs.h](https://github.com/torvalds/linux/blob/master/include/linux/percpu-defs.h) 此部分函数主要为对 `this_cpu` 进行操作. 这些操作提供不同的对每cpu[per-cpu](http://xinqiu.gitbooks.io/linux-insides-cn/content/Concepts/linux-cpu-1.html) 变量相关访问方式. 譬如让我们来看看这个函数 `this_cpu_read`: +`this_cpu_read` 函数与其它很多函数一样如(`this_cpu_write`, `this_cpu_add` 等等...) 被定义在[include/linux/percpu-defs.h](https://github.com/torvalds/linux/blob/master/include/linux/percpu-defs.h) 此部分函数主要为对 `this_cpu` 进行操作. 这些操作提供不同的对每cpu[per-cpu](/Concepts/linux-cpu-1.md) 变量相关访问方式. 譬如让我们来看看这个函数 `this_cpu_read`: ``` __pcpu_size_call_return(this_cpu_read_, pcp) @@ -311,7 +311,7 @@ static inline int __check_is_bitmap(const unsigned long *bitmap) 原来此函数始终返回1,事实上我们需要这样的函数才达到我们的目的: 它在编译时给定一个`bitmap`,换句话将就是检查`bitmap`的类型是否是`unsigned long *`,因此我们仅仅通过`to_cpumask`宏指令将类型为`unsigned long`的数组转化为`struct cpumask *`。现在我们可以调用`cpumask_set_cpu` 函数,这个函数仅仅是一个 `set_bit`给CPU掩码的功能函数。所有的这些`set_cpu_*`函数的原理都是一样的。 -如果你还不确定`set_cpu_*`这些函数的操作并且不能理解 `cpumask`的概念,不要担心。你可以通过读取这些章节[cpumask](https://xinqiu.gitbooks.io/linux-insides-cn/content/Concepts/linux-cpu-2.html) or [documentation](https://www.kernel.org/doc/Documentation/cpu-hotplug.txt).来继续了解和学习这些函数的原理。 +如果你还不确定`set_cpu_*`这些函数的操作并且不能理解 `cpumask`的概念,不要担心。你可以通过读取这些章节[cpumask](/Concepts/linux-cpu-2.md) or [documentation](https://www.kernel.org/doc/Documentation/cpu-hotplug.txt).来继续了解和学习这些函数的原理。 现在我们已经激活第一个CPU,我们继续接着start_kernel函数往下走,下面的函数是`page_address_init`,但是此函数不执行任何操作,因为只有当所有内存不能直接映射的时候才会执行。 @@ -349,7 +349,7 @@ Linux version 4.0.0-rc6+ (alex@localhost) (gcc version 4.9.1 (Ubuntu 4.9.1-16ubu memblock_reserve(__pa_symbol(_text), (unsigned long)__bss_stop - (unsigned long)_text); ``` -你可以阅读关于`memblock`的相关内容在[Linux kernel memory management Part 1.](http://xinqiu.gitbooks.io/linux-insides-cn/content/MM/linux-mm-1.html),你应该还记得`memblock_reserve`函数的两个参数: +你可以阅读关于`memblock`的相关内容在[Linux kernel memory management Part 1.](/MM/linux-mm-1.md),你应该还记得`memblock_reserve`函数的两个参数: * base physical address of a memory block; * size of a memory block. @@ -382,7 +382,7 @@ u64 ramdisk_size = get_ramdisk_size(); u64 ramdisk_end = PAGE_ALIGN(ramdisk_image + ramdisk_size); ``` -如果你阅读过这些章节[Linux Kernel Booting Process](http://xinqiu.gitbooks.io/linux-insides-cn/content/Booting/index.html),你就知道所有的这些参数都来自于`boot_params`,时刻谨记`boot_params`在boot期间已经被赋值,内核启动头包含了一下几个字段用来描述RAM DISK: +如果你阅读过这些章节[Linux Kernel Booting Process](/Booting/),你就知道所有的这些参数都来自于`boot_params`,时刻谨记`boot_params`在boot期间已经被赋值,内核启动头包含了一下几个字段用来描述RAM DISK: ``` Field name: ramdisk_image Type: write (obligatory) @@ -434,7 +434,7 @@ memblock_reserve(ramdisk_image, ramdisk_end - ramdisk_image); 如果你有任何的问题或者建议,你可以留言,也可以直接发消息给我[twitter](https://twitter.com/0xAX)。 -**很抱歉,英语并不是我的母语,非常抱歉给您阅读带来不便,如果你发现文中描述有任何问题,请提交一个 PR 到 [linux-insides](https://github.com/hust-open-atom-club/linux-insides-zh).** +**很抱歉,英语并不是我的母语,非常抱歉给您阅读带来不便,如果你发现文中描述有任何问题,请提交一个 PR 到 [linux-insides-zh](https://github.com/hust-open-atom-club/linux-insides-zh).** 链接 -------------------------------------------------------------------------------- @@ -447,4 +447,4 @@ memblock_reserve(ramdisk_image, ramdisk_end - ramdisk_image); * [stack buffer overflow](http://en.wikipedia.org/wiki/Stack_buffer_overflow) * [IRQs](http://en.wikipedia.org/wiki/Interrupt_request_%28PC_architecture%29) * [initrd](http://en.wikipedia.org/wiki/Initrd) -* [Previous part](https://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-3.html) +* [Previous part](/Initialization/linux-initialization-3.md) diff --git a/Initialization/linux-initialization-5.md b/Initialization/linux-initialization-5.md index 7cb0b36..48eeb0b 100644 --- a/Initialization/linux-initialization-5.md +++ b/Initialization/linux-initialization-5.md @@ -4,14 +4,14 @@ 与系统架构有关的初始化后续分析 =========================================================== -在之前的[章节](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-4.html)中, +在之前的[章节](/Initialization/linux-initialization-4.md)中, 我们讲到了与系统架构有关的 [setup_arch](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/setup.c#L856) 函数部分,本文会继续从这里开始。 我们为 [initrd](http://en.wikipedia.org/wiki/Initrd) 预留了内存之后,下一步是执行 `olpc_ofw_detect` 函数检测系统是否支持 [One Laptop Per Child support](http://wiki.laptop.org/go/OFW_FAQ)。 我们不会考虑与平台有关的东西,且会忽略与平台有关的函数。所以我们继续往下看。 下一步是执行 `early_trap_init` 函数。这个函数会初始化调试功能 (`#DB` -当 `TF` 标志位和rflags被设置时会被使用)和 `int3` (`#BP`)中断门。 -如果你不了解中断,你可以从 [初期中断和异常处理](https://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-2.html) 中学习有关中断的内容。 +如果你不了解中断,你可以从 [初期中断和异常处理](/Initialization/linux-initialization-2.md) 中学习有关中断的内容。 在 `x86` 架构中,`INT`,`INT0` 和 `INT3` 是支持任务显式调用中断处理函数的特殊指令。`INT3` 指令调用断点(`#BP`)处理函数。 -你如果记得,我们在这[部分](https://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-2.html) 看到过中断和异常概念: +你如果记得,我们在这[部分](/Initialization/linux-initialization-2.md) 看到过中断和异常概念: ``` ---------------------------------------------------------------------------------------------- @@ -190,7 +190,7 @@ movl $1,%ebx * I/O端口 * 设备内存 -我们在 linux [内核启动过程](https://xinqiu.gitbooks.io/linux-insides-cn/content/Booting/linux-bootstrap-3.html)中见过第一种方法(通过 `outb/inb` 指令实现)。 +我们在 linux [内核启动过程](/Booting/linux-bootstrap-3.md)中见过第一种方法(通过 `outb/inb` 指令实现)。 第二种方法是把 `I/O` 的物理地址映射到虚拟地址。当 `CPU` 读取一段物理地址时,它可以读取到映射了 `I/O` 设备的物理 `RAM` 区域。 `ioremap` 就是用来把设备内存映射到内核地址空间的。 @@ -221,7 +221,7 @@ memset(bm_pte, 0, sizeof(bm_pte)); pmd_populate_kernel(&init_mm, pmd, bm_pte); ``` -这就是所有过程。如果你仍然觉得困惑,不要担心。在 [内核内存管理,第二部分](https://xinqiu.gitbooks.io/linux-insides-cn/content/MM/linux-mm-2.html) 章节会有单独一部分讲解 `ioremap` 和 `fixmaps`。 +这就是所有过程。如果你仍然觉得困惑,不要担心。在 [内核内存管理,第二部分](/MM/linux-mm-2.md) 章节会有单独一部分讲解 `ioremap` 和 `fixmaps`。 获取根设备的主次设备号 ---------------------------------------------------------------------------- @@ -272,7 +272,7 @@ static inline dev_t new_decode_dev(u32 dev) Memory Map设置 ----------------------------------------------------------------------- -下一步是调用 `setup_memory_map` 函数设置内存映射。但是在这之前我们需要设置与显示屏有关的参数(目前有行、列,视频页等,你可以在 [显示模式初始化和进入保护模式](https://xinqiu.gitbooks.io/linux-insides-cn/content/Booting/linux-bootstrap-3.html) 中了解), +下一步是调用 `setup_memory_map` 函数设置内存映射。但是在这之前我们需要设置与显示屏有关的参数(目前有行、列,视频页等,你可以在 [显示模式初始化和进入保护模式](/Booting/linux-bootstrap-3.md) 中了解), 与拓展显示识别数据,视频模式,引导启动器类型等参数: ```C @@ -393,7 +393,7 @@ struct x86_init_ops x86_init __initdata = { } ``` -我们可以看到,这里的 `memory_setup` 赋值为 `default_machine_specific_memory_setup`,它是我们在对 [内核启动](https://xinqiu.gitbooks.io/linux-insides-cn/content/Booting/linux-bootstrap-2.html) 过程中的所有 [e820](http://en.wikipedia.org/wiki/E820) 条目经过整理和把内存分区填入 `e820map` 结构体中获得的。 +我们可以看到,这里的 `memory_setup` 赋值为 `default_machine_specific_memory_setup`,它是我们在对 [内核启动](/Booting/linux-bootstrap-2.md) 过程中的所有 [e820](http://en.wikipedia.org/wiki/E820) 条目经过整理和把内存分区填入 `e820map` 结构体中获得的。 所有收集的内存分区会用 `printk` 打印出来。你可以通过运行 `dmesg` 命令找到类似于下面的信息: ``` @@ -452,7 +452,7 @@ static inline void __init copy_edd(void) 下一步是在初始化阶段完成内存描述符的初始化。我们知道每个进程都有自己的运行内存地址空间。通过调用 `memory descriptor` 可以看到这些特殊数据结构。 在 linux 内核源码中内存描述符是用 `mm_struct` 结构体表示的。`mm_struct` 包含许多不同的与进程地址空间有关的字段,像内核代码/数据段的起始和结束地址, `brk` 的起始和结束,内存区域的数量,内存区域列表等。这些结构定义在 [include/linux/mm_types.h](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/include/linux/mm_types.h) 中。`task_struct` 结构的 `mm` 和 `active_mm` 字段包含了每个进程自己的内存描述符。 -我们的第一个 `init` 进程也有自己的内存描述符。在之前的[章节](https://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-4.html)我们看到过通过 `INIT_TASK` 宏实现 `task_struct` 的部分初始化信息: +我们的第一个 `init` 进程也有自己的内存描述符。在之前的[章节](/Initialization/linux-initialization-4.md)我们看到过通过 `INIT_TASK` 宏实现 `task_struct` 的部分初始化信息: ```C #define INIT_TASK(tsk) \ @@ -538,7 +538,7 @@ void x86_configure_nx(void) 以上是 linux 内核初始化过程的第五部分。在这一章我们讲解了有关架构初始化的 `setup_arch` 函数。内容很多,但是我们还没有学习完。其中,`setup_arch` 是一个很复杂的函数,甚至我不确定我们能在以后的章节中讲完它的所有内容。在这一章节中有一些很有趣的概念像 `Fix-mapped` 地址,`ioremap` 等等。 -如果没听明白也不用担心,在 [内核内存管理,第二部分](https://xinqiu.gitbooks.io/linux-insides-cn/content/MM/linux-mm-2.html) 还会有更详细的解释。在下一章节我们会继续讲解有关结构初始化的东西, +如果没听明白也不用担心,在 [内核内存管理,第二部分](/MM/linux-mm-2.md) 还会有更详细的解释。在下一章节我们会继续讲解有关结构初始化的东西, 以及初期内核参数的解析,`pci` 设备的早期转存,直接媒体接口扫描等等。 如果你有任何问题或者建议,你可以留言,也可以直接发送消息给我[twitter](https://twitter.com/0xAX)。 diff --git a/Initialization/linux-initialization-6.md b/Initialization/linux-initialization-6.md index b767320..9854019 100644 --- a/Initialization/linux-initialization-6.md +++ b/Initialization/linux-initialization-6.md @@ -4,7 +4,7 @@ 仍旧是与系统架构有关的初始化 =========================================================== -在之前的[章节](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-5.html)我们从 [arch/x86/kernel/setup.c](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/setup.c)了解了特定于系统架构的初始化事务(在我们的例子中是 `x86_64` 架构),并且通过 `x86_configure_nx` 函数根据对[NX bit](http://en.wikipedia.org/wiki/NX_bit)的支持配置了 `_PAGE_NX` 标志位。正如我之前写的, `setup_arch` 函数和 `start_kernel` 都非常复杂,所以在这个和下个章节我们将继续学习关于系统架构初始化进程的内容。`x86_configure_nx` 函数的下面是 `parse_early_param` 函数。这个函数定义在 [init/main.c](https://github.com/torvalds/linux/blob/master/init/main.c) 中并且你可以从它的名字中了解到,这个函数解析内核命令行并且基于给定的参数创建不同的服务 (所有的内核命令行参数你都可以在 [Documentation/kernel-parameters.txt](https://github.com/torvalds/linux/blob/master/Documentation/kernel-parameters.txt) 找到)。 你可能记得在最前面的 [章节](http://xinqiu.gitbooks.io/linux-insides-cn/content/Booting/linux-bootstrap-2.html) 我们是怎样创建 `earlyprintk`地。在前面我们用 [arch/x86/boot/cmdline.c](https://github.com/torvalds/linux/blob/master/arch/x86/boot/cmdline.c) 里面的 `cmdline_find_option` 和 `__cmdline_find_option`, `__cmdline_find_option_bool` 函数的帮助下寻找内核参数及其值。我们在通用内核部分不依赖于特定的系统架构,在这里我们使用另一种方法。 如果你正在阅读linux内核源代码,你可能注意到这样的调用: +在之前的[章节](/Initialization/linux-initialization-5.md)我们从 [arch/x86/kernel/setup.c](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/setup.c)了解了特定于系统架构的初始化事务(在我们的例子中是 `x86_64` 架构),并且通过 `x86_configure_nx` 函数根据对[NX bit](http://en.wikipedia.org/wiki/NX_bit)的支持配置了 `_PAGE_NX` 标志位。正如我之前写的, `setup_arch` 函数和 `start_kernel` 都非常复杂,所以在这个和下个章节我们将继续学习关于系统架构初始化进程的内容。`x86_configure_nx` 函数的下面是 `parse_early_param` 函数。这个函数定义在 [init/main.c](https://github.com/torvalds/linux/blob/master/init/main.c) 中并且你可以从它的名字中了解到,这个函数解析内核命令行并且基于给定的参数创建不同的服务 (所有的内核命令行参数你都可以在 [Documentation/kernel-parameters.txt](https://github.com/torvalds/linux/blob/master/Documentation/kernel-parameters.txt) 找到)。 你可能记得在最前面的 [章节](/Booting/linux-bootstrap-2.md) 我们是怎样创建 `earlyprintk`地。在前面我们用 [arch/x86/boot/cmdline.c](https://github.com/torvalds/linux/blob/master/arch/x86/boot/cmdline.c) 里面的 `cmdline_find_option` 和 `__cmdline_find_option`, `__cmdline_find_option_bool` 函数的帮助下寻找内核参数及其值。我们在通用内核部分不依赖于特定的系统架构,在这里我们使用另一种方法。 如果你正在阅读linux内核源代码,你可能注意到这样的调用: ```C early_param("gbpages", parse_direct_gbpages_on); @@ -102,7 +102,7 @@ noexec [X86] memblock_x86_reserve_range_setup_data(); ``` -这个函数的定义也在 [arch/x86/kernel/setup.c](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/setup.c) 中,然后这个函数为 `setup_data` 重新映射内存并保留内存块(你可以阅读之前的 [章节](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-5.html) 了解关于 `setup_data` 的更多内容,你也可以在 [Linux kernel memory management](http://xinqiu.gitbooks.io/linux-insides-cn/content/MM/index.html) 中阅读到关于 `ioremap` and `memblock` 的更多内容)。 +这个函数的定义也在 [arch/x86/kernel/setup.c](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/setup.c) 中,然后这个函数为 `setup_data` 重新映射内存并保留内存块(你可以阅读之前的 [章节](/Initialization/linux-initialization-5.md) 了解关于 `setup_data` 的更多内容,你也可以在 [Linux kernel memory management](/MM/) 中阅读到关于 `ioremap` and `memblock` 的更多内容)。 接下来我们来看看下面的条件语句: @@ -134,7 +134,7 @@ int __init acpi_mps_check(void) ``` `acpi_mps_check` 函数检查内置的 `MPS` 又称 [多重处理器规范]((http://en.wikipedia.org/wiki/MultiProcessor_Specification)) 表。如果设置了 ` CONFIG_X86_LOCAL_APIC` 但未设置 `CONFIG_x86_MPPAARSE` ,而且传递给内核的命令行选项中有 `acpi=off`、`acpi=noirq` 或者 `pci=noacpi` 参数,那么`acpi_mps_check` 函数就会输出警告信息。如果 `acpi_mps_check` 返回了1,这就表示我们禁用了本地 [APIC](http://en.wikipedia.org/wiki/Advanced_Programmable_Interrupt_Controller) -,而且 `setup_clear_cpu_cap` 宏清除了当前CPU中的 `X86_FEATURE_APIC` 位。(你可以阅读 [CPU masks](https://xinqiu.gitbooks.io/linux-insides-cn/content/Concepts/linux-cpu-2.html) 了解关于CPU mask的更多内容)。 +,而且 `setup_clear_cpu_cap` 宏清除了当前CPU中的 `X86_FEATURE_APIC` 位。(你可以阅读 [CPU masks](/Concepts/linux-cpu-2.md) 了解关于CPU mask的更多内容)。 早期的PCI转储 -------------------------------------------------------------------------------- @@ -200,7 +200,7 @@ for (bus = 0; bus < 256; bus++) { -------------------------------------------------------------------------------- -在 `early_dump_pci_devices` 函数后面,有一些与可用内存和[e820](http://en.wikipedia.org/wiki/E820)相关的函数,其中 [e820](http://en.wikipedia.org/wiki/E820) 的相关信息我们在 [内核安装的第一步](http://xinqiu.gitbooks.io/linux-insides-cn/content/Booting/linux-bootstrap-2.html) 章节中整理过。 +在 `early_dump_pci_devices` 函数后面,有一些与可用内存和[e820](http://en.wikipedia.org/wiki/E820)相关的函数,其中 [e820](http://en.wikipedia.org/wiki/E820) 的相关信息我们在 [内核安装的第一步](/Booting/linux-bootstrap-2.md) 章节中整理过。 ```C /* update the e820_saved too */ e820_reserve_setup_data(); @@ -541,15 +541,14 @@ MEMBLOCK configuration: * [NX bit](http://en.wikipedia.org/wiki/NX_bit) * [Documentation/kernel-parameters.txt](https://github.com/torvalds/linux/blob/master/Documentation/kernel-parameters.txt) * [APIC](http://en.wikipedia.org/wiki/Advanced_Programmable_Interrupt_Controller) -* [CPU masks](http://0xax.gitbooks.io/linux-insides/content/Concepts/cpumask.html) -* [Linux kernel memory management](http://xinqiu.gitbooks.io/linux-insides-cn/content/MM/index.html) +* [CPU masks](/Concepts/linux-cpu-2.md) +* [Linux kernel memory management](/MM/index.md) * [PCI](http://en.wikipedia.org/wiki/Conventional_PCI) * [e820](http://en.wikipedia.org/wiki/E820) * [System Management BIOS](http://en.wikipedia.org/wiki/System_Management_BIOS) -* [System Management BIOS](http://en.wikipedia.org/wiki/System_Management_BIOS) * [EFI](http://en.wikipedia.org/wiki/Unified_Extensible_Firmware_Interface) * [SMP](http://en.wikipedia.org/wiki/Symmetric_multiprocessing) * [MultiProcessor Specification](http://www.intel.com/design/pentium/datashts/24201606.pdf) * [BSS](http://en.wikipedia.org/wiki/.bss) * [SMBIOS specification](http://www.dmtf.org/sites/default/files/standards/documents/DSP0134v2.5Final.pdf) -* [前一个章节](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-5.html) +* [前一个章节](/Initialization/linux-initialization-5.md) diff --git a/Initialization/linux-initialization-7.md b/Initialization/linux-initialization-7.md index 6c9de2e..993da13 100644 --- a/Initialization/linux-initialization-7.md +++ b/Initialization/linux-initialization-7.md @@ -4,7 +4,7 @@ Kernel initialization. Part 7. The End of the architecture-specific initialization, almost... ================================================================================ -This is the seventh part of the Linux Kernel initialization process which covers insides of the `setup_arch` function from the [arch/x86/kernel/setup.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/kernel/setup.c#L861). As you can know from the previous [parts](https://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-6.html), the `setup_arch` function does some architecture-specific (in our case it is [x86_64](http://en.wikipedia.org/wiki/X86-64)) initialization stuff like reserving memory for kernel code/data/bss, early scanning of the [Desktop Management Interface](http://en.wikipedia.org/wiki/Desktop_Management_Interface), early dump of the [PCI](http://en.wikipedia.org/wiki/PCI) device and many many more. If you have read the previous [part](https://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-6.html), you can remember that we've finished it at the `setup_real_mode` function. In the next step, as we set limit of the [memblock](https://xinqiu.gitbooks.io/linux-insides-cn/content/MM/linux-mm-1.html) to the all mapped pages, we can see the call of the `setup_log_buf` function from the [kernel/printk/printk.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/kernel/printk/printk.c). +This is the seventh part of the Linux Kernel initialization process which covers insides of the `setup_arch` function from the [arch/x86/kernel/setup.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/kernel/setup.c#L861). As you can know from the previous [parts](/Initialization/linux-initialization-6.md), the `setup_arch` function does some architecture-specific (in our case it is [x86_64](http://en.wikipedia.org/wiki/X86-64)) initialization stuff like reserving memory for kernel code/data/bss, early scanning of the [Desktop Management Interface](http://en.wikipedia.org/wiki/Desktop_Management_Interface), early dump of the [PCI](http://en.wikipedia.org/wiki/PCI) device and many many more. If you have read the previous [part](/Initialization/linux-initialization-6.md), you can remember that we've finished it at the `setup_real_mode` function. In the next step, as we set limit of the [memblock](/MM/linux-mm-1.md) to the all mapped pages, we can see the call of the `setup_log_buf` function from the [kernel/printk/printk.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/kernel/printk/printk.c). The `setup_log_buf` function setups kernel cyclic buffer and its length depends on the `CONFIG_LOG_BUF_SHIFT` configuration option. As we can read from the documentation of the `CONFIG_LOG_BUF_SHIFT` it can be between `12` and `21`. In the insides, buffer defined as array of chars: @@ -32,7 +32,7 @@ setup_log_buf(1); where `1` means that it is early setup. In the next step we check `new_log_buf_len` variable which is updated length of the kernel log buffer and allocate new space for the buffer with the `memblock_virt_alloc` function for it, or just return. -As kernel log buffer is ready, the next function is `reserve_initrd`. You can remember that we already called the `early_reserve_initrd` function in the fourth part of the [Kernel initialization](https://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-4.html). Now, as we reconstructed direct memory mapping in the `init_mem_mapping` function, we need to move [initrd](http://en.wikipedia.org/wiki/Initrd) into directly mapped memory. The `reserve_initrd` function starts from the definition of the base address and end address of the `initrd` and check that `initrd` is provided by a bootloader. All the same as what we saw in the `early_reserve_initrd`. But instead of the reserving place in the `memblock` area with the call of the `memblock_reserve` function, we get the mapped size of the direct memory area and check that the size of the `initrd` is not greater than this area with: +As kernel log buffer is ready, the next function is `reserve_initrd`. You can remember that we already called the `early_reserve_initrd` function in the fourth part of the [Kernel initialization](/Initialization/linux-initialization-4.md). Now, as we reconstructed direct memory mapping in the `init_mem_mapping` function, we need to move [initrd](http://en.wikipedia.org/wiki/Initrd) into directly mapped memory. The `reserve_initrd` function starts from the definition of the base address and end address of the `initrd` and check that `initrd` is provided by a bootloader. All the same as what we saw in the `early_reserve_initrd`. But instead of the reserving place in the `memblock` area with the call of the `memblock_reserve` function, we get the mapped size of the direct memory area and check that the size of the `initrd` is not greater than this area with: ```C mapped_size = memblock_mem_size(max_pfn_mapped); @@ -68,7 +68,7 @@ memblock_free(ramdisk_image, ramdisk_end - ramdisk_image); After we relocated `initrd` ramdisk image, the next function is `vsmp_init` from the [arch/x86/kernel/vsmp_64.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/kernel/vsmp_64.c). This function initializes support of the `ScaleMP vSMP`. As I already wrote in the previous parts, this chapter will not cover non-related `x86_64` initialization parts (for example as the current or `ACPI`, etc.). So we will skip implementation of this for now and will back to it in the part which cover techniques of parallel computing. -The next function is `io_delay_init` from the [arch/x86/kernel/io_delay.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/kernel/io_delay.c). This function allows to override default I/O delay `0x80` port. We already saw I/O delay in the [Last preparation before transition into protected mode](https://xinqiu.gitbooks.io/linux-insides-cn/content/Booting/linux-bootstrap-3.html), now let's look on the `io_delay_init` implementation: +The next function is `io_delay_init` from the [arch/x86/kernel/io_delay.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/kernel/io_delay.c). This function allows to override default I/O delay `0x80` port. We already saw I/O delay in the [Last preparation before transition into protected mode](/Booting/linux-bootstrap-3.md), now let's look on the `io_delay_init` implementation: ```C void __init io_delay_init(void) @@ -98,7 +98,7 @@ We can see `io_delay` command line parameter setup with the `early_param` macro early_param("io_delay", io_delay_param); ``` -More about `early_param` you can read in the previous [part](https://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-6.html). So the `io_delay_param` function which setups `io_delay_override` variable will be called in the [do_early_param](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/init/main.c#L413) function. `io_delay_param` function gets the argument of the `io_delay` kernel command line parameter and sets `io_delay_type` depends on it: +More about `early_param` you can read in the previous [part](/Initialization/linux-initialization-6.md). So the `io_delay_param` function which setups `io_delay_override` variable will be called in the [do_early_param](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/init/main.c#L413) function. `io_delay_param` function gets the argument of the `io_delay` kernel command line parameter and sets `io_delay_type` depends on it: ```C static int __init io_delay_param(char *s) @@ -296,19 +296,19 @@ BUILD_BUG_ON((unsigned long)__fix_to_virt(VSYSCALL_PAGE) != (unsigned long)VSYSCALL_ADDR); ``` -Now `vsyscall` area is in the `fix-mapped` area. That's all about `map_vsyscall`, if you do not know anything about fix-mapped addresses, you can read [Fix-Mapped Addresses and ioremap](https://xinqiu.gitbooks.io/linux-insides-cn/content/MM/linux-mm-2.html). We will see more about `vsyscalls` in the `vsyscalls and vdso` part. +Now `vsyscall` area is in the `fix-mapped` area. That's all about `map_vsyscall`, if you do not know anything about fix-mapped addresses, you can read [Fix-Mapped Addresses and ioremap](/MM/linux-mm-2.md). We will see more about `vsyscalls` in the `vsyscalls and vdso` part. Getting the SMP configuration -------------------------------------------------------------------------------- -You may remember how we made a search of the [SMP](http://en.wikipedia.org/wiki/Symmetric_multiprocessing) configuration in the previous [part](https://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-6.html). Now we need to get the `SMP` configuration if we found it. For this we check `smp_found_config` variable which we set in the `smp_scan_config` function (read about it the previous part) and call the `get_smp_config` function: +You may remember how we made a search of the [SMP](http://en.wikipedia.org/wiki/Symmetric_multiprocessing) configuration in the previous [part](/Initialization/linux-initialization-6.md). Now we need to get the `SMP` configuration if we found it. For this we check `smp_found_config` variable which we set in the `smp_scan_config` function (read about it the previous part) and call the `get_smp_config` function: ```C if (smp_found_config) get_smp_config(); ``` -The `get_smp_config` expands to the `x86_init.mpparse.default_get_smp_config` function which is defined in the [arch/x86/kernel/mpparse.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/kernel/mpparse.c). This function defines a pointer to the multiprocessor floating pointer structure - `mpf_intel` (you can read about it in the previous [part](https://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-6.html)) and does some checks: +The `get_smp_config` expands to the `x86_init.mpparse.default_get_smp_config` function which is defined in the [arch/x86/kernel/mpparse.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/kernel/mpparse.c). This function defines a pointer to the multiprocessor floating pointer structure - `mpf_intel` (you can read about it in the previous [part](/Initialization/linux-initialization-6.md)) and does some checks: ```C struct mpf_intel *mpf = mpf_found; @@ -320,7 +320,7 @@ if (acpi_lapic && early) return; ``` -Here we can see that multiprocessor configuration was found in the `smp_scan_config` function or just return from the function if not. The next check is `acpi_lapic` and `early`. And as we did this checks, we start to read the `SMP` configuration. As we finished reading it, the next step is - `prefill_possible_map` function which makes preliminary filling of the possible CPU's `cpumask` (more about it you can read in the [Introduction to the cpumasks](https://xinqiu.gitbooks.io/linux-insides-cn/content/Concepts/linux-cpu-2.html)). +Here we can see that multiprocessor configuration was found in the `smp_scan_config` function or just return from the function if not. The next check is `acpi_lapic` and `early`. And as we did this checks, we start to read the `SMP` configuration. As we finished reading it, the next step is - `prefill_possible_map` function which makes preliminary filling of the possible CPU's `cpumask` (more about it you can read in the [Introduction to the cpumasks](/Concepts/linux-cpu-2.md)). The rest of the setup_arch -------------------------------------------------------------------------------- @@ -334,7 +334,7 @@ That's all, and now we can back to the `start_kernel` from the `setup_arch`. Back to the main.c ================================================================================ -As I wrote above, we have finished with the `setup_arch` function and now we can back to the `start_kernel` function from the [init/main.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/init/main.c). As you may remember or saw yourself, `start_kernel` function as big as the `setup_arch`. So the couple of the next part will be dedicated to learning of this function. So, let's continue with it. After the `setup_arch` we can see the call of the `mm_init_cpumask` function. This function sets the [cpumask](https://xinqiu.gitbooks.io/linux-insides-cn/content/Concepts/linux-cpu-2.html) pointer to the memory descriptor `cpumask`. We can look on its implementation: +As I wrote above, we have finished with the `setup_arch` function and now we can back to the `start_kernel` function from the [init/main.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/init/main.c). As you may remember or saw yourself, `start_kernel` function as big as the `setup_arch`. So the couple of the next part will be dedicated to learning of this function. So, let's continue with it. After the `setup_arch` we can see the call of the `mm_init_cpumask` function. This function sets the [cpumask](/Concepts/linux-cpu-2.md) pointer to the memory descriptor `cpumask`. We can look on its implementation: ```C static inline void mm_init_cpumask(struct mm_struct *mm) @@ -379,7 +379,7 @@ static void __init setup_command_line(char *command_line) Here we can see that we allocate space for the three buffers which will contain kernel command line for the different purposes (read above). And as we allocated space, we store `boot_command_line` in the `saved_command_line` and `command_line` (kernel command line from the `setup_arch`) to the `static_command_line`. -The next function after the `setup_command_line` is the `setup_nr_cpu_ids`. This function setting `nr_cpu_ids` (number of CPUs) according to the last bit in the `cpu_possible_mask` (more about it you can read in the chapter describes [cpumasks](https://xinqiu.gitbooks.io/linux-insides-cn/content/Concepts/linux-cpu-2.html) concept). Let's look on its implementation: +The next function after the `setup_command_line` is the `setup_nr_cpu_ids`. This function setting `nr_cpu_ids` (number of CPUs) according to the last bit in the `cpu_possible_mask` (more about it you can read in the chapter describes [cpumasks](/Concepts/linux-cpu-2.md) concept). Let's look on its implementation: ```C void __init setup_nr_cpu_ids(void) @@ -479,4 +479,4 @@ Links * [vsyscalls](https://lwn.net/Articles/446528/) * [SMP](http://en.wikipedia.org/wiki/Symmetric_multiprocessing) * [jiffy](http://en.wikipedia.org/wiki/Jiffy_%28time%29) -* [Previous part](https://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-6.html) \ No newline at end of file +* [Previous part](/Initialization/linux-initialization-6.md) \ No newline at end of file diff --git a/Initialization/linux-initialization-8.md b/Initialization/linux-initialization-8.md index 64ccb7d..9cd50e9 100644 --- a/Initialization/linux-initialization-8.md +++ b/Initialization/linux-initialization-8.md @@ -4,7 +4,7 @@ Kernel initialization. Part 8. Scheduler initialization ================================================================================ -This is the eighth [part](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/index.html) of the Linux kernel initialization process and we stopped on the `setup_nr_cpu_ids` function in the [previous](https://github.com/hust-open-atom-club/linux-insides-zh/blob/master/Initialization/linux-initialization-7.md) part. The main point of the current part is [scheduler](http://en.wikipedia.org/wiki/Scheduling_%28computing%29) initialization. But before we will start to learn initialization process of the scheduler, we need to do some stuff. The next step in the [init/main.c](https://github.com/torvalds/linux/blob/master/init/main.c) is the `setup_per_cpu_areas` function. This function setups areas for the `percpu` variables, more about it you can read in the special part about the [Per-CPU variables](https://xinqiu.gitbooks.io/linux-insides-cn/content/Concepts/linux-cpu-1.html). After `percpu` areas is up and running, the next step is the `smp_prepare_boot_cpu` function. This function does some preparations for the [SMP](http://en.wikipedia.org/wiki/Symmetric_multiprocessing): +This is the eighth [part](/Initialization/) of the Linux kernel initialization process and we stopped on the `setup_nr_cpu_ids` function in the [previous](https://github.com/hust-open-atom-club/linux-insides-zh/blob/master/Initialization/linux-initialization-7.md) part. The main point of the current part is [scheduler](http://en.wikipedia.org/wiki/Scheduling_%28computing%29) initialization. But before we will start to learn initialization process of the scheduler, we need to do some stuff. The next step in the [init/main.c](https://github.com/torvalds/linux/blob/master/init/main.c) is the `setup_per_cpu_areas` function. This function setups areas for the `percpu` variables, more about it you can read in the special part about the [Per-CPU variables](/Concepts/linux-cpu-1.md). After `percpu` areas is up and running, the next step is the `smp_prepare_boot_cpu` function. This function does some preparations for the [SMP](http://en.wikipedia.org/wiki/Symmetric_multiprocessing): ```C static inline void smp_prepare_boot_cpu(void) @@ -25,7 +25,7 @@ void __init native_smp_prepare_boot_cpu(void) } ``` -The `native_smp_prepare_boot_cpu` function gets the id of the current CPU (which is Bootstrap processor and its `id` is zero) with the `smp_processor_id` function. I will not explain how the `smp_processor_id` works, because we already saw it in the [Kernel entry point](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-4.html) part. As we got processor `id` number we reload [Global Descriptor Table](http://en.wikipedia.org/wiki/Global_Descriptor_Table) for the given CPU with the `switch_to_new_gdt` function: +The `native_smp_prepare_boot_cpu` function gets the id of the current CPU (which is Bootstrap processor and its `id` is zero) with the `smp_processor_id` function. I will not explain how the `smp_processor_id` works, because we already saw it in the [Kernel entry point](/Initialization/linux-initialization-4.md) part. As we got processor `id` number we reload [Global Descriptor Table](http://en.wikipedia.org/wiki/Global_Descriptor_Table) for the given CPU with the `switch_to_new_gdt` function: ```C void switch_to_new_gdt(int cpu) @@ -39,7 +39,7 @@ void switch_to_new_gdt(int cpu) } ``` -The `gdt_descr` variable represents pointer to the `GDT` descriptor here (we already saw `desc_ptr` in the [Early interrupt and exception handling](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-2.html)). We get the address and the size of the `GDT` descriptor where `GDT_SIZE` is `256` or: +The `gdt_descr` variable represents pointer to the `GDT` descriptor here (we already saw `desc_ptr` in the [Early interrupt and exception handling](/Initialization/linux-initialization-2.md)). We get the address and the size of the `GDT` descriptor where `GDT_SIZE` is `256` or: ```C #define GDT_SIZE (GDT_ENTRIES * 8) @@ -54,7 +54,7 @@ static inline struct desc_struct *get_cpu_gdt_table(unsigned int cpu) } ``` -The `get_cpu_gdt_table` uses `per_cpu` macro for getting `gdt_page` percpu variable for the given CPU number (bootstrap processor with `id` - 0 in our case). You may ask the following question: so, if we can access `gdt_page` percpu variable, where it was defined? Actually we already saw it in this book. If you have read the first [part](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-1.html) of this chapter, you can remember that we saw definition of the `gdt_page` in the [arch/x86/kernel/head_64.S](https://github.com/0xAX/linux/blob/master/arch/x86/kernel/head_64.S): +The `get_cpu_gdt_table` uses `per_cpu` macro for getting `gdt_page` percpu variable for the given CPU number (bootstrap processor with `id` - 0 in our case). You may ask the following question: so, if we can access `gdt_page` percpu variable, where it was defined? Actually we already saw it in this book. If you have read the first [part](/Initialization/linux-initialization-1.md) of this chapter, you can remember that we saw definition of the `gdt_page` in the [arch/x86/kernel/head_64.S](https://github.com/0xAX/linux/blob/master/arch/x86/kernel/head_64.S): ```assembly early_gdt_descr: @@ -86,7 +86,7 @@ DEFINE_PER_CPU_PAGE_ALIGNED(struct gdt_page, gdt_page) = { .gdt = { ... ``` -more about `percpu` variables you can read in the [Per-CPU variables](https://xinqiu.gitbooks.io/linux-insides-cn/content/Concepts/linux-cpu-1.html) part. As we got address and size of the `GDT` descriptor we reload `GDT` with the `load_gdt` which just execute `lgdt` instruct and load `percpu_segment` with the following function: +more about `percpu` variables you can read in the [Per-CPU variables](/Concepts/linux-cpu-1.md) part. As we got address and size of the `GDT` descriptor we reload `GDT` with the `load_gdt` which just execute `lgdt` instruct and load `percpu_segment` with the following function: ```C void load_percpu_segment(int cpu) { @@ -180,11 +180,11 @@ After this we can see the kernel command line in the initialization output: ![kernel command line](http://oi58.tinypic.com/2m7vz10.jpg) -And a couple of functions such as `parse_early_param` and `parse_args` which handles linux kernel command line. You may remember that we already saw the call of the `parse_early_param` function in the sixth [part](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-6.html) of the kernel initialization chapter, so why we call it again? Answer is simple: we call this function in the architecture-specific code (`x86_64` in our case), but not all architecture calls this function. And we need to call the second function `parse_args` to parse and handle non-early command line arguments. +And a couple of functions such as `parse_early_param` and `parse_args` which handles linux kernel command line. You may remember that we already saw the call of the `parse_early_param` function in the sixth [part](/Initialization/linux-initialization-6.md) of the kernel initialization chapter, so why we call it again? Answer is simple: we call this function in the architecture-specific code (`x86_64` in our case), but not all architecture calls this function. And we need to call the second function `parse_args` to parse and handle non-early command line arguments. In the next step we can see the call of the `jump_label_init` from the [kernel/jump_label.c](https://github.com/torvalds/linux/blob/master/kernel/jump_label.c). and initializes [jump label](https://lwn.net/Articles/412072/). -After this we can see the call of the `setup_log_buf` function which setups the [printk](http://www.makelinux.net/books/lkd2/ch18lev1sec3) log buffer. We already saw this function in the seventh [part](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-7.html) of the linux kernel initialization process chapter. +After this we can see the call of the `setup_log_buf` function which setups the [printk](http://www.makelinux.net/books/lkd2/ch18lev1sec3) log buffer. We already saw this function in the seventh [part](/Initialization/linux-initialization-7.md) of the linux kernel initialization process chapter. PID hash initialization -------------------------------------------------------------------------------- @@ -205,7 +205,7 @@ pid_hash = alloc_large_system_hash("PID", sizeof(*pid_hash), 0, 18, ``` The number of elements of the `pid_hash` depends on the `RAM` configuration, but it can be between `2^4` and `2^12`. The `pidhash_init` computes the size -and allocates the required storage (which is `hlist` in our case - the same as [doubly linked list](https://xinqiu.gitbooks.io/linux-insides-cn/content/DataStructures/linux-datastructures-1.html), but contains one pointer instead on the [struct hlist_head](https://github.com/torvalds/linux/blob/master/include/linux/types.h)]. The `alloc_large_system_hash` function allocates a large system hash table with `memblock_virt_alloc_nopanic` if we pass `HASH_EARLY` flag (as it in our case) or with `__vmalloc` if we did no pass this flag. +and allocates the required storage (which is `hlist` in our case - the same as [doubly linked list](/DataStructures/linux-datastructures-1.md), but contains one pointer instead on the [struct hlist_head](https://github.com/torvalds/linux/blob/master/include/linux/types.h)]. The `alloc_large_system_hash` function allocates a large system hash table with `memblock_virt_alloc_nopanic` if we pass `HASH_EARLY` flag (as it in our case) or with `__vmalloc` if we did no pass this flag. The result we can see in the `dmesg` output: @@ -230,7 +230,7 @@ pgtable_init(); vmalloc_init(); ``` -The first is `page_ext_init_flatmem` which depends on the `CONFIG_SPARSEMEM` kernel configuration option and initializes extended data per page handling. The `mem_init` releases all `bootmem`, the `kmem_cache_init` initializes kernel cache, the `percpu_init_late` - replaces `percpu` chunks with those allocated by [slub](http://en.wikipedia.org/wiki/SLUB_%28software%29), the `pgtable_init` - initializes the `page->ptl` kernel cache, the `vmalloc_init` - initializes `vmalloc`. Please, **NOTE** that we will not dive into details about all of these functions and concepts, but we will see all of they it in the [Linux kernel memory manager](http://xinqiu.gitbooks.io/linux-insides-cn/content/MM/index.html) chapter. +The first is `page_ext_init_flatmem` which depends on the `CONFIG_SPARSEMEM` kernel configuration option and initializes extended data per page handling. The `mem_init` releases all `bootmem`, the `kmem_cache_init` initializes kernel cache, the `percpu_init_late` - replaces `percpu` chunks with those allocated by [slub](http://en.wikipedia.org/wiki/SLUB_%28software%29), the `pgtable_init` - initializes the `page->ptl` kernel cache, the `vmalloc_init` - initializes `vmalloc`. Please, **NOTE** that we will not dive into details about all of these functions and concepts, but we will see all of they it in the [Linux kernel memory manager](/MM/) chapter. That's all. Now we can look on the `scheduler`. @@ -290,7 +290,7 @@ The root task group is the task group which belongs to every task in system. As DECLARE_PER_CPU(cpumask_var_t, load_balance_mask); ``` -Here `cpumask_var_t` is the `cpumask_t` with one difference: `cpumask_var_t` is allocated only `nr_cpu_ids` bits when the `cpumask_t` always has `NR_CPUS` bits (more about `cpumask` you can read in the [CPU masks](https://xinqiu.gitbooks.io/linux-insides-cn/content/Concepts/linux-cpu-2.html) part). As you can see: +Here `cpumask_var_t` is the `cpumask_t` with one difference: `cpumask_var_t` is allocated only `nr_cpu_ids` bits when the `cpumask_t` always has `NR_CPUS` bits (more about `cpumask` you can read in the [CPU masks](/Concepts/linux-cpu-2.md) part). As you can see: ```C #ifdef CONFIG_CPUMASK_OFFSTACK @@ -461,19 +461,19 @@ If you have any questions or suggestions write me a comment or ping me at [twitt Links -------------------------------------------------------------------------------- -* [CPU masks](https://xinqiu.gitbooks.io/linux-insides-cn/content/Concepts/linux-cpu-2.html) +* [CPU masks](/Concepts/linux-cpu-2.md) * [high-resolution kernel timer](https://www.kernel.org/doc/Documentation/timers/hrtimers.txt) * [spinlock](http://en.wikipedia.org/wiki/Spinlock) * [Run queue](http://en.wikipedia.org/wiki/Run_queue) -* [Linux kernem memory manager](http://0xax.gitbooks.io/linux-insides/content/MM/index.html) +* [Linux kernem memory manager](/MM/) * [slub](http://en.wikipedia.org/wiki/SLUB_%28software%29) * [virtual file system](http://en.wikipedia.org/wiki/Virtual_file_system) * [Linux kernel hotplug documentation](https://www.kernel.org/doc/Documentation/cpu-hotplug.txt) * [IRQ](http://en.wikipedia.org/wiki/Interrupt_request_%28PC_architecture%29) * [Global Descriptor Table](http://en.wikipedia.org/wiki/Global_Descriptor_Table) -* [Per-CPU variables](http://0xax.gitbooks.io/linux-insides/content/Concepts/per-cpu.html) +* [Per-CPU variables](/Concepts/linux-cpu-1.md) * [SMP](http://en.wikipedia.org/wiki/Symmetric_multiprocessing) * [RCU](http://en.wikipedia.org/wiki/Read-copy-update) * [CFS Scheduler documentation](https://www.kernel.org/doc/Documentation/scheduler/sched-design-CFS.txt) * [Real-Time group scheduling](https://www.kernel.org/doc/Documentation/scheduler/sched-rt-group.txt) -* [Previous part](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-7.html) +* [Previous part](/Initialization/linux-initialization-7.md) diff --git a/Initialization/linux-initialization-9.md b/Initialization/linux-initialization-9.md index ac89678..4a697ed 100644 --- a/Initialization/linux-initialization-9.md +++ b/Initialization/linux-initialization-9.md @@ -4,7 +4,7 @@ Kernel initialization. Part 9. RCU initialization ================================================================================ -This is ninth part of the [Linux Kernel initialization process](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/index.html) and in the previous part we stopped at the [scheduler initialization](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-8.html). In this part we will continue to dive to the linux kernel initialization process and the main purpose of this part will be to learn about initialization of the [RCU](http://en.wikipedia.org/wiki/Read-copy-update). We can see that the next step in the [init/main.c](https://github.com/torvalds/linux/blob/master/init/main.c) after the `sched_init` is the call of the `preempt_disable`. There are two macros: +This is ninth part of the [Linux Kernel initialization process](/Initialization/) and in the previous part we stopped at the [scheduler initialization](/Initialization/linux-initialization-8.md). In this part we will continue to dive to the linux kernel initialization process and the main purpose of this part will be to learn about initialization of the [RCU](http://en.wikipedia.org/wiki/Read-copy-update). We can see that the next step in the [init/main.c](https://github.com/torvalds/linux/blob/master/init/main.c) after the `sched_init` is the call of the `preempt_disable`. There are two macros: * `preempt_disable` * `preempt_enable` @@ -38,7 +38,7 @@ In the first implementation of the `preempt_disable` we increment this `__preemp #define preempt_count_add(val) __preempt_count_add(val) ``` -where `preempt_count_add` calls the `raw_cpu_add_4` macro which adds `1` to the given `percpu` variable (`__preempt_count`) in our case (more about `precpu` variables you can read in the part about [Per-CPU variables](https://xinqiu.gitbooks.io/linux-insides-cn/content/Concepts/linux-cpu-1.html)). Ok, we increased `__preempt_count` and the next step we can see the call of the `barrier` macro in the both macros. The `barrier` macro inserts an optimization barrier. In the processors with `x86_64` architecture independent memory access operations can be performed in any order. That's why we need the opportunity to point compiler and processor on compliance of order. This mechanism is memory barrier. Let's consider a simple example: +where `preempt_count_add` calls the `raw_cpu_add_4` macro which adds `1` to the given `percpu` variable (`__preempt_count`) in our case (more about `precpu` variables you can read in the part about [Per-CPU variables](/Concepts/linux-cpu-1.md)). Ok, we increased `__preempt_count` and the next step we can see the call of the `barrier` macro in the both macros. The `barrier` macro inserts an optimization barrier. In the processors with `x86_64` architecture independent memory access operations can be performed in any order. That's why we need the opportunity to point compiler and processor on compliance of order. This mechanism is memory barrier. Let's consider a simple example: ```C preempt_disable(); @@ -83,7 +83,7 @@ void __init idr_init_cache(void) } ``` -Here we can see the call of the `kmem_cache_create`. We already called the `kmem_cache_init` in the [init/main.c](https://github.com/torvalds/linux/blob/master/init/main.c#L485). This function create generalized caches again using the `kmem_cache_alloc` (more about caches we will see in the [Linux kernel memory management](http://xinqiu.gitbooks.io/linux-insides-cn/content/MM/index.html) chapter). In our case, as we are using `kmem_cache_t` which will be used by the [slab](http://en.wikipedia.org/wiki/Slab_allocation) allocator and `kmem_cache_create` creates it. As you can see we pass five parameters to the `kmem_cache_create`: +Here we can see the call of the `kmem_cache_create`. We already called the `kmem_cache_init` in the [init/main.c](https://github.com/torvalds/linux/blob/master/init/main.c#L485). This function create generalized caches again using the `kmem_cache_alloc` (more about caches we will see in the [Linux kernel memory management](/MM/) chapter). In our case, as we are using `kmem_cache_t` which will be used by the [slab](http://en.wikipedia.org/wiki/Slab_allocation) allocator and `kmem_cache_create` creates it. As you can see we pass five parameters to the `kmem_cache_create`: * name of the cache; * size of the object to store in cache; @@ -127,7 +127,7 @@ The next step is [RCU](http://en.wikipedia.org/wiki/Read-copy-update) initializa In the first case `rcu_init` will be in the [kernel/rcu/tiny.c](https://github.com/torvalds/linux/blob/master/kernel/rcu/tiny.c) and in the second case it will be defined in the [kernel/rcu/tree.c](https://github.com/torvalds/linux/blob/master/kernel/rcu/tree.c). We will see the implementation of the `tree rcu`, but first of all about the `RCU` in general. -`RCU` or read-copy update is a scalable high-performance synchronization mechanism implemented in the Linux kernel. On the early stage the linux kernel provided support and environment for the concurrently running applications, but all execution was serialized in the kernel using a single global lock. In our days linux kernel has no single global lock, but provides different mechanisms including [lock-free data structures](http://en.wikipedia.org/wiki/Concurrent_data_structure), [percpu](https://xinqiu.gitbooks.io/linux-insides-cn/content/Concepts/linux-cpu-1.html) data structures and other. One of these mechanisms is - the `read-copy update`. The `RCU` technique is designed for rarely-modified data structures. The idea of the `RCU` is simple. For example we have a rarely-modified data structure. If somebody wants to change this data structure, we make a copy of this data structure and make all changes in the copy. In the same time all other users of the data structure use old version of it. Next, we need to choose safe moment when original version of the data structure will have no users and update it with the modified copy. +`RCU` or read-copy update is a scalable high-performance synchronization mechanism implemented in the Linux kernel. On the early stage the linux kernel provided support and environment for the concurrently running applications, but all execution was serialized in the kernel using a single global lock. In our days linux kernel has no single global lock, but provides different mechanisms including [lock-free data structures](http://en.wikipedia.org/wiki/Concurrent_data_structure), [percpu](/Concepts/linux-cpu-1.md) data structures and other. One of these mechanisms is - the `read-copy update`. The `RCU` technique is designed for rarely-modified data structures. The idea of the `RCU` is simple. For example we have a rarely-modified data structure. If somebody wants to change this data structure, we make a copy of this data structure and make all changes in the copy. In the same time all other users of the data structure use old version of it. Next, we need to choose safe moment when original version of the data structure will have no users and update it with the modified copy. Of course this description of the `RCU` is very simplified. To understand some details about `RCU`, first of all we need to learn some terminology. Data readers in the `RCU` executed in the [critical section](http://en.wikipedia.org/wiki/Critical_section). Every time when data reader get to the critical section, it calls the `rcu_read_lock`, and `rcu_read_unlock` on exit from the critical section. If the thread is not in the critical section, it will be in state which called - `quiescent state`. The moment when every thread is in the `quiescent state` called - `grace period`. If a thread wants to remove an element from the data structure, this occurs in two steps. First step is `removal` - atomically removes element from the data structure, but does not release the physical memory. After this thread-writer announces and waits until it is finished. From this moment, the removed element is available to the thread-readers. After the `grace period` finished, the second step of the element removal will be started, it just removes the element from the physical memory. @@ -378,7 +378,7 @@ Ok, we already passed the main theme of this part which is `RCU` initialization, After we initialized `RCU`, the next step which you can see in the [init/main.c](https://github.com/torvalds/linux/blob/master/init/main.c) is the - `trace_init` function. As you can understand from its name, this function initialize [tracing](http://en.wikipedia.org/wiki/Tracing_%28software%29) subsystem. You can read more about linux kernel trace system - [here](http://elinux.org/Kernel_Trace_Systems). -After the `trace_init`, we can see the call of the `radix_tree_init`. If you are familiar with the different data structures, you can understand from the name of this function that it initializes kernel implementation of the [Radix tree](http://en.wikipedia.org/wiki/Radix_tree). This function is defined in the [lib/radix-tree.c](https://github.com/torvalds/linux/blob/master/lib/radix-tree.c) and you can read more about it in the part about [Radix tree](https://xinqiu.gitbooks.io/linux-insides-cn/content/DataStructures/linux-datastructures-2.html). +After the `trace_init`, we can see the call of the `radix_tree_init`. If you are familiar with the different data structures, you can understand from the name of this function that it initializes kernel implementation of the [Radix tree](http://en.wikipedia.org/wiki/Radix_tree). This function is defined in the [lib/radix-tree.c](https://github.com/torvalds/linux/blob/master/lib/radix-tree.c) and you can read more about it in the part about [Radix tree](/DataStructures/linux-datastructures-2.md). In the next step we can see the functions which are related to the `interrupts handling` subsystem, they are: @@ -394,18 +394,18 @@ The next couple of functions are related with the [perf](https://perf.wiki.kerne local_irq_enable(); ``` -which expands to the `sti` instruction and making post initialization of the [SLAB](http://en.wikipedia.org/wiki/Slab_allocation) with the call of the `kmem_cache_init_late` function (As I wrote above we will know about the `SLAB` in the [Linux memory management](http://xinqiu.gitbooks.io/linux-insides-cn/content/MM/index.html) chapter). +which expands to the `sti` instruction and making post initialization of the [SLAB](http://en.wikipedia.org/wiki/Slab_allocation) with the call of the `kmem_cache_init_late` function (As I wrote above we will know about the `SLAB` in the [Linux memory management](/MM/) chapter). After the post initialization of the `SLAB`, next point is initialization of the console with the `console_init` function from the [drivers/tty/tty_io.c](https://github.com/torvalds/linux/blob/master/drivers/tty/tty_io.c). After the console initialization, we can see the `lockdep_info` function which prints information about the [Lock dependency validator](https://www.kernel.org/doc/Documentation/locking/lockdep-design.txt). After this, we can see the initialization of the dynamic allocation of the `debug objects` with the `debug_objects_mem_init`, kernel memory leak [detector](https://www.kernel.org/doc/Documentation/kmemleak.txt) initialization with the `kmemleak_init`, `percpu` pageset setup with the `setup_per_cpu_pageset`, setup of the [NUMA](http://en.wikipedia.org/wiki/Non-uniform_memory_access) policy with the `numa_policy_init`, setting time for the scheduler with the `sched_clock_init`, `pidmap` initialization with the call of the `pidmap_init` function for the initial `PID` namespace, cache creation with the `anon_vma_init` for the private virtual memory areas and early initialization of the [ACPI](http://en.wikipedia.org/wiki/Advanced_Configuration_and_Power_Interface) with the `acpi_early_init`. -This is the end of the ninth part of the [linux kernel initialization process](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/index.html) and here we saw initialization of the [RCU](http://en.wikipedia.org/wiki/Read-copy-update). In the last paragraph of this part (`Rest of the initialization process`) we will go through many functions but did not dive into details about their implementations. Do not worry if you do not know anything about these stuff or you know and do not understand anything about this. As I already wrote many times, we will see details of implementations in other parts or other chapters. +This is the end of the ninth part of the [linux kernel initialization process](/Initialization/) and here we saw initialization of the [RCU](http://en.wikipedia.org/wiki/Read-copy-update). In the last paragraph of this part (`Rest of the initialization process`) we will go through many functions but did not dive into details about their implementations. Do not worry if you do not know anything about these stuff or you know and do not understand anything about this. As I already wrote many times, we will see details of implementations in other parts or other chapters. Conclusion -------------------------------------------------------------------------------- -It is the end of the ninth part about the linux kernel [initialization process](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/index.html). In this part, we looked on the initialization process of the `RCU` subsystem. In the next part we will continue to dive into linux kernel initialization process and I hope that we will finish with the `start_kernel` function and will go to the `rest_init` function from the same [init/main.c](https://github.com/torvalds/linux/blob/master/init/main.c) source code file and will see the start of the first process. +It is the end of the ninth part about the linux kernel [initialization process](/Initialization/). In this part, we looked on the initialization process of the `RCU` subsystem. In the next part we will continue to dive into linux kernel initialization process and I hope that we will finish with the `start_kernel` function and will go to the `rest_init` function from the same [init/main.c](https://github.com/torvalds/linux/blob/master/init/main.c) source code file and will see the start of the first process. If you have any questions or suggestions write me a comment or ping me at [twitter](https://twitter.com/0xAX). @@ -423,8 +423,8 @@ Links * [integer ID management](https://lwn.net/Articles/103209/) * [Documentation/memory-barriers.txt](https://www.kernel.org/doc/Documentation/memory-barriers.txt) * [Runtime locking correctness validator](https://www.kernel.org/doc/Documentation/locking/lockdep-design.txt) -* [Per-CPU variables](https://xinqiu.gitbooks.io/linux-insides-cn/content/Concepts/linux-cpu-1.html) -* [Linux kernel memory management](http://0xax.gitbooks.io/linux-insides/content/MM/index.html) +* [Per-CPU variables](/Concepts/linux-cpu-1.md) +* [Linux kernel memory management](/MM/) * [slab](http://en.wikipedia.org/wiki/Slab_allocation) * [i2c](http://en.wikipedia.org/wiki/I%C2%B2C) -* [Previous part](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-8.html) +* [Previous part](/Initialization/linux-initialization-8.md) diff --git a/Interrupts/linux-interrupts-1.md b/Interrupts/linux-interrupts-1.md index f6655f9..4b395f0 100644 --- a/Interrupts/linux-interrupts-1.md +++ b/Interrupts/linux-interrupts-1.md @@ -4,7 +4,7 @@ Introduction -------------------------------------------------------------------------------- -这是 [linux 内核揭秘](http://xinqiu.gitbooks.io/linux-insides-cn/content/) 这本书最新章节的第一部分。我们已经在这本书前面的[章节](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/index.html)中走过了漫长的道路。从内核初始化的[第一步](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-1.html)开始,结束于第一个 `init` 程序的[启动](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-10.html)。我们见证了一系列与各种内核子系统相关的初始化步骤,但是我们并没有深入这些子系统。在这一章中,我们将会试着去了解这些内核子系统是如何工作和实现的。就像你在这章标题中看到的,第一个子系统是[中断(interrupts)](http://en.wikipedia.org/wiki/Interrupt)。 +这是 [linux 内核揭秘](/) 这本书最新章节的第一部分。我们已经在这本书前面的[章节](/Initialization/)中走过了漫长的道路。从内核初始化的[第一步](/Initialization/linux-initialization-1.md)开始,结束于第一个 `init` 程序的[启动](/Initialization/linux-initialization-10.md)。我们见证了一系列与各种内核子系统相关的初始化步骤,但是我们并没有深入这些子系统。在这一章中,我们将会试着去了解这些内核子系统是如何工作和实现的。就像你在这章标题中看到的,第一个子系统是[中断(interrupts)](http://en.wikipedia.org/wiki/Interrupt)。 什么是中断? -------------------------------------------------------------------------------- @@ -37,7 +37,7 @@ Introduction BUG_ON((unsigned)n > 0xFF); ``` -你可以在 Linux 内核源码中关于中断设置的地方找到这个定义(例如:`set_intr_gate`, `void set_system_intr_gate` 在 [arch/x86/include/asm/desc.h](https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/desc.h)中)。从 `0` 到 `31` 的 32 个中断标识码被处理器保留,用作处理架构定义的异常和中断。你可以在 Linux 内核初始化程序的第二部分 - [早期中断和异常处理](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-2.html)中找到这个表和关于这些中断标识码的描述。从 `32` 到 `255` 的中断标识码设计为用户定义中断并且不被系统保留。这些中断通常分配给外部 I/O 设备,使这些设备可以发送中断给处理器。 +你可以在 Linux 内核源码中关于中断设置的地方找到这个定义(例如:`set_intr_gate`, `void set_system_intr_gate` 在 [arch/x86/include/asm/desc.h](https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/desc.h)中)。从 `0` 到 `31` 的 32 个中断标识码被处理器保留,用作处理架构定义的异常和中断。你可以在 Linux 内核初始化程序的第二部分 - [早期中断和异常处理](/Initialization/linux-initialization-2.md)中找到这个表和关于这些中断标识码的描述。从 `32` 到 `255` 的中断标识码设计为用户定义中断并且不被系统保留。这些中断通常分配给外部 I/O 设备,使这些设备可以发送中断给处理器。 现在,我们来讨论中断的类型。笼统地来讲,我们可以把中断分为两个主要类型: @@ -58,7 +58,7 @@ BUG_ON((unsigned)n > 0xFF); 最后的 `终止` 是一个从不报告引起异常的精确指令的异常,并且不允许被中断的程序继续执行。 -我们已经从前面的[部分](http://xinqiu.gitbooks.io/linux-insides-cn/content/Booting/linux-bootstrap-3.html)知道,中断可以分为 `可屏蔽的(maskable)` 和 `不可屏蔽的(non-maskable)`。可屏蔽的中断可以被阻塞,使用 `x86_64` 的指令 - `sti` 和 `cli`。我们可以在 Linux 内核代码中找到他们: +我们已经从前面的[部分](/Booting/linux-bootstrap-3.md)知道,中断可以分为 `可屏蔽的(maskable)` 和 `不可屏蔽的(non-maskable)`。可屏蔽的中断可以被阻塞,使用 `x86_64` 的指令 - `sti` 和 `cli`。我们可以在 Linux 内核代码中找到他们: ```C static inline void native_irq_disable(void) @@ -135,13 +135,13 @@ static inline void native_irq_enable(void) +--------------+-------------------------------------------------+ ``` -现在我们了解了一些关于各种类型的中断和异常的内容,是时候转到更实用的部分了。我们从 `中断描述符表(IDT)` 开始。就如之前所提到的,`IDT` 保存了中断和异常处理程序的入口指针。`IDT` 是一个类似于 `全局描述符表(Global Descriptor Table)`的结构,我们在[内核启动程序](http://xinqiu.gitbooks.io/linux-insides-cn/content/Booting/linux-bootstrap-2.html)的第二部分已经介绍过。但是他们确实有一些不同,`IDT` 的表项被称为 `门(gates)`,而不是 `描述符(descriptors)`。它可以包含下面的一种: +现在我们了解了一些关于各种类型的中断和异常的内容,是时候转到更实用的部分了。我们从 `中断描述符表(IDT)` 开始。就如之前所提到的,`IDT` 保存了中断和异常处理程序的入口指针。`IDT` 是一个类似于 `全局描述符表(Global Descriptor Table)`的结构,我们在[内核启动程序](/Booting/linux-bootstrap-2.md)的第二部分已经介绍过。但是他们确实有一些不同,`IDT` 的表项被称为 `门(gates)`,而不是 `描述符(descriptors)`。它可以包含下面的一种: * 中断门(Interrupt gates) * 任务门(Task gates) * 陷阱门(Trap gates) -在 `x86` 架构中,只有 [long mode](http://en.wikipedia.org/wiki/Long_mode) 中断门和陷阱门可以在 `x86_64` 中引用。就像 `全局描述符表`,`中断描述符表` 在 `x86` 上是一个 8 字节数组门,而在 `x86_64` 上是一个 16 字节数组门。让我们回忆在[内核启动程序](http://xinqiu.gitbooks.io/linux-insides-cn/content/Booting/linux-bootstrap-2.html)的第二部分,`全局描述符表` 必须包含 `NULL` 描述符作为它的第一个元素。与 `全局描述符表` 不一样的是,`中断描述符表` 的第一个元素可以是一个门。它并不是强制要求的。比如,你可能还记得我们只是在早期的章节中过渡到[保护模式](http://en.wikipedia.org/wiki/Protected_mode)时用 `NULL` 门加载过中断描述符表: +在 `x86` 架构中,只有 [long mode](http://en.wikipedia.org/wiki/Long_mode) 中断门和陷阱门可以在 `x86_64` 中引用。就像 `全局描述符表`,`中断描述符表` 在 `x86` 上是一个 8 字节数组门,而在 `x86_64` 上是一个 16 字节数组门。让我们回忆在[内核启动程序](/Booting/linux-bootstrap-2.md)的第二部分,`全局描述符表` 必须包含 `NULL` 描述符作为它的第一个元素。与 `全局描述符表` 不一样的是,`中断描述符表` 的第一个元素可以是一个门。它并不是强制要求的。比如,你可能还记得我们只是在早期的章节中过渡到[保护模式](http://en.wikipedia.org/wiki/Protected_mode)时用 `NULL` 门加载过中断描述符表: ```C /* @@ -284,7 +284,7 @@ struct gate_struct64 { #endif ``` -`KASan` 是一个运行时内存[调试器](http://lwn.net/Articles/618180/)。所以,如果 `CONFIG_KASAN` 被禁用,`THREAD_SIZE` 是 `16384` ;如果内核配置选项打开,`THREAD_SIZE` 的值是 `32768`。这块栈空间保存着有用的数据,只要线程是活动状态或者僵尸状态。但是当线程在用户空间的时候,这个内核栈是空的,除非 `thread_info` 结构(关于这个结构的详细信息在 Linux 内核初始程序的第四[部分](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-4.html))在这个栈空间的底部。活动的或者僵尸线程并不是在他们栈中的唯一的线程,与每一个 CPU 关联的特殊栈也存在于这个空间。当内核在这个 CPU 上执行代码的时候,这些栈处于活动状态;当在这个 CPU 上执行用户空间代码时,这些栈不包含任何有用的信息。每一个 CPU 也有一个特殊的 per-cpu 栈。首先是给外部中断使用的 `中断栈(interrupt stack)`。它的大小定义如下: +`KASan` 是一个运行时内存[调试器](http://lwn.net/Articles/618180/)。所以,如果 `CONFIG_KASAN` 被禁用,`THREAD_SIZE` 是 `16384` ;如果内核配置选项打开,`THREAD_SIZE` 的值是 `32768`。这块栈空间保存着有用的数据,只要线程是活动状态或者僵尸状态。但是当线程在用户空间的时候,这个内核栈是空的,除非 `thread_info` 结构(关于这个结构的详细信息在 Linux 内核初始程序的第四[部分](/Initialization/linux-initialization-4.md))在这个栈空间的底部。活动的或者僵尸线程并不是在他们栈中的唯一的线程,与每一个 CPU 关联的特殊栈也存在于这个空间。当内核在这个 CPU 上执行代码的时候,这些栈处于活动状态;当在这个 CPU 上执行用户空间代码时,这些栈不包含任何有用的信息。每一个 CPU 也有一个特殊的 per-cpu 栈。首先是给外部中断使用的 `中断栈(interrupt stack)`。它的大小定义如下: ```C #define IRQ_STACK_ORDER (2 + KASAN_STACK_ORDER) @@ -306,7 +306,7 @@ union irq_stack_union { 第一个 `irq_stack` 域是一个 16KB 的数组。然后你可以看到 `irq_stack_union` 联合包含了一个结构体,这个结构体有两个域: -* `gs_base` - 总是指向 `irqstack` 联合底部的 `gs` 寄存器。在 `x86_64` 中, per-cpu(更多关于 `per-cpu` 变量的信息可以阅读特定的[章节](http://xinqiu.gitbooks.io/linux-insides-cn/content/Concepts/linux-cpu-1.html)) 和 stack canary 共享 `gs` 寄存器。所有的 per-cpu 标志初始值为零,并且 `gs` 指向 per-cpu 区域的开始。你已经知道[段内存模式](http://en.wikipedia.org/wiki/Memory_segmentation)已经废除很长时间了,但是我们可以使用[特殊模块寄存器(Model specific registers)](http://en.wikipedia.org/wiki/Model-specific_register)给这两个段寄存器 - `fs` 和 `gs` 设置基址,并且这些寄存器仍然可以被用作地址寄存器。如果你记得 Linux 内核初始程序的第一[部分](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-1.html),你会记起我们设置了 `gs` 寄存器: +* `gs_base` - 总是指向 `irqstack` 联合底部的 `gs` 寄存器。在 `x86_64` 中, per-cpu(更多关于 `per-cpu` 变量的信息可以阅读特定的[章节](/Concepts/linux-cpu-1.md)) 和 stack canary 共享 `gs` 寄存器。所有的 per-cpu 标志初始值为零,并且 `gs` 指向 per-cpu 区域的开始。你已经知道[段内存模式](http://en.wikipedia.org/wiki/Memory_segmentation)已经废除很长时间了,但是我们可以使用[特殊模块寄存器(Model specific registers)](http://en.wikipedia.org/wiki/Model-specific_register)给这两个段寄存器 - `fs` 和 `gs` 设置基址,并且这些寄存器仍然可以被用作地址寄存器。如果你记得 Linux 内核初始程序的第一[部分](/Initialization/linux-initialization-1.md),你会记起我们设置了 `gs` 寄存器: ```assembly movl $MSR_GS_BASE,%ecx @@ -478,7 +478,7 @@ asmlinkage void double_fault(void); 如果你有任何问题或建议,请给我发评论或者给我发 [Twitter](https://twitter.com/0xAX)。 -**请注意英语并不是我的母语,我为任何表达不清楚的地方感到抱歉。如果你发现任何错误请发 PR 到 [linux-insides](https://github.com/hust-open-atom-club/linux-insides-zh)。(译者注:翻译问题请发 PR 到 [linux-insides-cn](https://www.gitbook.com/book/xinqiu/linux-insides-cn))** +**请注意英语并不是我的母语,我为任何表达不清楚的地方感到抱歉。如果你发现任何错误请发 PR 到 [linux-insides](https://github.com/0xAX/linux-insides)。(译者注:翻译问题请发 PR 到 [linux-insides-zh](https://github.com/hust-open-atom-club/linux-insides-zh))** 链接 @@ -493,4 +493,4 @@ asmlinkage void double_fault(void); * [segmented memory model](http://en.wikipedia.org/wiki/Memory_segmentation) * [Model specific registers](http://en.wikipedia.org/wiki/Model-specific_register) * [Stack canary](http://en.wikipedia.org/wiki/Stack_buffer_overflow#Stack_canaries) -* [Previous chapter](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/index.html) +* [Previous chapter](/Initialization/) diff --git a/Interrupts/linux-interrupts-9.md b/Interrupts/linux-interrupts-9.md index de6c35a..0ae37e5 100644 --- a/Interrupts/linux-interrupts-9.md +++ b/Interrupts/linux-interrupts-9.md @@ -511,7 +511,7 @@ insert_work(pwq, work, worklist, work_flags); 如果你有任何问题或建议,请给我发评论或者给我发 [Twitter](https://twitter.com/0xAX)。 -**请注意英语并不是我的母语,我为任何表达不清楚的地方感到抱歉。如果你发现任何错误请发 PR 到 [linux-insides](https://github.com/MintCN/linux-insides-zh)。(译者注:翻译问题请发 PR 到 [linux-insides-cn](https://github.com/hust-open-atom-club/linux-insides-zh))** +**请注意英语并不是我的母语,我为任何表达不清楚的地方感到抱歉。如果你发现任何错误请发 PR 到 [linux-insides](https://github.com/0xAX/linux-insides)。(译者注:翻译问题请发 PR 到 [linux-insides-zh](https://github.com/hust-open-atom-club/linux-insides-zh))** 链接 diff --git a/Timers/linux-timers-2.md b/Timers/linux-timers-2.md index 18cdd29..3f3c488 100644 --- a/Timers/linux-timers-2.md +++ b/Timers/linux-timers-2.md @@ -4,7 +4,7 @@ Timers and time management in the Linux kernel. Part 2. Introduction to the `clocksource` framework -------------------------------------------------------------------------------- -The previous [part](https://xinqiu.gitbooks.io/linux-insides-cn/content/Timers/linux-timers-1.html) was the first part in the current [chapter](https://xinqiu.gitbooks.io/linux-insides-cn/content/Timers/index.html) that describes timers and time management related stuff in the Linux kernel. We got acquainted with two concepts in the previous part: +The previous [part](/Timers/linux-timers-1.md) was the first part in the current [chapter](/Timers/) that describes timers and time management related stuff in the Linux kernel. We got acquainted with two concepts in the previous part: * `jiffies` * `clocksource` @@ -92,7 +92,7 @@ Within this framework, each clock source is required to maintain a representatio The clocksource structure -------------------------------------------------------------------------------- -The fundamental of the `clocksource` framework is the `clocksource` structure that defined in the [include/linux/clocksource.h](https://github.com/torvalds/linux/blob/master/include/linux/clocksource.h) header file. We already saw some fields that are provided by the `clocksource` structure in the previous [part](https://xinqiu.gitbooks.io/linux-insides-cn/content/Timers/linux-timers-1.html). Let's look on the full definition of this structure and try to describe all of its fields: +The fundamental of the `clocksource` framework is the `clocksource` structure that defined in the [include/linux/clocksource.h](https://github.com/torvalds/linux/blob/master/include/linux/clocksource.h) header file. We already saw some fields that are provided by the `clocksource` structure in the previous [part](/Timers/linux-timers-1.md). Let's look on the full definition of this structure and try to describe all of its fields: ```C struct clocksource { @@ -197,7 +197,7 @@ That's all. From this moment we know all fields of the `clocksource` structure. New clock source registration -------------------------------------------------------------------------------- -We saw only one function from the `clocksource` framework in the previous [part](https://xinqiu.gitbooks.io/linux-insides-cn/content/Timers/linux-timers-1.html). This function was - `__clocksource_register`. This function defined in the [include/linux/clocksource.h](https://github.com/torvalds/linux/tree/master/include/linux/clocksource.h) header file and as we can understand from the function's name, main point of this function is to register new clocksource. If we will look on the implementation of the `__clocksource_register` function, we will see that it just makes call of the `__clocksource_register_scale` function and returns its result: +We saw only one function from the `clocksource` framework in the previous [part](/Timers/linux-timers-1.md). This function was - `__clocksource_register`. This function defined in the [include/linux/clocksource.h](https://github.com/torvalds/linux/tree/master/include/linux/clocksource.h) header file and as we can understand from the function's name, main point of this function is to register new clocksource. If we will look on the implementation of the `__clocksource_register` function, we will see that it just makes call of the `__clocksource_register_scale` function and returns its result: ```C static inline int __clocksource_register(struct clocksource *cs) @@ -241,7 +241,7 @@ int __clocksource_register_scale(struct clocksource *cs, u32 scale, u32 freq) } ``` -First of all we can see that the `__clocksource_register_scale` function starts from the call of the `__clocksource_update_freq_scale` function that defined in the same source code file and updates given clock source with the new frequency. Let's look on the implementation of this function. In the first step we need to check given frequency and if it was not passed as `zero`, we need to calculate `mult` and `shift` parameters for the given clock source. Why do we need to check value of the `frequency`? Actually it can be zero. if you attentively looked on the implementation of the `__clocksource_register` function, you may have noticed that we passed `frequency` as `0`. We will do it only for some clock sources that have self defined `mult` and `shift` parameters. Look in the previous [part](https://xinqiu.gitbooks.io/linux-insides-cn/content/Timers/linux-timers-1.html) and you will see that we saw calculation of the `mult` and `shift` for `jiffies`. The `__clocksource_update_freq_scale` function will do it for us for other clock sources. +First of all we can see that the `__clocksource_register_scale` function starts from the call of the `__clocksource_update_freq_scale` function that defined in the same source code file and updates given clock source with the new frequency. Let's look on the implementation of this function. In the first step we need to check given frequency and if it was not passed as `zero`, we need to calculate `mult` and `shift` parameters for the given clock source. Why do we need to check value of the `frequency`? Actually it can be zero. if you attentively looked on the implementation of the `__clocksource_register` function, you may have noticed that we passed `frequency` as `0`. We will do it only for some clock sources that have self defined `mult` and `shift` parameters. Look in the previous [part](/Timers/linux-timers-1.md) and you will see that we saw calculation of the `mult` and `shift` for `jiffies`. The `__clocksource_update_freq_scale` function will do it for us for other clock sources. So in the start of the `__clocksource_update_freq_scale` function we check the value of the `frequency` parameter and if is not zero we need to calculate `mult` and `shift` for the given clock source. Let's look on the `mult` and `shift` calculation: @@ -448,4 +448,4 @@ Links * [clock rate](https://en.wikipedia.org/wiki/Clock_rate) * [mutex](https://en.wikipedia.org/wiki/Mutual_exclusion) * [sysfs](https://en.wikipedia.org/wiki/Sysfs) -* [previous part](https://xinqiu.gitbooks.io/linux-insides-cn/content/Timers/linux-timers-1.html) +* [previous part](/Timers/linux-timers-1.md) diff --git a/Timers/linux-timers-3.md b/Timers/linux-timers-3.md index 620d29d..cbfc1f2 100644 --- a/Timers/linux-timers-3.md +++ b/Timers/linux-timers-3.md @@ -4,7 +4,7 @@ Timers and time management in the Linux kernel. Part 3. The tick broadcast framework and dyntick -------------------------------------------------------------------------------- -This is third part of the [chapter](https://xinqiu.gitbooks.io/linux-insides-cn/content/Timers/index.html) which describes timers and time management related stuff in the Linux kernel and we stopped on the `clocksource` framework in the previous [part](https://xinqiu.gitbooks.io/linux-insides-cn/content/Timers/linux-timers-2.html). We have started to consider this framework because it is closely related to the special counters which are provided by the Linux kernel. One of these counters which we already saw in the first [part](https://xinqiu.gitbooks.io/linux-insides-cn/content/Timers/linux-timers-1.html) of this chapter is - `jiffies`. As I already wrote in the first part of this chapter, we will consider time management related stuff step by step during the Linux kernel initialization. Previous step was call of the: +This is third part of the [chapter](/Timers/) which describes timers and time management related stuff in the Linux kernel and we stopped on the `clocksource` framework in the previous [part](/Timers/linux-timers-2.md). We have started to consider this framework because it is closely related to the special counters which are provided by the Linux kernel. One of these counters which we already saw in the first [part](/Timers/linux-timers-1.md) of this chapter is - `jiffies`. As I already wrote in the first part of this chapter, we will consider time management related stuff step by step during the Linux kernel initialization. Previous step was call of the: ```C register_refined_jiffies(CLOCK_TICK_RATE); @@ -12,7 +12,7 @@ register_refined_jiffies(CLOCK_TICK_RATE); function which defined in the [kernel/time/jiffies.c](https://github.com/torvalds/linux/blob/master/kernel/time/jiffies.c) source code file and executes initialization of the `refined_jiffies` clock source for us. Recall that this function is called from the `setup_arch` function that defined in the [https://github.com/torvalds/linux/blob/master/arch/x86/kernel/setup.c](arch/x86/kernel/setup.c) source code and executes architecture-specific ([x86_64](https://en.wikipedia.org/wiki/X86-64) in our case) initialization. Look on the implementation of the `setup_arch` and you will note that the call of the `register_refined_jiffies` is the last step before the `setup_arch` function will finish its work. -There are many different `x86_64` specific things already configured after the end of the `setup_arch` execution. For example some early [interrupt](https://en.wikipedia.org/wiki/Interrupt) handlers already able to handle interrupts, memory space reserved for the [initrd](https://en.wikipedia.org/wiki/Initrd), [DMI](https://en.wikipedia.org/wiki/Desktop_Management_Interface) scanned, the Linux kernel log buffer is already set and this means that the [printk](https://en.wikipedia.org/wiki/Printk) function is able to work, [e820](https://en.wikipedia.org/wiki/E820) parsed and the Linux kernel already knows about available memory and and many many other architecture specific things (if you are interesting, you can read more about the `setup_arch` function and Linux kernel initialization process in the second [chapter](https://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/index.html) of this book). +There are many different `x86_64` specific things already configured after the end of the `setup_arch` execution. For example some early [interrupt](https://en.wikipedia.org/wiki/Interrupt) handlers already able to handle interrupts, memory space reserved for the [initrd](https://en.wikipedia.org/wiki/Initrd), [DMI](https://en.wikipedia.org/wiki/Desktop_Management_Interface) scanned, the Linux kernel log buffer is already set and this means that the [printk](https://en.wikipedia.org/wiki/Printk) function is able to work, [e820](https://en.wikipedia.org/wiki/E820) parsed and the Linux kernel already knows about available memory and and many many other architecture specific things (if you are interesting, you can read more about the `setup_arch` function and Linux kernel initialization process in the second [chapter](/Initialization/) of this book). Now, the `setup_arch` finished its work and we can back to the generic Linux kernel code. Recall that the `setup_arch` function was called from the `start_kernel` function which is defined in the [init/main.c](https://github.com/torvalds/linux/blob/master/init/main.c) source code file. So, we shall return to this function. You can see that there are many different function are called right after `setup_arch` function inside of the `start_kernel` function, but since our chapter is devoted to timers and time management related stuff, we will skip all code which is not related to this topic. The first function which is related to the time management in the Linux kernel is: @@ -42,7 +42,7 @@ void __init tick_init(void) As you can understand from the paragraph's title, we are interesting only in the `tick_broadcast_init` function for now. This function defined in the [kernel/time/tick-broadcast.c](https://github.com/torvalds/linux/blob/master/kernel/time/tick-broadcast.c) source code file and executes initialization of the `tick broadcast` framework related data structures. Before we will look on the implementation of the `tick_broadcast_init` function and will try to understand what does this function do, we need to know about `tick broadcast` framework. -Main point of a central processor is to execute programs. But sometimes a processor may be in a special state when it is not being used by any program. This special state is called - [idle](https://en.wikipedia.org/wiki/Idle_%28CPU%29). When the processor has no anything to execute, the Linux kernel launches `idle` task. We already saw a little about this in the last part of the [Linux kernel initialization process](https://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-10.html). When the Linux kernel will finish all initialization processes in the `start_kernel` function from the [init/main.c](https://github.com/torvalds/linux/blob/master/init/main.c) source code file, it will call the `rest_init` function from the same source code file. Main point of this function is to launch kernel `init` thread and the `kthreadd` thread, to call the `schedule` function to start task scheduling and to go to sleep by calling the `cpu_idle_loop` function that defined in the [kernel/sched/idle.c](https://github.com/torvalds/linux/blob/master/kernel/sched/idle.c) source code file. +Main point of a central processor is to execute programs. But sometimes a processor may be in a special state when it is not being used by any program. This special state is called - [idle](https://en.wikipedia.org/wiki/Idle_%28CPU%29). When the processor has no anything to execute, the Linux kernel launches `idle` task. We already saw a little about this in the last part of the [Linux kernel initialization process](/Initialization/linux-initialization-10.md). When the Linux kernel will finish all initialization processes in the `start_kernel` function from the [init/main.c](https://github.com/torvalds/linux/blob/master/init/main.c) source code file, it will call the `rest_init` function from the same source code file. Main point of this function is to launch kernel `init` thread and the `kthreadd` thread, to call the `schedule` function to start task scheduling and to go to sleep by calling the `cpu_idle_loop` function that defined in the [kernel/sched/idle.c](https://github.com/torvalds/linux/blob/master/kernel/sched/idle.c) source code file. The `cpu_idle_loop` function represents infinite loop which checks the need for rescheduling on each iteration. After the scheduler finds something to execute, the `idle` process will finish its work and the control will be moved to a new runnable task with the call of the `schedule_preempt_disabled` function: @@ -102,7 +102,7 @@ void __init tick_broadcast_init(void) } ``` -As we can see, the `tick_broadcast_init` function allocates different [cpumasks](https://xinqiu.gitbooks.io/linux-insides-cn/content/Concepts/linux-cpu-2.html) with the help of the `zalloc_cpumask_var` function. The `zalloc_cpumask_var` function defined in the [lib/cpumask.c](https://github.com/torvalds/linux/blob/master/lib/cpumask.c) source code file and expands to the call of the following function: +As we can see, the `tick_broadcast_init` function allocates different [cpumasks](/Concepts/linux-cpu-2.md) with the help of the `zalloc_cpumask_var` function. The `zalloc_cpumask_var` function defined in the [lib/cpumask.c](https://github.com/torvalds/linux/blob/master/lib/cpumask.c) source code file and expands to the call of the following function: ```C bool zalloc_cpumask_var(cpumask_var_t *mask, gfp_t flags) @@ -243,7 +243,7 @@ int tick_device_uses_broadcast(struct clock_event_device *dev, int cpu) } ``` -More about the `smp_processor_id` macro you can read in the fourth [part](https://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-4.html) of the Linux kernel initialization process chapter. +More about the `smp_processor_id` macro you can read in the fourth [part](/Initialization/linux-initialization-4.md) of the Linux kernel initialization process chapter. The `tick_broadcast_start_periodic` function check the given `clock event` device and call the `tick_setup_periodic` function: @@ -407,7 +407,7 @@ for_each_cpu(cpu, tick_nohz_full_mask) context_tracking_cpu_set(cpu); ``` -The `context_tracking_cpu_set` function defined in the [kernel/context_tracking.c](https://github.com/torvalds/linux/blob/master/kernel/context_tracking.c) source code file and main point of this function is to set the `context_tracking.active` [percpu](https://xinqiu.gitbooks.io/linux-insides-cn/content/Concepts/linux-cpu-1.html) variable to `true`. When the `active` field will be set to `true` for the certain processor, all [context switches](https://en.wikipedia.org/wiki/Context_switch) will be ignored by the Linux kernel context tracking subsystem for this processor. +The `context_tracking_cpu_set` function defined in the [kernel/context_tracking.c](https://github.com/torvalds/linux/blob/master/kernel/context_tracking.c) source code file and main point of this function is to set the `context_tracking.active` [percpu](/Concepts/linux-cpu-1.md) variable to `true`. When the `active` field will be set to `true` for the certain processor, all [context switches](https://en.wikipedia.org/wiki/Context_switch) will be ignored by the Linux kernel context tracking subsystem for this processor. That's all. This is the end of the `tick_nohz_init` function. After this `NO_HZ` related data structures will be initialzed. We didn't see API of the `NO_HZ` mode, but will see it soon. @@ -433,12 +433,12 @@ Links * [CPU idle](https://en.wikipedia.org/wiki/Idle_%28CPU%29) * [power management](https://en.wikipedia.org/wiki/Power_management) * [NO_HZ documentation](https://github.com/torvalds/linux/blob/master/Documentation/timers/NO_HZ.txt) -* [cpumasks](https://xinqiu.gitbooks.io/linux-insides-cn/content/Concepts/linux-cpu-2.html) +* [cpumasks](/Concepts/linux-cpu-2.md) * [high precision event timer](https://en.wikipedia.org/wiki/High_Precision_Event_Timer) * [irq](https://en.wikipedia.org/wiki/Interrupt_request_%28PC_architecture%29) * [IPI](https://en.wikipedia.org/wiki/Inter-processor_interrupt) * [CPUID](https://en.wikipedia.org/wiki/CPUID) * [APIC](https://en.wikipedia.org/wiki/Advanced_Programmable_Interrupt_Controller) -* [percpu](https://xinqiu.gitbooks.io/linux-insides-cn/content/Concepts/linux-cpu-1.html) +* [percpu](/Concepts/linux-cpu-1.md) * [context switches](https://en.wikipedia.org/wiki/Context_switch) -* [Previous part](https://xinqiu.gitbooks.io/linux-insides-cn/content/Timers/linux-timers-2.html) +* [Previous part](/Timers/linux-timers-2.md) diff --git a/Timers/linux-timers-4.md b/Timers/linux-timers-4.md index 458592a..7d4f668 100644 --- a/Timers/linux-timers-4.md +++ b/Timers/linux-timers-4.md @@ -4,7 +4,7 @@ Timers and time management in the Linux kernel. Part 4. Timers -------------------------------------------------------------------------------- -This is fourth part of the [chapter](https://xinqiu.gitbooks.io/linux-insides-cn/content/Timers/index.html) which describes timers and time management related stuff in the Linux kernel and in the previous [part](https://xinqiu.gitbooks.io/linux-insides-cn/content/Timers/linux-timers-3.html) we knew about the `tick broadcast` framework and `NO_HZ` mode in the Linux kernel. We will continue to dive into the time management related stuff in the Linux kernel in this part and will be acquainted with yet another concept in the Linux kernel - `timers`. Before we will look at timers in the Linux kernel, we have to learn some theory about this concept. Note that we will consider software timers in this part. +This is fourth part of the [chapter](/Timers/) which describes timers and time management related stuff in the Linux kernel and in the previous [part](/Timers/linux-timers-3.md) we knew about the `tick broadcast` framework and `NO_HZ` mode in the Linux kernel. We will continue to dive into the time management related stuff in the Linux kernel in this part and will be acquainted with yet another concept in the Linux kernel - `timers`. Before we will look at timers in the Linux kernel, we have to learn some theory about this concept. Note that we will consider software timers in this part. The Linux kernel provides a `software timer` concept to allow to kernel functions could be invoked at future moment. Timers are widely used in the Linux kernel. For example, look in the [net/netfilter/ipset/ip_set_list_set.c](https://github.com/torvalds/linux/blob/master/net/netfilter/ipset/ip_set_list_set.c) source code file. This source code file provides implementation of the framework for the managing of groups of [IP](https://en.wikipedia.org/wiki/Internet_Protocol) addresses. @@ -45,7 +45,7 @@ Now let's continue to research source code of Linux kernel which is related to t Introduction to dynamic timers in the Linux kernel -------------------------------------------------------------------------------- -As I already wrote, we knew about the `tick broadcast` framework and `NO_HZ` mode in the previous [part](https://xinqiu.gitbooks.io/linux-insides-cn/content/Timers/linux-timers-3.html). They will be initialized in the [init/main.c](https://github.com/torvalds/linux/blob/master/init/main.c) source code file by the call of the `tick_init` function. If we will look at this source code file, we will see that the next time management related function is: +As I already wrote, we knew about the `tick broadcast` framework and `NO_HZ` mode in the previous [part](/Timers/linux-timers-3.md). They will be initialized in the [init/main.c](https://github.com/torvalds/linux/blob/master/init/main.c) source code file by the call of the `tick_init` function. If we will look at this source code file, we will see that the next time management related function is: ```C init_timers(); @@ -75,7 +75,7 @@ static void __init init_timer_cpus(void) } ``` -If you do not know or do not remember what is it a `possible` cpu, you can read the special [part](https://xinqiu.gitbooks.io/linux-insides-cn/content/Concepts/linux-cpu-2.html) of this book which describes `cpumask` concept in the Linux kernel. In short words, a `possible` processor is a processor which can be plugged in anytime during the life of the system. +If you do not know or do not remember what is it a `possible` cpu, you can read the special [part](/Concepts/linux-cpu-2.md) of this book which describes `cpumask` concept in the Linux kernel. In short words, a `possible` processor is a processor which can be plugged in anytime during the life of the system. The `init_timer_cpu` function does main work for us, namely it executes initialization of the `tvec_base` structure for each processor. This structure defined in the [kernel/time/timer.c](https://github.com/torvalds/linux/blob/master/kernel/time/timer.c) source code file and stores data related to a `dynamic` timer for a certain processor. Let's look on the definition of this structure: @@ -136,13 +136,13 @@ static void __init init_timer_cpu(int cpu) } ``` -The `tvec_bases` represents [per-cpu](https://xinqiu.gitbooks.io/linux-insides-cn/content/Concepts/linux-cpu-1.html) variable which represents main data structure for a dynamic timer for a given processor. This `per-cpu` variable defined in the same source code file: +The `tvec_bases` represents [per-cpu](/Concepts/linux-cpu-1.md) variable which represents main data structure for a dynamic timer for a given processor. This `per-cpu` variable defined in the same source code file: ```C static DEFINE_PER_CPU(struct tvec_base, tvec_bases); ``` -First of all we're getting the address of the `tvec_bases` for the given processor to `base` variable and as we got it, we are starting to initialize some of the `tvec_base` fields in the `init_timer_cpu` function. After initialization of the `per-cpu` dynamic timers with the [jiffies](https://xinqiu.gitbooks.io/linux-insides-cn/content/Timers/linux-timers-1.html) and the number of a possible processor, we need to initialize a `tstats_lookup_lock` [spinlock](https://en.wikipedia.org/wiki/Spinlock) in the `init_timer_stats` function: +First of all we're getting the address of the `tvec_bases` for the given processor to `base` variable and as we got it, we are starting to initialize some of the `tvec_base` fields in the `init_timer_cpu` function. After initialization of the `per-cpu` dynamic timers with the [jiffies](/Timers/linux-timers-1.md) and the number of a possible processor, we need to initialize a `tstats_lookup_lock` [spinlock](https://en.wikipedia.org/wiki/Spinlock) in the `init_timer_stats` function: ```C void __init init_timer_stats(void) @@ -240,7 +240,7 @@ The last step in the `init_timers` function is the call of the: open_softirq(TIMER_SOFTIRQ, run_timer_softirq); ``` -function. The `open_softirq` function may be already familar to you if you have read the ninth [part](https://xinqiu.gitbooks.io/linux-insides-cn/content/Interrupts/linux-interrupts-9.html) about the interrupts and interrupt handling in the Linux kernel. In short words, the `open_softirq` function defined in the [kernel/softirq.c](https://github.com/torvalds/linux/blob/master/kernel/softirq.c) source code file and executes initialization of the deferred interrupt handler. +function. The `open_softirq` function may be already familar to you if you have read the ninth [part](/Interrupts/linux-interrupts-9.md) about the interrupts and interrupt handling in the Linux kernel. In short words, the `open_softirq` function defined in the [kernel/softirq.c](https://github.com/torvalds/linux/blob/master/kernel/softirq.c) source code file and executes initialization of the deferred interrupt handler. In our case the deferred function is the `run_timer_softirq` function that is will be called after a hardware interrupt in the `do_IRQ` function which defined in the [arch/x86/kernel/irq.c](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/irq.c) source code file. The main point of this function is to handle a software dynamic timer. The Linux kernel does not do this thing during the hardware timer interrupt handling because this is time consuming operation. @@ -256,7 +256,7 @@ static void run_timer_softirq(struct softirq_action *h) } ``` -At the beginning of the `run_timer_softirq` function we get a `dynamic` timer for a current processor and compares the current value of the [jiffies](https://xinqiu.gitbooks.io/linux-insides-cn/content/Timers/linux-timers-1.html) with the value of the `timer_jiffies` for the current structure by the call of the `time_after_eq` macro which is defined in the [include/linux/jiffies.h](https://github.com/torvalds/linux/blob/master/include/linux/jiffies.h) header file: +At the beginning of the `run_timer_softirq` function we get a `dynamic` timer for a current processor and compares the current value of the [jiffies](/Timers/linux-timers-1.md) with the value of the `timer_jiffies` for the current structure by the call of the `time_after_eq` macro which is defined in the [include/linux/jiffies.h](https://github.com/torvalds/linux/blob/master/include/linux/jiffies.h) header file: ```C #define time_after_eq(a,b) \ @@ -418,10 +418,10 @@ Links * [IP](https://en.wikipedia.org/wiki/Internet_Protocol) * [netfilter](https://en.wikipedia.org/wiki/Netfilter) * [network](https://en.wikipedia.org/wiki/Computer_network) -* [cpumask](https://xinqiu.gitbooks.io/linux-insides-cn/content/Concepts/linux-cpu-2.html) +* [cpumask](/Concepts/linux-cpu-2.md) * [interrupt](https://en.wikipedia.org/wiki/Interrupt) -* [jiffies](https://xinqiu.gitbooks.io/linux-insides-cn/content/Timers/linux-timers-1.html) -* [per-cpu](https://xinqiu.gitbooks.io/linux-insides-cn/content/Concepts/linux-cpu-1.html) +* [jiffies](/Timers/linux-timers-1.md) +* [per-cpu](/Concepts/linux-cpu-1.md) * [spinlock](https://en.wikipedia.org/wiki/Spinlock) * [procfs](https://en.wikipedia.org/wiki/Procfs) -* [previous part](https://xinqiu.gitbooks.io/linux-insides-cn/content/Timers/linux-timers-3.html) +* [previous part](/Timers/linux-timers-3.md) diff --git a/Timers/linux-timers-5.md b/Timers/linux-timers-5.md index e70e081..b7b88ea 100644 --- a/Timers/linux-timers-5.md +++ b/Timers/linux-timers-5.md @@ -4,7 +4,7 @@ Timers and time management in the Linux kernel. Part 5. Introduction to the `clockevents` framework -------------------------------------------------------------------------------- -This is fifth part of the [chapter](https://xinqiu.gitbooks.io/linux-insides-cn/content/Timers/index.html) which describes timers and time management related stuff in the Linux kernel. As you might noted from the title of this part, the `clockevents` framework will be discussed. We already saw one framework in the [second](https://xinqiu.gitbooks.io/linux-insides-cn/content/Timers/linux-timers-2.html) part of this chapter. It was `clocksource` framework. Both of these frameworks represent timekeeping abstractions in the Linux kernel. +This is fifth part of the [chapter](/Timers/) which describes timers and time management related stuff in the Linux kernel. As you might noted from the title of this part, the `clockevents` framework will be discussed. We already saw one framework in the [second](/Timers/linux-timers-2.md) part of this chapter. It was `clocksource` framework. Both of these frameworks represent timekeeping abstractions in the Linux kernel. At first let's refresh your memory and try to remember what is it `clocksource` framework and and what its purpose. The main goal of the `clocksource` framework is to provide `timeline`. As described in the [documentation](https://github.com/0xAX/linux/blob/master/Documentation/timers/timekeeping.txt): @@ -130,7 +130,7 @@ The next two fields `shift` and `mult` are familiar to us. They will be used to #define cpumask_of(cpu) (get_cpu_mask(cpu)) ``` -Where the `get_cpu_mask` returns the cpumask containing just a given `cpu` number. More about `cpumasks` concept you may read in the [CPU masks in the Linux kernel](https://xinqiu.gitbooks.io/linux-insides-cn/content/Concepts/linux-cpu-2.html) part. In the last four lines of code we set callbacks for the clock event device suspend/resume, device shutdown and update of the clock event device state. +Where the `get_cpu_mask` returns the cpumask containing just a given `cpu` number. More about `cpumasks` concept you may read in the [CPU masks in the Linux kernel](/Concepts/linux-cpu-2.md) part. In the last four lines of code we set callbacks for the clock event device suspend/resume, device shutdown and update of the clock event device state. After we finished with the initialization of the `at91sam926x` periodic timer, we can register it by the call of the following functions: @@ -185,7 +185,7 @@ if (!dev->cpumask) { } ``` -Remember that we have set the `cpumask` of the `at91sam926x` periodic timer to first processor. If the `cpumask` field is zero, we check the number of possible processors in the system and print warning message if it is less than on. Additionally we set the `cpumask` of the given clock event device to the current processor. If you are interested in how the `smp_processor_id` macro is implemented, you can read more about it in the fourth [part](https://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-4.html) of the Linux kernel initialization process chapter. +Remember that we have set the `cpumask` of the `at91sam926x` periodic timer to first processor. If the `cpumask` field is zero, we check the number of possible processors in the system and print warning message if it is less than on. Additionally we set the `cpumask` of the given clock event device to the current processor. If you are interested in how the `smp_processor_id` macro is implemented, you can read more about it in the fourth [part](/Initialization/linux-initialization-4.md) of the Linux kernel initialization process chapter. After this check we lock the actual code of the clock event device registration by the call following macros: @@ -385,7 +385,7 @@ That's all. Conclusion ------------------------------------------------------------------------------- -This is the end of the fifth part of the [chapter](https://xinqiu.gitbooks.io/linux-insides-cn/content/Timers/index.html) that describes timers and timer management related stuff in the Linux kernel. In the previous part got acquainted with the `timers` concept. In this part we continued to learn time management related stuff in the Linux kernel and saw a little about yet another framework - `clockevents`. +This is the end of the fifth part of the [chapter](/Timers/) that describes timers and timer management related stuff in the Linux kernel. In the previous part got acquainted with the `timers` concept. In this part we continued to learn time management related stuff in the Linux kernel and saw a little about yet another framework - `clockevents`. If you have questions or suggestions, feel free to ping me in twitter [0xAX](https://twitter.com/0xAX), drop me [email](anotherworldofworld@gmail.com) or just create [issue](https://github.com/hust-open-atom-club/linux-insides-zh/issues/new). @@ -409,7 +409,7 @@ Links * [local APIC](https://en.wikipedia.org/wiki/Advanced_Programmable_Interrupt_Controller) * [C3 state](https://en.wikipedia.org/wiki/Advanced_Configuration_and_Power_Interface#Device_states) * [Periodic Interval Timer (PIT) for at91sam926x](http://www.atmel.com/Images/doc6062.pdf) -* [CPU masks in the Linux kernel](https://xinqiu.gitbooks.io/linux-insides-cn/content/Concepts/linux-cpu-2.html) +* [CPU masks in the Linux kernel](/Concepts/linux-cpu-2.md) * [deadlock](https://en.wikipedia.org/wiki/Deadlock) * [CPU hotplug](https://www.kernel.org/doc/Documentation/cpu-hotplug.txt) -* [previous part](https://xinqiu.gitbooks.io/linux-insides-cn/content/Timers/linux-timers-3.html) +* [previous part](/Timers/linux-timers-3.md) diff --git a/Timers/linux-timers-7.md b/Timers/linux-timers-7.md index 12dab7c..2f6dabb 100644 --- a/Timers/linux-timers-7.md +++ b/Timers/linux-timers-7.md @@ -12,7 +12,7 @@ This is the seventh and last part [chapter](/Timers/) which describes timers and We will start from simple userspace [C](https://en.wikipedia.org/wiki/C_%28programming_language%29) program and see all way from the call of the [standard library](https://en.wikipedia.org/wiki/Standard_library) function to the implementation of certain system call. As each [architecture](https://github.com/torvalds/linux/tree/master/arch) provides its own implementation of certain system call, we will consider only [x86_64](https://en.wikipedia.org/wiki/X86-64) specific implementations of system calls, as this book is related to this architecture. -Additionally we will not consider concept of system calls in this part, but only implementations of these three system calls in the Linux kernel. If you are interested in what is it a `system call`, there is special [chapter](https://xinqiu.gitbooks.io/linux-insides-cn/content/SysCall/index.html) about this. +Additionally we will not consider concept of system calls in this part, but only implementations of these three system calls in the Linux kernel. If you are interested in what is it a `system call`, there is special [chapter](/SysCall/) about this. So, let's from the `gettimeofday` system call. @@ -57,7 +57,7 @@ The second parameter of the `gettimeofday` function is pointer to the `timezone` Current date/time: 03-26-2016/16:42:02 ``` -As you already may know, an userspace application does not call a system call directly from the kernel space. Before the actual system call entry will be called, we call a function from the standard library. In my case it is [glibc](https://en.wikipedia.org/wiki/GNU_C_Library), so I will consider this case. The implementation of the `gettimeofday` function is located in the [sysdeps/unix/sysv/linux/x86/gettimeofday.c](https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/x86/gettimeofday.c;h=36f7c26ffb0e818709d032c605fec8c4bd22a14e;hb=HEAD) source code file. As you already may know, the `gettimeofday` is not usual system call. It is located in the special area which is called `vDSO` (you can read more about it in the [part](https://xinqiu.gitbooks.io/linux-insides-cn/content/SysCall/linux-syscall-3.html) which describes this concept). +As you already may know, an userspace application does not call a system call directly from the kernel space. Before the actual system call entry will be called, we call a function from the standard library. In my case it is [glibc](https://en.wikipedia.org/wiki/GNU_C_Library), so I will consider this case. The implementation of the `gettimeofday` function is located in the [sysdeps/unix/sysv/linux/x86/gettimeofday.c](https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/x86/gettimeofday.c;h=36f7c26ffb0e818709d032c605fec8c4bd22a14e;hb=HEAD) source code file. As you already may know, the `gettimeofday` is not usual system call. It is located in the special area which is called `vDSO` (you can read more about it in the [part](/SysCall/linux-syscall-3.md) which describes this concept). The `glibc` implementation of the `gettimeofday` tries to resolve the given symbol, in our case this symbol is `__vdso_gettimeofday` by the call of the `_dl_vdso_vsym` internal function. If the symbol will not be resolved, it returns `NULL` and we fallback to the call of the usual system call: @@ -356,7 +356,7 @@ SYSCALL_DEFINE2(nanosleep, struct timespec __user *, rqtp, } ``` -More about the `SYSCALL_DEFINE2` macro you may read in the [chapter](https://xinqiu.gitbooks.io/linux-insides-cn/content/SysCall/index.html) about system calls. If we look at the implementation of the `nanosleep` system call, first of all we will see that it starts from the call of the `copy_from_user` function. This function copies the given data from the userspace to kernelspace. In our case we copy timeout value to sleep to the kernelspace `timespec` structure and check that the given `timespec` is valid by the call of the `timesc_valid` function: +More about the `SYSCALL_DEFINE2` macro you may read in the [chapter](/SysCall/) about system calls. If we look at the implementation of the `nanosleep` system call, first of all we will see that it starts from the call of the `copy_from_user` function. This function copies the given data from the userspace to kernelspace. In our case we copy timeout value to sleep to the kernelspace `timespec` structure and check that the given `timespec` is valid by the call of the `timesc_valid` function: ```C static inline bool timespec_valid(const struct timespec *ts) @@ -369,7 +369,7 @@ static inline bool timespec_valid(const struct timespec *ts) } ``` -which just checks that the given `timespec` does not represent date before `1970` and nanoseconds does not overflow `1` second. The `nanosleep` function ends with the call of the `hrtimer_nanosleep` function from the same source code file. The `hrtimer_nanosleep` function creates a [timer](https://xinqiu.gitbooks.io/linux-insides-cn/content/Timers/linux-timers-4.html) and calls the `do_nanosleep` function. The `do_nanosleep` does main job for us. This function provides loop: +which just checks that the given `timespec` does not represent date before `1970` and nanoseconds does not overflow `1` second. The `nanosleep` function ends with the call of the `hrtimer_nanosleep` function from the same source code file. The `hrtimer_nanosleep` function creates a [timer](/Timers/linux-timers-4.md) and calls the `do_nanosleep` function. The `do_nanosleep` does main job for us. This function provides loop: ```C do { @@ -392,7 +392,7 @@ That's all. Conclusion -------------------------------------------------------------------------------- -This is the end of the seventh part of the [chapter](https://xinqiu.gitbooks.io/linux-insides-cn/content/Timers/index.html) that describes timers and timer management related stuff in the Linux kernel. In the previous part we saw [x86_64](https://en.wikipedia.org/wiki/X86-64) specific clock sources. As I wrote in the beginning, this part is the last part of this chapter. We saw important time management related concepts like `clocksource` and `clockevents` frameworks, `jiffies` counter and etc., in this chpater. Of course this does not cover all of the time management in the Linux kernel. Many parts of this mostly related to the scheduling which we will see in other chapter. +This is the end of the seventh part of the [chapter](/Timers/) that describes timers and timer management related stuff in the Linux kernel. In the previous part we saw [x86_64](https://en.wikipedia.org/wiki/X86-64) specific clock sources. As I wrote in the beginning, this part is the last part of this chapter. We saw important time management related concepts like `clocksource` and `clockevents` frameworks, `jiffies` counter and etc., in this chpater. Of course this does not cover all of the time management in the Linux kernel. Many parts of this mostly related to the scheduling which we will see in other chapter. If you have questions or suggestions, feel free to ping me in twitter [0xAX](https://twitter.com/0xAX), drop me [email](anotherworldofworld@gmail.com) or just create [issue](https://github.com/hust-open-atom-club/linux-insides-zh/issues/new). @@ -412,10 +412,10 @@ Links * [register](https://en.wikipedia.org/wiki/Processor_register) * [System V Application Binary Interface](http://www.x86-64.org/documentation/abi.pdf) * [context switch](https://en.wikipedia.org/wiki/Context_switch) -* [Introduction to timers in the Linux kernel](https://xinqiu.gitbooks.io/linux-insides-cn/content/Timers/linux-timers-4.html) +* [Introduction to timers in the Linux kernel](/Timers/linux-timers-4.md) * [uptime](https://en.wikipedia.org/wiki/Uptime#Using_uptime) * [system calls table for x86_64](https://github.com/torvalds/linux/blob/master/arch/x86/entry/syscalls/syscall_64.tbl) * [High Precision Event Timer](https://en.wikipedia.org/wiki/High_Precision_Event_Timer) * [Time Stamp Counter](https://en.wikipedia.org/wiki/Time_Stamp_Counter) * [x86_64](https://en.wikipedia.org/wiki/X86-64) -* [previous part](https://xinqiu.gitbooks.io/linux-insides-cn/content/Timers/linux-timers-6.html) +* [previous part](/Timers/linux-timers-6.md)