mirror of
https://github.com/MintCN/linux-insides-zh.git
synced 2026-05-05 12:05:23 +08:00
remove all gitbook links
This commit is contained in:
@@ -4,11 +4,11 @@
|
||||
踏入内核代码的第一步(TODO: Need proofreading)
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
[上一章](https://xinqiu.gitbooks.io/linux-insides-cn/content/Booting/linux-bootstrap-5.html)是[引导过程](https://xinqiu.gitbooks.io/linux-insides-cn/content/Booting/index.html)的最后一部分。从现在开始,我们将深入探究 Linux 内核的初始化过程。在解压缩完 Linux 内核镜像、并把它妥善地放入内存后,内核就开始工作了。我们在第一章中介绍了 Linux 内核引导程序,它的任务就是为执行内核代码做准备。而在本章中,我们将探究内核代码,看一看内核的初始化过程——即在启动 [PID](https://en.wikipedia.org/wiki/Process_identifier) 为 `1` 的 `init` 进程前,内核所做的大量工作。
|
||||
[上一章](/Booting/linux-bootstrap-5.md)是[引导过程](/Booting/)的最后一部分。从现在开始,我们将深入探究 Linux 内核的初始化过程。在解压缩完 Linux 内核镜像、并把它妥善地放入内存后,内核就开始工作了。我们在第一章中介绍了 Linux 内核引导程序,它的任务就是为执行内核代码做准备。而在本章中,我们将探究内核代码,看一看内核的初始化过程——即在启动 [PID](https://en.wikipedia.org/wiki/Process_identifier) 为 `1` 的 `init` 进程前,内核所做的大量工作。
|
||||
|
||||
本章的内容很多,介绍了在内核启动前的所有准备工作。[arch/x86/kernel/head_64.S](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/head_64.S) 文件中定义了内核入口点,我们会从这里开始,逐步地深入下去。在 `start_kernel` 函数(定义在 [init/main.c](https://github.com/torvalds/linux/blob/master/init/main.c#L489)) 执行之前,我们会看到很多的初期的初始化过程,例如初期页表初始化、切换到一个新的内核空间描述符等等。
|
||||
|
||||
在[上一章](https://xinqiu.gitbooks.io/linux-insides-cn/content/Booting/index.html)的[最后一节](https://xinqiu.gitbooks.io/linux-insides-cn/content/Booting/linux-bootstrap-5.html)中,我们跟踪到了 [arch/x86/boot/compressed/head_64.S](https://github.com/torvalds/linux/blob/master/arch/x86/boot/compressed/head_64.S) 文件中的 [jmp](https://github.com/torvalds/linux/blob/master/arch/x86/boot/compressed/head_64.S) 指令:
|
||||
在[上一章](/Booting/)的[最后一节](/Booting/linux-bootstrap-5.md)中,我们跟踪到了 [arch/x86/boot/compressed/head_64.S](https://github.com/torvalds/linux/blob/master/arch/x86/boot/compressed/head_64.S) 文件中的 [jmp](https://github.com/torvalds/linux/blob/master/arch/x86/boot/compressed/head_64.S) 指令:
|
||||
|
||||
```assembly
|
||||
jmp *%rax
|
||||
@@ -90,7 +90,7 @@ rbp = 0x1000000 - (0xffffffff81000000 - 0xffffffff80000000)
|
||||
jnz bad_address
|
||||
```
|
||||
|
||||
在这里我们将 `rbp` 寄存器的低32位与 `PMD_PAGE_MASK` 进行比较。`PMD_PAGE_MASK` 代表中层页目录(`Page middle directory`)屏蔽位(相关信息请阅读 [paging](http://xinqiu.gitbooks.io/linux-insides-cn/content/Theory/linux-theory-1.html) 一节),它的定义如下:
|
||||
在这里我们将 `rbp` 寄存器的低32位与 `PMD_PAGE_MASK` 进行比较。`PMD_PAGE_MASK` 代表中层页目录(`Page middle directory`)屏蔽位(相关信息请阅读 [paging](/Theory/linux-theory-1.md) 一节),它的定义如下:
|
||||
|
||||
```C
|
||||
#define PMD_PAGE_MASK (~(PMD_PAGE_SIZE-1))
|
||||
@@ -162,7 +162,7 @@ NEXT_PAGE(level1_fixmap_pgt)
|
||||
_PAGE_ACCESSED | _PAGE_DIRTY)
|
||||
```
|
||||
|
||||
更多信息请阅读 [分页](http://xinqiu.gitbooks.io/linux-insides-cn/content/Theory/linux-theory-1.html) 部分.
|
||||
更多信息请阅读 [分页](/Theory/linux-theory-1.md) 部分.
|
||||
|
||||
`level3_kernel_pgt` 中保存的两项用来映射内核空间,在它的前 `510`(即 `L3_START_KERNEL`)项均为 `0`。这里的 `L3_START_KERNEL` 保存的是在上层页目录(Page Upper Directory)中包含`__START_KERNEL_map` 地址的那一条索引,它等于 `510`。后面一项 `level2_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE` 中的 `level2_kernel_pgt` 比较容易理解,它是一条页表项,包含了指向中层页目录的指针,它用来映射内核空间,并且具有如下的访问权限:
|
||||
|
||||
@@ -491,7 +491,7 @@ INIT_PER_CPU(gdt_page);
|
||||
|
||||
`INIT_PER_CPU` 扩展后也将得到 `init_per_cpu__gdt_page` 并将它的值设置为相对于 `__per_cpu_load` 的偏移量。这样,我们就得到了新GDT的正确的基地址。
|
||||
|
||||
per-CPU变量是2.6内核中的特性。顾名思义,当我们创建一个 `per-CPU` 变量时,每个CPU都会拥有一份它自己的拷贝,在这里我们创建的是 `gdt_page` per-CPU变量。这种类型的变量有很多有点,比如由于每个CPU都只访问自己的变量而不需要锁等。因此在多处理器的情况下,每一个处理器核心都将拥有一份自己的 `GDT` 表,其中的每一项都代表了一块内存,这块内存可以由在这个核心上运行的线程访问。这里 [Concepts/per-cpu](https://xinqiu.gitbooks.io/linux-insides-cn/content/Concepts/linux-cpu-1.html) 有关于 `per-CPU` 变量的更详细的介绍。
|
||||
per-CPU变量是2.6内核中的特性。顾名思义,当我们创建一个 `per-CPU` 变量时,每个CPU都会拥有一份它自己的拷贝,在这里我们创建的是 `gdt_page` per-CPU变量。这种类型的变量有很多有点,比如由于每个CPU都只访问自己的变量而不需要锁等。因此在多处理器的情况下,每一个处理器核心都将拥有一份自己的 `GDT` 表,其中的每一项都代表了一块内存,这块内存可以由在这个核心上运行的线程访问。这里 [Concepts/per-cpu](/Concepts/linux-cpu-1.md) 有关于 `per-CPU` 变量的更详细的介绍。
|
||||
|
||||
在加载好了新的全局描述附表之后,跟之前一样我们重新加载一下各个段:
|
||||
|
||||
@@ -620,7 +620,7 @@ write_cr3(__pa_nodebug(early_level4_pgt));
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
* [Model Specific Register](http://en.wikipedia.org/wiki/Model-specific_register)
|
||||
* [Paging](http://xinqiu.gitbooks.io/linux-insides-cn/content/Theory/linux-theory-1.html)
|
||||
* [Previous part - Kernel decompression](http://xinqiu.gitbooks.io/linux-insides-cn/content/Booting/linux-bootstrap-5.html)
|
||||
* [Paging](/Theory/linux-theory-1.md)
|
||||
* [Previous part - Kernel decompression](/Booting/linux-bootstrap-5.md)
|
||||
* [NX](http://en.wikipedia.org/wiki/NX_bit)
|
||||
* [ASLR](http://en.wikipedia.org/wiki/Address_space_layout_randomization)
|
||||
|
||||
@@ -4,7 +4,7 @@ Kernel initialization. Part 10.
|
||||
End of the linux kernel initialization process
|
||||
================================================================================
|
||||
|
||||
This is tenth part of the chapter about linux kernel [initialization process](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/index.html) and in the [previous part](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-9.html) we saw the initialization of the [RCU](http://en.wikipedia.org/wiki/Read-copy-update) and stopped on the call of the `acpi_early_init` function. This part will be the last part of the [Kernel initialization process](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/index.html) chapter, so let's finish it.
|
||||
This is tenth part of the chapter about linux kernel [initialization process](/Initialization/) and in the [previous part](/Initialization/linux-initialization-9.md) we saw the initialization of the [RCU](http://en.wikipedia.org/wiki/Read-copy-update) and stopped on the call of the `acpi_early_init` function. This part will be the last part of the [Kernel initialization process](/Initialization/index.md) chapter, so let's finish it.
|
||||
|
||||
After the call of the `acpi_early_init` function from the [init/main.c](https://github.com/torvalds/linux/blob/master/init/main.c), we can see the following code:
|
||||
|
||||
@@ -185,7 +185,7 @@ nrpages = (nr_free_buffer_pages() * 10) / 100;
|
||||
max_buffer_heads = nrpages * (PAGE_SIZE / sizeof(struct buffer_head));
|
||||
```
|
||||
|
||||
which will be equal to the `10%` of the `ZONE_NORMAL` (all RAM from the 4GB on the `x86_64`). The next function after the `buffer_init` is - `vfs_caches_init`. This function allocates `SLAB` caches and hashtable for different [VFS](http://en.wikipedia.org/wiki/Virtual_file_system) caches. We already saw the `vfs_caches_init_early` function in the eighth part of the linux kernel [initialization process](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-8.html) which initialized caches for `dcache` (or directory-cache) and [inode](http://en.wikipedia.org/wiki/Inode) cache. The `vfs_caches_init` function makes post-early initialization of the `dcache` and `inode` caches, private data cache, hash tables for the mount points, etc. More details about [VFS](http://en.wikipedia.org/wiki/Virtual_file_system) will be described in the separate part. After this we can see `signals_init` function. This function is defined in the [kernel/signal.c](https://github.com/torvalds/linux/blob/master/kernel/signal.c) and allocates a cache for the `sigqueue` structures which represents queue of the real time signals. The next function is `page_writeback_init`. This function initializes the ratio for the dirty pages. Every low-level page entry contains the `dirty` bit which indicates whether a page has been written to after been loaded into memory.
|
||||
which will be equal to the `10%` of the `ZONE_NORMAL` (all RAM from the 4GB on the `x86_64`). The next function after the `buffer_init` is - `vfs_caches_init`. This function allocates `SLAB` caches and hashtable for different [VFS](http://en.wikipedia.org/wiki/Virtual_file_system) caches. We already saw the `vfs_caches_init_early` function in the eighth part of the linux kernel [initialization process](/Initialization/linux-initialization-8.md) which initialized caches for `dcache` (or directory-cache) and [inode](http://en.wikipedia.org/wiki/Inode) cache. The `vfs_caches_init` function makes post-early initialization of the `dcache` and `inode` caches, private data cache, hash tables for the mount points, etc. More details about [VFS](http://en.wikipedia.org/wiki/Virtual_file_system) will be described in the separate part. After this we can see `signals_init` function. This function is defined in the [kernel/signal.c](https://github.com/torvalds/linux/blob/master/kernel/signal.c) and allocates a cache for the `sigqueue` structures which represents queue of the real time signals. The next function is `page_writeback_init`. This function initializes the ratio for the dirty pages. Every low-level page entry contains the `dirty` bit which indicates whether a page has been written to after been loaded into memory.
|
||||
|
||||
Creation of the root for the procfs
|
||||
--------------------------------------------------------------------------------
|
||||
@@ -440,7 +440,7 @@ That's all! Linux kernel initialization process is finished!
|
||||
Conclusion
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
It is the end of the tenth part about the linux kernel [initialization process](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/index.html). It is not only the `tenth` part, but also is the last part which describes initialization of the linux kernel. As I wrote in the first [part](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-1.html) of this chapter, we will go through all steps of the kernel initialization and we did it. We started at the first architecture-independent function - `start_kernel` and finished with the launch of the first `init` process in the our system. I skipped details about different subsystem of the kernel, for example I almost did not cover scheduler, interrupts, exception handling, etc. From the next part we will start to dive to the different kernel subsystems. Hope it will be interesting.
|
||||
It is the end of the tenth part about the linux kernel [initialization process](/Initialization/). It is not only the `tenth` part, but also is the last part which describes initialization of the linux kernel. As I wrote in the first [part](/Initialization/linux-initialization-1.md) of this chapter, we will go through all steps of the kernel initialization and we did it. We started at the first architecture-independent function - `start_kernel` and finished with the launch of the first `init` process in the our system. I skipped details about different subsystem of the kernel, for example I almost did not cover scheduler, interrupts, exception handling, etc. From the next part we will start to dive to the different kernel subsystems. Hope it will be interesting.
|
||||
|
||||
If you have any questions or suggestions write me a comment or ping me at [twitter](https://twitter.com/0xAX).
|
||||
|
||||
@@ -470,4 +470,4 @@ Links
|
||||
* [Tmpfs](http://en.wikipedia.org/wiki/Tmpfs)
|
||||
* [initrd](http://en.wikipedia.org/wiki/Initrd)
|
||||
* [panic](http://en.wikipedia.org/wiki/Kernel_panic)
|
||||
* [Previous part](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-9.html)
|
||||
* [Previous part](/Initialization/linux-initialization-9.md)
|
||||
|
||||
@@ -4,9 +4,9 @@
|
||||
初期中断和异常处理
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
在上一个 [部分](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-1.html) 我们谈到了初期中断初始化。目前我们已经处于解压缩后的Linux内核中了,还有了用于初期启动的基本的 [分页](https://en.wikipedia.org/wiki/Page_table) 机制。我们的目标是在内核的主体代码执行前做好准备工作。
|
||||
在上一个 [部分](/Initialization/linux-initialization-1.md) 我们谈到了初期中断初始化。目前我们已经处于解压缩后的Linux内核中了,还有了用于初期启动的基本的 [分页](https://en.wikipedia.org/wiki/Page_table) 机制。我们的目标是在内核的主体代码执行前做好准备工作。
|
||||
|
||||
我们已经在 [本章](https://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/index.html) 的 [第一部分](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-1.html) 做了一些工作,在这一部分中我们会继续分析关于中断和异常处理部分的代码。
|
||||
我们已经在 [本章](/Initialization/) 的 [第一部分](/Initialization/linux-initialization-1.md) 做了一些工作,在这一部分中我们会继续分析关于中断和异常处理部分的代码。
|
||||
|
||||
我们在上一部分谈到了下面这个循环:
|
||||
|
||||
@@ -493,4 +493,4 @@ pmd_p[pmd_index(address)] = pmd;
|
||||
* [Page table](https://en.wikipedia.org/wiki/Page_table)
|
||||
* [Interrupt handler](https://en.wikipedia.org/wiki/Interrupt_handler)
|
||||
* [Page Fault](https://en.wikipedia.org/wiki/Page_fault),
|
||||
* [Previous part](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-1.html)
|
||||
* [Previous part](/Initialization/linux-initialization-1.md)
|
||||
|
||||
@@ -77,7 +77,7 @@ extern char __initdata boot_command_line[];
|
||||
初始化内存页
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
至此,我们已经拷贝了 `boot_params` 结构体,接下来将对初期页表进行一些设置以便在初始化内核的过程中使用。我们之前已经对初始化了初期页表,以便支持换页,这在之前的[部分](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-1.html)中已经讨论过。现在则通过调用 `reset_early_page_tables` 函数将初期页表中大部分项清零(在之前的部分也有介绍),只保留内核高地址的映射。然后我们调用:
|
||||
至此,我们已经拷贝了 `boot_params` 结构体,接下来将对初期页表进行一些设置以便在初始化内核的过程中使用。我们之前已经对初始化了初期页表,以便支持换页,这在之前的[部分](/Initialization/linux-initialization-1.md)中已经讨论过。现在则通过调用 `reset_early_page_tables` 函数将初期页表中大部分项清零(在之前的部分也有介绍),只保留内核高地址的映射。然后我们调用:
|
||||
|
||||
```C
|
||||
clear_page(init_level4_pgt);
|
||||
|
||||
@@ -208,7 +208,7 @@ int cpu = smp_processor_id();
|
||||
#define raw_smp_processor_id() (this_cpu_read(cpu_number))
|
||||
```
|
||||
|
||||
`this_cpu_read` 函数与其它很多函数一样如(`this_cpu_write`, `this_cpu_add` 等等...) 被定义在[include/linux/percpu-defs.h](https://github.com/torvalds/linux/blob/master/include/linux/percpu-defs.h) 此部分函数主要为对 `this_cpu` 进行操作. 这些操作提供不同的对每cpu[per-cpu](http://xinqiu.gitbooks.io/linux-insides-cn/content/Concepts/linux-cpu-1.html) 变量相关访问方式. 譬如让我们来看看这个函数 `this_cpu_read`:
|
||||
`this_cpu_read` 函数与其它很多函数一样如(`this_cpu_write`, `this_cpu_add` 等等...) 被定义在[include/linux/percpu-defs.h](https://github.com/torvalds/linux/blob/master/include/linux/percpu-defs.h) 此部分函数主要为对 `this_cpu` 进行操作. 这些操作提供不同的对每cpu[per-cpu](/Concepts/linux-cpu-1.md) 变量相关访问方式. 譬如让我们来看看这个函数 `this_cpu_read`:
|
||||
|
||||
```
|
||||
__pcpu_size_call_return(this_cpu_read_, pcp)
|
||||
@@ -311,7 +311,7 @@ static inline int __check_is_bitmap(const unsigned long *bitmap)
|
||||
|
||||
原来此函数始终返回1,事实上我们需要这样的函数才达到我们的目的: 它在编译时给定一个`bitmap`,换句话将就是检查`bitmap`的类型是否是`unsigned long *`,因此我们仅仅通过`to_cpumask`宏指令将类型为`unsigned long`的数组转化为`struct cpumask *`。现在我们可以调用`cpumask_set_cpu` 函数,这个函数仅仅是一个 `set_bit`给CPU掩码的功能函数。所有的这些`set_cpu_*`函数的原理都是一样的。
|
||||
|
||||
如果你还不确定`set_cpu_*`这些函数的操作并且不能理解 `cpumask`的概念,不要担心。你可以通过读取这些章节[cpumask](https://xinqiu.gitbooks.io/linux-insides-cn/content/Concepts/linux-cpu-2.html) or [documentation](https://www.kernel.org/doc/Documentation/cpu-hotplug.txt).来继续了解和学习这些函数的原理。
|
||||
如果你还不确定`set_cpu_*`这些函数的操作并且不能理解 `cpumask`的概念,不要担心。你可以通过读取这些章节[cpumask](/Concepts/linux-cpu-2.md) or [documentation](https://www.kernel.org/doc/Documentation/cpu-hotplug.txt).来继续了解和学习这些函数的原理。
|
||||
|
||||
现在我们已经激活第一个CPU,我们继续接着start_kernel函数往下走,下面的函数是`page_address_init`,但是此函数不执行任何操作,因为只有当所有内存不能直接映射的时候才会执行。
|
||||
|
||||
@@ -349,7 +349,7 @@ Linux version 4.0.0-rc6+ (alex@localhost) (gcc version 4.9.1 (Ubuntu 4.9.1-16ubu
|
||||
memblock_reserve(__pa_symbol(_text), (unsigned long)__bss_stop - (unsigned long)_text);
|
||||
```
|
||||
|
||||
你可以阅读关于`memblock`的相关内容在[Linux kernel memory management Part 1.](http://xinqiu.gitbooks.io/linux-insides-cn/content/MM/linux-mm-1.html),你应该还记得`memblock_reserve`函数的两个参数:
|
||||
你可以阅读关于`memblock`的相关内容在[Linux kernel memory management Part 1.](/MM/linux-mm-1.md),你应该还记得`memblock_reserve`函数的两个参数:
|
||||
|
||||
* base physical address of a memory block;
|
||||
* size of a memory block.
|
||||
@@ -382,7 +382,7 @@ u64 ramdisk_size = get_ramdisk_size();
|
||||
u64 ramdisk_end = PAGE_ALIGN(ramdisk_image + ramdisk_size);
|
||||
```
|
||||
|
||||
如果你阅读过这些章节[Linux Kernel Booting Process](http://xinqiu.gitbooks.io/linux-insides-cn/content/Booting/index.html),你就知道所有的这些参数都来自于`boot_params`,时刻谨记`boot_params`在boot期间已经被赋值,内核启动头包含了一下几个字段用来描述RAM DISK:
|
||||
如果你阅读过这些章节[Linux Kernel Booting Process](/Booting/),你就知道所有的这些参数都来自于`boot_params`,时刻谨记`boot_params`在boot期间已经被赋值,内核启动头包含了一下几个字段用来描述RAM DISK:
|
||||
```
|
||||
Field name: ramdisk_image
|
||||
Type: write (obligatory)
|
||||
@@ -434,7 +434,7 @@ memblock_reserve(ramdisk_image, ramdisk_end - ramdisk_image);
|
||||
|
||||
如果你有任何的问题或者建议,你可以留言,也可以直接发消息给我[twitter](https://twitter.com/0xAX)。
|
||||
|
||||
**很抱歉,英语并不是我的母语,非常抱歉给您阅读带来不便,如果你发现文中描述有任何问题,请提交一个 PR 到 [linux-insides](https://github.com/hust-open-atom-club/linux-insides-zh).**
|
||||
**很抱歉,英语并不是我的母语,非常抱歉给您阅读带来不便,如果你发现文中描述有任何问题,请提交一个 PR 到 [linux-insides-zh](https://github.com/hust-open-atom-club/linux-insides-zh).**
|
||||
|
||||
链接
|
||||
--------------------------------------------------------------------------------
|
||||
@@ -447,4 +447,4 @@ memblock_reserve(ramdisk_image, ramdisk_end - ramdisk_image);
|
||||
* [stack buffer overflow](http://en.wikipedia.org/wiki/Stack_buffer_overflow)
|
||||
* [IRQs](http://en.wikipedia.org/wiki/Interrupt_request_%28PC_architecture%29)
|
||||
* [initrd](http://en.wikipedia.org/wiki/Initrd)
|
||||
* [Previous part](https://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-3.html)
|
||||
* [Previous part](/Initialization/linux-initialization-3.md)
|
||||
|
||||
@@ -4,14 +4,14 @@
|
||||
与系统架构有关的初始化后续分析
|
||||
===========================================================
|
||||
|
||||
在之前的[章节](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-4.html)中,
|
||||
在之前的[章节](/Initialization/linux-initialization-4.md)中,
|
||||
我们讲到了与系统架构有关的 [setup_arch](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/setup.c#L856) 函数部分,本文会继续从这里开始。
|
||||
我们为 [initrd](http://en.wikipedia.org/wiki/Initrd) 预留了内存之后,下一步是执行 `olpc_ofw_detect` 函数检测系统是否支持 [One Laptop Per Child support](http://wiki.laptop.org/go/OFW_FAQ)。
|
||||
我们不会考虑与平台有关的东西,且会忽略与平台有关的函数。所以我们继续往下看。
|
||||
下一步是执行 `early_trap_init` 函数。这个函数会初始化调试功能 (`#DB` -当 `TF` 标志位和rflags被设置时会被使用)和 `int3` (`#BP`)中断门。
|
||||
如果你不了解中断,你可以从 [初期中断和异常处理](https://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-2.html) 中学习有关中断的内容。
|
||||
如果你不了解中断,你可以从 [初期中断和异常处理](/Initialization/linux-initialization-2.md) 中学习有关中断的内容。
|
||||
在 `x86` 架构中,`INT`,`INT0` 和 `INT3` 是支持任务显式调用中断处理函数的特殊指令。`INT3` 指令调用断点(`#BP`)处理函数。
|
||||
你如果记得,我们在这[部分](https://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-2.html) 看到过中断和异常概念:
|
||||
你如果记得,我们在这[部分](/Initialization/linux-initialization-2.md) 看到过中断和异常概念:
|
||||
|
||||
```
|
||||
----------------------------------------------------------------------------------------------
|
||||
@@ -190,7 +190,7 @@ movl $1,%ebx
|
||||
* I/O端口
|
||||
* 设备内存
|
||||
|
||||
我们在 linux [内核启动过程](https://xinqiu.gitbooks.io/linux-insides-cn/content/Booting/linux-bootstrap-3.html)中见过第一种方法(通过 `outb/inb` 指令实现)。
|
||||
我们在 linux [内核启动过程](/Booting/linux-bootstrap-3.md)中见过第一种方法(通过 `outb/inb` 指令实现)。
|
||||
第二种方法是把 `I/O` 的物理地址映射到虚拟地址。当 `CPU` 读取一段物理地址时,它可以读取到映射了 `I/O` 设备的物理 `RAM` 区域。
|
||||
`ioremap` 就是用来把设备内存映射到内核地址空间的。
|
||||
|
||||
@@ -221,7 +221,7 @@ memset(bm_pte, 0, sizeof(bm_pte));
|
||||
pmd_populate_kernel(&init_mm, pmd, bm_pte);
|
||||
```
|
||||
|
||||
这就是所有过程。如果你仍然觉得困惑,不要担心。在 [内核内存管理,第二部分](https://xinqiu.gitbooks.io/linux-insides-cn/content/MM/linux-mm-2.html) 章节会有单独一部分讲解 `ioremap` 和 `fixmaps`。
|
||||
这就是所有过程。如果你仍然觉得困惑,不要担心。在 [内核内存管理,第二部分](/MM/linux-mm-2.md) 章节会有单独一部分讲解 `ioremap` 和 `fixmaps`。
|
||||
|
||||
获取根设备的主次设备号
|
||||
----------------------------------------------------------------------------
|
||||
@@ -272,7 +272,7 @@ static inline dev_t new_decode_dev(u32 dev)
|
||||
Memory Map设置
|
||||
-----------------------------------------------------------------------
|
||||
|
||||
下一步是调用 `setup_memory_map` 函数设置内存映射。但是在这之前我们需要设置与显示屏有关的参数(目前有行、列,视频页等,你可以在 [显示模式初始化和进入保护模式](https://xinqiu.gitbooks.io/linux-insides-cn/content/Booting/linux-bootstrap-3.html) 中了解),
|
||||
下一步是调用 `setup_memory_map` 函数设置内存映射。但是在这之前我们需要设置与显示屏有关的参数(目前有行、列,视频页等,你可以在 [显示模式初始化和进入保护模式](/Booting/linux-bootstrap-3.md) 中了解),
|
||||
与拓展显示识别数据,视频模式,引导启动器类型等参数:
|
||||
|
||||
```C
|
||||
@@ -393,7 +393,7 @@ struct x86_init_ops x86_init __initdata = {
|
||||
}
|
||||
```
|
||||
|
||||
我们可以看到,这里的 `memory_setup` 赋值为 `default_machine_specific_memory_setup`,它是我们在对 [内核启动](https://xinqiu.gitbooks.io/linux-insides-cn/content/Booting/linux-bootstrap-2.html) 过程中的所有 [e820](http://en.wikipedia.org/wiki/E820) 条目经过整理和把内存分区填入 `e820map` 结构体中获得的。
|
||||
我们可以看到,这里的 `memory_setup` 赋值为 `default_machine_specific_memory_setup`,它是我们在对 [内核启动](/Booting/linux-bootstrap-2.md) 过程中的所有 [e820](http://en.wikipedia.org/wiki/E820) 条目经过整理和把内存分区填入 `e820map` 结构体中获得的。
|
||||
所有收集的内存分区会用 `printk` 打印出来。你可以通过运行 `dmesg` 命令找到类似于下面的信息:
|
||||
|
||||
```
|
||||
@@ -452,7 +452,7 @@ static inline void __init copy_edd(void)
|
||||
下一步是在初始化阶段完成内存描述符的初始化。我们知道每个进程都有自己的运行内存地址空间。通过调用 `memory descriptor` 可以看到这些特殊数据结构。
|
||||
在 linux 内核源码中内存描述符是用 `mm_struct` 结构体表示的。`mm_struct` 包含许多不同的与进程地址空间有关的字段,像内核代码/数据段的起始和结束地址,
|
||||
`brk` 的起始和结束,内存区域的数量,内存区域列表等。这些结构定义在 [include/linux/mm_types.h](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/include/linux/mm_types.h) 中。`task_struct` 结构的 `mm` 和 `active_mm` 字段包含了每个进程自己的内存描述符。
|
||||
我们的第一个 `init` 进程也有自己的内存描述符。在之前的[章节](https://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-4.html)我们看到过通过 `INIT_TASK` 宏实现 `task_struct` 的部分初始化信息:
|
||||
我们的第一个 `init` 进程也有自己的内存描述符。在之前的[章节](/Initialization/linux-initialization-4.md)我们看到过通过 `INIT_TASK` 宏实现 `task_struct` 的部分初始化信息:
|
||||
|
||||
```C
|
||||
#define INIT_TASK(tsk) \
|
||||
@@ -538,7 +538,7 @@ void x86_configure_nx(void)
|
||||
|
||||
以上是 linux 内核初始化过程的第五部分。在这一章我们讲解了有关架构初始化的 `setup_arch` 函数。内容很多,但是我们还没有学习完。其中,`setup_arch`
|
||||
是一个很复杂的函数,甚至我不确定我们能在以后的章节中讲完它的所有内容。在这一章节中有一些很有趣的概念像 `Fix-mapped` 地址,`ioremap` 等等。
|
||||
如果没听明白也不用担心,在 [内核内存管理,第二部分](https://xinqiu.gitbooks.io/linux-insides-cn/content/MM/linux-mm-2.html) 还会有更详细的解释。在下一章节我们会继续讲解有关结构初始化的东西,
|
||||
如果没听明白也不用担心,在 [内核内存管理,第二部分](/MM/linux-mm-2.md) 还会有更详细的解释。在下一章节我们会继续讲解有关结构初始化的东西,
|
||||
以及初期内核参数的解析,`pci` 设备的早期转存,直接媒体接口扫描等等。
|
||||
|
||||
如果你有任何问题或者建议,你可以留言,也可以直接发送消息给我[twitter](https://twitter.com/0xAX)。
|
||||
|
||||
@@ -4,7 +4,7 @@
|
||||
仍旧是与系统架构有关的初始化
|
||||
===========================================================
|
||||
|
||||
在之前的[章节](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-5.html)我们从 [arch/x86/kernel/setup.c](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/setup.c)了解了特定于系统架构的初始化事务(在我们的例子中是 `x86_64` 架构),并且通过 `x86_configure_nx` 函数根据对[NX bit](http://en.wikipedia.org/wiki/NX_bit)的支持配置了 `_PAGE_NX` 标志位。正如我之前写的, `setup_arch` 函数和 `start_kernel` 都非常复杂,所以在这个和下个章节我们将继续学习关于系统架构初始化进程的内容。`x86_configure_nx` 函数的下面是 `parse_early_param` 函数。这个函数定义在 [init/main.c](https://github.com/torvalds/linux/blob/master/init/main.c) 中并且你可以从它的名字中了解到,这个函数解析内核命令行并且基于给定的参数创建不同的服务 (所有的内核命令行参数你都可以在 [Documentation/kernel-parameters.txt](https://github.com/torvalds/linux/blob/master/Documentation/kernel-parameters.txt) 找到)。 你可能记得在最前面的 [章节](http://xinqiu.gitbooks.io/linux-insides-cn/content/Booting/linux-bootstrap-2.html) 我们是怎样创建 `earlyprintk`地。在前面我们用 [arch/x86/boot/cmdline.c](https://github.com/torvalds/linux/blob/master/arch/x86/boot/cmdline.c) 里面的 `cmdline_find_option` 和 `__cmdline_find_option`, `__cmdline_find_option_bool` 函数的帮助下寻找内核参数及其值。我们在通用内核部分不依赖于特定的系统架构,在这里我们使用另一种方法。 如果你正在阅读linux内核源代码,你可能注意到这样的调用:
|
||||
在之前的[章节](/Initialization/linux-initialization-5.md)我们从 [arch/x86/kernel/setup.c](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/setup.c)了解了特定于系统架构的初始化事务(在我们的例子中是 `x86_64` 架构),并且通过 `x86_configure_nx` 函数根据对[NX bit](http://en.wikipedia.org/wiki/NX_bit)的支持配置了 `_PAGE_NX` 标志位。正如我之前写的, `setup_arch` 函数和 `start_kernel` 都非常复杂,所以在这个和下个章节我们将继续学习关于系统架构初始化进程的内容。`x86_configure_nx` 函数的下面是 `parse_early_param` 函数。这个函数定义在 [init/main.c](https://github.com/torvalds/linux/blob/master/init/main.c) 中并且你可以从它的名字中了解到,这个函数解析内核命令行并且基于给定的参数创建不同的服务 (所有的内核命令行参数你都可以在 [Documentation/kernel-parameters.txt](https://github.com/torvalds/linux/blob/master/Documentation/kernel-parameters.txt) 找到)。 你可能记得在最前面的 [章节](/Booting/linux-bootstrap-2.md) 我们是怎样创建 `earlyprintk`地。在前面我们用 [arch/x86/boot/cmdline.c](https://github.com/torvalds/linux/blob/master/arch/x86/boot/cmdline.c) 里面的 `cmdline_find_option` 和 `__cmdline_find_option`, `__cmdline_find_option_bool` 函数的帮助下寻找内核参数及其值。我们在通用内核部分不依赖于特定的系统架构,在这里我们使用另一种方法。 如果你正在阅读linux内核源代码,你可能注意到这样的调用:
|
||||
|
||||
```C
|
||||
early_param("gbpages", parse_direct_gbpages_on);
|
||||
@@ -102,7 +102,7 @@ noexec [X86]
|
||||
memblock_x86_reserve_range_setup_data();
|
||||
```
|
||||
|
||||
这个函数的定义也在 [arch/x86/kernel/setup.c](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/setup.c) 中,然后这个函数为 `setup_data` 重新映射内存并保留内存块(你可以阅读之前的 [章节](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-5.html) 了解关于 `setup_data` 的更多内容,你也可以在 [Linux kernel memory management](http://xinqiu.gitbooks.io/linux-insides-cn/content/MM/index.html) 中阅读到关于 `ioremap` and `memblock` 的更多内容)。
|
||||
这个函数的定义也在 [arch/x86/kernel/setup.c](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/setup.c) 中,然后这个函数为 `setup_data` 重新映射内存并保留内存块(你可以阅读之前的 [章节](/Initialization/linux-initialization-5.md) 了解关于 `setup_data` 的更多内容,你也可以在 [Linux kernel memory management](/MM/) 中阅读到关于 `ioremap` and `memblock` 的更多内容)。
|
||||
|
||||
接下来我们来看看下面的条件语句:
|
||||
|
||||
@@ -134,7 +134,7 @@ int __init acpi_mps_check(void)
|
||||
```
|
||||
|
||||
`acpi_mps_check` 函数检查内置的 `MPS` 又称 [多重处理器规范]((http://en.wikipedia.org/wiki/MultiProcessor_Specification)) 表。如果设置了 ` CONFIG_X86_LOCAL_APIC` 但未设置 `CONFIG_x86_MPPAARSE` ,而且传递给内核的命令行选项中有 `acpi=off`、`acpi=noirq` 或者 `pci=noacpi` 参数,那么`acpi_mps_check` 函数就会输出警告信息。如果 `acpi_mps_check` 返回了1,这就表示我们禁用了本地 [APIC](http://en.wikipedia.org/wiki/Advanced_Programmable_Interrupt_Controller)
|
||||
,而且 `setup_clear_cpu_cap` 宏清除了当前CPU中的 `X86_FEATURE_APIC` 位。(你可以阅读 [CPU masks](https://xinqiu.gitbooks.io/linux-insides-cn/content/Concepts/linux-cpu-2.html) 了解关于CPU mask的更多内容)。
|
||||
,而且 `setup_clear_cpu_cap` 宏清除了当前CPU中的 `X86_FEATURE_APIC` 位。(你可以阅读 [CPU masks](/Concepts/linux-cpu-2.md) 了解关于CPU mask的更多内容)。
|
||||
|
||||
早期的PCI转储
|
||||
--------------------------------------------------------------------------------
|
||||
@@ -200,7 +200,7 @@ for (bus = 0; bus < 256; bus++) {
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
|
||||
在 `early_dump_pci_devices` 函数后面,有一些与可用内存和[e820](http://en.wikipedia.org/wiki/E820)相关的函数,其中 [e820](http://en.wikipedia.org/wiki/E820) 的相关信息我们在 [内核安装的第一步](http://xinqiu.gitbooks.io/linux-insides-cn/content/Booting/linux-bootstrap-2.html) 章节中整理过。
|
||||
在 `early_dump_pci_devices` 函数后面,有一些与可用内存和[e820](http://en.wikipedia.org/wiki/E820)相关的函数,其中 [e820](http://en.wikipedia.org/wiki/E820) 的相关信息我们在 [内核安装的第一步](/Booting/linux-bootstrap-2.md) 章节中整理过。
|
||||
```C
|
||||
/* update the e820_saved too */
|
||||
e820_reserve_setup_data();
|
||||
@@ -541,15 +541,14 @@ MEMBLOCK configuration:
|
||||
* [NX bit](http://en.wikipedia.org/wiki/NX_bit)
|
||||
* [Documentation/kernel-parameters.txt](https://github.com/torvalds/linux/blob/master/Documentation/kernel-parameters.txt)
|
||||
* [APIC](http://en.wikipedia.org/wiki/Advanced_Programmable_Interrupt_Controller)
|
||||
* [CPU masks](http://0xax.gitbooks.io/linux-insides/content/Concepts/cpumask.html)
|
||||
* [Linux kernel memory management](http://xinqiu.gitbooks.io/linux-insides-cn/content/MM/index.html)
|
||||
* [CPU masks](/Concepts/linux-cpu-2.md)
|
||||
* [Linux kernel memory management](/MM/index.md)
|
||||
* [PCI](http://en.wikipedia.org/wiki/Conventional_PCI)
|
||||
* [e820](http://en.wikipedia.org/wiki/E820)
|
||||
* [System Management BIOS](http://en.wikipedia.org/wiki/System_Management_BIOS)
|
||||
* [System Management BIOS](http://en.wikipedia.org/wiki/System_Management_BIOS)
|
||||
* [EFI](http://en.wikipedia.org/wiki/Unified_Extensible_Firmware_Interface)
|
||||
* [SMP](http://en.wikipedia.org/wiki/Symmetric_multiprocessing)
|
||||
* [MultiProcessor Specification](http://www.intel.com/design/pentium/datashts/24201606.pdf)
|
||||
* [BSS](http://en.wikipedia.org/wiki/.bss)
|
||||
* [SMBIOS specification](http://www.dmtf.org/sites/default/files/standards/documents/DSP0134v2.5Final.pdf)
|
||||
* [前一个章节](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-5.html)
|
||||
* [前一个章节](/Initialization/linux-initialization-5.md)
|
||||
|
||||
@@ -4,7 +4,7 @@ Kernel initialization. Part 7.
|
||||
The End of the architecture-specific initialization, almost...
|
||||
================================================================================
|
||||
|
||||
This is the seventh part of the Linux Kernel initialization process which covers insides of the `setup_arch` function from the [arch/x86/kernel/setup.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/kernel/setup.c#L861). As you can know from the previous [parts](https://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-6.html), the `setup_arch` function does some architecture-specific (in our case it is [x86_64](http://en.wikipedia.org/wiki/X86-64)) initialization stuff like reserving memory for kernel code/data/bss, early scanning of the [Desktop Management Interface](http://en.wikipedia.org/wiki/Desktop_Management_Interface), early dump of the [PCI](http://en.wikipedia.org/wiki/PCI) device and many many more. If you have read the previous [part](https://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-6.html), you can remember that we've finished it at the `setup_real_mode` function. In the next step, as we set limit of the [memblock](https://xinqiu.gitbooks.io/linux-insides-cn/content/MM/linux-mm-1.html) to the all mapped pages, we can see the call of the `setup_log_buf` function from the [kernel/printk/printk.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/kernel/printk/printk.c).
|
||||
This is the seventh part of the Linux Kernel initialization process which covers insides of the `setup_arch` function from the [arch/x86/kernel/setup.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/kernel/setup.c#L861). As you can know from the previous [parts](/Initialization/linux-initialization-6.md), the `setup_arch` function does some architecture-specific (in our case it is [x86_64](http://en.wikipedia.org/wiki/X86-64)) initialization stuff like reserving memory for kernel code/data/bss, early scanning of the [Desktop Management Interface](http://en.wikipedia.org/wiki/Desktop_Management_Interface), early dump of the [PCI](http://en.wikipedia.org/wiki/PCI) device and many many more. If you have read the previous [part](/Initialization/linux-initialization-6.md), you can remember that we've finished it at the `setup_real_mode` function. In the next step, as we set limit of the [memblock](/MM/linux-mm-1.md) to the all mapped pages, we can see the call of the `setup_log_buf` function from the [kernel/printk/printk.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/kernel/printk/printk.c).
|
||||
|
||||
The `setup_log_buf` function setups kernel cyclic buffer and its length depends on the `CONFIG_LOG_BUF_SHIFT` configuration option. As we can read from the documentation of the `CONFIG_LOG_BUF_SHIFT` it can be between `12` and `21`. In the insides, buffer defined as array of chars:
|
||||
|
||||
@@ -32,7 +32,7 @@ setup_log_buf(1);
|
||||
|
||||
where `1` means that it is early setup. In the next step we check `new_log_buf_len` variable which is updated length of the kernel log buffer and allocate new space for the buffer with the `memblock_virt_alloc` function for it, or just return.
|
||||
|
||||
As kernel log buffer is ready, the next function is `reserve_initrd`. You can remember that we already called the `early_reserve_initrd` function in the fourth part of the [Kernel initialization](https://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-4.html). Now, as we reconstructed direct memory mapping in the `init_mem_mapping` function, we need to move [initrd](http://en.wikipedia.org/wiki/Initrd) into directly mapped memory. The `reserve_initrd` function starts from the definition of the base address and end address of the `initrd` and check that `initrd` is provided by a bootloader. All the same as what we saw in the `early_reserve_initrd`. But instead of the reserving place in the `memblock` area with the call of the `memblock_reserve` function, we get the mapped size of the direct memory area and check that the size of the `initrd` is not greater than this area with:
|
||||
As kernel log buffer is ready, the next function is `reserve_initrd`. You can remember that we already called the `early_reserve_initrd` function in the fourth part of the [Kernel initialization](/Initialization/linux-initialization-4.md). Now, as we reconstructed direct memory mapping in the `init_mem_mapping` function, we need to move [initrd](http://en.wikipedia.org/wiki/Initrd) into directly mapped memory. The `reserve_initrd` function starts from the definition of the base address and end address of the `initrd` and check that `initrd` is provided by a bootloader. All the same as what we saw in the `early_reserve_initrd`. But instead of the reserving place in the `memblock` area with the call of the `memblock_reserve` function, we get the mapped size of the direct memory area and check that the size of the `initrd` is not greater than this area with:
|
||||
|
||||
```C
|
||||
mapped_size = memblock_mem_size(max_pfn_mapped);
|
||||
@@ -68,7 +68,7 @@ memblock_free(ramdisk_image, ramdisk_end - ramdisk_image);
|
||||
|
||||
After we relocated `initrd` ramdisk image, the next function is `vsmp_init` from the [arch/x86/kernel/vsmp_64.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/kernel/vsmp_64.c). This function initializes support of the `ScaleMP vSMP`. As I already wrote in the previous parts, this chapter will not cover non-related `x86_64` initialization parts (for example as the current or `ACPI`, etc.). So we will skip implementation of this for now and will back to it in the part which cover techniques of parallel computing.
|
||||
|
||||
The next function is `io_delay_init` from the [arch/x86/kernel/io_delay.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/kernel/io_delay.c). This function allows to override default I/O delay `0x80` port. We already saw I/O delay in the [Last preparation before transition into protected mode](https://xinqiu.gitbooks.io/linux-insides-cn/content/Booting/linux-bootstrap-3.html), now let's look on the `io_delay_init` implementation:
|
||||
The next function is `io_delay_init` from the [arch/x86/kernel/io_delay.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/kernel/io_delay.c). This function allows to override default I/O delay `0x80` port. We already saw I/O delay in the [Last preparation before transition into protected mode](/Booting/linux-bootstrap-3.md), now let's look on the `io_delay_init` implementation:
|
||||
|
||||
```C
|
||||
void __init io_delay_init(void)
|
||||
@@ -98,7 +98,7 @@ We can see `io_delay` command line parameter setup with the `early_param` macro
|
||||
early_param("io_delay", io_delay_param);
|
||||
```
|
||||
|
||||
More about `early_param` you can read in the previous [part](https://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-6.html). So the `io_delay_param` function which setups `io_delay_override` variable will be called in the [do_early_param](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/init/main.c#L413) function. `io_delay_param` function gets the argument of the `io_delay` kernel command line parameter and sets `io_delay_type` depends on it:
|
||||
More about `early_param` you can read in the previous [part](/Initialization/linux-initialization-6.md). So the `io_delay_param` function which setups `io_delay_override` variable will be called in the [do_early_param](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/init/main.c#L413) function. `io_delay_param` function gets the argument of the `io_delay` kernel command line parameter and sets `io_delay_type` depends on it:
|
||||
|
||||
```C
|
||||
static int __init io_delay_param(char *s)
|
||||
@@ -296,19 +296,19 @@ BUILD_BUG_ON((unsigned long)__fix_to_virt(VSYSCALL_PAGE) !=
|
||||
(unsigned long)VSYSCALL_ADDR);
|
||||
```
|
||||
|
||||
Now `vsyscall` area is in the `fix-mapped` area. That's all about `map_vsyscall`, if you do not know anything about fix-mapped addresses, you can read [Fix-Mapped Addresses and ioremap](https://xinqiu.gitbooks.io/linux-insides-cn/content/MM/linux-mm-2.html). We will see more about `vsyscalls` in the `vsyscalls and vdso` part.
|
||||
Now `vsyscall` area is in the `fix-mapped` area. That's all about `map_vsyscall`, if you do not know anything about fix-mapped addresses, you can read [Fix-Mapped Addresses and ioremap](/MM/linux-mm-2.md). We will see more about `vsyscalls` in the `vsyscalls and vdso` part.
|
||||
|
||||
Getting the SMP configuration
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
You may remember how we made a search of the [SMP](http://en.wikipedia.org/wiki/Symmetric_multiprocessing) configuration in the previous [part](https://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-6.html). Now we need to get the `SMP` configuration if we found it. For this we check `smp_found_config` variable which we set in the `smp_scan_config` function (read about it the previous part) and call the `get_smp_config` function:
|
||||
You may remember how we made a search of the [SMP](http://en.wikipedia.org/wiki/Symmetric_multiprocessing) configuration in the previous [part](/Initialization/linux-initialization-6.md). Now we need to get the `SMP` configuration if we found it. For this we check `smp_found_config` variable which we set in the `smp_scan_config` function (read about it the previous part) and call the `get_smp_config` function:
|
||||
|
||||
```C
|
||||
if (smp_found_config)
|
||||
get_smp_config();
|
||||
```
|
||||
|
||||
The `get_smp_config` expands to the `x86_init.mpparse.default_get_smp_config` function which is defined in the [arch/x86/kernel/mpparse.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/kernel/mpparse.c). This function defines a pointer to the multiprocessor floating pointer structure - `mpf_intel` (you can read about it in the previous [part](https://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-6.html)) and does some checks:
|
||||
The `get_smp_config` expands to the `x86_init.mpparse.default_get_smp_config` function which is defined in the [arch/x86/kernel/mpparse.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/kernel/mpparse.c). This function defines a pointer to the multiprocessor floating pointer structure - `mpf_intel` (you can read about it in the previous [part](/Initialization/linux-initialization-6.md)) and does some checks:
|
||||
|
||||
```C
|
||||
struct mpf_intel *mpf = mpf_found;
|
||||
@@ -320,7 +320,7 @@ if (acpi_lapic && early)
|
||||
return;
|
||||
```
|
||||
|
||||
Here we can see that multiprocessor configuration was found in the `smp_scan_config` function or just return from the function if not. The next check is `acpi_lapic` and `early`. And as we did this checks, we start to read the `SMP` configuration. As we finished reading it, the next step is - `prefill_possible_map` function which makes preliminary filling of the possible CPU's `cpumask` (more about it you can read in the [Introduction to the cpumasks](https://xinqiu.gitbooks.io/linux-insides-cn/content/Concepts/linux-cpu-2.html)).
|
||||
Here we can see that multiprocessor configuration was found in the `smp_scan_config` function or just return from the function if not. The next check is `acpi_lapic` and `early`. And as we did this checks, we start to read the `SMP` configuration. As we finished reading it, the next step is - `prefill_possible_map` function which makes preliminary filling of the possible CPU's `cpumask` (more about it you can read in the [Introduction to the cpumasks](/Concepts/linux-cpu-2.md)).
|
||||
|
||||
The rest of the setup_arch
|
||||
--------------------------------------------------------------------------------
|
||||
@@ -334,7 +334,7 @@ That's all, and now we can back to the `start_kernel` from the `setup_arch`.
|
||||
Back to the main.c
|
||||
================================================================================
|
||||
|
||||
As I wrote above, we have finished with the `setup_arch` function and now we can back to the `start_kernel` function from the [init/main.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/init/main.c). As you may remember or saw yourself, `start_kernel` function as big as the `setup_arch`. So the couple of the next part will be dedicated to learning of this function. So, let's continue with it. After the `setup_arch` we can see the call of the `mm_init_cpumask` function. This function sets the [cpumask](https://xinqiu.gitbooks.io/linux-insides-cn/content/Concepts/linux-cpu-2.html) pointer to the memory descriptor `cpumask`. We can look on its implementation:
|
||||
As I wrote above, we have finished with the `setup_arch` function and now we can back to the `start_kernel` function from the [init/main.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/init/main.c). As you may remember or saw yourself, `start_kernel` function as big as the `setup_arch`. So the couple of the next part will be dedicated to learning of this function. So, let's continue with it. After the `setup_arch` we can see the call of the `mm_init_cpumask` function. This function sets the [cpumask](/Concepts/linux-cpu-2.md) pointer to the memory descriptor `cpumask`. We can look on its implementation:
|
||||
|
||||
```C
|
||||
static inline void mm_init_cpumask(struct mm_struct *mm)
|
||||
@@ -379,7 +379,7 @@ static void __init setup_command_line(char *command_line)
|
||||
|
||||
Here we can see that we allocate space for the three buffers which will contain kernel command line for the different purposes (read above). And as we allocated space, we store `boot_command_line` in the `saved_command_line` and `command_line` (kernel command line from the `setup_arch`) to the `static_command_line`.
|
||||
|
||||
The next function after the `setup_command_line` is the `setup_nr_cpu_ids`. This function setting `nr_cpu_ids` (number of CPUs) according to the last bit in the `cpu_possible_mask` (more about it you can read in the chapter describes [cpumasks](https://xinqiu.gitbooks.io/linux-insides-cn/content/Concepts/linux-cpu-2.html) concept). Let's look on its implementation:
|
||||
The next function after the `setup_command_line` is the `setup_nr_cpu_ids`. This function setting `nr_cpu_ids` (number of CPUs) according to the last bit in the `cpu_possible_mask` (more about it you can read in the chapter describes [cpumasks](/Concepts/linux-cpu-2.md) concept). Let's look on its implementation:
|
||||
|
||||
```C
|
||||
void __init setup_nr_cpu_ids(void)
|
||||
@@ -479,4 +479,4 @@ Links
|
||||
* [vsyscalls](https://lwn.net/Articles/446528/)
|
||||
* [SMP](http://en.wikipedia.org/wiki/Symmetric_multiprocessing)
|
||||
* [jiffy](http://en.wikipedia.org/wiki/Jiffy_%28time%29)
|
||||
* [Previous part](https://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-6.html)
|
||||
* [Previous part](/Initialization/linux-initialization-6.md)
|
||||
@@ -4,7 +4,7 @@ Kernel initialization. Part 8.
|
||||
Scheduler initialization
|
||||
================================================================================
|
||||
|
||||
This is the eighth [part](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/index.html) of the Linux kernel initialization process and we stopped on the `setup_nr_cpu_ids` function in the [previous](https://github.com/hust-open-atom-club/linux-insides-zh/blob/master/Initialization/linux-initialization-7.md) part. The main point of the current part is [scheduler](http://en.wikipedia.org/wiki/Scheduling_%28computing%29) initialization. But before we will start to learn initialization process of the scheduler, we need to do some stuff. The next step in the [init/main.c](https://github.com/torvalds/linux/blob/master/init/main.c) is the `setup_per_cpu_areas` function. This function setups areas for the `percpu` variables, more about it you can read in the special part about the [Per-CPU variables](https://xinqiu.gitbooks.io/linux-insides-cn/content/Concepts/linux-cpu-1.html). After `percpu` areas is up and running, the next step is the `smp_prepare_boot_cpu` function. This function does some preparations for the [SMP](http://en.wikipedia.org/wiki/Symmetric_multiprocessing):
|
||||
This is the eighth [part](/Initialization/) of the Linux kernel initialization process and we stopped on the `setup_nr_cpu_ids` function in the [previous](https://github.com/hust-open-atom-club/linux-insides-zh/blob/master/Initialization/linux-initialization-7.md) part. The main point of the current part is [scheduler](http://en.wikipedia.org/wiki/Scheduling_%28computing%29) initialization. But before we will start to learn initialization process of the scheduler, we need to do some stuff. The next step in the [init/main.c](https://github.com/torvalds/linux/blob/master/init/main.c) is the `setup_per_cpu_areas` function. This function setups areas for the `percpu` variables, more about it you can read in the special part about the [Per-CPU variables](/Concepts/linux-cpu-1.md). After `percpu` areas is up and running, the next step is the `smp_prepare_boot_cpu` function. This function does some preparations for the [SMP](http://en.wikipedia.org/wiki/Symmetric_multiprocessing):
|
||||
|
||||
```C
|
||||
static inline void smp_prepare_boot_cpu(void)
|
||||
@@ -25,7 +25,7 @@ void __init native_smp_prepare_boot_cpu(void)
|
||||
}
|
||||
```
|
||||
|
||||
The `native_smp_prepare_boot_cpu` function gets the id of the current CPU (which is Bootstrap processor and its `id` is zero) with the `smp_processor_id` function. I will not explain how the `smp_processor_id` works, because we already saw it in the [Kernel entry point](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-4.html) part. As we got processor `id` number we reload [Global Descriptor Table](http://en.wikipedia.org/wiki/Global_Descriptor_Table) for the given CPU with the `switch_to_new_gdt` function:
|
||||
The `native_smp_prepare_boot_cpu` function gets the id of the current CPU (which is Bootstrap processor and its `id` is zero) with the `smp_processor_id` function. I will not explain how the `smp_processor_id` works, because we already saw it in the [Kernel entry point](/Initialization/linux-initialization-4.md) part. As we got processor `id` number we reload [Global Descriptor Table](http://en.wikipedia.org/wiki/Global_Descriptor_Table) for the given CPU with the `switch_to_new_gdt` function:
|
||||
|
||||
```C
|
||||
void switch_to_new_gdt(int cpu)
|
||||
@@ -39,7 +39,7 @@ void switch_to_new_gdt(int cpu)
|
||||
}
|
||||
```
|
||||
|
||||
The `gdt_descr` variable represents pointer to the `GDT` descriptor here (we already saw `desc_ptr` in the [Early interrupt and exception handling](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-2.html)). We get the address and the size of the `GDT` descriptor where `GDT_SIZE` is `256` or:
|
||||
The `gdt_descr` variable represents pointer to the `GDT` descriptor here (we already saw `desc_ptr` in the [Early interrupt and exception handling](/Initialization/linux-initialization-2.md)). We get the address and the size of the `GDT` descriptor where `GDT_SIZE` is `256` or:
|
||||
|
||||
```C
|
||||
#define GDT_SIZE (GDT_ENTRIES * 8)
|
||||
@@ -54,7 +54,7 @@ static inline struct desc_struct *get_cpu_gdt_table(unsigned int cpu)
|
||||
}
|
||||
```
|
||||
|
||||
The `get_cpu_gdt_table` uses `per_cpu` macro for getting `gdt_page` percpu variable for the given CPU number (bootstrap processor with `id` - 0 in our case). You may ask the following question: so, if we can access `gdt_page` percpu variable, where it was defined? Actually we already saw it in this book. If you have read the first [part](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-1.html) of this chapter, you can remember that we saw definition of the `gdt_page` in the [arch/x86/kernel/head_64.S](https://github.com/0xAX/linux/blob/master/arch/x86/kernel/head_64.S):
|
||||
The `get_cpu_gdt_table` uses `per_cpu` macro for getting `gdt_page` percpu variable for the given CPU number (bootstrap processor with `id` - 0 in our case). You may ask the following question: so, if we can access `gdt_page` percpu variable, where it was defined? Actually we already saw it in this book. If you have read the first [part](/Initialization/linux-initialization-1.md) of this chapter, you can remember that we saw definition of the `gdt_page` in the [arch/x86/kernel/head_64.S](https://github.com/0xAX/linux/blob/master/arch/x86/kernel/head_64.S):
|
||||
|
||||
```assembly
|
||||
early_gdt_descr:
|
||||
@@ -86,7 +86,7 @@ DEFINE_PER_CPU_PAGE_ALIGNED(struct gdt_page, gdt_page) = { .gdt = {
|
||||
...
|
||||
```
|
||||
|
||||
more about `percpu` variables you can read in the [Per-CPU variables](https://xinqiu.gitbooks.io/linux-insides-cn/content/Concepts/linux-cpu-1.html) part. As we got address and size of the `GDT` descriptor we reload `GDT` with the `load_gdt` which just execute `lgdt` instruct and load `percpu_segment` with the following function:
|
||||
more about `percpu` variables you can read in the [Per-CPU variables](/Concepts/linux-cpu-1.md) part. As we got address and size of the `GDT` descriptor we reload `GDT` with the `load_gdt` which just execute `lgdt` instruct and load `percpu_segment` with the following function:
|
||||
|
||||
```C
|
||||
void load_percpu_segment(int cpu) {
|
||||
@@ -180,11 +180,11 @@ After this we can see the kernel command line in the initialization output:
|
||||
|
||||

|
||||
|
||||
And a couple of functions such as `parse_early_param` and `parse_args` which handles linux kernel command line. You may remember that we already saw the call of the `parse_early_param` function in the sixth [part](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-6.html) of the kernel initialization chapter, so why we call it again? Answer is simple: we call this function in the architecture-specific code (`x86_64` in our case), but not all architecture calls this function. And we need to call the second function `parse_args` to parse and handle non-early command line arguments.
|
||||
And a couple of functions such as `parse_early_param` and `parse_args` which handles linux kernel command line. You may remember that we already saw the call of the `parse_early_param` function in the sixth [part](/Initialization/linux-initialization-6.md) of the kernel initialization chapter, so why we call it again? Answer is simple: we call this function in the architecture-specific code (`x86_64` in our case), but not all architecture calls this function. And we need to call the second function `parse_args` to parse and handle non-early command line arguments.
|
||||
|
||||
In the next step we can see the call of the `jump_label_init` from the [kernel/jump_label.c](https://github.com/torvalds/linux/blob/master/kernel/jump_label.c). and initializes [jump label](https://lwn.net/Articles/412072/).
|
||||
|
||||
After this we can see the call of the `setup_log_buf` function which setups the [printk](http://www.makelinux.net/books/lkd2/ch18lev1sec3) log buffer. We already saw this function in the seventh [part](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-7.html) of the linux kernel initialization process chapter.
|
||||
After this we can see the call of the `setup_log_buf` function which setups the [printk](http://www.makelinux.net/books/lkd2/ch18lev1sec3) log buffer. We already saw this function in the seventh [part](/Initialization/linux-initialization-7.md) of the linux kernel initialization process chapter.
|
||||
|
||||
PID hash initialization
|
||||
--------------------------------------------------------------------------------
|
||||
@@ -205,7 +205,7 @@ pid_hash = alloc_large_system_hash("PID", sizeof(*pid_hash), 0, 18,
|
||||
```
|
||||
|
||||
The number of elements of the `pid_hash` depends on the `RAM` configuration, but it can be between `2^4` and `2^12`. The `pidhash_init` computes the size
|
||||
and allocates the required storage (which is `hlist` in our case - the same as [doubly linked list](https://xinqiu.gitbooks.io/linux-insides-cn/content/DataStructures/linux-datastructures-1.html), but contains one pointer instead on the [struct hlist_head](https://github.com/torvalds/linux/blob/master/include/linux/types.h)]. The `alloc_large_system_hash` function allocates a large system hash table with `memblock_virt_alloc_nopanic` if we pass `HASH_EARLY` flag (as it in our case) or with `__vmalloc` if we did no pass this flag.
|
||||
and allocates the required storage (which is `hlist` in our case - the same as [doubly linked list](/DataStructures/linux-datastructures-1.md), but contains one pointer instead on the [struct hlist_head](https://github.com/torvalds/linux/blob/master/include/linux/types.h)]. The `alloc_large_system_hash` function allocates a large system hash table with `memblock_virt_alloc_nopanic` if we pass `HASH_EARLY` flag (as it in our case) or with `__vmalloc` if we did no pass this flag.
|
||||
|
||||
The result we can see in the `dmesg` output:
|
||||
|
||||
@@ -230,7 +230,7 @@ pgtable_init();
|
||||
vmalloc_init();
|
||||
```
|
||||
|
||||
The first is `page_ext_init_flatmem` which depends on the `CONFIG_SPARSEMEM` kernel configuration option and initializes extended data per page handling. The `mem_init` releases all `bootmem`, the `kmem_cache_init` initializes kernel cache, the `percpu_init_late` - replaces `percpu` chunks with those allocated by [slub](http://en.wikipedia.org/wiki/SLUB_%28software%29), the `pgtable_init` - initializes the `page->ptl` kernel cache, the `vmalloc_init` - initializes `vmalloc`. Please, **NOTE** that we will not dive into details about all of these functions and concepts, but we will see all of they it in the [Linux kernel memory manager](http://xinqiu.gitbooks.io/linux-insides-cn/content/MM/index.html) chapter.
|
||||
The first is `page_ext_init_flatmem` which depends on the `CONFIG_SPARSEMEM` kernel configuration option and initializes extended data per page handling. The `mem_init` releases all `bootmem`, the `kmem_cache_init` initializes kernel cache, the `percpu_init_late` - replaces `percpu` chunks with those allocated by [slub](http://en.wikipedia.org/wiki/SLUB_%28software%29), the `pgtable_init` - initializes the `page->ptl` kernel cache, the `vmalloc_init` - initializes `vmalloc`. Please, **NOTE** that we will not dive into details about all of these functions and concepts, but we will see all of they it in the [Linux kernel memory manager](/MM/) chapter.
|
||||
|
||||
That's all. Now we can look on the `scheduler`.
|
||||
|
||||
@@ -290,7 +290,7 @@ The root task group is the task group which belongs to every task in system. As
|
||||
DECLARE_PER_CPU(cpumask_var_t, load_balance_mask);
|
||||
```
|
||||
|
||||
Here `cpumask_var_t` is the `cpumask_t` with one difference: `cpumask_var_t` is allocated only `nr_cpu_ids` bits when the `cpumask_t` always has `NR_CPUS` bits (more about `cpumask` you can read in the [CPU masks](https://xinqiu.gitbooks.io/linux-insides-cn/content/Concepts/linux-cpu-2.html) part). As you can see:
|
||||
Here `cpumask_var_t` is the `cpumask_t` with one difference: `cpumask_var_t` is allocated only `nr_cpu_ids` bits when the `cpumask_t` always has `NR_CPUS` bits (more about `cpumask` you can read in the [CPU masks](/Concepts/linux-cpu-2.md) part). As you can see:
|
||||
|
||||
```C
|
||||
#ifdef CONFIG_CPUMASK_OFFSTACK
|
||||
@@ -461,19 +461,19 @@ If you have any questions or suggestions write me a comment or ping me at [twitt
|
||||
Links
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
* [CPU masks](https://xinqiu.gitbooks.io/linux-insides-cn/content/Concepts/linux-cpu-2.html)
|
||||
* [CPU masks](/Concepts/linux-cpu-2.md)
|
||||
* [high-resolution kernel timer](https://www.kernel.org/doc/Documentation/timers/hrtimers.txt)
|
||||
* [spinlock](http://en.wikipedia.org/wiki/Spinlock)
|
||||
* [Run queue](http://en.wikipedia.org/wiki/Run_queue)
|
||||
* [Linux kernem memory manager](http://0xax.gitbooks.io/linux-insides/content/MM/index.html)
|
||||
* [Linux kernem memory manager](/MM/)
|
||||
* [slub](http://en.wikipedia.org/wiki/SLUB_%28software%29)
|
||||
* [virtual file system](http://en.wikipedia.org/wiki/Virtual_file_system)
|
||||
* [Linux kernel hotplug documentation](https://www.kernel.org/doc/Documentation/cpu-hotplug.txt)
|
||||
* [IRQ](http://en.wikipedia.org/wiki/Interrupt_request_%28PC_architecture%29)
|
||||
* [Global Descriptor Table](http://en.wikipedia.org/wiki/Global_Descriptor_Table)
|
||||
* [Per-CPU variables](http://0xax.gitbooks.io/linux-insides/content/Concepts/per-cpu.html)
|
||||
* [Per-CPU variables](/Concepts/linux-cpu-1.md)
|
||||
* [SMP](http://en.wikipedia.org/wiki/Symmetric_multiprocessing)
|
||||
* [RCU](http://en.wikipedia.org/wiki/Read-copy-update)
|
||||
* [CFS Scheduler documentation](https://www.kernel.org/doc/Documentation/scheduler/sched-design-CFS.txt)
|
||||
* [Real-Time group scheduling](https://www.kernel.org/doc/Documentation/scheduler/sched-rt-group.txt)
|
||||
* [Previous part](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-7.html)
|
||||
* [Previous part](/Initialization/linux-initialization-7.md)
|
||||
|
||||
@@ -4,7 +4,7 @@ Kernel initialization. Part 9.
|
||||
RCU initialization
|
||||
================================================================================
|
||||
|
||||
This is ninth part of the [Linux Kernel initialization process](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/index.html) and in the previous part we stopped at the [scheduler initialization](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-8.html). In this part we will continue to dive to the linux kernel initialization process and the main purpose of this part will be to learn about initialization of the [RCU](http://en.wikipedia.org/wiki/Read-copy-update). We can see that the next step in the [init/main.c](https://github.com/torvalds/linux/blob/master/init/main.c) after the `sched_init` is the call of the `preempt_disable`. There are two macros:
|
||||
This is ninth part of the [Linux Kernel initialization process](/Initialization/) and in the previous part we stopped at the [scheduler initialization](/Initialization/linux-initialization-8.md). In this part we will continue to dive to the linux kernel initialization process and the main purpose of this part will be to learn about initialization of the [RCU](http://en.wikipedia.org/wiki/Read-copy-update). We can see that the next step in the [init/main.c](https://github.com/torvalds/linux/blob/master/init/main.c) after the `sched_init` is the call of the `preempt_disable`. There are two macros:
|
||||
|
||||
* `preempt_disable`
|
||||
* `preempt_enable`
|
||||
@@ -38,7 +38,7 @@ In the first implementation of the `preempt_disable` we increment this `__preemp
|
||||
#define preempt_count_add(val) __preempt_count_add(val)
|
||||
```
|
||||
|
||||
where `preempt_count_add` calls the `raw_cpu_add_4` macro which adds `1` to the given `percpu` variable (`__preempt_count`) in our case (more about `precpu` variables you can read in the part about [Per-CPU variables](https://xinqiu.gitbooks.io/linux-insides-cn/content/Concepts/linux-cpu-1.html)). Ok, we increased `__preempt_count` and the next step we can see the call of the `barrier` macro in the both macros. The `barrier` macro inserts an optimization barrier. In the processors with `x86_64` architecture independent memory access operations can be performed in any order. That's why we need the opportunity to point compiler and processor on compliance of order. This mechanism is memory barrier. Let's consider a simple example:
|
||||
where `preempt_count_add` calls the `raw_cpu_add_4` macro which adds `1` to the given `percpu` variable (`__preempt_count`) in our case (more about `precpu` variables you can read in the part about [Per-CPU variables](/Concepts/linux-cpu-1.md)). Ok, we increased `__preempt_count` and the next step we can see the call of the `barrier` macro in the both macros. The `barrier` macro inserts an optimization barrier. In the processors with `x86_64` architecture independent memory access operations can be performed in any order. That's why we need the opportunity to point compiler and processor on compliance of order. This mechanism is memory barrier. Let's consider a simple example:
|
||||
|
||||
```C
|
||||
preempt_disable();
|
||||
@@ -83,7 +83,7 @@ void __init idr_init_cache(void)
|
||||
}
|
||||
```
|
||||
|
||||
Here we can see the call of the `kmem_cache_create`. We already called the `kmem_cache_init` in the [init/main.c](https://github.com/torvalds/linux/blob/master/init/main.c#L485). This function create generalized caches again using the `kmem_cache_alloc` (more about caches we will see in the [Linux kernel memory management](http://xinqiu.gitbooks.io/linux-insides-cn/content/MM/index.html) chapter). In our case, as we are using `kmem_cache_t` which will be used by the [slab](http://en.wikipedia.org/wiki/Slab_allocation) allocator and `kmem_cache_create` creates it. As you can see we pass five parameters to the `kmem_cache_create`:
|
||||
Here we can see the call of the `kmem_cache_create`. We already called the `kmem_cache_init` in the [init/main.c](https://github.com/torvalds/linux/blob/master/init/main.c#L485). This function create generalized caches again using the `kmem_cache_alloc` (more about caches we will see in the [Linux kernel memory management](/MM/) chapter). In our case, as we are using `kmem_cache_t` which will be used by the [slab](http://en.wikipedia.org/wiki/Slab_allocation) allocator and `kmem_cache_create` creates it. As you can see we pass five parameters to the `kmem_cache_create`:
|
||||
|
||||
* name of the cache;
|
||||
* size of the object to store in cache;
|
||||
@@ -127,7 +127,7 @@ The next step is [RCU](http://en.wikipedia.org/wiki/Read-copy-update) initializa
|
||||
|
||||
In the first case `rcu_init` will be in the [kernel/rcu/tiny.c](https://github.com/torvalds/linux/blob/master/kernel/rcu/tiny.c) and in the second case it will be defined in the [kernel/rcu/tree.c](https://github.com/torvalds/linux/blob/master/kernel/rcu/tree.c). We will see the implementation of the `tree rcu`, but first of all about the `RCU` in general.
|
||||
|
||||
`RCU` or read-copy update is a scalable high-performance synchronization mechanism implemented in the Linux kernel. On the early stage the linux kernel provided support and environment for the concurrently running applications, but all execution was serialized in the kernel using a single global lock. In our days linux kernel has no single global lock, but provides different mechanisms including [lock-free data structures](http://en.wikipedia.org/wiki/Concurrent_data_structure), [percpu](https://xinqiu.gitbooks.io/linux-insides-cn/content/Concepts/linux-cpu-1.html) data structures and other. One of these mechanisms is - the `read-copy update`. The `RCU` technique is designed for rarely-modified data structures. The idea of the `RCU` is simple. For example we have a rarely-modified data structure. If somebody wants to change this data structure, we make a copy of this data structure and make all changes in the copy. In the same time all other users of the data structure use old version of it. Next, we need to choose safe moment when original version of the data structure will have no users and update it with the modified copy.
|
||||
`RCU` or read-copy update is a scalable high-performance synchronization mechanism implemented in the Linux kernel. On the early stage the linux kernel provided support and environment for the concurrently running applications, but all execution was serialized in the kernel using a single global lock. In our days linux kernel has no single global lock, but provides different mechanisms including [lock-free data structures](http://en.wikipedia.org/wiki/Concurrent_data_structure), [percpu](/Concepts/linux-cpu-1.md) data structures and other. One of these mechanisms is - the `read-copy update`. The `RCU` technique is designed for rarely-modified data structures. The idea of the `RCU` is simple. For example we have a rarely-modified data structure. If somebody wants to change this data structure, we make a copy of this data structure and make all changes in the copy. In the same time all other users of the data structure use old version of it. Next, we need to choose safe moment when original version of the data structure will have no users and update it with the modified copy.
|
||||
|
||||
Of course this description of the `RCU` is very simplified. To understand some details about `RCU`, first of all we need to learn some terminology. Data readers in the `RCU` executed in the [critical section](http://en.wikipedia.org/wiki/Critical_section). Every time when data reader get to the critical section, it calls the `rcu_read_lock`, and `rcu_read_unlock` on exit from the critical section. If the thread is not in the critical section, it will be in state which called - `quiescent state`. The moment when every thread is in the `quiescent state` called - `grace period`. If a thread wants to remove an element from the data structure, this occurs in two steps. First step is `removal` - atomically removes element from the data structure, but does not release the physical memory. After this thread-writer announces and waits until it is finished. From this moment, the removed element is available to the thread-readers. After the `grace period` finished, the second step of the element removal will be started, it just removes the element from the physical memory.
|
||||
|
||||
@@ -378,7 +378,7 @@ Ok, we already passed the main theme of this part which is `RCU` initialization,
|
||||
|
||||
After we initialized `RCU`, the next step which you can see in the [init/main.c](https://github.com/torvalds/linux/blob/master/init/main.c) is the - `trace_init` function. As you can understand from its name, this function initialize [tracing](http://en.wikipedia.org/wiki/Tracing_%28software%29) subsystem. You can read more about linux kernel trace system - [here](http://elinux.org/Kernel_Trace_Systems).
|
||||
|
||||
After the `trace_init`, we can see the call of the `radix_tree_init`. If you are familiar with the different data structures, you can understand from the name of this function that it initializes kernel implementation of the [Radix tree](http://en.wikipedia.org/wiki/Radix_tree). This function is defined in the [lib/radix-tree.c](https://github.com/torvalds/linux/blob/master/lib/radix-tree.c) and you can read more about it in the part about [Radix tree](https://xinqiu.gitbooks.io/linux-insides-cn/content/DataStructures/linux-datastructures-2.html).
|
||||
After the `trace_init`, we can see the call of the `radix_tree_init`. If you are familiar with the different data structures, you can understand from the name of this function that it initializes kernel implementation of the [Radix tree](http://en.wikipedia.org/wiki/Radix_tree). This function is defined in the [lib/radix-tree.c](https://github.com/torvalds/linux/blob/master/lib/radix-tree.c) and you can read more about it in the part about [Radix tree](/DataStructures/linux-datastructures-2.md).
|
||||
|
||||
In the next step we can see the functions which are related to the `interrupts handling` subsystem, they are:
|
||||
|
||||
@@ -394,18 +394,18 @@ The next couple of functions are related with the [perf](https://perf.wiki.kerne
|
||||
local_irq_enable();
|
||||
```
|
||||
|
||||
which expands to the `sti` instruction and making post initialization of the [SLAB](http://en.wikipedia.org/wiki/Slab_allocation) with the call of the `kmem_cache_init_late` function (As I wrote above we will know about the `SLAB` in the [Linux memory management](http://xinqiu.gitbooks.io/linux-insides-cn/content/MM/index.html) chapter).
|
||||
which expands to the `sti` instruction and making post initialization of the [SLAB](http://en.wikipedia.org/wiki/Slab_allocation) with the call of the `kmem_cache_init_late` function (As I wrote above we will know about the `SLAB` in the [Linux memory management](/MM/) chapter).
|
||||
|
||||
After the post initialization of the `SLAB`, next point is initialization of the console with the `console_init` function from the [drivers/tty/tty_io.c](https://github.com/torvalds/linux/blob/master/drivers/tty/tty_io.c).
|
||||
|
||||
After the console initialization, we can see the `lockdep_info` function which prints information about the [Lock dependency validator](https://www.kernel.org/doc/Documentation/locking/lockdep-design.txt). After this, we can see the initialization of the dynamic allocation of the `debug objects` with the `debug_objects_mem_init`, kernel memory leak [detector](https://www.kernel.org/doc/Documentation/kmemleak.txt) initialization with the `kmemleak_init`, `percpu` pageset setup with the `setup_per_cpu_pageset`, setup of the [NUMA](http://en.wikipedia.org/wiki/Non-uniform_memory_access) policy with the `numa_policy_init`, setting time for the scheduler with the `sched_clock_init`, `pidmap` initialization with the call of the `pidmap_init` function for the initial `PID` namespace, cache creation with the `anon_vma_init` for the private virtual memory areas and early initialization of the [ACPI](http://en.wikipedia.org/wiki/Advanced_Configuration_and_Power_Interface) with the `acpi_early_init`.
|
||||
|
||||
This is the end of the ninth part of the [linux kernel initialization process](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/index.html) and here we saw initialization of the [RCU](http://en.wikipedia.org/wiki/Read-copy-update). In the last paragraph of this part (`Rest of the initialization process`) we will go through many functions but did not dive into details about their implementations. Do not worry if you do not know anything about these stuff or you know and do not understand anything about this. As I already wrote many times, we will see details of implementations in other parts or other chapters.
|
||||
This is the end of the ninth part of the [linux kernel initialization process](/Initialization/) and here we saw initialization of the [RCU](http://en.wikipedia.org/wiki/Read-copy-update). In the last paragraph of this part (`Rest of the initialization process`) we will go through many functions but did not dive into details about their implementations. Do not worry if you do not know anything about these stuff or you know and do not understand anything about this. As I already wrote many times, we will see details of implementations in other parts or other chapters.
|
||||
|
||||
Conclusion
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
It is the end of the ninth part about the linux kernel [initialization process](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/index.html). In this part, we looked on the initialization process of the `RCU` subsystem. In the next part we will continue to dive into linux kernel initialization process and I hope that we will finish with the `start_kernel` function and will go to the `rest_init` function from the same [init/main.c](https://github.com/torvalds/linux/blob/master/init/main.c) source code file and will see the start of the first process.
|
||||
It is the end of the ninth part about the linux kernel [initialization process](/Initialization/). In this part, we looked on the initialization process of the `RCU` subsystem. In the next part we will continue to dive into linux kernel initialization process and I hope that we will finish with the `start_kernel` function and will go to the `rest_init` function from the same [init/main.c](https://github.com/torvalds/linux/blob/master/init/main.c) source code file and will see the start of the first process.
|
||||
|
||||
If you have any questions or suggestions write me a comment or ping me at [twitter](https://twitter.com/0xAX).
|
||||
|
||||
@@ -423,8 +423,8 @@ Links
|
||||
* [integer ID management](https://lwn.net/Articles/103209/)
|
||||
* [Documentation/memory-barriers.txt](https://www.kernel.org/doc/Documentation/memory-barriers.txt)
|
||||
* [Runtime locking correctness validator](https://www.kernel.org/doc/Documentation/locking/lockdep-design.txt)
|
||||
* [Per-CPU variables](https://xinqiu.gitbooks.io/linux-insides-cn/content/Concepts/linux-cpu-1.html)
|
||||
* [Linux kernel memory management](http://0xax.gitbooks.io/linux-insides/content/MM/index.html)
|
||||
* [Per-CPU variables](/Concepts/linux-cpu-1.md)
|
||||
* [Linux kernel memory management](/MM/)
|
||||
* [slab](http://en.wikipedia.org/wiki/Slab_allocation)
|
||||
* [i2c](http://en.wikipedia.org/wiki/I%C2%B2C)
|
||||
* [Previous part](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-8.html)
|
||||
* [Previous part](/Initialization/linux-initialization-8.md)
|
||||
|
||||
Reference in New Issue
Block a user