mirror of
https://github.com/MintCN/linux-insides-zh.git
synced 2026-04-24 10:40:16 +08:00
Merge commit '80cc145b3f973fad04b6d76c665ded4fafa47ccb' into dev
* commit '80cc145b3f973fad04b6d76c665ded4fafa47ccb': update linker.md 更改翻译状态 翻译3.10节 fix a bug change links 申请翻译3.10节 remove a paragraph 更新翻译状态 更改章节目录翻译 2.4章节翻译 CStatus 修改翻译状态
This commit is contained in:
@@ -6,9 +6,6 @@
|
||||
|
||||
如果你已经看过我之前的[文章](http://0xax.blogspot.com/search/label/asm),就知道之前我开始和底层编程打交道。我写了一些关于 Linux x86_64 汇编的文章。同时,我开始深入研究 Linux 源代码。底层是如果工作的,程序是如何在电脑上运行的,他们是如何在内存中定位的,内核是如何管理进程和内存,网络堆栈是如何在底层工作的等等,这些我都非常感兴趣。因此,我决定去写另外的一系列文章关于 **x86_64** 框架的 Linux 内核。
|
||||
|
||||
请注意我不是一个专业的内核黑客并且我的工作不是为内核贡献代码。这只是小兴趣。我只是喜欢底层的东西,底层是如何工作的让我产生了很大的兴趣。如果你发现任何迷惑的地方或者你有任何问题/备注,[twitter](https://twitter.com/0xAX),[email](anotherworldofworld@gmail.com)我或者提一个[issue](https://github.com/0xAX/linux-insides).(PS:翻译上的问题请mail我:xinqiu.94@gmail.com或github上@xinqiu)。我会很高兴。所有的文章也可以在[linux-insides](https://github.com/0xAX/linux-insides)上看,如果你发现哪里英文或内容错误,随意提个PR。(PS:中文版地址:https://github.com/xinqiu/linux-insides)
|
||||
|
||||
|
||||
*注意这不是官方文档,只是学习和分享知识*
|
||||
|
||||
**需要的基础知识**
|
||||
|
||||
@@ -6,11 +6,11 @@
|
||||
|
||||
* [内核解压之后的首要步骤](https://github.com/0xAX/linux-insides/blob/master/Initialization/linux-initialization-1.md) - 描述内核中的首要步骤。
|
||||
* [早期的中断和异常控制](https://github.com/0xAX/linux-insides/blob/master/Initialization/linux-initialization-2.md) - 描述了早期的中断初始化和早期的缺页处理函数。
|
||||
* [在到达内核端点之前最后的准备](https://github.com/0xAX/linux-insides/blob/master/Initialization/linux-initialization-3.md) - 描述了在调用 start_kernel 之前最后的准备工作。
|
||||
* [内核端点](https://github.com/0xAX/linux-insides/blob/master/Initialization/linux-initialization-4.md) - 描述了内核通用代码中的第一步。
|
||||
* [继续指定架构的初始化](https://github.com/0xAX/linux-insides/blob/master/Initialization/linux-initialization-5.md) - 描述了特定架构的初始化。
|
||||
* [再次初始化指定架构](https://github.com/0xAX/linux-insides/blob/master/Initialization/linux-initialization-6.md) - 描述了再一次的指定架构初始化流程。
|
||||
* [指定架构初始化的最后部分](https://github.com/0xAX/linux-insides/blob/master/Initialization/linux-initialization-7.md) - 描述了指定架构初始化流程的结尾。
|
||||
* [调度初始化](https://github.com/0xAX/linux-insides/blob/master/Initialization/linux-initialization-8.md) - 描述了调度初始化之前的准备工作,以及调度初始化。
|
||||
* [在到达内核入口之前最后的准备](https://github.com/0xAX/linux-insides/blob/master/Initialization/linux-initialization-3.md) - 描述了在调用 start_kernel 之前最后的准备工作。
|
||||
* [内核入口 - start_kernel](https://github.com/0xAX/linux-insides/blob/master/Initialization/linux-initialization-4.md) - 描述了内核通用代码中初始化的第一步。
|
||||
* [体系架构初始化](https://github.com/0xAX/linux-insides/blob/master/Initialization/linux-initialization-5.md) - 描述了特定架构的初始化。
|
||||
* [进一步初始化指定体系架构](https://github.com/0xAX/linux-insides/blob/master/Initialization/linux-initialization-6.md) - 描述了再一次的指定架构初始化流程。
|
||||
* [最后对指定体系架构初始化](https://github.com/0xAX/linux-insides/blob/master/Initialization/linux-initialization-7.md) - 描述了指定架构初始化流程的结尾。
|
||||
* [调度器初始化](https://github.com/0xAX/linux-insides/blob/master/Initialization/linux-initialization-8.md) - 描述了调度初始化之前的准备工作,以及调度初始化。
|
||||
* [RCU 初始化](https://github.com/0xAX/linux-insides/blob/master/Initialization/linux-initialization-9.md) - 描述了 RCU 的初始化。
|
||||
* [初始化结束](https://github.com/0xAX/linux-insides/blob/master/Initialization/linux-initialization-10.md) - Linux内核初始化的最后部分。
|
||||
@@ -1,73 +1,71 @@
|
||||
Kernel initialization. Part 4.
|
||||
内核初始化. Part 4.
|
||||
================================================================================
|
||||
|
||||
Kernel entry point
|
||||
================================================================================
|
||||
|
||||
If you have read the previous part - [Last preparations before the kernel entry point](https://github.com/MintCN/linux-insides-zh/blob/master/Initialization/linux-initialization-3.md), you can remember that we finished all pre-initialization stuff and stopped right before the call to the `start_kernel` function from the [init/main.c](https://github.com/torvalds/linux/blob/master/init/main.c). The `start_kernel` is the entry of the generic and architecture independent kernel code, although we will return to the `arch/` folder many times. If you look inside of the `start_kernel` function, you will see that this function is very big. For this moment it contains about `86` calls of functions. Yes, it's very big and of course this part will not cover all the processes that occur in this function. In the current part we will only start to do it. This part and all the next which will be in the [Kernel initialization process](https://github.com/MintCN/linux-insides-zh/blob/master/Initialization/README.md) chapter will cover it.
|
||||
还记得上一章的内容吗 - [跳转到内核入口之前的最后准备](https://github.com/MintCN/linux-insides-zh/blob/master/Initialization/linux-initialization-3.md)?你应该还记得我们已经完成一系列初始化操作停在了`start_kernel`函数位于`init/main.c`.`start_kernel`函数是于体系架构无关的通用处理入口函数,尽管我们在此初始化过程中要无数次的返回arch/ 文件夹。如果你仔细看看`start_kernel`函数的内容,你将发现此函数涉及内容非常广泛。在此过程中约包含了86个调用函数,是的,你发现它真的是非常庞大但是此部分并不是全部的初始化过程,在当前阶段我们只看这些就可以了。此章节以及后续所有的内容章节[内核初始化过程](https://github.com/MintCN/linux-insides-zh/blob/master/Initialization/README.md)我们都将涉及并详述。
|
||||
|
||||
The main purpose of the `start_kernel` to finish kernel initialization process and launch the first `init` process. Before the first process will be started, the `start_kernel` must do many things such as: to enable [lock validator](https://www.kernel.org/doc/Documentation/locking/lockdep-design.txt), to initialize processor id, to enable early [cgroups](http://en.wikipedia.org/wiki/Cgroups) subsystem, to setup per-cpu areas, to initialize different caches in [vfs](http://en.wikipedia.org/wiki/Virtual_file_system), to initialize memory manager, rcu, vmalloc, scheduler, IRQs, ACPI and many many more. Only after these steps will we see the launch of the first `init` process in the last part of this chapter. So much kernel code awaits us, let's start.
|
||||
`start_kernel`函数的主要目的是完成内核初始化并启动祖先进程(1号进程)。在祖先进程启动之前`start_kernel`函数做了很多事情,如[锁验证器](https://www.kernel.org/doc/Documentation/locking/lockdep-design.txt),根据处理器标识ID初始化处理器,开启cgroups子系统,设置每CPU区域环境,初始化[VFS](http://en.wikipedia.org/wiki/Virtual_file_system) Cache机制,初始化内存管理,rcu,vmalloc,scheduler(调度器),IRQs(中断向量表),ACPI(中断可编程控制器)以及其它很多子系统。只有经过这些步骤我们才看到本章最后一部分祖先进程启动的过程;同志们,如此复杂的内核子系统,有没有勾起你的学习欲望,有这么多的内核代码等着我们去征服,让我们开始吧。
|
||||
|
||||
**NOTE: All parts from this big chapter `Linux Kernel initialization process` will not cover anything about debugging. There will be a separate chapter about kernel debugging tips.**
|
||||
**注意:在此大章节的所有内容 `Linux Kernel initialization process`,并不涉及内核调试相关,关于内核调试部分会有一个单独的章节来进行描述**
|
||||
|
||||
A little about function attributes
|
||||
关于 `__attribute__`
|
||||
---------------------------------------------------------------------------------
|
||||
|
||||
As I wrote above, the `start_kernel` function is defined in the [init/main.c](https://github.com/torvalds/linux/blob/master/init/main.c). This function defined with the `__init` attribute and as you already may know from other parts, all functions which are defined with this attribute are necessary during kernel initialization.
|
||||
正如我上述所写,`start_kernel`函数是定义在[init/main.c](https://github.com/torvalds/linux/blob/master/init/main.c).从已知代码中我们能看到此函数使用了`__init`特性,你也许从其它地方了解过关于GCC `__attribute__`相关的内容。在内核初始化阶段这个机制在所有的函数中都是有必要的。
|
||||
|
||||
```C
|
||||
#define __init __section(.init.text) __cold notrace
|
||||
#define __init __section(.init.text) __cold notrace
|
||||
```
|
||||
|
||||
After the initialization process have finished, the kernel will release these sections with a call to the `free_initmem` function. Note also that `__init` is defined with two attributes: `__cold` and `notrace`. The purpose of the first `cold` attribute is to mark that the function is rarely used and the compiler must optimize this function for size. The second `notrace` is defined as:
|
||||
在初始化过程完成后,内核将通过调用`free_initmem`释放这些sections(段)。注意`__init`属性是通过`__cold`和`notrace`两个属性来定义的。第一个属性`cold`的目的是标记此函数很少使用所以编译器必须优化此函数的大小,第二个属性`notrace`定义如下:
|
||||
|
||||
```C
|
||||
#define notrace __attribute__((no_instrument_function))
|
||||
#define notrace __attribute__((no_instrument_function))
|
||||
```
|
||||
|
||||
where `no_instrument_function` says to the compiler not to generate profiling function calls.
|
||||
含有`no_instrument_function`意思就是告诉编译器函数调用不产生环境变量(堆栈空间)。
|
||||
|
||||
In the definition of the `start_kernel` function, you can also see the `__visible` attribute which expands to the:
|
||||
在`start_kernel`函数的定义中,你也可以看到`__visible` 属性的扩展:
|
||||
|
||||
```
|
||||
#define __visible __attribute__((externally_visible))
|
||||
#define __visible __attribute__((externally_visible))
|
||||
```
|
||||
|
||||
where `externally_visible` tells to the compiler that something uses this function or variable, to prevent marking this function/variable as `unusable`. You can find the definition of this and other macro attributes in [include/linux/init.h](https://github.com/torvalds/linux/blob/master/include/linux/init.h).
|
||||
含有`externally_visible`意思就是告诉编译器有一些过程在使用该函数或者变量,为了放至标记这个函数/变量是`unusable`。你可以在此[include/linux/init.h](https://github.com/torvalds/linux/blob/master/include/linux/init.h)处查到这些属性表达式的含义。
|
||||
|
||||
First steps in the start_kernel
|
||||
start_kernel 初始化
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
At the beginning of the `start_kernel` you can see the definition of these two variables:
|
||||
在start_kernel的初始之初你可以看到这两个变量:
|
||||
|
||||
```C
|
||||
char *command_line;
|
||||
char *after_dashes;
|
||||
```
|
||||
|
||||
The first represents a pointer to the kernel command line and the second will contain the result of the `parse_args` function which parses an input string with parameters in the form `name=value`, looking for specific keywords and invoking the right handlers. We will not go into the details related with these two variables at this time, but will see it in the next parts. In the next step we can see a call to the:
|
||||
第一个变量表示内核命令行的全局指针,第二个变量将包含`parse_args`函数通过输入字符串中的参数'name=value',寻找特定的关键字和调用正确的处理程序。我们不想在这个时候参与这两个变量的相关细节,但是会在接下来的章节看到。我们接着往下走,下一步我们看到了此函数:
|
||||
|
||||
```C
|
||||
lockdep_init();
|
||||
```
|
||||
|
||||
function. `lockdep_init` initializes [lock validator](https://www.kernel.org/doc/Documentation/locking/lockdep-design.txt). Its implementation is pretty simple, it just initializes two [list_head](https://github.com/MintCN/linux-insides-zh/blob/master/DataStructures/dlist.md) hashes and sets the `lockdep_initialized` global variable to `1`. Lock validator detects circular lock dependencies and is called when any [spinlock](http://en.wikipedia.org/wiki/Spinlock) or [mutex](http://en.wikipedia.org/wiki/Mutual_exclusion) is acquired.
|
||||
`lockdep_init` 初始化 [lock validator](https://www.kernel.org/doc/Documentation/locking/lockdep-design.txt). 其实现是相当简单的,它只是初始化了两个哈希表 [list_head](https://github.com/MintCN/linux-insides-zh/blob/master/DataStructures/dlist.md)并设置`lockdep_initialized` 全局变量为`1`。
|
||||
关于自旋锁 [spinlock](http://en.wikipedia.org/wiki/Spinlock)以及互斥锁[mutex](http://en.wikipedia.org/wiki/Mutual_exclusion) 如何获取请参考链接.
|
||||
|
||||
The next function is `set_task_stack_end_magic` which takes address of the `init_task` and sets `STACK_END_MAGIC` (`0x57AC6E9D`) as canary for it. `init_task` represents the initial task structure:
|
||||
下一个函数是`set_task_stack_end_magic`,参数为`init_task`和设置`STACK_END_MAGIC` (`0x57AC6E9D`)。`init_task`代表初始化进程(任务)数据结构:
|
||||
|
||||
```C
|
||||
struct task_struct init_task = INIT_TASK(init_task);
|
||||
```
|
||||
`task_struct` 存储了进程的所有相关信息。因为它很庞大,我在这本书并不会去介绍,详细信息你可以查看调度相关数据结构定义头文件 [include/linux/sched.h](https://github.com/torvalds/linux/blob/master/include/linux/sched.h#L1278)。在此刻`task_sreuct`包含了超过`100`个字段!虽然你不会看到`task_struct`是在这本书中的解释,但是我们会经常使用它,因为它是介绍在Linux内核`进程`的基本知识。我将描述这个结构中字段的一些含义,因为我们在后面的实践中见到它们。
|
||||
|
||||
where `task_struct` stores all the information about a process. I will not explain this structure in this book because it's very big. You can find its definition in [include/linux/sched.h](https://github.com/torvalds/linux/blob/master/include/linux/sched.h#L1278). At this moment `task_struct` contains more than `100` fields! Although you will not see the explanation of the `task_struct` in this book, we will use it very often since it is the fundamental structure which describes the `process` in the Linux kernel. I will describe the meaning of the fields of this structure as we meet them in practice.
|
||||
你也可以查看`init_task`的相关定义以及宏指令`INIT_TASK`的初始化流程。这个宏指令来自于[include/linux/init_task.h](https://github.com/torvalds/linux/blob/master/include/linux/init_task.h)在此刻只是设置和初始化了第一个进程来(0号进程)的值。例如这么设置:
|
||||
* 初始化进程状态为 zero 或者 `runnable`. 一个可运行进程即为等待CPU去运行;
|
||||
* 初始化仅存的标志位 - `PF_KTHREAD` 意思为 - 内核线程;
|
||||
* 一个可运行的任务列表;
|
||||
* 进程地址空间;
|
||||
* 初始化进程堆栈 `&init_thread_info` - `init_thread_union.thread_info` 和 `initthread_union` 使用共用体 - `thread_union` 包含了 `thread_info`进程信息以及进程栈:。
|
||||
|
||||
You can see the definition of the `init_task` and it initialized by the `INIT_TASK` macro. This macro is from [include/linux/init_task.h](https://github.com/torvalds/linux/blob/master/include/linux/init_task.h) and it just fills the `init_task` with the values for the first process. For example it sets:
|
||||
|
||||
* init process state to zero or `runnable`. A runnable process is one which is waiting only for a CPU to run on;
|
||||
* init process flags - `PF_KTHREAD` which means - kernel thread;
|
||||
* a list of runnable task;
|
||||
* process address space;
|
||||
* init process stack to the `&init_thread_info` which is `init_thread_union.thread_info` and `initthread_union` has type - `thread_union` which contains `thread_info` and process stack:
|
||||
|
||||
```C
|
||||
union thread_union {
|
||||
@@ -75,8 +73,7 @@ union thread_union {
|
||||
unsigned long stack[THREAD_SIZE/sizeof(long)];
|
||||
};
|
||||
```
|
||||
|
||||
Every process has its own stack and it is 16 kilobytes or 4 page frames. in `x86_64`. We can note that it is defined as array of `unsigned long`. The next field of the `thread_union` is - `thread_info` defined as:
|
||||
每个进程都有其自己的堆栈,`x86_64`架构的CPU一般支持的页表是16KB or 4个页框大小。我们注意stack变量被定义为数据并且类型是`unsigned long`。`thread_union`结构的下一个字段为`thread_union` 定义如下:
|
||||
|
||||
```C
|
||||
struct thread_info {
|
||||
@@ -93,10 +90,8 @@ struct thread_info {
|
||||
unsigned int uaccess_err:1;
|
||||
};
|
||||
```
|
||||
|
||||
and occupies 52 bytes. The `thread_info` structure contains architecture-specific information on the thread. We know that on `x86_64` the stack grows down and `thread_union.thread_info` is stored at the bottom of the stack in our case. So the process stack is 16 kilobytes and `thread_info` is at the bottom. The remaining thread_size will be `16 kilobytes - 62 bytes = 16332 bytes`. Note that `thread_union` represented as the [union](http://en.wikipedia.org/wiki/Union_type) and not structure, it means that `thread_info` and stack share the memory space.
|
||||
|
||||
Schematically it can be represented as follows:
|
||||
此结构占用52个字节。`thread_info`结构包含了特定体系架构相关的线程信息,我们都知道在`X86_64`架构上内核栈是逆生成而`thread_union.thread_info`结构则是正生长。所以进程进程栈是16KB并且`thread_info`是在栈底。还需我们处理`16 kilobytes - 62 bytes = 16332 bytes`.注意 `thread_union`代表一个联合体[union](http://en.wikipedia.org/wiki/Union_type)而不是结构体,用一张图来描述栈内存空间。
|
||||
如下图所示:
|
||||
|
||||
```C
|
||||
+-----------------------+
|
||||
@@ -117,9 +112,10 @@ Schematically it can be represented as follows:
|
||||
|
||||
http://www.quora.com/In-Linux-kernel-Why-thread_info-structure-and-the-kernel-stack-of-a-process-binds-in-union-construct
|
||||
|
||||
So the `INIT_TASK` macro fills these `task_struct's` fields and many many more. As I already wrote above, I will not describe all the fields and values in the `INIT_TASK` macro but we will see them soon.
|
||||
所以`INIT_TASK`宏指令就是`task_struct's`'结构。正如我上述所写,我并不会去描述这些字段的含义和值,在`INIT_TASK`赋值处理的时候我们很快能看到这些。
|
||||
|
||||
现在让我们回到`set_task_stack_end_magic`函数,这个函数被定义在[kernel/fork.c](https://github.com/torvalds/linux/blob/master/kernel/fork.c#L297)功能为设置[canary](http://en.wikipedia.org/wiki/Stack_buffer_overflow) `init` 进程堆栈以检测堆栈溢出。
|
||||
|
||||
Now let's go back to the `set_task_stack_end_magic` function. This function defined in the [kernel/fork.c](https://github.com/torvalds/linux/blob/master/kernel/fork.c#L297) and sets a [canary](http://en.wikipedia.org/wiki/Stack_buffer_overflow) to the `init` process stack to prevent stack overflow.
|
||||
|
||||
```C
|
||||
void set_task_stack_end_magic(struct task_struct *tsk)
|
||||
@@ -130,19 +126,20 @@ void set_task_stack_end_magic(struct task_struct *tsk)
|
||||
}
|
||||
```
|
||||
|
||||
Its implementation is simple. `set_task_stack_end_magic` gets the end of the stack for the given `task_struct` with the `end_of_stack` function. The end of a process stack depends on the `CONFIG_STACK_GROWSUP` configuration option. As we learn in `x86_64` architecture, the stack grows down. So the end of the process stack will be:
|
||||
上述函数比较简单,`set_task_stack_end_magic`函数的作用是先通过`end_of_stack`函数获取堆栈并赋给 `task_struct`。
|
||||
关于检测配置需要打开内核配置宏`CONFIG_STACK_GROWSUP`。因为我们学习的是x86架构的初始化,堆栈是逆生成,所以堆栈底部为:
|
||||
|
||||
```C
|
||||
(unsigned long *)(task_thread_info(p) + 1);
|
||||
```
|
||||
|
||||
where `task_thread_info` just returns the stack which we filled with the `INIT_TASK` macro:
|
||||
`task_thread_info`的定义如下,返回一个当前的堆栈;
|
||||
|
||||
```C
|
||||
#define task_thread_info(task) ((struct thread_info *)(task)->stack)
|
||||
```
|
||||
|
||||
As we got the end of the init process stack, we write `STACK_END_MAGIC` there. After `canary` is set, we can check it like this:
|
||||
进程的栈底,我们写`STACK_END_MAGIC`这个值。如果设置`canary`,我们可以像这样子去检测堆栈:
|
||||
|
||||
```C
|
||||
if (*end_of_stack(task) != STACK_END_MAGIC) {
|
||||
@@ -152,7 +149,7 @@ if (*end_of_stack(task) != STACK_END_MAGIC) {
|
||||
}
|
||||
```
|
||||
|
||||
The next function after the `set_task_stack_end_magic` is `smp_setup_processor_id`. This function has an empty body for `x86_64`:
|
||||
`set_task_stack_end_magic` 初始化完毕后的下一个函数是 `smp_setup_processor_id`.此函数在`x86_64`架构上是空函数:
|
||||
|
||||
```C
|
||||
void __init __weak smp_setup_processor_id(void)
|
||||
@@ -160,11 +157,12 @@ void __init __weak smp_setup_processor_id(void)
|
||||
}
|
||||
```
|
||||
|
||||
as it not implemented for all architectures, but some such as [s390](http://en.wikipedia.org/wiki/IBM_ESA/390) and [arm64](http://en.wikipedia.org/wiki/ARM_architecture#64.2F32-bit_architecture).
|
||||
在此架构上没有实现此函数,但在别的体系架构的实现可以参考[s390](http://en.wikipedia.org/wiki/IBM_ESA/390) and [arm64](http://en.wikipedia.org/wiki/ARM_architecture#64.2F32-bit_architecture).
|
||||
|
||||
The next function in `start_kernel` is `debug_objects_early_init`. Implementation of this function is almost the same as `lockdep_init`, but fills hashes for object debugging. As I wrote above, we will not see the explanation of this and other functions which are for debugging purposes in this chapter.
|
||||
我们接着往下走,下一个函数是`debug_objects_early_init`。此函数的执行几乎和`lockdep_init`是一样的,但是填充的哈希对象是调试相关。上述我已经表明,关于内核调试部分会在后续专门有一个章节来完成。
|
||||
|
||||
After the `debug_object_early_init` function we can see the call of the `boot_init_stack_canary` function which fills `task_struct->canary` with the canary value for the `-fstack-protector` gcc feature. This function depends on the `CONFIG_CC_STACKPROTECTOR` configuration option and if this option is disabled, `boot_init_stack_canary` does nothing, otherwise it generates random numbers based on random pool and the [TSC](http://en.wikipedia.org/wiki/Time_Stamp_Counter):
|
||||
`debug_object_early_init`函数之后我们看到调用了`boot_init_stack_canary`函数。`task_struct->canary` 的值利用了GCC特性,但是此特性需要先使能内核`CONFIG_CC_STACKPROTECTOR`宏后才可以使用。
|
||||
`boot_init_stack_canary` 什么也没有做, 否则基于随机数和随机池产生 [TSC](http://en.wikipedia.org/wiki/Time_Stamp_Counter):
|
||||
|
||||
```C
|
||||
get_random_bytes(&canary, sizeof(canary));
|
||||
@@ -172,19 +170,19 @@ tsc = __native_read_tsc();
|
||||
canary += tsc + (tsc << 32UL);
|
||||
```
|
||||
|
||||
After we got a random number, we fill the `stack_canary` field of `task_struct` with it:
|
||||
我们要获取随机数, 我们可以给`stack_canary` 字段 `task_struct`赋值:
|
||||
|
||||
```C
|
||||
current->stack_canary = canary;
|
||||
```
|
||||
|
||||
and write this value to the top of the IRQ stack with the:
|
||||
然后将此值写入IRQ堆栈的顶部:
|
||||
|
||||
```C
|
||||
this_cpu_write(irq_stack_union.stack_canary, canary); // read below about this_cpu_write
|
||||
```
|
||||
|
||||
Again, we will not dive into details here, we will cover it in the part about [IRQs](http://en.wikipedia.org/wiki/Interrupt_request_%28PC_architecture%29). As canary is set, we disable local and early boot IRQs and register the bootstrap CPU in the CPU maps. We disable local IRQs (interrupts for current CPU) with the `local_irq_disable` macro which expands to the call of the `arch_local_irq_disable` function from [include/linux/percpu-defs.h](https://github.com/torvalds/linux/blob/master/include/linux/percpu-defs.h):
|
||||
关于IRQ的章节我们这里也不会详细刨析, 关于这部分介绍看这里[IRQs](http://en.wikipedia.org/wiki/Interrupt_request_%28PC_architecture%29).如果canary被设置, 关闭本地中断注册bootstrap CPU以及CPU maps. 我们关闭本地中断 (interrupts for current CPU) 使用 `local_irq_disable` 函数,展开后原型为 `arch_local_irq_disable` 函数[include/linux/percpu-defs.h](https://github.com/torvalds/linux/blob/master/include/linux/percpu-defs.h):
|
||||
|
||||
```C
|
||||
static inline notrace void arch_local_irq_enable(void)
|
||||
@@ -192,31 +190,30 @@ static inline notrace void arch_local_irq_enable(void)
|
||||
native_irq_enable();
|
||||
}
|
||||
```
|
||||
如果`native_irq_enable`通过`cli`指令判断架构,这里是`X86_64`,
|
||||
Where `native_irq_enable` is `cli` instruction for `x86_64`.中断的关闭(屏蔽)我们可以通过注册当前CPU ID到CPU bitmap来实现。
|
||||
|
||||
Where `native_irq_enable` is `cli` instruction for `x86_64`. As interrupts are disabled we can register the current CPU with the given ID in the CPU bitmap.
|
||||
|
||||
The first processor activation
|
||||
激活第一个CPU
|
||||
---------------------------------------------------------------------------------
|
||||
|
||||
The current function from the `start_kernel` is `boot_cpu_init`. This function initializes various CPU masks for the bootstrap processor. First of all it gets the bootstrap processor id with a call to:
|
||||
当前已经走到`start_kernel`函数中的`boot_cpu_init`函数,此函数主要为了通过掩码初始化每一个CPU。首先我们需要获取当前处理器的ID通过下面函数:
|
||||
|
||||
```C
|
||||
int cpu = smp_processor_id();
|
||||
```
|
||||
|
||||
For now it is just zero. If the `CONFIG_DEBUG_PREEMPT` configuration option is disabled, `smp_processor_id` just expands to the call of `raw_smp_processor_id` which expands to the:
|
||||
现在是0. 如果`CONFIG_DEBUG_PREEMPT` 宏配置了那么 `smp_processor_id` 的值就来自于 `raw_smp_processor_id` 函数,原型如下:
|
||||
|
||||
```C
|
||||
#define raw_smp_processor_id() (this_cpu_read(cpu_number))
|
||||
```
|
||||
|
||||
`this_cpu_read` as many other function like this (`this_cpu_write`, `this_cpu_add` and etc...) defined in the [include/linux/percpu-defs.h](https://github.com/torvalds/linux/blob/master/include/linux/percpu-defs.h) and presents `this_cpu` operation. These operations provide a way of optimizing access to the [per-cpu](http://xinqiu.gitbooks.io/linux-insides-cn/content/Theory/per-cpu.html) variables which are associated with the current processor. In our case it is `this_cpu_read`:
|
||||
`this_cpu_read` 函数与其它很多函数一样如(`this_cpu_write`, `this_cpu_add` 等等...) 被定义在[include/linux/percpu-defs.h](https://github.com/torvalds/linux/blob/master/include/linux/percpu-defs.h) 此部分函数主要为对 `this_cpu` 进行操作. 这些操作提供不同的对每cpu[per-cpu](http://xinqiu.gitbooks.io/linux-insides-cn/content/Theory/per-cpu.html) 变量相关访问方式. 譬如让我们来看看这个函数 `this_cpu_read`:
|
||||
|
||||
```
|
||||
__pcpu_size_call_return(this_cpu_read_, pcp)
|
||||
```
|
||||
|
||||
Remember that we have passed `cpu_number` as `pcp` to the `this_cpu_read` from the `raw_smp_processor_id`. Now let's look at the `__pcpu_size_call_return` implementation:
|
||||
还记得上面我们所写,每cpu变量`cpu_number` 的值是`this_cpu_read`通过`raw_smp_processor_id`来得到,现在让我们看看 `__pcpu_size_call_return`的执行:
|
||||
|
||||
```C
|
||||
#define __pcpu_size_call_return(stem, variable) \
|
||||
@@ -234,40 +231,40 @@ Remember that we have passed `cpu_number` as `pcp` to the `this_cpu_read` from t
|
||||
pscr_ret__; \
|
||||
})
|
||||
```
|
||||
|
||||
Yes, it looks a little strange but it's easy. First of all we can see the definition of the `pscr_ret__` variable with the `int` type. Why int? Ok, `variable` is `common_cpu` and it was declared as per-cpu int variable:
|
||||
是的,此函数虽然看起起奇怪但是它的实现是简单的,我们看到`pscr_ret__` 变量的定义是`int`类型,为什么是int类型呢?好吧,`变量`是`common_cpu` 它声明了每cpu(per-cpu)变量:
|
||||
|
||||
```C
|
||||
DECLARE_PER_CPU_READ_MOSTLY(int, cpu_number);
|
||||
```
|
||||
|
||||
In the next step we call `__verify_pcpu_ptr` with the address of `cpu_number`. `__veryf_pcpu_ptr` used to verify that the given parameter is a per-cpu pointer. After that we set `pscr_ret__` value which depends on the size of the variable. Our `common_cpu` variable is `int`, so it 4 bytes in size. It means that we will get `this_cpu_read_4(common_cpu)` in `pscr_ret__`. In the end of the `__pcpu_size_call_return` we just call it. `this_cpu_read_4` is a macro:
|
||||
在下一个步骤中我们调用了`__verify_pcpu_ptr`通过使用一个有效的每cpu变量指针来取地址得到`cpu_number`。之后我们通过`pscr_ret__` 函数设置变量的大小,`common_cpu`变量是`int`,所以它的大小是4字节。意思就是我们通过`this_cpu_read_4(common_cpu)`获取cpu变量其大小被`pscr_ret__`决定。在`__pcpu_size_call_return`的结束 我们调用了__pcpu_size_call_return:
|
||||
|
||||
```C
|
||||
#define this_cpu_read_4(pcp) percpu_from_op("mov", pcp)
|
||||
```
|
||||
|
||||
which calls `percpu_from_op` and pass `mov` instruction and per-cpu variable there. `percpu_from_op` will expand to the inline assembly call:
|
||||
需要调用`percpu_from_op` 并且通过`mov`指令来传递每cpu变量,`percpu_from_op`的内联扩展如下:
|
||||
|
||||
|
||||
```C
|
||||
asm("movl %%gs:%1,%0" : "=r" (pfo_ret__) : "m" (common_cpu))
|
||||
```
|
||||
|
||||
Let's try to understand how it works and what it does. The `gs` segment register contains the base of per-cpu area. Here we just copy `common_cpu` which is in memory to the `pfo_ret__` with the `movl` instruction. Or with another words:
|
||||
让我们尝试理解此函数是如果工作的,`gs`段寄存器包含每个CPU区域的初始值,这里我们通过`mov`指令copy `common_cpu`到内存中去,此函数还有另外的形式:
|
||||
|
||||
```C
|
||||
this_cpu_read(common_cpu)
|
||||
```
|
||||
|
||||
is the same as:
|
||||
等价于:
|
||||
|
||||
```C
|
||||
movl %gs:$common_cpu, $pfo_ret__
|
||||
```
|
||||
|
||||
As we didn't setup per-cpu area, we have only one - for the current running CPU, we will get `zero` as a result of the `smp_processor_id`.
|
||||
由于我们没有设置每个CPU的区域,我们只有一个 - 为当前CPU的值`zero` 通过此函数 `smp_processor_id`返回.
|
||||
|
||||
As we got the current processor id, `boot_cpu_init` sets the given CPU online, active, present and possible with the:
|
||||
返回的ID表示我们处于哪一个CPU上, `boot_cpu_init` 函数设置了CPU的在线, 激活, 当前的设置为:
|
||||
|
||||
```C
|
||||
set_cpu_online(cpu, true);
|
||||
@@ -276,27 +273,26 @@ set_cpu_present(cpu, true);
|
||||
set_cpu_possible(cpu, true);
|
||||
```
|
||||
|
||||
All of these functions use the concept - `cpumask`. `cpu_possible` is a set of CPU ID's which can be plugged in at any time during the life of that system boot. `cpu_present` represents which CPUs are currently plugged in. `cpu_online` represents subset of the `cpu_present` and indicates CPUs which are available for scheduling. These masks depend on the `CONFIG_HOTPLUG_CPU` configuration option and if this option is disabled `possible == present` and `active == online`. Implementation of the all of these functions are very similar. Every function checks the second parameter. If it is `true`, it calls `cpumask_set_cpu` or `cpumask_clear_cpu` otherwise.
|
||||
上述我们所有使用的这些CPU的配置我们称之为- CPU掩码`cpumask`. `cpu_possible` 则是设置支持CPU热插拔时候的CPU ID. `cpu_present` 表示当前热插拔的CPU. `cpu_online`表示当前所有在线的CPU以及通过 `cpu_present` 来决定被调度出去的CPU. CPU热插拔的操作需要打开内核配置宏`CONFIG_HOTPLUG_CPU`并且将 `possible == present` 以及`active == online`选项禁用。这些功能都非常相似,每个函数都需要检查第二个参数,如果设置为`true`,需要通过调用`cpumask_set_cpu` or `cpumask_clear_cpu`来改变状态。
|
||||
|
||||
For example let's look at `set_cpu_possible`. As we passed `true` as the second parameter, the:
|
||||
譬如我们可以通过true或者第二个参数来这么调用:
|
||||
|
||||
```C
|
||||
cpumask_set_cpu(cpu, to_cpumask(cpu_possible_bits));
|
||||
```
|
||||
|
||||
will be called. First of all let's try to understand the `to_cpumask` macro. This macro casts a bitmap to a `struct cpumask *`. CPU masks provide a bitmap suitable for representing the set of CPU's in a system, one bit position per CPU number. CPU mask presented by the `cpu_mask` structure:
|
||||
让我们继续尝试理解`to_cpumask`宏指令,此宏指令转化为一个位图通过`struct cpumask *`,CPU掩码提供了位图集代表了当前系统中所有的CPU's,每CPU都占用1bit,CPU掩码相关定义通过`cpu_mask`结构定义:
|
||||
|
||||
```C
|
||||
typedef struct cpumask { DECLARE_BITMAP(bits, NR_CPUS); } cpumask_t;
|
||||
```
|
||||
|
||||
which is just bitmap declared with the `DECLARE_BITMAP` macro:
|
||||
在来看下面一组函数定义了位图宏指令。
|
||||
|
||||
```C
|
||||
#define DECLARE_BITMAP(name, bits) unsigned long name[BITS_TO_LONGS(bits)]
|
||||
```
|
||||
|
||||
As we can see from its definition, the `DECLARE_BITMAP` macro expands to the array of `unsigned long`. Now let's look at how the `to_cpumask` macro is implemented:
|
||||
正如我们看到的定义一样, `DECLARE_BITMAP`宏指令的原型是一个`unsigned long`的数组,现在让我们查看如何执行`to_cpumask`:
|
||||
|
||||
```C
|
||||
#define to_cpumask(bitmap) \
|
||||
@@ -304,7 +300,7 @@ As we can see from its definition, the `DECLARE_BITMAP` macro expands to the arr
|
||||
: (void *)sizeof(__check_is_bitmap(bitmap))))
|
||||
```
|
||||
|
||||
I don't know about you, but it looked really weird for me at the first time. We can see a ternary operator here which is `true` every time, but why the `__check_is_bitmap` here? It's simple, let's look at it:
|
||||
我不知道你是怎么想的, 但是我是这么想的,我看到此函数其实就是一个条件判断语句当条件为真的时候,但是为什么执行`__check_is_bitmap`?让我们看看`__check_is_bitmap`的定义:
|
||||
|
||||
```C
|
||||
static inline int __check_is_bitmap(const unsigned long *bitmap)
|
||||
@@ -313,70 +309,72 @@ static inline int __check_is_bitmap(const unsigned long *bitmap)
|
||||
}
|
||||
```
|
||||
|
||||
Yeah, it just returns `1` every time. Actually we need in it here only for one purpose: at compile time it checks that the given `bitmap` is a bitmap, or in other words it checks that the given `bitmap` has a type of `unsigned long *`. So we just pass `cpu_possible_bits` to the `to_cpumask` macro for converting the array of `unsigned long` to the `struct cpumask *`. Now we can call `cpumask_set_cpu` function with the `cpu` - 0 and `struct cpumask *cpu_possible_bits`. This function makes only one call of the `set_bit` function which sets the given `cpu` in the cpumask. All of these `set_cpu_*` functions work on the same principle.
|
||||
原来此函数始终返回1,事实上我们需要这样的函数才达到我们的目的: 它在编译时给定一个`bitmap`,换句话将就是检查`bitmap`的类型是否是`unsigned long *`,因此我们仅仅通过`to_cpumask`宏指令将类型为`unsigned long`的数组转化为`struct cpumask *`。现在我们可以调用`cpumask_set_cpu` 函数,这个函数仅仅是一个 `set_bit`给CPU掩码的功能函数。所有的这些`set_cpu_*`函数的原理都是一样的。
|
||||
|
||||
If you're not sure that this `set_cpu_*` operations and `cpumask` are not clear for you, don't worry about it. You can get more info by reading the special part about it - [cpumask](http://xinqiu.gitbooks.io/linux-insides-cn/content/Concepts/cpumask.html) or [documentation](https://www.kernel.org/doc/Documentation/cpu-hotplug.txt).
|
||||
如果你还不确定`set_cpu_*`这些函数的操作并且不能理解 `cpumask`的概念,不要担心。你可以通过读取这些章节[cpumask](http://xinqiu.gitbooks.io/linux-insides-cn/content/Concepts/cpumask.html) or [documentation](https://www.kernel.org/doc/Documentation/cpu-hotplug.txt).来继续了解和学习这些函数的原理。
|
||||
|
||||
As we activated the bootstrap processor, it's time to go to the next function in the `start_kernel.` Now it is `page_address_init`, but this function does nothing in our case, because it executes only when all `RAM` can't be mapped directly.
|
||||
现在我们已经激活第一个CPU,我们继续接着start_kernel函数往下走,下面的函数是`page_address_init`,但是此函数不执行任何操作,因为只有当所有内存不能直接映射的时候才会执行。
|
||||
|
||||
Print linux banner
|
||||
Linux 内核的第一条打印信息
|
||||
---------------------------------------------------------------------------------
|
||||
|
||||
The next call is `pr_notice`:
|
||||
下面调用了pr_notice函数。
|
||||
|
||||
```C
|
||||
#define pr_notice(fmt, ...) \
|
||||
printk(KERN_NOTICE pr_fmt(fmt), ##__VA_ARGS__)
|
||||
```
|
||||
|
||||
as you can see it just expands to the `printk` call. At this moment we use `pr_notice` to print the Linux banner:
|
||||
|
||||
pr_notice其实是printk的扩展,这里我们使用它打印了Linux 的banner。
|
||||
|
||||
```C
|
||||
pr_notice("%s", linux_banner);
|
||||
```
|
||||
|
||||
which is just the kernel version with some additional parameters:
|
||||
打印的是内核的版本号以及编译环境信息:
|
||||
|
||||
```
|
||||
Linux version 4.0.0-rc6+ (alex@localhost) (gcc version 4.9.1 (Ubuntu 4.9.1-16ubuntu6) ) #319 SMP
|
||||
```
|
||||
|
||||
Architecture-dependent parts of initialization
|
||||
依赖于体系结构的初始化部分
|
||||
---------------------------------------------------------------------------------
|
||||
|
||||
The next step is architecture-specific initialization. The Linux kernel does it with the call of the `setup_arch` function. This is a very big function like `start_kernel` and we do not have time to consider all of its implementation in this part. Here we'll only start to do it and continue in the next part. As it is `architecture-specific`, we need to go again to the `arch/` directory. The `setup_arch` function defined in the [arch/x86/kernel/setup.c](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/setup.c) source code file and takes only one argument - address of the kernel command line.
|
||||
下个步骤我们就要进入到指定的体系架构的初始函数,Linux 内核初始化体系架构相关调用`setup_arch`函数,这又是一个类型于`start_kernel`一版的庞大函数,这里我们仅仅简单描述,在下一个章节我们将继续深入。指定体系架构的内容,我们需要再一次阅读`arch/`目录,`setup_arch`函数定义在[arch/x86/kernel/setup.c](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/setup.c) 文件中,此函数就一个参数-内核命令行。
|
||||
|
||||
This function starts from the reserving memory block for the kernel `_text` and `_data` which starts from the `_text` symbol (you can remember it from the [arch/x86/kernel/head_64.S](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/head_64.S#L46)) and ends before `__bss_stop`. We are using `memblock` for the reserving of memory block:
|
||||
此函数解析内核的段`_text`和`_data`来自于`_text`符号和`_bss_stop`(你应该还记得此文件[arch/x86/kernel/head_64.S](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/head_64.S#L46))。我们使用`memblock`来解析内存块。
|
||||
|
||||
```C
|
||||
memblock_reserve(__pa_symbol(_text), (unsigned long)__bss_stop - (unsigned long)_text);
|
||||
```
|
||||
|
||||
You can read about `memblock` in the [Linux kernel memory management Part 1.](http://xinqiu.gitbooks.io/linux-insides-cn/content/mm/linux-mm-1.html). As you can remember `memblock_reserve` function takes two parameters:
|
||||
你可以阅读关于`memblock`的相关内容在[Linux kernel memory management Part 1.](http://xinqiu.gitbooks.io/linux-insides-cn/content/mm/linux-mm-1.html),你应该还记得`memblock_reserve`函数的两个参数:
|
||||
|
||||
* base physical address of a memory block;
|
||||
* size of a memory block.
|
||||
|
||||
We can get the base physical address of the `_text` symbol with the `__pa_symbol` macro:
|
||||
我们可以通过`__pa_symbol`宏指令来获取符号表`_text`段中的物理地址
|
||||
|
||||
```C
|
||||
#define __pa_symbol(x) \
|
||||
__phys_addr_symbol(__phys_reloc_hide((unsigned long)(x)))
|
||||
```
|
||||
|
||||
First of all it calls `__phys_reloc_hide` macro on the given parameter. The `__phys_reloc_hide` macro does nothing for `x86_64` and just returns the given parameter. Implementation of the `__phys_addr_symbol` macro is easy. It just subtracts the symbol address from the base address of the kernel text mapping base virtual address (you can remember that it is `__START_KERNEL_map`) and adds `phys_base` which is the base address of `_text`:
|
||||
上述宏指令调用 `__phys_reloc_hide` 宏指令来填充参数,`__phys_reloc_hide`宏指令在`x86_64`上返回的参数是给定的。宏指令 `__phys_addr_symbol`的执行是简单的,只是减去从`_text`符号表中读到的内核的符号映射地址并且加上物理地址的基地址。
|
||||
|
||||
```C
|
||||
#define __phys_addr_symbol(x) \
|
||||
((unsigned long)(x) - __START_KERNEL_map + phys_base)
|
||||
```
|
||||
|
||||
After we got the physical address of the `_text` symbol, `memblock_reserve` can reserve a memory block from the `_text` to the `__bss_stop - _text`.
|
||||
`memblock_reserve`函数对内存页进行分配。
|
||||
|
||||
Reserve memory for initrd
|
||||
|
||||
保留可用内存初始化initrd
|
||||
---------------------------------------------------------------------------------
|
||||
|
||||
In the next step after we reserved place for the kernel text and data is reserving place for the [initrd](http://en.wikipedia.org/wiki/Initrd). We will not see details about `initrd` in this post, you just may know that it is temporary root file system stored in memory and used by the kernel during its startup. The `early_reserve_initrd` function does all work. First of all this function gets the base address of the ram disk, its size and the end address with:
|
||||
之后我们保留替换内核的text和data段用来初始化[initrd](http://en.wikipedia.org/wiki/Initrd),我们暂时不去了解initrd的详细信息,你仅仅只需要知道根文件系统就是通过这方式来进行初始化这就是`early_reserve_initrd` 函数的工作,此函数获取RAM DISK的基地址以及大小以及大小加偏移。
|
||||
|
||||
```C
|
||||
u64 ramdisk_image = get_ramdisk_image();
|
||||
@@ -384,8 +382,7 @@ u64 ramdisk_size = get_ramdisk_size();
|
||||
u64 ramdisk_end = PAGE_ALIGN(ramdisk_image + ramdisk_size);
|
||||
```
|
||||
|
||||
All of these parameters are taken from `boot_params`. If you have read the chapter about [Linux Kernel Booting Process](http://xinqiu.gitbooks.io/linux-insides-cn/content/Booting/index.html), you must remember that we filled the `boot_params` structure during boot time. The kernel setup header contains a couple of fields which describes ramdisk, for example:
|
||||
|
||||
如果你阅读过这些章节[Linux Kernel Booting Process](http://xinqiu.gitbooks.io/linux-insides-cn/content/Booting/index.html),你就知道所有的这些啊参数都来自于`boot_params`,时刻谨记`boot_params`在boot期间已经被赋值,内核启动头包含了一下几个字段用来描述RAM DISK:
|
||||
```
|
||||
Field name: ramdisk_image
|
||||
Type: write (obligatory)
|
||||
@@ -396,7 +393,8 @@ Protocol: 2.00+
|
||||
zero if there is no initial ramdisk/ramfs.
|
||||
```
|
||||
|
||||
So we can get all the information that interests us from `boot_params`. For example let's look at `get_ramdisk_image`:
|
||||
|
||||
我们可以得到关于 `boot_params`的一些信息. 具体查看`get_ramdisk_image`:
|
||||
|
||||
```C
|
||||
static u64 __init get_ramdisk_image(void)
|
||||
@@ -409,13 +407,13 @@ static u64 __init get_ramdisk_image(void)
|
||||
}
|
||||
```
|
||||
|
||||
Here we get the address of the ramdisk from the `boot_params` and shift left it on `32`. We need to do it because as you can read in the [Documentation/x86/zero-page.txt](https://github.com/0xAX/linux/blob/master/Documentation/x86/zero-page.txt):
|
||||
关于32位的ramdisk的地址,我们可以阅读此部分内容来获取[Documentation/x86/zero-page.txt](https://github.com/0xAX/linux/blob/master/Documentation/x86/zero-page.txt):
|
||||
|
||||
```
|
||||
0C0/004 ALL ext_ramdisk_image ramdisk_image high 32bits
|
||||
```
|
||||
|
||||
So after shifting it on 32, we're getting a 64-bit address in `ramdisk_image` and we return it. `get_ramdisk_size` works on the same principle as `get_ramdisk_image`, but it used `ext_ramdisk_size` instead of `ext_ramdisk_image`. After we got ramdisk's size, base address and end address, we check that bootloader provided ramdisk with the:
|
||||
32位变化后,我们获取64位的ramdisk原理一样,为此我们可以检查bootloader 提供的ramdisk信息:
|
||||
|
||||
```C
|
||||
if (!boot_params.hdr.type_of_loader ||
|
||||
@@ -423,22 +421,22 @@ if (!boot_params.hdr.type_of_loader ||
|
||||
return;
|
||||
```
|
||||
|
||||
and reserve memory block with the calculated addresses for the initial ramdisk in the end:
|
||||
并保留内存块将ramdisk传输到最终的内存地址,然后进行初始化:
|
||||
|
||||
```C
|
||||
memblock_reserve(ramdisk_image, ramdisk_end - ramdisk_image);
|
||||
```
|
||||
|
||||
Conclusion
|
||||
结束语
|
||||
---------------------------------------------------------------------------------
|
||||
|
||||
It is the end of the fourth part about the Linux kernel initialization process. We started to dive in the kernel generic code from the `start_kernel` function in this part and stopped on the architecture-specific initialization in the `setup_arch`. In the next part we will continue with architecture-dependent initialization steps.
|
||||
以上就是第四部分关于内核初始化的部分内容,我们从`start_kernel`函数开始一直到指定体系架构初始化`setup_arch`的过程中停止,那么在下一个章节我们将继续研究体系架构相关的初始化内容。
|
||||
|
||||
If you have any questions or suggestions write me a comment or ping me at [twitter](https://twitter.com/0xAX).
|
||||
如果你有任何的问题或者建议,你可以留言,也可以直接发消息给我[twitter](https://twitter.com/0xAX)。
|
||||
|
||||
**Please note that English is not my first language, And I am really sorry for any inconvenience. If you find any mistakes please send me a PR to [linux-insides](https://github.com/MintCN/linux-insides-zh).**
|
||||
**很抱歉,英语并不是我的母的,非常抱歉给您阅读带来不便,如果你发现文中描述有任何问题,请提交一个 PR 到 [linux-insides](https://github.com/MintCN/linux-insides-zh).**
|
||||
|
||||
Links
|
||||
链接
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
* [GCC function attributes](https://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html)
|
||||
@@ -449,4 +447,4 @@ Links
|
||||
* [stack buffer overflow](http://en.wikipedia.org/wiki/Stack_buffer_overflow)
|
||||
* [IRQs](http://en.wikipedia.org/wiki/Interrupt_request_%28PC_architecture%29)
|
||||
* [initrd](http://en.wikipedia.org/wiki/Initrd)
|
||||
* [Previous part](https://github.com/MintCN/linux-insides-zh/blob/master/Initialization/linux-initialization-3.md)
|
||||
* [Previous part](https://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-3.html)
|
||||
|
||||
@@ -479,4 +479,4 @@ Links
|
||||
* [vsyscalls](https://lwn.net/Articles/446528/)
|
||||
* [SMP](http://en.wikipedia.org/wiki/Symmetric_multiprocessing)
|
||||
* [jiffy](http://en.wikipedia.org/wiki/Jiffy_%28time%29)
|
||||
* [Previous part](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/%20linux-initialization-6.html)
|
||||
* [Previous part](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-6.html)
|
||||
|
||||
22
README.md
22
README.md
@@ -22,16 +22,16 @@ Linux Insides
|
||||
|└ 1.5|[@chengong](https://github.com/chengong)|正在进行|
|
||||
| 2. Initialization||正在进行|
|
||||
|├ 2.0|[@mudongliang](https://github.com/mudongliang)|已完成|
|
||||
|├ 2.1||未开始|
|
||||
|├ 2.2||未开始|
|
||||
|├ 2.3||未开始|
|
||||
|├ 2.4||未开始|
|
||||
|├ 2.5||未开始|
|
||||
|├ 2.6||未开始|
|
||||
|├ 2.7||未开始|
|
||||
|├ 2.8||未开始|
|
||||
|├ 2.9||未开始|
|
||||
|└ 2.10||未开始|
|
||||
|├ 2.1|[@dontpanic92](https://github.com/dontpanic92)|正在进行|
|
||||
|├ 2.2|[@dontpanic92](https://github.com/dontpanic92)|正在进行|
|
||||
|├ 2.3|[@dontpanic92](https://github.com/dontpanic92)|正在进行|
|
||||
|├ 2.4|[@bjwrkj](https://github.com/bjwrkj)|已完成|
|
||||
|├ 2.5|[@bjwrkj](https://github.com/bjwrkj)|正在进行|
|
||||
|├ 2.6|[@bjwrkj](https://github.com/bjwrkj)|正在进行|
|
||||
|├ 2.7|[@bjwrkj](https://github.com/bjwrkj)|正在进行|
|
||||
|├ 2.8|[@bjwrkj](https://github.com/bjwrkj)|正在进行|
|
||||
|├ 2.9|[@bjwrkj](https://github.com/bjwrkj)|正在进行|
|
||||
|└ 2.10|[@bjwrkj](https://github.com/bjwrkj)|正在进行|
|
||||
| 3. Interrupts||正在进行|
|
||||
|├ 3.0|[@littleneko](https://github.com/littleneko)|正在进行|
|
||||
|├ 3.1|[@littleneko](https://github.com/littleneko)|正在进行|
|
||||
@@ -43,7 +43,7 @@ Linux Insides
|
||||
|├ 3.7|[@cloudusers](https://github.com/cloudusers)|正在进行|
|
||||
|├ 3.8|[@cloudusers](https://github.com/cloudusers)|正在进行|
|
||||
|├ 3.9|[@zhangyangjing](https://github.com/zhangyangjing)|已完成|
|
||||
|└ 3.10||未开始|
|
||||
|└ 3.10|[@worldwar](https://github.com/worldwar)|已完成|
|
||||
| 4. System calls||正在进行|
|
||||
|├ 4.0|[@mudongliang](https://github.com/mudongliang)|已完成|
|
||||
|├ 4.1|[@qianmoke](https://github.com/qianmoke)|已完成|
|
||||
|
||||
@@ -25,3 +25,5 @@
|
||||
[@zhangyangjing](https://github.com/zhangyangjing)
|
||||
|
||||
[@huxq](https://github.com/huxq)
|
||||
|
||||
[@worldwar](https://github.com/worldwar)
|
||||
|
||||
503
interrupts/interrupts-10.md
Normal file
503
interrupts/interrupts-10.md
Normal file
@@ -0,0 +1,503 @@
|
||||
中断和中断处理(十)
|
||||
=====================
|
||||
终结篇
|
||||
-------------------------
|
||||
本文是 Linux 内核[中断和中断处理](https://xinqiu.gitbooks.io/linux-insides-cn/content/interrupts/index.html)的第十节。在[上一节](https://xinqiu.gitbooks.io/linux-insides-cn/content/interrupts/interrupts-9.html),我们了解了延后中断及其相关概念,如 `softirq`,`tasklet`,`workqueue`。本节我们继续深入这个主题,现在是见识真正的硬件驱动的时候了。
|
||||
|
||||
以 [StringARM** SA-100/21285 评估板](http://netwinder.osuosl.org/pub/netwinder/docs/intel/datashts/27813501.pdf)串行驱动为例,我们来观察驱动程序如何请求一个 [IRQ](https://en.wikipedia.org/wiki/Interrupt_request_%28PC_architecture%29) 线,一个中断被触发时会发生什么之类的。驱动程序代码位于 [drivers/tty/serial/21285.c](https://github.com/torvalds/linux/blob/master/drivers/tty/serial/21285.c) 源文件。好啦,源码在手,说走就走!
|
||||
|
||||
一个内核模块的初始化
|
||||
-----------------------------------------------
|
||||
与本书其他新概念类似,为了考察这个驱动程序,我们从考察它的初始化过程开始。如你所知,Linux 内核为驱动程序或者内核模块的初始化和终止提供了两个宏:
|
||||
|
||||
* `module_init`
|
||||
* `module_exit`
|
||||
|
||||
可以在驱动程序的源代码中查阅这些宏的用法:
|
||||
|
||||
```C
|
||||
module_init(serial21285_init);
|
||||
module_exit(serial21285_exit);
|
||||
```
|
||||
|
||||
大多数驱动程序都能编译成一个可装载的内核[模块](https://en.wikipedia.org/wiki/Loadable_kernel_module),亦或被静态地链入 Linux 内核。前一种情况下,一个设备驱动程序的初始化由 `module_init` 与 `module_exit` 宏触发。这些宏定义在 [include/linux/init.h](https://github.com/torvalds/linux/blob/master/include/linux/init.h) 中:
|
||||
|
||||
```C
|
||||
#define module_init(initfn) \
|
||||
static inline initcall_t __inittest(void) \
|
||||
{ return initfn; } \
|
||||
int init_module(void) __attribute__((alias(#initfn)));
|
||||
|
||||
#define module_exit(exitfn) \
|
||||
static inline exitcall_t __exittest(void) \
|
||||
{ return exitfn; } \
|
||||
void cleanup_module(void) __attribute__((alias(#exitfn)));
|
||||
```
|
||||
|
||||
并被 [initcall](http://kernelnewbies.org/Documents/InitcallMechanism) 函数调用:
|
||||
|
||||
* `early_initcall`
|
||||
* `pure_initcall`
|
||||
* `core_initcall`
|
||||
* `postcore_initcall`
|
||||
* `arch_initcall`
|
||||
* `subsys_initcall`
|
||||
* `fs_initcall`
|
||||
* `rootfs_initcall`
|
||||
* `device_initcall`
|
||||
* `late_initcall`
|
||||
|
||||
这些函数又被 [init/main.c](https://github.com/torvalds/linux/blob/master/init/main.c) 中的 `do_initcalls` 函数调用。然而,如果设备驱动程序被静态链入 Linux 内核,那么这些宏的实现则如下所示:
|
||||
|
||||
```C
|
||||
#define module_init(x) __initcall(x);
|
||||
#define module_exit(x) __exitcall(x);
|
||||
```
|
||||
|
||||
这种情况下,模块装载的实现位于 [kernel/module.c](https://github.com/torvalds/linux/blob/master/kernel/module.c) 源文件中,而初始化发生在 `do_init_module` 函数内。我们不打算在本章深入探讨可装载模块的细枝末节,而会在一个专门介绍 Linux 内核模块的章节中窥其真容。话说回来,`module_init` 宏接受一个参数 - 本例中这个值是 `serial21285_init`。从函数名可以得知,这个函数做了一些驱动程序初始化的相关工作。请看:
|
||||
|
||||
```C
|
||||
static int __init serial21285_init(void)
|
||||
{
|
||||
int ret;
|
||||
|
||||
printk(KERN_INFO "Serial: 21285 driver\n");
|
||||
|
||||
serial21285_setup_ports();
|
||||
|
||||
ret = uart_register_driver(&serial21285_reg);
|
||||
if (ret == 0)
|
||||
uart_add_one_port(&serial21285_reg, &serial21285_port);
|
||||
|
||||
return ret;
|
||||
}
|
||||
```
|
||||
|
||||
如你所见,首先它把驱动程序相关信息写入内核缓冲区,然后调用 `serial21285_setup_ports` 函数。该函数设置了 `serial21285_port` 设备的基本 [uart](https://en.wikipedia.org/wiki/Universal_asynchronous_receiver/transmitter) 时钟:
|
||||
|
||||
```C
|
||||
unsigned int mem_fclk_21285 = 50000000;
|
||||
|
||||
static void serial21285_setup_ports(void)
|
||||
{
|
||||
serial21285_port.uartclk = mem_fclk_21285 / 4;
|
||||
}
|
||||
```
|
||||
|
||||
此处的 `serial21285` 是描述 `uart` 驱动程序的结构体:
|
||||
|
||||
```C
|
||||
static struct uart_driver serial21285_reg = {
|
||||
.owner = THIS_MODULE,
|
||||
.driver_name = "ttyFB",
|
||||
.dev_name = "ttyFB",
|
||||
.major = SERIAL_21285_MAJOR,
|
||||
.minor = SERIAL_21285_MINOR,
|
||||
.nr = 1,
|
||||
.cons = SERIAL_21285_CONSOLE,
|
||||
};
|
||||
```
|
||||
|
||||
如果驱动程序注册成功,我们借助 [drivers/tty/serial/serial_core.c](https://github.com/torvalds/linux/blob/master/drivers/tty/serial/serial_core.c) 源文件中的 `uart_add_one_port` 函数添加由驱动程序定义的端口 `serial21285_port` 结构体,然后从 `serial21285_init` 函数返回:
|
||||
|
||||
```C
|
||||
if (ret == 0)
|
||||
uart_add_one_port(&serial21285_reg, &serial21285_port);
|
||||
|
||||
return ret;
|
||||
```
|
||||
|
||||
到此为止,我们的驱动程序初始化完毕。当一个 `uart` 端口被 [drivers/tty/serial/serial_core.c](https://github.com/torvalds/linux/blob/master/drivers/tty/serial/serial_core.c) 中的 `uart_open` 函数打开,该函数会调用 `uart_startup` 函数来启动这个串行端口,后者会调用 `startup` 函数。它是 `uart_ops` 结构体的一部分。每个 `uart` 驱动程序都会定义这样一个结构体。在本例中,它是这样的:
|
||||
|
||||
```C
|
||||
static struct uart_ops serial21285_ops = {
|
||||
...
|
||||
.startup = serial21285_startup,
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
可以看到,`.startup` 字段是对 `serial21285_startup` 函数的引用。这个函数的实现是我们的关注重点,因为它与中断和中断处理密切相关。
|
||||
|
||||
请求中断线
|
||||
---------------------------------------
|
||||
|
||||
我们来看看 `serial21285_startup` 函数的实现:
|
||||
|
||||
```C
|
||||
static int serial21285_startup(struct uart_port *port)
|
||||
{
|
||||
int ret;
|
||||
|
||||
tx_enabled(port) = 1;
|
||||
rx_enabled(port) = 1;
|
||||
|
||||
ret = request_irq(IRQ_CONRX, serial21285_rx_chars, 0,
|
||||
serial21285_name, port);
|
||||
if (ret == 0) {
|
||||
ret = request_irq(IRQ_CONTX, serial21285_tx_chars, 0,
|
||||
serial21285_name, port);
|
||||
if (ret)
|
||||
free_irq(IRQ_CONRX, port);
|
||||
}
|
||||
|
||||
return ret;
|
||||
}
|
||||
```
|
||||
|
||||
首先是`TX`和`RX`。一个设备的串行总线仅由两条线组成:一条用于发送数据,另一条用于接收数据。与此对应,串行设备应该有两个串行引脚:接收器 - `RX` 和发送器 - `TX`。通过调用 `tx_enabled` 和 `rx_enalbed` 这两个宏来激活这些线。函数接下来的部分是我们最感兴趣的。注意 `request_irq` 这个函数。它注册了一个中断处理程序,然后激活一条给定的中断线。看一下这个函数的实现细节。该函数定义在 [include/linux/interrupt.h](https://github.com/torvalds/linux/blob/master/include/linux/interrupt.h) 头文件中,如下所示:
|
||||
|
||||
```C
|
||||
static inline int __must_check
|
||||
request_irq(unsigned int irq, irq_handler_t handler, unsigned long flags,
|
||||
const char *name, void *dev)
|
||||
{
|
||||
return request_threaded_irq(irq, handler, NULL, flags, name, dev);
|
||||
}
|
||||
```
|
||||
|
||||
可以看到,`request_irq` 函数接受五个参数:
|
||||
|
||||
* `irq` - 被请求的中断号
|
||||
* `handler` - 中断处理程序指针
|
||||
* `flags` - 掩码选项
|
||||
* `name` - 中断拥有者的名称
|
||||
* `dev` - 用于共享中断线的指针
|
||||
|
||||
现在我们来考察 `request_irq` 函数的调用。可以看到,第一个参数是 `IRQ_CONRX`。我们知道它是中断号,但 `CONRX` 又是什么东西?这个宏定义在 [arch/arm/mach-footbridge/include/mach/irqs.h](https://github.com/torvalds/linux/blob/master/arch/arm/mach-footbridge/include/mach/irqs.h) 头文件中。我们可以在这里找到 `21285` 主板能够产生的全部中断。注意,在第二次调用 `request_irq` 函数时,我们传入了 `IRQ_CONTX` 中断号。我们的驱动程序会在这些中断中处理 `RX` 和 `TX` 事件。这些宏的实现很简单:
|
||||
|
||||
```C
|
||||
#define IRQ_CONRX _DC21285_IRQ(0)
|
||||
#define IRQ_CONTX _DC21285_IRQ(1)
|
||||
...
|
||||
...
|
||||
...
|
||||
#define _DC21285_IRQ(x) (16 + (x))
|
||||
```
|
||||
|
||||
这个主板的 [ISA](https://en.wikipedia.org/wiki/Industry_Standard_Architecture) 中断号分布在`0`到`15`这个范围内。因此,我们的中断号就是在此之后的头两个值:`16` 和 `17`。在 `request_irq` 函数的两次调用中,第二个参数分别是 `serial21285_rx_chars` 和 `serial21285_tx_chars` 函数。当一个 `RX` 或 `TX` 中断发生时,这些函数就会被调用。我们不会在此深入探究这些函数,因为本章讲述的是中断与中断处理,而并非设备和驱动。下一个参数是 `flags`,`request_irq` 函数的两次调用中,它的值都是零。所有合法的 `flags` 都在 [include/linux/interrupt.h](https://github.com/torvalds/linux/blob/master/include/linux/interrupt.h) 中定义成诸如 `IRQF_*` 此类的宏。一些例子:
|
||||
|
||||
* `IRQF_SHARED` - 允许多个设备共享此中断号
|
||||
* `IRQF_PERCPU` - 此中断号属于单独cpu的(per cpu)
|
||||
* `IRQF_NO_THREAD` - 中断不能线程化
|
||||
* `IRQF_NOBALANCING` - 此中断步参与irq平衡时
|
||||
* `IRQF_IRQPOLL` - 此中断用于轮询
|
||||
* 等等
|
||||
|
||||
这里,我们传入的是 `0`,也就是 `IRQF_TRIGGER_NONE`。这个标志是说,它不配置任何水平触发或边缘触发的中断行为。至于第四个参数(`name`),我们传入 `serial21285_name` ,它定义如下:
|
||||
|
||||
```C
|
||||
static const char serial21285_name[] = "Footbridge UART";
|
||||
```
|
||||
|
||||
它会显示在 `/proc/interrupts` 的输出中。针对最后一个参数,我们传入一个指向 `uart_port` 结构体的指针。对 `request_irq` 函数及其参数有所了解后,我们来看看它的实现。从上文可知,`request_irq` 函数内部只是调用了定义在 [kernel/irq/manage.c](https://github.com/torvalds/linux/blob/master/kernel/irq/manage.c) 源文件中的 `request_threaded_irq` 函数,并分配了一个给定的中断线。该函数起始部分是 `irqaction` 和 `irq_desc` 的定义:
|
||||
|
||||
```C
|
||||
int request_threaded_irq(unsigned int irq, irq_handler_t handler,
|
||||
irq_handler_t thread_fn, unsigned long irqflags,
|
||||
const char *devname, void *dev_id)
|
||||
{
|
||||
struct irqaction *action;
|
||||
struct irq_desc *desc;
|
||||
int retval;
|
||||
...
|
||||
...
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
在本章,我们已经见识过 `irqaction` 和 `irq_desc` 结构体了。第一个结构体表示一个中断动作描述符,它包含中断处理程序指针,设备名称,中断号等等。第二个结构体表示一个中断描述符,包含指向 `irqaction` 的指针,中断标志等等。注意,`request_threaded_irq` 函数被 `request_irq` 调用时,带了一个额外的参数:`irq_handler_t thread_fn`。如果这个参数不为 `NULL`,它会创建 `irq` 线程,并在该线程中执行给定的 `irq` 处理程序。下一步,我们要做如下检查:
|
||||
|
||||
```C
|
||||
if (((irqflags & IRQF_SHARED) && !dev_id) ||
|
||||
(!(irqflags & IRQF_SHARED) && (irqflags & IRQF_COND_SUSPEND)) ||
|
||||
((irqflags & IRQF_NO_SUSPEND) && (irqflags & IRQF_COND_SUSPEND)))
|
||||
return -EINVAL;
|
||||
```
|
||||
|
||||
首先,我们确保共享中断时传入了真正的 `dev_id`(译者注:不然后面搞不清楚哪台设备产生了中断),而且 `IRQF_COND_SUSPEND` 仅对共享中断生效。否则退出函数,返回 `-EINVAL` 错误。之后,我们借助 [kernel/irq/irqdesc.c](https://github.com/torvalds/linux/blob/master/kernel/irq/irqdesc.c) 源文件中定义的 `irq_to_desc` 函数将给定的 `irq` 中断号转换成 `irq` 中断描述符。如果不成功,则退出函数,返回 `-EINVAL` 错误:
|
||||
|
||||
```C
|
||||
desc = irq_to_desc(irq);
|
||||
if (!desc)
|
||||
return -EINVAL;
|
||||
```
|
||||
|
||||
`irq_to_desc` 函数检查给定的 `irq` 中断号是否小于最大中断号,并且返回中断描述符。这里,`irq` 中断号就是 `irq_desc` 数组的偏移量:
|
||||
|
||||
```C
|
||||
struct irq_desc *irq_to_desc(unsigned int irq)
|
||||
{
|
||||
return (irq < NR_IRQS) ? irq_desc + irq : NULL;
|
||||
}
|
||||
```
|
||||
|
||||
由于我们已经把 `irq` 中断号转换成了 `irq` 中断描述符,现在来检查描述符的状态,确保我们可以请求中断:
|
||||
|
||||
```C
|
||||
if (!irq_settings_can_request(desc) || WARN_ON(irq_settings_is_per_cpu_devid(desc)))
|
||||
return -EINVAL;
|
||||
```
|
||||
|
||||
失败则返回 `-EINVAL` 错误。接着,我们检查给定的中断处理程序(译者注:是指 `handler` 变量)。如果它没被传入 `request_irq` 函数,我们就检查 `thread_fn`。两个都是 `NULL` 则返回 `-EINVAL`。如果中断处理程序没有被传入 `request_irq` 函数而 `thread_fn` 不为空,则把 `handler` 设为 `irq_default_primary_handler`:
|
||||
|
||||
```C
|
||||
if (!handler) {
|
||||
if (!thread_fn)
|
||||
return -EINVAL;
|
||||
handler = irq_default_primary_handler;
|
||||
}
|
||||
```
|
||||
|
||||
下一步,我们通过 `kzalloc` 函数为 `irqaction` 分配内存,若不成功则返回:
|
||||
|
||||
```C
|
||||
action = kzalloc(sizeof(struct irqaction), GFP_KERNEL);
|
||||
if (!action)
|
||||
return -ENOMEM;
|
||||
```
|
||||
|
||||
欲知 `kzalloc` 详情,请查阅专门介绍 Linux 内核[内存管理](https://xinqiu.gitbooks.io/linux-insides-cn/content/mm/index.html)的章节。为 `irqaction` 分配空间后,我们即对这个结构体进行初始化,设置它的中断处理程序,中断标志,设备名称等等:
|
||||
|
||||
```C
|
||||
action->handler = handler;
|
||||
action->thread_fn = thread_fn;
|
||||
action->flags = irqflags;
|
||||
action->name = devname;
|
||||
action->dev_id = dev_id;
|
||||
```
|
||||
|
||||
在 `request_threaded_irq` 函数末尾,我们调用 [kernel/irq/manage.c](https://github.com/torvalds/linux/blob/master/kernel/irq/manage.c) 中的 `__setup_irq` 函数,并注册一个给定的 `irqaction`。然后释放 `irqaction` 内存并返回:
|
||||
|
||||
```C
|
||||
chip_bus_lock(desc);
|
||||
retval = __setup_irq(irq, desc, action);
|
||||
chip_bus_sync_unlock(desc);
|
||||
|
||||
if (retval)
|
||||
kfree(action);
|
||||
|
||||
return retval;
|
||||
```
|
||||
|
||||
注意,`__setup_irq` 函数的调用位于 `chip_bus_lock` 和 `chip_bus_sync_unlock` 函数之间。这些函数对慢速总线(如 [i2c](https://en.wikipedia.org/wiki/I%C2%B2C))芯片进行锁定/解锁。现在来看看 `__setup_irq` 函数的实现。`__setup_irq` 函数开头是各种检查。首先我们检查给定的中断描述符不为 `NULL`,`irqchip` 不为 `NULL`,以及给定的中断描述符模块拥有者不为 `NULL`。接下来我们检查中断是否嵌套在其他中断线程中。如果是的,我们则以 `irq_nested_primary_handler` 替换 `irq_default_priamry_handler`。
|
||||
|
||||
下一步,如果给定的中断不是嵌套的,并且 `thread_fn` 不为空,我们就通过 `kthread_create` 创建了一个中断处理线程。
|
||||
|
||||
```C
|
||||
if (new->thread_fn && !nested) {
|
||||
struct task_struct *t;
|
||||
t = kthread_create(irq_thread, new, "irq/%d-%s", irq, new->name);
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
并在最后为给定的中断描述符的剩余字段赋值。于是,我们的 `16` 和 `17` 号中断请求线注册完毕。当一个中断控制器获得这些中断的相关事件时,`serial21285_rx_chars` 和`serial21285_tx_chars` 函数会被调用。现在我们来看一看一个中断发生时到底发生了什么。
|
||||
|
||||
准备处理中断
|
||||
------------------------------
|
||||
|
||||
通过上文,我们观察了为给定的中断描述符请求中断号,为给定的中断注册 `irqaction` 结构体的过程。我们已经知道,当一个中断事件发生时,中断控制器向处理器通知该事件,处理器尝试为这个中断找到一个合适的中断门。如果你已阅读本章[第八节](https://xinqiu.gitbooks.io/linux-insides-cn/content/interrupts/interrupts-8.html),你应该还记得 `native_init_IRQ` 函数。这个函数会初始化本地 [APIC](https://en.wikipedia.org/wiki/Advanced_Programmable_Interrupt_Controller)。这个函数的如下部分是我们现在最感兴趣的地方:
|
||||
|
||||
```C
|
||||
for_each_clear_bit_from(i, used_vectors, first_system_vector) {
|
||||
set_intr_gate(i, irq_entries_start +
|
||||
8 * (i - FIRST_EXTERNAL_VECTOR));
|
||||
}
|
||||
```
|
||||
|
||||
这里,我们从第 `first_system_vector` 位开始,依次向后迭代 `used_vectors` 位图中所有被清除的位:
|
||||
|
||||
```C
|
||||
int first_system_vector = FIRST_SYSTEM_VECTOR; // 0xef
|
||||
```
|
||||
|
||||
并且设置中断门,`i` 是向量号,`irq_entries_start + 8 * (i - FIRST_EXTERNAL_VECTOR)` 是起始地址。仅有一处尚不明了 - `irq_entries_start`。这个符号定义在 [arch/x86/entry/entry_64.S](https://github.com/torvalds/linux/blob/master/arch/x86/entry_entry_64.S) 汇编文件中,并提供了 `irq` 入口。一起来看:
|
||||
|
||||
```assembly
|
||||
.align 8
|
||||
ENTRY(irq_entries_start)
|
||||
vector=FIRST_EXTERNAL_VECTOR
|
||||
.rept (FIRST_SYSTEM_VECTOR - FIRST_EXTERNAL_VECTOR)
|
||||
pushq $(~vector+0x80)
|
||||
vector=vector+1
|
||||
jmp common_interrupt
|
||||
.align 8
|
||||
.endr
|
||||
END(irq_entries_start)
|
||||
```
|
||||
|
||||
这里我们可以看到 [GNU 汇编器](https://en.wikipedia.org/wiki/GNU_Assembler)的 `.rept` 指令。这条指令会把 `.endr` 之前的这几行代码重复 `FIRST_SYSTEM_VECTOR - FIRST_EXTERNAL_VECTOR` 次。我们已经知道 `FIRST_SYSTEM_VECTOR` 的值是 `0xef`,而 `FIRST_EXTERNAL_VECTOR` 等于 `0x20`。于是,它将运行:
|
||||
|
||||
```python
|
||||
>>> 0xef - 0x20
|
||||
207
|
||||
```
|
||||
|
||||
次。在 `.rept` 指令主体中,我们把入口程序地址压入栈中(注意,我们使用负数表示中断向量号,因为正数留作标识[系统调用](https://en.wikipedia.org/wiki/System_call)之用),将 `vector` 变量加 1,并跳转到 `common_interrupt` 标签。在 `common_interrupt` 中,我们调整了栈中向量号,执行 `interrupt` 指令,参数是 `do_IRQ`:
|
||||
|
||||
```assembly
|
||||
common_interrupt:
|
||||
addq $-0x80, (%rsp)
|
||||
interrupt do_IRQ
|
||||
```
|
||||
|
||||
`interrupt` 宏定义在同一个源文件中。它把[通用](https://en.wikipedia.org/wiki/Processor_register)寄存器的值保存在栈中。如果需要,它还会通过 `SWAPGS` 汇编指令在内核中改变用户空间 `gs` 寄存器。它会增加 [per-cpu](https://xinqiu.gitbooks.io/linux-insides-cn/content/Concepts/per-cpu.html) 的 `irq_count` 变量,来表明我们处于中断状态,然后调用 `do_IRQ` 函数。该函数定义于 [arch/x86/kernel/irq.c](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/irq.c) 源文件中,作用是处理我们的设备中断。让我们一起考察这个函数。`do_IRQ` 函数接受一个参数 - `pt_regs` 结构体,它存放着用户空间寄存器的值:
|
||||
|
||||
```C
|
||||
__visible unsigned int __irq_entry do_IRQ(struct pt_regs *regs)
|
||||
{
|
||||
struct pt_regs *old_regs = set_irq_regs(regs);
|
||||
unsigned vector = ~regs->orig_ax;
|
||||
unsigned irq;
|
||||
|
||||
irq_enter();
|
||||
exit_idle();
|
||||
...
|
||||
...
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
函数开头调用了 `set_irq_regs` 函数,后者返回被保存的 `per-cpu` 中断寄存器指针。然后又调用 `irq_enter` 和 `exit_idle` 函数。第一个函数 `irq_enter` 进入到一个中断上下文,更新 `__preempt_count` 变量。第二个函数 `exit_idle` 检查当前进程是否是 [pid](https://en.wikipedia.org/wiki/Process_identifier) 为 `0` 的 `idle` 进程,然后把 `IDLE_END` 传送给 `idle_notifier`。
|
||||
|
||||
接下来,我们从当前 cpu 中读取 `irq` 值,并调用 `handle_irq` 函数:
|
||||
|
||||
```C
|
||||
irq = __this_cpu_read(vector_irq[vector]);
|
||||
|
||||
if (!handle_irq(irq, regs)) {
|
||||
...
|
||||
...
|
||||
...
|
||||
}
|
||||
...
|
||||
...
|
||||
...
|
||||
```
|
||||
|
||||
`handle_irq` 函数定义于 [arch/x86/kernel/irq_64.c](https://github.com/torvalds/linux/blob/arch/x86/kernel/irq_64.c) 源文件中,它检查给定的中断描述符,然后调用 `generic_handle_irq_desc` 函数:
|
||||
|
||||
```C
|
||||
desc = irq_to_desc(irq);
|
||||
if (unlikely(!desc))
|
||||
return false;
|
||||
generic_handle_irq_desc(irq, desc);
|
||||
```
|
||||
|
||||
该函数又调用中断处理程序:
|
||||
|
||||
```C
|
||||
static inline void generic_handle_irq_desc(unsigned int irq, struct irq_desc *desc)
|
||||
{
|
||||
desc->handle_irq(irq, desc);
|
||||
}
|
||||
```
|
||||
|
||||
但是,停一停……`handle_irq` 是何方神圣,为什么在知道 `irqaction` 指向真正的中断处理程序的情况下,偏偏通过中断描述符调用我们的中断处理程序?实际上,`irq_desc->handle_irq` 是一个用来调用中断处理程序的上层 API。它在[设备树](https://en.wikipedia.org/wiki/Device_tree) 和 [APIC](https://en.wikipedia.org/wiki/Advanced_Programmable_Interrupt_Controller) 的初始化过程中就设定好了。内核通过它选择正确的函数以及 `irq->actions(s)` 的调用链。就这样,当一个中断发生时,`serial21285_tx_chars` 或者 `serial21285_rx_chars` 函数会被调用。
|
||||
|
||||
在 `do_IRQ` 函数末尾,我们调用 `irq_exit` 函数来退出中断上下文,调用 `set_irq_regs` 函数并传入先前的用户空间寄存器,最后返回:
|
||||
|
||||
```C
|
||||
irq_exit();
|
||||
set_irq_regs(old_regs);
|
||||
return 1;
|
||||
```
|
||||
|
||||
我们已经知道,当一个 `IRQ` 工作结束之后,如果有延后中断,它们会被执行。
|
||||
|
||||
退出中断
|
||||
---------------------
|
||||
|
||||
好了,中断处理程序执行完毕,我们必须从中断中返回。在 `do_IRQ` 函数将工作处理完毕后,我们将回到 [arch/x86/entry/entry_64.S](https://github.com/torvalds/linux/blob/master/arch/x86/entry_entry_64.S) 汇编代码的 `ret_from_intr` 标签处。首先,我们通过 `DISABLE_INTERRUPTS` 宏禁止中断,这个宏被扩展成 `cli` 指令,将 [per-cpu](https://xinqiu.gitbooks.io/linux-insides-cn/content/Concepts/per-cpu.html) 的 `irq_count` 变量值减 1。记住,当我们处于中断上下文的时候,这个变量的值是 `1`:
|
||||
|
||||
```assembly
|
||||
DISABLE_INTERRUPTS(CLBR_NONE)
|
||||
TRACE_IRQS_OFF
|
||||
decl PER_CPU_VAR(irq_count)
|
||||
```
|
||||
|
||||
最后一步,我们检查之前的上下文(用户空间或者内核空间),正确地恢复它,然后通过指令退出中断:
|
||||
|
||||
```assembly
|
||||
INTERRUPT_RETURN
|
||||
```
|
||||
|
||||
此处的 `INTERRUPT_RETURN` 宏是:
|
||||
|
||||
```C
|
||||
#define INTERRUPT_RETURN jmp native_iret
|
||||
```
|
||||
|
||||
而
|
||||
|
||||
```assembly
|
||||
ENTRY(native_iret)
|
||||
|
||||
.global native_irq_return_iret
|
||||
native_irq_return_iret:
|
||||
iretq
|
||||
```
|
||||
|
||||
本节到此结束。
|
||||
|
||||
总结
|
||||
--------------------------
|
||||
|
||||
这里是[中断和中断处理](https://xinqiu.gitbooks.io/linux-insides-cn/content/interrupts/index.html) 章节的第十节的结尾。如你在本节开头读到的那样,这是本章的最后一节。本章开篇阐述了中断理论,我们于是明白了什么是中断,中断的类型,然后也了解了异常以及对这种类型中断的处理,延后中断。最后在本节,我们考察了硬件中断和对这些中断的处理。当然,本节甚至本章都未能覆盖到 Linux 内核中断和中断处理的所有方面。这样并不现实,至少对我而言如此。这是一项浩大工程,不知你作何感想,对我来说,它确实浩大。这个主题远远超出本章讲述的内容,我不确定地球上能否找到一本书可以涵盖这个主题。我们漏掉了关于中断和中断处理的很多内容,但我相信,深入研究中断和中断处理相关的内核源码是个不错的点子。
|
||||
|
||||
如果有任何疑问或者建议,撰写评论或者在 [twitter](https://twitter.com/0xAX) 上联系我。
|
||||
|
||||
|
||||
**请注意,英语并非我的母语。任何不便之处,我深感抱歉。如果发现任何错误,请在 [linux-insides](https://github.com/0xAX/linux-insides) 向我发送 PR。(译者注:翻译问题请发送 PR 到 [linux-insides-cn](https://www.gitbook.com/book/xinqiu/linux-insides-cn))**
|
||||
|
||||
链接
|
||||
---------------------------------
|
||||
* [串行驱动文档](https://www.kernel.org/doc/Documentation/serial/driver)
|
||||
* [StrongARM** SA-110/21285 评估板](http://netwinder.osuosl.org/pub/netwinder/docs/intel/datashts/27813501.pdf)
|
||||
* [IRQ](https://en.wikipedia.org/wiki/Interrupt_request_%28PC_architecture%29)
|
||||
* [模块](https://en.wikipedia.org/wiki/Loadable_kernel_module)
|
||||
* [initcall](http://kernelnewbies.org/Documents/InitcallMechanism)
|
||||
* [uart](https://en.wikipedia.org/wiki/Universal_asynchronous_receiver/transmitter)
|
||||
* [ISA](https://en.wikipedia.org/wiki/Industry_Standard_Architecture)
|
||||
* [内存管理](https://xinqiu.gitbooks.io/linux-insides-cn/content/mm/index.html)
|
||||
* [i2c](https://en.wikipedia.org/wiki/I%C2%B2C)
|
||||
* [APIC](https://en.wikipedia.org/wiki/Advanced_Programmable_Interrupt_Controller)
|
||||
* [GNU 汇编器](https://en.wikipedia.org/wiki/GNU_Assembler)
|
||||
* [处理器寄存器](https://en.wikipedia.org/wiki/Processor_register)
|
||||
* [per-cpu](https://xinqiu.gitbooks.io/linux-insides-cn/content/Concepts/per-cpu.html)
|
||||
* [pid](https://en.wikipedia.org/wiki/Process_identifier)
|
||||
* [设备树](https://en.wikipedia.org/wiki/Device_tree)
|
||||
* [系统调用](https://en.wikipedia.org/wiki/System_call)
|
||||
* [上一节](https://xinqiu.gitbooks.io/linux-insides-cn/content/interrupts/interrupts-9.html)
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user