添加第二章第1、2、3节翻译

This commit is contained in:
Shengqiu Li
2017-05-10 22:32:34 +08:00
parent f32aa2a3e5
commit d166d39e2b
3 changed files with 278 additions and 267 deletions

View File

@@ -1,38 +1,38 @@
Kernel initialization. Part 2.
内核初始化 第二部分
================================================================================
Early interrupt and exception handling
初期中断和异常处理
--------------------------------------------------------------------------------
In the previous [part](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-1.html) we stopped before setting of early interrupt handlers. At this moment we are in the decompressed Linux kernel, we have basic [paging](https://en.wikipedia.org/wiki/Page_table) structure for early boot and our current goal is to finish early preparation before the main kernel code will start to work.
在上一个 [部分](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-1.html) 我们谈到了初期中断初始化。目前我们已经处于解压缩后的Linux内核中了还有了用于初期启动的基本的[分页](https://en.wikipedia.org/wiki/Page_table)机制。我们的目标是在内核的主体代码执行前做好准备工作。
We already started to do this preparation in the previous [first](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-1.html) part of this [chapter](https://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/index.html). We continue in this part and will know more about interrupt and exception handling.
我们已经在[本章](https://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/index.html)的[第一部分](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-1.html)做了一些工作,在这一部分中我们会继续分析关于中断和异常处理部分的代码。
Remember that we stopped before following loop:
我们在上一部分谈到了下面这个循环:
```C
for (i = 0; i < NUM_EXCEPTION_VECTORS; i++)
set_intr_gate(i, early_idt_handler_array[i]);
```
from the [arch/x86/kernel/head64.c](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/head64.c) source code file. But before we started to sort out this code, we need to know about interrupts and handlers.
这段代码位于 [arch/x86/kernel/head64.c](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/head64.c)。在分析这段代码之前,我们先来了解一些关于中断和中断处理程序的知识。
Some theory
理论
--------------------------------------------------------------------------------
An interrupt is an event caused by software or hardware to the CPU. For example a user have pressed a key on keyboard. On interrupt, CPU stops the current task and transfer control to the special routine which is called - [interrupt handler](https://en.wikipedia.org/wiki/Interrupt_handler). An interrupt handler handles and interrupt and transfer control back to the previously stopped task. We can split interrupts on three types:
中断是一种由软件或硬件产生的、向CPU发出的事件。例如如果用户按下了键盘上的一个按键时就会产生中断。此时CPU将会暂停当前的任务并且将控制流转到特殊的程序中——[中断处理程序Interrupt Handler](https://en.wikipedia.org/wiki/Interrupt_handler)。一个中断处理程序会对中断进行处理,然后将控制权交还给之前暂停的任务中。中断分为三类:
* Software interrupts - when a software signals CPU that it needs kernel attention. These interrupts are generally used for system calls;
* Hardware interrupts - when a hardware event happens, for example button is pressed on a keyboard;
* Exceptions - interrupts generated by CPU, when the CPU detects error, for example division by zero or accessing a memory page which is not in RAM.
* 软件中断 - 当一个软件可以向CPU发出信号表明它需要系统内核的相关功能时产生。这些中断通常用于系统调用
* 硬件中断 - 当一个硬件有任何事件发生时产生,例如键盘的按键被按下;
* 异常 - 当CPU检测到错误时产生例如发生了除零错误或者访问了一个不存在的内存页。
Every interrupt and exception is assigned a unique number which called - `vector number`. `Vector number` can be any number from `0` to `255`. There is common practice to use first `32` vector numbers for exceptions, and vector numbers from `32` to `255` are used for user-defined interrupts. We can see it in the code above - `NUM_EXCEPTION_VECTORS`, which defined as:
每一个中断和异常都可以由一个数来表示,这个数叫做`向量号`,它可以取从 `0` `255` 中的任何一个数。通常在实践中前 `32` 个向量号用来表示异常,`32` `255` 用来表示用户定义的中断。可以看到在上面的代码中,`NUM_EXCEPTION_VECTORS` 就定义为:
```C
#define NUM_EXCEPTION_VECTORS 32
```
CPU uses vector number as an index in the `Interrupt Descriptor Table` (we will see description of it soon). CPU catch interrupts from the [APIC](http://en.wikipedia.org/wiki/Advanced_Programmable_Interrupt_Controller) or through it's pins. Following table shows `0-31` exceptions:
CPU会从[APIC](http://en.wikipedia.org/wiki/Advanced_Programmable_Interrupt_Controller)或者CPU引脚接收中断并使用中断向量号作为 `中断描述符表` 的索引。下面的表中列出了 `0-31` 号异常:
```
----------------------------------------------------------------------------------------------
@@ -84,9 +84,9 @@ CPU uses vector number as an index in the `Interrupt Descriptor Table` (we will
----------------------------------------------------------------------------------------------
```
To react on interrupt CPU uses special structure - Interrupt Descriptor Table or IDT. IDT is an array of 8-byte descriptors like Global Descriptor Table, but IDT entries are called `gates`. CPU multiplies vector number on 8 to find index of the IDT entry. But in 64-bit mode IDT is an array of 16-byte descriptors and CPU multiplies vector number on 16 to find index of the entry in the IDT. We remember from the previous part that CPU uses special `GDTR` register to locate Global Descriptor Table, so CPU uses special register `IDTR` for Interrupt Descriptor Table and `lidt` instruction for loading base address of the table into this register.
为了能够对中断进行处理CPU使用了一种特殊的结构 - 中断描述符表IDT。IDT是一个由描述符组成的数组其中每个描述符都为8个字节与全局描述附表一致不过不同的是我们把IDT中的每一项叫做`门(gate`。为了获得某一项描述符的起始地址CPU会把向量号乘以8在64位模式中则会乘以16。在前面我们已经见过CPU使用一个特殊的 `GDTR` 寄存器来存放全局描述符表的地址,中断描述符表也有一个类似的寄存器 `IDTR`,同时还有用于将基地址加载入这个寄存器的指令 `lidt`
64-bit mode IDT entry has following structure:
64位模式下IDT的每一项的结构如下
```
127 96
@@ -115,46 +115,46 @@ To react on interrupt CPU uses special structure - Interrupt Descriptor Table or
--------------------------------------------------------------------------------
```
Where:
其中:
* `Offset` - is offset to entry point of an interrupt handler;
* `DPL` - Descriptor Privilege Level;
* `P` - Segment Present flag;
* `Segment selector` - a code segment selector in GDT or LDT
* `IST` - provides ability to switch to a new stack for interrupts handling.
* `Offset` - 代表了到中断处理程序入口点的偏移;
* `DPL` - 描述符特权级别;
* `P` - Segment Present 标志;
* `Segment selector` - 在GDT或LDT中的代码段选择子
* `IST` - 用来为中断处理提供一个新的栈。
And the last `Type` field describes type of the `IDT` entry. There are three different kinds of handlers for interrupts:
最后的 `Type` 域描述了这一项的类型,中断处理程序共分为三种:
* Task descriptor
* Interrupt descriptor
* Trap descriptor
* 任务描述符
* 中断描述符
* 陷阱描述符
Interrupt and trap descriptors contain a far pointer to the entry point of the interrupt handler. Only one difference between these types is how CPU handles `IF` flag. If interrupt handler was accessed through interrupt gate, CPU clear the `IF` flag to prevent other interrupts while current interrupt handler executes. After that current interrupt handler executes, CPU sets the `IF` flag again with `iret` instruction.
中断和陷阱描述符包含了一个指向中断处理程序的远far指针二者唯一的不同在于CPU处理 `IF` 标志的方式。如果是由中断门进入中断处理程序的CPU会清除 `IF` 标志位这样当当前中断处理程序执行时CPU不会对其他的中断进行处理只有当当前的中断处理程序返回时CPU 才在 `iret` 指令执行时重新设置 `IF` 标志位。
Other bits in the interrupt gate reserved and must be 0. Now let's look how CPU handles interrupts:
中断门的其他位为保留位必须为0。下面我们来看一下CPU是如何处理中断的
* CPU save flags register, `CS`, and instruction pointer on the stack.
* If interrupt causes an error code (like `#PF` for example), CPU saves an error on the stack after instruction pointer;
* After interrupt handler executed, `iret` instruction used to return from it.
* CPU 会在栈上保存标志寄存器、`cs`段寄存器和程序计数器IP
* 如果中断是由错误码引起的(比如 `#PF` CPU会在栈上保存错误码
* 在中断处理程序执行完毕后,由`iret`指令返回。
Now let's back to code.
OK接下来我们继续分析代码。
Fill and load IDT
设置并加载 IDT
--------------------------------------------------------------------------------
We stopped at the following point:
我们分析到了如下代码:
```C
for (i = 0; i < NUM_EXCEPTION_VECTORS; i++)
set_intr_gate(i, early_idt_handler_array[i]);
```
Here we call `set_intr_gate` in the loop, which takes two parameters:
这里循环内部调用了 `set_intr_gate`,它接受两个参数:
* Number of an interrupt or `vector number`;
* Address of the idt handler.
* 中断号,即 `向量号`
* 中断处理程序的地址。
and inserts an interrupt gate to the `IDT` table which is represented by the `&idt_descr` array. First of all let's look on the `early_idt_handler_array` array. It is an array which is defined in the [arch/x86/include/asm/segment.h](https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/segment.h) header file contains addresses of the first `32` exception handlers:
同时,这个函数还会将中断门插入至 `IDT` 表中,代码中的 `&idt_descr` 数组即为 `IDT`。 首先让我们来看一下 `early_idt_handler_array` 数组,它定义在 [arch/x86/include/asm/segment.h](https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/segment.h) 头文件中包含了前32个异常处理程序的地址
```C
#define EARLY_IDT_HANDLER_SIZE 9
@@ -163,11 +163,11 @@ and inserts an interrupt gate to the `IDT` table which is represented by the `&i
extern const char early_idt_handler_array[NUM_EXCEPTION_VECTORS][EARLY_IDT_HANDLER_SIZE];
```
The `early_idt_handler_array` is `288` bytes array which contains address of exception entry points every nine bytes. Every nine bytes of this array consist of two bytes optional instruction for pushing dummy error code if an exception does not provide it, two bytes instruction for pushing vector number to the stack and five bytes of `jump` to the common exception handler code.
`early_idt_handler_array` 是一个大小为 `288` 字节的数组,每一项为 `9` 个字节其中2个字节的备用指令用于向栈中压入默认错误码如果异常本身没有提供错误码的话2个字节的指令用于向栈中压入向量号剩余5个字节用于跳转到异常处理程序。
As we can see, We're filling only first 32 `IDT` entries in the loop, because all of the early setup runs with interrupts disabled, so there is no need to set up interrupt handlers for vectors greater than `32`. The `early_idt_handler_array` array contains generic idt handlers and we can find its definition in the [arch/x86/kernel/head_64.S](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/head_64.S) assembly file. For now we will skip it, but will look it soon. Before this we will look on the implementation of the `set_intr_gate` macro.
在上面的代码中,我们只通过一个循环向 `IDT` 中填入了前32项内容这是因为在整个初期设置阶段中断是禁用的。`early_idt_handler_array` 数组中的每一项指向的都是同一个通用中断处理程序,定义在 [arch/x86/kernel/head_64.S](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/head_64.S)。我们先暂时跳过这个数组的内容,看一下 `set_intr_gate` 的定义。
The `set_intr_gate` macro is defined in the [arch/x86/include/asm/desc.h](https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/desc.h) header file and looks:
`set_intr_gate` 宏定义在 [arch/x86/include/asm/desc.h](https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/desc.h)
```C
#define set_intr_gate(n, addr) \
@@ -180,7 +180,7 @@ The `set_intr_gate` macro is defined in the [arch/x86/include/asm/desc.h](https:
} while (0)
```
First of all it checks with that passed interrupt number is not greater than `255` with `BUG_ON` macro. We need to do this check because we can have only `256` interrupts. After this, it make a call of the `_set_gate` function which writes address of an interrupt gate to the `IDT`:
首先 `BUG_ON` 宏确保了传入的中断向量号不会大于255因为我们最多只有 `256` 个中断。然后它调用了 `_set_gate` 函数,它会将中断门写入 `IDT`
```C
static inline void _set_gate(int gate, unsigned type, void *addr,
@@ -193,7 +193,7 @@ static inline void _set_gate(int gate, unsigned type, void *addr,
}
```
At the start of `_set_gate` function we can see call of the `pack_gate` function which fills `gate_desc` structure with the given values:
`_set_gate` 函数的开始,它调用了 `pack_gate` 函数。这个函数会使用给定的参数填充 `gate_desc` 结构:
```C
static inline void pack_gate(gate_desc *gate, unsigned type, unsigned long func,
@@ -211,8 +211,7 @@ static inline void pack_gate(gate_desc *gate, unsigned type, unsigned long func,
gate->offset_high = PTR_HIGH(func);
}
```
As I mentioned above, we fill gate descriptor in this function. We fill three parts of the address of the interrupt handler with the address which we got in the main loop (address of the interrupt handler entry point). We are using three following macros to split address on three parts:
在这个函数里,我们把从主循环中得到的中断处理程序入口点地址拆成三个部分,填入门描述符中。下面的三个宏就用来做这个拆分工作:
```C
#define PTR_LOW(x) ((unsigned long long)(x) & 0xFFFF)
@@ -220,9 +219,9 @@ As I mentioned above, we fill gate descriptor in this function. We fill three pa
#define PTR_HIGH(x) ((unsigned long long)(x) >> 32)
```
With the first `PTR_LOW` macro we get the first `2` bytes of the address, with the second `PTR_MIDDLE` we get the second `2` bytes of the address and with the third `PTR_HIGH` macro we get the last `4` bytes of the address. Next we setup the segment selector for interrupt handler, it will be our kernel code segment - `__KERNEL_CS`. In the next step we fill `Interrupt Stack Table` and `Descriptor Privilege Level` (highest privilege level) with zeros. And we set `GAT_INTERRUPT` type in the end.
调用 `PTR_LOW` 可以得到x的低 `2` 个字节,调用 `PTR_MIDDLE` 可以得到x的中间 `2` 个字节,调用 `PTR_HIGH` 则能够得到x的高 `4` 个字节。接下来我们来位中断处理程序设置段选择子,即内核代码段 `__KERNEL_CS`。然后将 `Interrupt Stack Table``描述符特权等级` 最高特权等级设置为0以及在最后设置 `GAT_INTERRUPT` 类型。
Now we have filled IDT entry and we can call `native_write_idt_entry` function which just copies filled `IDT` entry to the `IDT`:
现在我们已经设置好了IDT中的一项那么通过调用 `native_write_idt_entry` 函数来把复制到 `IDT`
```C
static inline void native_write_idt_entry(gate_desc *idt, int entry, const gate_desc *gate)
@@ -231,32 +230,32 @@ static inline void native_write_idt_entry(gate_desc *idt, int entry, const gate_
}
```
After that main loop will finished, we will have filled `idt_table` array of `gate_desc` structures and we can load `Interrupt Descriptor table` with the call of the:
主循环结束后,`idt_table` 就已经设置完毕了,其为一个 `gate_desc` 数组。然后我们就可以通过下面的代码加载 `中断描述符表`
```C
load_idt((const struct desc_ptr *)&idt_descr);
```
Where `idt_descr` is:
其中,`idt_descr` 为:
```C
struct desc_ptr idt_descr = { NR_VECTORS * 16 - 1, (unsigned long) idt_table };
```
and `load_idt` just executes `lidt` instruction:
`load_idt` 函数只是执行了一下 `lidt` 指令:
```C
asm volatile("lidt %0"::"m" (*dtr));
```
You can note that there are calls of the `_trace_*` functions in the `_set_gate` and other functions. These functions fills `IDT` gates in the same manner that `_set_gate` but with one difference. These functions use `trace_idt_table` the `Interrupt Descriptor Table` instead of `idt_table` for tracepoints (we will cover this theme in the another part).
你可能已经注意到了,在代码中还有对 `_trace_*` 函数的调用。这些函数会用跟 `_set_gate` 同样的方法对 `IDT` 门进行设置,但仅有一处不同:这些函数并不设置 `idt_table`,而是 `trace_idt_table`,用于设置追踪点(tracepoint,我们将会在其他章节介绍这一部分)。
Okay, now we have filled and loaded `Interrupt Descriptor Table`, we know how the CPU acts during an interrupt. So now time to deal with interrupts handlers.
好了,至此我们已经了解到,通过设置并加载 `中断描述符表`能够让CPU在发生中断时做出相应的动作。下面让我们来看一下如何编写中断处理程序。
Early interrupts handlers
初期中断处理程序
--------------------------------------------------------------------------------
As you can read above, we filled `IDT` with the address of the `early_idt_handler_array`. We can find it in the [arch/x86/kernel/head_64.S](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/head_64.S) assembly file:
在上面的代码中,我们用 `early_idt_handler_array` 的地址来填充了 `IDT`,这个 `early_idt_handler_array` 定义在 [arch/x86/kernel/head_64.S](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/head_64.S)
```assembly
.globl early_idt_handler_array
@@ -273,7 +272,7 @@ early_idt_handlers:
.endr
```
We can see here, interrupt handlers generation for the first `32` exceptions. We check here, if exception has an error code then we do nothing, if exception does not return error code, we push zero to the stack. We do it for that would stack was uniform. After that we push exception number on the stack and jump on the `early_idt_handler_array` which is generic interrupt handler for now. As we may see above, every nine bytes of the `early_idt_handler_array` array consists from optional push of an error code, push of `vector number` and jump instruction. We can see it in the output of the `objdump` util:
这段代码自动生成为前 `32` 个异常生成了中断处理程序。首先,为了统一栈的布局,如果一个异常没有返回错误码,那么我们就手动在栈中压入一个 `0`。然后再在栈中压入中断向量号,最后跳转至通用的中断处理程序 `early_idt_handler_common`。我们可以通过 `objdump` 命令的输出一探究竟:
```
$ objdump -D vmlinux
@@ -294,7 +293,7 @@ ffffffff81fe5014: 6a 02 pushq $0x2
...
```
As i wrote above, CPU pushes flag register, `CS` and `RIP` on the stack. So before `early_idt_handler` will be executed, stack will contain following data:
由于在中断发生时CPU会在栈上压入标志寄存器、`CS` 段寄存器和 `RIP` 寄存器的内容。因此在 `early_idt_handler` 执行前,栈的布局如下:
```
|--------------------|
@@ -305,14 +304,14 @@ As i wrote above, CPU pushes flag register, `CS` and `RIP` on the stack. So befo
|--------------------|
```
Now let's look on the `early_idt_handler_common` implementation. It locates in the same [arch/x86/kernel/head_64.S](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/head_64.S#L343) assembly file and first of all we can see check for [NMI](http://en.wikipedia.org/wiki/Non-maskable_interrupt). We don't need to handle it, so just ignore it in the `early_idt_handler_common`:
下面我们来看一下 `early_idt_handler_common` 的实现。它也定义在 [arch/x86/kernel/head_64.S](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/head_64.S#L343) 文件中。首先它会检查当前中断是否为 [不可屏蔽中断(NMI](http://en.wikipedia.org/wiki/Non-maskable_interrupt),如果是则简单地忽略它们:
```assembly
cmpl $2,(%rsp)
je .Lis_nmi
```
where `is_nmi`:
其中 `is_nmi`:
```assembly
is_nmi:
@@ -320,7 +319,9 @@ is_nmi:
INTERRUPT_RETURN
```
drops an error code and vector number from the stack and call `INTERRUPT_RETURN` which is just expands to the `iretq` instruction. As we checked the vector number and it is not `NMI`, we check `early_recursion_flag` to prevent recursion in the `early_idt_handler_common` and if it's correct we save general registers on the stack:
这段程序首先从栈顶弹出错误码和中断向量号,然后通过调用 `INTERRUPT_RETURN`,即 `iretq` 指令直接返回。
如果当前中断不是 `NMI`,则首先检查 `early_recursion_flag` 以避免在 `early_idt_handler_common` 程序中递归地产生中断。如果一切都没问题,就先在栈上保存通用寄存器,为了防止中断返回时寄存器的内容错乱:
```assembly
pushq %rax
@@ -334,16 +335,14 @@ drops an error code and vector number from the stack and call `INTERRUPT_RETURN`
pushq %r11
```
We need to do it to prevent wrong values of registers when we return from the interrupt handler. After this we check segment selector in the stack:
然后我们检查栈上的段选择子:
```assembly
cmpl $__KERNEL_CS,96(%rsp)
jne 11f
```
which must be equal to the kernel code segment and if it is not we jump on label `11` which prints `PANIC` message and makes stack dump.
After the code segment was checked, we check the vector number, and if it is `#PF` or [Page Fault](https://en.wikipedia.org/wiki/Page_fault), we put value from the `cr2` to the `rdi` register and call `early_make_pgtable` (well see it soon):
段选择子必须为内核代码段,如果不是则跳转到标签 `11`,输出 `PANIC` 信息并打印栈的内容。然后我们来检查向量号,如果是 `#PF` 即 [缺页中断Page Fault](https://en.wikipedia.org/wiki/Page_fault),那么就把 `cr2` 寄存器中的值赋值给 `rdi`,然后调用 `early_make_pgtable` (详见后文):
```assembly
cmpl $14,72(%rsp)
@@ -354,8 +353,7 @@ After the code segment was checked, we check the vector number, and if it is `#P
jz 20f
```
If vector number is not `#PF`, we restore general purpose registers from the stack:
如果向量号不是 `#PF`,那么就恢复通用寄存器:
```assembly
popq %r11
popq %r10
@@ -368,16 +366,16 @@ If vector number is not `#PF`, we restore general purpose registers from the sta
popq %rax
```
and exit from the handler with `iret`.
并调用 `iret` 从中断处理程序返回。
It is the end of the first interrupt handler. Note that it is very early interrupt handler, so it handles only Page Fault now. We will see handlers for the other interrupts, but now let's look on the page fault handler.
第一个中断处理程序到这里就结束了。由于它只是一个初期中段处理程序,因此只处理缺页中断。下面让我们首先来看一下缺页中断处理程序,其他中断的处理程序我们之后再进行分析。
Page fault handling
缺页中断处理程序
--------------------------------------------------------------------------------
In the previous paragraph we saw first early interrupt handler which checks interrupt number for page fault and calls `early_make_pgtable` for building new page tables if it is. We need to have `#PF` handler in this step because there are plans to add ability to load kernel above `4G` and make access to `boot_params` structure above the 4G.
在上一节中我们第一次见到了初期中断处理程序,它检查了缺页中断的中断号,并调用了 `early_make_pgtable`来建立新的页表。在这里我们需要提供 `#PF` 中断处理程序,以便为之后将内核加载至 `4G` 地址以上并且能访问位于4G以上的 `boot_params` 结构体。
You can find implementation of the `early_make_pgtable` in the [arch/x86/kernel/head64.c](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/head64.c) and takes one parameter - address from the `cr2` register, which caused Page Fault. Let's look on it:
`early_make_pgtable` 的实现在 [arch/x86/kernel/head64.c](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/head64.c),它接受一个参数:从 `cr2` 寄存器得到的地址,这个地址引发了内存中断。下面让我们来看一下:
```C
int __init early_make_pgtable(unsigned long address)
@@ -393,60 +391,61 @@ int __init early_make_pgtable(unsigned long address)
}
```
It starts from the definition of some variables which have `*val_t` types. All of these types are just:
首先它定义了一些 `*val_t` 类型的变量。这些类型均为:
```C
typedef unsigned long pgdval_t;
```
Also we will operate with the `*_t` (not val) types, for example `pgd_t` and etc... All of these types defined in the [arch/x86/include/asm/pgtable_types.h](https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/pgtable_types.h) and represent structures like this:
此外,我们还会遇见 `*_t` (不带val)的类型,比如 `pgd_t`……这些类型都定义在 [arch/x86/include/asm/pgtable_types.h](https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/pgtable_types.h),形式如下:
```C
typedef struct { pgdval_t pgd; } pgd_t;
```
For example,
例如,
```C
extern pgd_t early_level4_pgt[PTRS_PER_PGD];
```
Here `early_level4_pgt` presents early top-level page table directory which consists of an array of `pgd_t` types and `pgd` points to low-level page entries.
在这里 `early_level4_pgt` 代表了初期顶层页表目录,它是一个 `pdg_t` 类型的数组,其中的 `pgd` 指向了下一级页表。
After we made the check that we have no invalid address, we're getting the address of the Page Global Directory entry which contains `#PF` address and put it's value to the `pgd` variable:
在确认不是非法地址后,我们取得页表中包含引起 `#PF` 中断的地址的那一项,将其赋值给 `pgd` 变量:
```C
pgd_p = &early_level4_pgt[pgd_index(address)].pgd;
pgd = *pgd_p;
```
In the next step we check `pgd`, if it contains correct page global directory entry we put physical address of the page global directory entry and put it to the `pud_p` with:
接下来我们检查一下 `pgd`,如果它包含了正确的全局页表项的话,我们就把这一项的物理地址处理后赋值给 `pud_p`
```C
pud_p = (pudval_t *)((pgd & PTE_PFN_MASK) + __START_KERNEL_map - phys_base);
```
where `PTE_PFN_MASK` is a macro:
其中 `PTE_PFN_MASK` 是一个宏:
```C
#define PTE_PFN_MASK ((pteval_t)PHYSICAL_PAGE_MASK)
```
which expands to:
展开后将为:
```C
(~(PAGE_SIZE-1)) & ((1 << 46) - 1)
```
or
或者写为:
```
0b1111111111111111111111111111111111111111111111
```
which is 46 bits to mask page frame.
它是一个46bit大小的页帧屏蔽值。
If `pgd` does not contain correct address we check that `next_early_pgt` is not greater than `EARLY_DYNAMIC_PAGE_TABLES` which is `64` and present a fixed number of buffers to set up new page tables on demand. If `next_early_pgt` is greater than `EARLY_DYNAMIC_PAGE_TABLES` we reset page tables and start again. If `next_early_pgt` is less than `EARLY_DYNAMIC_PAGE_TABLES`, we create new page upper directory pointer which points to the current dynamic page table and writes it's physical address with the `_KERPG_TABLE` access rights to the page global directory:
如果 `pgd` 没有包含有效的地址,我们就检查 `next_early_pgt` `EARLY_DYNAMIC_PAGE_TABLES`(即 `64`)的大小。`EARLY_DYNAMIC_PAGE_TABLES` 它是一个固定大小的缓冲区,用来在需要的时候建立新的页表。如果 `next_early_pgt` `EARLY_DYNAMIC_PAGE_TABLES` 大,我们就用一个上层页目录指针指向当前的动态页表,并将它的物理地址与 `_KERPG_TABLE` 访问权限一起写入全局页目录表:
```C
if (next_early_pgt >= EARLY_DYNAMIC_PAGE_TABLES) {
@@ -460,30 +459,32 @@ for (i = 0; i < PTRS_PER_PUD; i++)
*pgd_p = (pgdval_t)pud_p - __START_KERNEL_map + phys_base + _KERNPG_TABLE;
```
After this we fix up address of the page upper directory with:
然后我们来修正上层页目录的地址:
```C
pud_p += pud_index(address);
pud = *pud_p;
```
In the next step we do the same actions as we did before, but with the page middle directory. In the end we fix address of the page middle directory which contains maps kernel text+data virtual addresses:
下面我们对中层页目录重复上面同样的操作。最后我们利用 In the end we fix address of the page middle directory which contains maps kernel text+data virtual addresses:
```C
pmd = (physaddr & PMD_MASK) + early_pmd_flags;
pmd_p[pmd_index(address)] = pmd;
```
After page fault handler finished it's work and as result our `early_level4_pgt` contains entries which point to the valid addresses.
到此缺页中断处理程序就完成了它所有的工作,此时 `early_level4_pgt` 就包含了指向合法地址的项。
Conclusion
小结
--------------------------------------------------------------------------------
This is the end of the second part about linux kernel insides. If you have questions or suggestions, ping me in twitter [0xAX](https://twitter.com/0xAX), drop me [email](anotherworldofworld@gmail.com) or just create [issue](https://github.com/MintCN/linux-insides-zh/issues/new). In the next part we will see all steps before kernel entry point - `start_kernel` function.
本书的第二部分到此结束了。
**Please note that English is not my first language and I am really sorry for any inconvenience. If you found any mistakes please send me PR to [linux-insides](https://github.com/0xAX/linux-insides).**
如果你有任何问题或建议请在twitter上联系我 [0xAX](https://twitter.com/0xAX),或者通过[邮件](anotherworldofworld@gmail.com)与我沟通,还可以新开[issue](https://github.com/MintCN/linux-insides-zh/issues/new)。
Links
接下来我们将会看到进入内核入口点 `start_kernel` 函数之前剩下所有的准备工作。
相关链接
--------------------------------------------------------------------------------
* [GNU assembly .rept](https://sourceware.org/binutils/docs-2.23/as/Rept.html)