mirror of
https://github.com/MintCN/linux-insides-zh.git
synced 2026-04-25 19:20:28 +08:00
Merge branch 'master' into linux-insides-zh-datastructures-1-fix
This commit is contained in:
@@ -478,7 +478,7 @@ pgtable:
|
||||
movl %eax, 0(%edi)
|
||||
```
|
||||
|
||||
还是在这里,我们把和 `ebx` 相关的,或者说和 `startup_32` 相关的 `pgtable` 的地址放到 `ebi` 寄存器。接下来我们把相对此地址偏移 `0x1007` 的地址放到 `eax` 寄存器中。 `0x1007` 是 `PML4` 的大小 `4096` 加上 `7` 。这里的 `7` 代表了 `PML4` 的项标记。在我们这里,这些标记是 `PRESENT+RW+USER` 。在最后我们把第一个 `PDP(页目录指针)` 项的地址写到 `PML4` 中。
|
||||
还是在这里,我们把和 `ebx` 相关的,或者说和 `startup_32` 相关的 `pgtable` 的地址放到 `edi` 寄存器。接下来我们把相对此地址偏移 `0x1007` 的地址放到 `eax` 寄存器中。 `0x1007` 是 `PML4` 的大小 `4096` 加上 `7` 。这里的 `7` 代表了 `PML4` 的项标记。在我们这里,这些标记是 `PRESENT+RW+USER` 。在最后我们把第一个 `PDP(页目录指针)` 项的地址写到 `PML4` 中。
|
||||
|
||||
在接下来的一步,我们将会在 `页目录指针(PDP)` 表(3级页表)建立 4 个带有 `PRESENT+RW+USE` 标记的 `Page Directory (2级页表)` 项:
|
||||
|
||||
|
||||
@@ -45,3 +45,5 @@
|
||||
[@narcijie](https://github.com/narcijie)
|
||||
|
||||
[@biopuppet](https://github.com/biopuppet)
|
||||
|
||||
[@Albertchamberlain](https://github.com/Albertchamberlain)
|
||||
|
||||
@@ -4,7 +4,7 @@ Kernel initialization. Part 7.
|
||||
The End of the architecture-specific initialization, almost...
|
||||
================================================================================
|
||||
|
||||
This is the seventh part of the Linux Kernel initialization process which covers insides of the `setup_arch` function from the [arch/x86/kernel/setup.c](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/setup.c#L861). As you can know from the previous [parts](http://0xax.gitbooks.io/linux-insides/content/Initialization/index.html), the `setup_arch` function does some architecture-specific (in our case it is [x86_64](http://en.wikipedia.org/wiki/X86-64)) initialization stuff like reserving memory for kernel code/data/bss, early scanning of the [Desktop Management Interface](http://en.wikipedia.org/wiki/Desktop_Management_Interface), early dump of the [PCI](http://en.wikipedia.org/wiki/PCI) device and many many more. If you have read the previous [part](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-6.html), you can remember that we've finished it at the `setup_real_mode` function. In the next step, as we set limit of the [memblock](http://xinqiu.gitbooks.io/linux-insides-cn/content/MM/linux-mm-1.html) to the all mapped pages, we can see the call of the `setup_log_buf` function from the [kernel/printk/printk.c](https://github.com/torvalds/linux/blob/master/kernel/printk/printk.c).
|
||||
This is the seventh part of the Linux Kernel initialization process which covers insides of the `setup_arch` function from the [arch/x86/kernel/setup.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/kernel/setup.c#L861). As you can know from the previous [parts](https://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-6.html), the `setup_arch` function does some architecture-specific (in our case it is [x86_64](http://en.wikipedia.org/wiki/X86-64)) initialization stuff like reserving memory for kernel code/data/bss, early scanning of the [Desktop Management Interface](http://en.wikipedia.org/wiki/Desktop_Management_Interface), early dump of the [PCI](http://en.wikipedia.org/wiki/PCI) device and many many more. If you have read the previous [part](https://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-6.html), you can remember that we've finished it at the `setup_real_mode` function. In the next step, as we set limit of the [memblock](https://xinqiu.gitbooks.io/linux-insides-cn/content/MM/linux-mm-1.html) to the all mapped pages, we can see the call of the `setup_log_buf` function from the [kernel/printk/printk.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/kernel/printk/printk.c).
|
||||
|
||||
The `setup_log_buf` function setups kernel cyclic buffer and its length depends on the `CONFIG_LOG_BUF_SHIFT` configuration option. As we can read from the documentation of the `CONFIG_LOG_BUF_SHIFT` it can be between `12` and `21`. In the insides, buffer defined as array of chars:
|
||||
|
||||
@@ -32,7 +32,7 @@ setup_log_buf(1);
|
||||
|
||||
where `1` means that it is early setup. In the next step we check `new_log_buf_len` variable which is updated length of the kernel log buffer and allocate new space for the buffer with the `memblock_virt_alloc` function for it, or just return.
|
||||
|
||||
As kernel log buffer is ready, the next function is `reserve_initrd`. You can remember that we already called the `early_reserve_initrd` function in the fourth part of the [Kernel initialization](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-4.html). Now, as we reconstructed direct memory mapping in the `init_mem_mapping` function, we need to move [initrd](http://en.wikipedia.org/wiki/Initrd) into directly mapped memory. The `reserve_initrd` function starts from the definition of the base address and end address of the `initrd` and check that `initrd` is provided by a bootloader. All the same as what we saw in the `early_reserve_initrd`. But instead of the reserving place in the `memblock` area with the call of the `memblock_reserve` function, we get the mapped size of the direct memory area and check that the size of the `initrd` is not greater than this area with:
|
||||
As kernel log buffer is ready, the next function is `reserve_initrd`. You can remember that we already called the `early_reserve_initrd` function in the fourth part of the [Kernel initialization](https://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-4.html). Now, as we reconstructed direct memory mapping in the `init_mem_mapping` function, we need to move [initrd](http://en.wikipedia.org/wiki/Initrd) into directly mapped memory. The `reserve_initrd` function starts from the definition of the base address and end address of the `initrd` and check that `initrd` is provided by a bootloader. All the same as what we saw in the `early_reserve_initrd`. But instead of the reserving place in the `memblock` area with the call of the `memblock_reserve` function, we get the mapped size of the direct memory area and check that the size of the `initrd` is not greater than this area with:
|
||||
|
||||
```C
|
||||
mapped_size = memblock_mem_size(max_pfn_mapped);
|
||||
@@ -66,9 +66,9 @@ In the end of the `reserve_initrd` function, we free memblock memory which occup
|
||||
memblock_free(ramdisk_image, ramdisk_end - ramdisk_image);
|
||||
```
|
||||
|
||||
After we relocated `initrd` ramdisk image, the next function is `vsmp_init` from the [arch/x86/kernel/vsmp_64.c](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/vsmp_64.c). This function initializes support of the `ScaleMP vSMP`. As I already wrote in the previous parts, this chapter will not cover non-related `x86_64` initialization parts (for example as the current or `ACPI`, etc.). So we will skip implementation of this for now and will back to it in the part which cover techniques of parallel computing.
|
||||
After we relocated `initrd` ramdisk image, the next function is `vsmp_init` from the [arch/x86/kernel/vsmp_64.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/kernel/vsmp_64.c). This function initializes support of the `ScaleMP vSMP`. As I already wrote in the previous parts, this chapter will not cover non-related `x86_64` initialization parts (for example as the current or `ACPI`, etc.). So we will skip implementation of this for now and will back to it in the part which cover techniques of parallel computing.
|
||||
|
||||
The next function is `io_delay_init` from the [arch/x86/kernel/io_delay.c](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/io_delay.c). This function allows to override default default I/O delay `0x80` port. We already saw I/O delay in the [Last preparation before transition into protected mode](http://xinqiu.gitbooks.io/linux-insides-cn/content/Booting/linux-bootstrap-3.html), now let's look on the `io_delay_init` implementation:
|
||||
The next function is `io_delay_init` from the [arch/x86/kernel/io_delay.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/kernel/io_delay.c). This function allows to override default I/O delay `0x80` port. We already saw I/O delay in the [Last preparation before transition into protected mode](https://xinqiu.gitbooks.io/linux-insides-cn/content/Booting/linux-bootstrap-3.html), now let's look on the `io_delay_init` implementation:
|
||||
|
||||
```C
|
||||
void __init io_delay_init(void)
|
||||
@@ -78,7 +78,7 @@ void __init io_delay_init(void)
|
||||
}
|
||||
```
|
||||
|
||||
This function check `io_delay_override` variable and overrides I/O delay port if `io_delay_override` is set. We can set `io_delay_override` variably by passing `io_delay` option to the kernel command line. As we can read from the [Documentation/kernel-parameters.txt](https://github.com/torvalds/linux/blob/master/Documentation/kernel-parameters.txt), `io_delay` option is:
|
||||
This function check `io_delay_override` variable and overrides I/O delay port if `io_delay_override` is set. We can set `io_delay_override` variably by passing `io_delay` option to the kernel command line. As we can read from the [Documentation/kernel-parameters.txt](https://github.com/torvalds/linux/blob/master/Documentation/admin-guide/kernel-parameters.rst), `io_delay` option is:
|
||||
|
||||
```
|
||||
io_delay= [X86] I/O delay method
|
||||
@@ -92,13 +92,13 @@ io_delay= [X86] I/O delay method
|
||||
No delay
|
||||
```
|
||||
|
||||
We can see `io_delay` command line parameter setup with the `early_param` macro in the [arch/x86/kernel/io_delay.c](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/io_delay.c)
|
||||
We can see `io_delay` command line parameter setup with the `early_param` macro in the [arch/x86/kernel/io_delay.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/kernel/io_delay.c)
|
||||
|
||||
```C
|
||||
early_param("io_delay", io_delay_param);
|
||||
```
|
||||
|
||||
More about `early_param` you can read in the previous [part](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-6.html). So the `io_delay_param` function which setups `io_delay_override` variable will be called in the [do_early_param](https://github.com/torvalds/linux/blob/master/init/main.c#L413) function. `io_delay_param` function gets the argument of the `io_delay` kernel command line parameter and sets `io_delay_type` depends on it:
|
||||
More about `early_param` you can read in the previous [part](https://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-6.html). So the `io_delay_param` function which setups `io_delay_override` variable will be called in the [do_early_param](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/init/main.c#L413) function. `io_delay_param` function gets the argument of the `io_delay` kernel command line parameter and sets `io_delay_type` depends on it:
|
||||
|
||||
```C
|
||||
static int __init io_delay_param(char *s)
|
||||
@@ -127,7 +127,7 @@ The next functions are `acpi_boot_table_init`, `early_acpi_boot_init` and `initm
|
||||
Allocate area for DMA
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
In the next step we need to allocate area for the [Direct memory access](http://en.wikipedia.org/wiki/Direct_memory_access) with the `dma_contiguous_reserve` function which is defined in the [drivers/base/dma-contiguous.c](https://github.com/torvalds/linux/blob/master/drivers/base/dma-contiguous.c). `DMA` is a special mode when devices communicate with memory without CPU. Note that we pass one parameter - `max_pfn_mapped << PAGE_SHIFT`, to the `dma_contiguous_reserve` function and as you can understand from this expression, this is limit of the reserved memory. Let's look on the implementation of this function. It starts from the definition of the following variables:
|
||||
In the next step we need to allocate area for the [Direct memory access](http://en.wikipedia.org/wiki/Direct_memory_access) with the `dma_contiguous_reserve` function which is defined in the [drivers/base/dma-contiguous.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/drivers/base/dma-contiguous.c). `DMA` is a special mode when devices communicate with memory without CPU. Note that we pass one parameter - `max_pfn_mapped << PAGE_SHIFT`, to the `dma_contiguous_reserve` function and as you can understand from this expression, this is limit of the reserved memory. Let's look on the implementation of this function. It starts from the definition of the following variables:
|
||||
|
||||
```C
|
||||
phys_addr_t selected_size = 0;
|
||||
@@ -189,7 +189,7 @@ The next step is the call of the function - `x86_init.paging.pagetable_init`. If
|
||||
#define native_pagetable_init paging_init
|
||||
```
|
||||
|
||||
which expands as you can see to the call of the `paging_init` function from the [arch/x86/mm/init_64.c](https://github.com/torvalds/linux/blob/master/arch/x86/mm/init_64.c). The `paging_init` function initializes sparse memory and zone sizes. First of all what's zones and what is it `Sparsemem`. The `Sparsemem` is a special foundation in the linux kernel memory manager which used to split memory area into different memory banks in the [NUMA](http://en.wikipedia.org/wiki/Non-uniform_memory_access) systems. Let's look on the implementation of the `paginig_init` function:
|
||||
which expands as you can see to the call of the `paging_init` function from the [arch/x86/mm/init_64.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/mm/init_64.c). The `paging_init` function initializes sparse memory and zone sizes. First of all what's zones and what is it `Sparsemem`. The `Sparsemem` is a special foundation in the linux kernel memory manager which used to split memory area into different memory banks in the [NUMA](http://en.wikipedia.org/wiki/Non-uniform_memory_access) systems. Let's look on the implementation of the `paginig_init` function:
|
||||
|
||||
```C
|
||||
void __init paging_init(void)
|
||||
@@ -205,7 +205,7 @@ void __init paging_init(void)
|
||||
}
|
||||
```
|
||||
|
||||
As you can see there is call of the `sparse_memory_present_with_active_regions` function which records a memory area for every `NUMA` node to the array of the `mem_section` structure which contains a pointer to the structure of the array of `struct page`. The next `sparse_init` function allocates non-linear `mem_section` and `mem_map`. In the next step we clear state of the movable memory nodes and initialize sizes of zones. Every `NUMA` node is divided into a number of pieces which are called - `zones`. So, `zone_sizes_init` function from the [arch/x86/mm/init.c](https://github.com/torvalds/linux/blob/master/arch/x86/mm/init.c) initializes size of zones.
|
||||
As you can see there is call of the `sparse_memory_present_with_active_regions` function which records a memory area for every `NUMA` node to the array of the `mem_section` structure which contains a pointer to the structure of the array of `struct page`. The next `sparse_init` function allocates non-linear `mem_section` and `mem_map`. In the next step we clear state of the movable memory nodes and initialize sizes of zones. Every `NUMA` node is divided into a number of pieces which are called - `zones`. So, `zone_sizes_init` function from the [arch/x86/mm/init.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/mm/init.c) initializes size of zones.
|
||||
|
||||
Again, this part and next parts do not cover this theme in full details. There will be special part about `NUMA`.
|
||||
|
||||
@@ -222,7 +222,7 @@ if (boot_cpu_data.cpuid_level >= 0) {
|
||||
}
|
||||
```
|
||||
|
||||
The next function which you can see is `map_vsyscal` from the [arch/x86/kernel/vsyscall_64.c](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/vsyscall_64.c). This function maps memory space for [vsyscalls](https://lwn.net/Articles/446528/) and depends on `CONFIG_X86_VSYSCALL_EMULATION` kernel configuration option. Actually `vsyscall` is a special segment which provides fast access to the certain system calls like `getcpu`, etc. Let's look on implementation of this function:
|
||||
The next function which you can see is `map_vsyscal` from the [arch/x86/kernel/vsyscall_64.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/kernel/vsyscall_64.c). This function maps memory space for [vsyscalls](https://lwn.net/Articles/446528/) and depends on `CONFIG_X86_VSYSCALL_EMULATION` kernel configuration option. Actually `vsyscall` is a special segment which provides fast access to the certain system calls like `getcpu`, etc. Let's look on implementation of this function:
|
||||
|
||||
```C
|
||||
void __init map_vsyscall(void)
|
||||
@@ -241,7 +241,7 @@ void __init map_vsyscall(void)
|
||||
}
|
||||
```
|
||||
|
||||
In the beginning of the `map_vsyscall` we can see definition of two variables. The first is extern variable `__vsyscall_page`. As a extern variable, it defined somewhere in other source code file. Actually we can see definition of the `__vsyscall_page` in the [arch/x86/kernel/vsyscall_emu_64.S](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/vsyscall_emu_64.S). The `__vsyscall_page` symbol points to the aligned calls of the `vsyscalls` as `gettimeofday`, etc.:
|
||||
In the beginning of the `map_vsyscall` we can see definition of two variables. The first is extern variable `__vsyscall_page`. As a extern variable, it defined somewhere in other source code file. Actually we can see definition of the `__vsyscall_page` in the [arch/x86/kernel/vsyscall_emu_64.S](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/kernel/vsyscall_emu_64.S). The `__vsyscall_page` symbol points to the aligned calls of the `vsyscalls` as `gettimeofday`, etc.:
|
||||
|
||||
```assembly
|
||||
.globl __vsyscall_page
|
||||
@@ -296,19 +296,19 @@ BUILD_BUG_ON((unsigned long)__fix_to_virt(VSYSCALL_PAGE) !=
|
||||
(unsigned long)VSYSCALL_ADDR);
|
||||
```
|
||||
|
||||
Now `vsyscall` area is in the `fix-mapped` area. That's all about `map_vsyscall`, if you do not know anything about fix-mapped addresses, you can read [Fix-Mapped Addresses and ioremap](http://xinqiu.gitbooks.io/linux-insides-cn/content/MM/linux-mm-2.html). We will see more about `vsyscalls` in the `vsyscalls and vdso` part.
|
||||
Now `vsyscall` area is in the `fix-mapped` area. That's all about `map_vsyscall`, if you do not know anything about fix-mapped addresses, you can read [Fix-Mapped Addresses and ioremap](https://xinqiu.gitbooks.io/linux-insides-cn/content/MM/linux-mm-2.html). We will see more about `vsyscalls` in the `vsyscalls and vdso` part.
|
||||
|
||||
Getting the SMP configuration
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
You may remember how we made a search of the [SMP](http://en.wikipedia.org/wiki/Symmetric_multiprocessing) configuration in the previous [part](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-6.html). Now we need to get the `SMP` configuration if we found it. For this we check `smp_found_config` variable which we set in the `smp_scan_config` function (read about it the previous part) and call the `get_smp_config` function:
|
||||
You may remember how we made a search of the [SMP](http://en.wikipedia.org/wiki/Symmetric_multiprocessing) configuration in the previous [part](https://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-6.html). Now we need to get the `SMP` configuration if we found it. For this we check `smp_found_config` variable which we set in the `smp_scan_config` function (read about it the previous part) and call the `get_smp_config` function:
|
||||
|
||||
```C
|
||||
if (smp_found_config)
|
||||
get_smp_config();
|
||||
```
|
||||
|
||||
The `get_smp_config` expands to the `x86_init.mpparse.default_get_smp_config` function which is defined in the [arch/x86/kernel/mpparse.c](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/mpparse.c). This function defines a pointer to the multiprocessor floating pointer structure - `mpf_intel` (you can read about it in the previous [part](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-6.html)) and does some checks:
|
||||
The `get_smp_config` expands to the `x86_init.mpparse.default_get_smp_config` function which is defined in the [arch/x86/kernel/mpparse.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/kernel/mpparse.c). This function defines a pointer to the multiprocessor floating pointer structure - `mpf_intel` (you can read about it in the previous [part](https://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-6.html)) and does some checks:
|
||||
|
||||
```C
|
||||
struct mpf_intel *mpf = mpf_found;
|
||||
@@ -334,7 +334,7 @@ That's all, and now we can back to the `start_kernel` from the `setup_arch`.
|
||||
Back to the main.c
|
||||
================================================================================
|
||||
|
||||
As I wrote above, we have finished with the `setup_arch` function and now we can back to the `start_kernel` function from the [init/main.c](https://github.com/torvalds/linux/blob/master/init/main.c). As you may remember or saw yourself, `start_kernel` function as big as the `setup_arch`. So the couple of the next part will be dedicated to learning of this function. So, let's continue with it. After the `setup_arch` we can see the call of the `mm_init_cpumask` function. This function sets the [cpumask](https://xinqiu.gitbooks.io/linux-insides-cn/content/Concepts/linux-cpu-2.html) pointer to the memory descriptor `cpumask`. We can look on its implementation:
|
||||
As I wrote above, we have finished with the `setup_arch` function and now we can back to the `start_kernel` function from the [init/main.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/init/main.c). As you may remember or saw yourself, `start_kernel` function as big as the `setup_arch`. So the couple of the next part will be dedicated to learning of this function. So, let's continue with it. After the `setup_arch` we can see the call of the `mm_init_cpumask` function. This function sets the [cpumask](https://xinqiu.gitbooks.io/linux-insides-cn/content/Concepts/linux-cpu-2.html) pointer to the memory descriptor `cpumask`. We can look on its implementation:
|
||||
|
||||
```C
|
||||
static inline void mm_init_cpumask(struct mm_struct *mm)
|
||||
@@ -346,7 +346,7 @@ static inline void mm_init_cpumask(struct mm_struct *mm)
|
||||
}
|
||||
```
|
||||
|
||||
As you can see in the [init/main.c](https://github.com/torvalds/linux/blob/master/init/main.c), we pass memory descriptor of the init process to the `mm_init_cpumask` and depends on `CONFIG_CPUMASK_OFFSTACK` configuration option we clear [TLB](http://en.wikipedia.org/wiki/Translation_lookaside_buffer) switch `cpumask`.
|
||||
As you can see in the [init/main.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/init/main.c), we pass memory descriptor of the init process to the `mm_init_cpumask` and depends on `CONFIG_CPUMASK_OFFSTACK` configuration option we clear [TLB](http://en.wikipedia.org/wiki/Translation_lookaside_buffer) switch `cpumask`.
|
||||
|
||||
In the next step we can see the call of the following function:
|
||||
|
||||
@@ -462,7 +462,7 @@ It is the end of the seventh part about the linux kernel initialization process.
|
||||
|
||||
If you have any questions or suggestions write me a comment or ping me at [twitter](https://twitter.com/0xAX).
|
||||
|
||||
**Please note that English is not my first language, And I am really sorry for any inconvenience. If you find any mistakes please send me PR to [linux-insides](https://github.com/MintCN/linux-insides-zh).**
|
||||
**Please note that English is not my first language, And I am really sorry for any inconvenience. If you find any mistakes please send me PR to [linux-insides](https://github.com/0xAX/linux-insides).**
|
||||
|
||||
Links
|
||||
================================================================================
|
||||
@@ -471,7 +471,7 @@ Links
|
||||
* [x86_64](http://en.wikipedia.org/wiki/X86-64)
|
||||
* [initrd](http://en.wikipedia.org/wiki/Initrd)
|
||||
* [Kernel panic](http://en.wikipedia.org/wiki/Kernel_panic)
|
||||
* [Documentation/kernel-parameters.txt](https://github.com/torvalds/linux/blob/master/Documentation/kernel-parameters.txt)
|
||||
* [Documentation/kernel-parameters.txt](https://github.com/torvalds/linux/blob/master/Documentation/admin-guide/kernel-parameters.rst)
|
||||
* [ACPI](http://en.wikipedia.org/wiki/Advanced_Configuration_and_Power_Interface)
|
||||
* [Direct memory access](http://en.wikipedia.org/wiki/Direct_memory_access)
|
||||
* [NUMA](http://en.wikipedia.org/wiki/Non-uniform_memory_access)
|
||||
@@ -479,4 +479,4 @@ Links
|
||||
* [vsyscalls](https://lwn.net/Articles/446528/)
|
||||
* [SMP](http://en.wikipedia.org/wiki/Symmetric_multiprocessing)
|
||||
* [jiffy](http://en.wikipedia.org/wiki/Jiffy_%28time%29)
|
||||
* [Previous part](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-6.html)
|
||||
* [Previous part](https://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-6.html)
|
||||
@@ -37,7 +37,7 @@ Introduction
|
||||
BUG_ON((unsigned)n > 0xFF);
|
||||
```
|
||||
|
||||
你可以在 Linux 内核源码中关于中断设置的地方找到这个检查(例如:`set_intr_gate`, `void set_system_intr_gate` 在 [arch/x86/include/asm/desc.h](https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/desc.h)中)。从 `0` 到 `31` 的 32 个中断标识码被处理器保留,用作处理架构定义的异常和中断。你可以在 Linux 内核初始化程序的第二部分 - [早期中断和异常处理](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-2.html)中找到这个表和关于这些中断标识码的描述。从 `32` 到 `255` 的中断标识码设计为用户定义中断并且不被系统保留。这些中断通常分配给外部 I/O 设备,使这些设备可以发送中断给处理器。
|
||||
你可以在 Linux 内核源码中关于中断设置的地方找到这个定义(例如:`set_intr_gate`, `void set_system_intr_gate` 在 [arch/x86/include/asm/desc.h](https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/desc.h)中)。从 `0` 到 `31` 的 32 个中断标识码被处理器保留,用作处理架构定义的异常和中断。你可以在 Linux 内核初始化程序的第二部分 - [早期中断和异常处理](http://xinqiu.gitbooks.io/linux-insides-cn/content/Initialization/linux-initialization-2.html)中找到这个表和关于这些中断标识码的描述。从 `32` 到 `255` 的中断标识码设计为用户定义中断并且不被系统保留。这些中断通常分配给外部 I/O 设备,使这些设备可以发送中断给处理器。
|
||||
|
||||
现在,我们来讨论中断的类型。笼统地来讲,我们可以把中断分为两个主要类型:
|
||||
|
||||
@@ -381,12 +381,14 @@ void load_percpu_segment(int cpu)
|
||||
...
|
||||
...
|
||||
...
|
||||
loadsegment(gs, 0);
|
||||
wrmsrl(MSR_GS_BASE, (unsigned long)per_cpu(irq_stack_union.gs_base, cpu));
|
||||
__loadsegment_simple(gs, 0);
|
||||
wrmsrl(MSR_GS_BASE, cpu_kernelmode_gs_base(cpu));
|
||||
...
|
||||
load_stack_canary_segment();
|
||||
}
|
||||
```
|
||||
|
||||
就像我们所知道的一样,`gs` 寄存器指向中断栈的栈底:
|
||||
正如我们所知的一样,`gs` 寄存器指向中断栈的栈底:
|
||||
|
||||
```assembly
|
||||
movl $MSR_GS_BASE,%ecx
|
||||
@@ -394,14 +396,14 @@ void load_percpu_segment(int cpu)
|
||||
movl initial_gs+4(%rip),%edx
|
||||
wrmsr
|
||||
|
||||
GLOBAL(initial_gs)
|
||||
.quad INIT_PER_CPU_VAR(irq_stack_union)
|
||||
SYM_DATA(initial_gs,
|
||||
.quad INIT_PER_CPU_VAR(fixed_percpu_data))
|
||||
```
|
||||
|
||||
现在我们可以看到 `wrmsr` 指令,这个指令从 `edx:eax` 加载数据到 被 `ecx` 指向的[MSR寄存器]((http://en.wikipedia.org/wiki/Model-specific_register))。在这里MSR寄存器是 `MSR_GS_BASE`,它保存了被 `gs` 寄存器指向的内存段的基址。`edx:eax` 指向 `initial_gs` 的地址,它就是 `irq_stack_union` 的基址。
|
||||
现在我们可以看到 `wrmsr` 指令,这个指令从 `edx:eax` 加载数据到 被 `ecx` 指向的[MSR寄存器]((http://en.wikipedia.org/wiki/Model-specific_register))。在这里MSR寄存器是 `MSR_GS_BASE`,它保存了被 `gs` 寄存器指向的内存段的基址。`edx:eax` 指向 `initial_gs` ,的地址,它就是 `fixed_percpu_data` 的基址。
|
||||
|
||||
我们还知道,`x86_64` 有一个叫 `中断栈表(Interrupt Stack Table)` 或者 `IST` 的组件,当发生不可屏蔽中断、双重错误等等的时候,这个组件提供了切换到新栈的功能。这可以到达7个 `IST` per-cpu 入口。其中一些如下;
|
||||
There can be up to seven `IST` entries per-cpu. Some of them are:
|
||||
|
||||
|
||||
* `DOUBLEFAULT_STACK`
|
||||
* `NMI_STACK`
|
||||
@@ -420,34 +422,36 @@ There can be up to seven `IST` entries per-cpu. Some of them are:
|
||||
所有被 `IST` 切换到新栈的中断门描述符都由 `set_intr_gate_ist` 函数初始化。例如:
|
||||
|
||||
```C
|
||||
set_intr_gate_ist(X86_TRAP_NMI, &nmi, NMI_STACK);
|
||||
...
|
||||
...
|
||||
...
|
||||
set_intr_gate_ist(X86_TRAP_DF, &double_fault, DOUBLEFAULT_STACK);
|
||||
static const __initconst struct idt_data def_idts[] = {
|
||||
...
|
||||
INTG(X86_TRAP_NMI, nmi),
|
||||
...
|
||||
INTG(X86_TRAP_DF, double_fault),
|
||||
```
|
||||
|
||||
其中 `&nmi` 和 `&double_fault` 是中断函数的入口地址:
|
||||
其中 `&nmi` 和 `&double_fault` 在以下位置创建入口点:
|
||||
|
||||
|
||||
[arch/x86/kernel/entry_64.S](https://github.com/torvalds/linux/blob/master/arch/x86/entry/entry_64.S)中
|
||||
|
||||
```assembly
|
||||
idtentry double_fault do_double_fault has_error_code=1 paranoid=2 read_cr2=1
|
||||
...
|
||||
...
|
||||
...
|
||||
SYM_CODE_START(nmi)
|
||||
...
|
||||
...
|
||||
...
|
||||
SYM_CODE_END(nmi)
|
||||
SYM_CODE_END(nmi)
|
||||
```
|
||||
在以下位置给出了中断处理程序的声明 [arch/x86/include/asm/traps.h](https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/traps.h):
|
||||
```C
|
||||
asmlinkage void nmi(void);
|
||||
asmlinkage void double_fault(void);
|
||||
```
|
||||
|
||||
定义在 [arch/x86/kernel/entry_64.S](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/entry_64.S)中
|
||||
|
||||
```assembly
|
||||
idtentry double_fault do_double_fault has_error_code=1 paranoid=2
|
||||
...
|
||||
...
|
||||
...
|
||||
ENTRY(nmi)
|
||||
...
|
||||
...
|
||||
...
|
||||
END(nmi)
|
||||
```
|
||||
|
||||
当一个中断或者异常发生时,新的 `ss` 选择器被强制置为 `NULL`,并且 `ss` 选择器的 `rpl` 域被设置为新的 `cpl`。旧的 `ss`、`rsp`、寄存器标志、`cs`、`rip` 被压入新栈。在 64 位模型下,中断栈帧大小固定为 8 字节,所以我们可以得到下面的栈:
|
||||
|
||||
```
|
||||
|
||||
@@ -1,19 +1,17 @@
|
||||
Interrupts and Interrupt Handling. Part 3.
|
||||
中断和中断处理. Part 3.
|
||||
================================================================================
|
||||
|
||||
Exception Handling
|
||||
异常处理
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
This is the third part of the [chapter](http://0xax.gitbooks.io/linux-insides/content/interrupts/index.html) about an interrupts and an exceptions handling in the Linux kernel and in the previous [part](http://0xax.gitbooks.io/linux-insides/content/interrupts/index.html) we stopped at the `setup_arch` function from the [arch/x86/kernel/setup.c](https://github.com/torvalds/linux/blame/master/arch/x86/kernel/setup.c) source code file.
|
||||
这是第三部分 [chapter](http://0xax.gitbooks.io/linux-insides/content/interrupts/index.html) Linux内核中有关中断和异常处理 在前面的内容中 [part](http://0xax.gitbooks.io/linux-insides/content/interrupts/index.html) 我们停止了 `setup_arch` 函数 [arch/x86/kernel/setup.c](https://github.com/torvalds/linux/blame/master/arch/x86/kernel/setup.c) 源代码文件.
|
||||
|
||||
We already know that this function executes initialization of architecture-specific stuff. In our case the `setup_arch` function does [x86_64](https://en.wikipedia.org/wiki/X86-64) architecture related initializations. The `setup_arch` is big function, and in the previous part we stopped on the setting of the two exceptions handlers for the two following exceptions:
|
||||
我们已经知道该函数执行特定于体系结构的东西的初始化。 在我们的例子中,`setup_arch`函数执行与[x86_64](https://en.wikipedia.org/wiki/X86-64) architecture相关的初始化工作。 `setup_arch`是一个大功能,在上一部分中,我们停止了以下两个异常的两个异常处理程序的设置:
|
||||
* `#DB` - 调试异常,将控制从中断的进程转移到调试处理程序;
|
||||
* `#BP` - 由`int`指令引起的断点异常。
|
||||
|
||||
* `#DB` - debug exception, transfers control from the interrupted process to the debug handler;
|
||||
* `#BP` - breakpoint exception, caused by the `int 3` instruction.
|
||||
|
||||
These exceptions allow the `x86_64` architecture to have early exception processing for the purpose of debugging via the [kgdb](https://en.wikipedia.org/wiki/KGDB).
|
||||
|
||||
As you can remember we set these exceptions handlers in the `early_trap_init` function:
|
||||
这些异常允许x86_64体系结构具有早期异常处理功能,以便于通过[kgdb](https://en.wikipedia.org/wiki/KGDB) 进行调试
|
||||
正如您记得的,我们在`early_trap_init`函数中设置了这些异常处理程序:
|
||||
|
||||
```C
|
||||
void __init early_trap_init(void)
|
||||
@@ -24,19 +22,18 @@ void __init early_trap_init(void)
|
||||
}
|
||||
```
|
||||
|
||||
from the [arch/x86/kernel/traps.c](https://github.com/torvalds/linux/tree/master/arch/x86/kernel/traps.c). We already saw implementation of the `set_intr_gate_ist` and `set_system_intr_gate_ist` functions in the previous part and now we will look on the implementation of these two exceptions handlers.
|
||||
来自 [arch/x86/kernel/traps.c](https://github.com/torvalds/linux/tree/master/arch/x86/kernel/traps.c). 我们已经在上一部分中看到了`set_intr_gate_ist`和`set_system_intr_gate_ist`函数的实现,现在我们将看看这两个异常处理程序的实现。
|
||||
|
||||
Debug and Breakpoint exceptions
|
||||
调试和断点异常
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
Ok, we setup exception handlers in the `early_trap_init` function for the `#DB` and `#BP` exceptions and now time is to consider their implementations. But before we will do this, first of all let's look on details of these exceptions.
|
||||
Ok,我们在`early_trap_init`函数中为`#DB`和`#BP`异常设置了异常处理程序,现在是时候考虑它们的实现了。但是,在执行此操作之前,我们首先看一下这些异常的详细信息。
|
||||
|
||||
The first exceptions - `#DB` or `debug` exception occurs when a debug event occurs. For example - attempt to change the contents of a [debug register](http://en.wikipedia.org/wiki/X86_debug_register). Debug registers are special registers that were presented in `x86` processors starting from the [Intel 80386](http://en.wikipedia.org/wiki/Intel_80386) processor and as you can understand from name of this CPU extension, main purpose of these registers is debugging.
|
||||
第一个异常-`#DB`或`debug`异常是在发生调试事件时发生的。例如-尝试更改[debug register](http://en.wikipedia.org/wiki/X86_debug_register) 的内容。debug register是从 [Intel 80386](http://en.wikipedia.org/wiki/Intel_80386) 处理器开始在`x86`处理器中提供的特殊寄存器,从CPU扩展的名可以知道,这些寄存器中的正在调试这些功能。
|
||||
|
||||
These registers allow to set breakpoints on the code and read or write data to trace it. Debug registers may be accessed only in the privileged mode and an attempt to read or write the debug registers when executing at any other privilege level causes a [general protection fault](https://en.wikipedia.org/wiki/General_protection_fault) exception. That's why we have used `set_intr_gate_ist` for the `#DB` exception, but not the `set_system_intr_gate_ist`.
|
||||
|
||||
The verctor number of the `#DB` exceptions is `1` (we pass it as `X86_TRAP_DB`) and as we may read in specification, this exception has no error code:
|
||||
这些寄存器允许在代码上设置断点,并读取或写入数据以对其进行跟踪。debug register只能在特权模式下访问,以任何其他特权级别执行时尝试读取或写入调试寄存器都会导致[general protection fault](https://en.wikipedia.org/wiki/General_protection_fault) 异常。这就是为什么我们对`#DB`异常使用了`set_intr_gate_ist`,而不对`set_system_intr_gate_ist`使用。
|
||||
|
||||
`#DB`异常的记录编号为1(我们将其作为X86_TRAP_DB传递),并且正如我们在规范中可能会看到的那样,该异常没有错误代码:
|
||||
```
|
||||
+-----------------------------------------------------+
|
||||
|Vector|Mnemonic|Description |Type |Error Code|
|
||||
@@ -45,8 +42,7 @@ The verctor number of the `#DB` exceptions is `1` (we pass it as `X86_TRAP_DB`)
|
||||
+-----------------------------------------------------+
|
||||
```
|
||||
|
||||
The second exception is `#BP` or `breakpoint` exception occurs when processor executes the [int 3](http://en.wikipedia.org/wiki/INT_%28x86_instruction%29#INT_3) instruction. Unlike the `DB` exception, the `#BP` exception may occur in userspace. We can add it anywhere in our code, for example let's look on the simple program:
|
||||
|
||||
第二个异常是处理器执行[int 3](http://en.wikipedia.org/wiki/INT_%28x86_instruction%29#INT_3) 指令时发生的`#BP`或`breakpointv`异常。 与`DB`异常不同,`#BP`异常可能在用户空间中发生。 我们可以将其添加到代码中的任何位置,让我们看一下简单的程序:
|
||||
```C
|
||||
// breakpoint.c
|
||||
#include <stdio.h>
|
||||
@@ -61,7 +57,7 @@ int main() {
|
||||
}
|
||||
```
|
||||
|
||||
If we will compile and run this program, we will see following output:
|
||||
如果我们编译并运行该程序,我们将看到以下输出:
|
||||
|
||||
```
|
||||
$ gcc breakpoint.c -o breakpoint
|
||||
@@ -69,7 +65,7 @@ i equal to: 0
|
||||
Trace/breakpoint trap
|
||||
```
|
||||
|
||||
But if will run it with gdb, we will see our breakpoint and can continue execution of our program:
|
||||
但是,如果将其与gdb一起运行,我们将看到断点并可以继续执行程序:
|
||||
|
||||
```
|
||||
$ gdb breakpoint
|
||||
@@ -102,17 +98,16 @@ Program received signal SIGTRAP, Trace/breakpoint trap.
|
||||
...
|
||||
```
|
||||
|
||||
From this moment we know a little about these two exceptions and we can move on to consideration of their handlers.
|
||||
从这一刻起,我们对这两个异常有所了解,我们可以把关注点转移到它们的处理程序。
|
||||
|
||||
Preparation before an exception handler
|
||||
异常处理程序之前的准备
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
As you may note before, the `set_intr_gate_ist` and `set_system_intr_gate_ist` functions takes an addresses of exceptions handlers in theirs second parameter. In or case our two exception handlers will be:
|
||||
|
||||
正如您之前可能注意到的那样,`set_intr_gate_ist`和`set_system_intr_gate_ist`函数在其第二个参数中使用异常处理程序的地址。 否则,我们的两个异常处理程序将是:
|
||||
* `debug`;
|
||||
* `int3`.
|
||||
|
||||
You will not find these functions in the C code. all of that could be found in the kernel's `*.c/*.h` files only definition of these functions which are located in the [arch/x86/include/asm/traps.h](https://github.com/torvalds/linux/tree/master/arch/x86/include/asm/traps.h) kernel header file:
|
||||
你在C代码中找不到这些功能。这些所有功能都可以在内核的`*.c/*.h`文件中找到,这些功能的定义位于[arch/x86/include/asm/traps.h](https://github.com/torvalds/linux/tree/master/arch/x86/include/asm/traps.h)内核头文件:
|
||||
|
||||
```C
|
||||
asmlinkage void debug(void);
|
||||
@@ -124,10 +119,10 @@ and
|
||||
asmlinkage void int3(void);
|
||||
```
|
||||
|
||||
You may note `asmlinkage` directive in definitions of these functions. The directive is the special specificator of the [gcc](http://en.wikipedia.org/wiki/GNU_Compiler_Collection). Actually for a `C` functions which are called from assembly, we need in explicit declaration of the function calling convention. In our case, if function made with `asmlinkage` descriptor, then `gcc` will compile the function to retrieve parameters from stack.
|
||||
|
||||
您可能会在这些函数的定义中注意到`asmlinkage`指令。 该指令是[gcc](http://en.wikipedia.org/wiki/GNU_Compiler_Collection) 的特殊说明符。 实际上,对于从汇编中调用的C函数,我们需要显式声明函数调用约定。在我们的例子中,如果函数使用`asmlinkage`描述符创建,则`gcc`将编译该函数以从堆栈中检索参数。
|
||||
So, both handlers are defined in the [arch/x86/entry/entry_64.S](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/entry/entry_64.S) assembly source code file with the `idtentry` macro:
|
||||
|
||||
因此,这两个处理程序都在[arch/x86/entry/entry_64.S](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/entry/entry_64.S) 汇编源代码文件中定义`idtentry`宏:
|
||||
```assembly
|
||||
idtentry debug do_debug has_error_code=0 paranoid=1 shift_ist=DEBUG_STACK
|
||||
```
|
||||
@@ -138,20 +133,19 @@ and
|
||||
idtentry int3 do_int3 has_error_code=0 paranoid=1 shift_ist=DEBUG_STACK
|
||||
```
|
||||
|
||||
Each exception handler may be consists from two parts. The first part is generic part and it is the same for all exception handlers. An exception handler should to save [general purpose registers](https://en.wikipedia.org/wiki/Processor_register) on the stack, switch to kernel stack if an exception came from userspace and transfer control to the second part of an exception handler. The second part of an exception handler does certain work depends on certain exception. For example page fault exception handler should find virtual page for given address, invalid opcode exception handler should send `SIGILL` [signal](https://en.wikipedia.org/wiki/Unix_signal) and etc.
|
||||
每个异常处理程序可以由两部分组成。 第一部分是通用部分,所有异常处理程序都相同。 异常处理程序应将[general purpose registers](https://en.wikipedia.org/wiki/Processor_register) 保存在堆栈上,如果异常来自用户空间,则应切换到内核堆栈,并将控制权转移到异常的第二部分 处理程序。 异常处理程序的第二部分完成某些工作取决于某些异常。 例如,页面错误异常处理程序应找到给定地址的虚拟页面,无效的操作码异常处理程序应发送`SIGILL` [signal](https://en.wikipedia.org/wiki/Unix_signal) 等。
|
||||
正如我们所见,异常处理程序从[arch/x86/kernel/entry_64.S](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/kernel/entry_64.S) 汇编源代码文件,因此让我们看一下该宏的实现。 我们可以会看到,`idtentry`宏接受五个参数:
|
||||
|
||||
As we just saw, an exception handler starts from definition of the `idtentry` macro from the [arch/x86/kernel/entry_64.S](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/kernel/entry_64.S) assembly source code file, so let's look at implementation of this macro. As we may see, the `idtentry` macro takes five arguments:
|
||||
*`sym`用`globl name`定义全局符号,它将作为异常处理程序的入口;
|
||||
*`do_sym`符号名称,代表异常处理程序的辅助条目;
|
||||
*`has_error_code`有关异常错误代码的存在的信息。
|
||||
|
||||
* `sym` - defines global symbol with the `.globl name` which will be an an entry of exception handler;
|
||||
* `do_sym` - symbol name which represents a secondary entry of an exception handler;
|
||||
* `has_error_code` - information about existence of an error code of exception.
|
||||
最后两个参数是可选的:
|
||||
|
||||
The last two parameters are optional:
|
||||
*`paranoid`-向我们展示了如何检查当前模式(稍后将详细解释);
|
||||
*`shift_ist`-显示我们是在“中断堆栈表”上运行的异常。
|
||||
|
||||
* `paranoid` - shows us how we need to check current mode (will see explanation in details later);
|
||||
* `shift_ist` - shows us is an exception running at `Interrupt Stack Table`.
|
||||
|
||||
Definition of the `.idtentry` macro looks:
|
||||
`idtentry`宏的定义如下:
|
||||
|
||||
```assembly
|
||||
.macro idtentry sym do_sym has_error_code:req paranoid=0 shift_ist=-1
|
||||
@@ -165,6 +159,7 @@ END(\sym)
|
||||
|
||||
Before we will consider internals of the `idtentry` macro, we should to know state of stack when an exception occurs. As we may read in the [Intel® 64 and IA-32 Architectures Software Developer’s Manual 3A](http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html), the state of stack when an exception occurs is following:
|
||||
|
||||
在考虑`identry`宏的内部之前,我们应该知道发生异常时的堆栈状态。 正如我们可能会在[Intel®64 and IA-32 Architectures Software Developer's Manual 3A](http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html) ,则发生异常时的堆栈状态如下:
|
||||
```
|
||||
+------------+
|
||||
+40 | %SS |
|
||||
@@ -176,26 +171,24 @@ Before we will consider internals of the `idtentry` macro, we should to know sta
|
||||
+------------+
|
||||
```
|
||||
|
||||
Now we may start to consider implementation of the `idtmacro`. Both `#DB` and `BP` exception handlers are defined as:
|
||||
|
||||
现在我们可以开始考虑`idtmacro`的实现了。 `#DB`和`BP`异常处理程序都定义为:
|
||||
```assembly
|
||||
idtentry debug do_debug has_error_code=0 paranoid=1 shift_ist=DEBUG_STACK
|
||||
idtentry int3 do_int3 has_error_code=0 paranoid=1 shift_ist=DEBUG_STACK
|
||||
```
|
||||
|
||||
If we will look at these definitions, we may know that compiler will generate two routines with `debug` and `int3` names and both of these exception handlers will call `do_debug` and `do_int3` secondary handlers after some preparation. The third parameter defines existence of error code and as we may see both our exception do not have them. As we may see on the diagram above, processor pushes error code on stack if an exception provides it. In our case, the `debug` and `int3` exception do not have error codes. This may bring some difficulties because stack will look differently for exceptions which provides error code and for exceptions which not. That's why implementation of the `idtentry` macro starts from putting a fake error code to the stack if an exception does not provide it:
|
||||
|
||||
如果我们看一下这些定义,我们可能知道编译器将生成两个带有`debug`和`int3`名称的例程,并且这两个异常处理程序在经过一些准备后将调用`do_debug`和`do_int3`辅助处理程序。 第三个参数定义了错误代码的存在,并且我们可以看到我们的两个异常都没有它们。 如上图所示,如果有异常,处理器会将错误代码压入堆栈。 在我们的例子中,`debug`和`int3`异常没有错误代码。 这可能会带来一些困难,因为对于提供错误代码的异常和未提供错误代码的异常,堆栈的外观会有所不同。 这就是为什么`idtentry`宏的实现始于在异常未提供的情况下将伪造的错误代码放入堆栈的原因:
|
||||
```assembly
|
||||
.ifeq \has_error_code
|
||||
pushq $-1
|
||||
.endif
|
||||
```
|
||||
|
||||
But it is not only fake error-code. Moreover the `-1` also represents invalid system call number, so that the system call restart logic will not be triggered.
|
||||
这不仅是伪造的错误代码。此外,“-1”还代表无效的系统调用号码,因此系统调用重启逻辑将不会被触发。
|
||||
|
||||
The last two parameters of the `idtentry` macro `shift_ist` and `paranoid` allow to know do an exception handler runned at stack from `Interrupt Stack Table` or not. You already may know that each kernel thread in the system has own stack. In addition to these stacks, there are some specialized stacks associated with each processor in the system. One of these stacks is - exception stack. The [x86_64](https://en.wikipedia.org/wiki/X86-64) architecture provides special feature which is called - `Interrupt Stack Table`. This feature allows to switch to a new stack for designated events such as an atomic exceptions like `double fault` and etc. So the `shift_ist` parameter allows us to know do we need to switch on `IST` stack for an exception handler or not.
|
||||
`idtentry`宏`shift_ist`和`paranoid`的最后两个参数允许您知道是否从`Interrupt Stack Table`运行在堆栈上的异常处理程序。您可能已经知道系统中的每个内核线程都有自己的堆栈。除了这些堆栈外,还有一些专用堆栈与系统中的每个处理器相关联。这些堆栈之一是-异常堆栈。 [x86_64](https://en.wikipedia.org/wiki/X86-64) 架构提供了称为`中断堆栈表`的特殊功能。此功能允许针对指定事件(例如原子异常(如double fault)等)切换到新堆栈。因此,使用shift_ist参数可以让我们知道是否需要为异常处理程序打开IST堆栈。
|
||||
|
||||
The second parameter - `paranoid` defines the method which helps us to know did we come from userspace or not to an exception handler. The easiest way to determine this is to via `CPL` or `Current Privilege Level` in `CS` segment register. If it is equal to `3`, we came from userspace, if zero we came from kernel space:
|
||||
第二个参数`paranoid`定义了一种方法,该方法可以帮助我们知道我们是来自用户空间还是来自异常处理程序。确定这一点的最简单方法是通过`CS`段寄存器中的`CPL`或`Current Privilege Level`。如果等于`3`,则来自用户空间;如果为零,则来自内核空间:
|
||||
|
||||
```
|
||||
testl $3,CS(%rsp)
|
||||
@@ -203,18 +196,18 @@ jnz userspace
|
||||
...
|
||||
...
|
||||
...
|
||||
// we are from the kernel space
|
||||
|
||||
// 我们来自内核空间
|
||||
```
|
||||
|
||||
But unfortunately this method does not give a 100% guarantee. As described in the kernel documentation:
|
||||
|
||||
但是不幸的是,这种方法不能100%的保证。如内核文档中所述:
|
||||
> if we are in an NMI/MCE/DEBUG/whatever super-atomic entry context,
|
||||
> which might have triggered right after a normal entry wrote CS to the
|
||||
> stack but before we executed SWAPGS, then the only safe way to check
|
||||
> for GS is the slower method: the RDMSR.
|
||||
|
||||
In other words for example `NMI` could happen inside the critical section of a [swapgs](http://www.felixcloutier.com/x86/SWAPGS.html) instruction. In this way we should check value of the `MSR_GS_BASE` [model specific register](https://en.wikipedia.org/wiki/Model-specific_register) which stores pointer to the start of per-cpu area. So to check did we come from userspace or not, we should to check value of the `MSR_GS_BASE` model specific register and if it is negative we came from kernel space, in other way we came from userspace:
|
||||
|
||||
换句话说,例如,`NMI`可能发生在[swapgs](http://www.felixcloutier.com/x86/SWAPGS.html) 指令的关键部分内。 这样,我们应该检查`MSR_GS_BASE` [模型专用寄存器](https://en.wikipedia.org/wiki/Model-specific_register) 的值,该值存储指向每个cpu区域开始的指针。 因此,要检查我们是否来自用户空间,我们应该检查`MSR_GS_BASE`模型特定寄存器的值,如果它是负数,则来自内核空间,否则来自用户空间:
|
||||
```assembly
|
||||
movl $MSR_GS_BASE,%ecx
|
||||
rdmsr
|
||||
@@ -222,15 +215,14 @@ testl %edx,%edx
|
||||
js 1f
|
||||
```
|
||||
|
||||
In first two lines of code we read value of the `MSR_GS_BASE` model specific register into `edx:eax` pair. We can't set negative value to the `gs` from userspace. But from other side we know that direct mapping of the physical memory starts from the `0xffff880000000000` virtual address. In this way, `MSR_GS_BASE` will contain an address from `0xffff880000000000` to `0xffffc7ffffffffff`. After the `rdmsr` instruction will be executed, the smallest possible value in the `%edx` register will be - `0xffff8800` which is `-30720` in unsigned 4 bytes. That's why kernel space `gs` which points to start of `per-cpu` area will contain negative value.
|
||||
|
||||
After we pushed fake error code on the stack, we should allocate space for general purpose registers with:
|
||||
在前两行代码中,我们将模型专用寄存器`MSR_GS_BASE`的值读入edx:eax对。 我们不能从用户空间为gs设置负值。 但是从另一面我们知道,物理内存的直接映射是从虚拟地址`0xffff880000000000`开始的。 这样,`MSR_GS_BASE`将包含从`0xffff880000000000`到`0xffffc7ffffffffff`的地址。 执行完`rdmsr`指令后,`%edx`寄存器中的最小可能值为-`0xffff8800`,即无符号4个字节的`-30720`。 这就是指向`每个CPU`区域开始的内核空间`gs`包含负值的原因。
|
||||
将伪错误代码压入堆栈后,我们应该使用以下命令为通用寄存器分配空间:
|
||||
|
||||
```assembly
|
||||
ALLOC_PT_GPREGS_ON_STACK
|
||||
```
|
||||
|
||||
macro which is defined in the [arch/x86/entry/calling.h](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/entry/calling.h) header file. This macro just allocates 15*8 bytes space on the stack to preserve general purpose registers:
|
||||
在[arch / x86 / entry / calling.h](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/entry/calling.h) 头文件中定义的宏。 该宏仅在堆栈上分配15 * 8字节空间以保留通用寄存器:
|
||||
|
||||
```assembly
|
||||
.macro ALLOC_PT_GPREGS_ON_STACK addskip=0
|
||||
@@ -238,7 +230,8 @@ macro which is defined in the [arch/x86/entry/calling.h](https://github.com/torv
|
||||
.endm
|
||||
```
|
||||
|
||||
So the stack will look like this after execution of the `ALLOC_PT_GPREGS_ON_STACK`:
|
||||
|
||||
因此,在执行`ALLOC_PT_GPREGS_ON_STACK`之后,堆栈将如下所示:
|
||||
|
||||
```
|
||||
+------------+
|
||||
@@ -267,7 +260,9 @@ So the stack will look like this after execution of the `ALLOC_PT_GPREGS_ON_STAC
|
||||
+------------+
|
||||
```
|
||||
|
||||
After we allocated space for general purpose registers, we do some checks to understand did an exception come from userspace or not and if yes, we should move back to an interrupted process stack or stay on exception stack:
|
||||
|
||||
在为通用寄存器分配空间之后,我们进行一些检查以了解异常是否来自用户空间,如果是,则应移回中断的进程堆栈或保留在异常堆栈上:
|
||||
|
||||
|
||||
```assembly
|
||||
.if \paranoid
|
||||
@@ -281,28 +276,31 @@ After we allocated space for general purpose registers, we do some checks to und
|
||||
.endif
|
||||
```
|
||||
|
||||
Let's consider all of these there cases in course.
|
||||
|
||||
An exception occured in userspace
|
||||
|
||||
让我们考虑一下所有情况
|
||||
|
||||
用户空间中发生异常
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
In the first let's consider a case when an exception has `paranoid=1` like our `debug` and `int3` exceptions. In this case we check selector from `CS` segment register and jump at `1f` label if we came from userspace or the `paranoid_entry` will be called in other way.
|
||||
|
||||
首先,让我们考虑一个异常具有像我们的`debug`和`int3`异常这样的`paranoid = 1`的情况。 在这种情况下,如果来自用户空间,否则我们将从CS段寄存器中检查选择器,并跳转到`1f`标签上,否则将以其他方式调用`paranoid_entry`。
|
||||
Let's consider first case when we came from userspace to an exception handler. As described above we should jump at `1` label. The `1` label starts from the call of the
|
||||
|
||||
```assembly
|
||||
call error_entry
|
||||
```
|
||||
|
||||
routine which saves all general purpose registers in the previously allocated area on the stack:
|
||||
|
||||
该例程将所有通用寄存器保存在堆栈中先前分配的区域中:
|
||||
```assembly
|
||||
SAVE_C_REGS 8
|
||||
SAVE_EXTRA_REGS 8
|
||||
```
|
||||
|
||||
These both macros are defined in the [arch/x86/entry/calling.h](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/entry/calling.h) header file and just move values of general purpose registers to a certain place at the stack, for example:
|
||||
|
||||
|
||||
|
||||
这两个宏都在[arch/x86/entry/calling.h](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/entry/calling.h) 头文件中定义并移动 通用寄存器的值到堆栈中的某个位置,例如:
|
||||
```assembly
|
||||
.macro SAVE_EXTRA_REGS offset=0
|
||||
movq %r15, 0*8+\offset(%rsp)
|
||||
@@ -314,7 +312,7 @@ These both macros are defined in the [arch/x86/entry/calling.h](https://github.
|
||||
.endm
|
||||
```
|
||||
|
||||
After execution of `SAVE_C_REGS` and `SAVE_EXTRA_REGS` the stack will look:
|
||||
执行`SAVE_C_REGS`和`SAVE_EXTRA_REGS`之后,堆栈将如下所示:
|
||||
|
||||
```
|
||||
+------------+
|
||||
@@ -343,23 +341,21 @@ After execution of `SAVE_C_REGS` and `SAVE_EXTRA_REGS` the stack will look:
|
||||
+------------+
|
||||
```
|
||||
|
||||
After the kernel saved general purpose registers at the stack, we should check that we came from userspace space again with:
|
||||
|
||||
在内核将通用寄存器保存在堆栈中之后,应该使用以下命令再次检查来自用户空间:
|
||||
```assembly
|
||||
testb $3, CS+8(%rsp)
|
||||
jz .Lerror_kernelspace
|
||||
```
|
||||
|
||||
because we may have potentially fault if as described in documentation truncated `%RIP` was reported. Anyway, in both cases the [SWAPGS](http://www.felixcloutier.com/x86/SWAPGS.html) instruction will be executed and values from `MSR_KERNEL_GS_BASE` and `MSR_GS_BASE` will be swapped. From this moment the `%gs` register will point to the base address of kernel structures. So, the `SWAPGS` instruction is called and it was main point of the `error_entry` routing.
|
||||
|
||||
Now we can back to the `idtentry` macro. We may see following assembler code after the call of `error_entry`:
|
||||
因为如果报告文档中描述的`%RIP`被截断,我们可能有潜在的错误。 无论如何,在两种情况下,都将执行[SWAPGS](http://www.felixcloutier.com/x86/SWAPGS.html) 指令,并且将交换` MSR_KERNEL_GS_BASE`和` MSR_GS_BASE`中的值。 从这一刻开始,`%gs`寄存器将指向内核结构的基址。 因此,调用了`SWAPGS`指令,这是`error_entry`路由的要点。
|
||||
|
||||
现在我们可以回到`idtentry`宏。 调用`error_entry`之后,我们可能会看到以下汇编代码:
|
||||
```assembly
|
||||
movq %rsp, %rdi
|
||||
call sync_regs
|
||||
```
|
||||
|
||||
Here we put base address of stack pointer `%rdi` register which will be first argument (according to [x86_64 ABI](https://www.uclibc.org/docs/psABI-x86_64.pdf)) of the `sync_regs` function and call this function which is defined in the [arch/x86/kernel/traps.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/kernel/traps.c) source code file:
|
||||
在这里,我们将堆栈指针`%rdi`寄存器的基地址放入其中,这将是`sync_regs`的第一个参数(根据[x86_64 ABI](https://www.uclibc.org/docs/psABI-x86_64.pdf) ) 函数并调用[arch / x86 / kernel / traps.c](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/kernel/traps.c) 源代码中定义的函数 文件:
|
||||
|
||||
```C
|
||||
asmlinkage __visible notrace struct pt_regs *sync_regs(struct pt_regs *eregs)
|
||||
@@ -370,29 +366,31 @@ asmlinkage __visible notrace struct pt_regs *sync_regs(struct pt_regs *eregs)
|
||||
}
|
||||
```
|
||||
|
||||
This function takes the result of the `task_ptr_regs` macro which is defined in the [arch/x86/include/asm/processor.h](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/include/asm/processor.h) header file, stores it in the stack pointer and return it. The `task_ptr_regs` macro expands to the address of `thread.sp0` which represents pointer to the normal kernel stack:
|
||||
|
||||
此函数采用在[arch/x86/include/asm/processor.h]中定义的`task_ptr_regs`宏的结果(https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/ include / asm / processor.h)头文件,将其存储在堆栈指针中并返回。 宏`task_ptr_regs`扩展为`thread.sp0`的地址,该地址表示指向普通内核堆栈的指针:
|
||||
```C
|
||||
#define task_pt_regs(tsk) ((struct pt_regs *)(tsk)->thread.sp0 - 1)
|
||||
```
|
||||
|
||||
As we came from userspace, this means that exception handler will run in real process context. After we got stack pointer from the `sync_regs` we switch stack:
|
||||
|
||||
正如来自用户空间一样,这意味着异常处理程序将在实际流程上下文中运行。 从`sync_regs`获取堆栈指针后,我们切换堆栈:
|
||||
|
||||
```assembly
|
||||
movq %rax, %rsp
|
||||
```
|
||||
|
||||
The last two steps before an exception handler will call secondary handler are:
|
||||
异常处理程序将调用辅助处理程序之前的最后两个步骤是:
|
||||
|
||||
1. Passing pointer to `pt_regs` structure which contains preserved general purpose registers to the `%rdi` register:
|
||||
1.传递指向`pt_regs`结构的指针,该结构包含保留的通用寄存器到`%rdi`寄存器:
|
||||
|
||||
```assembly
|
||||
movq %rsp, %rdi
|
||||
```
|
||||
|
||||
as it will be passed as first parameter of secondary exception handler.
|
||||
|
||||
2. Pass error code to the `%rsi` register as it will be second argument of an exception handler and set it to `-1` on the stack for the same purpose as we did it before - to prevent restart of a system call:
|
||||
因为它将作为辅助异常处理程序的第一个参数传递。
|
||||
|
||||
|
||||
2.将错误代码传递到`%rsi`寄存器,因为它将是异常处理程序的第二个参数,并在堆栈上将其设置为-1,其目的与我们之前相同 防止重新启动系统调用 :
|
||||
|
||||
```
|
||||
.if \has_error_code
|
||||
@@ -403,10 +401,10 @@ as it will be passed as first parameter of secondary exception handler.
|
||||
.endif
|
||||
```
|
||||
|
||||
Additionally you may see that we zeroed the `%esi` register above in a case if an exception does not provide error code.
|
||||
|
||||
In the end we just call secondary exception handler:
|
||||
另外,如果异常不提供错误代码,可能会看到我们将上面的`%esi`寄存器清零了。
|
||||
|
||||
最后,我们只调用辅助异常处理程序:
|
||||
```assembly
|
||||
call \do_sym
|
||||
```
|
||||
@@ -417,20 +415,22 @@ which:
|
||||
dotraplinkage void do_debug(struct pt_regs *regs, long error_code);
|
||||
```
|
||||
|
||||
will be for `debug` exception and:
|
||||
将用于`debug`异常和:
|
||||
|
||||
```C
|
||||
dotraplinkage void notrace do_int3(struct pt_regs *regs, long error_code);
|
||||
```
|
||||
|
||||
will be for `int 3` exception. In this part we will not see implementations of secondary handlers, because of they are very specific, but will see some of them in one of next parts.
|
||||
|
||||
We just considered first case when an exception occurred in userspace. Let's consider last two.
|
||||
|
||||
An exception with paranoid > 0 occurred in kernelspace
|
||||
将用于`int 3`例外。 在本部分中,我们将看不到辅助处理程序的实现,因为它们非常具体,但是在下一部分中将看到其中的一些。
|
||||
|
||||
我们只是考虑了在用户空间中发生异常的第一种情况。 我们考虑最后两个。
|
||||
|
||||
内核空间中发生了偏执> 0的异常
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
In this case an exception was occurred in kernelspace and `idtentry` macro is defined with `paranoid=1` for this exception. This value of `paranoid` means that we should use slower way that we saw in the beginning of this part to check do we really came from kernelspace or not. The `paranoid_entry` routing allows us to know this:
|
||||
在这种情况下,内核空间中发生了异常,并且为该异常使用`paranoid = 1`定义了`idtentry`宏。 `paranoid`的值意味着我们应该使用在本部分开头看到的更慢的方式来检查我们是否真的来自内核空间。 `paranoid_entry`路由使我们知道这一点:
|
||||
|
||||
```assembly
|
||||
ENTRY(paranoid_entry)
|
||||
@@ -448,7 +448,7 @@ ENTRY(paranoid_entry)
|
||||
END(paranoid_entry)
|
||||
```
|
||||
|
||||
As you may see, this function represents the same that we covered before. We use second (slow) method to get information about previous state of an interrupted task. As we checked this and executed `SWAPGS` in a case if we came from userspace, we should to do the same that we did before: We need to put pointer to a structure which holds general purpose registers to the `%rdi` (which will be first parameter of a secondary handler) and put error code if an exception provides it to the `%rsi` (which will be second parameter of a secondary handler):
|
||||
如您所见,此功能代表了我们之前介绍的功能。 我们使用第二(慢)方法来获取有关被中断任务的先前状态的信息。 当我们检查并在来自用户空间的情况下执行`SWAPGS`时,我们应该做与之前相同的操作:我们需要将指针指向一个结构,该结构将通用寄存器保存到`%rdi`( 将是辅助处理程序的第一个参数),如果异常将其提供给%rsi(将是辅助处理程序的第二个参数),则放置错误代码:
|
||||
|
||||
```assembly
|
||||
movq %rsp, %rdi
|
||||
@@ -461,47 +461,43 @@ movq %rsp, %rdi
|
||||
.endif
|
||||
```
|
||||
|
||||
The last step before a secondary handler of an exception will be called is cleanup of new `IST` stack fram:
|
||||
|
||||
调用异常的辅助处理程序之前的最后一步是清理新的`IST`堆栈帧:
|
||||
```assembly
|
||||
.if \shift_ist != -1
|
||||
subq $EXCEPTION_STKSZ, CPU_TSS_IST(\shift_ist)
|
||||
.endif
|
||||
```
|
||||
|
||||
You may remember that we passed the `shift_ist` as argument of the `idtentry` macro. Here we check its value and if its not equal to `-1`, we get pointer to a stack from `Interrupt Stack Table` by `shift_ist` index and setup it.
|
||||
|
||||
In the end of this second way we just call secondary exception handler as we did it before:
|
||||
您可能还记得我们将`shift_ist`作为`iddentry`宏的参数传递了。 在这里,我们检查其值,如果其值不等于-1,则通过`shift_ist` 索引从`中断堆栈表`中获取指向堆栈的指针并进行设置。
|
||||
|
||||
在第二种方法的结尾,我们只是像以前一样调用辅助异常处理程序:
|
||||
```assembly
|
||||
call \do_sym
|
||||
```
|
||||
|
||||
The last method is similar to previous both, but an exception occured with `paranoid=0` and we may use fast method determination of where we are from.
|
||||
|
||||
Exit from an exception handler
|
||||
最后一种方法与前面两种方法都相似,但是`paranoid = 0`发生了例外,我们可以使用快速方法确定我们的来源。
|
||||
从异常处理程序退出
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
After secondary handler will finish its works, we will return to the `idtentry` macro and the next step will be jump to the `error_exit`:
|
||||
|
||||
|
||||
在辅助处理程序完成工作之后,我们将返回到`idtentry`宏,下一步将跳转到`error_exit`:
|
||||
```assembly
|
||||
jmp error_exit
|
||||
```
|
||||
|
||||
routine. The `error_exit` function defined in the same [arch/x86/entry/entry_64.S](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/entry/entry_64.S) assembly source code file and the main goal of this function is to know where we are from (from userspace or kernelspace) and execute `SWPAGS` depends on this. Restore registers to previous state and execute `iret` instruction to transfer control to an interrupted task.
|
||||
在相同的[arch/x86/entry/entry_64.S](https://github.com/torvalds/linux/blob/16f73eb02d7e1765ccab3d2018e0bd98eb93d973/arch/x86/entry/entry_64.S) 汇编源代码中定义的`error_exit`函数 文件,此功能的主要目标是从用户空间或内核空间知道我们的位置,并根据此位置执行`SWPAGS`。 将寄存器恢复到先前的状态,并执行`iret`指令将控制权转移到中断的任务。
|
||||
|
||||
That's all.
|
||||
|
||||
Conclusion
|
||||
总结完毕
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
It is the end of the third part about interrupts and interrupt handling in the Linux kernel. We saw the initialization of the [Interrupt descriptor table](https://en.wikipedia.org/wiki/Interrupt_descriptor_table) in the previous part with the `#DB` and `#BP` gates and started to dive into preparation before control will be transferred to an exception handler and implementation of some interrupt handlers in this part. In the next part we will continue to dive into this theme and will go next by the `setup_arch` function and will try to understand interrupts handling related stuff.
|
||||
|
||||
If you have any questions or suggestions write me a comment or ping me at [twitter](https://twitter.com/0xAX).
|
||||
|
||||
**Please note that English is not my first language, And I am really sorry for any inconvenience. If you find any mistakes please send me PR to [linux-insides](https://github.com/0xAX/linux-insides).**
|
||||
|
||||
Links
|
||||
第三部分到此结束,有关Linux内核中的中断和中断处理。 在上一部分中,我们看到了使用`#DB`和`#BP`对[中断描述符表](https://en.wikipedia.org/wiki/Interrupt_descriptor_table) 的初始化,并开始进行控制之前的准备工作 将被转移到异常处理程序和这部分中某些中断处理程序的实现。 在下一部分中,我们将继续深入探讨该主题,然后通过`setup_arch`函数进行下一步,并尝试了解处理相关内容的中断。
|
||||
如果您有任何疑问或建议,请在[twitter](https://twitter.com/0xAX) 上给我写评论或ping我。
|
||||
**请注意,英语不是我的母语,对于由此带来的不便,我深表歉意。如果发现任何错误,请将PR发送给[linux-insides](https://github.com/0xAX/linux-insides)。**
|
||||
以下链接
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
* [Debug registers](http://en.wikipedia.org/wiki/X86_debug_register)
|
||||
|
||||
@@ -207,7 +207,7 @@ Tasklets
|
||||
* `TASKLET_SOFTIRQ`;
|
||||
* `HI_SOFTIRQ`.
|
||||
|
||||
简而言之,`tasklets` 是运行时分配和初始化的软中断。和软中断不同的是,同一类型的 `tasklets` 可以在同一时间运行于不同的处理器上。我们已经了解到一些关于软中断的知识,当然上面的文字并不能详细讲解所有的细节,但我们现在可以通过直接阅读代码一步步的更深入了解软中断。我们返回到开始部分讨论的 `softirq_init` 函数实现,这个函数在 [kernel/softirq.c](https://github.com/torvalds/linux/blob/master/kernel/softirq.c) 中定义如下:
|
||||
简而言之,`tasklets` 是运行时分配和初始化的软中断。和软中断不同的是,同一类型的 `tasklets` 不能同时运行在多个处理器上。我们已经了解到一些关于软中断的知识,当然上面的文字并不能详细讲解所有的细节,但我们现在可以通过直接阅读代码一步步的更深入了解软中断。我们返回到开始部分讨论的 `softirq_init` 函数实现,这个函数在 [kernel/softirq.c](https://github.com/torvalds/linux/blob/master/kernel/softirq.c) 中定义如下:
|
||||
|
||||
```C
|
||||
void __init softirq_init(void)
|
||||
|
||||
@@ -37,13 +37,13 @@
|
||||
|├ [2.4](https://github.com/MintCN/linux-insides-zh/blob/master/Initialization/linux-initialization-4.md)|[@bjwrkj](https://github.com/bjwrkj)|已完成|
|
||||
|├ [2.5](https://github.com/MintCN/linux-insides-zh/blob/master/Initialization/linux-initialization-5.md)|[@NeoCui](https://github.com/NeoCui)|更新至[cf32dc6c](https://github.com/0xAX/linux-insides/commit/cf32dc6c81abce567af330c480afc3d58678443d)|
|
||||
|├ [2.6](https://github.com/MintCN/linux-insides-zh/blob/master/Initialization/linux-initialization-6.md)|[@kele1997](https://github.com/kele1997)|更新至[e896e56](https://github.com/0xAX/linux-insides/commit/e896e56c867876397ef78da58d5e2a31b2e690b6)|
|
||||
|├ [2.7](https://github.com/MintCN/linux-insides-zh/blob/master/Initialization/linux-initialization-7.md)||未开始|
|
||||
|├ [2.7](https://github.com/MintCN/linux-insides-zh/blob/master/Initialization/linux-initialization-7.md)|[@kele1997](https://github.com/kele1997)|正在进行|
|
||||
|├ [2.8](https://github.com/MintCN/linux-insides-zh/blob/master/Initialization/linux-initialization-8.md)||未开始|
|
||||
|├ [2.9](https://github.com/MintCN/linux-insides-zh/blob/master/Initialization/linux-initialization-9.md)||未开始|
|
||||
|├ [2.9](https://github.com/MintCN/linux-insides-zh/blob/master/Initialization/linux-initialization-9.md)|[@hindlee](https://github.com/hindlee)|正在进行|
|
||||
|└ [2.10](https://github.com/MintCN/linux-insides-zh/blob/master/Initialization/linux-initialization-10.md)||未开始|
|
||||
| 3. [Interrupts](https://github.com/MintCN/linux-insides-zh/tree/master/Interrupts)||正在进行|
|
||||
|├ [3.0](https://github.com/MintCN/linux-insides-zh/blob/master/Interrupts/README.md)|[@littleneko](https://github.com/littleneko)|更新至[57279321](https://github.com/0xAX/linux-insides/commit/5727932167a2ff6a1e647081c85d081d4ed8b508)|
|
||||
|├ [3.1](https://github.com/MintCN/linux-insides-zh/blob/master/Interrupts/linux-interrupts-1.md)||未开始|
|
||||
|├ [3.1](https://github.com/MintCN/linux-insides-zh/blob/master/Interrupts/linux-interrupts-1.md)|[@Albertchamberlain](https://github.com/Albertchamberlain)|更新至[e58c06bf](https://github.com/0xAX/linux-insides/commit/e58c06bfca60d4af25d92562de1ee9959992fc68)|
|
||||
|├ [3.2](https://github.com/MintCN/linux-insides-zh/blob/master/Interrupts/linux-interrupts-2.md)|[@narcijie](https://github.com/narcijie)|更新至[4d635117](https://github.com/0xAX/linux-insides/commit/4d6351172486e5c046a7d3db2286fc0d0d0d7789)|
|
||||
|├ [3.3](https://github.com/MintCN/linux-insides-zh/blob/master/Interrupts/linux-interrupts-3.md)||未开始|
|
||||
|├ [3.4](https://github.com/MintCN/linux-insides-zh/blob/master/Interrupts/linux-interrupts-4.md)||未开始|
|
||||
@@ -68,7 +68,7 @@
|
||||
|├ [5.3](https://github.com/MintCN/linux-insides-zh/blob/master/Timers/linux-timers-3.md)||未开始|
|
||||
|├ [5.4](https://github.com/MintCN/linux-insides-zh/blob/master/Timers/linux-timers-4.md)||未开始|
|
||||
|├ [5.5](https://github.com/MintCN/linux-insides-zh/blob/master/Timers/linux-timers-5.md)||未开始|
|
||||
|├ [5.6](https://github.com/MintCN/linux-insides-zh/blob/master/Timers/linux-timers-6.md)||未开始|
|
||||
|├ [5.6](https://github.com/MintCN/linux-insides-zh/blob/master/Timers/linux-timers-6.md)||正在进行|
|
||||
|└ [5.7](https://github.com/MintCN/linux-insides-zh/blob/master/Timers/linux-timers-7.md)||未开始|
|
||||
| 6. [Synchronization primitives](https://github.com/MintCN/linux-insides-zh/tree/master/SyncPrim)||正在进行|
|
||||
|├ [6.0](https://github.com/MintCN/linux-insides-zh/blob/master/SyncPrim/README.md)|[@mudongliang](https://github.com/mudongliang)|更新至[6f85b63e](https://github.com/0xAX/linux-insides/commit/6f85b63e347b636e08e965e9dc22c177e972afe2)|
|
||||
|
||||
Reference in New Issue
Block a user