mirror of
https://github.com/MintCN/linux-insides-zh.git
synced 2026-04-24 18:50:42 +08:00
翻译了理论这一章节的README,其他两个文件只是修改了名字,之后会尽快翻译。另外,修改Summery对应章节的名字。
This commit is contained in:
@@ -37,9 +37,9 @@
|
||||
* [Data Structures in the Linux Kernel](DataStructures/README.md)
|
||||
* [Doubly linked list](DataStructures/dlist.md)
|
||||
* [Radix tree](DataStructures/radix-tree.md)
|
||||
* [Theory](Theory/README.md)
|
||||
* [Paging](Theory/Paging.md)
|
||||
* [Elf64](Theory/ELF.md)
|
||||
* [理论](Theory/README.md)
|
||||
* [分页](Theory/Paging.md)
|
||||
* [Elf64 格式](Theory/ELF.md)
|
||||
* [CPUID]()
|
||||
* [MSR]()
|
||||
* [Initial ram disk]()
|
||||
|
||||
218
Theory/ELF.md
Normal file
218
Theory/ELF.md
Normal file
@@ -0,0 +1,218 @@
|
||||
ELF文件格式
|
||||
================================================================================
|
||||
|
||||
ELF (Executable and Linkable Format) is a standard file format for executable files, object code, shared libraries and core dumps. Linux and many UNIX-like operating systems use this format. Let's look at the structure of the ELF-64 Object File Format and some definitions in the linux kernel source code which related with it.
|
||||
|
||||
An ELF object file consists of the following parts:
|
||||
|
||||
* ELF header - describes the main characteristics of the object file: type, CPU architecture, the virtual address of the entry point, the size and offset of the remaining parts, etc...;
|
||||
* Program header table - lists the available segments and their attributes. Program header table need loaders for placing sections of the file as virtual memory segments;
|
||||
* Section header table - contains the description of the sections.
|
||||
|
||||
Now let's have a closer look on these components.
|
||||
|
||||
**ELF header**
|
||||
|
||||
The ELF header is located at the beginning of the object file. Its main purpose is to locate all other parts of the object file. The File header contains the following fields:
|
||||
|
||||
* ELF identification - array of bytes which helps identify the file as an ELF object file and also provides information about general object file characteristic;
|
||||
* Object file type - identifies the object file type. This field can describe that ELF file is a relocatable object file, an executable file, etc...;
|
||||
* Target architecture;
|
||||
* Version of the object file format;
|
||||
* Virtual address of the program entry point;
|
||||
* File offset of the program header table;
|
||||
* File offset of the section header table;
|
||||
* Size of an ELF header;
|
||||
* Size of a program header table entry;
|
||||
* and other fields...
|
||||
|
||||
You can find the `elf64_hdr` structure which presents ELF64 header in the linux kernel source code:
|
||||
|
||||
```C
|
||||
typedef struct elf64_hdr {
|
||||
unsigned char e_ident[EI_NIDENT];
|
||||
Elf64_Half e_type;
|
||||
Elf64_Half e_machine;
|
||||
Elf64_Word e_version;
|
||||
Elf64_Addr e_entry;
|
||||
Elf64_Off e_phoff;
|
||||
Elf64_Off e_shoff;
|
||||
Elf64_Word e_flags;
|
||||
Elf64_Half e_ehsize;
|
||||
Elf64_Half e_phentsize;
|
||||
Elf64_Half e_phnum;
|
||||
Elf64_Half e_shentsize;
|
||||
Elf64_Half e_shnum;
|
||||
Elf64_Half e_shstrndx;
|
||||
} Elf64_Ehdr;
|
||||
```
|
||||
|
||||
This structure defined in the [elf.h](https://github.com/torvalds/linux/blob/master/include/uapi/linux/elf.h#L220)
|
||||
|
||||
**Sections**
|
||||
|
||||
All data stores in a sections in an Elf object file. Sections identified by index in the section header table. Section header contains following fields:
|
||||
|
||||
* Section name;
|
||||
* Section type;
|
||||
* Section attributes;
|
||||
* Virtual address in memory;
|
||||
* Offset in file;
|
||||
* Size of section;
|
||||
* Link to other section;
|
||||
* Miscellaneous information;
|
||||
* Address alignment boundary;
|
||||
* Size of entries, if section has table;
|
||||
|
||||
And presented with the following `elf64_shdr` structure in the linux kernel:
|
||||
|
||||
```C
|
||||
typedef struct elf64_shdr {
|
||||
Elf64_Word sh_name;
|
||||
Elf64_Word sh_type;
|
||||
Elf64_Xword sh_flags;
|
||||
Elf64_Addr sh_addr;
|
||||
Elf64_Off sh_offset;
|
||||
Elf64_Xword sh_size;
|
||||
Elf64_Word sh_link;
|
||||
Elf64_Word sh_info;
|
||||
Elf64_Xword sh_addralign;
|
||||
Elf64_Xword sh_entsize;
|
||||
} Elf64_Shdr;
|
||||
```
|
||||
|
||||
[elf.h](https://github.com/torvalds/linux/blob/master/include/uapi/linux/elf.h#L312)
|
||||
|
||||
**Program header table**
|
||||
|
||||
All sections are grouped into segments in an executable or shared object file. Program header is an array of structures which describe every segment. It looks like:
|
||||
|
||||
```C
|
||||
typedef struct elf64_phdr {
|
||||
Elf64_Word p_type;
|
||||
Elf64_Word p_flags;
|
||||
Elf64_Off p_offset;
|
||||
Elf64_Addr p_vaddr;
|
||||
Elf64_Addr p_paddr;
|
||||
Elf64_Xword p_filesz;
|
||||
Elf64_Xword p_memsz;
|
||||
Elf64_Xword p_align;
|
||||
} Elf64_Phdr;
|
||||
```
|
||||
|
||||
in the linux kernel source code.
|
||||
|
||||
`elf64_phdr` defined in the same [elf.h](https://github.com/torvalds/linux/blob/master/include/uapi/linux/elf.h#L254).
|
||||
|
||||
The ELF object file also contains other fields/structures which you can find in the [Documentation](http://www.uclibc.org/docs/elf-64-gen.pdf). Now let's a look at the `vmlinux` ELF object.
|
||||
|
||||
vmlinux
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
`vmlinux` is also a relocatable ELF object file . We can take a look at it with the `readelf` util. First of all let's look at the header:
|
||||
|
||||
```
|
||||
$ readelf -h vmlinux
|
||||
ELF Header:
|
||||
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
|
||||
Class: ELF64
|
||||
Data: 2's complement, little endian
|
||||
Version: 1 (current)
|
||||
OS/ABI: UNIX - System V
|
||||
ABI Version: 0
|
||||
Type: EXEC (Executable file)
|
||||
Machine: Advanced Micro Devices X86-64
|
||||
Version: 0x1
|
||||
Entry point address: 0x1000000
|
||||
Start of program headers: 64 (bytes into file)
|
||||
Start of section headers: 381608416 (bytes into file)
|
||||
Flags: 0x0
|
||||
Size of this header: 64 (bytes)
|
||||
Size of program headers: 56 (bytes)
|
||||
Number of program headers: 5
|
||||
Size of section headers: 64 (bytes)
|
||||
Number of section headers: 73
|
||||
Section header string table index: 70
|
||||
```
|
||||
|
||||
Here we can see that `vmlinux` is a 64-bit executable file.
|
||||
|
||||
We can read from the [Documentation/x86/x86_64/mm.txt](https://github.com/torvalds/linux/blob/master/Documentation/x86/x86_64/mm.txt#L19):
|
||||
|
||||
```
|
||||
ffffffff80000000 - ffffffffa0000000 (=512 MB) kernel text mapping, from phys 0
|
||||
```
|
||||
|
||||
We can then look this address up in the `vmlinux` ELF object with:
|
||||
|
||||
```
|
||||
$ readelf -s vmlinux | grep ffffffff81000000
|
||||
1: ffffffff81000000 0 SECTION LOCAL DEFAULT 1
|
||||
65099: ffffffff81000000 0 NOTYPE GLOBAL DEFAULT 1 _text
|
||||
90766: ffffffff81000000 0 NOTYPE GLOBAL DEFAULT 1 startup_64
|
||||
```
|
||||
|
||||
Note that the address of the `startup_64` routine is not `ffffffff80000000`, but `ffffffff81000000` and now I'll explain why.
|
||||
|
||||
We can see following definition in the [arch/x86/kernel/vmlinux.lds.S](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/vmlinux.lds.S):
|
||||
|
||||
```
|
||||
. = __START_KERNEL;
|
||||
...
|
||||
...
|
||||
..
|
||||
/* Text and read-only data */
|
||||
.text : AT(ADDR(.text) - LOAD_OFFSET) {
|
||||
_text = .;
|
||||
...
|
||||
...
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
Where `__START_KERNEL` is:
|
||||
|
||||
```
|
||||
#define __START_KERNEL (__START_KERNEL_map + __PHYSICAL_START)
|
||||
```
|
||||
|
||||
`__START_KERNEL_map` is the value from the documentation - `ffffffff80000000` and `__PHYSICAL_START` is `0x1000000`. That's why address of the `startup_64` is `ffffffff81000000`.
|
||||
|
||||
And at last we can get program headers from `vmlinux` with the following command:
|
||||
|
||||
```
|
||||
readelf -l vmlinux
|
||||
|
||||
Elf file type is EXEC (Executable file)
|
||||
Entry point 0x1000000
|
||||
There are 5 program headers, starting at offset 64
|
||||
|
||||
Program Headers:
|
||||
Type Offset VirtAddr PhysAddr
|
||||
FileSiz MemSiz Flags Align
|
||||
LOAD 0x0000000000200000 0xffffffff81000000 0x0000000001000000
|
||||
0x0000000000cfd000 0x0000000000cfd000 R E 200000
|
||||
LOAD 0x0000000001000000 0xffffffff81e00000 0x0000000001e00000
|
||||
0x0000000000100000 0x0000000000100000 RW 200000
|
||||
LOAD 0x0000000001200000 0x0000000000000000 0x0000000001f00000
|
||||
0x0000000000014d98 0x0000000000014d98 RW 200000
|
||||
LOAD 0x0000000001315000 0xffffffff81f15000 0x0000000001f15000
|
||||
0x000000000011d000 0x0000000000279000 RWE 200000
|
||||
NOTE 0x0000000000b17284 0xffffffff81917284 0x0000000001917284
|
||||
0x0000000000000024 0x0000000000000024 4
|
||||
|
||||
Section to Segment mapping:
|
||||
Segment Sections...
|
||||
00 .text .notes __ex_table .rodata __bug_table .pci_fixup .builtin_fw
|
||||
.tracedata __ksymtab __ksymtab_gpl __kcrctab __kcrctab_gpl
|
||||
__ksymtab_strings __param __modver
|
||||
01 .data .vvar
|
||||
02 .data..percpu
|
||||
03 .init.text .init.data .x86_cpu_dev.init .altinstructions
|
||||
.altinstr_replacement .iommu_table .apicdrivers .exit.text
|
||||
.smp_locks .data_nosave .bss .brk
|
||||
```
|
||||
|
||||
Here we can see five segments with sections list. You can find all of these sections in the generated linker script at - `arch/x86/kernel/vmlinux.lds`.
|
||||
|
||||
That's all. Of course it's not a full description of ELF (Executable and Linkable Format), but if you want to know more, you can find the documentation - [here](http://www.uclibc.org/docs/elf-64-gen.pdf)
|
||||
263
Theory/Paging.md
Normal file
263
Theory/Paging.md
Normal file
@@ -0,0 +1,263 @@
|
||||
分页
|
||||
================================================================================
|
||||
|
||||
Introduction
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
In the fifth [part](http://0xax.gitbooks.io/linux-insides/content/Booting/linux-bootstrap-5.html) of the series `Linux kernel booting process` we learned about what the kernel does in its earliest stage. In the next step the kernel will initialize different things like `initrd` mounting, lockdep initialization, and many many others things, before we can see how the kernel runs the first init process.
|
||||
|
||||
Yeah, there will be many different things, but many many and once again many work with **memory**.
|
||||
|
||||
In my view, memory management is one of the most complex part of the linux kernel and in system programming in general. This is why before we proceed with the kernel initialization stuff, we need to get acquainted with paging.
|
||||
|
||||
`Paging` is a mechanism that translates a linear memory address to a physical address. If you have read the previous parts of this book, you may remember that we saw segmentation in real mode when physical addresses are calculated by shifting a segment register by four and adding an offset. We also saw segmentation in protected mode, where we used the descriptor tables and base addresses from descriptors with offsets to calculate the physical addresses. Now that we are in 64-bit mode, will see paging.
|
||||
|
||||
As the Intel manual says:
|
||||
|
||||
> Paging provides a mechanism for implementing a conventional demand-paged, virtual-memory system where sections of a program’s execution environment are mapped into physical memory as needed.
|
||||
|
||||
So... In this post I will try to explain the theory behind paging. Of course it will be closely related to the `x86_64` version of the linux kernel for, but we will not go into too much details (at least in this post).
|
||||
|
||||
Enabling paging
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
There are three paging modes:
|
||||
|
||||
* 32-bit paging;
|
||||
* PAE paging;
|
||||
* IA-32e paging.
|
||||
|
||||
We will only explain the last mode here. To enable the `IA-32e paging` paging mode we need to do following things:
|
||||
|
||||
* set the `CR0.PG` bit;
|
||||
* set the `CR4.PAE` bit;
|
||||
* set the `IA32_EFER.LME` bit.
|
||||
|
||||
We already saw where those this bits were set in [arch/x86/boot/compressed/head_64.S](https://github.com/torvalds/linux/blob/master/arch/x86/boot/compressed/head_64.S):
|
||||
|
||||
```assembly
|
||||
movl $(X86_CR0_PG | X86_CR0_PE), %eax
|
||||
movl %eax, %cr0
|
||||
```
|
||||
|
||||
and
|
||||
|
||||
```assembly
|
||||
movl $MSR_EFER, %ecx
|
||||
rdmsr
|
||||
btsl $_EFER_LME, %eax
|
||||
wrmsr
|
||||
```
|
||||
|
||||
Paging structures
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
Paging divides the linear address space into fixed-size pages. Pages can be mapped into the physical address space or even external storage. This fixed size is `4096` bytes for the `x86_64` linux kernel. To perform the linear address translation to a physical address special structures are used. Every structure is `4096` bytes size and contains `512` entries (this only for `PAE` and `IA32_EFER.LME` modes). Paging structures are hierarchical and the linux kernel uses 4 level of paging in the `x86_64` architecture. The CPU uses a part of the linear address to identify the entry in another paging structure which is at the lower level or physical memory region (`page frame`) or physical address in this region (`page offset`). The address of the top level paging structure located in the `cr3` register. We already saw this in [arch/x86/boot/compressed/head_64.S](https://github.com/torvalds/linux/blob/master/arch/x86/boot/compressed/head_64.S):
|
||||
|
||||
```assembly
|
||||
leal pgtable(%ebx), %eax
|
||||
movl %eax, %cr3
|
||||
```
|
||||
|
||||
We built the page table structures and put the address of the top-level structure in the `cr3` register. Here `cr3` is used to store the address of the top-level structure, the `PML4` or `Page Global Directory` as it is called in the linux kernel. `cr3` is 64-bit register and has the following structure:
|
||||
|
||||
```
|
||||
63 52 51 32
|
||||
--------------------------------------------------------------------------------
|
||||
| | |
|
||||
| Reserved MBZ | Address of the top level structure |
|
||||
| | |
|
||||
--------------------------------------------------------------------------------
|
||||
31 12 11 5 4 3 2 0
|
||||
--------------------------------------------------------------------------------
|
||||
| | | P | P | |
|
||||
| Address of the top level structure | Reserved | C | W | Reserved |
|
||||
| | | D | T | |
|
||||
--------------------------------------------------------------------------------
|
||||
```
|
||||
|
||||
These fields have the following meanings:
|
||||
|
||||
* Bits 2:0 - ignored;
|
||||
* Bits 51:12 - stores the address of the top level paging structure;
|
||||
* Bit 3 and 4 - PWT or Page-Level Writethrough and PCD or Page-level cache disable indicate. These bits control the way the page or Page Table is handled by the hardware cache;
|
||||
* Reserved - reserved must be 0;
|
||||
* Bits 63:52 - reserved must be 0.
|
||||
|
||||
The linear address translation address is following:
|
||||
|
||||
* A given linear address arrives to the [MMU](http://en.wikipedia.org/wiki/Memory_management_unit) instead of memory bus.
|
||||
* 64-bit linear address splits on some parts. Only low 48 bits are significant, it means that `2^48` or 256 TBytes of linear-address space may be accessed at any given time.
|
||||
* `cr3` register stores the address of the 4 top-level paging structure.
|
||||
* `47:39` bits of the given linear address stores an index into the paging structure level-4, `38:30` bits stores index into the paging structure level-3, `29:21` bits stores an index into the paging structure level-2, `20:12` bits stores an index into the paging structure level-1 and `11:0` bits provide the byte offset into the physical page.
|
||||
|
||||
schematically, we can imagine it like this:
|
||||
|
||||

|
||||
|
||||
Every access to a linear address is either a supervisor-mode access or a user-mode access. This access is determined by the `CPL` (current privilege level). If `CPL < 3` it is a supervisor mode access level otherwise, otherwise it is a user mode access level. For example, the top level page table entry contains access bits and has the following structure:
|
||||
|
||||
```
|
||||
63 62 52 51 32
|
||||
--------------------------------------------------------------------------------
|
||||
| N | | |
|
||||
| | Available | Address of the paging structure on lower level |
|
||||
| X | | |
|
||||
--------------------------------------------------------------------------------
|
||||
31 12 11 9 8 7 6 5 4 3 2 1 0
|
||||
--------------------------------------------------------------------------------
|
||||
| | | M |I| | P | P |U|W| |
|
||||
| Address of the paging structure on lower level | AVL | B |G|A| C | W | | | P |
|
||||
| | | Z |N| | D | T |S|R| |
|
||||
--------------------------------------------------------------------------------
|
||||
```
|
||||
|
||||
Where:
|
||||
|
||||
* 63 bit - N/X bit (No Execute Bit) - presents ability to execute the code from physical pages mapped by the table entry;
|
||||
* 62:52 bits - ignored by CPU, used by system software;
|
||||
* 51:12 bits - stores physical address of the lower level paging structure;
|
||||
* 12:9 bits - ignored by CPU;
|
||||
* MBZ - must be zero bits;
|
||||
* Ignored bits;
|
||||
* A - accessed bit indicates was physical page or page structure accessed;
|
||||
* PWT and PCD used for cache;
|
||||
* U/S - user/supervisor bit controls user access to the all physical pages mapped by this table entry;
|
||||
* R/W - read/write bit controls read/write access to the all physical pages mapped by this table entry;
|
||||
* P - present bit. Current bit indicates was page table or physical page loaded into primary memory or not.
|
||||
|
||||
Ok, we know about the paging structures and their entries. Now let's see some details about 4-level paging in the linux kernel.
|
||||
|
||||
Paging structures in the linux kernel
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
As we've seen, the linux kernel in `x86_64` uses 4-level page tables. Their names are:
|
||||
|
||||
* Page Global Directory
|
||||
* Page Upper Directory
|
||||
* Page Middle Directory
|
||||
* Page Table Entry
|
||||
|
||||
After you've compiled and installed the linux kernel, you can see the `System.map` file which stores the virtual addresses of the functions that are used by the kernel. For example:
|
||||
|
||||
```
|
||||
$ grep "start_kernel" System.map
|
||||
ffffffff81efe497 T x86_64_start_kernel
|
||||
ffffffff81efeaa2 T start_kernel
|
||||
```
|
||||
|
||||
We can see `0xffffffff81efe497` here. I doubt you really have that much RAM installed. But anyway, `start_kernel` and `x86_64_start_kernel` will be executed. The address space in `x86_64` is `2^64` size, but it's too large, that's why a smaller address space is used, only 48-bits wide. So we have a situation where the physical address space is limited to 48 bits, but addressing still performed with 64 bit pointers. How is this problem solved? Look at this diagram:
|
||||
|
||||
```
|
||||
0xffffffffffffffff +-----------+
|
||||
| |
|
||||
| | Kernelspace
|
||||
| |
|
||||
0xffff800000000000 +-----------+
|
||||
| |
|
||||
| |
|
||||
| hole |
|
||||
| |
|
||||
| |
|
||||
0x00007fffffffffff +-----------+
|
||||
| |
|
||||
| | Userspace
|
||||
| |
|
||||
0x0000000000000000 +-----------+
|
||||
```
|
||||
|
||||
This solution is `sign extension`. Here we can see that the lower 48 bits of a virtual address can be used for addressing. Bits `63:48` can be either only zeroes or only ones. Note that the virtual address space is split in 2 parts:
|
||||
|
||||
* Kernel space
|
||||
* Userspace
|
||||
|
||||
Userspace occupies the lower part of the virtual address space, from `0x000000000000000` to `0x00007fffffffffff` and kernel space occupies the highest part from `0xffff8000000000` to `0xffffffffffffffff`. Note that bits `63:48` is 0 for userspace and 1 for kernel space. All addresses which are in kernel space and in userspace or in other words which higher `63:48` bits are zeroes or ones are called `canonical` addresses. There is a `non-canonical` area between these memory regions. Together these two memory regions (kernel space and user space) are exactly `2^48` bits wide. We can find the virtual memory map with 4 level page tables in the [Documentation/x86/x86_64/mm.txt](https://github.com/torvalds/linux/blob/master/Documentation/x86/x86_64/mm.txt):
|
||||
|
||||
```
|
||||
0000000000000000 - 00007fffffffffff (=47 bits) user space, different per mm
|
||||
hole caused by [48:63] sign extension
|
||||
ffff800000000000 - ffff87ffffffffff (=43 bits) guard hole, reserved for hypervisor
|
||||
ffff880000000000 - ffffc7ffffffffff (=64 TB) direct mapping of all phys. memory
|
||||
ffffc80000000000 - ffffc8ffffffffff (=40 bits) hole
|
||||
ffffc90000000000 - ffffe8ffffffffff (=45 bits) vmalloc/ioremap space
|
||||
ffffe90000000000 - ffffe9ffffffffff (=40 bits) hole
|
||||
ffffea0000000000 - ffffeaffffffffff (=40 bits) virtual memory map (1TB)
|
||||
... unused hole ...
|
||||
ffffec0000000000 - fffffc0000000000 (=44 bits) kasan shadow memory (16TB)
|
||||
... unused hole ...
|
||||
ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks
|
||||
... unused hole ...
|
||||
ffffffff80000000 - ffffffffa0000000 (=512 MB) kernel text mapping, from phys 0
|
||||
ffffffffa0000000 - ffffffffff5fffff (=1525 MB) module mapping space
|
||||
ffffffffff600000 - ffffffffffdfffff (=8 MB) vsyscalls
|
||||
ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole
|
||||
```
|
||||
|
||||
We can see here the memory map for user space, kernel space and the non-canonical area in-between them. The user space memory map is simple. Let's take a closer look at the kernel space. We can see that it starts from the guard hole which is reserved for the hypervisor. We can find the definition of this guard hole in [arch/x86/include/asm/page_64_types.h](https://github.com/torvalds/linux/blob/master/arch/x86/include/asm/page_64_types.h):
|
||||
|
||||
```C
|
||||
#define __PAGE_OFFSET _AC(0xffff880000000000, UL)
|
||||
```
|
||||
|
||||
Previously this guard hole and `__PAGE_OFFSET` was from `0xffff800000000000` to `0xffff80ffffffffff` to prevent access to non-canonical area, but was later extended by 3 bits for the hypervisor.
|
||||
|
||||
Next is the lowest usable address in kernel space - `ffff880000000000`. This virtual memory region is for direct mapping of the all physical memory. After the memory space which maps all physical addresses, the guard hole. It needs to be between the direct mapping of all the physical memory and the vmalloc area. After the virtual memory map for the first terabyte and the unused hole after it, we can see the `kasan` shadow memory. It was added by [commit](https://github.com/torvalds/linux/commit/ef7f0d6a6ca8c9e4b27d78895af86c2fbfaeedb2) and provides the kernel address sanitizer. After the next unused hole we can see the `esp` fixup stacks (we will talk about it in other parts of this book) and the start of the kernel text mapping from the physical address - `0`. We can find the definition of this address in the same file as the `__PAGE_OFFSET`:
|
||||
|
||||
```C
|
||||
#define __START_KERNEL_map _AC(0xffffffff80000000, UL)
|
||||
```
|
||||
|
||||
Usually kernel's `.text` start here with the `CONFIG_PHYSICAL_START` offset. We saw it in the post about [ELF64](https://github.com/0xAX/linux-insides/blob/master/Theory/ELF.md):
|
||||
|
||||
```
|
||||
readelf -s vmlinux | grep ffffffff81000000
|
||||
1: ffffffff81000000 0 SECTION LOCAL DEFAULT 1
|
||||
65099: ffffffff81000000 0 NOTYPE GLOBAL DEFAULT 1 _text
|
||||
90766: ffffffff81000000 0 NOTYPE GLOBAL DEFAULT 1 startup_64
|
||||
```
|
||||
|
||||
Here i checked `vmlinux` with the `CONFIG_PHYSICAL_START` is `0x1000000`. So we have the start point of the kernel `.text` - `0xffffffff80000000` and offset - `0x1000000`, the resulted virtual address will be `0xffffffff80000000 + 1000000 = 0xffffffff81000000`.
|
||||
|
||||
After the kernel `.text` region there is the virtual memory region for kernel modules, `vsyscalls` and an unused hole of 2 megabytes.
|
||||
|
||||
We've seen how the kernel's virtual memory map is laid out and how a virtual address is translated into a physical one. Let's take for example following address:
|
||||
|
||||
```
|
||||
0xffffffff81000000
|
||||
```
|
||||
|
||||
In binary it will be:
|
||||
|
||||
```
|
||||
1111111111111111 111111111 111111110 000001000 000000000 000000000000
|
||||
63:48 47:39 38:30 29:21 20:12 11:0
|
||||
```
|
||||
|
||||
This virtual address is split in parts as described above:
|
||||
|
||||
* `63:48` - bits not used;
|
||||
* `47:39` - bits of the given linear address stores an index into the paging structure level-4;
|
||||
* `38:30` - bits stores index into the paging structure level-3;
|
||||
* `29:21` - bits stores an index into the paging structure level-2;
|
||||
* `20:12` - bits stores an index into the paging structure level-1;
|
||||
* `11:0` - bits provide the byte offset into the physical page.
|
||||
|
||||
That is all. Now you know a little about theory of `paging` and we can go ahead in the kernel source code and see the first initialization steps.
|
||||
|
||||
Conclusion
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
It's the end of this short part about paging theory. Of course this post doesn't cover every detail of paging, but soon we'll see in practice how the linux kernel builds paging structures and works with them.
|
||||
|
||||
**Please note that English is not my first language and I am really sorry for any inconvenience. If you've found any mistakes please send me PR to [linux-internals](https://github.com/0xAX/linux-internals).**
|
||||
|
||||
|
||||
Links
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
* [Paging on Wikipedia](http://en.wikipedia.org/wiki/Paging)
|
||||
* [Intel 64 and IA-32 architectures software developer's manual volume 3A](http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html)
|
||||
* [MMU](http://en.wikipedia.org/wiki/Memory_management_unit)
|
||||
* [ELF64](https://github.com/0xAX/linux-insides/blob/master/Theory/ELF.md)
|
||||
* [Documentation/x86/x86_64/mm.txt](https://github.com/torvalds/linux/blob/master/Documentation/x86/x86_64/mm.txt)
|
||||
* [Last part - Kernel booting process](http://0xax.gitbooks.io/linux-insides/content/Booting/linux-bootstrap-5.html)
|
||||
6
Theory/README.md
Normal file
6
Theory/README.md
Normal file
@@ -0,0 +1,6 @@
|
||||
# 理论
|
||||
|
||||
这一章描述各种理论性概念和那些不直接涉及实践,但是知道了会很有用的概念。
|
||||
|
||||
* [分页](http://xinqiu.gitbooks.io/linux-insides/content/Theory/Paging.html)
|
||||
* [Elf64 格式](http://xinqiu.gitbooks.io/linux-insides/content/Theory/ELF.html)
|
||||
Reference in New Issue
Block a user