27 KiB
内核启动过程,第三部分
显示模式初始化和进入保护模式
这一章是内核启动过程的第三部分,在前一章中,我们的内核启动过程之旅停在了对 set_video 函数的调用(这个函数定义在 main.c)。在着一章中,我们将接着上一章继续我们的内核启动之旅。在这一章你将读到下面的内容:
- 显示模式的初始化,
- 在进入保护模式之前的准备工作,
- 正式进入保护模式
注意 如果你对保护模式一无所知,你可以查看前一章 的相关内容。另外,你也可以查看下面这些链接 以了解更多关于保护模式的内容。
就像我们前面所说的,我们将从 set_video 函数开始我们这章的内容,你可以在 arch/x86/boot/video.c 找到这个函数的定义。 这个函数首先从 boot_params.hdr 数据结构获取显示模式设置:
u16 mode = boot_params.hdr.vid_mode;
至于 boot_params.hdr 数据结构中的内容,是通过 copy_boot_params 函数实现的 (关于这个函数的实现细节请查看上一章的内容),boot_params.hdr 中的 vid_mode 是引导程序必须填入的字段。你可以在 kernel boot protocol 文档中找到关于 vid_mode 的详细信息:
Offset Proto Name Meaning
/Size
01FA/2 ALL vid_mode Video mode control
而在 linux kernel boot protocol 文档中定义了如何通过命令行参数的方式为 vid_mode 字段传入相应的值:
**** SPECIAL COMMAND LINE OPTIONS
vga=<mode>
<mode> here is either an integer (in C notation, either
decimal, octal, or hexadecimal) or one of the strings
"normal" (meaning 0xFFFF), "ext" (meaning 0xFFFE) or "ask"
(meaning 0xFFFD). This value should be entered into the
vid_mode field, as it is used by the kernel before the command
line is parsed.
所以我们可以通过将 vga 选项写入 grub 或者起到引导程序的配置文件将,从而让内核命令行得到相应的显示模式设置信息。就像上面所描述的那样,这个选项可以接受不同类型的值来表示相同的意思。比如你可以传入 0XFFFD 或者 ask,这2个值都表示需要显示一个菜单让用户选择想要的显示模式。下面的链接就给出了这个菜单:
通过这个菜单,用户可以选择想要进入的显示模式。不过再我们进一步了解显示模式的设置过程之前,让我们先回头了解一些重要的概念。
内核数据类型
在前面的章节中,我们已经接触到了一个类似于 u16 的内核数据类型。下面列出了更多内核支持的数据类型:
| Type | char | short | int | long | u8 | u16 | u32 | u64 |
|---|---|---|---|---|---|---|---|---|
| Size | 1 | 2 | 4 | 8 | 1 | 2 | 4 | 8 |
如果你尝试阅读内核代码,最好能够牢记这些数据类型。 them.
堆操作 API
在 set_video 函数将 vid_mod 的值设置完成之后,将调用 RESET_HEAP 宏将 HEAP 头指向 _end 符号。RESET_HEAP 宏定义在 boot.h:
#define RESET_HEAP() ((void *)( HEAP = _end ))
如果你阅读过第二部分,你应该还记得在第二部分中,我们通过 init_heap 函数完成了 heap 的初始化。在 boot.h 中定义了一系列的方法来操作被初始化之后的 heap。这些操作包括:
#define RESET_HEAP() ((void *)( HEAP = _end ))
就像我们在前面看到的,这个宏只是简单的将 HEAP 头设置到 _end 标号。在上一章中我们已经说明了 _end 标号的,在 boot.h 中通过 extern char _end[]; 来引用(从这里可以看出,在内核初始化的时候堆和栈是共享内存空间的,详细的信息可以查看第一章的堆初始化和第二章的堆初始化):
下面一个是 GET_HEAP 宏:
#define GET_HEAP(type, n) \
((type *)__get_heap(sizeof(type),__alignof__(type),(n)))
可以看出这个宏调用了 __get_heap 函数来进行内存的分配。__get_heap 需要下面3个参数来进行内存分配参数:
- 某个数据类型所占用的字节数
__alignof__(type)返回对于请求的数据类型需要怎样的对齐方式 ( 根据我的了解这个是 gcc 提供的一个功能 )n需要分配对少个对应数据类型的对象
Implementation of __get_heap is:
static inline char *__get_heap(size_t s, size_t a, size_t n)
{
char *tmp;
HEAP = (char *)(((size_t)HEAP+(a-1)) & ~(a-1));
tmp = HEAP;
HEAP += s*n;
return tmp;
}
and further we will see its usage, something like:
saved.data = GET_HEAP(u16, saved.x * saved.y);
Let's try to understand how __get_heap works. We can see here that HEAP (which is equal to _end after RESET_HEAP()) is the address of aligned memory according to the a parameter. After this we save the memory address from HEAP to the tmp variable, move HEAP to the end of the allocated block and return tmp which is the start address of allocated memory.
And the last function is:
static inline bool heap_free(size_t n)
{
return (int)(heap_end - HEAP) >= (int)n;
}
which subtracts value of the HEAP from the heap_end (we calculated it in the previous part) and returns 1 if there is enough memory for n.
That's all. Now we have a simple API for heap and can setup video mode.
Set up video mode
Now we can move directly to video mode initialization. We stopped at the RESET_HEAP() call in the set_video function. Next is the call to store_mode_params which stores video mode parameters in the boot_params.screen_info structure which is defined in include/uapi/linux/screen_info.h.
If we look at the store_mode_params function, we can see that it starts with the call to the store_cursor_position function. As you can understand from the function name, it gets information about cursor and stores it.
First of all store_cursor_position initializes two variables which have type biosregs with AH = 0x3, and calls 0x10 BIOS interruption. After the interruption is successfully executed, it returns row and column in the DL and DH registers. Row and column will be stored in the orig_x and orig_y fields from the the boot_params.screen_info structure.
After store_cursor_position is executed, the store_video_mode function will be called. It just gets the current video mode and stores it in boot_params.screen_info.orig_video_mode.
After this, it checks the current video mode and sets the video_segment. After the BIOS transfers control to the boot sector, the following addresses are for video memory:
0xB000:0x0000 32 Kb Monochrome Text Video Memory
0xB800:0x0000 32 Kb Color Text Video Memory
So we set the video_segment variable to 0xB000 if the current video mode is MDA, HGC, or VGA in monochrome mode and to 0xB800 if the current video mode is in color mode. After setting up the address of the video segment, font size needs to be stored in boot_params.screen_info.orig_video_points with:
set_fs(0);
font_size = rdfs16(0x485);
boot_params.screen_info.orig_video_points = font_size;
First of all we put 0 in the FS register with the set_fs function. We already saw functions like set_fs in the previous part. They are all defined in boot.h. Next we read the value which is located at address 0x485 (this memory location is used to get the font size) and save the font size in boot_params.screen_info.orig_video_points.
x = rdfs16(0x44a);
y = (adapter == ADAPTER_CGA) ? 25 : rdfs8(0x484)+1;
Next we get the amount of columns by address 0x44a and rows by address 0x484 and store them in boot_params.screen_info.orig_video_cols and boot_params.screen_info.orig_video_lines. After this, execution of store_mode_params is finished.
Next we can see the save_screen function which just saves screen content to the heap. This function collects all data which we got in the previous functions like rows and columns amount etc. and stores it in the saved_screen structure, which is defined as:
static struct saved_screen {
int x, y;
int curx, cury;
u16 *data;
} saved;
It then checks whether the heap has free space for it with:
if (!heap_free(saved.x*saved.y*sizeof(u16)+512))
return;
and allocates space in the heap if it is enough and stores saved_screen in it.
The next call is probe_cards(0) from arch/x86/boot/video-mode.c. It goes over all video_cards and collects the number of modes provided by the cards. Here is the interesting moment, we can see the loop:
for (card = video_cards; card < video_cards_end; card++) {
/* collecting number of modes here */
}
but video_cards is not declared anywhere. Answer is simple: Every video mode presented in the x86 kernel setup code has definition like this:
static __videocard video_vga = {
.card_name = "VGA",
.probe = vga_probe,
.set_mode = vga_set_mode,
};
where __videocard is a macro:
#define __videocard struct card_info __attribute__((used,section(".videocards")))
which means that card_info structure:
struct card_info {
const char *card_name;
int (*set_mode)(struct mode_info *mode);
int (*probe)(void);
struct mode_info *modes;
int nmodes;
int unsafe;
u16 xmode_first;
u16 xmode_n;
};
is in the .videocards segment. Let's look in the arch/x86/boot/setup.ld linker file, we can see there:
.videocards : {
video_cards = .;
*(.videocards)
video_cards_end = .;
}
It means that video_cards is just a memory address and all card_info structures are placed in this segment. It means that all card_info structures are placed between video_cards and video_cards_end, so we can use it in a loop to go over all of it. After probe_cards executes we have all structures like static __videocard video_vga with filled nmodes (number of video modes).
After probe_cards execution is finished, we move to the main loop in the set_video function. There is an infinite loop which tries to set up video mode with the set_mode function or prints a menu if we passed vid_mode=ask to the kernel command line or video mode is undefined.
The set_mode function is defined in video-mode.c and gets only one parameter, mode, which is the number of video modes (we got it from the menu or in the start of setup_video, from the kernel setup header).
The set_mode function checks the mode and calls the raw_set_mode function. The raw_set_mode calls the set_mode function for the selected card i.e. card->set_mode(struct mode_info*). We can get access to this function from the card_info structure. Every video mode defines this structure with values filled depending upon the video mode (for example for vga it is the video_vga.set_mode function. See above example of card_info structure for vga). video_vga.set_mode is vga_set_mode, which checks the vga mode and calls the respective function:
static int vga_set_mode(struct mode_info *mode)
{
vga_set_basic_mode();
force_x = mode->x;
force_y = mode->y;
switch (mode->mode) {
case VIDEO_80x25:
break;
case VIDEO_8POINT:
vga_set_8font();
break;
case VIDEO_80x43:
vga_set_80x43();
break;
case VIDEO_80x28:
vga_set_14font();
break;
case VIDEO_80x30:
vga_set_80x30();
break;
case VIDEO_80x34:
vga_set_80x34();
break;
case VIDEO_80x60:
vga_set_80x60();
break;
}
return 0;
}
Every function which sets up video mode just calls the 0x10 BIOS interrupt with a certain value in the AH register.
After we have set video mode, we pass it to boot_params.hdr.vid_mode.
Next vesa_store_edid is called. This function simply stores the EDID (Extended Display Identification Data) information for kernel use. After this store_mode_params is called again. Lastly, if do_restore is set, the screen is restored to an earlier state.
After this we have set video mode and now we can switch to the protected mode.
Last preparation before transition into protected mode
We can see the last function call - go_to_protected_mode - in main.c. As the comment says: Do the last things and invoke protected mode, so let's see these last things and switch into protected mode.
go_to_protected_mode is defined in arch/x86/boot/pm.c. It contains some functions which make the last preparations before we can jump into protected mode, so let's look at it and try to understand what they do and how it works.
First is the call to the realmode_switch_hook function in go_to_protected_mode. This function invokes the real mode switch hook if it is present and disables NMI. Hooks are used if the bootloader runs in a hostile environment. You can read more about hooks in the boot protocol (see ADVANCED BOOT LOADER HOOKS).
The realmode_switch hook presents a pointer to the 16-bit real mode far subroutine which disables non-maskable interrupts. After realmode_switch hook (it isn't present for me) is checked, disabling of Non-Maskable Interrupts(NMI) occurs:
asm volatile("cli");
outb(0x80, 0x70); /* Disable NMI */
io_delay();
At first there is an inline assembly instruction with a cli instruction which clears the interrupt flag (IF). After this, external interrupts are disabled. The next line disables NMI (non-maskable interrupt).
An interrupt is a signal to the CPU which is emitted by hardware or software. After getting the signal, the CPU suspends the current instruction sequence, saves its state and transfers control to the interrupt handler. After the interrupt handler has finished it's work, it transfers control to the interrupted instruction. Non-maskable interrupts (NMI) are interrupts which are always processed, independently of permission. It cannot be ignored and is typically used to signal for non-recoverable hardware errors. We will not dive into details of interrupts now, but will discuss it in the next posts.
Let's get back to the code. We can see that second line is writing 0x80 (disabled bit) byte to 0x70 (CMOS Address register). After that, a call to the io_delay function occurs. io_delay causes a small delay and looks like:
static inline void io_delay(void)
{
const u16 DELAY_PORT = 0x80;
asm volatile("outb %%al,%0" : : "dN" (DELAY_PORT));
}
Outputting any byte to the port 0x80 should delay exactly 1 microsecond. So we can write any value (value from AL register in our case) to the 0x80 port. After this delay realmode_switch_hook function has finished execution and we can move to the next function.
The next function is enable_a20, which enables A20 line. This function is defined in arch/x86/boot/a20.c and it tries to enable the A20 gate with different methods. The first is the a20_test_short function which checks if A20 is already enabled or not with the a20_test function:
static int a20_test(int loops)
{
int ok = 0;
int saved, ctr;
set_fs(0x0000);
set_gs(0xffff);
saved = ctr = rdfs32(A20_TEST_ADDR);
while (loops--) {
wrfs32(++ctr, A20_TEST_ADDR);
io_delay(); /* Serialize and make delay constant */
ok = rdgs32(A20_TEST_ADDR+0x10) ^ ctr;
if (ok)
break;
}
wrfs32(saved, A20_TEST_ADDR);
return ok;
}
First of all we put 0x0000 in the FS register and 0xffff in the GS register. Next we read the value in address A20_TEST_ADDR (it is 0x200) and put this value into the saved variable and ctr.
Next we write an updated ctr value into fs:gs with the wrfs32 function, then delay for 1ms, and then read the value from the GS register by address A20_TEST_ADDR+0x10, if it's not zero we already have enabled the A20 line. If A20 is disabled, we try to enable it with a different method which you can find in the a20.c. For example with call of 0x15 BIOS interrupt with AH=0x2041 etc.
If the enabled_a20 function finished with fail, print an error message and call function die. You can remember it from the first source code file where we started - arch/x86/boot/header.S:
die:
hlt
jmp die
.size die, .-die
After the A20 gate is successfully enabled, the reset_coprocessor function is called:
outb(0, 0xf0);
outb(0, 0xf1);
This function clears the Math Coprocessor by writing 0 to 0xf0 and then resets it by writing 0 to 0xf1.
After this, the mask_all_interrupts function is called:
outb(0xff, 0xa1); /* Mask all interrupts on the secondary PIC */
outb(0xfb, 0x21); /* Mask all but cascade on the primary PIC */
This masks all interrupts on the secondary PIC (Programmable Interrupt Controller) and primary PIC except for IRQ2 on the primary PIC.
And after all of these preparations, we can see the actual transition into protected mode.
Set up Interrupt Descriptor Table
Now we set up the Interrupt Descriptor table (IDT). setup_idt:
static void setup_idt(void)
{
static const struct gdt_ptr null_idt = {0, 0};
asm volatile("lidtl %0" : : "m" (null_idt));
}
which sets up the Interrupt Descriptor Table (describes interrupt handlers and etc.). For now the IDT is not installed (we will see it later), but now we just the load IDT with the lidtl instruction. null_idt contains address and size of IDT, but now they are just zero. null_idt is a gdt_ptr structure, it as defined as:
struct gdt_ptr {
u16 len;
u32 ptr;
} __attribute__((packed));
where we can see the 16-bit length(len) of the IDT and the 32-bit pointer to it (More details about the IDT and interruptions will be seen in the next posts). __attribute__((packed)) means that the size of gdt_ptr is the minimum required size. So the size of the gdt_ptr will be 6 bytes here or 48 bits. (Next we will load the pointer to the gdt_ptr to the GDTR register and you might remember from the previous post that it is 48-bits in size).
Set up Global Descriptor Table
Next is the setup of the Global Descriptor Table (GDT). We can see the setup_gdt function which sets up GDT (you can read about it in the Kernel booting process. Part 2.). There is a definition of the boot_gdt array in this function, which contains the definition of the three segments:
static const u64 boot_gdt[] __attribute__((aligned(16))) = {
[GDT_ENTRY_BOOT_CS] = GDT_ENTRY(0xc09b, 0, 0xfffff),
[GDT_ENTRY_BOOT_DS] = GDT_ENTRY(0xc093, 0, 0xfffff),
[GDT_ENTRY_BOOT_TSS] = GDT_ENTRY(0x0089, 4096, 103),
};
For code, data and TSS (Task State Segment). We will not use the task state segment for now, it was added there to make Intel VT happy as we can see in the comment line (if you're interested you can find commit which describes it - here). Let's look at boot_gdt. First of all note that it has the __attribute__((aligned(16))) attribute. It means that this structure will be aligned by 16 bytes. Let's look at a simple example:
#include <stdio.h>
struct aligned {
int a;
}__attribute__((aligned(16)));
struct nonaligned {
int b;
};
int main(void)
{
struct aligned a;
struct nonaligned na;
printf("Not aligned - %zu \n", sizeof(na));
printf("Aligned - %zu \n", sizeof(a));
return 0;
}
Technically a structure which contains one int field must be 4 bytes, but here aligned structure will be 16 bytes:
$ gcc test.c -o test && test
Not aligned - 4
Aligned - 16
GDT_ENTRY_BOOT_CS has index - 2 here, GDT_ENTRY_BOOT_DS is GDT_ENTRY_BOOT_CS + 1 and etc. It starts from 2, because first is a mandatory null descriptor (index - 0) and the second is not used (index - 1).
GDT_ENTRY is a macro which takes flags, base and limit and builds GDT entry. For example let's look at the code segment entry. GDT_ENTRY takes following values:
- base - 0
- limit - 0xfffff
- flags - 0xc09b
What does this mean? The segment's base address is 0, and the limit (size of segment) is - 0xffff (1 MB). Let's look at the flags. It is 0xc09b and it will be:
1100 0000 1001 1011
in binary. Let's try to understand what every bit means. We will go through all bits from left to right:
- 1 - (G) granularity bit
- 1 - (D) if 0 16-bit segment; 1 = 32-bit segment
- 0 - (L) executed in 64 bit mode if 1
- 0 - (AVL) available for use by system software
- 0000 - 4 bit length 19:16 bits in the descriptor
- 1 - (P) segment presence in memory
- 00 - (DPL) - privilege level, 0 is the highest privilege
- 1 - (S) code or data segment, not a system segment
- 101 - segment type execute/read/
- 1 - accessed bit
You can read more about every bit in the previous post or in the Intel® 64 and IA-32 Architectures Software Developer's Manuals 3A.
After this we get the length of the GDT with:
gdt.len = sizeof(boot_gdt)-1;
We get the size of boot_gdt and subtract 1 (the last valid address in the GDT).
Next we get a pointer to the GDT with:
gdt.ptr = (u32)&boot_gdt + (ds() << 4);
Here we just get the address of boot_gdt and add it to the address of the data segment left-shifted by 4 bits (remember we're in the real mode now).
Lastly we execute the lgdtl instruction to load the GDT into the GDTR register:
asm volatile("lgdtl %0" : : "m" (gdt));
Actual transition into protected mode
This is the end of the go_to_protected_mode function. We loaded IDT, GDT, disable interruptions and now can switch the CPU into protected mode. The last step is calling the protected_mode_jump function with two parameters:
protected_mode_jump(boot_params.hdr.code32_start, (u32)&boot_params + (ds() << 4));
which is defined in arch/x86/boot/pmjump.S. It takes two parameters:
- address of protected mode entry point
- address of
boot_params
Let's look inside protected_mode_jump. As I wrote above, you can find it in arch/x86/boot/pmjump.S. The first parameter will be in the eax register and second is in edx.
First of all we put the address of boot_params in the esi register and the address of code segment register cs (0x1000) in bx. After this we shift bx by 4 bits and add the address of label 2 to it (we will have the physical address of label 2 in the bx after this) and jump to label 1. Next we put data segment and task state segment in the cs and di registers with:
movw $__BOOT_DS, %cx
movw $__BOOT_TSS, %di
As you can read above GDT_ENTRY_BOOT_CS has index 2 and every GDT entry is 8 byte, so CS will be 2 * 8 = 16, __BOOT_DS is 24 etc.
Next we set the PE (Protection Enable) bit in the CR0 control register:
movl %cr0, %edx
orb $X86_CR0_PE, %dl
movl %edx, %cr0
and make a long jump to protected mode:
.byte 0x66, 0xea
2: .long in_pm32
.word __BOOT_CS
where
0x66is the operand-size prefix which allows us to mix 16-bit and 32-bit code,0xea- is the jump opcode,in_pm32is the segment offset__BOOT_CSis the code segment.
After this we are finally in the protected mode:
.code32
.section ".text32","ax"
Let's look at the first steps in protected mode. First of all we set up the data segment with:
movl %ecx, %ds
movl %ecx, %es
movl %ecx, %fs
movl %ecx, %gs
movl %ecx, %ss
If you paid attention, you can remember that we saved $__BOOT_DS in the cx register. Now we fill it with all segment registers besides cs (cs is already __BOOT_CS). Next we zero out all general purpose registers besides eax with:
xorl %ecx, %ecx
xorl %edx, %edx
xorl %ebx, %ebx
xorl %ebp, %ebp
xorl %edi, %edi
And jump to the 32-bit entry point in the end:
jmpl *%eax
Remember that eax contains the address of the 32-bit entry (we passed it as first parameter into protected_mode_jump).
That's all. We're in the protected mode and stop at it's entry point. We will see what happens next in the next part.
Conclusion
This is the end of the third part about linux kernel insides. In next part we will see first steps in the protected mode and transition into the long mode.
If you have any questions or suggestions write me a comment or ping me at twitter.
Please note that English is not my first language, And I am really sorry for any inconvenience. If you find any mistakes, please send me a PR with corrections at linux-insides.
