Files
linux-insides-zh/Concepts/initcall.md
2017-08-26 15:16:58 +08:00

24 KiB
Raw Blame History

The initcall mechanism initcall 机制

Introduction 介绍

As you may understand from the title, this part will cover an interesting and important concept in the Linux kernel which is called - initcall. We already saw definitions like these:

就像你从标题所理解的这部分将涉及Linux内核中有趣且重要的概念称之为 initcall。在Linux内核中我们可以看到类似这样的定义

early_param("debug", debug_kernel);

or

或者

arch_initcall(init_pit_clocksource);

在我们分析这个机制在内核中是如何实现的之前我们必须了解这个机制是什么在Linux内核中是如何使用它的。像这样的定义表示一个 回调函数 它们会在Linux内核启动中或启动后调用。实际上 initcall 机制的要点是确定内置模块和子系统初始化的正确顺序。举个例子,我们来看看下面的函数: Actually the main point of the initcall mechanism is to determine correct order of the built-in modules and subsystems initialization. For example let's look at the following function:

static int __init nmi_warning_debugfs(void)
{
    debugfs_create_u64("nmi_longest_ns", 0644,
                       arch_debugfs_dir, &nmi_longest_ns);
    return 0;
}

这个函数出自源码文件 arch/x86/kernel/nmi.c。我们可以看到,这个函数只是在 arch_debugfs_dir 目录中创建 nmi_longest_ns debugfs 文件。实际上,只有在 arch_debugfs_dir 创建后,才会创建这个 debugfs 文件。这个目录是在Linux内核特定架构的初始化期间创建的。实际上该目录将在源码文件 arch/x86/kernel/kdebugfs.carch_kdebugfs_init 函数中创建。注意 arch_kdebugfs_init 函数也被标记为 initcall。from the arch/x86/kernel/nmi.c source code file. As we may see it just creates the nmi_longest_ns debugfs file in the arch_debugfs_dir directory. Actually, this debugfs file may be created only after the arch_debugfs_dir will be created. Creation of this directory occurs during the architecture-specific initialization of the Linux kernel. Actually this directory will be created in the arch_kdebugfs_init function from the arch/x86/kernel/kdebugfs.c source code file. Note that the arch_kdebugfs_init function is marked as initcall too:

arch_initcall(arch_kdebugfs_init);

Linux内核在调用 fs 相关的 initcalls 之前调用所有特定架构的 initcalls。因此,只有在 arch_kdebugfs_dir 目录创建以后才会创建我们的 nmi_longest_ns。实际上Linux内核提供了八个级别的主 initcallsThe Linux kernel calls all architecture-specific initcalls before the fs related initcalls. So, our nmi_longest_ns file will be created only after the arch_kdebugfs_dir directory will be created. Actually, the Linux kernel provides eight levels of main initcalls:

  • early;
  • core;
  • postcore;
  • arch;
  • susys;
  • fs;
  • device;
  • late.

它们的所有名称是由数组 initcall_level_names 来描述的,该数组定义在源码文件 init/main.cAll of their names are represented by the initcall_level_names array which is defined in the init/main.c source code file:

static char *initcall_level_names[] __initdata = {
	"early",
	"core",
	"postcore",
	"arch",
	"subsys",
	"fs",
	"device",
	"late",
};

所有用这些(相同的)标识符标记为 initcall 的函数将会以相同的顺序被调用, early initcalls 会首先被调用,其次是 core initcalls,以此类推。现在,我们对 initcall 机制了解点了所以我们可以开始潜入Linux内核源码来看看这个机制是如何实现的。All functions which are marked as initcall by these identifiers, will be called in the same order or at first early initcalls will be called, at second core initcalls and etc. From this moment we know a little about initcall mechanism, so we can start to dive into the source code of the Linux kernel to see how this mechanism is implemented.

initcall机制在Linux内核中的实现Implementation initcall mechanism in the Linux kernel

Linux内核提供了一组来自头文件 include/linux/init.h 的宏,来标记给定的函数为 initcall。所有这些宏都相当简单The Linux kernel provides a set of macros from the include/linux/init.h header file to mark a given function as initcall. All of these macros are pretty simple:

#define early_initcall(fn)		__define_initcall(fn, early)
#define core_initcall(fn)		__define_initcall(fn, 1)
#define postcore_initcall(fn)		__define_initcall(fn, 2)
#define arch_initcall(fn)		__define_initcall(fn, 3)
#define subsys_initcall(fn)		__define_initcall(fn, 4)
#define fs_initcall(fn)			__define_initcall(fn, 5)
#define device_initcall(fn)		__define_initcall(fn, 6)
#define late_initcall(fn)		__define_initcall(fn, 7)

我们可以看到这些宏只是从相同的头文件扩展为 __define_initcall 宏的调用。此外,__define_initcall 宏有两个参数and as we may see these macros just expand to the call of the __define_initcall macro from the same header file. Moreover, the __define_initcall macro takes two arguments:

  • fn - 在调用某个级别 initcalls 时调用的回调函数callback function which will be called during call of initcalls of the certain level;
  • id - 识别 initcall 的标识符,用来防止两个相同的 initcalls 指向同一个处理函数时出现错误。identifier to identify initcall to prevent error when two the same initcalls point to the same handler.

__define_initcall 宏的实现如下所示The implementation of the __define_initcall macro looks like:

#define __define_initcall(fn, id) \
	static initcall_t __initcall_##fn##id __used \
	__attribute__((__section__(".initcall" #id ".init"))) = fn; \
	LTO_REFERENCE_INITCALL(__initcall_##fn##id)

要了解 __define_initcall 宏,首先让我们来看下 initcall_t 类型。这个类型定义在同一个 头文件 中,它表示一个返回 整形指针的函数指针,这将是 initcall 的结果。To understand the __define_initcall macro, first of all let's look at the initcall_t type. This type is defined in the same header file and it represents pointer to a function which returns pointer to integer which will be result of the initcall:

typedef int (*initcall_t)(void);

现在让我们回到 _-define_initcall 宏。## 提供了连接两个符号的能力。在我们的例子中,__define_initcall 宏的第一行产生了 .initcall id .init ELF部分 给定函数的定义,并标记以下gcc 属性: __initcall_function_name_id and __used。如果我们查看表示内核链接脚本数据的 include/asm-generic/vmlinux.lds.h 头文件,我们会看到所有的 initcalls 部分都将放在 .data 段。Now let's return to the _-define_initcall macro. The ## provides ability to concatenate two symbols. In our case, the first line of the __define_initcall macro produces definition of the given function which is located in the .initcall id .init ELF section and marked with the following gcc attributes: __initcall_function_name_id and __used. If we will look in the include/asm-generic/vmlinux.lds.h header file which represents data for the kernel linker script, we will see that all of initcalls sections will be placed in the .data section:

#define INIT_CALLS					\
		VMLINUX_SYMBOL(__initcall_start) = .;	\
		*(.initcallearly.init)					\
		INIT_CALLS_LEVEL(0)					    \
		INIT_CALLS_LEVEL(1)					    \
		INIT_CALLS_LEVEL(2)					    \
		INIT_CALLS_LEVEL(3)					    \
		INIT_CALLS_LEVEL(4)					    \
		INIT_CALLS_LEVEL(5)					    \
		INIT_CALLS_LEVEL(rootfs)				\
		INIT_CALLS_LEVEL(6)					    \
		INIT_CALLS_LEVEL(7)					    \
		VMLINUX_SYMBOL(__initcall_end) = .;

#define INIT_DATA_SECTION(initsetup_align)	\
	.init.data : AT(ADDR(.init.data) - LOAD_OFFSET) {	   \
        ...                                                \
        INIT_CALLS						                   \
        ...                                                \
	}

第二个属性 - __used,定义在 include/linux/compiler-gcc.h 头文件中,它扩展了以下 gcc 定义The second attribute - __used is defined in the include/linux/compiler-gcc.h header file and it expands to the definition of the following gcc attribute:

#define __used   __attribute__((__used__))

它防止 定义了变量但未使用 的告警。宏 __define_initcall 最后一行是which prevents variable defined but not used warning. The last line of the __define_initcall macro is:

LTO_REFERENCE_INITCALL(__initcall_##fn##id)

这取决于 CONFIG_LTO 内核配置选项,只为编译器提供链接时间优化存根depends on the CONFIG_LTO kernel configuration option and just provides stub for the compiler Link time optimization:

#ifdef CONFIG_LTO
#define LTO_REFERENCE_INITCALL(x) \
        static __used __exit void *reference_##x(void)  \
        {                                               \
                return &x;                              \
        }
#else
#define LTO_REFERENCE_INITCALL(x)
#endif

为了防止没有引用模块中变量时出现问题,它被移到了程序末尾。这就是关于 __define_initcall 宏的全部了。所以,所有的 *_initcall 宏将会在Linux内核编译时扩展所有的 initcalls 会放置在它们的段内,并可以通过 .data 段来获取Linux内核在初始化过程中就知道在哪儿去找到 initcall 并调用它。In order to prevent any problem when there is no reference to a variable in a module, it will be moved to the end of the program. That's all about the __define_initcall macro. So, all of the *_initcall macros will be expanded during compilation of the Linux kernel, and all initcalls will be placed in their sections and all of them will be available from the .data section and the Linux kernel will know where to find a certain initcall to call it during initialization process.

既然Linux内核可以调用 initcalls我们就来看下Linux内核是如何做的。这个过程从init/main.c 头文件的 do_basic_setup 函数开始As initcalls can be called by the Linux kernel, let's look how the Linux kernel does this. This process starts in the do_basic_setup function from the init/main.c source code file:

static void __init do_basic_setup(void)
{
    ...
    ...
    ...
   	do_initcalls();
    ...
    ...
    ...
}

该函数在Linux内核初始化过程中调用调用时机是主要步骤之后比如内存管理器相关的初始化、CPU 子系统等都完成了。do_initcalls 函数只是遍历 initcall 级别数组,并调用每个级别的 do_initcall_level 函数which is called during the initialization of the Linux kernel, right after main steps of initialization like memory manager related initialization, CPU subsystem and other already finished. The do_initcalls function just goes through the array of initcall levels and call the do_initcall_level function for each level:

static void __init do_initcalls(void)
{
	int level;

	for (level = 0; level < ARRAY_SIZE(initcall_levels) - 1; level++)
		do_initcall_level(level);
}

initcall_levels 数组在同一个源码文件中定义,包含了定义在 __define_initcall 宏中的那些段的指针The initcall_levels array is defined in the same source code file and contains pointers to the sections which were defined in the __define_initcall macro:

static initcall_t *initcall_levels[] __initdata = {
	__initcall0_start,
	__initcall1_start,
	__initcall2_start,
	__initcall3_start,
	__initcall4_start,
	__initcall5_start,
	__initcall6_start,
	__initcall7_start,
	__initcall_end,
};

如果你有兴趣你可以在Linux内核编译后生成的链接器脚本 arch/x86/kernel/vmlinux.lds 中找到这些段If you are interested, you can find these sections in the arch/x86/kernel/vmlinux.lds linker script which is generated after the Linux kernel compilation:

.init.data : AT(ADDR(.init.data) - 0xffffffff80000000) {
    ...
    ...
    ...
    ...
    __initcall_start = .;
    *(.initcallearly.init)
    __initcall0_start = .;
    *(.initcall0.init)
    *(.initcall0s.init)
    __initcall1_start = .;
    ...
    ...
}

如果你对这些不熟,可以在本书的某些部分了解更多关于链接器的信息。If you are not familiar with this then you can know more about linkers in the special part of this book.

正如我们刚看到的,do_initcall_level 函数有一个参数 - initcall 的级别,做了以下两件事:首先这个函数拷贝了 initcall_command_line,这是通常内核包含了各个模块参数的命令行的副本,并用 kernel/params.c源码文件的 parse_args 函数解析它,然后调用各个级别的 do_on_initcall 函数As we just saw, the do_initcall_level function takes one parameter - level of initcall and does following two things: First of all this function parses the initcall_command_line which is copy of usual kernel command line which may contain parameters for modules with the parse_args function from the kernel/params.c source code file and call the do_on_initcall function for each level:

for (fn = initcall_levels[level]; fn < initcall_levels[level+1]; fn++)
		do_one_initcall(*fn);

The do_on_initcall does main job for us. As we may see, this function takes one parameter which represent initcall callback function and does the call of the given callback:

int __init_or_module do_one_initcall(initcall_t fn)
{
	int count = preempt_count();
	int ret;
	char msgbuf[64];

	if (initcall_blacklisted(fn))
		return -EPERM;

	if (initcall_debug)
		ret = do_one_initcall_debug(fn);
	else
		ret = fn();

	msgbuf[0] = 0;

	if (preempt_count() != count) {
		sprintf(msgbuf, "preemption imbalance ");
		preempt_count_set(count);
	}
	if (irqs_disabled()) {
		strlcat(msgbuf, "disabled interrupts ", sizeof(msgbuf));
		local_irq_enable();
	}
	WARN(msgbuf[0], "initcall %pF returned with %s\n", fn, msgbuf);

	return ret;
}

Let's try to understand what does the do_on_initcall function does. First of all we increase preemption counter so that we can check it later to be sure that it is not imbalanced. After this step we can see the call of the initcall_backlist function which goes over the blacklisted_initcalls list which stores blacklisted initcalls and releases the given initcall if it is located in this list:

list_for_each_entry(entry, &blacklisted_initcalls, next) {
	if (!strcmp(fn_name, entry->buf)) {
		pr_debug("initcall %s blacklisted\n", fn_name);
		kfree(fn_name);
		return true;
	}
}

The blacklisted initcalls stored in the blacklisted_initcalls list and this list is filled during early Linux kernel initialization from the Linux kernel command line.

After the blacklisted initcalls will be handled, the next part of code does directly the call of the initcall:

if (initcall_debug)
	ret = do_one_initcall_debug(fn);
else
	ret = fn();

Depends on the value of the initcall_debug variable, the do_one_initcall_debug function will call initcall or this function will do it directly via fn(). The initcall_debug variable is defined in the same source code file:

bool initcall_debug;

and provides ability to print some information to the kernel log buffer. The value of the variable can be set from the kernel commands via the initcall_debug parameter. As we can read from the documentation of the Linux kernel command line:

initcall_debug	[KNL] Trace initcalls as they are executed.  Useful
                      for working out where the kernel is dying during
                      startup.

And that's true. If we will look at the implementation of the do_one_initcall_debug function, we will see that it does the same as the do_one_initcall function or i.e. the do_one_initcall_debug function calls the given initcall and prints some information (like the pid of the currently running task, duration of execution of the initcall and etc.) related to the execution of the given initcall:

static int __init_or_module do_one_initcall_debug(initcall_t fn)
{
	ktime_t calltime, delta, rettime;
	unsigned long long duration;
	int ret;

	printk(KERN_DEBUG "calling  %pF @ %i\n", fn, task_pid_nr(current));
	calltime = ktime_get();
	ret = fn();
	rettime = ktime_get();
	delta = ktime_sub(rettime, calltime);
	duration = (unsigned long long) ktime_to_ns(delta) >> 10;
	printk(KERN_DEBUG "initcall %pF returned %d after %lld usecs\n",
		 fn, ret, duration);

	return ret;
}

As an initcall was called by the one of the do_one_initcall or do_one_initcall_debug functions, we may see two checks in the end of the do_one_initcall function. The first one checks the amount of possible __preempt_count_add and __preempt_count_sub calls inside of the executed initcall, and if this value is not equal to the previous value of the preemptible counter, we add the preemption imbalance string to the message buffer and set correct value of the preemptible counter:

if (preempt_count() != count) {
	sprintf(msgbuf, "preemption imbalance ");
	preempt_count_set(count);
}

Later this error string will be printed. The last check the state of local IRQs and if they are disabled, we add the disabled interrupts strings to the our message buffer and enable IRQs for the current processor to prevent the state when IRQs were disabled by an initcall and didn't enable again:

if (irqs_disabled()) {
	strlcat(msgbuf, "disabled interrupts ", sizeof(msgbuf));
	local_irq_enable();
}

That's all. In this way the Linux kernel does initialization of many subsystems in a correct order. From now on, we know what is the initcall mechanism in the Linux kernel. In this part, we covered main general portion of the initcall mechanism but we left some important concepts. Let's make a short look at these concepts.

First of all, we have missed one level of initcalls, this is rootfs initcalls. You can find definition of the rootfs_initcall in the include/linux/init.h header file along with all similar macros which we saw in this part:

#define rootfs_initcall(fn)		__define_initcall(fn, rootfs)

As we may understand from the macro's name, its main purpose is to store callbacks which are related to the rootfs. Besides this goal, it may be useful to initialize other stuffs after initialization related to filesystems level only if devices related stuff are not initialized. For example, the decompression of the initramfs which occurred in the populate_rootfs function from the init/initramfs.c source code file:

rootfs_initcall(populate_rootfs);

From this place, we may see familiar output:

[    0.199960] Unpacking initramfs...

Besides the rootfs_initcall level, there are additional console_initcall, security_initcall and other secondary initcall levels. The last thing that we have missed is the set of the *_initcall_sync levels. Almost each *_initcall macro that we have seen in this part, has macro companion with the _sync prefix:

#define core_initcall_sync(fn)		__define_initcall(fn, 1s)
#define postcore_initcall_sync(fn)	__define_initcall(fn, 2s)
#define arch_initcall_sync(fn)		__define_initcall(fn, 3s)
#define subsys_initcall_sync(fn)	__define_initcall(fn, 4s)
#define fs_initcall_sync(fn)		__define_initcall(fn, 5s)
#define device_initcall_sync(fn)	__define_initcall(fn, 6s)
#define late_initcall_sync(fn)		__define_initcall(fn, 7s)

The main goal of these additional levels is to wait for completion of all a module related initialization routines for a certain level.

That's all.

Conclusion

In this part we saw the important mechanism of the Linux kernel which allows to call a function which depends on the current state of the Linux kernel during its initialization.

If you have questions or suggestions, feel free to ping me in twitter 0xAX, drop me email or just create issue.

Please note that English is not my first language and I am really sorry for any inconvenience. If you found any mistakes please send me PR to linux-insides..