Mylène Josserand
September 25, 2020
Reading time:
In the first part of this blog post series on Linux kernel initcalls, we looked at their purpose, their usage, and ways to debug them (using initcall_debug or FTrace). In this second part, we'll go deeper into the implementation of initcalls, with a look at the colorful __device_initcall()
macro, the rootfs
initcall, and how modules can be executed.
If you haven't already read part 1, I highly recommend reading it before continuing.
Now, let's begin. Here's a reminder of what we learned in part 1:
2
in case of a postcore__define_initcall()
. This is what we will now be focusing on.If we use our dummy example, the postcore_initcall()
leads to a first __define_initcall()
(with an ID of 2) leading to another ___define_initcall()
having 3 arguments. Here is a summary of the state of our previous article:
Now, let's expand line by line this final ___define_initcall()
macro:
This define_initcall()
function is using the following parameters:
mypostcore_init
in our case)2
for postcore).initcall2
).All these parameters will be used to create an initcall_t
entry that will be named according to the given parameters. In our example, __initcall_mypostcore_init2
.
The use of keyword attribute
and section
will allow us to name an object file section. It will be .initcall2.init
in case of a postcore initcall. It will be the same for all postcore-initcalls, all grouped in sections .initcall2.init
.
Using objdump will confirm that. It is possible to look at fresh kernel object file and search for our function’s name:
$ objdump -t vmlinux.o | grep postcore_init2 000007c l O .initcall2.init 0000004 __initcall_mypostcore_init2
We have a section .initcall2.init
refering to our entry __initcall_postcore_init2
leading to our postcore dummy example. To summarize, the __define_initcall
function will create an object-file section specific to the initcall used (thanks to its id) pointing to the function created.
If we look at all existing initcall2
(i.e. postcore initcalls), we can see that the address of each function pointers follow each others:
$ objdump -t vmlinux.o | grep .initcall2.init 00000000 l O .initcall2.init 00000004 __initcall_atomic_pool_init2 00000004 l O .initcall2.init 00000004 __initcall_mvebu_soc_device2 00000008 l O .initcall2.init 00000004 __initcall_coherency_late_init2 0000000c l O .initcall2.init 00000004 __initcall_imx_mmdc_init2 00000010 l O .initcall2.init 00000004 __initcall_omap_hwmod_setup_all2 [...] 0000007c l O .initcall2.init 00000004 __initcall_mypostcore_init2 00000080 l O .initcall2.init 00000004 __initcall_rockchip_grf_init2 [...]
This initcall2.init
section contains function addresses of all postcore's initcalls registered. The order is performed on compilation time, depending on the order in Makefiles.
Level-initcalls ordering: Makefile!
<pà>Let's execute an example to prove that the ordering between all the initcalls of one level is performed via the ordering in the Makefile and not according to any other way (alphabetic order, ...).
mydriver.c
containing mydriver_func()
initcall and myotherdriver.c
containing myotherdriver_func()
initcall. Let's put these two drivers in RTC subsystem (of course, it could be anywhere else):
$ cat drivers/rtc/mydriver.c #include <linux/init.h> static int __init mydriver_func(void) { return 0; } postcore_initcall(mydriver_func); $ cat drivers/rtc/myotherdriver.c #include <linux/init.h> static int __init myotherdriver_func(void) { return 0; } postcore_initcall(myotherdriver_func);
mydriver
as being the first compiled and then, myotherdriver
:
$ git diff drivers/rtc/Makefile [...] -rtc-core-y := class.o interface.o +rtc-core-y := class.o interface.o mydriver.o myotherdriver.o
$ objdump -t vmlinux.o | grep "driver_func" 0008c3c8 l F .init.text 00000008 mydriver_func 000000c8 l O .initcall2.init 00000004 __initcall_mydriver_func2 0008c3d0 l F .init.text 00000008 myotherdriver_func 000000cc l O .initcall2.init 00000004 __initcall_myotherdriver_func2
As you can see, the address of the section is different depending on the function name: 000000c8
for __initcall_mydriver_func2
and 000000cc
for __initcall_myotherdriver_func2
. The address of __initcall_mydriver_func2
is before the one for __initcall_myotherdriver_func2
.
# cat /sys/kernel/debug/tracing/trace | grep driver_func swapper/0-1 [000] .... 0.059546: initcall_start: func=mydriver_func+0x0/0x8 swapper/0-1 [000] .... 0.059556: initcall_finish: func=mydriver_func+0x0/0x8 ret=0 swapper/0-1 [000] .... 0.059571: initcall_start: func=myotherdriver_func+0x0/0x8 swapper/0-1 [000] .... 0.059581: initcall_finish: func=myotherdriver_func+0x0/0x8 ret=0
mydriver_func
is executed before myotherdriver_func
.
$ git diff drivers/rtc/Makefile [...] -rtc-core-y := class.o interface.o +rtc-core-y := class.o interface.o myotherdriver.o mydriver.o
vmlinux.o
is also inverted:
$ objdump -t vmlinux.o | grep "driver_func" 0008c3c8 l F .init.text 00000008 myotherdriver_func 000000c8 l O .initcall2.init 00000004 __initcall_myotherdriver_func2 0008c3d0 l F .init.text 00000008 mydriver_func 000000cc l O .initcall2.init 00000004 __initcall_mydriver_func2
# cat /sys/kernel/debug/tracing/trace | grep driver_func swapper/0-1 [000] .... 0.059520: initcall_start: func=myotherdriver_func+0x0/0x8 swapper/0-1 [000] .... 0.059530: initcall_finish: func=myotherdriver_func+0x0/0x8 ret=0 swapper/0-1 [000] .... 0.059545: initcall_start: func=mydriver_func+0x0/0x8 swapper/0-1 [000] .... 0.059555: initcall_finish: func=mydriver_func+0x0/0x8 ret=0
So far, we know that creating a function as an initcall will create in each driver a section specific to the level of the initcall (postcore_initcall => .initcall2.init
) and each initcall for this particular level will be ordered in the final Kernel image according to Makefile ordering.
But how is the kernel ordering all the initcall levels between themselves? When is a postcore initcall executed relative to the other initcalls? How is it handled? Let's find out...
If you remember, each type of initcall has an ID. This is the key of the ordering. After the above part, we know that each type of initcall will have different section’s name according to its ID: .initcall1.init
, .initcall2.init
, etc
The main implementation of initcall ordering is done in init/main.c
. Yes, really, you are looking at init/main.c in Linux Kernel's code!
The initcall_levels
is an array where each entry is a pointer for this particular level. initcall_levels[]
contains different __initcall<n>_start
.
extern initcall_entry_t __initcall_start[]; extern initcall_entry_t __initcall0_start[]; extern initcall_entry_t __initcall1_start[]; extern initcall_entry_t __initcall2_start[]; extern initcall_entry_t __initcall3_start[]; extern initcall_entry_t __initcall4_start[]; extern initcall_entry_t __initcall5_start[]; extern initcall_entry_t __initcall6_start[]; extern initcall_entry_t __initcall7_start[]; extern initcall_entry_t __initcall_end[]; static initcall_entry_t *initcall_levels[] __initdata = { __initcall0_start, __initcall1_start, __initcall2_start, __initcall3_start, __initcall4_start, __initcall5_start, __initcall6_start, __initcall7_start, __initcall_end, };
We already know that initcalls is a mechanism to place chosen functions in specific object file sections. Those will be iterated over at boot time. To do that the kernel must somehow know where they actually are. This is achieved with the linker using a script which creates the __initcall<n>_start
symbols (include/asm-generic/vmlinux.lds.h
):
#define INIT_CALLS_LEVEL(level) \ __initcall##level##_start = .; \ KEEP(*(.initcall##level##.init)) \ KEEP(*(.initcall##level##s.init)) \
After compilation, the resulting linker script (arch/arm/kernel/vmlinux.lds
) looks like:
.init.data : AT(ADDR(.init.data) - 0) __initcall_start = .; KEEP(*(.initcallearly.init)) __initcall0_start = .; KEEP(*(.initcall0.init)) __initcall1_start = .; KEEP(*(.initcall1.init)) __initcall2_start = .; KEEP(*(.initcall2.init)) __initcall3_start = .; KEEP(*(.initcall3.init)) __initcall4_start = .; KEEP(*(.initcall4.init)) __initcall5_start = .; KEEP(*(.initcall5.init)) __initcallrootfs_start = .; KEEP(*(.initcallrootfs.init)) __initcall6_start = .; KEEP(*(.initcall6.init)) __initcall7_start = .; KEEP(*(.initcall7.init)) __initcall_end = .
Without being a linker script expert, we can assume that the __initcall2_start
entry points the first address of .initcall2.init
section in object file.
The main function that will process all the possible initcall levels is called do_initcalls()
and is available in init/main.c
:
static void __init do_basic_setup(void) { [...] do_initcalls(); } static void __init do_initcalls(void) { int level; [...] for (level = 0; level < ARRAY_SIZE(initcall_levels)–1;level++) { [...] do_initcall_level(level, command_line); } }
This function is handling all the levels from this array. A quick word about command_line
parameter that is only a copy of usual command-line which can contains parameters for modules. This function is calling another function do_initcall_level
where the code (simplified) is the following:
static void __init do_initcall_level(int level,char *command_line) { initcall_entry_t *fn; [...] for (fn = initcall_levels[level]; fn < initcall_levels[level+1]; fn++) do_one_initcall(initcall_from_entry(fn)); }
This above function (do_initcall_level
) is calling all the initcalls for a particular level thanks to the function do_one_initcall
. Thanks to this for-loop on the initcall_entry_t
, it will execute through the do_one_initcall
function the address of the said section which contains function pointers stored sequentially. In other words, during this for loop, the first value of the fn
is the address given by __initcall2_start
(which corresponds to the first .initcall2.init
section found). All sections are organized according to their order in the Makefiles. This for-loop will iterate on all the addresses (fn++
). This code is passing parameters for all of the addresses after iterating all initcall2.init
section:
$ objdump -t vmlinux.o | grep .initcall2.init 00000000 l O .initcall2.init 00000004 __initcall_atomic_pool_init2 00000004 l O .initcall2.init 00000004 __initcall_mvebu_soc_device2 00000008 l O .initcall2.init 00000004 __initcall_coherency_late_init2 0000000c l O .initcall2.init 00000004 __initcall_imx_mmdc_init2 00000010 l O .initcall2.init 00000004 __initcall_omap_hwmod_setup_all2 [...] 0000007c l O .initcall2.init 00000004 __initcall_mypostcore_init2 00000080 l O .initcall2.init 00000004 __initcall_rockchip_grf_init2 [...]
In the above example, the values of fn
would be:
fn
equals __initcall2_start
which correspond to the address of .initcall2.init
=00000000
=> __initcall_atomic_pool_init2
fn
equals next address of .initcall2.init
=00000004
=> __initcall_mvebu_soc_device2
fn
equals next address of .initcall2.init
=00000008
=> __initcall_coherency_late_init2
int __init_or_module do_one_initcall(initcall_t fn) { int ret; [...] do_trace_initcall_start(fn); ret = fn(); do_trace_initcall_finish(fn, ret); [...] return ret; }
The code above has two important points:
initcall_t
which corresponds to the function created by the user.To summarize, initcall_levels
is an array with a list of initcall<n>_start
for all initcalls levels. They correspond to the first address, the first .initcall<n>.init
section that will be used for each level. Take again the example of postcore_initcall
. The first initcall2.init
compiled (depending on the Makefile ordering) will have the same address than the address pointed by initcall2_start
. In do_one_initcall()
, it will be the first function executed. Then, with the for-loop from do_initcall_level()
, it will go to the next function pointer's address (thanks to fn++
) and so one until it reaches the end of all initcall2
. And then, thanks to do_initcalls()
, it will go to the next level i.e. initcall3.
If you look at all initcalls definitions, everything is based on an ID. '2' in case of postcore_initcall()
but the ID is a string rootfs
in the case of a rootfs_initcall()
. Let's have a look at this particular initcall.
In the init
folder, we can notice that it is mainly to mount a rootfs, either from an initramfs or a block device.
$ git grep rootfs_initcall init/ init/initramfs.c:rootfs_initcall(populate_rootfs); init/noinitramfs.c:rootfs_initcall(default_rootfs);
According to what we have seen previously, we will have a object-file section with the corresponding function pointer depending if an initial RAM filesytem support is enabled or not in our kernel's configuration.
$ objdump -t vmlinux.o | grep .initcallrootfs 00000000 l d .initcallrootfs.init 00000000 .initcallrootfs.init 00000000 l .initcallrootfs.init 00000000 $d 00000000 l O .initcallrootfs.init 00000004 __initcall_populate_rootfsrootfs
If you remember in the previous part of this blog post series about initcalls, using module_init()
allows modules to be executed as device_initcall
in case they are compiled builtin. In the case of a loadable module, the function will be executed at the module's insertion. The code is the following:
#define early_initcall(fn) module_init(fn) #define core_initcall(fn) module_init(fn) #define postcore_initcall(fn) module_init(fn) #define arch_initcall(fn) module_init(fn) #define subsys_initcall(fn) module_init(fn) #define fs_initcall(fn) module_init(fn) #define rootfs_initcall(fn) module_init(fn) #define device_initcall(fn) module_init(fn) #define late_initcall(fn) module_init(fn) #define console_initcall(fn) module_init(fn) /* Each module must use one module_init(). */ #define module_init(initfn) \ static inline initcall_t __maybe_unused __inittest(void) \ { return initfn; } \ int init_module(void) __copy(initfn) __attribute__((alias(#initfn)));
We have already seen the case of a non-loadable module (i.e. #ifndef MODULE
in part 1) so let's quickly look at the case of a module that can be loadable. All the initcalls are replaced by one single definition: module_init()
. This macro is creating init_module
as an alias to our function. For module, an additional part of code is added to add the init_module
alias to the .init
field of the structure module
. A function do_init_module()
is called on insertion time via syscalls. If you look closer, this function is using a function that we already talked about:
static noinline int do_init_module(struct module *mod) { [...] /* Start the module */ if (mod->init != NULL) ret = do_one_initcall(mod->init); [...]
This funtion is using our previous do_one_initcall()
function with mod->init
as the initcall's function to execute! Thanks to additional code handled by some modpost scripts, .init = init_module
and init_module
is an alias to our function.
To sum-up, when loading a loadable module, the syscall which initializes module's insertion is calling the function passed in module_init()
as a initcall. To make it more generic, it is using an alias (init_module
) to point to this specific function and using an init
field to module's structure. Thanks to the syscall mechanism, it means that when you are loading a module, the syscall will execute do_init_module()
which will execute our function directly by using the existing do_one_initcall()
.
To avoid writing again about all the mechanisms we have seen around the implementation of initcalls, I will conclude with a drawing to sum-up the interactions/implementation.
And, that's it! We have seen a lot of stuff with these two articles about initcalls. I hope you enjoyed reading this as much as I enjoyed to writing it. And it is so cool to look at the main.c
of the Linux Kernel, right?!
15/01/2025
With VirGL, Venus, and vDRM, virglrenderer offers three different approaches to obtain access to accelerated GFX in a virtual machine. Here…
19/12/2024
In the world of deep learning optimization, two powerful tools stand out: torch.compile, PyTorch’s just-in-time (JIT) compiler, and NVIDIA’s…
08/10/2024
Having multiple developers work on pre-merge testing distributes the process and ensures that every contribution is rigorously tested before…
15/08/2024
After rigorous debugging, a new unit testing framework was added to the backend compiler for NVK. This is a walkthrough of the steps taken…
01/08/2024
We're reflecting on the steps taken as we continually seek to improve Linux kernel integration. This will include more detail about the…
27/06/2024
With each board running a mainline-first Linux software stack and tested in a CI loop with the LAVA test framework, the Farm showcased Collabora's…
Comments (0)
Add a Comment