Laura Nao
August 01, 2024
Reading time:
Over the past year, we at Collabora have embarked on a journey to improve the Linux kernel integration for everyone. A key part of that work is enhancing the quality of tests by developing new upstream tests and refining automated testing processes.
A significant portion of testing in the Linux kernel remains manual or only minimally automated, with many subsystems still lacking automated test coverage. This raised some critical questions: How can we alleviate the maintainers' burden through continuous integration (CI) testing? How can we better support developers by detecting and reporting regressions?
Driven by these questions, we started an effort to make CI systems more trustworthy and actively engage the upstream community in the testing process. Ultimately, focusing on test quality is key: good tests lead to reliable reports, which are essential for a strong CI process.
Collabora has been heavily involved in KernelCI over the years, working on its infrastructure, creating new tests, and running a dedicated lab to run tests on specific hardware platforms for our clients. During this time, we found many issues related to the quality of the tests. Then, with the recent launch of the new KernelCI infrastructure, we saw a chance to make improvements and focus on the reliability of the tests.
From our experience, tests that are poorly maintained or depend on unstable ABI often lead to false positives. Another major issue in kernel testing is fragmentation, where each subsystem operates its own CI. While different subsystems have specific CI requirements, essential aspects can be unified by using common, in-tree tests like kselftests instead of standalone ones.
To address these challenges, we developed a plan focused on improving the quality of tests in the new KernelCI system: rather than just enabling more and more tests, we focused on making sure that the tests meet some important quality criteria:
By focusing on tests that can be merged into the kernel tree, we grow a base of reliable tests that can be used for multiple CI systems.
Our initial focus was on addressing the bootrr test, a sanity checker for boards under automated test on LAVA which was generating many false positives in the legacy KernelCI system. This test relies on static descriptions of the DUT's hardware and drivers and therefore requires regular maintenance, especially since driver names can change over time. One goal of bootrr is to check if the peripherals on the DUT are correctly bound to their drivers. This can largely be done by using information from the device tree and ACPI tables, instead of manually describing the hardware. You can find out more about this test in our previous blog post.
This first test set the stage for creating more tests with the same approach: using generic kernel interfaces to provide extensive test coverage and reduce maintenance over time.
You can find a detailed summary of the tests we developed at the end of this post.
In addition to introducing new tests, we also focused on evaluating the quality of existing kselftests. Some tests upstream have been around for a while but still lack support for certain functions needed to run entirely in a non-interactive system. The suspend/resume test within the cpufreq selftest is one example; after suspend has been invoked, the test relies on an external wakeup event to resume. By adding RTC wakeup alarm support to the test, we can ensure it works in a CI environment without needing manual intervention or prior configuration. Our goal is to ensure the existing tests run well when integrated into a CI, document all configuration dependencies, and conform the output with the KTAP format.
All this work has already shown positive results by identifying failures and regressions. We will keep building on this to make testing in KernelCI even more reliable and effective.
You can follow our progress, including patch series and regression reports, here.
KernelCI is hosting a bi-weekly call on Thursday to discuss improvements to existing upstream tests, the development of new tests to increase kernel testing coverage, and the enablement of these tests in KernelCI. Minutes from the meetings are sent to the KernelCI mailing list kernelci@lists.linux.dev (see the notes). Reach out to us if you want to join the discussion or talk more about any of these topics. We look forward to working with the community to improve upstream tests and expand coverage to more areas of the kernel.
Join us at LPC in September to talk about generic device testing and boot time testing:
For more details, check out the links below:
Here's an overview of the tests we've been developing and contributing to upstream. All these tests have been enabled in the new KernelCI system and have shown their value by identifying failures and regressions in the mainline and linux-next kernels.
15/01/2025
With VirGL, Venus, and vDRM, virglrenderer offers three different approaches to obtain access to accelerated GFX in a virtual machine. Here…
19/12/2024
In the world of deep learning optimization, two powerful tools stand out: torch.compile, PyTorch’s just-in-time (JIT) compiler, and NVIDIA’s…
08/10/2024
Having multiple developers work on pre-merge testing distributes the process and ensures that every contribution is rigorously tested before…
15/08/2024
After rigorous debugging, a new unit testing framework was added to the backend compiler for NVK. This is a walkthrough of the steps taken…
01/08/2024
We're reflecting on the steps taken as we continually seek to improve Linux kernel integration. This will include more detail about the…
27/06/2024
With each board running a mainline-first Linux software stack and tested in a CI loop with the LAVA test framework, the Farm showcased Collabora's…
Comments (0)
Add a Comment