Deborah Brouwer
October 08, 2024
Reading time:
Mesa is an open source 3D graphics library that implements a wide range of APIs including OpenGL, OpenGL ES, Vulkan, OpenCL, and hardware-acceleration interfaces like VDPAU and VA-API. It provides hardware drivers for many vendors: AMD, ARM, Broadcom, Imagination, Intel, Qualcomm, NVIDIA, and Vivante/VeriSilicon. It supports layered drivers to run APIs on top of other APIs and in virtualized environments. With software drivers like Lavapipe and LLVMpipe, it can run graphics on CPUs without dedicated GPU hardware.
Not only is Mesa large and complex, but development happens very quickly. Stable versions with bug fixes are released about every two weeks and development versions are released every three months with thousands of new commits. Mandatory and reliable pre-merge testing is essential to Mesa’s rapid development model. Its continuous integration (CI) system has been so successful that it now provides scripts and setup for DRM-CI to carry out pre-merge testing for the DRM subsystem of the Linux kernel.
So, what do we mean by pre-merge
testing? Mesa developers work in parallel in their own forks of the Mesa repository hosted by Freedesktop. Once a change is ready for community review, a developer opens a merge request against the main branch of Mesa. The affected codeowners will discuss, acknowledge, and/or review the changes. Then, after the review is complete, anyone with a "Developer" or higher role in the project can initiate a merge by assigning the merge request to Mesa's marge-bot.
Marge-bot will ensure that, before any changes are merged, the new code passes CI testing. This pre-merge testing is available, on demand, 24/7, providing access to real hardware distributed globally on CI farms. Given the critical role of CI in Mesa's development, supporting it is a large community effort involving many companies and developers. Members of Collabora’s Mesa CI team: Antonio Ospite, Daniel Stone, Deborah Brouwer, Guilherme Alcarde Gallo, Sergi Blanch Torne, and Vignesh Raman, are dedicated to this community effort of keeping Mesa CI running.
Let's take a closer look at how pre-merge testing is implemented for Mesa. Say, for example, a developer has forked the Mesa repository and made a change to Intel’s Gallium driver Iris. Assuming the developer has configured their fork to use the same CI/CD setting as Mesa itself, pushing a commit will automatically create a new pipeline in their repository. Opening the pipeline tab may be disappointing, however, as the pipeline sits empty and grey. The developer has encountered the first line of defense protecting Mesa’s CI infrastructure: developers need permission to run Mesa’s CI pipelines. Access to the infrastructure is controlled by membership in the CI-OK group. Since running a pipeline essentially gives developers free access to run any code changes on any of Freedesktop's GitLab runners that are registered with the Mesa project, these resources need to be protected from abuse.
However, once the developer has permission to run a pipeline, pushing changes to a branch will still not start any CI activity. All of the jobs sit dormant waiting for manual action by the developer. This is another mechanism designed to protect Mesa’s CI resources since not every code change, particularly during development, needs to immediately trigger CI jobs. Developers may push code changes to share and collaborate during development, or just to save their work against local failures, and it would be a waste of resources to run pipelines automatically on development forks. If the developer does want to initiate some of the CI jobs on their fork, the GitLab user interface provides a manual button to initiate a job. Here is an example of the debian/x86_64_build-base job running after the developer starts it manually:
This manual action works fine for jobs that don’t depend on other jobs to run, but a closer look at the pipeline shows that many of the jobs simply can’t be started manually. These jobs depend on a series of other jobs completing successfully before they become available. Often the most interesting jobs are driver-specific and depend on multiple other jobs completing before they can be run. If our hypothetical developer is working on a change to the Iris driver they might want to make sure that they haven’t caused any regressions on devices running the Gemini Lake platform. The iris-glk-deqp
job will run dEQP quality and conformance tests on an HP Chromebook located in Collabora’s Lava lab, but first all of these pre-requisite jobs need to be run:
Since it is very dull to click through the user-interface and wait for each of these jobs to complete, developers can use the command-line tool ci_run_n_monitor to automatically start one or more test targets and just those jobs necessary to support those target jobs. A simple command might be:
bash ./bin/ci/ci_run_n_monitor.sh \ --pipeline-url https://gitlab.freedesktop.org/dbrouwer/mesa/-/pipelines/1260557 \ --target "iris-glk-deqp"
This command generates a nicely pared down pipeline with just the essential jobs running for targeted tests:
Since developer forks don't run pipelines automatically, the pipelines will include unrelated jobs, but the ci_run_n_monitor tool makes it easy for developers to target just the jobs that they want to test.
Once a developer opens a merge request against the main Mesa repository, the CI system is more selective about which jobs to add to the pipeline because all of these jobs will need to pass before a proposed change is merged into Mesa. While a developer's fork for an Intel driver will indiscriminately include about 250 jobs from the entire Mesa project, the developer's pre-merge pipeline includes only about 85 jobs for Intel hardware and the build environment that supports it. Here is the pre-merge pipeline for the same changes as above:
Selecting only the essential jobs is necessary for efficient development, but it's not sufficient. Since the goal is to complete the CI testing quickly enough to keep pace with Mesa development, merge requests can't back up waiting for CI to run. Even with a reduced jobs set such as above, the sheer number of tests would quickly overwhelm CI infrastructure. For example, just the single iris-glk-deqp
job runs dEQP and Khronos Conformance tests for OpenGL ES 2.0, 3.0, 3.1, and OpenGL 4.6 for a total of over 130 thousand discrete tests.
One method of reducing the pre-merge runtime is to run only a fraction of the total tests otherwise available in caselists. The iris-glk-deqp
job sends only one of every six possible tests for OpenGL ES 2.0; one of every eight tests for OpenGL ES 3.0 and 3.1; and every other test for the remaining standards. Mesa CI also uses the deqp-runner tool to parallelize tests across a single system. Deqp-runner itself accepts fractional arguments, and in the case of iris-glk-deqp
it runs every other test, ultimately winnowing down the run time to about 6 minutes for 14 thousand tests. Furthermore, as long as sufficient CI farm resources are available, most jobs can run in parallel across different machines. So while iris-glk-deqp
is running, so are at least 15 other jobs working simultaneously.
The run time for a pre-merge pipeline varies depending on the scope of the proposed changes and the number of jobs that need to be run. Changes that affect all of Mesa or revise the underlying structure of the CI system will take longer than usual, whereas small changes to drivers may finish quickly. In general, the pre-merge pipelines run quickly; for example, in the last week, the average run time for a pre-merge pipeline was 34 minutes with a margin of plus or minus 15 minutes on either side.
While a more traditional open source development model may rely on a hierarchy of maintainers to review and manually accept code changes trusting that developers will fix what they have broken, the power of pre-merge testing is to distribute and, to some extent, parallelize the process of contributing to the code base. Mesa CI allows developers to work continuously in tandem, relying on objective testing to protect against significant regressions, and avoiding the bottleneck of a manual merge process. Pre-merge testing in Mesa's CI system ensures that every contribution is rigorously tested before merging. Mesa CI works silently in the background, keeping everything running smoothly across various hardware configurations.
Here we have just scratched the surface of the Mesa CI system. A whole series of posts is planned to describe the complexity of how Mesa uses templates, containers, compression, and stored images at each stage of the CI pipeline. The Collabora team is always working to improve Mesa CI, continuously monitoring performance, adding new hardware, and improving tools to make the CI system easier to use and more efficient. We look forward to sharing more with you in the future.
19/12/2024
In the world of deep learning optimization, two powerful tools stand out: torch.compile, PyTorch’s just-in-time (JIT) compiler, and NVIDIA’s…
08/10/2024
Having multiple developers work on pre-merge testing distributes the process and ensures that every contribution is rigorously tested before…
15/08/2024
After rigorous debugging, a new unit testing framework was added to the backend compiler for NVK. This is a walkthrough of the steps taken…
01/08/2024
We're reflecting on the steps taken as we continually seek to improve Linux kernel integration. This will include more detail about the…
27/06/2024
With each board running a mainline-first Linux software stack and tested in a CI loop with the LAVA test framework, the Farm showcased Collabora's…
26/06/2024
WirePlumber 0.5 arrived recently with many new and essential features including the Smart Filter Policy, enabling audio filters to automatically…
Comments (0)
Add a Comment