Rafael Garcia Ruiz
March 03, 2023
Reading time:
Back in September, I joined Collabora as an intern to work on Rust-related projects for six months. It's been a great experience and I would recommend it to anyone who is passionate about FOSS and wants to work in an inclusive environment with very skilled and supporting people!
Over the internship, my goal was to continue to "oxidise" bmaptool, a tool for creating the block map (bmap) for a file and copying files using the block map. Oxidising means to rewrite some code, in this case a project written in Python, into Rust code. Usually a project is oxidised into Rust because of many reasons, the main usually being memory safety. Mozilla has an interesting article on Oxidation if you would like to learn more about general reasons of why Rust is so great.
Of course, rewriting a project in Rust is pointless without good reasons: if a solution to a problem already exists, there should be no need to rewrite it :-). With this project, the main goal was to remove Python dependencies and instead create a statically linked binary which should save disk space and in future allow the bmap sparse file format to be used in other Rust projects. Another reason for the project was for me to gain experience in some more advanced Rust topics and have some fun during the process!
We decided to call the new project bmap-rs and host the source code on GitHub under a permissive open-source licence to allow the wider community to benefit from the project and make contributions easier.
bmaptool
is a generic tool for flashing sparse images to a block device or file using a custom file format called bmap
. The idea is that large files, like raw system image files, can be copied or flashed a lot faster and more reliably with bmaptool than with traditional tools, like dd
or cp
because the bmap
file format allows you to only flash the used parts of the system image and also verifies each written block. The tool's main use is to flash system images into block devices, but it can also be used for general image flashing purposes. The feature we were mainly interested in was the copy subcommand:
bmaptool copy <input> <output>
The input parameter can be a local file or a remote URL, the output parameter can be a local file or a block device. We wanted bmap-rs to be able to execute that particular job in the short term, even though having other features of bmaptool would also be useful later. Then the goal was to be able to execute the following command, with the same functionality as the original project:
bmap-rs copy <input> <output>
As the project had already been started by a colleague, the first step for me would be to import the existing project into GitHub and prepare it as an open source project: setup a CI pipeline to make sure things build, a licence file, correct README and everything else needed for it to be open to contributions. For that moment on I handled bug reports, feature requests, pull-requests and updating dependencies as one of the team. From this I learnt about the never-ending responsibilities of maintaining open-source software!
The development roadmap of the internship was as follows:
Parse the XML .bmap format from a local file
Stream image contents from an HTTP/HTTPS source
Load the bmap file from the same HTTP/HTTPS source as the image
Stream a gzip compressed image file over HTTPS and verify the written checksums against the parsed bmap
Write the image contents to a normal file
Write the image contents to a block device (e.g. SD card or USB flash drive)
But what exactly is a bmap file and why is it so useful for this purpose? Well, it is an XML file which contains a list of mapped areas plus some additional information about the file it was created from. For example:
SHA256 checksum of the bmap file itself
SHA256 checksum of the mapped areas
the original file size
amount of mapped data
Having each mapped area's checksum, once each part is copied to the destination we can check that the information has been copied correctly and not corrupt. Having the data mapped allows to avoid reading or copping "holes", meaning a bunch of zeroes, which allows us to only copy the parts of the image which are used. Here's an example of a bmap file.
Even if there was already an initial project containing the copying algorithm for local copy, it wasn't able to write into block devices or copy remote files. Allow copying into block devices turned out to be a simple fix. But on the other hand, allowing remote input was a bigger issue. To allow an HTTP request from the code, it had to be able to wait for the response so we needed to create an asynchronous context for that feature. At the same time, it also needed to fetch the bmap file remotely and accept a URL as input argument on the command line.
Other enhancements have been made along the way, like the implementation of a progress bar and the ability to copy an image without using a bmap file, which can be useful in cases where you have an image without holes. These features are not required but would most likely improve the experience of using it and allow bmaptools to be fully replaces. Finally, we published the crates on Crates.io, Rust's package registry. The crates bmap-rs and bmap-parser have been published and are now ready for anyone to use them and try them out!
The intended context for this to work was to integrate it into the tests which run on real hardware in Collabora's LAVA lab. Some tests boot a minimal Linux system from a network filesystem, then use bmaptool
to flash an image to the target block device, for instance the SD card or eMMC. The device can then reboot into the flashed filesystem and run tests on the image.
Using bmaptool
in this way increases the size of the NFS image since it includes the Python runtime and other libraries. In comparison, using Rust allows us to generate a small statically-linked binary to do the same and even offers the ability to make further improvements in future, for instance booting to a EFI binary to complete the flashing rather than booting a complete Linux system.
Once it's integrated with LAVA it will result in an efficiency enhancement across all projects that use bmap
files, resulting in a benefit for other teams in Collabora. Knowing that is the most rewarding feeling about this achievement.
There are still some features of bmaptools
that could be interesting for bmap-rs to have like the create
command for generating the bmap
of a file and implementing some more decompression algorithms.
During the internship, I've had to constantly learn new skills and challenge myself. For the first time I've acted as the maintainer of a project, keeping it up to date and managing it using an open-source open-first philosophy. I've learned to use Rust from scratch and ended using some advanced features of the language including async features. Participating in the development of bmap-rs and acting as it's maintainer during this time has allowed me to improve on my Rust skills and overall open source contributing abilities and confidence.
This experience has also helped me to gain knowledge about the profession itself. I feel more oriented towards what kind of engineer I want to become, which areas do I intend to investigate more and which abilities do I want to obtain in the future. I see clearer than ever that I want my work to be oriented towards Open Source, so it can be reused and shared, helping many others. Likewise, I'm looking forward to finishing my degree and rejoin the team more equipped to make a better impact.
I'm really grateful to my mentor Christopher Obbard and also Gustavo Noronha. Their implication and support during this experience has helped me a lot. I appreciate a lot how Sjoerd Simons and Ryan Gonzalez has review my code with their Rust language experience and knowledge. I'm sure all I've learned during this internship is going to help me make a better impact with my future contributions to open source and seeing how my work can be useful to other people really gave me a sense of fulfilment.
19/12/2024
In the world of deep learning optimization, two powerful tools stand out: torch.compile, PyTorch’s just-in-time (JIT) compiler, and NVIDIA’s…
08/10/2024
Having multiple developers work on pre-merge testing distributes the process and ensures that every contribution is rigorously tested before…
15/08/2024
After rigorous debugging, a new unit testing framework was added to the backend compiler for NVK. This is a walkthrough of the steps taken…
01/08/2024
We're reflecting on the steps taken as we continually seek to improve Linux kernel integration. This will include more detail about the…
27/06/2024
With each board running a mainline-first Linux software stack and tested in a CI loop with the LAVA test framework, the Farm showcased Collabora's…
26/06/2024
WirePlumber 0.5 arrived recently with many new and essential features including the Smart Filter Policy, enabling audio filters to automatically…
Comments (2)
Mikko:
Mar 04, 2023 at 02:14 PM
"Here can be a local or remote file and can be a file or a block device" seems to be missing < input > and output tags.
Reply to this comment
Reply to this comment
Christopher Obbard:
Mar 06, 2023 at 01:51 PM
Thank you, we have updated the blog post.
Reply to this comment
Reply to this comment
Add a Comment