Oxidizing bmap-tools: rewriting a Python project in Rust

Oxidizing bmap-tools: rewriting a Python project in Rust

Rafael Garcia Ruiz
March 03, 2023

Share this post:

Reading time:

Back in September, I joined Collabora as an intern to work on Rust-related projects for six months. It's been a great experience and I would recommend it to anyone who is passionate about FOSS and wants to work in an inclusive environment with very skilled and supporting people!

Over the internship, my goal was to continue to "oxidise" bmaptool, a tool for creating the block map (bmap) for a file and copying files using the block map. Oxidising means to rewrite some code, in this case a project written in Python, into Rust code. Usually a project is oxidised into Rust because of many reasons, the main usually being memory safety. Mozilla has an interesting article on Oxidation if you would like to learn more about general reasons of why Rust is so great.

Of course, rewriting a project in Rust is pointless without good reasons: if a solution to a problem already exists, there should be no need to rewrite it :-). With this project, the main goal was to remove Python dependencies and instead create a statically linked binary which should save disk space and in future allow the bmap sparse file format to be used in other Rust projects. Another reason for the project was for me to gain experience in some more advanced Rust topics and have some fun during the process!

We decided to call the new project bmap-rs and host the source code on GitHub under a permissive open-source licence to allow the wider community to benefit from the project and make contributions easier.

More than just copying

bmaptool is a generic tool for flashing sparse images to a block device or file using a custom file format called bmap. The idea is that large files, like raw system image files, can be copied or flashed a lot faster and more reliably with bmaptool than with traditional tools, like dd or cp because the bmap file format allows you to only flash the used parts of the system image and also verifies each written block. The tool's main use is to flash system images into block devices, but it can also be used for general image flashing purposes. The feature we were mainly interested in was the copy subcommand:

bmaptool copy <input> <output>

The input parameter can be a local file or a remote URL, the output parameter can be a local file or a block device. We wanted bmap-rs to be able to execute that particular job in the short term, even though having other features of bmaptool would also be useful later. Then the goal was to be able to execute the following command, with the same functionality as the original project:

bmap-rs copy <input> <output>

As the project had already been started by a colleague, the first step for me would be to import the existing project into GitHub and prepare it as an open source project: setup a CI pipeline to make sure things build, a licence file, correct README and everything else needed for it to be open to contributions. For that moment on I handled bug reports, feature requests, pull-requests and updating dependencies as one of the team. From this I learnt about the never-ending responsibilities of maintaining open-source software!

The development roadmap of the internship was as follows:

Parse the XML .bmap format from a local file
Stream image contents from an HTTP/HTTPS source
Load the bmap file from the same HTTP/HTTPS source as the image
Stream a gzip compressed image file over HTTPS and verify the written checksums against the parsed bmap
Write the image contents to a normal file
Write the image contents to a block device (e.g. SD card or USB flash drive)

Bmap file

But what exactly is a bmap file and why is it so useful for this purpose? Well, it is an XML file which contains a list of mapped areas plus some additional information about the file it was created from. For example:

SHA256 checksum of the bmap file itself
SHA256 checksum of the mapped areas
the original file size
amount of mapped data

Having each mapped area's checksum, once each part is copied to the destination we can check that the information has been copied correctly and not corrupt. Having the data mapped allows to avoid reading or copping "holes", meaning a bunch of zeroes, which allows us to only copy the parts of the image which are used. Here's an example of a bmap file.

Remote input

Even if there was already an initial project containing the copying algorithm for local copy, it wasn't able to write into block devices or copy remote files. Allow copying into block devices turned out to be a simple fix. But on the other hand, allowing remote input was a bigger issue. To allow an HTTP request from the code, it had to be able to wait for the response so we needed to create an asynchronous context for that feature. At the same time, it also needed to fetch the bmap file remotely and accept a URL as input argument on the command line.

Other enhancements have been made along the way, like the implementation of a progress bar and the ability to copy an image without using a bmap file, which can be useful in cases where you have an image without holes. These features are not required but would most likely improve the experience of using it and allow bmaptools to be fully replaces. Finally, we published the crates on Crates.io, Rust's package registry. The crates bmap-rs and bmap-parser have been published and are now ready for anyone to use them and try them out!

What's next?

The intended context for this to work was to integrate it into the tests which run on real hardware in Collabora's LAVA lab. Some tests boot a minimal Linux system from a network filesystem, then use bmaptool to flash an image to the target block device, for instance the SD card or eMMC. The device can then reboot into the flashed filesystem and run tests on the image.

Using bmaptool in this way increases the size of the NFS image since it includes the Python runtime and other libraries. In comparison, using Rust allows us to generate a small statically-linked binary to do the same and even offers the ability to make further improvements in future, for instance booting to a EFI binary to complete the flashing rather than booting a complete Linux system.

Once it's integrated with LAVA it will result in an efficiency enhancement across all projects that use bmap files, resulting in a benefit for other teams in Collabora. Knowing that is the most rewarding feeling about this achievement.

There are still some features of bmaptools that could be interesting for bmap-rs to have like the create command for generating the bmap of a file and implementing some more decompression algorithms.

Wrapping Up

During the internship, I've had to constantly learn new skills and challenge myself. For the first time I've acted as the maintainer of a project, keeping it up to date and managing it using an open-source open-first philosophy. I've learned to use Rust from scratch and ended using some advanced features of the language including async features. Participating in the development of bmap-rs and acting as it's maintainer during this time has allowed me to improve on my Rust skills and overall open source contributing abilities and confidence.

This experience has also helped me to gain knowledge about the profession itself. I feel more oriented towards what kind of engineer I want to become, which areas do I intend to investigate more and which abilities do I want to obtain in the future. I see clearer than ever that I want my work to be oriented towards Open Source, so it can be reused and shared, helping many others. Likewise, I'm looking forward to finishing my degree and rejoin the team more equipped to make a better impact.

I'm really grateful to my mentor Christopher Obbard and also Gustavo Noronha. Their implication and support during this experience has helped me a lot. I appreciate a lot how Sjoerd Simons and Ryan Gonzalez has review my code with their Rust language experience and knowledge. I'm sure all I've learned during this internship is going to help me make a better impact with my future contributions to open source and seeing how my work can be useful to other people really gave me a sense of fulfilment.

Exploring Rust for Vulkan drivers, part 1

Tracing stateless video hardware decoding in V4L2

KernelCI now testing Linux Rust code

Exploring Rust for Vulkan drivers, part 1

Tracing stateless video hardware decoding in V4L2

KernelCI now testing Linux Rust code

Comments (2)

Mikko:
Mar 04, 2023 at 02:14 PM

"Here can be a local or remote file and can be a file or a block device" seems to be missing < input > and output tags.

Reply to this comment

Reply to this comment
1. Christopher Obbard:
  Mar 06, 2023 at 01:51 PM
  
  Thank you, we have updated the blog post.
  
  Reply to this comment
  
  Reply to this comment

Add a Comment

Search the newsroom

Latest Blog Posts

PipeWire workshop 2025: Updates on video transport, Rust efforts, TSN networking, and Bluetooth support

03/07/2025

As part of the activities Embedded Recipes in Nice, France, Collabora hosted a PipeWire workshop/hackfest, an opportunity for attendees…

Coccinelle for Rust progress report

25/06/2025

In collaboration with Inria, the French Institute for Research in Computer Science and Automation, Tathagata Roy shares the progress made…

Linux Media Summit 2025 recap

23/06/2025

Last month in Nice, active media developers came together for the annual Linux Media Summit to exchange insights and tackle ongoing challenges…

Constructor acquires, destructor releases

09/06/2025

In this final article based on Matt Godbolt's talk on making APIs easy to use and hard to misuse, I will discuss locking, an area where…

What if C++ had decades to learn?

21/05/2025

In this second article of a three-part series, I look at how Matt Godbolt uses modern C++ features to try to protect against misusing an…

Unleashing gst-python-ml: Python-powered ML analytics for GStreamer pipelines

12/05/2025

Powerful video analytics pipelines are easy to make when you're well-equipped. Combining GStreamer and Machine Learning frameworks are the…

About Collabora

Whether writing a line of code or shaping a longer-term strategic software development plan, we'll help you navigate the ever-evolving world of Open Source.

한국의 국기 한국어 버전의 Collabora.com 보기