We're hiring!
*

Reducing the size of a Rust GStreamer plugin

Guillaume Desmottes avatar

Guillaume Desmottes
April 28, 2020

Share this post:

Reading time:

A common complaint heard about Rust is the size of the binary it produces. They are various reasons explaining why Rust binaries are generally bigger that ones produced with lower level languages such as C. The main one is Cargo, Rust's package manager and building tool, producing static binaries by default. While larger binaries are generally not much of an issue for desktop or server applications, it may become more of a problem on embedded systems where storage and/or memory may be very limited.

GStreamer is used extensively at Collabora to help our clients to build embedded multimedia solutions. With Rust gaining traction among the GStreamer community as an alternative to C to write GStreamer applications and plugins, we began wondering if the size of such Rust plugins would be a problem for embedded systems, and what could be done to reduce sizes as much as possible.

Inspired by this Tiny Rocket analysis and the Minimizing Rust Binary Size repository, here are the different strategies we tried to reduce the size of a minimal Rust GStreamer plugin.

Environment

All the builds have been done on Fedora 32 using latest stable Rust with the stable-x86_64-unknown-linux-gnu toolchain.

$ rustc --version
rustc 1.42.0 (b8cedc004 2020-03-09)

We built the gst-plugin-tutorial plugin from gst-plugins-rs for this experiment. In order to make this plugin as minimal as possible we removed all the elements except rsidentity, a simpler version of the identity element. We also removed the existing profile settings to build with the default cargo settings.

Default size

Let's start by looking at the size of the plugin after a normal build:

$ cargo build
$ ls -l target/debug/libgstrstutorial.so
-rwxrwxr-x. 2 cassidy cassidy 32248640  1 avril 12:09 target/debug/libgstrstutorial.so

So 31M for a trivial plugin not doing much, that's quite a lot indeed! But this is a dev build which is not meant to be used in production. Let's retry using a release build:

$ cargo build --release
$ ls -l target/release/libgstrstutorial.so
-rwxrwxr-x. 2 cassidy cassidy 2740472  1 avril 12:14 target/release/libgstrstutorial.so

Switching from a dev build to a release one reduced the size by a factor 11!

Let's keep those metrics as reference:

build modifications size (bytes) size (human) % change
dev none 32248640 31M 0%
release none 2740472 2,7M 0%

Stripping the binary

Cargo does not strip binaries and there is currently no setting to do it. We could use cargo-strip but it's easier to just strip manually in such simple example:

$ strip target/debug/libgstrstutorial.so
$ ls -l target/debug/libgstrstutorial.so
-rwxrwxr-x. 2 cassidy cassidy 604512  1 avril 12:19 target/debug/libgstrstutorial.so
$ strip target/release/libgstrstutorial.so
$ ls -l target/release/libgstrstutorial.so
-rwxrwxr-x. 2 cassidy cassidy 305504  1 avril 12:19 target/release/libgstrstutorial.so

As the plugin is statically built, the symbols information of all the crates (dependencies) used by the plugin ended up in our binary. Stripping it removed them and so saved us a lot of space.

build modifications size (bytes) size (human) % change
dev none 32248640 31M 0%
dev stripped 604512 591K -98%
release none 2740472 2,7M 0%
release stripped 305504 299K -88%


These numbers look much better, we already have something that should be usable in most systems. But we can still save some space by tweaking Cargo's build flags. All these settings are set using Cargo's profile sections.

From this point we'll consider only the size of release builds as that's what actually matter when distributing sofware in production. So we'll set our build flags in the profile.release section of our Cargo manifest.

Use LLVM's full LTO

By using the LTO setting and reducing the number of compilation units we can request the compiler to generate smaller binaries at the cost of a higher compile time. Let's add those settings in the profile configuration, this is done by editing our Cargo.toml and setting the lto and codegen-units settings in the release profile:

[profile.release]
lto = true
codegen-units = 1

These changes reduced the plugin size quite a lot, but once stripped we notice that we actually gained only 44K.

build modifications size (bytes) size (human) % change
release none 2740472 2,7M 0%
release lto 888560 868K -67.6%
release lto + stripped 260368 255K -90.5%

Optimize for size

Cargo proposes different optimization levels. Some are more fit for debugging while others are meant to achieve better performances. There is also the 'z' level optimizing for size:

opt-level = "z"

We gained a few extra kilobytes:

build modifications size (bytes) size (human) % change
release none 2740472 2,7M 0%
release lto 888560 868K -67.6%
release lto + opt-level 855096 836K -68.8%
release lto + opt-level + stripped 231696 227K -91.5%

Abort on Panic

By default, Rust can provide a nice backtrace when panicking. This can be quite handy when debugging but consumes some space which may not be useful in production builds. Disabling backtraces on panic! can save us the size of the unwinding code in our plugin.

panic = 'abort'

Disabling this feature saved us some extra bytes as well:

build modifications size (bytes) size (human) % change
release none 2740472 2,7M 0%
release lto 888560 868K -67.6%
release lto + opt-level 855096 836K -68.8%
release lto + opt-level + panic abort 792136 774K -71.2%
release lto + opt-level + panic abort + stripped 207024 203K -92.4%

It's important to note that this change will not only remove the panic stacktrace but also affect the behavior of GStreamer Rust plugins.

gstreamer-rs provides a macro converting panics to proper GStreamer error messages that can be handled by the application.

When such panic occurs the element will be marked as unusable but the application will continue running and have a chance to gracefully handle the problem.

By setting panic = 'abort' this whole system is disabled and the application process will abort right away.

Reducing even further

At this point we used all the options usable with the stable Rust version. To reduce even further, we would have to switch to Rust nightly, the unstable version of the compiler. One interesting option would be to manually build libstd so it can benefits from our optimized build settings.

Extreme solutions such as not using libstd are not really an option here as glib-rs and gstreamer-rs are heavily using the Rust standard library.

Rust size overhead

So we managed to reduce the plugin size to 203K which is a 92% improvement from the default release build. Here are the final settings used:

[profile.release]
lto = true
codegen-units = 1
opt-level = "z"
panic = 'abort'

We reached a size reasonable enough to be used in lots of embedded use cases. But how does it compare to a C implementation? We could have used the existing identity element as a comparaison but it's bundled in the coreelements plugin and provide more feature than rsidentity.

For the sake of the experiment, we re-implemented rsidentity in C using the exact same feature and APIs. It weigths 48K reduced to 15K once stripped.

-rwxrwxr-x. 1 cassidy cassidy 15K  1 avril 15:50 libgstidentitylight.so

So the Rust size overhead seems to be around 190K for this simple plugin. That's not unexpected as the Rust version statically link on Rust's standard library and contains all the bindings code between GLib and GStreamer.

We can use cargo bloat to list the biggest dependencies. Note that those numbers are for a pre-stripped build:

$ cargo bloat --release --crates
 9.8%  58.2%  76.2KiB std
 4.6%  27.2%  35.5KiB [Unknown]
 1.4%   8.2%  10.7KiB gstreamer
 0.5%   2.9%   3.8KiB glib
 0.1%   0.8%   1.1KiB gstrstutorial
 0.1%   0.7%     934B once_cell
 0.0%   0.2%     282B gstreamer_sys
 0.0%   0.0%      64B muldiv
 0.0%   0.0%      18B futures_task
 0.0%   0.0%      18B byte_slice_cast
 0.0%   0.0%      18B futures_util
 0.0%   0.0%      17B glib_sys
16.9% 100.0% 130.8KiB .text section size, the file size is 773.6KiB

As expected the standard library is the biggest culprit here.

Actual plugins size

The plugin we used for our experimentations was very minimal. It would be interesting to look at the sizes of actual real Rust GStreamer plugins. We therefore built all the gst-plugins-rs plugins using the same build settings:

plugin size (bytes) size (human) stripped size (bytes) stripped size (human)
libgstcdg.so 2876960 2.8M 334208 327K
libgstclaxon.so 2795840 2.7M 354656 347K
libgstfallbackswitch.so 2964136 2.9M 412000 403K
libgstgif.so 2793224 2.7M 342392 335K
libgstlewton.so 2985256 2.9M 420192 411K
libgstrav1e.so 4511504 4.4M 1571208 1.5M
libgstreqwest.so 6762648 6.5M 3230480 3.1M
libgstrsaudiofx.so 815104 796K 223408 219K
libgstrsclosedcaption.so 3447056 3.3M 741240 724K
libgstrsdav1d.so 2748928 2.7M 313752 307K
libgstrsfile.so 1403832 1.4M 739592 723K
libgstrsflv.so 1007672 985K 321712 315K
libgstrusoto.so 7412336 7.1M 3734024 3.6M
libgstsodium.so 3050656 3.0M 572432 560K
libgstthreadshare.so 4530280 4.4M 1448376 1.4M
libgsttogglerecord.so 3012008 2.9M 436552 427K


It's interesting to notice that most plugins stay in the few kilobytes range with some notable exceptions. The plugins reaching the megabyte(s) size seem to be the ones relying on big Rust crates such as rav1e or reqwest. Those are "pure" Rust elements as they don't rely on external C libraries to actually process the data, like C plugins generally do.

The AV1 encoder and decoder are a good example here. The former, libgstrav1e.so, uses the rav1e crate which is also written in Rust and so is statically linked with the plugin. On the other hand, libgstrsdav1d.so wraps the dav1d C decoder to which it's dynamically linked to, so the actual decoding code isn't accounted in the plugin size.

What about ARM binaries?

So far we only considerd x86_64 binaries, however embedded devices are generally based on ARM SoC. We were interested in comparing the size of Rust plugins when built for this architecture, and wondered if we would observe any significant difference.

We therefore rebuilt all the plugins using the armv7-unknown-linux-gnueabihf toolchain as we would do to build for the Raspberry Pi, for example.

plugin size (bytes) size (human) stripped size (bytes) stripped size (human)
libgstcdg.so 2810512 2.7M 251460 246K
libgstclaxon.so 2815844 2.7M 263732 258K
libgstfallbackswitch.so 2893712 2.8M 321076 314K
libgstgif.so 2820696 2.7M 255556 250K
libgstlewton.so 2912520 2.8M 316980 310K
libgstrav1e.so 4376676 4.2M 1287752 1.3M
libgstreqwest.so 6213712 6.0M 2336548 2.3M
libgstrsaudiofx.so 902320 882K 165424 162K
libgstrsclosedcaption.so 3347412 3.2M 563580 551K
libgstrsfile.so 1348440 1.3M 501248 490K
libgstrsflv.so 1094828 1.1M 243248 238K
libgstrusoto.so 6818928 6.6M 2754168 2.7M
libgstsodium.so 3026284 2.9M 435892 426K
libgstthreadshare.so 4419844 4.3M 1132128 1.1M
libgsttogglerecord.so 2954476 2.9M 337452 330K


We notice here that ARM binaries are slightly lighter than their x86_64 equivalents and the gain from stripping is very similar on both architectures.

Conclusion

We have to keep in mind that each size reduction technique comes at a cost: binaries that are less debug friendly, higher build times, etc. Depending of our actual needs and constraints, one needs to consider the tradeoff between ease of debugging and binary size.

It's also important to note that we considered only a single Rust plugin in our setup. The total size would grow rapidly if we would have to ship multiple Rust plugins as each one would statically ship the GStreamer and GLib Rust glue code. We'll discuss and analyze in a future blog post the options to reduce the total size in such multi-plugins scenarios such as linking all just elements into a single larger Rust plugin so they can share common code.

Based on this research, we think that Rust is ready to deploy in embedded systems with limited memory resources. Rust brings numerous benefits to embedded systems, in particular, it's as fast as C/C++ but offers zero-cost abstractions, and advanced memory safety that enable rapid development and enable easier multi-threaded programming and fearless concurrency. As the GStreamer community is embracing the Rust language for its memory safety while handling untrusted multimedia data, Collabora is happy to help you bring Rust to your embedded projects.

Comments (2)

  1. dbdr:
    Apr 29, 2020 at 05:58 AM

    Reqwest depends on tokio and the whole async stack. If the plugin only needs a single (or a few) HTTP requests at the same time, using instead a lightweight HTTP client library like attohttpc would probably provide huge savings.

    Reply to this comment

    Reply to this comment

    1. Guillaume Desmottes:
      Apr 29, 2020 at 02:11 PM

      Indeed, it would be nice to have another plugin using such lighter http crate so users can pick the one fitting best for their use case.
      Feel free to try writing one if you're interested contributing to gst-plugins-rs. :)

      Reply to this comment

      Reply to this comment


Add a Comment






Allowed tags: <b><i><br>Add a new comment:


Search the newsroom

Latest Blog Posts

The state of GFX virtualization using virglrenderer

15/01/2025

With VirGL, Venus, and vDRM, virglrenderer offers three different approaches to obtain access to accelerated GFX in a virtual machine. Here…

Faster inference: torch.compile vs TensorRT

19/12/2024

In the world of deep learning optimization, two powerful tools stand out: torch.compile, PyTorch’s just-in-time (JIT) compiler, and NVIDIA’s…

Mesa CI and the power of pre-merge testing

08/10/2024

Having multiple developers work on pre-merge testing distributes the process and ensures that every contribution is rigorously tested before…

A shifty tale about unit testing with Maxwell, NVK's backend compiler

15/08/2024

After rigorous debugging, a new unit testing framework was added to the backend compiler for NVK. This is a walkthrough of the steps taken…

A journey towards reliable testing in the Linux Kernel

01/08/2024

We're reflecting on the steps taken as we continually seek to improve Linux kernel integration. This will include more detail about the…

Building a Board Farm for Embedded World

27/06/2024

With each board running a mainline-first Linux software stack and tested in a CI loop with the LAVA test framework, the Farm showcased Collabora's…

Open Since 2005 logo

Our website only uses a strictly necessary session cookie provided by our CMS system. To find out more please follow this link.

Collabora Limited © 2005-2025. All rights reserved. Privacy Notice. Sitemap.