Deborah Brouwer
December 02, 2022
Reading time:
Earlier this year, I joined Collabora for a six-month internship to learn how V4L2 (Video4Linux2) supports stateless video hardware decoding. My project was to build a utility that traced and replayed stateless decoding from a userspace perspective. The utility, called the v4l2-tracer, is intended to be part of v4l-utils, a collection of utilities and libraries to handle media devices. The code is currently under review on the mailing list: [PATCH v4] utils: add v4l2-tracer utility.
Although there are many excellent tracing tools, such as strace, the v4l2-tracer traces V4L2 stateless decoding more comprehensively. It adds the ability to replay (i.e. "retrace") the traced activity, portably, between different userspace environments. The project was inspired by another tool, apitrace, which provides the same tracing and retracing functionality for certain graphics APIs. Although the ability to trace V4L2 stateless decoding is interesting in itself, replaying the trace is additionally helpful for:
Each of the two main functions, tracing and retracing, are explained below.
To create a trace, the v4l2-tracer preloads a small, custom library that intercepts specific system calls made by userspace applications when decoding.
Along with basic calls that open and close devices and map memory, the v4l2-tracer primarily traces ioctls used in the V4L2 Memory-to-Memory Stateless Video Decoder Interface. Tracing these ioctls can be complex because V4L2 stateless decoding depends on a userspace application, like GStreamer, to provide crucial decoding metadata to the driver through ioctl arguments.
In V4L2 stateless decoding, userspace must parse the encoded bitstream to extract the information needed for the decoding of every frame, and then pass it to the decoder. For example, the stateless decoder does not know if any particular frame can be decoded just by itself or if it needs information from neighboring frames. If other frames are needed, the decoder doesn't know which ones. Userspace provides this crucial "state" information on a frame-by-frame basis through an ioctl with arguments that set the stateless codec controls. Subsequent ioctls will associate these controls with a unique request using the Request API. The request connects the encoded video data to its control information. The v4l2-tracer traces all of this state information and writes it to a JSON-formatted trace file.
In addition to system calls, the v4l2-tracer also traces the encoded video data passed to the kernel driver through OUTPUT buffers. The decoded video data, returned on CAPTURE buffers, is not traced by default because it is not needed for the retrace function and it significantly increases the trace file's size. Optionally, there are flags to turn on the tracing of the decoded video data or to write the video data to a separate .yuv
file which can provide a good sanity check for the decoding.
Here is an example of a command to trace the stateless decoding of a VP8 compressed file:
v4l2-tracer trace gst-launch-1.0 -- filesrc location=test-25fps.vp8 ! parsebin ! v4l2slvp8dec ! videocodectestsink
It will produce a time-stamped trace file such as:
90608_trace.json
In this example, the userspace application is a GStreamer pipeline with the stateless decoding element v4l2slvp8dec
. Of course this will only work on a machine with a stateless VP8 hardware decoder and the right kernel driver. For my internship, I used a Rockpi 4B which has a Rockchip RK3399 SoC and the Hantro VPU driver. If you want to test the v4l2-tracer without the hardware, one option is to try the virtual stateless decoder driver that is currently under development. This test-driver will not actually decode any data, but it will accept a GStreamer pipeline and return a test pattern along with debug information on the CAPTURE buffer. Another option is to use the existing virtual codec driver, vicodec, which can emulate a stateless hardware codec for the patent-free FWHT (Fast Walsh-Hadamard Transform).
The second main function of the v4l2-tracer is retracing. The JSON-formatted trace file that is the output of the trace function becomes the input for the retracing function. Here is the simplest retrace command:
v4l2-tracer retrace 90608_trace.json
It will produce a new retrace file that can be compared with the original trace file.
90608_trace_retrace.json
The newly generated retrace file should be nearly identical to the original trace file except for changes to the video and media devices, file descriptors, and memory addresses.
When retracing, the v4l2-tracer reads the trace file and mimics the original userspace application that was traced. The v4l2-tracer makes all the same system calls and writes the same encoded video data to the OUTPUT buffers in exactly the same order and with exactly the same parameters as in the original trace file. The retracing function runs independently from the original userspace application that was traced.
A trace file generated on one machine can be retraced on another machine as long as a stateless hardware decoder and its V4L2 driver are available. Since the /dev/media
and /dev/video
device numbers will usually change between different machines, the v4l2-tracer will attempt to match the driver from the trace file with the device numbers available in the retrace environment. Alternatively, to use a different driver, the user can set specific video and/or media device nodes for the retracer to use. For example, to retrace on /dev/video6
and /dev/media3
the command is:
v4l2-tracer -d6 -m3 retrace 90608_trace.json
The v4l2-tracer has lots of room to grow. So far the v4l2-tracer fully supports the tracing of MPEG2, VP8, H.264, and FWHT compression formats. The stateless controls for VP9 and HEVC formats will also be traced and retraced, since they are part of the V4L2 uAPI, but more work is needed to write the decoded video data to yuv files. The v4l2-tracer could also be adapted to trace stateless encoding in addition to decoding. Eventually, the v4l2-tracer could be used in more automated testing of stateless V4L2 drivers, for example, by randomly editing the trace files to inject errors for fuzz testing.
This internship project challenged me daily to learn, solve problems, and build new skills. It was the first time I had developed on a single-board computer and cross-compiled Linux for the ARM architecture needed by the board. I was introduced to GStreamer pipelines and how to build and configure them to run on the development board. Although I knew, theoretically, what a stateless decoder did, I didn't really understand what I was dealing with until I started to trace the hundreds of parameters parsed in userspace, watched the OUTPUT and CAPTURE buffer queues in action, and then received decoded video frames back, out-of-order, with extraneous padding skewing their display.
I cannot thank enough my mentors at Collabora, Daniel Almeida, Nícolas F. R. A. Prado, and Nicolas Dufresne, for proposing this project and their daily guidance and support. I am also deeply appreciative of advice we received from Linux media subsystem co-maintainer Hans Verkuil, and from the stateless codec developers at Collabora, along with the open-source community. This internship has enhanced not only my skills but also my confidence and commitment to open-source development, and I hope to be contributing for many more years ahead.
15/01/2025
With VirGL, Venus, and vDRM, virglrenderer offers three different approaches to obtain access to accelerated GFX in a virtual machine. Here…
19/12/2024
In the world of deep learning optimization, two powerful tools stand out: torch.compile, PyTorch’s just-in-time (JIT) compiler, and NVIDIA’s…
08/10/2024
Having multiple developers work on pre-merge testing distributes the process and ensures that every contribution is rigorously tested before…
15/08/2024
After rigorous debugging, a new unit testing framework was added to the backend compiler for NVK. This is a walkthrough of the steps taken…
01/08/2024
We're reflecting on the steps taken as we continually seek to improve Linux kernel integration. This will include more detail about the…
27/06/2024
With each board running a mainline-first Linux software stack and tested in a CI loop with the LAVA test framework, the Farm showcased Collabora's…
Comments (0)
Add a Comment