Dafna Hirschfeld
October 09, 2019
Reading time:
Prior to joining Collabora, I took part in Round 17 of the Outreachy internships, which ran from December 2018 to March 2019. Outreachy is a paid, remote internship program. Its goal is to support people from groups underrepresented in tech, and help newcomers to free software and open source make their first contributions. Open to applicants around the world, Outreachy internships run twice a year.
Once your application is approved, you must pick an open source project to make a contribution to, in hopes of being selected as an intern, and teamed with experienced mentors. You can read more about the program here.
In my case, I was selected as an intern to work on the media subsystem of the Linux kernel, and my mentors were Helen Koike, (who is now my colleague at Collabora!) and Hans Verkuil (who works for Cisco and has been working on the media subsystem for around 15 years).
In the media subsystem there are few drivers that are 'virtual' in the sense that they do not interact with any specific hardware, but they are implemented only in software. The main purpose of those drivers is to be used to test user space applications. Since no specific hardware or architecture is needed, userspace applications can always interact with those drivers and rely on them for running their own tests. Therefore, it is important that those drivers implement the APIs accurately and have a wide support.
The virtual drivers are: vivid, vim2m, vimc and vicodec. During my Outreachy internship I worked on the vicodec driver.
vicodec
, which stands for Virtual Codec, is a driver that implements a codec format based on the Fast Walsh-Hadamard Transform. FWHT was designed to be fast and simple, and to have characteristics of other video codecs so that it faces the same issues. Applications can interact with vicodec and compress videos to this format and decompress them. You can read more details about FWHT format here.
A common problem that arises in decoding is that in many cases, sequential frames have different properties such as dimensions, pixel format and so on. With the traditional codecs API, called stateful codecs, the properties are configured before the decoding/encoding streaming starts. So when a frame has a property which is different from the configuration, the decoding stream should stop, reconfigure and then start again - this sequence is called 'Dynamic resolution change'. This causes a lag and is impractical if the frames' properties change too often.
To address this issue, a new API called Request API was recently introduced. The idea is that each frame is part of a 'request', which is basically a list of elements that are clustered together. The application first composes the request and then it asks the kernel to process it. In the context of stateless codecs, a request is a combination of the frame buffers and a list of properties that can also include pointers to reference frames. Each frame is processed separately without the need to stop and restart the decoding/encoding stream.
During my Outreachy internship, I added stateless implementation to vicodec. Applications can now interact with the vicodec driver either with the stateful or the stateless API. In the stateless API, the userspace application has to do the 'hard work' - parse the frames' headers and keep track of the reference frames and their order. The v4l-utils package is a good place for code examples of how to use the various media APIs. I used it during my internship and added support for stateless vicodec.
Here are some hands-on of how to use it.
(tl;dr
: the final script to test the stateless decoder is here).
The vicodec driver currently only implements a stateless decoder and a stateful encoder and decoder. The driver exposes three device nodes, /dev/videoX, one for each supported implementations.
The way to test vicodec
is to first run the encoder to generate encoded fwht
format files and then run the decoder on those files.
For that we should first prepare a decoded video format. During my internship I used a video from jell.yfish.us (screenshot below). You can read more about it in my internship blog. I wrote a script that generates decoded formats from that video with various pixel formats.
Here are commands to download the video and generate a few videos of dimensions 700x1000 of decoded formats in a directory images
:
wget http://jell.yfish.us/media/jellyfish-10-mbps-hd-h264.mkv mkdir images ffmpeg -i jellyfish-10-mbps-hd-h264.mkv -c:v rawvideo -pix_fmt yuv420p -f rawvideo images/jelly-1920-1080.YU12 -loglevel quiet ffmpeg -s 1920x1080 -pix_fmt yuv420p -f rawvideo -i images/jelly-1920-1080.YU12 -filter:v crop=700:1000:0:0 -pix_fmt yuyv422 -f rawvideo images/jelly-700-1000.YUYV -loglevel quiet ffmpeg -s 1920x1080 -pix_fmt yuv420p -f rawvideo -i images/jelly-1920-1080.YU12 -filter:v crop=700:1000:0:0 -pix_fmt yuv422p -f rawvideo images/jelly-700-1000.422P -loglevel quiet ffmpeg -s 1920x1080 -pix_fmt yuv420p -f rawvideo -i images/jelly-1920-1080.YU12 -filter:v crop=700:1000:0:0 -pix_fmt gray -f rawvideo images/jelly-700-1000.GREY -loglevel quiet ffmpeg -s 1920x1080 -pix_fmt yuv420p -f rawvideo -i images/jelly-1920-1080.YU12 -filter:v crop=700:1000:0:0 -pix_fmt yuv420p -f rawvideo images/jelly-700-1000.YU12 -loglevel quiet ffmpeg -s 1920x1080 -pix_fmt yuv420p -f rawvideo -i images/jelly-1920-1080.YU12 -filter:v crop=700:1000:0:0 -pix_fmt nv12 -f rawvideo images/jelly-700-1000.NV12 -loglevel quiet ffmpeg -s 1920x1080 -pix_fmt yuv420p -f rawvideo -i images/jelly-1920-1080.YU12 -filter:v crop=700:1000:0:0 -pix_fmt rgb24 -f rawvideo images/jelly-700-1000.RGB3 -loglevel quiet
In order to play the files, you can run for example:
ffplay -loglevel warning -v info -f rawvideo -pixel_format yuv422p -video_size "700x1000" images/jelly-700-1000.422P
Now we can use the files in images
to first test the vicodec encoder. Here is a command example:
v4l2-ctl -d0 --set-selection-output target=crop,width=700,height=1000 -x width=700,height=1000,pixelformat=422P --stream-mmap --stream-out-mmap --stream-to jelly_700-1000-422P.fwht --stream-from images/jelly-700-1000.422P
This will generate a 'fwht' compressed file called jelly_700-1000-422P.fwht
from the decoded file images/jelly-700-1000.422P
. The parameter -d0
indicates the use of /dev/video0
for that.
Then to test the decoder, you can generate back a decoded format from the fwht
file. This can be done with either the stateful decoder exposed in my case to /dev/video1
or the statless decoder exposed to /dev/video2
.
v4l2-ctl -d1 -x width=700,height=1000 -v width=700,height=1000,pixelformat=422P --stream-mmap --stream-out-mmap --stream-from jelly_700-1000-422P.fwht --stream-to out-700-1000.422P
Running the above command with -d2
instead of -d1
will use the stateless decoder and not the stateful decoder.
Now we want to test the stateless decoder on more interesting video. For that we will take two videos with different dimensions and we will merge them together such that each frame has a different dimension from the previous one.
I wrote a utility for that. Compile it with gcc merge_fwht_frames.c -o merge_fwht_frames
then running without params shows you how to use it:
dafna@ubuntu:~/jelly$ ./merge_fwht_frames usage: ./merge_fwht_frames
The utility gets the two files to merge along with two arguments containing the highest value for each dimension (height and width). So the following set of commands will generate a fwht
file called merged-dim.fwht
that is composed of two merged videos, one of dimensions 700x1000 and one of dimensions 800x900:
ffmpeg -s 1920x1080 -pix_fmt yuv420p -f rawvideo -i images/jelly-1920-1080.YU12 -filter:v crop=700:1000:0:0 -pix_fmt yuv422p -f rawvideo images/jelly-700-1000.422P -loglevel quiet ffmpeg -s 1920x1080 -pix_fmt yuv420p -f rawvideo -i images/jelly-1920-1080.YU12 -filter:v crop=800:900:0:0 -pix_fmt yuv422p -f rawvideo images/jelly-800-900.422P -loglevel quiet v4l2-ctl -d0 --set-selection-output target=crop,width=700,height=1000 -x width=700,height=1000,pixelformat=422P --stream-mmap --stream-out-mmap --stream-to jelly_700-1000-422P.fwht --stream-from images/jelly-700-1000.422P v4l2-ctl -d0 --set-selection-output target=crop,width=800,height=900 -x width=800,height=900,pixelformat=422P --stream-mmap --stream-out-mmap --stream-to jelly_800-900-422P.fwht --stream-from images/jelly-800-900.422P ./merge_fwht_frames jelly_700-1000-422P.fwht jelly_800-900-422P.fwht merged-dim.fwht 800 1000
And now decoding merged-dim.fwht
with the stateless decoder:
v4l2-ctl -d2 -x width=800,height=1000 -v width=800,height=1000,pixelformat=422P --stream-mmap --stream-out-mmap --stream-from merged-dim.fwht --stream-to out-800-1000.422P
Now we have a decoded video file out-800-1000.422P
which is composed of frames with interchanging dimensions. This is a 422P format in which each pixel is encoded with 2 bytes so that the first frame size is 2*700*1000=1400000
bytes, the second is 2*800*900=1440000
bytes, the third is again 1400000 bytes and so on.
The following script will separate the file out-800-1000.422P
into two files:
size=$(stat --printf="%s" out-800-1000.422P) frm1_sz=$((700 * 1000 * 2)) ex_size1=$(($frm1_sz * 450)) frm2_sz=$((800 * 900 * 2)) ex_size2=$(($frm2_sz * 450)) if [ $(($ex_size1 + $ex_size2)) != $size ]; then echo "expected size = $ex_size" echo "actual size = $size" exit 1 fi double_frame=$(($frm1_sz + $frm2_sz)) while do dd if=out-800-1000.422P obs=$double_frame ibs=$double_frame skip=$i count=1 >> tmp head -c $frm1_sz tmp >> out-mrg-700-1000.422P tail -c $frm2_sz tmp >> out-mrg-800-900.422P rm tmp i=$(($i + 1)) done
And now play the two files:
ffplay -loglevel warning -v info -f rawvideo -pixel_format "yuv422p" -video_size "700x1000" out-mrg-700-1000.422P ffplay -loglevel warning -v info -f rawvideo -pixel_format "yuv422p" -video_size "800x900" out-mrg-800-900.422P
You will see glitches on the edges of the videos. Like constant color squares on the bottom of the out-mrg-700-1000.422P
video and on the left of the out-mrg-800-900.422P
video.
This is because the fwht format uses the previous frame as a reference frame and since the dimensions of the previous frame don't match the current ones, there are those squares where frames don't overlap.
The vicodec uses the video_gop_size
control which controls the periods of I-frames (I-frames are frames that do not need reference to other frames in order to be decoded).
If you run v4l2-ctl -d0 --list-ctrls
you will see:
dafna@ubuntu:~/outreachy$ v4l2-ctl -d0 --list-ctrls User Controls min_number_of_output_buffers 0x00980928 (int) : min=1 max=1 step=1 default=1 value=1 flags=read-only Codec Controls video_gop_size 0x009909cb (int) : min=1 max=16 step=1 default=10 value=10 fwht_i_frame_qp_value 0x00990a22 (int) : min=1 max=31 step=1 default=20 value=20 fwht_p_frame_qp_value 0x00990a23 (int) : min=1 max=31 step=1 default=20 value=20
So the default value for video_gop_size
is 10 which means that there is an I-frame every 10 frames. If we set this values to 1 then each frame will be an I-frame and so we will not have those artifacts. We do this by adding the --set-ctrl video_gop_size=1
option to the decoding and encoding commands.
The final script can be found here.
Enjoy!
15/01/2025
With VirGL, Venus, and vDRM, virglrenderer offers three different approaches to obtain access to accelerated GFX in a virtual machine. Here…
19/12/2024
In the world of deep learning optimization, two powerful tools stand out: torch.compile, PyTorch’s just-in-time (JIT) compiler, and NVIDIA’s…
08/10/2024
Having multiple developers work on pre-merge testing distributes the process and ensures that every contribution is rigorously tested before…
15/08/2024
After rigorous debugging, a new unit testing framework was added to the backend compiler for NVK. This is a walkthrough of the steps taken…
01/08/2024
We're reflecting on the steps taken as we continually seek to improve Linux kernel integration. This will include more detail about the…
27/06/2024
With each board running a mainline-first Linux software stack and tested in a CI loop with the LAVA test framework, the Farm showcased Collabora's…
Comments (0)
Add a Comment