Daniel Stone
March 23, 2018
Reading time:
Following on from part 1 in the series, today we cover more developments in low-level graphics, which we’ve enabled through the whole stack. Support for buffer modifiers is something we’ve worked on in the kernel, Mesa, Wayland, Weston, Mutter and GNOME Shell, and X.Org. Meanwhile, our community continues to grow thanks to Google Summer of Code. Read on for more.
Another kernel feature from the same era we are now able to take full advantage of, is buffer modifiers.
Although we tend to think of images as being laid out in memory how they appear: left to right, then top to bottom, hardware would very much prefer you wouldn't. The most optimal way to access buffers is in tiled modes, and even supertiled modes (with up to three levels of tiles containing tiles). Not only this, but GPUs are beginning to be able to compress data quite well.
Whilst great for performance, until recently we didn't have a good way to annotate buffers with their modifiers. For some time, we could pretend that the buffers were linear and hope that we could get away with it, but with the increasing complexity of current use cases, these heuristics are not good enough. This illusion of being linear was also underpinned by magic side-channels: each driver implementing tiling support had its own way of telling userspace what the true configuration of the buffer was. But if you were not that driver, you had no way of knowing: if you want to share buffers across different GPUs, or even different drivers, you had to pessimise and disable all support for tiling, with an often unacceptable performance cost.
At Collabora, we started developing an EGL extension, which we contributed upstream to Khronos as well as support for Mesa. This extension allows GPU drivers to advertise which format modifiers (tiling/compression modes) they support, and to explicitly specify modifiers on import. We also wrote a Wayland extension with support in Weston and GStreamer, allowing Wayland to be used as a two-way transit for modifier information.
Last year, sponsored by Intel, we finished the task by adding support to the window systems. We helped finalise support for modifier advertisement in KMS, wrote an X11 equivalent to our Wayland protocol so legacy systems could also get the benefit of modifier support, wired up support in Mesa so it was able to discover the modifiers available to allocate with and communicate the chosen modifier back to the window system once done, wired this up for both Wayland and X11 clients running on any of EGL/GLX/Vulkan, implemented support for the Wayland protocol in Mutter (GNOME Shell's display server), Xwayland (for supporting legacy X11 clients in a Wayland session), and also the classic Xorg X server. Whilst we were there, we implemented support for the atomic modesetting API in the classic X.Org server, though not using planes as the X server is architecturally unable to do so.
With the groundwork having been laid by Mesa, Robert Foss implemented support for buffer modifiers in the open-source drm_hwcomposer implementation, allowing Android to get the benefit of this support. This built on top of support added to the Etnaviv open-source driver used in NXP i.MX SoC family by Lucas Stach of Pengutronix. The i.MX family is surprisingly complex, in that its GPU doesn't actually understand linear formats at all, requiring a completely separate copy to preserve the illusion that the buffer is laid out linearly. Expressing modifiers clearly and explicitly allows this copy to be elided where it is not necessary. Modifier support later spread to the open-source VC4 driver used in the Raspberry Pi (suffering similar shadow-copy problems), and now also the open-source NVIDIA Tegra driver.
The end result of all this work is that we have been able to eliminate the magic side channels which used to proliferate, and lay the groundwork for properly communicating this information across multiple devices as well. Devices supporting ARM's AFBC compression format are just beginning to hit the market, which share a single compression format between video decoder, GPU, and display controller. We are also beginning to see GPUs from different vendors share tiling formats, in order to squeeze the most performance possible from hybrid GPU systems.
Whilst it seems simple: take a 32-bit format token and 'just' add a 64-bit modifier token, the implementation was surprisingly complex. Part of the reason was that so much of the knowledge about modifiers was previously implicit, and mixing implicit and explicit systems is notoriously difficult. Unpicking all this took time and persistence, but it seems we've finally arrived at our end goal.
Finally, over the past year I had the pleasure of mentoring Roman Gilg, as he worked on reducing copies in XWayland for Google SUmmer of Code 2018. Thanks to the tireless work of Martin Peres and others, Wayland was included in GSoC under the umbrella of the X.Org Foundation. Roman's project was inside the X server, making things better for people using legacy X11 clients such as Google Chrome or Steam, when running in a Wayland session.
When Wayland clients submit content to the server, they render to a buffer which the display server then uses directly until the client provides a new one. This is true if the content is displayed directly on a hardware overlay, or if it is composited on the GPU. However, in a composited X11 environment, the compositor only has a single buffer handle for each client window, even when it's getting updated. In this environment, when the client sends a buffer to the X server, the X server copies the content of the client buffer into the staging buffer it has created for the compositor to source window content from, and releases the client buffer immediately.
This extra copy not only causes extra GPU workload, but also an unsightly tearing effect: the server may be copying new content from the client buffer at the same time as the compositor is rendering. Even worse, whilst an X server running on bare hardware can skip this for full-screen applications if there is only one screen, Xwayland cannot skip it at all.
Roman's work involved a lot of work on the core of the X11 Present extension, allowing it to do copy-free 'flips' for any window if the backend supports it. He then implemented direct flips for Xwayland, so the client buffer would be directly passed through to the Wayland server without an extra copy. This means that windowed GPU-accelerated X11 clients can be faster under Wayland then when running natively under X11.
This work is currently being reviewed, mainly by the tireless Michel Dänzer, and it's looking hopeful for landing quite soon. This will provide a great performance boost to X11 games (such as through Steam) in particular, but even browsers and other clients. Congratulations to Roman for a successful Summer of Code!
Over its various iterations, at Collabora the above work work has also seen work from Derek Foreman, Emil Velikov, Louis-Francis Ratté-Boulianne, Robert Foss, Tomeu Vizoso, and Varad Gautam; externally, in particular Ben Widawsky, Chad Versace, Daniel Vetter, Fabien Dessenne, Faith Ekstrand, Kristian Høgsberg, Michel Dänzer, Sergi Granell, and Tomohito Esaki, have provided assistance, code, review, and moral support. Thanks also to those who have sponsored our work on this: Intel, Google, Renesas, Zodiac Inflight Innovation, as well as Collabora's own internal efforts.
19/12/2024
In the world of deep learning optimization, two powerful tools stand out: torch.compile, PyTorch’s just-in-time (JIT) compiler, and NVIDIA’s…
08/10/2024
Having multiple developers work on pre-merge testing distributes the process and ensures that every contribution is rigorously tested before…
15/08/2024
After rigorous debugging, a new unit testing framework was added to the backend compiler for NVK. This is a walkthrough of the steps taken…
01/08/2024
We're reflecting on the steps taken as we continually seek to improve Linux kernel integration. This will include more detail about the…
27/06/2024
With each board running a mainline-first Linux software stack and tested in a CI loop with the LAVA test framework, the Farm showcased Collabora's…
26/06/2024
WirePlumber 0.5 arrived recently with many new and essential features including the Smart Filter Policy, enabling audio filters to automatically…
Comments (4)
Robert:
Jul 14, 2018 at 11:06 AM
Hej Daniel, I just hit these two posts and want to thank you for this highly interesting insights! I've heard about a lot of the terms before, but missed an explanation of exactly this kind (hardware planes, the x11 present extension). Wow this is such a big list of cool stuff!
I'm super excited about this great developments. The fall distributions (fedora 29, ubuntu 18.10) will make a big leap in term of wayland support and performance!
Reply to this comment
Reply to this comment
Robert:
Jul 14, 2018 at 11:10 AM
I missed to mention: when you wrote this posts, did phoronix.com cover/link them? Because I think a lot of its readers would be very interested in this stuff!
Reply to this comment
Reply to this comment
Umur Gedik:
Sep 27, 2018 at 05:42 AM
There is so much going on linux graphics stack, but unfortunately there is only few (dated) documentation is available for the developers. I want to learn low level graphics capabilities of linux and want to contribute them, but I've no idea where to start. Would be really nice if you can point some documentations/articles telling about the stack and the terminology. Great article btw, I'm very excited about the future of desktops in linux.
Reply to this comment
Reply to this comment
Daniel Stone:
Sep 27, 2018 at 09:54 AM
We don't have that yet, but it's something we really want to work on. We have a GitLab issue here which is documenting our effort so far: https://gitlab.freedesktop.org/freedesktop/freedesktop/issues/76
Reply to this comment
Reply to this comment
Add a Comment