Alyssa Rosenzweig
June 05, 2020
Reading time:
⇨ Bifrost встречается с GNOME: Вперед и вверх до нуля
In our last blog update for Panfrost, the free and open-source graphics driver for modern Mali GPUs, we announced initial support for the Bifrost architecture. We have since extended this support to all major features of OpenGL ES 2.0 and even some features of desktop OpenGL 2.1. With only free software, a Mali G31 chip can now run Wayland compositors with zero-copy graphics, including GNOME 3. We can run every scene in glmark2-es2, and 3D games like Neverball can be played. In addition, we can support hardware-accelerated video players mpv and Kodi. Screenshots above are from a Mali G31 board running Panfrost.
All of the above is included in upstream Mesa with no out-of-tree patches required, with the upcoming Bifrost support enabled via the PAN_MESA_DEBUG=bifrost
environmental variable.
GNOME Shell |
Neverball |
Bringing up these new applications required implementing many new floating-point arithmetic opcodes, including comparisons, selections, and additional type conversions. Further, I’ve added initial support for integer arithmetic and bitwise operations, used to implement integer types directly as well as booleans. While there are a number of arithmetic logic unit (ALU) opcodes required, this is not an obstacle on architectures with regular instruction encodings.
Unfortunately, Bifrost is not a regular architecture and has dozens of distinct instruction encodings in order to conserve space. Adding opcodes to the compiler is still routine, but requires adding quite a bit more code. Plus, the duplication can be error-prone, so as soon as I add a new opcode, I add comprehensive tests against the real hardware iterating through different combinations of operand size and modifiers to exercise all the packing special cases.
The upshot is that the testing coverage eliminates entire classes of compiler bugs which tend to plague new drivers, allowing our open source Bifrost driver to flourish despite such a quirky architecture.
Beyond new ALU opcodes, I extended the texture support to enable simple texture operations from vertex shaders, a pattern occurring in glmark2
’s terrain scene. Mali GPUs use slightly different encodings for fragment and vertex texture operations, since fragment shaders can automatically compute the level-of-detail parameter based on neighboring fragments, whereas there is no notion of neighboring fragments in vertex shaders.
Finally, I added initial control flow support (branching) support for if/else statements and loops. As Bifrost is a Single Instruction, Multiple Thread (SIMT) architecture in which multiple threads run the same shader in lockstep, branching is a complicated affair if threads diverge. Most of the complexity is handled in hardware, but just enough seeps through that the branching implementation ends up a hair more complicated than that of Midgard. Still, it’s enough for glmark2’s loop
scene, and there’s always room for improvement.
Of couse, Bifrost progress is no obstacle to improving our Midgard support. Inspired by the lessons learned designing the Bifrost Intermediate Representation as previously blogged, I revisited our Midgard Intermediate Representation as well. The focus was two fold:
Simplify to enable faster, more effective optimizations in fewer lines of code.
Generalize the IR to support non-32-bit operation.
To do so, I implemented generic helpers for inferring instruction modifiers like saturation. Consider a shader that squares a variable and saturates it to the range [0, 1].
X = clamp(X * X, 0.0, 1.0);
In NIR, Mesa’s common intermediate representation used across drivers, this line might look like the following, using NIR’s fsat
opcode to clamp to [0, 1]:
ssa_10 = fmul ssa_9, ssa_9
ssa_11 = fsat ssa_10
Our hardware has native support for saturating the results of floating-point instructions. There are a few approaches to take advantage of this. One is to use NIR’s builtin saturation handling, as Midgard’s compiler used to. A NIR pass can fuse the fsat
instruction into the multiply, producing the NIR:
ssa_10 = fmul.sat ssa_9, ssa_9
Then our backend compiler can use the .sat
flag directly. While this is an easy approach, it is inflexible, since the hardware might be able to use modifiers that NIR does not express. For instance, Mali GPUs have a .clamp_positive
operation which does max(x, 0.0) on the result for free. If we wrote X = max(X * X, 0.0)
, NIR could give us code using a dedicated fclamp_positive
instruction:
ssa_10 = fmul ssa_9, ssa_9
ssa_11 = fclamp_positive ssa_10
However, it could not fuse the modifier in without substantial changes affecting common code. The second approach would be to compile this to two instructions in the IR, and use a second propagation pass on our backend IR to fuse it together.
10 = fmul 9, 9 10 = fmul.pos 9, 9
11 = fclamp_positive ssa_10
However, there’s a third option unifying both cases and simplifying the compiler: inferring the modifiers generically while translating NIR into our backend IR. This enables us to use architecture-specific modifiers, like .pos
, while still having the original NIR available for efficient handling. This approach enabled us to replace hundreds of lines of optimizations for floating-point modifiers and bitwise inverses, while optimizing new patterns that the original design could not, promising savings in code complexity and performance improvements. Since it’s generic, it allows us to optimize not just Midgard programs, but soon Bifrost modifiers as well.
With a simpler compiler, I was able to add 16-bit support to the Midgard compiler to reduce register pressure and improve thread count (occupancy) due to the architecture’s register sharing mechanism. As previously blogged, our Bifrost compiler is built to support this from day 1, and through the lessons learned there, I was able to backport the improvements to Midgard.
To prepare, I added types into the IR to avoid compiler passes requiring type inference, a complex and error prone pursuit. Once type sizes were preserved cleanly, I added additional support to the Midgard compiler’s packing routines to handle some outstanding details of 16-bit instructions. Midgard is significantly simpler to pack than Bifrost; whereas 16-bit and 32-bit instructions on Bifrost involve separate instructions with dramatically differing opcodes and formats, Midgard has a one-size-fits-most approach which – despite its inherent limitations – is refreshing. Miscellaneous fixes were needed across the compiler; nevertheless, the simplified IR lived up to its design and is now able to support 16-bit operations.
The bulk of the code required for FP16 has now landed in upstream Mesa but is disabled by default pending further testing. Nevertheless, for the adventurous among you, you can set PAN_MESA_DEBUG=fp16
on a recent build of master. Beware: here be dragons.
Stepping away from the compiler, an interesting improvement is the new handling of draws with colour masked out. A typical draw in OpenGL that does not use blending or colour masks might look like:
glColorMask(true, true, true, true);
glDepthMask(true);
glDrawArrays(GL_TRIANGLES, 0, 15);
Since blending is disabled and all colour channels (RGBA) are written simultaneously, this draw does not need to read from the colour buffer (tilebuffer). But what if the draw does not write to any colour channels?
glColorMask(false, false, false, false);
glDepthMask(true);
glDrawArrays(GL_TRIANGLES, 0, 15);
Naively, the GPU would need to read the previous colour and write it back immediately - but that’s wasteful. Instead, we can detect the case where no colour is written, and elide all access to the colour buffer, skipping both the read and the write.
Could we skip the draw entirely? If there are no side effects, we can, but applications typically mask out colour while also unmasking the depth buffer, which is independent of the colour computation. Midgard has a solution.
Even if depth/stencil updates are required, as long as the shader only computes colour with no side effects, there’s no reason to run the shader. While Bifrost does not appear to, Midgard allows the driver to specify a draw with no shader, saving not only colour buffer read/write but also shader execution.
In addition to our work on Midgard performance, community Panfrost hacker Icecream95 has been improving the Midgard stack nonstop.
Since our last blog post, they contributed a major bug fix for handling discard
instructions. For background, OpenGL conceptually first runs the fragment shader for each pixel on the screen and then performs depth testing. In practice, modern hardware attempts to perform depth tests before running the shader, known as “early-z” testing, in order to avoid needlessly executing the shader for occluded pixels.
However, games use discard
, an OpenGL directive allowing shaders to eliminate fragments, which can interfere with optimizations like early-z. The driver is responsible for detecting these situations, disabling these optimizations, and enabling standards-compliant fallback paths including “late-z” testing. After Icecream95 investigated issues with Panfrost’s handling of depth testing in the presence of discard
instructions, they were able to fix rendering bugs in many games including SuperTuxKart, OpenMW, and RVGL.
On the performance front, in the past they have significantly optimized Panfrost’s tiling routines and Mesa’s min/max index calculation, and added support for ASTC and ETC compressed textures.
Some Panfrost (Mali T760) screenshots of games improved by Icecream95’s patches:
Hats off to a great community contributor!
One final area that we’ve been working on is exposing Mali’s performance counters to userspace in Panfrost, allowing us to identify bottlenecks in the driver and other developers to identify bottlenecks in their application running on Panfrost. For about a year, we have had experimental support for passing the raw counters from kernelspace. Collaborans Antonio Caggiano and Rohan Garg, in conjunction with Icecream95 and other contributors, have been working on integrating these counters with Perfetto to enable high-level analysis with an elegant, free software user interface.
In the past 3 months since we began work on Bifrost, fellow Collaboran Tomeu Vizoso and I have progressed from stubbing out the new compiler and command stream in March to running real programs by May. Driven by a reverse-engineering effort in tandem with the free software community, we are confident that against proprietary blobs and downstream hacks, open-source software will prevail.
Looking to the future, we plan to improve Bifrost’s coverage of OpenGL ES 2.0 to support more 3D games, now that the basic accelerated desktop is working. We also plan to improve Bifrost compiler performance, in order to approach the proprietary stack’s performance as we did for Midgard. Most of all, we’d like to build a community around the driver, with software freedom and an open first approach as core values.
It worked for Freedreno, Etnaviv, and Lima. It worked for Panfrost on Midgard. And I’m confident it will work again on Bifrost.
Happy hacking.
19/12/2024
In the world of deep learning optimization, two powerful tools stand out: torch.compile, PyTorch’s just-in-time (JIT) compiler, and NVIDIA’s…
08/10/2024
Having multiple developers work on pre-merge testing distributes the process and ensures that every contribution is rigorously tested before…
15/08/2024
After rigorous debugging, a new unit testing framework was added to the backend compiler for NVK. This is a walkthrough of the steps taken…
01/08/2024
We're reflecting on the steps taken as we continually seek to improve Linux kernel integration. This will include more detail about the…
27/06/2024
With each board running a mainline-first Linux software stack and tested in a CI loop with the LAVA test framework, the Farm showcased Collabora's…
26/06/2024
WirePlumber 0.5 arrived recently with many new and essential features including the Smart Filter Policy, enabling audio filters to automatically…
Comments (30)
deuteragenie:
Jun 05, 2020 at 08:46 PM
Congratulations to you and everybody who is contributing to this effort !
Question : is there a plan to improve the scheduler in panfrost to be "state-of-the-art" ? As scheduling is both an art and a science, is there a way to make the scheduling architecture pluggable or at least easily replaceable , so that different approaches could be tried ?
Reply to this comment
Reply to this comment
Alyssa Rosenzweig:
Jun 05, 2020 at 09:05 PM
Both the instruction scheduler in Mesa and job scheduler in the kernel are prime candidates for optimization, which we're always looking to improve. Thank you for reading!
Reply to this comment
Reply to this comment
anon:
Jun 05, 2020 at 11:09 PM
Awesome work!
(that is all)
Reply to this comment
Reply to this comment
Alyssa Rosenzweig:
Jun 08, 2020 at 07:45 PM
Thank you!
Reply to this comment
Reply to this comment
Maor:
Jun 06, 2020 at 08:38 AM
Alyssa, you and the rest of the team are doing an amazing job. Thank you!
I have a question but i'm not an expert in this so the answer might be obvious.
I saw some videos of people running linux using an arm chip(the rk3399 for example) and the video-playing capability
Was pretty bad. Does Panfrost have support for hardware acceleration/video encoding/decoding?
Reply to this comment
Reply to this comment
sre:
Jun 08, 2020 at 04:54 PM
Hi Maor,
In ARM SoCs, hardware acceleration for video de/encoding is usually not performed by the GPU. Instead there are separate hardware blocks (IP cores) just for this task. The Rockchip rk3399 is no exception. The kernel's staging area has drivers available: CONFIG_VIDEO_ROCKCHIP_VDEC for VP9/H264/H265 codecs and CONFIG_VIDEO_HANTRO_ROCKCHIP for MPG2/VP8/H264 (rk3399 has two different IP). Note, that the drivers are still WIP. While not covered by a dedicated blog post so far, you can find some news about those drivers in our kernel blog posts.
-- Sebastian
Reply to this comment
Reply to this comment
Maor:
Jun 08, 2020 at 08:37 PM
Hi Sebastian,
Thank you very much for the detailed explanation!
Reply to this comment
Reply to this comment
jimmij:
Jun 07, 2020 at 04:28 PM
Congratulations! Thank you for making such a tremendous contribution to the free and open source community.
Reply to this comment
Reply to this comment
Alyssa Rosenzweig:
Jun 08, 2020 at 07:47 PM
Thank you!
Reply to this comment
Reply to this comment
LP:
Jun 08, 2020 at 09:46 AM
Great progress!
What machine(s) are you testing/running this on?
ASUS C101/C201? Is there anything with better specs available (in laptop or tablet form factor)?
Reply to this comment
Reply to this comment
Alyssa Rosenzweig:
Jun 08, 2020 at 07:47 PM
Personally, I use a Samsung Chromebook Plus for Midgard (Mali T860) development, and the screenshots for Bifrost are from an ODROID GO Advance. Other developers like using RK3399-based single-board computers.
Reply to this comment
Reply to this comment
Alexander Stein:
Jun 10, 2020 at 10:15 PM
This sounds really great. So you used a G31 for bifrost GPU. I would like to try/test and maybe even hack myself on a G52 (ODROID-N2). AFAICS the current mainline kernel support in panfrost is only Mali-Txxx. What did you have to change in order to use the panfrost kernel driver on the G31? Could please you share this or have even a public repository?
Reply to this comment
Reply to this comment
Alyssa Rosenzweig:
Jun 12, 2020 at 02:36 PM
Hi,
While Midgard and Bifrost have drastically different instruction sets requiring separate compilers, the interface exposed to the kernel is quite similar, so we've largely been able to reuse the already mainlined code with just a few Bifrost specific patches (https://gitlab.freedesktop.org/tomeu/linux/-/commits/panfrost-odroid-n2/ and https://gitlab.freedesktop.org/tomeu/linux/-/commits/panfrost-go-advance are WIP branches). Unfortunately, the Mali G52 on Amlogic boards still needs a few more magic kernel bits to work (hence my focus on the G31 used in Rockchip), but ironing out those bugs so you can use it on your baord is a top priority: stay tuned!
Thank you for reading.
Alyssa
Reply to this comment
Reply to this comment
Solomon Shantz-Kreutzkamp:
Nov 09, 2020 at 06:28 AM
Thanks for your hard work! Looking forward to seeing Mali G52 support upcoming!
Reply to this comment
Reply to this comment
Michal Lazo:
Jun 15, 2020 at 03:17 PM
I have Odroid C4 SBC
And I have armbian build of ubuntu 20.04 with
mesa master (https://launchpad.net/~oibaf/+archive/ubuntu/graphics-drivers)
I also added PAN_MESA_DEBUG=bifrost to /etc/environment
And it looks like ubuntu desktop is wokring
There are some glitches with ubuntu settings (gtk)
but glmark2-es-wayland is working
Nice job!!!
Reply to this comment
Reply to this comment
Alyssa Rosenzweig:
Jun 16, 2020 at 01:19 AM
Thank you!
Reply to this comment
Reply to this comment
Michal Lazo:
Jun 16, 2020 at 07:47 AM
Are there any chance to stabilize C51 ?
gnome is "running" then when I start glmark it will crash in one benchmark
btw
my experience from Odroid C4 (with mali C31)
ubuntu 20.04 gnome is running fine.
I think it will need some optimization :)
Nice work!
Reply to this comment
Reply to this comment
Alyssa Rosenzweig:
Jun 17, 2020 at 01:21 AM
Our current focus has been Mali G31, but improvements for G52 are in the pipes, as of course are optimizations!
Reply to this comment
Reply to this comment
Eric:
Jun 24, 2020 at 02:58 AM
I've used the blob driver directly to DRM to get 3D accelerated drawing without a windowing system.
Is it possible to do this with Bifrost?
Reply to this comment
Reply to this comment
Alyssa Rosenzweig:
Jun 29, 2020 at 04:41 PM
Yes, via DRM/GBM. In fact, this is how the compositors themselves (Weston, for instance) are accelerated.
Reply to this comment
Reply to this comment
Eric H:
Jun 29, 2020 at 05:51 PM
That's great. Now I need to figure out why the old code that uses DRM fails under the new driver.
Reply to this comment
Reply to this comment
Michal Lazo:
Jun 30, 2020 at 07:28 AM
I think that it will be same as many other sw
I fixed mutter for lima and panfrost.
many sw don't expect gpu device as first in /dev/dri/
and vpu as second
Reply to this comment
Reply to this comment
Andy:
Jul 01, 2020 at 05:15 PM
Alyssa, thank you so much for your hard work and dedication!!!
I have a 'X96 Max+' TV box (Amlogic S905X3 with G31) and yesterday I was able to get Panfrost running on it for the first time using Armbian with kernel 5.7.6 and Gnome desktop backed by Wayland!
'glmark2-es2-wayland' is working well on the box but the desktop gets more and more blurry over time, especially text.
One question:
Do I have to start Supertuxkart and Neverball from terminal with some special commands? When I try to run them using the respective icons they crash and send me back to login.
Another question:
Is it already possible to activate Panfrost (on G31) using a lighter desktop environment like Xfce (which is the default of Armbian) or Lxde? I did not manage to do so. 'glxinfo -B' always shows that llvmpipe is doing the rendering.
Thanks again for your brilliant work!
Reply to this comment
Reply to this comment
Alyssa Rosenzweig:
Jul 02, 2020 at 05:25 PM
Thank you for reading. Mali G31 support, while making fast progress, still has some bugs to work out. For Neverball, try PAN_MESA_DEBUG=bifrost neverball. Stay tuned for Supertuxkart and X11 support!
Reply to this comment
Reply to this comment
Andy:
Jul 03, 2020 at 08:38 AM
Thanks for your reply.
I am excited like a small kid to finally see Panfrost working on Bifrost GPUs and am very much looking forward for every small evolution of the driver - running Supertuxkart and X would be a dream :-)
Hope to sound not too impatient. You are doing an amazing job!
I would really like to give back something myself but I am not that good at programming. Could only do some testing and give feedback if that would help.
Reply to this comment
Reply to this comment
Aureal:
Jun 12, 2021 at 12:53 PM
Any tutorial to get armbian with panfrost? I have s905x3 tv box a95x f3 slim
Reply to this comment
Reply to this comment
hackan:
Jul 09, 2020 at 12:24 AM
First of all - Awesome work!
Not sure if this is the right place for this, but I'll give it a go anyway:
I got a Pinebook Pro which is running Gnome on Manjaro and I cannot change the screen color temperature. i can flip on "night light" (or whatever the feature is called in English) but it doesn't have any effect on the actual screen temperature. Has this something to do with the panfrost driver?
Reply to this comment
Reply to this comment
Alyssa Rosenzweig:
Jul 09, 2020 at 03:06 PM
Thank you! That sounds like a display driver issue, I don't think it's related to Panfrost.
Reply to this comment
Reply to this comment
Bill Sanders:
Jan 13, 2021 at 05:27 AM
Run on Khadas Vim3 under Wayfire (wayland), glmark2-es2-wayland gives mostly great results.
=======================================================
glmark2 2020.04
=======================================================
OpenGL Information
GL_VENDOR: Panfrost
GL_RENDERER: Mali G52 (Panfrost)
GL_VERSION: OpenGL ES 2.0 Mesa 21.0.0-devel (git-f01bca8100)
=======================================================
[build] use-vbo=false: FPS: 604 FrameTime: 1.656 ms
[build] use-vbo=true: FPS: 652 FrameTime: 1.534 ms
[texture] texture-filter=nearest: FPS: 1766 FrameTime: 0.566 ms
[texture] texture-filter=linear: FPS: 1781 FrameTime: 0.561 ms
[texture] texture-filter=mipmap: FPS: 1781 FrameTime: 0.561 ms
[shading] shading=gouraud: FPS: 413 FrameTime: 2.421 ms
[shading] shading=blinn-phong-inf: FPS: 420 FrameTime: 2.381 ms
[shading] shading=phong: FPS: 378 FrameTime: 2.646 ms
[shading] shading=cel: FPS: 370 FrameTime: 2.703 ms
[bump] bump-render=high-poly: FPS: 147 FrameTime: 6.803 ms
[bump] bump-render=normals: FPS: 1483 FrameTime: 0.674 ms
[bump] bump-render=height: FPS: 1361 FrameTime: 0.735 ms
[effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 751 FrameTime: 1.332 ms
[effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 300 FrameTime: 3.333 ms
[pulsar] light=false:quads=5:texture=false: FPS: 1641 FrameTime: 0.609 ms
[desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 207 FrameTime: 4.831 ms
[desktop] effect=shadow:windows=4: FPS: 911 FrameTime: 1.098 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 129 FrameTime: 7.752 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 130 FrameTime: 7.692 ms
[buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 214 FrameTime: 4.673 ms
[ideas] speed=duration: FPS: 164 FrameTime: 6.098 ms
[jellyfish] : FPS: 467 FrameTime: 2.141 ms
[terrain] : FPS: 22 FrameTime: 45.455 ms
[shadow] : FPS: 296 FrameTime: 3.378 ms
[refract] : FPS: 36 FrameTime: 27.778 ms
[conditionals] fragment-steps=0:vertex-steps=0: FPS: 1260 FrameTime: 0.794 ms
[conditionals] fragment-steps=5:vertex-steps=0: FPS: 918 FrameTime: 1.089 ms
[conditionals] fragment-steps=0:vertex-steps=5: FPS: 1233 FrameTime: 0.811 ms
[function] fragment-complexity=low:fragment-steps=5: FPS: 1223 FrameTime: 0.818 ms
[function] fragment-complexity=medium:fragment-steps=5: FPS: 912 FrameTime: 1.096 ms
[loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 1179 FrameTime: 0.848 ms
[loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 1171 FrameTime: 0.854 ms
[loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 713 FrameTime: 1.403 ms
=======================================================
glmark2 Score: 758
=======================================================
Congratulations and thank yous to the panfrost team!
In comparison to results i have seen on Raspberry Pi 4, g52 panfrost yields 2-6x framerate, except in: [bump] bump-render=high-poly and [refract], in which I score lower.
Reply to this comment
Reply to this comment
Senor:
Nov 26, 2023 at 03:40 AM
This is at the limit of my understanding to appreciate how impressive it is. What a feat. Come back to Twitter! It's hard to follow this series of accomplishments for the layperson. Look forward to testing on G31 shortly.. if I can handle even the installation XD
Reply to this comment
Reply to this comment
Add a Comment