Faith Ekstrand
June 26, 2023
Reading time:
It's been a while since I've written about NVK. Rebecca, my intern, has written a couple of blog posts about her NVK work but I've been mostly quiet. Part of that is because I've been primarily focused on something else NVK will need but we'll get to that in a bit. That doesn't mean nothing has happened in NVK, though. Quite a bit has landed in the main NVK branch since October and we're long overdue for an update.
Along with Rebecca's work which you may have seen on the Collabora blog, we've seen a number of community contributions and I've done a bit of work here and there. Here are some highlights since my last post in October:
Here's a fairly complete list of extensions and notable features that have been enabled since my "Introducing NVK" blog post in October:
Importantly, we are getting very close to being able to bump our Vulkan core version. Currently, we're advertising Vulkan 1.0 but now we have most of what's needed to get us to Vulkan 1.2 or maybe even 1.3. The only two major features missing from Vulkan 1.2 are timeline semaphores and VK_KHR_sampler_ycbcr_conversion. Proper timeline semaphore work is waiting on the new kernel uAPI (more on that in a moment) and Mohamed is working on YCbCr support right now.
Before NVK will be considered a conformant Vulkan implementation, it will need to be able to pass the Vulkan CTS (conformance test suite). We've been testing with the CTS heavily during the entire development period. My latest CTS run on a GTX 2060 had the following results:
Pass: 313758, Fail: 1282, Crash: 161, Skip: 1672133, Flake: 41
The biggest difference between this and the results shared in my earlier blog post in October is that we fixed about a thousand crashes and the added features enabled about 60% more tests to be run. There's still a way to go fixing the remaining failures but it's good enough that we have decent regression testing.
Probably the single most common question I get from folks is, "When will NVK be in upstream mesa?" The short answer is that it'll be upstreamed along with the new kernel API. The new API is going to be required in order to implement Vulkan correctly in a bunch of cases. Even though it mostly works on top of upstream nouveau, I don't want to be maintaining support for that interface for another 10 years when it only partially works.
We don't yet have an exact timetable for when the new API will be ready. I'm currently hoping that we get it all upstream this year but I can't say when exactly.
Performance is still far from where it needs to be. So far, I've been more focused on getting something that's correct than getting maximum speed. As far as I know, there are no architectural problems with the driver that will prevent us from achieving good performance, but NVK is still maturing and there are a few things that are a bit naive at the moment. Here are a few of the performance issues we know about:
vkCmdPipelineBarrier()
call does a full wait-for-idle no matter what barriers are requested. This is way more aggressive than we actually need and relaxing the stall rules is going to be necessary for good performance. However, we need to be very careful when relaxing it lest we end up causing data races. As with the descriptor changes, the ability to regression test is very important here.Those are the big performance holes I'm aware of off-hand. I'm sure we'll find many more along the way. We also still have issues with reclocking on upstream kernels. Those will be solved on Turing and later with the GSP firmware but older hardware is still problematic.
Speaking of GSP firmware... It's probably worth a few words about what's happening kernel-side. Broadly speaking, the ongoing kernel work breaks down into three categories:
All this work is still ongoing and is being done by the lovely folks at Red Hat. My role is mostly in advising the kernel API design. Otherwise, I'm trusting them to do that part. As far as I understand, both GSP and the new kernel API are mostly working in some form if you have the right development branch but getting it all put together upstream is still ongoing.
I mentioned earlier that I haven't been very focused on NVK itself lately because I've been working on something else. That something else is a new back-end compiler for NVIDIA hardware, code-named NAK or Nvidia Awesome Kompiler. It's written in Rust and is intended to eventually be a replacement for the old nv50 codegen, at least for modern hardware.
Currently, I'm only targeting Turing GPUs. It will be expanded to more hardware eventually. Unfortunately, unlike with the command stream, the shader encoding does change from generation to generation so it makes sense to fix everything to a particular generation for initial development.
Getting into all the details is probably a topic for another blog post but here are a few highlights:
Overall, I've been very happy with Rust as a language for back-end compiler development. It's way more fun writing Rust code than C or C++ and I can already feel it guiding me away from mistakes. There are a few things that were tricky to get right but I'm pretty happy with the overall design.
The current development status of NAK is that the core seems to be in pretty good shape at this point. It's near parity for CTS runs as long as it's only enabled for compute shaders. My latest CTS run with NAK enabled for compute shaders had the following results:
Pass: 313148, Fail: 906, Crash: 1150, Skip: 1672133, Flake: 38
There are a few holes to be filled in yet but the biggest thing we're currently missing is support for spilling when the number of temporary values gets to be more than we can fit in registers. Also, the above assumes you're running compute-only. While some of the other shader stages do work at least some, more debugging needs to be done for 3D shaders. There are also a few opcodes that have yet to be implemented.
Looking back, it's amazing how much has happened in NVK in just the last 7 months. If development continues at this crazy pace, we may be looking at a pretty decent driver before too much longer.
07/01/2025
A testament to its long standing community interest and devote volunteers, FOSDEM will be celebrating its 25th anniversary this year. Join…
20/12/2024
The Rockchip RK3588 upstream support has progressed a lot over the last few years. As 2024 comes to a close, it is a great time to have…
09/12/2024
Collabora will be at NeurIPs this week to dive into the latest academic findings in machine learning and research advancements that are…
Comments (0)
Add a Comment