Faith Ekstrand
June 09, 2022
Reading time:
One of the important lessons the graphics industry has learned over the last decade or so is the need for explicit synchronization between different pieces of asynchronous accelerator work. The Vulkan graphics and compute API learned this lesson and uses explicit synchronization for everything. On Linux, unfortunately, we still need implicit synchronization to talk to various window systems which has caused pain for Linux Vulkan drivers for years.
With older graphics APIs like OpenGL, the client makes a series of API calls, each of which either mutates some bit of state or performs a draw operation. There are a number of techniques that have been used over the years to parallelize the rendering work, but the implementation has to ensure that everything appears to happen in order from the client's perspective. While this served us well for years, it's become harder and harder to keep the GPU fully occupied. Games and other 3D applications have gotten more complex and need multiple CPU cores in order to have enough processing power to reliably render their entire scene in less than 16 milliseconds and achieve a smooth 60 frames per second. GPUs have also gotten larger with more parallelism, and there's only so much a driver can do behind the client's back to parallelize things.
To improve both GPU and CPU utilization, modern APIs like Vulkan take a different approach. Most Vulkan objects such as images are immutable: while the underlying image contents may change, the fundamental properties of the image such as its dimensions, color format, and number of miplevels do not. This is different from OpenGL where the application can change any property of anything at any time. To draw, the client records sequences of rendering commands in command buffers which are submitted to the GPU as a separate step. The command buffers themselves are still stateful, and the recorded commands have the same in-order guarantees as OpenGL. However, the state and ordering guarantees only apply within the command buffer, making it safe to record multiple command buffers simultaneously from different threads. The client only needs to synchronize between threads at the last moment when they submit those command buffers to the GPU. Vulkan also allows the driver to expose multiple hardware work queues of different types which all run in parallel. Getting the most out of a large desktop GPU often requires having 3D rendering, compute, and image/buffer copy (DMA) work happening all at the same time and in parallel with the CPU prep work for the next batch of GPU work.
Enabling all this additional CPU and GPU parallelism comes at a cost: synchronization. One piece of GPU work may depend on other pieces of GPU work, possibly on a different queue. For instance, you may upload a texture on a copy queue and then use that texture on a 3D queue. Because command buffers can be built in parallel and the driver has no idea what the client is actually trying to do, the client has to explicitly provide that dependency information to the driver. In Vulkan, this is done through VkSemaphore
objects. If command buffers are the nodes in the dependency graph of work to be done, semaphores are the edges. When a command buffer is submitted to a queue, the client provides two sets of semaphores: a set to wait on before executing the command buffer and a set to signal when the command buffer completes. In our texture upload example, the client would tell the driver to signal a semaphore when the texture upload operation completes and then have it wait on that same semaphore before doing the 3D rendering which uses the texture. This allows the client to take advantage of as much parallelism as it can manage while still having things happen in the correct order as needed.
Everything we just discussed is in the context of a single client trying to get as much out of the GPU as it can. But what if we have multiple clients involved? While this isn't something most game engine developers want to think about, it's critical when you look at a desktop system as a whole. In the common case, you don't just have a single game rendering and displaying on the screen; you have multiple clients all rendering their own window and a compositor putting everything together into the final image you see on-screen. If you're watching a video, you may also have a video decoder which is feeding into your web browser, adding another layer of complexity.
All of these pieces working together to create the final image you see on your screen poses many of the same problems as multi-queue rendering. Instead of having multiple queues being targeted by a single client, each client has its own queues and we need to synchronize between them. In particular, we need to make sure that the composition happens after each of the clients has completed its rendering or else we risk getting stale or incomplete data on the screen.
The way this typically works with OpenGL on Linux is that the client will draw to the back buffer (framebuffer 0) and then call eglSwapBuffers()
(or glxSwapBuffers()
if using X11 and GLX). Inside the eglSwapBuffers()
call, the driver ensures that all the rendering work has been submitted to the kernel driver and then hands the back buffer to the compositor (either the X server or a Wayland compositor) to be composited in the next frame. The compositor then submits its rendering commands to composite the latest frames from all the apps. Who ensures that the compositor's rendering work happens only after all the client's have completed rendering to their respective back buffers? The kernel does, implicitly. For each shared buffer, it tracks all the GPU work which has been submitted globally, across the entire system, which may touch that buffer and ensures it happens in order. While this auto-magic tracking sounds nice, it has the same over-synchronization downsides as the rest of OpenGL that we discussed above.
The way this is supposed to work with Vulkan is via explicit synchronization. The client first acquires an image to render to via vkAcquireNextImageKHR()
which takes an optional semaphore and fence to be signaled once the acquired image is actually ready for rendering. The client is expected to block its rendering on that semaphore or fence. Then, once the client has submitted its rendering, it calls vkQueuePresentKHR()
and passes it a set of semaphores to wait on before reading the image. Exactly how those fences and semaphores get shared between the compositor and client and exactly what they do is left as an implementation detail. The mental model, however, is that the semaphore and fence in vkAcquireNextImage()
are the signaled semaphore and fence from the compositor's last GPU job which read that image and the semaphores passed to vkQueuePresentKHR()
are the ones the compositor waits on before compositing.
The description above is how Vulkan is "supposed to work" because, as nice as that mental model is, it's all a lie. The fundamental problem is that, even if the app is using Vulkan, the compositors are typically written in OpenGL and the window-system protocols (X11 and Wayland) are written assuming implicit synchronization. In Wayland, once the wl_surface.commit
request is sent across the wire, the compositor is free to assume the surface is ready and begin rendering, trusting in implicit synchronization to make it all work. There has been some work to allow passing sync files along with wl_surface.commit
and wl_buffer.release
events but it's still incomplete and not broadly supported. The X11 PRESENT extension has a mechanism for creating a synchronization primitive which is shared between the X server and client. However, that primitive is always xshmfence
which only synchronizes between the two userspace processes on the CPU; implicit synchronization is required to ensure the GPU work happens in order. In the end, then, in spite of all the nice explicit semaphores we have in the Vulkan window-system APIs, we have to somehow turn that into implicit synchronization because we live in an implicit synchronized world.
As a quick side note, none of the above is a problem on Android. The Android APIs are designed to use explicit synchronization from the ground up. All the SurfaceFlinger
APIs pass sync files between the client and compositor to do the synchronization. This maps fairly well to the Vulkan APIs. It's only generic Linux where we have a real problem here.
If implicit synchronization is auto-magic and handled by the kernel, doesn't Vulkan get it for free? Why is this a problem? Good questions! Yes, Vulkan drivers are running on top of the same kernel drivers as OpenGL but they typically shut off implicit synchronization and use the explicit primitives. There are a few different reasons for this, all of which come down to trying to avoid over-synchronization:
With multiple queues the client can submit to, if implicit synchronization were enabled, the client might end up synchronizing with itself more than needed. We don't know what the client is trying to do and it's better to only do the synchronization it explicitly asks for so we can get maximum parallelism and keep that beast full.
Vulkan doesn't know when a piece of memory is being written as opposed to read, so we would always have to assume the worst case. The kernel implicit synchronization stuff is smart enough to allow multiple simultaneous reads but only one client job writing at a time. If everything looks like a write, everything which touches a given memory object would get serialized.
Vulkan lets the client sub-allocate images out of larger memory objects. Because the kernel's implicit synchronization is at the memory object granularity, every job which touches the same memory object would get synchronized, even two jobs are accessing completely independent images within it.
If you're using bindless (UPDATE_AFTER_BIND_BIT
) or buffer device address, the Vulkan driver doesn't even know which memory objects are being used by any given command buffer. It has to assume any memory object which exists may be used by any command buffer. If we left implicit synchronization enabled, this would mean everything would synchronize on everything else.
Each of those can be pretty bad by itself but when you put them together the result is that, in practice, using implicit synchronization in Vulkan would completely serialize all work and kill your multi-queue parallelism. So we shut it off if the kernel driver allows it.
If we're turning off implicit synchronization, how do we synchronize with the window system? That's the real question, isn't it? There are a number of different strategies for this which have been employed by various drivers over the years and they all come down to some form of selective enabling of implicit synchronization. Also, they're all terrible and lead to over-synchronization somewhere.
The RADV driver currently tracks when each window-system buffer is acquired by the client and only enables implicit synchronization for window-system buffers and only when owned by the client. Thanks to details of the amdgpu kernel driver, enabling implicit synchronization doesn't actually cause the client to synchronize with itself when doing work on multiple queues. However, because of our inability to accurately track when a buffer is in use, this strategy leads to over-synchronization if the client acquires multiple images from the swapchain and is working on them simultaneously. There's no way for us to separate which work is targeting which image and only make the vkQueuePresentKHR()
wait on the work for the one image.
In ANV (the Intel driver), we get hints from the window-system code and flag the window-system buffer as written by the dummy submit done as part of vkQueueSubmit()
and consider it to be read by everything that waits on the semaphore from vkAcquireNextImageKHR()
. This strategy works well for GPU <-> GPU synchronization but we run into problems when implementing vkWaitForFences()
for the fence from vkAcquireNextImageKHR()
. That has to be done via DRM_IOCTL_I915_GEM_WAIT
which can't tell the difference between the compositor's work and work which has since been submitted by the client. If you call vkWaitForFences()
on such a fence after submitting any client work, it basically ends up being a vkDeviceWaitIdle()
which isn't at all what you want.
If you didn't follow all that, don't worry too much. It's all very complicated and detailed and annoying. The important thing to understand is that there is no one strategy for dealing with this; every driver has its own. Also, all the strategies we've employed to date can cause massive over-synchronization somewhere. We need a better plan.
So, how do we do this better? We've tried and failed so many times. Is there a better way? Yes, I believe there is.
Before getting into the details of marrying implicit and explicit synchronization, we need to understand how implicit synchronization works in the kernel. Each graphics memory allocation in the kernel is represented by a dma-buf object. (This corresponds to a VkDeviceMemory
object in Vulkan or a single buffer or image in OpenGL.) Each dma-buf object in the kernel has a dma reservation object attached to it which is a container of dma fences. A dma fence is a lightweight object which represents an event that is guaranteed to happen at some point, possibly in the future. Whenever some GPU job is enqueued in kernel space, a dma fence is created which signals when that GPU job is complete and that fence is added to the reservation object on any buffers used by that job. Each dma fence in the reservation object has a usage flag saying what kind of fence it is. When a job is created, it captures some subset of the fences associated with the buffers used by the job and the kernel waits on those before executing the job. Depending on the job and its relation to the buffer in question, it may wait on some or all of the fences. For instance, if doing a read with implicit synchronization, the job must wait on any fences from previously enqueued jobs which write the buffer.
So how do we tie implicit and explicit sync together? Let userspace extract and set fences itself, of course! The new API, which should be in Linux 5.20, adds two new ioctls on dma-buf file descriptors which allow userspace to extract and insert fences directly. In userspace, these dma fences are represented by sync files. A sync file wraps a dma fence which turns it into a file descriptor that can be passed around by userspace and waited on via poll()
. The first ioctl extracts all of the fences from a dma-buf's reservation object and returns them to userspace as a single sync file. It takes a flags parameter which lets you specify whether you expect to read the data in the dma-buf, write it, or both. If you specify read-only, the returned sync file will only contain write fences but if you specify write or read-write, the returned sync file will wait on all implicit sync fences currently in the reservation object. The second ioctl allows userspace to add a sync file to the reservation object. It also takes read/write flags to allow you to control whether the newly added fence is considered a write fence or only a read fence.
These new ioctls have unfortunately been quite a long time in coming. I typed the initial patches around a year ago and they got quickly nacked by Christian König at AMD who saw two big problems. First was that the sync file export patch was going to cause serious over-synchronization if it was ever used on the amdgpu kernel driver because of some of the clever tricks they play with dma fences to avoid over-synchronization internally. Second, thanks to the design of reservation objects at the time, the sync file import patch made both him and Daniel Vetter nervous because of the way it let userspace add arbitrary dma fences that might interact with low-level kernel operations such as memory eviction and swapping. Neither Daniel nor Christian was opposed to the API in principle, but it had to wait until we had solutions to those problems. Over the course of past year, Christian has been working steadily on refactoring the amdgpu driver and reworking the design of reservation objects away from the old read/write lock design towards a new "bag of fences" design which allows a lot more flexibility. Now that his work has landed, it's safe to go ahead with the new fence import/export API and it should be landing in time for Linux 5.19.
With this new API, we can finally move to a new implicit synchronization strategy in the Mesa Vulkan window-system code which should work correctly for everyone with no additional over-synchronization. In vkAcquireNextImageKHR()
, we can export the fences from the dma-buf that backs the window-system image as a sync file and then import that sync file into the semaphore and fence provided by the client. Because the export takes a snapshot of the dma fences, any calls to vkWaitForFences()
on the acquire fence won't have the GPU-stalling effect the ANV solution has today. In vkQueuePresentKHR()
, instead of playing all the object ownership and memory object signaling tricks we play today, we can take the wait semaphores passed in from the client or produced by the present blit, turn them into a sync file, and then import that sync file into the dma-buf that backs the window system image before handing it off to the compositor. As far as the compositor is concerned, we look just like an OpenGL driver using implicit synchronization and, from the perspective of the Vulkan driver, it all looks like explicit synchronization. Everyone wins!
Of course, all those old strategies will have to hang around in the tree for several years while we wait for the new ioctls to be reliably available everywhere. In another 3-5 years or so, we can delete support for all the legacy implicit synchronization mechanisms and we'll finally be living in explicit synchronization nirvana.
Before we wrap up, it's worth addressing one more question. A lot of people have asked me over the last couple years why we don't just plumb explicit synchronization support through Wayland and call it a day. That's how things work on Android, and it worked out okay.
The fundamental problem is that Linux is heterogeneous by nature. People mix and match different components and versions of those components all the time. Even in the best case, there are version differences. Ubuntu and Fedora come out at roughly the same time every 6 months but they still don't ship the same versions of every package. There are also LTS versions which update some packages but not others, spins which make different choices from the main distro, etc. The end result is that we can't just rewire everything and drop in a new solution atomically. Whatever we do has to be something that can be rolled out one component at a time.
This solution allows us to roll out better explicit synchronization support to users seamlessly. Vulkan drivers seamlessly work with compositors which only understand implicit synchronizaiton and, if Wayland compositors pick up sufficient explicit synchronization support, we can transition to that once the compositors are ready. We could have driven this from the Wayland side first and rolled out explicit synchronization support to a bunch of Wayland compositors and said you need a new Wayland compositor if you want to get the fastest possible Vulkan experience. However, that would have been a lot more work. It would have involved a bunch of protocol, adding sync file support to KMS, and touching every Wayland compositor we collectively care about. It would also have been much harder to get 100% transitioned to explicit synchronization because you can only use explicit synchronization without stalling if every component in the entire display path supports it. Likely, had we taken that path, some configurations would be stuck with the old hacky solutions forever and we would never be able to delete that code from Mesa.
There are two other advantages of the kernel ioctl over relying on Wayland protocol. First is that we can check for support on driver initialization. Because of the way Vulkan is structured, we know nothing about the window system when the driver first starts up. We do, however, know about the kernel. If we ever want to have driver features or other behavior depend on "real" explicit synchronization, we can check for these new ioctls early in the driver initialization process and adjust accordingly instead of having to wait until the client connects us to the window system, possibly after they've already done some rendering work.
Second, these new ioctls allow people to write Wayland compositors in Vulkan! We've had the dma-buf import/export APIs in Vulkan for a while but synchronization was left for later. Now that we have these ioctls, a Wayland compositor written in Vulkan can do the same thing as I described above with vkAcquireNextImage()
and vkQueuePresentKHR()
only in reverse. When they get the composite request from the client, they can export the fences from the client's buffer to a sync file and use that as a wait semaphore for their Vulkan composite job. Once they submit the composite job, the completion semaphore for the composite job can then be exported as a sync file and re-imported into each of the clients' buffers. For a Vulkan client, this will be equivalent to if they had just passed VkSemaphore
objects back and forth. For an OpenGL client, this will appear the same as if the compositor were running OpenGL with implicit synchronization.
There you have it! After fighting with the divide between implicit and explicit synchronization with Vulkan on Linux for over seven years, we may finally have some closure. The work's not all done, however. A few of us in the Linux graphics space have a lot of ideas on where we'd like to see synchronization go in the future. We're not to synchronization nirvana quite yet, but this is an important step along the way.
15/01/2025
With VirGL, Venus, and vDRM, virglrenderer offers three different approaches to obtain access to accelerated GFX in a virtual machine. Here…
19/12/2024
In the world of deep learning optimization, two powerful tools stand out: torch.compile, PyTorch’s just-in-time (JIT) compiler, and NVIDIA’s…
08/10/2024
Having multiple developers work on pre-merge testing distributes the process and ensures that every contribution is rigorously tested before…
15/08/2024
After rigorous debugging, a new unit testing framework was added to the backend compiler for NVK. This is a walkthrough of the steps taken…
01/08/2024
We're reflecting on the steps taken as we continually seek to improve Linux kernel integration. This will include more detail about the…
27/06/2024
With each board running a mainline-first Linux software stack and tested in a CI loop with the LAVA test framework, the Farm showcased Collabora's…
Comments (15)
Mikko Rantalainen:
Jun 10, 2022 at 05:14 PM
Great article. I have just one question:
Does this make possible to have multiple clients running in parallel in variable frame rate mode? Games are obviously one typical example but rendering 50 fps video on 60Hz (or even 120 Hz) display results in jitter without ability to run display at different framerates.
And if that's possible, can it handle two monitor case where 50 fps video is on the middle of the seam between the monitors (about half of the frame on one monitor, the rest on the other)?
Reply to this comment
Reply to this comment
Jason Ekstrand:
Jun 10, 2022 at 07:03 PM
No, not directly. If a compositor wants to handle variable refresh better, one strategy for that would be to delay until right before vblank and take whatever app frames are ready. This new ioctl would provide one tool to aid in that. However, you can already wait for a dma-buf to be idle via the poll() ioctl so there's nothing preventing them from implementing that strategy today.
Reply to this comment
Reply to this comment
IC Rainbow:
Jun 13, 2022 at 09:27 AM
Would this work for vulkan-1.2 timeline semaphores too?
Reply to this comment
Reply to this comment
Jason Ekstrand:
Jun 13, 2022 at 02:53 PM
Yes. The DRM syncobj we're using to implement timeline semaphores everywhere allows you to export a particular time point as a sync_file. It's a bit clunky with the current API but it can be done. Once exported as a sync_file, it can be used to signal implicit sync on a dma-buf. Similarly, there is a way to import a sync_file into a syncobj as a new future time point to go the other direction.
Reply to this comment
Reply to this comment
James Jones:
Aug 26, 2022 at 11:56 PM
First of all, I think this work is great, and I'm happy it's being merged. However, I wanted to question a few things you've stated as fact above:
> We've had the dma-buf import/export APIs in Vulkan for a while but synchronization was left for later.
I don't think that's accurate. sync files could be imported to and exported from VkSemaphore objects and exported from VkFence objects long before we had dma-buf support was added to Vulkan. They were included in the original external semaphore/external fence specifications that were ultimately merged into Vulkan 1.1.
> It would have involved a bunch of protocol
This protocol already exists and is on its (WIP) second revision:
https://gitlab.freedesktop.org/wayland/wayland-protocols/-/tree/master/unstable/linux-explicit-synchronization
https://gitlab.freedesktop.org/wayland/wayland-protocols/-/merge_requests/90
While v1 has little to offer over what you've added, I think v2 is preferable where available, being based on DRM syncobj natively.
Granted, I think only Weston has implemented v1. and poorly, as outlined here:
https://github.com/swaywm/wlroots/issues/894#issuecomment-465096345
> adding sync file support to KMS, and touching every Wayland compositor we collectively care about.
DRM-KMS has already had explicit sync support for quite some time. IIRC Weston has support for it, but I haven't checked the source to verify.
I think the primary benefit to your work is the note about its "always-there" property, so it can be used as a fallback when talking to "old" (I.e., all current) compositors or existing versions of Xwayland and used as a tool for incremental deployment, and both of those are great outcomes I don't want to downplay at all. However, I don't think this generalizes to the Vulkan Timeline Semaphore/DRM syncobj case as well as you claim above, nor as I had naively hoped a few years ago. It's my impression you can't extract a sync file for an unsubmitted point on a DRM syncobj's timeline, you can't block in VkQueuePresentKHR() waiting for it to be submitted because that can deadlock your application thread, and you can't defer it to another thread because that violates Wayland protocol/Vulkan interaction rules, so a true explicit synchronization protocol based on DRM syncobj is still needed long term.
Further, as a general rule and the above aside, I don't feel using objects attached to dma-bufs as a substitute for proper userspace IPC is really something to be celebrated as a way to solve the issue of "There are a lot of Wayland compositors of varying quality" issue. It's a solution for this particular issue to be sure, and I'll happily take it, but I think we are going to have to ultimately live with the fact that some compositors are better maintained and support more protocol than others, and Vulkan/GL implementations and especially applications that code to the Wayland or presentation mechanisms directly might perform significantly better or expose significantly more features on the better compositors, unlike in the X11 world where we could more or less rely on most people eventually picking up newer version of the one (or two, briefly) X server implementation anyone cared about (Not to knock Xi Graphics. That was very cool code on a technical level). It would beneficial in many regards if all the major compositors coalesced around a common protocol handling+rendering backend library, but history and the nature of the Linux ecosystem don't suggest this will happen.
Reply to this comment
Reply to this comment
Jason Ekstrand:
Aug 29, 2022 at 04:21 PM
> I don't think that's accurate. sync files could be imported to and exported from VkSemaphore objects and exported from VkFence objects long before we had dma-buf support was added to Vulkan. They were included in the original external semaphore/external fence specifications that were ultimately merged into Vulkan 1.1.
Yes and no. It's true that Vulkan has had sync file import/export for some time. However, most other dma-buf based APIs such as those in EGL and various media APIs have assumed implicit synchronization and many of them have never been updated to allow disabling implicit sync and support importing/exporting sync files. So while it's technically true that Vulkan has supported enough that you can build dma-buf based APIs on top of it, it isn't enough to let you use Vulkan with existing APIs.
> DRM-KMS has already had explicit sync support for quite some time.
Import, yes; export, no. As with early Wayland protocol, explicit sync was only considered in the forward direction where you can provide a sync file for KMS to wait on. If you, for instance, do an async flip and shove in a new buffer, there's no return sync file to tell you when the buffer that was scaning out is complete. The KMS application can wait on various scanout events but there's no way to get it as a sync file or other explicit synchronization primitve.
> I think the primary benefit to your work is the note about its "always-there" property, so it can be used as a fallback when talking to "old" (I.e., all current) compositors or existing versions of Xwayland and used as a tool for incremental deployment, and both of those are great outcomes I don't want to downplay at all. [...] Further, as a general rule and the above aside, I don't feel using objects attached to dma-bufs as a substitute for proper userspace IPC is really something to be celebrated as a way to solve the issue of "There are a lot of Wayland compositors of varying quality" issue.
I think we're actually 100% agreed here that fixing this in compositors is necessary long-term. Much of my commentary about not being able to fix it with protocol really comes down to legacy support as you mentioned. I can't count the number of compositor people who've tried to tell me all my driver woes will be supported with a bit of protocol. When asked how they plan to plumb that all through XWayland which 90% of perf-sensitive Vulkan apps still rely on, I get mostly blank stares. :-)
Long-term, we need a better OS synchronization primitive that's more similar to WDDM2 fences and we need to plumb everything through everywhere. In the short to medium term, we need something that works with the current dma-fence-based syncronization primitives we have today. As much as it may be theoretically better for the Wayland protocol to carry the fences, it doesn't solve all the cases we need to solve in X11 and drivers and so will never be a complete solution. Is this a hack? That depends entirely on whether or not you think implicit sync itself is a hack. This lets explicit sync drivers live in an implicit sync world which is either wonderful or an abomination depending on perspective. :-)
Long-term, we need a better solution and this isn't that solution. I never intended to claim that it is. Unfortunately, I've been tilting at that windmill for years and making little progress. I do hope to make progress on it in the next few years but it's going to involve boiling the ocean.
Reply to this comment
Reply to this comment
James Jones:
Aug 29, 2022 at 07:05 PM
Yes, I find we're generally largely in agreement. Two more notes:
> When asked how they plan to plumb that all through XWayland which 90% of perf-sensitive Vulkan apps still rely on, I get mostly blank stares. :-)
I agree Xwayland continues to be the weakest link in the chain. Fortunately, Erik Kurzinger has been spending quite a bit of time on this. Proposals are coming, and I believe they'll line up well with your thoughts on future directions above.
> [DRM-KMS] Import, yes; export, no.
There is OUT_FENCE_PTR, but I just realized it may not be quite correctly defined for the purpose you're talking about here. It signals when a given commit starts presenting, but perhaps that could be different than when the prior surfaces are free to be re-used? Still, it should generally work as an explicit pre-fence for the "old" framebuffers' dma-bufs/GEM buffers/etc., and if it wasn't intended for this usage, I'm not clear what it was intended for. It's also at the CRTC level rather than the plane level, which seems a little asymmetric compared to the IN_FENCE property, but I think you can logically associate it with the right buffers from userspace even when using multiple planes.
Reply to this comment
Reply to this comment
m4gr4th34:
Mar 27, 2023 at 06:28 PM
Apparently glamor on xwayland with nvidia drivers doesn't work because Userspace Memory Fences has not been implemented in the linux kernel. Is there any timeline on that?
Reply to this comment
Reply to this comment
Faith Ekstrand:
Mar 27, 2023 at 07:37 PM
That's a very long-term project. We're multiple years out from having a design that's ready to land upstream. Also, it's not really fair to say that that's the blocker. NVIDIA could implement proper implicit sync in their driver stack if they chose to. They've chosen not to.
Reply to this comment
Reply to this comment
m4gr4th34:
Mar 27, 2023 at 09:16 PM
Thanks for the clarification, I appreciate it. Nvidia and doing the right thing ... I guess I should prepare to move to arch, windows, or amd.
Reply to this comment
Reply to this comment
Faith Ekstrand:
Mar 27, 2023 at 09:33 PM
There is a project underway to reboot the open-source NVIDIA driver stack and build competent open-source drivers. It won't be ready for production for probably another 6 months to a year but it's underway.
https://www.collabora.com/news-and-blog/news-and-events/introducing-nvk.html
Reply to this comment
Reply to this comment
m4gr4th34:
Mar 27, 2023 at 10:01 PM
Thank you, 6 months is a bit far off.
Reply to this comment
Reply to this comment
Morris:
Jun 25, 2023 at 03:49 PM
The best way to fix the gap deals with the whole elimination of implicit sync, so to modernize Linux. All other solutions are a palliative which increase the complexity. Linux developers must cooperate to make possible this transition.
Reply to this comment
Reply to this comment
Faith Ekstrand:
Jul 05, 2023 at 10:03 PM
No one is arguing that implicit sync is a good long-term solution. There may have been a few hold-outs for a while, but they shut up years ago. However, switching to full explicit sync is a years-long project and this is a piece of that transition. This isn't Windows where Microsoft can roll out a new kernel and userspace and compositor and force IHVs to roll out new drivers all at the same time like they did with Windows 8 and WDDM2. We need to keep everything working as we transition one piece at a time. This piece improves Vulkan drivers today but also gives us an important tool for moving kernel and userspace drivers over to a fully explicit sync model while keeping existing compositors working. We also need to move compositors over to explicit sync but it needs to remain decoupled from drivers or it won't happen.
Reply to this comment
Reply to this comment
Morris:
Jul 06, 2023 at 08:58 PM
Drivers are ready, Vulkan is natively explicit sync and Wayland is ready too. What is not ready are the compositors, except for Weston. Once compositors are ready, Linux operating systems will be compliant to complete the transition to Wayland without any necessity of Xorg and Opengl. Older machine? Older maintained Xorg operating systems
Reply to this comment
Reply to this comment
Add a Comment