We're hiring!
*

Venus on QEMU: Enabling the new virtual Vulkan driver

Antonio Caggiano avatar

Antonio Caggiano
November 26, 2021

Share this post:

Reading time:

With virtualization we can create multiple virtual machines over a single physical computer. The benefits of virtualization are countless, from being able to create virtual representation of different machines, to efficiently use the currently available hardware. Clearly a virtual machine, like any real computer, needs an operating system (OS). In this context it is called a Guest OS, as opposed to the one running on real hardware, called Host OS.

Running graphics applications in a Guest OS can be annoying as they are generally greedy of computing resources, and that can slow you down or give you a bad experience in terms of graphics performance. Being able to accelerate all this by offloading the workload to the hardware can be a great deal. The VirtIO-GPU virtual GPU device comes into play here, allowing a Guest OS to send graphics commands to it through OpenGL or Vulkan. While we are already there with OpenGL, we can not say the same for Vulkan. Well, until now.

Jump to a section: OverviewDefinitionsPrerequisitesCreate an image for QEMURunning QEMUTesting Venus | Troubleshooting | Conclusions


Overview

This blog post describes how to enable 3D acceleration of Vulkan applications in QEMU through the Venus experimental Vulkan driver for VirtIO-GPU with a local development environment.

As an alternative you could cherry-pick this commit which contains a set of scripts you could use to set up a Docker development environment.

Definitions

Let us start with a brief description of the projects mentioned in this post:

  • QEMU is an open-source machine emulator, and we will use it to run an Ubuntu guest operating system and take advantage of the VirtIO-GPU device available in the virtual machine.
  • VirGL is an OpenGL driver for VirtIO-GPU, available in Mesa.
  • Venus is an experimental Vulkan driver for VirtIO-GPU, also available in Mesa.
  • Virglrenderer is a library that enables hardware acceleration to VM guests, effectively translating commands from the two drivers just mentioned to either OpenGL or Vulkan.

Prerequisites

The following snippets are prefixed by either (host) or (guest) to specify where they should run. Of course, in order to run something in the guest, you should have QEMU and an image already in place.

Host

  1. Venus requires BLOB resources support in QEMU, which in turns requires /dev/udmabuf. This is not enabled in the default Debian kernel, so make sure your kernel was built with CONFIG_UDMABUF.

    Please note that you could encounter the following error with kvm on AMD when enabling BLOB support: error: kvm run failed Bad address.

  2. Clone virglrenderer res-sharing branch from FDO/fahien and compile it with:

    (host)
    
    git clone -b res-sharing https://gitlab.freedesktop.org/Fahien/virglrenderer.git
    
    cd virglrenderer
    
    meson build \
        -Dprefix=$HOME/.local \
        -Dplatforms=egl \
        -Dvenus-experimental=true \
        -Dminigbm_allocation=false \
        -Dbuildtype=debugoptimized
    
    ninja -C build install
    
  3. Clone QEMU branch venus-dev from FDO/fahien. Then configure and compile it enabling OpenGL, VirGL, GTK (or SDL if you prefer this frontend):
    (host)
    
    git clone -b venus-dev https://gitlab.freedesktop.org/Fahien/qemu.git
    
    cd mesa
    
    mkdir build && cd build
    
    ../configure                   \
      --prefix=$HOME/.local        \
      --target-list=x86_64-softmmu \
      --enable-kvm                 \
      --disable-werror             \
      --enable-opengl              \
      --enable-virglrenderer       \
      --enable-gtk                 \
      --enable-sdl
    
    make -j4 qemu-system-x86_64 && make install
    

Guest

  • Linux kernel v5.16-rc1+.

  • Mesa version 21.1+, configured with meson -Dvulkan-drivers=virtio-experimental.

  • Install vulkan-utils and run vulkaninfo | grep driver to get some info on the available vulkan drivers.

  • Test vkcube.

Create an image for QEMU

You will need to provide QEMU an image. Here is an example of how to make one.

(host)

ISO=ubuntu-21.04-desktop-amd64.iso
wget https://releases.ubuntu.com/21.04/$ISO

IMG=ubuntu.qcow2
qemu-img create -f qcow2 $IMG 16G

# Start ubuntu installation by booting from CD-ROM.
# No need for graphics acceleration at the moment.
qemu-system-x86_64                  \
    -enable-kvm                     \
    -M q35                          \
    -smp 1                          \
    -m 4G                           \
    -net nic,model=virtio           \
    -net user,hostfwd=tcp::2222-:22 \
    -hda $IMG                       \
    -display gtk                    \
    -boot d -cdrom $ISO

Running QEMU

Running with -d guest_errors will show error messages from the guest.

(host)

qemu-system-x86_64                                               \
    -enable-kvm                                                  \
    -M q35                                                       \
    -smp 1                                                       \
    -m 4G                                                        \
    -cpu host                                                    \
    -net nic,model=virtio                                        \
    -net user,hostfwd=tcp::2222-:22                              \
    -hda $IMG                                                    \
    -device virtio-vga-gl,context_init=true,blob=true,hostmem=4G \
    -vga none                                                    \
    -initrd /image/rootfs.cpio.gz                                \
    -kernel /kernel/arch/x86_64/boot/bzImage                     \
    -append "root=/dev/sda3 nokaslr"                             \
    -display gtk,gl=on,show-cursor=on                            \
    -usb -device usb-tablet                                      \
    -object memory-backend-memfd,id=mem1,size=4G                 \
    -machine memory-backend=mem1                                 \
    -d guest_errors

OpenGL-enabled GTK3 display

VirtIO VGA GL (-device virtio-vga-gl) requires OpenGL support by the current QEMU display, which can be enabled with the following cli option -display gtk,gl=on.

For some reason, we hit a GTK assertion due to a failure in gtk_widget_get_realized(). The solution is to run QEMU with -vga none to avoid having two scanouts, one for VGA and another for virtio-vga-gl.

Build the kernel

I made a custom config (x86_64.config) to build VirtIO-GPU and DRM within the kernel with debug info.

(host)

git clone --depth 1 -b v5.16-rc1 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git kernel

cd kernel

./scripts/kconfig/merge_config.sh arch/x86/configs/x86_64_defconfig x86_64.config

make -j12 vmlinux bzImage

Starting Qemu with our custom kernel can be done by setting the current command line options:

-kernel arch/x86_64/boot/bzImage \
-inintrd ramdisk.img \
-append "root=/dev/sda3" \

You can create ramdisk.img by running mkinitramfs -o ramdisk.img

Testing Venus

Make sure VirGL is correctly detected and used by running the following:

(guest) glxinfo -B

If it outputs llvmpipe instead, build mesa with this configuration:

(guest)

git clone -b qemu-venus https://gitlab.freedesktop.org/Fahien/mesa.git

cd mesa

meson build                                   \
  -Dprefix=/usr                               \
  -Ddri3=enabled                              \
  -Dglx=dri                                   \
  -Degl=enabled                               \
  -Dgbm=enabled                               \
  -Dgallium-vdpau=disabled                    \
  -Dgallium-vs=disabled                       \
  -Dvalgrind=disabled                         \
  -Dbuildtype=debugoptimized                  \
  -Ddri-drivers=[]                            \
  -Dgallium-drivers=swrast,virgl              \
  -Dvulkan-drivers=swrast,virtio-experimental \
  -Dvulkan-layers=device-select

ninja -C build install

Then compile and and run vkcube to test Venus, making sure to tell mesa the correct Vulkan ICD file name:

(guest)

sudo apt install meson build-essential libdrm-dev libgbm-dev libpng-devibwayland-dev libxcb1-dev libvulkan-dev

git clone https://github.com/krh/vkcube.git

cd vkcube

meson build && meson compile -C build

VK_ICD_FILENAMES=/usr/shared/vulkan/icd.d/virtio_icd.x86_64.json build/vkcube

Troubleshooting

At this point the venus driver should be correctly loaded, and debug messages can be enabled by setting the VN_DEBUG environment variable to one of the following: init, result, vtest, or wsi.

Debugging the kernel

  1. Run QEMU with arguments -s -S:
    • -S stops qemu waiting for gdb
    • -s starts a gdb server at localhost:1234
  2. Your ~/.gdbinit should contain this (make sure it does not point directly to scripts/gdb/vmlinux-gdb.py):

    add-auto-load-safe-path /path/to/linux/vmlinux-gdb.py
    
  3. Run gdb and attach to QEMU gdb server:

    (host)
    
    gdb vmlinux
    (gdb) target remote :1234
    (gdb) hbreak start_kernel
    (gdb) c
    

VsCode debugger

You can use VSCode Debug UI thanks to the Native Debug extension, attaching to gdbserver target :1234, and autorun: [ "hbreak kernel_init" ].


GDB

Sometimes your only choice is to debug with GDB, therefore here are some useful commands to keep in mind.

Command Description
bt Print backtrace
f Print the current frame
f <n> Change to frame number
list Print out a bunch of lines of code around the current instruction pointer
b <func_name> Set a breakpoint
stepi Step into a function
n Step over to the next instruction within the current function
fin Step out of a function
delete Delete all breakpoints
delete <n> Delete breakpoint number
info sharedlibrary Show list of loaded library

Vulkan loader

If your libvulkan.so fails with a segmentation fault, it would be a good idea to build it from source and debug it with gdb. Make sure to checkout a version in line with your libvulkan-dev.

(guest)

git clone https://github.com/KhronosGroup/Vulkan-Loader.git vulkan-loader
cd vulkan-loader
git checkout v1.2.162

Conclusions

While you can enable Vulkan hardware acceleration by checking out the development branches following this guide, there is still further work to do on Virglrenderer and QEMU for proper upstreaming, which might need some time to complete. Virgilrenderer needs to fix resource import/export between OpenGL and Vulkan contexts, and QEMU needs various patches currently under review:

To sum up, if you need assistance with graphics virtualization, we would be happy to help, so please do not hesitate to contact us.

On the other hand, if you are a developer and would be thrilled to work on the open source Linux graphics stack, check out our careers page!

Comments (25)

  1. Trigger Huang:
    Dec 15, 2021 at 11:46 AM

    Hello,

    >>Please note that you could encounter the following error with kvm on AMD when enabling BLOB support:
    Unfortunately, I saw this error on a KVM+INTEL CPU + AMDGPU platform. would you help?

    I saw a error 'error: kvm run failed Bad address' when running vkcube in the guest VM after strictly followed all the steps.
    However, vulkaninfo on guest shows that my enviroment is good
    Virtio-GPU Venus (AMD RADV NAVI14 (ACO))

    My system:
    CPU: Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
    MemTotal: 16288232 kB
    GPU on Host: AMD Radeon Pro W5500]

    Reply to this comment

    Reply to this comment

    1. Antonio Caggiano:
      Dec 16, 2021 at 03:47 PM

      Hi, we are aware of this issue, unfortunately it is a configuration we have not tested with and somebody would need to debug it. I can not give you an estimate as the change list is still under the review process.

      Reply to this comment

      Reply to this comment

      1. Trigger Huang:
        Dec 17, 2021 at 11:57 AM

        Hi Antonio,
        Thank you for the quick response.
        This article is still helpful for me as I managed to set up the VirGL rendering for OpenGL in guest VM without any extra steps. :) :) :)
        BTW, could you share the recommended system configuration for the current Venus?

        Reply to this comment

        Reply to this comment

        1. Antonio Caggiano:
          Dec 17, 2021 at 03:13 PM

          Awesome!
          My testing machine has an Intel x86_64 processor with integrated GPU.

          Reply to this comment

          Reply to this comment

          1. Janboe Ye:
            Jun 04, 2022 at 04:40 AM

            Do you have chance to test on nvidia dGPU? it reports 'Virgl blob create error: Unknown error -22' on my 1070 GPU

            Thanks

            Reply to this comment

            Reply to this comment

            1. Antonio Caggiano:
              Jun 14, 2022 at 10:20 AM

              Hi Janboe Ye,

              Unfortunately I do not have a Nvidia dGPU at the moment. I would try with debugging QEMU and virglrenderer, stepping into virglrenderer.c:virgl_renderer_resource_create_blob() to see exactly what is going on.

              Cheers!

              Reply to this comment

              Reply to this comment

  2. Mitchel Stewart:
    Dec 19, 2021 at 04:11 AM

    Great work, can't wait to see this get uptreamed, lots of cool projects can be done with this

    Reply to this comment

    Reply to this comment

  3. Trigger Huang:
    Jan 13, 2022 at 05:53 AM

    Happy new year:)

    This week I got a chance to debug this issue on AMD dGPU.
    This issue happened in the following scenario:
    1, Mesa Vulkan in guest VM request to create and map blob resource (GPA is allocated from BAR4 of Virtio GPU PCI dev)
    2, Host Qemu create AMDGPU BO and export it by vkGetMemoryFdKHR(RADV driver) in virgl_renderer_resource_create_blob() to get the FD
    3, Host qemu call mmap for this FD to get HVA of this BO in virgl_renderer_resource_map()
    4, With the HVA and GPA, host qemu will call kvm_set_user_memory_region() to insert this guest memory region into KVM
    5, AMDGPU TTM driver will allocate the host pages of this BO when page fault happened
    6, kvm_mmu_page_fault() in ept_violation() will be called to setup the EPT page table for this guest memory region. For the first page of this region, EPT is setup well and guest can access it with GVA successfully. But kvm_mmu_page_fault() failed on the second page, then Qemu will report this 'kvm run failed Bad address' error due to the EPT page table is not set successfully.

    The root cause:
    kvm_try_get_pfn() in kvm_mmu_page_fault() failed due to the second page has a refcount of zero
    After check TTM driver, alloc_pages() will be called to allocate host pages, and it will only call set_page_refcounted(page) for this first page.

    Fix:
    I have a workaround patch to fix it in case anyone who wants to have a quick try Venus on AMD dGPU.

    drm/amdgpu: increase ref count for pages from TTM

    Signed-off-by: Trigger Huang
    ---
    drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 8 ++++++--
    1 file changed, 6 insertions(+), 2 deletions(-)

    diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
    index c875f1cdd..6d7664a1f 100644
    --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
    +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
    @@ -1143,8 +1143,10 @@ static int amdgpu_ttm_tt_populate(struct ttm_device *bdev,
    if (ret)
    return ret;

    - for (i = 0; i < ttm->num_pages; ++i)
    + for (i = 0; i < ttm->num_pages; ++i) {
    ttm->pages[i]->mapping = bdev->dev_mapping;
    + page_ref_inc(ttm->pages[i]);
    + }

    return 0;
    }
    @@ -1174,8 +1176,10 @@ static void amdgpu_ttm_tt_unpopulate(struct ttm_device *bdev,
    if (ttm->page_flags & TTM_TT_FLAG_EXTERNAL)
    return;

    - for (i = 0; i < ttm->num_pages; ++i)
    + for (i = 0; i < ttm->num_pages; ++i) {
    ttm->pages[i]->mapping = NULL;
    + page_ref_dec(ttm->pages[i]);
    + }

    adev = amdgpu_ttm_adev(bdev);
    return ttm_pool_free(&adev->mman.bdev.pool, ttm);

    According to the patch f8be156be163a052a067306417cd0ff679068c97 in kernel KVM, due to CVE-2021-22543, KVM does not allow mapping valid but non-reference-counted pages

    So, set_page_refcounted() should be called for each page of pages from alloc_pages()in TTM.
    Maybe we need talk with people

    diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
    index 72c4e6b39..043c97b3c 100644
    --- a/virt/kvm/kvm_main.c
    +++ b/virt/kvm/kvm_main.c
    @@ -2382,8 +2382,10 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
    * would then underflow the refcount when the caller does the
    * required put_page. Don't allow those pages here.
    */
    - if (!kvm_try_get_pfn(pfn))
    - r = -EFAULT;
    + if (!kvm_try_get_pfn(pfn)) {
    + //r = -EFAULT;
    + printk("Not force EFAULT: %s %d r = %d, pfn = 0x%016llx\n", __FUNCTION__, __LINE__, r, pfn);
    + }


    Reply to this comment

    Reply to this comment

    1. Alex:
      Apr 03, 2022 at 09:05 AM

      Hi Trigger Huang,

      Is this the fix for the note above?
      "Please note that you could encounter the following error with kvm on AMD when enabling BLOB support: error: kvm run failed Bad address."

      I'm on a Ubuntu host with HWE kernel 5.13.0-39-generic amdgpu Vega dGPU and experiencing this after a few moments in both a Fedora 36 guest (kernel 5.17 + mesa 21.x) and Ubuntu 22.04 guest (kernel 5.15 + mesa 22.0).

      Cheers!

      Reply to this comment

      Reply to this comment

      1. Trigger Huang:
        May 30, 2022 at 02:41 AM

        Hi Alex,

        Sorry for the late response as I didn't check this thread recently.
        Yes, the workaround patch, increase the ref count for pages from TTM, for the host AMDGPU driver should fix the issue: when enabling BLOB support: error: kvm run failed Bad address.", This workaround has nothing to do with the ASIC family (Vega, or Navi)

        Thanks,
        Trigger

        Reply to this comment

        Reply to this comment

    2. Mitchel Stewart:
      Jun 18, 2022 at 11:58 AM

      has this been reported upstream? it would be nice if they were made aware.

      Reply to this comment

      Reply to this comment

    3. Fafa Kitten:
      Jun 26, 2022 at 11:26 AM

      Thank you for making this patch!! I had the error: kvm run failed Bad address and using a kernel I compiled with this patch made it go away and now Venus device appears in my guest! AMD Radeon RX 5700 XT

      Reply to this comment

      Reply to this comment

  4. Lorna McNeill:
    Jan 17, 2022 at 03:04 PM

    Hi Antonio, can you explain why you used -vga none with -device virtio-vga-gl?

    Reply to this comment

    Reply to this comment

    1. Antonio Caggiano:
      Jan 17, 2022 at 06:27 PM

      IIRC, this was needed due to a limitation with supporting multiple QEMU scanouts. Without specifying -vga none, I believe QEMU would create two scanouts, "scanout-0" for the standard display device (-vga std) and "scanout-1" for the VirtIO-based display with virgl support (-device virtio-vga-gl). While working on this, I noticed the latter would only work on scanout-0. The quickest workaround for me was to just disable the standard display device.

      Reply to this comment

      Reply to this comment

  5. Trigger Huang:
    Jan 18, 2022 at 10:01 AM

    Hi Antonio,
    I can't do the single step debug for the guest kernel after follow your instructions, would you help?
    Each time I input gdb command 'n' or 's', I always got a interrupt of '__sysvec_apic_timer_interrupt'

    Thread 1 hit Breakpoint 1, virtio_gpu_execbuffer_ioctl (dev=0xffff888100ae1000, data=0xffffc90000aa7e50, file=0xffff88810201c000) at drivers/gpu/drm/virtio/virtgpu_ioctl.c:121
    121 struct virtio_gpu_device *vgdev = dev->dev_private;
    (gdb) n
    __sysvec_apic_timer_interrupt (regs=0xffffc90000aa7d18) at arch/x86/kernel/apic/apic.c:1102
    1102 trace_local_timer_entry(LOCAL_TIMER_VECTOR);
    (gdb) c
    Continuing.

    Thread 1 hit Breakpoint 1, virtio_gpu_execbuffer_ioctl (dev=0xffff888100ae1000, data=0xffffc90000aa7e50, file=0xffff88810201c000) at drivers/gpu/drm/virtio/virtgpu_ioctl.c:121
    121 struct virtio_gpu_device *vgdev = dev->dev_private;
    (gdb) n
    __sysvec_apic_timer_interrupt (regs=0xffffc90000aa7d18) at arch/x86/kernel/apic/apic.c:1102
    1102 trace_local_timer_entry(LOCAL_TIMER_VECTOR);
    (gdb) bt
    #0 __sysvec_apic_timer_interrupt (regs=0xffffc90000aa7d18) at arch/x86/kernel/apic/apic.c:1102
    #1 0xffffffff81c11a89 in sysvec_apic_timer_interrupt (regs=0xffffc90000aa7d18) at arch/x86/kernel/apic/apic.c:1097
    Backtrace stopped: Cannot access memory at address 0xffffc90000004008

    Reply to this comment

    Reply to this comment

    1. Antonio Caggiano:
      Jan 18, 2022 at 03:12 PM

      Hi Trigger, I would try with enabling this option:
      make menuconfig
      > Processor type and features >
      [*] Support x2apic

      Reply to this comment

      Reply to this comment

      1. Trigger Huang:
        Jan 19, 2022 at 03:34 AM

        Hi Antonio,

        Unfortunately, I still saw this issue after enable x2apic on both host & guest kernel. :)
        CONFIG_X86_X2APIC=y
        The single step can only work well inside function __sysvec_apic_timer_interrupt()

        Reply to this comment

        Reply to this comment

        1. Antonio Caggiano:
          Jan 19, 2022 at 04:59 PM

          Another thing you can try is this:
          https://stackoverflow.com/questions/64961631/how-to-skip-timer-interrupt-while-debugging-linux
          If this does not work either, I am afraid your best options would be to just set breakpoints on the lines you want and hit C.

          Reply to this comment

          Reply to this comment

          1. Trigger Huang:
            Jan 20, 2022 at 06:11 AM

            Hi Antonio,

            Thanks for pointing out this link. This method helped me a lot, and now GDB for guest kernel worked much better than before.

            Reply to this comment

            Reply to this comment

  6. Reggie:
    Jan 28, 2022 at 06:01 PM

    how to get this driver to work in chrome os crostini with the default debian container?

    Reply to this comment

    Reply to this comment

  7. Mitchel Stewart:
    Jun 11, 2022 at 06:21 PM

    Is this still being worked on? it would be something very nice to see working in qemu. and if so is there anywhere we can follow the progress of this and test it as it gets worked on?

    Reply to this comment

    Reply to this comment

  8. DocMAX:
    Feb 13, 2023 at 12:28 PM

    Can somebody create a PKGBUILD for Arch Linux? I get compile errors with the qemu source mentioned here. Thanks.

    Reply to this comment

    Reply to this comment

  9. WB:
    Mar 05, 2023 at 01:55 AM

    Hi Antonio,

    that is really cool.

    What is status of upstreaming qemu changes?

    I checked qemu-devel pages, and see only v2 patch series, but nothing after that. Also the Fab

    Reply to this comment

    Reply to this comment

    1. Daniel Stone:
      Mar 06, 2023 at 02:01 PM

      We've been coming back to this and expect to be able to post our changes up in the next month or two.

      Reply to this comment

      Reply to this comment


Add a Comment






Allowed tags: <b><i><br>Add a new comment:


Search the newsroom

Latest Blog Posts

Faster inference: torch.compile vs TensorRT

19/12/2024

In the world of deep learning optimization, two powerful tools stand out: torch.compile, PyTorch’s just-in-time (JIT) compiler, and NVIDIA’s…

Mesa CI and the power of pre-merge testing

08/10/2024

Having multiple developers work on pre-merge testing distributes the process and ensures that every contribution is rigorously tested before…

A shifty tale about unit testing with Maxwell, NVK's backend compiler

15/08/2024

After rigorous debugging, a new unit testing framework was added to the backend compiler for NVK. This is a walkthrough of the steps taken…

A journey towards reliable testing in the Linux Kernel

01/08/2024

We're reflecting on the steps taken as we continually seek to improve Linux kernel integration. This will include more detail about the…

Building a Board Farm for Embedded World

27/06/2024

With each board running a mainline-first Linux software stack and tested in a CI loop with the LAVA test framework, the Farm showcased Collabora's…

Smart audio filters with WirePlumber 0.5

26/06/2024

WirePlumber 0.5 arrived recently with many new and essential features including the Smart Filter Policy, enabling audio filters to automatically…

Open Since 2005 logo

Our website only uses a strictly necessary session cookie provided by our CMS system. To find out more please follow this link.

Collabora Limited © 2005-2024. All rights reserved. Privacy Notice. Sitemap.