We're hiring!
*

Trimming apitrace workload captures for better Mesa testing

Gert Wollny avatar

Gert Wollny
February 01, 2021

Share this post:

Reading time:

Apitrace is a very good tool for recording the OpenGL calls of applications so that one can debug OpenGL issues and test performance and the correctness of the rendering without having access to the application. The latter is already done in the mesa CI. However, especially with games, because of intros, menus, and loading screens the traces can become rather large before the actual, interesting rendering happens. This puts a heavy computational burden on the CI runners that could be avoided if the trace could be trimmed to just render the frame(s) of interest. As an alternative RenderDoc exists that can be use to capture and replay exactly one frame, but it doesn’t support compatibility contexts, a feature that is used by many games.

Enters gltrim, an apitrace tool that has recently been added to apitrace and that is designed to trim traces to user selected target frame(s). Other than with the already available trim tool, its resulting traces can always be replayed. However, given the complexity of the object relationships that can be created in OpenGL it is not guaranteed that the rendering produced by the trace is also correct. For this gltrim provides means to keep additional states and setup frames. Below you’ll find a few notes about the possible object inter-dependencies in OpenGL, a sketch of how gltrim works, and a few example how it is applied in practice.

About OpenGL object interdependencies

OpenGL offers many ways to fill textures with data:

  • Upload the data directly,
  • upload from the buffer bound to the PIXEL_UNPACK_BUFFER target,
  • bind the texture to a framebuffer and draw to it or blit from another framebuffer,
  • copy directly from a readbuffer that is either bound to a framebuffer,
  • object or is one of the default drawbuffers, or
  • manipulate textures with shaders by binding them as images.

The output to framebuffers is also dependent on many states that need to be tracked for each draw call. Considering that these so created textures can then again be used as a source for draw calls that manipulate or fill other textures, the dependency chains can be limitless.

Buffer data can also be changed in various ways: Apart from direct uploads, the contents of buffers can be manipulated by using shaders, or by copying the data from one buffer to another. With OpenGL 4.4 buffers can be mapped persistently, so that memory copies need to be tracked. Finally, the rendering output may not be cleared at each frame start, so that the actual displayed output is the result of the rendering done in multiple frames. Given all these possible inter-dependencies between OpenGL objects, doing an exhaustive tracking of the objects and states required to reproduce a certain frame from scratch can be very time and memory intensive.

RenderDoc works around this by actually executing the OpenGL so that when a frame is captured, it does not contain all calls to re-created the output from scratch, but also intermediate textures and buffer data. In the worst case one might capture a frame that recreates the output by only uploading texture data and doing a blit.

How gltrim trims traces

With the approach implemented in gltrim the aim is to re-create the output from scratch, always keeping all the needed draw calls and state changes. Early tries to do an exhaustive tracking did not work well: On one hand, they were quite slow, and needed a lot of memory, and on the other hand, the fact that some games display content over a number of frames made the approach where always only the target frames were specified quite unreliable.

Instead an alternative was implemented, that accepts that one might not get it right the first time when trying to trim a trace. Here, all OpenGL objects are tracked, but only a lightweight tracking of dependencies is done. Some setup calls, like the calls to create the rendering context, are always copied to the output trace. When the first target frame of one or more continuous frames is analysed, the calls of the current states and the state of the currently bound objects is copied to the output. Then, whenever an object is bound all the calls that are required to reproduce its state and also those of directly dependent object are copied.

With this method trimming a trace is pretty fast and has a low memory footprint. While the methods with more exhaustive tracking may take hours to complete, here a trimming a trace is a matter of minutes, or for short traces even only seconds. The output trace consists of one frame doing all the setup calls, and the target frame(s). When the trimming is done, qapitrace can and should be used to check the visual output and if the result is not correct, there are various options to improve it: On one hand one can specify that all simple state changes are kept. This especially helps with errors in the brightness of the output. The burden on the size of the trace are usually low, and also the final run-time of the trace does not increase a lot. If whole objects are missing in the output gltrim has the option to add extra frames to the setup frame. Usually, when a scene changes in a game, for instance, a new level is loaded, objects are created and data is uploaded. These frames can often be identified easily because they consists of more average number of OpenGL calls, and gltrim offers an option to list such frames when trimming is complete. By specifying these additional frames as setup frames, they are copied boilerplate (expect for the frame-end buffer swap) to the output setup frame and so are the calls of dependencies of objects used in these additional frames.

To obtain a trace that re-creates the intended output correctly, usually some experimenting is needed, but given the relatively short turnaround time for trimming a trace, this is a feasible approach.

Examples on how to use gltrim

The simple case

In the best case the application of gltrim is straight forward: In the first example, a trace of the game Doom3 running on the Open Source implementation dhewm was taken and trimmed:

  apitrace gltrim -f 666 -o doom3_trim_frame666.trace doom3.trace 

In this case the original trace was trimmed from a file size of 300 MB down to 32 MB with the target frame 666 rendering like in the original trace.

When the states don't seem to be right

For a traces taken from Alien Isolation the naive approach to trimming doesn’t result in a trace correctly reproducing the original image, that is, running

   apitrace gltrim -f 2000 -o AI_trim_frame2000.trace AI.trace

results in some textures not being rendered with the correct brightness (Figure 1).

Figure 1: Original rendering (left), naive trimming (right), not the lower intensities in some areas.


Trimming the trace by keeping all state changes fixes the problem and is like

apitrace gltrim -f 2000 -k -o AI_trim_keep_states_frame2000.trace AI.trace

Geometry is missing

Sometimes when trimming traces in the naive way geometry goes missing (Figure 2).

Figure 2: Original rendering (left) versus naive trimming (right). Geometry is missing the trimmed output because the background might have been drawn in another frame and was not retained.


Here one might analyse the trace for large frames that usually contain extra setup calls that might contain the creation of the missing parts. To obtain a list of the frames with the most OpenGL calls:

	gltrim -f 4000 -t 10 Unturned.trace -o Unturned-virgl-f4000.trace 

resulting in an output like:

...
Calls per frame:
  Frame[3334] = 56743
  Frame[3806] = 42453
  Frame[3813] = 33820
  Frame[3795] = 26747
  Frame[3814] = 24379
  Frame[0] = 6145
  Frame[2032] = 5848
  Frame[3817] = 5392
  Frame[3934] = 5313
  Frame[3937] = 5282

With this information one can start to add additional setup frames. Here the calls with a significantly larger number closest to the target frame are frames 3806, 3813, and 3814. Only adding the very large frame 3806 is not sufficient (see Figure 3, right), but it is required to get the geometry right. Adding frame 3814 too finally results in correct rendering.

Figure 3: Rendering after adding the frame with the highest number of GL calls, 3806, to the setup frame (left), and also the large frame closest to the target frame, 3814 (right). Note, only adding frame 3806 as setup frame corrects the geometry, but the shading of the mountains compared to the origional (Figure 2, left) is still off, adding another setup frame fixes this (right).

Output is created over time (1)

Civilization V creates the board tiles over time using the same combination of texture and framebuffer object so that tracking the dependency becomes difficult. Direct trimming results in uninitialized tile memory (Figure 4, upper right). Adding setup frames starting from the frame right after the loading screen removes the error. In this case in total of 45 additional frames are required to get correct rendering.

Figure 4: Civilization V: rendering of original trace (upper left) and after trimming to one frame without setup frames (upper right), with 15 setup frames starting at a large frame (middle left), 25 setup frames (middle right). 35 setup frames (lower left), and finally with 45 setup frames (lower right), which finally results in correct rendering.

Output is created over time (2)

Another example of output created over time is motion blur, where the rendering output is accumulated over a series of frames. An example where this occurs is SOMA (Figure 5)

Figure 5: Original (upper, left), trimming to target frame only (upper right), trimming to target frame only keeping all states (lower left), adding one setup frame and all states (lower right). This latter output seems close but is is actually not correct.


Here keeping the whole motion blurred scene might be needed to get the pixel-exact same rendering.

Conclusion

gltrim provides the means to trim apitrace traces to contain only a few frames that can still render correctly. It is a semi-automatic tool that requires some manual interaction to check and optimize the output, but the obtained trimmed traces have a significantly lower size (depending on the input trace as low as 10% of the original trace), and a lower run-time (usually about half the time the original trace trimmed to end at the target frame needs to run, sometimes even less).

By removing the intro and menu frames the resulting traces are way more useful for doing performance analysis, and by reducing the memory footprint and the run time, it becomes more feasible to use traces of resource-intensive games in the Mesa CI. In theory, it should be possible to wrap the tool in some script that automatically pick setup frames and options optimize the output, one thing we might look into in the future.

Games used for tracing

  • Doom3, Id Games (source port dhewm3)
  • Alien Isolation Creative Assembly, Feral Interactive (Linux and MacOSX port)
  • Unturned, Smartly Dressed Games
  • Civilization V, Firaxis Games, Aspyr (Linux and MacOSX port)
  • Soma, Frictional Games

Comments (5)

  1. Martin Peres:
    Feb 04, 2021 at 01:16 PM

    Gert, you rock for making it not only work for actual games, but upstreaming it too!

    I'm looking forward to playing with it :)

    Reply to this comment

    Reply to this comment

  2. alphaprog:
    Apr 15, 2021 at 04:06 AM

    Is the retained/saved opengl state (while trimming the log) sufficient to enable the application continuing its execution after the last replayed frame? Supposing that we choose to keep only the last frame (gltrim -k $last_frame fullTrace)

    Reply to this comment

    Reply to this comment

    1. Gert Wollny:
      Apr 16, 2021 at 08:38 AM

      I'm not sure whether I understood the question correctly. The application is not actually running when you replay a trace, instead one of the apitrace replay tools is running and issuing the OpenGL commands as recorded in the trace, which means that after replaying the trace the tool exits and with the trimmed trace it simply exists after the last frame the trace was trimmed to.

      Now as for the states retained by using the "-k" flag, this just keeps all the "cheap" state calls in the trimmed trace (cheap in the sense that they don't pass large amounts of data and don't directly result in draw calls)., but that has nothing to do with running the original application.

      Reply to this comment

      Reply to this comment

  3. alphaprog:
    Apr 16, 2021 at 06:32 PM

    Thank you for your answer, my bad for not giving more details.

    I was thinking of interfacing the gltrim capability with a checkpoint-restart software.
    The checkpoint software will use apitrace to trace the opengl calls, and then restarts using the ckpt-restart software and the gltrim to replay only the last frame (instead of replaying all the frames ...)

    So in this case:
    Do you think that the ``cheap'' retained state will be enough so the application could continue its execution normally after a restart (i.e; with a correct/valid opengl state)?

    Reply to this comment

    Reply to this comment

    1. Gert:
      Apr 16, 2021 at 06:58 PM

      It may work in some cases, but it more likely to fail, because of the things that are not kept: Shaders, textures, and buffers that were (pre-)loaded, but are not used in the specified target frame. So if these are used later in the application they will be missing - think of a 3D scene that doesn't display certain things in the target frame, so the related textures, shaders etc, are dropped, but then, e.g. when turning around, these objects should be displayed, but they would now be missing.

      Reply to this comment

      Reply to this comment


Add a Comment






Allowed tags: <b><i><br>Add a new comment:


Search the newsroom

Latest Blog Posts

The state of GFX virtualization using virglrenderer

15/01/2025

With VirGL, Venus, and vDRM, virglrenderer offers three different approaches to obtain access to accelerated GFX in a virtual machine. Here…

Faster inference: torch.compile vs TensorRT

19/12/2024

In the world of deep learning optimization, two powerful tools stand out: torch.compile, PyTorch’s just-in-time (JIT) compiler, and NVIDIA’s…

Mesa CI and the power of pre-merge testing

08/10/2024

Having multiple developers work on pre-merge testing distributes the process and ensures that every contribution is rigorously tested before…

A shifty tale about unit testing with Maxwell, NVK's backend compiler

15/08/2024

After rigorous debugging, a new unit testing framework was added to the backend compiler for NVK. This is a walkthrough of the steps taken…

A journey towards reliable testing in the Linux Kernel

01/08/2024

We're reflecting on the steps taken as we continually seek to improve Linux kernel integration. This will include more detail about the…

Building a Board Farm for Embedded World

27/06/2024

With each board running a mainline-first Linux software stack and tested in a CI loop with the LAVA test framework, the Farm showcased Collabora's…

Open Since 2005 logo

Our website only uses a strictly necessary session cookie provided by our CMS system. To find out more please follow this link.

Collabora Limited © 2005-2025. All rights reserved. Privacy Notice. Sitemap.