Gert Wollny
February 01, 2021
Reading time:
Apitrace is a very good tool for recording the OpenGL calls of applications so that one can debug OpenGL issues and test performance and the correctness of the rendering without having access to the application. The latter is already done in the mesa CI. However, especially with games, because of intros, menus, and loading screens the traces can become rather large before the actual, interesting rendering happens. This puts a heavy computational burden on the CI runners that could be avoided if the trace could be trimmed to just render the frame(s) of interest. As an alternative RenderDoc exists that can be use to capture and replay exactly one frame, but it doesn’t support compatibility contexts, a feature that is used by many games.
Enters gltrim, an apitrace tool that has recently been added to apitrace and that is designed to trim traces to user selected target frame(s). Other than with the already available trim tool, its resulting traces can always be replayed. However, given the complexity of the object relationships that can be created in OpenGL it is not guaranteed that the rendering produced by the trace is also correct. For this gltrim provides means to keep additional states and setup frames. Below you’ll find a few notes about the possible object inter-dependencies in OpenGL, a sketch of how gltrim works, and a few example how it is applied in practice.
OpenGL offers many ways to fill textures with data:
PIXEL_UNPACK_BUFFER
target,The output to framebuffers is also dependent on many states that need to be tracked for each draw call. Considering that these so created textures can then again be used as a source for draw calls that manipulate or fill other textures, the dependency chains can be limitless.
Buffer data can also be changed in various ways: Apart from direct uploads, the contents of buffers can be manipulated by using shaders, or by copying the data from one buffer to another. With OpenGL 4.4 buffers can be mapped persistently, so that memory copies need to be tracked. Finally, the rendering output may not be cleared at each frame start, so that the actual displayed output is the result of the rendering done in multiple frames. Given all these possible inter-dependencies between OpenGL objects, doing an exhaustive tracking of the objects and states required to reproduce a certain frame from scratch can be very time and memory intensive.
RenderDoc works around this by actually executing the OpenGL so that when a frame is captured, it does not contain all calls to re-created the output from scratch, but also intermediate textures and buffer data. In the worst case one might capture a frame that recreates the output by only uploading texture data and doing a blit.
With the approach implemented in gltrim the aim is to re-create the output from scratch, always keeping all the needed draw calls and state changes. Early tries to do an exhaustive tracking did not work well: On one hand, they were quite slow, and needed a lot of memory, and on the other hand, the fact that some games display content over a number of frames made the approach where always only the target frames were specified quite unreliable.
Instead an alternative was implemented, that accepts that one might not get it right the first time when trying to trim a trace. Here, all OpenGL objects are tracked, but only a lightweight tracking of dependencies is done. Some setup calls, like the calls to create the rendering context, are always copied to the output trace. When the first target frame of one or more continuous frames is analysed, the calls of the current states and the state of the currently bound objects is copied to the output. Then, whenever an object is bound all the calls that are required to reproduce its state and also those of directly dependent object are copied.
With this method trimming a trace is pretty fast and has a low memory footprint. While the methods with more exhaustive tracking may take hours to complete, here a trimming a trace is a matter of minutes, or for short traces even only seconds. The output trace consists of one frame doing all the setup calls, and the target frame(s). When the trimming is done, qapitrace can and should be used to check the visual output and if the result is not correct, there are various options to improve it: On one hand one can specify that all simple state changes are kept. This especially helps with errors in the brightness of the output. The burden on the size of the trace are usually low, and also the final run-time of the trace does not increase a lot. If whole objects are missing in the output gltrim has the option to add extra frames to the setup frame. Usually, when a scene changes in a game, for instance, a new level is loaded, objects are created and data is uploaded. These frames can often be identified easily because they consists of more average number of OpenGL calls, and gltrim offers an option to list such frames when trimming is complete. By specifying these additional frames as setup frames, they are copied boilerplate (expect for the frame-end buffer swap) to the output setup frame and so are the calls of dependencies of objects used in these additional frames.
To obtain a trace that re-creates the intended output correctly, usually some experimenting is needed, but given the relatively short turnaround time for trimming a trace, this is a feasible approach.
In the best case the application of gltrim is straight forward: In the first example, a trace of the game Doom3 running on the Open Source implementation dhewm was taken and trimmed:
apitrace gltrim -f 666 -o doom3_trim_frame666.trace doom3.trace
In this case the original trace was trimmed from a file size of 300 MB down to 32 MB with the target frame 666 rendering like in the original trace.
For a traces taken from Alien Isolation the naive approach to trimming doesn’t result in a trace correctly reproducing the original image, that is, running
apitrace gltrim -f 2000 -o AI_trim_frame2000.trace AI.trace
results in some textures not being rendered with the correct brightness (Figure 1).
Figure 1: Original rendering (left), naive trimming (right), not the lower intensities in some areas. |
Trimming the trace by keeping all state changes fixes the problem and is like
apitrace gltrim -f 2000 -k -o AI_trim_keep_states_frame2000.trace AI.trace
Sometimes when trimming traces in the naive way geometry goes missing (Figure 2).
Figure 2: Original rendering (left) versus naive trimming (right). Geometry is missing the trimmed output because the background might have been drawn in another frame and was not retained. |
Here one might analyse the trace for large frames that usually contain extra setup calls that might contain the creation of the missing parts. To obtain a list of the frames with the most OpenGL calls:
gltrim -f 4000 -t 10 Unturned.trace -o Unturned-virgl-f4000.trace
resulting in an output like:
... Calls per frame: Frame[3334] = 56743 Frame[3806] = 42453 Frame[3813] = 33820 Frame[3795] = 26747 Frame[3814] = 24379 Frame[0] = 6145 Frame[2032] = 5848 Frame[3817] = 5392 Frame[3934] = 5313 Frame[3937] = 5282
With this information one can start to add additional setup frames. Here the calls with a significantly larger number closest to the target frame are frames 3806, 3813, and 3814. Only adding the very large frame 3806 is not sufficient (see Figure 3, right), but it is required to get the geometry right. Adding frame 3814 too finally results in correct rendering.
Civilization V creates the board tiles over time using the same combination of texture and framebuffer object so that tracking the dependency becomes difficult. Direct trimming results in uninitialized tile memory (Figure 4, upper right). Adding setup frames starting from the frame right after the loading screen removes the error. In this case in total of 45 additional frames are required to get correct rendering.
Another example of output created over time is motion blur, where the rendering output is accumulated over a series of frames. An example where this occurs is SOMA (Figure 5)
Here keeping the whole motion blurred scene might be needed to get the pixel-exact same rendering.
gltrim provides the means to trim apitrace traces to contain only a few frames that can still render correctly. It is a semi-automatic tool that requires some manual interaction to check and optimize the output, but the obtained trimmed traces have a significantly lower size (depending on the input trace as low as 10% of the original trace), and a lower run-time (usually about half the time the original trace trimmed to end at the target frame needs to run, sometimes even less).
By removing the intro and menu frames the resulting traces are way more useful for doing performance analysis, and by reducing the memory footprint and the run time, it becomes more feasible to use traces of resource-intensive games in the Mesa CI. In theory, it should be possible to wrap the tool in some script that automatically pick setup frames and options optimize the output, one thing we might look into in the future.
15/01/2025
With VirGL, Venus, and vDRM, virglrenderer offers three different approaches to obtain access to accelerated GFX in a virtual machine. Here…
19/12/2024
In the world of deep learning optimization, two powerful tools stand out: torch.compile, PyTorch’s just-in-time (JIT) compiler, and NVIDIA’s…
08/10/2024
Having multiple developers work on pre-merge testing distributes the process and ensures that every contribution is rigorously tested before…
15/08/2024
After rigorous debugging, a new unit testing framework was added to the backend compiler for NVK. This is a walkthrough of the steps taken…
01/08/2024
We're reflecting on the steps taken as we continually seek to improve Linux kernel integration. This will include more detail about the…
27/06/2024
With each board running a mainline-first Linux software stack and tested in a CI loop with the LAVA test framework, the Farm showcased Collabora's…
Comments (5)
Martin Peres:
Feb 04, 2021 at 01:16 PM
Gert, you rock for making it not only work for actual games, but upstreaming it too!
I'm looking forward to playing with it :)
Reply to this comment
Reply to this comment
alphaprog:
Apr 15, 2021 at 04:06 AM
Is the retained/saved opengl state (while trimming the log) sufficient to enable the application continuing its execution after the last replayed frame? Supposing that we choose to keep only the last frame (gltrim -k $last_frame fullTrace)
Reply to this comment
Reply to this comment
Gert Wollny:
Apr 16, 2021 at 08:38 AM
I'm not sure whether I understood the question correctly. The application is not actually running when you replay a trace, instead one of the apitrace replay tools is running and issuing the OpenGL commands as recorded in the trace, which means that after replaying the trace the tool exits and with the trimmed trace it simply exists after the last frame the trace was trimmed to.
Now as for the states retained by using the "-k" flag, this just keeps all the "cheap" state calls in the trimmed trace (cheap in the sense that they don't pass large amounts of data and don't directly result in draw calls)., but that has nothing to do with running the original application.
Reply to this comment
Reply to this comment
alphaprog:
Apr 16, 2021 at 06:32 PM
Thank you for your answer, my bad for not giving more details.
I was thinking of interfacing the gltrim capability with a checkpoint-restart software.
The checkpoint software will use apitrace to trace the opengl calls, and then restarts using the ckpt-restart software and the gltrim to replay only the last frame (instead of replaying all the frames ...)
So in this case:
Do you think that the ``cheap'' retained state will be enough so the application could continue its execution normally after a restart (i.e; with a correct/valid opengl state)?
Reply to this comment
Reply to this comment
Gert:
Apr 16, 2021 at 06:58 PM
It may work in some cases, but it more likely to fail, because of the things that are not kept: Shaders, textures, and buffers that were (pre-)loaded, but are not used in the specified target frame. So if these are used later in the application they will be missing - think of a 3D scene that doesn't display certain things in the target frame, so the related textures, shaders etc, are dropped, but then, e.g. when turning around, these objects should be displayed, but they would now be missing.
Reply to this comment
Reply to this comment
Add a Comment