Panfrost performance counters with Perfetto

Panfrost performance counters with Perfetto

Antonio Caggiano
August 21, 2020

Share this post:

Reading time:

Linux system information is available in several scattered forms. We can query kernel events, CPU counters, and memory counters through ftrace, procfs and sysfs, but historically we've lacked a holistic view of the system - including graphics performance counters - to target optimization. But we have now integrated Mali GPU hardware counters supported by Panfrost with Perfetto's tracing SDK, unlocking all-in-one graphics-aware profiling on Panfrost systems!

What is Perfetto?

Perfetto is an open-source project for performance instrumentation and tracing of Linux/Android/Chrome platforms and user-space apps. It enables you to capture the state of various components of your system into a trace file, which can be loaded into a web-based trace viewer, also available online.

At the moment of writing, Perfetto offers a good number of probes to see what is going on in the CPU, what is the memory usage, and other things like power consumption. There is also a GPU probe, but it is only capable of sampling the GPU frequency when the driver outputs that information via ftrace.

A key feature of Perfetto is its extendibility. You can feed your own data into Perfetto by either instrumenting your program or making a custom data source. So how about making one to put a magnifying glass on the GPU?

Graphics Perfetto producers

Collabora is working on a new project, namely gfx-pps, which aims to collect various perfetto data sources related to graphics hardware and sofware. The term producer is a key Perfetto concept which refers to a client process contributing to the tracing service with one or more data sources.

The gfx-pps project is under active development on FreeDesktop's GitLab licensed under MIT. It currently includes two data sources: one is able to sample Mali performance counters, while the other generates track events about Weston timeline.

Panfrost data source

One of the Perfetto data sources available in gfx-pps is the Panfrost data source, which is able to query Mali Midgard performance counters using the Panfrost driver.

You can follow the README to compile the tools for your target platform, but all the binaries you need to start tracing your application are already available as artifacts of the project's GitLab CI, on master and perfetto branches. The executables you can find for either x86_64 and aarch64 are the following:

traced, the tracing service.
traced_probes, the OS probes service.
libperfetto.so and perfetto, which is the command line tool used for recording traces.
producer-gpu, which provides the Panfrost data source.
gpu.cfg, config file to feed as input to perfetto describing what to trace. This one and other config files can be found under the gfx-pps/scripts directory.

Once you have everything ready on your target platform, follow these steps to capture a trace.

Start the tracing service by running traced.
Start the OS probes service with traced_probes.
Start the GPU producer producer-gpu.
Start perfetto to capture a trace following the directives of a config file:
```
perfetto --txt -c gpu.cfg -o trace
```

Once tracing has finished, you will find a trace file ready to be opened with ui.perfetto.dev.

Analysis guidelines

The golden rule of performance analysis is to find the bottleneck, which means applying Amdahl's law in order to parallelize as much as possible or to optimize the longest of a series of processes. Then it is all about finding the right balance.

CPU/GPU balancing

The first thing to check is the balance between CPU and GPU workload. If the GPU is idle most of the time, while the CPU is continuously busy, it would not make sense to focus on graphics, but the code running on the CPU needs to be optimized instead.

Once we know the GPU is the bottleneck, we need to make sure to parallelize work on CPU and GPU as much as possible by taking advantage of multi-buffering. Multi-buffering enables us to draw multiple frames in-flight, therefore exploiting the full potentiality of the GPU.

The screenshot below shows a trace of WebGL Aquarium taken on a RK3399 processor which uses a Mali Midgard GPU. You can see a frame generated in 69.3 ms. You can also notice how both GPU and CPU activity occupy respectively the first and the second half of the highlighted area. This suggests that improvements are needed on both sides.

Vertex/Fragment balancing

The next step moves our focus on vertex and fragment workloads. We generally expect to spend more time processing fragments. If the opposite is true, it means we would probably achieve a better speedup by either optimizing the vertex shader or reducing the number of geometry submitted for drawing.

Note that spending more time processing fragments does not mean it should occupy 100% of GPU time. If that happens, it is a sign we need to simplify this stage of the graphics pipeline, by reducing the complexity of the fragment shader.

Tripipe: Arithmetic/Load-Store/Texture balancing

In order to optimize our shaders, it is important to review the Midgard shader core structure, by focusing on the Tripipe execution core. The Tripipe is so-called because it can run arithmetic, load/store, and texture instructions in parallel. By looking at the counters of these three pipes, we could find a hint on which one to focus.

Bandwidth

Last but not least, optimizing memory bandwidth usage is crucial for mobile devices, as it directly results in less power consumption and better performance.

Note:

With the following formula you can calculate the bandwidth in bytes:

( L2 external reads + L2 external writes ) * bus width

While L2 counters are available for tracing, the bus width depends on the specific GPU, e.g. Mali T860 GPUs have a 16 bytes AXI bus. Keep in mind that an ideal value for a mobile GPU should stay below 5 GB/s.

Conclusion

The gfx-pps project is a good starting point for empowering the open source world of performance analysis tooling. As it has proven to be greatly valuable for our work on Panfrost, we have already planned to implement Perfetto data sources for other GPU families!

Learn more

Also useful: Mali Midgard family performance counters.

Bifrost meets GNOME: Onward & upward to zero graphics blobs

From Bifrost to Panfrost - deep dive into the first render

Experimental Panfrost GLES 3.0 support has landed in Mesa

Bifrost meets GNOME: Onward & upward to zero graphics blobs

From Bifrost to Panfrost - deep dive into the first render

Experimental Panfrost GLES 3.0 support has landed in Mesa

Comments (0)

Add a Comment

Search the newsroom

Latest Blog Posts

What to do about differing product life cycles

25/09/2025

Abandoned vendor-provided BSP roadblocks can be overcome when mainline Open Source projects like the Linux kernel are integrated directly.…

Writing a Rust GPU kernel driver: a brief introduction on how GPU drivers work

06/08/2025

This second post in the Tyr series dives deeper into GPU driver internals by using the Vulkan-based VkCube application to explain how User…

A practical debugging guide for media driver developers

22/07/2025

Getting into kernel development can be daunting. There are layers upon layers of knowledge to master, but no clear roadmap, especially when…

Quick notes from the GStreamer Spring Hackfest 2025

15/07/2025

This past May, we met with the community at the GStreamer Spring Hackfest in Nice, France, and were able to make great strides, including…

PipeWire workshop 2025: Updates on video transport, Rust efforts, TSN networking, and Bluetooth support

03/07/2025

As part of the activities Embedded Recipes in Nice, France, Collabora hosted a PipeWire workshop/hackfest, an opportunity for attendees…

Coccinelle for Rust progress report

25/06/2025

In collaboration with Inria, the French Institute for Research in Computer Science and Automation, Tathagata Roy shares the progress made…

About Collabora

Whether writing a line of code or shaping a longer-term strategic software development plan, we'll help you navigate the ever-evolving world of Open Source.

한국의 국기 한국어 버전의 Collabora.com 보기