Daniel Morin
February 13, 2024
Reading time:
GStreamer has long been the best framework to build pipelines to handle video streams, and in particular, live ones. It's no coincidence that it has been adopted widely by engineers wishing to build video analytics pipelines.
Within computers, we represent media data as a series of discreet samples over time, and in the case over images, over space. We generally don't care about the meaning of those samples, as the goal is to display them back to humans. This data is unstructured. Sometimes, we instead want to structure the content of this data to extract a meaning. For example, instead of just reddish pixels, we want to know that it's a strawberry. There exist a number of different type of algorithms to do this, from traditional computer vision to the latest trends in deep learning. But they all have in common that they produce some structured data describing the content of the input.
A typical example of object detection and classification using strawberries and leaves. More examples available here: https://col.la/gstanalyticsexamplesmodels. |
GStreamer is a natural choice to handle this kind of metadata describing the underlying media data. It has a flexible system to attach arbitrary bits of data to a media buffer. Many companies have built their machine learning analysis framework around GStreamer, but no one had made the effort to contribute upstream, until now.
Our goal was to create an analytics framework for GStreamer that decouples analysis steps from each other, leverages platform-specific acceleration where available, defines generic elements that function across platforms, and scales to large amounts of data and detections.
GStreamer has a feature called a GstMeta
which is a way to attach an arbitrary structure to a buffer (such as a video frame). In particular, there is also a region of interest meta that allows defining a rectangle in the image and attaching some data to it. Our first idea was to extend this, but we realized that it couldn't scale. For example, in a wide shot of a crowd, you could detect hundreds of people. The other thing we wanted make it easier to do the analysis in multiple steps, for example by having one step that detect objects, then further steps that find more information about specific objects.
We defined a new GstAnalyticsRelationMeta
that stores an array of metadata structures along with a graph of relations between those. This enables us to have an object at a specific location, then define a class of objects and have a "this object belongs to this class" type of relationship. For example, we can have a "car" class and a "tire" class, so we can define a relationship between object 1 as a car and object 2 as a tire. Furthermore, we can include a relationship between objects, such as object 2 being part of object 1 - the tire is part of the car.
In this example, there are 2 types of metadata, classification and object dectection. The classification further describes the objects. |
We've also defined some base classes of metadata: objects, classification and tracking. But more classes can be defined in the future, and plugins can even define their own.
We hope that this will be a first step to foster more collaboration between everyone using GStreamer as a common language for video analysis. Please don't hesitate to contact us if you want to discuss your GStreamer projects, or want help building media analytics into your products.
07/01/2025
A testament to its long standing community interest and devote volunteers, FOSDEM will be celebrating its 25th anniversary this year. Join…
20/12/2024
The Rockchip RK3588 upstream support has progressed a lot over the last few years. As 2024 comes to a close, it is a great time to have…
09/12/2024
Collabora will be at NeurIPs this week to dive into the latest academic findings in machine learning and research advancements that are…
Comments (0)
Add a Comment