Jakub Piotr Cłapa
January 17, 2023
Reading time:
Modern datasets contain hundreds of thousands to millions of labels that must be kept accurate. In practice, some errors in the dataset average out and can be ignored – systematic biases transfer to the model. After quick initial wins in areas where abundant data is readily available, deep learning needs to become more data efficient to help solve difficult business problems. In the words of deep learning pioneer Andrew Ng:
In many industries where giant data sets simply don’t exist, I think the focus has to shift from big data to good data. – Andrew Ng: Unbiggen AI - IEEE Spectrum
Over the course of 2022, we worked on an open-source tool that combines novel unsupervised machine-learning pipelines with a new user interface concept that, together, help annotators and machine-learning engineers identify and filter out label errors.
Labeling is a difficult cognitive task and accurate labels require a serious Quality Assurance (QA) process. Most existing labeling tools (both commercial and Open Source) have only minimal support for review. Frequently the QA process is more difficult (and expensive!) than initial labeling since you are forced to use an interface optimized for drawing bounding boxes to verify if all labels were assigned correctly. Here is the process described by a leading annotation service provider:
Annotations are reviewed four times in order to confirm accuracy. Two annotators label a given object, a supervisor then checks the quality of their work. – keymakr, a leading annotation provider
Can you spot the mistake in the following photo? I can't blame you. This is hard because it requires expert knowledge and a lot of cognitive resources to read all the labels, remember what each of these signs should look like, and finally spot the ones that are incorrect.
What if instead we show the exact same data like this:
Now it's not so difficult to spot the one speed limit sign that does not fit with the rest (the 30km/h speed limit). It requires you to only keep a single type of object in your working memory at a time and taps into the intuitive skill of spotting items that stand out from the rest. It also takes an order of magnitude less time.
This insight directly led to the creation of MLfix. Using the streamlined interface lets us perform the QA process more than 10 times faster and avoid missing even 30% of the errors.
The video below shows a user quickly scrolling through 40 objects belonging to 5 classes and finding 6 mislabeled examples.
You can also try it yourself on a selection of 60km/h speed limit signs coming from the Mapillary Traffic Sign Dataset. Note that depending on demand the live demo can take some time to start.
MLfix can be used as a standalone tool, but it can also be embedded directly into Jupyter notebooks that are used by data scientists to prepare and train deep learning networks. Thanks to that, MLfix can tap into all the metadata you have about your dataset and also utilize networks you've trained to help you with the QA process. You can:
other-sign
that the model believed to be the do-not-enter
sign; we can see that it was right most of the time:
We made a comparison on the Mapillary Traffic Sign Dataset, which is an extensive dataset of 206 thousand traffic signs divided into 401 classes. Among these, there are 6,400 annotations of speed limit signs, and with MLfix, in about 30 minutes we could find and remove 3% of them that were erroneous. In other words, we corrected 0.11% of all the labels in the whole dataset.
We trained image classification models (based on the ResNet50 backbone) on both the original and fixed datasets 20 times and averaged out the accuracy metrics. After fixing the dataset, the model error rate went down from 7.28% to 7.05%, and the error rate for speed signs improved by almost 2% from 10.42% to 8.49%) which is a significant improvement for a very modest amount of effort. More information about these experiments (including the code to reproduce the results) can be found in the GitHub repo - jpc/mlfix-mapillary-traffic-signs. The accuracy histograms show that the improvement is consistent over multiple training runs:
Our work could not have been possible without the help of countless open-source resources. We hope MLfix will help the annotations community to build the next generation of innovative technology.
If you have questions or ideas, join us on our Gitter #lounge channel or leave a comment in the comment section.
08/10/2024
Having multiple developers work on pre-merge testing distributes the process and ensures that every contribution is rigorously tested before…
15/08/2024
After rigorous debugging, a new unit testing framework was added to the backend compiler for NVK. This is a walkthrough of the steps taken…
01/08/2024
We're reflecting on the steps taken as we continually seek to improve Linux kernel integration. This will include more detail about the…
27/06/2024
With each board running a mainline-first Linux software stack and tested in a CI loop with the LAVA test framework, the Farm showcased Collabora's…
26/06/2024
WirePlumber 0.5 arrived recently with many new and essential features including the Smart Filter Policy, enabling audio filters to automatically…
12/06/2024
Part 3 of the cmtp-responder series with a focus on USB gadgets explores several new elements including a unified build environment with…
Comments (0)
Add a Comment