We're hiring!
*

Carlafox: Towards reliable open-source 3D perception

Vineet Suryan avatar

Vineet Suryan
April 05, 2023

Share this post:

Reading time:

Carlafox: Towards reliable open-source 3D perception

Overview

Key takeaways

Data Collection

Training 3D Perception Models on CARLA dataset

A closer look into SFA3D

Optimising with TensorRT

Testing

Outlook

Extracting precise 3D object information is one of the prime goals for comprehensive scene understanding. However, labeling errors are common in present open-source 3D perception datasets, which could have impactful consequences. To tackle this issue, we used Carlafox to automatically generate an error-free synthetic dataset for 3D perception.

Deep 3D object detectors may become confused during training due to the inherent ambiguity in ground-truth annotations of 3D bounding boxes brought on by occlusions, missing, or manual annotation errors, which lowers the detection accuracy. However, existing methods overlook such issues to some extent and treat the labels as deterministic. It is possible to create enormous datasets without cost by using a virtual simulation with known labels. According to research, using both simulated and real data helps AI models become more accurate. Our results show that simulated data can significantly reduce the amount of training on real data required to achieve satisfactory levels of accuracy.

Key Takeaways

  • Simulated data is becoming more crucial than ever in autonomous driving applications, both for testing pre-trained models and for developing new models.
  • It is imperative that the underlying dataset contains a variety of driving scenarios and that the simulated sensor readings closely resemble real-world sensors for the neural network models to generalize to real-world applications.
  • Carlafox is able to export high-quality, synchronized LIDAR and camera data with object annotations, and offers a configuration to accurately reflect a real-life sensor array.
  • Furthermore, we use the Carlafox tool to generate a dataset consisting of 10,000+ samples and use this dataset to train SFA3D, a fast open-source 3D object detection neural network.
  • For testing, we integrate the model back into Carlafox and visualize it against the ground truth data from the simulator.

Data Collection

Carlafox, a web-based CARLA visualizer, substantially demystifies the arduous task of synthetic dataset generation for 3D object detection. We use Carlafox to set up sensor configurations, create diverse weather conditions, and generate data from different maps in the KITTI format. One of the advantages of the dataset is that the open-source CARLA simulator was used to recreate the same LiDAR and camera configurations used to generate the original KITTI data.

The objective is to offer a challenging dataset to assess and enhance approaches in complicated vision tasks, such as 3D object detection. In total, the dataset has 12807 cars, 10252 pedestrians, and 11624 cyclists. The dataset contains 2D and 3D bounding box annotations of the classes: Car, Pedestrian, and Cyclist and contains both LIDAR and camera sensor data, as well as the generation of sensor calibration matrices.

Figure 1: 3D bounding boxes projected on LiDAR point cloud.

 

Figure 2: 2D & 3D bounding boxes with occlusion and unique id.

Training 3D Perception Models on CARLA dataset

Due to its numerous applications across various industries, including robotics and autonomous driving, 3D object detection has been gaining more attention from businesses and academia. LiDAR sensors are commonly used in robotics and autonomous vehicles to collect 3D scene data as sparse and erratic point clouds, which has shown to serve as helpful cues for 3D scene perception and comprehension.

We trained quite a few LiDAR-based networks, namely PointRCNN, PVRCNN, and SFA3D, and a Multimodal(RGB + LiDAR) 3D object detection model, i.e. MVXNet on the CARLA synthetic dataset but fine-tuned only one of these i.e., SFA3D, mainly because it is faster and uses less memory without much loss in performance. Although, any other model could have performed better if optimized and tuned further than just training a baseline, as shown in the following panel.

A closer look into SFA3D

Super Fast and Accurate 3D object detection is based on 3D LiDAR Point Clouds. The ResNet-based Keypoint Feature Pyramid Network (KFPN), builds the backbone of the detector and was proposed in RTM3D.

The model takes a bird's-eye-view (BEV) map as input. The height, intensity, and density of 3D LiDAR point clouds are used to encode the BEV map. On the other hand, it outputs a heatmap for the main center, the center offset, the heading angle, the dimensions of the object, and the z coordinate.

Figure 3: SFA3D predictions on held-out test set.


As for the loss functions, the focal loss is used for the main center heatmap, and l1 loss for the heading angle (yaw). It employs balanced l1 loss for the z coordinate and the three dimensions (height, width, and length). We trained the model for a total of 300 epochs by setting equal weights for the aforementioned loss components using a cosine LR scheduler with an initial learning rate of 0.001 and a batch size of 32 (on two RTX 2080Ti). Refer to the following wandb panels for results with SFA3D experiments on all towns and Town01, respectively.

Optimising with TensorRT

TensorRT enables developers to optimize inference by leveraging CUDA libraries. TensorRT supports both INT8 and FP16 post-training quantization, which greatly reduces application latency and is required for many real-time services, as well as autonomous and embedded applications.

As a first step, we convert the SFA3D PyTorch model to ONNX, and use the ONNX parser to convert ONNX model to TensorRT. We could also bypass the parser and directly convert from PyTorch to TensorRT, doing so would require us to write the SFA3D network in TensorRT network-definition API, which would be time intensive and result in negligible speed benefit but could be more efficient on an embedded device like a Jetson Nano.

In addition, we examined benchmarks across multiple frameworks like TVM and ONNX to ensure that TensorRT is the best performing. From the above results, it is clear that TensorRT aids in obtaining higher throughput on the same hardware. Furthermore, quantization to FP16 boosts performance even more. On RTX2080Ti, TensorRT may be the most efficient solution for SFA3D, but it's also possible that another framework, such as Apache TVM, performs better on a different device with the same or another network; thus, results may vary depending on the hardware.

Testing

To make it easier to compare the model's predictions with CARLA's ground truth, we incorporated the model into Carlafox and made them available in a separate Foxglove image panel. For more details on the Carlafox visualizer, please refer to this dedicated blog post.

Outlook

Numerous open-source resources paved the way for us to accomplish our work. In the future, we plan to finetune the trained models with the official KITTI dataset. Because of the expenses associated with acquiring real-world data, the use of synthetic data for training machine learning models has grown in popularity in recent years. This is especially true in the case of autonomous driving due to the rigorous requirement of generalizability to a wide range of driving conditions, so we hope our findings help others in research and development.

If you have questions or ideas on how to leverage synthetic data for 3D perception, join us on our Gitter #lounge channel or leave a comment in the comment section.

Comments (0)


Add a Comment






Allowed tags: <b><i><br>Add a new comment:


Search the newsroom

Latest Blog Posts

Faster inference: torch.compile vs TensorRT

19/12/2024

In the world of deep learning optimization, two powerful tools stand out: torch.compile, PyTorch’s just-in-time (JIT) compiler, and NVIDIA’s…

Mesa CI and the power of pre-merge testing

08/10/2024

Having multiple developers work on pre-merge testing distributes the process and ensures that every contribution is rigorously tested before…

A shifty tale about unit testing with Maxwell, NVK's backend compiler

15/08/2024

After rigorous debugging, a new unit testing framework was added to the backend compiler for NVK. This is a walkthrough of the steps taken…

A journey towards reliable testing in the Linux Kernel

01/08/2024

We're reflecting on the steps taken as we continually seek to improve Linux kernel integration. This will include more detail about the…

Building a Board Farm for Embedded World

27/06/2024

With each board running a mainline-first Linux software stack and tested in a CI loop with the LAVA test framework, the Farm showcased Collabora's…

Smart audio filters with WirePlumber 0.5

26/06/2024

WirePlumber 0.5 arrived recently with many new and essential features including the Smart Filter Policy, enabling audio filters to automatically…

Open Since 2005 logo

Our website only uses a strictly necessary session cookie provided by our CMS system. To find out more please follow this link.

Collabora Limited © 2005-2024. All rights reserved. Privacy Notice. Sitemap.