Springe direkt zu Inhalt

Evgenia Slivko:

Multimodal object detection based on deep learning


Deep learning is widely used in autonomous vehicles’ environment perception systems that utilize data from a variety of sensors such as camera, LiDAR, RADAR, and Time-of-Flight. In order to build a reliable environment perception system for real-life applications, a vast amount of multi-modal labeled data for different light, weather, and climatic conditions is necessary for the training of deep learning environment perception models. Preparing such datasets is a non-trivial task that requires a huge amount of resources. Moreover, annotation of LIDAR data requires the designation of object bounding boxes in 3D space and a yaw angle for each object, while manual annotation of nighttime camera data is complicated since the objects are not clearly distinguishable due to the lack of contrast of the images that were obtained without sufficient light. Besides, with the development of technology new sensors appear that have distinctive features which determine the specific characteristics of the data. Therefore, the problem of transferring knowledge between sensors of the same and different modalities arises. This master thesis is prepared with the German semiconductor and sensor manufacturer Infineon Technologies AG, which conducts research in AI based on sensor data. The master thesis addresses the problem of the lack of labeled data for autonomous vehicles’ environment perception deep learning-based models training and the problem of transferring knowledge between sensors on the example of a camera and LiDAR data. While working on the thesis, a custom dataset for multimodal environment perception was collected with the use of the Infineon multi-sensor setup, which included a camera, LiDAR, and other sensors. A synchronization for Camera and LiDAR data was performed by extracting 2D depth maps from 3D LiDAR point cloud and by calibration of the sensors using a planar checkerboard pattern. Using the transfer learning approach, the YOLOv5 object detection model was trained on Infineon camera image data. The weights were initialized from the object detection model that was pretrained on the MS COCO dataset. The technique to extrapolate labels from the camera images to LiDAR 2D depth maps was determined and implemented. The resulting labels were used for training an independent object detection model for Infineon 2D depth map data. Using the late fusion approach a sensor fusion algorithm was implemented to provide a unified perception of the environment for the autonomous vehicle. This approach allows to label multimodal data automatically and, therefore, significantly decreases the time and resources for dataset annotation.

Master of Science (M.Sc.)