Ruber, Simon: Impact of Domain-Based Data Sampling Approaches on the Performance of Object Detection Models.
Deep learning-based models have become essential for the progress in object detection, but they are heavily dependent on the data with which they are trained. The optimal constitution of this training data to support the model learning the patterns of the data is not yet fully understood. One subtopic of this area is the relevance of image domains which is investigated within this thesis. Image domains and their associated classes influence the visual attributes of images and therefore lead to systematic differences between domain classes (e.g., time of the day or weather). This work proposes a process to evaluate the relevance of image domain classes and to measure their impact on object detection models. Further on, the impact of image domain-based sampling on the model performance is evaluated. The BDD100K dataset was used as the data source for the experiments. Cleaning and label validation processes were developed to prepare the dataset. The relevance of an image domain class and the impact of domainbased sampling were tested with the YOLOv5s-P6 model. Twelve image domain classes, belonging to three image domains (weather, time of the day and scene) were investigated. Ten out of twelve image domain classes are considered relevant for the performance of the object detection model. Three model groups, trained with stratified sampled data, were tested against models trained with randomly sampled data. Stratified sampling was not superior in any of the conducted comparisons. Instead, the object size distribution in the training data of the models showed significant impact on the model performances.
Deep learning is widely used in autonomous vehicles’ environment perception systems that utilize data from a variety of sensors such as camera, LiDAR, RADAR, and Time-of-Flight. In order to build a reliable environment perception system for real-life applications, a vast amount of multi-modal labeled data for different light, weather, and climatic conditions is necessary for the training of deep learning environment perception models. Preparing such datasets is a non-trivial task that requires a huge amount of resources. Moreover, annotation of LIDAR data requires the designation of object bounding boxes in 3D space and a yaw angle for each object, while manual annotation of nighttime camera data is complicated since the objects are not clearly distinguishable due to the lack of contrast of the images that were obtained without sufficient light. Besides, with the development of technology new sensors appear that have distinctive features which determine the specific characteristics of the data. Therefore, the problem of transferring knowledge between sensors of the same and different modalities arises. This master thesis is prepared with the German semiconductor and sensor manufacturer Infineon Technologies AG, which conducts research in AI based on sensor data. The master thesis addresses the problem of the lack of labeled data for autonomous vehicles’ environment perception deep learning-based models training and the problem of transferring knowledge between sensors on the example of a camera and LiDAR data. While working on the thesis, a custom dataset for multimodal environment perception was collected with the use of the Infineon multi-sensor setup, which included a camera, LiDAR, and other sensors. A synchronization for Camera and LiDAR data was performed by extracting 2D depth maps from 3D LiDAR point cloud and by calibration of the sensors using a planar checkerboard pattern. Using the transfer learning approach, the YOLOv5 object detection model was trained on Infineon camera image data. The weights were initialized from the object detection model that was pretrained on the MS COCO dataset. The technique to extrapolate labels from the camera images to LiDAR 2D depth maps was determined and implemented. The resulting labels were used for training an independent object detection model for Infineon 2D depth map data. Using the late fusion approach a sensor fusion algorithm was implemented to provide a unified perception of the environment for the autonomous vehicle. This approach allows to label multimodal data automatically and, therefore, significantly decreases the time and resources for dataset annotation.
Baga, Oleksandra: User Position Prediction in 6-DoF Mixed Reality Applications Using Recurrent Neural Networks
This thesis is focusing on designing and evaluation of the approach for the prediction of human head position in a 6-dimensional degree of freedom (6-DoF) of Extended Reality (XR) applications for a given look-ahead time (LAT) in order to reduce the Motion-to-Photon (M2P) latency of the network and computational delays. At the beginning of the work the existing head motion prediction methods were analysed, and their similarities differences will be taken into account when a proposed Recurrent Neural Network-based predictor will be developed. Main goal is the systematic analysis of the potential of recurrent neural networks for head motion prediction. The proposed approach was evaluated on a real head motion dataset collected from Microsoft HoloLens. Based on a discussion of the obtained results, suggestions for future work are provided.
Pinto Diaz, Luz Adriana: Classification of Unseen Categories Through Few-Shot Learning for Garment Sorting Processes
The garment industry is one of the most pollutant and prominent producers of waste on the planet. Only a low percentage of the used textile resources are reused to manufacture new clothes, and second-hand markets are becoming more saturated, causing that clothing that can still be worn to be discarded. A crucial alternative for the garments industry to reduce the environmental impact is closed-loop recycling; however, there are still challenges, such as the automation of sorting processes, that need to be tackled to enable circularity. This thesis is developed within the cooperation framework of the Freie Universität Berlin, the Technische Universität Berlin, and the circular fashion company to support CRTX. CRTX is a collaborative project that researches solutions to automate the sorting of used garments for high-quality purposes and to support human sorters to achieve a fine-grained classification. During the sorting process, previously unseen garment categories may appear that need to be classified. This work explores a meta-learning approach, which recognizes new classes from only a few labeled examples of each class, as an alternative to classify such categories. Results show that these methods are scalable to new classes and robust to imbalanced datasets, closer to real-world conditions. For the experimentation, a Machine Learning pipeline was built using state-of-the-art tools, which also contributes to the objective of an eventual system deployment for production-level serving.
Winiger, Helena: Artificial Intelligence, Context and Emotion. Dreyfus´ Critique of Symbolic AI from an Ethical Perspective
The phenomenon of intelligence still raises fascinating questions. Hence studying it means entering unknown depths and constantly giving rise to new disciplines. Perhaps the attraction of the phenomenon stems precisely from the fact that numerous mysteries still seem unsolved: We are left with the phenomenon itself. Nevertheless, in the research field of AI, efforts are being made to thoroughly assess and even replicate the phenomenon by technical means. Historically and methodologically, two different approaches are opposed to one another, reflecting the dichotomy of mind and brain: The initially dominant symbolic approach to AI attempts to capture the phenomenon of the mind by logically representing world knowledge, ontologically relating it, and automatically reasoning on its basis. In contrast, today’s dominant sub-symbolic AI intends to bionically mimic the human brain with artificial neural networks that sense the world and learn from it through data.
Marine habitats are an increasingly relevant research field, and with more capable and affordable options for capturing images in underwater environments, the need for more effective pipelines to process these images becomes apparent. We compare different image enhancement methods, namely contrast limited adaptive histogram equalization (CLAHE), multi-scale retinex with color restoration (MSRCR), and a fusion-based approach on their efficacy in an object detection pipeline. Their specifice use in the training and inferencing process with convolutional neural network (CNN) models are evaluated. In our setup, we build flexible pipelines to train several models with different enhancement strategies for the training dataset and assess their detection capabilites by measuring their inferencing precision with differently enhanced test datasets. We chose the regions with CNN(R-CNN) architectures Faster R-CNN and Mask R-CNN for our analysis, as they are widely used and deployed in all sorts of practical applications. We found that the use of these enhancement methods during the training phase results in better models, though their application in an inferencing pipeline is still inconclusive. Our data shows, that a significant subset of the images would gain from some form of enhancement as these methods show to mitigate some of the image degradations introduced by the underwater environment. We, therefore, argue with this work for the necessity of a reliable method that determines the best enhancement procedure for each image as part of an extended detection process.
The decline of honey bees (Apis mellifera), due to various threats, like the ongoing global climate change or pesticides, endangers wild plant diversity, ecosystem stability and crop production. The EU funded project Hiveopolis wants to address this problem with technology. A newly developed intelligent bee colony system equipped with sensors, actuators, and robots will be used to optimally manage and guide the bee colony through nowadays challenges. Part of the research within the Hiveopolis project deals with automated methods for monitoring the brood nest on a honeycomb. Which is useful for assessing the colony strength. This thesis leverages high resolution image data of a honey bee colony, recorded with the hive observation setup, from the BeesBook project at the Biorobotic Lab, whose team also contributes to Hiveopolis, at the Freie Universität Berlin, in order to investigate how well it is possible to predict honey bee brood age with high resolution image data. (More)