In this thesis we show novel techniques for the robust estimation and segmentation of indoor room structure using common 2.5d sensors, namely AIT Stereo Vision and Microsoft’s Kinect. The underlying concept of this work is the so-called Manhattan world assumption i.e. the frequently observed dominance of three mutually orthogonal vanishing directions in man-made environments. Our work emphasizes on processing speed and robustness over high-quality segmentation. Many indoor environments can be considered Manhattan-like if the furniture is aligned to the walls and the room is rectangular within limits.
Our methods works in three steps: First we estimate the Manhattan world, extract features and fuse them together in a segmentation. The estimation uses three different techniques i.e. 2D vision using vanishing point detection, 3D vision using minimum entropy in histograms and normal vector MSAC estimation. All methods work efficiently and independently from each other and are robust to noise and occlusion. The feature extraction is based on the used estimators and uses geometric constrained line and plane detection. Lines are extracted using histograms and gabor filters while planes are extracted using mean shift clustering and connected component RANSAC estimators. All estimates are fused using a traditional particle filter for a coherent sensor data representation. In a last step we apply multi-label graph segmentation and extract the room structure.
We also present applications like geometric constrained visual odometry and mapping. We show that our method is robust and accurate in realistic environments using our own created database. This work can be applied for indoor robot navigation, object recognition and holistic scene understanding. Our approach is not limited to AIT Stereo Vision and Microsoft’s Kinect and can be used with any 2.5d sensor like for example in Google’s Project Tango.