Object pose estimation is a common computer vision problem. The recent trend in augmented reality applications, as well as applying deep convolutional neural networks for computer vision problems was the original motivation for this thesis. Specifically, this thesis tries to explore the possibilities of using a deep convolutional neural network trained on rendered training data to estimate the pose of a cube in an industrial application.
The mentioned industrial application is to be realized in the context of a project at the Fraunhofer FOKUS. In an industrial like setting, the location and rotation of a cube on a workbench is to be predicted by a trained neural network given an RGB image. With the exact pose of the object an augmented reality experience for a user can be realized to interact with the cube. More importantly, a robot within the workspace can use this information to further process the cube.
The trained convolutional neural netork in this thesis generalizes well over the pose of a cuboid when tested on rendered images. Some approaches are discussed as an outlook to bridging the gap between predicting on rendered images and real images at the end of this thesis.