Egomotion estimation and multi-run depth data integration for 3D reconstruction of street scenes
Digitalisation of a 3D scene has been a fundamental yet highly active topic in the field of computer science. The acquisition of detailed 3D information on street sides is essential to many applications such as driver assistance, autonomous driving, or urban planning. Over decades, many techniques including active scanning and passive reconstruction have been developed and applied to achieve this goal. One of the state-of-the-art solutions of passive techniques uses a moving stereo camera to record a video sequence on a street which is later analysed for recovering the scene structure and the sensor's egomotion that together contribute to a 3D scene reconstruction in a consistent coordinate system.
As a single reconstruction may be incomplete, the scene needs to be scanned multiple times, possibly with different types of sensors to fill in the missing data. This thesis studies the egomotion estimation problem in a wider perspective and proposes a framework that unifies multiple alignment models which are generally considered individually by existing methods. Integrated models lead to an energy minimisation-based egomotion estimation algorithm which is applicable to a wider range of sensor configurations including monocular cameras, stereo cameras, or LiDAR-engaged vision systems.
This thesis also studies the integration of 3D street-side models reconstructed from multiple video sequences based on the proposed framework. A keyframe-based sequence bag-of-words matching pipeline is proposed. For integrating depth data from difference sequences, an alignment is initially found from established cross-sequence landmark-feature observations, based on the aforementioned outlier-aware pose estimation algorithm. The solution is then optimised using an improved bundle adjustment technique. Aligned point clouds are then integrated into a 3D mesh of the scanned street scene.