Lightweight deep learning for monocular depth estimation
Master of Science
MetadataShow full item record
Monocular depth estimation is a challenging but significant part of computer vision with many applications in other areas of study. This estimation method aims to provide a relative depth prediction for a single input image. In the past, conventional methods have been able to give rough depth estimations however their accuracies were not sufficient. In recent years, due to the rise of deep convolutional neural networks (DCNNs), the accuracy of the depth estimations has increased. However, DCNNs do so at the expense of compute resources and time. This leads to the need for more lightweight solutions for the task. In this thesis, we use recent advances made in lightweight network design to reduce complexity. Furthermore, we use conventional methods to increase the performance of lightweight networks. Specifically, we propose a novel lightweight network architecture which has a significantly reduced complexity compared to current methods while still maintaining a competitive accuracy. We propose an encoder-decoder architecture that utilizes DiCE units  to reduce the complexity of the encoder. In addition, we utilize a custom designed decoder based on depthwise-separable convolutions. Furthermore, we propose a novel lightweight self-supervised training framework which leverages conventional methods to remove the need for pose estimation that current self-supervised methods have. Similar to current unsupervised and self-supervised methods, out method needs a pair of stereo images during training. However, we take advantage of this need to compute a ground truth approximation. Doing this we are able to eliminate the need for pose estimation that other self-supervised approaches have. Both our lightweight network and our self-supervised framework reduce the size and complexity of current state-of-the-art methods while maintaining competitive results in their respective areas.