Enhancing semantic segmentation: architectural innovations and strategies for label-efficient learning
Abstract
Semantic segmentation is a fundamental component of modern computer vision applications. Although
supervised learning models have achieved state-of-the-art performance in this domain, they
rely heavily on large volumes of labeled data, which is an expensive and time-consuming requirement.
Thus, this research aims to develop enhanced supervised semantic segmentation models that
balance accuracy and data efficiency for visual perception tasks in autonomous driving environments.
To achieve this, the thesis is organized into two distinct phases. The first phase investigates
a dual-network architecture, in which an auxiliary boundary detection network is incorporated into
the primary segmentation framework to mitigate pixelation artifacts at object boundaries in multiclass
segmentation of complex scenes. The experimental findings demonstrate the importance of
designing unified segmentation models that take advantage of architectural enhancements capable
of extracting richer feature representations for improved performance. The second phase leverages
insights from the previous stage and focuses on the development of an efficient deep learning
model with attention mechanisms and multi-scale feature refinement. The proposed method introduces
a novel depth-wise, point-wise feature pyramid module that extracts information-rich
spatio-semantic context from early and deep feature representations, improving model efficacy.
Exhaustive experimental studies conducted on widely used benchmark datasets validate the effectiveness
of the proposed models, which achieve competitive performance while offering improved
computational efficiency relative to baseline approaches. The findings highlight that strategically
balancing resource utilization with architectural innovation can yield strong performance while
minimizing annotation demands and environmental impact. This research sets a valuable precedent
for building competitive, resource-aware vision systems suited to constrained application settings.