Advancing object detection models: an investigation focused on small object detection in complex scenes
Abstract
Small object detection remains a persistent challenge in computer vision, especially in safetycritical
applications, such as autonomous driving and aerial surveillance, where objects of interest
often occupy only a few pixels and are easily lost in cluttered scenes. To advance the performance
of small object detection models, this thesis proposes two novel approaches focused on increasing
both accuracy and robustness.
The first approach introduces a semantic segmentation-guided feature fusion framework, where
contextual cues from a segmentation model are integrated into the object detection pipeline. A
lightweight attention mechanism is used to merge semantic and visual features, enhancing the
detection of small objects. The experimental results demonstrate clear improvements in identifying
challenging small targets, proving the effectiveness of cross-task feature integration.
The second approach utilizes feature pyramidal structures to improve multi-scale feature representation
through a novel dilated strip-wise spatial feature pyramid, which employs dilated stripwise
depth convolutions. Evaluated on the VisDrone and AI-TOD benchmark datasets, this model
shows significant improvements over the baseline, effectively detecting objects in densely packed
environments. The approach achieves state-of-the-art performance on the AI-TOD dataset.
Together, these approaches offer distinct strategies for overcoming the limitations of the existing
object detection models. The research findings emphasize the importance of both semantic
guidance and spatial feature refinement in enhancing small object detection.