Urban Scene Segmentation and Cross-Dataset Transfer Learning using SegFormer
Loading...
Date
Authors
Hatkar, Tanmay Sunil
Ahmed, Saad Bin
Journal Title
Journal ISSN
Volume Title
Publisher
SPIE
Abstract
Semantic segmentation is essential for autonomous driving applications, but state-of-the-art models are typically evaluated on large datasets like Cityscapes, leaving smaller datasets underexplored. This research gap limits our understanding of how transformer-based models generalize across diverse urban scenes with limited training data. This paper presents a comprehensive evaluation of SegFormer architectural variants (B3, B4, B5) on the CamVid dataset and investigates cross-dataset transfer learning from CamVid to KITTI. Using an optimization framework combining cross-entropy loss with class weighting and boundary-aware components, our experiments establish
new performance baselines on CamVid and demonstrate that transfer learning provides benefits w hen target domain data is limited. We achieve a modest 2.57% relative mean Intersection over Union (mIoU) improvement on KITTI through knowledge transfer from CamVid, along with 61.1% faster convergence. Additionally, we observe substantial class-specific improvements of up to 30.75% for challenging c ategories. Our analysis provides insights into model scaling effects, c ross-dataset k nowledge t ransfer m echanisms, a nd p ractical s trategies for addressing data scarcity in urban scene segmentation.
Description
Keywords
Semantic segmentation, Transfer learning, Transformer, Computer vision, Autonomous driving
Citation
Hatkar, T. S., & Ahmed, S. B. (2025, August). Urban scene segmentation and cross-dataset transfer learning using SegFormer. In Eighth International Conference on Machine Vision and Applications (ICMVA 2025) 13734: 39-46. SPIE.
