ISBN: 978-981-18-7950-0 DOI: 10.18178/wcse.2023.06.022
T-SSD: A Transformer-based Single-Stage Multi-Scale Sampling Object Detector
Abstract—The object detection algorithms are the cornerstones of autonomous driving systems, they are mostly based on convolutional neural networks (CNNs) with one or two stages. Since its strong correlation with the life safety of the driver, the accuracy of object detectors is crucial and limited by its foundation, CNN, which is hard to improve nowadays. But at the same time, the basic transformer shows its better performance compared with the advanced CNN. To improve the accuracy, using transformers seems to be a better choice. However, most transformer-based detectors are only backbone replacements, ViT concept extension, or a fusion with CNN, cannot give a full play to the performance referring to the characteristics of
the transformer. We proposed a single-stage object detector T-SSD (Transformer-based Single-Stage Detector) that comes with a multi-scale feature modeling ability. The transformer backbone extracts feature in different scales and aggregates them into an intermediate representation. The transformer neck then directly queries the semantic information from the aggregated representation and feed them to heads to make prediction once and for all. After training on COCO2017, by combining the construction philosophy of the object detector and the characteristics of transformers, our T-SSD-Tiny gives an AP (Average Precision) up to 9.0 higher than the CNN-based detectors with 100 fewer epochs, better than YOLOv3-Base and SSD-300. Also, the AP given by our T-SSD-Small is up to 4.7 higher than the transformer-based detector with the same epoch, indicating a comparable performance with DETR-ResNet-101 and YOLOS-Small.
Index Terms—Transformer, Vision Transformer, Object Detection, One-Stage Algorithm
Kailai Huang, Mi Wen, Chen Wang, Lina Ling
College of Computer Science and Technology, Shanghai University of Electric Power, CHINA
Cite: Kailai Huang, Mi Wen, Chen Wang, Lina Ling, "T-SSD: A Transformer-based Single-Stage Multi-Scale Sampling Object Detector" Proceedings of 2023 the 13th International Workshop on Computer Science and Engineering (WCSE 2023), pp. 156-163, June 16-18, 2023.