Vol. 3 No. 5 (2024)
Articles

RT-DETR-Based Multimodal Detection with Modality Attention and Feature Alignment

Published 2024-08-30

Keywords

  • Infrared and visible light fusion, target detection, RT-DETR, modal attention

How to Cite

Lou, Y. (2024). RT-DETR-Based Multimodal Detection with Modality Attention and Feature Alignment. Journal of Computer Technology and Software, 3(5). https://doi.org/10.5281/zenodo.15392302

Abstract

Infrared-visible fusion object detection plays a vital role in visual perception under complex environments. However, existing methods still face challenges in feature alignment, modality complementarity, and detection accuracy. To address these issues, this paper proposes a multimodal object detection method based on an improved RT-DETR. A dual-branch feature extraction network is designed to process infrared and visible images separately, and a modality attention mechanism is introduced to enhance cross-modal information interaction. In addition, a feature alignment loss is employed to optimize the fusion process and improve the model's adaptability to different modalities. Experimental results show that the proposed method achieves superior performance on multiple benchmark datasets. Compared to traditional single-modal approaches, the improved RT-DETR shows higher mAP@50 and mAP@95 scores and demonstrates greater robustness under challenging lighting conditions. Compared with existing multimodal detection methods, the proposed model maintains high detection accuracy while improving class discrimination and reducing false positives and missed detections, validating its effectiveness in multimodal visual perception tasks.