{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,11]],"date-time":"2026-04-11T05:13:04Z","timestamp":1775884384333,"version":"3.50.1"},"reference-count":53,"publisher":"Springer Science and Business Media LLC","issue":"4","license":[{"start":{"date-parts":[[2024,5,8]],"date-time":"2024-05-08T00:00:00Z","timestamp":1715126400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,5,8]],"date-time":"2024-05-08T00:00:00Z","timestamp":1715126400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2024,8]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Object detection plays a vital role in remote sensing applications. Although object detection has achieved proud results in natural images, these methods are difficult to be directly applied to remote sensing images. Remote sensing images often have complex backgrounds and small objects, which results in a highly unbalanced distribution of foreground and complex background information. In order to solve the above problems, this paper proposes a multi-head channel and spatial trans-attention (MCSTA) module, which performs remote pixel interaction from the channel and spatial dimensions respectively to complete the attention feature capture function. It is a plug-and-play module that can be easily embedded in any other natural image object detection convolutional neural network, making it quickly applicable to remote sensing images. First, in order to reduce computational complexity and improve feature richness, we use a special linear convolution to obtain three projection features instead of the simple matrix multiplication transformation in Transformer. Second, we obtain trans-attention maps in different dimensions in a manner similar to the self-attention mechanism to capture the interrelationships of features in channels and spaces. In this process, we use a multi-head mechanism to perform parallel operations to improve speed. Furthermore, in order to avoid large-scale matrix operations, we specially designed an attention blocking mode to reduce computer memory usage and increase operation speed. Finally, we embedded the trans-attention module into YOLOv8, added a new detection head and optimized the feature fusion method, thus designing a lightweight small object detection model named TA-YOLO for remote sensing images. It has fewer parameters than the benchmark model YOLOv8, and its mAP on the PASCAL VOC and VisDrone data sets increased by 1.3% and 6.2% respectively. The experimental results prove the powerful function of the trans-attention module and the excellent performance of TA-YOLO.<\/jats:p>","DOI":"10.1007\/s40747-024-01448-6","type":"journal-article","created":{"date-parts":[[2024,5,8]],"date-time":"2024-05-08T07:01:47Z","timestamp":1715151707000},"page":"5459-5473","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":52,"title":["TA-YOLO: a lightweight small object detection model based on multi-dimensional trans-attention module for remote sensing images"],"prefix":"10.1007","volume":"10","author":[{"ORCID":"https:\/\/orcid.org\/0009-0004-2537-9982","authenticated-orcid":false,"given":"Minze","family":"Li","sequence":"first","affiliation":[]},{"given":"Yuling","family":"Chen","sequence":"additional","affiliation":[]},{"given":"Tao","family":"Zhang","sequence":"additional","affiliation":[]},{"given":"Wu","family":"Huang","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,5,8]]},"reference":[{"key":"1448_CR1","doi-asserted-by":"crossref","unstructured":"Zou Z, Chen K, Shi Z, Guo Y, Ye J (2023) Object detection in 20 years: a survey. In: Proceedings of the IEEE","DOI":"10.1109\/JPROC.2023.3238524"},{"key":"1448_CR2","doi-asserted-by":"crossref","unstructured":"Li J, Xu R, Ma J, Zou Q, Ma J, Yu H (2023) Domain adaptive object detection for autonomous driving under foggy weather. In: Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, pp 612\u2013622","DOI":"10.1109\/WACV56688.2023.00068"},{"issue":"4","key":"1448_CR3","doi-asserted-by":"publisher","DOI":"10.1088\/1361-6501\/acb075","volume":"34","author":"L Shen","year":"2023","unstructured":"Shen L, Tao H, Ni Y, Wang Y, Stojanovic V (2023) Improved yolov3 model with feature map cropping for multi-scale road object detection. Meas Sci Technol 34(4):045406","journal-title":"Meas Sci Technol"},{"key":"1448_CR4","doi-asserted-by":"crossref","unstructured":"Mao J, Shi S, Wang X, Li H (2022) 3D object detection for autonomous driving: a review and new outlooks. arXiv:2206.09474","DOI":"10.1016\/j.neucom.2021.11.048"},{"key":"1448_CR5","doi-asserted-by":"publisher","DOI":"10.1016\/j.iot.2023.100709","volume":"22","author":"A El-Ghamry","year":"2023","unstructured":"El-Ghamry A, Darwish A, Hassanien AE (2023) An optimized CNN-based intrusion detection system for reducing risks in smart farming. Internet Things 22:100709","journal-title":"Internet Things"},{"key":"1448_CR6","doi-asserted-by":"crossref","unstructured":"Zhou W, Guan H, Li Z, Shao Z, Delavar MR (2023) Remote sensing image retrieval in the past decade: achievements, challenges, and future directions. In: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","DOI":"10.1109\/JSTARS.2023.3236662"},{"key":"1448_CR7","doi-asserted-by":"crossref","unstructured":"Liang Y, Han Y, Jiang F (2022) Deep learning-based small object detection: a survey. In: Proceedings of the 8th International Conference on Computing and Artificial Intelligence, pp 432\u2013438","DOI":"10.1145\/3532213.3532278"},{"key":"1448_CR8","doi-asserted-by":"publisher","first-page":"303","DOI":"10.1007\/s11263-009-0275-4","volume":"88","author":"M Everingham","year":"2010","unstructured":"Everingham M, Gool LV, Williams CKI, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88:303\u2013338","journal-title":"Int J Comput Vis"},{"key":"1448_CR9","doi-asserted-by":"crossref","unstructured":"Lin T-Y, Maire M, Belongie S , Hays J, Perona P, Ramanan D, Doll\u00e1r P , Zitnick CL (2014) Microsoft coco: Common objects in context. In: Computer vision\u2014ECCV 2014: 13th European Conference, Zurich, Switzerland, Sept 6\u201312, 2014, Proceedings, Part V 13, pp 740\u2013755. Springer","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"1448_CR10","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2023.119960","volume":"24","author":"L Wen","year":"2023","unstructured":"Wen L, Cheng Y, Fang Y, Li X (2023) A comprehensive survey of oriented object detection in remote sensing images. Expert Syst Appl 24:119960","journal-title":"Expert Syst Appl"},{"key":"1448_CR11","first-page":"1","volume":"61","author":"C Li","year":"2023","unstructured":"Li C, Cheng G, Wang G, Zhou P, Han J (2023) Instance-aware distillation for efficient object detection in remote sensing images. IEEE Trans Geosci Remote Sens 61:1\u201311","journal-title":"IEEE Trans Geosci Remote Sens"},{"key":"1448_CR12","first-page":"1","volume":"61","author":"J Zhang","year":"2023","unstructured":"Zhang J, Lei J, Xie W, Fang Z, Li Y, Qian D (2023) Superyolo: super resolution assisted object detection in multimodal remote sensing imagery. IEEE Trans Geosci Remote Sens 61:1\u201315","journal-title":"IEEE Trans Geosci Remote Sens"},{"key":"1448_CR13","first-page":"1","volume":"61","author":"L Gao","year":"2023","unstructured":"Gao L, Liu B, Ping F, Mingzhu X (2023) Adaptive spatial tokenization transformer for salient object detection in optical remote sensing images. IEEE Trans Geosci Remote Sens 61:1\u201315","journal-title":"IEEE Trans Geosci Remote Sens"},{"key":"1448_CR14","doi-asserted-by":"crossref","unstructured":"Liu Y, Yuan Y, Wang Q (2023) Uncertainty-aware graph reasoning with global collaborative learning for remote sensing salient object detection. In: IEEE Geoscience and Remote Sensing Letters","DOI":"10.1109\/LGRS.2023.3299245"},{"key":"1448_CR15","doi-asserted-by":"crossref","unstructured":"Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779\u2013788","DOI":"10.1109\/CVPR.2016.91"},{"key":"1448_CR16","doi-asserted-by":"crossref","unstructured":"Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263\u20137271","DOI":"10.1109\/CVPR.2017.690"},{"key":"1448_CR17","unstructured":"Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv:1804.02767"},{"key":"1448_CR18","unstructured":"Bochkovskiy A, Wang C-Y, Liao HYM (2020) Yolov4: optimal speed and accuracy of object detection. arXiv:2004.10934"},{"key":"1448_CR19","unstructured":"Jocher G (2022) Yolov5. code repository https:\/\/www.github.com\/ultralytics\/yolov5"},{"key":"1448_CR20","unstructured":"Li C, Li L, Jiang H, Weng K, Geng Y, Li L, Ke Z, Li Q, Cheng M, Nie W et\u00a0al (2022) Yolov6: a single-stage object detection framework for industrial applications. arXiv:2209.02976"},{"key":"1448_CR21","doi-asserted-by":"crossref","unstructured":"Wang C-Y, Bochkovskiy A, Liao HYM (2023) Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp 7464\u20137475","DOI":"10.1109\/CVPR52729.2023.00721"},{"key":"1448_CR22","unstructured":"Jocher G (2023) Yolov8. code repository https:\/\/github.com\/ultralytics\/ultralytics"},{"key":"1448_CR23","doi-asserted-by":"crossref","unstructured":"Wang K, Liew JH, Zou Y, Zhou D, Feng J (2019) Panet: few-shot image semantic segmentation with prototype alignment. In: Proceedings of the IEEE\/CVF international conference on computer vision, pp 9197\u20139206","DOI":"10.1109\/ICCV.2019.00929"},{"key":"1448_CR24","unstructured":"Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser \u0141, Polosukhin I (2017) Attention is all you need. In: 2017 The Thirty-first Conference on neural information processing systems (NeurIPS), pp 5998\u20136008"},{"key":"1448_CR25","doi-asserted-by":"crossref","unstructured":"Du D, Zhu P, Wen L, Bian X, Lin H, Hu Q, Peng T, Zheng J, Wang X, Zhang Y , et\u00a0al (2019) Visdrone-det2019: The vision meets drone object detection in image challenge results. In: Proceedings of the IEEE\/CVF international conference on computer vision workshops","DOI":"10.1109\/ICCVW.2019.00031"},{"key":"1448_CR26","doi-asserted-by":"crossref","unstructured":"Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580\u2013587","DOI":"10.1109\/CVPR.2014.81"},{"key":"1448_CR27","doi-asserted-by":"crossref","unstructured":"Girshick Ross (2015) Fast r-CNN. In: Proceedings of the IEEE international conference on computer vision, pp 1440\u20131448","DOI":"10.1109\/ICCV.2015.169"},{"key":"1448_CR28","unstructured":"Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: 2015 The Twenty-nine Conference on neural information processing systems (NeurIPS), pp 91\u201399"},{"key":"1448_CR29","doi-asserted-by":"crossref","unstructured":"He K, Gkioxari G, Doll\u00e1r P, Girshick R (2017) Mask r-CNN. In: Proceedings of the IEEE international conference on computer vision, pp 2961\u20132969","DOI":"10.1109\/ICCV.2017.322"},{"key":"1448_CR30","unstructured":"Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861"},{"key":"1448_CR31","doi-asserted-by":"crossref","unstructured":"Sandler M, Howard A , Zhu M, Zhmoginov A, Chen L-C (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In :Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510\u20134520","DOI":"10.1109\/CVPR.2018.00474"},{"key":"1448_CR32","doi-asserted-by":"crossref","unstructured":"Howard A, Sandler M, Chu G, Chen LC, Chen B, Tan M, Wang W, Zhu Y, Pang R, Vasudevan V et al (2019) Searching for mobilenetv3. In: Proceedings of the IEEE\/CVF international conference on computer vision, pp 1314\u20131324","DOI":"10.1109\/ICCV.2019.00140"},{"key":"1448_CR33","doi-asserted-by":"crossref","unstructured":"Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6848\u20136856","DOI":"10.1109\/CVPR.2018.00716"},{"key":"1448_CR34","doi-asserted-by":"crossref","unstructured":"Ma N, Zhang X, Zheng H-T, Sun J (2018) Shufflenet v2: practical guidelines for efficient CNN architecture design. In: Proceedings of the European conference on computer vision (ECCV), pp 116\u2013131","DOI":"10.1007\/978-3-030-01264-9_8"},{"key":"1448_CR35","doi-asserted-by":"crossref","unstructured":"Liu W , Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: single shot multibox detector. In: Computer vision\u2013ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11\u201314, 2016, Proceedings, Part I 14, pp 21\u201337. Springer","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"1448_CR36","doi-asserted-by":"crossref","unstructured":"Lin T-Y, Goyal P, Girshick R, He K, Doll\u00e1r P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980\u20132988","DOI":"10.1109\/ICCV.2017.324"},{"key":"1448_CR37","unstructured":"Tan M, Le Q (2019) Efficientnet: rethinking model scaling for convolutional neural networks. In: International conference on machine learning (PMLR), pp 6105\u20136114"},{"key":"1448_CR38","doi-asserted-by":"crossref","unstructured":"Hu P, Ramanan D (2017) Finding tiny faces. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 951\u2013959","DOI":"10.1109\/CVPR.2017.166"},{"key":"1448_CR39","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.isprsjprs.2020.04.019","volume":"166","author":"Z Zheng","year":"2020","unstructured":"Zheng Z, Zhong Y, Ma A, Han X, Zhao J, Liu Y, Zhang L (2020) Hynet: hyper-scale object detection network framework for multiple spatial resolution remote sensing imagery. ISPRS J Photogram Remote Sens 166:1\u201314","journal-title":"ISPRS J Photogram Remote Sens"},{"key":"1448_CR40","doi-asserted-by":"crossref","unstructured":"Liu W, Anguelov D, Erhan D, Szegedy C, Reed S , Fu C-Y, Berg AC (2016) Ssd: Single shot multibox detector. In Computer Vision\u2013ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, Oct 11\u201314, 2016, Proceedings, Part I 14, pp 21\u201337. Springer","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"1448_CR41","unstructured":"Fu C-Y, Liu W, Ranga A, Tyagi A, Berg AC (2017) DSSD: Deconvolutional single shot detector. arXiv:1701.06659"},{"key":"1448_CR42","doi-asserted-by":"crossref","unstructured":"Xiang W, Zhang D-Q, Yu H, Athitsos V (2018) Context-aware single-shot detector. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp 1784\u20131793","DOI":"10.1109\/WACV.2018.00198"},{"key":"1448_CR43","doi-asserted-by":"crossref","unstructured":"Cao G, Xie X, Yang W, Liao Q , Shi G, Wu J (2018) Feature-fused SSD: fast detection for small objects. In: Ninth international conference on graphic and image processing (ICGIP 2017), vol 10615, pp 381\u2013388","DOI":"10.1117\/12.2304811"},{"key":"1448_CR44","doi-asserted-by":"crossref","unstructured":"Bell S, Zitnick CL, Bala K, Girshick R (2016) Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2874\u20132883","DOI":"10.1109\/CVPR.2016.314"},{"key":"1448_CR45","doi-asserted-by":"crossref","unstructured":"Bai Y, Zhang Y, Ding M, Ghanem B (2018) SOD-MTGAN: Small object detection via multi-task generative adversarial network. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 206\u2013221","DOI":"10.1007\/978-3-030-01261-8_13"},{"key":"1448_CR46","doi-asserted-by":"crossref","unstructured":"Noh J, Bae W, Lee W, Seo J, Kim G (2019) Better to follow, follow to be better: towards precise supervision of feature super-resolution for small object detection. In: Proceedings of the IEEE\/CVF International Conference on Computer Vision, pp 9725\u20139734","DOI":"10.1109\/ICCV.2019.00982"},{"key":"1448_CR47","unstructured":"Mnih V, Heess N, Graves A et al (2014) Recurrent models of visual attention. In: 2014 The Twenty-nine Conference on neural information processing systems (NeurIPS), pp 2204\u20132212"},{"key":"1448_CR48","doi-asserted-by":"crossref","unstructured":"Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132\u20137141","DOI":"10.1109\/CVPR.2018.00745"},{"key":"1448_CR49","doi-asserted-by":"crossref","unstructured":"Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3\u201319","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"1448_CR50","doi-asserted-by":"crossref","unstructured":"Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 13713\u201313722","DOI":"10.1109\/CVPR46437.2021.01350"},{"key":"1448_CR51","doi-asserted-by":"crossref","unstructured":"Han K, Wang Y, Tian Q, Guo J, Xu C, Xu C (2020) Ghostnet: more features from cheap operations. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 1580\u20131589","DOI":"10.1109\/CVPR42600.2020.00165"},{"key":"1448_CR52","doi-asserted-by":"crossref","unstructured":"Liu Z, Hu H, Lin Y, Yao Z, Xie Z, Wei Y, Ning J , Cao Y, Zhang Z, Dong L, et\u00a0al (2022) Swin transformer v2: scaling up capacity and resolution. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 12009\u201312019","DOI":"10.1109\/CVPR52688.2022.01170"},{"key":"1448_CR53","doi-asserted-by":"crossref","unstructured":"Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp 618\u2013626","DOI":"10.1109\/ICCV.2017.74"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-024-01448-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-024-01448-6\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-024-01448-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,11,18]],"date-time":"2024-11-18T07:46:39Z","timestamp":1731915999000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-024-01448-6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,5,8]]},"references-count":53,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2024,8]]}},"alternative-id":["1448"],"URL":"https:\/\/doi.org\/10.1007\/s40747-024-01448-6","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"value":"2199-4536","type":"print"},{"value":"2198-6053","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,5,8]]},"assertion":[{"value":"19 September 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"6 April 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"8 May 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}