{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,3]],"date-time":"2026-04-03T15:21:50Z","timestamp":1775229710234,"version":"3.50.1"},"reference-count":32,"publisher":"Oxford University Press (OUP)","issue":"9","license":[{"start":{"date-parts":[[2025,4,14]],"date-time":"2025-04-14T00:00:00Z","timestamp":1744588800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/pages\/standard-publication-reuse-rights"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,9,21]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Due to the challenges posed by background noise and the limited information available for small targets in remote sensing images, the detection performance for such targets remains unsatisfactory. To address these issues and enhance detection accuracy, we propose an improved algorithm based on RTDETR, named Adaptive Selective Transformer. Firstly, in the feature extraction network, we introduce an adaptive convolutional feature enhancement module to improve the multi-scale feature extraction capability in low-resolution remote sensing images. Secondly, we design a multi-scale enhancement structure to extract detailed information from small target images through enhanced multi-scale representation learning, thereby generating target features with stronger discriminative power. Finally, we propose a hierarchical frequency attention mechanism to achieve localized enhancement of contextual awareness, effectively capturing high-frequency local feature information of small targets. Experimental results demonstrate that the Adaptive Selective Transformer achieves superior small target detection performance, validating the effectiveness of our modifications to the original RTDETR model.<\/jats:p>","DOI":"10.1093\/comjnl\/bxaf040","type":"journal-article","created":{"date-parts":[[2025,3,29]],"date-time":"2025-03-29T18:42:52Z","timestamp":1743273772000},"page":"1329-1344","source":"Crossref","is-referenced-by-count":3,"title":["Small object detection in remote sensing images through multi-scale feature fusion"],"prefix":"10.1093","volume":"68","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1111-907X","authenticated-orcid":false,"given":"Sumin","family":"Li","sequence":"first","affiliation":[{"name":"School of Information Engineering, Minzu University of China , 27 Zhongguancun South Avenue, Beijing, 100081 ,","place":["China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-5252-0396","authenticated-orcid":false,"given":"Jinhua","family":"Lin","sequence":"additional","affiliation":[{"name":"School of Information Engineering, Minzu University of China , 27 Zhongguancun South Avenue, Beijing, 100081 ,","place":["China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-5830-5309","authenticated-orcid":false,"given":"Yijin","family":"Gang","sequence":"additional","affiliation":[{"name":"School of Human Settlements and Civil Engineering Xi\u2019an Jiaotong University , No. 28 Xianning West Road, Xi'an City, Shaanxi Province, 710049 ,","place":["China"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0630-7914","authenticated-orcid":false,"given":"Xiuqin","family":"Pan","sequence":"additional","affiliation":[{"name":"School of Information Engineering, Minzu University of China , 27 Zhongguancun South Avenue, Beijing, 100081 ,","place":["China"]}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2025,4,14]]},"reference":[{"key":"2025092706213606300_ref1","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1109\/MGRS.2021.3115137","article-title":"Deep learning for unmanned aerial vehicle-based object detection and tracking: a survey","volume":"10","author":"Wu","year":"2021","journal-title":"IEEE Geosci Remote Sens Mag"},{"key":"2025092706213606300_ref2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/AVSS.2019.8909830","article-title":"Drone detection in long-range surveillance videos","volume-title":"Proceedings of the 16th International Conference on Advanced Video and Signal-based Surveillance (AVSS)","author":"Nalamati","year":"2019"},{"key":"2025092706213606300_ref3","doi-asserted-by":"crossref","first-page":"5786","DOI":"10.1109\/JSTARS.2021.3079968","article-title":"Lightweight oriented object detection using multiscale context and enhanced channel attention in remote sensing images","volume":"14","author":"Ran","year":"2021","journal-title":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing"},{"key":"2025092706213606300_ref4","doi-asserted-by":"crossref","first-page":"3201","DOI":"10.3390\/electronics12143201","article-title":"A decoupled semantic\u2013detail learning network for remote sensing object detection in complex backgrounds","volume":"12","author":"Ruan","year":"2023","journal-title":"Electronic"},{"key":"2025092706213606300_ref5","doi-asserted-by":"crossref","first-page":"1498","DOI":"10.1109\/TIP.2023.3243853","article-title":"Single-source domain expansion network for cross-scene hyperspectral image classification","volume":"32","author":"Zhang","year":"2023","journal-title":"IEEE Trans Image Process"},{"key":"2025092706213606300_ref6","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1109\/JPROC.2023.3238524","article-title":"Object detection in 20 years: a survey","volume":"111","author":"Zou","year":"2023","journal-title":"Proc IEEE"},{"key":"2025092706213606300_ref7","first-page":"2125","article-title":"Image retrieval on real-life images with pre-trained vision-and-language models","volume-title":"Proceedings of the 18th International Conference on Computer Vision (ICCV)","author":"Liu","year":"2021"},{"key":"2025092706213606300_ref8","article-title":"Real-time flying object detection with YOLOv8","author":"Reis","year":"2023"},{"key":"2025092706213606300_ref9","first-page":"341","article-title":"Joint feature learning and relation modeling for tracking: A one-stream framework","volume-title":"Proceedings of the 17th European Conference on Computer Vision (ECCV)","author":"Ye","year":"2022"},{"key":"2025092706213606300_ref10","first-page":"580","article-title":"Rich feature hierarchies for accurate object detection and semantic segmentation","volume-title":"Proceedings of the 27th Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Girshick","year":"2014"},{"key":"2025092706213606300_ref11","article-title":"Faster r-cnn: towards real-time object detection with region proposal networks","volume":"39","author":"Ren","year":"2016","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence, New Jersey"},{"key":"2025092706213606300_ref12","first-page":"779","article-title":"You only look once: Unified, real-time object detection","volume-title":"Proceedings of the 29th Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Redmon","year":"2016"},{"key":"2025092706213606300_ref13","first-page":"7464","article-title":"YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors","volume-title":"Proceedings of the 36th International Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Wang","year":"2023"},{"key":"2025092706213606300_ref14","first-page":"213","article-title":"End-to-end object detection with transformers","volume-title":"Proceedings of the 16th European Conference on Computer Vision (ECCV)","author":"Carion","year":"2020"},{"key":"2025092706213606300_ref15","first-page":"16965","article-title":"DETRs beat YOLOs on real-time object detection","volume-title":"Proceedings of the 37th Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Zhao","year":"2024"},{"key":"2025092706213606300_ref16","first-page":"bxae119","article-title":"Remote sensing image dehazing method in mountaineering equipment","volume":"77","author":"Guo","year":"2024","journal-title":"Comput J"},{"key":"2025092706213606300_ref17","doi-asserted-by":"crossref","DOI":"10.1109\/TGRS.2023.3304710","article-title":"Geodesic distance based scattering power decomposition for compact polarimetric SAR data","volume":"61","author":"Muhuri","year":"2023","journal-title":"IEEE Trans Geosci Remote Sens"},{"key":"2025092706213606300_ref18","doi-asserted-by":"crossref","first-page":"9528","DOI":"10.1109\/TNNLS.2022.3151138","article-title":"DualConv: dual convolutional kernels for lightweight deep neural networks","volume":"34","author":"Zhong","year":"2022","journal-title":"IEEE Trans Neural Networks Learn Syst"},{"key":"2025092706213606300_ref19","first-page":"9308","article-title":"Deformable ConvNets v2: More deformable, better results","volume-title":"Proceedings of the 32nd Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Zhu","year":"2019"},{"key":"2025092706213606300_ref20","first-page":"268","article-title":"MFFNet: a lightweight multi-feature fusion network for UAV infrared object detection","volume":"27","author":"Chen","year":"2024","journal-title":"Egypt J Remote Sens Space Sci"},{"key":"2025092706213606300_ref21","doi-asserted-by":"crossref","first-page":"346","DOI":"10.1016\/j.cviu.2007.09.014","article-title":"Speeded-up robust features (SURF)","volume":"110","author":"Bay","year":"2008","journal-title":"Comput Vision Image Understanding"},{"key":"2025092706213606300_ref22","article-title":"Steganography and its advancements in spatial domain","author":"Garg","year":"2019","journal-title":"EasyChair"},{"key":"2025092706213606300_ref23","first-page":"242","article-title":"Digital preservation of cultural heritage for future generations. Interdisciplinary digital preservation tools and technologies","author":"Aggarwal","year":"2017","journal-title":"IGI Global"},{"key":"2025092706213606300_ref24","doi-asserted-by":"crossref","first-page":"159","DOI":"10.1007\/978-3-030-27157-2_12","article-title":"Fusion and enhancement techniques for processing of multispectral images","volume-title":"Unmanned Aerial Vehicle: Applications in Agriculture and Environment","author":"Aggarwal","year":"2020"},{"key":"2025092706213606300_ref25","doi-asserted-by":"crossref","first-page":"25345","DOI":"10.1109\/TITS.2022.3158253","article-title":"Edge YOLO: real-time intelligent object detection system based on edge-cloud cooperation in autonomous vehicles","volume":"23","author":"Liang","year":"2022","journal-title":"IEEE Trans Intell Transp Syst"},{"key":"2025092706213606300_ref26","doi-asserted-by":"crossref","first-page":"2162","DOI":"10.3390\/app14052162","article-title":"Small-scale foreign object debris detection using deep learning and dual light modes","volume":"14","author":"Mo","year":"2024","journal-title":"Appl Sci"},{"key":"2025092706213606300_ref27","doi-asserted-by":"crossref","first-page":"464","DOI":"10.3390\/electronics13020464","article-title":"Detection of small lesions on grape leaves based on improved YOLOv7","volume":"13","author":"Yang","year":"2024","journal-title":"Electronics"},{"key":"2025092706213606300_ref28","doi-asserted-by":"crossref","first-page":"25","DOI":"10.3390\/rs16010025","article-title":"An efficient rep-style gaussian\u2013wasserstein network: improved UAV infrared small object detection for urban road surveillance and safety","volume":"16","author":"Aibibu","year":"2023","journal-title":"Remote Sens (Basel)"},{"key":"2025092706213606300_ref29","doi-asserted-by":"crossref","first-page":"4970","DOI":"10.3390\/electronics12244970","article-title":"Revolutionizing target detection in intelligent traffic systems: Yolov8-snakevision","volume":"12","author":"Liu","year":"2023","journal-title":"Electronics"},{"key":"2025092706213606300_ref30","doi-asserted-by":"crossref","DOI":"10.1109\/ICCV51070.2023.00558","article-title":"Dynamic snake convolution based on topological geometric constraints for tubular structure segmentation","volume-title":"Proceedings of the International Conference on Computer Vision (ICCV)","author":"Qi","year":"2023"},{"key":"2025092706213606300_ref31","doi-asserted-by":"publisher","first-page":"2251","DOI":"10.1007\/s00371-023-02914-x","article-title":"Joint attribute soft-sharing and contextual local: a multi-level features learning network for person re-identification","volume":"40","author":"Wang","year":"2024","journal-title":"The Visual Computer"},{"key":"2025092706213606300_ref32","article-title":"SMFANet: A lightweight self-modulation feature aggregation network for efficient image super-resolution","volume-title":"Proceedings of the 17th European Conference on Computer Vision (ECCV)","author":"Zheng","year":"2024"}],"container-title":["The Computer Journal"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/comjnl\/article-pdf\/68\/9\/1329\/62926601\/bxaf040.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/comjnl\/article-pdf\/68\/9\/1329\/62926601\/bxaf040.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,27]],"date-time":"2025-09-27T10:21:50Z","timestamp":1758968510000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/comjnl\/article\/68\/9\/1329\/8113253"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,4,14]]},"references-count":32,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2025,4,14]]},"published-print":{"date-parts":[[2025,9,21]]}},"URL":"https:\/\/doi.org\/10.1093\/comjnl\/bxaf040","relation":{},"ISSN":["0010-4620","1460-2067"],"issn-type":[{"value":"0010-4620","type":"print"},{"value":"1460-2067","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,9]]},"published":{"date-parts":[[2025,4,14]]}}}