{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,11]],"date-time":"2026-03-11T04:13:46Z","timestamp":1773202426091,"version":"3.50.1"},"reference-count":68,"publisher":"Springer Science and Business Media LLC","issue":"6","license":[{"start":{"date-parts":[[2023,5,15]],"date-time":"2023-05-15T00:00:00Z","timestamp":1684108800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,5,15]],"date-time":"2023-05-15T00:00:00Z","timestamp":1684108800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Important Research Project of Hebei Province","award":["22370301D"],"award-info":[{"award-number":["22370301D"]}]},{"name":"Scientific Research Foundation of Hebei University for Distinguished Young Scholars","award":["521100221081"],"award-info":[{"award-number":["521100221081"]}]},{"name":"Scientific Research Foundation of Colleges and Universities in Hebei Province","award":["QN2022107"],"award-info":[{"award-number":["QN2022107"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2023,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Object detection in unmanned aerial vehicle (UAV) images has attracted the increasing attention of researchers in recent years. However, it is challenging for small object detection using conventional detection methods because less location and semantic information are extracted from the feature maps of UAV images. To remedy this problem, three new feature extraction modules are proposed in this paper to refine the feature maps for small objects in UAV images. Namely, <jats:bold>S<\/jats:bold>mall-<jats:bold>K<\/jats:bold>ernel-<jats:bold>Block <\/jats:bold>(SKBlock), <jats:bold>L<\/jats:bold>arge-<jats:bold>K<\/jats:bold>ernel-<jats:bold>Block <\/jats:bold>(LKBlock), and <jats:bold>C<\/jats:bold>onv-<jats:bold>T<\/jats:bold>rans-<jats:bold>Block <\/jats:bold>(CTBlock), respectively. Based on these three modules, a novel backbone called <jats:bold>H<\/jats:bold>igh-<jats:bold>R<\/jats:bold>esolution <jats:bold>C<\/jats:bold>onv-<jats:bold>T<\/jats:bold>rans <jats:bold>N<\/jats:bold>etwork (HRCTNet) is proposed. Additionally, an activation function Acon is deployed in our network to reduce the possibility of dying ReLU and remove redundant features. Based on the characteristics of extreme imbalanced labels in UAV image datasets, a loss function Ployloss is adopted to train HRCTNet. To verify the effectiveness of the proposed HRCTNet, corresponding experiments have been conducted on several datasets. On VisDrone dataset, HRCTNet achieves 49.5% on AP<jats:sub>50<\/jats:sub> and 29.1% on AP, respectively. As on COCO dataset, with limited FLOPs, HRCTNet achieves 37.9% on AP and 24.1% on AP<jats:sub>S<\/jats:sub>. The experimental results demonstrate that HRCTNet outperforms the existing methods for object detection in UAV images.<\/jats:p>","DOI":"10.1007\/s40747-023-01076-6","type":"journal-article","created":{"date-parts":[[2023,5,15]],"date-time":"2023-05-15T13:51:02Z","timestamp":1684158662000},"page":"6437-6457","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":11,"title":["HRCTNet: a hybrid network with high-resolution representation for object detection in UAV image"],"prefix":"10.1007","volume":"9","author":[{"given":"Wenjie","family":"Xing","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7345-1422","authenticated-orcid":false,"given":"Zhenchao","family":"Cui","sequence":"additional","affiliation":[]},{"given":"Jing","family":"Qi","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,5,15]]},"reference":[{"key":"1076_CR1","doi-asserted-by":"publisher","first-page":"1670","DOI":"10.3390\/rs13091670","volume":"13","author":"D Avola","year":"2021","unstructured":"Avola D, Cinque L, Diko A, Fagioli A, Foresti GL, Mecca A, Pannone D, Piciarelli C (2021) MS-faster R-CNN: multi-stream backbone for improved faster R-CNN object detection and aerial tracking from UAV images. Remote Sens 13:1670","journal-title":"Remote Sens"},{"key":"1076_CR2","doi-asserted-by":"publisher","first-page":"653","DOI":"10.3390\/rs13040653","volume":"13","author":"V Stojni\u0107","year":"2021","unstructured":"Stojni\u0107 V, Risojevic V, Mustra M, Jovanovic V, Filipi J, Kezic N, Babic Z (2021) A method for detection of small moving objects in UAV videos. Remote Sens 13:653","journal-title":"Remote Sens"},{"key":"1076_CR3","doi-asserted-by":"publisher","first-page":"230","DOI":"10.3390\/rs13020230","volume":"13","author":"Y Ma","year":"2021","unstructured":"Ma Y, Li Q, Chu L, Zhou Y, Xu C (2021) Real-time detection and spatial localization of insulators for UAV inspection based on binocular stereo vision. Remote Sens 13:230","journal-title":"Remote Sens"},{"key":"1076_CR4","unstructured":"Redmon J, Farhadi A (2018) YOLOv3: an incremental improvement. arXiv: arXiv:1804.02767abs\/1804.02767"},{"key":"1076_CR5","unstructured":"Bochkovskiy A, Wang C-Y, Liao H-YM (2020) YOLOv4: optimal speed and accuracy of object detection. arXiv: arXiv:2004.10934"},{"key":"1076_CR6","doi-asserted-by":"crossref","unstructured":"Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: 2017 IEEE Conference on computer vision and pattern recognition (CVPR), (Piscataway: IEEE Press, Honolulu, USA, 2017), pp 6517\u20136525","DOI":"10.1109\/CVPR.2017.690"},{"key":"1076_CR7","doi-asserted-by":"crossref","unstructured":"Redmon J, Divvala SK, Girshick RB, Farhadi A (2016) You only look once: unified, real-time object detection. In: 2016 IEEE Conference on computer vision and pattern recognition (CVPR), (Piscataway: IEEE Press, Las Vegas, USA, 2016), pp 779\u2013788","DOI":"10.1109\/CVPR.2016.91"},{"key":"1076_CR8","doi-asserted-by":"crossref","unstructured":"Lin T-Y, Doll\u00e1r P, Girshick RB, He K, Hariharan B, Belongie SJ (2017) Feature pyramid networks for object detection. In: 2017 IEEE Conference on computer vision and pattern recognition (CVPR), (Piscataway: IEEE Press, Honolulu, USA, 2017), pp 936\u2013944","DOI":"10.1109\/CVPR.2017.106"},{"key":"1076_CR9","doi-asserted-by":"crossref","unstructured":"Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: single shot multibox detector. In: 14th European Conference on computer vision (ECCV), (Cham: Springer International Publishing, Amsterdam, The Netherlands, 2016), pp 21\u201337","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"1076_CR10","doi-asserted-by":"publisher","DOI":"10.1088\/1361-6501\/ac8368","volume":"33","author":"H Tao","year":"2022","unstructured":"Tao H, Cheng L, Qiu J, Stojanovic V (2022) Few shot cross equipment fault diagnosis method based on parameter optimization and feature mertic. Meas Sci Technol 33:115005","journal-title":"Meas Sci Technol"},{"key":"1076_CR11","doi-asserted-by":"crossref","unstructured":"Zhu P, Wen L, Du D, Bian X, Fan H, Hu Q, Ling H (2021), Detection and tracking meet drones challenge. IEEE Trans Pattern Anal Mach Intell 44:7380\u20137399","DOI":"10.1109\/TPAMI.2021.3119563"},{"key":"1076_CR12","doi-asserted-by":"crossref","unstructured":"Wen L, Du D, Zhu P, Hu Q, Wang Q, Bo L, Lyu S (2021) Detection, tracking, and counting meets drones in crowds: a benchmark. In: 2021 IEEE\/CVF Conference on computer vision and pattern Recognition (CVPR), (Piscataway: IEEE Press, Electr Network, 2021), pp 7808\u20137817","DOI":"10.1109\/CVPR46437.2021.00772"},{"key":"1076_CR13","doi-asserted-by":"publisher","first-page":"1556","DOI":"10.1109\/TIP.2020.3045636","volume":"30","author":"S Deng","year":"2021","unstructured":"Deng S, Li S, Xie K, Song W, Liao X, Hao A, Qin H (2021) A global-local self-adaptive network for drone-view object detection. IEEE Trans Image Process 30:1556\u20131569","journal-title":"IEEE Trans Image Process"},{"key":"1076_CR14","doi-asserted-by":"publisher","first-page":"936","DOI":"10.1109\/TSMC.2020.3005231","volume":"52","author":"G Chen","year":"2022","unstructured":"Chen G, Wang HT, Chen K, Li ZJ, Song ZD, Liu YL, Chen WK, Knoll A (2022) a survey of the four pillars for small object detection: multiscale representation, contextual information, super-resolution, and region proposal. IEEE Trans Syst Man Cybern-Syst 52:936\u2013953","journal-title":"IEEE Trans Syst Man Cybern-Syst"},{"key":"1076_CR15","doi-asserted-by":"publisher","first-page":"045406","DOI":"10.1088\/1361-6501\/acb075","volume":"34","author":"L Shen","year":"2023","unstructured":"Shen L, Tao H, Ni Y, Wang Y, Vladimir S (2023) Improved YOLOv3 model with feature map cropping for multi-scale road object detection. Meas Sci Technol 34:045406","journal-title":"Meas Sci Technol"},{"key":"1076_CR16","doi-asserted-by":"publisher","first-page":"78311","DOI":"10.1109\/ACCESS.2019.2922479","volume":"7","author":"K-J Kim","year":"2019","unstructured":"Kim K-J, Kim P-K, Chung Y-S, Choi D-H (2019) Multi-scale detector for accurate vehicle detection in traffic surveillance data. IEEE Access 7:78311\u201378319","journal-title":"IEEE Access"},{"key":"1076_CR17","doi-asserted-by":"publisher","first-page":"97","DOI":"10.1007\/s00034-013-9633-0","volume":"33","author":"V Stojanovic","year":"2014","unstructured":"Stojanovic V, Filipovic V (2014) Adaptive input design for identification of output error model with constrained output. Circ Syst Signal Process 33:97\u2013113","journal-title":"Circ Syst Signal Process"},{"key":"1076_CR18","doi-asserted-by":"publisher","first-page":"439","DOI":"10.1016\/j.neunet.2022.08.029","volume":"155","author":"K Min","year":"2022","unstructured":"Min K, Lee G-H, Lee S-W (2022) Attentional feature pyramid network for small object detection. Neural Netw 155:439\u2013450","journal-title":"Neural Netw"},{"key":"1076_CR19","doi-asserted-by":"crossref","unstructured":"Huang L, Chen C, Yun J, Sun Y, Tian J, Hao Z, Yu H, Ma H (2022) Multi-scale feature fusion convolutional neural network for indoor small target detection. Front Neurorobot 16:881021","DOI":"10.3389\/fnbot.2022.881021"},{"key":"1076_CR20","doi-asserted-by":"publisher","first-page":"522","DOI":"10.3390\/rs14030522","volume":"14","author":"B Peng","year":"2022","unstructured":"Peng B, Ren D, Zheng C, Lu A (2022) TRDet: two-stage rotated detection of rural buildings in remote sensing images. Remote Sensing 14:522","journal-title":"Remote Sensing"},{"key":"1076_CR21","doi-asserted-by":"crossref","unstructured":"Noh J, Bae W, Lee W, Seo J, Kim G (2019) Better to follow, follow to be better: towards precise supervision of feature super-resolution for small object detection. In: 2019 IEEE\/CVF International Conference on computer vision (ICCV), (Piscataway: IEEE Press, Seoul, Korea (South), 2019), pp 9724\u20139733","DOI":"10.1109\/ICCV.2019.00982"},{"key":"1076_CR22","doi-asserted-by":"publisher","first-page":"1854","DOI":"10.3390\/rs13091854","volume":"13","author":"SMA Bashir","year":"2021","unstructured":"Bashir SMA, Wang Y (2021) Small object detection in remote sensing images with residual feature aggregation-based super-resolution and object detector network. Remote Sens 13:1854","journal-title":"Remote Sens"},{"key":"1076_CR23","doi-asserted-by":"publisher","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","volume":"39","author":"S Ren","year":"2015","unstructured":"Ren S, He K, Girshick RB, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39:1137\u20131149","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"1076_CR24","doi-asserted-by":"publisher","first-page":"108199","DOI":"10.1016\/j.patcog.2021.108199","volume":"121","author":"J Peng","year":"2022","unstructured":"Peng J, Wang H, Yue S, Zhang Z (2022) Context-aware co-supervision for accurate object detection. Pattern Recognit 121:108199","journal-title":"Pattern Recognit"},{"key":"1076_CR25","doi-asserted-by":"publisher","first-page":"313","DOI":"10.1016\/j.cja.2021.10.022","volume":"35","author":"YH Zhang","year":"2022","unstructured":"Zhang YH, Xu TB, Wei ZZ (2022) Pre-locate net for object detection in high-resolution images. Chin J Aeronaut 35:313\u2013325","journal-title":"Chin J Aeronaut"},{"key":"1076_CR26","doi-asserted-by":"crossref","unstructured":"Tang X, Du SK, He Z, Liu J (2018), Pyramidbox: a context-assisted single shot face detector. In: Proceedings of the European Conference on computer vision (ECCV), (Cham: Springer International Publishing, 2018), pp 797\u2013813","DOI":"10.1007\/978-3-030-01240-3_49"},{"key":"1076_CR27","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2021.107867","volume":"114","author":"Y Kong","year":"2021","unstructured":"Kong Y, Feng M, Li X, Lu H, Liu X, Yin B (2021) Spatial context-aware network for salient object detection. Pattern Recognit 114:107867","journal-title":"Pattern Recognit"},{"key":"1076_CR28","doi-asserted-by":"crossref","unstructured":"Tan M, Pang R, Le QV (2020) EfficientDet: scalable and efficient object detection. In: 2020 IEEE\/CVF Conference on computer vision and pattern recognition (CVPR), (Piscataway: IEEE Press, Electr Network, 2020), pp 10778\u201310787","DOI":"10.1109\/CVPR42600.2020.01079"},{"key":"1076_CR29","doi-asserted-by":"publisher","first-page":"3423","DOI":"10.1109\/TIP.2019.2896952","volume":"28","author":"Y Yuan","year":"2019","unstructured":"Yuan Y, Xiong Z, Wang Q (2019) VSSA-NET: vertical spatial sequence attention network for traffic sign detection. IEEE Trans Image Process 28:3423\u20133434","journal-title":"IEEE Trans Image Process"},{"key":"1076_CR30","doi-asserted-by":"crossref","unstructured":"Qiao S, Chen L-C, Yuille AL (2021) DetectoRS: detecting objects with recursive feature pyramid and switchable Atrous convolution. In: 2021 IEEE\/CVF Conference on computer vision and pattern recognition (CVPR), (Piscataway: IEEE Press, Electr Network, 2021), pp 10208\u201310219","DOI":"10.1109\/CVPR46437.2021.01008"},{"key":"1076_CR31","doi-asserted-by":"crossref","unstructured":"Dai X, Chen Y, Xiao B, Chen D, Liu M, Yuan L, Zhang L (2021) Dynamic head: unifying object detection heads with attentions. In: 2021 IEEE\/CVF Conference on Computer vision and pattern recognition (CVPR), (Piscataway: IEEE Press, Electr Network, 2021), pp 7369\u20137378","DOI":"10.1109\/CVPR46437.2021.00729"},{"key":"1076_CR32","doi-asserted-by":"publisher","first-page":"1747","DOI":"10.1016\/j.cja.2020.02.024","volume":"33","author":"YD Li","year":"2020","unstructured":"Li YD, Dong H, Li HG, Zhang XY, Zhang BC, Xiao ZF (2020) Multi-block SSD based on small object detection for UAV railway scene surveillance. Chin J Aeronaut 33:1747\u20131755","journal-title":"Chin J Aeronaut"},{"key":"1076_CR33","doi-asserted-by":"crossref","unstructured":"Jiao J, Gao J, Liu X, Liu F, Yang S, Hou B (2021) Multi-scale representation learning for image classification: a survey. IEEE Trans Artif Intell 4:23\u201343","DOI":"10.1109\/TAI.2021.3135248"},{"key":"1076_CR34","first-page":"1","volume":"63","author":"L Cui","year":"2020","unstructured":"Cui L (2020) MDSSD: multi-scale deconvolutional single shot detector for small objects, Science China. Inf Sci 63:1\u20133","journal-title":"Inf Sci"},{"key":"1076_CR35","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s10489-019-01511-7","volume":"50","author":"Z Liu","year":"2019","unstructured":"Liu Z, Li D, Ge SS, Tian F (2019) Small traffic sign detection from large image. Appl Intell 50:1\u201313","journal-title":"Appl Intell"},{"key":"1076_CR36","doi-asserted-by":"publisher","first-page":"57120","DOI":"10.1109\/ACCESS.2019.2913882","volume":"7","author":"Z Liu","year":"2019","unstructured":"Liu Z, Du J, Tian F, Wen J (2019) MR-CNN: a multi-scale region-based convolutional neural network for small traffic sign recognition. IEEE Access 7:57120\u201357128","journal-title":"IEEE Access"},{"key":"1076_CR37","unstructured":"Song L, Li Y, Jiang Z, Li Z, Sun H, Sun J, Zheng N (2020) Fine-grained dynamic head for object detection. In: 2020 The Thirty-fourth Conference on neural information processing systems (NeurIPS), (New York: Curran Associates Press, Electr Network, 2020), pp 11131\u201311141"},{"key":"1076_CR38","doi-asserted-by":"publisher","first-page":"579","DOI":"10.1109\/TPAMI.2019.2933510","volume":"44","author":"J Han","year":"2022","unstructured":"Han J, Yao X, Cheng G, Feng X, Xu D (2022) P-CNN: part-based convolutional neural networks for fine-grained visual categorization. IEEE Trans Pattern Anal Mach Intell 44:579\u2013590","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"1076_CR39","doi-asserted-by":"crossref","unstructured":"Wang GQ, Zhuang Y, Chen H, Liu X, Zhang T, Li LL, Dong S, Sang QB (2022) FSoD-Net: full-scale object detection from optical remote sensing imagery. IEEE Trans Geosci Remote Sens 60:5602918","DOI":"10.1109\/TGRS.2021.3064599"},{"key":"1076_CR40","doi-asserted-by":"publisher","first-page":"2416","DOI":"10.3390\/rs12152416","volume":"12","author":"ZZ Tian","year":"2020","unstructured":"Tian ZZ, Zhan RH, Hu JM, Wang W, He ZQ, Zhuang ZW (2020) Generating anchor boxes based on attention mechanism for object detection in remote sensing images. Remote Sens 12:2416","journal-title":"Remote Sens"},{"key":"1076_CR41","doi-asserted-by":"publisher","first-page":"67","DOI":"10.1016\/j.isprsjprs.2019.12.001","volume":"160","author":"YT Yu","year":"2020","unstructured":"Yu YT, Guan HY, Li DL, Gu TN, Tang E, Li AX (2020) Orientation guided anchoring for geospatial object detection from remote sensing imagery. ISPRS-J Photogramm Remote Sens 160:67\u201382","journal-title":"ISPRS-J Photogramm Remote Sens"},{"key":"1076_CR42","doi-asserted-by":"crossref","unstructured":"Hou JB, Zhu XB, Yin XC (2021) Self-adaptive aspect ratio anchor for oriented object detection in remote sensing images. Remote Sens 13:1318","DOI":"10.3390\/rs13071318"},{"key":"1076_CR43","doi-asserted-by":"crossref","unstructured":"Shen JQ, Zhou WC, Liu NZ, Sun H, Li DG, Zhang YX An anchor-free lightweight deep convolutional network for vehicle detection in aerial images. IEEE Trans Intell Transp Syst\n23:24330\u201324342","DOI":"10.1109\/TITS.2022.3203715"},{"key":"1076_CR44","doi-asserted-by":"crossref","unstructured":"Shi LK, Kuang LY, Xu X, Pan B, Shi ZW (2022) CANet: centerness-aware network for object detection in remote sensing images. IEEE Trans Geosci Remote Sens 60:5603613","DOI":"10.1109\/TGRS.2021.3068970"},{"key":"1076_CR45","doi-asserted-by":"crossref","unstructured":"Wang P, Niu YX, Xiong R, Ma F, Zhang CX (2021), DGANet: dynamic gradient adjustment anchor-free object detection in optical remote sensing images. Remote Sens 13:1642","DOI":"10.3390\/rs13091642"},{"key":"1076_CR46","doi-asserted-by":"publisher","first-page":"273","DOI":"10.1016\/j.cja.2021.09.016","volume":"35","author":"L Ni","year":"2022","unstructured":"Ni L, Huo CL, Zhang X, Wang P, Zhou ZX (2022) GroupNet: learning to group corner for object detection in remote sensing imagery. Chin J Aeronaut 35:273\u2013284","journal-title":"Chin J Aeronaut"},{"key":"1076_CR47","doi-asserted-by":"publisher","first-page":"8826","DOI":"10.1109\/TGRS.2021.3053311","volume":"59","author":"ZY Cui","year":"2021","unstructured":"Cui ZY, Leng JX, Liu Y, Zhang TL, Quan P, Zhao W (2021) SKNet: detecting rotated ships as keypoints in optical remote sensing images. IEEE Trans Geosci Remote Sens 59:8826\u20138840","journal-title":"IEEE Trans Geosci Remote Sens"},{"key":"1076_CR48","unstructured":"Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N (2021) An image is worth 16x16 words: transformers for image recognition at scale. arXiv: arXiv:2010.11929"},{"key":"1076_CR49","doi-asserted-by":"crossref","unstructured":"Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: 16th European Conference on computer vision (ECCV), (Cham: Springer International Publishing, Electr Network, 2020), pp 213\u2013229","DOI":"10.1007\/978-3-030-58452-8_13"},{"key":"1076_CR50","unstructured":"Park N, Kim S (2022) How do vision transformers work? arXiv: arXiv:2202.06709"},{"key":"1076_CR51","doi-asserted-by":"crossref","unstructured":"Gulati A, Qin J, Chiu C-C, Parmar N, Zhang Y, Yu J, Han Q, Wang S, Zhang X, Wu Y (2020) Conformer: convolution-augmented transformer for speech recognition. arXiv:2005.08100","DOI":"10.21437\/Interspeech.2020-3015"},{"key":"1076_CR52","doi-asserted-by":"crossref","unstructured":"Chen Q, Wu Q, Wang J, Hu Q, Hu T, Ding E, Cheng J, Wang J (2022) MixFormer: mixing features across windows and dimensions. In: 2022 IEEE\/CVF Conference on computer vision and pattern recognition (CVPR), (Piscataway: IEEE Press, New Orleans, USA, 2022), pp 5239\u20135249","DOI":"10.1109\/CVPR52688.2022.00518"},{"key":"1076_CR53","doi-asserted-by":"crossref","unstructured":"Wu H, Xiao B, Codella NCF, Liu M, Dai X, Yuan L, Zhang l (2021) CvT: introducing convolutions to vision transformers. In: 2021 IEEE\/CVF International Conference on computer vision (ICCV), (Piscataway: IEEE Press, Montreal, BC, Canada, 2021), pp 22\u201331","DOI":"10.1109\/ICCV48922.2021.00009"},{"key":"1076_CR54","doi-asserted-by":"crossref","unstructured":"Zhu XK, Lyu SC, Wang X, Zhao Q, Soc IC (2021) TPH-YOLOv5: improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios. In: 2021 IEEE\/CVF International Conference on computer vision (ICCV), (Piscataway: IEEE Press, Montreal, BC, Canada, 2021), pp 2778\u20132788","DOI":"10.1109\/ICCVW54120.2021.00312"},{"key":"1076_CR55","unstructured":"Dai Z, Liu H, Le QV, Tan M (2021) CoAtNet: marrying convolution and attention for all data sizes. In: 2021 The Thirty-fifth Conference on neural information processing systems (NeurIPS), (New York: Curran Associates Press, Electr Network, 2021), pp 3965\u20133977"},{"key":"1076_CR56","doi-asserted-by":"crossref","unstructured":"Han K, Wang Y, Tian Q, Guo J, Xu C, Xu C (2020) GhostNet: more features from cheap operations. In: 2020 IEEE\/CVF Conference on computer vision and pattern recognition (CVPR), (Piscataway: IEEE Press, Electr Network, 2020), pp 1577\u20131586","DOI":"10.1109\/CVPR42600.2020.00165"},{"key":"1076_CR57","doi-asserted-by":"crossref","unstructured":"Tay Y, Dehghani M, Bahri D, Metzler D (2022) Efficient transformers: a survey. ACM Comput Surv\n55:1\u201328","DOI":"10.1145\/3530811"},{"key":"1076_CR58","unstructured":"Qin Z, Sun W, Deng H, Li D, Wei Y, Lv B, Yan J, Kong L, Zhong Y (2022) cosFormer: rethinking softmax in attention. arXiv: arXiv:2202.08791"},{"key":"1076_CR59","unstructured":"Ma X, Kong X, Wang S, Zhou C, May J, Ma H, Zettlemoyer L (2021) Luna: Linear unified nested attention. In: 2021 The Thirty-fifth Conference on neural information processing systems (NeurIPS), (New York: Curran Associates Press, Electr Network, 2021), pp 2441\u20132453."},{"key":"1076_CR60","unstructured":"Lu J, Yao J, Zhang J, Zhu X, Xu H, Gao W, Xu C, Xiang T, Zhang L (2021) SOFT: softmax-free transformer with linear complexity. In: 2021 The Thirty-fifth Conference on neural information processing systems (NeurIPS), (New York: Curran Associates Press, Electr Network, 2021), pp 21297\u201321309"},{"key":"1076_CR61","unstructured":"Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, J\u00e9gou H (2021) Training data-efficient image transformers & distillation through attention. In: 2021 International Conference on machine learning (ICML), (PMLR, Electr Network, 2021), pp 10347\u201310357"},{"key":"1076_CR62","unstructured":"Bello I (2021) LambdaNetworks: modeling long-range interactions without attention. arXiv: arXiv:2102.08602"},{"key":"1076_CR63","doi-asserted-by":"publisher","first-page":"3349","DOI":"10.1109\/TPAMI.2020.2983686","volume":"43","author":"J Wang","year":"2021","unstructured":"Wang J, Sun K, Cheng T, Jiang B, Deng C, Zhao Y, Liu D, Mu Y, Tan M, Wang X, Liu W, Xiao B (2021) Deep high-resolution representation learning for visual recognition. IEEE Trans Pattern Anal Mach Intell 43:3349\u20133364","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"1076_CR64","doi-asserted-by":"crossref","unstructured":"Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Doll\u00e1r P, Zitnick CL (2014) Microsoft coco: common objects in context. In: 13th European Conference on Computer Vision (ECCV), (Cham: Springer International Publishing, Zurich, Switzerland, 2014), pp 740\u2013755","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"1076_CR65","doi-asserted-by":"crossref","unstructured":"Xia G-S, Bai X, Ding J, Zhu Z, Belongie SJ, Luo J, Datcu M, Pelillo M, Zhang L-p (2018) DOTA: a large-scale dataset for object detection in aerial images. In: 2018 IEEE\/CVF Conference on computer vision and pattern recognition (CVPR), (Piscataway: IEEE Press, Salt Lake, USA, 2018), pp 3974\u20133983","DOI":"10.1109\/CVPR.2018.00418"},{"key":"1076_CR66","doi-asserted-by":"publisher","first-page":"4244","DOI":"10.1007\/s10489-021-02512-1","volume":"52","author":"G Tian","year":"2022","unstructured":"Tian G, Liu J, Zhao H, Yang W (2022) Small object detection via dual inspection mechanism for UAV visual images. Appl Intell 52:4244\u20134257","journal-title":"Appl Intell"},{"key":"1076_CR67","doi-asserted-by":"crossref","unstructured":"Du D, Zhu P, Wen L, Bian X, Lin H, Hu Q, Peng T, Zheng J, Wang X, Zhang Y (2019) VisDrone-DET2019: The vision meets drone object detection in image challenge results. In: Proceedings of the IEEE\/CVF international conference on computer vision workshops (CVPR), (Piscataway: IEEE Press, Long Beach, USA, 2019)","DOI":"10.1109\/ICCVW.2019.00031"},{"key":"1076_CR68","unstructured":"Tan M, Le Q (2021) Efficientnetv2: Smaller models and faster training. In: 2021 International Conference on Machine Learning (ICML), (PMLR, Electr Network, 2021), pp 10096\u201310106"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-023-01076-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-023-01076-6\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-023-01076-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,10,27]],"date-time":"2023-10-27T19:16:52Z","timestamp":1698434212000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-023-01076-6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,5,15]]},"references-count":68,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2023,12]]}},"alternative-id":["1076"],"URL":"https:\/\/doi.org\/10.1007\/s40747-023-01076-6","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"value":"2199-4536","type":"print"},{"value":"2198-6053","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,5,15]]},"assertion":[{"value":"27 January 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"10 April 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"15 May 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}