{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,15]],"date-time":"2026-04-15T17:45:08Z","timestamp":1776275108200,"version":"3.50.1"},"reference-count":48,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2022,6,15]],"date-time":"2022-06-15T00:00:00Z","timestamp":1655251200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Natural Science Foundation of China (NSFC)","award":["62175196"],"award-info":[{"award-number":["62175196"]}]},{"name":"National Natural Science Foundation of China (NSFC)","award":["61775176"],"award-info":[{"award-number":["61775176"]}]},{"name":"National Natural Science Foundation of China (NSFC)","award":["62125505"],"award-info":[{"award-number":["62125505"]}]},{"name":"National Natural Science Foundation of China (NSFC)","award":["2021GXLH-Z-058"],"award-info":[{"award-number":["2021GXLH-Z-058"]}]},{"name":"National Natural Science Foundation of China (NSFC)","award":["2020GY-131"],"award-info":[{"award-number":["2020GY-131"]}]},{"name":"National Natural Science Foundation of China (NSFC)","award":["2021SF-135"],"award-info":[{"award-number":["2021SF-135"]}]},{"name":"National Natural Science Foundation of China (NSFC)","award":["2021TD-57"],"award-info":[{"award-number":["2021TD-57"]}]},{"name":"National Natural Science Foundation of China (NSFC)","award":["xjh012020021"],"award-info":[{"award-number":["xjh012020021"]}]},{"name":"Shaanxi Province Key Research and Development Program","award":["62175196"],"award-info":[{"award-number":["62175196"]}]},{"name":"Shaanxi Province Key Research and Development Program","award":["61775176"],"award-info":[{"award-number":["61775176"]}]},{"name":"Shaanxi Province Key Research and Development Program","award":["62125505"],"award-info":[{"award-number":["62125505"]}]},{"name":"Shaanxi Province Key Research and Development Program","award":["2021GXLH-Z-058"],"award-info":[{"award-number":["2021GXLH-Z-058"]}]},{"name":"Shaanxi Province Key Research and Development Program","award":["2020GY-131"],"award-info":[{"award-number":["2020GY-131"]}]},{"name":"Shaanxi Province Key Research and Development Program","award":["2021SF-135"],"award-info":[{"award-number":["2021SF-135"]}]},{"name":"Shaanxi Province Key Research and Development Program","award":["2021TD-57"],"award-info":[{"award-number":["2021TD-57"]}]},{"name":"Shaanxi Province Key Research and Development Program","award":["xjh012020021"],"award-info":[{"award-number":["xjh012020021"]}]},{"name":"Innovation Capability Support Program of Shaanxi","award":["62175196"],"award-info":[{"award-number":["62175196"]}]},{"name":"Innovation Capability Support Program of Shaanxi","award":["61775176"],"award-info":[{"award-number":["61775176"]}]},{"name":"Innovation Capability Support Program of Shaanxi","award":["62125505"],"award-info":[{"award-number":["62125505"]}]},{"name":"Innovation Capability Support Program of Shaanxi","award":["2021GXLH-Z-058"],"award-info":[{"award-number":["2021GXLH-Z-058"]}]},{"name":"Innovation Capability Support Program of Shaanxi","award":["2020GY-131"],"award-info":[{"award-number":["2020GY-131"]}]},{"name":"Innovation Capability Support Program of Shaanxi","award":["2021SF-135"],"award-info":[{"award-number":["2021SF-135"]}]},{"name":"Innovation Capability Support Program of Shaanxi","award":["2021TD-57"],"award-info":[{"award-number":["2021TD-57"]}]},{"name":"Innovation Capability Support Program of Shaanxi","award":["xjh012020021"],"award-info":[{"award-number":["xjh012020021"]}]},{"name":"Fundamental Research Funds for the Central Universities","award":["62175196"],"award-info":[{"award-number":["62175196"]}]},{"name":"Fundamental Research Funds for the Central Universities","award":["61775176"],"award-info":[{"award-number":["61775176"]}]},{"name":"Fundamental Research Funds for the Central Universities","award":["62125505"],"award-info":[{"award-number":["62125505"]}]},{"name":"Fundamental Research Funds for the Central Universities","award":["2021GXLH-Z-058"],"award-info":[{"award-number":["2021GXLH-Z-058"]}]},{"name":"Fundamental Research Funds for the Central Universities","award":["2020GY-131"],"award-info":[{"award-number":["2020GY-131"]}]},{"name":"Fundamental Research Funds for the Central Universities","award":["2021SF-135"],"award-info":[{"award-number":["2021SF-135"]}]},{"name":"Fundamental Research Funds for the Central Universities","award":["2021TD-57"],"award-info":[{"award-number":["2021TD-57"]}]},{"name":"Fundamental Research Funds for the Central Universities","award":["xjh012020021"],"award-info":[{"award-number":["xjh012020021"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Object detection has made tremendous progress in natural images over the last decade. However, the results are hardly satisfactory when the natural image object detection algorithm is directly applied to satellite images. This is due to the intrinsic differences in the scale and orientation of objects generated by the bird\u2019s-eye perspective of satellite photographs. Moreover, the background of satellite images is complex and the object area is small; as a result, small objects tend to be missing due to the challenge of feature extraction. Dense objects overlap and occlusion also affects the detection performance. Although the self-attention mechanism was introduced to detect small objects, the computational complexity increased with the image\u2019s resolution. We modified the general one-stage detector YOLOv5 to adapt the satellite images to resolve the above problems. First, new feature fusion layers and a prediction head are added from the shallow layer for small object detection for the first time because it can maximally preserve the feature information. Second, the original convolutional prediction heads are replaced with Swin Transformer Prediction Heads (SPHs) for the first time. SPH represents an advanced self-attention mechanism whose shifted window design can reduce the computational complexity to linearity. Finally, Normalization-based Attention Modules (NAMs) are integrated into YOLOv5 to improve attention performance in a normalized way. The improved YOLOv5 is termed SPH-YOLOv5. It is evaluated on the NWPU-VHR10 dataset and DOTA dataset, which are widely used for satellite image object detection evaluations. Compared with the basal YOLOv5, SPH-YOLOv5 improves the mean Average Precision (mAP) by 0.071 on the DOTA dataset.<\/jats:p>","DOI":"10.3390\/rs14122861","type":"journal-article","created":{"date-parts":[[2022,6,16]],"date-time":"2022-06-16T03:01:22Z","timestamp":1655348482000},"page":"2861","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":189,"title":["Swin-Transformer-Enabled YOLOv5 with Attention Mechanism for Small Object Detection on Satellite Images"],"prefix":"10.3390","volume":"14","author":[{"given":"Hang","family":"Gong","sequence":"first","affiliation":[{"name":"MOE Key Laboratory for Nonequilibrium Synthesis and Modulation of Condensed Matter, Research Center for Space Optics and Astronomy, School of Physics, Xi\u2019an Jiaotong University, Xi\u2019an 710049, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7927-7760","authenticated-orcid":false,"given":"Tingkui","family":"Mu","sequence":"additional","affiliation":[{"name":"MOE Key Laboratory for Nonequilibrium Synthesis and Modulation of Condensed Matter, Research Center for Space Optics and Astronomy, School of Physics, Xi\u2019an Jiaotong University, Xi\u2019an 710049, China"}]},{"given":"Qiuxia","family":"Li","sequence":"additional","affiliation":[{"name":"MOE Key Laboratory for Nonequilibrium Synthesis and Modulation of Condensed Matter, Research Center for Space Optics and Astronomy, School of Physics, Xi\u2019an Jiaotong University, Xi\u2019an 710049, China"}]},{"given":"Haishan","family":"Dai","sequence":"additional","affiliation":[{"name":"Shanghai Institute of Satellite Engineering, Shanghai Academy of Spaceflight Technology, Shanghai 201109, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4364-6867","authenticated-orcid":false,"given":"Chunlai","family":"Li","sequence":"additional","affiliation":[{"name":"Shanghai Institute of Technical Physics, Chinese Academy of Sciences, Shanghai 200083, China"}]},{"given":"Zhiping","family":"He","sequence":"additional","affiliation":[{"name":"Shanghai Institute of Technical Physics, Chinese Academy of Sciences, Shanghai 200083, China"}]},{"given":"Wenjing","family":"Wang","sequence":"additional","affiliation":[{"name":"MOE Key Laboratory for Nonequilibrium Synthesis and Modulation of Condensed Matter, Research Center for Space Optics and Astronomy, School of Physics, Xi\u2019an Jiaotong University, Xi\u2019an 710049, China"}]},{"given":"Feng","family":"Han","sequence":"additional","affiliation":[{"name":"MOE Key Laboratory for Nonequilibrium Synthesis and Modulation of Condensed Matter, Research Center for Space Optics and Astronomy, School of Physics, Xi\u2019an Jiaotong University, Xi\u2019an 710049, China"}]},{"given":"Abudusalamu","family":"Tuniyazi","sequence":"additional","affiliation":[{"name":"MOE Key Laboratory for Nonequilibrium Synthesis and Modulation of Condensed Matter, Research Center for Space Optics and Astronomy, School of Physics, Xi\u2019an Jiaotong University, Xi\u2019an 710049, China"}]},{"given":"Haoyang","family":"Li","sequence":"additional","affiliation":[{"name":"MOE Key Laboratory for Nonequilibrium Synthesis and Modulation of Condensed Matter, Research Center for Space Optics and Astronomy, School of Physics, Xi\u2019an Jiaotong University, Xi\u2019an 710049, China"}]},{"given":"Xuechan","family":"Lang","sequence":"additional","affiliation":[{"name":"MOE Key Laboratory for Nonequilibrium Synthesis and Modulation of Condensed Matter, Research Center for Space Optics and Astronomy, School of Physics, Xi\u2019an Jiaotong University, Xi\u2019an 710049, China"}]},{"given":"Zhiyuan","family":"Li","sequence":"additional","affiliation":[{"name":"MOE Key Laboratory for Nonequilibrium Synthesis and Modulation of Condensed Matter, Research Center for Space Optics and Astronomy, School of Physics, Xi\u2019an Jiaotong University, Xi\u2019an 710049, China"}]},{"given":"Bin","family":"Wang","sequence":"additional","affiliation":[{"name":"MOE Key Laboratory for Nonequilibrium Synthesis and Modulation of Condensed Matter, Research Center for Space Optics and Astronomy, School of Physics, Xi\u2019an Jiaotong University, Xi\u2019an 710049, China"}]}],"member":"1968","published-online":{"date-parts":[[2022,6,15]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"183","DOI":"10.1016\/j.compind.2018.03.014","article-title":"Real-time object detection in agricultural\/remote environments using the multiple- expert colour feature extreme learning machine (mec-elm)","volume":"98","author":"Sadgrove","year":"2018","journal-title":"Comput. Ind."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Daniilidis, K., Maragos, P., and Paragios, N. (2010). Detection and tracking of large number of targets in wide area surveillance. Computer Vision\u2014ECCV 2010, Proceedings of the 11th European Conference on Computer Vision, Heraklion, Crete, Greece, 5\u201311 September 2010, Springer.","DOI":"10.1007\/978-3-642-15561-1"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"778","DOI":"10.1109\/LGRS.2017.2681128","article-title":"Deep learning classification of land cover and crop types using remote sensing data","volume":"14","author":"Kussul","year":"2017","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_4","first-page":"1","article-title":"Stagewise Unsupervised Domain Adaptation with Adversarial Self-Training for Road Segmentation of Remote-Sensing Images","volume":"60","author":"Zhang","year":"2022","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"318","DOI":"10.1109\/TPAMI.2018.2858826","article-title":"Focal loss for dense object detection","volume":"42","author":"Lin","year":"2020","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Leibe, B., Matas, J., Sebe, N., and Welling, M. (2016). Ssd: Single shot multibox detector. Computer Vision\u2014ECCV 2016, Proceedings of the 14th European Conference on Computer Vision, Amsterdam, The Netherlands, 11\u201314 October 2022, Springer International Publishing.","DOI":"10.1007\/978-3-319-46493-0"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"116793","DOI":"10.1016\/j.eswa.2022.116793","article-title":"Remote sensing image super-resolution and object detection: Benchmark and state of the art","volume":"197","author":"Wang","year":"2022","journal-title":"Expert Syst. Appl."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1007\/s11263-009-0275-4","article-title":"The pascal visual object classes (voc) challenge","volume":"88","author":"Everingham","year":"2010","journal-title":"Int. J. Comput. Vis."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollar, P., and Zitnick, C.L. (2014). Microsoft coco: Common objects in context. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"9805389","DOI":"10.34133\/2021\/9805389","article-title":"Feature enhancement network for object detection in optical remote sensing images","volume":"2021","author":"Cheng","year":"2021","journal-title":"J. Remote Sens."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"2486","DOI":"10.1109\/TGRS.2016.2645610","article-title":"Accurate object localization in remote sensing images based on convolutional neural networks","volume":"55","author":"Long","year":"2017","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"296","DOI":"10.1016\/j.isprsjprs.2019.11.023","article-title":"Object detection in optical remote sensing images: A survey and a new benchmark","volume":"159","author":"Ke","year":"2020","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016, January 27\u201330). You only look once: Unified, real-time object detection. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.91"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Redmon, J., and Farhadi, A. (2017, January 21\u201327). Yolo9000: Better, faster, stronger. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.690"},{"key":"ref_15","unstructured":"Joseph, R., and Ali, F. (2018). Yolov3: An incremental improvement. arXiv."},{"key":"ref_16","unstructured":"Bochkovskiy, A., Wang, C.Y., and Liao, H.Y.M. (2020). Yolov4: Optimal speed and accuracy of object detection. arXiv."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"3895","DOI":"10.1007\/s00521-021-06651-x","article-title":"A fast accurate fine-grain object detection model based on yolov4 deep neural network","volume":"34","author":"Roy","year":"2022","journal-title":"Neural Comput. Appl."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Wang, C.Y., Liao, H.Y.M., Wu, Y.H., Chen, P.Y., Hsieh, J.W., and Yeh, I.H. (2020, January 14\u201319). Cspnet: A new backbone that can enhance learning capability of CNN. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA.","DOI":"10.1109\/CVPRW50498.2020.00203"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Liu, S., Qi, L., Qin, H., Shi, J., and Jia, J. (2018, January 18\u201322). Path aggregation network for instance segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00913"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021, January 11\u201317). Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"ref_21","unstructured":"Dalal, N., and Triggs, B. (2005, January 20\u201325). Histograms of oriented gradients for human detection. Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR\u201905), San Diego, CA, USA."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1023\/B:VISI.0000029664.99615.94","article-title":"Distinctive image features from scale-invariant keypoints","volume":"60","author":"Low","year":"2004","journal-title":"Int. J. Comput. Vis."},{"key":"ref_23","unstructured":"Liu, L., Ouyang, W., Wang, X., Fieguth, P., Chen, J., Liu, X., and Pietika inen, M. (2018). Deeplearningforgenericobjectdetection: A survey. arXiv."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014, January 23\u201328). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.81"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"1904","DOI":"10.1109\/TPAMI.2015.2389824","article-title":"Spatial pyramid pooling in deep convolutional networks for visual recognition","volume":"37","author":"He","year":"2015","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 11\u201314). Fast r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Berlin, Germany.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","article-title":"Faster r-cnn: Towards real-time object detection with region proposal networks","volume":"39","author":"Ren","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., and Ren, D. (2020, January 7\u201312). Distance-iouloss: Faster and better learning for bounding box regression. Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA.","DOI":"10.1609\/aaai.v34i07.6999"},{"key":"ref_30","unstructured":"Misra, D. (2019). Mish:Aselfregularizednon-monotonicactivationfunction. arXiv."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Wang, C.Y., Bochkovskiy, A., and Liao, H.Y.M. (2021, January 19\u201325). Scaled-yolov4: Scaling cross stage partial network. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Virtual.","DOI":"10.1109\/CVPR46437.2021.01283"},{"key":"ref_32","unstructured":"Simard, P.Y., Steinkraus, D., and Platt, J.C. (2003, January 3\u20136). Best practices for convolutional neural networks applied to visual document analysis. Proceedings of the 7th International Conference on Document Analysis and Recognition (ICDAR 2003), Edinburgh, UK."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Zhang, H., Cisse, M., Dauphin, Y.N., and Lopez-Paz, D. (2017). mixup:Beyond empirical risk minimization. arXiv.","DOI":"10.1007\/978-1-4899-7687-1_79"},{"key":"ref_34","unstructured":"DeVries, T., and Taylor, G.W. (2017). Improved regularization of convolutional neural networks with cutout. arXiv."},{"key":"ref_35","unstructured":"Yun, S., Han, D., Oh, S.J., Chun, S., Choe, J., and Yoo, Y. (November, January 27). Cutmix: Regularization strategy to train strong classifiers with localizable features. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_36","first-page":"2204","article-title":"Recurrent models of visual attention","volume":"27","author":"Mnih","year":"2014","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_37","first-page":"2017","article-title":"Spatial transformer networks","volume":"28","author":"Jaderberg","year":"2015","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Hu, J., Shen, L., and Sun, G. (2018, January 18\u201322). Squeeze-and-excitation networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00745"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Woo, S., Park, J., Lee, J.-Y., and Kweon, I.S. (2018, January 8\u201314). Cbam: Convolutional block attention module. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Wang, X., Girshick, R., Gupta, A., and He, K. (2018, January 18\u201322). Non-local neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00813"},{"key":"ref_41","unstructured":"Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Yuan, L., Chen, Y., Wang, T., Yu, W., Shi, Y., Jiang, Z.H., Tay, F.E., Feng, J., and Yan, S. (2021, January 11\u201317). Tokens-to-token vit: Training vision transformers from scratch on imagenet. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.00060"},{"key":"ref_43","unstructured":"Guo, M.H., Xu, T.X., Liu, J.J., Liu, Z.N., Jiang, P.T., Mu, T.J., Zhang, S.H., Martin, R.R., Cheng, M.M., and Hu, S.M. (2021). Attention mecha-nisms in computer vision: A survey. arXiv."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Zhu, X., Lyu, S., Wang, X., and Zhao, Q. (2021, January 19\u201325). Tph-yolov5: Improved yolov5 based on transformer prediction head for object detection on drone- captured scenarios. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Nashville, TN, USA.","DOI":"10.1109\/ICCVW54120.2021.00312"},{"key":"ref_45","unstructured":"Hendrycks, D., and Gimpel, K. (2016). Gaussian error linear units (gelus). arXiv."},{"key":"ref_46","unstructured":"Liu, Y., Shao, Z., Teng, Y., and Hoffmann, N. (2021). NAM: Normalization-based Attention Module. arXiv."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1016\/j.isprsjprs.2014.10.002","article-title":"Multi-class geospatial object detection and geographic image classification based on collection of part detectors","volume":"98","author":"Cheng","year":"2014","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Xia, G.S., Bai, X., Ding, J., Zhu, Z., Belongie, S., Luo, J., Datcu, M., Pelillo, M., and Zhang, L. (2018, January 18\u201323). Dota: A large-scale dataset for object detection in aerial images. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00418"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/12\/2861\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T23:31:52Z","timestamp":1760139112000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/12\/2861"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,6,15]]},"references-count":48,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2022,6]]}},"alternative-id":["rs14122861"],"URL":"https:\/\/doi.org\/10.3390\/rs14122861","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,6,15]]}}}