{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,21]],"date-time":"2026-02-21T03:48:10Z","timestamp":1771645690111,"version":"3.50.1"},"reference-count":52,"publisher":"MDPI AG","issue":"6","license":[{"start":{"date-parts":[[2022,3,18]],"date-time":"2022-03-18T00:00:00Z","timestamp":1647561600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62076019"],"award-info":[{"award-number":["62076019"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Mid-to-high altitude Unmanned Aerial Vehicle (UAV) imagery can provide important remote sensing information between satellite and low altitude platforms, and vehicle detection in mid-to-high altitude UAV images plays a crucial role in land monitoring and disaster relief. However, the high background complexity of images and limited pixels of objects challenge the performance of tiny vehicle detection. Traditional methods suffer from poor adaptation ability to complex backgrounds, while deep neural networks (DNNs) have inherent defects in feature extraction of tiny objects with finite pixels. To address the issue above, this paper puts forward a vehicle detection method combining the DNNs-based and traditional methods for mid-to-high altitude UAV images. We first employ the deep segmentation network to exploit the co-occurrence of the road and vehicles, then detect tiny vehicles based on visual attention mechanism with spatial-temporal constraint information. Experimental results show that the proposed method achieves effective detection of tiny vehicles in complex backgrounds. In addition, ablation experiments are performed to inspect the effectiveness of each component, and comparative experiments on tinier objects are carried out to prove the superior generalization performance of our method in detecting vehicles with a limited size of 5 \u00d7 5 pixels or less.<\/jats:p>","DOI":"10.3390\/s22062354","type":"journal-article","created":{"date-parts":[[2022,3,20]],"date-time":"2022-03-20T21:37:17Z","timestamp":1647812237000},"page":"2354","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":14,"title":["Tiny Vehicle Detection for Mid-to-High Altitude UAV Images Based on Visual Attention and Spatial-Temporal Information"],"prefix":"10.3390","volume":"22","author":[{"given":"Ruonan","family":"Yu","sequence":"first","affiliation":[{"name":"School of Electrical and Information Engineering, Beihang University, Beijing 100191, China"}]},{"given":"Hongguang","family":"Li","sequence":"additional","affiliation":[{"name":"Unmanned System Research Institute, Beihang University, Beijing 100191, China"}]},{"given":"Yalong","family":"Jiang","sequence":"additional","affiliation":[{"name":"Unmanned System Research Institute, Beihang University, Beijing 100191, China"}]},{"given":"Baochang","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Automation Science and Electrical Engineering, Beihang University, Beijing 100191, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8713-3153","authenticated-orcid":false,"given":"Yufeng","family":"Wang","sequence":"additional","affiliation":[{"name":"Unmanned System Research Institute, Beihang University, Beijing 100191, China"}]}],"member":"1968","published-online":{"date-parts":[[2022,3,18]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Li, H., Ding, W., Cao, X., and Liu, C. (2017). Image Registration and Fusion of Visible and Infrared Integrated Camera for Medium-Altitude Unmanned Aerial Vehicle Remote Sensing. Remote Sens., 9.","DOI":"10.3390\/rs9050441"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"12606","DOI":"10.3390\/rs71012606","article-title":"Metadata-Assisted Global Motion Estimation for Medium-Altitude Unmanned Aerial Vehicle Video Applications","volume":"7","author":"Li","year":"2015","journal-title":"Remote Sens."},{"key":"ref_3","unstructured":"Zhu, P., Wen, L., Du, D., Bian, X., Hu, Q., and Ling, H. (2020). Vision Meets Drones: Past, Present and Future. arXiv."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Mueller, M., Smith, N., and Ghanem, B. (2016, January 8\u201316). A Benchmark and Simulator for UAV Tracking. Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_27"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Robicquet, A., Sadeghian, A., and Alahi, A. (2016, January 8\u201316). Learning Social Etiquette: Human Trajectory Understanding In Crowded Scenes. Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46484-8_33"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Barekatain, M., Mart\u00ed, M., and Shih, H. (2017, January 21\u201326). Okutama-Action: An Aerial View Video Dataset for Concurrent Human Action Detection. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA.","DOI":"10.1109\/CVPRW.2017.267"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Du, D., Qi, Y., Yu, H., Yang, Y., Duan, K., Li, G., Zhang, W., Huang, Q., and Tian, Q. (2018, January 8\u201314). The Unmanned Aerial Vehicle Benchmark: Object Detection and Tracking. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01249-6_23"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Lin, T., Maire, M., and Belongie, S. (2014, January 6\u201312). Microsoft COCO: Common Objects in Context. Proceedings of the European Conference on Computer Vision (ECCV), Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_9","unstructured":"Zhang, W., Cong, M., and Wang, L. (2003, January 14\u201317). Algorithms for optical weak small targets detection and tracking: Review. Proceedings of the International Conference on Neural Networks and Signal Processing, Nanjing, China."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Ozbay, M., and \u015eahingil, M. (2017, January 15\u201318). A fast and robust automatic object detection algorithm to detect small objects in infrared images. Proceedings of the 2017 25th Signal Processing and Communications Applications Conference (SIU), Antalya, Turkey.","DOI":"10.1109\/SIU.2017.7960456"},{"key":"ref_11","unstructured":"Yang, Y., and Sun, W. (2018, January 19\u201321). Adaptive Detection of Infrared Small Target Based on Target-Background Separation with Ratio Minimization of Singular Values. Proceedings of the 2018 4th Annual International Conference on Network and Information Systems for Computers (ICNISC), Wuhan, China."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"1643","DOI":"10.1016\/j.sigpro.2009.11.014","article-title":"Enhancement of dim small target through modified top-hat transformation under the condition of heavy clutter","volume":"90","author":"Bai","year":"2010","journal-title":"Signal Process."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"962","DOI":"10.1109\/LGRS.2016.2556218","article-title":"An Efficient Infrared Small Target Detection Method Based on Visual Contrast Mechanism","volume":"13","author":"Chen","year":"2016","journal-title":"IEEE Geosci. Remote Sens. Lett."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Zhao, J., Liu, F., and Bo, M. (2012, January 28\u201329). An Algorithm of Dim and Small Target Detection Based on Wavelet Transform and Image Fusion. Proceedings of the International Symposium on Computational Intelligence and Design (ISCID), Washington, DC, USA.","DOI":"10.1109\/ISCID.2012.162"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Duk, V., Ng, B., and Rosenberg, L. (2015, January 10\u201315). The potential of 2D wavelet transforms for target detection in sea-clutter. Proceedings of the IEEE National Radar Conference, Arlington, VA, USA.","DOI":"10.1109\/RADAR.2015.7131123"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"994","DOI":"10.1007\/s10762-009-9518-2","article-title":"Small Target Detection Utilizing Robust Methods of the Human Visual System for IRST","volume":"30","author":"Kim","year":"2009","journal-title":"J. Infrared Millim. Terahertz Waves"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"513","DOI":"10.1016\/j.infrared.2012.08.004","article-title":"Infrared dim target detection based on visual attention","volume":"55","author":"Wang","year":"2012","journal-title":"Infrared Phys. Technol."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Achanta, R., Hemami, S., and Estrada, F. (2009, January 20\u201325). Frequencytuned salient region detection. Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Miami, FL, USA.","DOI":"10.1109\/CVPR.2009.5206596"},{"key":"ref_19","unstructured":"Redmon, J., and Farhadi, A. (2018). Yolov3: An incremental improvement. arXiv."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"318","DOI":"10.1109\/TPAMI.2018.2858826","article-title":"Focal loss for dense object detection","volume":"42","author":"Lin","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"14781","DOI":"10.1007\/s11042-016-4025-7","article-title":"Small target detection combining regional stability and saliency in a color image","volume":"76","author":"Lou","year":"2017","journal-title":"Multimed. Tools Appl."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"216","DOI":"10.1016\/j.patcog.2016.04.002","article-title":"Multiscale patch-based contrast measure for small infrared target detection","volume":"58","author":"Wei","year":"2016","journal-title":"Pattern Recognit."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"761","DOI":"10.1016\/j.imavis.2004.02.006","article-title":"Robust wide-baseline stereo from maximally stable extremal regions","volume":"22","author":"Matas","year":"2004","journal-title":"Image Vis. Comput."},{"key":"ref_24","unstructured":"Elgammal, A., Harwood, D., and Davis, L. (July, January 26). Non-parametric model for background subtraction. Proceedings of the European Conference on Computer Vision (ECCV), Dublin, Ireland."},{"key":"ref_25","unstructured":"Zheng, M., Wu, Z., and Bakhdavlatov, S. (November, January 29). Real-time aerial targets detection algorithm based background subtraction. Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, Kaohsiung, Taiwan."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Li, W., Yao, J., and Dong, T. (2015, January 14\u201316). Moving vehicle detection based on an improved interframe difference and a Gaussian model. Proceedings of the Congress on Image and Signal Processing (CISP), Shenyang, China.","DOI":"10.1109\/CISP.2015.7408019"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Chen, Y., and Dong, J. (2016, January 10\u201311). Target Detection Based on the Interframe Difference of Block and Graph-Based. Proceedings of the 2016 9th International Symposium on Computational Intelligence and Design (ISCID), Hangzhou, China.","DOI":"10.1109\/ISCID.2016.2115"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Hossen, M., and Tuli, S. (2016, January 13\u201314). A surveillance system based on motion detection and motion estimation using optical flow. Proceedings of the 2016 5th International Conference on Informatics, Electronics and Vision (ICIEV), Dhaka, Bangladesh.","DOI":"10.1109\/ICIEV.2016.7760081"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Han, X., Gao, Y., and Lu, Z. (2015, January 18\u201320). Research on Moving Object Detection Algorithm Based on Improved Three Frame Difference Method and Optical Flow. Proceedings of the 2015 Fifth International Conference on Instrumentation and Measurement, Computer, Communication and Control (IMCCC), Qinhuangdao, China.","DOI":"10.1109\/IMCCC.2015.420"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"323","DOI":"10.1016\/j.infrared.2004.06.001","article-title":"Detecting and tracking dim moving point target in IR image sequence","volume":"46","author":"Zhang","year":"2005","journal-title":"Infrared Phys. Technol."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"3609","DOI":"10.1109\/TVT.2021.3066516","article-title":"Multi-Frame Integration Method for Radar Detection of Weak Moving Target","volume":"70","author":"Li","year":"2021","journal-title":"IEEE Trans. Veh. Technol."},{"key":"ref_32","first-page":"8234349","article-title":"Dim-Small Target Detection Based on Adaptive Pipeline Filtering","volume":"1","author":"Li","year":"2020","journal-title":"Math. Probl. Eng."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"3037","DOI":"10.1109\/TGRS.2017.2660879","article-title":"Robust Infrared Maritime Target Detection Based on Visual Attention and Spatiotemporal Filtering","volume":"55","author":"Dong","year":"2017","journal-title":"IEEE Trans. Geosci. Remote Sens."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Wang, B., Dong, L., and Zhao, M. (2015, January 23\u201325). A small dim infrared maritime target detection algorithm based on local peak detection and pipeline-filtering. Proceedings of the International Conference on Graphic & Image Processing, Singapore.","DOI":"10.1117\/12.2228418"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Lin, T., Dollar, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21\u201326). Feature pyramid networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_36","unstructured":"Fu, C., Liu, W., Rang, A., Tyagi, A., and Berg, A.C. (2017). DSSD: Deconvolutional single shot detector. arXiv."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Chen, C., Liu, M., Tuzel, O., and Xiao, J. (2016, January 20\u201324). R-CNN for Small Object Detection. Proceedings of the Asian Conference on Computer Vision, Taipei, Taiwan.","DOI":"10.1007\/978-3-319-54193-8_14"},{"key":"ref_38","first-page":"936","article-title":"SCAN: Semantic Context Aware Network for Accurate Small Object Detection","volume":"11","author":"Guang","year":"2018","journal-title":"Int. J. Comput. Intell. Syst."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Wang, J., Chen, K., Yang, S., Loy, C.C., and Lin, D. (2019, January 15\u201320). Region Proposal by Guided Anchoring. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00308"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Zhang, S., Zhu, X., Lei, Z., Shi, H., Wang, X., and Li, S.Z. (2017, January 22\u201329). S3fd: Single shot scale-invariant face detector. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, IT, USA.","DOI":"10.1109\/ICCV.2017.30"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"9651","DOI":"10.1109\/TIE.2019.2899548","article-title":"Simultaneously Detecting and Counting Dense Vehicles From Drone Images","volume":"66","author":"Li","year":"2019","journal-title":"IEEE Trans. Ind. Electron."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Dai, X., Chen, Y., Xiao, B., Chen, D., Liu, M., Yuan, L., and Zhang, L. (2021, January 20\u201325). Dynamic Head: Unifying Object Detection Heads with Attentions. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00729"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Ghiasi, G., Cui, Y., Srinivas, A., Qian, R., Lin, T.-Y., Cubuk, E.D., Le, Q.V., and Zoph, B. (2021, January 20\u201325). Simple copy-paste is a strong data augmentation method for instance segmentation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00294"},{"key":"ref_44","unstructured":"Bochkovskiy, A., Wang, C.-Y., and Liao, H.-J.M. (2020). YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv."},{"key":"ref_45","unstructured":"Zhu, X., Su, W., Lu, L., Li, B., Wang, X., and Dai, J. (2020). Deformable DETR: Deformable Transformers for End-to-End Object Detection. arXiv."},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Zhu, X., Lyu, S., Wang, X., and Zhao, Q. (2021, January 11\u201317). TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-captured Scenarios. Proceedings of the IEEE\/CVF International Conference on Computer Vision Workshops (ICCVW), Montreal, QC, Canada.","DOI":"10.1109\/ICCVW54120.2021.00312"},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"294","DOI":"10.1016\/j.patrec.2021.09.002","article-title":"FAVOD: Feature fusion architecture for video object detection","volume":"151","author":"Perreault","year":"2021","journal-title":"Pattern Recognit. Lett."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Perreault, H., Heritier, M., Gravel, P., Bilodeau, G.-A., and Saunier, N. (2020). RN-VID: A Feature Fusion Architecture for Video Object Detection. Proceedings of the International Conference on Image Analysis and Recognition, Varzim, Portugal, 24\u201326 June 2020, Springer.","DOI":"10.1007\/978-3-030-50347-5_12"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Han, M., Wang, Y., Chang, X., and Qiao, Y. (2020, January 23\u201328). Mining Inter-Video Proposal Relations for Video Object Detection. Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK.","DOI":"10.1007\/978-3-030-58589-1_26"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Perreault, H., Bilodeau, G.-A., Saunier, N., and Heritier, M. (2020, January 13\u201315). Spotnet: Self-attention multi-task network for object detection. Proceedings of the 2020 17th Conference on Computer and Robot Vision (CRV), Ottawa, ON, Canada.","DOI":"10.1109\/CRV50864.2020.00038"},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"62","DOI":"10.1109\/TSMC.1979.4310076","article-title":"A Threshold Selection Method from Gray-Level Histograms","volume":"9","author":"Otsu","year":"2007","journal-title":"IEEE Trans. Syst. Man Cybern."},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Chen, L., Zhu, Y., and Papandreou, G. (2018, January 8\u201314). Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_49"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/6\/2354\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T22:38:49Z","timestamp":1760135929000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/6\/2354"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,18]]},"references-count":52,"journal-issue":{"issue":"6","published-online":{"date-parts":[[2022,3]]}},"alternative-id":["s22062354"],"URL":"https:\/\/doi.org\/10.3390\/s22062354","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,3,18]]}}}