{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,21]],"date-time":"2026-01-21T09:30:38Z","timestamp":1768987838621,"version":"3.49.0"},"reference-count":46,"publisher":"MDPI AG","issue":"11","license":[{"start":{"date-parts":[[2023,5,31]],"date-time":"2023-05-31T00:00:00Z","timestamp":1685491200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100011789","name":"Department of Science and Technology of Jilin Province","doi-asserted-by":"publisher","award":["20210201132GX"],"award-info":[{"award-number":["20210201132GX"]}],"id":[{"id":"10.13039\/501100011789","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Based on the versatility and effectiveness of the siamese neural network, the technology of unmanned aerial vehicle visual object tracking has found widespread application in various fields including military reconnaissance, intelligent transportation, and visual positioning. However, due to complex factors, such as occlusions, viewpoint changes, and interference from similar objects during UAV tracking, most existing siamese neural network trackers struggle to combine superior performance with efficiency. To tackle this challenge, this paper proposes a novel SiamSTM tracker that is based on Slight Aware Enhancement Transformer and Multiple matching networks for real-time UAV tracking. The SiamSTM leverages lightweight transformers to encode robust target appearance features while using the Multiple matching networks to fully perceive response map information and enhance the tracker\u2019s ability to distinguish between the target and background. The results are impressive: evaluation results based on three UAV tracking benchmarks showed superior speed and precision. Moreover, SiamSTM achieves over 35 FPS on NVIDIA Jetson AGX Xavier, which satisfies the real-time requirements in engineering.<\/jats:p>","DOI":"10.3390\/rs15112857","type":"journal-article","created":{"date-parts":[[2023,5,31]],"date-time":"2023-05-31T02:57:10Z","timestamp":1685501830000},"page":"2857","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":15,"title":["Slight Aware Enhancement Transformer and Multiple Matching Network for Real-Time UAV Tracking"],"prefix":"10.3390","volume":"15","author":[{"given":"Anping","family":"Deng","sequence":"first","affiliation":[{"name":"Changchun Institute of Optics, Fine Mechanics and Physics (CIOMP), Chinese Academy of Sciences, Changchun 130033, China"},{"name":"University of Chinese Academy of Sciences, Beijing 101408, China"}]},{"given":"Guangliang","family":"Han","sequence":"additional","affiliation":[{"name":"Changchun Institute of Optics, Fine Mechanics and Physics (CIOMP), Chinese Academy of Sciences, Changchun 130033, China"}]},{"given":"Dianbin","family":"Chen","sequence":"additional","affiliation":[{"name":"Changchun Institute of Optics, Fine Mechanics and Physics (CIOMP), Chinese Academy of Sciences, Changchun 130033, China"}]},{"given":"Tianjiao","family":"Ma","sequence":"additional","affiliation":[{"name":"Changchun Institute of Optics, Fine Mechanics and Physics (CIOMP), Chinese Academy of Sciences, Changchun 130033, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1952-2370","authenticated-orcid":false,"given":"Zhichao","family":"Liu","sequence":"additional","affiliation":[{"name":"Changchun Institute of Optics, Fine Mechanics and Physics (CIOMP), Chinese Academy of Sciences, Changchun 130033, China"},{"name":"University of Chinese Academy of Sciences, Beijing 101408, China"}]}],"member":"1968","published-online":{"date-parts":[[2023,5,31]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Li, B., Fu, C., Ding, F., Ye, J., and Lin, F. (2022). All-day object tracking for unmanned aerial vehicle. IEEE Trans. Mob. Comput.","DOI":"10.1109\/TMC.2022.3162892"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Zhang, Z. (2022). Object Tracking based on satellite videos: A literature review. Remote Sens., 14.","DOI":"10.3390\/rs14153674"},{"key":"ref_3","unstructured":"Fu, C., Lu, K., Zheng, G., Ye, J., Cao, Z., and Li, B. (2022). Siamese object tracking for unmanned aerial vehicle: A review and comprehensive analysis. arXiv."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Bolme, D.S., Beveridge, J.R., Draper, B.A., and Lui, Y.M. (2010, January 13\u201318). Visual object tracking using adaptive correlation filters. Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA.","DOI":"10.1109\/CVPR.2010.5539960"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Tao, R., Gavves, E., and Smeulders, A.W.M. (2016, January 27\u201330). Siamese instance search for tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.158"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Marvasti-Zadeh, S.M., Cheng, L., and Ghanei-Yakhdan, H. (2021). Deep learning for visual tracking: A comprehensive survey. IEEE Trans. Intell. Transp. Syst.","DOI":"10.1109\/TITS.2020.3046478"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1109\/MGRS.2021.3115137","article-title":"Deep learning for unmanned aerial vehicle-based object detection and tracking: A survey","volume":"10","author":"Wu","year":"2021","journal-title":"IEEE Geosci. Remote Sens. Mag."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1109\/TPAMI.2014.2345390","article-title":"High-speed tracking with kernelized correlation filters","volume":"37","author":"Henriques","year":"2014","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_9","unstructured":"Huang, Z., Fu, C., Li, Y., Lin, F., and Lu, P. (November, January 27). Learning aberrance repressed correlation filters for real-time UAV tracking. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Li, Y., Fu, C., Ding, F., Huang, Z., and Lu, G. (2020, January 13\u201319). AutoTrack: Towards high-performance visual tracking for UAV with automatic spatio-temporal regularization. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01194"},{"key":"ref_11","unstructured":"Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., and Torr, P. (2016). Computer Vision\u2013ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8\u201310 and 15\u201316, 2016, Proceedings, Part II 14, Springer International Publishing."},{"key":"ref_12","unstructured":"Bo, L., Yan, J., Wei, W., Zheng, Z., and Hu, X. (2018, January 18\u201322). High performance visual tracking with siamese region proposal network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., and Yan, J. (2019, January 15\u201320). Siamrpn++: Evolution of siamese visual tracking with very deep networks. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00441"},{"key":"ref_14","unstructured":"Zhang, Z., and Zhang, L. (2021). Domain Adaptive SiamRPN++ for Object Tracking in the Wild. arXiv."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Peng, J., Jiang, Z., Gu, Y., Wu, Y., Wang, Y., and Tai, Y. (2021). Siamrcr: Reciprocal classification and regression for visual object tracking. arXiv.","DOI":"10.24963\/ijcai.2021\/132"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Voigtlaender, P., Luiten, J., Torr, P., and Leibe, B. (2020, January 13\u201319). Siam r-cnn: Visual tracking by re-detection. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00661"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Guo, D., Shao, Y., Cui, Y., Wang, Z., Zhang, L., and Shen, C. (2021, January 20\u201325). Graph attention tracking. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00942"},{"key":"ref_18","first-page":"I","article-title":"Attention is all you need","volume":"30","author":"Vaswani","year":"2017","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_19","unstructured":"Thangavel, J., Kokul, T., Ramanan, A., and Fernando, S. (2023). Transformers in Single Object Tracking: An Experimental Survey. arXiv."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Deng, A., Liu, J., Chen, Q., Wang, X., and Zuo, Y. (2022). Visual Tracking with FPN Based on Transformer and Response Map Enhancement. Appl. Sci., 12.","DOI":"10.3390\/app12136551"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Cao, Z., Fu, C., Ye, J., Li, B., and Li, Y. (2021, January 11\u201317). Hift: Hierarchical feature transformer for aerial tracking. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.01517"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Cao, Z., Fu, C., Ye, J., Li, B., and Li, Y. (October, January 27). SiamAPN++: Siamese attentional aggregation network for real-time UAV tracking. Proceedings of the 2021 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic.","DOI":"10.1109\/IROS51168.2021.9636309"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Yao, L., Fu, C., and Li, S. (2023). SGDViT: Saliency-Guided Dynamic Vision Transformer for UAV Tracking. arXiv.","DOI":"10.1109\/ICRA48891.2023.10161487"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Cao, Z., Huang, Z., Pan, L., Zhang, S., Liu, Z., and Fu, C. (2022, January 18\u201324). TCTrack: Temporal contexts for aerial tracking. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01438"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Blatter, P., Kanakis, M., Danelljan, M., and Gool, L.V. (2023, January 2\u20137). Efficient visual tracking with exemplar transformers. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA.","DOI":"10.1109\/WACV56688.2023.00162"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1145\/3065386","article-title":"Imagenet classification with deep convolutional neural networks","volume":"60","author":"Krizhevsky","year":"2017","journal-title":"Commun. ACM"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and Chen, L.C. (2018, January 18\u201322). Mobilenetv2: Inverted residuals and linear bottlenecks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00474"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Chen, X., Yan, B., Zhu, J., Wang, D., Yang, X., and Lu, H. (2021, January 20\u201325). Transformer tracking. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00803"},{"key":"ref_29","first-page":"28522","article-title":"Vitae: Vision transformer advanced by exploring intrinsic inductive bias","volume":"34","author":"Xu","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_30","unstructured":"Ba, J.L., Kiros, J.R., and Hinton, G.E. (2016). Layer normalization. arXiv."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., and Batra, D. (2017, January 22\u201329). Grad-cam: Visual explanations from deep networks via gradient-based localization. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.74"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Javed, S., Danelljan, M., Khan, F.S., Khan, M.H., Felsberg, M., and Matas, J. (2022). Visual object tracking with discriminative filters and siamese networks: A survey and outlook. IEEE Trans. Pattern Anal. Mach. Intell.","DOI":"10.1109\/TPAMI.2022.3212594"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Yan, B., Zhang, X., Wang, D., Lu, H., and Yang, X. (2021, January 20\u201325). Alpha-refine: Boosting tracking performance by precise bounding box estimation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00525"},{"key":"ref_34","unstructured":"Ge, Z., Liu, S., Wang, F., Li, Z., and Sun, J. (2021). Yolox: Exceeding yolo series in 2021. arXiv."},{"key":"ref_35","unstructured":"Mueller, M., Smith, N., and Ghanem, B. (2016). Computer Vision\u2013ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11\u201314, 2016, Proceedings, Part I 14, Springer International Publishing."},{"key":"ref_36","unstructured":"Isaac-Medina, B., Poyser, M., Organisciak, D., Willcocks, C.G., Breckon, T.P., and Shum, H. (2018, January 8\u201314). The unmanned aerial vehicle benchmark: Object detection and tracking. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"1562","DOI":"10.1109\/TPAMI.2019.2957464","article-title":"Got-10k: A large high-diversity benchmark for generic object tracking in the wild","volume":"43","author":"Huang","year":"2019","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_38","unstructured":"Lin, T.-Y., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., Perona, P., Ramanan, D., Lawrence Zitnick, C., and Doll\u00e1r, P. (2014). Computer Vision\u2013ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6\u201312, 2014, Proceedings, Part V 13, Springer International Publishing."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Zhang, Z., and Peng, H. (2019, January 15\u201320). Deeper and wider siamese networks for real-time visual tracking. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00472"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Bertinetto, L., Valmadre, J., Golodetz, S., Miksik, O., and Torr, P.H.S. (2016, January 27\u201330). Staple: Complementary learners for real-time tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.156"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Fu, C., Peng, W., Li, S., Ye, J., and Cao, Z. (2022, January 23\u201327). Local Perception-Aware Transformer for Aerial Tracking. Proceedings of the 2022 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan.","DOI":"10.1109\/IROS47612.2022.9981248"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Xu, Y., Wang, Z., Li, Z., Yuan, Y., and Yu, G. (2020, January 7\u201312). Siamfc++: Towards robust and accurate visual tracking with target estimation guidelines. Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA.","DOI":"10.1609\/aaai.v34i07.6944"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., and Hu, W. (2018, January 8\u201314). Distractor-aware siamese networks for visual object tracking. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01240-3_7"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Zolfaghari, M., Singh, K., and Brox, T. (2018, January 8\u201314). Eco: Efficient convolutional network for online video understanding. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01216-8_43"},{"key":"ref_45","unstructured":"Zhang, L., Gonzalez-Garcia, A., Weijer, J.V.D., Danelljan, M., and Khan, F.S. (November, January 27). Learning the model update for siamese trackers. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_46","unstructured":"Zhang, Z., Peng, H., Fu, J., Li, B., and Hu, W. (2020). Computer Vision\u2013ECCV 2020: 16th European Conference, Glasgow, UK, August 23\u201328, 2020, Proceedings, Part XXI 16, Springer International Publishing."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/15\/11\/2857\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T19:45:52Z","timestamp":1760125552000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/15\/11\/2857"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,5,31]]},"references-count":46,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2023,6]]}},"alternative-id":["rs15112857"],"URL":"https:\/\/doi.org\/10.3390\/rs15112857","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,5,31]]}}}