{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T15:26:43Z","timestamp":1772119603356,"version":"3.50.1"},"reference-count":47,"publisher":"MDPI AG","issue":"7","license":[{"start":{"date-parts":[[2022,3,25]],"date-time":"2022-03-25T00:00:00Z","timestamp":1648166400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100012130","name":"Aeronautical Science Foundation of China","doi-asserted-by":"publisher","award":["20185142003"],"award-info":[{"award-number":["20185142003"]}],"id":[{"id":"10.13039\/501100012130","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100012335","name":"National Defense Basic Scientific Research Program of China","doi-asserted-by":"publisher","award":["JCKY2018419C001"],"award-info":[{"award-number":["JCKY2018419C001"]}],"id":[{"id":"10.13039\/501100012335","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Science and Technology Innovative Talents in Universities of Henan Province","award":["21HASTIT030"],"award-info":[{"award-number":["21HASTIT030"]}]},{"name":"National Thirteen-Five Equipment Pre-Research Foundation of China","award":["61403120207"],"award-info":[{"award-number":["61403120207"]}]},{"name":"Young Backbone Teachers in Universities of Henan Province","award":["2020GGJS073"],"award-info":[{"award-number":["2020GGJS073"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>Visual object tracking for unmanned aerial vehicles (UAV) is widely used in many fields such as military reconnaissance, search and rescue work, film shooting, and so on. However, the performance of existing methods is still not very satisfactory due to some complex factors including viewpoint changing, background clutters and occlusion. The Siamese trackers, which offer a convenient way of formulating the visual tracking problem as a template matching process, have achieved success in recent visual tracking datasets. Unfortunately, these template match-based trackers cannot adapt well to frequent appearance change in UAV video datasets. To deal with this problem, this paper proposes a template-driven Siamese network (TDSiam), which consists of feature extraction subnetwork, feature fusion subnetwork and bounding box estimation subnetwork. Especially, a template library branch is proposed for the feature extraction subnetwork to adapt to the changeable appearance of the target. In addition, a feature aligned (FA) module is proposed as the core of feature fusion subnetwork, which can fuse information in the form of center alignment. More importantly, a method for occlusion detection is proposed to reduce the noise caused by occlusion. Experiments were conducted on two challenging benchmarks UAV123 and UAV20L, the results verified the more competitive performance of our proposed method compared to the existing algorithms.<\/jats:p>","DOI":"10.3390\/rs14071584","type":"journal-article","created":{"date-parts":[[2022,3,27]],"date-time":"2022-03-27T21:29:36Z","timestamp":1648416576000},"page":"1584","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["Visual Object Tracking for Unmanned Aerial Vehicles Based on the Template-Driven Siamese Network"],"prefix":"10.3390","volume":"14","author":[{"given":"Lifan","family":"Sun","sequence":"first","affiliation":[{"name":"School of Information Engineering, Henan University of Science and Technology, Luoyang 471023, China"},{"name":"School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China"}]},{"given":"Zhe","family":"Yang","sequence":"additional","affiliation":[{"name":"School of Information Engineering, Henan University of Science and Technology, Luoyang 471023, China"}]},{"given":"Jinjin","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Information Engineering, Henan University of Science and Technology, Luoyang 471023, China"}]},{"given":"Zhumu","family":"Fu","sequence":"additional","affiliation":[{"name":"School of Information Engineering, Henan University of Science and Technology, Luoyang 471023, China"}]},{"given":"Zishu","family":"He","sequence":"additional","affiliation":[{"name":"School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China"}]}],"member":"1968","published-online":{"date-parts":[[2022,3,25]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Fu, C., Lin, F., Li, Y., and Chen, G. (2019). Correlation filter-based visual tracking for UAV with online multi-feature learning. Remote Sens., 11.","DOI":"10.3390\/rs11050549"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Zhang, S., Zhuo, L., Zhang, H., and Li, J. (2020). Object tracking in unmanned aerial vehicle videos via multifeature discrimination and instance-aware attention network. Remote Sens., 12.","DOI":"10.3390\/rs12162646"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Zhuo, L., Liu, B., Zhang, H., Zhang, S., and Li, J. (2021). MultiRPN-DIDNet: Multiple RPNs and Distance-IoU Discriminative Network for Real-Time UAV Target Tracking. Remote Sens., 13.","DOI":"10.3390\/rs13142772"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Xue, X., Li, Y., Dong, H., and Shen, Q. (2018). Robust correlation tracking for UAV videos via feature fusion and saliency proposals. Remote Sens., 10.","DOI":"10.3390\/rs10101644"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Wu, Y., Lim, J., and Yang, M.H. (2013, January 23\u201328). Online object tracking: A benchmark. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA.","DOI":"10.1109\/CVPR.2013.312"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Zhu, Z., Wu, W., Zou, W., and Yan, J. (2018, January 18\u201323). End-to-end flow correlation tracking with spatial-temporal attention. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00064"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"2002","DOI":"10.1109\/TPAMI.2014.2315808","article-title":"Fast compressive tracking","volume":"36","author":"Zhang","year":"2014","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Bolme, D.S., Beveridge, J.R., Draper, B.A., and Lui, Y.M. (2010, January 13\u201318). Visual object tracking using adaptive correlation filters. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA.","DOI":"10.1109\/CVPR.2010.5539960"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Henriques, J.F., Caseiro, R., Martins, P., and Batista, J. (2012, January 7\u201313). Exploiting the circulant structure of tracking-by-detection with kernels. Proceedings of the European Conference on Computer Vision, Proceedings of the Computer Vision\u2014ECCV 2012, 12th European Conference on Computer Vision, Florence, Italy.","DOI":"10.1007\/978-3-642-33765-9_50"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1109\/TPAMI.2014.2345390","article-title":"High-speed tracking with kernelized correlation filters","volume":"37","author":"Henriques","year":"2014","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_11","unstructured":"Wang, N., and Yeung, D.Y. (2013, January 5\u201310). Learning a deep compact image representation for visual tracking. Proceedings of the Advances in Neural Information Processing Systems, Harrahs, NV, USA."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Wang, L., Ouyang, W., Wang, X., and Lu, H. (2015, January 7\u201313). Visual tracking with fully convolutional networks. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.357"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Nam, H., and Han, B. (2016, January 27\u201330). Learning multi-domain convolutional neural networks for visual tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.465"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Li, B., Yan, J., Wu, W., Zhu, Z., and Hu, X. (2018, January 18\u201323). High performance visual tracking with siamese region proposal network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00935"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Danelljan, M., Bhat, G., Shahbaz Khan, F., and Felsberg, M. (2017, January 21\u201326). Eco: Efficient convolution operators for tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.733"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., and Torr, P.H. (15\u201316, January 8\u201310). Fully-convolutional siamese networks for object tracking. Proceedings of the European Conference on Computer Vision, Computer Vision\u2014ECCV 2016 Workshops, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-48881-3_56"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Danelljan, M., Hager, G., Shahbaz Khan, F., and Felsberg, M. (2015, January 7\u201313). Convolutional features for correlation filter based visual tracking. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCVW.2015.84"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Mueller, M., Smith, N., and Ghanem, B. (2016, January 11\u201314). A benchmark and simulator for uav tracking. Proceedings of the European Conference on Computer Vision, Computer Vision\u2014ECCV 2016, 14th European Conference, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_27"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Tao, R., Gavves, E., and Smeulders, A.W. (2016, January 27\u201330). Siamese instance search for tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.158"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Held, D., Thrun, S., and Savarese, S. (2016, January 11\u201314). Learning to track at 100 fps with deep regression networks. Proceedings of the European Conference on Computer Vision, Computer Vision\u2014ECCV 2016, 14th European Conference, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_45"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Valmadre, J., Bertinetto, L., Henriques, J., Vedaldi, A., and Torr, P.H. (2017, January 21\u201326). End-to-end representation learning for correlation filter based tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.531"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Wang, Q., Teng, Z., Xing, J., Gao, J., Hu, W., and Maybank, S. (2018, January 18\u201323). Learning attentions: Residual attentional siamese network for high performance online visual tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00510"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., and Hu, W. (2018, January 8\u201314). Distractor-aware siamese networks for visual object tracking. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01240-3_7"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Wang, M., Liu, Y., and Huang, Z. (2017, January 21\u201326). Large margin object tracking with circulant feature maps. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.510"},{"key":"ref_25","unstructured":"Krizhevsky, A., Sutskever, I., and Hinton, G.E. (2012, January 3\u20138). Imagenet classification with deep convolutional neural networks. Proceedings of the Advances in Neural Information Processing Systems, Harrahs, NV, USA."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Zhang, Z., and Peng, H. (2019, January 15\u201320). Deeper and wider siamese networks for real-time visual tracking. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00472"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., and Yan, J. (2019, January 15\u201320). Siamrpn++: Evolution of siamese visual tracking with very deep networks. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00441"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Yu, Y., Xiong, Y., Huang, W., and Scott, M.R. (2020, January 14\u201319). Deformable siamese attention networks for visual object tracking. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00676"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Zhang, L., Gonzalez-Garcia, A., Weijer, J.v.d., Danelljan, M., and Khan, F.S. (2019, January 15\u201320). Learning the model update for siamese trackers. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/ICCV.2019.00411"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Xu, Y., Wang, Z., Li, Z., Yuan, Y., and Yu, G. (2020, January 7\u201312). Siamfc++: Towards robust and accurate visual tracking with target estimation guidelines. Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA.","DOI":"10.1609\/aaai.v34i07.6944"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Guo, D., Wang, J., Cui, Y., Wang, Z., and Chen, S. (2020, January 14\u201319). SiamCAR: Siamese fully convolutional classification and regression for visual tracking. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00630"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Chen, Z., Zhong, B., Li, G., Zhang, S., and Ji, R. (2020, January 14\u201319). Siamese box adaptive network for visual tracking. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00670"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Huang, Z., Fu, C., Li, Y., Lin, F., and Lu, P. (2019, January 27\u201328). Learning aberrance repressed correlation filters for real-time uav tracking. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00298"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"4814","DOI":"10.1109\/TIP.2021.3076272","article-title":"Learning Deep Lucas-Kanade Siamese Network for Visual Tracking","volume":"30","author":"Yao","year":"2021","journal-title":"IEEE Trans. Image Process."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","article-title":"Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks","volume":"39","author":"Ren","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Doll\u00e1r, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017, January 21\u201326). Feature pyramid networks for object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.106"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Du, D., Qi, Y., Yu, H., Yang, Y., Duan, K., Li, G., Zhang, W., Huang, Q., and Tian, Q. (2018, January 8\u201314). The unmanned aerial vehicle benchmark: Object detection and tracking. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01249-6_23"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1007\/s11263-015-0816-y","article-title":"Imagenet large scale visual recognition challenge","volume":"115","author":"Russakovsky","year":"2015","journal-title":"Int. J. Comput. Vis."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"1562","DOI":"10.1109\/TPAMI.2019.2957464","article-title":"Got-10k: A large high-diversity benchmark for generic object tracking in the wild","volume":"43","author":"Huang","year":"2019","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Real, E., Shlens, J., Mazzocchi, S., Pan, X., and Vanhoucke, V. (2017, January 21\u201326). Youtube-boundingboxes: A large high-precision human-annotated data set for object detection in video. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.789"},{"key":"ref_42","unstructured":"Kingma, D.P., and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Zhang, J., Ma, S., and Sclaroff, S. (2014, January 6\u201312). MEEM: Robust tracking via multiple experts using entropy minimization. Proceedings of the European Conference on Computer Vision, Computer Vision\u2014ECCV 2014, 13th European Conference, Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10599-4_13"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Li, Y., and Zhu, J. (2014, January 6\u201312). A scale adaptive kernel correlation filter tracker with feature integration. Proceedings of the European Conference on Computer Vision, Computer Vision\u2014ECCV 2014, 13th European Conference, Zurich, Switzerland.","DOI":"10.1007\/978-3-319-16181-5_18"},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"2096","DOI":"10.1109\/TPAMI.2015.2509974","article-title":"Struck: Structured output tracking with kernels","volume":"38","author":"Hare","year":"2015","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Danelljan, M., Hager, G., Shahbaz Khan, F., and Felsberg, M. (2015, January 7\u201313). Learning spatially regularized correlation filters for visual tracking. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.490"},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Danelljan, M., Robinson, A., Khan, F.S., and Felsberg, M. (2016, January 11\u201314). Beyond correlation filters: Learning continuous convolution operators for visual tracking. Proceedings of the European Conference on Computer Vision, Computer Vision\u2014ECCV 2016, 14th European Conference, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46454-1_29"}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/7\/1584\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T22:43:22Z","timestamp":1760136202000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/14\/7\/1584"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,25]]},"references-count":47,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2022,4]]}},"alternative-id":["rs14071584"],"URL":"https:\/\/doi.org\/10.3390\/rs14071584","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,3,25]]}}}