{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,27]],"date-time":"2025-10-27T11:01:35Z","timestamp":1761562895812,"version":"build-2065373602"},"reference-count":45,"publisher":"MDPI AG","issue":"14","license":[{"start":{"date-parts":[[2022,7,12]],"date-time":"2022-07-12T00:00:00Z","timestamp":1657584000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Natural Science Basic Research Project of Shaanxi Provincial Department of Science and Technology","award":["2022JQ-677","18JK0383"],"award-info":[{"award-number":["2022JQ-677","18JK0383"]}]},{"name":"Natural Science Foundation of Shaanxi Provincial Department of Education","award":["2022JQ-677","18JK0383"],"award-info":[{"award-number":["2022JQ-677","18JK0383"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>This paper proposes a tracking method combining feature enhancement and template update, aiming to solve the problems of existing trackers lacking global information attention, weak feature characterization ability, and not being well adapted to the changing appearance of the target. Pre-extracted features are enhanced in context and on channels through a feature enhancement network consisting of channel attention and transformer architectures. The enhanced feature information is input into classification and regression networks to achieve the final target state estimation. At the same time, the template update strategy is introduced to update the sample template judiciously. Experimental results show that the proposed tracking method exhibits good tracking performance on the OTB100, LaSOT, and GOT-10k benchmark datasets.<\/jats:p>","DOI":"10.3390\/s22145219","type":"journal-article","created":{"date-parts":[[2022,7,12]],"date-time":"2022-07-12T23:02:01Z","timestamp":1657666921000},"page":"5219","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":10,"title":["Transformer Feature Enhancement Network with Template Update for Object Tracking"],"prefix":"10.3390","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6326-7130","authenticated-orcid":false,"given":"Xiuhua","family":"Hu","sequence":"first","affiliation":[{"name":"School of Computer Science and Engineering, Xi\u2019an Technological University, Xi\u2019an 710021, China"},{"name":"State and Provincial Joint Engineering Laboratory of Advanced Network, Monitoring and Control, Xi\u2019an 710021, China"}]},{"given":"Huan","family":"Liu","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Xi\u2019an Technological University, Xi\u2019an 710021, China"},{"name":"State and Provincial Joint Engineering Laboratory of Advanced Network, Monitoring and Control, Xi\u2019an 710021, China"}]},{"given":"Yan","family":"Hui","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Xi\u2019an Technological University, Xi\u2019an 710021, China"},{"name":"State and Provincial Joint Engineering Laboratory of Advanced Network, Monitoring and Control, Xi\u2019an 710021, China"}]},{"given":"Xi","family":"Wu","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Xi\u2019an Technological University, Xi\u2019an 710021, China"},{"name":"State and Provincial Joint Engineering Laboratory of Advanced Network, Monitoring and Control, Xi\u2019an 710021, China"}]},{"given":"Jing","family":"Zhao","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Xi\u2019an Technological University, Xi\u2019an 710021, China"},{"name":"State and Provincial Joint Engineering Laboratory of Advanced Network, Monitoring and Control, Xi\u2019an 710021, China"}]}],"member":"1968","published-online":{"date-parts":[[2022,7,12]]},"reference":[{"key":"ref_1","first-page":"3943","article-title":"Deep learning for visual tracking: A comprehensive survey","volume":"23","author":"Cheng","year":"2021","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Javed, S., Danelljan, M., Khan, F.S., Khan, M.H., Felsberg, M., and Matas, J. (2021). Visual Object Tracking with Discriminative Filters and Siamese Networks: A Survey and Outlook. arXiv.","DOI":"10.1109\/TPAMI.2022.3212594"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., and Torr, P.H. (2016, January 8\u201316). Fully-convolutional siamese networks for object tracking. Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-48881-3_56"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Chen, Z., Zhong, B., Li, G., Zhang, S., and Ji, R. (2020, January 14\u201319). Siamese box adaptive network for visual tracking. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00670"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Shen, Q., Qiao, L., Guo, J., Li, P., Li, X., Li, B., Feng, W., Gan, W., Wu, W., and Ouyang, W. (2022). Unsupervised learning of accurate siamese tracking. arXiv.","DOI":"10.1109\/CVPR52688.2022.00793"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Li, B., Yan, J., Wu, W., Zhu, Z., and Hu, X. (2018, January 18\u201322). High performance visual tracking with siamese region proposal network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00935"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., and Yan, J. (2019, January 16\u201320). Siamrpn++: Evolution of siamese visual tracking with very deep networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00441"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Fan, H., and Ling, H. (2019, January 16\u201320). Siamese cascaded region proposal networks for real-time visual tracking. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00814"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., and Hu, W. (2018, January 8\u201314). Distractor-aware Siamese Networks for Visual Object Tracking. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01240-3_7"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Zhang, Z., and Peng, H. (2019, January 16\u201320). Deeper and wider siamese networks for real-time visual tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00472"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Wang, Q., Zhang, L., Bertinetto, L., Hu, W., and Torr, P.H. (2019, January 16\u201320). Fast online object tracking and segmentation: A unifying approach. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00142"},{"key":"ref_12","unstructured":"He, A., Luo, C., Tian, X., and Zeng, W. (2008, January 23\u201328). A twofold siamese network for real-time object tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Hu, J., Shen, L., and Sun, G. (2018, January 18\u201322). Squeeze-and-excitation networks. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00745"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Wang, Q., Teng, Z., Xing, J., Gao, J., Hu, W., and Maybank, S. (2018, January 18\u201322). Learning attentions: Residual attentional siamese network for high performance online visual tracking. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00510"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Yu, Y., Xiong, Y., Huang, W., and Scott, M.R. (2020, January 14\u201319). Deformable Siamese attention networks for visual object tracking. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00676"},{"key":"ref_16","unstructured":"Zhao, M., Okada, K., and Inaba, M. (2021). TrTr: Visual tracking with transformer. arXiv."},{"key":"ref_17","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, \u0141., and Polosukhin, I. (2017, January 12). Attention is all you need. Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Chen, X., Yan, B., Zhu, J., Wang, D., Yang, X., and Lu, H. (2021, January 20\u201325). Transformer Tracking. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Virtual.","DOI":"10.1109\/CVPR46437.2021.00803"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Ma, F., Shou, M.Z., Zhu, L., Fan, H., Xu, Y., Yang, Y., and Yan, Z. (2022). Unified Transformer Tracker for Object Tracking. arXiv.","DOI":"10.1109\/CVPR52688.2022.00858"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Yan, B., Peng, H., Fu, J., Wang, D., and Lu, H. (2021, January 11\u201317). Learning Spatio-Temporal Transformer for Visual Tracking. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.01028"},{"key":"ref_21","unstructured":"Xu, Y., Ban, Y., Delorme, G., Gan, C., Rus, D., and Alameda-Pineda, X. (2021). Transcenter: Transformers with dense queries for multiple-object tracking. arXiv."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Tang, C., Qin, P., and Zhang, J. (2021). Robust template adjustment siamese network for Object Tracking. Sensors, 21.","DOI":"10.3390\/s21041466"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Guo, Q., Feng, W., Zhou, C., Huang, R., Wan, L., and Wang, S. (2017, January 22\u201329). Learning dynamic siamese network for visual object tracking. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.196"},{"key":"ref_24","unstructured":"Li, P., Chen, B., Ouyang, W., Wang, D., Yang, X., and Lu, H. (November, January 27). Gradnet: Gradient-guided network for visual object tracking. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Yang, T., and Chan, A.B. (2018, January 20). Learning dynamic memory networks for object tracking. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01240-3_10"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Fan, H., Lin, L., Yang, F., Chu, P., Deng, G., Yu, S., Bai, H., Xu, Y., Liao, C., and Ling, H. (2019, January 16\u201320). LaSOT: A high-quality benchmark for large-scale single object tracking. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Bean, CA, USA.","DOI":"10.1109\/CVPR.2019.00552"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"1562","DOI":"10.1109\/TPAMI.2019.2957464","article-title":"Got-10k: A large high-diversity benchmark for generic object tracking in the wild","volume":"43","author":"Huang","year":"2021","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Doll\u00e1r, P., and Zitnick, C.L. (2014, January 6\u201312). Microsoft coco: Common objects in context. Proceedings of the European Conference on Computer Vision, Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1007\/s11263-015-0816-y","article-title":"Imagenet large scale visual recognition challenge","volume":"115","author":"Russakovsky","year":"2015","journal-title":"Int. J. Comput. Vis."},{"key":"ref_30","unstructured":"Loshchilov, I., and Hutter, F. (2017). Decoupled weight decay regularization. arXiv."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"1834","DOI":"10.1109\/TPAMI.2014.2388226","article-title":"Object tracking benchmark","volume":"37","author":"Wu","year":"2015","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Valmadre, J., Bertinetto, L., Henriques, J., Vedaldi, A., and Torr, P.H. (2017, January 21\u201326). End-to-end representation learning for correlation filter based tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.531"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Danelljan, M., Hager, G., Shahbaz Khan, F., and Felsberg, M. (2015, January 11\u201318). Learning spatially regularized correlation filters for visual tracking. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.490"},{"key":"ref_34","unstructured":"Bertinetto, L., Valmadre, J., Golodetz, S., Miksik, O., and Torr, P.H. (July, January 26). Staple: Complementary learners for real-time tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Danelljan, M., H\u00e4ger, G., Khan, F., and Felsberg, M. (2014, January 1\u20135). Accurate scale estimation for robust visual tracking. Proceedings of the British Machine Vision Conference (BMVC), Nottingham, UK.","DOI":"10.5244\/C.28.65"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Zhang, J., Ma, S., and Sclaroff, S. (2014, January 5\u201312). MEEM: Robust tracking via multiple experts using entropy minimization. Proceedings of the European Conference on Computer Vision (ECCV), Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10599-4_13"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Danelljan, M., Bhat, G., Khan, F.S., and Felsberg, M. (2019, January 16\u201320). Atom: Accurate tracking by overlap maximization. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00479"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Song, Y., Ma, C., Wu, X., Gong, L., Bao, L., Zuo, W., Shen, C., Lau, R.W., and Yang, M.H. (2018, January 18\u201322). Vital: Visual tracking via adversarial learning. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00937"},{"key":"ref_39","unstructured":"Yan, B., Zhao, H., Wang, D., Lu, H., and Yang, X. (November, January 27). \u2018Skimming-Perusal\u2019 Tracking: A framework for real-time and robust long-term tracking. Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Korea."},{"key":"ref_40","unstructured":"Zhang, L., Gonzalez-Garcia, A., van de Weijer, J., Danelljan, M., and Khan, F.S. (November, January 27). Learning the model update for siamese trackers. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Korea."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Danelljan, M., Bhat, G., Shahbaz Khan, F., and Felsberg, M. (2017, January 21\u201326). Eco: Efficient convolution operators for tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.733"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Nam, H., and Han, B. (2016, January 27\u201330). Learning multi-domain convolutional neural networks for visual tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.465"},{"key":"ref_43","unstructured":"Sauer, A., Aljalbout, E., and Haddadin, S. (2019). Tracking Holistic Object Representations. arXiv."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Wang, G., Luo, C., Xiong, Z., and Zeng, W. (2019, January 16\u201320). Spm-tracker: Series-parallel matching for real-time visual object tracking. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00376"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Held, D., Thrun, S., and Savarese, S. (2016, January 8\u201316). Learning to track at 100 fps with deep regression networks. Proceedings of the European Conference Computer Vision (ECCV), Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46448-0_45"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/14\/5219\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T23:49:09Z","timestamp":1760140149000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/14\/5219"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,7,12]]},"references-count":45,"journal-issue":{"issue":"14","published-online":{"date-parts":[[2022,7]]}},"alternative-id":["s22145219"],"URL":"https:\/\/doi.org\/10.3390\/s22145219","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2022,7,12]]}}}