{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,27]],"date-time":"2026-02-27T04:28:48Z","timestamp":1772166528792,"version":"3.50.1"},"reference-count":42,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T00:00:00Z","timestamp":1672185600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T00:00:00Z","timestamp":1672185600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"National Key Research and Development Program of China","award":["2018YFB1601200"],"award-info":[{"award-number":["2018YFB1601200"]}]},{"name":"Graduate Research Innovation Grant Program of Civil Aviation University of China","award":["2021YJS026"],"award-info":[{"award-number":["2021YJS026"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["EURASIP J. Adv. Signal Process."],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>\n                    Aiming at the tracking failure due to the disappearance of the target in the long-term target tracking process, this paper proposes a long-term target tracking network based on the visual transformer and template update. First of all, we construct a feature extraction network based on the transformer and adopt a knowledge distillation strategy to improve the effectiveness of the network for global feature extraction. Secondly, in the modeling transformer, the target features are fully fused with the search area features by using encoder, and the position information in the target query is learned by the decoder. Then, target predictions are performed on the information from the encoder\u2013decoder to obtain tracking results. Meanwhile, we design a score head model to judge the validity of the dynamic template of the current frame before tracking in the next frame. We select the appropriate dynamic template for the tracking of the next frame according to the score result. In this paper, we performed extensive experiments on LaSOT, VOT2021-LT, TrackingNet, TLP, and UAV123 datasets, and the experimental results prove the effectiveness of our method. In particular, it exceeds STARK by 0.8\n                    <jats:inline-formula>\n                      <jats:alternatives>\n                        <jats:tex-math>$$\\%$$<\/jats:tex-math>\n                        <mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                          <mml:mo>%<\/mml:mo>\n                        <\/mml:math>\n                      <\/jats:alternatives>\n                    <\/jats:inline-formula>\n                    (F score) on VOT2021-LT, 1.0\n                    <jats:inline-formula>\n                      <jats:alternatives>\n                        <jats:tex-math>$$\\%$$<\/jats:tex-math>\n                        <mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                          <mml:mo>%<\/mml:mo>\n                        <\/mml:math>\n                      <\/jats:alternatives>\n                    <\/jats:inline-formula>\n                    (S score) on LaSOT, and TrackingNet exceed STARK by 1.1\n                    <jats:inline-formula>\n                      <jats:alternatives>\n                        <jats:tex-math>$$\\%$$<\/jats:tex-math>\n                        <mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                          <mml:mo>%<\/mml:mo>\n                        <\/mml:math>\n                      <\/jats:alternatives>\n                    <\/jats:inline-formula>\n                    (NP score), which also demonstrates the superiority of the method in this paper.\n                  <\/jats:p>","DOI":"10.1186\/s13634-022-00954-4","type":"journal-article","created":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T06:05:51Z","timestamp":1672207551000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Long-term tracking with transformer and template update"],"prefix":"10.1186","volume":"2022","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8113-9810","authenticated-orcid":false,"given":"Hongying","family":"Zhang","sequence":"first","affiliation":[]},{"given":"Xiaowen","family":"Peng","sequence":"additional","affiliation":[]},{"given":"Xuyong","family":"Wang","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,12,28]]},"reference":[{"key":"954_CR1","doi-asserted-by":"crossref","unstructured":"S.M. Marvasti-Zadeh, L. Cheng, H. Ghanei-Yakhdan, S. Kasaei, Deep learning for visual tracking: a comprehensive survey. IEEE Trans. Intell. Transp. Syst. 23(5), 3943\u20133968 (2021)","DOI":"10.1109\/TITS.2020.3046478"},{"key":"954_CR2","unstructured":"A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, \u0141. Kaiser, I. Polosukhin, Attention is all you need. Adv. Neural Inform. Process. Syst. 30 (2017)"},{"key":"954_CR3","doi-asserted-by":"crossref","unstructured":"T. Bian, Y. Hua, T. Song, Z. Xue, R. Ma, N. Robertson, H. Guan, Vtt: long-term visual tracking with transformers. In 2020 25th International Conference on Pattern Recognition (ICPR), pp. 9585\u20139592 (2021). IEEE","DOI":"10.1109\/ICPR48806.2021.9412156"},{"key":"954_CR4","doi-asserted-by":"crossref","unstructured":"B. Yan, H. Peng, J. Fu, et al. Learning spatio-temporal transformer for visual tracking. In: Proceedings of the IEEE\/CVF International Conference on Computer Vision, pp. 10448-10457 (2021)","DOI":"10.1109\/ICCV48922.2021.01028"},{"key":"954_CR5","unstructured":"N. Parmar, A. Vaswani, J. Uszkoreit, et al. Image transformer. In: International conference on machine learning. PMLR, pp. 4055\u20134064 (2018)"},{"key":"954_CR6","unstructured":"H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, H. J\u00e9gou, Training data-efficient image transformers & distillation through attention. In: International Conference on Machine Learning, pp. 10347\u201310357 (2021). PMLR"},{"key":"954_CR7","doi-asserted-by":"crossref","unstructured":"K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770\u2013778 (2016)","DOI":"10.1109\/CVPR.2016.90"},{"key":"954_CR8","unstructured":"Y. Zhang, D. Wang, L. Wang, J. Qi, H. Lu, Learning regression and verification networks for long-term visual tracking. arXiv preprint arXiv:1809.04320 (2018)"},{"key":"954_CR9","doi-asserted-by":"crossref","unstructured":"J. Valmadre, L. Bertinetto, J.F. Henriques, R. Tao, A. Vedaldi, A.W. Smeulders, P.H. Torr, E. Gavves, Long-term tracking in the wild: a benchmark. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 670\u2013685 (2018)","DOI":"10.1007\/978-3-030-01219-9_41"},{"key":"954_CR10","doi-asserted-by":"crossref","unstructured":"L. Bertinetto, J. Valmadre, J.F. Henriques, A. Vedaldi, P.H. Torr, Fully-convolutional siamese networks for object tracking. In: European Conference on Computer Vision, pp. 850\u2013865 (2016). Springer","DOI":"10.1007\/978-3-319-48881-3_56"},{"key":"954_CR11","doi-asserted-by":"crossref","unstructured":"P. Voigtlaender, J., Luiten, P.H. Torr, B. Leibe, Siam r-cnn: Visual tracking by re-detection. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 6578\u20136588 (2020)","DOI":"10.1109\/CVPR42600.2020.00661"},{"key":"954_CR12","doi-asserted-by":"publisher","first-page":"119","DOI":"10.1016\/j.patrec.2018.09.017","volume":"127","author":"T Li","year":"2019","unstructured":"T. Li, S. Zhao, Q. Meng, Y. Chen, J. Shen, A stable long-term object tracking method with re-detection strategy. Pattern Recogn. Lett. 127, 119\u2013127 (2019)","journal-title":"Pattern Recogn. Lett."},{"issue":"3","key":"954_CR13","doi-asserted-by":"publisher","first-page":"730","DOI":"10.1109\/TCSVT.2018.2816570","volume":"29","author":"N Wang","year":"2018","unstructured":"N. Wang, W. Zhou, H. Li, Reliable re-detection for long-term tracking. IEEE Trans. Circuits Syst. Video Technol. 29(3), 730\u2013743 (2018)","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"954_CR14","doi-asserted-by":"crossref","unstructured":"L. Huang, X. Zhao, K. Huang, Globaltrack: A simple and strong baseline for long-term tracking. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11037\u201311044 (2020)","DOI":"10.1609\/aaai.v34i07.6758"},{"key":"954_CR15","doi-asserted-by":"crossref","unstructured":"L. Zhang, A. Gonzalez-Garcia, J.V.d. Weijer, M. Danelljan, F.S. Khan, Learning the model update for siamese trackers. In: Proceedings of the IEEE\/CVF International Conference on Computer Vision, pp. 4010\u20134019 (2019)","DOI":"10.1109\/ICCV.2019.00411"},{"key":"954_CR16","doi-asserted-by":"crossref","unstructured":"Z. Zhu, Q. Wang, B. Li, W. Wu, J. Yan, W. Hu, Distractor-aware siamese networks for visual object tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 101\u2013117 (2018)","DOI":"10.1007\/978-3-030-01240-3_7"},{"key":"954_CR17","doi-asserted-by":"crossref","unstructured":"K. Dai, Y. Zhang, D. Wang, J. Li, H. Lu, X. Yang, High-performance long-term tracking with meta-updater. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 6298\u20136307 (2020)","DOI":"10.1109\/CVPR42600.2020.00633"},{"key":"954_CR18","doi-asserted-by":"crossref","unstructured":"B. Li, J. Yan, W. Wu, Z. Zhu, X. Hu, High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8971\u20138980 (2018)","DOI":"10.1109\/CVPR.2018.00935"},{"key":"954_CR19","unstructured":"Y. Cui, C. Jiang, L. Wang, G. Wu, Target transformed regression for accurate tracking. arXiv preprint arXiv:2104.00403 (2021)"},{"key":"954_CR20","doi-asserted-by":"crossref","unstructured":"Z. Fu, Q. Liu, Z. Fu, Y. Wang, Stmtrack: template-free visual tracking with space-time memory networks. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 13774\u201313783 (2021)","DOI":"10.1109\/CVPR46437.2021.01356"},{"key":"954_CR21","unstructured":"P. Sun, J. Cao, Y. Jiang, R. Zhang, E. Xie, Z. Yuan, C. Wang, P. Luo, Transtrack: Multiple object tracking with transformer (2020). arXiv preprint arXiv:2012.15460"},{"key":"954_CR22","doi-asserted-by":"crossref","unstructured":"N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, S. Zagoruyko, End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213\u2013229 (2020). Springer","DOI":"10.1007\/978-3-030-58452-8_13"},{"key":"954_CR23","doi-asserted-by":"crossref","unstructured":"T. Meinhardt, A. Kirillov, L. Leal-Taixe, C. Feichtenhofer, Trackformer: multi-object tracking with transformers. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 8844\u20138854 (2022)","DOI":"10.1109\/CVPR52688.2022.00864"},{"key":"954_CR24","unstructured":"A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, C. Heigold, S. Gelly, et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)"},{"key":"954_CR25","doi-asserted-by":"crossref","unstructured":"Y. Cui, C. Jiang, L. Wang, G. Wu, Mixformer: End-to-end tracking with iterative mixed attention. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 13608\u201313618 (2022)","DOI":"10.1109\/CVPR52688.2022.01324"},{"key":"954_CR26","doi-asserted-by":"crossref","unstructured":"H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid, S. Savarese, Generalized intersection over union: A metric and a loss for bounding box regression. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 658\u2013666 (2019)","DOI":"10.1109\/CVPR.2019.00075"},{"key":"954_CR27","doi-asserted-by":"crossref","unstructured":"H. Fan, L, Lin, F. Yang, P. Chu, G. Deng, S. Yu, H. Bai, Y. Xu, C. Liao, H. Ling, Lasot: a high-quality benchmark for large-scale single object tracking. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 5374\u20135383 (2019)","DOI":"10.1109\/CVPR.2019.00552"},{"issue":"5","key":"954_CR28","doi-asserted-by":"publisher","first-page":"1562","DOI":"10.1109\/TPAMI.2019.2957464","volume":"43","author":"L Huang","year":"2019","unstructured":"L. Huang, X. Zhao, K. Huang, Got-10k: a large high-diversity benchmark for generic object tracking in the wild. IEEE Trans. Pattern Anal. Mach. Intell. 43(5), 1562\u20131577 (2019)","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"954_CR29","doi-asserted-by":"crossref","unstructured":"T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll\u00e1r, C.L. Zitnick, Microsoft coco: common objects in context. In: European Conference on Computer Vision, pp. 740\u2013755 (2014). Springer","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"954_CR30","doi-asserted-by":"crossref","unstructured":"M. Muller, A. Bibi, S. Giancola, S. Alsubaihi, B. Ghanem, Trackingnet: a large-scale dataset and benchmark for object tracking in the wild. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 300\u2013317 (2018)","DOI":"10.1007\/978-3-030-01246-5_19"},{"key":"954_CR31","unstructured":"M. Kristan, J. Matas, A. Leonardis, M. Felsberg, R. Pflugfelder, J.-K. K\u00e4m\u00e4r\u00e4inen, H.J. Chang, M. Danelljan, L. Cehovin, A. Luke\u017ei\u010d, et\u00a0al: The ninth visual object tracking vot2021 challenge results. In: Proceedings of the IEEE\/CVF International Conference on Computer Vision, pp. 2711\u20132738 (2021)"},{"key":"954_CR32","doi-asserted-by":"publisher","unstructured":"B. Leibe, J. Matas, N. Sebe, M. Welling, [lecture notes in computer science] computer vision \u2013 eccv 2016 volume 9907 || fast guided global interpolation for depth and motion https:\/\/doi.org\/10.1007\/978-3-319-46487-9(Chapter 44), 717\u2013733 (2016)","DOI":"10.1007\/978-3-319-46487-9"},{"key":"954_CR33","doi-asserted-by":"crossref","unstructured":"A. Moudgil, V. Gandhi, Long-term visual object tracking benchmark. In: Asian Conference on Computer Vision, pp. 629\u2013645 (2018). Springer","DOI":"10.1007\/978-3-030-20890-5_40"},{"key":"954_CR34","doi-asserted-by":"crossref","unstructured":"B. Li, W. Wu, Q. Wang, F. Zhang, J. Xing, J. Yan, Siamrpn++: evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 4282\u20134291 (2019)","DOI":"10.1109\/CVPR.2019.00441"},{"key":"954_CR35","unstructured":"M. Kristan, A. Leonardis, J. Matas, M. Felsberg, R. Pflugfelder, J.-K. K\u00e4m\u00e4r\u00e4inen, M. Danelljan, L.\u010c., Zajc, A. Luke\u017ei\u010d, O. Drbohlav, et\u00a0al., The eighth visual object tracking vot2020 challenge results. In: European Conference on Computer Vision, pp. 547\u2013601 (2020). Springer"},{"issue":"7","key":"954_CR36","doi-asserted-by":"publisher","first-page":"1409","DOI":"10.1109\/TPAMI.2011.239","volume":"34","author":"Z Kalal","year":"2011","unstructured":"Z. Kalal, K. Mikolajczyk, J. Matas, Tracking-learning-detection. IEEE Trans. Pattern Anal. Mach. Intell. 34(7), 1409\u20131422 (2011)","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"954_CR37","doi-asserted-by":"crossref","unstructured":"B. Yan, H. Zhao, D. Wang, H. Lu, X. Yang, \u2019Skimming-perusal\u2019 tracking: a framework for real-time and robust long-term tracking. In: 2019 IEEE\/CVF International Conference on Computer Vision (ICCV) (2019)","DOI":"10.1109\/ICCV.2019.00247"},{"key":"954_CR38","doi-asserted-by":"crossref","unstructured":"S. Choi, J. Lee, Y. Lee, A. Hauptmann, Robust long-term object tracking via improved discriminative model prediction. In: European Conference on Computer Vision, pp. 602\u2013617 (2020). Springer","DOI":"10.1007\/978-3-030-68238-5_40"},{"key":"954_CR39","doi-asserted-by":"crossref","unstructured":"G. Bhat, M. Danelljan, L.V. Gool, R. Timofte, Learning discriminative model prediction for tracking. In: Proceedings of the IEEE\/CVF International Conference on Computer Vision, pp. 6182\u20136191 (2019)","DOI":"10.1109\/ICCV.2019.00628"},{"key":"954_CR40","doi-asserted-by":"crossref","unstructured":"M. Danelljan, L.V. Gool, R. Timofte, Probabilistic regression for visual tracking. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 7183\u20137192 (2020)","DOI":"10.1109\/CVPR42600.2020.00721"},{"key":"954_CR41","doi-asserted-by":"crossref","unstructured":"M. Danelljan, G. Bhat, F.S. Khan, M. Felsberg, Atom: accurate tracking by overlap maximization. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 4660\u20134669 (2019)","DOI":"10.1109\/CVPR.2019.00479"},{"key":"954_CR42","doi-asserted-by":"crossref","unstructured":"Z. Zhang, B. Zhong, S. Zhang, Z. Tang, X. Liu, Z. Zhang, Distractor-aware fast tracking via dynamic convolutions and mot philosophy. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 1024\u20131033 (2021)","DOI":"10.1109\/CVPR46437.2021.00108"}],"container-title":["EURASIP Journal on Advances in Signal Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13634-022-00954-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13634-022-00954-4\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13634-022-00954-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T06:15:56Z","timestamp":1672208156000},"score":1,"resource":{"primary":{"URL":"https:\/\/asp-eurasipjournals.springeropen.com\/articles\/10.1186\/s13634-022-00954-4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,28]]},"references-count":42,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2022,12]]}},"alternative-id":["954"],"URL":"https:\/\/doi.org\/10.1186\/s13634-022-00954-4","relation":{"has-preprint":[{"id-type":"doi","id":"10.21203\/rs.3.rs-1941223\/v1","asserted-by":"object"}]},"ISSN":["1687-6180"],"issn-type":[{"value":"1687-6180","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,12,28]]},"assertion":[{"value":"11 August 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"27 November 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"28 December 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no competing interests","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"124"}}