{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,2]],"date-time":"2026-01-02T07:32:13Z","timestamp":1767339133078,"version":"3.37.3"},"reference-count":72,"publisher":"Springer Science and Business Media LLC","issue":"5","license":[{"start":{"date-parts":[[2023,4,7]],"date-time":"2023-04-07T00:00:00Z","timestamp":1680825600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,4,7]],"date-time":"2023-04-07T00:00:00Z","timestamp":1680825600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["6202"],"award-info":[{"award-number":["6202"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2023,10]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Transformer-based trackers greatly improve tracking success rate and precision rate. Attention mechanism in Transformer can fully explore the context information across successive frames. Nevertheless, it ignores the equally important local information and structured spatial information. And irrelevant regions may also affect the template features and search region features. In this work, a multi-scale feature fusion network is designed with box attention and instance attention in Encoder\u2013Decoder architecture based on Transformer. After extracting features, the local information and structured spatial information is learnt by multi-scale box attention, and the global context information is explored by instance attention. Box attention samples grid features from the region of interest. Therefore, it effectively focuses on the region of interest (ROI) and avoids the influence of irrelevant regions in feature extraction. At the same time, instance attention can also pay attention to the context information across successive frames, and avoid falling into local optimum. The long-range feature dependencies are learned in this stage. Extensive experiments are conducted on six challenging tracking datasets to demonstrate the superiority of the proposed tracker MDTT, including UAV123, GOT-10k, LaSOT, VOT2018, TrackingNet, and NfS. In particular, the proposed tracker achieves AUC score of <jats:inline-formula><jats:alternatives><jats:tex-math>$$64.7 \\% $$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mrow>\n                    <mml:mn>64.7<\/mml:mn>\n                    <mml:mo>%<\/mml:mo>\n                  <\/mml:mrow>\n                <\/mml:math><\/jats:alternatives><\/jats:inline-formula> on LaSOT, <jats:inline-formula><jats:alternatives><jats:tex-math>$$78.1 \\%$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mrow>\n                    <mml:mn>78.1<\/mml:mn>\n                    <mml:mo>%<\/mml:mo>\n                  <\/mml:mrow>\n                <\/mml:math><\/jats:alternatives><\/jats:inline-formula> on TrackingNet and precision score of <jats:inline-formula><jats:alternatives><jats:tex-math>$$89.2 \\%$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mrow>\n                    <mml:mn>89.2<\/mml:mn>\n                    <mml:mo>%<\/mml:mo>\n                  <\/mml:mrow>\n                <\/mml:math><\/jats:alternatives><\/jats:inline-formula> on UAV123, which outperforms the baseline and most recent advanced trackers.<\/jats:p>","DOI":"10.1007\/s40747-023-01043-1","type":"journal-article","created":{"date-parts":[[2023,4,7]],"date-time":"2023-04-07T08:02:41Z","timestamp":1680854561000},"page":"5793-5806","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["Transformer tracking with multi-scale dual-attention"],"prefix":"10.1007","volume":"9","author":[{"given":"Jun","family":"Wang","sequence":"first","affiliation":[]},{"given":"Changwang","family":"Lai","sequence":"additional","affiliation":[]},{"given":"Wenshuang","family":"Zhang","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6446-5873","authenticated-orcid":false,"given":"Yuanyun","family":"Wang","sequence":"additional","affiliation":[]},{"given":"Chenchen","family":"Meng","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,4,7]]},"reference":[{"key":"1043_CR1","first-page":"1","volume":"2","author":"K Chen","year":"2022","unstructured":"Chen K, Guo X, Xu L, Zhou T, Li R (2022) A robust target tracking algorithm based on spatial regularization and adaptive updating model. Complex Intell Syst 2:1\u201315","journal-title":"Complex Intell Syst"},{"key":"1043_CR2","first-page":"2","volume":"30","author":"A Vaswani","year":"2017","unstructured":"Vaswani A, Shazeer A, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. Adv Neural Inform Process Syst 30:2","journal-title":"Adv Neural Inform Process Syst"},{"key":"1043_CR3","doi-asserted-by":"crossref","unstructured":"Wang N, Zhou W, Wang J, Li H (2021) Transformer meets tracker: Exploiting temporal context for robust visual tracking. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 1571\u20131580","DOI":"10.1109\/CVPR46437.2021.00162"},{"key":"1043_CR4","doi-asserted-by":"crossref","unstructured":"Mueller M, Smith N, Ghanem B (2016) A benchmark and simulator for uav tracking. In: European conference on computer vision, Springer, pp. 445\u2013461","DOI":"10.1007\/978-3-319-46448-0_27"},{"key":"1043_CR5","unstructured":"Kristan M, Leonardis A, Matas J, Felsberg M, Pflugfelder R, Cehovin Zajc L, Vojir T, Bhat G, Lukezic A, Eldesokey A et al (2018) The sixth visual object tracking vot2018 challenge results. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops"},{"issue":"5","key":"1043_CR6","doi-asserted-by":"publisher","first-page":"1562","DOI":"10.1109\/TPAMI.2019.2957464","volume":"43","author":"L Huang","year":"2019","unstructured":"Huang L, Zhao X, Huang K (2019) Got-10k: A large high-diversity benchmark for generic object tracking in the wild. IEEE Trans Pattern Anal Mach Intell 43(5):1562\u20131577","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"1043_CR7","unstructured":"Kiani Galoogahi H, Fagg A, Huang C, Ramanan D, Lucey S (2007) Need for speed: A benchmark for higher frame rate object tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1125\u20131134"},{"key":"1043_CR8","doi-asserted-by":"crossref","unstructured":"Fan H, Lin L, Yang F, Chu P, Deng G, Yu S, Bai H, Xu Y, Liao C, Ling H (2019) Lasot: A high-quality benchmark for large-scale single object tracking. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 5374\u20135383","DOI":"10.1109\/CVPR.2019.00552"},{"key":"1043_CR9","doi-asserted-by":"crossref","unstructured":"Muller M, Bibi A, Giancola S, Alsubaihi S, Ghanem B (2018) Trackingnet: A large-scale dataset and benchmark for object tracking in the wild. In: Proceedings of the European conference on computer vision (ECCV), pp 300\u2013317","DOI":"10.1007\/978-3-030-01246-5_19"},{"key":"1043_CR10","doi-asserted-by":"crossref","unstructured":"Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PH (2016) Fully-convolutional siamese networks for object tracking. In: European conference on computer vision, Springer, pp 850\u2013865","DOI":"10.1007\/978-3-319-48881-3_56"},{"key":"1043_CR11","doi-asserted-by":"crossref","unstructured":"Dong X, Shen J (2018) Triplet loss in siamese network for object tracking. In: Proceedings of the European conference on computer vision (ECCV), pp 459\u2013474","DOI":"10.1007\/978-3-030-01261-8_28"},{"key":"1043_CR12","doi-asserted-by":"crossref","unstructured":"Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8971\u20138980","DOI":"10.1109\/CVPR.2018.00935"},{"key":"1043_CR13","doi-asserted-by":"crossref","unstructured":"Li B, Wu W, Wang Q, Zhang F, Xing J, Yan J (2019) Siamrpn++: Evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp 4282\u20134291","DOI":"10.1109\/CVPR.2019.00441"},{"key":"1043_CR14","doi-asserted-by":"crossref","unstructured":"Abdelpakey MH, Shehata MS, Mohamed MM (2018) Denssiam: End-to-end densely-siamese network with self-attention model for object tracking. In: International Symposium on Visual Computing, Springer, pp. 463\u2013473","DOI":"10.1007\/978-3-030-03801-4_41"},{"key":"1043_CR15","doi-asserted-by":"crossref","unstructured":"Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132\u20137141","DOI":"10.1109\/CVPR.2018.00745"},{"key":"1043_CR16","doi-asserted-by":"crossref","unstructured":"Woo S, Park J, Lee JY, Kweon IS (2018) Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3\u201319","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"1043_CR17","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2021.107338","volume":"229","author":"Z Xiao","year":"2021","unstructured":"Xiao Z, Xu X, Xing H, Song F, Wang X, Zhao B (2021) A federated learning system with enhanced feature extraction for human activity recognition. Knowl-Based Syst 229:107338","journal-title":"Knowl-Based Syst"},{"issue":"11","key":"1043_CR18","doi-asserted-by":"publisher","first-page":"8583","DOI":"10.1002\/int.22957","volume":"37","author":"H Xing","year":"2022","unstructured":"Xing H, Xiao Z, Zhan D, Luo S, Dai P, Li K (2022) Selfmatch: Robust semisupervised time-series classification with self-distillation. Int J Intell Syst 37(11):8583\u20138610","journal-title":"Int J Intell Syst"},{"key":"1043_CR19","doi-asserted-by":"crossref","unstructured":"Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE\/CVF International Conference on Computer Vision, pp 10012\u201310022","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"1043_CR20","doi-asserted-by":"crossref","unstructured":"Xia Z, Pan X, Song S, Li LE, Huang G (2022) Vision transformer with deformable attention. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp 4794\u20134803","DOI":"10.1109\/CVPR52688.2022.00475"},{"key":"1043_CR21","doi-asserted-by":"crossref","unstructured":"Yan B, Peng H, Fu J, Wang D, Lu H (2021) Learning spatio-temporal transformer for visual tracking. In: Proceedings of the IEEE\/CVF International Conference on Computer Vision, pp 10448\u201310457","DOI":"10.1109\/ICCV48922.2021.01028"},{"key":"1043_CR22","unstructured":"Lin L, Fan H, Xu Y, Ling H (2021) Swintrack: A simple and strong baseline for transformer tracking, arXiv preprint arXiv:2112.00995"},{"key":"1043_CR23","unstructured":"Zhao M, Okada K, Inaba M (2021) Trtr: Visual tracking with transformer, arXiv preprint arXiv:2105.03817"},{"key":"1043_CR24","doi-asserted-by":"crossref","unstructured":"Chen X, Yan B, Zhu J, Wang D, Yang X, Lu H (2021) Transformer tracking. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp 8126\u20138135","DOI":"10.1109\/CVPR46437.2021.00803"},{"key":"1043_CR25","doi-asserted-by":"crossref","unstructured":"Mayer C, Danelljan M, Bhat G, Paul M, Paudel DP, Yu F, Van Gool L (2022) Transforming model prediction for tracking. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp 8731\u20138740","DOI":"10.1109\/CVPR52688.2022.00853"},{"key":"1043_CR26","doi-asserted-by":"crossref","unstructured":"He K, Gkioxari G, Doll\u00e1r P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp. 2961\u20132969","DOI":"10.1109\/ICCV.2017.322"},{"key":"1043_CR27","doi-asserted-by":"crossref","unstructured":"He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770\u2013778","DOI":"10.1109\/CVPR.2016.90"},{"key":"1043_CR28","doi-asserted-by":"crossref","unstructured":"Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Doll\u00e1r D, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision, Springer, pp 740\u2013755","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"1043_CR29","unstructured":"Kingma DP, Ba J (2014) Adam: A method for stochastic optimization, arXiv preprint arXiv:1412.6980"},{"key":"1043_CR30","doi-asserted-by":"crossref","unstructured":"Bhat G, Danelljan M, Gool LV, Timofte R (2019) Learning discriminative model prediction for tracking. In: Proceedings of the IEEE\/CVF international conference on computer vision, pp 6182\u20136191","DOI":"10.1109\/ICCV.2019.00628"},{"key":"1043_CR31","doi-asserted-by":"crossref","unstructured":"Zhu Z, Wang Q, Li B, Wu W, Yan J, Hu W (2018) Distractor-aware siamese networks for visual object tracking. In: Proceedings of the European conference on computer vision (ECCV), pp 101\u2013117","DOI":"10.1007\/978-3-030-01240-3_7"},{"key":"1043_CR32","doi-asserted-by":"crossref","unstructured":"Cao Z, Fu C, Ye J, Li B, Li Y (2021) Hift: Hierarchical feature transformer for aerial tracking. In: Proceedings of the IEEE\/CVF International Conference on Computer Vision, pp. 15457\u201315466","DOI":"10.1109\/ICCV48922.2021.01517"},{"key":"1043_CR33","doi-asserted-by":"crossref","unstructured":"Cao Z, Huang Z, Pan L, Zhang S, Liu Z, Fu C (2022) Tctrack: Temporal contexts for aerial tracking. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp 14798\u201314808","DOI":"10.1109\/CVPR52688.2022.01438"},{"key":"1043_CR34","doi-asserted-by":"crossref","unstructured":"Fu Z, Liu Q, Fu Z, Wang Y (2021) Stmtrack: Template-free visual tracking with space-time memory networks. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp 13774\u201313783","DOI":"10.1109\/CVPR46437.2021.01356"},{"key":"1043_CR35","doi-asserted-by":"crossref","unstructured":"Guo D, Shao Y, Cui Y, Wang Z, Zhang L, Shen C (2021) Graph attention tracking. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 9543\u20139552","DOI":"10.1109\/CVPR46437.2021.00942"},{"key":"1043_CR36","doi-asserted-by":"crossref","unstructured":"Yu Y, Xiong Y, Huang E, Scott MR (2020) Deformable siamese attention networks for visual object tracking. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 6728\u20136737","DOI":"10.1109\/CVPR42600.2020.00676"},{"key":"1043_CR37","doi-asserted-by":"crossref","unstructured":"Tang F, Ling Q (2022) Ranking-based siamese visual tracking. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp 8741\u20138750","DOI":"10.1109\/CVPR52688.2022.00854"},{"key":"1043_CR38","doi-asserted-by":"crossref","unstructured":"Guo M, Zhang Z, Fan H, Jing L, Lyu Y, Li B, Hu W (2022) Learning target-aware representation for visual tracking via informative interactions, arXiv preprint arXiv:2201.02526","DOI":"10.24963\/ijcai.2022\/130"},{"key":"1043_CR39","doi-asserted-by":"crossref","unstructured":"Ma F, Shou MZ, Zhu L, Fan H, Xu Y, Yang Y, Yan Z (2022) Unified transformer tracker for object tracking. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp 8781\u20138790","DOI":"10.1109\/CVPR52688.2022.00858"},{"key":"1043_CR40","unstructured":"Cui Y, Jiang C, Wang L, Wu G (2021) Target transformed regression for accurate tracking, arXiv preprint arXiv:2104.00403"},{"key":"1043_CR41","doi-asserted-by":"crossref","unstructured":"Xie F, Wang C, Wang G, Cao Y, Yang W, Zeng W (2022) Correlation-aware deep tracking. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp 8751\u20138760","DOI":"10.1109\/CVPR52688.2022.00855"},{"key":"1043_CR42","unstructured":"Danelljan M, Bhat G (2019) Pytracking: Visual tracking library based on pytorch"},{"key":"1043_CR43","doi-asserted-by":"crossref","unstructured":"Zhang Z, Liu Y, Wang X, Li B, Hu W (2021) Learn to match: Automatic matching network design for visual tracking. In: Proceedings of the IEEE\/CVF International Conference on Computer Vision, pp 13339\u201313348","DOI":"10.1109\/ICCV48922.2021.01309"},{"key":"1043_CR44","doi-asserted-by":"crossref","unstructured":"Voigtlaender P, Luiten J, Torr PH, Leibe B (2020) Siam r-cnn: Visual tracking by re-detection. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 6578\u20136588","DOI":"10.1109\/CVPR42600.2020.00661"},{"key":"1043_CR45","doi-asserted-by":"crossref","unstructured":"Zhou Z, Pei W, Li X, Wang H, Zheng F, He Z (2021) Saliency-associated object tracking. In: Proceedings of the IEEE\/CVF International Conference on Computer Vision, pp 9866\u20139875","DOI":"10.1109\/ICCV48922.2021.00972"},{"key":"1043_CR46","doi-asserted-by":"crossref","unstructured":"Bhat G, Danelljan M, Gool LV, Timofte R (2020) Know your surroundings: Exploiting scene information for object tracking. In: European Conference on Computer Vision, Springer, pp 205\u2013221","DOI":"10.1007\/978-3-030-58592-1_13"},{"key":"1043_CR47","unstructured":"Cui Y, Jiang C, Wang L, Wu G (2020) Fully convolutional online tracking, arXiv preprint arXiv:2004.07109"},{"key":"1043_CR48","doi-asserted-by":"crossref","unstructured":"Danelljan M, Gool LV, Timofte R (2020) Probabilistic regression for visual tracking. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 7183\u20137192","DOI":"10.1109\/CVPR42600.2020.00721"},{"key":"1043_CR49","doi-asserted-by":"crossref","unstructured":"Nie J, Wu H, He Z, Yang Y, Gao M, Dong Z (2022) Learning localization-aware target confidence for siamese visual tracking, arXiv preprint arXiv:2204.14093","DOI":"10.1109\/TMM.2022.3206668"},{"key":"1043_CR50","doi-asserted-by":"crossref","unstructured":"Zhang Z, Peng H, Fu J, Li B, Hu W (2020) Ocean: Object-aware anchor-free tracking. In: European Conference on Computer Vision, Springer, pp 771\u2013787","DOI":"10.1007\/978-3-030-58589-1_46"},{"key":"1043_CR51","doi-asserted-by":"crossref","unstructured":"Lukezic A, Matas J, Kristan M (2020) D3s-a discriminative single shot segmentation tracker. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 7133\u20137142","DOI":"10.1109\/CVPR42600.2020.00716"},{"key":"1043_CR52","doi-asserted-by":"crossref","unstructured":"Guo D, Wang J, Cui Y, Wang Z, Chen S (2020) Siamcar: Siamese fully convolutional classification and regression for visual tracking. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 6269\u20136277","DOI":"10.1109\/CVPR42600.2020.00630"},{"key":"1043_CR53","doi-asserted-by":"crossref","unstructured":"Danelljan M, Bhat G, Khan FS, Felsberg M (2019) Atom: Accurate tracking by overlap maximization. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp 4660\u20134669","DOI":"10.1109\/CVPR.2019.00479"},{"key":"1043_CR54","doi-asserted-by":"crossref","unstructured":"Xie F, Wang C, Wang G, Yang W, Zeng W (2021) Learning tracking representations via dual-branch fully transformer networks. In: Proceedings of the IEEE\/CVF International Conference on Computer Vision, pp 2688\u20132697","DOI":"10.1109\/ICCVW54120.2021.00303"},{"key":"1043_CR55","doi-asserted-by":"crossref","unstructured":"Xing D, Evangeliou N, Tsoukalas A, Tzes A (2022) Siamese transformer pyramid networks for real-time uav tracking. In: Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, pp 2139\u20132148","DOI":"10.1109\/WACV51458.2022.00196"},{"key":"1043_CR56","unstructured":"Shen Q, Li X, Meng F, Liang Y (2022) Context-aware visual tracking with joint meta-updating, arXiv preprint arXiv:2204.01513"},{"key":"1043_CR57","doi-asserted-by":"crossref","unstructured":"Dai K, Zhang Y, Wang D, Li J, Lu H, Yang X (2020) High-performance long-term tracking with meta- updater. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp 6298\u20136307","DOI":"10.1109\/CVPR42600.2020.00633"},{"key":"1043_CR58","unstructured":"Zhu J, Chen X, Wang D, Zhao W, Lu H (2022) Srrt: Search region regulation tracking, arXiv preprint arXiv:2207.04438"},{"key":"1043_CR59","doi-asserted-by":"crossref","unstructured":"Du F, Liu P, Zhao W, Tang X (2020) Correlation-guided attention for corner detection based visual tracking. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp 6836\u20136845","DOI":"10.1109\/CVPR42600.2020.00687"},{"key":"1043_CR60","doi-asserted-by":"crossref","unstructured":"Shen Q, Qiao L, Guo J, Li P, Li X, Li B, Feng W, Gan W, Wu W, Ouyang W (2022) Unsupervised learning of accurate siamese tracking. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp 8101\u20138110","DOI":"10.1109\/CVPR52688.2022.00793"},{"key":"1043_CR61","doi-asserted-by":"crossref","unstructured":"Wang G, Luo C, Sun X, Xiong Z, Zeng W (2020) Tracking by instance detection: A meta-learning approach. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 6288\u20136297","DOI":"10.1109\/CVPR42600.2020.00632"},{"key":"1043_CR62","doi-asserted-by":"crossref","unstructured":"Liao B, Wang C, Wang Y, Wang Y, Yin J (2020) Pg-net: Pixel to global matching network for visual tracking. In: European Conference on Computer Vision, Springer, pp 429\u2013444","DOI":"10.1007\/978-3-030-58542-6_26"},{"key":"1043_CR63","doi-asserted-by":"crossref","unstructured":"Xu Y, Wang Z, Li Z, Yuan Y, Yu G (2020) Siamfc++: Towards robust and accurate visual tracking with target estimation guidelines. In: Proceedings of the AAAI Conference on Artificial Intelligence 34:12549\u201312556","DOI":"10.1609\/aaai.v34i07.6944"},{"issue":"11","key":"1043_CR64","doi-asserted-by":"publisher","first-page":"5596","DOI":"10.1109\/TIP.2019.2919201","volume":"28","author":"T Xu","year":"2019","unstructured":"Xu T, Feng Z-H, Wu X-J, Kittler J (2019) Learning adaptive discriminative correlation filters via temporal consistency preserving spatial feature selection for robust visual object tracking. IEEE Trans Image Process 28(11):5596\u20135609","journal-title":"IEEE Trans Image Process"},{"key":"1043_CR65","doi-asserted-by":"crossref","unstructured":"Bhat G, Johnander J, Danelljan M, Khan FS, Felsberg M (2018) Unveiling the power of deep tracking. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 483\u2013498","DOI":"10.1007\/978-3-030-01216-8_30"},{"key":"1043_CR66","doi-asserted-by":"crossref","unstructured":"Zheng L, Tang M, Chen Y, Wang J, Lu H (2020) Learning feature embeddings for discriminant model based tracking. In: European Conference on Computer Vision, Springer, pp 759\u2013775","DOI":"10.1007\/978-3-030-58555-6_45"},{"key":"1043_CR67","doi-asserted-by":"publisher","first-page":"8785","DOI":"10.1109\/TIP.2021.3120305","volume":"30","author":"F Tang","year":"2021","unstructured":"Tang F, Ling Q (2021) Learning to rank proposals for siamese visual tracking. IEEE Trans Image Process 30:8785\u20138796","journal-title":"IEEE Trans Image Process"},{"key":"1043_CR68","doi-asserted-by":"crossref","unstructured":"Fan H, Ling H (2019) Siamese cascaded region proposal networks for real-time visual tracking, in: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp. 7952\u20137961","DOI":"10.1109\/CVPR.2019.00814"},{"key":"1043_CR69","doi-asserted-by":"crossref","unstructured":"Fan H, Ling H (2020) Cract: Cascaded regression-align-classification for robust visual tracking, arXiv preprint arXiv:2011.12483","DOI":"10.1109\/IROS51168.2021.9636803"},{"key":"1043_CR70","doi-asserted-by":"crossref","unstructured":"Zhang Z, Peng H (2019) Deeper and wider siamese networks for real-time visual tracking, in: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 4591\u20134600","DOI":"10.1109\/CVPR.2019.00472"},{"key":"1043_CR71","doi-asserted-by":"crossref","unstructured":"Derrac J, Garc\u00eda S, Molina D, Herrera F (2011) A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evol Comput 1(1):3\u201318","DOI":"10.1016\/j.swevo.2011.02.002"},{"key":"1043_CR72","doi-asserted-by":"crossref","unstructured":"Cui Y, Jiang C, Wang L, Wu G (2022) Mixformer: End-to-end tracking with iterative mixed attention, in: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp 13608\u201313618","DOI":"10.1109\/CVPR52688.2022.01324"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-023-01043-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-023-01043-1\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-023-01043-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,9,22]],"date-time":"2023-09-22T17:26:15Z","timestamp":1695403575000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-023-01043-1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,4,7]]},"references-count":72,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2023,10]]}},"alternative-id":["1043"],"URL":"https:\/\/doi.org\/10.1007\/s40747-023-01043-1","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"type":"print","value":"2199-4536"},{"type":"electronic","value":"2198-6053"}],"subject":[],"published":{"date-parts":[[2023,4,7]]},"assertion":[{"value":"22 November 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 March 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 April 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}