{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T14:20:52Z","timestamp":1773843652315,"version":"3.50.1"},"reference-count":49,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2024,1,10]],"date-time":"2024-01-10T00:00:00Z","timestamp":1704844800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Neurorobot."],"abstract":"<jats:p>Visual tracking is a crucial task in computer vision that has been applied in diverse fields. Recently, transformer architecture has been widely applied in visual tracking and has become a mainstream framework instead of the Siamese structure. Although transformer-based trackers have demonstrated remarkable accuracy in general circumstances, their performance in occluded scenes remains unsatisfactory. This is primarily due to their inability to recognize incomplete target appearance information when the target is occluded. To address this issue, we propose a novel transformer tracking approach referred to as TATT, which integrates a target-aware transformer network and a hard occlusion instance generation module. The target-aware transformer network utilizes an encoder-decoder structure to facilitate interaction between template and search features, extracting target information in the template feature to enhance the unoccluded parts of the target in the search features. It can directly predict the boundary between the target region and the background to generate tracking results. The hard occlusion instance generation module employs multiple image similarity calculation methods to select an image pitch in video sequences that is most similar to the target and generate an occlusion instance mimicking real scenes without adding an extra network. Experiments on five benchmarks, including LaSOT, TrackingNet, Got10k, OTB100, and UAV123, demonstrate that our tracker achieves promising performance while running at approximately 41 fps on GPU. Specifically, our tracker achieves the highest AUC scores of 65.5 and 61.2% in partial and full occlusion evaluations on LaSOT, respectively.<\/jats:p>","DOI":"10.3389\/fnbot.2023.1323188","type":"journal-article","created":{"date-parts":[[2024,1,10]],"date-time":"2024-01-10T04:22:18Z","timestamp":1704860538000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["Target-aware transformer tracking with hard occlusion instance generation"],"prefix":"10.3389","volume":"17","author":[{"given":"Dingkun","family":"Xiao","sequence":"first","affiliation":[]},{"given":"Zhenzhong","family":"Wei","sequence":"additional","affiliation":[]},{"given":"Guangjun","family":"Zhang","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2024,1,10]]},"reference":[{"key":"ref1","doi-asserted-by":"publisher","first-page":"993","DOI":"10.1007\/s00371-020-01848-y","article-title":"A survey on online learning for visual tracking","volume":"37","author":"Abbass","year":"2021","journal-title":"Vis. Comput."},{"key":"ref2","author":"Bertinetto","year":"2016"},{"key":"ref3","author":"Bhat","year":"2019"},{"key":"ref4","author":"Bhat","year":"2020"},{"key":"ref5","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-030-58452-8_13","article-title":"End-to-end object detection with transformers","volume-title":"European conference on computer vision","author":"Carion","year":"2020"},{"key":"ref6","author":"Chen","year":"2021"},{"key":"ref7","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/TPAMI.2022.3195759","article-title":"SiamBAN: target-aware tracking with Siamese box adaptive network","volume":"45","author":"Chen","year":"2022","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref8","author":"Cui","year":"2022"},{"key":"ref9","author":"Danelljan","year":"2019"},{"key":"ref10","author":"DeVries","year":"2017"},{"key":"ref11","author":"Fan","year":"2019"},{"key":"ref12","doi-asserted-by":"publisher","first-page":"1296","DOI":"10.1109\/TCSVT.2020.2987601","article-title":"Siamon: Siamese occlusion-aware network for visual tracking","volume":"31","author":"Fan","year":"2021","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref13","author":"Guo","year":"2020"},{"key":"ref14","doi-asserted-by":"publisher","first-page":"87","DOI":"10.1109\/TPAMI.2022.3152247","article-title":"A survey on vision transformer","volume":"45","author":"Han","year":"2022","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref15","doi-asserted-by":"publisher","first-page":"354","DOI":"10.1109\/TCYB.2016.2514714","article-title":"Robust object tracking via key patch sparse representation","volume":"47","author":"He","year":"2016","journal-title":"IEEE Transact Cybernet"},{"key":"ref16","doi-asserted-by":"publisher","first-page":"3072","DOI":"10.1109\/TPAMI.2022.3172932","article-title":"Siammask: a framework for fast online object tracking and segmentation","volume":"45","author":"Hu","year":"2023","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref17","doi-asserted-by":"publisher","first-page":"1562","DOI":"10.1109\/TPAMI.2019.2957464","article-title":"Got-10k: a large high-diversity benchmark for generic object tracking in the wild","volume":"43","author":"Huang","year":"2019","journal-title":"IEEE Transact Pattern Analys Machine Intell"},{"key":"ref18","doi-asserted-by":"publisher","first-page":"6552","DOI":"10.48550\/arXiv.2112.02838","article-title":"Visual object tracking with discriminative filters and siamese networks: a survey and outlook","volume":"45","author":"Javed","year":"2022","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref19","doi-asserted-by":"publisher","first-page":"5497","DOI":"10.1109\/TNNLS.2021.3136907","article-title":"Deep learning in visual tracking: a review","volume":"34","author":"Jiao","year":"2021","journal-title":"IEEE Transact Neural Netw Learn Syst"},{"key":"ref20","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3505244","article-title":"Transformers in vision: a survey","volume":"54","author":"Khan","year":"2022","journal-title":"ACM Comput Surveys"},{"key":"ref21","doi-asserted-by":"publisher","first-page":"323","DOI":"10.1016\/j.patcog.2017.11.007","article-title":"Deep visual tracking: review and experimental comparison","volume":"76","author":"Li","year":"2018","journal-title":"Pattern Recogn."},{"key":"ref22","doi-asserted-by":"publisher","first-page":"21002","DOI":"10.48550\/arXiv.2006.04388","article-title":"Generalized focal loss: learning qualified and distributed bounding boxes for dense object detection","volume":"33","author":"Li","year":"2020","journal-title":"Adv. Neural Inf. Proces. Syst."},{"key":"ref23","author":"Li","year":"2019"},{"key":"ref24","author":"Li","year":"2018"},{"key":"ref25","doi-asserted-by":"publisher","first-page":"16743","DOI":"10.48550\/arXiv.2112.00995","article-title":"Swintrack: a simple and strong baseline for transformer tracking","volume":"35","author":"Lin","year":"2022","journal-title":"Adv. Neural Inf. Proces. Syst."},{"key":"ref26","author":"Lin","year":"2014"},{"key":"ref27","author":"Lukezic","year":"2020"},{"key":"ref28","doi-asserted-by":"publisher","first-page":"3943","DOI":"10.1109\/TITS.2020.3046478","article-title":"Deep learning for visual tracking: a comprehensive survey","volume":"23","author":"Marvasti-Zadeh","year":"2021","journal-title":"IEEE Trans. Intell. Transp. Syst."},{"key":"ref29","author":"Mayer","year":"2022"},{"key":"ref30","author":"Mayer","year":"2021"},{"key":"ref31","author":"Mueller","year":"2016"},{"key":"ref32","author":"M\u00fcller","year":"2018"},{"key":"ref33","author":"Song","year":"2022"},{"key":"ref34","author":"Touvron","year":"2021"},{"key":"ref9001","first-page":"30","author":"Vaswani","year":"2017"},{"key":"ref35","author":"Wang","year":"2018"},{"key":"ref36","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/TMM.2023.3264851","article-title":"CMAT: integrating convolution mixer and self-attention for visual tracking","author":"Wang","year":"2023","journal-title":"IEEE Trans. Multimed."},{"key":"ref37","doi-asserted-by":"publisher","first-page":"730","DOI":"10.1109\/TCSVT.2018.2816570","article-title":"Reliable re-detection for long-term tracking","volume":"29","author":"Wang","year":"2018","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref38","author":"Wang","year":"2021"},{"key":"ref39","author":"Wu","year":"2013"},{"key":"ref40","author":"Xie","year":"2022"},{"key":"ref41","author":"Xu","year":"2020"},{"key":"ref42","author":"Yan","year":"2021"},{"key":"ref43","doi-asserted-by":"publisher","first-page":"1235","DOI":"10.1109\/TCSVT.2016.2527358","article-title":"Part-based robust tracking using online latent structured learning","volume":"27","author":"Yao","year":"2016","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref44","author":"Ye","year":"2019"},{"key":"ref45","author":"Yu","year":"2020"},{"key":"ref46","author":"Zhang","year":"2020"},{"key":"ref47","doi-asserted-by":"publisher","first-page":"2536","DOI":"10.1007\/s11263-021-01487-3","article-title":"Learning regression and verification networks for robust long-term tracking","volume":"129","author":"Zhang","year":"2021","journal-title":"Int. J. Comput. Vis."},{"key":"ref48","author":"Zhao","year":"2021"}],"container-title":["Frontiers in Neurorobotics"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fnbot.2023.1323188\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,1,10]],"date-time":"2024-01-10T04:22:20Z","timestamp":1704860540000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fnbot.2023.1323188\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,1,10]]},"references-count":49,"alternative-id":["10.3389\/fnbot.2023.1323188"],"URL":"https:\/\/doi.org\/10.3389\/fnbot.2023.1323188","relation":{},"ISSN":["1662-5218"],"issn-type":[{"value":"1662-5218","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,1,10]]},"article-number":"1323188"}}