{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,25]],"date-time":"2026-04-25T14:34:28Z","timestamp":1777127668080,"version":"3.51.4"},"reference-count":33,"publisher":"MDPI AG","issue":"19","license":[{"start":{"date-parts":[[2022,9,29]],"date-time":"2022-09-29T00:00:00Z","timestamp":1664409600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Natural Science Foundation of Heilongjiang Province","award":["LH2020F040"],"award-info":[{"award-number":["LH2020F040"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>To solve the insufficient ability of the current Thermal InfraRed (TIR) tracking methods to resist occlusion and interference from similar targets, we propose a TIR tracking method based on efficient global information perception. In order to efficiently obtain the global semantic information of images, we use the Transformer structure for feature extraction and fusion. In the feature extraction process, the Focal Transformer structure is used to improve the efficiency of remote information modeling, which is highly similar to the human attention mechanism. The feature fusion process supplements the relative position encoding to the standard Transformer structure, which allows the model to continuously consider the influence of positional relationships during the learning process. It can also generalize to capture the different positional information for different input sequences. Thus, it makes the Transformer structure model the semantic information contained in images more efficiently. To further improve the tracking accuracy and robustness, the heterogeneous bi-prediction head is utilized in the object prediction process. The fully connected sub-network is responsible for the classification prediction of the foreground or background. The convolutional sub-network is responsible for the regression prediction of the object bounding box. In order to alleviate the contradiction between the vast demand for training data of the Transformer model and the insufficient scale of the TIR tracking dataset, the LaSOT-TIR dataset is generated with the generative adversarial network for network training. Our method achieves the best performance compared with other state-of-the-art trackers on the VOT2015-TIR, VOT2017-TIR, PTB-TIR and LSOTB-TIR datasets, and performs outstandingly especially when dealing with severe occlusion or interference from similar objects.<\/jats:p>","DOI":"10.3390\/s22197408","type":"journal-article","created":{"date-parts":[[2022,9,29]],"date-time":"2022-09-29T23:09:29Z","timestamp":1664492969000},"page":"7408","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["Thermal Infrared Tracking Method Based on Efficient Global Information Perception"],"prefix":"10.3390","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6271-5576","authenticated-orcid":false,"given":"Long","family":"Zhao","sequence":"first","affiliation":[{"name":"Big Data Institute, East University of Heilongjiang, Harbin 150066, China"},{"name":"College of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xiaoye","family":"Liu","sequence":"additional","affiliation":[{"name":"Big Data Institute, East University of Heilongjiang, Harbin 150066, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Honge","family":"Ren","sequence":"additional","affiliation":[{"name":"College of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China"},{"name":"Forestry Intelligent Equipment Engineering Research Center, Harbin 150040, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lingjixuan","family":"Xue","sequence":"additional","affiliation":[{"name":"College of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2022,9,29]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"106","DOI":"10.1016\/j.patcog.2019.106977","article-title":"RGB-T Object Tracking: Benchmark and Baseline","volume":"96","author":"Li","year":"2019","journal-title":"J. Pattern Recognit."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Liu, Q., Li, X., He, Z., Li, C., and Zheng, F. (2020, January 12\u201316). LSOTB-TIR:A Large-Scale High-Diversity Thermal Infrared Object Tracking Benchmark. Proceedings of the ACM International Conference on Multimedia, Seattle, WA, USA.","DOI":"10.1145\/3394171.3413922"},{"key":"ref_3","unstructured":"Felsberg, M., and Kristan, M. (2016, January 8\u201316). The Thermal Infrared Visual Object Tracking VOT 2016 Challenge Results. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Song, Y., Chao, M., Gong, L., Zhang, J., and Yang, M.H. (2017, January 22\u201329). CREST: Convolutional Residual Learning for Visual Tracking. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","DOI":"10.1109\/ICCV.2017.279"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Li, X., Ma, C., and Wu, B. (2019, January 16\u201320). Target-Aware Deep Tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00146"},{"key":"ref_6","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, \u0141., and Polosukhin, I. (2017, January 4\u20139). Attention is all you need. Proceedings of the International Conference on Neural Information Processing Systems, Long Beach, LA, USA."},{"key":"ref_7","unstructured":"Yang, J., Li, C., and Zhang, P. (2021, January 6\u201314). Focal Self-attention for Local-Global Interactions in Vision Transformers. Proceedings of the International Conference on Neural Information Processing Systems, Online."},{"key":"ref_8","first-page":"1503","article-title":"Application of Improved Particle Filter Algorithm in Deep Space Infrared Small Target Tracking","volume":"43","author":"Ye","year":"2015","journal-title":"J. Acta Electron. Sin."},{"key":"ref_9","first-page":"2164","article-title":"Tracking of Infrared Small-Target Based on Improved Mean-Shift Algoeirhm","volume":"43","author":"Zhang","year":"2014","journal-title":"J. Infrared Laser Eng."},{"key":"ref_10","unstructured":"Comaniciu, D., Ramesh, V., and Meer, P. (2000, January 15). Real-time tracking of non-rigid objects using mean shift. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Hilton Head, SC, USA."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Danelljan, M., Bhat, G., and Khan, F.S. (2017, January 21\u201326). ECO: Efficient Convolution Operators for Tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.733"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"189","DOI":"10.1016\/j.knosys.2017.07.032","article-title":"Deep Convolutional Neural Networks for Thermal Infrared Object Tracking","volume":"134","author":"Liu","year":"2017","journal-title":"J. Knowl.-Based Syst."},{"key":"ref_13","unstructured":"Liu, Q., Li, X., He, Z., Fan, N., and Liang, Y. (February, January 27). Multi-Task Driven Feature Models for Thermal Infrared Tracking. Proceedings of the Thirty-Third AAAI Conference on Artifificial Intelligence, Honolulu, HI, USA."},{"key":"ref_14","unstructured":"Shi, J., Chen, R., and Wang, H. (2015, January 7\u201312). Convolutional LSTM network: A machine learning approach for precipitation nowcasting. Proceedings of the International Conference on Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_15","first-page":"192","article-title":"Infrared Pedestrian Target Tracking Method Based on Video Prediction","volume":"52","author":"Liu","year":"2020","journal-title":"J. Harbin Inst. Technol."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Li, M., Peng, L., Chen, Y., Huang, S., Qin, F., and Peng, Z. (2019). Mask Sparse Representation Based on Semantic Features for Thermal Infrared Target Tracking. J. Remote Sens., 11.","DOI":"10.3390\/rs11171967"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Wu, H., Li, W., Li, W., and Liu, G. (2020, January 14\u201319). A Real Time Robust Approach for Tracking UAVs in Infrared Videos. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA.","DOI":"10.1109\/CVPRW50498.2020.00524"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"32383","DOI":"10.1109\/ACCESS.2019.2903829","article-title":"Two streams multiple-model object tracker for thermal infrared video","volume":"7","author":"Zulkifley","year":"2019","journal-title":"J. IEEE Access"},{"key":"ref_19","unstructured":"Dosovitskiy, A., Beyer, L., and Kolesnikov, A. (2021, January 3\u20137). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. Proceedings of the International Conference on Learning Representations (Virtual), Online."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Chen, X., Yan, B., Zhu, J., Wang, D., Yang, X., and Lu, H. (2021, January 20\u201325). Transformer Tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00803"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Yan, B., Peng, H., and Fu, J. (2021, January 10\u201317). Learning Spatio-Temporal Transformer for Visual Tracking. Proceedings of the IEEE International Conference on Computer Vision, Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.01028"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021, January 10\u201317). Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. Proceedings of the IEEE International Conference on Computer Vision, Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.00986"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Wu, Y., Chen, Y., and Yuan, L. (2020, January 13\u201319). Rethinking Classification and Localization for Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01020"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Shaw, P., Uszkoreit, J., and Vaswani, A. (2018). Self-Attention with Relative Position Representations. arXiv.","DOI":"10.18653\/v1\/N18-2074"},{"key":"ref_25","unstructured":"Goodfellow, I., Pouget-Abadie, J., and Mirza, M. (2014, January 2\u20138). Generative Adversarial Nets. Proceedings of the International Conference on Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Isola, P., Zhu, J.Y., and Zhou, T. (2017, January 21\u201326). Image-to-Image Translation with Conditional Adversarial Networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.632"},{"key":"ref_27","unstructured":"Ronneberger, O., Fischer, P., and Brox, T. (2017). U-Net: Convolutional Networks for Biomedical Image Segmentation, Springer International Publishing."},{"key":"ref_28","unstructured":"Loshchilov, I., and Hutter, F. (2019, January 6\u20139). Decoupled Weight Decay Regularization. Proceedings of the International Conference on Learning Representations, New Orleans, US, USA."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., and Torr, P.H. (2016, January 3\u20138). Fully-Convolutional Siamese Networks for Object Tracking. Proceedings of the European Conference on Computer Vision Workshops, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-48881-3_56"},{"key":"ref_30","unstructured":"Felsberg, M., Berg, A., and Hager, G. (2015, January 7\u201313). The Thermal Infrared Visual Object Tracking VOT-TIR2015 Challenge Results. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Santiago, Chile."},{"key":"ref_31","unstructured":"Kristan, M., and Leonardis, A. (2017, January 22\u201329). The Visual Object Tracking vot2017 Challenge Results. Proceedings of the IEEE International Conference on Computer Vision Workshops, Venice, Italy."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"666","DOI":"10.1109\/TMM.2019.2932615","article-title":"PTB-TIR: A Thermal Infrared Pedestrian Tracking Benchmark","volume":"22","author":"Qiao","year":"2020","journal-title":"IEEE Trans. Multimed."},{"key":"ref_33","first-page":"2114","article-title":"Learning Deep Multi-Level Similarity for Thermal Infrared Object Tracking","volume":"99","author":"Liu","year":"2020","journal-title":"IEEE Trans. Multimed."}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/19\/7408\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T00:41:48Z","timestamp":1760143308000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/19\/7408"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,9,29]]},"references-count":33,"journal-issue":{"issue":"19","published-online":{"date-parts":[[2022,10]]}},"alternative-id":["s22197408"],"URL":"https:\/\/doi.org\/10.3390\/s22197408","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,9,29]]}}}