{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,17]],"date-time":"2026-01-17T08:36:08Z","timestamp":1768638968490,"version":"3.49.0"},"reference-count":51,"publisher":"MDPI AG","issue":"13","license":[{"start":{"date-parts":[[2023,6,24]],"date-time":"2023-06-24T00:00:00Z","timestamp":1687564800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Natural Science Foundation of China","award":["62076137"],"award-info":[{"award-number":["62076137"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Remote Sensing"],"abstract":"<jats:p>The tracking community is increasingly focused on RGBT tracking, which leverages the complementary strengths of corresponding visible light and thermal infrared images. The most well-known RGBT trackers, however, are unable to balance performance and speed at the same time for UAV tracking. In this paper, an innovative RGBT Siamese tracker named SiamCAF is proposed, which utilizes multi-modal features with a beyond-real-time running speed. Specifically, we used a dual-modal Siamese subnetwork to extract features. In addition, to extract similar features and reduce the modality differences for fusing features efficiently, we designed the Complementary Coupling Feature fusion module (CCF). Simultaneously, the Residual Channel Attention Enhanced module (RCAE) was designed to enhance the extracted features and representational power. Furthermore, the Maximum Fusion Prediction module (MFP) was constructed to boost performance in the response map fusion stage. Finally, comprehensive experiments on three real RGBT tracking datasets and one visible\u2013thermal UAV tracking dataset showed that SiamCAF outperforms other tracking methods, with a remarkable tracking speed of over 105 frames per second.<\/jats:p>","DOI":"10.3390\/rs15133252","type":"journal-article","created":{"date-parts":[[2023,6,26]],"date-time":"2023-06-26T03:14:56Z","timestamp":1687749296000},"page":"3252","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":24,"title":["SiamCAF: Complementary Attention Fusion-Based Siamese Network for RGBT Tracking"],"prefix":"10.3390","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0009-0002-1867-8517","authenticated-orcid":false,"given":"Yingjian","family":"Xue","sequence":"first","affiliation":[{"name":"School of Mathematics and Statistics, Nanjing University of Information Science and Technology, Nanjing 210044, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jianwei","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Mathematics and Statistics, Nanjing University of Information Science and Technology, Nanjing 210044, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhoujin","family":"Lin","sequence":"additional","affiliation":[{"name":"School of Mathematics and Statistics, Nanjing University of Information Science and Technology, Nanjing 210044, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chenglong","family":"Li","sequence":"additional","affiliation":[{"name":"School of Mathematics and Statistics, Nanjing University of Information Science and Technology, Nanjing 210044, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bihan","family":"Huo","sequence":"additional","affiliation":[{"name":"School of Mathematics and Statistics, Nanjing University of Information Science and Technology, Nanjing 210044, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2001-2175","authenticated-orcid":false,"given":"Yan","family":"Zhang","sequence":"additional","affiliation":[{"name":"Engineering Research Center of Digital Forensics, Ministry of Education, Nanjing University of Information Science and Technology, Nanjing 210044, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2023,6,24]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"245","DOI":"10.1007\/s00138-013-0570-5","article-title":"Thermal cameras and applications: A survey","volume":"25","author":"Gade","year":"2013","journal-title":"Mach. Vis. Appl."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"5743","DOI":"10.1109\/TIP.2016.2614135","article-title":"Learning Collaborative Sparse Representation for Grayscale-Thermal Tracking","volume":"25","author":"Li","year":"2016","journal-title":"IEEE Trans. Image Process."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"106977","DOI":"10.1016\/j.patcog.2019.106977","article-title":"RGB-T object tracking: Benchmark and baseline","volume":"96","author":"Li","year":"2019","journal-title":"Pattern Recognit."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"392","DOI":"10.1109\/TIP.2021.3130533","article-title":"LasHeR: A Large-Scale High-Diversity Benchmark for RGBT Tracking","volume":"31","author":"Li","year":"2022","journal-title":"IEEE Trans. Image Process."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Zhang, P., Zhao, J., Wang, D., Lu, H., and Ruan, X. (2022). Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline. arXiv.","DOI":"10.1109\/CVPR52688.2022.00868"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"948","DOI":"10.1109\/TSP.2015.2493985","article-title":"The Accurate Continuous-Discrete Extended Kalman Filter for Radar Tracking","volume":"64","author":"Kulikov","year":"2016","journal-title":"IEEE Trans. Signal Process."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Changjiang, Y., Duraiswami, R., and Davis, L. (2005, January 17\u201321). Fast multiple object tracking via a hierarchical particle filter. Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV), Beijing, China.","DOI":"10.1109\/ICCV.2005.95"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"086402","DOI":"10.1117\/1.2969127","article-title":"Improved mean shift algorithm for multiple occlusion target tracking","volume":"47","author":"Li","year":"2008","journal-title":"Opt. Eng."},{"key":"ref_9","unstructured":"Dalal, N., and Triggs, B. (2005, January 20\u201325). Histograms of oriented gradients for human detection. Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), San Diego, CA, USA."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"2012","DOI":"10.1109\/TIP.2009.2024578","article-title":"n-SIFT: N-Dimensional Scale Invariant Feature Transform","volume":"18","author":"Cheung","year":"2009","journal-title":"IEEE Trans. Image Process."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Jia, C., Wang, Z., Wu, X., Cai, B., Huang, Z., Wang, G., Zhang, T., and Tong, D. (2015, January 6\u20139). A Tracking-Learning-Detection (TLD) method with local binary pattern improved. Proceedings of the 2015 IEEE International Conference on Robotics and Biomimetics (ROBIO), Zhuhai, China.","DOI":"10.1109\/ROBIO.2015.7419004"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Nam, H., and Han, B. (2016, January 27\u201330). Learning Multi-domain Convolutional Neural Networks for Visual Tracking. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.465"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., and Torr, P.H.S. (2016, January 8\u201316). Fully-convolutional siamese networks for object tracking. Proceedings of the European Conference on Computer Vision Workshop (ECCVW), Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-48881-3_56"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"2555","DOI":"10.1007\/s00371-021-02131-4","article-title":"Dual Siamese network for RGBT tracking via fusing predicted position maps","volume":"38","author":"Guo","year":"2021","journal-title":"Vis. Comput."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Bolme, D.S., Beveridge, J.R., and Draper, B.A. (2010, January 13\u201318). Visual object tracking using adaptive correlation filters. Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, USA.","DOI":"10.1109\/CVPR.2010.5539960"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"702","DOI":"10.1007\/978-3-642-33765-9_50","article-title":"Exploiting the Circulant Structure of Tracking-by-Detection with Kernels","volume":"Volume 7575","author":"Henriques","year":"2012","journal-title":"Proceedings of the Computer Vision\u2013ECCV 2012: 12th European Conference"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1109\/TPAMI.2014.2345390","article-title":"High-Speed Tracking with Kernelized Correlation Filters","volume":"37","author":"Henriques","year":"2015","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Danelljan, M., Khan, F.S., Felsberg, M., and Weijer, J.V.D. (2014, January 23\u201328). Adaptive Color Attributes for Real-Time Visual Tracking. Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.143"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"472","DOI":"10.1007\/978-3-319-46454-1_29","article-title":"Beyond Correlation Filters: Learning Continuous Convolution Operators for Visual Tracking","volume":"Volume 9909","author":"Danelljan","year":"2016","journal-title":"Proceedings of the Computer Vision\u2014ECCV 2016: 14th European Conference"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Li, B., Yan, J., Wu, W., Zhu, Z., and Hu, X. (2018, January 18\u201323). High Performance Visual Tracking with Siamese Region Proposal Network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00935"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Zhang, X., Ye, P., Qiao, D., Zhao, J., Peng, S., and Xiao, G. (2019, January 2\u20135). Object Fusion Tracking Based on Visible and Infrared Images Using Fully Convolutional Siamese Networks. Proceedings of the 2019 22th International Conference on Information Fusion (FUSION), Ottawa, ON, Canada.","DOI":"10.23919\/FUSION43075.2019.9011253"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"122122","DOI":"10.1109\/ACCESS.2019.2936914","article-title":"SiamFT: An RGB-Infrared Fusion Tracking Method via Fully Convolutional Siamese Networks","volume":"7","author":"Zhang","year":"2019","journal-title":"IEEE Access"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"115756","DOI":"10.1016\/j.image.2019.115756","article-title":"DSiamMFT: An RGB-T fusion tracking method via dynamic Siamese networks using multi-layer feature fusion","volume":"84","author":"Zhang","year":"2020","journal-title":"Signal Process. Image Commun."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"1403","DOI":"10.1109\/TCSVT.2021.3072207","article-title":"SiamCDA: Complementarity- and Distractor-Aware RGB-T Tracking Based on Siamese Network","volume":"32","author":"Zhang","year":"2022","journal-title":"IEEE Trans. Circuits Syst. Video Technol."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"108945","DOI":"10.1016\/j.knosys.2022.108945","article-title":"Learning reliable modal weight with transformer for robust RGBT tracking","volume":"249","author":"Feng","year":"2022","journal-title":"Knowl.-Based Syst."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Peng, J., Zhao, H., Hu, Z., Zhuang, Y., and Wang, B. (2023). Siamese infrared and visible light fusion network for RGB-T tracking. Int. J. Mach. Learn. Cybern.","DOI":"10.1007\/s13042-023-01833-6"},{"key":"ref_27","unstructured":"Jaderberg, M., Simonyan, K., Zisserman, A., and Kavukcuoglu, K. (2015, January 7\u201312). Spatial transformer networks. Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS), Montreal, QC, Canada."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Hu, J., Shen, L., and Sun, G. (2018, January 18\u201323). Squeeze-and-Excitation Networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00745"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Woo, S., Park, J., Lee, J.-Y., and Kweon, I.S. (2018, January 8\u201314). CBAM: Convolutional Block Attention Module. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01234-2_1"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Li, X., Wang, W., Hu, X., and Yang, J. (2019, January 15\u201320). Selective Kernel Networks. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00060"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Zhang, L., Danelljan, M., Gonzalez-Garcia, A., Weijer, J.v.d., and Khan, F.S. (2019, January 27\u201328). Multi-Modal Fusion for End-to-End RGB-T Tracking. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Republic of Korea.","DOI":"10.1109\/ICCVW.2019.00278"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.inffus.2019.12.014","article-title":"IVFuseNet: Fusion of infrared and visible light images for depth prediction","volume":"58","author":"Li","year":"2020","journal-title":"Inf. Fusion"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Li, C., Zhao, N., Lu, Y., Zhu, C., and Tang, J. (2017, January 23\u201327). Weighted Sparse Representation Regularized Graph Learning for RGB-T Object Tracking. Proceedings of the 25th ACM international conference on Multimedia, Mountain View, CA, USA.","DOI":"10.1145\/3123266.3123289"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Danelljan, M., Bhat, G., Khan, F.S., and Felsberg, M. (2017, January 21\u201326). ECO: Efficient Convolution Operators for Tracking. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.733"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"222","DOI":"10.1007\/978-3-030-58542-6_14","article-title":"Challenge-Aware RGBT Tracking","volume":"Volume 12367","author":"Li","year":"2020","journal-title":"Proceedings of the Computer Vision\u2013ECCV 2020: 16th European Conference"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Li, C.L., Lu, A., Zheng, A.H., Tu, Z., and Tang, J. (2019, January 27\u201328). Multi-Adapter RGBT Tracking. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Republic of Korea.","DOI":"10.1109\/ICCVW.2019.00279"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Zhu, Y., Li, C., Luo, B., Tang, J., and Wang, X. (2019, January 21\u201325). Dense Feature Aggregation and Pruning for RGBT Tracking. Proceedings of the 27th ACM International Conference on Multimedia, Nice, France.","DOI":"10.1145\/3343031.3350928"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"16915","DOI":"10.1109\/JSEN.2021.3078455","article-title":"HDINet: Hierarchical Dual-Sensor Interaction Network for RGBT Tracking","volume":"21","author":"Mei","year":"2021","journal-title":"IEEE Sens. J."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Danelljan, M., H\u00e4ger, G., Khan, F.S., and Felsberg, M. (2015, January 7\u201313). Learning Spatially Regularized Correlation Filters for Visual Tracking. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.490"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Jung, I., Son, J., Baek, M., and Han, B. (2018, January 8\u201314). Real-Time MDNet. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01225-0_6"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Zhang, Z., and Peng, H. (2019, January 15\u201320). Deeper and Wider Siamese Networks for Real-Time Visual Tracking. Proceedings of the 2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00472"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Valmadre, J., Bertinetto, L., Henriques, J., Vedaldi, A., and Torr, P.H.S. (2017, January 21\u201326). End-to-End Representation Learning for Correlation Filter Based Tracking. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.531"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Kim, H.U., Lee, D.Y., Sim, J.Y., and Kim, C.S. (2015, January 7\u201313). SOWP: Spatially Ordered and Weighted Patch Descriptor for Visual Tracking. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.345"},{"key":"ref_44","unstructured":"Wu, Y., Blasch, E., Chen, G., Bai, L., and Ling, H. (2011, January 5\u20138). Multiple source data fusion via sparse representation for robust visual tracking. Proceedings of the 14th International Conference on Information Fusion, Chicago, IL, USA."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Danelljan, M., H\u00e4ger, G., Khan, F.S., and Felsberg, M. (2014, January 1\u20135). Accurate Scale Estimation for Robust Visual Tracking. Proceedings of the British Machine Vision Conference, Nottingham, UK.","DOI":"10.5244\/C.28.65"},{"key":"ref_46","first-page":"11037","article-title":"GlobalTrack: A Simple and Strong Baseline for Long-Term Tracking","volume":"34","author":"Huang","year":"2020","journal-title":"Proc. AAAI Conf. Artif. Intell."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Yan, B., Zhao, H., Wang, D., Lu, H., and Yang, X. (November, January 27). \u2018Skimming-Perusal\u2019 Tracking: A Framework for Real-Time and Robust Long-Term Tracking. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea.","DOI":"10.1109\/ICCV.2019.00247"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Cao, Z., Fu, C., Ye, J., Li, B., and Li, Y. (2021, January 10\u201317). HiFT: Hierarchical Feature Transformer for Aerial Tracking. Proceedings of the 2021 IEEE\/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.01517"},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Luke\u017ei\u010d, A., Matas, J., and Kristan, M. (2020, January 13\u201319). D3S\u2014A Discriminative Single Shot Segmentation Tracker. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00716"},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Gao, Y., Li, C., Zhu, Y., Tang, J., He, T., and Wang, F. (, January 27\u201328). Deep Adaptive Fusion Network for High Performance RGBT Tracking. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Republic of Korea.","DOI":"10.1109\/ICCVW.2019.00017"},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"2714","DOI":"10.1007\/s11263-021-01495-3","article-title":"Learning Adaptive Attribute-Driven Representation for Real-Time RGB-T Tracking","volume":"129","author":"Zhang","year":"2021","journal-title":"Int. J. Comput. Vis."}],"container-title":["Remote Sensing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2072-4292\/15\/13\/3252\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T19:59:42Z","timestamp":1760126382000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2072-4292\/15\/13\/3252"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,24]]},"references-count":51,"journal-issue":{"issue":"13","published-online":{"date-parts":[[2023,7]]}},"alternative-id":["rs15133252"],"URL":"https:\/\/doi.org\/10.3390\/rs15133252","relation":{},"ISSN":["2072-4292"],"issn-type":[{"value":"2072-4292","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,6,24]]}}}