{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,27]],"date-time":"2026-03-27T05:39:48Z","timestamp":1774589988014,"version":"3.50.1"},"reference-count":77,"publisher":"Association for Computing Machinery (ACM)","issue":"5","license":[{"start":{"date-parts":[[2025,5,22]],"date-time":"2025-05-22T00:00:00Z","timestamp":1747872000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Science Foundation Program of China","award":["61976241"],"award-info":[{"award-number":["61976241"]}]},{"name":"Postgraduate Research and Practice Innovation Program of Jiangsu Province","award":["KYCX24_4129"],"award-info":[{"award-number":["KYCX24_4129"]}]},{"name":"International Science and technology cooperation plan project of Zhenjiang","award":["GJ2021008"],"award-info":[{"award-number":["GJ2021008"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2025,5,31]]},"abstract":"<jats:p>\n            Visible-infrared person re-identification (VI-ReID) task is to retrieve the same pedestrian across the visible and infrared modalities. The existing transformer-based works are constrained by the inherent structure of the ViT that feature collapse in deeper layers and the over-globalization of extracted features, resulting in incomplete learning of local and low-level features. However, these features are instrumental in representing and identifying elements within visible-infrared images more comprehensively, which increases the accuracy and robustness of cross-modal pedestrian matching. To solve the above problem, we propose the Local-Aware Residual Attention Vision Transformer (LAReViT) to enhance the learning of fine-grained local and shallow-level information to reinforce the feature discrimination and comprehensiveness in ViT. Specifically, the Local-Aware Residual (LAR) Module, which uses a novel Local Residual Attention (LRA) mechanism, is proposed to increase the fine-grained local information contained in feature extraction. In order to exploit fine-grained local information lost in lower-level visual features, the LRA in the LAR module adopts novel attention residual connections. Additionally, we propose a Positional Channel Reconstruction (PCR) Module that takes advantage of the local receptive field benefits of convolution. PCR reweights features within patches at the channel level, further facilitating the network emphasis on effective fine-grained local information. Finally, the novel Center Aggregation Loss (CAL) is designed to reduce modality discrepancies moderately and promote comprehensive feature extraction. Extensive experiments conducted on the SYSU-MM01, RegDB, and LLCM datasets demonstrate the state-of-the-art performance achieved by our proposed method. The code is available at\n            <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/Hua-XC\/LAReViT\">https:\/\/github.com\/Hua-XC\/LAReViT<\/jats:ext-link>\n            .\n          <\/jats:p>","DOI":"10.1145\/3723358","type":"journal-article","created":{"date-parts":[[2025,3,14]],"date-time":"2025-03-14T13:26:29Z","timestamp":1741958789000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":10,"title":["Local-Aware Residual Attention Vision Transformer for Visible-Infrared Person Re-Identification"],"prefix":"10.1145","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0009-0002-8077-0015","authenticated-orcid":false,"given":"Xuecheng","family":"Hua","sequence":"first","affiliation":[{"name":"Jiangsu University of Science and Technology, Zhenjiang, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8956-8916","authenticated-orcid":false,"given":"Ke","family":"Cheng","sequence":"additional","affiliation":[{"name":"Jiangsu University of Science and Technology, Zhenjiang, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-5136-023X","authenticated-orcid":false,"given":"Gege","family":"Zhu","sequence":"additional","affiliation":[{"name":"Jiangsu University of Science and Technology, Zhenjiang, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0350-4055","authenticated-orcid":false,"given":"Hu","family":"Lu","sequence":"additional","affiliation":[{"name":"Jiangsu University, Zhenjiang, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9232-5392","authenticated-orcid":false,"given":"Yuanquan","family":"Wang","sequence":"additional","affiliation":[{"name":"Hebei University of Technology, Tianjin, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8393-6554","authenticated-orcid":false,"given":"Shitong","family":"Wang","sequence":"additional","affiliation":[{"name":"Wenzhou-Kean University, Wenzhou, China and Jiangnan University, Wuxi, China"}]}],"member":"320","published-online":{"date-parts":[[2025,5,22]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2023.3268080"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2022.3141868"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00065"},{"issue":"104","key":"e_1_3_1_5_2","doi-asserted-by":"crossref","first-page":"104265","DOI":"10.1016\/j.jvcir.2024.104265","article-title":"A visible-infrared person re-identification method based on meta-graph isomerization aggregation module","author":"Chongrui Shan","year":"2024","unstructured":"Shan Chongrui, Zhang Baohua, Gu Yu, Li Jianjun, Zhang Ming, and Wang Jingyu. 2024. A visible-infrared person re-identification method based on meta-graph isomerization aggregation module. Journal of Visual Communication and Image Representation 104 (2024), 104265.","journal-title":"Journal of Visual Communication and Image Representation"},{"key":"e_1_3_1_6_2","first-page":"6","volume-title":"Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI)","volume":"1","author":"Dai Pingyang","year":"2018","unstructured":"Pingyang Dai, Rongrong Ji, Haibin Wang, Qiong Wu, and Yuyu Huang. 2018. Cross-modality person re-identification with generative adversarial training. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), Vol. 1, 6."},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2024.110853"},{"key":"e_1_3_1_8_2","unstructured":"Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai Thomas Unterthiner Mostafa Dehghani Matthias Minderer Georg Heigold Sylvain Gelly et al. 2020. An image is worth 16 \u00d7 16 words: Transformers for image recognition at scale. arXiv:2010.11929. Retrieved from https:\/\/arxiv.org\/abs\/2010.11929"},{"key":"e_1_3_1_9_2","first-page":"2286","volume-title":"Proceedings of the International Conference on Machine Learning","author":"D\u2019Ascoli St\u00e9phane","year":"2021","unstructured":"St\u00e9phane D\u2019Ascoli, Hugo Touvron, Matthew L. Leavitt, Ari S. Morcos, Giulio Biroli, and Levent Sagun. 2021. ConViT: Improving vision transformers with soft convolutional inductive biases. In Proceedings of the International Conference on Machine Learning. PMLR, 2286\u20132296."},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.02179"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2022.3224663"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01161"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/3474085.3475643"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2023.3296680"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01186"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2022.3152247"},{"key":"e_1_3_1_17_2","first-page":"16403","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV)","author":"Hao Xin","year":"2021","unstructured":"Xin Hao, Sanyuan Zhao, Mang Ye, and Jianbing Shen. 2021. Cross-modality person re-identification via modality confusion and center aggregation. In Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), 16403\u201316412."},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01553"},{"key":"e_1_3_1_19_2","first-page":"15013","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV)","author":"He Shuting","year":"2021","unstructured":"Shuting He, Hao Luo, Pichao Wang, Fan Wang, Hao Li, and Wei Jiang. 2021. TransReID: Transformer-based object re-identification. In Proceedings of the IEEE\/CVF International Conference on Computer Vision (ICCV), 15013\u201315022."},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00745"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2024.111090"},{"key":"e_1_3_1_22_2","first-page":"480","volume-title":"European Conference on Computer Vision","author":"Jiang Kongzhu","year":"2022","unstructured":"Kongzhu Jiang, Tianzhu Zhang, Xiang Liu, Bingqiao Qian, Yongdong Zhang, and Feng Wu. 2022. Cross-modality transformer for visible-infrared person re-identification. In European Conference on Computer Vision. Springer, 480\u2013496."},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1145\/3505244"},{"key":"e_1_3_1_24_2","first-page":"4344","volume-title":"Proceedings of the Asian Conference on Computer Vision","author":"Kim Daeho","year":"2022","unstructured":"Daeho Kim and Jaeil Kim. 2022. Vision transformer compression and architecture exploration with efficient embedding space search. In Proceedings of the Asian Conference on Computer Vision, 4344\u20134360."},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01786"},{"key":"e_1_3_1_26_2","first-page":"4150","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision","author":"Lai Shenqi","year":"2021","unstructured":"Shenqi Lai, Zhenhua Chai, and Xiaolin Wei. 2021. Transformer meets part model: Adaptive part division for person re-identification. In Proceedings of the IEEE\/CVF International Conference on Computer Vision, 4150\u20134157."},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/3607535"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i04.5891"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00292"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/JIOT.2024.3503766"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2023.3338813"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2021.3105702"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2020.3042080"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01876"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v37i2.25273"},{"key":"e_1_3_1_36_2","unstructured":"Hao Luo Pichao Wang Yi Xu Feng Ding Yanxin Zhou Fan Wang Hao Li and Rong Jin. 2021. Self-supervised pre-training for transformer-based person re-identification. arXiv:2111.12084. Retrieved from https:\/\/arxiv.org\/abs\/2111.12084"},{"key":"e_1_3_1_37_2","first-page":"23296","article-title":"Intriguing properties of vision transformers","volume":"34","author":"Naseer Muhammad Muzammal","year":"2021","unstructured":"Muhammad Muzammal Naseer, Kanchana Ranasinghe, Salman H. Khan, Munawar Hayat, Fahad Shahbaz Khan, and Ming-Hsuan Yang. 2021. Intriguing properties of vision transformers. In Advances in Neural Information Processing Systems. M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (Eds.), Vol. 34, Curran Associates, Inc., 23296\u201323308.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.3390\/s17030605"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.01036"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIFS.2024.3426335"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01183"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00042"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.74"},{"key":"e_1_3_1_44_2","first-page":"456","volume-title":"Proceedings of the European Conference on Computer Vision","author":"Shi Jiangming","year":"2025","unstructured":"Jiangming Shi, Xiangbo Yin, Yeyun Chen, Yachao Zhang, Zhizhong Zhang, Yuan Xie, and Yanyun Qu. 2025. Multi-memory matching for unsupervised visible-infrared person re-identification. In Proceedings of the European Conference on Computer Vision. Springer, 456\u2013474."},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1145\/3503161.3547970"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01225-0_30"},{"key":"e_1_3_1_47_2","first-page":"15316","article-title":"Augmented shortcuts for vision transformers","volume":"34","author":"Tang Yehui","year":"2021","unstructured":"Yehui Tang, Kai Han, Chang Xu, An Xiao, Yiping Deng, Chao Xu, and Yunhe Wang. 2021. Augmented shortcuts for vision transformers. In Advances in Neural Information Processing Systems. M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (Eds.), Vol. 34, Curran Associates, Inc., 15316\u201315327.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_48_2","article-title":"Attention is all you need","volume":"30","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, \u0141. Ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30, Curran Associates, Inc.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00372"},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v36i3.20155"},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.575"},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00009"},{"key":"e_1_3_1_53_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2023.3331569"},{"key":"e_1_3_1_54_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP49357.2023.10097170"},{"key":"e_1_3_1_55_2","first-page":"16870","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Yang Bin","year":"2024","unstructured":"Bin Yang, Jun Chen, and Mang Ye. 2024. Shallow-deep collaborative learning for unsupervised visible-infrared person re-identification. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 16870\u201316879."},{"key":"e_1_3_1_56_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01391"},{"key":"e_1_3_1_57_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2024.3377139"},{"key":"e_1_3_1_58_2","doi-asserted-by":"publisher","DOI":"10.1145\/3343031.3351043"},{"key":"e_1_3_1_59_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.12293"},{"key":"e_1_3_1_60_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01331"},{"key":"e_1_3_1_61_2","first-page":"229","volume-title":"Proceedings of the 16th European Conference on Computer Vision (ECCV \u201920), Part XVII 16","author":"Ye Mang","year":"2020","unstructured":"Mang Ye, Jianbing Shen, David J. Crandall, Ling Shao, and Jiebo Luo. 2020. Dynamic dual-attentive aggregation learning for visible-infrared person re-identification. In Proceedings of the 16th European Conference on Computer Vision (ECCV \u201920), Part XVII 16. Springer, 229\u2013247."},{"key":"e_1_3_1_62_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2021.3054775"},{"key":"e_1_3_1_63_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIFS.2020.3001665"},{"key":"e_1_3_1_64_2","doi-asserted-by":"publisher","DOI":"10.1145\/3664647.3680951"},{"key":"e_1_3_1_65_2","doi-asserted-by":"publisher","DOI":"10.1145\/3674737"},{"key":"e_1_3_1_66_2","first-page":"14133","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Zhang Guiwei","year":"2023","unstructured":"Guiwei Zhang, Yongfei Zhang, Tianyu Zhang, Bo Li, and Shiliang Pu. 2023. PHA: Patch-wise high-frequency augmentation for transformer-based person re-identification. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 14133\u201314142."},{"key":"e_1_3_1_67_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2021.3085978"},{"key":"e_1_3_1_68_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00720"},{"key":"e_1_3_1_69_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIFS.2022.3224853"},{"key":"e_1_3_1_70_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00214"},{"key":"e_1_3_1_71_2","doi-asserted-by":"publisher","DOI":"10.1145\/3474085.3475250"},{"key":"e_1_3_1_72_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2022.3163847"},{"key":"e_1_3_1_73_2","first-page":"3520","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence, Vol","volume":"35","author":"Zhao Zhiwei","year":"2021","unstructured":"Zhiwei Zhao, Bin Liu, Qi Chu, Yan Lu, and Nenghai Yu. 2021. Joint color-irrelevant consistency learning and identity-aware modality adaptation for visible-infrared cross modality person re-identification. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35, 3520\u20133528."},{"key":"e_1_3_1_74_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2023.102128"},{"key":"e_1_3_1_75_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i07.7000"},{"key":"e_1_3_1_76_2","unstructured":"Daquan Zhou Bingyi Kang Xiaojie Jin Linjie Yang Xiaochen Lian Zihang Jiang Qibin Hou and Jiashi Feng. 2021. DeepViT: Towards deeper vision transformer. arXiv:2103.11886. Retrieved from https:\/\/arxiv.org\/abs\/2103.11886"},{"key":"e_1_3_1_77_2","first-page":"198","volume-title":"Proceedings of the European Conference on Computer Vision","author":"Zhu Kuan","year":"2022","unstructured":"Kuan Zhu, Haiyun Guo, Tianyi Yan, Yousong Zhu, Jinqiao Wang, and Ming Tang. 2022. PASS: Part-aware self-supervised pre-training for person re-identification. In Proceedings of the European Conference on Computer Vision. Springer, 198\u2013214."},{"key":"e_1_3_1_78_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2019.12.100"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3723358","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3723358","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:56:45Z","timestamp":1750298205000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3723358"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,5,22]]},"references-count":77,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2025,5,31]]}},"alternative-id":["10.1145\/3723358"],"URL":"https:\/\/doi.org\/10.1145\/3723358","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,5,22]]},"assertion":[{"value":"2024-09-17","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-03-06","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-05-22","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}