{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,11]],"date-time":"2026-05-11T12:39:45Z","timestamp":1778503185117,"version":"3.51.4"},"reference-count":58,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2025,3,7]],"date-time":"2025-03-07T00:00:00Z","timestamp":1741305600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2025,3,31]]},"abstract":"<jats:p>Visible-infrared person re-identification (VI-ReID) aims to match persons across visible and infrared modalities; however, its performance is prone to complex dynamic scenes, such as occlusions, background shifts, and pose changes. In this paper, we propose a Multi-scale Dynamic Fusion Network (MDFN) to address these challenges in the VI-ReID task. Specifically, the proposed MDFN consists of the Dynamic Feature Fusion (DFF), Dynamic Perception Enhancement (DPE), and Feature Reweighting with Similarity (FRS) modules. The DFF module dynamically extracts local and long-range dependencies among features to obtain finer-grained discriminative features. The DPE module extracts multi-scale features from both visible and infrared modalities to generate diverse embeddings. The FRS module mitigates the impact of information imbalance between modalities, thereby further improving performance. Extensive experiments on the SYSU-MM01 and RegDB datasets show that our MDFN outperforms other state-of-the-art methods, especially in complex dynamic scenes with occlusions, background shifts, and pose changes.<\/jats:p>","DOI":"10.1145\/3715330","type":"journal-article","created":{"date-parts":[[2025,1,28]],"date-time":"2025-01-28T15:53:42Z","timestamp":1738079622000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["Multi-Scale Dynamic Fusion for Visible-Infrared Person Re-Identification"],"prefix":"10.1145","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0009-0008-8979-9774","authenticated-orcid":false,"given":"Shen","family":"Wang","sequence":"first","affiliation":[{"name":"College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-2621-2981","authenticated-orcid":false,"given":"Yu","family":"Wang","sequence":"additional","affiliation":[{"name":"College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7580-0553","authenticated-orcid":false,"given":"Renjie","family":"Qiao","sequence":"additional","affiliation":[{"name":"College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9859-9573","authenticated-orcid":false,"given":"Kejun","family":"Wu","sequence":"additional","affiliation":[{"name":"School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9097-2318","authenticated-orcid":false,"given":"Chia-Wen","family":"Lin","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering, National Tsing Hua University, Hsinchu, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3475-6098","authenticated-orcid":false,"given":"Chengtao","family":"Cai","sequence":"additional","affiliation":[{"name":"College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin, China, Heilongjiang Provincial Key Laboratory of Environment Intelligent Perception, Harbin, China, and Key laboratory of Intelligent Technology and Application of Marine Equipment, Harbin Engineering University, Harbin, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,3,7]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2022.3141868"},{"key":"e_1_3_1_3_2","first-page":"677--683","article-title":"Cross-modality person re-identification with generative adversarial training","author":"Dai Pingyang","year":"2018","unstructured":"Pingyang Dai, Rongrong Ji, Haibin Wang, Qiong Wu, and Yuyu Huang. 2018. Cross-modality person re-identification with generative adversarial training. In IJCAI, 677--683.","journal-title":"IJCAI"},{"key":"e_1_3_1_4_2","unstructured":"Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai Thomas Unterthiner Mostafa Dehghani Matthias Minderer Georg Heigold Sylvain Gelly et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv:2010.11929."},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICME57554.2024.10688271"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1145\/3422622"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01609"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.123"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/3617375"},{"key":"e_1_3_1_11_2","unstructured":"Alexander Hermans Lucas Beyer and Bastian Leibe. 2017. In defense of the triplet loss for person re-identification. arXiv:1703.07737."},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/1391729.1391730"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jksuci.2022.07.002"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/3505244"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/5.726791"},{"issue":"1","key":"e_1_3_1_16_2","first-page":"1","article-title":"Dynamic weighted gradient reversal network for visible-infrared person re-identification","volume":"20","author":"Li Chenghua","year":"2023","unstructured":"Chenghua Li, Zongze Li, Jing Sun, Yun Zhang, Xiaoping Jiang, and Fan Zhang. 2023. Dynamic weighted gradient reversal network for visible-infrared person re-identification. ACM Transactions on Multimedia Computing, Communications and Applications 20, 1 (2023), 1\u201323.","journal-title":"ACM Transactions on Multimedia Computing, Communications and Applications"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i04.5891"},{"key":"e_1_3_1_18_2","volume-title":"Proceedings of the 16th Asian Conference on Machine Learning (Conference Track)","author":"Lin Ziyang","year":"2024","unstructured":"Ziyang Lin and Banghai Wang. 2024. Visible-Infrared Person Re-Indentification via Feature Fusion and Deep Mutual Learning. In Proceedings of the 16th Asian Conference on Machine Learning (Conference Track). Retrieved from https:\/\/openreview.net\/forum?id=NmoEsT5Rul"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v37i2.25250"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01876"},{"key":"e_1_3_1_21_2","first-page":"1","article-title":"ByteNet: Rethinking multimedia file fragment classification through visual perspectives","author":"Liu Wenyang","year":"2024","unstructured":"Wenyang Liu, Kejun Wu, Tianyi Liu, Yi Wang, Kim-Hui Yap, and Lap-Pui Chau. 2024. ByteNet: Rethinking multimedia file fragment classification through visual perspectives. IEEE Transactions on Multimedia (2024), 1\u201314.","journal-title":"IEEE Transactions on Multimedia"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v37i2.25273"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01339"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.3390\/s17030605"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neunet.2023.11.003"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIFS.2024.3426335"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298682"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.74"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.01030"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/3503161.3547970"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIFS.2024.3354377"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.3389\/fnagi.2022.908143"},{"issue":"11","key":"e_1_3_1_33_2","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"Van der Maaten Laurens","year":"2008","unstructured":"Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, 11 (2008), 2579\u20132605.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_1_34_2","article-title":"Attention is all you need","volume":"30","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 30.","journal-title":"Proceedings of the Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00372"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i07.6894"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00071"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00029"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.575"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1364\/OE.504717"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2023.3306072"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2022.3169055"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00431"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1145\/3652583.3658109"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.01571"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICME57554.2024.10687987"},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2024.3379752"},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-024-01997-w"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58520-4_14"},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2021.3054775"},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIFS.2024.3356233"},{"key":"e_1_3_1_52_2","first-page":"2","article-title":"Visible thermal person re-identification via dual-constrained top-ranking","volume":"1","author":"Ye Mang","year":"2018","unstructured":"Mang Ye, Zheng Wang, Xiangyuan Lan, and Pong C Yuen. 2018. Visible thermal person re-identification via dual-constrained top-ranking. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI \u201818), Vol. 1. 2.","journal-title":"Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI \u201818)"},{"key":"e_1_3_1_53_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00720"},{"key":"e_1_3_1_54_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00214"},{"key":"e_1_3_1_55_2","doi-asserted-by":"publisher","DOI":"10.1145\/3474085.3475250"},{"key":"e_1_3_1_56_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2023.102128"},{"key":"e_1_3_1_57_2","doi-asserted-by":"publisher","DOI":"10.1145\/3159171"},{"key":"e_1_3_1_58_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.389"},{"key":"e_1_3_1_59_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.00024"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3715330","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3715330","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:18:18Z","timestamp":1750295898000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3715330"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,7]]},"references-count":58,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2025,3,31]]}},"alternative-id":["10.1145\/3715330"],"URL":"https:\/\/doi.org\/10.1145\/3715330","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,3,7]]},"assertion":[{"value":"2024-08-07","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-01-15","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-03-07","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}