{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,27]],"date-time":"2026-02-27T15:26:56Z","timestamp":1772206016130,"version":"3.50.1"},"reference-count":49,"publisher":"Association for Computing Machinery (ACM)","issue":"6","license":[{"start":{"date-parts":[[2024,11,19]],"date-time":"2024-11-19T00:00:00Z","timestamp":1731974400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62302142"],"award-info":[{"award-number":["62302142"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"National Key Research and Development Programs of China","award":["2023YFC2506800 and 2019YFA0706203"],"award-info":[{"award-number":["2023YFC2506800 and 2019YFA0706203"]}]},{"DOI":"10.13039\/501100012226","name":"Fundamental Research Funds for the Central Universities","doi-asserted-by":"crossref","award":["JZ2023HGQA0097, JZ2024HGTA0178"],"award-info":[{"award-number":["JZ2023HGQA0097, JZ2024HGTA0178"]}],"id":[{"id":"10.13039\/501100012226","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Intell. Syst. Technol."],"published-print":{"date-parts":[[2024,12,31]]},"abstract":"<jats:p>RGB-D cross-modal person re-identification (re-id) targets at retrieving the person of interest across RGB and depth image modalities. To cope with the modal discrepancy, some existing methods generate an auxiliary mode with either inherent properties of input modes or extra deep networks. However, such useful intermediary role included in generated mode is often overlooked in these approaches, leading to insufficient exploitation of crucial bridge knowledge. By contrast, in this article, we propose a novel approach that constructs an intermediary mode through the constraints of self-supervised intermediary learning, which is freedom from modal prior knowledge and additional module parameters. We then design a bridge network to fully mine the intermediary role of generated modality through carrying out multi-modal integration and decomposition. For one thing, this network leverages a multi-modal transformer to integrate the information of three modes via fully exploiting their heterogeneous relations with the intermediary mode as the bridge. It conducts the identification consistency constraint to promote cross-modal associations. For another, it employs circle contrastive learning to decompose the cross-modal constraint process into several subprocedures, which provides the intermediate relay during pulling two original modalities closer. Experiments on two public datasets demonstrate that the proposed method exceeds the state-of-the-arts. The effectiveness of each component in this method is verified through numerous ablation studies. Additionally, we have demonstrated the generalization ability of the proposed method through experiments.<\/jats:p>","DOI":"10.1145\/3682066","type":"journal-article","created":{"date-parts":[[2024,7,29]],"date-time":"2024-07-29T14:06:56Z","timestamp":1722262016000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["Intermediary-Generated Bridge Network for RGB-D Cross-Modal Re-Identification"],"prefix":"10.1145","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3818-4277","authenticated-orcid":false,"given":"Jingjing","family":"Wu","sequence":"first","affiliation":[{"name":"School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5461-3986","authenticated-orcid":false,"given":"Richang","family":"Hong","sequence":"additional","affiliation":[{"name":"Key Laboratory of Knowledge Engineering with Big Data, Hefei University of Technology, Hefei, China and Ministry of Education, School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6313-2543","authenticated-orcid":false,"given":"Shengeng","family":"Tang","sequence":"additional","affiliation":[{"name":"Key Laboratory of Knowledge Engineering with Big Data, Hefei University of Technology, Hefei, China and Ministry of Education, School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China"}]}],"member":"320","published-online":{"date-parts":[[2024,11,19]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.image.2020.115933"},{"key":"e_1_3_1_3_2","first-page":"1597","volume-title":"Proceedings of the International Conference on Machine Learning.","author":"Chen Ting","year":"2020","unstructured":"Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In Proceedings of the International Conference on Machine Learning. PMLR, 1597\u20131607."},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01027"},{"key":"e_1_3_1_5_2","volume-title":"Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI \u201918)","volume":"1","author":"Dai Pingyang","year":"2018","unstructured":"Pingyang Dai, Rongrong Ji, Haibin Wang, Qiong Wu, and Yuyu Huang. 2018. Cross-modality person re-identification with generative adversarial training. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI \u201918), Vol. 1, 2."},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1145\/3610298"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cviu.2023.103703"},{"key":"e_1_3_1_9_2","first-page":"1","volume-title":"Proceedings of the 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS \u201919).","author":"Hafner Frank M.","year":"2019","unstructured":"Frank M. Hafner, Amran Bhuiyan, Julian F. P. Kooij, and Eric Granger. 2019. RGB-depth cross-modal person re-identification. In Proceedings of the 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS \u201919). IEEE, 1\u20138."},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cviu.2021.103352"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00975"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_1_13_2","unstructured":"Alexander Hermans Lucas Beyer and Bastian Leibe. 2017. In defense of the triplet loss for person re-identification. arXiv:1703.07737. Retrieved from https:\/\/arxiv.org\/pdf\/1703.07737"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v36i1.19987"},{"key":"e_1_3_1_15_2","first-page":"3345","volume-title":"Proceedings of the IEEE International Conference on Image Processing.","author":"John Vijay","year":"2013","unstructured":"Vijay John, Gwenn Englebienne, and Ben Krose. 2013. Person re-identification using height-based gait in colour depth camera. In Proceedings of the IEEE International Conference on Image Processing. IEEE, 3345\u20133349."},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.01786"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i04.5891"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298832"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2014.2369055"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.trit.2017.04.001"},{"key":"e_1_3_1_21_2","first-page":"19366","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Liu Jialun","year":"2022","unstructured":"Jialun Liu, Yifan Sun, Feng Zhu, Hongbin Pei, Yi Yang, and Wenhui Li. 2022. Learning memory-augmented unidirectional metrics for cross-modality person re-identification. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 19366\u201319375."},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1145\/3372121"},{"key":"e_1_3_1_23_2","first-page":"507","article-title":"Large-margin softmax loss for convolutional neural networks","volume":"48","author":"Liu Weiyang","year":"2016","unstructured":"Weiyang Liu, Yandong Wen, Zhiding Yu, and Meng Yang. 2016. Large-margin softmax loss for convolutional neural networks. In Proceedings of the 33rd International Conference on International Conference on Machine Learning (ICML \u201916), Vol. 48, 507\u2013516.","journal-title":"Proceedings of the 33rd International Conference on International Conference on Machine Learning (ICML \u201916)"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2019.00190"},{"key":"e_1_3_1_25_2","first-page":"1","volume-title":"Proceedings of the International Workshop on Biometrics and Forensics (IWBF \u201913).","author":"M\u00f8gelmose Andreas","year":"2013","unstructured":"Andreas M\u00f8gelmose, Thomas B. Moeslund, and Kamal Nasrollahi. 2013. Multimodal person re-identification using RGB-D sensors and a transient identification database. In Proceedings of the International Workshop on Biometrics and Forensics (IWBF \u201913). IEEE, 1\u20134."},{"key":"e_1_3_1_26_2","first-page":"161","volume-title":"Person Re-Identification. Advances in Computer Vision and Pattern Recognition","author":"Munaro Matteo","year":"2014","unstructured":"Matteo Munaro, Andrea Fossati, Alberto Basso, Emanuele Menegatti, and Luc Van Gool. 2014. One-shot person re-identification with a consumer depth camera. In Person Re-Identification. Advances in Computer Vision and Pattern Recognition. S. Gong, M. Cristani, S. Yan, and C. Loy (Eds.), Springer, London, 161\u2013181."},{"key":"e_1_3_1_27_2","first-page":"69","volume-title":"Proceedings of the 14th European Conference on Computer Vision (ECCV \u201916)","author":"Noroozi Mehdi","year":"2016","unstructured":"Mehdi Noroozi and Paolo Favaro. 2016. Unsupervised learning of visual representations by solving jigsaw puzzles. In Proceedings of the 14th European Conference on Computer Vision (ECCV \u201916). Springer, 69\u201384."},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2015.2424056"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1145\/3394171.3413673"},{"key":"e_1_3_1_30_2","unstructured":"Joseph Redmon and Ali Farhadi. 2018. Yolov3: An incremental improvement. arXiv:1804.02767. Retrieved from https:\/\/arxiv.org\/pdf\/1804.02767"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00643"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/3454130"},{"key":"e_1_3_1_33_2","first-page":"6438","volume-title":"Proceedings of the International Conference on Machine Learning.","author":"Verma Vikas","year":"2019","unstructured":"Vikas Verma, Alex Lamb, Christopher Beckham, Amir Najafi, Ioannis Mitliagkas, David Lopez-Paz, and Yoshua Bengio. 2019. Manifold mixup: Better representations by interpolating hidden states. In Proceedings of the International Conference on Machine Learning. PMLR, 6438\u20136447."},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00372"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00071"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.575"},{"issue":"4","key":"e_1_3_1_37_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3506708","article-title":"An end-to-end heterogeneous restraint network for RGB-D cross-modal person re-identification","volume":"18","author":"Wu Jingjing","year":"2022","unstructured":"Jingjing Wu, Jianguo Jiang, Meibin Qi, Cuiqun Chen, and Jingjing Zhang. 2022. An end-to-end heterogeneous restraint network for RGB-D cross-modal person re-identification. ACM Transactions on Multimedia Computing, Communications, and Applications(TOMM) 18, 4 (2022), 1\u201322.","journal-title":"ACM Transactions on Multimedia Computing, Communications, and Applications(TOMM)"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00393"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2015.2405574"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58520-4_14"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIFS.2020.3001665"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00612"},{"key":"e_1_3_1_43_2","unstructured":"Hongyi Zhang Moustapha Cisse Yann N. Dauphin and David Lopez-Paz. 2017. mixup: Beyond empirical risk minimization. arXiv:1710.09412. Retrieved from https:\/\/arxiv.org\/pdf\/1710.09412"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2019.2939564"},{"key":"e_1_3_1_45_2","first-page":"649","volume-title":"Proceedings of the 14th European Conference on Computer Vision (ECCV \u201916)","author":"Zhang Richard","year":"2016","unstructured":"Richard Zhang, Phillip Isola, and Alexei A. Efros. 2016. Colorful image colorization. In Proceedings of the 14th European Conference on Computer Vision (ECCV \u201916).Springer, 649\u2013666."},{"key":"e_1_3_1_46_2","first-page":"2153","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Zhang Yukang","year":"2023","unstructured":"Yukang Zhang and Hanzi Wang. 2023. Diverse embedding expansion network and low-light cross-modality benchmark for visible-infrared person re-identification. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 2153\u20132162."},{"key":"e_1_3_1_47_2","doi-asserted-by":"crossref","unstructured":"Yukang Zhang Yan Yan Jie Li and Hanzi Wang. 2023. MRCN: A novel modality restitution and compensation network for visible-infrared person re-identification. arXiv:2303.14626. Retrieved from https:\/\/arxiv.org\/pdf\/2303.14626","DOI":"10.1609\/aaai.v37i3.25459"},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1145\/3474085.3475250"},{"issue":"1","key":"e_1_3_1_49_2","doi-asserted-by":"crossref","first-page":"989","DOI":"10.1109\/JSEN.2021.3130181","article-title":"Visible-infrared person re-identification based on frequency-domain simulated multispectral modality for dual-mode cameras","volume":"22","author":"Zhao Zhenghui","year":"2021","unstructured":"Zhenghui Zhao, Rui Sun, Zi Yang, and Jun Gao. 2021. Visible-infrared person re-identification based on frequency-domain simulated multispectral modality for dual-mode cameras. IEEE Sensors Journal 22, 1 (2021), 989\u20131002.","journal-title":"IEEE Sensors Journal"},{"key":"e_1_3_1_50_2","first-page":"280","volume-title":"Proceedings of the CCF Chinese Conference on Computer Vision","author":"Zhuo Jiaxuan","year":"2017","unstructured":"Jiaxuan Zhuo, Junyong Zhu, Jianhuang Lai, and Xiaohua Xie. 2017. Person re-identification on heterogeneous camera network. In Proceedings of the CCF Chinese Conference on Computer Vision. Springer, 280\u2013291."}],"container-title":["ACM Transactions on Intelligent Systems and Technology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3682066","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3682066","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:10:03Z","timestamp":1750295403000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3682066"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,11,19]]},"references-count":49,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2024,12,31]]}},"alternative-id":["10.1145\/3682066"],"URL":"https:\/\/doi.org\/10.1145\/3682066","relation":{},"ISSN":["2157-6904","2157-6912"],"issn-type":[{"value":"2157-6904","type":"print"},{"value":"2157-6912","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,11,19]]},"assertion":[{"value":"2023-11-10","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-07-18","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-11-19","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}