{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,18]],"date-time":"2026-04-18T16:41:00Z","timestamp":1776530460600,"version":"3.51.2"},"reference-count":49,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2022,3,4]],"date-time":"2022-03-04T00:00:00Z","timestamp":1646352000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["61876056 and 61771180"],"award-info":[{"award-number":["61876056 and 61771180"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2022,11,30]]},"abstract":"<jats:p>The RGB-D cross-modal person re-identification (re-id) task aims to identify the person of interest across the RGB and depth image modes. The tremendous discrepancy between these two modalities makes this task difficult to tackle. Few researchers pay attention to this task, and the deep networks of existing methods still cannot be trained in an end-to-end manner. Therefore, this article proposes an end-to-end module for RGB-D cross-modal person re-id. This network introduces a cross-modal relational branch to narrow the gaps between two heterogeneous images. It models the abundant correlations between any cross-modal sample pairs, which are constrained by heterogeneous interactive learning. The proposed network also exploits a dual-modal local branch, which aims to capture the common spatial contexts in two modalities. This branch adopts shared attentive pooling and mutual contextual graph networks to extract the spatial attention within each local region and the spatial relations between distinct local parts, respectively. Experimental results on two public benchmark datasets, that is, the BIWI and RobotPKU datasets, demonstrate that our method is superior to the state-of-the-art. In addition, we perform thorough experiments to prove the effectiveness of each component in the proposed method.<\/jats:p>","DOI":"10.1145\/3506708","type":"journal-article","created":{"date-parts":[[2022,3,4]],"date-time":"2022-03-04T10:31:58Z","timestamp":1646389918000},"page":"1-22","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":16,"title":["An End-to-end Heterogeneous Restraint Network for RGB-D Cross-modal Person Re-identification"],"prefix":"10.1145","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3818-4277","authenticated-orcid":false,"given":"Jingjing","family":"Wu","sequence":"first","affiliation":[{"name":"School of Computer Science and Information Engineering, Hefei University of Technology, Hefei city, Anhui Province, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jianguo","family":"Jiang","sequence":"additional","affiliation":[{"name":"Key Laboratory of Knowledge Engineering with Big Data (Hefei University of Technology) Ministry of Education, School of Computer Science and Information Engineering, Hefei University of Technology, Anhui Province Key Laboratory of Industry Safety and Emergency Technology, Hefei, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Meibin","family":"Qi","sequence":"additional","affiliation":[{"name":"Key Laboratory of Knowledge Engineering with Big Data (Hefei University of Technology) Ministry of Education, School of Computer Science and Information Engineering, Hefei University of Technology, Anhui Province Key Laboratory of Industry Safety and Emergency Technology, Hefei, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Cuiqun","family":"Chen","sequence":"additional","affiliation":[{"name":"School of Computer Science and Information Engineering, Hefei University of Technology, Hefei city, Anhui Province, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jingjing","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Computer Science and Information Engineering, Hefei University of Technology, Hefei city, Anhui Province, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,3,4]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/2502081.2502147"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.145"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jvcir.2019.01.010"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2019.2928126"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00374"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/AVSS.2019.8909838"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_1_10_2","article-title":"In defense of the triplet loss for person re-identification","author":"Hermans Alexander","year":"2017","unstructured":"Alexander Hermans, Lucas Beyer, and Bastian Leibe. 2017. In defense of the triplet loss for person re-identification. arXiv:1703.07737","journal-title":"arXiv:1703.07737"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2020.03.109"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2018.10.002"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/3412384"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/3362988"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298832"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2019.06.006"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2014.2369055"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.trit.2017.04.001"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00304"},{"key":"e_1_3_1_20_2","unstructured":"Weiyang Liu Yandong Wen Zhiding Yu and Meng Yang. 2016. Large-margin Softmax loss for convolutional neural networks.. In International Conference on Machine Learning New York City NY USA Vol. 2. 7. Microtome Publishing 507\u2013516."},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2019.00190"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2013.52"},{"key":"e_1_3_1_23_2","first-page":"1","volume-title":"2013 International Workshop on Biometrics and Forensics (IWBF)","author":"M\u00f8gelmose Andreas","year":"2013","unstructured":"Andreas M\u00f8gelmose, Thomas B. Moeslund, and Kamal Nasrollahi. 2013. Multimodal person re-identification using RGB-D sensors and a transient identification database. In 2013 International Workshop on Biometrics and Forensics (IWBF), Lisbon, Portugal. IEEE, 1\u20134."},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4471-6296-4_8"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2015.2424056"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.577"},{"key":"e_1_3_1_27_2","article-title":"YOLOv3: An incremental improvement","author":"Redmon Joseph","year":"2018","unstructured":"Joseph Redmon and Ali Farhadi. 2018. YOLOv3: An incremental improvement. arXiv:1804.02767","journal-title":"arXiv:1804.02767"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46448-0_44"},{"key":"e_1_3_1_29_2","article-title":"Very deep convolutional networks for large-scale image recognition","author":"Simonyan Karen","year":"2014","unstructured":"Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556","journal-title":"arXiv:1409.1556"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.427"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00643"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.410"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01225-0_30"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.144"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00372"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00813"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/WACV.2018.00087"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2017.2675201"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.575"},{"key":"e_1_3_1_40_2","article-title":"Personnet: Person re-identification with deep convolutional neural networks","author":"Wu Lin","year":"2016","unstructured":"Lin Wu, Chunhua Shen, and Anton van den Hengel. 2016. Personnet: Person re-identification with deep convolutional neural networks. arXiv:1601.07255","journal-title":"arXiv:1601.07255"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2019.2891895"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2015.2405574"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58520-4_14"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIFS.2020.3001665"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICPR48806.2021.9412576"},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2019.2939564"},{"key":"e_1_3_1_47_2","article-title":"AlignedReID: Surpassing human-level performance in person re-identification","author":"Zhang Xuan","year":"2017","unstructured":"Xuan Zhang, Hao Luo, Xing Fan, Weilai Xiang, Yixiao Sun, Qiqi Xiao, Wei Jiang, Chi Zhang, and Jian Sun. 2017. AlignedReID: Surpassing human-level performance in person re-identification. arXiv:1711.08184","journal-title":"arXiv:1711.08184"},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.349"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1145\/3159171"},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-981-10-7305-2_25"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3506708","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3506708","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:11:50Z","timestamp":1750191110000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3506708"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,4]]},"references-count":49,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2022,11,30]]}},"alternative-id":["10.1145\/3506708"],"URL":"https:\/\/doi.org\/10.1145\/3506708","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,3,4]]},"assertion":[{"value":"2021-06-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-12-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-03-04","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}