{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,17]],"date-time":"2026-07-17T23:17:13Z","timestamp":1784330233268,"version":"3.55.0"},"reference-count":68,"publisher":"Association for Computing Machinery (ACM)","issue":"1s","license":[{"start":{"date-parts":[[2021,1,31]],"date-time":"2021-01-31T00:00:00Z","timestamp":1612051200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["61702560, 62072166, 61836016, and 61672177"],"award-info":[{"award-number":["61702560, 62072166, 61836016, and 61672177"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Science and Technology Plan of Hunan Province","award":["2018JJ3691 and 2016JC2011"],"award-info":[{"award-number":["2018JJ3691 and 2016JC2011"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2021,1,31]]},"abstract":"<jats:p>The purpose of cross-modal retrieval is to find the relationship between different modal samples and to retrieve other modal samples with similar semantics by using a certain modal sample. As the data of different modalities presents heterogeneous low-level feature and semantic-related high-level features, the main problem of cross-modal retrieval is how to measure the similarity between different modalities. In this article, we present a novel cross-modal retrieval method, named Hybrid Cross-Modal Similarity Learning model (HCMSL for short). It aims to capture sufficient semantic information from both labeled and unlabeled cross-modal pairs and intra-modal pairs with same classification label. Specifically, a coupled deep fully connected networks are used to map cross-modal feature representations into a common subspace. Weight-sharing strategy is utilized between two branches of networks to diminish cross-modal heterogeneity. Furthermore, two Siamese CNN models are employed to learn intra-modal similarity from samples of same modality. Comprehensive experiments on real datasets clearly demonstrate that our proposed technique achieves substantial improvements over the state-of-the-art cross-modal retrieval techniques.<\/jats:p>","DOI":"10.1145\/3412847","type":"journal-article","created":{"date-parts":[[2021,4,27]],"date-time":"2021-04-27T00:31:07Z","timestamp":1619483467000},"page":"1-22","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":32,"title":["HCMSL: Hybrid Cross-modal Similarity Learning for Cross-modal Retrieval"],"prefix":"10.1145","volume":"17","author":[{"given":"Chengyuan","family":"Zhang","sequence":"first","affiliation":[{"name":"College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jiayu","family":"Song","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Central South University, Changsha, Hunan"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Xiaofeng","family":"Zhu","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering at University of Electronic Science and Technology of China, Chengdu, Sichuan"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Lei","family":"Zhu","sequence":"additional","affiliation":[{"name":"College of Information and Intelligence, Hunan Agricultural University School of Computer Science and Engineering, Central South University"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Shichao","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Central South University, Changsha, Hunan"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2021,4,26]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.5555\/3042817.3043076"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/860435.860460"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2019.105428"},{"key":"e_1_2_1_4_1","volume-title":"Video-based recipe retrieval. Inf. Sci. 514","author":"Cao Da","year":"2020","unstructured":"Da Cao , Ning Han , Hao Chen , Xiaochi Wei , and Xiangnan He. 2020. Video-based recipe retrieval. Inf. Sci. 514 ( 2020 ). Da Cao, Ning Han, Hao Chen, Xiaochi Wei, and Xiangnan He. 2020. Video-based recipe retrieval. Inf. Sci. 514 (2020)."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3343031.3351067"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cviu.2014.04.001"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/1646396.1646452"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2018.2821921"},{"key":"e_1_2_1_9_1","volume-title":"Proceedings of the 3rd International Workshop on Pattern Recognition.","author":"Deng Guangwei","year":"2018","unstructured":"Guangwei Deng , Cheng Xu , XiaoHan Tu , Tao Li , and Nan Gao . 2018 . Rapid image retrieval with binary hash codes based on deep learning . In Proceedings of the 3rd International Workshop on Pattern Recognition. Guangwei Deng, Cheng Xu, XiaoHan Tu, Tao Li, and Nan Gao. 2018. Rapid image retrieval with binary hash codes based on deep learning. In Proceedings of the 3rd International Workshop on Pattern Recognition."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.267"},{"key":"e_1_2_1_11_1","article-title":"Deep hashing neural networks for hyperspectral image feature extraction. IEEE Geosci","volume":"16","author":"Fang Leyuan","year":"2019","unstructured":"Leyuan Fang , Zhiliang Liu , and Weiwei Song . 2019 . Deep hashing neural networks for hyperspectral image feature extraction. IEEE Geosci . Rem. Sens. Lett. 16 , 9 (2019). Leyuan Fang, Zhiliang Liu, and Weiwei Song. 2019. Deep hashing neural networks for hyperspectral image feature extraction. IEEE Geosci. Rem. Sens. Lett. 16, 9 (2019).","journal-title":"Rem. Sens. Lett."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/2647868.2654902"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1093\/biomet\/28.3-4.321"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2018.2866771"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCYB.2018.2879846"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2011.6126524"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3323873.3325019"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2019.102104"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1181"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.5555\/2283516.2283623"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.5555\/3060832.3060864"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/2556195.2556238"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCYB.2016.2608906"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.439"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11071-020-05654-y"},{"key":"e_1_2_1_26_1","first-page":"1072","article-title":"A robust image hashing algorithm resistant against geometrical attacks","volume":"22","author":"Liu Yu Ling","year":"2013","unstructured":"Yu Ling Liu and Yong Xiao . 2013 . A robust image hashing algorithm resistant against geometrical attacks . Radioengineering 22 , 4 (2013), 1072 -- 1082 . Yu Ling Liu and Yong Xiao. 2013. A robust image hashing algorithm resistant against geometrical attacks. Radioengineering 22, 4 (2013), 1072--1082.","journal-title":"Radioengineering"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.13164\/re.2016.0556"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.5555\/2986459.2986562"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.5555\/3104482.3104569"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.5555\/3048787.3048830"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2978572"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.5555\/3061053.3061157"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2017.2742704"},{"key":"e_1_2_1_34_1","volume-title":"CM-GANs: Cross-modal generative adversarial networks for common representation learning. CoRR abs\/1710.05106","author":"Peng Yuxin","year":"2017","unstructured":"Yuxin Peng , Jinwei Qi , and Yuxin Yuan . 2017. CM-GANs: Cross-modal generative adversarial networks for common representation learning. CoRR abs\/1710.05106 ( 2017 ). Yuxin Peng, Jinwei Qi, and Yuxin Yuan. 2017. CM-GANs: Cross-modal generative adversarial networks for common representation learning. CoRR abs\/1710.05106 (2017)."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2013.142"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.466"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.5555\/1866696.1866717"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/1873951.1873987"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.5555\/2354409.2355114"},{"key":"e_1_2_1_40_1","unstructured":"Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In ICLR.  Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In ICLR."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.5555\/2627435.2697059"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3123266.3123326"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICTAI.2015.45"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2015.2455415"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-017-4567-3"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2015.10.028"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.5555\/3045118.3045234"},{"key":"e_1_2_1_48_1","volume-title":"Commun. Applic.","author":"Wang Yang","year":"2020","unstructured":"Yang Wang . 2020. Survey on deep multi-modal data analytics: Collaboration, rivalry and fusion. ACM Trans. Multimedia Comput ., Commun. Applic. ( 2020 ). Yang Wang. 2020. Survey on deep multi-modal data analytics: Collaboration, rivalry and fusion. ACM Trans. Multimedia Comput., Commun. Applic. (2020)."},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/2733373.2806233"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2017.2655449"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/2647868.2654901"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2017.2777489"},{"key":"e_1_2_1_53_1","first-page":"449","article-title":"Cross-modal retrieval with CNN visual features: A new baseline","volume":"47","author":"Wei Yunchao","year":"2017","unstructured":"Yunchao Wei , Yao Zhao , Canyi Lu , Shikui Wei , Luoqi Liu , Zhenfeng Zhu , and Shuicheng Yan . 2017 . Cross-modal retrieval with CNN visual features: A new baseline . IEEE Trans. Cybern. 47 , 2 (2017), 449 \u2013 460 . Yunchao Wei, Yao Zhao, Canyi Lu, Shikui Wei, Luoqi Liu, Zhenfeng Zhu, and Shuicheng Yan. 2017. Cross-modal retrieval with CNN visual features: A new baseline. IEEE Trans. Cybern. 47, 2 (2017), 449\u2013460.","journal-title":"IEEE Trans. Cybern."},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICME.2019.00089"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2013.2291214"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2018.2877886"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2020.2979190"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2018.2878970"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2019.2940684"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-019-08273-x"},{"key":"e_1_2_1_61_1","volume-title":"Honavar","author":"Yakhnenko Oksana","year":"2009","unstructured":"Oksana Yakhnenko and Vasant G . Honavar . 2009 . Multi-modal hierarchical Dirichlet process model for predicting image annotation and image-object label correspondence. In Proceedings of the SIAM International Conference on Data Mining (SDM\u201909). SIAM , 283\u2013293. Oksana Yakhnenko and Vasant G. Honavar. 2009. Multi-modal hierarchical Dirichlet process model for predicting image annotation and image-object label correspondence. In Proceedings of the SIAM International Conference on Data Mining (SDM\u201909). SIAM, 283\u2013293."},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2013.2276704"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.5555\/2892753.2892854"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISECS.2009.118"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1109\/CISP.2008.422"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCYB.2018.2868826"},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01064"},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1145\/2600428.2609610"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3412847","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3412847","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:47:01Z","timestamp":1750193221000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3412847"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,1,31]]},"references-count":68,"journal-issue":{"issue":"1s","published-print":{"date-parts":[[2021,1,31]]}},"alternative-id":["10.1145\/3412847"],"URL":"https:\/\/doi.org\/10.1145\/3412847","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,1,31]]},"assertion":[{"value":"2020-04-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-07-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-04-26","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}