{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,24]],"date-time":"2026-04-24T21:36:13Z","timestamp":1777066573221,"version":"3.51.4"},"reference-count":52,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2019,11,30]],"date-time":"2019-11-30T00:00:00Z","timestamp":1575072000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["61771025"],"award-info":[{"award-number":["61771025"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2019,11,30]]},"abstract":"<jats:p>\n            Cross-modal hashing aims to map heterogeneous multimedia data into a common Hamming space through hash function, and achieves fast and flexible cross-modal retrieval. Most existing cross-modal hashing methods learn hash function by mining the correlation among multimedia data, but ignore the important property of multimedia data: Each modality of multimedia data has features of different scales, such as texture, object, and scene features in the image, which can provide complementary information for boosting retrieval task. The correlations among the multi-scale features are more abundant than the correlations between single features of multimedia data, which reveal finer underlying structures of the multimedia data and can be used for effective hashing function learning. Therefore, we propose the\n            <jats:bold>M<\/jats:bold>\n            ulti-scale\n            <jats:bold>C<\/jats:bold>\n            orrelation\n            <jats:bold>S<\/jats:bold>\n            equential\n            <jats:bold>C<\/jats:bold>\n            ross-modal\n            <jats:bold>H<\/jats:bold>\n            ashing (\n            <jats:bold>MCSCH<\/jats:bold>\n            ) approach, and its main contributions can be summarized as follows: (1)\n            <jats:bold>Multi-scale feature guided sequential hashing learning method<\/jats:bold>\n            is proposed to share the information from features of different scales through an RNN-based network and generate the hash codes sequentially. The features of different scales are used to guide the hash codes generation, which can enhance the diversity of the hash codes and weaken the influence of errors in specific features, such as false object features caused by occlusion. (2)\n            <jats:bold>Multi-scale correlation mining strategy<\/jats:bold>\n            is proposed to align the features of different scales in different modalities and mine the correlations among aligned features. These correlations reveal the finer underlying structure of multimedia data and can help to boost the hash function learning. (3)\n            <jats:bold>Correlation evaluation network<\/jats:bold>\n            evaluates the importance of the correlations to select the worthwhile correlations, and increases the impact of these correlations for hash function learning. Experiments on two widely-used 2-media datasets and a 5-media dataset demonstrate the effectiveness of our proposed MCSCH approach.\n          <\/jats:p>","DOI":"10.1145\/3356338","type":"journal-article","created":{"date-parts":[[2019,12,26]],"date-time":"2019-12-26T21:05:46Z","timestamp":1577394346000},"page":"1-20","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":21,"title":["Sequential Cross-Modal Hashing Learning via Multi-scale Correlation Mining"],"prefix":"10.1145","volume":"15","author":[{"given":"Zhaoda","family":"Ye","sequence":"first","affiliation":[{"name":"Peking University, Beijing, China"}]},{"given":"Yuxin","family":"Peng","sequence":"additional","affiliation":[{"name":"Peking University, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2019,12,26]]},"reference":[{"key":"e_1_2_1_1_1","first-page":"22","article-title":"Mixed image-keyword query adaptive hashing over multilabel images","volume":"10","author":"Liu Xianglong","year":"2014","unstructured":"Xianglong Liu , Yadong Mu , Bo Lang , and Shih-Fu Chang . 2014 . Mixed image-keyword query adaptive hashing over multilabel images . ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 10 , 2 (2014), 22 . Xianglong Liu, Yadong Mu, Bo Lang, and Shih-Fu Chang. 2014. Mixed image-keyword query adaptive hashing over multilabel images. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 10, 2 (2014), 22.","journal-title":"ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2017.2749147"},{"key":"e_1_2_1_3_1","first-page":"2","article-title":"Image retrieval with query-adaptive hashing","volume":"9","author":"Liu Dong","year":"2013","unstructured":"Dong Liu , Shuicheng Yan , Rong-Rong Ji , Xian-Sheng Hua , and Hong-Jiang Zhang . 2013 . Image retrieval with query-adaptive hashing . ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 9 , 1 (2013), 2 . Dong Liu, Shuicheng Yan, Rong-Rong Ji, Xian-Sheng Hua, and Hong-Jiang Zhang. 2013. Image retrieval with query-adaptive hashing. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 9, 1 (2013), 2.","journal-title":"ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2015.2467315"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2012.193"},{"key":"e_1_2_1_6_1","first-page":"2","article-title":"Image retrieval with query-adaptive hashing","volume":"9","author":"Liu Dong","year":"2013","unstructured":"Dong Liu , Shuicheng Yan , Rong-Rong Ji , Xian-Sheng Hua , and Hong-Jiang Zhang . 2013 . Image retrieval with query-adaptive hashing . ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 9 , 1 (2013), 2 . Dong Liu, Shuicheng Yan, Rong-Rong Ji, Xian-Sheng Hua, and Hong-Jiang Zhang. 2013. Image retrieval with query-adaptive hashing. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 9, 1 (2013), 2.","journal-title":"ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2015.2487976"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1016\/0306-4573(88)90021-0"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/6138.6149"},{"key":"e_1_2_1_10_1","volume-title":"Lee","author":"Yuwono Budi","year":"1997","unstructured":"Budi Yuwono and Dik L . Lee . 1997 . Server ranking for distributed text retrieval systems on the internet. In Database Systems for Advanced Applications (DASFAA). World Scientific , 41--49. Budi Yuwono and Dik L. Lee. 1997. Server ranking for distributed text retrieval systems on the internet. In Database Systems for Advanced Applications (DASFAA). World Scientific, 41--49."},{"key":"e_1_2_1_11_1","volume-title":"Modern Information Retrieval","author":"Baeza-Yates Ricardo","unstructured":"Ricardo Baeza-Yates and Berthier Ribeiro-Neto . 1999. Modern Information Retrieval . Vol. 463 . ACM Press New York . Ricardo Baeza-Yates and Berthier Ribeiro-Neto. 1999. Modern Information Retrieval. Vol. 463. ACM Press New York."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2006.873157"},{"key":"e_1_2_1_13_1","first-page":"26","article-title":"Sparse transfer learning for interactive video search reranking","volume":"8","author":"Tian Xinmei","year":"2012","unstructured":"Xinmei Tian , Dacheng Tao , and Yong Rui . 2012 . Sparse transfer learning for interactive video search reranking . ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 8 , 3 (2012), 26 . Xinmei Tian, Dacheng Tao, and Yong Rui. 2012. Sparse transfer learning for interactive video search reranking. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 8, 3 (2012), 26.","journal-title":"ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3284750"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463676.2465274"},{"key":"e_1_2_1_16_1","volume-title":"Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI)","volume":"22","author":"Kumar Shaishav","year":"2011","unstructured":"Shaishav Kumar and Raghavendra Udupa . 2011 . Learning hash functions for cross-view similarity search . In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI) , Vol. 22 . 1360. Shaishav Kumar and Raghavendra Udupa. 2011. Learning hash functions for cross-view similarity search. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), Vol. 22. 1360."},{"key":"e_1_2_1_17_1","volume-title":"Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). ACM, 579--588","author":"Long Mingsheng","unstructured":"Mingsheng Long , Yue Cao , Jianmin Wang , and Philip S. Yu . 2016. Composite correlation quantization for efficient multimodal retrieval . In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). ACM, 579--588 . Mingsheng Long, Yue Cao, Jianmin Wang, and Philip S. Yu. 2016. Composite correlation quantization for efficient multimodal retrieval. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). ACM, 579--588."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2010.5539928"},{"key":"e_1_2_1_19_1","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence (AAAI)","volume":"1","author":"Zhang Dongqing","year":"2014","unstructured":"Dongqing Zhang and Wu-Jun Li . 2014 . Large-scale supervised multimodal hashing with semantic correlation maximization . In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) , Vol. 1 . 7. Dongqing Zhang and Wu-Jun Li. 2014. Large-scale supervised multimodal hashing with semantic correlation maximization. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), Vol. 1. 7."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/2647868.2655059"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/2911996.2912000"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2018.2842190"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3240508.3240560"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1093\/biomet\/28.3-4.321"},{"key":"e_1_2_1_25_1","volume-title":"Proceedings of the Advances in Neural Information Processing Systems (NIPS). 1753--1760","author":"Weiss Yair","year":"2009","unstructured":"Yair Weiss , Antonio Torralba , and Rob Fergus . 2009 . Spectral hashing . In Proceedings of the Advances in Neural Information Processing Systems (NIPS). 1753--1760 . Yair Weiss, Antonio Torralba, and Rob Fergus. 2009. Spectral hashing. In Proceedings of the Advances in Neural Information Processing Systems (NIPS). 1753--1760."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2016.2607421"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2018.2890144"},{"key":"e_1_2_1_28_1","volume-title":"Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). 3890--3896","author":"Wang Di","year":"2015","unstructured":"Di Wang , Xinbo Gao , Xiumei Wang , and Lihuo He . 2015 . Semantic topic multimodal hashing for cross-media retrieval . In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). 3890--3896 . Di Wang, Xinbo Gao, Xiumei Wang, and Lihuo He. 2015. Semantic topic multimodal hashing for cross-media retrieval. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). 3890--3896."},{"key":"e_1_2_1_29_1","volume-title":"Unsupervised generative adversarial cross-modal hashing. arXiv preprint arXiv:1712.00358","author":"Zhang Jian","year":"2017","unstructured":"Jian Zhang , Yuxin Peng , and Mingkuan Yuan . 2017. Unsupervised generative adversarial cross-modal hashing. arXiv preprint arXiv:1712.00358 ( 2017 ). Jian Zhang, Yuxin Peng, and Mingkuan Yuan. 2017. Unsupervised generative adversarial cross-modal hashing. arXiv preprint arXiv:1712.00358 (2017)."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/2623330.2623688"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7299011"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCYB.2016.2606441"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2016.2645565"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2018\/349"},{"key":"e_1_2_1_35_1","volume-title":"Hinton","author":"Krizhevsky Alex","year":"2012","unstructured":"Alex Krizhevsky , Ilya Sutskever , and Geoffrey E . Hinton . 2012 . Imagenet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems (NIPS) . 1097--1105. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems (NIPS). 1097--1105."},{"key":"e_1_2_1_36_1","volume-title":"Mask R-CNN. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV). IEEE, 2980--2988","author":"He Kaiming","year":"2017","unstructured":"Kaiming He , Georgia Gkioxari , Piotr Doll\u00e1r , and Ross Girshick . 2017 . Mask R-CNN. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV). IEEE, 2980--2988 . Kaiming He, Georgia Gkioxari, Piotr Doll\u00e1r, and Ross Girshick. 2017. Mask R-CNN. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV). IEEE, 2980--2988."},{"key":"e_1_2_1_37_1","volume-title":"Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 1445--1454","author":"Cao Yue","unstructured":"Yue Cao , Mingsheng Long , Jianmin Wang , Qiang Yang , and Philip S. Yuy . 2016. Deep visual-semantic hashing for cross-modal retrieval . In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 1445--1454 . Yue Cao, Mingsheng Long, Jianmin Wang, Qiang Yang, and Philip S. Yuy. 2016. Deep visual-semantic hashing for cross-modal retrieval. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 1445--1454."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.348"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2018.2821921"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2018.2877122"},{"key":"e_1_2_1_41_1","volume-title":"Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556","author":"Simonyan Karen","year":"2014","unstructured":"Karen Simonyan and Andrew Zisserman . 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 ( 2014 ). Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)."},{"key":"e_1_2_1_42_1","volume-title":"Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882","author":"Kim Yoon","year":"2014","unstructured":"Yoon Kim . 2014. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 ( 2014 ). Yoon Kim. 2014. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014)."},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.510"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2013.441"},{"key":"e_1_2_1_45_1","volume-title":"Proceedings of the International Conference on Music Information Retrieval. 600--3.","author":"McKay Cory","year":"2005","unstructured":"Cory McKay , Ichiro Fujinaga , and Philippe Depalle . 2005 . jAudio: A feature extraction library . In Proceedings of the International Conference on Music Information Retrieval. 600--3. Cory McKay, Ichiro Fujinaga, and Philippe Depalle. 2005. jAudio: A feature extraction library. In Proceedings of the International Conference on Music Information Retrieval. 600--3."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1111\/1467-8659.00669"},{"key":"e_1_2_1_47_1","volume-title":"Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval. ACM, 39--43","author":"Mark","unstructured":"Mark J. Huiskes and Michael S. Lew. 2008. The MIR flickr retrieval evaluation . In Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval. ACM, 39--43 . Mark J. Huiskes and Michael S. Lew. 2008. The MIR flickr retrieval evaluation. In Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval. ACM, 39--43."},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2013.142"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2015.2400779"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2013.2276704"},{"key":"e_1_2_1_51_1","volume-title":"Proceedings of the AAAI Conference on Artifcial Intelligence (AAAI).","author":"Zhang Jian","year":"2018","unstructured":"Jian Zhang , Yuxin Peng , and Mingkuan Yuan . 2018 . Unsupervised generative adversarial cross-modal hashing . In Proceedings of the AAAI Conference on Artifcial Intelligence (AAAI). Jian Zhang, Yuxin Peng, and Mingkuan Yuan. 2018. Unsupervised generative adversarial cross-modal hashing. In Proceedings of the AAAI Conference on Artifcial Intelligence (AAAI)."},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00446"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3356338","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3356338","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:44:51Z","timestamp":1750203891000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3356338"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,11,30]]},"references-count":52,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2019,11,30]]}},"alternative-id":["10.1145\/3356338"],"URL":"https:\/\/doi.org\/10.1145\/3356338","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,11,30]]},"assertion":[{"value":"2018-11-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-08-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-12-26","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}