{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,21]],"date-time":"2026-01-21T14:53:02Z","timestamp":1769007182459,"version":"3.49.0"},"reference-count":50,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2022,1,4]],"date-time":"2022-01-04T00:00:00Z","timestamp":1641254400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,1,4]],"date-time":"2022-01-04T00:00:00Z","timestamp":1641254400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61370205"],"award-info":[{"award-number":["61370205"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"science and technology project of Chongqing Education Commission of China","award":["KJQN201900520"],"award-info":[{"award-number":["KJQN201900520"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2022,4]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>With the vigorous development of mobile Internet technology and the popularization of smart devices, while the amount of multimedia data has exploded, its forms have become more and more diversified. People\u2019s demand for information is no longer satisfied with single-modal data retrieval, and cross-modal retrieval has become a research hotspot in recent years. Due to the strong feature learning ability of deep learning, cross-modal deep hashing has been extensively studied. However, the similarity of different modalities is difficult to measure directly because of the different distribution and representation of cross-modal. Therefore, it is urgent to eliminate the modal gap and improve retrieval accuracy. Some previous research work has introduced GANs in cross-modal hashing to reduce semantic differences between different modalities. However, most of the existing GAN-based cross-modal hashing methods have some issues such as network training is unstable and gradient disappears, which affect the elimination of modal differences. To solve this issue, this paper proposed a novel Semantic-guided Autoencoder Adversarial Hashing method for cross-modal retrieval (SAAH). First of all, two kinds of adversarial autoencoder networks, under the guidance of semantic multi-labels, maximize the semantic relevance of instances and maintain the immutability of cross-modal. Secondly, under the supervision of semantics, the adversarial module guides the feature learning process and maintains the modality relations. In addition, to maintain the inter-modal correlation of all similar pairs, this paper use two types of loss functions to maintain the similarity. To verify the effectiveness of our proposed method, sufficient experiments were conducted on three widely used cross-modal datasets (MIRFLICKR, NUS-WIDE and MS COCO), and compared with several representatives advanced cross-modal retrieval methods, SAAH achieved leading retrieval performance.<\/jats:p>","DOI":"10.1007\/s40747-021-00615-3","type":"journal-article","created":{"date-parts":[[2022,1,4]],"date-time":"2022-01-04T07:03:04Z","timestamp":1641279784000},"page":"1603-1617","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["Semantic-guided autoencoder adversarial hashing for large-scale cross-modal retrieval"],"prefix":"10.1007","volume":"8","author":[{"given":"Mingyong","family":"Li","sequence":"first","affiliation":[]},{"given":"Qiqi","family":"Li","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5517-3633","authenticated-orcid":false,"given":"Yan","family":"Ma","sequence":"additional","affiliation":[]},{"given":"Degang","family":"Yang","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,1,4]]},"reference":[{"key":"615_CR1","unstructured":"Gionis A, Indyk P, Motwani R et al (1999) Similarity search in high dimensions via hashing. In: VLDB, vol 99, pp 518\u2013529"},{"key":"615_CR2","doi-asserted-by":"crossref","unstructured":"Datar M, Immorlica N, Indyk P, Mirrokni VS (2004) Locality sensitive hashing scheme based on p-stable distributions. In: Proceedings of the twentieth annual symposium on Computational geometry. ACM, pp 253\u2013262","DOI":"10.1145\/997817.997857"},{"key":"615_CR3","doi-asserted-by":"crossref","unstructured":"Ji J, Li J, Yan S, Tian Q, Zhang B (2013) Min-max hash for Jaccard similarity. In: 2013 IEEE 13th international conference on data mining. IEEE, pp 301\u2013309","DOI":"10.1109\/ICDM.2013.119"},{"issue":"11","key":"615_CR4","doi-asserted-by":"publisher","first-page":"5427","DOI":"10.1109\/TIP.2016.2607421","volume":"25","author":"G Ding","year":"2016","unstructured":"Ding G, Guo Y, Zhou J et al (2016) Large-scale cross-modality search via collective matrix factorization hashing. IEEE Trans Image Process 25(11):5427\u20135440","journal-title":"IEEE Trans Image Process"},{"key":"615_CR5","doi-asserted-by":"crossref","unstructured":"Zhou J, Ding G, Guo Y (2014) Latent semantic sparse hashing for cross-modal similarity search. In: Proceedings of the 37th ACM SIGIR international conference on research and development in information retrieval, Gold Coast, QLD, Australia. ACM, pp 415\u2013424","DOI":"10.1145\/2600428.2609610"},{"key":"615_CR6","doi-asserted-by":"crossref","unstructured":"Xia R, Pan Y, Lai H, Liu C, Yan S (2014) Supervised hashing for image retrieval via image representation learning. In: Proceedings of the AAAI conference on artificial intelligence, pp 2156\u20132162","DOI":"10.1609\/aaai.v28i1.8952"},{"issue":"1","key":"615_CR7","first-page":"22","volume":"15","author":"Y Peng","year":"2019","unstructured":"Peng Y, Qi J (2019) CM-GANs: cross-modal generative adversarial networks for common representation learning. ACM Trans Multimed Comput Commun Appl (TOMM) 15(1):22","journal-title":"ACM Trans Multimed Comput Commun Appl (TOMM)"},{"key":"615_CR8","unstructured":"Wang K, Yin Q, Wang W, Wu S, Wang L (2016) A comprehensive survey on cross-modal retrieval. arXiv preprint arXiv:1607.06215"},{"key":"615_CR9","doi-asserted-by":"crossref","unstructured":"Ding G, Guo Y, Zhou J (2014) Collective matrix factorization hashing for multimodal data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2075\u20132082","DOI":"10.1109\/CVPR.2014.267"},{"key":"615_CR10","unstructured":"Kumar S, Udupa R (2011) Learning hash functions for cross-view similarity search. In: IJCAI proceedings-international joint conference on artificial intelligence, vol 22, p 1360"},{"key":"615_CR11","doi-asserted-by":"crossref","unstructured":"Zhang D, Li W-J (2014) Large-scale supervised multimodal hashing with semantic correlation maximization. In: AAAI, vol 1, p 7","DOI":"10.1609\/aaai.v28i1.8995"},{"key":"615_CR12","doi-asserted-by":"crossref","unstructured":"Cao Y, Long M, Wang J, Yang Q, Yu PS (2016) Deep visual-semantic hashing for cross-modal retrieval. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1445\u20131454","DOI":"10.1145\/2939672.2939812"},{"key":"615_CR13","doi-asserted-by":"crossref","unstructured":"Liu L, Shen F, Shen Y, Liu X, Shao L (2017) Deep sketch hashing: fast free-hand sketch-based image retrieval. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 2298\u20132307","DOI":"10.1109\/CVPR.2017.247"},{"key":"615_CR14","doi-asserted-by":"crossref","unstructured":"Li C, Deng C, Li N, Liu W, Gao X, Tao D (2018) Self-supervised adversarial hashing networks for cross-modal retrieval. In: CVPR, pp 4242\u20134251","DOI":"10.1109\/CVPR.2018.00446"},{"key":"615_CR15","doi-asserted-by":"crossref","unstructured":"Gu W, Gu X, Gu J, Li B, Xiong Z, Wang W (2019) Adversary guided asymmetric hashing for cross-modal retrieval. ICMR 159\u2013167","DOI":"10.1145\/3323873.3325045"},{"key":"615_CR16","unstructured":"Kumar S, Udupa R (2011) Learning hash functions for cross-view similarity search. In: IJCAI, vol 22, p 1360"},{"key":"615_CR17","unstructured":"Wang D, Gao X, Wang X, He L (2015) Semantic topic multimodal hashing for cross-media retrieval. In: IJCAI, pp 3890\u20133896"},{"key":"615_CR18","doi-asserted-by":"crossref","unstructured":"Bronstein MM, Bronstein AM, Michel F, Paragios N (2010) Data fusion through cross-modality metric learning using similarity-sensitive hashing. In: CVPR, pp 3594\u20133601","DOI":"10.1109\/CVPR.2010.5539928"},{"key":"615_CR19","doi-asserted-by":"crossref","unstructured":"Zhang D, Li WJ (2014) Large-scale supervised multimodal hashing with semantic correlation maximization. In: AAAI, pp 2177\u20132183","DOI":"10.1609\/aaai.v28i1.8995"},{"key":"615_CR20","doi-asserted-by":"crossref","unstructured":"Lin Z, Ding G, Hu M, Wang J (2015) Semantics-preserving hashing for cross-view retrieval. In: CVPR","DOI":"10.1109\/CVPR.2015.7299011"},{"key":"615_CR21","doi-asserted-by":"crossref","unstructured":"Jiang Q-Y, Li W-J (2017) Deep cross-modal hashing. In: CVPR","DOI":"10.1109\/CVPR.2017.348"},{"key":"615_CR22","doi-asserted-by":"crossref","unstructured":"Yang E, Deng C, Liu W, Liu X, Tao D, Gao X (2017) Pairwise relationship guided deep hashing for cross-modal retrieval. In: AAAI, pp 1618\u20131625","DOI":"10.1609\/aaai.v31i1.10719"},{"key":"615_CR23","doi-asserted-by":"crossref","unstructured":"Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details: delving deep into convolutional nets. arXiv preprint arXiv:1405.3531","DOI":"10.5244\/C.28.6"},{"key":"615_CR24","doi-asserted-by":"crossref","unstructured":"Wang B, Yang Y, Xu X, Hanjalic A, Shen HT (2017) Adversarial cross-modal retrieval. In: ACMMM, pp 154\u2013162","DOI":"10.1145\/3123266.3123326"},{"key":"615_CR25","doi-asserted-by":"crossref","unstructured":"Chua T-S, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) NUS-WIDE: a real-world web image database from National University of Singapore. In: ACM CIVR, p 48","DOI":"10.1145\/1646396.1646452"},{"key":"615_CR26","doi-asserted-by":"crossref","unstructured":"Huiskes MJ, Lew MS (2008) The MIR Flickr retrieval evaluation. In ACM: CIVR, pp 39\u201343","DOI":"10.1145\/1460096.1460104"},{"key":"615_CR27","doi-asserted-by":"crossref","unstructured":"Liu H, Wang R, Shan S, Chen X (2016) Deep supervised hashing for fast image retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2064\u20132072","DOI":"10.1109\/CVPR.2016.227"},{"key":"615_CR28","doi-asserted-by":"crossref","unstructured":"Liu W, Wang J, Ji R, Jiang Y-G, Chang S-F (2012) Supervised hashing with kernels. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 2074\u20132081","DOI":"10.1109\/CVPR.2012.6247912"},{"key":"615_CR29","doi-asserted-by":"crossref","unstructured":"Peng Y, Huang X, Zhao YZ (2017) An overview of cross-media retrieval: concepts, methodologies, benchmarks and challenges. IEEE Trans Circuits Syst Video Technol (2017)","DOI":"10.1109\/TCSVT.2017.2705068"},{"key":"615_CR30","doi-asserted-by":"crossref","unstructured":"Lin Z, Ding G, Hu M, Wang J (2015) Semantics-preserving hashing for cross-view retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3864\u20133872","DOI":"10.1109\/CVPR.2015.7299011"},{"key":"615_CR31","unstructured":"Cao Y, Long M, Wang J (2016) Correlation hashing network for efficient cross-modal retrieval (2016). CoRR abs\/1602.06697. arXiv:1602.06697"},{"key":"615_CR32","doi-asserted-by":"crossref","unstructured":"Liong VE, Lu J, Tan Y-P, Zhou J (2017) Cross-modal deep variational hashing. In: 2017 IEEE international conference on computer vision (ICCV). IEEE, pp 4097\u20134105","DOI":"10.1109\/ICCV.2017.439"},{"key":"615_CR33","doi-asserted-by":"crossref","unstructured":"Su S , Zhong Z , Zhang C (2019) Deep joint-semantics reconstructing hashing for large-scale unsupervised cross-modal retrieval. ICCV","DOI":"10.1109\/ICCV.2019.00312"},{"key":"615_CR34","doi-asserted-by":"crossref","unstructured":"Li C , Deng C , Wang L et al (2019) Coupled CycleGAN: unsupervised hashing network for cross-modal retrieval. AAAI","DOI":"10.1609\/aaai.v33i01.3301176"},{"key":"615_CR35","unstructured":"Chen Z-D, Yu W-J, Li C-X, Nie L, Xu X-S (2018) Dual deep neural networks cross-modal hashing. In: AAAI, pp 274\u2013281"},{"issue":"9","key":"615_CR36","doi-asserted-by":"publisher","first-page":"2400","DOI":"10.1109\/TMM.2018.2804763","volume":"20","author":"J Zhang","year":"2018","unstructured":"Zhang J, Peng Y (2018) Query-adaptive image retrieval by deep-weighted hashing. IEEE Trans Multimed 20(9):2400\u20132414","journal-title":"IEEE Trans Multimed"},{"issue":"4","key":"615_CR37","doi-asserted-by":"publisher","first-page":"1602","DOI":"10.1109\/TIP.2018.2878970","volume":"28","author":"L Wu","year":"2019","unstructured":"Wu L, Wang Y, Shao L (2019) Cycle-consistent deep generative hashing for cross-modal retrieval. IEEE Trans Image Process 28(4):1602\u20131612","journal-title":"IEEE Trans Image Process"},{"key":"615_CR38","doi-asserted-by":"crossref","unstructured":"Li K et al (2019) Visual semantic reasoning for image-text matching. ICCV","DOI":"10.1109\/ICCV.2019.00475"},{"key":"615_CR39","doi-asserted-by":"publisher","first-page":"3626","DOI":"10.1109\/TIP.2020.2963957","volume":"29","author":"D Xie","year":"2020","unstructured":"Xie D, Deng C, Li C, Liu X, Tao D (2020) Multi-task consistency-preserving adversarial hashing for cross-modal retrieval. IEEE Trans Image Process 29:3626\u20133637. https:\/\/doi.org\/10.1109\/TIP.2020.2963957","journal-title":"IEEE Trans Image Process"},{"issue":"1","key":"615_CR40","doi-asserted-by":"publisher","first-page":"401","DOI":"10.1109\/TCSVT.2020.2974877","volume":"31","author":"X Nie","year":"2021","unstructured":"Nie X, Wang B, Li J, Hao F, Jian M, Yin YL (2021) Deep multiscale fusion hashing for cross-modal retrieval. IEEE Trans Circuits Syst Video Technol 31(1):401\u2013410","journal-title":"IEEE Trans Circuits Syst Video Technol"},{"issue":"2","key":"615_CR41","doi-asserted-by":"publisher","first-page":"657","DOI":"10.1007\/s11280-018-0541-x","volume":"22","author":"X Xu","year":"2019","unstructured":"Xu X, He L, Lu HM et al (2019) Deep adversarial metric learning for cross-modal retrieval. World Wide Web 22(2):657\u2013672","journal-title":"World Wide Web"},{"key":"615_CR42","doi-asserted-by":"crossref","unstructured":"Feng F, Wang X, Li R (2014) Cross-modal retrieval with correspondence autoencoder. In: Proceedings of the 22nd ACM international conference on multimodal, New York, USA. ACM, pp 7\u201316","DOI":"10.1145\/2647868.2654902"},{"issue":"28","key":"615_CR43","doi-asserted-by":"publisher","first-page":"165","DOI":"10.1016\/j.neucom.2018.11.042","volume":"331","author":"Y Wu","year":"2019","unstructured":"Wu Y, Wang S, Huang Q (2019) Multi-modal semantic autoencoder for cross-modal retrieval. Neurocomputing 331(28):165\u2013175","journal-title":"Neurocomputing"},{"issue":"3","key":"615_CR44","doi-asserted-by":"publisher","first-page":"1047","DOI":"10.1109\/TCYB.2018.2879846","volume":"50","author":"X Huang","year":"2020","unstructured":"Huang X, Peng Y, Yuan M (2020) MHTN: modal-adversarial hybrid transfer network for cross-modal retrieval. IEEE Trans Cybern 50(3):1047\u20131059","journal-title":"IEEE Trans Cybern"},{"key":"615_CR45","doi-asserted-by":"crossref","unstructured":"Zhang J, Peng YX, Yuan MK (2018) Unsupervised generative adversarial cross-modal hashing. In: Proceedings of the 32nd AAAI conference on artificial intelligence. AAAI","DOI":"10.1609\/aaai.v32i1.11263"},{"issue":"1","key":"615_CR46","doi-asserted-by":"publisher","first-page":"174","DOI":"10.1109\/TMM.2019.2922128","volume":"22","author":"J Zhang","year":"2020","unstructured":"Zhang J, Peng Y (2020) Multi-pathway generative adversarial hashing for unsupervised cross-modal retrieval. IEEE Trans Multimed 22(1):174\u2013187","journal-title":"IEEE Trans Multimed"},{"key":"615_CR47","doi-asserted-by":"publisher","first-page":"255","DOI":"10.1016\/j.neucom.2020.03.019","volume":"400","author":"X Wang","year":"2020","unstructured":"Wang X, Zou X, Bakker EM, Wu S (2020) Self-constraining and attention-based hashing network for bit-scalable cross-modal retrieval. Neurocomputing 400:255\u2013271","journal-title":"Neurocomputing"},{"key":"615_CR48","doi-asserted-by":"publisher","first-page":"240","DOI":"10.1016\/j.neucom.2019.11.061","volume":"381","author":"M Zhang","year":"2020","unstructured":"Zhang M, Li J, Zhang HX, Liu L (2020) Deep semantic cross modal hashing with correlation alignment. Neurocomputing 381:240\u2013251","journal-title":"Neurocomputing"},{"key":"615_CR49","doi-asserted-by":"crossref","unstructured":"Wu F, Jing X, Wu Z, Ji Y, Dong X, Luo X, Huang QH, Wang R (2020) Modality-specific and shared generative adversarial network for cross-modal retrieval. Pattern Recognit 104:107335","DOI":"10.1016\/j.patcog.2020.107335"},{"key":"615_CR50","doi-asserted-by":"crossref","unstructured":"Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollr P, Zitnick CL (2014) Microsoft coco: common objects in context. In: ECCV, pp 740\u2013755","DOI":"10.1007\/978-3-319-10602-1_48"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-021-00615-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-021-00615-3\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-021-00615-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,21]],"date-time":"2023-01-21T18:43:06Z","timestamp":1674326586000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-021-00615-3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,1,4]]},"references-count":50,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2022,4]]}},"alternative-id":["615"],"URL":"https:\/\/doi.org\/10.1007\/s40747-021-00615-3","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"value":"2199-4536","type":"print"},{"value":"2198-6053","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,1,4]]},"assertion":[{"value":"26 May 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"26 November 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 January 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}