{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,10]],"date-time":"2026-04-10T16:11:49Z","timestamp":1775837509146,"version":"3.50.1"},"reference-count":46,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2019,8,8]],"date-time":"2019-08-08T00:00:00Z","timestamp":1565222400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Guangzhou Science and Technology Program, China","award":["201510010165"],"award-info":[{"award-number":["201510010165"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61672548, U1611461, 61173081"],"award-info":[{"award-number":["61672548, U1611461, 61173081"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2019,8,31]]},"abstract":"<jats:p>One fundamental problem in image search is to learn the ranking functions (i.e., the similarity between query and image). Recent progress on this topic has evolved through two paradigms: the text-based model and image ranker learning. The former relies on image surrounding texts, making the similarity sensitive to the quality of textual descriptions. The latter may suffer from the robustness problem when human-labeled query-image pairs cannot represent user search intent precisely. We demonstrate in this article that the preceding two limitations can be well mitigated by learning a cross-view embedding that leverages click data. Specifically, a novel click-based Deep Structure-Preserving Embeddings with visual Attention (DSPEA) model is presented, which consists of two components: deep convolutional neural networks followed by image embedding layers for learning visual embedding, and a deep neural networks for generating query semantic embedding. Meanwhile, visual attention is incorporated at the top of the convolutional neural network to reflect the relevant regions of the image to the query. Furthermore, considering the high dimension of the query space, a new click-based representation on a query set is proposed for alleviating this sparsity problem. The whole network is end-to-end trained by optimizing a large margin objective that combines cross-view ranking constraints with in-view neighborhood structure preservation constraints. On a large-scale click-based image dataset with 11.7 million queries and 1 million images, our model is shown to be powerful for keyword-based image search with superior performance over several state-of-the-art methods and achieves, to date, the best reported NDCG@25 of 52.21%.<\/jats:p>","DOI":"10.1145\/3328994","type":"journal-article","created":{"date-parts":[[2019,8,8]],"date-time":"2019-08-08T12:30:31Z","timestamp":1565267431000},"page":"1-19","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["Learning Click-Based Deep Structure-Preserving Embeddings with Visual Attention"],"prefix":"10.1145","volume":"15","author":[{"given":"Yehao","family":"Li","sequence":"first","affiliation":[{"name":"Sun Yat-sen University, Guangzhou, China"}]},{"given":"Yingwei","family":"Pan","sequence":"additional","affiliation":[{"name":"JD AI Research, Beijing, China"}]},{"given":"Ting","family":"Yao","sequence":"additional","affiliation":[{"name":"JD AI Research, Beijing, China"}]},{"given":"Hongyang","family":"Chao","sequence":"additional","affiliation":[{"name":"Sun Yat-sen University, Guangzhou, China"}]},{"given":"Yong","family":"Rui","sequence":"additional","affiliation":[{"name":"Lenovo, Beijing, China"}]},{"given":"Tao","family":"Mei","sequence":"additional","affiliation":[{"name":"JD AI Research, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2019,8,8]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Proceedings of the 2015 International Conference on Learning Representations.","author":"Bahdanau Dzmitry","year":"2015","unstructured":"Dzmitry Bahdanau , Kyunghyun Cho , and Yoshua Bengio . 2015 . Neural machine translation by jointly learning to align and translate . In Proceedings of the 2015 International Conference on Learning Representations. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 2015 International Conference on Learning Representations."},{"key":"e_1_2_1_2_1","doi-asserted-by":"crossref","unstructured":"Bing Bai Jason Weston David Grangier Ronan Collobert Kunihiko Sadamasa Yanjun Qi Corinna Cortes and Mehryar Mohri. 2009. Polynomial semantic indexing. In Advances in Neural Information Processing Systems. 64--72. Bing Bai Jason Weston David Grangier Ronan Collobert Kunihiko Sadamasa Yanjun Qi Corinna Cortes and Mehryar Mohri. 2009. Polynomial semantic indexing. In Advances in Neural Information Processing Systems. 64--72.","DOI":"10.1145\/1645953.1645979"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/2647868.2656402"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.5555\/1756006.1756042"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/2505515.2505532"},{"key":"e_1_2_1_6_1","volume-title":"Devise: A deep visual-semantic embedding model. In Advances in Neural Information Processing Systems. 2121--2129.","author":"Frome Andrea","year":"2013","unstructured":"Andrea Frome , Greg S. Corrado , Jonathon Shlens , Samy Bengio , Jeffrey Dean , Marc\u2019Aurelio Ranzato , and Tomas Mikolov . 2013 . Devise: A deep visual-semantic embedding model. In Advances in Neural Information Processing Systems. 2121--2129. Andrea Frome, Greg S. Corrado, Jonathon Shlens, Samy Bengio, Jeffrey Dean, Marc\u2019Aurelio Ranzato, and Tomas Mikolov. 2013. Devise: A deep visual-semantic embedding model. In Advances in Neural Information Processing Systems. 2121--2129."},{"key":"e_1_2_1_7_1","first-page":"361","article-title":"Statistical consistency of kernel canonical correlation analysis","volume":"8","author":"Fukumizu Kenji","year":"2007","unstructured":"Kenji Fukumizu , Francis R. Bach , and Arthur Gretton . 2007 . Statistical consistency of kernel canonical correlation analysis . Journal of Machine Learning Research 8 (2007), 361 -- 383 . Kenji Fukumizu, Francis R. Bach, and Arthur Gretton. 2007. Statistical consistency of kernel canonical correlation analysis. Journal of Machine Learning Research 8 (2007), 361--383.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-013-0658-4"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1162\/0899766042321814"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2016.2558463"},{"key":"e_1_2_1_12_1","doi-asserted-by":"crossref","unstructured":"Ralf Herbrich Thore Graepel and Klaus Obermayer. 2000. Large margin rank boundaries for ordinal regression. In Advances in Large Margin Classifiers. 115--132. Ralf Herbrich Thore Graepel and Klaus Obermayer. 2000. Large margin rank boundaries for ordinal regression. In Advances in Large Margin Classifiers. 115--132.","DOI":"10.7551\/mitpress\/1113.003.0010"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/2502081.2502283"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/1963405.1963447"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2647868.2654889"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2015.2435740"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2015.2390499"},{"key":"e_1_2_1_18_1","volume-title":"Hinton","author":"Krizhevsky Alex","year":"2012","unstructured":"Alex Krizhevsky , Ilya Sutskever , and Geoffrey E . Hinton . 2012 . ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems . 1097--1105. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2964284.2964320"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/2733373.2806373"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.5555\/1953048.2021038"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/2600428.2609568"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/2647868.2656404"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/2502081.2508128"},{"key":"e_1_2_1_25_1","volume-title":"Proceedings of the 3rd Text REtrieval Conference.","author":"Robertson S. E.","unstructured":"S. E. Robertson , S. Walker , S. Jones , M. Hancock-Beaulieu , and M. Gatford . 1994. Okapi at TREC-3 . In Proceedings of the 3rd Text REtrieval Conference. S. E. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, and M. Gatford. 1994. Okapi at TREC-3. In Proceedings of the 3rd Text REtrieval Conference."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.1990.10474928"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1007\/11752790_2"},{"key":"e_1_2_1_28_1","volume-title":"McGill","author":"Salton Gerard","year":"1986","unstructured":"Gerard Salton and Michael J . McGill . 1986 . Introduction to Modern Information Retrieval. McGraw-Hill , New York, NY. Gerard Salton and Michael J. McGill. 1986. Introduction to Modern Information Retrieval. McGraw-Hill, New York, NY."},{"key":"e_1_2_1_29_1","unstructured":"Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N. Gomez \u0141ukasz Kaiser and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 5998--6008. Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N. Gomez \u0141ukasz Kaiser and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 5998--6008."},{"key":"e_1_2_1_30_1","volume-title":"Proceedings of the International Conference on Learning Representations.","author":"Veli\u010dkovi\u0107 Petar","year":"2018","unstructured":"Petar Veli\u010dkovi\u0107 , Guillem Cucurull , Arantxa Casanova , Adriana Romero , Pietro Li\u00f2 , and Yoshua Bengio . 2018 . Graph attention networks . In Proceedings of the International Conference on Learning Representations. Petar Veli\u010dkovi\u0107, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Li\u00f2, and Yoshua Bengio. 2018. Graph attention networks. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298935"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.29"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/2433396.2433481"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2018.2872898"},{"key":"e_1_2_1_35_1","volume-title":"Proceedings of the International Conference on Machine Learning. 2048--2057","author":"Xu Kelvin","year":"2015","unstructured":"Kelvin Xu , Jimmy Ba , Ryan Kiros , Kyunghyun Cho , Aaron Courville , Ruslan Salakhudinov , Rich Zemel , and Yoshua Bengio . 2015 . Show, attend and tell: Neural image caption generation with visual attention . In Proceedings of the International Conference on Machine Learning. 2048--2057 . Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In Proceedings of the International Conference on Machine Learning. 2048--2057."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298966"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/2457450.2457456"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.10"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/1816041.1816048"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.12"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/2502081.2502085"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2012.2236341"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298826"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/984321.984322"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/2978656"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2014.2326010"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3328994","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3328994","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:54:41Z","timestamp":1750204481000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3328994"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,8,8]]},"references-count":46,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2019,8,31]]}},"alternative-id":["10.1145\/3328994"],"URL":"https:\/\/doi.org\/10.1145\/3328994","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,8,8]]},"assertion":[{"value":"2018-09-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-04-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-08-08","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}