{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:23:28Z","timestamp":1750220608812,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":46,"publisher":"ACM","license":[{"start":{"date-parts":[[2020,6,8]],"date-time":"2020-06-08T00:00:00Z","timestamp":1591574400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Natural Science Foundation of China","award":["61876062"],"award-info":[{"award-number":["61876062"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2020,6,8]]},"DOI":"10.1145\/3372278.3390681","type":"proceedings-article","created":{"date-parts":[[2020,6,2]],"date-time":"2020-06-02T04:35:27Z","timestamp":1591072527000},"page":"117-125","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":20,"title":["Sentence-based and Noise-robust Cross-modal Retrieval on Cooking Recipes and Food Images"],"prefix":"10.1145","author":[{"given":"Zichen","family":"Zan","sequence":"first","affiliation":[{"name":"Wuhan University of Technology, Wuhan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lin","family":"Li","sequence":"additional","affiliation":[{"name":"Wuhan University of Technology, Wuhan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jianquan","family":"Liu","sequence":"additional","affiliation":[{"name":"NEC Corporation, Tokyo, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dong","family":"Zhou","sequence":"additional","affiliation":[{"name":"Hunan University of Science and Technology, Hunan, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2020,6,8]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/3194658.3194663"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2018.2831627"},{"key":"e_1_3_2_1_3_1","volume-title":"Flavor network and the principles of food pairing. Scientific reports","author":"Ahn Yong-Yeol","year":"2011","unstructured":"Yong-Yeol Ahn , Sebastian E Ahnert , James P Bagrow , and Albert-L\u00e1szl\u00f3 Barab\u00e1si . 2011. Flavor network and the principles of food pairing. Scientific reports , Vol. 1 ( 2011 ), 196. Yong-Yeol Ahn, Sebastian E Ahnert, James P Bagrow, and Albert-L\u00e1szl\u00f3 Barab\u00e1si. 2011. Flavor network and the principles of food pairing. Scientific reports, Vol. 1 (2011), 196."},{"key":"e_1_3_2_1_4_1","volume-title":"3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7--9, 2015, Conference Track Proceedings.","author":"Bahdanau Dzmitry","year":"2015","unstructured":"Dzmitry Bahdanau , Kyunghyun Cho , and Yoshua Bengio . 2015 . Neural Machine Translation by Jointly Learning to Align and Translate . In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7--9, 2015, Conference Track Proceedings. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7--9, 2015, Conference Track Proceedings."},{"key":"e_1_3_2_1_5_1","volume-title":"Appetitoso: A search engine for restaurant retrieval based on dishes. CLiC it","author":"Barlacchi Gianni","year":"2016","unstructured":"Gianni Barlacchi , Azad Abad , Emanuele Rossinelli , and Alessandro Moschitti . 2016 . Appetitoso: A search engine for restaurant retrieval based on dishes. CLiC it (2016), Vol. 46 (2016). Gianni Barlacchi, Azad Abad, Emanuele Rossinelli, and Alessandro Moschitti. 2016. Appetitoso: A search engine for restaurant retrieval based on dishes. CLiC it (2016), Vol. 46 (2016)."},{"volume-title":"The 41st ACM SIGIR. ACM, 35--44.","author":"Carvalho Micael","key":"e_1_3_2_1_6_1","unstructured":"Micael Carvalho , R\u00e9mi Cad\u00e8ne , David Picard , Laure Soulier , Nicolas Thome , and Matthieu Cord . 2018. Cross-modal retrieval in the cooking context: Learning semantic text-image embeddings . In The 41st ACM SIGIR. ACM, 35--44. Micael Carvalho, R\u00e9mi Cad\u00e8ne, David Picard, Laure Soulier, Nicolas Thome, and Matthieu Cord. 2018. Cross-modal retrieval in the cooking context: Learning semantic text-image embeddings. In The 41st ACM SIGIR. ACM, 35--44."},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-51811-4_48"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/3240508.3240627"},{"key":"e_1_3_2_1_9_1","volume-title":"Attention-Based Models for Speech Recognition. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015","author":"Chorowski Jan","year":"2015","unstructured":"Jan Chorowski , Dzmitry Bahdanau , Dmitriy Serdyuk , Kyunghyun Cho , and Yoshua Bengio . 2015 . Attention-Based Models for Speech Recognition. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015 , December 7 --12 , 2015, Montreal, Quebec, Canada. 577--585. Jan Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, and Yoshua Bengio. 2015. Attention-Based Models for Speech Recognition. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7--12, 2015, Montreal, Quebec, Canada. 577--585."},{"key":"e_1_3_2_1_10_1","volume-title":"Won Gu Lee, and Hyunwoo Bang","author":"Chung Jungman","year":"2017","unstructured":"Jungman Chung , Jungmin Chung , Wonjun Oh , Yongkyu Yoo , Won Gu Lee, and Hyunwoo Bang . 2017 . A glasses-type wearable device for monitoring the patterns of food intake and facial activity. Scientific reports, Vol. 7 (2017), 41690. Jungman Chung, Jungmin Chung, Wonjun Oh, Yongkyu Yoo, Won Gu Lee, and Hyunwoo Bang. 2017. A glasses-type wearable device for monitoring the patterns of food intake and facial activity. Scientific reports, Vol. 7 (2017), 41690."},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3343031.3351147"},{"key":"e_1_3_2_1_12_1","volume-title":"Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805","author":"Devlin Jacob","year":"2018","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2018 . Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018). Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)."},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3077136.3080826"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICIP.2018.8451461"},{"key":"e_1_3_2_1_15_1","volume-title":"Retrieval and classification of food images. Computers in biology and medicine","author":"Farinella Giovanni Maria","year":"2016","unstructured":"Giovanni Maria Farinella , Dario Allegra , Marco Moltisanti , Filippo Stanco , and Sebastiano Battiato . 2016. Retrieval and classification of food images. Computers in biology and medicine , Vol. 77 ( 2016 ), 23--39. Giovanni Maria Farinella, Dario Allegra, Marco Moltisanti, Filippo Stanco, and Sebastiano Battiato. 2016. Retrieval and classification of food images. Computers in biology and medicine, Vol. 77 (2016), 23--39."},{"key":"e_1_3_2_1_16_1","unstructured":"Ian Goodfellow Jean Pouget-Abadie Mehdi Mirza Bing Xu David Warde-Farley Sherjil Ozair Aaron Courville and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in neural information processing systems. 2672--2680.  Ian Goodfellow Jean Pouget-Abadie Mehdi Mirza Bing Xu David Warde-Farley Sherjil Ozair Aaron Courville and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in neural information processing systems. 2672--2680."},{"key":"e_1_3_2_1_17_1","unstructured":"Ishaan Gulrajani Faruk Ahmed Martin Arjovsky Vincent Dumoulin and Aaron C Courville. 2017. Improved training of wasserstein gans. In Advances in neural information processing systems. 5767--5777.  Ishaan Gulrajani Faruk Ahmed Martin Arjovsky Vincent Dumoulin and Aaron C Courville. 2017. Improved training of wasserstein gans. In Advances in neural information processing systems. 5767--5777."},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_1_19_1","unstructured":"Alexander Hermans Lucas Beyer and Bastian Leibe. 2017. In Defense of the Triplet Loss for Person Re-Identification. CoRR Vol. abs\/1703.07737 (2017).  Alexander Hermans Lucas Beyer and Bastian Leibe. 2017. In Defense of the Triplet Loss for Person Re-Identification. CoRR Vol. abs\/1703.07737 (2017)."},{"key":"e_1_3_2_1_20_1","volume-title":"Long short-term memory. Neural computation","author":"Hochreiter Sepp","year":"1997","unstructured":"Sepp Hochreiter and J\u00fcrgen Schmidhuber . 1997. Long short-term memory. Neural computation , Vol. 9 , 8 ( 1997 ), 1735--1780. Sepp Hochreiter and J\u00fcrgen Schmidhuber. 1997. Long short-term memory. Neural computation, Vol. 9, 8 (1997), 1735--1780."},{"volume-title":"Breakthroughs in statistics","author":"Hotelling Harold","key":"e_1_3_2_1_21_1","unstructured":"Harold Hotelling . 1992. Relations between two sets of variates . In Breakthroughs in statistics . Springer , 162--190. Harold Hotelling. 1992. Relations between two sets of variates. In Breakthroughs in statistics. Springer, 162--190."},{"key":"e_1_3_2_1_22_1","volume-title":"MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. CoRR","author":"Howard Andrew G.","year":"2017","unstructured":"Andrew G. Howard , Menglong Zhu , Bo Chen , Dmitry Kalenichenko , Weijun Wang , Tobias Weyand , Marco Andreetto , and Hartwig Adam . 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. CoRR , Vol. abs\/ 1704 .04861 ( 2017 ). Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. CoRR, Vol. abs\/1704.04861 (2017)."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1356"},{"volume-title":"Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7--9, 2015, Conference Track Proceedings.","author":"Diederik","key":"e_1_3_2_1_24_1","unstructured":"Diederik P. Kingma and Jimmy Ba. 2015 . Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7--9, 2015, Conference Track Proceedings. Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7--9, 2015, Conference Track Proceedings."},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/WACV.2018.00068"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1111\/obr.12340"},{"key":"e_1_3_2_1_27_1","volume-title":"Proceedings of the Tenth International Conference on Web and Social Media","author":"Mejova Yelena","year":"2016","unstructured":"Yelena Mejova , Sofiane Abbar , and Hamed Haddadi . 2016 . Fetishizing Food in Digital Age: #foodporn Around the World . In Proceedings of the Tenth International Conference on Web and Social Media , Cologne, Germany, May 17--20 , 2016. 250--258. Yelena Mejova, Sofiane Abbar, and Hamed Haddadi. 2016. Fetishizing Food in Digital Age: #foodporn Around the World. In Proceedings of the Tenth International Conference on Web and Social Media, Cologne, Germany, May 17--20, 2016. 250--258."},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2017.2759499"},{"key":"e_1_3_2_1_29_1","volume-title":"A survey on food computing. arXiv preprint arXiv:1808.07202","author":"Min Weiqing","year":"2018","unstructured":"Weiqing Min , Shuqiang Jiang , Linhu Liu , Yong Rui , and Ramesh Jain . 2018. A survey on food computing. arXiv preprint arXiv:1808.07202 ( 2018 ). Weiqing Min, Shuqiang Jiang, Linhu Liu, Yong Rui, and Ramesh Jain. 2018. A survey on food computing. arXiv preprint arXiv:1808.07202 (2018)."},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3343031.3350948"},{"key":"e_1_3_2_1_31_1","volume-title":"Food and health: individual, cultural, or scientific matters? Genes & nutrition","author":"Nordstr\u00f6m Karin","year":"2013","unstructured":"Karin Nordstr\u00f6m , Christian Coff , H\u00e5kan J\u00f6nsson , Lennart Nordenfelt , and Ulf G\u00f6rman . 2013. Food and health: individual, cultural, or scientific matters? Genes & nutrition , Vol. 8 , 4 ( 2013 ), 357. Karin Nordstr\u00f6m, Christian Coff, H\u00e5kan J\u00f6nsson, Lennart Nordenfelt, and Ulf G\u00f6rman. 2013. Food and health: individual, cultural, or scientific matters? Genes & nutrition, Vol. 8, 4 (2013), 357."},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/3038912.3052663"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2017.2705068"},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3284750"},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/3041021.3055137"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.327"},{"key":"e_1_3_2_1_37_1","volume-title":"3rd International Conference on Learning Representations, ICLR","author":"Simonyan Karen","year":"2015","unstructured":"Karen Simonyan and Andrew Zisserman . 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition . In 3rd International Conference on Learning Representations, ICLR 2015 , San Diego, CA , USA, May 7--9, 2015, Conference Track Proceedings . Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7--9, 2015, Conference Track Proceedings."},{"key":"e_1_3_2_1_38_1","volume-title":"Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani , Noam Shazeer , Niki Parmar , Jakob Uszkoreit , Llion Jones , Aidan N. Gomez , Lukasz Kaiser , and Illia Polosukhin . 2017 . Attention is All you Need . In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017 , 4--9 December 2017, Long Beach, CA, USA. 5998--6008. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4--9 December 2017, Long Beach, CA, USA. 5998--6008."},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01184"},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2017.2655449"},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/2766462.2767825"},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/2964284.2967205"},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2018.2878970"},{"key":"e_1_3_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2018.2856253"},{"key":"e_1_3_2_1_45_1","volume-title":"Deep Cross-Modal Correlation Learning for Audio and Lyrics in Music Retrieval. TOMM","volume":"15","author":"Yu Yi","year":"2019","unstructured":"Yi Yu , Suhua Tang , Francisco Raposo , and Lei Chen . 2019 b . Deep Cross-Modal Correlation Learning for Audio and Lyrics in Music Retrieval. TOMM , Vol. 15 , 1 (2019), 20:1--20:16. Yi Yu, Suhua Tang, Francisco Raposo, and Lei Chen. 2019 b. Deep Cross-Modal Correlation Learning for Audio and Lyrics in Music Retrieval. TOMM, Vol. 15, 1 (2019), 20:1--20:16."},{"key":"e_1_3_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01174"}],"event":{"name":"ICMR '20: International Conference on Multimedia Retrieval","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Dublin Ireland","acronym":"ICMR '20"},"container-title":["Proceedings of the 2020 International Conference on Multimedia Retrieval"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3372278.3390681","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3372278.3390681","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T21:32:10Z","timestamp":1750195930000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3372278.3390681"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,6,8]]},"references-count":46,"alternative-id":["10.1145\/3372278.3390681","10.1145\/3372278"],"URL":"https:\/\/doi.org\/10.1145\/3372278.3390681","relation":{},"subject":[],"published":{"date-parts":[[2020,6,8]]},"assertion":[{"value":"2020-06-08","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}