{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,27]],"date-time":"2025-10-27T16:19:32Z","timestamp":1761581972268,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":28,"publisher":"ACM","license":[{"start":{"date-parts":[[2020,6,8]],"date-time":"2020-06-08T00:00:00Z","timestamp":1591574400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Natural Science Foundation of China","award":["61876062"],"award-info":[{"award-number":["61876062"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2020,6,8]]},"DOI":"10.1145\/3372278.3390717","type":"proceedings-article","created":{"date-parts":[[2020,6,2]],"date-time":"2020-06-02T04:35:27Z","timestamp":1591072527000},"page":"571-577","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":10,"title":["A Coordinated Representation Learning Enhanced Multimodal Machine Translation Approach with Multi-Attention"],"prefix":"10.1145","author":[{"given":"Yifeng","family":"Han","sequence":"first","affiliation":[{"name":"Wuhan University of Technology, Wuhan, China"}]},{"given":"Lin","family":"Li","sequence":"additional","affiliation":[{"name":"Wuhan University of Technology, Wuhan, China"}]},{"given":"Jianwei","family":"Zhang","sequence":"additional","affiliation":[{"name":"Iwate University, Morioka, Japan"}]}],"member":"320","published-online":{"date-parts":[[2020,6,8]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"A. Vaswani N. Shazeer N. Parmar J. Uszkoreit L. Jones A. N. Gomez L. Kaiser and I. Polosukhin. 2017. Attention is All you Need. In Advances in neural information processing systems. 5998--6008.  A. Vaswani N. Shazeer N. Parmar J. Uszkoreit L. Jones A. N. Gomez L. Kaiser and I. Polosukhin. 2017. Attention is All you Need. In Advances in neural information processing systems. 5998--6008."},{"volume-title":"Proceedings of the Third Conference on Machine Translation: Shared Task Papers. 603--611","author":"Gr\u00f6nroos S.-A.","key":"e_1_3_2_1_2_1","unstructured":"S.-A. Gr\u00f6nroos , B. Huet , M. Kurimo , J. Laaksonen , B. M\u00e9rialdo , P. Pham , M. Sj\u00f6berg , U. Sulubacak , J. Tiedemann , R. Troncy , and R. V\u00e1zquez . 2018. The MeMAD Submission to the WMT18 Multimodal Translation Task . In Proceedings of the Third Conference on Machine Translation: Shared Task Papers. 603--611 . S.-A. Gr\u00f6nroos, B. Huet, M. Kurimo, J. Laaksonen, B. M\u00e9rialdo, P. Pham, M. Sj\u00f6berg, U. Sulubacak, J. Tiedemann, R. Troncy, and R. V\u00e1zquez. 2018. The MeMAD Submission to the WMT18 Multimodal Translation Task. In Proceedings of the Third Conference on Machine Translation: Shared Task Papers. 603--611."},{"volume-title":"Proceedings of the Third Conference on Machine Translation: Shared Task Papers. 616--623","author":"Helcl J.","key":"e_1_3_2_1_3_1","unstructured":"J. Helcl , J. Libovick\u00fd , and D. Varis . 2018. CUNI System for the WMT18 Multimodal Translation Task . In Proceedings of the Third Conference on Machine Translation: Shared Task Papers. 616--623 . J. Helcl, J. Libovick\u00fd, and D. Varis. 2018. CUNI System for the WMT18 Multimodal Translation Task. In Proceedings of the Third Conference on Machine Translation: Shared Task Papers. 616--623."},{"key":"e_1_3_2_1_4_1","unstructured":"R. Kiros R. Salakhutdinov and R. S. Zemel. 2014. Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models. arXiv preprint arXiv:1411.2539.  R. Kiros R. Salakhutdinov and R. S. Zemel. 2014. Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models. arXiv preprint arXiv:1411.2539."},{"volume-title":"Proceedings of the First Conference on Machine Translation. 543--553","author":"Specia L.","key":"e_1_3_2_1_5_1","unstructured":"L. Specia , S. Frank , K. Sima'an , and D. Elliott . 2016. A Shared Task on Multimodal Machine Translation and Crosslingual Image Description . In Proceedings of the First Conference on Machine Translation. 543--553 . L. Specia, S. Frank, K. Sima'an, and D. Elliott. 2016. A Shared Task on Multimodal Machine Translation and Crosslingual Image Description. In Proceedings of the First Conference on Machine Translation. 543--553."},{"volume-title":"Proceedings of the Second Conference on Machine Translation. 215--233","author":"Elliott D.","key":"e_1_3_2_1_6_1","unstructured":"D. Elliott , S. Frank , L. Barrault , F. Bougares , and L. Specia . 2017. Findings of the Second Shared Task on Multimodal Machine Translation and Multilingual Image Description . In Proceedings of the Second Conference on Machine Translation. 215--233 . D. Elliott, S. Frank, L. Barrault, F. Bougares, and L. Specia. 2017. Findings of the Second Shared Task on Multimodal Machine Translation and Multilingual Image Description. In Proceedings of the Second Conference on Machine Translation. 215--233."},{"volume-title":"Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence. 4792--4799","author":"Chen K.","key":"e_1_3_2_1_7_1","unstructured":"K. Chen , R. Wang , M. Utiyama , E. Sumita and T. Zhao . 2018. Syntax-Directed Attention for Neural Machine Translation . In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence. 4792--4799 . K. Chen, R. Wang, M. Utiyama, E. Sumita and T. Zhao. 2018. Syntax-Directed Attention for Neural Machine Translation. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence. 4792--4799."},{"volume-title":"Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence. 563--570","author":"Zhao S.","key":"e_1_3_2_1_8_1","unstructured":"S. Zhao and Z. Zhang . 2018. Attention-via-Attention Neural Machine Translation . In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence. 563--570 . S. Zhao and Z. Zhang. 2018. Attention-via-Attention Neural Machine Translation. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence. 563--570."},{"key":"e_1_3_2_1_9_1","volume-title":"In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 1799--1808","author":"Domhan T.","year":"2018","unstructured":"T. Domhan . How Much Attention Do You Need? A Granular Analysis of Neural Machine Translation Architectures . 2018 . In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 1799--1808 . T. Domhan. How Much Attention Do You Need? A Granular Analysis of Neural Machine Translation Architectures. 2018. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 1799--1808."},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"crossref","unstructured":"Y. Yu S. Tang F. Raposo and L. Chen. 2019. Cross-modal correlation learning for audio and lyrics in music retrieval. In ACM Transaction on Multimedia Computing Communication and Applications (TOMCCAP). 1--16.  Y. Yu S. Tang F. Raposo and L. Chen. 2019. Cross-modal correlation learning for audio and lyrics in music retrieval. In ACM Transaction on Multimedia Computing Communication and Applications (TOMCCAP). 1--16.","DOI":"10.1145\/3281746"},{"volume-title":"Proceedings of the 22nd ACM international conference on Multimedia. 17--26","author":"Habibian A.","key":"e_1_3_2_1_11_1","unstructured":"A. Habibian ., T. Mensink , and C. G. Snoek . 2014. Videostory: A new multimedia embedding for few-example recognition and translation of events . In Proceedings of the 22nd ACM international conference on Multimedia. 17--26 . A. Habibian., T. Mensink, and C. G. Snoek. 2014. Videostory: A new multimedia embedding for few-example recognition and translation of events. In Proceedings of the 22nd ACM international conference on Multimedia. 17--26."},{"volume-title":"3rd International Conference on Learning Representations. http:\/\/arxiv.org\/abs\/1409","author":"Bahdanau D.","key":"e_1_3_2_1_12_1","unstructured":"D. Bahdanau , K. Cho , and Y. Bengio . 2015. Neural Machine Translation by Jointly Learning to Align and Translate . In 3rd International Conference on Learning Representations. http:\/\/arxiv.org\/abs\/1409 .0473. D. Bahdanau, K. Cho, and Y. Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. In 3rd International Conference on Learning Representations. http:\/\/arxiv.org\/abs\/1409.0473."},{"volume-title":"Proceedings of the 32nd International Conference on Machine Learning. 2048--2057","author":"Xu K.","key":"e_1_3_2_1_13_1","unstructured":"K. Xu , J. Ba , R. Kiros , K. Cho , A. C. Courville , R. Salakhutdinov , R. S. Zemel , and Y. Bengio . 2015. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention . In Proceedings of the 32nd International Conference on Machine Learning. 2048--2057 . K. Xu, J. Ba, R. Kiros, K. Cho, A. C. Courville, R. Salakhutdinov, R. S. Zemel, and Y. Bengio. 2015. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. In Proceedings of the 32nd International Conference on Machine Learning. 2048--2057."},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W16-2358"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W17-4746"},{"key":"e_1_3_2_1_16_1","volume":"201","author":"Libovick\u00fd J.","unstructured":"J. Libovick\u00fd , and J. Helcl. 201 7. Attention Strategies for Multi-Source Sequence-to-Sequence Learning. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 196--202. J. Libovick\u00fd, and J. Helcl. 2017. Attention Strategies for Multi-Source Sequence-to-Sequence Learning. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 196--202.","journal-title":"J. Helcl."},{"key":"e_1_3_2_1_17_1","volume-title":"Proceedings of the Eighth International Joint Conference on Natural Language Processing. 130--141","author":"Elliott D.","year":"2017","unstructured":"D. Elliott , and \u00c1. K\u00e1d\u00e1r. 2017 . Imagination Improves Multimodal Translation . In Proceedings of the Eighth International Joint Conference on Natural Language Processing. 130--141 . D. Elliott, and \u00c1. K\u00e1d\u00e1r. 2017. Imagination Improves Multimodal Translation. In Proceedings of the Eighth International Joint Conference on Natural Language Processing. 130--141."},{"volume-title":"Proceedings of the 24th ACM International Conference on Information and Knowledge Management. 333--342","author":"Duan H.","key":"e_1_3_2_1_18_1","unstructured":"H. Duan and C. Zhai . 2015. Mining Coordinated Intent Representation for Entity Search and Recommendation . In Proceedings of the 24th ACM International Conference on Information and Knowledge Management. 333--342 . H. Duan and C. Zhai. 2015. Mining Coordinated Intent Representation for Entity Search and Recommendation. In Proceedings of the 24th ACM International Conference on Information and Knowledge Management. 333--342."},{"key":"e_1_3_2_1_19_1","unstructured":"S. G. Finlayson M.B.A. McDermott A.V. Pickering S. L. Lipnick W. Yuan and I. S. Kohane. 2019. Approaching Small Molecule Prioritization as a Cross-Modal Information Retrieval Task through Coordinated Representation Learning. arXiv preprint arXiv:1911.10241.  S. G. Finlayson M.B.A. McDermott A.V. Pickering S. L. Lipnick W. Yuan and I. S. Kohane. 2019. Approaching Small Molecule Prioritization as a Cross-Modal Information Retrieval Task through Coordinated Representation Learning. arXiv preprint arXiv:1911.10241."},{"key":"e_1_3_2_1_20_1","volume":"201","author":"Wang D.","unstructured":"D. Wang and J. Xu. 201 9. Differentially Private Empirical Risk Minimization with Smooth Non-Convex Loss Functions: A Non-Stationary View. In The Thirty-Third {AAAI} Conference on Artificial Intelligence. 1182--1189. D. Wang and J. Xu. 2019. Differentially Private Empirical Risk Minimization with Smooth Non-Convex Loss Functions: A Non-Stationary View. In The Thirty-Third {AAAI} Conference on Artificial Intelligence. 1182--1189.","journal-title":"J. Xu."},{"volume-title":"Focus Is All You Need: Loss Functions for Event-Based Vision. In IEEE Conference on Computer Vision and Pattern Recognition. 12280--12289","author":"Gallego G.","key":"e_1_3_2_1_21_1","unstructured":"G. Gallego , M. Gehrig , and D. Scaramuzza . 2019 . Focus Is All You Need: Loss Functions for Event-Based Vision. In IEEE Conference on Computer Vision and Pattern Recognition. 12280--12289 . G. Gallego, M. Gehrig, and D. Scaramuzza. 2019. Focus Is All You Need: Loss Functions for Event-Based Vision. In IEEE Conference on Computer Vision and Pattern Recognition. 12280--12289."},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"crossref","unstructured":"A. Semenov V. Boginski and E. L. Pasiliao. 2019. Neural Networks with Multidimensional Cross-Entropy Loss Functions. In Computational Data and Social Networks - 8th International Conference. 57--62.  A. Semenov V. Boginski and E. L. Pasiliao. 2019. Neural Networks with Multidimensional Cross-Entropy Loss Functions. In Computational Data and Social Networks - 8th International Conference. 57--62.","DOI":"10.1007\/978-3-030-34980-6_5"},{"volume-title":"Proceedings of the 5th Workshop on Vision and Language. https:\/\/www.aclweb.org\/anthology\/W16--3210\/.","author":"Elliott D.","key":"e_1_3_2_1_23_1","unstructured":"D. Elliott , S. Frank , K. Sima'an , and L. Specia . 2016. Multi30K: Multilingual English-German Image Descriptions . In Proceedings of the 5th Workshop on Vision and Language. https:\/\/www.aclweb.org\/anthology\/W16--3210\/. D. Elliott, S. Frank, K. Sima'an, and L. Specia. 2016. Multi30K: Multilingual English-German Image Descriptions. In Proceedings of the 5th Workshop on Vision and Language. https:\/\/www.aclweb.org\/anthology\/W16--3210\/."},{"volume-title":"Proceedings of the IEEE international conference on computer vision. 2641--2649","author":"Plummer B. A.","key":"e_1_3_2_1_24_1","unstructured":"B. A. Plummer , L. Wang , C. M. Cervantes , J. C. Caicedo , J. Hockenmaier , and S. Lazebnik . 2015. Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models . In Proceedings of the IEEE international conference on computer vision. 2641--2649 . B. A. Plummer, L. Wang, C. M. Cervantes, J. C. Caicedo, J. Hockenmaier, and S. Lazebnik. 2015. Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models. In Proceedings of the IEEE international conference on computer vision. 2641--2649."},{"volume-title":"2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance. 1--5.","author":"Saqib M.","key":"e_1_3_2_1_25_1","unstructured":"M. Saqib , S. D. Khan , N. Sharma , and M. Blumenstein . 2017. A study on detecting drones using deep convolutional neural networks . In 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance. 1--5. M. Saqib, S. D. Khan, N. Sharma, and M. Blumenstein. 2017. A study on detecting drones using deep convolutional neural networks. In 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance. 1--5."},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3175536.3175570"},{"key":"e_1_3_2_1_27_1","volume":"201","author":"Kingma D. P.","unstructured":"D. P. Kingma , and J. Ba. 201 5. Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations. http:\/\/arxiv.org\/abs\/1412.6980. D. P. Kingma, and J. Ba. 2015. Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations. http:\/\/arxiv.org\/abs\/1412.6980.","journal-title":"J. Ba."},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"crossref","unstructured":"Y. Yu S. Tang K. Aizawa and A. Aizawa. 2019. Category-based deep CCA for fine-grained venue discovery from multimodal data. In IEEE Transaction on Neural Network and Learning System (TNNLS). 1250--1258.  Y. Yu S. Tang K. Aizawa and A. Aizawa. 2019. Category-based deep CCA for fine-grained venue discovery from multimodal data. In IEEE Transaction on Neural Network and Learning System (TNNLS). 1250--1258.","DOI":"10.1109\/TNNLS.2018.2856253"}],"event":{"name":"ICMR '20: International Conference on Multimedia Retrieval","sponsor":["SIGMM ACM Special Interest Group on Multimedia"],"location":"Dublin Ireland","acronym":"ICMR '20"},"container-title":["Proceedings of the 2020 International Conference on Multimedia Retrieval"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3372278.3390717","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3372278.3390717","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:33:25Z","timestamp":1750199605000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3372278.3390717"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,6,8]]},"references-count":28,"alternative-id":["10.1145\/3372278.3390717","10.1145\/3372278"],"URL":"https:\/\/doi.org\/10.1145\/3372278.3390717","relation":{},"subject":[],"published":{"date-parts":[[2020,6,8]]},"assertion":[{"value":"2020-06-08","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}