{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,3,3]],"date-time":"2024-03-03T01:26:20Z","timestamp":1709429180191},"reference-count":34,"publisher":"MIT Press - Journals","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Transactions of the Association for Computational Linguistics"],"published-print":{"date-parts":[[2020,12]]},"abstract":"<jats:p> Neural machine translation (NMT) systems are usually trained on clean parallel data. They can perform very well for translating clean in-domain texts. However, as demonstrated by previous work, the translation quality significantly worsens when translating noisy texts, such as user-generated texts (UGT) from online social media. Given the lack of parallel data of UGT that can be used to train or adapt NMT systems, we synthesize parallel data of UGT, exploiting monolingual data of UGT through crosslingual language model pre-training and zero-shot NMT systems. This paper presents two different but complementary approaches: One alters given clean parallel data into UGT-like parallel data whereas the other generates translations from monolingual data of UGT. On the MTNT translation tasks, we show that our synthesized parallel data can lead to better NMT systems for UGT while making them more robust in translating texts from various domains and styles. <\/jats:p>","DOI":"10.1162\/tacl_a_00341","type":"journal-article","created":{"date-parts":[[2020,11,13]],"date-time":"2020-11-13T20:25:25Z","timestamp":1605299125000},"page":"710-725","source":"Crossref","is-referenced-by-count":1,"title":["Synthesizing Parallel Data of User-Generated Texts with Zero-Shot Neural Machine Translation"],"prefix":"10.1162","volume":"8","author":[{"given":"Benjamin","family":"Marie","sequence":"first","affiliation":[{"name":"National Institute of Information and Communications Technology 3-5 Hikaridai, Seika-cho, Soraku-gun, Kyoto, 619-0289, Japan."}]},{"given":"Atsushi","family":"Fujita","sequence":"additional","affiliation":[{"name":"National Institute of Information and Communications Technology 3-5 Hikaridai, Seika-cho, Soraku-gun, Kyoto, 619-0289, Japan."}]}],"member":"281","reference":[{"key":"bib1","first-page":"597604","volume-title":"Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics","author":"Bannard Colin","year":"2005"},{"key":"bib2","volume-title":"Proceedings of the 6th International Conference on Learning Representations","author":"Belinkov Yonatan","year":"2018"},{"key":"bib3","doi-asserted-by":"crossref","first-page":"168","DOI":"10.18653\/v1\/D19-5617","volume-title":"Proceedings of the 3rd Workshop on Neural Generation and Translation","author":"Berard Alexandre","year":"2019"},{"key":"bib4","doi-asserted-by":"crossref","first-page":"526","DOI":"10.18653\/v1\/W19-5361","volume-title":"Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)","author":"Berard Alexandre","year":"2019"},{"key":"bib5","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18653\/v1\/W15-30","volume-title":"Proceedings of the Tenth Workshop on Statistical Machine Translation","author":"Bojar Ond\u0159ej","year":"2015"},{"key":"bib6","doi-asserted-by":"crossref","first-page":"53","DOI":"10.18653\/v1\/W19-5206","volume-title":"Proceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers)","author":"Caswell Isaac","year":"2019"},{"key":"bib7","first-page":"176","volume-title":"Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies","author":"Clark Jonathan H.","year":"2011"},{"key":"bib8","first-page":"7057","volume-title":"Proceedings of Advances in Neural Information Processing Systems 32","author":"Conneau Alexis","year":"2019"},{"key":"bib9","first-page":"4171","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Devlin Jacob","year":"2019"},{"key":"bib10","first-page":"45","volume-title":"Proceedings of MT Summit XIV Workshop on Post-editing Technology and Practice","author":"Gerlach Johanna","year":"2013"},{"key":"bib11","first-page":"690","volume-title":"Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)","author":"Heafield Kenneth","year":"2013"},{"issue":"2","key":"bib12","first-page":"24:1","volume":"19","author":"Imankulova Aizhan","year":"2019","journal-title":"ACM Transactions on Asian and Low-Resource Language Information Processing"},{"key":"bib13","doi-asserted-by":"crossref","first-page":"116","DOI":"10.18653\/v1\/P18-4020","volume-title":"Proceedings of ACL 2018, System Demonstrations","author":"Junczys-Dowmunt Marcin","year":"2018"},{"key":"bib14","doi-asserted-by":"crossref","first-page":"42","DOI":"10.18653\/v1\/D19-5506","volume-title":"Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)","author":"Karpukhin Vladimir","year":"2019"},{"key":"bib15","volume":"1412","author":"Kingma Diederik P.","year":"2014","journal-title":"CoRR"},{"key":"bib16","first-page":"177","volume-title":"Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions","author":"Koehn Philipp","year":"2007"},{"key":"bib17","doi-asserted-by":"crossref","first-page":"5039","DOI":"10.18653\/v1\/D18-1549","volume-title":"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing","author":"Lample Guillaume","year":"2018"},{"key":"bib18","first-page":"91","volume-title":"Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)","author":"Li Xian","year":"2019"},{"key":"bib19","first-page":"328","volume-title":"Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)","author":"Li Zhenhao","year":"2019"},{"key":"bib20","first-page":"501","volume-title":"Proceedings of the 20th International Conference on Computational Linguistics","author":"Lin Chin-Yew","year":"2004"},{"key":"bib21","first-page":"881","volume-title":"Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers","author":"Mallinson Jonathan","year":"2017"},{"key":"bib22","doi-asserted-by":"crossref","first-page":"294","DOI":"10.18653\/v1\/W19-5330","volume-title":"Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)","author":"Marie Benjamin","year":"2019"},{"key":"bib23","doi-asserted-by":"crossref","first-page":"275","DOI":"10.18653\/v1\/D19-5536","volume-title":"Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)","author":"Veliz Claudia Matos","year":"2019"},{"key":"bib24","doi-asserted-by":"crossref","first-page":"543","DOI":"10.18653\/v1\/D18-1050","volume-title":"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing","author":"Michel Paul","year":"2018"},{"key":"bib25","first-page":"311","volume-title":"Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics","author":"Papineni Kishore","year":"2002"},{"key":"bib26","doi-asserted-by":"crossref","first-page":"392","DOI":"10.18653\/v1\/W15-3049","volume-title":"Proceedings of the Tenth Workshop on Statistical Machine Translation","author":"Popovi\u0107 Maja","year":"2015"},{"key":"bib27","doi-asserted-by":"crossref","first-page":"186","DOI":"10.18653\/v1\/W18-6319","volume-title":"Proceedings of the Third Conference on Machine Translation: Research Papers","author":"Post Matt","year":"2018"},{"key":"bib28","doi-asserted-by":"crossref","first-page":"866","DOI":"10.18653\/v1\/P18-1080","volume-title":"Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Prabhumoye Shrimai","year":"2018"},{"key":"bib29","doi-asserted-by":"crossref","unstructured":"Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016a. Improving neural machine translation models with monolingual data. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 86\u201396. Berlin, Germany. Association for Computational Linguistics. DOI:\u00a0https:\/\/doi.org\/10.18653\/v1\/P16-1009","DOI":"10.18653\/v1\/P16-1009"},{"key":"bib30","doi-asserted-by":"crossref","first-page":"1715","DOI":"10.18653\/v1\/P16-1162","volume-title":"Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Sennrich Rico","year":"2016"},{"key":"bib31","first-page":"1916","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Vaibhav Vaibhav","year":"2019"},{"key":"bib32","first-page":"5998","volume-title":"Proceedings of Advances in Neural Information Processing Systems 30","author":"Vaswani Ashish","year":"2017"},{"key":"bib33","author":"Zhang Zhirui","year":"2018","journal-title":"CoRR"},{"key":"bib34","volume-title":"Proceedings of the 8th International Conference on Learning Representations","author":"Zheng Zaixiang","year":"2020"}],"container-title":["Transactions of the Association for Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mitpressjournals.org\/doi\/pdf\/10.1162\/tacl_a_00341","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,3,12]],"date-time":"2021-03-12T21:39:46Z","timestamp":1615585186000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/tacl\/article\/96471"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,12]]},"references-count":34,"alternative-id":["10.1162\/tacl_a_00341"],"URL":"https:\/\/doi.org\/10.1162\/tacl_a_00341","relation":{},"ISSN":["2307-387X"],"issn-type":[{"value":"2307-387X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,12]]}}}