{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,17]],"date-time":"2026-03-17T22:22:01Z","timestamp":1773786121599,"version":"3.50.1"},"reference-count":32,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2019,10,31]],"date-time":"2019-10-31T00:00:00Z","timestamp":1572480000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Asian Low-Resour. Lang. Inf. Process."],"published-print":{"date-parts":[[2020,3,31]]},"abstract":"<jats:p>Large-scale parallel corpora are essential for training high-quality machine translation systems; however, such corpora are not freely available for many language translation pairs. Previously, training data has been augmented by pseudo-parallel corpora obtained by using machine translation models to translate monolingual corpora into the source language. However, in low-resource language pairs, in which only low-accurate machine translation systems can be used, translation quality degrades when a pseudo-parallel corpus is naively used. To improve machine translation performance with low-resource language pairs, we propose a method to effectively expand the training data via filtering the pseudo-parallel corpus using quality estimation based on sentence-level round-trip translation. For experiments with three language pairs that utilized small, medium, and large size parallel corpora, BLEU scores significantly improved for low-resource language pairs. Additionally, the effects of iterative bootstrapping on translation performance quality is investigated; resultingly, it is confirmed that bootstrapping can further improve the translation performance.<\/jats:p>","DOI":"10.1145\/3341726","type":"journal-article","created":{"date-parts":[[2019,10,31]],"date-time":"2019-10-31T12:23:34Z","timestamp":1572524614000},"page":"1-16","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":20,"title":["Filtered Pseudo-parallel Corpus Improves Low-resource Neural Machine Translation"],"prefix":"10.1145","volume":"19","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4484-8543","authenticated-orcid":false,"given":"Aizhan","family":"Imankulova","sequence":"first","affiliation":[{"name":"Tokyo Metropolitan University, Tokyo, Japan"}]},{"given":"Takayuki","family":"Sato","sequence":"additional","affiliation":[{"name":"Tokyo Metropolitan University, Tokyo, Japan"}]},{"given":"Mamoru","family":"Komachi","sequence":"additional","affiliation":[{"name":"Tokyo Metropolitan University, Tokyo, Japan"}]}],"member":"320","published-online":{"date-parts":[[2019,10,31]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Proceedings of the International Conference on Learning Representations.","author":"Artetxe Mikel","year":"2018"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.5555\/2145432.2145474"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.3115\/1626431.1626468"},{"key":"e_1_2_1_4_1","volume-title":"Explaining and generalizing back-translation through wake-sleep. Retrieved from: CoRR abs\/1806.04402","author":"Cotterell Ryan","year":"2018"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1045"},{"key":"e_1_2_1_6_1","volume-title":"Proceedings of the Workshop on Collaboration and Computing for Under-Resourced Languages (CCURL\u201916)","author":"Goldhahn Dirk","year":"2016"},{"key":"e_1_2_1_7_1","volume-title":"Proceedings of the Conference on Advances in Neural Information Processing Systems. 820--828","author":"He Di","year":"2016"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W18-2703"},{"key":"e_1_2_1_9_1","volume-title":"Proceedings of the 2nd Workshop on Hybrid Approaches to Translation. 117--122","author":"Hsieh An-Chang","year":"2013"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W18-2707"},{"key":"e_1_2_1_11_1","volume-title":"Proceedings of the 4th Workshop on Asian Translation. 70--78","author":"Imankulova Aizhan","year":"2017"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0180885"},{"key":"e_1_2_1_13_1","volume-title":"Rush","author":"Klein Guillaume","year":"2017"},{"key":"e_1_2_1_14_1","volume-title":"Europarl: A Multilingual Corpus for Evaluation of Machine Translation","author":"Koehn Philipp","year":"2002"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W17-3204"},{"key":"e_1_2_1_16_1","volume-title":"Proceedings of the International Conference on Learning Representations.","author":"Lample Guillaume","year":"2018"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.3115\/1218955.1219032"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.3115\/1220355.1220427"},{"key":"e_1_2_1_19_1","volume-title":"Proceedings of the International Conference on Learning Representations.","author":"Mikolov Tomas","year":"2013"},{"key":"e_1_2_1_20_1","volume-title":"Proceedings of the Meeting on Association for Computational Linguistics Conference Short Papers. 220--224","author":"Robert"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W18-2710"},{"key":"e_1_2_1_22_1","volume-title":"Proceedings of the 40th Meeting on Association for Computational Linguistics. 311--318","author":"Papineni Kishore","year":"2002"},{"key":"e_1_2_1_23_1","volume-title":"Proceedings of International Workshop on Spoken Language Translation. 182--189","author":"Schwenk Holger","year":"2008"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P16-1009"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/N15-1138"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.5555\/2627435.2670313"},{"key":"e_1_2_1_27_1","volume-title":"Proceedings of the 45th Meeting of the Association of Computational Linguistics. 25--32","author":"Ueffing Nicola","year":"2007"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D17-1147"},{"key":"e_1_2_1_29_1","doi-asserted-by":"crossref","unstructured":"Longyue Wang Derek F. Wong Lidia S. Chao Yi Lu and Junwen Xing. 2014. A systematic comparison of data selection criteria for SMT domain adaptation. The Scientific World Journal. 1--10.  Longyue Wang Derek F. Wong Lidia S. Chao Yi Lu and Junwen Xing. 2014. A systematic comparison of data selection criteria for SMT domain adaptation. The Scientific World Journal. 1--10.","DOI":"10.1155\/2014\/745485"},{"key":"e_1_2_1_30_1","volume-title":"Proceedings of the 6th International Conference on Web services and Semantic Technology. 21--30","author":"Y\u0131ld\u0131z Eray","year":"2014"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D16-1160"},{"key":"e_1_2_1_32_1","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence. 555--562","author":"Zhang Zhirui","year":"2018"}],"container-title":["ACM Transactions on Asian and Low-Resource Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3341726","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3341726","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T00:43:24Z","timestamp":1750207404000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3341726"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,10,31]]},"references-count":32,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2020,3,31]]}},"alternative-id":["10.1145\/3341726"],"URL":"https:\/\/doi.org\/10.1145\/3341726","relation":{},"ISSN":["2375-4699","2375-4702"],"issn-type":[{"value":"2375-4699","type":"print"},{"value":"2375-4702","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,10,31]]},"assertion":[{"value":"2018-04-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-06-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-10-31","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}