{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,13]],"date-time":"2025-12-13T23:09:20Z","timestamp":1765667360202,"version":"3.41.0"},"reference-count":63,"publisher":"Association for Computing Machinery (ACM)","issue":"6","license":[{"start":{"date-parts":[[2021,8,12]],"date-time":"2021-08-12T00:00:00Z","timestamp":1628726400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100012166","name":"National Key R&D Program of China","doi-asserted-by":"crossref","award":["2017YFB0202204"],"award-info":[{"award-number":["2017YFB0202204"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["61925601, and 61772302"],"award-info":[{"award-number":["61925601, and 61772302"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Asian Low-Resour. Lang. Inf. Process."],"published-print":{"date-parts":[[2021,11,30]]},"abstract":"<jats:p>Data augmentation is an approach for several text generation tasks. Generally, in the machine translation paradigm, mainly in low-resource language scenarios, many data augmentation methods have been proposed. The most used approaches for generating pseudo data mainly lay in word omission, random sampling, or replacing some words in the text. However, previous methods barely guarantee the quality of augmented data. In this work, we try to build the data by using paraphrase embedding and POS-Tagging. Namely, we generate the fake monolingual corpus by replacing the main four POS-Tagging labels, such as noun, adjective, adverb, and verb, based on both the paraphrase table and their similarity. We select the bigger corpus size of the paraphrase table with word level and obtain the word embedding of each word in the table, then calculate the cosine similarity between these words and tagged words in the original sequence. In addition, we exploit the ranking algorithm to choose highly similar words to reduce semantic errors and leverage the POS-Tagging replacement to mitigate syntactic error to some extent. Experimental results show that our augmentation method consistently outperforms all previous SOTA methods on the low-resource language pairs in seven language pairs from four corpora by 1.16 to 2.39 BLEU points.<\/jats:p>","DOI":"10.1145\/3464427","type":"journal-article","created":{"date-parts":[[2021,8,12]],"date-time":"2021-08-12T16:57:58Z","timestamp":1628787478000},"page":"1-21","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":9,"title":["Improving Data Augmentation for Low-Resource NMT Guided by POS-Tagging and Paraphrase Embedding"],"prefix":"10.1145","volume":"20","author":[{"given":"Mieradilijiang","family":"Maimaiti","sequence":"first","affiliation":[{"name":"State Key Laboratory of Intelligent Technology and Systems, Institute for Artificial Intelligence, Tsinghua University, Beijing National Research Center for Information Science and Technology (BNRist), Department of Computer Science and Technology, Tsinghua University, Beijing, China"}]},{"given":"Yang","family":"Liu","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Intelligent Technology and Systems, Institute for Artificial Intelligence, Tsinghua University, Beijing National Research Center for Information Science and Technology (BNRist), Department of Computer Science and Technology, Tsinghua University, Beijing, China"}]},{"given":"Huanbo","family":"Luan","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Intelligent Technology and Systems, Institute for Artificial Intelligence, Tsinghua University, Beijing National Research Center for Information Science and Technology (BNRist), Department of Computer Science and Technology, Tsinghua University, Beijing, China"}]},{"given":"Zegao","family":"Pan","sequence":"additional","affiliation":[{"name":"School of Software, Xinjiang University, Urumqi, China"}]},{"given":"Maosong","family":"Sun","sequence":"additional","affiliation":[{"name":"State Key Laboratory of Intelligent Technology and Systems, Institute for Artificial Intelligence, Tsinghua University, Beijing National Research Center for Information Science and Technology (BNRist), Department of Computer Science and Technology, Tsinghua University, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2021,8,12]]},"reference":[{"volume-title":"Proceedings of ICLR.","year":"2018","author":"Artetxe M.","key":"e_1_2_1_1_1"},{"volume-title":"Jamie Ryan Kiros, and Geoffrey E. Hinton","year":"2016","author":"Ba Jimmy Lei","key":"e_1_2_1_2_1"},{"volume-title":"Proceedings of ICLR.","year":"2015","author":"Bahdanau Dzmitry","key":"e_1_2_1_3_1"},{"volume-title":"Proceedings of IEEvaluation@ACL.","author":"Banerjee S.","key":"e_1_2_1_4_1"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.3115\/1225403.1225421"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.5555\/972470.972474"},{"volume-title":"Li","year":"2017","author":"Chen Yun","key":"e_1_2_1_7_1"},{"volume-title":"Li","year":"2018","author":"Chen Yun","key":"e_1_2_1_8_1"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P16-1185"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.5555\/3171837.3171841"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.3115\/1219840.1219873"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1179"},{"key":"e_1_2_1_13_1","unstructured":"Chenhui Chu Raj Dabre and Sadao Kurohashi. 2017. An empirical comparison of simple domain adaptation methods for neural machine translation. arXiv:1701.03214.  Chenhui Chu Raj Dabre and Sadao Kurohashi. 2017. An empirical comparison of simple domain adaptation methods for neural machine translation. arXiv:1701.03214."},{"volume-title":"Le","year":"2019","author":"Cubuk E.","key":"e_1_2_1_14_1"},{"volume-title":"BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceeding of NAACL-HLT.","year":"2019","author":"Devlin J.","key":"e_1_2_1_15_1"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/P15-1166"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P17-2090"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N16-1101"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D16-1026"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1555"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.60"},{"volume-title":"Proceedings of EMNLP.","author":"Gu Jiatao","key":"e_1_2_1_22_1"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/P15-1162"},{"key":"e_1_2_1_26_1","unstructured":"Marcin Junczys-Dowmunt Tomasz Dwojak and Hieu Hoang. 2016. Is neural machine translation ready for deployment? A case study on 30 translation directions. arXiv:1610.01108v2.  Marcin Junczys-Dowmunt Tomasz Dwojak and Hieu Hoang. 2016. Is neural machine translation ready for deployment? A case study on 30 translation directions. arXiv:1610.01108v2."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1436"},{"volume-title":"Proceedings of NAACL.","year":"2017","author":"Kobayashi S.","key":"e_1_2_1_28_1"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.3115\/1073445.1073462"},{"volume-title":"Proceedings of ICLR.","year":"2018","author":"Lample Guillaume","key":"e_1_2_1_30_1"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1162\/coli.2009.35.4.35403"},{"volume-title":"ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out","year":"2004","author":"Lin Chin-Yew","key":"e_1_2_1_32_1"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/P15-1002"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3314945"},{"volume-title":"Proceedings of ICIS.","year":"2018","author":"Mieradilijiang Maimaiti","key":"e_1_2_1_35_1"},{"key":"e_1_2_1_36_1","unstructured":"Tomas Mikolov Kai Chen G. S. Corrado and J. Dean. 2013. Efficient estimation of word representations in vector space. arXiv:1301.3781.  Tomas Mikolov Kai Chen G. S. Corrado and J. Dean. 2013. Efficient estimation of word representations in vector space. arXiv:1301.3781."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.5555\/3157096.3157290"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.3115\/1073083.1073135"},{"volume-title":"Language models are unsupervised multitask learners. OpenAI blog 1, 8","year":"2019","author":"Radford A.","key":"e_1_2_1_39_1"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P16-1009"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P16-1162"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1021"},{"key":"e_1_2_1_43_1","unstructured":"Matthew Snover Bonnie J. Dorr R. Schwartz and L. Micciulla. 2006. A study of translation edit rate with targeted human annotation. In AMTA.  Matthew Snover Bonnie J. Dorr R. Schwartz and L. Micciulla. 2006. A study of translation edit rate with targeted human annotation. In AMTA."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2017.09.070"},{"key":"e_1_2_1_45_1","doi-asserted-by":"crossref","unstructured":"Jinsong Su Jiali Zeng John Xie H. Wen Yongjing Yin and Y. Liu. 2021. Exploring discriminative word-level domain contexts for multi-domain neural machine translation.IEEE Transactions on Pattern Analysis and Machine Intelligence 43 5 (2021) 1530\u20131545.  Jinsong Su Jiali Zeng John Xie H. Wen Yongjing Yin and Y. Liu. 2021. Exploring discriminative word-level domain contexts for multi-domain neural machine translation.IEEE Transactions on Pattern Analysis and Machine Intelligence 43 5 (2021) 1530\u20131545.","DOI":"10.1109\/TPAMI.2019.2954406"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-6504"},{"volume-title":"Proceedings of AAAI.","author":"Sun Yibo","key":"e_1_2_1_47_1"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.5555\/2969033.2969173"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.5555\/3298483.3298614"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P17-1184"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.5555\/3295222.3295349"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2018.2837223"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1100"},{"volume-title":"Proceedings of AAAI.","author":"Wang Yiren","key":"e_1_2_1_54_1"},{"volume-title":"Proceedings of EMNLP\/IJCNLP.","author":"Wei Jason","key":"e_1_2_1_55_1"},{"volume-title":"Proceedings of ICCS.","author":"Wu Xing","key":"e_1_2_1_56_1"},{"volume-title":"et\u00a0al","year":"2016","author":"Wu Yonghui","key":"e_1_2_1_57_1"},{"volume-title":"Proceedings of ICLR.","author":"Xie Ziang","key":"e_1_2_1_58_1"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1078"},{"volume-title":"Proceedings of EMNLP.","author":"Zeng Jiali","key":"e_1_2_1_60_1"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2018.2883740"},{"volume-title":"Le","year":"2017","author":"Zoph Barret","key":"e_1_2_1_62_1"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D16-1163"}],"container-title":["ACM Transactions on Asian and Low-Resource Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3464427","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3464427","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:17:10Z","timestamp":1750191430000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3464427"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,8,12]]},"references-count":63,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2021,11,30]]}},"alternative-id":["10.1145\/3464427"],"URL":"https:\/\/doi.org\/10.1145\/3464427","relation":{},"ISSN":["2375-4699","2375-4702"],"issn-type":[{"type":"print","value":"2375-4699"},{"type":"electronic","value":"2375-4702"}],"subject":[],"published":{"date-parts":[[2021,8,12]]},"assertion":[{"value":"2021-01-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-05-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-08-12","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}