{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,1]],"date-time":"2025-10-01T15:38:12Z","timestamp":1759333092493,"version":"3.41.0"},"reference-count":56,"publisher":"Association for Computing Machinery (ACM)","issue":"9","license":[{"start":{"date-parts":[[2023,9,22]],"date-time":"2023-09-22T00:00:00Z","timestamp":1695340800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"TMU research fund for young scientists and JST","award":["JPMJFS2139"],"award-info":[{"award-number":["JPMJFS2139"]}]},{"name":"JSPS KAKENHI","award":["19K12099"],"award-info":[{"award-number":["19K12099"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Asian Low-Resour. Lang. Inf. Process."],"published-print":{"date-parts":[[2023,9,30]]},"abstract":"<jats:p>South and North Korea both use the Korean language. However, Korean natural language processing (NLP) research has mostly focused on South Korean language. Therefore, existing NLP systems in the Korean language, such as neural machine translation (NMT) systems, cannot properly process North Korean inputs. Training a model using North Korean data is the most straightforward approach to solving this problem, but the data to train NMT models are insufficient. To solve this problem, we constructed a parallel corpus to develop a North Korean NMT model using a comparable corpus. We manually aligned parallel sentences to create evaluation data and automatically aligned the remaining sentences to create training data. We trained a North Korean NMT model using our North Korean parallel data and improved North Korean translation quality using South Korean resources such as parallel data and a pre-trained model. In addition, we propose Korean-specific pre-processing methods, character tokenization, and phoneme decomposition to use the South Korean resources more efficiently. We demonstrate that the phoneme decomposition consistently improves the North Korean translation accuracy compared to other pre-processing methods.<\/jats:p>","DOI":"10.1145\/3608947","type":"journal-article","created":{"date-parts":[[2023,7,19]],"date-time":"2023-07-19T12:32:40Z","timestamp":1689769960000},"page":"1-22","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["North Korean Neural Machine Translation through South Korean Resources"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0009-0007-3146-4085","authenticated-orcid":false,"given":"Hwichan","family":"Kim","sequence":"first","affiliation":[{"name":"Tokyo Metropolitan University, Japan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4657-8214","authenticated-orcid":false,"given":"Hirasawa","family":"Tosho","sequence":"additional","affiliation":[{"name":"Tokyo Metropolitan University, Japan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9033-9304","authenticated-orcid":false,"given":"Sangwhan","family":"Moon","sequence":"additional","affiliation":[{"name":"Tokyo Institute of Technology, Japan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7635-6175","authenticated-orcid":false,"given":"Naoaki","family":"Okazaki","sequence":"additional","affiliation":[{"name":"Tokyo Institute of Technology, Japan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1166-1739","authenticated-orcid":false,"given":"Mamoru","family":"Komachi","sequence":"additional","affiliation":[{"name":"Tokyo Metropolitan University, Japan"}]}],"member":"320","published-online":{"date-parts":[[2023,9,22]]},"reference":[{"key":"e_1_3_5_2_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1309"},{"key":"e_1_3_5_3_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00288"},{"key":"e_1_3_5_4_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W17-2508"},{"key":"e_1_3_5_5_2","doi-asserted-by":"publisher","DOI":"10.1145\/3443279.3443310"},{"key":"e_1_3_5_6_2","doi-asserted-by":"publisher","DOI":"10.3115\/981344.981366"},{"key":"e_1_3_5_7_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W16-2365"},{"key":"e_1_3_5_8_2","doi-asserted-by":"publisher","DOI":"10.3115\/981574.981576"},{"key":"e_1_3_5_9_2","first-page":"282","volume-title":"Proceedings of the 31st Pacific Asia Conference on Language, Information and Computation","author":"Dabre Raj","year":"2017","unstructured":"Raj Dabre, Tetsuji Nakagawa, and Hideto Kazawa. 2017. An empirical study of language relatedness for transfer learning in neural machine translation. In Proceedings of the 31st Pacific Asia Conference on Language, Information and Computation. 282\u2013286. Retrieved from https:\/\/aclanthology.org\/Y17-1038"},{"key":"e_1_3_5_10_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W16-2366"},{"key":"e_1_3_5_11_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-1128"},{"key":"e_1_3_5_12_2","doi-asserted-by":"publisher","DOI":"10.5555\/1858681.1858729"},{"key":"e_1_3_5_13_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P16-1189"},{"key":"e_1_3_5_14_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00437"},{"key":"e_1_3_5_15_2","doi-asserted-by":"publisher","DOI":"10.5555\/972450.972455"},{"key":"e_1_3_5_16_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.145"},{"key":"e_1_3_5_17_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W16-2368"},{"key":"e_1_3_5_18_2","first-page":"2228","volume-title":"Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC\u201916)","author":"Gomes Lu\u00eds","year":"2016","unstructured":"Lu\u00eds Gomes and Gabriel Pereira Lopes. 2016. First steps towards coverage-based sentence alignment. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC\u201916). European Language Resources Association (ELRA), 2228\u20132231. Retrieved from https:\/\/aclanthology.org\/L16-1354"},{"key":"e_1_3_5_19_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2017.08.003"},{"key":"e_1_3_5_20_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1286"},{"key":"e_1_3_5_21_2","first-page":"3477","volume-title":"Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC\u201918)","author":"Karimi Akbar","year":"2018","unstructured":"Akbar Karimi, Ebrahim Ansari, and Bahram Sadeghi Bigham. 2018. Extracting an English-Persian parallel corpus from comparable corpora. In Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC\u201918). European Language Resources Association (ELRA), 3477\u20133482. Retrieved from https:\/\/aclanthology.org\/L18-1549"},{"key":"e_1_3_5_22_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00348"},{"key":"e_1_3_5_23_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.emnlp-main.274"},{"key":"e_1_3_5_24_2","first-page":"235","volume-title":"Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation","author":"Kim Hwichan","year":"2021","unstructured":"Hwichan Kim and Mamoru Komachi. 2021. Can monolingual pre-trained encoder-decoder improve NMT for distant language pairs? In Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation. Association for Computational Lingustics, 235\u2013243. Retrieved from https:\/\/aclanthology.org\/2021.paclic-1.25"},{"key":"e_1_3_5_25_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-2012"},{"key":"e_1_3_5_26_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-srw.34"},{"key":"e_1_3_5_27_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W18-6316"},{"key":"e_1_3_5_28_2","doi-asserted-by":"publisher","DOI":"10.1515\/ijsl.1990.82.71"},{"key":"e_1_3_5_29_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00067"},{"key":"e_1_3_5_30_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.acl-long.577"},{"key":"e_1_3_5_31_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.703"},{"key":"e_1_3_5_32_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1291"},{"key":"e_1_3_5_33_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W16-2372"},{"key":"e_1_3_5_34_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P16-1100"},{"key":"e_1_3_5_35_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W16-2373"},{"key":"e_1_3_5_36_2","first-page":"129","volume-title":"Proceedings of the 15th Annual Conference of the European Association for Machine Translation","author":"Marujo Luis","year":"2011","unstructured":"Luis Marujo, Nuno Grazina, Tiago Luis, Wang Ling, Luisa Coheur, and Isabel Trancoso. 2011. BP2EP\u2014Adaptation of Brazilian Portuguese texts to European Portuguese. In Proceedings of the 15th Annual Conference of the European Association for Machine Translation. European Association for Machine Translation, 129\u2013136. Retrieved from https:\/\/aclanthology.org\/2011.eamt-1.19"},{"key":"e_1_3_5_37_2","doi-asserted-by":"publisher","DOI":"10.1007\/BF02295996"},{"key":"e_1_3_5_38_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W16-2374"},{"key":"e_1_3_5_39_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.wat-1.1"},{"key":"e_1_3_5_40_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.755"},{"key":"e_1_3_5_41_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N19-4009"},{"key":"e_1_3_5_42_2","volume-title":"Proceedings of the 35th Conference on Neural Information Processing Systems Datasets and Benchmarks Track","author":"Park Sungjoon","year":"2021","unstructured":"Sungjoon Park, Jihyung Moon, Sungdong Kim, Won Ik Cho, Ji Yoon Han, Jangwon Park, Chisung Song, Junseong Kim, Youngsook Song, Taehwan Oh, Joohong Lee, Juhyun Oh, Sungwon Lyu, Younghoon Jeong, Inkwon Lee, Sangwoo Seo, Dongjun Lee, Hyunwoo Kim, Myeonghwa Lee, Seongbo Jang, Seungwon Do, Sunkyoung Kim, Kyungtae Lim, Jongwon Lee, Kyumin Park, Jamin Shin, Seonghyun Kim, Lucy Park, Alice Oh, Jung-Woo Ha, and Kyunghyun Cho. 2021. KLUE: Korean language understanding evaluation. In Proceedings of the 35th Conference on Neural Information Processing Systems Datasets and Benchmarks Track. Advances in Neural Information Processing Systems. Retrieved from https:\/\/openreview.net\/forum?id=q-8h8-LZiUm"},{"key":"e_1_3_5_43_2","first-page":"43","volume-title":"Proceedings of the 3rd Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial\u201916)","author":"Popovi\u0107 Maja","year":"2016","unstructured":"Maja Popovi\u0107, Mihael Ar\u010dan, and Filip Klubi\u010dka. 2016. Language related issues for machine translation between closely related South Slavic languages. In Proceedings of the 3rd Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial\u201916). The COLING 2016 Organizing Committee, 43\u201352. Retrieved from https:\/\/aclanthology.org\/W16-4806"},{"key":"e_1_3_5_44_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.findings-emnlp.381"},{"key":"e_1_3_5_45_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-2037"},{"key":"e_1_3_5_46_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W17-2619"},{"key":"e_1_3_5_47_2","volume-title":"Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Research Papers","author":"Sennrich Rico","year":"2010","unstructured":"Rico Sennrich and Martin Volk. 2010. MT-based sentence alignment for OCR-generated parallel texts. In Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Research Papers. Association for Machine Translation in the Americas. Retrieved from https:\/\/aclanthology.org\/2010.amta-papers.14"},{"key":"e_1_3_5_48_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.naacl-srw.17"},{"key":"e_1_3_5_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/IALP.2012.14"},{"key":"e_1_3_5_50_2","doi-asserted-by":"publisher","DOI":"10.5555\/1873781.1873905"},{"key":"e_1_3_5_51_2","doi-asserted-by":"publisher","DOI":"10.3115\/1075096.1075106"},{"key":"e_1_3_5_52_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N19-1190"},{"key":"e_1_3_5_53_2","first-page":"6000\u20136010","volume-title":"Proceedings of the Conference on Advances in Neural Information Processing Systems","volume":"30","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the Conference on Advances in Neural Information Processing Systems, Vol. 30. Curran Associates, Inc., Red Hook, NY, 6000\u20136010. Retrieved from https:\/\/proceedings.neurips.cc\/paper\/2017\/file\/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf"},{"key":"e_1_3_5_54_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1670"},{"key":"e_1_3_5_55_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.wocn.2019.100918"},{"key":"e_1_3_5_56_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D16-1163"},{"key":"e_1_3_5_57_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W17-2512"}],"container-title":["ACM Transactions on Asian and Low-Resource Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3608947","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3608947","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T22:29:46Z","timestamp":1750285786000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3608947"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,9,22]]},"references-count":56,"journal-issue":{"issue":"9","published-print":{"date-parts":[[2023,9,30]]}},"alternative-id":["10.1145\/3608947"],"URL":"https:\/\/doi.org\/10.1145\/3608947","relation":{},"ISSN":["2375-4699","2375-4702"],"issn-type":[{"type":"print","value":"2375-4699"},{"type":"electronic","value":"2375-4702"}],"subject":[],"published":{"date-parts":[[2023,9,22]]},"assertion":[{"value":"2022-09-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-07-06","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-09-22","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}