{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,8]],"date-time":"2026-01-08T06:42:01Z","timestamp":1767854521489,"version":"3.49.0"},"reference-count":68,"publisher":"Association for Computing Machinery (ACM)","issue":"8","license":[{"start":{"date-parts":[[2023,8,24]],"date-time":"2023-08-24T00:00:00Z","timestamp":1692835200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"JSPS KAKENHI","award":["2123124"],"award-info":[{"award-number":["2123124"]}]},{"name":"Young Scientists","award":["#19K20343"],"award-info":[{"award-number":["#19K20343"]}]},{"name":"JSPS Research Fellow for Young Scientists","award":["DC1"],"award-info":[{"award-number":["DC1"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Asian Low-Resour. Lang. Inf. Process."],"published-print":{"date-parts":[[2023,8,31]]},"abstract":"<jats:p>Sub-word segmentation is an essential pre-processing step for Neural Machine Translation (NMT). Existing work has shown that neural sub-word segmenters are better than Byte-Pair Encoding (BPE), however, they are inefficient, as they require parallel corpora, days to train, and hours to decode. This article introduces SelfSeg, a self-supervised neural sub-word segmentation method that is much faster to train\/decode and requires only monolingual dictionaries instead of parallel corpora. SelfSeg takes as input a word in the form of a partially masked character sequence, optimizes the word generation probability, and generates the segmentation with the maximum posterior probability, which is calculated using a dynamic programming algorithm. The training time of SelfSeg depends on word frequencies, and we explore several word frequency normalization strategies to accelerate the training phase. Additionally, we propose a regularization mechanism that allows the segmenter to generate various segmentations for one word. To show the effectiveness of our approach, we conduct MT experiments in low-, middle-, and high-resource scenarios, where we compare the performance of using different segmentation methods. The experimental results demonstrate that, on the low-resource ALT dataset, our method achieves more than 1.2 BLEU score improvement compared with BPE and SentencePiece, and a 1.1 score improvement over Dynamic Programming Encoding (DPE) and Vocabulary Learning via Optimal Transport (VOLT), on average. The regularization method achieves approximately a 4.3 BLEU score improvement over BPE and a 1.2 BLEU score improvement over BPE-dropout, the regularized version of BPE. We also observed significant improvements on IWSLT15 Vi\u2192En, WMT16 Ro\u2192En, and WMT15 Fi\u2192En datasets and competitive results on the WMT14 De\u2192En and WMT14 Fr\u2192En datasets. Furthermore, our method is 17.8\u00d7 faster during training and up to 36.8\u00d7 faster during decoding in a high-resource scenario compared to DPE. We provide extensive analysis, including why monolingual word-level data is enough to train SelfSeg.<\/jats:p>","DOI":"10.1145\/3610611","type":"journal-article","created":{"date-parts":[[2023,7,26]],"date-time":"2023-07-26T12:00:16Z","timestamp":1690372816000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["SelfSeg: A Self-supervised Sub-word Segmentation Method for Neural Machine Translation"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1159-0918","authenticated-orcid":false,"given":"Haiyue","family":"Song","sequence":"first","affiliation":[{"name":"Kyoto University, Japan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0664-3421","authenticated-orcid":false,"given":"Raj","family":"Dabre","sequence":"additional","affiliation":[{"name":"National Institute of Information and Communications Technology, Japan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9848-6384","authenticated-orcid":false,"given":"Chenhui","family":"Chu","sequence":"additional","affiliation":[{"name":"Kyoto University, Japan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5398-8399","authenticated-orcid":false,"given":"Sadao","family":"Kurohashi","sequence":"additional","affiliation":[{"name":"Kyoto University, Japan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1028-4399","authenticated-orcid":false,"given":"Eiichiro","family":"Sumita","sequence":"additional","affiliation":[{"name":"National Institute of Information and Communications Technology, Japan"}]}],"member":"320","published-online":{"date-parts":[[2023,8,24]]},"reference":[{"key":"e_1_3_3_2_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P16-1068"},{"key":"e_1_3_3_3_2","unstructured":"Dzmitry Bahdanau Kyunghyun Cho and Yoshua Bengio. 2014. Neural Machine Translation by Jointly Learning to Align and Translate. arXiv e-prints Article arXiv:1409.0473 (Sept.2014). arXiv:1409.0473"},{"key":"e_1_3_3_4_2","first-page":"65","volume-title":"Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and\/or Summarization","author":"Banerjee Satanjeev","year":"2005","unstructured":"Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and\/or Summarization. Association for Computational Linguistics, 65\u201372. Retrieved from https:\/\/aclanthology.org\/W05-0909"},{"key":"e_1_3_3_5_2","unstructured":"Tom B. Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell Sandhini Agarwal Ariel Herbert-Voss Gretchen Krueger Tom Henighan Rewon Child Aditya Ramesh Daniel M. Ziegler Jeffrey Wu Clemens Winter Christopher Hesse Mark Chen Eric Sigler Mateusz Litwin Scott Gray Benjamin Chess Jack Clark Christopher Berner Sam McCandlish Alec Radford Ilya Sutskever and Dario Amodei. 2020. Language Models are Few-shot Learners. arXiv:arXiv:2005.14165"},{"key":"e_1_3_3_6_2","doi-asserted-by":"crossref","unstructured":"Kris Cao and Laura Rimell. 2021. You Should Evaluate Your Language Model on Marginal Likelihood over Tokenisations. arXiv:arXiv:2109.02550","DOI":"10.18653\/v1\/2021.emnlp-main.161"},{"key":"e_1_3_3_7_2","unstructured":"William Chan Yu Zhang Quoc Le and Navdeep Jaitly. 2016. Latent Sequence Decompositions. arXiv:arXiv:1610.03035"},{"key":"e_1_3_3_8_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1461"},{"key":"e_1_3_3_9_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P16-2058"},{"key":"e_1_3_3_10_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N19-1423"},{"key":"e_1_3_3_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.167"},{"key":"e_1_3_3_12_2","doi-asserted-by":"crossref","unstructured":"C. M. Downey Fei Xia Gina-Anne Levow and Shane Steinert-Threlkeld. 2021. A Masked Segmental Language Model for Unsupervised Natural Language Segmentation. arXiv:arXiv:2104.07829","DOI":"10.18653\/v1\/2022.sigmorphon-1.5"},{"key":"e_1_3_3_13_2","unstructured":"Philip Gage. 1994. A new algorithm for data compression. C Users J. 12 2 (1994) 23\u201338."},{"key":"e_1_3_3_14_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1141"},{"key":"e_1_3_3_15_2","doi-asserted-by":"publisher","DOI":"10.5555\/3305381.3305510"},{"key":"e_1_3_3_16_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1143"},{"key":"e_1_3_3_17_2","unstructured":"Rohit Gupta Laurent Besacier Marc Dymetman and Matthias Gall\u00e9. 2019. Character-based NMT with Transformer. arXiv:arXiv:1911.04997"},{"key":"e_1_3_3_18_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.275"},{"key":"e_1_3_3_19_2","doi-asserted-by":"publisher","DOI":"10.1126\/science.1127647"},{"key":"e_1_3_3_20_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W17-4706"},{"key":"e_1_3_3_21_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/P15-1001"},{"key":"e_1_3_3_22_2","first-page":"1700","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing","author":"Kalchbrenner Nal","year":"2013","unstructured":"Nal Kalchbrenner and Phil Blunsom. 2013. Recurrent continuous translation models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1700\u20131709. Retrieved from https:\/\/aclanthology.org\/D13-1176"},{"key":"e_1_3_3_23_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1645"},{"key":"e_1_3_3_24_2","doi-asserted-by":"crossref","unstructured":"Yoon Kim Yacine Jernite David Sontag and Alexander Rush. 2016. Character-aware neural language models. Proc. AAAI Conf. Artif. Intell. 30 1 (Mar.2016). Retrieved from https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/view\/10362","DOI":"10.1609\/aaai.v30i1.10362"},{"key":"e_1_3_3_25_2","unstructured":"Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. arXiv e-prints Article arXiv:1412.6980 (Dec.2014). arXiv:1412.6980"},{"key":"e_1_3_3_26_2","first-page":"388","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing","author":"Koehn Philipp","year":"2004","unstructured":"Philipp Koehn. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 388\u2013395. Retrieved from https:\/\/www.aclweb.org\/anthology\/W04-3250"},{"key":"e_1_3_3_27_2","doi-asserted-by":"publisher","DOI":"10.3115\/1557769.1557821"},{"key":"e_1_3_3_28_2","doi-asserted-by":"publisher","DOI":"10.3115\/1067807.1067833"},{"key":"e_1_3_3_29_2","unstructured":"Julia Kreutzer and Artem Sokolov. 2018. Learning to Segment Inputs for NMT Favors Character-level Processing. arXiv:arXiv:1810.01480"},{"key":"e_1_3_3_30_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-1007"},{"key":"e_1_3_3_31_2","doi-asserted-by":"crossref","unstructured":"Taku Kudo. 2018. Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates. arXiv:arXiv:1804.10959","DOI":"10.18653\/v1\/P18-1007"},{"key":"e_1_3_3_32_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-2012"},{"key":"e_1_3_3_33_2","unstructured":"Jimmy Lei Ba Jamie Ryan Kiros and Geoffrey E. Hinton. 2016. Layer Normalization. arXiv e-prints Article arXiv:1607.06450 (July2016). arXiv:1607.06450"},{"key":"e_1_3_3_34_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.703"},{"key":"e_1_3_3_35_2","unstructured":"Wang Ling Isabel Trancoso Chris Dyer and Alan W. Black. 2015. Character-based Neural Machine Translation. arXiv:arXiv:1511.04586"},{"key":"e_1_3_3_36_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00343"},{"key":"e_1_3_3_37_2","unstructured":"Yinhan Liu Myle Ott Naman Goyal Jingfei Du Mandar Joshi Danqi Chen Omer Levy Mike Lewis Luke Zettlemoyer and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv:arXiv:1907.11692"},{"key":"e_1_3_3_38_2","doi-asserted-by":"crossref","unstructured":"Minh-Thang Luong and Christopher D. Manning. 2016. Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-character Models. arXiv:arXiv:1604.00788","DOI":"10.18653\/v1\/P16-1100"},{"key":"e_1_3_3_39_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/P15-1002"},{"key":"e_1_3_3_40_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/P14-5010"},{"key":"e_1_3_3_41_2","unstructured":"Sabrina J. Mielke Zaid Alyafeai Elizabeth Salesky Colin Raffel Manan Dey Matthias Gall\u00e9 Arun Raja Chenglei Si Wilson Y. Lee Beno\u00eet Sagot and Samson Tan. 2021. Between Words and Characters: A Brief History of Open-vocabulary Modeling and Tokenization in NLP. arXiv:arXiv:2112.10508"},{"key":"e_1_3_3_42_2","doi-asserted-by":"crossref","unstructured":"Ishan Misra C. Lawrence Zitnick and Martial Hebert. 2016. Shuffle and Learn: Unsupervised Learning Using Temporal Order Verification. arXiv:arXiv:1603.08561","DOI":"10.1007\/978-3-319-46448-0_32"},{"key":"e_1_3_3_43_2","unstructured":"Graham Neubig. 2011. The Kyoto Free Translation Task. Retrieved fromhttp:\/\/www.phontron.com\/kftt"},{"key":"e_1_3_3_44_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N19-4009"},{"key":"e_1_3_3_45_2","doi-asserted-by":"crossref","unstructured":"Deepak Pathak Philipp Krahenbuhl Jeff Donahue Trevor Darrell and Alexei A. Efros. 2016. Context encoders: Feature learning by inpainting. In Proceedings of the Computer Vision and Pattern Recognition Conference . arXiv:arXiv:1604.07379","DOI":"10.1109\/CVPR.2016.278"},{"key":"e_1_3_3_46_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W18-6319"},{"key":"e_1_3_3_47_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.170"},{"key":"e_1_3_3_48_2","unstructured":"Colin Raffel Noam Shazeer Adam Roberts Katherine Lee Sharan Narang Michael Matena Yanqi Zhou Wei Li and Peter J. Liu. 2019. Exploring the Limits of Transfer Learning with a Unified Text-to-text Transformer. arXiv:arXiv:1910.10683"},{"key":"e_1_3_3_49_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10590-020-09258-6"},{"key":"e_1_3_3_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2012.6289079"},{"key":"e_1_3_3_51_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.704"},{"key":"e_1_3_3_52_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W16-2323"},{"key":"e_1_3_3_53_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P16-1162"},{"key":"e_1_3_3_54_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.naacl-main.17"},{"key":"e_1_3_3_55_2","unstructured":"Kaitao Song Xu Tan Tao Qin Jianfeng Lu and Tie-Yan Liu. 2019. MASS: Masked Sequence to Sequence Pre-training for Language Generation. arXiv e-prints Article arXiv:1905.02450 (May2019). arXiv:1905.02450"},{"key":"e_1_3_3_56_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1531"},{"key":"e_1_3_3_57_2","first-page":"3104","volume-title":"Advances in Neural Information Processing Systems 27","author":"Sutskever Ilya","year":"2014","unstructured":"Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 3104\u20133112. Retrieved from http:\/\/papers.nips.cc\/paper\/5346-sequence-to-sequence-learning-with-neural-networks.pdf"},{"key":"e_1_3_3_58_2","doi-asserted-by":"publisher","DOI":"10.3115\/1220175.1220299"},{"key":"e_1_3_3_59_2","first-page":"1574","volume-title":"Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC\u201916)","author":"Thu Ye Kyaw","year":"2016","unstructured":"Ye Kyaw Thu, Win Pa Pa, Masao Utiyama, Andrew Finch, and Eiichiro Sumita. 2016. Introducing the Asian Language Treebank (ALT). In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC\u201916). European Language Resources Association (ELRA), 1574\u20131578. Retrieved from https:\/\/www.aclweb.org\/anthology\/L16-1249"},{"key":"e_1_3_3_60_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-2010"},{"key":"e_1_3_3_61_2","volume-title":"Advances in Neural Information Processing Systems 30","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc. Retrieved from https:\/\/proceedings.neurips.cc\/paper\/2017\/file\/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf"},{"key":"e_1_3_3_62_2","doi-asserted-by":"publisher","DOI":"10.1145\/1390156.1390294"},{"key":"e_1_3_3_63_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01261-8_24"},{"key":"e_1_3_3_64_2","unstructured":"Changhan Wang Kyunghyun Cho and Jiatao Gu. 2019. Neural Machine Translation with Byte-level Subwords. arXiv:arXiv:1909.03341"},{"key":"e_1_3_3_65_2","unstructured":"Chong Wang Yining Wang Po-Sen Huang Abdelrahman Mohamed Dengyong Zhou and Li Deng. 2017. Sequence Modeling via Segmentations. arXiv:arXiv:1702.07463"},{"key":"e_1_3_3_66_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00840"},{"key":"e_1_3_3_67_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.acl-long.571"},{"key":"e_1_3_3_68_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46487-9_40"},{"key":"e_1_3_3_69_2","volume-title":"Morphological Zero-shot Neural Machine Translation","author":"Zhou Giulio","year":"2018","unstructured":"Giulio Zhou. 2018. Morphological Zero-shot Neural Machine Translation. University of Edinburgh."}],"container-title":["ACM Transactions on Asian and Low-Resource Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3610611","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3610611","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:36:00Z","timestamp":1750178160000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3610611"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,8,24]]},"references-count":68,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2023,8,31]]}},"alternative-id":["10.1145\/3610611"],"URL":"https:\/\/doi.org\/10.1145\/3610611","relation":{},"ISSN":["2375-4699","2375-4702"],"issn-type":[{"value":"2375-4699","type":"print"},{"value":"2375-4702","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,8,24]]},"assertion":[{"value":"2023-07-13","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-08-24","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}