{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,28]],"date-time":"2026-03-28T16:51:06Z","timestamp":1774716666810,"version":"3.50.1"},"reference-count":29,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2020,2,22]],"date-time":"2020-02-22T00:00:00Z","timestamp":1582329600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Department of Computer Science"},{"DOI":"10.13039\/501100022536","name":"Faculty of Science","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100022536","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100004071","name":"Khon Kaen University","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100004071","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Asian Low-Resour. Lang. Inf. Process."],"published-print":{"date-parts":[[2020,3,31]]},"abstract":"<jats:p>In this study, we developed an Isarn Dharma word segmentation system. We mainly focused on solving the word ambiguity and unknown word problems in unsegmented Isarn Dharma text. Ambiguous Isarn Dharma words occur frequently in word construction due to the writing style without tone markers. Thus, words can be interpreted as having different tones and meanings in the same writing text. To overcome these problems, we developed an Isarn Dharma character cluster\u2013(IDCC) based statistical model and affixation and integrated it with the named entity recognition method (IDCC-C-based statistical model and affixation with named entity recognition (NER)). This method integrates the IDCC-based and character-based statistical models to distinguish the word boundaries. The IDCC-based statistical model utilizes the IDCC feature to disambiguate any ambiguous words. The unknown words are handled using the character-based statistical model, based on the character features. In addition, linguistic knowledge is employed to detect the boundaries of a new word based on the construction morphology and NER. In evaluations, we compared the proposed method with various word segmentation methods. The experimental results showed that the proposed method performed slightly better than the other methods when the corpus size increased. Using the test set, the proposed method obtained the best F-measure of 92.19, an F-measure that was better than the IDCC longest matching grouping at 2.85.<\/jats:p>","DOI":"10.1145\/3359990","type":"journal-article","created":{"date-parts":[[2020,2,24]],"date-time":"2020-02-24T18:12:51Z","timestamp":1582567971000},"page":"1-16","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["Isarn Dharma Word Segmentation Using a Statistical Approach with Named Entity Recognition"],"prefix":"10.1145","volume":"19","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9531-8307","authenticated-orcid":false,"given":"Sittichai","family":"Somsap","sequence":"first","affiliation":[{"name":"Natural Language and Speech Processing Laboratory (NLSP), Khon Kaen University, Khon Kaen, Thailand"}]},{"given":"Pusadee","family":"Seresangtakul","sequence":"additional","affiliation":[{"name":"Natural Language and Speech Processing Laboratory (NLSP), Khon Kaen University, Khon Kaen, Thailand"}]}],"member":"320","published-online":{"date-parts":[[2020,2,22]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W16-2404"},{"key":"e_1_2_1_2_1","volume-title":"Proceedings of the 5th SNLP and 5th Oriental COCOSDA Workshop. 68--75","author":"Aroonmanakun Wirote","year":"2002"},{"key":"e_1_2_1_3_1","volume-title":"Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 858--867","author":"Brants Thorsten","year":"2007"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/SLT.2008.4777885"},{"key":"e_1_2_1_5_1","first-page":"22","article-title":"Word segmentation for Burmese (Myanmar)","volume":"15","author":"Ding Chenchen","year":"2016","journal-title":"ACM Transact. Asian Lang. Inf. Process."},{"key":"e_1_2_1_6_1","first-page":"381","article-title":"Chinese word segmentation by classification of characters","volume":"10","author":"Goh Chooi-Ling","year":"2005","journal-title":"International Journal of the Computational Linguistics and Chinese Language Processing."},{"key":"e_1_2_1_7_1","volume-title":"Proceedings of the 8th International Symposium on Natural Language Processing. 1--5.","author":"Haruechaiyasak Choochart","year":"2009"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/ECTICON.2008.4600388"},{"key":"e_1_2_1_9_1","first-page":"15","article-title":"Preliminary notes on \u201cthe cultural region of tham script manuscripts","volume":"74","author":"Iijima Akiko","year":"2009","journal-title":"Senri Ethnol. Stud."},{"key":"e_1_2_1_10_1","volume-title":"Proceedings of the 11th International Conference on Lao Studies. 1--21","author":"Kourilsky Gregory J D","year":"2005"},{"key":"e_1_2_1_11_1","first-page":"644","article-title":"Improved isarn dharma alphabets to Thai language translation using longest syllable matching with named entities recognition","volume":"59","author":"Lakkhanawannakun Phoemporn","year":"2014","journal-title":"WIT Trans. Info. Comm. Technol."},{"key":"e_1_2_1_12_1","volume-title":"Proceedings of the 8th International Symposium on Natural Language Processing.","author":"Limcharoen Piya","year":"2009"},{"key":"e_1_2_1_13_1","volume-title":"Learn Fast to Read Tham Character in Lao Texts","author":"Maha Sena Phuy Phaya Luang"},{"key":"e_1_2_1_14_1","first-page":"549","article-title":"Bilingually motivated word segmentation for statistical machine translation","volume":"8","author":"Ma Yanjun","year":"2009","journal-title":"ACM Transact. Asian Lang. Inf. Process."},{"key":"e_1_2_1_15_1","volume-title":"Proceedings of the IJCNLP\u201908 Workshop on NLP for Less Privileged Languages. 51--58","author":"Maung Zin Maung","year":"2008"},{"key":"e_1_2_1_16_1","volume-title":"Proceedings of the 13th National on Computer Science and Engineering Conference (NCSEC\u201909)","author":"Phaiboon Nongnud","year":"2009"},{"key":"e_1_2_1_17_1","first-page":"95","article-title":"An analysis of the contexts and the permutationsof the lanna language in the \u201c5 chiang","volume":"36","year":"2016","journal-title":"Silpak. Univ. J."},{"key":"e_1_2_1_18_1","volume-title":"Proceedings of the Asian Language Processing (IALP\u201912)","author":"Rabiya Rashid","year":"2012"},{"key":"e_1_2_1_19_1","volume-title":"Department of Thai and Oriental Languages","author":"Siriaksornsat Pojanee"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCAIS.2013.6720529"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.5555\/239895.239900"},{"key":"e_1_2_1_22_1","unstructured":"Wat Srisawang 2010. Esan Literary Works. Retrieved from http:\/\/www.esansawang.in.th. [in Thai]  Wat Srisawang 2010. Esan Literary Works. Retrieved from http:\/\/www.esansawang.in.th. [in Thai]"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/ECTICon.2013.6559585"},{"key":"e_1_2_1_24_1","volume-title":"Northeastern Thai Language and Scripts. Department of Thai and Oriental Languages","author":"Tapang Adul"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCIT.2010.5665124"},{"key":"e_1_2_1_26_1","volume-title":"Towards the Design of a Thai Text Syllable Analyzer. Master's thesis","author":"Thairatananond Yupin"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1177\/0165551507086258"},{"key":"e_1_2_1_28_1","first-page":"63","article-title":"Magical use of traditional scripts in northeastern Thai villages","volume":"74","author":"Tsumura Fumihiko","year":"2009","journal-title":"Senri Ethnol. Stud."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/2184436.2184440"}],"container-title":["ACM Transactions on Asian and Low-Resource Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3359990","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3359990","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:13:28Z","timestamp":1750202008000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3359990"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,2,22]]},"references-count":29,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2020,3,31]]}},"alternative-id":["10.1145\/3359990"],"URL":"https:\/\/doi.org\/10.1145\/3359990","relation":{},"ISSN":["2375-4699","2375-4702"],"issn-type":[{"value":"2375-4699","type":"print"},{"value":"2375-4702","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,2,22]]},"assertion":[{"value":"2018-05-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-08-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-02-22","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}