{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,4,28]],"date-time":"2025-04-28T19:40:07Z","timestamp":1745869207298,"version":"3.40.4"},"reference-count":42,"publisher":"MIT Press","license":[{"start":{"date-parts":[[2025,4,28]],"date-time":"2025-04-28T00:00:00Z","timestamp":1745798400000},"content-version":"vor","delay-in-days":117,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,4,17]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Automatic dependency parsing of Thai sentences has been underexplored, as evidenced by the lack of large Thai dependency treebanks with complete dependency structures and the lack of a published evaluation of state-of-the-art models, especially transformer-based parsers. In this work, we addressed these gaps by introducing the Thai Universal Dependency Treebank (TUD), a new Thai treebank consisting of 3,627 trees annotated according to the Universal Dependencies (UD) framework. We then benchmarked 92 dependency parsing models that incorporate pretrained transformers on Thai-PUD and our TUD, achieving state-of-the-art results and shedding light on the optimal model components for Thai dependency parsing. Our error analysis of the models also reveals that polyfunctional words, serial verb construction, and lack of rich morphosyntactic features present main challenges for Thai dependency parsing.<\/jats:p>","DOI":"10.1162\/tacl_a_00745","type":"journal-article","created":{"date-parts":[[2025,4,28]],"date-time":"2025-04-28T19:04:53Z","timestamp":1745867093000},"page":"376-391","update-policy":"https:\/\/doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":0,"title":["The Thai Universal Dependency Treebank"],"prefix":"10.1162","volume":"13","author":[{"given":"Panyut","family":"Sriwirote","sequence":"first","affiliation":[{"name":"Department of Linguistics, Chulalongkorn University, Thailand. panyutsriwirote@gmail.com"}]},{"given":"Wei Qi","family":"Leong","sequence":"additional","affiliation":[{"name":"AI Singapore, Singapore. weiqi@aisingapore.org"}]},{"given":"Charin","family":"Polpanumas","sequence":"additional","affiliation":[{"name":"Amazon, Japan. cebril@gmail.com"}]},{"given":"Santhawat","family":"Thanyawong","sequence":"additional","affiliation":[{"name":"Faculty of Humanities and Social Sciences, Prince of Songkla University, Thailand. santhawat.t@psu.ac.th"}]},{"given":"William Chandra","family":"Tjhi","sequence":"additional","affiliation":[{"name":"AI Singapore, Singapore. wtjhi@aisingapore.org"}]},{"given":"Wirote","family":"Aroonmanakun","sequence":"additional","affiliation":[{"name":"Department of Linguistics, Chulalongkorn University, Thailand. awirote@chula.ac.th"}]},{"given":"Attapol T.","family":"Rutherford","sequence":"additional","affiliation":[{"name":"Department of Linguistics, Chulalongkorn University, Thailand attapol. attapol.t"}]}],"member":"281","published-online":{"date-parts":[[2025,4,17]]},"reference":[{"key":"2025042815044594700_bib1","doi-asserted-by":"publisher","first-page":"200190","DOI":"10.1016\/j.iswa.2023.200190","article-title":"Improving the performance of graph based dependency parsing by guiding bi-affine layer with augmented global and local features","volume":"18","author":"Alt\u0131nta\u015f","year":"2023","journal-title":"Intelligent Systems with Applications"},{"key":"2025042815044594700_bib2","doi-asserted-by":"publisher","first-page":"153","DOI":"10.3115\/1690299.1690321","article-title":"Thai National Corpus: A progress report","volume-title":"Proceedings of the 7th Workshop on Asian Language Resources","author":"Aroonmanakun","year":"2009"},{"key":"2025042815044594700_bib3","first-page":"6495","article-title":"Survey on Thai NLP language resources and tools","volume-title":"Proceedings of the Thirteenth Language Resources and Evaluation Conference","author":"Arreerard","year":"2022"},{"issue":"1","key":"2025042815044594700_bib4","doi-asserted-by":"publisher","first-page":"37","DOI":"10.1177\/001316446002000104","article-title":"A coefficient of agreement for nominal scales","volume":"20","author":"Cohen","year":"1960","journal-title":"Educational and Psychological Measurement"},{"issue":"2","key":"2025042815044594700_bib5","doi-asserted-by":"publisher","first-page":"255","DOI":"10.1162\/coli_a_00402","article-title":"Universal Dependencies","volume":"47","author":"de Marneffe","year":"2021","journal-title":"Computational Linguistics"},{"key":"2025042815044594700_bib6","doi-asserted-by":"publisher","first-page":"4171","DOI":"10.18653\/v1\/N19-1423","article-title":"BERT: Pre-training of deep bidirectional transformers for language understanding","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Devlin","year":"2019"},{"key":"2025042815044594700_bib7","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1611.01734","article-title":"Deep biaffine attention for neural dependency parsing","author":"Dozat","year":"2017"},{"key":"2025042815044594700_bib8","first-page":"38","article-title":"Cross-dialect social media dependency parsing for social scientific entity attribute analysis","volume-title":"Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022)","author":"Eggleston","year":"2022"},{"key":"2025042815044594700_bib9","doi-asserted-by":"publisher","first-page":"710","DOI":"10.18653\/v1\/N19-1076","article-title":"Left-to-right dependency parsing with pointer networks","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Fern\u00e1ndez-Gonz\u00e1lez","year":"2019"},{"key":"2025042815044594700_bib10","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1802.06893","article-title":"Learning word vectors for 157 languages","author":"Grave","year":"2018"},{"issue":"1","key":"2025042815044594700_bib11","doi-asserted-by":"publisher","first-page":"49","DOI":"10.6519\/TJL.201901_17(1).0002","article-title":"Is there a dichotomy between synthetic compounds and phrases in Thai?","volume":"17","author":"Hongthong","year":"2019","journal-title":"Taiwan Journal of Linguistics"},{"key":"2025042815044594700_bib12","doi-asserted-by":"publisher","first-page":"2475","DOI":"10.18653\/v1\/P19-1237","article-title":"Graph-based dependency parsing with graph neural networks","volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics","author":"Ji","year":"2019"},{"key":"2025042815044594700_bib13","doi-asserted-by":"publisher","first-page":"2779","DOI":"10.18653\/v1\/D19-1279","article-title":"75 languages, 1 model: Parsing Universal Dependencies universally","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Kondratyuk","year":"2019"},{"key":"2025042815044594700_bib14","first-page":"31","article-title":"A computational linguistics study of compound nouns in Thai","volume-title":"Proceedings of the Seventh International Symposium on Natural Language Processing (SNLP 2007)","author":"Kriengket","year":"2007"},{"key":"2025042815044594700_bib15","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1907.11692","article-title":"Roberta: A robustly optimized bert pretraining approach","author":"Liu","year":"2019"},{"key":"2025042815044594700_bib16","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2101.09635","article-title":"WangchanBERTa: Pretraining transformer- based Thai language models","author":"Lowphansirikul","year":"2021"},{"key":"2025042815044594700_bib17","doi-asserted-by":"publisher","first-page":"1403","DOI":"10.18653\/v1\/P18-1130","article-title":"Stack-pointer networks for dependency parsing","volume-title":"Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Ma","year":"2018"},{"key":"2025042815044594700_bib18","doi-asserted-by":"publisher","first-page":"7203","DOI":"10.18653\/v1\/2020.acl-main.645","article-title":"CamemBERT: A tasty French language model","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Martin","year":"2020"},{"key":"2025042815044594700_bib19","first-page":"1626","article-title":"Event extraction as dependency parsing","volume-title":"Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies","author":"McClosky","year":"2011"},{"key":"2025042815044594700_bib20","first-page":"122","article-title":"Characterizing the errors of data-driven dependency parsing models","volume-title":"Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)","author":"McDonald","year":"2007"},{"key":"2025042815044594700_bib21","doi-asserted-by":"publisher","first-page":"120","DOI":"10.1162\/tacl_a_00358","article-title":"Recursive non-autoregressive graph- to-graph transformer for dependency parsing with iterative refinement","volume":"9","author":"Mohammadshahi","year":"2021","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2025042815044594700_bib22","doi-asserted-by":"publisher","first-page":"731","DOI":"10.18653\/v1\/2020.findings-emnlp.65","article-title":"Rethinking self-attention: Towards interpretability in neural parsing","volume-title":"Findings of the Association for Computational Linguistics: EMNLP 2020","author":"Mrini","year":"2020"},{"key":"2025042815044594700_bib23","doi-asserted-by":"publisher","first-page":"80","DOI":"10.18653\/v1\/2021.eacl-demos.10","article-title":"Trankit: A light-weight transformer-based toolkit for multilingual natural language processing","volume-title":"Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations","author":"Van Nguyen","year":"2021"},{"issue":"4","key":"2025042815044594700_bib24","doi-asserted-by":"publisher","first-page":"513","DOI":"10.1162\/coli.07-056-R1-07-027","article-title":"Algorithms for deterministic incremental dependency parsing","volume":"34","author":"Nivre","year":"2008","journal-title":"Computational Linguistics"},{"key":"2025042815044594700_bib25","doi-asserted-by":"publisher","DOI":"10.5281\/zenodo.3519354","article-title":"PyThaiNLP: Thai natural language processing in Python","author":"Phatthiyaphaibun","year":"2016"},{"issue":"2","key":"2025042815044594700_bib26","doi-asserted-by":"publisher","first-page":"251","DOI":"10.1515\/lity.2000.4.2.251","article-title":"Adjectives as verbs in Thai","volume":"4","author":"Prasithrathsint","year":"2000","journal-title":"Linguistic Typology"},{"key":"2025042815044594700_bib27","doi-asserted-by":"publisher","first-page":"160","DOI":"10.18653\/v1\/K18-2016","article-title":"Universal Dependency parsing from scratch","volume-title":"Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies","author":"Qi","year":"2018"},{"key":"2025042815044594700_bib28","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2003.07082","article-title":"Stanza: A python natural language processing toolkit for many human languages","author":"Qi","year":"2020"},{"key":"2025042815044594700_bib29","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/ICITEED.2019.8930002","article-title":"Thai dependency parsing with character embedding","volume-title":"2019 11th International Conference on Information Technology and Electrical Engineering (ICITEE)","author":"Singkul","year":"2019"},{"key":"2025042815044594700_bib30","volume-title":"Thai: An Essential Grammar","author":"Smyth","year":"2002","edition":"1st"},{"key":"2025042815044594700_bib31","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2311.12475","article-title":"PhayaThaiBERT: Enhancing a pretrained Thai language model with unassimilated loanwords","author":"Sriwirote","year":"2023"},{"key":"2025042815044594700_bib32","first-page":"197","article-title":"UDPipe 2.0 prototype at CoNLL 2018 UD shared task","volume-title":"Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies","author":"Straka","year":"2018"},{"key":"2025042815044594700_bib33","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1908.07448","article-title":"Evaluating contextualized embeddings on 54 languages in pos tagging, lemmatization and dependency parsing","author":"Straka","year":"2019"},{"key":"2025042815044594700_bib34","first-page":"215","article-title":"Basic serial verb constructions in Thai","volume":"1","author":"Takahashi","year":"2009","journal-title":"Journal of the Southeast Asian Linguistics Society"},{"key":"2025042815044594700_bib35","doi-asserted-by":"publisher","first-page":"4458","DOI":"10.18653\/v1\/2021.acl-long.344","article-title":"Dependency-driven relation extraction with attentive graph convolutional networks","volume-title":"Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)","author":"Tian","year":"2021"},{"key":"2025042815044594700_bib36","doi-asserted-by":"publisher","first-page":"169","DOI":"10.1109\/ICBIR57571.2023.10147628","article-title":"Sequence-labeling RoBERTa model for dependency-parsing in Classical Chinese and its application to Vietnamese and Thai","volume-title":"2023 8th International Conference on Business and Industrial Research (ICBIR)","author":"Yasuoka","year":"2023"},{"key":"2025042815044594700_bib37","doi-asserted-by":"publisher","first-page":"1","DOI":"10.18653\/v1\/K18-2001","article-title":"CoNLL 2018 shared task: Multilingual parsing from raw text to Universal Dependencies","volume-title":"Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies","author":"Zeman","year":"2018"},{"key":"2025042815044594700_bib38","doi-asserted-by":"publisher","first-page":"1","DOI":"10.18653\/v1\/K17-3001","article-title":"CoNLL 2017 shared task: Multilingual parsing from raw text to Universal Dependencies","volume-title":"Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies","author":"Zeman","year":"2017"},{"key":"2025042815044594700_bib39","doi-asserted-by":"publisher","first-page":"3295","DOI":"10.18653\/v1\/2020.acl-main.302","article-title":"Efficient second-order TreeCRF for neural dependency parsing","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Zhang","year":"2020"},{"key":"2025042815044594700_bib40","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.emnlp-main.218","article-title":"Entity relation extraction as dependency parsing in visually rich documents","author":"Zhang","year":"2021","journal-title":"arXiv preprint arXiv:2110.09915"},{"key":"2025042815044594700_bib41","doi-asserted-by":"publisher","first-page":"2396","DOI":"10.18653\/v1\/P19-1230","article-title":"Head-driven phrase structure grammar parsing on Penn Treebank","volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics","author":"Zhou","year":"2019"},{"key":"2025042815044594700_bib42","doi-asserted-by":"publisher","first-page":"4809","DOI":"10.18653\/v1\/2020.emnlp-main.390","article-title":"Please mind the root: Decoding arborescences for dependency parsing","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Zmigrod","year":"2020"}],"container-title":["Transactions of the Association for Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00745\/2514583\/tacl_a_00745.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00745\/2514583\/tacl_a_00745.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,4,28]],"date-time":"2025-04-28T19:04:58Z","timestamp":1745867098000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/tacl\/article\/doi\/10.1162\/tacl_a_00745\/128939\/The-Thai-Universal-Dependency-Treebank"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025]]},"references-count":42,"URL":"https:\/\/doi.org\/10.1162\/tacl_a_00745","relation":{},"ISSN":["2307-387X"],"issn-type":[{"value":"2307-387X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025]]},"published":{"date-parts":[[2025]]}}}