{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,19]],"date-time":"2025-11-19T16:39:11Z","timestamp":1763570351869,"version":"3.45.0"},"reference-count":45,"publisher":"Association for Computing Machinery (ACM)","issue":"11","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Asian Low-Resour. Lang. Inf. Process."],"published-print":{"date-parts":[[2025,11,30]]},"abstract":"<jats:p>Today, automatic speech recognition systems are widely used by individuals, institutions, and organizations. However, the lack of punctuation marks in the texts produced by these systems complicates the comprehensibility of the texts and hinders advanced text analysis. Consequently, there is an increasing need for automatic punctuation restoration models. A review of existing studies reveals that most research focuses on the English language, while languages like Turkish, which belong to the agglutinative language group, have been relatively underexplored. In this study, a unique dataset has been created for Turkish automatic punctuation restoration. Models developed using convolutional neural networks, transformer encoder, and FnetEncoder layers were trained and analyzed with this dataset. The hyper-parameters of the developed models were optimized using Bayesian optimization. The analysis results showed that the best performance was achieved by the transformer encoder-based model with an overall F-score of 90.10%. Additionally, all models were observed to be more successful in predicting periods and spaces compared to commas. This study contributes to the literature by focusing on the Turkish language and offers a novel approach to automatic punctuation restoration with the creation of a new dataset and the developed models.<\/jats:p>","DOI":"10.1145\/3772087","type":"journal-article","created":{"date-parts":[[2025,10,22]],"date-time":"2025-10-22T10:49:58Z","timestamp":1761130198000},"page":"1-15","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Developing Deep Learning Models for Turkish Automatic Punctuation Restoration Using a Novel Dataset"],"prefix":"10.1145","volume":"24","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8276-2030","authenticated-orcid":false,"given":"Yasin","family":"G\u00f6rmez","sequence":"first","affiliation":[{"name":"Management Information Systems, Sivas Cumhuriyet University","place":["Sivas, Turkey"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3286-5159","authenticated-orcid":false,"given":"Halil","family":"Arslan","sequence":"additional","affiliation":[{"name":"Computer Engineering, Sivas Cumhuriyet University","place":["Sivas, Turkey"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8889-6590","authenticated-orcid":false,"given":"Mustafa","family":"Elyakan","sequence":"additional","affiliation":[{"name":"Detay Teknoloji Yaz\u0131l\u0131m Dan\u0131\u015fmanl\u0131k Bilgisayar Hizmetleri Tic. San. A.S. R&D Center","place":["\u0130stanbul, Turkey"]}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,11,19]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.21437\/interspeech.2016-1517"},{"key":"e_1_3_1_3_2","article-title":"Importance of punctuation marks for writing and reading comprehension skills","author":"Suliman F.","year":"2019","unstructured":"F. Suliman, M. B.-. Ahmeida, and S. Mahalla. 2019. Importance of punctuation marks for writing and reading comprehension skills. Faculty of Arts Journal 13 (2019).","journal-title":"Faculty of Arts Journal"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10579-021-09568-y"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.3015854"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.2991\/ijcis.2010.3.5.12"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/UBMK59864.2023.10286690"},{"key":"e_1_3_1_8_2","unstructured":"Tr-Dizin \u2018TRDizin \u2013 TRDizin\u2019. Retrieved August 22 2024 from https:\/\/trdizin.gov.tr\/"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W17-2319"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.wnut-1.18"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.eacl-demos.37"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICPR.2018.8545470"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2024.125097"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2021.115740"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.findings-emnlp.393"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-50316-1_31"},{"key":"e_1_3_1_17_2","first-page":"168","volume-title":"Proceedings of the 5th International Conference on Natural Language and Speech Processing (ICNLSP 2022)","author":"Dyer L.","year":"2022","unstructured":"L. Dyer, A. Hughes, D. Shah, and B. Can. 2022. Comparison of Token-Level and Character-Level approaches to restoration of spaces, punctuation, and capitalization in various languages. In Proceedings of the 5th International Conference on Natural Language and Speech Processing (ICNLSP 2022), M. Abbas and A. A. Freihat (Eds.). Trento, Italy: Association for Computational Linguistics, 168\u2013178. Retrieved August 22, 2024 from https:\/\/aclanthology.org\/2022.icnlsp-1.19"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.iwslt-1.33"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/CogInfoCom.2018.8639876"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.3390\/app13031685"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCC51575.2020.9344889"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-35320-8_17"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP39728.2021.9414518"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.23919\/APSIPAASC55919.2022.9980338"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2023-664"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.findings-naacl.149"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.21437\/SLaTE.2023-30"},{"key":"e_1_3_1_28_2","unstructured":"Tr-Dizin. 2024. TRDizin Api. Retrieved April 23 2024 from https:\/\/development.trdizin.gov.tr\/"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1080\/08839514.2023.2175112"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/3578707"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","unstructured":"M. Domingo M. Garc\u00eda-Mart\u00ednez A. Helle F. Casacuberta and M. Herranz. 2023. How much does tokenization affect neural machine translation?. In Computational Linguistics and Intelligent Text Processing A. Gelbukh (Ed.). Cham: Springer Nature Switzerland 545\u2013554. DOI:10.1007\/978-3-031-24337-0_38","DOI":"10.1007\/978-3-031-24337-0_38"},{"key":"e_1_3_1_32_2","unstructured":"K. Team. 2024. Keras documentation: KerasNLP. Retrieved August 23 2024 from https:\/\/keras.io\/keras_nlp\/"},{"key":"e_1_3_1_33_2","unstructured":"Team. 2024. Keras: Deep Learning for humans. Retrieved August 23 2024 from https:\/\/keras.io\/"},{"key":"e_1_3_1_34_2","unstructured":"TensorFlow. TensorFlow\u2019 TensorFlow. Retrieved August 23 2024 from https:\/\/www.tensorflow.org\/?hl=tr"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.engappai.2023.107013"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.2991\/978-94-6463-238-5_106"},{"key":"e_1_3_1_37_2","unstructured":"P. K. Mandal and R. Mahto. 2022. An FNet based auto encoder for long sequence news story generation. arXiv:2211.08295. Retrieved from https:\/\/arxiv.org\/abs\/2211.08295"},{"key":"e_1_3_1_38_2","unstructured":"S. Bai J. Z. Kolter and V. Koltun. 2018. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv:1803.01271. Retrieved from https:\/\/arxiv.org\/abs\/1803.01271"},{"key":"e_1_3_1_39_2","unstructured":"A. Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N. Gomez Lukasz Kaiser and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems Curran Associates Inc. Retrieved September 5 2025 from https:\/\/proceedings.neurips.cc\/paper\/2017\/hash\/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html"},{"key":"e_1_3_1_40_2","first-page":"1243","volume-title":"Proceedings of the 34th International Conference on Machine Learning.","author":"Gehring J.","year":"2017","unstructured":"J. Gehring, M. Auli, D. Grangier, D. Yarats, and Y. N. Dauphin. 2017. Convolutional sequence to sequence learning. In Proceedings of the 34th International Conference on Machine Learning. PMLR, 1243\u20131252. Retrieved September 5, 2025 from https:\/\/proceedings.mlr.press\/v70\/gehring17a.html"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-25566-3_40"},{"key":"e_1_3_1_42_2","unstructured":"J. Snoek H. Larochelle and R. P. Adams. 2012. Practical bayesian optimization of machine learning algorithms. In Advances in Neural Information Processing Systems. Curran Associates Inc. Retrieved August 26 2024 from https:\/\/proceedings.neurips.cc\/paper\/2012\/hash\/05311655a15b75fab86956663e1819cd-Abstract.html"},{"key":"e_1_3_1_43_2","unstructured":"J. Bergstra R. Bardenet Y. Bengio and B. K\u00e9gl. 2011. Algorithms for Hyper-Parameter Optimization. In Advances in Neural Information Processing Systems. Curran Associates Inc. Retrieved August 26 2024 from https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2011\/hash\/86e8f7ab32cfd12577bc2619bc635690-Abstract.html"},{"key":"e_1_3_1_44_2","unstructured":"scikit-optimize scikit-optimize: Sequential model-based optimization toolbox. 2024. Python. Retrieved August 26 2024 from https:\/\/scikit-optimize.readthedocs.io\/en\/latest\/contents.html"},{"key":"e_1_3_1_45_2","unstructured":"scikit-learn. 2024. sklearn.metrics\u2019 scikit-learn. Retrieved August 26 2024 from https:\/\/scikit-learn\/stable\/api\/sklearn.metrics.html"},{"key":"e_1_3_1_46_2","unstructured":"scikit-learn. 2024. Multiclass Receiver Operating Characteristic (ROC)\u2019 scikit-learn. Retrieved August 26 2024 from https:\/\/scikit-learn\/stable\/auto_examples\/model_selection\/plot_roc.html"}],"container-title":["ACM Transactions on Asian and Low-Resource Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3772087","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,19]],"date-time":"2025-11-19T16:38:15Z","timestamp":1763570295000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3772087"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11,19]]},"references-count":45,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2025,11,30]]}},"alternative-id":["10.1145\/3772087"],"URL":"https:\/\/doi.org\/10.1145\/3772087","relation":{},"ISSN":["2375-4699","2375-4702"],"issn-type":[{"type":"print","value":"2375-4699"},{"type":"electronic","value":"2375-4702"}],"subject":[],"published":{"date-parts":[[2025,11,19]]},"assertion":[{"value":"2025-03-07","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-10-11","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-11-19","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}