{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,9,15]],"date-time":"2024-09-15T23:43:12Z","timestamp":1726443792524},"reference-count":16,"publisher":"Walter de Gruyter GmbH","issue":"1","license":[{"start":{"date-parts":[[2021,1,1]],"date-time":"2021-01-01T00:00:00Z","timestamp":1609459200000},"content-version":"unspecified","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,1,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Despite the modern boom in technology, we are still faced with the fact that people write texts without diacritics. There are two main reasons for this. The first, historical reason stems from the past when the use of diacritics was troublesome and people would write text without them. The second one is the speed - typing without diacritics is usually faster. Text without diacritics is easy to understand for people, but for some types of documents, missing diacritics can cause a problem. This is also an issue when computers process such text. In this paper, we propose an algorithm based on word n-grams (a contiguous sequence of n words) that can restore diacritics of text written in the Slovak language. We also compare and evaluate our results with other algorithms developed for Slovak text.<\/jats:p>","DOI":"10.1515\/comp-2020-0143","type":"journal-article","created":{"date-parts":[[2021,2,16]],"date-time":"2021-02-16T17:36:16Z","timestamp":1613496976000},"page":"180-189","source":"Crossref","is-referenced-by-count":1,"title":["Diacritics restoration based on word n-grams for Slovak texts"],"prefix":"10.1515","volume":"11","author":[{"given":"\u0160tefan","family":"Toth","sequence":"first","affiliation":[{"name":"Department of Software Technologies, Faculty of Management Science and Informatics , University of \u017dilina , \u017dilina , Slovakia"}]},{"given":"Emanuel","family":"Zaymus","sequence":"additional","affiliation":[{"name":"Department of Software Technologies, Faculty of Management Science and Informatics , University of \u017dilina , \u017dilina , Slovakia"}]},{"given":"Michal","family":"\u010eura\u010d\u00edk","sequence":"additional","affiliation":[{"name":"Department of Software Technologies, Faculty of Management Science and Informatics , University of \u017dilina , \u017dilina , Slovakia"}]},{"given":"Patrik","family":"Hrk\u00fat","sequence":"additional","affiliation":[{"name":"Department of Software Technologies, Faculty of Management Science and Informatics , University of \u017dilina , \u017dilina , Slovakia"}]},{"given":"Matej","family":"Me\u0161ko","sequence":"additional","affiliation":[{"name":"Department of Software Technologies, Faculty of Management Science and Informatics , University of \u017dilina , \u017dilina , Slovakia"}]}],"member":"374","published-online":{"date-parts":[[2021,1,27]]},"reference":[{"key":"2022020121510180133_j_comp-2020-0143_ref_001","doi-asserted-by":"crossref","unstructured":"Federico M., Bertoldi N., Cettolo M., Irstlm: an open source toolkit for handling large scale language models, Ninth Annual Conference of the International Speech Communication Association, 2008.","DOI":"10.21437\/Interspeech.2008-271"},{"key":"2022020121510180133_j_comp-2020-0143_ref_002","unstructured":"Gedera J., Dopl\u0148ova\u010d diakritiky (tool for diacritic restoration), http:\/\/text.fiit.stuba.sk:8081\/, Last accessed 24 June 2020."},{"key":"2022020121510180133_j_comp-2020-0143_ref_003","doi-asserted-by":"crossref","unstructured":"Hl\u00e1dek D., Sta\u0161 J., Juh\u00e1r J., Diacritics restoration in the slovak texts using hidden markov model, Language and Technology Conference, Springer, 2013, 29\u201340.","DOI":"10.1007\/978-3-319-43808-5_3"},{"key":"2022020121510180133_j_comp-2020-0143_ref_004","unstructured":"Hra\u0161ka R., Dopl\u0148a\u010d diakritiky (tool for diacritic restoration), https:\/\/diakritika.brm.sk\/, Last accessed 24 June 2020."},{"key":"2022020121510180133_j_comp-2020-0143_ref_005","doi-asserted-by":"crossref","unstructured":"Hucko A. Lacko P., Diacritics restoration using deep neural networks, 2018 World Symposium on Digital Intelligence for Systems and Machines (DISA), IEEE, 2018, 195\u2013200.","DOI":"10.1109\/DISA.2018.8490624"},{"key":"2022020121510180133_j_comp-2020-0143_ref_006","doi-asserted-by":"crossref","unstructured":"Jansen W. Delaitre A., Mobile forensic reference materials: A methodology and reification, 2009, US Department of Commerce, National Institute of Standards and Technology.","DOI":"10.6028\/NIST.IR.7617"},{"key":"2022020121510180133_j_comp-2020-0143_ref_007","doi-asserted-by":"crossref","unstructured":"Krchnavy R. Simko M., Sentiment analysis of social network posts in slovak language, 2017 12th International Workshop on Semantic and Social Media Adaptation and Personalization (SMAP), IEEE, 2017, 20\u201325.","DOI":"10.1109\/SMAP.2017.8022661"},{"key":"2022020121510180133_j_comp-2020-0143_ref_008","unstructured":"Ljube\u0161i\u0107 N., Erjavec T., Fi\u0161er D., Corpus-based diacritic restoration for south slavic languages, Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC\u201916), 2016, 3612\u20133616."},{"key":"2022020121510180133_j_comp-2020-0143_ref_009","unstructured":"Merriam-Webster, Diacritic - definition of diacritic by merriam-webster, https:\/\/www.merriam-webster.com\/dictionary\/diacritic, Last accessed 24 June 2020."},{"key":"2022020121510180133_j_comp-2020-0143_ref_010","unstructured":"N\u00e1plava J., Straka M., Stra\u0148\u00e1k P., Hajic J., Diacritics restoration using neural networks, Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), 2018."},{"key":"2022020121510180133_j_comp-2020-0143_ref_011","doi-asserted-by":"crossref","unstructured":"Nov\u00e1k A. Sikl\u00f3si B., Automatic diacritics restoration for hungarian, Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015, 2286\u20132291.","DOI":"10.18653\/v1\/D15-1275"},{"key":"2022020121510180133_j_comp-2020-0143_ref_012","unstructured":"N\u00e1plava J., Straka M., Haji\u010d J., Stra\u0148\u00e1k P., Corpus for training and evaluating diacritics restoration systems, URL http:\/\/hdl.handle.net\/11234\/1-2607."},{"key":"2022020121510180133_j_comp-2020-0143_ref_013","unstructured":"Tufi\u015f D., Ceau\u015fu A., et al., Diac+: A professional diacritics recovering system, Proceedings of LREC 2008, 2008, 167\u2013174."},{"key":"2022020121510180133_j_comp-2020-0143_ref_014","unstructured":"L\u2019udov\u00edt \u0160t\u00far Institute of Linguistics of the Slovak Academy of Sciences (J\u00daL\u2019\u0160 SAV), Diakritik \u2013 n\u00e1stroj na rekon\u0161trukciu diakritiky (tool for diacritics reconstruction), https:\/\/www.juls.savba.sk\/diakritik.html, Last accessed 24 June 2020."},{"key":"2022020121510180133_j_comp-2020-0143_ref_015","unstructured":"L\u2019udov\u00edt \u0160t\u00far Institute of Linguistics of the Slovak Academy of Sciences (J\u00daL\u2019\u0160 SAV), Pravidl\u00e1 slovensk\u00e9ho pravopisu. 3., upraven\u00e9 a doplnen\u00e9 vyd, Bratislava: Veda, 2000."},{"key":"2022020121510180133_j_comp-2020-0143_ref_016","doi-asserted-by":"crossref","unstructured":"Zitouni I., Sorensen J. S., Sarikaya R., Maximum entropy based restoration of arabic diacritics, Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, Association for Computational Linguistics, 2006, 577\u2013584.","DOI":"10.3115\/1220175.1220248"}],"container-title":["Open Computer Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.degruyter.com\/document\/doi\/10.1515\/comp-2020-0143\/xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.degruyter.com\/document\/doi\/10.1515\/comp-2020-0143\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,2,1]],"date-time":"2022-02-01T22:09:58Z","timestamp":1643753398000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.degruyter.com\/document\/doi\/10.1515\/comp-2020-0143\/html"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,1,1]]},"references-count":16,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2021,1,13]]},"published-print":{"date-parts":[[2021,1,1]]}},"alternative-id":["10.1515\/comp-2020-0143"],"URL":"https:\/\/doi.org\/10.1515\/comp-2020-0143","relation":{},"ISSN":["2299-1093"],"issn-type":[{"type":"electronic","value":"2299-1093"}],"subject":[],"published":{"date-parts":[[2021,1,1]]}}}