{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T01:21:54Z","timestamp":1760145714841,"version":"build-2065373602"},"reference-count":47,"publisher":"MDPI AG","issue":"8","license":[{"start":{"date-parts":[[2024,8,20]],"date-time":"2024-08-20T00:00:00Z","timestamp":1724112000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>Text simplification is crucial in bridging the comprehension gap in today\u2019s information-rich environment. Despite advancements in English text simplification, languages with intricate grammatical structures, such as Greek, often remain under-explored. The complexity of Greek grammar, characterized by its flexible syntactic ordering, presents unique challenges that hinder comprehension for native speakers, learners, tourists, and international students. This paper introduces a comprehensive dataset for Greek text simplification, containing over 7500 sentences across diverse topics such as history, science, and culture, tailored to address these challenges. We outline the methodology for compiling this dataset, including a collection of texts from Greek Wikipedia, their annotation with simplified versions, and the establishment of robust evaluation metrics. Additionally, the paper details the implementation of quality control measures and the application of machine learning techniques to analyze text complexity. Our experimental results demonstrate the dataset\u2019s initial effectiveness and potential in reducing linguistic barriers and enhancing communication, with initial machine learning models showing promising directions for future improvements in classifying text complexity. The development of this dataset marks a significant step toward improving accessibility and comprehension for a broad audience of Greek speakers and learners, fostering a more inclusive society.<\/jats:p>","DOI":"10.3390\/info15080500","type":"journal-article","created":{"date-parts":[[2024,8,20]],"date-time":"2024-08-20T09:13:48Z","timestamp":1724145228000},"page":"500","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Bridging Linguistic Gaps: Developing a Greek Text Simplification Dataset"],"prefix":"10.3390","volume":"15","author":[{"given":"Leonidas","family":"Agathos","sequence":"first","affiliation":[{"name":"Department of Informatics, Ionian University, 49100 Corfu, Greece"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-9230-3056","authenticated-orcid":false,"given":"Andreas","family":"Avgoustis","sequence":"additional","affiliation":[{"name":"Department of Informatics, Ionian University, 49100 Corfu, Greece"}]},{"given":"Xristiana","family":"Kryelesi","sequence":"additional","affiliation":[{"name":"Department of Informatics, Ionian University, 49100 Corfu, Greece"}]},{"given":"Aikaterini","family":"Makridou","sequence":"additional","affiliation":[{"name":"Department of Informatics, Ionian University, 49100 Corfu, Greece"}]},{"given":"Ilias","family":"Tzanis","sequence":"additional","affiliation":[{"name":"Department of Informatics, Ionian University, 49100 Corfu, Greece"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2844-5488","authenticated-orcid":false,"given":"Despoina","family":"Mouratidis","sequence":"additional","affiliation":[{"name":"Department of Informatics, Ionian University, 49100 Corfu, Greece"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3270-5078","authenticated-orcid":false,"given":"Katia Lida","family":"Kermanidis","sequence":"additional","affiliation":[{"name":"Department of Informatics, Ionian University, 49100 Corfu, Greece"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9964-4134","authenticated-orcid":false,"given":"Andreas","family":"Kanavos","sequence":"additional","affiliation":[{"name":"Department of Informatics, Ionian University, 49100 Corfu, Greece"}]}],"member":"1968","published-online":{"date-parts":[[2024,8,20]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Santucci, V., Santarelli, F., Forti, L., and Spina, S. (2020). Automatic Classification of Text Complexity. Appl. Sci., 10.","DOI":"10.3390\/app10207285"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Mouratidis, D., Mathe, E., Voutos, Y., Stamou, K., Kermanidis, K.L., Mylonas, P., and Kanavos, A. (2022, January 7\u20139). Domain-Specific Term Extraction: A Case Study on Greek Maritime Legal Texts. Proceedings of the 12th Hellenic Conference on Artificial Intelligence (SETN), Corfu, Greece.","DOI":"10.1145\/3549737.3549751"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Kanavos, A., Theodoridis, E., and Tsakalidis, A.K. (2012, January 7\u20139). Extracting Knowledge from Web Search Engine Results. Proceedings of the 24th International Conference on Tools with Artificial Intelligence (ICTAI), Athens, Greece.","DOI":"10.1109\/ICTAI.2012.120"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Vonitsanos, G., Kanavos, A., and Mylonas, P. (2023, January 15\u201318). Decoding Gender on Social Networks: An In-depth Analysis of Language in Online Discussions Using Natural Language Processing and Machine Learning. Proceedings of the IEEE International Conference on Big Data, Sorrento, Italy.","DOI":"10.1109\/BigData59044.2023.10386655"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Siddharthan, A., Nenkova, A., and McKeown, K.R. (2004, January 23\u201327). Syntactic Simplification for Improving Content Selection in Multi-Document Summarization. Proceedings of the 20th International Conference on Computational Linguistics (COLING), Geneva, Switzerland.","DOI":"10.3115\/1220355.1220484"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Silveira, S.B., and Branco, A. (2012, January 8\u201310). Combining a double clustering approach with sentence simplification to produce highly informative multi-document summaries. Proceedings of the 13th International Conference on Information Reuse & Integration (IRI), Las Vegas, NV, USA.","DOI":"10.1109\/IRI.2012.6303047"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Narayan, S., and Gardent, C. (2014, January 23\u201324). Hybrid Simplification using Deep Semantics and Machine Translation. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL), Baltimore, MD, USA.","DOI":"10.3115\/v1\/P14-1041"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Qiang, J., Zhang, F., Li, Y., Yuan, Y., Zhu, Y., and Wu, X. (2023). Unsupervised Statistical Text Simplification using Pre-trained Language Modeling for Initialization. Front. Comput. Sci., 17.","DOI":"10.1007\/s11704-022-1244-0"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"30","DOI":"10.1007\/978-3-642-12320-7_5","article-title":"Translating from Complex to Simplified Sentences","volume":"Volume 6001","author":"Specia","year":"2010","journal-title":"Proceedings of the 9th International Conference on Computational Processing of the Portuguese Language (PROPOR)"},{"key":"ref_10","unstructured":"Wubben, S., van den Bosch, A., and Krahmer, E. (2012, January 8\u201314). Sentence Simplification by Monolingual Machine Translation. Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju Island, Republic of Korea."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Zhang, X., and Lapata, M. (2017, January 9\u201311). Sentence Simplification with Deep Reinforcement Learning. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Copenhagen, Denmark.","DOI":"10.18653\/v1\/D17-1062"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"371","DOI":"10.1093\/llc\/fqr034","article-title":"Comparing Methods for the Syntactic Simplification of Sentences in Information Extraction","volume":"26","author":"Evans","year":"2011","journal-title":"Lit. Linguist. Comput."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Lu, X., Qiang, J., Li, Y., Yuan, Y., and Zhu, Y. (2021). An Unsupervised Method for Building Sentence Simplification Corpora in Multiple Languages. arXiv.","DOI":"10.18653\/v1\/2021.findings-emnlp.22"},{"key":"ref_14","unstructured":"(2024, July 30). Newsela Data. Available online: https:\/\/newsela.com\/data."},{"key":"ref_15","unstructured":"Coster, W., and Kauchak, D. (2011, January 19\u201324). Simple English Wikipedia: A New Text Simplification Task. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR, USA."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Vajjala, S., and Lucic, I. (2018, January 5). OneStopEnglish corpus: A new corpus for automatic readability assessment and text simplification. Proceedings of the 13th Workshop on Innovative Use of NLP for Building Educational Applications@NAACL-HLT, New Orleans, LA, USA.","DOI":"10.18653\/v1\/W18-0535"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Sun, R., Jin, H., and Wan, X. (2021). Document-Level Text Simplification: Dataset, Criteria and Baseline. arXiv.","DOI":"10.18653\/v1\/2021.emnlp-main.630"},{"key":"ref_18","unstructured":"Battisti, A., and Ebling, S. (2019). A Corpus for Automatic Readability Assessment and Text Simplification of German. arXiv."},{"key":"ref_19","unstructured":"Klaper, D., Ebling, S., and Volk, M. (2013, January 8). Building a German\/Simple German Parallel Corpus for Automatic Text Simplification. Proceedings of the 2nd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR@ACL), Sofia, Bulgaria."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Rios, A., Spring, N., Kew, T., Kostrzewa, M., S\u00e4uberli, A., M\u00fcller, M., and Ebling, S. (2021, January 10). A New Dataset and Efficient Baselines for Document-level Text Simplification in German. Proceedings of the 3rd Workshop on New Frontiers in Summarization, Hong Kong, China.","DOI":"10.18653\/v1\/2021.newsum-1.16"},{"key":"ref_21","unstructured":"Aluisio, S., Specia, L., Gasperin, C., and Scarton, C. (2010, January 5). Readability Assessment for Text Simplification. Proceedings of the NAACL HLT 5th Workshop on Innovative Use of NLP for Building Educational Applications, Los Angeles, CA, USA."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/2738046","article-title":"Making It Simplext: Implementation and Evaluation of a Text Simplification System for Spanish","volume":"6","author":"Saggion","year":"2015","journal-title":"ACM Trans. Access. Comput."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Brunato, D., Dell\u2019Orletta, F., Venturi, G., and Montemagni, S. (2015, January 5). Design and Annotation of the First Italian Corpus for Text Simplification. Proceedings of the 9th Linguistic Annotation Workshop (LAW@NAACL-HLT), Denver, CO, USA.","DOI":"10.3115\/v1\/W15-1604"},{"key":"ref_24","unstructured":"Gala, N., Tack, A., Javourey-Drevet, L., Fran\u00e7ois, T., and Ziegler, J.C. (2020, January 11\u201316). Alector: A Parallel Corpus of Simplified French Texts with Alignments of Misreadings by Poor and Dyslexic Readers. Proceedings of the 12th Language Resources and Evaluation Conference (LREC), Marseille, France."},{"key":"ref_25","unstructured":"Holmer, D., and Rennes, E. (2023, January 22\u201324). Constructing Pseudo-parallel Swedish Sentence Corpora for Automatic Text Simplification. Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), T\u00f3rshavn, France."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"106485","DOI":"10.1016\/j.engappai.2023.106485","article-title":"Monolingual, Multilingual and Cross-lingual Code Comment Classification","volume":"124","author":"Kostic","year":"2023","journal-title":"Eng. Appl. Artif. Intell."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Den Bercken, L.V., Sips, R., and Lofi, C. (2019, January 13\u201317). Evaluating Neural Text Simplification in the Medical Domain. Proceedings of the World Wide Web Conference (WWW), San Francisco, CA, USA.","DOI":"10.1145\/3308558.3313630"},{"key":"ref_28","unstructured":"Shardlow, M. (2014, January 26\u201331). Out in the Open: Finding and Categorising Errors in the Lexical Simplification Pipeline. Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC), Reykjavik, Iceland."},{"key":"ref_29","unstructured":"Bott, S., Rello, L., Drndarevic, B., and Saggion, H. (2012, January 8\u201315). Can Spanish Be Simpler? LexSiS: Lexical Simplification for Spanish. Proceedings of the 24th International Conference on Computational Linguistics (COLING), Mumbai, India."},{"key":"ref_30","unstructured":"Biran, O., Brody, S., and Elhadad, N. (2011, January 19\u201324). Putting it Simply: A Context-Aware Approach to Lexical Simplification. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR, USA."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"183","DOI":"10.1016\/S0950-7051(97)00029-4","article-title":"Automatic Induction of Rules for Text Simplification","volume":"10","author":"Chandrasekar","year":"1997","journal-title":"Knowl. Based Syst."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"3064","DOI":"10.1109\/TASLP.2021.3111589","article-title":"LSBert: Lexical Simplification Based on BERT","volume":"29","author":"Qiang","year":"2021","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Process."},{"key":"ref_33","unstructured":"Siddharthan, A. (2011, January 28\u201330). Text Simplification using Typed Dependencies: A Comparision of the Robustness of Different Generation Strategies. Proceedings of the 13th European Workshop on Natural Language Generation (ENLG), Nancy, France."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Siddharthan, A., and Mandya, A. (2014, January 26\u201330). Hybrid Text Simplification using Synchronous Dependency Grammars with Hand-written and Automatically Harvested Rules. Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL), Gothenburg, Sweden.","DOI":"10.3115\/v1\/E14-1076"},{"key":"ref_35","unstructured":"Woodsend, K., and Lapata, M. (2011, January 27\u201329). Learning to Simplify Sentences with Quasi-Synchronous Grammar and Integer Programming. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Edinburgh, UK."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Garbacea, C., Guo, M., Carton, S., and Mei, Q. (2020). An Empirical Study on Explainable Prediction of Text Complexity: Preliminaries for Text Simplification. arXiv.","DOI":"10.18653\/v1\/2021.acl-long.88"},{"key":"ref_37","unstructured":"Wang, T., Chen, P., Amaral, K.M., and Qiang, J. (2016). An Experimental Study of LSTM Encoder-Decoder Model for Text Simplification. arXiv."},{"key":"ref_38","unstructured":"Nisioi, S., Stajner, S., Ponzetto, S.P., and Dinu, L.P. (August, January 30). Exploring Neural Text Simplification Models. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL), Vancouver, BC, Canada."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Sulem, E., Abend, O., and Rappoport, A. (2018, January 15\u201320). Simple and Effective Text Simplification Using Semantic and Neural Methods. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL), Melbourne, Australia.","DOI":"10.18653\/v1\/P18-1016"},{"key":"ref_40","unstructured":"Arvan, M., Pina, L., and Parde, N. (2022, January 18\u201322). Reproducibility of Exploring Neural Text Simplification Models: A Review. Proceedings of the 15th International Conference on Natural Language Generation: Generation Challenges, virtual."},{"key":"ref_41","unstructured":"(2024, July 30). Greek Wikipedia. Available online: https:\/\/en.wikipedia.org\/wiki\/Greek_Wikipedia."},{"key":"ref_42","unstructured":"(2024, July 30). HiLab Greek Text Simplification Dataset. Available online: https:\/\/hilab.di.ionio.gr\/wp-content\/uploads\/2024\/07\/HiLab_Greek_text_simplification_Wikipedia_Dataset.zip."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Lee, R.S.T. (2024). Natural Language Processing\u2014A Textbook with Python Implementation, Springer.","DOI":"10.1007\/978-981-99-1999-4"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"421","DOI":"10.1007\/s10579-010-9124-x","article-title":"Steven Bird, Ewan Klein and Edward Loper: Natural Language Processing with Python, Analyzing Text with the Natural Language Toolkit\u2014O\u2019Reilly Media","volume":"44","author":"Wagner","year":"2010","journal-title":"Lang. Resour. Eval."},{"key":"ref_45","unstructured":"Honnibal, M., Montani, I., Landeghem, S.V., and Boyd, A. (2020). spaCy: Industrial-Strength Natural Language Processing in Python, Zenodo."},{"key":"ref_46","first-page":"1","article-title":"Automated Text Simplification: A Survey","volume":"54","author":"Azmi","year":"2022","journal-title":"ACM Comput. Surv."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Mouratidis, D., Kermanidis, K., and Kanavos, A. (2023, January 10\u201312). Comparative Study of Recurrent and Dense Neural Networks for Classifying Maritime Terms. Proceedings of the 14th International Conference on Information, Intelligence, Systems & Applications (IISA), Volos, Greece.","DOI":"10.1109\/IISA59645.2023.10345925"}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/15\/8\/500\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T15:39:47Z","timestamp":1760110787000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/15\/8\/500"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,8,20]]},"references-count":47,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2024,8]]}},"alternative-id":["info15080500"],"URL":"https:\/\/doi.org\/10.3390\/info15080500","relation":{},"ISSN":["2078-2489"],"issn-type":[{"type":"electronic","value":"2078-2489"}],"subject":[],"published":{"date-parts":[[2024,8,20]]}}}