{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,22]],"date-time":"2026-06-22T17:27:37Z","timestamp":1782149257754,"version":"3.54.5"},"reference-count":53,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2023,12,19]],"date-time":"2023-12-19T00:00:00Z","timestamp":1702944000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Princess Nourah bint Abdulrahman University Researchers Supporting Project","award":["PNURSP2023R104"],"award-info":[{"award-number":["PNURSP2023R104"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Systems"],"abstract":"<jats:p>Despite a few attempts to automatically crawl Ewe text from online news portals and magazines, the African Ewe language remains underdeveloped despite its rich morphology and complex \"unique\" structure. This is due to the poor quality, unbalanced, and religious-based nature of the crawled Ewe texts, thus making it challenging to preprocess and perform any NLP task with current transformer-based language models. In this study, we present a well-preprocessed Ewe dataset for low-resource text classification to the research community. Additionally, we have developed an Ewe-based word embedding to leverage the low-resource semantic representation. Finally, we have fine-tuned seven transformer-based models, namely BERT-based (cased and uncased), DistilBERT-based (cased and uncased), RoBERTa, DistilRoBERTa, and DeBERTa, using the preprocessed Ewe dataset that we have proposed. Extensive experiments indicate that the fine-tuned BERT-base-cased model outperforms all baseline models with an accuracy of 0.972, precision of 0.969, recall of 0.970, loss score of 0.021, and an F1-score of 0.970. This performance demonstrates the model\u2019s ability to comprehend the low-resourced Ewe semantic representation compared to all other models, thus setting the fine-tuned BERT-based model as the benchmark for the proposed Ewe dataset.<\/jats:p>","DOI":"10.3390\/systems12010001","type":"journal-article","created":{"date-parts":[[2023,12,19]],"date-time":"2023-12-19T11:17:24Z","timestamp":1702984644000},"page":"1","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":14,"title":["Pre-Trained Transformer-Based Models for Text Classification Using Low-Resourced Ewe Language"],"prefix":"10.3390","volume":"12","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0723-5008","authenticated-orcid":false,"given":"Victor Kwaku","family":"Agbesi","sequence":"first","affiliation":[{"name":"School of Computer Science and Engineering, University of Electronic Science and Technology of China (UESTC), Chengdu 611731, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Wenyu","family":"Chen","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, University of Electronic Science and Technology of China (UESTC), Chengdu 611731, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6462-5320","authenticated-orcid":false,"given":"Sophyani Banaamwini","family":"Yussif","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, University of Electronic Science and Technology of China (UESTC), Chengdu 611731, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0730-2098","authenticated-orcid":false,"given":"Md Altab","family":"Hossin","sequence":"additional","affiliation":[{"name":"School of Innovation and Entrepreneurship, Chengdu University, No. 2025 Chengluo Avenue, Chengdu 610106, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4532-6026","authenticated-orcid":false,"given":"Chiagoziem C.","family":"Ukwuoma","sequence":"additional","affiliation":[{"name":"College of Nuclear Technology and Automation Engineering, Chengdu University of Technology, Chengdu 610059, China"},{"name":"Sichuan Engineering Technology Research Center for Industrial Internet Intelligent Monitoring and Application, Chengdu University of Technology, Chengdu 610059, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Noble A.","family":"Kuadey","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, University of Electronic Science and Technology of China (UESTC), Chengdu 611731, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Colin Collinson","family":"Agbesi","sequence":"additional","affiliation":[{"name":"Faculty of Applied Science and Technology, Koforidua Technical University, Koforidua P.O. Box KF-981, Ghana"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5957-1383","authenticated-orcid":false,"given":"Nagwan","family":"Abdel Samee","sequence":"additional","affiliation":[{"name":"Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9149-2810","authenticated-orcid":false,"given":"Mona M.","family":"Jamjoom","sequence":"additional","affiliation":[{"name":"Department of Computer Sciences, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh 11671, Saudi Arabia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4457-4407","authenticated-orcid":false,"given":"Mugahed A.","family":"Al-antari","sequence":"additional","affiliation":[{"name":"Department of Artificial Intelligence, College of Software & Convergence Technology, Daeyang AI Center, Sejong University, Seoul 05006, Republic of Korea"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2023,12,19]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"10417","DOI":"10.1007\/s10489-022-03946-x","article-title":"BERT-based chinese text classification for emergency management with a novel loss function","volume":"53","author":"Wang","year":"2023","journal-title":"Appl. Intell."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"103011","DOI":"10.1016\/j.ipm.2022.103011","article-title":"A text classification approach to detect psychological stress combining a lexicon-based feature framework with distributional representations","volume":"59","author":"Iglesias","year":"2022","journal-title":"Inf. Process. Manag."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Borjali, A., Magn\u00e9li, M., Shin, D., Malchau, H., Muratoglu, O.K., and Varadarajan, K.M. (2021). Natural language processing with deep learning for medical adverse event detection from free-text medical narratives: A case study of detecting total hip replacement dislocation. Comput. Biol. Med., 129.","DOI":"10.1016\/j.compbiomed.2020.104140"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"918","DOI":"10.1093\/comjnl\/bxaa130","article-title":"Semantic Analysis to Identify Students\u2019 Feedback","volume":"65","author":"Masood","year":"2022","journal-title":"Comput. J."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Dogra, V., Alharithi, F.S., \u00c1lvarez, R.M., Singh, A., and Qahtani, A.M. (2022). NLP-Based Application for Analyzing Private and Public Banks Stocks Reaction to News Events in the Indian Stock Exchange. Systems, 10.","DOI":"10.3390\/systems10060233"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Abdelhady, N., Elsemman, I.E., Farghally, M.F., and Soliman, T.H.A. (2023). Developing Analytical Tools for Arabic Sentiment Analysis of COVID-19 Data. Algorithms, 16.","DOI":"10.3390\/a16070318"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Hayashi, T., Yoshimura, T., Inuzuka, M., Kuroyanagi, I., and Segawa, O. (2021, January 23\u201327). Spontaneous Speech Summarization: Transformers All The Way Through. Proceedings of the European Signal Processing Conference, Dublin, Ireland.","DOI":"10.23919\/EUSIPCO54536.2021.9615996"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Palanivinayagam, A., El-Bayeh, C.Z., and Dama\u0161evi\u010dius, R. (2023). Twenty Years of Machine-Learning-Based Text Classification: A Systematic Review. Algorithms, 16.","DOI":"10.3390\/a16050236"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Agbesi, V.K., Chen, W., Gizaw, S.M., Ukwuoma, C.C., Ameneshewa, A.S., and Ejiyi, C.J. Attention Based BiGRU-2DCNN with Hunger Game Search Technique for Low-Resource Document-Level Sentiment Classification. In ACM International Conference Proceeding Series; 2023; pp. 48\u201354.","DOI":"10.1145\/3582177.3582186"},{"key":"ref_10","first-page":"1","article-title":"A Survey on Text Classification: From Traditional to Deep Learning","volume":"13","author":"Li","year":"2022","journal-title":"ACM Trans. Intell. Syst. Technol."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1671","DOI":"10.1109\/LSP.2015.2420092","article-title":"Deep neural network approaches to speaker and language recognition","volume":"22","author":"Richardson","year":"2015","journal-title":"IEEE Signal Process. Lett."},{"key":"ref_12","unstructured":"Guggilla, C. Discrimination between Similar Languages, Varieties and Dialects using {CNN}- and {LSTM}-based Deep Neural Networks. Proceedings of the Third Workshop on {NLP} for Similar Languages, Varieties and Dialects ({V}ar{D}ial3), Osaka, Japan."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Agbesi, V.K., Chen, W., Odame, E., and Browne, J.A. (2023, January 23\u201324). Efficient Adaptive Convolutional Model Based on Label Embedding for Text Classification Using Low Resource Languages. Proceedings of the 2023 7th International Conference on Intelligent Systems, Metaheuristics & Swarm Intelligence, Virtual.","DOI":"10.1145\/3596947.3596962"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Cho, K., van Merri\u00ebnboer, B., Bahdanau, D., and Bengio, Y. (2014, January 25). On the properties of neural machine translation: Encoder\u2013decoder approaches. Proceedings of the SSST 2014-8th Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar.","DOI":"10.3115\/v1\/W14-4012"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Howard, J., and Ruder, S. (2018, January 15\u201320). Universal language model fine-tuning for text classification. Proceedings of the ACL 2018-56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers), Melbourne, Australia.","DOI":"10.18653\/v1\/P18-1031"},{"key":"ref_16","first-page":"399","article-title":"Improving Language Understanding by Generative Pre-Training","volume":"9","author":"Radford","year":"2007","journal-title":"Homol. Homotopy Appl."},{"key":"ref_17","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2019, January 2\u20137). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the NAACL HLT 2019-2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies-Proceedings of the Conference, Minneapolis, MN, USA."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"10602","DOI":"10.1007\/s10489-022-04052-8","article-title":"Transformer models used for text-based question answering systems","volume":"53","author":"Nassiri","year":"2023","journal-title":"Appl. Intell."},{"key":"ref_19","unstructured":"Cruz, J.C.B., and Cheng, C. (2020). Establishing Baselines for Text Classification in Low-Resource Languages. arXiv."},{"key":"ref_20","first-page":"6595","article-title":"Short text classification for Arabic social media tweets","volume":"34","author":"Alzanin","year":"2022","journal-title":"J. King Saud Univ.-Comput. Inf. Sci."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"34046","DOI":"10.1109\/ACCESS.2022.3162614","article-title":"A Long-Text Classification Method of Chinese News Based on BERT and CNN","volume":"10","author":"Chen","year":"2022","journal-title":"IEEE Access"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Islam, K.I., Islam, M.S., and Amin, M.R. (2020, January 19\u201321). Sentiment analysis in Bengali via transfer learning using multi-lingual BERT. Proceedings of the ICCIT 2020-23rd International Conference on Computer and Information Technology, Proceedings, Virtual.","DOI":"10.1109\/ICCIT51783.2020.9392653"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"e1413","DOI":"10.7717\/peerj-cs.1413","article-title":"A comprehensive survey of techniques for developing an Arabic question answering system","volume":"9","author":"Alkhurayyif","year":"2023","journal-title":"PeerJ Comput. Sci."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"102481","DOI":"10.1016\/j.ipm.2020.102481","article-title":"On the cost-effectiveness of neural and non-neural approaches and representations for text classification: A comprehensive comparative study","volume":"58","author":"Cunha","year":"2021","journal-title":"Inf. Process. Manag."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"214","DOI":"10.1016\/j.neucom.2019.10.033","article-title":"Text classification using capsules","volume":"376","author":"Kim","year":"2020","journal-title":"Neurocomputing"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Agbesi, V.K., Wenyu, C., Kuadey, N.A., and Maale, G.T. (2022, January 22\u201325). Multi-Topic Categorization in a Low-Resource Ewe Language: A Modern Transformer Approach. Proceedings of the 2022 7th International Conference on Computer and Communication Systems (ICCCS), Wuhan, China.","DOI":"10.1109\/ICCCS55155.2022.9846372"},{"key":"ref_27","unstructured":"Azunre, P., Osei, S., Addo, S., Adu-Gyamfi, L.A., Moore, S., Adabankah, B., Opoku, B., Asare-Nyarko, C., Nyarko, S., and Amoaba, C. (2021). NLP for Ghanaian Languages. arXiv."},{"key":"ref_28","unstructured":"Marivate, V., Sefara, T., Chabalala, V., Makhaya, K., Mokgonyane, T., Mokoena, R., and Modupe, A. (2020). Investigating an approach for low resource language dataset creation, curation and classification: Setswana and Sepedi. arXiv."},{"key":"ref_29","unstructured":"Cruz, J.C.B., and Cheng, C. (2019). Evaluating Language Model Finetuning Techniques for Low-resource Languages. arXiv."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"5437","DOI":"10.1007\/s00521-020-05321-8","article-title":"Benchmarking performance of machine and deep learning-based methodologies for Urdu text document classification","volume":"33","author":"Asim","year":"2021","journal-title":"Neural Comput. Appl."},{"key":"ref_31","first-page":"404","article-title":"Improving Arabic Text Classification Using P-Stemmer","volume":"15","author":"Kanan","year":"2020","journal-title":"Recent Adv. Comput. Sci. Commun."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"1","DOI":"10.21608\/ejle.2020.29313.1006","article-title":"Machine Learning and Feature Selection Approaches for Categorizing Arabic Text: Analysis, Comparison, and Proposal","volume":"7","author":"Elnahas","year":"2020","journal-title":"Egypt. J. Lang. Eng."},{"key":"ref_33","first-page":"363","article-title":"Vietnamese News Articles Classification Using Neural Networks","volume":"12","author":"Vinh","year":"2021","journal-title":"J. Adv. Inf. Technol."},{"key":"ref_34","first-page":"3412","article-title":"Classifying Arabic text using deep learning","volume":"97","author":"Galal","year":"2019","journal-title":"J. Theor. Appl. Inf. Technol."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"102121","DOI":"10.1016\/j.ipm.2019.102121","article-title":"Arabic text classification using deep learning models","volume":"57","author":"Elnagar","year":"2020","journal-title":"Inf. Process. Manag."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Liu, X., Zhou, G., Kong, M., Yin, Z., Li, X., Yin, L., and Zheng, W. (2023). Developing Multi-Labelled Corpus of Twitter Short Texts: A Semi-Automatic Method. Systems, 11.","DOI":"10.3390\/systems11080390"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Adjeisah, M., Liu, G., Nortey, R.N., Song, J., Lamptey, K.O., and Frimpong, F.N. (2020, January 17\u201319). Twi corpus: A massively Twi-to-handful languages parallel bible corpus. Proceedings of the 2020 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking, Exeter, UK.","DOI":"10.1109\/ISPA-BDCloud-SocialCom-SustainCom51426.2020.00157"},{"key":"ref_38","unstructured":"Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv."},{"key":"ref_39","unstructured":"Sanh, V., Debut, L., Chaumond, J., and Wolf, T. (2019). DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv."},{"key":"ref_40","first-page":"5753","article-title":"XLNet: Generalized autoregressive pretraining for language understanding","volume":"2019","author":"Yang","year":"2019","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Mohammad, S.M., Bravo-Marquez, F., Salameh, M., and Kiritchenko, S. (2018, January 5\u20136). SemEval-2018 Task 1: Affect in Tweets. Proceedings of the NAACL HLT 2018-International Workshop on Semantic Evaluation, SemEval 2018-Proceedings of the 12th Workshop, New Orleans, LA, USA.","DOI":"10.18653\/v1\/S18-1001"},{"key":"ref_42","unstructured":"Kuriyozov, E., Salaev, U., Matlatipov, S., and Matlatipov, G. (2023). Text classification dataset and analysis for Uzbek language. arXiv."},{"key":"ref_43","unstructured":"Javed, T.A., Shahzad, W., and Arshad, U. (2021). Hierarchical Text Classification of Urdu News using Deep Neural Network. arXiv."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"124478","DOI":"10.1109\/ACCESS.2021.3110285","article-title":"The Impact of Translating Resource-Rich Datasets to Low-Resource Languages through Multi-Lingual Text Processing","volume":"9","author":"Ghafoor","year":"2021","journal-title":"IEEE Access"},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"297","DOI":"10.1016\/j.future.2021.01.024","article-title":"Scalable multi-channel dilated CNN\u2013BiLSTM model with attention mechanism for Chinese textual sentiment analysis","volume":"118","author":"Gan","year":"2021","journal-title":"Futur. Gener. Comput. Syst."},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., and Funtowicz, M. (2020, January 16\u201320). Transformers: State-of-the-Art Natural Language Processing. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Online.","DOI":"10.18653\/v1\/2020.emnlp-demos.6"},{"key":"ref_47","unstructured":"He, P., Liu, X., Gao, J., and Chen, W. (2020). DeBERTa: Decoding-enhanced BERT with Disentangled Attention. arXiv."},{"key":"ref_48","unstructured":"Sun, C., Huang, L., and Qiu, X. (2019, January 2\u20137). Utilizing BERT for aspect-based sentiment analysis via constructing auxiliary sentence. Proceedings of the NAACL HLT 2019-2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies-Proceedings of the Conference, Minneapolis, MN, USA."},{"key":"ref_49","first-page":"5999","article-title":"Attention is all you need","volume":"30","author":"Vaswani","year":"2017","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_50","unstructured":"Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., and Soricut, R. (2019). ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. arXiv."},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzm\u00e1n, F., Grave, E., Ott, M., Zettlemoyer, L., and Stoyanov, V. (2020, January 5\u201310). Unsupervised cross-lingual representation learning at scale. Proceedings of the Annual Meeting of the Association for Computational Linguistics, Online.","DOI":"10.18653\/v1\/2020.acl-main.747"},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"297","DOI":"10.1023\/A:1007614523901","article-title":"Improved boosting algorithms using confidence-rated predictions","volume":"37","author":"Schapire","year":"1999","journal-title":"Mach. Learn."},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"171","DOI":"10.1023\/A:1010920819831","article-title":"A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems","volume":"45","author":"Hand","year":"2001","journal-title":"Mach. Learn."}],"container-title":["Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2079-8954\/12\/1\/1\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T21:41:37Z","timestamp":1760132497000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2079-8954\/12\/1\/1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,12,19]]},"references-count":53,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2024,1]]}},"alternative-id":["systems12010001"],"URL":"https:\/\/doi.org\/10.3390\/systems12010001","relation":{},"ISSN":["2079-8954"],"issn-type":[{"value":"2079-8954","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,12,19]]}}}