{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,27]],"date-time":"2026-05-27T16:39:13Z","timestamp":1779899953382,"version":"3.53.1"},"reference-count":61,"publisher":"MDPI AG","issue":"11","license":[{"start":{"date-parts":[[2021,10,27]],"date-time":"2021-10-27T00:00:00Z","timestamp":1635292800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Future Internet"],"abstract":"<jats:p>The availability of different pre-trained semantic models has enabled the quick development of machine learning components for downstream applications. However, even if texts are abundant for low-resource languages, there are very few semantic models publicly available. Most of the publicly available pre-trained models are usually built as a multilingual version of semantic models that will not fit well with the need for low-resource languages. We introduce different semantic models for Amharic, a morphologically complex Ethio-Semitic language. After we investigate the publicly available pre-trained semantic models, we fine-tune two pre-trained models and train seven new different models. The models include Word2Vec embeddings, distributional thesaurus (DT), BERT-like contextual embeddings, and DT embeddings obtained via network embedding algorithms. Moreover, we employ these models for different NLP tasks and study their impact. We find that newly-trained models perform better than pre-trained multilingual models. Furthermore, models based on contextual embeddings from FLAIR and RoBERTa perform better than word2Vec models for the NER and POS tagging tasks. DT-based network embeddings are suitable for the sentiment classification task. We publicly release all the semantic models, machine learning components, and several benchmark datasets such as NER, POS tagging, sentiment classification, as well as Amharic versions of WordSim353 and SimLex999.<\/jats:p>","DOI":"10.3390\/fi13110275","type":"journal-article","created":{"date-parts":[[2021,10,27]],"date-time":"2021-10-27T22:00:23Z","timestamp":1635372023000},"page":"275","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":16,"title":["Introducing Various Semantic Models for Amharic: Experimentation and Evaluation with Multiple Tasks and Datasets"],"prefix":"10.3390","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8289-388X","authenticated-orcid":false,"given":"Seid Muhie","family":"Yimam","sequence":"first","affiliation":[{"name":"Language Technology Group, Universit\u00e4t Hamburg, Grindelallee 117, 20146 Hamburg, Germany"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Abinew Ali","family":"Ayele","sequence":"additional","affiliation":[{"name":"Language Technology Group, Universit\u00e4t Hamburg, Grindelallee 117, 20146 Hamburg, Germany"},{"name":"Faculty of Computing, Bahir Dar Institute of Technology, Bahir Dar University, Bahir Dar 6000, Ethiopia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Gopalakrishnan","family":"Venkatesh","sequence":"additional","affiliation":[{"name":"International Institute of Information Technology, Bangalore 560100, India"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4734-9571","authenticated-orcid":false,"given":"Ibrahim","family":"Gashaw","sequence":"additional","affiliation":[{"name":"College of Informatics, University of Gondar, Gondar 6200, Ethiopia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Chris","family":"Biemann","sequence":"additional","affiliation":[{"name":"Language Technology Group, Universit\u00e4t Hamburg, Grindelallee 117, 20146 Hamburg, Germany"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2021,10,27]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Camacho-Collados, J., and Pilehvar, M.T. (2020, January 12\u201313). Embeddings in Natural Language Processing. Proceedings of the 28th International Conference on Computational Linguistics: Tutorial Abstracts, Online.","DOI":"10.18653\/v1\/2020.coling-tutorials.2"},{"key":"ref_2","unstructured":"Katharina Sien\u010dnik, S. (2015, January 11\u201313). Adapting word2vec to Named Entity Recognition. Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015), Vilnius, Lithuania."},{"key":"ref_3","unstructured":"Joshi, M., Hart, E., Vogel, M., and Ruvini, J.D. (June, January 31). Distributed Word Representations Improve NER for e-Commerce. Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing, Denver, CO, USA."},{"key":"ref_4","unstructured":"Hou, J., Koppatz, M., Quecedo, J.M.H., and Yangarber, R. (October, January 30). Projecting named entity recognizers without annotated or parallel corpora. Proceedings of the 22nd Nordic Conference on Computational Linguistics, Turku, Finland."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Mbouopda, M.F., and Yonta, P.M. (2019, January 22\u201325). A Word Representation to Improve Named Entity Recognition in Low-resource Languages. Proceedings of the 2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS), Granada, Spain.","DOI":"10.1109\/SNAMS.2019.8931727"},{"key":"ref_6","unstructured":"Barhoumi, A., Camelin, N., Aloulou, C., Est\u00e8ve, Y., and Hadrich Belguith, L. (2020, January 13\u201315). Toward Qualitative Evaluation of Embeddings for Arabic Sentiment Analysis. Proceedings of the 12th Language Resources and Evaluation Conference, Marseille, France."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Al-Saqqa, S., and Awajan, A. (2019, January 14\u201316). The Use of Word2vec Model in Sentiment Analysis: A Survey. Proceedings of the 2019 International Conference on Artificial Intelligence, Robotics and Control, Cairo, Egypt.","DOI":"10.1145\/3388218.3388229"},{"key":"ref_8","unstructured":"Younes, A., and Weeds, J. (2020, January 12). Embed More Ignore Less (EMIL): Exploiting Enriched Representations for Arabic NLP. Proceedings of the Fifth Arabic Natural Language Processing Workshop, Barcelona, Spain."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Thavareesan, S., and Mahesan, S. (2020, January 26\u201328). Word embedding-based Part of Speech tagging in Tamil texts. Proceedings of the 2020 IEEE 15th International Conference on Industrial and Information Systems (ICIIS), Rupnagar, India.","DOI":"10.1109\/ICIIS51140.2020.9342640"},{"key":"ref_10","unstructured":"Pickard, T. (2020, January 13). Comparing word2vec and GloVe for Automatic Measurement of MWE Compositionality. Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons, Online."},{"key":"ref_11","unstructured":"Jadi, G., Claveau, V., Daille, B., and Monceaux, L. (2016, January 23\u201328). Evaluating Lexical Similarity to build Sentiment Similarity. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC\u201916), Portoro\u017e, Slovenia."},{"key":"ref_12","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2019, January 2\u20137). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA."},{"key":"ref_13","unstructured":"Tenney, I., Das, D., and Pavlick, E. (August, January 28). BERT Rediscovers the Classical NLP Pipeline. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy."},{"key":"ref_14","unstructured":"Agerri, R., Vicente, I.S., Campos, J.A., Barrena, A., Saralegi, X., Soroa, A., and Agirre, E. (2020). Give your Text Representation Models some Love: The Case for Basque. arXiv."},{"key":"ref_15","unstructured":"Ul\u010dar, M., and Robnik-\u0160ikonja, M. (2019). High Quality ELMo Embeddings for Seven Less-Resourced Languages. arXiv."},{"key":"ref_16","unstructured":"Grave, E., Bojanowski, P., Gupta, P., Joulin, A., and Mikolov, T. (2018, January 7\u201312). Learning Word Vectors for 157 Languages. Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzm\u00e1n, F., Grave, E., Ott, M., Zettlemoyer, L., and Stoyanov, V. (2020, January 5\u201310). Unsupervised Cross-lingual Representation Learning at Scale. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online.","DOI":"10.18653\/v1\/2020.acl-main.747"},{"key":"ref_18","unstructured":"Schweter, S. (2021, October 24). Multilingual Flair Embeddings. Available online: https:\/\/github.com\/flairNLP\/flair-lms."},{"key":"ref_19","unstructured":"Gashaw, I., and Shashirekha, H.L. (2020). Machine Learning Approaches for Amharic Parts-of-speech Tagging. arXiv."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Yimam, S.M., Alemayehu, H.M., Ayele, A., and Biemann, C. (2020, January 8\u201313). Exploring Amharic Sentiment Analysis from Social Media Texts: Building Annotation Tools and Classification Models. Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain.","DOI":"10.18653\/v1\/2020.coling-main.91"},{"key":"ref_21","unstructured":"Gezmu, A.M., Seyoum, B.E., Gasser, M., and N\u00fcrnberger, A. (, January August). Contemporary Amharic Corpus: Automatically Morpho-Syntactically Tagged Amharic Corpus. Proceedings of the First Workshop on Linguistic Resources for Natural Language Processing, Available online: https:\/\/aclanthology.org\/W18-3809.pdf."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"71","DOI":"10.1080\/02500167.2015.1018288","article-title":"Language policy, ideologies, power and the Ethiopian media","volume":"41","author":"Salawu","year":"2015","journal-title":"Communicatio"},{"key":"ref_23","unstructured":"Gasser, M. (2011, January 2\u20135). HornMorpho: A system for morphological processing of Amharic, Oromo, and Tigrinya. Proceedings of the Conference on Human Language Technology for Development, Alexandria, Egypt."},{"key":"ref_24","unstructured":"Suchomel, V., and Rychl\u00fd, P. (2016). Amharic Web Corpus, LINDAT\/CLARIN Digital Library at the Institute of Formal and Applied Linguistics (\u00daFAL), Faculty of Mathematics and Physics, Charles University."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"146","DOI":"10.1080\/00437956.1954.11659520","article-title":"Distributional Structure","volume":"10","author":"Harris","year":"1954","journal-title":"Word"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Ruppert, E., Kaufmann, M., Riedl, M., and Biemann, C. (2015, January 26\u201331). JoBimViz: A Web-based Visualization for Graph-based Distributional Semantic Models. Proceedings of the ACL-IJCNLP 2015 System Demonstrations, Beijing, China.","DOI":"10.3115\/v1\/P15-4018"},{"key":"ref_27","unstructured":"Mikolov, T., Sutskever, I., Chen, K., Corrado, G., and Dean, J. (2013). Distributed representations of words and phrases and their compositionality. arXiv."},{"key":"ref_28","unstructured":"\u0158eh\u00fa\u0159ek, R., and Sojka, P. Gensim\u2014Statistical Semantics in Python. Proceedings of the EuroScipy 2011, Available online: https:\/\/www.fi.muni.cz\/usr\/sojka\/posters\/rehurek-sojka-scipy2011.pdf."},{"key":"ref_29","unstructured":"Hamilton, W.L., Ying, R., and Leskovec, J. (2017). Representation Learning on Graphs: Methods and Applications. arXiv."},{"key":"ref_30","unstructured":"Sevgili, \u00d6., Panchenko, A., and Biemann, C. (August, January 29). Improving Neural Entity Disambiguation with Graph Embeddings. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, Florence, Italy."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"1616","DOI":"10.1109\/TKDE.2018.2807452","article-title":"A Comprehensive Survey of Graph Embedding: Problems, Techniques, and Applications","volume":"30","author":"Cai","year":"2018","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Perozzi, B., Al-Rfou, R., and Skiena, S. (2014, January 24\u201327). DeepWalk: Online Learning of Social Representations. Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA.","DOI":"10.1145\/2623330.2623732"},{"key":"ref_33","unstructured":"Ahmed, N.K., Rossi, R.A., Lee, J.B., Willke, T.L., Zhou, R., Kong, X., and Eldardiry, H. (2021, October 24). role2vec: Role-Based Network Embeddings. Available online: http:\/\/ryanrossi.com\/pubs\/role2vec-DLG-KDD.pdf."},{"key":"ref_34","unstructured":"Rozemberczki, B., Kiss, O., and Sarkar, R. (2020). An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs. arXiv."},{"key":"ref_35","unstructured":"Akbik, A., Blythe, D., and Vollgraf, R. (2018, January 21\u201325). Contextual String Embeddings for Sequence Labeling. Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, NM, USA."},{"key":"ref_36","unstructured":"Akbik, A., Bergmann, T., Blythe, D., Rasul, K., Schweter, S., and Vollgraf, R. (2019, January 2\u20137). FLAIR: An Easy-to-Use Framework for State-of-the-Art NLP. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Minneapolis, MN, USA."},{"key":"ref_37","unstructured":"Schweter, S., and Akbik, A. (2020). FLERT: Document-Level Features for Named Entity Recognition. arXiv."},{"key":"ref_38","unstructured":"Agi\u0107, \u017d., and Vuli\u0107, I. (August, January 28). JW300: A Wide-Coverage Parallel Corpus for Low-Resource Languages. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy."},{"key":"ref_39","unstructured":"Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (2017). Attention is All you Need. Advances in Neural Information Processing Systems, Curran Associates, Inc."},{"key":"ref_40","unstructured":"Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., and Stoyanov, V. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Pa\u015fca, M., and Soroa, A. (June, January 31). A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches. Proceedings of the Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Boulder, CO, USA.","DOI":"10.3115\/1620754.1620758"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"142907","DOI":"10.1109\/ACCESS.2019.2944151","article-title":"Word similarity datasets for Thai: Construction and evaluation","volume":"7","author":"Netisopakul","year":"2019","journal-title":"IEEE Access"},{"key":"ref_43","unstructured":"Asr, F.T., Zinkov, R., and Jones, M. (2018, January 1\u20136). Querying word embeddings for similarity and relatedness. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, LA, USA."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"665","DOI":"10.1162\/COLI_a_00237","article-title":"Simlex-999: Evaluating semantic models with (genuine) similarity estimation","volume":"41","author":"Hill","year":"2015","journal-title":"Comput. Linguist."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Speer, R., Chin, J., and Havasi, C. (2017, January 4\u20139). ConceptNet 5.5: An Open Multilingual Graph of General Knowledge. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA.","DOI":"10.1609\/aaai.v31i1.11164"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Recski, G., Ikl\u00f3di, E., Pajkossy, K., and Kornai, A. (2016, January 11). Measuring Semantic Similarity of Words Using Concept Networks. Proceedings of the 1st Workshop on Representation Learning for NLP, Berlin, Germany.","DOI":"10.18653\/v1\/W16-1622"},{"key":"ref_47","unstructured":"Getachew, M. (2001). Automatic Part-of-Speech Tagging for Amharic Language an Experiment Using Stochastic Hidden Markov Approach. [Master\u2019s Thesis, School of Graduate Studies, Addis Ababa University]."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Gamb\u00e4ck, B., Olsson, F., Alemu Argaw, A., and Asker, L. (2009, January 31). Methods for Amharic Part-of-Speech Tagging. Proceedings of the First Workshop on Language Technologies for African Languages, Athens, Greece.","DOI":"10.3115\/1564508.1564527"},{"key":"ref_49","unstructured":"Tachbelie, M.Y., and Menzel, W. (2009, January 14\u201316). Amharic Part-of-Speech Tagger for Factored Language Modeling. Proceedings of the International Conference RANLP-2009, Borovets, Bulgaria."},{"key":"ref_50","unstructured":"Tachbelie, M.Y., Abate, S.T., and Besacier, L. (2011). Part-of-speech tagging for underresourced and morphologically rich languages\u2014The case of Amharic. HLTD, 50\u201355. Available online: https:\/\/www.cle.org.pk\/hltd\/pdf\/HLTD201109.pdf."},{"key":"ref_51","unstructured":"Demeke, G., and Getachew, M. (2021, October 24). Manual Annotation of Amharic News Items with Part-of-Speech Tags and Its Challenges, Available online: https:\/\/www.bibsonomy.org\/bibtex\/d2fa6b0ccf8737fb4046c3d13f274894#export%7D%7B."},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Zitouni, I. (2014). Natural Language Processing of Semitic Languages, Springer.","DOI":"10.1007\/978-3-642-45358-8"},{"key":"ref_53","unstructured":"Ahmed, M. (2010). Named Entity Recognition for Amharic Language. [Master\u2019s Thesis, Addis Ababa University]."},{"key":"ref_54","unstructured":"Alemu, B. (2013). A Named Entity Recognition for Amharic. [Master\u2019s Thesis, Addis Ababa University]."},{"key":"ref_55","unstructured":"Tadele, M. (2014). Amharic Named Entity Recognition Using a Hybrid Approach. [Master\u2019s Thesis, Addis Ababa University]."},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Gamb\u00e4ck, B., and Sikdar, U.K. (June, January 31). Named entity recognition for Amharic using deep learning. Proceedings of the 2017 IST-Africa Week Conference (IST-Africa), Windhoek, Namibia.","DOI":"10.23919\/ISTAFRICA.2017.8102402"},{"key":"ref_57","doi-asserted-by":"crossref","unstructured":"Sikdar, U.K., and Gamb\u00e4ck, B. (2017, January 17\u201323). Named Entity Recognition for Amharic Using Stack-Based Deep Learning. Proceedings of the International Conference on Computational Linguistics and Intelligent Text Processing, Budapest, Hungary.","DOI":"10.1007\/978-3-319-77113-7_22"},{"key":"ref_58","unstructured":"Gangula, R.R.R., and Mamidi, R. (2018, January 7\u201312). Resource creation towards automated sentiment analysis in Telugu (a low resource language) and integrating multiple domain sources to enhance sentiment prediction. Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan."},{"key":"ref_59","doi-asserted-by":"crossref","unstructured":"Alemneh, G.N., Rauber, A., and Atnafu, S. (2019, January 22\u201324). Dictionary Based Amharic Sentiment Lexicon Generation. Proceedings of the International Conference on Information and Communication Technology for Development for Africa, Bahir Dar, Ethiopia.","DOI":"10.1007\/978-3-030-26630-1_27"},{"key":"ref_60","unstructured":"Gebremeskel, S. (2021, October 24). Sentiment Mining Model for Opinionated Amharic Texts. Available online: http:\/\/etd.aau.edu.et\/handle\/123456789\/3029."},{"key":"ref_61","doi-asserted-by":"crossref","unstructured":"Jana, A., and Goyal, P. (2018, January 1\u20136). Can Network Embedding of Distributional Thesaurus Be Combined with Word Vectors for Better Representation?. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, LA, USA.","DOI":"10.18653\/v1\/N18-1043"}],"container-title":["Future Internet"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-5903\/13\/11\/275\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T07:21:37Z","timestamp":1760167297000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-5903\/13\/11\/275"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,10,27]]},"references-count":61,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2021,11]]}},"alternative-id":["fi13110275"],"URL":"https:\/\/doi.org\/10.3390\/fi13110275","relation":{},"ISSN":["1999-5903"],"issn-type":[{"value":"1999-5903","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,10,27]]}}}