{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,11]],"date-time":"2026-02-11T19:07:02Z","timestamp":1770836822285,"version":"3.50.1"},"reference-count":50,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2026,2,11]],"date-time":"2026-02-11T00:00:00Z","timestamp":1770768000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Informatics"],"abstract":"<jats:p>This paper presents LinguoNER, a practical and extensible framework for bootstrapping Named Entity Recognition (NER) in extremely low-resource languages, demonstrated on Yambeta, a Bantu language spoken by a minority community in Cameroon. Due to scarce digital resources and the absence of annotated corpora, Yambeta has remained largely underrepresented in Natural Language Processing (NLP). LinguoNER addresses this gap by providing a methodologically transparent end-to-end workflow that integrates corpus acquisition, gazetteer-driven automatic annotation, tokenizer training, transformer fine-tuning, and multi-level evaluation in settings where large-scale manual annotation is infeasible. Using a Bible-derived corpus as a linguistically stable starting point, we release the first publicly available Yambeta NER dataset (\u224825,000 tokens) annotated with the CoNLL BIO scheme and a restricted entity schema (PER\/LOC\/ORG). Because labels are generated via dictionary-based annotation, the corpus is best characterized as silver-standard; credibility is strengthened through recorded dictionaries, transparency logs, expert-in-the-loop validation on sampled subsets, and complementary qualitative error analysis. We additionally train a dedicated Yambeta WordPiece tokenizer that preserves tone markers and diacritics, and fine-tune a bert-base-cased transformer for token classification. On a held-out test split, LinguoNER achieves strong token-level performance (Precision = 0.989, Recall = 0.981, F1 = 0.985), substantially outperforming a dictionary-only gazetteer baseline (\u0394F1 \u2248 0.36). Per-entity-type evaluation further indicates improvements beyond surface-form matching, while remaining errors are linguistically motivated and primarily involve multi-word entity boundaries, agglutinative constructions, and tone-\/diacritic-sensitive tokenization. We emphasize that results are restricted to a Bible domain and a limited label space, and should be interpreted as proof-of-concept evidence rather than claims of broad out-of-domain generalization. Overall, LinguoNER provides a reproducible blueprint for bootstrapping NER resources in underrepresented languages and supports future work on broader corpora sources (e.g., news, OPUS, JW300), additional African languages (e.g., Yoruba, Igbo, Bassa), and the iterative creation of expert-refined datasets and gold-standard subsets.<\/jats:p>","DOI":"10.3390\/informatics13020031","type":"journal-article","created":{"date-parts":[[2026,2,11]],"date-time":"2026-02-11T17:45:36Z","timestamp":1770831936000},"page":"31","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["LinguoNER: A Language-Agnostic Framework for Named Entity Recognition in Low-Resource Languages with a Focus on Yambeta"],"prefix":"10.3390","volume":"13","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0786-4253","authenticated-orcid":false,"given":"Philippe","family":"Tamla","sequence":"first","affiliation":[{"name":"Faculty of Mathematics and Computer Science, University of Hagen, 58097 Hagen, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-1808-4321","authenticated-orcid":false,"given":"Stephane","family":"Donna","sequence":"additional","affiliation":[{"name":"Faculty of Information and Communication Technology, ICT University USA, Yaounde P.O. Box 526, Cameroon"}]},{"given":"Tobias","family":"Bigala","sequence":"additional","affiliation":[{"name":"Faculty of Information and Communication Technology, ICT University USA, Yaounde P.O. Box 526, Cameroon"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-9336-0707","authenticated-orcid":false,"given":"Dilan","family":"Nde","sequence":"additional","affiliation":[{"name":"Faculty of Information and Communication Technology, ICT University USA, Yaounde P.O. Box 526, Cameroon"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1579-5232","authenticated-orcid":false,"given":"Maxime Yves Julien Manifi","family":"Abouh","sequence":"additional","affiliation":[{"name":"Higher Teacher Training College, University of Yaounde, Yaounde P.O. Box 47, Cameroon"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7344-6869","authenticated-orcid":false,"given":"Florian","family":"Freund","sequence":"additional","affiliation":[{"name":"Faculty of Mathematics and Computer Science, University of Hagen, 58097 Hagen, Germany"}]}],"member":"1968","published-online":{"date-parts":[[2026,2,11]]},"reference":[{"key":"ref_1","unstructured":"Mous, M., and Breedveld, A. (1986). A dialectometrical study of some Bantu languages (A. 40\u2013A. 60) of Cameroon. La M\u00e9thode Dialectom\u00e9trique, Appliqu\u00e9e aux Langues Africaines, Dietrich Reimer Verlag."},{"key":"ref_2","unstructured":"Benbow, C. (2007). Endangered? Yambetta in its Speech Community, SIT Graduate Institute."},{"key":"ref_3","unstructured":"Shamsfard, M. (2019, January 5\u20136). Challenges and opportunities in processing low resource languages: A study on persian. Proceedings of the International Conference Language Technologies for All (LT4All), Paris, France."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Nekoto, W., Marivate, V., Matsila, T., Fasubaa, T., Kolawole, T., Fagbohungbe, T., Akinola, S.O., Muhammad, S.H., Kabongo, S., and Osei, S. (2020). Participatory research for low-resourced machine translation: A case study in African languages. arXiv.","DOI":"10.18653\/v1\/2020.findings-emnlp.195"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1116","DOI":"10.1162\/tacl_a_00416","article-title":"MasakhaNER: Named entity recognition for African languages","volume":"9","author":"Adelani","year":"2021","journal-title":"Trans. Assoc. Comput. Linguist."},{"key":"ref_6","first-page":"20","article-title":"Named entity recognition and relation extraction: State-of-the-art","volume":"54","author":"Nasar","year":"2021","journal-title":"ACM Comput. Surv. (CSUR)"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"100017","DOI":"10.1016\/j.nlp.2023.100017","article-title":"A survey on Named Entity Recognition\u2014Datasets, tools, and methodologies","volume":"3","author":"Jehangir","year":"2023","journal-title":"Nat. Lang. Process. J."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Tang, X., Xia, D., Li, Y., Xu, T., and Xiong, N.N. (2023). A Survey of Low-Resource Named Entity Recognition. Proceedings of the International Conference on Information Science, Communication and Computing, Springer.","DOI":"10.1007\/978-981-99-7161-9_19"},{"key":"ref_9","unstructured":"Dube, M.W., and Wafula, R.S. (2017). Postcoloniality, Translation, and the Bible in Africa, Wipf and Stock Publishers."},{"key":"ref_10","unstructured":"Eberhard, D.M., Simons, G.F., and Fennig, C.D. (2020). Ethnologue: Languages of the World, SIL International. Available online: http:\/\/www.ethnologue.com."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Mamyrbayev, O., Alimhan, K., Zhumazhanov, B., Turdalykyzy, T., and Gusmanova, F. (2020). End-to-end speech recognition in agglutinative languages. Proceedings of the Asian Conference on Intelligent Information and Database Systems, Springer.","DOI":"10.1007\/978-3-030-42058-1_33"},{"key":"ref_12","unstructured":"Abouh, M.Y.J.M., and Sadembouo, E. (2014). Designing technical concepts in the elaboration of a thematic bilingual French\/Yambeta agricultural lexicon (De la d\u00e9nomination des concepts techniques dans l\u2019\u00e9laboration d\u2019un lexique th\u00e9matique agricole bilingue fran\u00e7ais yambetta). Proceedings of the TALN-RECITAL 2014 Workshop TALAf 2014: Traitement Automatique des Langues Africaines (TALAf 2014: African Language Processing), Association pour le Traitement Automatique des Langues. (In French)."},{"key":"ref_13","first-page":"6","article-title":"A computational look at oral history archives","volume":"15","author":"Pessanha","year":"2021","journal-title":"ACM J. Comput. Cult. Herit. (JOCCH)"},{"key":"ref_14","unstructured":"Wolf, T. (2019). Huggingface\u2019s transformers: State-of-the-art natural language processing. arXiv."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"89","DOI":"10.1080\/07421222.1990.11517898","article-title":"Systems Development in Information Systems Research","volume":"7","author":"Nunamaker","year":"1990","journal-title":"J. Manag. Inf. Syst."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"544","DOI":"10.1136\/amiajnl-2011-000464","article-title":"Natural language processing: An introduction","volume":"18","author":"Nadkarni","year":"2011","journal-title":"J. Am. Med. Inform. Assoc."},{"key":"ref_17","first-page":"1","article-title":"Named Entity Recognition in Low-resource Languages using Cross-lingual distributional word representation","volume":"33","author":"Mbouopda","year":"2020","journal-title":"Rev. Afr. Rech. Inform. Math. Appl."},{"key":"ref_18","unstructured":"Tamla, P. (2022). Supporting Access to Textual Resources Using Named Entity Recognition and Document Classification. [Ph.D. Thesis, University of Hagen]."},{"key":"ref_19","unstructured":"Eljasik-Swoboda, T. (2021). Bootstrapping Explainable Text Categorization in Emergent Knowledge-Domains. [Ph.D. Thesis, University of Hagen]."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Yohannes, H.M., and Amagasa, T. (2022). Named-entity recognition for a low-resource language using pre-trained language model. Proceedings of the 37th ACM\/SIGAPP Symposium on Applied Computing, Association for Computing Machinery. SAC \u201922.","DOI":"10.1145\/3477314.3507066"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Chen, S., Pei, Y., Ke, Z., and Silamu, W. (2021). Low-resource named entity recognition via the pre-training model. Symmetry, 13.","DOI":"10.3390\/sym13050786"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"137","DOI":"10.3406\/aflin.2013.1020","article-title":"Verb tone in Bantu languages: Micro-typological patterns and research methods","volume":"19","author":"Marlo","year":"2013","journal-title":"Afr. Linguist."},{"key":"ref_23","unstructured":"Martinus, L., and Abbott, J.Z. (2019). A focus on neural machine translation for african languages. arXiv."},{"key":"ref_24","unstructured":"Caines, A. (2026, February 01). The Geographic Diversity of NLP Conferences. Available online: https:\/\/www.marekrei.com\/blog\/geographic-diversity-of-nlp-conferences\/."},{"key":"ref_25","unstructured":"(2026, February 01). Pr\u00e9cis d\u2019Orthographe pour la Langue Yambe\u2019ta. Available online: https:\/\/www.sil.org\/system\/files\/reapdata\/14\/86\/15\/148615017254278893052551067942322276399\/Yambetta_Orthography_Guide.pdf."},{"key":"ref_26","first-page":"19","article-title":"Preserving Linguistic Diversity in the Digital Age: The Role of Technology in Endangered Language Documentation","volume":"6","author":"Eswaran","year":"2023","journal-title":"Acta Sci. Comput. Sci."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"356","DOI":"10.7202\/003694ar","article-title":"A history of translation and interpretation in Cameroon from precolonial times to present","volume":"35","author":"Nama","year":"1990","journal-title":"Meta"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"302","DOI":"10.1111\/j.1758-6631.1950.tb01687.x","article-title":"A Glance at Missions in Cameroon","volume":"39","author":"Brutsch","year":"1950","journal-title":"Int. Rev. Mission"},{"key":"ref_29","unstructured":"Aldridge, B. (2018). For the Gospel\u2019s Sake: The Rise of the Wycliffe Bible Translators and the Summer Institute of Linguistics, Wm. B. Eerdmans Publishing."},{"key":"ref_30","unstructured":"King, B.P. (2015). Practical Natural Language Processing for Low-Resource Languages. [Ph.D. Thesis, University of Michigan]."},{"key":"ref_31","unstructured":"Tadadjeu, M., and Sadembouo, E. (1984). Alphabet G\u00e9n\u00e9ral des Langues Camerounaises, University of Yaound\u00e9, Faculty of Letters and Social Sciences, Department of African Languages and Linguistics. Collection PROPELCA."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Adelani, D.I., Neubig, G., Ruder, S., Rijhwani, S., Beukman, M., Palen-Michel, C., Lignos, C., Alabi, J., Muhammad, S.H., and Nabende, P. (2022). Masakhaner 2.0: Africa-centric transfer learning for named entity recognition. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics.","DOI":"10.18653\/v1\/2022.emnlp-main.298"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Louis, A., De Waal, A., and Venter, C. (2006). Named entity recognition in a South African context. Proceedings of the 2006 Annual Research Conference of the South African Institute of Computer Scientists and Information Technologists on IT Research in Developing Countries, South African Institute for Computer Scientists and Information Technologists.","DOI":"10.1145\/1216262.1216281"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Shaalan, K., and Raza, H. (2008). Arabic named entity recognition from diverse text types. Proceedings of the International Conference on Natural Language Processing, Springer.","DOI":"10.1007\/978-3-540-85287-2_42"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Agi\u0107, \u017d., and Vuli\u0107, I. (2019). JW300: A wide-coverage parallel corpus for low-resource languages. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics.","DOI":"10.18653\/v1\/P19-1310"},{"key":"ref_36","first-page":"2214","article-title":"Parallel data, tools and interfaces in OPUS","volume":"Volume 2012","author":"Tiedemann","year":"2012","journal-title":"Proceedings of the Eighth International Conference on Language Resources and Evaluation"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Conneau, A., Khandelwal, K., Goyal, N., Chaudhary, V., Wenzek, G., Guzm\u00e1n, F., Grave, E., Ott, M., Zettlemoyer, L., and Stoyanov, V. (2020). Unsupervised cross-lingual representation learning at scale. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics.","DOI":"10.18653\/v1\/2020.acl-main.747"},{"key":"ref_38","unstructured":"Adelani, D.I., Masiak, M., Azime, I.A., Alabi, J., Tonja, A.L., Mwase, C., Ogundepo, O., Dossou, B.F., Oladipo, A., and Nixdorf, D. (2023). Masakhanews: News topic classification for african languages. Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics."},{"key":"ref_39","first-page":"121","article-title":"Projecting named entity tags from a resource rich language to a resource poor language","volume":"12","author":"Zamin","year":"2013","journal-title":"J. Inf. Commun. Technol."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1023\/A:1007558221122","article-title":"An algorithm that learns what\u2019s in a name","volume":"34","author":"Bikel","year":"1999","journal-title":"Mach. Learn."},{"key":"ref_41","unstructured":"Ehrmann, M., Turchi, M., and Steinberger, R. (2011). Building a multilingual named entity-annotated corpus using annotation projection. Proceedings of the International Conference Recent Advances in Natural Language Processing 2011, Association for Computational Linguistics."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Lothritz, C., Allix, K., Veiber, L., Klein, J., and Bissyande, T.F.D.A. (2020, January 8\u201313). Evaluating pretrained transformer-based models on the task of fine-grained named entity recognition. Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain.","DOI":"10.18653\/v1\/2020.coling-main.334"},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"97","DOI":"10.1109\/TCBB.2020.3035021","article-title":"Novel transformer networks for improved sequence labeling in genomics","volume":"19","author":"Clauwaert","year":"2020","journal-title":"IEEE\/ACM Trans. Comput. Biol. Bioinform."},{"key":"ref_44","unstructured":"Bouamor, H., Pino, J., and Bali, K. (2023). Are Structural Concepts Universal in Transformer Language Models? Towards Interpretable Cross-Lingual Generalization. Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, Association for Computational Linguistics."},{"key":"ref_45","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"655","DOI":"10.21105\/joss.00655","article-title":"Fast, Consistent Tokenization of Natural Language Text","volume":"3","author":"Mullen","year":"2018","journal-title":"J. Open Source Softw."},{"key":"ref_47","first-page":"569","article-title":"Critical Tokenization and its Properties","volume":"23","author":"Guo","year":"1997","journal-title":"Comput. Linguist."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Norman, D.A., and Draper, S.W. (1986). User Centered System Design: New Perspectives on Human-computer Interaction, CRC Press. [1st ed.].","DOI":"10.1201\/b15703"},{"key":"ref_49","unstructured":"Jacobson, L., and Booch, J.R.G. (2021). The Unified Modeling Language Reference Manual, Pearson Education."},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"52","DOI":"10.37358\/RC.21.4.8456","article-title":"Biomedical Named Entity Recognition Using the SVM Methodologies and bio Tagging Schemes","volume":"72","author":"Meenachisundaram","year":"2021","journal-title":"Rev. Chim."}],"container-title":["Informatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2227-9709\/13\/2\/31\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,2,11]],"date-time":"2026-02-11T18:06:49Z","timestamp":1770833209000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2227-9709\/13\/2\/31"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,2,11]]},"references-count":50,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2026,2]]}},"alternative-id":["informatics13020031"],"URL":"https:\/\/doi.org\/10.3390\/informatics13020031","relation":{},"ISSN":["2227-9709"],"issn-type":[{"value":"2227-9709","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,2,11]]}}}