{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,27]],"date-time":"2026-01-27T20:44:09Z","timestamp":1769546649942,"version":"3.49.0"},"reference-count":28,"publisher":"PeerJ","license":[{"start":{"date-parts":[[2026,1,27]],"date-time":"2026-01-27T00:00:00Z","timestamp":1769472000000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"abstract":"<jats:p>\n                    During the article a hybrid named-entity recognition (NER) algorithm for Uzbek is presented. It combines rule-based modules (transliteration, dialect normalization, morphological analysis) with modern neural network models. The study is motivated by Uzbek\u2019s agglutinative morphology, dialect diversity and the lack of specialized resources, which hinder the direct application of named entity recognition methods developed for English or other high-resource languages. As part of the work, an annotated\n                    <jats:italic>corpus<\/jats:italic>\n                    of more than three thousand sentences in the Uzbek language was formed, including legal documents, scientific articles, news materials and informal texts from social networks. The\n                    <jats:italic>corpus<\/jats:italic>\n                    is marked up according to the BIOES scheme taking into account the specific morphological and lexical features of the Uzbek language. The developed rule-oriented algorithms (transliteration, dialect standardization, morphological analysis) are integrated into a single post-processing system that complements neural network models. As a result of experiments aimed at assessing the effectiveness of the proposed approach, it was found that the hybrid approach significantly improves the accuracy and completeness metrics of named entity recognition in different thematic domains. The practical value of the study is that the proposed system can serve as a basis for automatic processing of Uzbek texts in the tasks of searching and extracting information, dialect normalization, annotating large text data and digitalization of document flow. The theoretical significance is that the work expands approaches to named entity recognition for low-resource languages, offering methods that take into account morphological-syntactic and dialectal features.\n                  <\/jats:p>","DOI":"10.7717\/peerj-cs.3489","type":"journal-article","created":{"date-parts":[[2026,1,27]],"date-time":"2026-01-27T08:21:26Z","timestamp":1769502086000},"page":"e3489","source":"Crossref","is-referenced-by-count":0,"title":["Development of hybrid approach for named entity recognition in Uzbek language text"],"prefix":"10.7717","volume":"12","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3969-1710","authenticated-orcid":true,"given":"Davlatyor","family":"Mengliev","sequence":"first","affiliation":[{"name":"Scientific Department, Cyber University, Nurafshon, Tashkent, Uzbekistan"}]},{"given":"Vladimir","family":"Barakhnin","sequence":"additional","affiliation":[{"name":"Federal Research Center for Information and Computational Technologies, Novosibirsk, Novosibirsk, Russia"}]},{"given":"Bahodir","family":"Ibragimov","sequence":"additional","affiliation":[{"name":"Information Technologies, Urgench State University, Urgench, Khorezm, Uzbekistan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6315-6561","authenticated-orcid":true,"given":"Mukhriddin","family":"Eshkulov","sequence":"additional","affiliation":[{"name":"Physics, Jizzakh Polytechnic Institute, Jizzakh, Uzbekistan"}]},{"given":"Oybek","family":"Allamov","sequence":"additional","affiliation":[{"name":"Information Technologies, Urgench State University, Urgench, Khorezm, Uzbekistan"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-2128-8769","authenticated-orcid":true,"given":"Madirimov","family":"Shohrux","sequence":"additional","affiliation":[{"name":"Information Technologies, Tashkent Institute of Textile and Light Industry, Tashkent, Tashkent, Uzbekistan"}]},{"given":"Otabek","family":"Khujaev","sequence":"additional","affiliation":[{"name":"Information Technologies, Urgench State University, Urgench, Khorezm, Uzbekistan"}]},{"given":"Bakhtiyar","family":"Rakhimov","sequence":"additional","affiliation":[{"name":"Cyberphysics Department, Urgench Branch of Tashkent Medical Acdemy, Urgench, Uzbekistan"}]}],"member":"4443","published-online":{"date-parts":[[2026,1,27]]},"reference":[{"issue":"4","key":"10.7717\/peerj-cs.3489\/ref-1","doi-asserted-by":"publisher","first-page":"44","DOI":"10.5281\/zenodo.7834009","article-title":"Identifying ner (named entity recognition) objects in Uzbek language texts","volume":"2","author":"Elov","year":"2023","journal-title":"Science and Innovation International Scientific Journal"},{"issue":"3","key":"10.7717\/peerj-cs.3489\/ref-2","doi-asserted-by":"publisher","first-page":"1840","DOI":"10.3390\/make6030090","article-title":"A parallel approach to enhance the performance of supervised machine learning realized in a multicore environment","volume":"6","author":"Ghimire","year":"2024","journal-title":"Machine Learning and Knowledge Extraction"},{"key":"10.7717\/peerj-cs.3489\/ref-3","first-page":"39","article-title":"Structure of a pragmatically-oriented model of an agglutinative natural language exemplified with Tatar","author":"Gilmullin","year":"2024"},{"issue":"Suppl 2","key":"10.7717\/peerj-cs.3489\/ref-4","doi-asserted-by":"publisher","first-page":"64","DOI":"10.1186\/s12911-019-0767-2","article-title":"A hybrid approach for named entity recognition in Chinese electronic medical record","volume":"19","author":"Ji","year":"2019","journal-title":"BMC Medical Informatics and Decision Making"},{"key":"10.7717\/peerj-cs.3489\/ref-5","article-title":"Analytical report, \u201cE-commerce in Uzbekistan\u201d","author":"KPMG","year":"2023"},{"key":"10.7717\/peerj-cs.3489\/ref-6","article-title":"Construction and evaluation of sentiment datasets for low-resource languages: the case of Uzbek","volume":"13212","author":"Kuriyozov","year":"2019"},{"key":"10.7717\/peerj-cs.3489\/ref-7","article-title":"Resolution President of the Republic of Uzbekistan On additional measures to accelerate the digitalization of the healthcare system and the implementation of advanced digital technologies. (in Russian)","author":"LexUz Online"},{"issue":"6","key":"10.7717\/peerj-cs.3489\/ref-8","doi-asserted-by":"publisher","first-page":"1953","DOI":"10.3390\/make6030096","article-title":"Assessing fine-tuned NER models with limited data in French: automating detection of new technologies, technological domains, and startup names in renewable energy","volume":"2024","author":"MacLean","year":"2024","journal-title":"Machine Learning and Knowledge Extraction"},{"key":"10.7717\/peerj-cs.3489\/ref-9","doi-asserted-by":"publisher","first-page":"108351","DOI":"10.1016\/j.dib.2022.108351","article-title":"Dataset of stopwords extracted from Uzbek texts","volume":"43","author":"Madatov","year":"2023","journal-title":"Data in Brief"},{"key":"10.7717\/peerj-cs.3489\/ref-10","doi-asserted-by":"publisher","first-page":"58","DOI":"10.1007\/978-3-030-28374-2_6","article-title":"Named entity extraction from semi-structured data using machine learning algorithms","volume-title":"Computational Collective Intelligence. ICCCI 2019. LNCS 11684","author":"Mansurova","year":"2019"},{"key":"10.7717\/peerj-cs.3489\/ref-11","article-title":"UzABSA: aspect-based sentiment analysis for the Uzbek language","author":"Matlatipov","year":"2024"},{"key":"10.7717\/peerj-cs.3489\/ref-12","doi-asserted-by":"publisher","DOI":"10.17632\/txrk7jm6x3.1","article-title":"Dataset of Khorezm dialect words of Uzbek language","volume":"V1","author":"Mengliev","year":"2025a","journal-title":"Mendeley Data"},{"key":"10.7717\/peerj-cs.3489\/ref-13","doi-asserted-by":"publisher","DOI":"10.17632\/7d59mk8xp5.1","article-title":"Dataset of Uzbek language NER (3000+)","volume":"V1","author":"Mengliev","year":"2025b","journal-title":"Mendeley Data"},{"key":"10.7717\/peerj-cs.3489\/ref-14","first-page":"2440","article-title":"Building a comprehensive Uzbek lexicon: bridging dialects for text standardization","author":"Mengliev","year":"2024a"},{"key":"10.7717\/peerj-cs.3489\/ref-15","first-page":"1440","article-title":"Automating the transition from dialectal to literary forms in Uzbek language texts: an algorithmic perspective","author":"Mengliev","year":"2023a"},{"issue":"19","key":"10.7717\/peerj-cs.3489\/ref-16","doi-asserted-by":"publisher","first-page":"9117","DOI":"10.3390\/app11199117","article-title":"Development of intellectual web system for morph analyzing of Uzbek words","volume":"11","author":"Mengliev","year":"2021","journal-title":"Applied Sciences"},{"issue":"109675","key":"10.7717\/peerj-cs.3489\/ref-17","doi-asserted-by":"publisher","first-page":"110413","DOI":"10.1016\/j.dib.2024.110413","article-title":"Developing named entity recognition algorithms for Uzbek: dataset insights and implementation","volume":"51","author":"Mengliev","year":"2024b","journal-title":"Data in Brief"},{"key":"10.7717\/peerj-cs.3489\/ref-18","first-page":"1500","article-title":"Developing rule-based and gazetteer lists for named entity recognition in Uzbek language: geographical names","author":"Mengliev","year":"2023b"},{"issue":"2","key":"10.7717\/peerj-cs.3489\/ref-19","doi-asserted-by":"publisher","first-page":"111249","DOI":"10.1016\/j.dib.2024.111249","article-title":"A comprehensive dataset and neural network approach for named entity recognition in the Uzbek language","volume":"58","author":"Mengliev","year":"2025","journal-title":"Data in Brief"},{"key":"10.7717\/peerj-cs.3489\/ref-20","first-page":"319","article-title":"A computational approach to recognizing poetry genres in Uzbek texts","author":"Mengliev","year":"2024c"},{"key":"10.7717\/peerj-cs.3489\/ref-21","first-page":"294","article-title":"Towards effective named entity recognition in Uzbek medical contexts","author":"Mengliev","year":"2024d"},{"issue":"1","key":"10.7717\/peerj-cs.3489\/ref-22","doi-asserted-by":"publisher","first-page":"658","DOI":"10.3390\/make6010031","article-title":"Why do tree ensemble approximators not outperform the recursive-rule eXtraction algorithm?","volume":"6","author":"Onishi","year":"2024","journal-title":"Machine Learning and Knowledge Extraction"},{"issue":"88","key":"10.7717\/peerj-cs.3489\/ref-23","first-page":"673","article-title":"Generality and specificity of dialectics and its reflection in the morphology of the Uzbek language","volume":"9","author":"Raxmatova","year":"2021","journal-title":"Economy and Society"},{"key":"10.7717\/peerj-cs.3489\/ref-24","first-page":"173","article-title":"Representing text chunks","author":"Sang","year":"1999"},{"key":"10.7717\/peerj-cs.3489\/ref-25","article-title":"UzbekTagger: the rule-based POS tagger for Uzbek language","author":"Sharipov","year":"2023"},{"key":"10.7717\/peerj-cs.3489\/ref-26","article-title":"Creating a morphological and syntactic tagged corpus for the Uzbek language","author":"Sharipov","year":"2022"},{"key":"10.7717\/peerj-cs.3489\/ref-27","doi-asserted-by":"publisher","first-page":"463","DOI":"10.5771\/0257-9774-2015-2-463","article-title":"Linguistic ambiguities of Uzbek and classification of Uzbek dialects","volume":"110","author":"Turaeva","year":"2015","journal-title":"Anthropos International Review of Anthropology and Linguistics"},{"key":"10.7717\/peerj-cs.3489\/ref-28","article-title":"UZNER: a benchmark for named entity recognition in Uzbek","volume":"14302","author":"Yusufu","year":"2023"}],"container-title":["PeerJ Computer Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/peerj.com\/articles\/cs-3489.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/peerj.com\/articles\/cs-3489.xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/peerj.com\/articles\/cs-3489.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/peerj.com\/articles\/cs-3489.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,27]],"date-time":"2026-01-27T08:21:31Z","timestamp":1769502091000},"score":1,"resource":{"primary":{"URL":"https:\/\/peerj.com\/articles\/cs-3489"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,1,27]]},"references-count":28,"alternative-id":["10.7717\/peerj-cs.3489"],"URL":"https:\/\/doi.org\/10.7717\/peerj-cs.3489","archive":["CLOCKSS","LOCKSS","Portico"],"relation":{},"ISSN":["2376-5992"],"issn-type":[{"value":"2376-5992","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,1,27]]},"article-number":"e3489"}}