{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,17]],"date-time":"2026-03-17T19:37:33Z","timestamp":1773776253387,"version":"3.50.1"},"reference-count":53,"publisher":"PeerJ","license":[{"start":{"date-parts":[[2026,3,17]],"date-time":"2026-03-17T00:00:00Z","timestamp":1773705600000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"abstract":"<jats:p>\n                    Named Entity Recognition (NER) is a fundamental task in Natural Language Processing (NLP), focusing on identifying and extracting entities such as names, locations, organizations, and other specific labels from unstructured text data. It plays a crucial role in various NLP applications, including information retrieval, question answering, and sentiment analysis. However, while NER systems have been extensively developed for English, adapting them to languages like Urdu poses unique challenges due to linguistic differences and the scarcity of annotated data. In this research, we enhance data diversity and accessibility for Urdu NER by introducing the ZUNERA\n                    <jats:italic>corpus<\/jats:italic>\n                    , the most extensive Urdu NER dataset to date, comprising 1,189,614 tokens and 89,804 named entities. Additionally, we classify the entities into twenty-three different named entities types. We meticulously annotate the\n                    <jats:italic>corpus<\/jats:italic>\n                    , providing clear guidelines and employing the Kappa coefficient to ensure high-quality annotations. Furthermore, we propose the Urdu-Named Entity Recognition with BiGRU-based Deep Learning Architecture (NERD) framework, which facilitates efficient entity recognition in Urdu text. The proposed framework achieves an impressive F1-score of 94.6%. Comparing ZUNERA with the MK-PUCIT dataset underscores its robustness in accurately recognizing entities. Although this study centers on Urdu, the proposed NER framework and annotation pipeline are designed to be language-agnostic. They can be extended to other morphologically rich or low-resource languages, providing a replicable foundation for future cross-lingual research. Overall, our contributions significantly advance Urdu NER research by providing a comprehensive dataset, evaluating state-of-the-art techniques, and introducing a novel framework for efficient Urdu entity recognition.\n                  <\/jats:p>","DOI":"10.7717\/peerj-cs.3678","type":"journal-article","created":{"date-parts":[[2026,3,17]],"date-time":"2026-03-17T16:42:40Z","timestamp":1773765760000},"page":"e3678","source":"Crossref","is-referenced-by-count":0,"title":["Urdu-NERD: Urdu named entity recognition with BiGRU-based deep learning architecture"],"prefix":"10.7717","volume":"12","author":[{"given":"Zainab","family":"Rafiq","sequence":"first","affiliation":[{"name":"Department of Computer Science, University of Management & Technology, Lahore (Sialkot Campus), Punjab, Pakistan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Muhammad","family":"Wasim","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of Management & Technology, Lahore (Sialkot Campus), Punjab, Pakistan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4323-4627","authenticated-orcid":true,"given":"Fatema","family":"Sabeen Shaikh","sequence":"additional","affiliation":[{"name":"Computer Information Systems Department, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0330-8436","authenticated-orcid":true,"given":"Nahier","family":"Aldhafferi","sequence":"additional","affiliation":[{"name":"Computer Information Systems Department, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Abdullah","family":"Alqahtani","sequence":"additional","affiliation":[{"name":"Computer Information Systems Department, College of Computer Science and Information Technology, Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"4443","published-online":{"date-parts":[[2026,3,17]]},"reference":[{"issue":"4","key":"10.7717\/peerj-cs.3678\/ref-1","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3383306","article-title":"Named entity recognition and classification for Punjabi Shahmukhi","volume":"19","author":"Ahmad","year":"2020","journal-title":"ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP)"},{"key":"10.7717\/peerj-cs.3678\/ref-2","doi-asserted-by":"crossref","DOI":"10.1109\/ACCESS.2025.3576784","article-title":"A benchmark dataset and a framework for Urdu multimodal named entity recognition","author":"Ahmad","year":"2025"},{"key":"10.7717\/peerj-cs.3678\/ref-3","first-page":"1","article-title":"A proposed model for Bengali named entity recognition using maximum entropy Markov model incorporated with rich linguistic feature set","author":"Alam","year":"2020"},{"key":"10.7717\/peerj-cs.3678\/ref-4","first-page":"2953","article-title":"SiNER: a large dataset for Sindhi named entity recognition","author":"Ali","year":"2020"},{"issue":"1","key":"10.7717\/peerj-cs.3678\/ref-5","doi-asserted-by":"publisher","first-page":"471","DOI":"10.32604\/cmc.2021.016054","article-title":"Arabic named entity recognition: a BERT-BGRU approach","volume":"68","author":"Alsaaran","year":"2021","journal-title":"Computers, Materials & Continua"},{"key":"10.7717\/peerj-cs.3678\/ref-6","article-title":"Constructing corpora of South Asian languages","author":"Baker","year":"2003"},{"key":"10.7717\/peerj-cs.3678\/ref-7","doi-asserted-by":"crossref","DOI":"10.3115\/1118759.1118760","article-title":"A study in Urdu corpus construction","author":"Becker","year":"2002"},{"key":"10.7717\/peerj-cs.3678\/ref-8","first-page":"2524","article-title":"NoSta-D named entity annotation for German: guidelines and dataset","author":"Benikova","year":"2014"},{"key":"10.7717\/peerj-cs.3678\/ref-9","first-page":"135","volume-title":"Enriching word vectors with subword information","volume":"5","author":"Bojanowski","year":"2017"},{"key":"10.7717\/peerj-cs.3678\/ref-10","first-page":"279","volume-title":"Urdu language processing: a survey","volume":"47","author":"Daud","year":"2017"},{"key":"10.7717\/peerj-cs.3678\/ref-11","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1810.04805","article-title":"BERT: pre-training of deep bidirectional transformers for language understanding","author":"Devlin","year":"2018"},{"key":"10.7717\/peerj-cs.3678\/ref-12","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2504.08792","article-title":"Enhancing NER performance in low-resource Pakistani languages using cross-lingual data augmentation","author":"Ehsan","year":"2025"},{"issue":"3","key":"10.7717\/peerj-cs.3678\/ref-13","doi-asserted-by":"publisher","first-page":"3","DOI":"10.11591\/ijece.v9i3.pp2025-2032","article-title":"Arabic named entity recognition using deep learning approach","volume":"9","author":"El Bazi","year":"2019","journal-title":"International Journal of Electrical and Computer Engineering (IJECE)"},{"key":"10.7717\/peerj-cs.3678\/ref-14","first-page":"179","article-title":"Overview of the transformer-based models for NLP tasks","author":"Gillioz","year":"2020"},{"key":"10.7717\/peerj-cs.3678\/ref-15","first-page":"1","volume-title":"Deep learning-based named entity recognition system using hybrid embedding","author":"Goyal","year":"2022"},{"issue":"10","key":"10.7717\/peerj-cs.3678\/ref-16","doi-asserted-by":"publisher","first-page":"6970","DOI":"10.1080\/03772063.2021.2006805","article-title":"Recurrent neural network-based model for named entity recognition with improved word embeddings","volume":"69","author":"Goyal","year":"2023","journal-title":"IETE Journal of Research"},{"key":"10.7717\/peerj-cs.3678\/ref-17","first-page":"56","article-title":"Biomedical named entity recognition with multilingual BERT","author":"Hakala","year":"2019"},{"key":"10.7717\/peerj-cs.3678\/ref-18","doi-asserted-by":"publisher","first-page":"45194","DOI":"10.1109\/ACCESS.2023.3267746","article-title":"B-NER: a novel Bangla named entity recognition dataset with largest entities and its baseline evaluation","volume":"11","author":"Haque","year":"2023","journal-title":"IEEE Access"},{"key":"10.7717\/peerj-cs.3678\/ref-19","doi-asserted-by":"publisher","first-page":"197","DOI":"10.1007\/s10462-019-09688-6","article-title":"Arabic named entity recognition via deep co-learning","volume":"52","author":"Helwe","year":"2019","journal-title":"Artificial Intelligence Review"},{"key":"10.7717\/peerj-cs.3678\/ref-20","first-page":"57","article-title":"OntoNotes: the 90% solution","volume-title":"Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers","author":"Hovy","year":"2006"},{"key":"10.7717\/peerj-cs.3678\/ref-21","first-page":"95","article-title":"N-gram and gazetteer list based named entity recognition for Urdu: a scarce resourced language","author":"Jahangir","year":"2012"},{"key":"10.7717\/peerj-cs.3678\/ref-22","first-page":"3585","article-title":"Improved differentiable architecture search for language modeling and named entity recognition","author":"Jiang","year":"2019"},{"issue":"2","key":"10.7717\/peerj-cs.3678\/ref-23","doi-asserted-by":"publisher","first-page":"1715","DOI":"10.1007\/s13369-022-06933-z","article-title":"BiLSTM-CRF Manipuri NER with character-level word representation","volume":"48","author":"Jimmy","year":"2023","journal-title":"Arabian Journal for Science and Engineering"},{"key":"10.7717\/peerj-cs.3678\/ref-24","first-page":"1","volume-title":"Urdu named entity recognition: corpus generation and deep learning applications","volume":"19","author":"Kanwal","year":"2019"},{"key":"10.7717\/peerj-cs.3678\/ref-25","first-page":"1","article-title":"A deep learning approach to building a framework for Urdu POS and NER","author":"Kazi","year":"2023"},{"key":"10.7717\/peerj-cs.3678\/ref-26","doi-asserted-by":"crossref","DOI":"10.1145\/3595861","volume-title":"Using data augmentation and bidirectional encoder representations from transformers for improving Punjabi named entity recognition","author":"Khalid","year":"2023"},{"issue":"1","key":"10.7717\/peerj-cs.3678\/ref-27","doi-asserted-by":"publisher","first-page":"90","DOI":"10.4218\/etrij.2018-0553","article-title":"Deep recurrent neural networks with word embeddings for Urdu named entity recognition","volume":"42","author":"Khan","year":"2020","journal-title":"ETRI Journal"},{"issue":"13","key":"10.7717\/peerj-cs.3678\/ref-28","doi-asserted-by":"publisher","first-page":"6391","DOI":"10.3390\/app12136391","article-title":"Named entity recognition using conditional random fields","volume":"12","author":"Khan","year":"2022","journal-title":"Applied Sciences"},{"key":"10.7717\/peerj-cs.3678\/ref-29","first-page":"1","volume-title":"Urdu named entity recognition and classification system using artificial neural network","volume":"17","author":"Malik","year":"2017"},{"key":"10.7717\/peerj-cs.3678\/ref-30","first-page":"339","volume-title":"Named entity recognition approaches","volume":"2","author":"Mansouri","year":"2008"},{"key":"10.7717\/peerj-cs.3678\/ref-31","first-page":"2","volume-title":"Improving NER tagging performance in low-resource languages via multilingual learning","volume":"18","author":"Murthy","year":"2018"},{"key":"10.7717\/peerj-cs.3678\/ref-32","doi-asserted-by":"publisher","first-page":"171","DOI":"10.1109\/FIT57066.2022.00040","article-title":"Sequence-driven neural network models for NER tagging in Roman Urdu","volume-title":"2022 International Conference on Frontiers of Information Technology (FIT)","author":"Nadeem","year":"2022"},{"key":"10.7717\/peerj-cs.3678\/ref-33","first-page":"4348","article-title":"An open corpus for named entity recognition in historic newspapers","author":"Neudecker","year":"2016"},{"key":"10.7717\/peerj-cs.3678\/ref-34","first-page":"1946","article-title":"Cross-lingual name tagging and linking for 282 languages","author":"Pan","year":"2017"},{"key":"10.7717\/peerj-cs.3678\/ref-35","first-page":"1532","article-title":"Glove: global vectors for word representation","author":"Pennington","year":"2014"},{"key":"10.7717\/peerj-cs.3678\/ref-36","first-page":"101","article-title":"Support vector machine","author":"Pisner","year":"2020"},{"key":"10.7717\/peerj-cs.3678\/ref-37","first-page":"126","article-title":"Rulebased named entity recognition in Urdu","author":"Riaz","year":"2010"},{"key":"10.7717\/peerj-cs.3678\/ref-38","first-page":"1","article-title":"Maximum entropy based Urdu named entity recognition","author":"Riaz","year":"2020"},{"issue":"21","key":"10.7717\/peerj-cs.3678\/ref-39","doi-asserted-by":"publisher","first-page":"7557","DOI":"10.3390\/app10217557","article-title":"Delayed combination of feature embedding in bidirectional LSTM CRF for NER","volume":"10","author":"Ronran","year":"2020","journal-title":"Applied Sciences"},{"key":"10.7717\/peerj-cs.3678\/ref-40","doi-asserted-by":"crossref","first-page":"142","DOI":"10.3115\/1119176.1119195","article-title":"Introduction to the CoNLL-2003 shared task: language-independent named entity recognition","volume-title":"Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003","author":"Sang","year":"2003"},{"key":"10.7717\/peerj-cs.3678\/ref-41","article-title":"Extended named entity ontology with attribute information","author":"Sekine","year":"2008"},{"key":"10.7717\/peerj-cs.3678\/ref-42","article-title":"Named entity recognition for south and south east Asian languages: taking stock","author":"Singh","year":"2008"},{"key":"10.7717\/peerj-cs.3678\/ref-43","first-page":"2507","article-title":"Named entity recognition system for Urdu","author":"Singh","year":"2012"},{"key":"10.7717\/peerj-cs.3678\/ref-44","first-page":"37","article-title":"Beheshti-NER: Persian named entity recognition using BERT","volume-title":"Proceedings of the First International Workshop on NLP Solutions for Under Resourced Languages (NSURL 2019) co-located with ICNLSP 2019-Short Papers","author":"Taher","year":"2020"},{"key":"10.7717\/peerj-cs.3678\/ref-45","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2504.18142","article-title":"EDU-NER-2025: named entity recognition in Urdu educational texts using XLM-RoBERTa with X (formerly Twitter)","author":"Ullah","year":"2025"},{"key":"10.7717\/peerj-cs.3678\/ref-46","first-page":"3","article-title":"Urdu named entity recognition with attention Bi-LSTM-CRF model","author":"Ullah","year":"2022"},{"issue":"2","key":"10.7717\/peerj-cs.3678\/ref-47","doi-asserted-by":"publisher","first-page":"1435","DOI":"10.1007\/s11277-023-10339-x","article-title":"An attention based BI-LSTM DenseNet model for named entity recognition in English texts","volume":"130","author":"VeeraSekharReddy","year":"2023","journal-title":"Wireless Personal Communications"},{"key":"10.7717\/peerj-cs.3678\/ref-48","doi-asserted-by":"publisher","first-page":"73627","DOI":"10.1109\/ACCESS.2019.2920734","article-title":"Named entity recognition from biomedical texts using a fusion attention-based BiLSTM-CRF","volume":"7","author":"Wei","year":"2019","journal-title":"IEEE Access"},{"key":"10.7717\/peerj-cs.3678\/ref-49","first-page":"1285","article-title":"Research progress of RNN language model","author":"Xiao","year":"2020"},{"key":"10.7717\/peerj-cs.3678\/ref-50","first-page":"565","article-title":"An improved LSTM structure for natural language processing","author":"Yao","year":"2018"},{"key":"10.7717\/peerj-cs.3678\/ref-51","doi-asserted-by":"crossref","first-page":"673","DOI":"10.1145\/3159652.3159703","article-title":"Dynamic word embeddings for evolving semantic discovery","volume-title":"Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining","author":"Yao","year":"2018"},{"key":"10.7717\/peerj-cs.3678\/ref-52","doi-asserted-by":"publisher","first-page":"7","DOI":"10.5121\/csit.2019.90706","article-title":"HMM-based dari named entity recognition for information extraction","volume":"9","author":"Zia","year":"2019","journal-title":"CS & IT Conference Proceedings"},{"issue":"4","key":"10.7717\/peerj-cs.3678\/ref-53","doi-asserted-by":"crossref","first-page":"377","DOI":"10.30630\/joiv.3.4.289","article-title":"Efficient processing of GRU based on word embedding for text classification","volume":"3","author":"Zulqarnain","year":"2019","journal-title":"JOIV: International Journal on Informatics Visualization"}],"container-title":["PeerJ Computer Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/peerj.com\/articles\/cs-3678.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/peerj.com\/articles\/cs-3678.xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/peerj.com\/articles\/cs-3678.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/peerj.com\/articles\/cs-3678.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,17]],"date-time":"2026-03-17T16:42:51Z","timestamp":1773765771000},"score":1,"resource":{"primary":{"URL":"https:\/\/peerj.com\/articles\/cs-3678"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3,17]]},"references-count":53,"alternative-id":["10.7717\/peerj-cs.3678"],"URL":"https:\/\/doi.org\/10.7717\/peerj-cs.3678","archive":["CLOCKSS","LOCKSS","Portico"],"relation":{},"ISSN":["2376-5992"],"issn-type":[{"value":"2376-5992","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,3,17]]},"article-number":"e3678"}}