{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,6]],"date-time":"2026-03-06T22:18:56Z","timestamp":1772835536385,"version":"3.50.1"},"reference-count":36,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2019,6,20]],"date-time":"2019-06-20T00:00:00Z","timestamp":1560988800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Asian Low-Resour. Lang. Inf. Process."],"published-print":{"date-parts":[[2020,1,31]]},"abstract":"<jats:p>Named Entity Recognition (NER) plays a pivotal role in various natural language processing tasks, such as machine translation and automatic question-answering systems. Recognizing the importance of NER, a plethora of NER techniques for Western and Asian languages have been developed. However, despite having over 490 million Urdu language speakers worldwide, NER resources for Urdu are either non-existent or inadequate. To fill this gap, this article makes four key contributions. First, we have developed the largest Urdu NER corpus, which contains 926,776 tokens and 99,718 carefully annotated NEs. The developed corpus has at least doubled the number of manually tagged NEs as compared to any of the existing Urdu NER corpora. Second, we have generated six new word embeddings using three different techniques, fastText, Word2vec, and Glove, on two corpora of Urdu text. These are the only publicly available embeddings for the Urdu language, besides the recently released Urdu word embeddings by Facebook. Third, we have pioneered in the application of deep learning techniques, NN and RNN, for Urdu named entity recognition. Finally, we have performed 10-folds of 32 different experiments using the combinations of a traditional supervised learning and deep learning techniques, seven types of word embeddings, and two different Urdu NER datasets. Based on the analysis of the results, several valuable insights are provided about the effectiveness of deep learning techniques, the impact of word embeddings, and variations of datasets.<\/jats:p>","DOI":"10.1145\/3329710","type":"journal-article","created":{"date-parts":[[2019,6,21]],"date-time":"2019-06-21T12:38:56Z","timestamp":1561120736000},"page":"1-13","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":33,"title":["Urdu Named Entity Recognition"],"prefix":"10.1145","volume":"19","author":[{"given":"Safia","family":"Kanwal","sequence":"first","affiliation":[{"name":"Punjab University College of Information Technology, Lahore, Pakistan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1392-8866","authenticated-orcid":false,"given":"Kamran","family":"Malik","sequence":"additional","affiliation":[{"name":"Punjab University College of Information Technology, Lahore, Pakistan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8433-6705","authenticated-orcid":false,"given":"Khurram","family":"Shahzad","sequence":"additional","affiliation":[{"name":"Punjab University College of Information Technology, Lahore, Pakistan"}]},{"given":"Faisal","family":"Aslam","sequence":"additional","affiliation":[{"name":"Punjab University College of Information Technology, Lahore, Pakistan"}]},{"given":"Zubair","family":"Nawaz","sequence":"additional","affiliation":[{"name":"Punjab University College of Information Technology, Lahore, Pakistan"}]}],"member":"320","published-online":{"date-parts":[[2019,6,20]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"crossref","unstructured":"Nita Patil Ajay S. Patil and B. V. Pawar. 2016. Survey of named entity recognition systems with respect to indian and foreign languages. International Journal of Computer Applications (0975--8887) 134 16 (2016) 6.  Nita Patil Ajay S. Patil and B. V. Pawar. 2016. Survey of named entity recognition systems with respect to indian and foreign languages. International Journal of Computer Applications (0975--8887) 134 16 (2016) 6.","DOI":"10.5120\/ijca2016908197"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/1967293.1967296"},{"key":"e_1_2_1_3_1","article-title":"Approaches to named entity recognition: A survey","volume":"3297","author":"Potey A.","year":"2015","journal-title":"International Journal of Innovative Research in Computer and Communication Engineering (An ISO"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ijinfomgt.2014.10.007"},{"key":"e_1_2_1_5_1","volume-title":"https:\/\/en.wikipedia.org\/w\/index.php?title&equals;Urdu8oldid&equals;844110134 {Online","author":"Urdu Wikipedia","year":"2018"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10462-016-9482-x"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3129290"},{"key":"e_1_2_1_8_1","volume-title":"Conference on Language and Technology. National University of Computer and emerging Science","author":"Jawaid Bushra","year":"2009"},{"key":"e_1_2_1_9_1","volume-title":"Proceedings of the 6th Workshop on Asian Language Resources.","author":"Hussain Sarmad","year":"2008"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/INMIC.2017.8289449"},{"key":"e_1_2_1_11_1","unstructured":"IITHyderabad. Urdu NER dataset Raw UTF-8. http:\/\/ltrc.iiit.ac.in\/ner-ssea-08\/index.cgi?topic&equals;5. ({n.d.}). Online; accessed 19 Nov 2018.  IITHyderabad. Urdu NER dataset Raw UTF-8. http:\/\/ltrc.iiit.ac.in\/ner-ssea-08\/index.cgi?topic&equals;5. ({n.d.}). Online; accessed 19 Nov 2018."},{"key":"e_1_2_1_12_1","volume-title":"Proceedings of the 9th Workshop on Asian Language Resources. 31--35","author":"Adeeba Farah","year":"2011"},{"key":"e_1_2_1_13_1","volume-title":"Proceedings of the 2010 Named Entities Workshop. Association for Computational Linguistics, 126--135","author":"Riaz Kashif","year":"2010"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/1838751.1838754"},{"key":"e_1_2_1_15_1","volume-title":"Urdu word sense disambiguation using machine learning approach. Cluster Computing","author":"Abid Muhammad","year":"2017"},{"key":"e_1_2_1_16_1","volume-title":"Urdu named entity recognition system using hidden Markov model. Pakistan Journal of Engineering and Applied Sciences","author":"Malik Muhammad Kamran","year":"2017"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.3115\/1118759.1118760"},{"key":"e_1_2_1_18_1","volume-title":"Proceedings of the 10th Annual Workshop for South Asian Language Processing, EACL.","author":"Baker Paul"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.3115\/1119176.1119195"},{"key":"e_1_2_1_20_1","volume-title":"Annotated Corpus for Named Entity Recognition: Corpus Annotated with BIO and POS Tags. https:\/\/www.kaggle.com\/velavok\/nercorpus. ({n.d.}). Online","author":"Dmitriev Anton","year":"2018"},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of the 7th Message Understanding Conference (MUC-7), Appendices. 359--367","author":"Chinchor Nancy","year":"1998"},{"key":"e_1_2_1_22_1","volume-title":"Reijers","author":"Vanwersch Rob J. B.","year":"2016"},{"key":"e_1_2_1_23_1","unstructured":"Anton Dmitriev. Reuters-21578 Text Categorization Collection. http:\/\/kdd.ics.uci.edu\/databases\/reuters21578\/reuters21578.html. ({n.d.}). Online; accessed 10 June 2018.  Anton Dmitriev. Reuters-21578 Text Categorization Collection. http:\/\/kdd.ics.uci.edu\/databases\/reuters21578\/reuters21578.html. ({n.d.}). Online; accessed 10 June 2018."},{"key":"e_1_2_1_24_1","volume-title":"International Conference on Language Resources and Evaluation. 3886--3893","author":"Ploch Danuta","year":"2012"},{"key":"e_1_2_1_25_1","unstructured":"Darina Benikova Chris Biemann and Marc Reznicek. 2014. NoSta-D named entity annotation for German: Guidelines and dataset. In LREC. 2524--2531.  Darina Benikova Chris Biemann and Marc Reznicek. 2014. NoSta-D named entity annotation for German: Guidelines and dataset. In LREC. 2524--2531."},{"key":"e_1_2_1_26_1","volume-title":"Pradhan and Nianwen Xue","author":"Sameer","year":"2009"},{"key":"e_1_2_1_27_1","volume-title":"International Conference on Language Resources and Evaluation.","author":"Neudecker Clemens","year":"2016"},{"key":"e_1_2_1_28_1","volume-title":"The expressive power of word embeddings. arXiv preprint arXiv:1301.3226","author":"Chen Yanqing","year":"2013"},{"key":"e_1_2_1_29_1","unstructured":"Tomas Mikolov Ilya Sutskever Kai Chen Greg S. Corrado and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems. 3111--3119.   Tomas Mikolov Ilya Sutskever Kai Chen Greg S. Corrado and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems. 3111--3119."},{"key":"e_1_2_1_30_1","volume-title":"word2vec explained: Deriving Mikolov et\u00a0al.\u2019s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722","author":"Goldberg Yoav","year":"2014"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1162"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00051"},{"key":"e_1_2_1_33_1","volume-title":"Word vectors for 157 languages. https:\/\/fasttext.cc\/docs\/en\/crawl-vectors.html. (2018). Online","year":"2018"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/34.58871"},{"key":"e_1_2_1_35_1","unstructured":"L. R. Medsker and L. C. Jain. 2001. Recurrent neural networks. Design and Applications 5 (2001).   L. R. Medsker and L. C. Jain. 2001. Recurrent neural networks. Design and Applications 5 (2001)."},{"key":"e_1_2_1_36_1","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"2016","author":"Abhyuday"}],"container-title":["ACM Transactions on Asian and Low-Resource Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3329710","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3329710","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:54:18Z","timestamp":1750204458000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3329710"}},"subtitle":["Corpus Generation and Deep Learning Applications"],"short-title":[],"issued":{"date-parts":[[2019,6,20]]},"references-count":36,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2020,1,31]]}},"alternative-id":["10.1145\/3329710"],"URL":"https:\/\/doi.org\/10.1145\/3329710","relation":{},"ISSN":["2375-4699","2375-4702"],"issn-type":[{"value":"2375-4699","type":"print"},{"value":"2375-4702","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,6,20]]},"assertion":[{"value":"2018-06-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-04-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-06-20","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}