{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,21]],"date-time":"2026-02-21T13:09:20Z","timestamp":1771679360075,"version":"3.50.1"},"reference-count":57,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2014,6,1]],"date-time":"2014-06-01T00:00:00Z","timestamp":1401580800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Transactions on Asian Language Information Processing"],"published-print":{"date-parts":[[2014,6]]},"abstract":"<jats:p>The Kurdish language is an Indo-European language spoken in Kurdistan, a large geographical region in the Middle East. Despite having a large number of speakers, Kurdish is among the less-resourced languages and has not seen much attention from the IR and NLP research communities. This article reports on the outcomes of a project aimed at providing essential resources for processing Kurdish texts.<\/jats:p>\n          <jats:p>A principal output of this project is Pewan, the first standard Test Collection to evaluate Kurdish Information Retrieval systems. The other language resources that we have built include a lightweight stemmer and a list of stopwords.<\/jats:p>\n          <jats:p>Our second principal contribution is using these newly-built resources to conduct a thorough experimental study on Kurdish documents. Our experimental results show that normalization, and to a lesser extent, stemming, can greatly improve the performance of Kurdish IR systems.<\/jats:p>","DOI":"10.1145\/2556948","type":"journal-article","created":{"date-parts":[[2014,6,24]],"date-time":"2014-06-24T12:16:16Z","timestamp":1403612176000},"page":"1-18","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":8,"title":["Towards Kurdish Information Retrieval"],"prefix":"10.1145","volume":"13","author":[{"given":"Kyumars Sheykh","family":"Esmaili","sequence":"first","affiliation":[{"name":"Technicolor, France"}]},{"given":"Shahin","family":"Salavati","sequence":"additional","affiliation":[{"name":"University of Kurdistan, Iran"}]},{"given":"Anwitaman","family":"Datta","sequence":"additional","affiliation":[{"name":"Nanyang Technological University, Singapore"}]}],"member":"320","published-online":{"date-parts":[[2014,6]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Hajir Abollahpour. 2013. Hajir Dictionary. http:\/\/kurmanj.ir\/news.php?readmore=76.  Hajir Abollahpour. 2013. Hajir Dictionary. http:\/\/kurmanj.ir\/news.php?readmore=76."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2009.05.002"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISSPA.2007.4555345"},{"key":"e_1_2_1_4_1","volume-title":"Proceedings of DEXA. 791--801","author":"Ballesteros Lisa","unstructured":"Lisa Ballesteros and W. Bruce Croft . 1996. Dictionary methods for cross-lingual information retrieval . In Proceedings of DEXA. 791--801 . Lisa Ballesteros and W. Bruce Croft. 1996. Dictionary methods for cross-lingual information retrieval. In Proceedings of DEXA. 791--801."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISSPIT.2009.5407540"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1023\/B:INRT.0000011208.60754.a1"},{"key":"e_1_2_1_7_1","volume-title":"Conference and Labs of the Evaluation Forum. http:\/\/www.clef-initiative.eu\/.","author":"CLEF.","year":"2013","unstructured":"CLEF. 2013 . Conference and Labs of the Evaluation Forum. http:\/\/www.clef-initiative.eu\/. CLEF. 2013. Conference and Labs of the Evaluation Forum. http:\/\/www.clef-initiative.eu\/."},{"key":"e_1_2_1_8_1","unstructured":"Dictio. 2013. Dictio Online Dictionary. http:\/\/dictio.kurditgroup.org\/dictio\/.  Dictio. 2013. Dictio Online Dictionary. http:\/\/dictio.kurditgroup.org\/dictio\/."},{"key":"e_1_2_1_9_1","unstructured":"Kyumars Sheykh Esmaili. 2012. Challenges in Kurdish text processing. CoRR abs\/1212.0074.  Kyumars Sheykh Esmaili. 2012. Challenges in Kurdish text processing. CoRR abs\/1212.0074."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/AICCSA.2007.370697"},{"key":"e_1_2_1_11_1","volume-title":"Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL). 300--305","author":"Esmaili Kyumars Sheykh","year":"2013","unstructured":"Kyumars Sheykh Esmaili and Shahin Salavati . 2013 . Sorani Kurdish versus Kurmanji Kurdish: An empirical comparison . In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL). 300--305 . Kyumars Sheykh Esmaili and Shahin Salavati. 2013. Sorani Kurdish versus Kurmanji Kurdish: An empirical comparison. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL). 300--305."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/AICCSA.2013.6616470"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/1644879.1644881"},{"key":"e_1_2_1_14_1","unstructured":"FIRE. 2013. Forum for information retrieval evaluation. http:\/\/www.isical.ac.in\/&ccirc;lia\/.  FIRE. 2013. Forum for information retrieval evaluation. http:\/\/www.isical.ac.in\/&ccirc;lia\/."},{"key":"e_1_2_1_15_1","volume-title":"Proceedings of ICEMCO.","author":"Gautier G\u00e9rard","year":"1998","unstructured":"G\u00e9rard Gautier . 1998 . Building a Kurdish language corpus: An overview of the technical problems . In Proceedings of ICEMCO. G\u00e9rard Gautier. 1998. Building a Kurdish language corpus: An overview of the technical problems. In Proceedings of ICEMCO."},{"key":"e_1_2_1_16_1","unstructured":"Guardian. 2013. The Guardian. www.guardian.co.uk\/.  Guardian. 2013. The Guardian. www.guardian.co.uk\/."},{"key":"e_1_2_1_17_1","first-page":"1","article-title":"Kurdish linguistics: A brief overview","volume":"55","author":"Haig Goeffrey","year":"2002","unstructured":"Goeffrey Haig and Yaron Matras . 2002 . Kurdish linguistics: A brief overview . Lang. Typol. and Univ. 55 , 1 . Goeffrey Haig and Yaron Matras. 2002. Kurdish linguistics: A brief overview. Lang. Typol. and Univ. 55, 1.","journal-title":"Lang. Typol. and Univ."},{"key":"e_1_2_1_18_1","volume-title":"A Study of European, Persian and Arabic loans in Standard Sorani","author":"Hasanpoor Jafar","unstructured":"Jafar Hasanpoor . 1999. A Study of European, Persian and Arabic loans in Standard Sorani . Uppsala University . Jafar Hasanpoor. 1999. A Study of European, Persian and Arabic loans in Standard Sorani. Uppsala University."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1515\/ijsl-2012-0047"},{"key":"e_1_2_1_20_1","volume-title":"Information Retrieval: Computational and Theoretical Aspects","author":"Heaps Harold Stanley","year":"1978","unstructured":"Harold Stanley Heaps . 1978 . Information Retrieval: Computational and Theoretical Aspects . Academic Press, Inc. Orlando, FL . Harold Stanley Heaps. 1978. Information Retrieval: Computational and Theoretical Aspects. Academic Press, Inc. Orlando, FL."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1097-4571(199601)47:1%3C70::AID-ASI7%3E3.3.CO;2-Q"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10791-009-9093-0"},{"key":"e_1_2_1_23_1","volume-title":"Development of a Stemming Algorithm","author":"Lovins Julie B.","unstructured":"Julie B. Lovins . 1968. Development of a Stemming Algorithm . MIT Information Processing Group , Electronic Systems Laboratory. Julie B. Lovins. 1968. Development of a Stemming Algorithm. MIT Information Processing Group, Electronic Systems Laboratory."},{"key":"e_1_2_1_24_1","unstructured":"Lucene. 2013. Apache Lucene. http:\/\/lucene.apache.org  Lucene. 2013. Apache Lucene. http:\/\/lucene.apache.org"},{"key":"e_1_2_1_25_1","volume-title":"Kurdish Dialect Studies","author":"MacKenzie David N.","unstructured":"David N. MacKenzie . 1961. Kurdish Dialect Studies . Oxford University Press . David N. MacKenzie. 1961. Kurdish Dialect Studies. Oxford University Press."},{"key":"e_1_2_1_26_1","volume-title":"Introduction to Information Retrieval","author":"Manning Christopher D.","unstructured":"Christopher D. Manning , Prabhakar Raghavan , and Hinrich Sch\u00fctze . 2008. Introduction to Information Retrieval . Cambridge University Press , New York, NY . Christopher D. Manning, Prabhakar Raghavan, and Hinrich Sch\u00fctze. 2008. Introduction to Information Retrieval. Cambridge University Press, New York, NY."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1023\/B:INRT.0000009441.78971.be"},{"key":"e_1_2_1_28_1","unstructured":"MG4J. 2013. Managing gigabytes for Java. http:\/\/mg4j.dsi.unimi.it\/.  MG4J. 2013. Managing gigabytes for Java. http:\/\/mg4j.dsi.unimi.it\/."},{"key":"e_1_2_1_29_1","unstructured":"Christian Middleton and Ricardo Baeza-Yates. 2007. A comparison of open source search engines. arXiv:1212.0074.  Christian Middleton and Ricardo Baeza-Yates. 2007. A comparison of open source search engines. arXiv:1212.0074."},{"key":"e_1_2_1_30_1","unstructured":"NTCIR. 2013. NII test collection for IR systems. http:\/\/research.nii.ac.jp\/ntcir\/index-en.html.  NTCIR. 2013. NII test collection for IR systems. http:\/\/research.nii.ac.jp\/ntcir\/index-en.html."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.5555\/188490.188499"},{"key":"e_1_2_1_32_1","unstructured":"Pewan. 2013. Pewan\u2019s download link. https:\/\/dl.dropbox.com\/u\/10883132\/Pewan.zip.  Pewan. 2013. Pewan\u2019s download link. https:\/\/dl.dropbox.com\/u\/10883132\/Pewan.zip."},{"key":"e_1_2_1_33_1","unstructured":"Peyamner. 2013. Peyamner news agency. http:\/\/www.peyamner.com\/.  Peyamner. 2013. Peyamner news agency. http:\/\/www.peyamner.com\/."},{"key":"e_1_2_1_34_1","volume-title":"Readings in Information Retrieval","author":"Porter M. F.","unstructured":"M. F. Porter . 1997. An algorithm for suffix stripping . In Readings in Information Retrieval , Karen Sparck Jones and Peter Willet, Eds., Morgan Kaufmann Publishers Inc ., 313--316. M. F. Porter. 1997. An algorithm for suffix stripping. In Readings in Information Retrieval, Karen Sparck Jones and Peter Willet, Eds., Morgan Kaufmann Publishers Inc., 313--316."},{"key":"e_1_2_1_35_1","volume-title":"Proceedings of the 2nd Workshop on South and Southeast Asian Natural Language Processing (WSSANLP). 40--45","author":"Rehman Zobia","year":"2011","unstructured":"Zobia Rehman , Waqas Anwar , and Usama Ijaz Bajwa . 2011 . Challenges in Urdu text tokenization and sentence boundary disambiguation . In Proceedings of the 2nd Workshop on South and Southeast Asian Natural Language Processing (WSSANLP). 40--45 . Zobia Rehman, Waqas Anwar, and Usama Ijaz Bajwa. 2011. Challenges in Urdu text tokenization and sentence boundary disambiguation. In Proceedings of the 2nd Workshop on South and Southeast Asian Natural Language Processing (WSSANLP). 40--45."},{"key":"e_1_2_1_36_1","volume-title":"Proceedings of the 6th International Symposium on Electrical and Electronics Engineering and Computer Science. 557--562","author":"Motaz","unstructured":"Motaz K. Saad and Wesam Ashour. 2010. OSAC: Open source Arabic corpus . In Proceedings of the 6th International Symposium on Electrical and Electronics Engineering and Computer Science. 557--562 . Motaz K. Saad and Wesam Ashour. 2010. OSAC: Open source Arabic corpus. In Proceedings of the 6th International Symposium on Electrical and Electronics Engineering and Computer Science. 557--562."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/182.358466"},{"key":"e_1_2_1_38_1","volume-title":"Proceedings of the International Conference on Head-Driven Phrase Structure Grammar. 235--249","author":"Samvelian Pollet","year":"2007","unstructured":"Pollet Samvelian . 2007 . A lexical account of Sorani Kurdish prepositions . In Proceedings of the International Conference on Head-Driven Phrase Structure Grammar. 235--249 . Pollet Samvelian. 2007. A lexical account of Sorani Kurdish prepositions. In Proceedings of the International Conference on Head-Driven Phrase Structure Grammar. 235--249."},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1097-4571(1999)50:10%3C944::AID-ASI9%3E3.3.CO;2-H"},{"key":"e_1_2_1_40_1","volume-title":"Proceedings of LTC.","author":"Shamsfard Mehrnoush","year":"2011","unstructured":"Mehrnoush Shamsfard . 2011 . Challenges and open problems in Persian text processing . In Proceedings of LTC. Mehrnoush Shamsfard. 2011. Challenges and open problems in Persian text processing. In Proceedings of LTC."},{"key":"e_1_2_1_41_1","volume-title":"Proceedings of LREC.","author":"Shamsfard Mehrnoush","year":"2010","unstructured":"Mehrnoush Shamsfard , Hoda Sadat Jafari , and Mahdi Ilbeygi . 2010 . STeP-1: A set of fundamental tools for Persian text processing . In Proceedings of LREC. Mehrnoush Shamsfard, Hoda Sadat Jafari, and Mahdi Ilbeygi. 2010. STeP-1: A set of fundamental tools for Persian text processing. In Proceedings of LREC."},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10993-010-9179-y"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1108\/eb026616"},{"key":"e_1_2_1_44_1","unstructured":"Terrier. 2013. Terrier IR platform. http:\/\/terrier.org\/.  Terrier. 2013. Terrier IR platform. http:\/\/terrier.org\/."},{"key":"e_1_2_1_45_1","volume-title":"Kurmanji Kurdish: A Reference Grammar with Selected Readings","author":"Thackston Wheeler M.","year":"2006","unstructured":"Wheeler M. Thackston . 2006 a. Kurmanji Kurdish: A Reference Grammar with Selected Readings . Harvard University . Wheeler M. Thackston. 2006a. Kurmanji Kurdish: A Reference Grammar with Selected Readings. Harvard University."},{"key":"e_1_2_1_46_1","volume-title":"Sorani Kurdish: A Reference Grammar with Selected Readings","author":"Thackston Wheeler M.","year":"2006","unstructured":"Wheeler M. Thackston . 2006 b. Sorani Kurdish: A Reference Grammar with Selected Readings . Harvard University . Wheeler M. Thackston. 2006b. Sorani Kurdish: A Reference Grammar with Selected Readings. Harvard University."},{"key":"e_1_2_1_47_1","volume-title":"Text REtrieval Conference. http:\/\/trec.nist.gov\/.","author":"TREC.","year":"2013","unstructured":"TREC. 2013 . Text REtrieval Conference. http:\/\/trec.nist.gov\/. TREC. 2013. Text REtrieval Conference. http:\/\/trec.nist.gov\/."},{"key":"e_1_2_1_48_1","unstructured":"VOA. 2013a. Voice of America - Kurdish (Kurmanji). http:\/\/www.dengeamerika.com\/.  VOA. 2013a. Voice of America - Kurdish (Kurmanji). http:\/\/www.dengeamerika.com\/."},{"key":"e_1_2_1_49_1","unstructured":"VOA. 2013b. Voice of America - Kurdish (Sorani). http:\/\/www.dengiamerika.com\/.  VOA. 2013b. Voice of America - Kurdish (Sorani). http:\/\/www.dengiamerika.com\/."},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.5555\/188490.188508"},{"key":"e_1_2_1_51_1","volume-title":"Overview of the TREC 2004 robust retrieval track. In Proceedings of TREC.","author":"Voorhees Ellen M.","year":"2004","unstructured":"Ellen M. Voorhees . 2004 . Overview of the TREC 2004 robust retrieval track. In Proceedings of TREC. Ellen M. Voorhees. 2004. Overview of the TREC 2004 robust retrieval track. In Proceedings of TREC."},{"key":"e_1_2_1_52_1","first-page":"1","article-title":"Overview of the eighth text retrieval conference","volume":"8","author":"Voorhees Ellen M.","year":"1999","unstructured":"Ellen M. Voorhees and Donna Harman . 1999 . Overview of the eighth text retrieval conference . In Proceedings of TREC , Vol. 8. 1 -- 24 . Ellen M. Voorhees and Donna Harman. 1999. Overview of the eighth text retrieval conference. In Proceedings of TREC, Vol. 8. 1--24.","journal-title":"Proceedings of TREC"},{"key":"e_1_2_1_53_1","volume-title":"The Proceedings of the Eighth Mediterranean Morphology Meeting.","author":"Walther G\u00e9raldine","year":"2011","unstructured":"G\u00e9raldine Walther . 2011 . Fitting into morphological structure: Accounting for Sorani Kurdish endoclitics . In The Proceedings of the Eighth Mediterranean Morphology Meeting. G\u00e9raldine Walther. 2011. Fitting into morphological structure: Accounting for Sorani Kurdish endoclitics. In The Proceedings of the Eighth Mediterranean Morphology Meeting."},{"key":"e_1_2_1_54_1","volume-title":"SaLTMiL\u2019s Workshop on Less-resourced Languages (LREC).","author":"Walther G\u00e9raldine","year":"2010","unstructured":"G\u00e9raldine Walther and Beno\u00eet Sagot . 2010 . Developing a large-scale Lexicon for a less-resourced language . In SaLTMiL\u2019s Workshop on Less-resourced Languages (LREC). G\u00e9raldine Walther and Beno\u00eet Sagot. 2010. Developing a large-scale Lexicon for a less-resourced language. In SaLTMiL\u2019s Workshop on Less-resourced Languages (LREC)."},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/267954.267957"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/564376.564424"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/290941.291014"}],"container-title":["ACM Transactions on Asian Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2556948","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2556948","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T07:34:41Z","timestamp":1750232081000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2556948"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,6]]},"references-count":57,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2014,6]]}},"alternative-id":["10.1145\/2556948"],"URL":"https:\/\/doi.org\/10.1145\/2556948","relation":{},"ISSN":["1530-0226","1558-3430"],"issn-type":[{"value":"1530-0226","type":"print"},{"value":"1558-3430","type":"electronic"}],"subject":[],"published":{"date-parts":[[2014,6]]},"assertion":[{"value":"2013-04-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2013-12-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2014-06-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}