{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:30:25Z","timestamp":1750307425971,"version":"3.41.0"},"reference-count":54,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2010,9,1]],"date-time":"2010-09-01T00:00:00Z","timestamp":1283299200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001602","name":"Science Foundation Ireland","doi-asserted-by":"publisher","award":["Grant 07\/CE\/I1142"],"award-info":[{"award-number":["Grant 07\/CE\/I1142"]}],"id":[{"id":"10.13039\/501100001602","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Transactions on Asian Language Information Processing"],"published-print":{"date-parts":[[2010,9]]},"abstract":"<jats:p>The Forum for Information Retrieval Evaluation (FIRE) provides document collections, topics, and relevance assessments for information retrieval (IR) experiments on Indian languages. Several research questions are explored in this article: 1) How to create create a simple, language-independent corpus-based stemmer, 2) How to identify sub-words and which types of sub-words are suitable as indexing units, and 3) How to apply blind relevance feedback on sub-words and how feedback term selection is affected by the type of the indexing unit. More than 140 IR experiments are conducted using the BM25 retrieval model on the topic titles and descriptions (TD) for the FIRE 2008 English, Bengali, Hindi, and Marathi document collections.<\/jats:p>\n          <jats:p>The major findings are: The corpus-based stemming approach is effective as a knowledge-light term conflation step and useful in the case of few language-specific resources. For English, the corpus-based stemmer performs nearly as well as the Porter stemmer and significantly better than the baseline of indexing words when combined with query expansion. In combination with blind relevance feedback, it also performs significantly better than the baseline for Bengali and Marathi IR.<\/jats:p>\n          <jats:p>Sub-words such as consonant-vowel sequences and word prefixes can yield similar or better performance in comparison to word indexing. There is no best performing method for all languages. For English, indexing using the Porter stemmer performs best, for Bengali and Marathi, overlapping 3-grams obtain the best result, and for Hindi, 4-prefixes yield the highest MAP. However, in combination with blind relevance feedback using 10 documents and 20 terms, 6-prefixes for English and 4-prefixes for Bengali, Hindi, and Marathi IR yield the highest MAP.<\/jats:p>\n          <jats:p>Sub-word identification is a general case of decompounding. It results in one or more index terms for a single word form and increases the number of index terms but decreases their average length. The corresponding retrieval experiments show that relevance feedback on sub-words benefits from selecting a larger number of index terms in comparison with retrieval on word forms. Similarly, selecting the number of relevance feedback terms depending on the ratio of word vocabulary size to sub-word vocabulary size almost always slightly increases information retrieval effectiveness compared to using a fixed number of terms for different languages.<\/jats:p>","DOI":"10.1145\/1838745.1838749","type":"journal-article","created":{"date-parts":[[2010,9,22]],"date-time":"2010-09-22T11:55:58Z","timestamp":1285156558000},"page":"1-30","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":10,"title":["Sub-Word Indexing and Blind Relevance Feedback for English, Bengali, Hindi, and Marathi IR"],"prefix":"10.1145","volume":"9","author":[{"given":"Johannes","family":"Leveling","sequence":"first","affiliation":[{"name":"Dublin City University"}]},{"given":"Gareth J. F.","family":"Jones","sequence":"additional","affiliation":[{"name":"Dublin City University"}]}],"member":"320","published-online":{"date-parts":[[2010,9]]},"reference":[{"volume-title":"Proceedings of the 15th Australasian Database Conference (ADC\u201904)","author":"Billerbeck B.","key":"e_1_2_1_1_1"},{"key":"e_1_2_1_2_1","doi-asserted-by":"crossref","unstructured":"}}\n      \n      Braschler M.\n     and \n      \n      \n      Ripplinger B\n      \n  \n  . \n  2003\n  . Stemming and decompounding for German text retrieval. In Proceedings of the 25th European Conference on Information Retrieval (ECIR\u201903). F. Sebastiani Ed. Lecture Notes in Computer Science vol. \n  2633 177--192. \n  Springer Berlin.   }} Braschler M. and Ripplinger B. 2003. Stemming and decompounding for German text retrieval. In Proceedings of the 25th European Conference on Information Retrieval (ECIR\u201903) . F. Sebastiani Ed. Lecture Notes in Computer Science vol. 2633 177--192. Springer Berlin.","DOI":"10.1007\/3-540-36618-0_13"},{"volume-title":"Overview of the 3rd Text REtrieval Conference (TREC\u201994)","author":"Buckley C.","key":"e_1_2_1_3_1"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1023\/B:INRT.0000009444.89549.90"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/258525.258532"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10579-007-9031-y"},{"volume-title":"Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (NAACL-HLT\u201907)","author":"Dasgupta S.","key":"e_1_2_1_7_1"},{"key":"e_1_2_1_8_1","unstructured":"}}Dolamic L. and Savoy J. 2008. UniNE at FIRE 2008: Hindi Bengali and Marathi IR. In Working Notes of the Forum for Information Retrieval Evaluation (FIRE\u201908).  }} Dolamic L. and Savoy J. 2008. UniNE at FIRE 2008: Hindi Bengali and Marathi IR. In Working Notes of the Forum for Information Retrieval Evaluation (FIRE\u201908) ."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0306-4573(02)00079-1"},{"key":"e_1_2_1_10_1","unstructured":"}}Fox C. 1992. Lexical Analysis and Stoplists. Prentice-Hall New Jersey. 102--130.   }} Fox C. 1992. Lexical Analysis and Stoplists . Prentice-Hall New Jersey. 102--130."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.3115\/1072133.1072172"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/133160.133194"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1162\/089120101750300490"},{"volume-title":"Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC\u201906)","author":"Guthrie D.","key":"e_1_2_1_14_1"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1097-4571(199101)42:1<7::AID-ASI2>3.0.CO;2-P"},{"key":"e_1_2_1_16_1","first-page":"2","article-title":"Compounds in dictionary-based cross-language information retrieval","volume":"7","author":"Hedlund T.","year":"2002","journal-title":"Inform. Res."},{"volume-title":"Proceedings of the PASCAL Challenge Workshop on Unsupervised Segmentation of Words Into Morphemes (MorphoChallenge\u201905)","author":"Keshava S.","key":"e_1_2_1_17_1"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/160688.160718"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/290941.291003"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2005.06.006"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/974740.974746"},{"volume-title":"Proceedings of the Lernen-Wissen-Adaption (LWA\u201909)","year":"2009","author":"Leveling J.","key":"e_1_2_1_22_1"},{"key":"e_1_2_1_23_1","unstructured":"}}Leveling J. Ganguly D. and Jones G. J. F. 2010. DCU@FIRE2010: Term conflation blind relevance feedback and cross-language IR with manual and automatic query translation. In Working Notes of the Forum for Information Retrieval Evaluation (FIRE\u201910).  }} Leveling J. Ganguly D. and Jones G. J. F. 2010. DCU@FIRE2010: Term conflation blind relevance feedback and cross-language IR with manual and automatic query translation. In Working Notes of the Forum for Information Retrieval Evaluation (FIRE\u201910) ."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1007\/11519645_28"},{"key":"e_1_2_1_25_1","first-page":"1","article-title":"Development of a stemming algorithm","volume":"11","author":"Lovins J. B.","year":"1968","journal-title":"Mechan. Trans. Comput."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/1031171.1031229"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/1281485.1281489"},{"volume-title":"Proceedings of the 3rd NTCIR Workshop on Research in Information Retrieval, Automatic Text Summarization, and Question Answering (NTCIR\u201901)","year":"2001","author":"McNamee P.","key":"e_1_2_1_28_1"},{"key":"e_1_2_1_29_1","unstructured":"}}McNamee P. 2008. N-gram tokenization for Indian language text retrieval. In Working Notes of the Forum for Information Retrieval Evaluation (FIRE\u201908).  }} McNamee P. 2008. N-gram tokenization for Indian language text retrieval. In Working Notes of the Forum for Information Retrieval Evaluation (FIRE\u201908) ."},{"key":"e_1_2_1_30_1","unstructured":"}}McNamee P. 2008. Textual representations for corpus-based bilingual retrieval. Ph.D. thesis University of Maryland Baltimore County.   }} McNamee P. 2008. Textual representations for corpus-based bilingual retrieval. Ph.D. thesis University of Maryland Baltimore County."},{"volume-title":"Working Notes of the Workshop on Cross-Language Evaluation Forum (CLEF\u201907)","author":"McNamee P.","key":"e_1_2_1_31_1"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/1571941.1571957"},{"key":"e_1_2_1_33_1","doi-asserted-by":"crossref","unstructured":"}}Ng K. 2000. Subword-based approaches for spoken document retrieval. Ph.D. thesis Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology.   }} Ng K. 2000. Subword-based approaches for spoken document retrieval. Ph.D. thesis Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology.","DOI":"10.1016\/S0167-6393(00)00008-X"},{"volume":"2069","volume-title":"Proceedings of the Cross-Language Information Retrieval and Evaluation, Workshop of Cross-Language Evaluation Forum (CLEF\u201900)","author":"Oard D. W.","key":"e_1_2_1_34_1"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/258525.258576"},{"key":"e_1_2_1_36_1","unstructured":"}}Paik J. H. and Parui S. K. 2008. A simple stemmer for inflectional languages. In Working Notes of the Forum for Information Retrieval Evaluation (FIRE\u201908).  }} Paik J. H. and Parui S. K. 2008. A simple stemmer for inflectional languages. In Working Notes of the Forum for Information Retrieval Evaluation (FIRE\u201908) ."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/1460027.1460044"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/1135777.1135898"},{"key":"e_1_2_1_39_1","first-page":"2","article-title":"Targeted s-gram matching: A novel n-gram matching technique for cross- and monolingual word form variants","volume":"7","author":"Pirkola A.","year":"2002","journal-title":"Inform. Res."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1108\/eb046814"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1108\/eb026866"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1002\/asi.4630270302"},{"volume-title":"Proceedings of the 7th Text Retrieval Conference (TREC98)","author":"Robertson S. E.","key":"e_1_2_1_43_1"},{"volume-title":"Overview of the 3rd Text Retrieval Conference (TREC\u201995)","author":"Robertson S. E.","key":"e_1_2_1_44_1"},{"volume-title":"The SMART Retrieval System -- Experiments in Automatic Document Processing","author":"Rocchio J. J.","key":"e_1_2_1_45_1"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1097-4571(1999)50:10%3C944::AID-ASI9%3E3.3.CO;2-H"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/1141277.1141523"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.3115\/1075812.1075897"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/1076034.1076045"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0306-4573(00)00015-7"},{"key":"e_1_2_1_51_1","unstructured":"}}Udupa R. Jagarlamudi J. and Saravanan K. 2008. Microsoft research at FIRE2008: Hindi-English cross-language information retrieval. In Working Notes of the Forum for Information Retrieval Evaluation 2008 (FIRE\u201908).  }} Udupa R. Jagarlamudi J. and Saravanan K. 2008. Microsoft research at FIRE2008: Hindi-English cross-language information retrieval. In Working Notes of the Forum for Information Retrieval Evaluation 2008 (FIRE\u201908) ."},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/267954.267957"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/243199.243202"},{"volume-title":"Maryland: English-Hindi CLIR. In Working Notes of the Forum for Information Retrieval Evaluation 2008 (FIRE\u201908).","year":"2008","author":"Xu T.","key":"e_1_2_1_54_1"}],"container-title":["ACM Transactions on Asian Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1838745.1838749","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1838745.1838749","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T11:39:49Z","timestamp":1750246789000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1838745.1838749"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,9]]},"references-count":54,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2010,9]]}},"alternative-id":["10.1145\/1838745.1838749"],"URL":"https:\/\/doi.org\/10.1145\/1838745.1838749","relation":{},"ISSN":["1530-0226","1558-3430"],"issn-type":[{"type":"print","value":"1530-0226"},{"type":"electronic","value":"1558-3430"}],"subject":[],"published":{"date-parts":[[2010,9]]},"assertion":[{"value":"2009-09-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2010-04-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2010-09-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}