{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:26:06Z","timestamp":1750220766448,"version":"3.41.0"},"reference-count":32,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2021,7,20]],"date-time":"2021-07-20T00:00:00Z","timestamp":1626739200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Asian Low-Resour. Lang. Inf. Process."],"published-print":{"date-parts":[[2021,7,31]]},"abstract":"<jats:p>With Web 2.0, there has been exponential growth in the number of Web users and the volume of Web content. Most of these users are not only consumers of the information but also generators of it. People express themselves here in colloquial languages, but using Roman script (transliteration). These texts are mostly informal and casual, and therefore seldom follow grammar rules. Also, there does not exist any prescribed set of spelling rules in transliterated text. This freedom leads to large-scale spelling variations, which is a major challenge in mixed script information processing. This article studies different existing phonetic algorithms to handle the issue of spelling variation, points out the limitations of them, and proposes a novel phonetic encoding approach with two different flavors in the light of Hindi transliteration. Experiments performed over Hindi song lyrics retrieval in mixed script domain with three different retrieval models show that proposed approaches outperform the existing techniques in a majority of the cases (sometimes statistically significantly) for a number of metrics like nDCG@1, nDCG@5, nDCG@10, MAP, MRR, and Recall.<\/jats:p>","DOI":"10.1145\/3447649","type":"journal-article","created":{"date-parts":[[2021,7,20]],"date-time":"2021-07-20T21:15:32Z","timestamp":1626815732000},"page":"1-34","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Query Expansion for Transliterated Text Retrieval"],"prefix":"10.1145","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1442-5451","authenticated-orcid":false,"given":"Dinesh Kumar","family":"Prabhakar","sequence":"first","affiliation":[{"name":"IIT (ISM), Dhanbad, Jharkhand, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sukomal","family":"Pal","sequence":"additional","affiliation":[{"name":"IIT (BHU), Varanasi, UP, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chiranjeev","family":"Kumar","sequence":"additional","affiliation":[{"name":"IIT (ISM), Dhanbad, Jharkhand, India"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,7,20]]},"reference":[{"volume-title":"Retrieved","year":"2015","key":"e_1_2_1_1_1"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/2215676.2215678"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/582415.582416"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/2824864.2824872"},{"volume-title":"Pre-proceedings 6th Workshop FIRE-2014","author":"Choudhury M.","key":"e_1_2_1_5_1"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.3115\/980845.980888"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/266714.266721"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1108\/eb046999"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1108\/eb047069"},{"volume-title":"Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016","year":"2016","author":"Gamb\u00e4ck Bj\u00f6rn","key":"e_1_2_1_10_1"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/2824864.2824888"},{"key":"e_1_2_1_12_1","unstructured":"Kanika Gupta Monojit Choudhury and Kalika Bali. 2012. Mining Hindi-English transliteration pairs from online Hindi lyrics. In LREC. 2459\u20132465.  Kanika Gupta Monojit Choudhury and Kalika Bali. 2012. Mining Hindi-English transliteration pairs from online Hindi lyrics. In LREC. 2459\u20132465."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/2600428.2609622"},{"volume-title":"Pre-proceedings 5th Workshop FIRE-2013","author":"Gupta P.","key":"e_1_2_1_14_1"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/1731035.1731036"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0306-4573(00)00016-9"},{"volume-title":"Pre-proceedings 5th workshop FIRE-2013","author":"Joshi H.","key":"e_1_2_1_17_1"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/1922649.1922654"},{"volume-title":"Abney","year":"2013","author":"King Ben","key":"e_1_2_1_19_1"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.5555\/1394399"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/290941.290995"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/2824864.2824873"},{"key":"e_1_2_1_23_1","unstructured":"M. Odell and R. Russell. 1918. The soundex coding system. US Patent 1261167 (1918).  M. Odell and R. Russell. 1918. The soundex coding system. US Patent 1261167 (1918)."},{"volume-title":"Pre-proceedings 5th Workshop FIRE-2013","author":"Pakray P.","key":"e_1_2_1_24_1"},{"volume-title":"Pre-proceedings 6th Workshop FIRE-2014","author":"Prakash A.","key":"e_1_2_1_25_1"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.5555\/2145432.2145602"},{"volume-title":"Conference on Empirical Methods in Natural Language Processing.","year":"1996","author":"Ratnaparkhi Adwait","key":"e_1_2_1_27_1"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/2701336.2701636"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1016\/0306-4573(88)90021-0"},{"volume-title":"Post Proceedings of the Workshops at the 7th Forum for Information Retrieval Evaluation. 19\u201325","year":"2015","author":"Sequiera Royal","key":"e_1_2_1_30_1"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/1498759.1498811"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/243199.243258"}],"container-title":["ACM Transactions on Asian and Low-Resource Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3447649","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3447649","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:41:09Z","timestamp":1750200069000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3447649"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,7,20]]},"references-count":32,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2021,7,31]]}},"alternative-id":["10.1145\/3447649"],"URL":"https:\/\/doi.org\/10.1145\/3447649","relation":{},"ISSN":["2375-4699","2375-4702"],"issn-type":[{"type":"print","value":"2375-4699"},{"type":"electronic","value":"2375-4702"}],"subject":[],"published":{"date-parts":[[2021,7,20]]},"assertion":[{"value":"2020-04-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-01-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-07-20","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}