{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,1]],"date-time":"2026-04-01T11:20:06Z","timestamp":1775042406996,"version":"3.50.1"},"reference-count":43,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2010,12,1]],"date-time":"2010-12-01T00:00:00Z","timestamp":1291161600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Transactions on Asian Language Information Processing"],"published-print":{"date-parts":[[2010,12]]},"abstract":"<jats:p>Today, parallel corpus-based systems dominate the transliteration landscape. But the resource-scarce languages do not enjoy the luxury of large parallel transliteration corpus. For these languages, rule-based transliteration is the only viable option. In this article, we show that by properly harnessing the monolingual resources in conjunction with manually created rule base, one can achieve reasonable transliteration performance. We achieve this performance by exploiting the power of Character Sequence Modeling (CSM), which requires only monolingual resources. We present the results of our rule-based system for Hindi to English, English to Hindi, and Persian to English transliteration tasks. We also perform extrinsic evaluation of transliteration systems in the context of Cross Lingual Information Retrieval. Another important contribution of our work is to explain the widely varying accuracy numbers reported in transliteration literature, in terms of the entropy of the language pairs and the datasets involved.<\/jats:p>","DOI":"10.1145\/1838751.1838753","type":"journal-article","created":{"date-parts":[[2010,12,20]],"date-time":"2010-12-20T15:55:04Z","timestamp":1292860504000},"page":"1-30","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":10,"title":["Transliteration for Resource-Scarce Languages"],"prefix":"10.1145","volume":"9","author":[{"given":"Manoj K.","family":"Chinnakotla","sequence":"first","affiliation":[{"name":"Indian Institute of Technology Bombay"}]},{"given":"Om P.","family":"Damani","sequence":"additional","affiliation":[{"name":"Indian Institute of Technology Bombay"}]},{"given":"Avijit","family":"Satoskar","sequence":"additional","affiliation":[{"name":"Indian Institute of Technology Bombay"}]}],"member":"320","published-online":{"date-parts":[[2010,12]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/956863.956890"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.3115\/1073083.1073150"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1147\/rd.382.0183"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.3115\/1220355.1220441"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCOM.1984.1096090"},{"key":"e_1_2_1_6_1","volume-title":"Proceedings of the Natural Language Pacific Rim Symposium (NLPRS\u201997)","author":"Collier N.","unstructured":"Collier , N. , Kumano , A. , and Hirakawa , H . 1997. Acquisition of English-Japanese proper nouns from noisy-parallel newswire articles using Katakana matching . In Proceedings of the Natural Language Pacific Rim Symposium (NLPRS\u201997) . 309--314. Collier, N., Kumano, A., and Hirakawa, H. 1997. Acquisition of English-Japanese proper nouns from noisy-parallel newswire articles using Katakana matching. In Proceedings of the Natural Language Pacific Rim Symposium (NLPRS\u201997). 309--314."},{"key":"e_1_2_1_7_1","volume-title":"Graphology: Types of writing systems","author":"Crystal D.","year":"1992","unstructured":"Crystal , D. 1992 . Graphology: Types of writing systems . In The Cambridge Encyclopedia of Language 2nd Ed. Cambridge University Press , 199--205. Crystal, D. 1992. Graphology: Types of writing systems. In The Cambridge Encyclopedia of Language 2nd Ed. Cambridge University Press, 199--205."},{"key":"e_1_2_1_8_1","volume-title":"Proceedings of the 10th Text Retrieval Conference (TREC\u201901)","author":"Darwish K.","unstructured":"Darwish , K. , Doermann , D. , Jones , R. , Oard , D. , and Rautiainen , M . 2001. TREC-10 experiments at university of Maryland CLIR and Video . In Proceedings of the 10th Text Retrieval Conference (TREC\u201901) . NIST. Darwish, K., Doermann, D., Jones, R., Oard, D., and Rautiainen, M. 2001. TREC-10 experiments at university of Maryland CLIR and Video. In Proceedings of the 10th Text Retrieval Conference (TREC\u201901). NIST."},{"key":"e_1_2_1_9_1","volume-title":"Proceedings of the COLING\/ACL on Main Conference Poster Sessions (COLING\u201906)","author":"Ekbal A.","unstructured":"Ekbal , A. , Naskar , S. K. , and Bandyopadhyay , S . 2006. A modified joint source-channel model for transliteration . In Proceedings of the COLING\/ACL on Main Conference Poster Sessions (COLING\u201906) , 191--198. Ekbal, A., Naskar, S. K., and Bandyopadhyay, S. 2006. A modified joint source-channel model for transliteration. In Proceedings of the COLING\/ACL on Main Conference Poster Sessions (COLING\u201906), 191--198."},{"key":"e_1_2_1_10_1","volume-title":"International Encyclopedia of Linguistics","author":"Frawley W. J.","unstructured":"Frawley , W. J. 1992. Writing and written language: Writing systems . In International Encyclopedia of Linguistics , vol. 4 , 2 nd Ed. Oxford University Press , 383--384. Frawley, W. J. 1992. Writing and written language: Writing systems. In International Encyclopedia of Linguistics, vol. 4, 2nd Ed. Oxford University Press, 383--384.","edition":"2"},{"key":"e_1_2_1_11_1","volume-title":"Proceedings of the Workshop on CLIA, Addressing the Needs of Multilingual Societies (IJCNLP\u201908)","author":"Ganesh S.","unstructured":"Ganesh , S. , Harsha , S. , Pingali , P. , and Verma , V . 2008. Statistical transliteration for cross language information retrieval using HMM alignment and CRF . In Proceedings of the Workshop on CLIA, Addressing the Needs of Multilingual Societies (IJCNLP\u201908) . Ganesh, S., Harsha, S., Pingali, P., and Verma, V. 2008. Statistical transliteration for cross language information retrieval using HMM alignment and CRF. In Proceedings of the Workshop on CLIA, Addressing the Needs of Multilingual Societies (IJCNLP\u201908)."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-30211-7_12"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.3115\/1218955.1218976"},{"key":"e_1_2_1_14_1","volume-title":"Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Demonstration Session (ACL\u201907)","author":"Hoang H.","unstructured":"Hoang , H. , Birch , A. , Callison -burch, C., Zens , R. , Aachen , R. , Constantin , A. , Federico , M. , Bertoldi , N. , Dyer , C. , Cowan , B. , Shen , W. , Moran , C. , and Bojar , O . 2007. Moses: Open source toolkit for statistical machine translation . In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Demonstration Session (ACL\u201907) , 177--180. Hoang, H., Birch, A., Callison-burch, C., Zens, R., Aachen, R., Constantin, A., Federico, M., Bertoldi, N., Dyer, C., Cowan, B., Shen, W., Moran, C., and Bojar, O. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Demonstration Session (ACL\u201907), 177--180."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.3115\/1220575.1220630"},{"key":"e_1_2_1_16_1","unstructured":"Jurafsky D. and Martin J. H. 2008. Speech and Language Processing: An Introduction to Natural Language Processing Computational Linguistics and Speech Recognition 2nd Ed. Prentice Hall. Jurafsky D. and Martin J. H. 2008. Speech and Language Processing: An Introduction to Natural Language Processing Computational Linguistics and Speech Recognition 2nd Ed. Prentice Hall."},{"key":"e_1_2_1_18_1","volume-title":"Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (ACL\u201907)","author":"Karimi S.","unstructured":"Karimi , S. , Turpin , A. , and Scholer , F . 2007. Corpus effects on the evaluation of automated transliteration systems . In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (ACL\u201907) , 640--647. Karimi, S., Turpin, A., and Scholer, F. 2007. Corpus effects on the evaluation of automated transliteration systems. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (ACL\u201907), 640--647."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/APCCAS.1998.743882"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.3115\/1220175.1220278"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.3115\/976909.979634"},{"key":"e_1_2_1_22_1","unstructured":"Kudo T. 2003. CRF++: Yet Another CRF Toolkit. http:\/\/crfpp.sourceforge.net. Kudo T. 2003. CRF++: Yet Another CRF Toolkit . http:\/\/crfpp.sourceforge.net."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/1277741.1277876"},{"key":"e_1_2_1_24_1","volume-title":"Whitepaper of NEWS 2009 machine transliteration shared task. In Proceedings of the 1st International Joint Conference on Natural Language Processing (IJCNLP\u201909)","author":"Li H.","unstructured":"Li , H. , Kumaran , A. , Zhang , M. , and Pervouchine , V . 2009 . Whitepaper of NEWS 2009 machine transliteration shared task. In Proceedings of the 1st International Joint Conference on Natural Language Processing (IJCNLP\u201909) . Li, H., Kumaran, A., Zhang, M., and Pervouchine, V. 2009. Whitepaper of NEWS 2009 machine transliteration shared task. In Proceedings of the 1st International Joint Conference on Natural Language Processing (IJCNLP\u201909)."},{"key":"e_1_2_1_25_1","volume-title":"Proceedings of International Conference on Speech Communication and Technology (EUROSPEECH\u201901), 1919","author":"Llitjos A. F.","year":"1922","unstructured":"Llitjos , A. F. and Black , A. W . 2001. Knowledge of language origin improves pronunciation accuracy of proper names . In Proceedings of International Conference on Speech Communication and Technology (EUROSPEECH\u201901), 1919 -- 1922 . Llitjos, A. F. and Black, A. W. 2001. Knowledge of language origin improves pronunciation accuracy of proper names. In Proceedings of International Conference on Speech Communication and Technology (EUROSPEECH\u201901), 1919--1922."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.3115\/1220175.1220318"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.3115\/1220355.1220458"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1162\/089120103321337421"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.3115\/1072228.1072327"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1007\/11562214_40"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/1194936.1194938"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/1321440.1321536"},{"key":"e_1_2_1_33_1","unstructured":"Padariya N. Chinnakotla M. Nagesh A. and Damani O. P. 2008. Evaluation of Hindi to English Marathi to English and English to Hindi CLIR at FIRE 2008. In Working Notes of Forum for Information Retrieval and Evaluation (FIRE\u201908). Padariya N. Chinnakotla M. Nagesh A. and Damani O. P. 2008. Evaluation of Hindi to English Marathi to English and English to Hindi CLIR at FIRE 2008. In Working Notes of Forum for Information Retrieval and Evaluation (FIRE\u201908) ."},{"key":"e_1_2_1_34_1","unstructured":"Rollings A. G. 2004. The Spelling Patterns of English. LINCOM GmbH. Rollings A. G. 2004. The Spelling Patterns of English . LINCOM GmbH."},{"key":"e_1_2_1_35_1","volume-title":"Companion Volume: Demonstrations. Organizing Committee for the International Conference on Computer Linguistics (COLING\u201908)","author":"Saini T. S.","unstructured":"Saini , T. S. , Lehal , G. S. , and Kalra , V. S . 2008. Shahmukhi to Gurmukhi transliteration system . In Companion Volume: Demonstrations. Organizing Committee for the International Conference on Computer Linguistics (COLING\u201908) 1, 177--180. Saini, T. S., Lehal, G. S., and Kalra, V. S. 2008. Shahmukhi to Gurmukhi transliteration system. In Companion Volume: Demonstrations. Organizing Committee for the International Conference on Computer Linguistics (COLING\u201908) 1, 177--180."},{"key":"e_1_2_1_36_1","volume-title":"Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (ACL\u201907)","author":"Sherif T.","unstructured":"Sherif , T. and Kondrak , G . 2007. Substring-based transliteration . In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (ACL\u201907) , 944--951. Sherif, T. and Kondrak, G. 2007. Substring-based transliteration. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (ACL\u201907), 944--951."},{"key":"e_1_2_1_37_1","unstructured":"Shukla S. 2000. Hindi phonology. LINCOM GmbH. Shukla S. 2000. Hindi phonology . LINCOM GmbH."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.3115\/1220175.1220185"},{"key":"e_1_2_1_39_1","volume-title":"Proceedings of the International Conference on Spoken Language Processing (ICSLP\u201902)","author":"Stolcke A.","year":"2002","unstructured":"Stolcke , A. 2002 . SRILM - An extensible language modeling toolkit . In Proceedings of the International Conference on Spoken Language Processing (ICSLP\u201902) . Stolcke, A. 2002. SRILM - An extensible language modeling toolkit. In Proceedings of the International Conference on Spoken Language Processing (ICSLP\u201902)."},{"key":"e_1_2_1_40_1","volume-title":"Proceedings of the International Joint Conference on Natural Language Processing (IJCNLP\u201908)","author":"Surana H.","unstructured":"Surana , H. and Singh , A. K . 2008. A more discerning and adaptable multilingual transliteration mechanism for Indian languages . In Proceedings of the International Joint Conference on Natural Language Processing (IJCNLP\u201908) . Surana, H. and Singh, A. K. 2008. A more discerning and adaptable multilingual transliteration mechanism for Indian languages. In Proceedings of the International Joint Conference on Natural Language Processing (IJCNLP\u201908)."},{"key":"e_1_2_1_42_1","volume-title":"Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics (EACL\u201909)","author":"Udupa R.","unstructured":"Udupa , R. , Saravanan , K. , Kumaran , A. , and Jagarlamudi , J . 2009. MINT: A method for effective and scalable mining of named entity transliterations from large comparable corpora . In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics (EACL\u201909) , 799--807. Udupa, R., Saravanan, K., Kumaran, A., and Jagarlamudi, J. 2009. MINT: A method for effective and scalable mining of named entity transliterations from large comparable corpora. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics (EACL\u201909), 799--807."},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.3115\/1119384.1119392"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.3115\/980691.980789"},{"key":"e_1_2_1_45_1","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing Association for Computational Linguistics (EMNLP\u201906)","author":"Zelenko D.","unstructured":"Zelenko , D. and Aone , C . 2006. Discriminative methods for transliteration . In Proceedings of the Conference on Empirical Methods in Natural Language Processing Association for Computational Linguistics (EMNLP\u201906) . Zelenko, D. and Aone, C. 2006. Discriminative methods for transliteration. In Proceedings of the Conference on Empirical Methods in Natural Language Processing Association for Computational Linguistics (EMNLP\u201906)."}],"container-title":["ACM Transactions on Asian Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1838751.1838753","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1838751.1838753","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T11:39:49Z","timestamp":1750246789000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1838751.1838753"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,12]]},"references-count":43,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2010,12]]}},"alternative-id":["10.1145\/1838751.1838753"],"URL":"https:\/\/doi.org\/10.1145\/1838751.1838753","relation":{},"ISSN":["1530-0226","1558-3430"],"issn-type":[{"value":"1530-0226","type":"print"},{"value":"1558-3430","type":"electronic"}],"subject":[],"published":{"date-parts":[[2010,12]]},"assertion":[{"value":"2009-10-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2010-03-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2010-12-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}