{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,11]],"date-time":"2025-12-11T03:01:53Z","timestamp":1765422113621,"version":"3.41.0"},"reference-count":35,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2018,12,14]],"date-time":"2018-12-14T00:00:00Z","timestamp":1544745600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Asian Low-Resour. Lang. Inf. Process."],"published-print":{"date-parts":[[2019,6,30]]},"abstract":"<jats:p>Machine translation is the core problem for several natural language processing research across the globe. However, building a translation system involving low-resource languages remains a challenge with respect to statistical machine translation (SMT). This work proposes and studies the effect of a phrase-induced hybrid machine translation system for translation from English to Tamil, under a low-resource setting. Unlike conventional hybrid MT systems, the free-word ordering feature of the target language Tamil is exploited to form a re-ordered target language model and to extend the parallel text corpus for training the SMT. In the current work, a novel rule-based phrase-extraction method, implemented using parts-of-speech (POS) and place-of-pause in both languages is proposed, which is used to pre-process the training corpus for developing the back-off phrase-induced SMT. Further, out-of-vocabulary (OOV) words are handled using speech-based transliteration and two-level thesaurus intersection techniques based on the POS tag of the OOV word. To ensure that the input with OOV words does not skip phrase-level translation in the hierarchical model, a phrase-level example-based machine translation approach is adopted to find the closest matching phrase and perform translation followed by OOV replacement. The proposed system results in a bilingual evaluation understudy score of 84.78 and a translation edit rate of 19.12. The performance of the system is compared in terms of adequacy and fluency, with existing translation systems for this specific language pair, and it is observed that the proposed system outperforms its counterparts.<\/jats:p>","DOI":"10.1145\/3265751","type":"journal-article","created":{"date-parts":[[2018,12,14]],"date-time":"2018-12-14T13:19:17Z","timestamp":1544793557000},"page":"1-22","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["Pause-Based Phrase Extraction and Effective OOV Handling for Low-Resource Machine Translation Systems"],"prefix":"10.1145","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4632-2518","authenticated-orcid":false,"given":"K.","family":"Mrinalini","sequence":"first","affiliation":[{"name":"SSN College of Engineering, India"}]},{"given":"T.","family":"Nagarajan","sequence":"additional","affiliation":[{"name":"SSN College of Engineering, India"}]},{"given":"P.","family":"Vijayalakshmi","sequence":"additional","affiliation":[{"name":"SSN College of Engineering, India"}]}],"member":"320","published-online":{"date-parts":[[2018,12,14]]},"reference":[{"key":"e_1_2_1_1_1","first-page":"1045","article-title":"Factored statistical machine translation system for English to Tamil language","volume":"22","author":"Kumar M. Anand","year":"2014","journal-title":"Pertanika Journal of Social Science and Humanities"},{"volume-title":"Proceedings of the International Conference on Advances in Computer Science. Springer India, 287--297","author":"Kumar M. Anand","key":"e_1_2_1_2_1"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.5555\/2812892.2813409"},{"volume-title":"the First International Workshop on Spoken Languages Technologies for Under-Resourced Languages (SLTU). International Speech Communication Association (ISCA). 70--75","year":"2008","author":"Arora Karunesh","key":"e_1_2_1_4_1"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2015.2405751"},{"volume-title":"Retrieved","year":"2017","author":"EILMT.","key":"e_1_2_1_6_1"},{"volume-title":"Retrieved","year":"2017","key":"e_1_2_1_7_1"},{"volume-title":"Proceedings of the Conference of the Association for Machine Translation in the Americas (AMTA), Association for Computational Linguistics (ACL).","year":"2016","author":"Gujral Biman","key":"e_1_2_1_8_1"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.5555\/1557690.1557706"},{"key":"e_1_2_1_10_1","unstructured":"Krupakar Hans and R. S. Milton. 2016. Improving the performance of neural machine translation involving morphologically rich languages. arXiv preprint arXiv:1612.02482 (2016).  Krupakar Hans and R. S. Milton. 2016. Improving the performance of neural machine translation involving morphologically rich languages. arXiv preprint arXiv:1612.02482 (2016)."},{"volume-title":"Retrieved","year":"2017","author":"India ITY","key":"e_1_2_1_11_1"},{"key":"e_1_2_1_12_1","unstructured":"Ann Irvine. 2013. Statistical machine translation in low resource settings. In HLT-NAACL. 54--61.  Ann Irvine. 2013. Statistical machine translation in low resource settings. In HLT-NAACL. 54--61."},{"volume-title":"Proceedings of the 8th Workshop on Statistical Machine Translation. 262--270","year":"2013","author":"Irvine Ann","key":"e_1_2_1_13_1"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.5555\/1557769.1557821"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.3115\/1073445.1073462"},{"volume-title":"Proceedings of Workshop of the European Association for Machine Translation. 116--123","author":"Lavie Alon","key":"e_1_2_1_16_1"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.3115\/980691.980696"},{"volume-title":"English-Tamil Text Corpus. Last accessed on","year":"2018","author":"Resource India Linguistic","key":"e_1_2_1_18_1"},{"volume-title":"V\u00e4rldens 100 st\u00f6rsta spr\u00e5k 2007 (The World\u2019s 100 Largest Languages","year":"2007","author":"Mikael Parkvall","key":"e_1_2_1_19_1"},{"volume-title":"Proceedings of the 2016 IEEE Region 10 Conference (TENCON\u201916)","author":"Mrinalini K.","key":"e_1_2_1_20_1"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.5555\/1699648.1699682"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1093\/ietisy\/e91-d.7.2051"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.3115\/1073083.1073135"},{"volume-title":"Retrieved","year":"2017","author":"Phan Xuan-Hieu","key":"e_1_2_1_24_1"},{"volume-title":"Anusha Prakash, S Aswin Shanmugam, Raghava Krishnan, S. Kishore Prahalad, K. Samudravijaya, et al.","year":"2013","author":"Ramani B.","key":"e_1_2_1_25_1"},{"volume-title":"Proceedings of the Workshop on Machine Translation and Parsing in Indian Languages (MTPIL\u201912)","year":"2012","author":"Ramasamy Loganathan","key":"e_1_2_1_26_1"},{"volume-title":"Proceedings of the 2016 International Conference on Computation of Power, Energy Information and Commuincation (ICCPEIC\u201916)","author":"Sangavi G.","key":"e_1_2_1_27_1"},{"key":"e_1_2_1_28_1","volume-title":"Proceedings of Association for Machine Translation in the Americas","volume":"200","author":"Snover Matthew","year":"2006"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.5555\/1626431.1626480"},{"key":"e_1_2_1_30_1","unstructured":"L. Sobha G. Sindhuja L. Gracy N. Padmapriya A. Gnanapriya and N. H. Parimala. 2016. AUKBC Tamil part-of-speech corpus (AUKBC-TamilPOSCorpus2016v1).  L. Sobha G. Sindhuja L. Gracy N. Padmapriya A. Gnanapriya and N. H. Parimala. 2016. AUKBC Tamil part-of-speech corpus (AUKBC-TamilPOSCorpus2016v1)."},{"volume-title":"Le","year":"2014","author":"Sutskever Ilya","key":"e_1_2_1_31_1"},{"volume-title":"Retrieved","year":"2017","key":"e_1_2_1_32_1"},{"key":"e_1_2_1_33_1","first-page":"388","article-title":"Approaches to machine translation","volume":"57","author":"Tripathi Sneha","year":"2010","journal-title":"Annals of Library and Information Studies"},{"key":"e_1_2_1_34_1","unstructured":"Harrassowitz Verlag. 2010. Tamil Language for Europeans Ziegenbalg\u2019s Grammatica Damulica. Hubert 8 Co.  Harrassowitz Verlag. 2010. Tamil Language for Europeans Ziegenbalg\u2019s Grammatica Damulica. Hubert 8 Co."},{"volume-title":"Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American","author":"Xu Peng","key":"e_1_2_1_35_1"}],"container-title":["ACM Transactions on Asian and Low-Resource Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3265751","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3265751","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T01:39:49Z","timestamp":1750210789000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3265751"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,12,14]]},"references-count":35,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2019,6,30]]}},"alternative-id":["10.1145\/3265751"],"URL":"https:\/\/doi.org\/10.1145\/3265751","relation":{},"ISSN":["2375-4699","2375-4702"],"issn-type":[{"type":"print","value":"2375-4699"},{"type":"electronic","value":"2375-4702"}],"subject":[],"published":{"date-parts":[[2018,12,14]]},"assertion":[{"value":"2018-02-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2018-08-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2018-12-14","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}