{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,19]],"date-time":"2025-09-19T11:01:13Z","timestamp":1758279673794,"version":"3.44.0"},"reference-count":29,"publisher":"Association for Computing Machinery (ACM)","issue":"8","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Asian Low-Resour. Lang. Inf. Process."],"published-print":{"date-parts":[[2025,8,31]]},"abstract":"<jats:p>Most legal text in the Indian judiciary is written in complex English due to historical reasons. However, only a small fraction of the Indian population is comfortable in reading English. Hence legal text needs to be made available in various Indian languages, possibly by translating the available legal text from English. Though there has been a lot of research on translation to and between Indian languages, to our knowledge, there has not been much prior work on such translation in the legal domain. In this work, we construct the first high-quality legal parallel corpus containing aligned text units in English and nine Indian languages, that includes several low-resource languages. We also benchmark the performance of a wide variety of Machine Translation (MT) systems over this corpus, including commercial MT systems, open-source MT systems, and Large Language Models. Through a comprehensive survey by Law practitioners, we check how satisfied they are with the translations by some of these MT systems, and how well automatic MT evaluation metrics agree with the opinions of Law practitioners.<\/jats:p>","DOI":"10.1145\/3748313","type":"journal-article","created":{"date-parts":[[2025,7,11]],"date-time":"2025-07-11T11:08:06Z","timestamp":1752232086000},"page":"1-30","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["MILPaC: A Novel Benchmark for Evaluating Translation of Legal Text to Indian Languages"],"prefix":"10.1145","volume":"24","author":[{"ORCID":"https:\/\/orcid.org\/0009-0000-4024-0533","authenticated-orcid":false,"given":"Sayan","family":"Mahapatra","sequence":"first","affiliation":[{"name":"Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur","place":["Kharagpur, India"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-9568-2062","authenticated-orcid":false,"given":"Debtanu","family":"Datta","sequence":"additional","affiliation":[{"name":"Department of Mathematics, Indian Institute of Technology Kharagpur","place":["Kharagpur, India"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-3615-405X","authenticated-orcid":false,"given":"Shubham","family":"Soni","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur","place":["Kharagpur, India"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4420-0077","authenticated-orcid":false,"given":"Adrijit","family":"Goswami","sequence":"additional","affiliation":[{"name":"Department of Mathematics, Indian Institute of Technology Kharagpur","place":["Kharagpur, India"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2306-300X","authenticated-orcid":false,"given":"Saptarshi","family":"Ghosh","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur","place":["Kharagpur, India"]}]}],"member":"320","published-online":{"date-parts":[[2025,8,21]]},"reference":[{"key":"e_1_3_4_2_1","doi-asserted-by":"crossref","unstructured":"Sina Ahmadi Hossein Hassani and Daban Q. Jaff. 2022. Leveraging multilingual news websites for building a kurdish parallel corpus. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 21 5 Article 99 (April 2022) 11 pages.","DOI":"10.1145\/3511806"},{"key":"e_1_3_4_3_1","first-page":"2799","volume-title":"Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC\u201916)","author":"Baisa V\u00edt","year":"2016","unstructured":"V\u00edt Baisa, Jan Michelfeit, Marek Medve\u010f, and Milo\u0161 Jakub\u00ed\u010dek. 2016. European union language resources in sketch engine. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC\u201916). European Language Resources Association (ELRA), 2799\u20132803."},{"key":"e_1_3_4_4_1","unstructured":"Marta R. Costa-juss\u00e0 James Cross Onur \u00c7elebi Maha Elbayad Kenneth Heafield Kevin Heffernan Elahe Kalbassi Janice Lam Daniel Licht Jean Maillard et\u00a0al. 2022. No language left behind: Scaling human-centered machine translation. arXiv:2207.04672. Retrieved from https:\/\/arxiv.org\/abs\/2207.04672"},{"key":"e_1_3_4_5_1","doi-asserted-by":"crossref","first-page":"22","DOI":"10.18653\/v1\/2022.findings-aacl.3","volume-title":"Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022","author":"Dabre Raj","year":"2022","unstructured":"Raj Dabre and Aneerav Sukhoo. 2022. KreolMorisienMT: A dataset for Mauritian Creole Machine Translation. In Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022. Yulan He, Heng Ji, Sujian Li, Yang Liu, and Chua-Hui Chang (Eds.), Association for Computational Linguistics, Online only, 22\u201329."},{"key":"e_1_3_4_6_1","first-page":"36","volume-title":"Proceedings of the Human-Informed Translation and Interpreting Technology Workshop (HiT-IT 2019)","author":"Escribe Marie","year":"2019","unstructured":"Marie Escribe. 2019. Human evaluation of neural machine translation: The case of deep learning. In Proceedings of the Human-Informed Translation and Interpreting Technology Workshop (HiT-IT 2019). 36\u201346."},{"key":"e_1_3_4_7_1","doi-asserted-by":"publisher","unstructured":"Naman Goyal Cynthia Gao Vishrav Chaudhary Peng-Jen Chen Guillaume Wenzek Da Ju Sanjana Krishnan Marc\u2019Aurelio Ranzato Francisco Guzm\u00e1n and Angela Fan. 2022. The Flores-101 evaluation benchmark for low-resource and multilingual machine translation. Transactions of the Association for Computational Linguistics 10 (2022) 522\u2013538. DOI:10.1162\/tacl_a_00474","DOI":"10.1162\/tacl_a_00474"},{"key":"e_1_3_4_8_1","unstructured":"Barry Haddow and Faheem Kirefu. 2020. PMIndia - A collection of parallel corpora of languages of India. arXiv:2001.09907. Retrieved from https:\/\/arxiv.org\/abs\/2001.09907"},{"key":"e_1_3_4_9_1","doi-asserted-by":"crossref","first-page":"2612","DOI":"10.18653\/v1\/2020.emnlp-main.207","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Hasan Tahmid","year":"2020","unstructured":"Tahmid Hasan, Abhik Bhattacharjee, Kazi Samin, Masum Hasan, Madhusudan Basak, M. Sohel Rahman, and Rifat Shahriyar. 2020. Not low-resource anymore: Aligner ensembling, batch filtering, and new datasets for Bengali-English machine translation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Online, 2612\u20132623."},{"key":"e_1_3_4_10_1","doi-asserted-by":"crossref","unstructured":"Thangkhanhau Haulai and Jamal Hussain. 2023. Construction of mizo: English parallel corpus for machine translation. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 22 8 Article 220 (August 2023) 12 pages.","DOI":"10.1145\/3610404"},{"key":"e_1_3_4_11_1","first-page":"175","volume-title":"Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC\u201914)","author":"H\u00f6fler Stefan","year":"2014","unstructured":"Stefan H\u00f6fler and Kyoko Sugisaki. 2014. Constructing and exploiting an automatically annotated resource of legislative texts. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC\u201914). European Language Resources Association (ELRA), Reykjavik, Iceland, 175\u2013180."},{"key":"e_1_3_4_12_1","first-page":"79","volume-title":"Proceedings of Machine Translation Summit X: Papers","author":"Koehn Philipp","year":"2005","unstructured":"Philipp Koehn. 2005. Europarl: A parallel corpus for statistical machine translation. In Proceedings of Machine Translation Summit X: Papers. 79\u201386."},{"key":"e_1_3_4_13_1","article-title":"IndoWordnet Parallel Corpus","author":"Kunchukuttan Anoop","year":"2020","unstructured":"Anoop Kunchukuttan. 2020. IndoWordnet Parallel Corpus. Retrieved 24th July 2025 from https:\/\/github.com\/anoopkunchukuttan\/indowordnet_parallel. (2020).","journal-title":"https:\/\/github.com\/anoopkunchukuttan\/indowordnet_parallel"},{"key":"e_1_3_4_14_1","volume-title":"Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC 2018)","author":"Kunchukuttan Anoop","year":"2018","unstructured":"Anoop Kunchukuttan, Pratik Mehta, and Pushpak Bhattacharyya. 2018. The IIT Bombay English-Hindi parallel corpus. In Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki, Japan."},{"key":"e_1_3_4_15_1","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00343"},{"key":"e_1_3_4_16_1","first-page":"3417","volume-title":"Proceedings of the 13th Language Resources and Evaluation Conference","author":"Mujadia Vandan","year":"2022","unstructured":"Vandan Mujadia and Dipti Sharma. 2022. The LTRC Hindi-Telugu parallel corpus. In Proceedings of the 13th Language Resources and Evaluation Conference. European Language Resources Association, Marseille, France, 3417\u20133424."},{"key":"e_1_3_4_17_1","first-page":"1","volume-title":"Proceedings of the 7th Workshop on Asian Translation","author":"Nakazawa Toshiaki","year":"2020","unstructured":"Toshiaki Nakazawa, Hideki Nakayama, Chenchen Ding, Raj Dabre, Shohei Higashiyama, Hideya Mino, Isao Goto, Win Pa Pa, Anoop Kunchukuttan, Shantipriya Parida, et\u00a0al. 2020. Overview of the 7th workshop on Asian translation. In Proceedings of the 7th Workshop on Asian Translation. Toshiaki Nakazawa, Hideki Nakayama, Chenchen Ding, Raj Dabre, Anoop Kunchukuttan, Win Pa Pa, Ond\u0159ej Bojar, Shantipriya Parida, Isao Goto, Hidaya Mino, et\u00a0al. (Eds.), Association for Computational Linguistics, Suzhou, China, 1\u201344. Retrieved from https:\/\/aclanthology.org\/2020.wat-1.1"},{"key":"e_1_3_4_18_1","first-page":"311","volume-title":"Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics","author":"Papineni Kishore","year":"2002","unstructured":"Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, 311\u2013318."},{"key":"e_1_3_4_19_1","first-page":"14","volume-title":"Proceedings of the WILDRE5\u20135th Workshop on Indian Language Data: Resources and Evaluation","author":"Parida Shantipriya","year":"2020","unstructured":"Shantipriya Parida, Satya Ranjan Dash, Ond\u0159ej Bojar, Petr Motlicek, Priyanka Pattnaik, and Debasish Kumar Mallick. 2020. OdiEnCorp 2.0: Odia-English parallel corpus for machine translation. In Proceedings of the WILDRE5\u20135th Workshop on Indian Language Data: Resources and Evaluation. 14\u201319."},{"key":"e_1_3_4_20_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W17-4770"},{"key":"e_1_3_4_21_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W18-6319"},{"key":"e_1_3_4_22_1","first-page":"113","volume-title":"Proceedings of the Workshop on Machine Translation and Parsing in Indian Languages (MTPIL-2012)","author":"Ramasamy Loganathan","year":"2012","unstructured":"Loganathan Ramasamy, Ond\u0159ej Bojar, and Zden\u011bk \u017dabokrtsk\u00fd. 2012. Morphological processing for English-Tamil statistical machine translation. In Proceedings of the Workshop on Machine Translation and Parsing in Indian Languages (MTPIL-2012). 113\u2013122."},{"key":"e_1_3_4_23_1","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00452"},{"key":"e_1_3_4_24_1","doi-asserted-by":"crossref","unstructured":"Said Salloum Tarek Gaber Sunil Vadera and Khaled Shaalan. 2023. A new English\/Arabic parallel corpus for phishing emails. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 22 7 Article 201 (July 2023) 17 pages.","DOI":"10.1145\/3606031"},{"key":"e_1_3_4_25_1","first-page":"3743","volume-title":"Proceedings of the 12th Language Resources and Evaluation Conference","author":"Siripragada Shashank","year":"2020","unstructured":"Shashank Siripragada, Jerin Philip, Vinay P. Namboodiri, and C. V. Jawahar. 2020. A multilingual parallel corpora collection effort for Indian languages. In Proceedings of the 12th Language Resources and Evaluation Conference. European Language Resources Association, Marseille, France, 3743\u20133751."},{"key":"e_1_3_4_26_1","unstructured":"Yuqing Tang Chau Tran Xian Li Peng-Jen Chen Naman Goyal Vishrav Chaudhary Jiatao Gu and Angela Fan. 2020. Multilingual translation with extensible multilingual pretraining and finetuning. arXiv:2008.00401. Retrieved from https:\/\/arxiv.org\/abs\/2008.00401"},{"key":"e_1_3_4_27_1","volume-title":"Proceedings of the European Association for Machine Translation Conferences\/Workshops","author":"Tiedemann J\u00f6rg","year":"2020","unstructured":"J\u00f6rg Tiedemann and Santhosh Thottingal. 2020. OPUS-MT \u2013 building open translation services for the world. In Proceedings of the European Association for Machine Translation Conferences\/Workshops."},{"key":"e_1_3_4_28_1","unstructured":"Yonghui Wu Mike Schuster Z. Chen Quoc V. Le Mohammad Norouzi Wolfgang Macherey Maxim Krikun Yuan Cao Qin Gao Klaus Macherey et\u00a0al. 2016. Google\u2019s neural machine translation system: Bridging the gap between human and machine translation. arXiv:1609.08144. Retrieved from https:\/\/arxiv.org\/abs\/1609.08144"},{"key":"e_1_3_4_29_1","doi-asserted-by":"crossref","first-page":"577","DOI":"10.18653\/v1\/2020.emnlp-main.43","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Zhang Shiyue","year":"2020","unstructured":"Shiyue Zhang, Benjamin Frey, and Mohit Bansal. 2020. ChrEn: Cherokee-English machine translation for endangered language revitalization. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Online, 577\u2013595."},{"key":"e_1_3_4_30_1","first-page":"3530","volume-title":"Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC\u201916)","author":"Ziemski Micha\u0142","year":"2016","unstructured":"Micha\u0142 Ziemski, Marcin Junczys-Dowmunt, and Bruno Pouliquen. 2016. The United Nations Parallel Corpus v1.0. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC\u201916). European Language Resources Association (ELRA), Portoro\u017e, Slovenia, 3530\u20133534."}],"container-title":["ACM Transactions on Asian and Low-Resource Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3748313","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,21]],"date-time":"2025-08-21T13:28:13Z","timestamp":1755782893000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3748313"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8,21]]},"references-count":29,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2025,8,31]]}},"alternative-id":["10.1145\/3748313"],"URL":"https:\/\/doi.org\/10.1145\/3748313","relation":{},"ISSN":["2375-4699","2375-4702"],"issn-type":[{"type":"print","value":"2375-4699"},{"type":"electronic","value":"2375-4702"}],"subject":[],"published":{"date-parts":[[2025,8,21]]},"assertion":[{"value":"2023-12-18","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-11-02","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-08-21","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}