{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T05:04:01Z","timestamp":1750309441254,"version":"3.41.0"},"reference-count":24,"publisher":"Association for Computing Machinery (ACM)","issue":"10","license":[{"start":{"date-parts":[[2024,10,24]],"date-time":"2024-10-24T00:00:00Z","timestamp":1729728000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Asian Low-Resour. Lang. Inf. Process."],"published-print":{"date-parts":[[2024,10,31]]},"abstract":"<jats:p>Transfer learning from pre-trained language models to encoder-decoder translation models faces a challenge due to the mismatch between the tasks of pre-training and fine-tuning. Pre-trained models are not explicitly trained to understand the semantic interactions between different languages. To address this issue, a cross-lingual embedding space is used as an interface during the pre-training phase. This approach enables the decoder inputs to attend to the encoder outputs, similar to the fine-tuning process. However, the effectiveness of this transfer heavily relies on the quality of the pre-trained unsupervised cross-lingual embeddings, which introduces complexity and reduces reproducibility. In this study, we propose a pre-training method called Cross-lingual Interaction Transfer (XLIT), which does not depend on other embedding techniques. XLIT effectively reconciles the task discrepancy in machine translation fine-tuning. We conducted extensive experiments involving four low-resource and six very low-resource translation directions. The results of our experiments demonstrate that our method surpasses randomly initialized models and previous pre-training techniques by up to 9.4 BLEU. Furthermore, we demonstrate that our method achieves comparable performance when pre-trained with large-scale monolingual data from various languages.<\/jats:p>","DOI":"10.1145\/3689630","type":"journal-article","created":{"date-parts":[[2024,8,23]],"date-time":"2024-08-23T12:24:42Z","timestamp":1724415882000},"page":"1-13","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["XLIT: A Method to Bridge Task Discrepancy in Machine Translation Pre-training"],"prefix":"10.1145","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4419-0355","authenticated-orcid":false,"given":"Khang","family":"Pham","sequence":"first","affiliation":[{"name":"Faculty of Information Technology, University of Science, Ho Chi Minh, Vietnam and Vietnam National University, Ho Chi Minh, Vietnam"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0884-1635","authenticated-orcid":false,"given":"Long","family":"Nguyen","sequence":"additional","affiliation":[{"name":"Faculty of Information Technology, University of Science, Ho Chi Minh, Vietnam and Vietnam National University, Ho Chi Minh, Vietnam"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2069-1016","authenticated-orcid":false,"given":"Dien","family":"Dinh","sequence":"additional","affiliation":[{"name":"Faculty of Information Technology, University of Science, Ho Chi Minh, Vietnam and Vietnam National University, Ho Chi Minh City, Vietnam"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,10,24]]},"reference":[{"key":"e_1_3_3_2_2","doi-asserted-by":"publisher","DOI":"10.5281\/zenodo.5196577"},{"key":"e_1_3_3_3_2","first-page":"261","volume-title":"Proceedings of the Conference of European Association for Machine Translation (EAMT\u201912)","author":"Cettolo Mauro","year":"2012","unstructured":"Mauro Cettolo, Christian Girardi, and Marcello Federico. 2012. WIT3: Web inventory of transcribed and translated talks. In Proceedings of the Conference of European Association for Machine Translation (EAMT\u201912). 261\u2013268."},{"key":"e_1_3_3_4_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.747"},{"key":"e_1_3_3_5_2","volume-title":"Advances in Neural Information Processing Systems","author":"Conneau Alexis","year":"2019","unstructured":"Alexis Conneau and Guillaume Lample. 2019. Cross-lingual language model pretraining. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch\u00e9-Buc, E. Fox, and R. Garnett (Eds.). Vol. 32. Curran Associates, Inc.https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2019\/file\/c04c19c2c2474dbf5f7ac4372c5b9af1-Paper.pdf"},{"key":"e_1_3_3_6_2","volume-title":"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP\u201918)","author":"Conneau Alexis","year":"2018","unstructured":"Alexis Conneau, Guillaume Lample, Ruty Rinott, Adina Williams, Samuel R. Bowman, Holger Schwenk, and Veselin Stoyanov. 2018. XNLI: Evaluating cross-lingual sentence representations. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP\u201918)."},{"key":"e_1_3_3_7_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N19-1423"},{"key":"e_1_3_3_8_2","volume-title":"3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7\u20139, 2015, Conference Track Proceedings","author":"Kingma Diederik P.","year":"2015","unstructured":"Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7\u20139, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http:\/\/arxiv.org\/abs\/1412.6980"},{"key":"e_1_3_3_9_2","volume-title":"International Conference on Learning Representations","author":"Lample Guillaume","year":"2018","unstructured":"Guillaume Lample, Alexis Conneau, Ludovic Denoyer, and Marc\u2019Aurelio Ranzato. 2018. Unsupervised machine translation using monolingual corpora only. In International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=rkYTTf-AZ"},{"key":"e_1_3_3_10_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.acl-main.703"},{"key":"e_1_3_3_11_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.210"},{"key":"e_1_3_3_12_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.acl-long.101"},{"key":"e_1_3_3_13_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.findings-acl.373"},{"key":"e_1_3_3_14_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00343"},{"key":"e_1_3_3_15_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N19-4009"},{"key":"e_1_3_3_16_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.acl-long.21"},{"key":"e_1_3_3_17_2","first-page":"311","volume-title":"Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics","author":"Papineni Kishore","year":"2002","unstructured":"Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 311\u2013318."},{"key":"e_1_3_3_18_2","article-title":"Improving language understanding by generative pre-training","author":"Radford Alec","year":"2018","unstructured":"Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. OpenAI (2018).","journal-title":"OpenAI"},{"key":"e_1_3_3_19_2","first-page":"4518","volume-title":"Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)","author":"Ren Shuo","year":"2021","unstructured":"Shuo Ren, Long Zhou, Shujie Liu, Furu Wei, Ming Zhou, and Shuai Ma. 2021. SemFace: Pre-training encoder and decoder with a semantic interface for neural machine translation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 4518\u20134527."},{"key":"e_1_3_3_20_2","first-page":"1","volume-title":"2016 Conference of The Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques (O-COCOSDA\u201916)","year":"2016","unstructured":"Hammam Riza, Michael Purwoadi, Teduh Uliniansyah, Aw Ai Ti, Sharifah Mahani Aljunied, Luong Chi Mai, Vu Tat Thang, Nguyen Phuong Thai, Vichet Chea, Rapid Sun, Sethserey Sam, Sopheap Seng, Khin Mar Soe, Khin Thandar Nwet, Masao Utiyama, and Chenchen Ding. 2016. Introduction of the Asian Language Treebank. In 2016 Conference of The Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques (O-COCOSDA\u201916). IEEE, 1\u20136."},{"key":"e_1_3_3_21_2","volume-title":"International Conference on Machine Learning","author":"Song Kaitao","year":"2019","unstructured":"Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, and Tie-Yan Liu. 2019. MASS: Masked sequence to sequence pre-training for language generation. In International Conference on Machine Learning."},{"key":"e_1_3_3_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.308"},{"key":"e_1_3_3_23_2","first-page":"4003","volume-title":"Proceedings of the Twelfth Language Resources and Evaluation Conference","author":"Wenzek Guillaume","year":"2020","unstructured":"Guillaume Wenzek, Marie-Anne Lachaux, Alexis Conneau, Vishrav Chaudhary, Francisco Guzm\u00e1n, Armand Joulin, and Edouard Grave. 2020. CCNet: Extracting high quality monolingual datasets from web crawl data. In Proceedings of the Twelfth Language Resources and Evaluation Conference. European Language Resources Association, Marseille, France, 4003\u20134012. https:\/\/aclanthology.org\/2020.lrec-1.494"},{"key":"e_1_3_3_24_2","first-page":"9378","volume-title":"The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7\u201312, 2020","author":"Yang Jiacheng","year":"2020","unstructured":"Jiacheng Yang, Mingxuan Wang, Hao Zhou, Chengqi Zhao, Weinan Zhang, Yong Yu, and Lei Li. 2020. Towards making the most of BERT in neural machine translation. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7\u201312, 2020. AAAI Press, 9378\u20139385. https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/view\/6479"},{"key":"e_1_3_3_25_2","doi-asserted-by":"crossref","first-page":"2624","DOI":"10.18653\/v1\/2020.emnlp-main.208","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP\u201920)","author":"Yang Zhen","year":"2020","unstructured":"Zhen Yang, Bojie Hu, Ambyera Han, Shen Huang, and Qi Ju. 2020. CSP: Code-switching pre-training for neural machine translation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP\u201920). 2624\u20132636."}],"container-title":["ACM Transactions on Asian and Low-Resource Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3689630","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3689630","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:09:47Z","timestamp":1750295387000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3689630"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,10,24]]},"references-count":24,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2024,10,31]]}},"alternative-id":["10.1145\/3689630"],"URL":"https:\/\/doi.org\/10.1145\/3689630","relation":{},"ISSN":["2375-4699","2375-4702"],"issn-type":[{"type":"print","value":"2375-4699"},{"type":"electronic","value":"2375-4702"}],"subject":[],"published":{"date-parts":[[2024,10,24]]},"assertion":[{"value":"2023-11-26","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-08-12","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-10-24","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}