{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T05:02:06Z","timestamp":1750309326490,"version":"3.41.0"},"reference-count":41,"publisher":"Association for Computing Machinery (ACM)","issue":"6","license":[{"start":{"date-parts":[[2024,6,21]],"date-time":"2024-06-21T00:00:00Z","timestamp":1718928000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Institute of Science and Development, Chinese Academy of Sciences","award":["GHJ-ZLZX-2023-04"],"award-info":[{"award-number":["GHJ-ZLZX-2023-04"]}]},{"name":"Fundamental-Plus Plan","award":["2022-JX-0840-1"],"award-info":[{"award-number":["2022-JX-0840-1"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Asian Low-Resour. Lang. Inf. Process."],"published-print":{"date-parts":[[2024,6,30]]},"abstract":"<jats:p>Domain adaptation proves to be an effective solution for addressing inadequate translation performance within specific domains. However, the straightforward approach of mixing data from multiple domains to obtain the multi-domain neural machine translation (NMT) model can give rise to the parameter interference between domains problem, resulting in a degradation of overall performance. To address this, we introduce a multi-domain adaptive NMT method aimed at learning domain specific sub-layer latent variable and employ the Gumbel-Softmax reparameterization technique to concurrently train both model parameters and domain specific sub-layer latent variable. This approach facilitates learning private domain-specific knowledge while sharing common domain-invariant knowledge, effectively mitigating the parameter interference problem. The experimental results show that our proposed method significantly improved by up to 7.68 and 3.71 BLEU compared with the baseline model in English-German and Chinese-English public multi-domain datasets, respectively.<\/jats:p>","DOI":"10.1145\/3661305","type":"journal-article","created":{"date-parts":[[2024,4,29]],"date-time":"2024-04-29T11:23:07Z","timestamp":1714389787000},"page":"1-15","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Learning Domain Specific Sub-layer Latent Variable for Multi-Domain Adaptation Neural Machine Translation"],"prefix":"10.1145","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0009-0006-0271-0040","authenticated-orcid":false,"given":"Shuanghong","family":"Huang","sequence":"first","affiliation":[{"name":"School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1691-1584","authenticated-orcid":false,"given":"Chong","family":"Feng","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China"},{"name":"Southeast Academy of Information Technology, Beijing Institute of Technology, Putian, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9296-9905","authenticated-orcid":false,"given":"Ge","family":"Shi","sequence":"additional","affiliation":[{"name":"Beijing University of Technology, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9518-5391","authenticated-orcid":false,"given":"Zhengjun","family":"Li","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-8271-3236","authenticated-orcid":false,"given":"Xuan","family":"Zhao","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China"},{"name":"Southeast Academy of Information Technology, Beijing Institute of Technology, Putian, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-0418-8538","authenticated-orcid":false,"given":"Xinyan","family":"Li","sequence":"additional","affiliation":[{"name":"China North Vehicle Research Institute, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9895-1511","authenticated-orcid":false,"given":"Xiaomei","family":"Wang","sequence":"additional","affiliation":[{"name":"Institute of Science and Development, Chinese Academy of Sciences, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2024,6,21]]},"reference":[{"key":"e_1_3_3_2_2","volume-title":"Proceedings of the 3rd International Conference on Learning Representations (ICLR\u201915)","author":"Bahdanau Dzmitry","year":"2015","unstructured":"Dzmitry Bahdanau, Kyung Hyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations (ICLR\u201915)."},{"key":"e_1_3_3_3_2","first-page":"1538","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing","author":"Bapna Ankur","year":"2019","unstructured":"Ankur Bapna and Orhan Firat. 2019. Simple, scalable adaptation for neural machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 1538\u20131548."},{"key":"e_1_3_3_4_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W17-4712"},{"key":"e_1_3_3_5_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1179"},{"key":"e_1_3_3_6_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P17-2061"},{"key":"e_1_3_3_7_2","first-page":"1304","volume-title":"Proceedings of the 27th International Conference on Computational Linguistics","author":"Chu Chenhui","year":"2018","unstructured":"Chenhui Chu and Rui Wang. 2018. A survey of domain adaptation for neural machine translation. In Proceedings of the 27th International Conference on Computational Linguistics. 1304\u20131319."},{"key":"e_1_3_3_8_2","doi-asserted-by":"crossref","first-page":"1799","DOI":"10.18653\/v1\/P18-1167","volume-title":"Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Domhan Tobias","year":"2018","unstructured":"Tobias Domhan. 2018. How much attention do you need? A granular analysis of neural machine translation architectures. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1799\u20131808."},{"key":"e_1_3_3_9_2","first-page":"3081","volume-title":"Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Gu Shuhao","year":"2019","unstructured":"Shuhao Gu, Yang Feng, and Qun Liu. 2019. Improving domain adaptation translation with domain invariant and specific information. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 3081\u20133091."},{"key":"e_1_3_3_10_2","first-page":"3942","volume-title":"Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Gu Shuhao","year":"2021","unstructured":"Shuhao Gu, Yang Feng, and Wanying Xie. 2021. Pruning-then-expanding model for domain adaptation of neural machine translation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 3942\u20133952."},{"key":"e_1_3_3_11_2","first-page":"351","volume-title":"Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)","author":"Hendy Amr","year":"2022","unstructured":"Amr Hendy, Mohamed Abdelghaffar, Mohamed Afify, and Ahmed Y. Tawfik. 2022. Domain specific sub-network for multi-domain neural machine translation. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 351\u2013356."},{"key":"e_1_3_3_12_2","volume-title":"Proceedings of the 5th International Conference on Learning Representations (ICLR\u201917)","author":"Jang Eric","year":"2017","unstructured":"Eric Jang, Shixiang Gu, and Ben Poole. 2017. Categorical reparameterization with Gumbel-Softmax. In Proceedings of the 5th International Conference on Learning Representations (ICLR\u201917). OpenReview.net."},{"key":"e_1_3_3_13_2","doi-asserted-by":"crossref","first-page":"36","DOI":"10.18653\/v1\/W18-2705","volume-title":"Proceedings of the 2nd Workshop on Neural Machine Translation and Generation","author":"Khayrallah Huda","year":"2018","unstructured":"Huda Khayrallah, Brian Thompson, Kevin Duh, and Philipp Koehn. 2018. Regularized training objective for continued training for domain adaptation in neural machine translation. In Proceedings of the 2nd Workshop on Neural Machine Translation and Generation. 36\u201344."},{"key":"e_1_3_3_14_2","volume-title":"Proceedings of the 3rd International Conference on Learning Representations (ICLR\u201915)","author":"Kingma Diederik P.","year":"2015","unstructured":"Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR\u201915), Yoshua Bengio and Yann LeCun (Eds.)."},{"key":"e_1_3_3_15_2","first-page":"372","volume-title":"Proceedings of the International Conference Recent Advances in Natural Language Processing (RANLP\u201917)","author":"Kobus Catherine","year":"2017","unstructured":"Catherine Kobus, Josep M. Crego, and Jean Senellart. 2017. Domain control for neural machine translation. In Proceedings of the International Conference Recent Advances in Natural Language Processing (RANLP\u201917). 372\u2013378."},{"key":"e_1_3_3_16_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-2012"},{"key":"e_1_3_3_17_2","doi-asserted-by":"publisher","DOI":"10.1214\/aoms\/1177729694"},{"key":"e_1_3_3_18_2","first-page":"1736","article-title":"Deep transformers with latent depth","volume":"33","author":"Li Xian","year":"2020","unstructured":"Xian Li, Asa Cooper Stickland, Yuqing Tang, and Xiang Kong. 2020. Deep transformers with latent depth. Advan. Neural Inf. Process. Syst. 33 (2020), 1736\u20131746.","journal-title":"Advan. Neural Inf. Process. Syst."},{"key":"e_1_3_3_19_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i15.17574"},{"key":"e_1_3_3_20_2","volume-title":"Proceedings of the 7th International Conference on Learning Representations (ICLR\u201919)","author":"Liu Zhuang","year":"2019","unstructured":"Zhuang Liu, Mingjie Sun, Tinghui Zhou, Gao Huang, and Trevor Darrell. 2019. Rethinking the value of network pruning. In Proceedings of the 7th International Conference on Learning Representations (ICLR\u201919). OpenReview.net."},{"key":"e_1_3_3_21_2","volume-title":"Proceedings of the 12th International Workshop on Spoken Language Translation: Evaluation Campaign","author":"Luong Minh-Thang","year":"2015","unstructured":"Minh-Thang Luong and Christopher D. Manning. 2015. Stanford neural machine translation systems for spoken language domains. In Proceedings of the 12th International Workshop on Spoken Language Translation: Evaluation Campaign."},{"key":"e_1_3_3_22_2","first-page":"109","volume-title":"Psychology of Learning and Motivation","author":"McCloskey Michael","year":"1989","unstructured":"Michael McCloskey and Neal J. Cohen. 1989. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of Learning and Motivation. Vol. 24. Elsevier, 109\u2013165."},{"key":"e_1_3_3_23_2","article-title":"Transformers without tears: Improving the normalization of self-attention","author":"Nguyen Toan Q.","year":"2019","unstructured":"Toan Q. Nguyen and Julian Salazar. 2019. Transformers without tears: Improving the normalization of self-attention. arXiv preprint arXiv:1910.05895 (2019).","journal-title":"arXiv preprint arXiv:1910.05895"},{"key":"e_1_3_3_24_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W18-6319"},{"key":"e_1_3_3_25_2","doi-asserted-by":"publisher","DOI":"10.1037\/0033-295X.97.2.285"},{"key":"e_1_3_3_26_2","article-title":"Domain adaptation and multi-domain adaptation for neural machine translation: A survey","author":"Saunders Danielle","year":"2021","unstructured":"Danielle Saunders. 2021. Domain adaptation and multi-domain adaptation for neural machine translation: A survey. arXiv preprint arXiv:2104.06951 (2021).","journal-title":"arXiv preprint arXiv:2104.06951"},{"key":"e_1_3_3_27_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P16-1162"},{"key":"e_1_3_3_28_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Shazeer Noam","year":"2017","unstructured":"Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, and Jeff Dean. 2017. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_3_29_2","first-page":"78","volume-title":"Proceedings of the 14th Conference of the Association for Machine Translation in the Americas (Volume 2: User Track)","author":"Stewart Craig","year":"2020","unstructured":"Craig Stewart, Ricardo Rei, Catarina Farinha, and Alon Lavie. 2020. COMET-deploying a new state-of-the-art MT evaluation metric in production. In Proceedings of the 14th Conference of the Association for Machine Translation in the Americas (Volume 2: User Track). 78\u2013109."},{"issue":"5","key":"e_1_3_3_30_2","first-page":"1530","article-title":"Exploring discriminative word-level domain contexts for multi-domain neural machine translation","volume":"43","author":"Su Jinsong","year":"2019","unstructured":"Jinsong Su, Jiali Zeng, Jun Xie, Huating Wen, Yongjing Yin, and Yang Liu. 2019. Exploring discriminative word-level domain contexts for multi-domain neural machine translation. IEEE Trans. Pattern Anal. Mach. Intell. 43, 5 (2019), 1530\u20131545.","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"e_1_3_3_31_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i05.6424"},{"key":"e_1_3_3_32_2","article-title":"Multi-domain neural machine translation","author":"Tars Sander","year":"2018","unstructured":"Sander Tars and Mark Fishel. 2018. Multi-domain neural machine translation. arXiv preprint arXiv:1805.02282 (2018).","journal-title":"arXiv preprint arXiv:1805.02282"},{"key":"e_1_3_3_33_2","first-page":"2062","volume-title":"Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Thompson Brian","year":"2019","unstructured":"Brian Thompson, Jeremy Gwinnup, Huda Khayrallah, Kevin Duh, and Philipp Koehn. 2019. Overcoming catastrophic forgetting during domain adaptation of neural machine translation. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2062\u20132068."},{"key":"e_1_3_3_34_2","first-page":"1837","volume-title":"Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC\u201914)","author":"Tian Liang","year":"2014","unstructured":"Liang Tian, Derek F. Wong, Lidia S. Chao, Paulo Quaresma, Francisco Oliveira, Yi Lu, Shuo Li, Yiming Wang, and Longyue Wang. 2014. UM-Corpus: A large English-Chinese parallel corpus for statistical machine translation. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC\u201914). 1837\u20131842."},{"key":"e_1_3_3_35_2","first-page":"2214","volume-title":"Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC\u201912)","author":"Tiedemann J\u00f6rg","year":"2012","unstructured":"J\u00f6rg Tiedemann. 2012. Parallel data, tools and interfaces in OPUS. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC\u201912), Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Ugur Dogan, Bente Maegaard, Joseph Mariani, Jan Odijk, and Stelios Piperidis (Eds.). European Language Resources Association (ELRA), 2214\u20132218."},{"key":"e_1_3_3_36_2","first-page":"193","volume-title":"Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)","year":"2018","unstructured":"Ashish Vaswani, Samy Bengio, Eugene Brevdo, Francois Chollet, Aidan N. Gomez, Stephan Gouws, Llion Jones, \u0141ukasz Kaiser, Nal Kalchbrenner, Niki Parmar, Ryan Sepassi, Noam Shazeer, and Jakob Uszkoreit. 2018. Tensor2Tensor for neural machine translation. In Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track). 193\u2013199. https:\/\/aclanthology.org\/W18-1819.pdf"},{"key":"e_1_3_3_37_2","first-page":"5998","volume-title":"Proceedings of the Conference on Advances in Neural Information Processing Systems","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 5998\u20136008."},{"key":"e_1_3_3_38_2","first-page":"500","volume-title":"Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)","author":"Vilar David","year":"2018","unstructured":"David Vilar. 2018. Learning hidden unit contribution for adapting neural machine translation models. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). 500\u2013505."},{"key":"e_1_3_3_39_2","article-title":"DeepNet: Scaling transformers to 1,000 layers","author":"Wang Hongyu","year":"2022","unstructured":"Hongyu Wang, Shuming Ma, Li Dong, Shaohan Huang, Dongdong Zhang, and Furu Wei. 2022. DeepNet: Scaling transformers to 1,000 layers. arXiv preprint arXiv:2203.00555 (2022).","journal-title":"arXiv preprint arXiv:2203.00555"},{"key":"e_1_3_3_40_2","doi-asserted-by":"publisher","DOI":"10.3233\/JIFS-212236"},{"key":"e_1_3_3_41_2","first-page":"447","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing","author":"Zeng Jiali","year":"2018","unstructured":"Jiali Zeng, Jinsong Su, Huating Wen, Yang Liu, Jun Xie, Yongjing Yin, and Jianqiang Zhao. 2018. Multi-domain neural machine translation with word-level domain context discrimination. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 447\u2013457."},{"key":"e_1_3_3_42_2","volume-title":"Proceedings of the 6th International Conference on Learning Representations (ICLR\u201918)","author":"Zhu Michael","year":"2018","unstructured":"Michael Zhu and Suyog Gupta. 2018. To prune, or not to prune: Exploring the efficacy of pruning for model compression. In Proceedings of the 6th International Conference on Learning Representations (ICLR\u201918). OpenReview.net."}],"container-title":["ACM Transactions on Asian and Low-Resource Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3661305","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3661305","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T00:04:02Z","timestamp":1750291442000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3661305"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,6,21]]},"references-count":41,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2024,6,30]]}},"alternative-id":["10.1145\/3661305"],"URL":"https:\/\/doi.org\/10.1145\/3661305","relation":{},"ISSN":["2375-4699","2375-4702"],"issn-type":[{"type":"print","value":"2375-4699"},{"type":"electronic","value":"2375-4702"}],"subject":[],"published":{"date-parts":[[2024,6,21]]},"assertion":[{"value":"2023-09-08","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-04-20","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-06-21","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}