{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,15]],"date-time":"2026-05-15T02:33:35Z","timestamp":1778812415206,"version":"3.51.4"},"reference-count":41,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2020,2,7]],"date-time":"2020-02-07T00:00:00Z","timestamp":1581033600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["61525205 and 61876120"],"award-info":[{"award-number":["61525205 and 61876120"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Asian Low-Resour. Lang. Inf. Process."],"published-print":{"date-parts":[[2020,5,31]]},"abstract":"<jats:p>\n            In neural machine translation (NMT), the source and target words are at the two ends of a large deep neural network, normally mediated by a series of non-linear activations. The problem with such consequent non-linear activations is that they significantly decrease the magnitude of the gradient in a deep neural network, and thus gradually loosen the interaction between source words and their translations. As a result, a source word may be incorrectly translated into a target word out of its translational equivalents. In this article, we propose\n            <jats:italic>short-path units<\/jats:italic>\n            (SPUs) to strengthen the association of source and target words by allowing information flow over adjacent layers effectively via linear interpolation. In particular, we enrich three critical NMT components with SPUs: (1) an enriched encoding model with SPU, which interpolates source word embeddings linearly into source annotations; (2) an enriched decoding model with SPU, which enables the source context linearly flow to target-side hidden states; and (3) an enriched output model with SPU, which further allows linear interpolation of target-side hidden states into output states. Experimentation on Chinese-to-English, English-to-German, and low-resource Tibetan-to-Chinese translation tasks demonstrates that the linear interpolation of SPUs significantly improves the overall translation quality by 1.88, 1.43, and 3.75 BLEU, respectively. Moreover, detailed analysis shows that our approaches much strengthen the association of source and target words. From the preceding, we can see that our proposed model is effective both in rich- and low-resource scenarios.\n          <\/jats:p>","DOI":"10.1145\/3377851","type":"journal-article","created":{"date-parts":[[2020,4,4]],"date-time":"2020-04-04T03:08:03Z","timestamp":1585969683000},"page":"1-16","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["Improving Neural Machine Translation with Linear Interpolation of a Short-Path Unit"],"prefix":"10.1145","volume":"19","author":[{"given":"Yachao","family":"Li","sequence":"first","affiliation":[{"name":"Soochow University 8 Northwest Minzu University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Junhui","family":"Li","sequence":"additional","affiliation":[{"name":"Soochow University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Min","family":"Zhang","sequence":"additional","affiliation":[{"name":"Soochow University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yixin","family":"Li","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Peng","family":"Zou","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2020,2,7]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Jamie Ryan Kiros, and Geoffrey E. Hinton","author":"Ba Jimmy Lei","year":"2016","unstructured":"Jimmy Lei Ba , Jamie Ryan Kiros, and Geoffrey E. Hinton . 2016 . Layer normalization. Computing Research Repository . arXiv:1607.06450. Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. 2016. Layer normalization. Computing Research Repository. arXiv:1607.06450."},{"key":"e_1_2_1_2_1","volume-title":"Proceedings of the 3rd International Conference on Learning Representations (ICLR\u201915)","author":"Bahdanau Dzmitry","year":"2015","unstructured":"Dzmitry Bahdanau , Kyunghyun Cho , and Yoshua Bengio . 2015 . Neural machine translation by jointly learning to align and translate . In Proceedings of the 3rd International Conference on Learning Representations (ICLR\u201915) . Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations (ICLR\u201915)."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1338"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/72.279181"},{"key":"e_1_2_1_5_1","volume-title":"Proceedings of the Workshop on Syntax, Semantics, and Structure in Statistical Translation (SSST\u201914)","author":"Cho Kyunghyun","year":"2014","unstructured":"Kyunghyun Cho , Bart van Merrienboer , Dzmitry Bahdanau , and Yoshua Bengio . 2014 . On the properties of neural machinetranslation: Encoder-decoder approaches . In Proceedings of the Workshop on Syntax, Semantics, and Structure in Statistical Translation (SSST\u201914) . 103--111. Kyunghyun Cho, Bart van Merrienboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. On the properties of neural machinetranslation: Encoder-decoder approaches. In Proceedings of the Workshop on Syntax, Semantics, and Structure in Statistical Translation (SSST\u201914). 103--111."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1179"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-1167"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.330186"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P17-1012"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3342353"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_2_1_12_1","volume-title":"Salakhutdinov","author":"Hinton Geoffrey E.","year":"2012","unstructured":"Geoffrey E. Hinton , Nitish Srivastava , Alex Krizhevsky , Ilya Sutskever , and Ruslan R . Salakhutdinov . 2012 . Improving neural networks by preventing co-adaptation of feature detectors. Computing Research Repository . arXiv:1207.0580. Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R. Salakhutdinov. 2012. Improving neural networks by preventing co-adaptation of feature detectors. Computing Research Repository. arXiv:1207.0580."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W15-3014"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1162\/COLI_a_00300"},{"key":"e_1_2_1_16_1","volume-title":"Proceedings of the 4th International Conference on Learning Representations (ICLR\u201916)","author":"Kalchbrenner Nal","year":"2016","unstructured":"Nal Kalchbrenner , Ivo Danihelka , and Alex Graves . 2016 . Grid long short-term memory . In Proceedings of the 4th International Conference on Learning Representations (ICLR\u201916) . Nal Kalchbrenner, Ivo Danihelka, and Alex Graves. 2016. Grid long short-term memory. In Proceedings of the 4th International Conference on Learning Representations (ICLR\u201916)."},{"key":"e_1_2_1_17_1","unstructured":"Jaeyoung Kim Mostafa El-Khamy and Jungwon Lee. 2017. Residual LSTM: Design of a deep recurrent architecture for distant speech recognition. arXiv:1701.03360v3.  Jaeyoung Kim Mostafa El-Khamy and Jungwon Lee. 2017. Residual LSTM: Design of a deep recurrent architecture for distant speech recognition. arXiv:1701.03360v3."},{"key":"e_1_2_1_18_1","volume-title":"Proceedings of the 4th International Conference on Learning Representations (ICLR\u201915)","author":"Diederik","unstructured":"Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization . In Proceedings of the 4th International Conference on Learning Representations (ICLR\u201915) . Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of the 4th International Conference on Learning Representations (ICLR\u201915)."},{"key":"e_1_2_1_19_1","first-page":"04","volume-title":"Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP\u201904)","author":"Koehn Philipp","year":"2004","unstructured":"Philipp Koehn . 2004 . Statistical significance tests for machine translation evaluation . In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP\u201904) . 388--395. https:\/\/www.aclweb.org\/anthology\/W 04 - 3250 Philipp Koehn. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP\u201904). 388--395. https:\/\/www.aclweb.org\/anthology\/W04-3250"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-1164"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P17-1064"},{"key":"e_1_2_1_22_1","first-page":"18","volume-title":"Proceedings of the 27th International Conference on Computational Linguistics (COLING\u201918)","author":"Li Yachao","year":"2018","unstructured":"Yachao Li , Junhui Li , and Min Zhang . 2018 . Adaptive weighting for neural machine translation . In Proceedings of the 27th International Conference on Computational Linguistics (COLING\u201918) . 3038--3048. https:\/\/www.aclweb.org\/anthology\/C 18 - 1257 Yachao Li, Junhui Li, and Min Zhang. 2018. Adaptive weighting for neural machine translation. In Proceedings of the 27th International Conference on Computational Linguistics (COLING\u201918). 3038--3048. https:\/\/www.aclweb.org\/anthology\/C18-1257"},{"key":"e_1_2_1_23_1","volume-title":"Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI\u201915)","author":"Liu Yang","year":"2015","unstructured":"Yang Liu and Maosong Sun . 2015 . Contrastive unsupervised word alignment with non-local features . In Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI\u201915) . 857--868. Yang Liu and Maosong Sun. 2015. Contrastive unsupervised word alignment with non-local features. In Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI\u201915). 857--868."},{"key":"e_1_2_1_24_1","first-page":"15","volume-title":"Proceedings of the 2015 Conference on Empirical Methods on Natural Language Processing (EMNLP\u201915)","author":"Luong Thang","year":"1865","unstructured":"Thang Luong , Hieu Pham , and Christopher D. Manning . 2015. Effective approaches to attention-based neural machine translation . In Proceedings of the 2015 Conference on Empirical Methods on Natural Language Processing (EMNLP\u201915) . 1412--1421. DOI:https:\/\/doi.org\/10. 1865 3\/v1\/D 15 - 1166 10.18653\/v1 Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods on Natural Language Processing (EMNLP\u201915). 1412--1421. DOI:https:\/\/doi.org\/10.18653\/v1\/D15-1166"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W17-4710"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1162\/089120103321337421"},{"key":"e_1_2_1_27_1","volume-title":"Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL\u201902)","author":"Papineni Kishore","year":"2002","unstructured":"Kishore Papineni , Salim Roukos , Todd Ward , and Wei-Jing Zhu . 2002 . Bleu: A method for automatic evaluation of machine translation . In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL\u201902) . 311--318. DOI:https:\/\/doi.org\/10.3115\/1073083.1073135 10.3115\/1073083.1073135 Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL\u201902). 311--318. DOI:https:\/\/doi.org\/10.3115\/1073083.1073135"},{"key":"e_1_2_1_28_1","volume-title":"Proceedings of the 30th International Conference on Machine Learning (ICML\u201913)","author":"Pascanu Razvan","year":"2013","unstructured":"Razvan Pascanu , Tomas Mikolov , and Yoshua Bengio . 2013 . On the difficulty of training recurrent neural networks . In Proceedings of the 30th International Conference on Machine Learning (ICML\u201913) . 1310--1318. Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. 2013. On the difficulty of training recurrent neural networks. In Proceedings of the 30th International Conference on Machine Learning (ICML\u201913). 1310--1318."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P16-1162"},{"key":"e_1_2_1_30_1","volume-title":"Proceedings of the ICML 2015 Deep Learning Workshop.","author":"Srivastava Rupesh Kumar","year":"2015","unstructured":"Rupesh Kumar Srivastava , Klaus Greff , and Jurgen Schmidhuber . 2015 . Highway networks . In Proceedings of the ICML 2015 Deep Learning Workshop. Rupesh Kumar Srivastava, Klaus Greff, and Jurgen Schmidhuber. 2015. Highway networks. In Proceedings of the ICML 2015 Deep Learning Workshop."},{"key":"e_1_2_1_31_1","volume-title":"Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI\u201918)","author":"Su Jinsong","year":"2018","unstructured":"Jinsong Su , Shan Wu , Deyi Xiong , Yaojie Lu , Xianpei Han , and Biao Zhang . 2018 . Variational recurrent neural machine translation . In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI\u201918) . Jinsong Su, Shan Wu, Deyi Xiong, Yaojie Lu, Xianpei Han, and Biao Zhang. 2018. Variational recurrent neural machine translation. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI\u201918)."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00048"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P16-1008"},{"key":"e_1_2_1_34_1","volume-title":"Proceedings of the 2017 Conference on Neural Information Processing Systems (NIPS\u201917)","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani , Noam Shazeer , Niki Parmar , Jakob Uszkoreit , Llion Jones , Aidan N. Gomez , Lukasz Kaiser , and Illia Polosukhin . 2017 . Attention is all you need . In Proceedings of the 2017 Conference on Neural Information Processing Systems (NIPS\u201917) . 5998--6008. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the 2017 Conference on Neural Information Processing Systems (NIPS\u201917). 5998--6008."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P17-1013"},{"key":"e_1_2_1_36_1","volume-title":"Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI\u201917)","author":"Wang Xing","year":"2017","unstructured":"Xing Wang , Zhengdong Lu , Zhaopeng Tu , Hang Li , Deyi Xiong , and Min Zhang . 2017 . Neural machine translation advised by statistical machine translation . In Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI\u201917) . 3330--3336. Xing Wang, Zhengdong Lu, Zhaopeng Tu, Hang Li, Deyi Xiong, and Min Zhang. 2017. Neural machine translation advised by statistical machine translation. In Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI\u201917). 3330--3336."},{"key":"e_1_2_1_37_1","unstructured":"Yonghui Wu Mike Schuster Zhifeng Chen Quoc V. Le Mohammad Norouzi Wolfgang Macherey Maxim Krikun etal 2016. Google\u2019s neural machine translation system: Bridging the gap between human and machine translation. arXiv:1609.08144.  Yonghui Wu Mike Schuster Zhifeng Chen Quoc V. Le Mohammad Norouzi Wolfgang Macherey Maxim Krikun et al. 2016. Google\u2019s neural machine translation system: Bridging the gap between human and machine translation. arXiv:1609.08144."},{"key":"e_1_2_1_38_1","volume-title":"Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI\u201918)","author":"Xiong Hao","year":"2018","unstructured":"Hao Xiong , Zhongjun He , Xiaoguang Hu , and Hua Wu . 2018 . Multi-channel encoder for neural machine translation . In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI\u201918) . Hao Xiong, Zhongjun He, Xiaoguang Hu, and Hua Wu. 2018. Multi-channel encoder for neural machine translation. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI\u201918)."},{"key":"e_1_2_1_39_1","unstructured":"Kaisheng Yao Trevor Cohn Katerina Vylomova Kevin Duh and Chris Dyer. 2015. Depth-gated LSTM. arXiv:1508.03790v4.  Kaisheng Yao Trevor Cohn Katerina Vylomova Kevin Duh and Chris Dyer. 2015. Depth-gated LSTM. arXiv:1508.03790v4."},{"key":"e_1_2_1_40_1","volume-title":"THUMT: An open source toolkit for neural machine translation. arXiv:1706.06415.","author":"Zhang Jiacheng","year":"2017","unstructured":"Jiacheng Zhang , Yanzhuo Ding , Shiqi Shen , Yong Cheng , Maosong Sun , Huanbo Luan , and Yang Liu . 2017 . THUMT: An open source toolkit for neural machine translation. arXiv:1706.06415. Jiacheng Zhang, Yanzhuo Ding, Shiqi Shen, Yong Cheng, Maosong Sun, Huanbo Luan, and Yang Liu. 2017. THUMT: An open source toolkit for neural machine translation. arXiv:1706.06415."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00105"}],"container-title":["ACM Transactions on Asian and Low-Resource Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3377851","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3377851","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:38:52Z","timestamp":1750199932000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3377851"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,2,7]]},"references-count":41,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2020,5,31]]}},"alternative-id":["10.1145\/3377851"],"URL":"https:\/\/doi.org\/10.1145\/3377851","relation":{},"ISSN":["2375-4699","2375-4702"],"issn-type":[{"value":"2375-4699","type":"print"},{"value":"2375-4702","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,2,7]]},"assertion":[{"value":"2019-04-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-12-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-02-07","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}