{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:27:30Z","timestamp":1750220850116,"version":"3.41.0"},"reference-count":23,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2019,11,1]],"date-time":"2019-11-01T00:00:00Z","timestamp":1572566400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["61673289,61273319"],"award-info":[{"award-number":["61673289,61273319"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100012166","name":"National Key R&D Program of China","doi-asserted-by":"crossref","award":["2016YFE0132100"],"award-info":[{"award-number":["2016YFE0132100"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Asian Low-Resour. Lang. Inf. Process."],"published-print":{"date-parts":[[2020,3,31]]},"abstract":"<jats:p>The convolutional sequence-to-sequence (ConvS2S) machine translation system is one of the typical neural machine translation (NMT) systems. Training the ConvS2S model tends to get stuck in a local optimum in our pre-studies. To overcome this inferior behavior, we propose to de-train a trained ConvS2S model in a mild way and retrain to find a better solution globally. In particular, the trained parameters of one layer of the NMT network are abandoned by re-initialization while other layers\u2019 parameters are kept at the same time to kick off re-optimization from a new start point and safeguard the new start point not too far from the previous optimum. This procedure is executed layer by layer until all layers of the ConvS2S model are explored. Experiments show that when compared to various measures for escaping from the local optimum, including initialization with random seeds, adding perturbations to the baseline parameters, and continuing training (con-training) with the baseline models, our method consistently improves the ConvS2S translation quality across various language pairs and achieves better performance.<\/jats:p>","DOI":"10.1145\/3358414","type":"journal-article","created":{"date-parts":[[2019,11,1]],"date-time":"2019-11-01T12:18:31Z","timestamp":1572610711000},"page":"1-15","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Layer-Wise De-Training and Re-Training for ConvS2S Machine Translation"],"prefix":"10.1145","volume":"19","author":[{"given":"Hongfei","family":"Yu","sequence":"first","affiliation":[{"name":"Soochow University"}]},{"given":"Xiaoqing","family":"Zhou","sequence":"additional","affiliation":[{"name":"Soochow University"}]},{"given":"Xiangyu","family":"Duan","sequence":"additional","affiliation":[{"name":"Soochow University"}]},{"given":"Min","family":"Zhang","sequence":"additional","affiliation":[{"name":"Soochow University"}]}],"member":"320","published-online":{"date-parts":[[2019,11]]},"reference":[{"volume-title":"Proceedings of the 3rd International Conference on Learning Representations.","year":"2015","author":"Bahdanau Dzmitry","key":"e_1_2_1_1_1"},{"key":"e_1_2_1_2_1","first-page":"2","article-title":"2002. Learning long-term dependencies with gradient descent is difficult.","volume":"5","author":"Bengio Y.","year":"2002","journal-title":"IEEE Transactions on Neural Networks"},{"volume-title":"Findings of the 2017 conference on machine translation (WMT17). In Proceedings of the Conference on Machine Translation. 169--214","year":"2017","author":"Bojar Ondrej","key":"e_1_2_1_3_1"},{"key":"e_1_2_1_4_1","unstructured":"James Bradbury Stephen Merity Caiming Xiong and Richard Socher. 2016. Quasi-recurrent neural networks. arXiv:1611.01576.  James Bradbury Stephen Merity Caiming Xiong and Richard Socher. 2016. Quasi-recurrent neural networks. arXiv:1611.01576."},{"key":"e_1_2_1_5_1","doi-asserted-by":"crossref","unstructured":"Michael Denkowski and Graham Neubig. 2017. Stronger baselines for trustable results in neural machine translation. arXiv:1706.09733.  Michael Denkowski and Graham Neubig. 2017. Stronger baselines for trustable results in neural machine translation. arXiv:1706.09733.","DOI":"10.18653\/v1\/W17-3203"},{"volume-title":"Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL\u201917)","author":"Gehring Jonas","key":"e_1_2_1_6_1"},{"volume-title":"Dauphin","year":"2017","author":"Gehring Jonas","key":"e_1_2_1_7_1"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_2_1_9_1","unstructured":"Sabastien Jean Kyunghyun Cho Roland Memisevic and Yoshua Bengio. 2015. On using very large target vocabulary for neural machine translation. arXiv:1412.2007.  Sabastien Jean Kyunghyun Cho Roland Memisevic and Yoshua Bengio. 2015. On using very large target vocabulary for neural machine translation. arXiv:1412.2007."},{"volume-title":"International Encyclopedia of Statistical Science","author":"Jolliffe Ian","key":"e_1_2_1_10_1"},{"volume-title":"Alex Graves, and Koray Kavukcuoglu.","year":"2016","author":"Kalchbrenner Nal","key":"e_1_2_1_11_1"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611974973.60"},{"volume-title":"Proceedings of the 2015 Conference on Empirical Methods on Natural Language Processing (EMNLP\u201915)","author":"Luong Minh-Thang","key":"e_1_2_1_13_1"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/P15-1003"},{"volume-title":"Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL\u201902)","year":"2002","author":"Papineni Kishore","key":"e_1_2_1_15_1"},{"volume-title":"Proceedings of the International Conference on Machine Learning. 1310--1318","year":"2013","author":"Pascanu Razvan","key":"e_1_2_1_16_1"},{"volume-title":"Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS\u201917)","year":"2017","author":"Paszke Adam","key":"e_1_2_1_17_1"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P16-1162"},{"volume-title":"Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15, 1","year":"2014","author":"Srivastava Nitish","key":"e_1_2_1_19_1"},{"volume-title":"Proceedings of the International Conference on Machine Learning. 1139--1147","year":"2013","author":"Sutskever Ilya","key":"e_1_2_1_20_1"},{"volume-title":"Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS\u201914)","author":"Sutskever Ilya","key":"e_1_2_1_21_1"},{"key":"e_1_2_1_22_1","unstructured":"Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N. Gomez Lukasz Kaiser and Illia Polosukhin. 2017. Attention is all you need. arXiv:1706.03762.  Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan N. Gomez Lukasz Kaiser and Illia Polosukhin. 2017. Attention is all you need. arXiv:1706.03762."},{"key":"e_1_2_1_23_1","doi-asserted-by":"crossref","unstructured":"Biao Zhang Deyi Xiong and Jinsong Su. 2018. Accelerating neural transformer via an average attention network. arXiv:1805.00631.  Biao Zhang Deyi Xiong and Jinsong Su. 2018. Accelerating neural transformer via an average attention network. arXiv:1805.00631.","DOI":"10.18653\/v1\/P18-1166"}],"container-title":["ACM Transactions on Asian and Low-Resource Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3358414","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3358414","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:23:12Z","timestamp":1750202592000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3358414"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,11]]},"references-count":23,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2020,3,31]]}},"alternative-id":["10.1145\/3358414"],"URL":"https:\/\/doi.org\/10.1145\/3358414","relation":{},"ISSN":["2375-4699","2375-4702"],"issn-type":[{"type":"print","value":"2375-4699"},{"type":"electronic","value":"2375-4702"}],"subject":[],"published":{"date-parts":[[2019,11]]},"assertion":[{"value":"2018-12-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-08-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-11-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}