{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T14:29:34Z","timestamp":1776090574536,"version":"3.50.1"},"reference-count":68,"publisher":"MIT Press - Journals","issue":"4","license":[{"start":{"date-parts":[[2021,9,2]],"date-time":"2021-09-02T00:00:00Z","timestamp":1630540800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/"}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,12,23]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>In recent years, Neural Machine Translation (NMT) has achieved notable results in various translation tasks. However, the word-by-word generation manner determined by the autoregressive mechanism leads to high translation latency of the NMT and restricts its low-latency applications. Non-Autoregressive Neural Machine Translation (NAT) removes the autoregressive mechanism and achieves significant decoding speedup by generating target words independently and simultaneously. Nevertheless, NAT still takes the word-level cross-entropy loss as the training objective, which is not optimal because the output of NAT cannot be properly evaluated due to the multimodality problem. In this article, we propose using sequence-level training objectives to train NAT models, which evaluate the NAT outputs as a whole and correlates well with the real translation quality. First, we propose training NAT models to optimize sequence-level evaluation metrics (e.g., BLEU) based on several novel reinforcement algorithms customized for NAT, which outperform the conventional method by reducing the variance of gradient estimation. Second, we introduce a novel training objective for NAT models, which aims to minimize the Bag-of-N-grams (BoN) difference between the model output and the reference sentence. The BoN training objective is differentiable and can be calculated efficiently without doing any approximations. Finally, we apply a three-stage training strategy to combine these two methods to train the NAT model. We validate our approach on four translation tasks (WMT14 En\u2194De, WMT16 En\u2194Ro), which shows that our approach largely outperforms NAT baselines and achieves remarkable performance on all translation tasks. The source code is available at https:\/\/github.com\/ictnlp\/Seq-NAT.<\/jats:p>","DOI":"10.1162\/coli_a_00421","type":"journal-article","created":{"date-parts":[[2021,9,2]],"date-time":"2021-09-02T19:00:33Z","timestamp":1630609233000},"page":"891-925","update-policy":"https:\/\/doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":12,"title":["Sequence-Level Training for Non-Autoregressive Neural Machine Translation"],"prefix":"10.1162","volume":"47","author":[{"given":"Chenze","family":"Shao","sequence":"first","affiliation":[{"name":"Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences. shaochenze18z@ict.ac.cn"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yang","family":"Feng","sequence":"additional","affiliation":[{"name":"Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences. fengyang@ict.ac.cn"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jinchao","family":"Zhang","sequence":"additional","affiliation":[{"name":"Pattern Recognition Center, WeChat AI, Tencent Inc. dayerzhang@tencent.com"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Fandong","family":"Meng","sequence":"additional","affiliation":[{"name":"Pattern Recognition Center, WeChat AI, Tencent Inc. fandongmeng@tencent.com"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jie","family":"Zhou","sequence":"additional","affiliation":[{"name":"Pattern Recognition Center, WeChat AI, Tencent Inc. withtomzhou@tencent.com"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"281","published-online":{"date-parts":[[2021,12,23]]},"reference":[{"key":"2022010319055017300_bib1","doi-asserted-by":"publisher","first-page":"1269","DOI":"10.18653\/v1\/P19-1122","article-title":"Syntactically supervised transformers for faster neural machine translation","volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics","author":"Akoury","year":"2019"},{"key":"2022010319055017300_bib2","article-title":"An actor-critic algorithm for sequence prediction","volume-title":"5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24\u201326, 2017, Conference Track Proceedings","author":"Bahdanau","year":"2017"},{"key":"2022010319055017300_bib3","article-title":"Neural machine translation by jointly learning to align and translate","volume-title":"3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7\u20139, 2015, Conference Track Proceedings","author":"Bahdanau","year":"2015"},{"key":"2022010319055017300_bib4","article-title":"Non-autoregressive transformer by position learning","author":"Bao","year":"2019","journal-title":"arXiv preprint arXiv:1911.10677"},{"key":"2022010319055017300_bib5","first-page":"1171","article-title":"Scheduled sampling for sequence prediction with recurrent neural networks","volume-title":"Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1","author":"Bengio","year":"2015"},{"key":"2022010319055017300_bib6","doi-asserted-by":"publisher","first-page":"1724","DOI":"10.3115\/v1\/D14-1179","article-title":"Learning phrase representations using RNN encoder\u2013decoder for statistical machine translation","volume-title":"Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Cho","year":"2014"},{"key":"2022010319055017300_bib7","first-page":"4171","article-title":"BERT: Pre-training of deep bidirectional transformers for language understanding","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Devlin","year":"2019"},{"key":"2022010319055017300_bib8","doi-asserted-by":"publisher","first-page":"355","DOI":"10.18653\/v1\/N18-1033","article-title":"Classical structured prediction losses for sequence to sequence learning","volume-title":"Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)","author":"Edunov","year":"2018"},{"key":"2022010319055017300_bib9","doi-asserted-by":"publisher","first-page":"2862","DOI":"10.18653\/v1\/2021.acl-long.223","article-title":"Guiding teacher forcing with seer forcing for neural machine translation","volume-title":"Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)","author":"Feng","year":"2021"},{"key":"2022010319055017300_bib10","first-page":"1243","article-title":"Convolutional sequence to sequence learning","volume-title":"Proceedings of the 34th International Conference on Machine Learning - Volume 70","author":"Gehring","year":"2017"},{"key":"2022010319055017300_bib11","first-page":"3515","article-title":"Aligned cross entropy for non-autoregressive machine translation","volume-title":"Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13\u201318 July 2020, Virtual Event, volume 119, of Proceedings of Machine Learning Research","author":"Ghazvininejad","year":"2020"},{"key":"2022010319055017300_bib12","doi-asserted-by":"publisher","first-page":"6112","DOI":"10.18653\/v1\/D19-1633","article-title":"Mask-predict: Parallel decoding of conditional masked language models","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Ghazvininejad","year":"2019"},{"key":"2022010319055017300_bib13","first-page":"4601","article-title":"Professor forcing: A new algorithm for training recurrent networks","volume-title":"Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5\u201310, 2016, Barcelona, Spain","author":"Goyal","year":"2016"},{"key":"2022010319055017300_bib14","article-title":"Non-autoregressive neural machine translation","volume-title":"6th International Conference on Learning Representations, ICLR","author":"Gu","year":"2018"},{"key":"2022010319055017300_bib15","first-page":"11179","article-title":"Levenshtein transformer","volume-title":"Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8\u201314, 2019, Vancouver, BC, Canada","author":"Gu","year":"2019"},{"issue":"01","key":"2022010319055017300_bib16","doi-asserted-by":"publisher","first-page":"3723","DOI":"10.1609\/aaai.v33i01.33013723","article-title":"Non-autoregressive neural machine translation with enhanced decoder input","volume":"33","author":"Guo","year":"2019","journal-title":"Proceedings of the AAAI Conference on Artificial Intelligence"},{"key":"2022010319055017300_bib17","first-page":"820","article-title":"Dual learning for machine translation","volume-title":"Advances in Neural Information Processing Systems 29","author":"He","year":"2016"},{"key":"2022010319055017300_bib18","article-title":"Distilling the knowledge in a neural network","author":"Hinton","year":"2015","journal-title":"arXiv preprint arXiv:1503.02531"},{"key":"2022010319055017300_bib19","doi-asserted-by":"publisher","first-page":"137","DOI":"10.1007\/BFb0026683","article-title":"Text categorization with support vector machines: Learning with many relevant features","volume-title":"Proceedings of the 10th European Conference on Machine Learning","author":"Joachims","year":"1998"},{"key":"2022010319055017300_bib20","doi-asserted-by":"publisher","first-page":"7839","DOI":"10.1609\/aaai.v34i05.6289","article-title":"Fine-tuning by curriculum learning for non-autoregressive neural machine translation","volume":"34","author":"Junliang","year":"2020","journal-title":"Proceedings of the AAAI Conference on Artificial Intelligence"},{"key":"2022010319055017300_bib21","first-page":"2390","article-title":"Fast decoding in sequence models using discrete latent variables","volume-title":"Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research","author":"Kaiser","year":"2018"},{"key":"2022010319055017300_bib22","first-page":"5144","article-title":"Non-autoregressive machine translation with disentangled context transformer","volume-title":"Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13\u201318 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research","author":"Kasai","year":"2020"},{"key":"2022010319055017300_bib23","doi-asserted-by":"publisher","first-page":"1317","DOI":"10.18653\/v1\/D16-1139","article-title":"Sequence-level knowledge distillation","volume-title":"Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing","author":"Kim","year":"2016"},{"key":"2022010319055017300_bib24","article-title":"Adam: A method for stochastic optimization","author":"Kingma","year":"2014","journal-title":"arXiv preprint arXiv:1412.6980"},{"key":"2022010319055017300_bib25","doi-asserted-by":"publisher","first-page":"1173","DOI":"10.18653\/v1\/D18-1149","article-title":"Deterministic non-autoregressive neural sequence modeling by iterative refinement","volume-title":"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing","author":"Lee","year":"2018"},{"key":"2022010319055017300_bib26","first-page":"1591","article-title":"Weighted neural bag-of-n-grams model: New baselines for text classification","volume-title":"Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers","author":"Li","year":"2016"},{"key":"2022010319055017300_bib27","doi-asserted-by":"publisher","first-page":"5708","DOI":"10.18653\/v1\/D19-1573","article-title":"Hint-based training for non-autoregressive machine translation","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Li","year":"2019"},{"key":"2022010319055017300_bib28","doi-asserted-by":"publisher","first-page":"3016","DOI":"10.18653\/v1\/D18-1336","article-title":"End-to-end non-autoregressive neural machine translation with connectionist temporal classification","volume-title":"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing","author":"Libovick\u00fd","year":"2018"},{"key":"2022010319055017300_bib29","first-page":"74","article-title":"ROUGE: A package for automatic evaluation of summaries","volume-title":"Text Summarization Branches Out","author":"Lin","year":"2004"},{"key":"2022010319055017300_bib30","doi-asserted-by":"publisher","first-page":"332","DOI":"10.18653\/v1\/P18-2053","article-title":"Bag-of-words as target for neural machine translation","volume-title":"Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)","author":"Ma","year":"2018"},{"key":"2022010319055017300_bib31","doi-asserted-by":"publisher","first-page":"4282","DOI":"10.18653\/v1\/D19-1437","article-title":"FlowSeq: Non-autoregressive conditional sequence generation with generative flow","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Ma","year":"2019"},{"key":"2022010319055017300_bib32","first-page":"278","article-title":"Policy invariance under reward transformations: Theory and application to reward shaping","volume-title":"Proceedings of the Sixteenth International Conference on Machine Learning","author":"Ng","year":"1999"},{"key":"2022010319055017300_bib33","first-page":"1731","article-title":"Reward augmented maximum likelihood for neural structured prediction","volume-title":"Proceedings of the 30th International Conference on Neural Information Processing Systems","author":"Norouzi","year":"2016"},{"key":"2022010319055017300_bib34","doi-asserted-by":"publisher","first-page":"79","DOI":"10.3115\/1118693.1118704","article-title":"Thumbs up? Sentiment classification using machine learning techniques","volume-title":"Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002)","author":"Pang","year":"2002"},{"key":"2022010319055017300_bib35","doi-asserted-by":"publisher","first-page":"311","DOI":"10.3115\/1073083.1073135","article-title":"BLEU: A method for automatic evaluation of machine translation","volume-title":"Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics","author":"Papineni","year":"2002"},{"key":"2022010319055017300_bib36","doi-asserted-by":"publisher","first-page":"3059","DOI":"10.18653\/v1\/2020.acl-main.277","article-title":"Learning to recover from multi-modality errors for non-autoregressive neural machine translation","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Ran","year":"2020"},{"key":"2022010319055017300_bib37","first-page":"13727","article-title":"Guiding non-autoregressive neural machine translation decoding with reordering information","volume-title":"Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Virtual Event, February 2-9, 2021","author":"Ran","year":"2021"},{"key":"2022010319055017300_bib38","article-title":"Sequence level training with recurrent neural networks","volume-title":"4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings","author":"Ranzato","year":"2016"},{"key":"2022010319055017300_bib39","doi-asserted-by":"publisher","first-page":"1098","DOI":"10.18653\/v1\/2020.emnlp-main.83","article-title":"Non-autoregressive machine translation with latent alignments","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Saharia","year":"2020"},{"key":"2022010319055017300_bib40","doi-asserted-by":"publisher","first-page":"1715","DOI":"10.18653\/v1\/P16-1162","article-title":"Neural machine translation of rare words with subword units","volume-title":"Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Sennrich","year":"2016"},{"key":"2022010319055017300_bib41","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1510","article-title":"Modeling coverage for non-autoregressive neural machine translation","author":"Shan","year":"2021","journal-title":"arXiv preprint arXiv:2104.11897"},{"key":"2022010319055017300_bib42","doi-asserted-by":"publisher","first-page":"4778","DOI":"10.18653\/v1\/D18-1510","article-title":"Greedy search with probabilistic n-gram matching for neural machine translation","volume-title":"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing","author":"Shao","year":"2018"},{"key":"2022010319055017300_bib43","doi-asserted-by":"publisher","first-page":"3013","DOI":"10.18653\/v1\/P19-1288","article-title":"Retrieving sequential information for non-autoregressive neural machine translation","volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics","author":"Shao","year":"2019"},{"key":"2022010319055017300_bib44","doi-asserted-by":"publisher","first-page":"198","DOI":"10.1609\/aaai.v34i01.5351","article-title":"Minimizing the bag-of-ngrams difference for non-autoregressive neural machine translation","volume-title":"The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, New York, NY, USA, February 7-12, 2020","author":"Shao","year":"2020"},{"key":"2022010319055017300_bib45","doi-asserted-by":"publisher","first-page":"1683","DOI":"10.18653\/v1\/P16-1159","article-title":"Minimum risk training for neural machine translation","volume-title":"Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Shen","year":"2016"},{"key":"2022010319055017300_bib46","doi-asserted-by":"publisher","first-page":"8846","DOI":"10.1609\/aaai.v34i05","article-title":"Latent-variable non-autoregressive neural machine translation with deterministic inference using a delta posterior","volume-title":"The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, New York, NY, USA, February 7\u201312, 2020","author":"Shu","year":"2020"},{"key":"2022010319055017300_bib47","first-page":"3016","article-title":"Fast structured decoding for sequence models","volume-title":"Advances in Neural Information Processing Systems 32","author":"Sun","year":"2019"},{"key":"2022010319055017300_bib48","first-page":"9249","article-title":"An EM approach to non-autoregressive conditional sequence generation","volume-title":"Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research","author":"Sun","year":"2020"},{"key":"2022010319055017300_bib49","first-page":"3104","article-title":"Sequence to sequence learning with neural networks","volume-title":"Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2","author":"Sutskever","year":"2014"},{"key":"2022010319055017300_bib50","first-page":"1057","article-title":"Policy gradient methods for reinforcement learning with function approximation","volume-title":"Proceedings of the 12th International Conference on Neural Information Processing Systems","author":"Sutton","year":"1999"},{"key":"2022010319055017300_bib51","unstructured":"Sutton, Richard Stuart\n          . 1984. Temporal Credit Assignment in Reinforcement Learning. Ph.D. thesis. AAI8410337."},{"key":"2022010319055017300_bib52","doi-asserted-by":"crossref","first-page":"2819","DOI":"10.18653\/v1\/2020.acl-main.251","article-title":"ENGINE: Energy-based inference networks for non-autoregressive machine translation","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Tu","year":"2020"},{"key":"2022010319055017300_bib53","first-page":"6000","article-title":"Attention is all you need","volume-title":"Proceedings of the 31st International Conference on Neural Information Processing Systems","author":"Vaswani","year":"2017"},{"key":"2022010319055017300_bib54","first-page":"3024","article-title":"Improving multi-step prediction of learned time series models","volume-title":"Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence","author":"Venkatraman","year":"2015"},{"key":"2022010319055017300_bib55","doi-asserted-by":"publisher","first-page":"479","DOI":"10.18653\/v1\/D18-1044","article-title":"Semi-autoregressive neural machine translation","volume-title":"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing","author":"Wang","year":"2018"},{"key":"2022010319055017300_bib56","doi-asserted-by":"publisher","first-page":"5377","DOI":"10.1609\/aaai.v33i01.33015377","article-title":"Non-autoregressive machine translation with auxiliary regularization","volume-title":"The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, Honolulu, Hawaii, USA, January 27 \u2013 February 1, 2019","author":"Wang","year":"2019"},{"key":"2022010319055017300_bib57","first-page":"538","article-title":"The optimal reward baseline for gradient-based reinforcement learning","volume-title":"Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence","author":"Weaver","year":"2001"},{"key":"2022010319055017300_bib58","doi-asserted-by":"publisher","first-page":"1304","DOI":"10.18653\/v1\/P19-1125","article-title":"Imitation learning for non-autoregressive neural machine translation","volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics","author":"Wei","year":"2019"},{"issue":"3\u20134","key":"2022010319055017300_bib59","doi-asserted-by":"publisher","first-page":"229","DOI":"10.1007\/BF00992696","article-title":"Simple statistical gradient-following algorithms for connectionist reinforcement learning","volume":"8","author":"Williams","year":"1992","journal-title":"Machine Learning"},{"issue":"2","key":"2022010319055017300_bib60","doi-asserted-by":"publisher","first-page":"270","DOI":"10.1162\/neco.1989.1.2.270","article-title":"A learning algorithm for continually running fully recurrent neural networks","volume":"1","author":"Williams","year":"1989","journal-title":"Neural Computation"},{"key":"2022010319055017300_bib61","doi-asserted-by":"publisher","first-page":"3612","DOI":"10.18653\/v1\/D18-1397","article-title":"A study of reinforcement learning for neural machine translation","volume-title":"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing","author":"Wu","year":"2018"},{"key":"2022010319055017300_bib62","first-page":"534","article-title":"Adversarial neural machine translation","volume-title":"Proceedings of the 10th Asian Conference on Machine Learning, volume 95 of Proceedings of Machine Learning Research","author":"Wu","year":"2018"},{"key":"2022010319055017300_bib63","article-title":"Google\u2019s neural machine translation system: Bridging the gap between human and machine translation","author":"Wu","year":"2016","journal-title":"arXiv preprint arXiv:1609.08144"},{"key":"2022010319055017300_bib64","doi-asserted-by":"publisher","first-page":"1346","DOI":"10.18653\/v1\/N18-1122","article-title":"Improving neural machine translation with conditional sequence generative adversarial nets","volume-title":"Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)","author":"Yang","year":"2018"},{"key":"2022010319055017300_bib65","first-page":"2852","article-title":"SeqGAN: Sequence generative adversarial nets with policy gradient","volume-title":"Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence","author":"Yu","year":"2017"},{"key":"2022010319055017300_bib66","doi-asserted-by":"publisher","first-page":"4334","DOI":"10.18653\/v1\/P19-1426","article-title":"Bridging the gap between training and inference for neural machine translation","volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics","author":"Zhang","year":"2019"},{"key":"2022010319055017300_bib67","article-title":"Understanding knowledge distillation in non-autoregressive machine translation","volume-title":"8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26\u201330, 2020","author":"Zhou","year":"2020"},{"key":"2022010319055017300_bib68","doi-asserted-by":"publisher","first-page":"1893","DOI":"10.18653\/v1\/2020.acl-main.171","article-title":"Improving non-autoregressive neural machine translation with monolingual data","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Zhou","year":"2020"}],"container-title":["Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/direct.mit.edu\/coli\/article-pdf\/47\/4\/891\/1979393\/coli_a_00421.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/direct.mit.edu\/coli\/article-pdf\/47\/4\/891\/1979393\/coli_a_00421.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,1,3]],"date-time":"2022-01-03T19:06:23Z","timestamp":1641236783000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/coli\/article\/47\/4\/891\/107176\/Sequence-Level-Training-for-Non-Autoregressive"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,12]]},"references-count":68,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2021,12,23]]},"published-print":{"date-parts":[[2021,12,23]]}},"URL":"https:\/\/doi.org\/10.1162\/coli_a_00421","relation":{},"ISSN":["0891-2017","1530-9312"],"issn-type":[{"value":"0891-2017","type":"print"},{"value":"1530-9312","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,12]]},"published":{"date-parts":[[2021,12]]}}}