{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,4]],"date-time":"2026-02-04T19:06:40Z","timestamp":1770232000488,"version":"3.49.0"},"reference-count":38,"publisher":"MIT Press - Journals","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Transactions of the Association for Computational Linguistics"],"published-print":{"date-parts":[[2020,12]]},"abstract":"<jats:p> Recent progress in the task of Grammatical Error Correction (GEC) has been driven by addressing data sparsity, both through new methods for generating large and noisy pretraining data and through the publication of small and higher-quality finetuning data in the BEA-2019 shared task. Building upon recent work in Neural Machine Translation (NMT), we make use of both kinds of data by deriving example-level scores on our large pretraining data based on a smaller, higher-quality dataset. In this work, we perform an empirical study to discover how to best incorporate delta-log-perplexity, a type of example scoring, into a training schedule for GEC. In doing so, we perform experiments that shed light on the function and applicability of delta-log-perplexity. Models trained on scored data achieve state- of-the-art results on common GEC test sets. <\/jats:p>","DOI":"10.1162\/tacl_a_00336","type":"journal-article","created":{"date-parts":[[2020,10,15]],"date-time":"2020-10-15T16:13:15Z","timestamp":1602778395000},"page":"634-646","source":"Crossref","is-referenced-by-count":21,"title":["Data Weighted Training Strategies for Grammatical Error Correction"],"prefix":"10.1162","volume":"8","author":[{"given":"Jared","family":"Lichtarge","sequence":"first","affiliation":[{"name":"Google Research."}]},{"given":"Chris","family":"Alberti","sequence":"additional","affiliation":[{"name":"Google Research."}]},{"given":"Shankar","family":"Kumar","sequence":"additional","affiliation":[{"name":"Google Research."}]}],"member":"281","reference":[{"key":"bib1","first-page":"355","volume-title":"Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing","author":"Axelrod Amittai","year":"2011"},{"key":"bib2","doi-asserted-by":"crossref","first-page":"52","DOI":"10.18653\/v1\/W19-4406","volume-title":"Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications","author":"Bryant Christopher","year":"2019"},{"key":"bib3","doi-asserted-by":"crossref","first-page":"793","DOI":"10.18653\/v1\/P17-1074","volume-title":"Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Bryant Christopher","year":"2017"},{"key":"bib4","doi-asserted-by":"crossref","first-page":"213","DOI":"10.18653\/v1\/W19-4423","volume-title":"Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications","author":"Choe Yo Joong","year":"2019"},{"key":"bib5","volume-title":"The Thirty-Second AAAI Conference on Artificial Intelligence","author":"Chollampatt Shamil","year":"2018"},{"key":"bib6","first-page":"166","volume-title":"Proceedings of the 3rd Workshop on Asian Translation (WAT2016)","author":"Cromieres Fabien","year":"2016"},{"key":"bib7","first-page":"568","volume-title":"Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Dahlmeier Daniel","year":"2012"},{"key":"bib8","first-page":"22","volume-title":"Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications","author":"Dahlmeier Daniel","year":"2013"},{"key":"bib9","first-page":"1055","volume-title":"Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Ge Tao","year":"2018"},{"key":"bib10","author":"Ge Tao","year":"2018","journal-title":"CoRR"},{"key":"bib11","first-page":"3","volume-title":"Learner English on Computer","author":"Granger Sylviane","year":"1998"},{"key":"bib12","doi-asserted-by":"crossref","first-page":"478","DOI":"10.1007\/978-3-319-10888-9_47","volume-title":"Advances in Natural Language Processing \u2014 Lecture Notes in Computer Science","volume":"8686","author":"Grundkiewicz Roman","year":"2014"},{"key":"bib13","doi-asserted-by":"crossref","first-page":"252","DOI":"10.18653\/v1\/W19-4427","volume-title":"Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications","author":"Grundkiewicz Roman","year":"2019"},{"key":"bib14","doi-asserted-by":"crossref","first-page":"174","DOI":"10.3115\/v1\/P14-2029","volume-title":"Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)","author":"Heilman Michael","year":"2014"},{"key":"bib15","volume-title":"Proceedings of the Third Conference on Machine Translation: Research Papers","author":"Junczys-Dowmunt Marcin","year":"2018"},{"key":"bib16","first-page":"595","volume-title":"Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)","author":"Junczys-Dowmunt Marcin","year":"2018"},{"key":"bib17","doi-asserted-by":"crossref","first-page":"4977","DOI":"10.18653\/v1\/D18-1541","volume-title":"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing","author":"Kasewa Sudhanshu","year":"2018"},{"key":"bib18","volume-title":"Proceedings of the Second Workshop on Neural Machine Translation and Generation","author":"Khayrallah Huda","year":"2018"},{"key":"bib19","doi-asserted-by":"crossref","first-page":"1236","DOI":"10.18653\/v1\/D19-1119","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Kiyono Shun","year":"2019"},{"key":"bib20","first-page":"2054","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Kumar Gaurav","year":"2019"},{"key":"bib21","first-page":"3291","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Lichtarge Jared","year":"2019"},{"key":"bib22","first-page":"863","volume-title":"Proceedings of COLING 2012: Posters","author":"Mizumoto Tomoya","year":"2012"},{"key":"bib23","first-page":"220","volume-title":"Proceedings of the ACL 2010 Conference Short Papers","author":"Moore Robert C.","year":"2010"},{"key":"bib24","unstructured":"Courtney Napoles, Keisuke Sakaguchi, Matt Post, and Joel Tetreault. 2016. GLEU without tuning. arXiv:1605.02592."},{"key":"bib25","first-page":"229","volume-title":"Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers","author":"Napoles Courtney","year":"2017"},{"key":"bib26","volume-title":"CoNLL Shared Task","author":"Ng Hwee Tou","year":"2013"},{"key":"bib27","first-page":"1","volume-title":"Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task","author":"Ng Hwee Tou","year":"2014"},{"key":"bib28","volume-title":"Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications","author":"Omelianchuk Kostiantyn","year":"2020"},{"key":"bib29","volume-title":"Proceedings of the IEEE Conference on Acoustics, Speech and Signal Processing","author":"Schuster Michael","year":"2012"},{"key":"bib30","author":"Shazeer Noam","year":"2018","journal-title":"arXiv:1804.04235"},{"key":"bib31","first-page":"6000","volume-title":"Advances in Neural Information Processing Systems","author":"Vaswani Ashish","year":"2017"},{"key":"bib32","first-page":"1400","volume-title":"Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing","author":"van der Wees Marlies","year":"2017"},{"key":"bib33","doi-asserted-by":"crossref","first-page":"1282","DOI":"10.18653\/v1\/P19-1123","volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics","author":"Wang Wei","year":"2019"},{"key":"bib34","doi-asserted-by":"crossref","first-page":"133","DOI":"10.18653\/v1\/W18-6314","volume-title":"Proceedings of the Third Conference on Machine Translation: Research Papers","author":"Wang Wei","year":"2018"},{"key":"bib35","unstructured":"Yu Wang, Yuelin Wang, Jie Liu, and Zhuo Liu. 2020. A comprehensive survey of grammar error correction. 2005.06600."},{"key":"bib36","first-page":"149","volume-title":"Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications","author":"Xu Shuyao","year":"2019"},{"key":"bib37","first-page":"180","volume-title":"Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies","author":"Yannakoudakis Helen","year":"2011"},{"key":"bib38","first-page":"156","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Zhao Wei","year":"2019"}],"container-title":["Transactions of the Association for Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mitpressjournals.org\/doi\/pdf\/10.1162\/tacl_a_00336","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,3,12]],"date-time":"2021-03-12T21:39:44Z","timestamp":1615585184000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/tacl\/article\/96461"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,12]]},"references-count":38,"alternative-id":["10.1162\/tacl_a_00336"],"URL":"https:\/\/doi.org\/10.1162\/tacl_a_00336","relation":{},"ISSN":["2307-387X"],"issn-type":[{"value":"2307-387X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,12]]}}}