{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,20]],"date-time":"2026-06-20T16:54:17Z","timestamp":1781974457194,"version":"3.54.5"},"reference-count":180,"publisher":"MIT Press","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Transactions of the Association for Computational Linguistics"],"published-print":{"date-parts":[[2020,12]]},"abstract":"<jats:p> Transformer-based models have pushed state of the art in many areas of NLP, but our understanding of what is behind their success is still limited. This paper is the first survey of over 150 studies of the popular BERT model. We review the current state of knowledge about how BERT works, what kind of information it learns and how it is represented, common modifications to its training objectives and architecture, the overparameterization issue, and approaches to compression. We then outline directions for future research. <\/jats:p>","DOI":"10.1162\/tacl_a_00349","type":"journal-article","created":{"date-parts":[[2021,1,5]],"date-time":"2021-01-05T15:52:59Z","timestamp":1609861979000},"page":"842-866","source":"Crossref","is-referenced-by-count":825,"title":["A Primer in BERTology: What We Know About How BERT Works"],"prefix":"10.1162","volume":"8","author":[{"given":"Anna","family":"Rogers","sequence":"first","affiliation":[{"name":"Center for Social Data Science, University of Copenhagen."}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Olga","family":"Kovaleva","sequence":"additional","affiliation":[{"name":"Dept. of Computer Science, University of Massachusetts Lowell."}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Anna","family":"Rumshisky","sequence":"additional","affiliation":[{"name":"Dept. of Computer Science, University of Massachusetts Lowell."}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"281","reference":[{"key":"bib1","author":"Aguilar Gustavo","year":"2019","journal-title":"arXiv preprint arXiv:1910.03723"},{"key":"bib2","first-page":"724","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Akbik Alan","year":"2019"},{"key":"bib3","first-page":"5393","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Arase Yuki","year":"2019"},{"key":"bib4","author":"Arkhangelskaia Ekaterina","year":"2019","journal-title":"arXiv preprint arXiv:1910.06431"},{"key":"bib5","author":"Artetxe Mikel","year":"2019","journal-title":"arXiv:1911.03310 [cs]"},{"key":"bib6","author":"A\u00dfenmacher Matthias","year":"2020","journal-title":"arXiv:2001.00781 [cs, stat]"},{"key":"bib7","author":"Baan Joris","year":"2019","journal-title":"arXiv preprint arXiv:1911.03898"},{"key":"bib8","first-page":"5360","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Baevski Alexei","year":"2019"},{"key":"bib9","author":"He Bai","year":"2020","journal-title":"arXiv:2004. 14996 [cs]"},{"key":"bib10","doi-asserted-by":"crossref","first-page":"205","DOI":"10.18653\/v1\/2020.repl4nlp-1.24","volume-title":"Proceedings of the 5th Workshop on Representation Learning for NLP","author":"Balasubramanian Sriram","year":"2020"},{"key":"bib11","author":"Bao Hangbo","year":"2020","journal-title":"arXiv:2002.12804 [cs]"},{"key":"bib12","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00254"},{"key":"bib13","author":"Ben-David Eyal","year":"2020","journal-title":"arXiv:2006.09075 [cs]"},{"key":"bib14","doi-asserted-by":"crossref","first-page":"4758","DOI":"10.18653\/v1\/2020.acl-main.431","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Bommasani Rishi","year":"2020"},{"key":"bib15","author":"Bouraoui Zied","year":"2019","journal-title":"arXiv:1911.12753 [cs]"},{"key":"bib16","doi-asserted-by":"crossref","first-page":"677","DOI":"10.18653\/v1\/K19-1063","volume-title":"Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)","author":"Broscheit Samuel","year":"2019"},{"key":"bib17","author":"Brown Tom B.","year":"2020","journal-title":"arXiv:2005.14165 [cs]"},{"key":"bib18","volume-title":"International Conference on Learning Representations","author":"Brunner Gino","year":"2020"},{"key":"bib19","author":"Chen Tianlong","year":"2020","journal-title":"arXiv:2007.12223 [cs, stat]"},{"key":"bib20","author":"Cheng Xingyi","year":"2019","journal-title":"arXiv:1909.03405 [cs]"},{"key":"bib21","doi-asserted-by":"crossref","first-page":"276","DOI":"10.18653\/v1\/W19-4828","volume-title":"Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP","author":"Clark Kevin","year":"2019"},{"key":"bib22","volume-title":"International Conference on Learning Representations","author":"Clark Kevin","year":"2020"},{"key":"bib23","doi-asserted-by":"crossref","first-page":"108","DOI":"10.18653\/v1\/D19-5611","volume-title":"Proceedings of the 3rd Workshop on Neural Generation and Translation","author":"Clinchant Stephane","year":"2019"},{"key":"bib24","author":"Conneau Alexis","year":"2019","journal-title":"arXiv:1911.02116 [cs]"},{"key":"bib25","doi-asserted-by":"crossref","first-page":"2174","DOI":"10.18653\/v1\/D19-1223","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Correia Gon\u00e7alo M.","year":"2019"},{"key":"bib26","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00018"},{"key":"bib27","author":"Cui Leyang","year":"2020","journal-title":"arXiv:2008.03945 [cs]"},{"key":"bib28","author":"Cui Yiming","year":"2019","journal-title":"arXiv:1906.08101 [cs]"},{"key":"bib29","first-page":"1","volume-title":"Proceedings of the First Workshop on Commonsense Inference in Natural Language Processing","author":"Da Jeff","year":"2019"},{"key":"bib30","doi-asserted-by":"crossref","first-page":"1173","DOI":"10.18653\/v1\/D19-1109","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Davison Joe","year":"2019"},{"key":"bib31","first-page":"4171","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Devlin Jacob","year":"2019"},{"key":"bib32","author":"Dodge Jesse","year":"2020","journal-title":"arXiv:2002.06305 [cs]"},{"key":"bib33","author":"Elazar Yanai","year":"2020","journal-title":"arXiv:2006. 00995 [cs]"},{"key":"bib34","doi-asserted-by":"crossref","first-page":"55","DOI":"10.18653\/v1\/D19-1006","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Ethayarajh Kawin","year":"2019"},{"key":"bib35","author":"Ettinger Allyson","year":"2019","journal-title":"arXiv: 1907.13528 [cs]"},{"key":"bib36","volume-title":"International Conference on Learning Representations","author":"Fan Angela","year":"2019"},{"key":"bib37","first-page":"7","volume-title":"Proceedings of the 41st Annual Conference of the Cognitive Science Society (CogSci 2019)","author":"Forbes Maxwell","year":"2019"},{"key":"bib38","volume-title":"International Conference on Learning Representations","author":"Frankle Jonathan","year":"2019"},{"key":"bib39","author":"Ganesh Prakhar","year":"2020","journal-title":"arXiv preprint arXiv:2002.11985"},{"key":"bib40","volume-title":"AAAI","author":"Garg Siddhant","year":"2020"},{"key":"bib41","doi-asserted-by":"crossref","first-page":"2773","DOI":"10.18653\/v1\/2020.acl-main.247","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Glass Michael","year":"2020"},{"key":"bib42","author":"Glava\u0161 Goran","year":"2020","journal-title":"arXiv:2008.06788 [cs]"},{"key":"bib43","volume-title":"Constructions at Work: The Nature of Generalization in Language","author":"Goldberg Adele","year":"2006"},{"key":"bib44","author":"Goldberg Yoav","year":"2019","journal-title":"arXiv preprint arXiv:1901.05287"},{"key":"bib45","first-page":"2337","volume-title":"International Conference on Machine Learning","author":"Gong Linyuan","year":"2019"},{"key":"bib46","author":"Gordon Mitchell A.","year":"2020","journal-title":"arXiv preprint arXiv:2002.08307"},{"key":"bib47","author":"Goyal Saurabh","year":"2020","journal-title":"arXiv preprint arXiv:2001. 08950"},{"key":"bib48","author":"Guo Fu-Ming","year":"2019","journal-title":"arXiv:1909.12486 [cs, stat]"},{"key":"bib49","author":"Guu Kelvin","year":"2020","journal-title":"arXiv:2002.08909 [cs]"},{"key":"bib50","first-page":"4143","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Hao Yaru","year":"2019"},{"key":"bib51","first-page":"4129","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Hewitt John","year":"2019"},{"key":"bib52","volume-title":"Deep Learning and Representation Learning Workshop: NIPS 2014","author":"Hinton Geoffrey","year":"2014"},{"key":"bib53","author":"Hoover Benjamin","year":"2019","journal-title":"arXiv:1910. 05276 [cs]"},{"key":"bib54","author":"Houlsby Neil","year":"2019","journal-title":"arXiv: 1902.00751 [cs, stat]"},{"key":"bib55","author":"Htut Phu Mon","year":"2019","journal-title":"arXiv preprint arXiv:1911.12246"},{"key":"bib56","first-page":"3543","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Jain Sarthak","year":"2019"},{"key":"bib57","volume-title":"57th Annual Meeting of the Association for Computational Linguistics (ACL), Florence, Italy","author":"Jawahar Ganesh","year":"2019"},{"key":"bib58","author":"Jiang Haoming","year":"2019","journal-title":"arXiv preprint arXiv:1911.03437"},{"key":"bib59","author":"Jiang Zhengbao","year":"2019","journal-title":"arXiv:1911. 12543 [cs]"},{"key":"bib60","author":"Jiao Xiaoqi","year":"2019","journal-title":"arXiv preprint arXiv:1909.10351"},{"key":"bib61","volume-title":"AAAI 2020","author":"Di Jin","year":"2020"},{"key":"bib62","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00300"},{"key":"bib63","author":"Kao Wei-Tsung","year":"2020","journal-title":"arXiv preprint arXiv:2001.09309"},{"key":"bib64","volume-title":"ICLR 2020","author":"Kim Taeuk","year":"2020"},{"key":"bib65","author":"Kobayashi Goro","year":"2020","journal-title":"arXiv:2004.10102 [cs]"},{"key":"bib66","doi-asserted-by":"crossref","first-page":"2779","DOI":"10.18653\/v1\/D19-1279","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Kondratyuk Dan","year":"2019"},{"key":"bib67","volume-title":"International Conference on Learning Representations","author":"Kong Lingpeng","year":"2019"},{"key":"bib68","first-page":"4356","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Kovaleva Olga","year":"2019"},{"key":"bib69","volume-title":"ICLR 2020","author":"Krishna Kalpesh","year":"2020"},{"key":"bib70","author":"Kumar Varun","year":"2020","journal-title":"arXiv:2003.02245 [cs]"},{"key":"bib71","author":"Kuznetsov Ilia","year":"2020","journal-title":"arXiv:2004.14999 [cs]"},{"key":"bib72","author":"Lample Guillaume","year":"2019","journal-title":"arXiv:1901.07291 [cs]"},{"key":"bib73","volume-title":"ICLR","author":"Lan Zhenzhong","year":"2020"},{"key":"bib74","author":"Lee Cheolhyoung","year":"2019","journal-title":"arXiv preprint arXiv:1909.11299"},{"key":"bib75","author":"Lewis Mike","year":"2019","journal-title":"arXiv: 1910.13461 [cs, stat]"},{"key":"bib76","first-page":"5709","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Li Changmao","year":"2020"},{"key":"bib77","author":"Li Zhuohan","year":"2020","journal-title":"arXiv preprint arXiv:2002.11794"},{"key":"bib78","first-page":"241","volume-title":"Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP","author":"Lin Yongjie","year":"2019"},{"key":"bib79","first-page":"1073","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Liu Nelson F.","year":"2019"},{"key":"bib80","author":"Liu Yinhan","year":"2019","journal-title":"arXiv:1907.11692 [cs]"},{"key":"bib81","author":"Ma Xiaofei","year":"2019","journal-title":"arXiv:1910.07973 [cs]"},{"key":"bib82","first-page":"201907367","author":"Manning Christopher D.","year":"2020","journal-title":"Proceedings of the National Academy of Sciences"},{"key":"bib83","first-page":"622","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"May Chandler","year":"2019"},{"key":"bib84","author":"McCarley J. S.","year":"2020","journal-title":"arXiv:1910.06360 [cs]"},{"key":"bib85","volume-title":"International Conference on Learning Representations","author":"Thomas McCoy R.","year":"2019"},{"key":"bib86","doi-asserted-by":"crossref","first-page":"3428","DOI":"10.18653\/v1\/P19-1334","volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics","author":"McCoy Tom","year":"2019"},{"key":"bib87","doi-asserted-by":"crossref","first-page":"110","DOI":"10.18653\/v1\/2020.repl4nlp-1.15","volume-title":"Proceedings of the 5th Workshop on Representation Learning for NLP","author":"Miaschi Alessio","year":"2020"},{"key":"bib88","volume-title":"Advances in Neural Information Processing Systems 32 (NIPS 2019)","author":"Michel Paul","year":"2019"},{"key":"bib89","author":"Mickus Timothee","year":"2019","journal-title":"arXiv preprint arXiv:1911.05758"},{"key":"bib90","unstructured":"Microsoft. 2020. Turing-NLG: A 17-billion-parameter language model by microsoft."},{"key":"bib91","first-page":"3111","volume-title":"Advances in Neural Information Processing Systems 26 (NIPS 2013)","author":"Mikolov Tomas","year":"2013"},{"key":"bib92","volume-title":"International Conference on Learning Representations","author":"Jiaqi Mu","year":"2018"},{"key":"bib93","doi-asserted-by":"crossref","first-page":"4658","DOI":"10.18653\/v1\/P19-1459","volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics","author":"Niven Timothy","year":"2019"},{"key":"bib94","doi-asserted-by":"crossref","first-page":"43","DOI":"10.18653\/v1\/D19-1005","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Peters Matthew E.","year":"2019"},{"key":"bib95","doi-asserted-by":"crossref","first-page":"7","DOI":"10.18653\/v1\/W19-4302","volume-title":"Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019)","author":"Peters Matthew E.","year":"2019"},{"key":"bib96","doi-asserted-by":"crossref","first-page":"2463","DOI":"10.18653\/v1\/D19-1250","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Petroni Fabio","year":"2019"},{"key":"bib97","author":"Phang Jason","year":"2019","journal-title":"arXiv:1811.01088 [cs]"},{"key":"bib98","author":"Pimentel Tiago","year":"2020","journal-title":"arXiv:2004. 03061 [cs]"},{"key":"bib99","author":"Poerner Nina","year":"2019","journal-title":"arXiv preprint arXiv: 1911.03681"},{"key":"bib100","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing","author":"Prasanna Sai","year":"2020"},{"key":"bib101","doi-asserted-by":"crossref","first-page":"2996","DOI":"10.18653\/v1\/2020.acl-main.270","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Press Ofir","year":"2020"},{"key":"bib102","doi-asserted-by":"crossref","first-page":"5231","DOI":"10.18653\/v1\/2020.acl-main.467","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Pruksachatkun Yada","year":"2020"},{"key":"bib103","author":"Raffel Colin","year":"2019","journal-title":"arXiv:1910.10683 [cs, stat]"},{"key":"bib104","author":"Raganato Alessandro","year":"2020","journal-title":"arXiv:2002.10260 [cs]"},{"key":"bib105","doi-asserted-by":"crossref","first-page":"287","DOI":"10.18653\/v1\/W18-5431","volume-title":"Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP","author":"Raganato Alessandro","year":"2018"},{"key":"bib106","doi-asserted-by":"crossref","first-page":"4902","DOI":"10.18653\/v1\/2020.acl-main.442","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Ribeiro Marco Tulio","year":"2020"},{"key":"bib107","volume-title":"AAAI 2020","author":"Richardson Kyle","year":"2020"},{"key":"bib108","author":"Richardson Kyle","year":"2019","journal-title":"arXiv:1912.13337 [cs]"},{"key":"bib109","author":"Roberts Adam","year":"2020","journal-title":"arXiv preprint arXiv:2002.08910"},{"key":"bib110","first-page":"11","volume-title":"AAAI","author":"Rogers Anna","year":"2020"},{"key":"bib111","author":"Rosa Rudolf","year":"2019","journal-title":"arXiv preprint arXiv:1906.11511"},{"key":"bib112","volume-title":"5th Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS 2019","author":"Sanh Victor","year":"2019"},{"key":"bib113","author":"Sanh Victor","year":"2020","journal-title":"arXiv:2005.07683 [cs]"},{"key":"bib114","doi-asserted-by":"crossref","first-page":"3996","DOI":"10.18653\/v1\/2020.acl-main.368","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Schick Timo","year":"2020"},{"key":"bib115","author":"Schmidt Florian","year":"2020","journal-title":"arXiv preprint arXiv:2003.02738"},{"key":"bib116","author":"Schwartz Roy","year":"2019","journal-title":"arXiv: 1907.10597 [cs, stat]"},{"key":"bib117","author":"Serrano Sofia","year":"2019","journal-title":"arXiv:1906.03731 [cs]"},{"key":"bib118","author":"Shen Sheng","year":"2019","journal-title":"arXiv preprint arXiv:1909. 05840"},{"key":"bib119","author":"Si Chenglei","year":"2019","journal-title":"arXiv:1910.12391 [cs]"},{"key":"bib120","author":"Song Kaitao","year":"2020","journal-title":"arXiv:2004.09297 [cs]"},{"key":"bib121","first-page":"5986","volume-title":"International Conference on Machine Learning","author":"Stickland Asa Cooper","year":"2019"},{"key":"bib122","volume-title":"ACL 2019","author":"Strubell Emma","year":"2019"},{"key":"bib123","author":"Ta-Chun Su","year":"2019","journal-title":"arXiv: 1910.03176 [cs]"},{"key":"bib124","volume-title":"AAAI","author":"Sugawara Saku","year":"2020"},{"key":"bib125","first-page":"4314","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Sun Siqi","year":"2019"},{"key":"bib126","author":"Sun Yu","year":"2019","journal-title":"arXiv:1904.09223 [cs]"},{"key":"bib127","author":"Sun Yu","year":"2019","journal-title":"arXiv:1907.12412 [cs]"},{"key":"bib128","unstructured":"Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, and Denny Zhou. 2020. MobileBERT: Task-Agnostic Compression of BERT for Resource Limited Devices."},{"key":"bib129","author":"Sundararaman Dhanasekar","year":"2019","journal-title":"arXiv:1911.06156 [cs, stat]"},{"key":"bib130","author":"Talmor Alon","year":"2019","journal-title":"arXiv:1912.13283 [cs]"},{"key":"bib131","doi-asserted-by":"crossref","first-page":"145","DOI":"10.1007\/978-981-15-6168-9_13","volume-title":"Computational Linguistics","author":"Tanaka Hirotaka","year":"2020"},{"key":"bib132","author":"Tang Raphael","year":"2019","journal-title":"arXiv preprint arXiv:1903.12136"},{"key":"bib133","doi-asserted-by":"crossref","first-page":"4593","DOI":"10.18653\/v1\/P19-1452","volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics","author":"Tenney Ian","year":"2019"},{"key":"bib134","volume-title":"International Conference on Learning Representations","author":"Tenney Ian","year":"2019"},{"key":"bib135","author":"Tian James Yi","year":"2019","journal-title":"arXiv preprint arXiv: 1912.06638"},{"key":"bib136","doi-asserted-by":"crossref","first-page":"166","DOI":"10.18653\/v1\/2020.repl4nlp-1.20","volume-title":"Proceedings of the 5th Workshop on Representation Learning for NLP","author":"Toshniwal Shubham","year":"2020"},{"key":"bib137","author":"Tsai Henry","year":"2019","journal-title":"arXiv preprint arXiv:1909.00100"},{"key":"bib138","author":"Turc Iulia","year":"2019","journal-title":"arXiv preprint arXiv:1908.08962"},{"key":"bib139","first-page":"5831","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Schijndel Marten van","year":"2019"},{"key":"bib140","first-page":"5998","volume-title":"Advances in Neural Information Processing Systems","author":"Vaswani Ashish","year":"2017"},{"key":"bib141","author":"Vig Jesse","year":"2019","journal-title":"arXiv:1904.02679 [cs, stat]"},{"key":"bib142","first-page":"63","volume-title":"Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP","author":"Vig Jesse","year":"2019"},{"key":"bib143","volume-title":"Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20)","author":"Vilares David","year":"2020"},{"key":"bib144","first-page":"4387","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Voita Elena","year":"2019"},{"key":"bib145","author":"Voita Elena","year":"2019","journal-title":"arXiv preprint arXiv:1905.09418"},{"key":"bib146","author":"Voita Elena","year":"2020","journal-title":"arXiv:2003.12298 [cs]"},{"key":"bib147","doi-asserted-by":"crossref","first-page":"2153","DOI":"10.18653\/v1\/D19-1221","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Wallace Eric","year":"2019"},{"key":"bib148","author":"Wallace Eric","year":"2019","journal-title":"arXiv preprint arXiv:1909. 07940"},{"key":"bib149","doi-asserted-by":"crossref","first-page":"353","DOI":"10.18653\/v1\/W18-5446","volume-title":"Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP","author":"Wang Alex","year":"2018"},{"key":"bib150","author":"Wang Ruize","year":"2020","journal-title":"arXiv:2002.01808 [cs]"},{"key":"bib151","author":"Wang Wei","year":"2019","journal-title":"arXiv:1908.04577 [cs]"},{"key":"bib152","author":"Wang Wenhui","year":"2020","journal-title":"arXiv preprint arXiv:2002.10957"},{"key":"bib153","author":"Wang Xiaozhi","year":"2020","journal-title":"arXiv:1911.06136 [cs]"},{"key":"bib154","author":"Wang Yile","year":"2020","journal-title":"arXiv:1911.02929 [cs]"},{"key":"bib155","author":"Wang Zihan","year":"2019","journal-title":"arXiv preprint arXiv:1912.07840"},{"key":"bib156","volume-title":"Proceedings of the 42nd Annual Virtual Meeting of the Cognitive Science Society","author":"Warstadt Alex","year":"2020"},{"key":"bib157","first-page":"2870","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Warstadt Alex","year":"2019"},{"key":"bib158","author":"Wiedemann Gregor","year":"2019","journal-title":"arXiv preprint arXiv:1909.10430"},{"key":"bib159","doi-asserted-by":"crossref","first-page":"11","DOI":"10.18653\/v1\/D19-1002","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Wiegreffe Sarah","year":"2019"},{"key":"bib160","author":"Wolf Thomas","year":"2020","journal-title":"arXiv:1910.03771 [cs]"},{"key":"bib161","volume-title":"International Conference on Learning Representations","author":"Felix Wu","year":"2019"},{"key":"bib162","first-page":"84","volume-title":"ICCS 2019: Computational Science ICCS 2019","author":"Xing Wu","year":"2019"},{"key":"bib163","unstructured":"Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey,  2016. Google\u2019s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation."},{"key":"bib164","first-page":"4166","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Zhiyong Wu","year":"2020"},{"key":"bib165","author":"Canwen Xu","year":"2020","journal-title":"arXiv preprint arXiv:2002.02925"},{"key":"bib166","author":"Yang Junjie","year":"2019","journal-title":"arXiv:1911.01940 [cs]"},{"key":"bib167","author":"Yang Zhilin","year":"2019","journal-title":"arXiv:1906.08237 [cs]"},{"key":"bib168","first-page":"8413","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Yin Pengcheng","year":"2020"},{"key":"bib169","author":"Yogatama Dani","year":"2019","journal-title":"arXiv: 1901.11373 [cs, stat]"},{"issue":"5","key":"bib170","volume":"1","author":"You Yang","year":"2019","journal-title":"arXiv preprint arXiv:1904.00962"},{"key":"bib171","author":"Zadeh Ali Hadi","year":"2020","journal-title":"arXiv:2005.03842 [cs, stat]"},{"key":"bib172","author":"Zafrir Ofir","year":"2019","journal-title":"arXiv preprint arXiv:1910.06188"},{"key":"bib173","doi-asserted-by":"crossref","first-page":"4791","DOI":"10.18653\/v1\/P19-1472","volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics","author":"Zellers Rowan","year":"2019"},{"key":"bib174","doi-asserted-by":"crossref","first-page":"1441","DOI":"10.18653\/v1\/P19-1139","volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics","author":"Zhang Zhengyan","year":"2019"},{"key":"bib175","volume-title":"AAAI 2020","author":"Zhang Zhuosheng","year":"2020"},{"key":"bib176","author":"Zhao Sanqiang","year":"2019","journal-title":"arXiv preprint arXiv:1909.11687"},{"key":"bib177","doi-asserted-by":"crossref","first-page":"4729","DOI":"10.18653\/v1\/2020.acl-main.429","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Zhao Yiyun","year":"2020"},{"key":"bib178","author":"Zhou Wenxuan","year":"2019","journal-title":"arXiv preprint arXiv:1911.03918"},{"key":"bib179","volume-title":"AAAI 2020","author":"Zhou Xuhui","year":"2020"},{"key":"bib180","author":"Zhu Chen","year":"2019","journal-title":"arXiv:1909.11764 [cs]"}],"container-title":["Transactions of the Association for Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mitpressjournals.org\/doi\/pdf\/10.1162\/tacl_a_00349","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,3,12]],"date-time":"2021-03-12T16:39:49Z","timestamp":1615567189000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/tacl\/article\/96482"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,12]]},"references-count":180,"alternative-id":["10.1162\/tacl_a_00349"],"URL":"https:\/\/doi.org\/10.1162\/tacl_a_00349","relation":{},"ISSN":["2307-387X"],"issn-type":[{"value":"2307-387X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,12]]}}}