{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,11]],"date-time":"2026-04-11T00:51:53Z","timestamp":1775868713384,"version":"3.50.1"},"reference-count":59,"publisher":"Proceedings of the National Academy of Sciences","issue":"48","license":[{"start":{"date-parts":[[2020,12,3]],"date-time":"2020-12-03T00:00:00Z","timestamp":1606953600000},"content-version":"vor","delay-in-days":183,"URL":"https:\/\/www.pnas.org\/site\/aboutpnas\/licenses.xhtml"}],"funder":[{"name":"Tencent Corp.","award":["gift"],"award-info":[{"award-number":["gift"]}]},{"DOI":"10.13039\/100006785","name":"Google","doi-asserted-by":"publisher","award":["fellowship"],"award-info":[{"award-number":["fellowship"]}],"id":[{"id":"10.13039\/100006785","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["www.pnas.org"],"crossmark-restriction":true},"short-container-title":["Proc. Natl. Acad. Sci. U.S.A."],"published-print":{"date-parts":[[2020,12]]},"abstract":"<jats:p>This paper explores the knowledge of linguistic structure learned by large artificial neural networks, trained via self-supervision, whereby the model simply tries to predict a masked word in a given context. Human language communication is via sequences of words, but language understanding requires constructing rich hierarchical structures that are never observed explicitly. The mechanisms for this have been a prime mystery of human language acquisition, while engineering work has mainly proceeded by supervised learning on treebanks of sentences hand labeled for this latent structure. However, we demonstrate that modern deep contextual language models learn major aspects of this structure, without any explicit supervision. We develop methods for identifying linguistic hierarchical structure emergent in artificial neural networks and demonstrate that components in these models focus on syntactic grammatical relationships and anaphoric coreference. Indeed, we show that a linear transformation of learned embeddings in these models captures parse tree distances to a surprising degree, allowing approximate reconstruction of the sentence tree structures normally assumed by linguists. These results help explain why these models have brought such large improvements across many language-understanding tasks.<\/jats:p>","DOI":"10.1073\/pnas.1907367117","type":"journal-article","created":{"date-parts":[[2020,6,3]],"date-time":"2020-06-03T23:59:43Z","timestamp":1591228783000},"page":"30046-30054","update-policy":"https:\/\/doi.org\/10.1073\/pnas.cm10313","source":"Crossref","is-referenced-by-count":204,"title":["Emergent linguistic structure in artificial neural networks trained by self-supervision"],"prefix":"10.1073","volume":"117","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6155-649X","authenticated-orcid":false,"given":"Christopher D.","family":"Manning","sequence":"first","affiliation":[{"name":"Computer Science Department, Stanford University, Stanford, CA 94305;"}]},{"given":"Kevin","family":"Clark","sequence":"additional","affiliation":[{"name":"Computer Science Department, Stanford University, Stanford, CA 94305;"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1320-6633","authenticated-orcid":false,"given":"John","family":"Hewitt","sequence":"additional","affiliation":[{"name":"Computer Science Department, Stanford University, Stanford, CA 94305;"}]},{"given":"Urvashi","family":"Khandelwal","sequence":"additional","affiliation":[{"name":"Computer Science Department, Stanford University, Stanford, CA 94305;"}]},{"given":"Omer","family":"Levy","sequence":"additional","affiliation":[{"name":"Facebook Artificial Intelligence Research, Facebook Inc., Seattle, WA 98109"}]}],"member":"341","published-online":{"date-parts":[[2020,6,3]]},"reference":[{"key":"e_1_3_4_1_2","doi-asserted-by":"publisher","DOI":"10.1038\/nrn1533"},{"key":"e_1_3_4_2_2","first-page":"337","volume-title":"Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Rambow O.","year":"2010","unstructured":"O. Rambow, \u201cThe simple truth about dependency and phrase structure representations: An opinion piece\u201d in Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, R. Kaplan, J. Burstein, M. Harper, G. Penn, Eds. (Association for Computational Linguistics, Stroudsburg, PA, 2010), pp. 337\u2013340."},{"key":"e_1_3_4_3_2","doi-asserted-by":"publisher","DOI":"10.1016\/S0042-6989(01)00173-0"},{"key":"e_1_3_4_4_2","first-page":"313","article-title":"Building a large annotated corpus of English: The Penn treebank","volume":"19","author":"Marcus M. P.","year":"1993","unstructured":"M. P. Marcus, B. Santorini, M. A. Marcinkiewicz, Building a large annotated corpus of English: The Penn treebank. Comput. Ling. 19, 313\u2013330 (1993).","journal-title":"Comput. Ling."},{"key":"e_1_3_4_5_2","first-page":"1659","volume-title":"LREC International Conference on Language Resources and Evaluation","author":"Nivre J.","year":"2016","unstructured":"J. Nivre , \u201cUniversal dependencies V1: A multilingual treebank collection\u201d in LREC International Conference on Language Resources and Evaluation, N. Calzolari , Eds. (European Language Resources Association, Paris, France, 2016), pp. 1659\u20131666."},{"key":"e_1_3_4_6_2","doi-asserted-by":"publisher","DOI":"10.1162\/089120103322753356"},{"key":"e_1_3_4_7_2","first-page":"740","volume-title":"Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing","author":"Chen D.","year":"2014","unstructured":"D. Chen, C. D. Manning, \u201cA fast and accurate dependency parser using neural networks\u201d in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, A. Moschitti, B. Pang, W. Daelemans, Eds. (Association for Computational Linguistics, Stroudsburg, PA, 2014), pp. 740\u2013750."},{"key":"e_1_3_4_8_2","unstructured":"T. Dozat C. D. Manning \u201cDeep biaffine attention for neural dependency parsing.\u201d https:\/\/openreview.net\/pdf?id=Hk95PK9le. Accessed 21 May 2020."},{"key":"e_1_3_4_9_2","first-page":"253","volume-title":"Proceedings of the International Joint Conference on Neural Networks (IJCNN)","author":"Schmidhuber J.","year":"1990","unstructured":"J. Schmidhuber, \u201cAn on-line algorithm for dynamic reinforcement learning and planning in reactive environments\u201d in Proceedings of the International Joint Conference on Neural Networks (IJCNN) (Institute of Electrical and Electronic Engineers, Piscataway, NJ, 1990), pp. 253\u2013258."},{"key":"e_1_3_4_10_2","first-page":"273","volume-title":"Proceedings of Robotics: Science and Systems (RSS)","author":"Lieb D.","year":"2005","unstructured":"D. Lieb, A. Lookingbill, S. Thrun, \u201cAdaptive road following using self-supervised learning and reverse optical flow\u201d in Proceedings of Robotics: Science and Systems (RSS), S. Thrun, G. S. Sukhatme, S. Schaal, Eds. (MIT Press, Cambridge, MA, 2005), pp. 273\u2013280."},{"key":"e_1_3_4_11_2","doi-asserted-by":"publisher","DOI":"10.1177\/107769905303000401"},{"key":"e_1_3_4_12_2","first-page":"3111","volume-title":"Advances Neural Information Processing Systems 26","author":"Mikolov T.","year":"2013","unstructured":"T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, J. Dean, \u201cDistributed representations of words and phrases and their compositionality\u201d in Advances Neural Information Processing Systems 26, C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, K.Q. Weinberger, Eds. (Curran Associates, Red Hook, NY, 2013), pp. 3111\u20133119."},{"key":"e_1_3_4_13_2","first-page":"1532","volume-title":"Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing","author":"Pennington J.","year":"2014","unstructured":"J. Pennington, R. Socher, C. Manning, \u201cGlove: Global vectors for word representation\u201d in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, A. Moschitti, B. Pang, W. Daelemans, Eds. (Association for Computational Linguistics, Stroudsburg, PA, 2014), pp. 1532\u20131543."},{"key":"e_1_3_4_14_2","doi-asserted-by":"publisher","DOI":"10.1561\/2200000006"},{"key":"e_1_3_4_15_2","doi-asserted-by":"publisher","DOI":"10.1111\/j.1551-6709.2011.01189.x"},{"key":"e_1_3_4_16_2","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1100760108"},{"key":"e_1_3_4_17_2","first-page":"2227","volume-title":"Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Peters M.","year":"2018","unstructured":"M. Peters , \u201cDeep contextualized word representations\u201d in Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, M. Walker, H. Ji, A. Stent, Eds. (Association for Computational Linguistics, Stroudsburg, PA, 2018), pp. 2227\u20132237."},{"key":"e_1_3_4_18_2","first-page":"4171","volume-title":"Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Devlin J.","year":"2019","unstructured":"J. Devlin, M. W. Chang, K. Lee, K. Toutanova, \u201cBERT: Pre-training of deep bidirectional transformers for language understanding\u201d in Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, J. Burstein, C. Doran, T. Solorio, Eds. (Association for Computational Linguistics, Stroudsburg, PA, 2019), pp. 4171\u20134186."},{"key":"e_1_3_4_19_2","volume-title":"Knowledge of Language: Its Nature, Origin, and Use","author":"Chomsky N.","year":"1986","unstructured":"N. Chomsky, Knowledge of Language: Its Nature, Origin, and Use (Praeger, New York, NY, 1986)."},{"key":"e_1_3_4_20_2","unstructured":"J. Devlin M.-W. Chang K. Lee K. Toutanova BERT. https:\/\/github.com\/google-research\/bert. Accessed 14 May 2020."},{"key":"e_1_3_4_21_2","first-page":"5998","volume-title":"Advances in Neural Information Processing Systems 30","author":"Vaswani A.","year":"2017","unstructured":"A. Vaswani , \u201cAttention is all you need\u201d in Advances in Neural Information Processing Systems 30, I. Guyon , Eds. (Curran Associates, Red Hook, NY, 2017), pp. 5998\u20136008."},{"key":"e_1_3_4_22_2","unstructured":"J. Chung C. Gulcehre K. Cho Y. Bengio Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555 (11 Dececember 2014)."},{"key":"e_1_3_4_23_2","unstructured":"D. Bahdanau K. Cho Y. Bengio Neural machine translation by jointly learning to align and translate. arXiv:1409.0473 (16 January 2019)."},{"key":"e_1_3_4_24_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00115"},{"key":"e_1_3_4_25_2","first-page":"1195","volume-title":"Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Gulordava K.","year":"2018","unstructured":"K. Gulordava, P. Bojanowski, E. Grave, T. Linzen, M. Baroni, \u201cColorless green recurrent networks dream hierarchically\u201d in Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, M. Walker, H. Ji, A. Stent, Eds. (Association for Computational Linguistics, Stroudsburg, PA, 2018), pp. 1195\u20131205."},{"key":"e_1_3_4_26_2","first-page":"1192","volume-title":"Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing","author":"Marvin R.","year":"2018","unstructured":"R. Marvin, T. Linzen, \u201cTargeted syntactic evaluation of language models\u201d in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, E. Riloff, D. Chiang, J. Hockenmaier, J. Tsujii, Eds. (Association for Computational Linguistics, Stroudsburg, PA, 2018), pp. 1192\u20131202."},{"key":"e_1_3_4_27_2","first-page":"1426","volume-title":"Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Kuncoro A.","year":"2018","unstructured":"A. Kuncoro , \u201cLSTMs can learn syntax-sensitive dependencies well, but modeling structure makes them better\u201d in Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, I. Gurevych, Y. Miyao, Eds. (Association for Computational Linguistics, Stroudsburg, PA, 2018), pp. 1426\u20131436."},{"key":"e_1_3_4_28_2","unstructured":"Y. Goldberg Assessing BERT\u2019s syntactic abilities. arXiv:1901.05287 (16 January 2019)."},{"key":"e_1_3_4_29_2","doi-asserted-by":"publisher","DOI":"10.1016\/0010-0285(91)90003-7"},{"key":"e_1_3_4_30_2","doi-asserted-by":"crossref","first-page":"147","DOI":"10.1108\/S0092-4563(2011)0000037009","volume-title":"Experiments at the Interfaces, Syntax and Semantics","author":"Phillips C.","year":"2011","unstructured":"C. Phillips, M. W. Wagers, E. F. Lau, \u201cGrammatical illusions and selective fallibility in real-time language comprehension\u201d in Experiments at the Interfaces, Syntax and Semantics, J. Runner, Ed. (Emerald Group Publishing Limited, 2011), vol. 37, pp. 147\u2013180."},{"key":"e_1_3_4_31_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D15-1166"},{"key":"e_1_3_4_32_2","unstructured":"S. Sharma R. Kiros R. Salakhutdinov Action recognition using visual attention. arxiv:1511.04119 (14 February 2016)."},{"key":"e_1_3_4_33_2","first-page":"2048","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Xu K.","year":"2015","unstructured":"K. Xu , \u201cShow, attend and tell: Neural image caption generation with visual attention\u201d in Proceedings of the International Conference on Machine Learning, F. Bach, D. Blei, Eds. (Proceedings of Machine Learning Research, Brookline, MA, 2015), pp. 2048\u20132057."},{"key":"e_1_3_4_34_2","first-page":"577","volume-title":"Advances Neural Information Processing Systems 28","author":"Chorowski J. K.","year":"2015","unstructured":"J. K. Chorowski, D. Bahdanau, D. Serdyuk, K. Cho, Y. Bengio, \u201cAttention-based models for speech recognition\u201d in Advances Neural Information Processing Systems 28, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, R. Garnett, Eds. (Curran Associates, Red Hook, NY, 2015), pp. 577\u2013585."},{"key":"e_1_3_4_35_2","unstructured":"M. P. Marcus B. Santorini M. A. Marcinkiewicz A. Taylor Treebank-3. Linguistic Data Consortium LDC99T42. https:\/\/catalog.ldc.upenn.edu\/LDC99T42. Accessed 14 May 2020."},{"key":"e_1_3_4_36_2","first-page":"449","volume-title":"LREC International Conference on Language Resources and Evaluation","author":"de Marneffe M. C.","year":"2006","unstructured":"M. C. de Marneffe, B. MacCartney, C. D. Manning, \u201cGenerating typed dependency parses from phrase structure parses\u201d in LREC International Conference on Language Resources and Evaluation, N. Calzolari , Eds. (European Language Resources Association, Paris, France, 2006), pp. 449\u2013454."},{"key":"e_1_3_4_37_2","first-page":"1","volume-title":"Joint Conference on EMNLP and CoNLL \u2013 Shared Task","author":"Pradhan S.","year":"2012","unstructured":"S. Pradhan, A. Moschitti, N. Xue, O. Uryupina, Y. Zhang, \u201cCoNLL-2012 shared task: Modeling multilingual unrestricted coreference in Ontonotes\u201d in Joint Conference on EMNLP and CoNLL \u2013 Shared Task, S. Pradhan, A. Moschitti, N. Xue, Eds. (Association for Computational Linguistics, Stroudsburg, PA, 2012), pp. 1\u201340."},{"key":"e_1_3_4_38_2","first-page":"28","volume-title":"Proceedings of the Conference on Computational Natural Language Learning: Shared Task","author":"Lee H.","year":"2011","unstructured":"H. Lee , \u201cStanford\u2019s multi-pass sieve coreference resolution system at the CoNLL-2011 shared task\u201d in Proceedings of the Conference on Computational Natural Language Learning: Shared Task, S. Pradhan, Ed. (Association for Computational Linguistics, Stroudsburg, PA, 2011), pp. 28\u201334."},{"key":"e_1_3_4_39_2","first-page":"823","volume-title":"Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Eriguchi A.","year":"2016","unstructured":"A. Eriguchi, K. Hashimoto, Y. Tsuruoka, \u201cTree-to-sequence attentional neural machine translation\u201d in Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, K. Erk, N. A. Smith, Eds. (Association for Computational Linguistics, Stroudsburg, PA, 2016), pp. 823\u2013833."},{"key":"e_1_3_4_40_2","first-page":"4792","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","author":"Chen K.","year":"2018","unstructured":"K. Chen, R. Wang, M. Utiyama, E. Sumita, T. Zhao, \u201cSyntax-directed attention for neural machine translation\u201d in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI Press, Palo Alto, CA, 2018), pp. 4792\u20134799."},{"key":"e_1_3_4_41_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1548"},{"key":"e_1_3_4_42_2","first-page":"4129","volume-title":"Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Hewitt J.","year":"2019","unstructured":"J. Hewitt, C. D. Manning, \u201cA structural probe for finding syntax in word representations\u201d in Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, J. Burstein, C. Doran, T. Solorio, Eds. (Association for Computational Linguistics, Stroudsburg, PA, 2019), pp. 4129\u20134138."},{"key":"e_1_3_4_43_2","first-page":"8594","volume-title":"Advances in Neural Information Processing Systems 32","author":"Reif E.","year":"2019","unstructured":"E. Reif , \u201cVisualizing and measuring the geometry of BERT\u201d in Advances in Neural Information Processing Systems 32, H. Wallach .,Eds. (Curran Associates, Red Hook, NY, 2019), pp. 8594\u20138603."},{"key":"e_1_3_4_44_2","doi-asserted-by":"publisher","DOI":"10.1037\/0033-295X.104.2.211"},{"key":"e_1_3_4_45_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D16-1159"},{"key":"e_1_3_4_46_2","first-page":"14","volume-title":"Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Blevins T.","year":"2018","unstructured":"T. Blevins, O. Levy, L. Zettlemoyer, \u201cDeep RNNs encode soft hierarchical syntax\u201d in Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, I. Gurevych, Y. Miyao, Eds. (Association for Computational Linguistics, Stroudsburg, PA, 2018), pp. 14\u201319."},{"key":"e_1_3_4_47_2","first-page":"1073","volume-title":"Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Liu N. F.","year":"2019","unstructured":"N. F. Liu, M. Gardner, Y. Belinkov, M. E. Peters, N. A. Smith, \u201cLinguistic knowledge and transferability of contextual representations\u201d in Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, J. Burstein, C. Doran, T. Solorio, Eds. (Association for Computational Linguistics, Stroudsburg, PA, 2019), pp. 1073\u20131094."},{"key":"e_1_3_4_48_2","unstructured":"I. Tenney \u201cWhat do you learn from context? Probing for sentence structure in contextualized word representations.\u201d https:\/\/openreview.net\/pdf?id=SJzSgnRcKX. Accessed 21 May 2020."},{"key":"e_1_3_4_49_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1179"},{"key":"e_1_3_4_50_2","first-page":"3257","volume-title":"Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Saphra N.","year":"2019","unstructured":"N. Saphra, A. Lopez, \u201cUnderstanding learning dynamics of language models with SVCCA\u201d in Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, J. Burstein, C. Doran, T. Solorio, Eds. (Association for Computational Linguistics, Stroudsburg, PA, 2019), pp. 3257\u20133267."},{"key":"e_1_3_4_51_2","first-page":"359","volume-title":"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing","author":"Zhang K. W.","year":"2018","unstructured":"K. W. Zhang, S. R. Bowman, \u201cLanguage modeling teaches you more syntax than translation does: Lessons learned through auxiliary task analysis\u201d in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, E. Riloff, D. Chiang, J. Hockenmaier, J. Tsujii, Eds. (Association for Computational Linguistics, Stroudsburg, PA, 2018), pp. 359\u2013361."},{"key":"e_1_3_4_52_2","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D15-1246"},{"key":"e_1_3_4_53_2","first-page":"2126","volume-title":"Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Conneau A.","year":"2018","unstructured":"A. Conneau, G. Kruszewski, G. Lample, L. Barrault, M. Baroni, \u201cWhat you can cram into a single \\$&!#* vector: Probing sentence embeddings for linguistic properties\u201d in Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, I. Gurevych, Y. Miyao, Eds. (Association for Computational Linguistics, Stroudsburg, PA, 2018), pp. 2126\u20132136."},{"key":"e_1_3_4_54_2","first-page":"861","volume-title":"Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Belinkov Y.","year":"2017","unstructured":"Y. Belinkov, N. Durrani, F. Dalvi, H. Sajjad, J. Glass, \u201cWhat do neural machine translation models learn about morphology?\u201d in Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, R. Barzilay, M.-Y. Kan, Eds. (Association for Computational Linguistics, Stroudsburg, PA, 2017), pp. 861\u2013872."},{"key":"e_1_3_4_55_2","unstructured":"K. Clark BERT attention analysis. https:\/\/github.com\/clarkkev\/attention-analysis. Deposited 27 June 2019."},{"key":"e_1_3_4_56_2","unstructured":"J. Hewitt Structural probes. https:\/\/github.com\/john-hewitt\/structural-probes. Deposited 27 May 2019."},{"key":"e_1_3_4_57_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/S14-2008"},{"key":"e_1_3_4_58_2","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00152"},{"key":"e_1_3_4_59_2","first-page":"pp. 276","volume-title":"Proceedings of the Second BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP","author":"Clark K.","year":"2019","unstructured":"K. Clark, U. Khandelwal, O. Levy, C. D. Manning, \u201cWhat does BERT look at? An analysis of BERT\u2019s attention\u201d in Proceedings of the Second BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, T. Linzen, G. Chrupa\u0142a, Y. Belinkov, D. Hupkes, Eds. (Association for Computational Linguistics, Stroudsburg PA, 2019), pp. 276\u2013286."}],"container-title":["Proceedings of the National Academy of Sciences"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/www.pnas.org\/syndication\/doi\/10.1073\/pnas.1907367117","content-type":"unspecified","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/pnas.org\/doi\/pdf\/10.1073\/pnas.1907367117","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,4,13]],"date-time":"2022-04-13T09:19:21Z","timestamp":1649841561000},"score":1,"resource":{"primary":{"URL":"https:\/\/pnas.org\/doi\/full\/10.1073\/pnas.1907367117"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,6,3]]},"references-count":59,"journal-issue":{"issue":"48","published-print":{"date-parts":[[2020,12]]}},"alternative-id":["10.1073\/pnas.1907367117"],"URL":"https:\/\/doi.org\/10.1073\/pnas.1907367117","relation":{},"ISSN":["0027-8424","1091-6490"],"issn-type":[{"value":"0027-8424","type":"print"},{"value":"1091-6490","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,6,3]]},"assertion":[{"value":"2020-06-03","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}