{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,17]],"date-time":"2026-04-17T07:58:16Z","timestamp":1776412696991,"version":"3.51.2"},"reference-count":91,"publisher":"MIT Press","license":[{"start":{"date-parts":[[2022,9,21]],"date-time":"2022-09-21T00:00:00Z","timestamp":1663718400000},"content-version":"vor","delay-in-days":263,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,9,19]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>We investigate the extent to which modern neural language models are susceptible to structural priming, the phenomenon whereby the structure of a sentence makes the same structure more probable in a follow-up sentence. We explore how priming can be used to study the potential of these models to learn abstract structural information, which is a prerequisite for good performance on tasks that require natural language understanding skills. We introduce a novel metric and release Prime-LM, a large corpus where we control for various linguistic factors that interact with priming strength. We find that Transformer models indeed show evidence of structural priming, but also that the generalizations they learned are to some extent modulated by semantic information. Our experiments also show that the representations acquired by the models may not only encode abstract sequential structure but involve certain level of hierarchical syntactic information. More generally, our study shows that the priming paradigm is a useful, additional tool for gaining insights into the capacities of language models and opens the door to future priming-based investigations that probe the model\u2019s internal states.1<\/jats:p>","DOI":"10.1162\/tacl_a_00504","type":"journal-article","created":{"date-parts":[[2022,9,21]],"date-time":"2022-09-21T18:02:44Z","timestamp":1663783364000},"page":"1031-1050","update-policy":"https:\/\/doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":20,"title":["Structural Persistence in Language Models: Priming as a Window into Abstract Language Representations"],"prefix":"10.1162","volume":"10","author":[{"given":"Arabella","family":"Sinclair","sequence":"first","affiliation":[{"name":"School of Natural and Computing Sciences, University of Aberdeen United Kingdom. arabella.sinclair@abdn.ac.uk"},{"name":"Institute for Logic, Language and Computation, University of Amsterdam, The Netherlands"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jaap","family":"Jumelet","sequence":"additional","affiliation":[{"name":"Institute for Logic, Language and Computation, University of Amsterdam, The Netherlands. j.w.d.jumelet@uva.nl"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Willem","family":"Zuidema","sequence":"additional","affiliation":[{"name":"Institute for Logic, Language and Computation, University of Amsterdam, The Netherlands. zuidema@uva.nl"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Raquel","family":"Fern\u00e1ndez","sequence":"additional","affiliation":[{"name":"Institute for Logic, Language and Computation, University of Amsterdam, The Netherlands. raquel.fernandez@uva.nl"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"281","published-online":{"date-parts":[[2022,9,19]]},"reference":[{"issue":"4","key":"2022092118021051200_bib1","doi-asserted-by":"publisher","first-page":"543","DOI":"10.1017\/S135132491900024X","article-title":"Analyzing and interpreting neural networks for NLP: A report on the first BlackboxNLP Workshop","volume":"25","author":"Alishahi","year":"2019","journal-title":"Natural Language Engineering"},{"key":"2022092118021051200_bib2","article-title":"On the proper role of linguistically-oriented deep net analysis in linguistic theorizing","volume-title":"Algebraic Systems and the Representation of Linguistic Knowledge","author":"Baroni","year":"2022"},{"issue":"3","key":"2022092118021051200_bib3","doi-asserted-by":"publisher","first-page":"455","DOI":"10.1016\/j.cognition.2009.11.005","article-title":"Does verb bias modulate syntactic priming?","volume":"114","author":"Bernolet","year":"2010","journal-title":"Cognition"},{"key":"2022092118021051200_bib4","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2022.bigscience-1.9","article-title":"GPT-Neo: Large scale autoregressive language modeling with mesh-tensorflow","author":"Black","year":"2021"},{"issue":"3","key":"2022092118021051200_bib5","doi-asserted-by":"publisher","first-page":"355","DOI":"10.1016\/0010-0285(86)90004-6","article-title":"Syntactic persistence in language production","volume":"18","author":"Bock","year":"1986","journal-title":"Cognitive Psychology"},{"issue":"2","key":"2022092118021051200_bib6","doi-asserted-by":"publisher","first-page":"163","DOI":"10.1016\/0010-0277(89)90022-X","article-title":"Closed-class immanence in sentence production","volume":"31","author":"Bock","year":"1989","journal-title":"Cognition"},{"issue":"3","key":"2022092118021051200_bib7","doi-asserted-by":"publisher","first-page":"437","DOI":"10.1016\/j.cognition.2006.07.003","article-title":"Persistent structural priming from language comprehension to language production","volume":"104","author":"Bock","year":"2007","journal-title":"Cognition"},{"issue":"2","key":"2022092118021051200_bib8","doi-asserted-by":"publisher","first-page":"177","DOI":"10.1037\/0096-3445.129.2.177","article-title":"The persistence of structural priming: Transient activation or implicit learning?","volume":"129","author":"Bock","year":"2000","journal-title":"Journal of Experimental Psychology: General"},{"issue":"1","key":"2022092118021051200_bib9","doi-asserted-by":"publisher","first-page":"213","DOI":"10.1146\/annurev-linguistics-011619-030303","article-title":"Distributional semantics and linguistic theory","volume":"6","author":"Boleda","year":"2020","journal-title":"Annual Review of Linguistics"},{"issue":"4","key":"2022092118021051200_bib10","doi-asserted-by":"publisher","first-page":"635","DOI":"10.3758\/BF03212972","article-title":"Syntactic priming in written production: Evidence for rapid decay","volume":"6","author":"Branigan","year":"1999","journal-title":"Psychonomic Bulletin & Review"},{"issue":"7\u20138","key":"2022092118021051200_bib11","doi-asserted-by":"publisher","first-page":"974","DOI":"10.1080\/016909600824609","article-title":"The role of local and global syntactic structure in language production: Evidence from syntactic priming","volume":"21","author":"Branigan","year":"2006","journal-title":"Language and Cognitive Processes"},{"key":"2022092118021051200_bib12","first-page":"69","article-title":"Predicting the dative alternation","volume-title":"Cognitive Foundations of Interpretation","author":"Bresnan","year":"2007"},{"key":"2022092118021051200_bib13","first-page":"1877","article-title":"Language models are few-shot learners","volume-title":"Advances in Neural Information Processing Systems","author":"Brown","year":"2020"},{"issue":"2","key":"2022092118021051200_bib14","doi-asserted-by":"publisher","first-page":"217","DOI":"10.1023\/A:1005101313330","article-title":"Structural priming as implicit learning: A comparison of models of sentence production","volume":"29","author":"Chang","year":"2000","journal-title":"Journal of Psycholinguistic Research"},{"key":"2022092118021051200_bib15","doi-asserted-by":"publisher","DOI":"10.1515\/9783112316009","volume-title":"Syntactic Structures","author":"Chomsky","year":"1957"},{"issue":"2","key":"2022092118021051200_bib16","doi-asserted-by":"publisher","first-page":"214","DOI":"10.1016\/S0749-596X(03)00060-3","article-title":"The use of lexical and syntactic information in language production: Evidence from the priming of noun-phrase structure","volume":"49","author":"Cleland","year":"2003","journal-title":"Journal of Memory and Language"},{"key":"2022092118021051200_bib17","volume-title":"Sampling Techniques","author":"Cochran","year":"1977","edition":"3rd edition"},{"issue":"2","key":"2022092118021051200_bib18","doi-asserted-by":"publisher","first-page":"159","DOI":"10.1075\/ijcl.14.2.02dav","article-title":"The 385+ million word Corpus of Contemporary American English (1990\u20132008+): Design, architecture, and linguistic insights","volume":"14","author":"Davies","year":"2009","journal-title":"International Journal of Corpus Linguistics"},{"key":"2022092118021051200_bib19","doi-asserted-by":"publisher","first-page":"396","DOI":"10.18653\/v1\/2020.conll-1.32","article-title":"Discourse structure interacts with reference but not syntax in neural language models","volume-title":"Proceedings of the 24th Conference on Computational Natural Language Learning","author":"Davis","year":"2020"},{"key":"2022092118021051200_bib20","first-page":"4171","article-title":"BERT: Pre-training of deep bidirectional Transformers for language understanding","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Devlin","year":"2019"},{"issue":"3","key":"2022092118021051200_bib21","doi-asserted-by":"publisher","first-page":"326","DOI":"10.1016\/j.cognition.2008.09.006","article-title":"A probabilistic corpus-based model of syntactic parallelism","volume":"109","author":"Dubey","year":"2008","journal-title":"Cognition"},{"key":"2022092118021051200_bib22","doi-asserted-by":"publisher","first-page":"34","DOI":"10.1162\/tacl_a_00298","article-title":"What BERT is not: Lessons from a new suite of psycholinguistic diagnostics for language models","volume":"8","author":"Ettinger","year":"2020","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2022092118021051200_bib23","first-page":"271","article-title":"Word vectors, reuse, and replicability: Towards a community repository of large-text resources","volume-title":"Proceedings of the 21st Nordic Conference on Computational Linguistics","author":"Fares","year":"2017"},{"issue":"3","key":"2022092118021051200_bib24","doi-asserted-by":"publisher","first-page":"578","DOI":"10.1111\/cogs.12022","article-title":"Evidence for implicit learning in syntactic comprehension","volume":"37","author":"Fine","year":"2013","journal-title":"Cognitive Science"},{"key":"2022092118021051200_bib25","doi-asserted-by":"publisher","first-page":"33","DOI":"10.3115\/1610163.1610170","article-title":"Avoiding repetition in generated text","volume-title":"Proceedings of the Eleventh European Workshop on Natural Language Generation (ENLG 07)","author":"Foster","year":"2007"},{"key":"2022092118021051200_bib26","first-page":"12848","article-title":"A theoretical analysis of the repetition problem in text generation","volume-title":"Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021","author":"Zihao","year":"2021"},{"key":"2022092118021051200_bib27","doi-asserted-by":"publisher","first-page":"32","DOI":"10.18653\/v1\/N19-1004","article-title":"Neural language models as psycholinguistic subjects: Representations of syntactic state","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Futrell","year":"2019"},{"key":"2022092118021051200_bib28","doi-asserted-by":"publisher","first-page":"70","DOI":"10.18653\/v1\/2020.acl-demos.10","article-title":"Syntaxgym: An online platform for targeted evaluation of language models","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations","author":"Gauthier","year":"2020"},{"issue":"1","key":"2022092118021051200_bib29","doi-asserted-by":"publisher","first-page":"156","DOI":"10.1111\/j.1551-6709.2010.01150.x","article-title":"Structural priming as structure-mapping: Children use analogies from previous utterances to guide sentence production","volume":"35","author":"Goldwater","year":"2011","journal-title":"Cognitive Science"},{"issue":"4","key":"2022092118021051200_bib30","doi-asserted-by":"publisher","first-page":"365","DOI":"10.1007\/s10936-005-6139-3","article-title":"Syntactic priming: A corpus-based approach","volume":"34","author":"Gries","year":"2005","journal-title":"Journal of Psycholinguistic Research"},{"key":"2022092118021051200_bib31","doi-asserted-by":"publisher","first-page":"1195","DOI":"10.18653\/v1\/n18-1108","article-title":"Colorless green recurrent networks dream hierarchically","volume-title":"Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT)","author":"Gulordava","year":"2018"},{"key":"2022092118021051200_bib32","doi-asserted-by":"publisher","first-page":"204","DOI":"10.18653\/v1\/2021.acl-short.27","article-title":"How effective is BERT without word ordering? Implications for language understanding and data privacy","volume-title":"Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)","author":"Hessel","year":"2021"},{"key":"2022092118021051200_bib33","doi-asserted-by":"publisher","first-page":"4129","DOI":"10.18653\/v1\/n19-1419","article-title":"A structural probe for finding syntax in word representations","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Hewitt","year":"2019"},{"key":"2022092118021051200_bib34","first-page":"pages 1725\u2013pages 1744","article-title":"A systematic assessment of syntactic generalization in neural language models","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Jennifer","year":"2020"},{"key":"2022092118021051200_bib35","doi-asserted-by":"publisher","first-page":"907","DOI":"10.1613\/jair.1.11196","article-title":"Visualisation and \u2018diagnostic classifiers\u2019 reveal how recurrent and recursive neural networks process hierarchical structure","volume":"61","author":"Hupkes","year":"2018","journal-title":"Journal of Artificial Intelligence Research"},{"issue":"2","key":"2022092118021051200_bib36","doi-asserted-by":"publisher","first-page":"175","DOI":"10.1080\/23273798.2016.1236976","article-title":"Do you what I say? People reconstruct the syntax of anomalous utterances","volume":"32","author":"Ivanova","year":"2017","journal-title":"Language, Cognition and Neuroscience"},{"issue":"2","key":"2022092118021051200_bib37","doi-asserted-by":"publisher","first-page":"193","DOI":"10.1016\/j.cognition.2011.10.013","article-title":"The comprehension of anomalous sentences: Evidence from structural priming","volume":"122","author":"Ivanova","year":"2012","journal-title":"Cognition"},{"issue":"1","key":"2022092118021051200_bib38","doi-asserted-by":"publisher","first-page":"57","DOI":"10.1016\/j.cognition.2012.10.013","article-title":"Alignment as a consequence of expectation adaptation: Syntactic priming is affected by the prime\u2019s prediction error given both prior and recent experience","volume":"127","author":"Florian Jaeger","year":"2013","journal-title":"Cognition"},{"key":"2022092118021051200_bib39","volume-title":"Theory of Probability","author":"Jeffreys","year":"1961","edition":"3rd"},{"key":"2022092118021051200_bib40","article-title":"Exploring the limits of language modeling","author":"J\u00f3zefowicz","year":"2016","journal-title":"CoRR"},{"key":"2022092118021051200_bib41","doi-asserted-by":"publisher","first-page":"pages 342\u2013pages 350","DOI":"10.18653\/v1\/2020.blackboxnlp-1.32","article-title":"diagNNose: A library for neural activation analysis","volume-title":"Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP","author":"Jumelet","year":"2020"},{"key":"2022092118021051200_bib42","doi-asserted-by":"publisher","first-page":"4958","DOI":"10.18653\/v1\/2021.findings-acl.439","article-title":"Language models use monotonicity to assess NPI licensing","volume-title":"Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021","author":"Jumelet","year":"2021"},{"issue":"6","key":"2022092118021051200_bib43","doi-asserted-by":"publisher","first-page":"1133","DOI":"10.3758\/s13423-011-0157-y","article-title":"Structural priming as implicit learning: Cumulative priming effects and individual differences","volume":"18","author":"Kaschak","year":"2011","journal-title":"Psychonomic Bulletin & Review"},{"issue":"430","key":"2022092118021051200_bib44","doi-asserted-by":"publisher","first-page":"773","DOI":"10.1080\/01621459.1995.10476572","article-title":"Bayes factors","volume":"90","author":"Kass","year":"1995","journal-title":"Journal of the American Statistical Association"},{"key":"2022092118021051200_bib45","doi-asserted-by":"publisher","first-page":"7811","DOI":"10.18653\/v1\/2020.acl-main.698","article-title":"Negated and misprimed probes for pretrained language models: Birds can talk, but cannot fly","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Kassner","year":"2020"},{"key":"2022092118021051200_bib46","doi-asserted-by":"publisher","first-page":"pages 1757\u2013pages 1762","DOI":"10.18653\/v1\/2020.acl-main.160","article-title":"Overestimation of syntactic representation in neural language models","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Kodner","year":"2020"},{"issue":"104699","key":"2022092118021051200_bib47","doi-asserted-by":"publisher","DOI":"10.1016\/j.cognition.2021.104699","article-title":"Mechanisms for handling nested dependencies in neural-network language models and humans","volume":"213","author":"Lakretz","year":"2021","journal-title":"Cognition"},{"key":"2022092118021051200_bib48","doi-asserted-by":"publisher","first-page":"11","DOI":"10.18653\/v1\/N19-1002","article-title":"The emergence of number and syntax units in LSTM language models","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Lakretz","year":"2019"},{"key":"2022092118021051200_bib49","article-title":"ALBERT: A lite BERT for self-supervised learning of language representations","volume-title":"Proceedings of the 8th International Conference on Learning Representations (ICLR)","author":"Lan","year":"2020"},{"key":"2022092118021051200_bib50","doi-asserted-by":"publisher","first-page":"7410","DOI":"10.18653\/v1\/2022.acl-long.512","article-title":"Neural reality of argument structure constructions","volume-title":"Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Li","year":"2022"},{"key":"2022092118021051200_bib51","doi-asserted-by":"publisher","first-page":"521","DOI":"10.1162\/tacl_a_00115","article-title":"Assessing the ability of LSTMs to learn syntax-sensitive dependencies","volume":"4","author":"Linzen","year":"2016","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2022092118021051200_bib52","article-title":"RoBERTa: A robustly optimized BERT pretraining approach","author":"Liu","year":"2019","journal-title":"CoRR"},{"key":"2022092118021051200_bib53","article-title":"Predicting inductive biases of pre-trained models","volume-title":"Proceedings of the 9th International Conference on Learning Representations (ICLR)","author":"Lovering","year":"2021"},{"key":"2022092118021051200_bib54","doi-asserted-by":"publisher","first-page":"4020","DOI":"10.18653\/v1\/2020.coling-main.355","article-title":"CxGBERT: BERT meets construction grammar","volume-title":"Proceedings of the 28th International Conference on Computational Linguistics (COLING)","author":"Madabushi","year":"2020"},{"key":"2022092118021051200_bib55","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1016\/j.jml.2016.03.009","article-title":"A meta-analysis of syntactic priming in language production","volume":"91","author":"Mahowald","year":"2016","journal-title":"Journal of Memory and Language"},{"key":"2022092118021051200_bib56","doi-asserted-by":"publisher","first-page":"1192","DOI":"10.1016\/j.jml.2016.03.009","article-title":"Targeted syntactic evaluation of language models","volume-title":"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing","author":"Marvin","year":"2018"},{"key":"2022092118021051200_bib57","doi-asserted-by":"publisher","first-page":"3428","DOI":"10.18653\/v1\/P19-1334","article-title":"Right for the wrong reasons: Diagnosing syntactic heuristics in natural language inference","volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics","author":"McCoy","year":"2019"},{"key":"2022092118021051200_bib58","volume-title":"Webster\u2019s Dictionary of English Usage","author":"Merriam-Webster","year":"1989"},{"issue":"3","key":"2022092118021051200_bib59","doi-asserted-by":"publisher","first-page":"402","DOI":"10.3758\/BF03195588","article-title":"The University of South Florida free association, rhyme, and word fragment norms","volume":"36","author":"Nelson","year":"2004","journal-title":"Behavior Research Methods, Instruments, & Computers"},{"key":"2022092118021051200_bib60","doi-asserted-by":"publisher","first-page":"3710","DOI":"10.18653\/v1\/2021.naacl-main.290","article-title":"Refining targeted syntactic evaluation of language models","volume-title":"Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Newman","year":"2021"},{"key":"2022092118021051200_bib61","doi-asserted-by":"publisher","first-page":"1145","DOI":"10.18653\/v1\/2021.findings-acl.98","article-title":"Out of order: How important is the sequential order of words in a sentence in natural language understanding tasks?","volume-title":"Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021","author":"Pham","year":"2021"},{"issue":"4","key":"2022092118021051200_bib62","doi-asserted-by":"publisher","first-page":"633","DOI":"10.1006\/jmla.1998.2592","article-title":"The representation of verbs: Evidence from syntactic priming in language production","volume":"39","author":"Pickering","year":"1998","journal-title":"Journal of Memory and Language"},{"issue":"3","key":"2022092118021051200_bib63","doi-asserted-by":"publisher","first-page":"427","DOI":"10.1037\/0033-2909.134.3.427","article-title":"Structural priming: A critical review.","volume":"134","author":"Pickering","year":"2008","journal-title":"Psychological Bulletin"},{"issue":"3","key":"2022092118021051200_bib64","doi-asserted-by":"publisher","first-page":"890","DOI":"10.1037\/a0029181","article-title":"Persistent structural priming and frequency effects during comprehension.","volume":"39","author":"Pickering","year":"2013","journal-title":"Journal of Experimental Psychology: Learning, Memory, and Cognition"},{"key":"2022092118021051200_bib65","doi-asserted-by":"publisher","first-page":"66","DOI":"10.18653\/v1\/K19-1007","article-title":"Using priming to uncover the organization of syntactic representations in neural language models","volume-title":"Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)","author":"Prasad","year":"2019"},{"issue":"8","key":"2022092118021051200_bib66","first-page":"9","article-title":"Language models are unsupervised multitask learners","volume":"1","author":"Radford","year":"2019","journal-title":"OpenAI blog"},{"key":"2022092118021051200_bib67","article-title":"Against sequence priming: Evidence from constituents and distituents in corpus data","volume-title":"Proceedings of the 29th Annual Meeting of the Cognitive Science Society","author":"Reitter","year":"2007"},{"issue":"4","key":"2022092118021051200_bib68","doi-asserted-by":"publisher","first-page":"587","DOI":"10.1111\/j.1551-6709.2010.01165.x","article-title":"A computational cognitive model of syntactic priming","volume":"35","author":"Reitter","year":"2011","journal-title":"Cognitive Science"},{"key":"2022092118021051200_bib69","doi-asserted-by":"publisher","first-page":"29","DOI":"10.1016\/j.jml.2014.05.008","article-title":"Alignment and task success in spoken dialogue","volume":"76","author":"Reitter","year":"2014","journal-title":"Journal of Memory and Language"},{"key":"2022092118021051200_bib70","doi-asserted-by":"publisher","first-page":"842","DOI":"10.1162\/tacl_a_00349","article-title":"A primer in bertology: What we know about how BERT works","volume":"8","author":"Rogers","year":"2020","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2022092118021051200_bib71","doi-asserted-by":"publisher","first-page":"2699","DOI":"10.18653\/v1\/2020.acl-main.240","article-title":"Masked language model scoring","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Salazar","year":"2020"},{"key":"2022092118021051200_bib72","article-title":"DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter","volume-title":"Proceedings of the 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing (NeurIPS)","author":"Sanh","year":"2019"},{"issue":"3","key":"2022092118021051200_bib73","doi-asserted-by":"publisher","first-page":"179","DOI":"10.1016\/S0010-0277(03)00119-7","article-title":"Syntactic priming of relative clause attachments: Persistence of structural configuration in sentence production","volume":"89","author":"Scheepers","year":"2003","journal-title":"Cognition"},{"key":"2022092118021051200_bib74","article-title":"Artificial neural networks accurately predict language processing in the brain","author":"Schrimpf","year":"2020","journal-title":"bioRxiv"},{"key":"2022092118021051200_bib75","doi-asserted-by":"publisher","first-page":"59","DOI":"10.1016\/j.jml.2016.03.011","article-title":"Unifying structural priming effects on syntactic choices and timing of sentence generation","volume":"91","author":"Segaert","year":"2016","journal-title":"Journal of Memory and Language"},{"key":"2022092118021051200_bib76","doi-asserted-by":"publisher","first-page":"2888","DOI":"10.18653\/v1\/2021.emnlp-main.230","article-title":"Masked language modeling and the distributional hypothesis: Order word matters pre-training for little","volume-title":"Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing","author":"Sinha","year":"2021"},{"key":"2022092118021051200_bib77","doi-asserted-by":"publisher","first-page":"7329","DOI":"10.18653\/v1\/2021.acl-long.569","article-title":"UnNatural Language Inference","volume-title":"Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)","author":"Sinha","year":"2021"},{"key":"2022092118021051200_bib78","doi-asserted-by":"publisher","first-page":"4593","DOI":"10.18653\/v1\/P19-1452","article-title":"BERT rediscovers the classical NLP pipeline","volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics","author":"Tenney","year":"2019"},{"key":"2022092118021051200_bib79","article-title":"What do you learn from context? Probing for sentence structure in contextualized word representations","volume-title":"Proceedings of the 7th International Conference on Learning Representations (ICLR)","author":"Tenney","year":"2019"},{"issue":"2","key":"2022092118021051200_bib80","doi-asserted-by":"publisher","first-page":"101","DOI":"10.1016\/j.cognition.2014.04.002","article-title":"On the parity of structural persistence in language production and comprehension","volume":"132","author":"Tooley","year":"2014","journal-title":"Cognition"},{"issue":"1","key":"2022092118021051200_bib81","doi-asserted-by":"publisher","first-page":"71","DOI":"10.1023\/A:1023239604158","article-title":"Building syntactic structure in speaking","volume":"28","author":"Fox Tree","year":"1999","journal-title":"Journal of Psycholinguistic Research"},{"key":"2022092118021051200_bib82","doi-asserted-by":"publisher","first-page":"58","DOI":"10.18653\/v1\/2021.blackboxnlp-1.5","article-title":"On the limits of minimal pairs in contrastive evaluation","volume-title":"Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP","author":"Vamvas","year":"2021"},{"key":"2022092118021051200_bib83","doi-asserted-by":"publisher","first-page":"pages 4704\u2013pages 4710","DOI":"10.18653\/v1\/D18-1499","article-title":"A neural model of adaptation in reading","volume-title":"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing","author":"van Schijndel","year":"2018"},{"key":"2022092118021051200_bib84","first-page":"5998","article-title":"Attention is all you need","volume-title":"Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems","author":"Vaswani","year":"2017"},{"key":"2022092118021051200_bib85","doi-asserted-by":"publisher","first-page":"pages 183\u2013pages 196","DOI":"10.18653\/v1\/2020.emnlp-main.14","article-title":"Information-theoretic probing with minimum description length","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Voita","year":"2020"},{"key":"2022092118021051200_bib86","doi-asserted-by":"publisher","first-page":"2870","DOI":"10.18653\/v1\/D19-1286","article-title":"Investigating BERT\u2019s knowledge of language: Five analysis methods with NPIs","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Warstadt","year":"2019"},{"key":"2022092118021051200_bib87","doi-asserted-by":"publisher","first-page":"351","DOI":"10.1162\/tacl_a_00321","article-title":"BLiMP: A benchmark of linguistic minimal pairs for English","volume-title":"Proceedings of the Society for Computation in Linguistics 2020","author":"Warstadt","year":"2020"},{"issue":"4","key":"2022092118021051200_bib88","doi-asserted-by":"publisher","first-page":"431","DOI":"10.1080\/01690960244000063","article-title":"Phrase structure priming: A short-lived effect","volume":"18","author":"Wheeldon","year":"2003","journal-title":"Language and Cognitive Processes"},{"key":"2022092118021051200_bib89","first-page":"132","article-title":"A non-linear structural probe","volume-title":"Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"White","year":"2021"},{"key":"2022092118021051200_bib90","doi-asserted-by":"publisher","first-page":"38","DOI":"10.18653\/v1\/2020.emnlp-demos.6","article-title":"Transformers: State-of-the-art natural language processing","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations","author":"Wolf","year":"2020"},{"key":"2022092118021051200_bib91","doi-asserted-by":"publisher","first-page":"270","DOI":"10.18653\/v1\/2020.acl-demos.30","article-title":"DIALOGPT: Large-scale generative pre-training for conversational response generation","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations","author":"Zhang","year":"2020"}],"container-title":["Transactions of the Association for Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00504\/2043729\/tacl_a_00504.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00504\/2043729\/tacl_a_00504.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,9,21]],"date-time":"2022-09-21T18:03:17Z","timestamp":1663783397000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/tacl\/article\/doi\/10.1162\/tacl_a_00504\/113019\/Structural-Persistence-in-Language-Models-Priming"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022]]},"references-count":91,"URL":"https:\/\/doi.org\/10.1162\/tacl_a_00504","relation":{},"ISSN":["2307-387X"],"issn-type":[{"value":"2307-387X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022]]},"published":{"date-parts":[[2022]]}}}