{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,2]],"date-time":"2025-11-02T14:45:15Z","timestamp":1762094715707,"version":"build-2065373602"},"reference-count":64,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2023,10,12]],"date-time":"2023-10-12T00:00:00Z","timestamp":1697068800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Artif. Intell."],"abstract":"<jats:p>Construction Grammar (CxG) is a paradigm from cognitive linguistics emphasizing the connection between syntax and semantics. Rather than rules that operate on lexical items, it posits <jats:italic>constructions<\/jats:italic> as the central building blocks of language, i.e., linguistic units of different granularity that combine syntax and semantics. As a first step toward assessing the compatibility of CxG with the syntactic and semantic knowledge demonstrated by state-of-the-art pretrained language models (PLMs), we present an investigation of their capability to classify and understand one of the most commonly studied constructions, the English comparative correlative (CC). We conduct experiments examining the classification accuracy of a syntactic probe on the one hand and the models' behavior in a semantic application task on the other, with BERT, RoBERTa, and DeBERTa as the example PLMs. Our results show that all three investigated PLMs, as well as OPT, are able to recognize the structure of the CC but fail to use its meaning. While human-like performance of PLMs on many NLP tasks has been alleged, this indicates that PLMs still suffer from substantial shortcomings in central domains of linguistic knowledge.<\/jats:p>","DOI":"10.3389\/frai.2023.1225791","type":"journal-article","created":{"date-parts":[[2023,10,12]],"date-time":"2023-10-12T09:06:48Z","timestamp":1697101608000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Explaining pretrained language models' understanding of linguistic structures using construction grammar"],"prefix":"10.3389","volume":"6","author":[{"given":"Leonie","family":"Weissweiler","sequence":"first","affiliation":[]},{"given":"Valentin","family":"Hofmann","sequence":"additional","affiliation":[]},{"given":"Abdullatif","family":"K\u00f6ksal","sequence":"additional","affiliation":[]},{"given":"Hinrich","family":"Sch\u00fctze","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2023,10,12]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","first-page":"1139","DOI":"10.1016\/j.lingua.2008.02.001","article-title":"Comparative correlatives and parameters","volume":"118","author":"Abeill\u00e9","year":"2008","journal-title":"Lingua"},{"key":"B2","doi-asserted-by":"crossref","first-page":"861","DOI":"10.18653\/v1\/P17-1080","article-title":"\u201cWhat do neural machine translation models learn about morphology?,\u201d","volume-title":"Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Belinkov","year":"2017"},{"key":"B3","doi-asserted-by":"publisher","first-page":"640","DOI":"10.1006\/jmla.2000.2757","article-title":"The contribution of argument structure constructions to sentence meaning","volume":"43","author":"Bencini","year":"2000","journal-title":"J. Mem. Lang"},{"volume-title":"Language","year":"1933","author":"Bloomfield","key":"B4"},{"article-title":"Generative grammar. Studies in English linguistics and literature","year":"1988","author":"Chomsky","key":"B5"},{"key":"B6","doi-asserted-by":"crossref","first-page":"2126","DOI":"10.18653\/v1\/P18-1198","article-title":"\u201cWhat you can cram into a single $&!#* vector: probing sentence embeddings for linguistic properties,\u201d","volume-title":"Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Conneau","year":"2018"},{"key":"B7","doi-asserted-by":"publisher","first-page":"543","DOI":"10.1162\/002438999554200","article-title":"The view from the periphery: the English comparative correlative","volume":"30","author":"Culicover","year":"1999","journal-title":"Linguist. Inq"},{"key":"B8","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.naacl-main.184","article-title":"\u201cLearning to recognize dialect features,\u201d","author":"Demszky","year":"2021","journal-title":"Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HTL) 2021"},{"key":"B9","doi-asserted-by":"publisher","first-page":"497","DOI":"10.1162\/002438905774464377","article-title":"Comparative correlatives comparatively","volume":"36","author":"Den Dikken","year":"2005","journal-title":"Linguist. Inq"},{"key":"B10","first-page":"4171","article-title":"\u201cBERT: pre-training of deep bidirectional transformers for language understanding,\u201d","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Devlin","year":"2019"},{"key":"B11","doi-asserted-by":"publisher","first-page":"117","DOI":"10.1162\/tacl_a_00050","article-title":"Automatically tagging constructions of causation and their slot-fillers","volume":"5","author":"Dunietz","year":"2017","journal-title":"Trans. Assoc. Comput. Linguist"},{"key":"B12","doi-asserted-by":"publisher","first-page":"254","DOI":"10.1017\/langcog.2016.7","article-title":"Computational learning of construction grammars","volume":"9","author":"Dunn","year":"2017","journal-title":"Lang. Cogn"},{"key":"B13","doi-asserted-by":"crossref","first-page":"117","DOI":"10.18653\/v1\/W19-2913","article-title":"\u201cFrequency vs. association for constraint selection in usage-based construction grammar,\u201d","volume-title":"Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics","author":"Dunn","year":"2019"},{"key":"B14","unstructured":"\u201cVarieties of conditional sentences,\u201d\n            FillmoreC. J.\n          Eastern States Conference on Linguistics, Vol. 31986"},{"key":"B15","doi-asserted-by":"crossref","first-page":"17","DOI":"10.1016\/B978-0-444-87144-2.50004-5","article-title":"\u201cGrammatical construction: theory and the familiar dichotomies,\u201d","volume-title":"Language Processing in Social Context","author":"Fillmore","year":"1989"},{"key":"B16","doi-asserted-by":"publisher","first-page":"501","DOI":"10.2307\/414531","article-title":"Regularity and idiomaticity in grammatical constructions: the case of let alone","volume":"64","author":"Fillmore","year":"1988","journal-title":"Language"},{"volume-title":"Constructions at Work: The Nature of Generalization in Language","year":"2006","author":"Goldberg","key":"B17"},{"volume-title":"Constructions: A Construction Grammar Approach to Argument Structure","year":"1995","author":"Goldberg","key":"B18"},{"key":"B19","doi-asserted-by":"publisher","first-page":"219","DOI":"10.1016\/S1364-6613(03)00080-9","article-title":"Constructions: a new theoretical approach to language","volume":"7","author":"Goldberg","year":"2003","journal-title":"Trends Cogn. Sci"},{"key":"B20","doi-asserted-by":"publisher","first-page":"1531","DOI":"10.1093\/oxfordhb\/9780195396683.013.0002","article-title":"\u201cChapter: 1415 Constructionist approaches,\u201d","author":"Goldberg","year":"2013","journal-title":"The Oxford Handbook of Construction Grammar"},{"key":"B21","article-title":"Assessing Bert's syntactic abilities","author":"Goldberg","year":"2019","journal-title":"arXiv preprint arXiv:1901.05287"},{"volume-title":"Constructions: A Construction Grammar Approach to Argument Structure","year":"1995","author":"Goldberg","key":"B22"},{"key":"B23","article-title":"\u201cDeberta: decoding-enhanced Bert with disentangled attention,\u201d","author":"He","year":"2020","journal-title":"International Conference on Learning Representations"},{"key":"B24","doi-asserted-by":"publisher","first-page":"151","DOI":"10.1017\/S0332586506001569","article-title":"A synchronic perspective on the grammaticalization of Swedish future constructions","volume":"29","author":"Hilpert","year":"2006","journal-title":"Nordic J. Linguist"},{"key":"B25","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1515\/cog-2018-0036","article-title":"The more data, the better: a usage-based account of the english comparative correlative construction","volume":"30","author":"Hoffmann","year":"2019","journal-title":"Cogn. Linguist"},{"key":"B26","doi-asserted-by":"crossref","DOI":"10.1093\/oxfordhb\/9780195396683.001.0001","volume-title":"The Oxford Handbook of Construction Grammar","author":"Hoffmann","year":"2013"},{"key":"B27","doi-asserted-by":"crossref","first-page":"7038","DOI":"10.18653\/v1\/2021.emnlp-main.564","article-title":"\u201cSurface form competition: why the highest probability answer isn't always right,\u201d","volume-title":"Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing","author":"Holtzman","year":"2021"},{"article-title":"spaCy 2: natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing","year":"2018","author":"Honnibal","key":"B28"},{"key":"B29","article-title":"\u201cComparative correlatives in English: a minimalist-cartographic analysis,\u201d","volume-title":"Essex Research Reports in Linguistics","author":"Iwasaki","year":"2009"},{"volume-title":"X Syntax: A Study of Phrase Structure","year":"1977","author":"Jackendoff","key":"B30"},{"key":"B31","doi-asserted-by":"crossref","first-page":"7811","DOI":"10.18653\/v1\/2020.acl-main.698","article-title":"\u201cNegated and misprimed probes for pretrained language models: Birds can talk, but cannot fly,\u201d","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Kassner","year":"2020"},{"key":"B32","doi-asserted-by":"publisher","first-page":"1","DOI":"10.2307\/417472","article-title":"Grammatical constructions and linguistic generalizations: the What's X doing Y? Construction","volume":"75","author":"Kay","year":"1999","journal-title":"Language"},{"key":"B33","doi-asserted-by":"crossref","DOI":"10.7208\/chicago\/9780226471013.001.0001","volume-title":"Women, Fire, and Dangerous Things: What Categories Reveal About the Mind","author":"Lakoff","year":"1987"},{"volume-title":"Foundations of Cognitive Grammar: Theoretical Prerequisites","year":"1987","author":"Langacker","key":"B34"},{"key":"B35","doi-asserted-by":"crossref","first-page":"7410","DOI":"10.18653\/v1\/2022.acl-long.512","article-title":"\u201cNeural reality of argument structure constructions,\u201d","volume-title":"Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Li","year":"2022"},{"key":"B36","article-title":"Roberta: a robustly optimized Bert pretraining approach","author":"Liu","year":"2019","journal-title":"arXiv preprint arXiv:1907.11692"},{"key":"B37","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.eacl-main.20","article-title":"A discerning several thousand judgments: GPT-3 rates the article+ adjective+ numeral+ noun construction","author":"Mahowald","year":"2023","journal-title":"arXiv preprint: arXiv:2301.12564"},{"key":"B38","first-page":"1137","article-title":"\u201cEvaluation strategies for computational construction grammars,\u201d","volume-title":"Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers","author":"Marques","year":"2016"},{"key":"B39","doi-asserted-by":"crossref","first-page":"1192","DOI":"10.18653\/v1\/D18-1151","article-title":"\u201cTargeted syntactic evaluation of language models,\u201d","volume-title":"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing","author":"Marvin","year":"2018"},{"key":"B40","first-page":"176","article-title":"\u201cThe comparative conditional construction in English, German, and Chinese,\u201d","volume-title":"Annual Meeting of the Berkeley Linguistics Society","author":"McCawley","year":"1988"},{"key":"B41","doi-asserted-by":"crossref","first-page":"1525","DOI":"10.18653\/v1\/P16-1144","article-title":"\u201cThe LAMBADA dataset: Word prediction requiring a broad discourse context,\u201d","volume-title":"Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Paperno","year":"2016"},{"key":"B42","first-page":"2825","article-title":"Scikit-learn: machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res"},{"key":"B43","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/N18-1202","article-title":"\u201cDeep contextualized word representations,\u201d","volume-title":"Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL HLT) 2018","author":"Peters","year":"2018"},{"key":"B44","first-page":"9","article-title":"Language models are unsupervised multitask learners","volume":"1","author":"Radford","year":"2019","journal-title":"OpenAI blog"},{"key":"B45","first-page":"1","article-title":"Exploring the limits of transfer learning with a unified text-to-text transformer","volume":"21","author":"Raffel","year":"2020","journal-title":"J. Mach. Learn. Res"},{"key":"B46","article-title":"\u201cVisualizing and measuring the geometry of Bert,\u201d","volume-title":"Advances in Neural Information Processing Systems, Vol. 32","author":"Reif","year":"2019"},{"key":"B47","doi-asserted-by":"crossref","first-page":"4902","DOI":"10.18653\/v1\/2020.acl-main.442","article-title":"\u201cBeyond accuracy: behavioral testing of NLP models with CheckList,\u201d","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Ribeiro","year":"2020"},{"key":"B48","first-page":"64","article-title":"\u201cHow to encode arbitrarily complex morphology in word embeddings, no corpus needed,\u201d","volume-title":"Proceedings of the First Workshop on NLP Applications to Field Linguistics","author":"Schwartz","year":"2022"},{"key":"B49","first-page":"2835","article-title":"Beyond the imitation game: quantifying and extrapolating the capabilities of language models","author":"Srivastava","year":"2023","journal-title":"Trans. Mach. Learn. Res."},{"key":"B50","doi-asserted-by":"crossref","first-page":"4020","DOI":"10.18653\/v1\/2020.coling-main.355","article-title":"\u201cCxGBERT: BERT meets construction grammar,\u201d","volume-title":"Proceedings of the 28th International Conference on Computational Linguistics","author":"Tayyar Madabushi","year":"2020"},{"key":"B51","article-title":"\u201cWhat do you learn from context? Probing for sentence structure in contextualized word representations,\u201d","volume-title":"International Conference on Learning Representations (ICLR), Vol. 7","author":"Tenney","year":"2019"},{"key":"B52","first-page":"6361","article-title":"\u201cCxLM: a construction and context-aware language model,\u201d","volume-title":"Proceedings of the Thirteenth Language Resources and Evaluation Conference","author":"Tseng","year":"2022"},{"key":"B53","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-main.586","article-title":"\u201cProbing pretrained language models for lexical semantics,\u201d","author":"Vuli\u0107","year":"2020","journal-title":"Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020"},{"key":"B54","doi-asserted-by":"publisher","first-page":"377","DOI":"10.1162\/tacl_a_00321","article-title":"BLiMP: the benchmark of linguistic minimal pairs for English","volume":"8","author":"Warstadt","year":"","journal-title":"Trans. Assoc. Comput. Linguist"},{"key":"B55","doi-asserted-by":"publisher","first-page":"625","DOI":"10.1162\/tacl_a_00290","article-title":"Neural network acceptability judgments","volume":"7","author":"Warstadt","year":"2019","journal-title":"Trans. Assoc. Comput. Linguist"},{"key":"B56","first-page":"217","article-title":"\u201cLearning which features matter: RoBERTa acquires a preference for linguistic generalizations (eventually),\u201d","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Warstadt","year":""},{"key":"B57","doi-asserted-by":"crossref","first-page":"932","DOI":"10.18653\/v1\/2021.emnlp-main.72","article-title":"\u201cFrequency effects on syntactic rule learning in transformers,\u201d","volume-title":"Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing","author":"Wei","year":"2021"},{"key":"B58","first-page":"85","article-title":"\u201cConstruction grammar provides unique insight into neural language models,\u201d","volume-title":"Proceedings of the First International Workshop on Construction Grammars and NLP (CxGs+NLP, GURT\/SyntaxFest 2023)","author":"Weissweiler","year":"2023"},{"key":"B59","doi-asserted-by":"crossref","first-page":"10859","DOI":"10.18653\/v1\/2022.emnlp-main.746","article-title":"\u201cThe better your syntax, the better your semantics? Probing pretrained language models for the English comparative correlative,\u201d","volume-title":"Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing","author":"Weissweiler","year":"2022"},{"key":"B60","article-title":"Does Bert make any sense? Interpretable word sense disambiguation with contextualized embeddings","author":"Wiedemann","year":"2019","journal-title":"arXiv preprint: arXiv:1909.10430"},{"key":"B61","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-demos.6","article-title":"Huggingface's transformers: State-of-the-art natural language processing","author":"Wolf","year":"2019","journal-title":"arXiv preprint: arXiv:1910.03771"},{"key":"B62","first-page":"230","article-title":"On theoretical issues in building a knowledge database of Chinese constructions","volume":"31","author":"Zhan","year":"2017","journal-title":"J. Chinese Inform. Process"},{"key":"B63","article-title":"Opt: Open pre-trained transformer language models","author":"Zhang","year":"2022","journal-title":"arXiv preprint arXiv:2205.01068"},{"key":"B64","first-page":"12697","article-title":"\u201cCalibrate before use: improving few-shot performance of language models,\u201d","volume-title":"Proceedings of the 38th International Conference on Machine Learning, Vol. 139 of Proceedings of Machine Learning Research","author":"Zhao","year":"2021"}],"container-title":["Frontiers in Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2023.1225791\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,10,12]],"date-time":"2023-10-12T09:07:23Z","timestamp":1697101643000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2023.1225791\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,10,12]]},"references-count":64,"alternative-id":["10.3389\/frai.2023.1225791"],"URL":"https:\/\/doi.org\/10.3389\/frai.2023.1225791","relation":{},"ISSN":["2624-8212"],"issn-type":[{"type":"electronic","value":"2624-8212"}],"subject":[],"published":{"date-parts":[[2023,10,12]]},"article-number":"1225791"}}