{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,25]],"date-time":"2026-02-25T18:15:52Z","timestamp":1772043352996,"version":"3.50.1"},"reference-count":48,"publisher":"MIT Press","license":[{"start":{"date-parts":[[2024,6,7]],"date-time":"2024-06-07T00:00:00Z","timestamp":1717718400000},"content-version":"vor","delay-in-days":158,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2024,6,4]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Sentences containing multiple semantic operators with overlapping scope often create ambiguities in interpretation, known as scope ambiguities. These ambiguities offer rich insights into the interaction between semantic structure and world knowledge in language processing. Despite this, there has been little research into how modern large language models treat them. In this paper, we investigate how different versions of certain autoregressive language models\u2014GPT-2, GPT-3\/3.5, Llama 2, and GPT-4\u2014treat scope ambiguous sentences, and compare this with human judgments. We introduce novel datasets that contain a joint total of almost 1,000 unique scope-ambiguous sentences, containing interactions between a range of semantic operators, and annotated for human judgments. Using these datasets, we find evidence that several models (i) are sensitive to the meaning ambiguity in these sentences, in a way that patterns well with human judgments, and (ii) can successfully identify human-preferred readings at a high level of accuracy (over 90% in some cases).1<\/jats:p>","DOI":"10.1162\/tacl_a_00670","type":"journal-article","created":{"date-parts":[[2024,6,7]],"date-time":"2024-06-07T17:26:11Z","timestamp":1717781171000},"page":"738-754","update-policy":"https:\/\/doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":16,"title":["Scope Ambiguities in Large Language Models"],"prefix":"10.1162","volume":"12","author":[{"given":"Gaurav","family":"Kamath","sequence":"first","affiliation":[{"name":"McGill University, Canada"},{"name":"Mila - Quebec AI Institute, Canada. gaurav.kamath@mail.mcgill.ca"}]},{"given":"Sebastian","family":"Schuster","sequence":"additional","affiliation":[{"name":"University College London, UK. s.schuster@ucl.ac.uk"}]},{"given":"Sowmya","family":"Vajjala","sequence":"additional","affiliation":[{"name":"National Research Council Canada. sowmya.vajjala@nrc-cnrc.gc.ca"}]},{"given":"Siva","family":"Reddy","sequence":"additional","affiliation":[{"name":"McGill University, Canada"},{"name":"Mila - Quebec AI Institute, Canada"},{"name":"Facebook CIFAR AI Chair, Canada. siva.reddy@mila.quebec"}]}],"member":"281","reference":[{"key":"2024060717254479400_bib1","article-title":"A review on language models as knowledge bases","author":"AlKhamissi","year":"2022","journal-title":"arXiv preprint arXiv:2204.06031"},{"key":"2024060717254479400_bib2","first-page":"15","article-title":"The pragmatics of quantifier scope: A corpus study","volume-title":"Proceedings of Sinn und Bedeutung","author":"AnderBois","year":"2012"},{"key":"2024060717254479400_bib3","volume-title":"The Structure and Real-time Comprehension of Quantifier Scope Ambiguity","author":"Anderson","year":"2004"},{"key":"2024060717254479400_bib4","article-title":"Statistical resolution of scope ambiguity in natural language","author":"Andrew","year":"2004","journal-title":"Unpublished Manuscript"},{"key":"2024060717254479400_bib5","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1201\/9781003205388-1","article-title":"On the proper role of linguistically-oriented deep net analysis in linguistic theorizing","author":"Baroni","year":"2022","journal-title":"Algebraic Structures in Natural Language"},{"key":"2024060717254479400_bib6","doi-asserted-by":"publisher","first-page":"49","DOI":"10.1162\/tacl_a_00254","article-title":"Analysis methods in neural language processing: A survey","volume":"7","author":"Belinkov","year":"2019","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2024060717254479400_bib7","first-page":"1877","article-title":"Language models are few-shot learners","volume":"33","author":"Brown","year":"2020","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2024060717254479400_bib8","first-page":"4875","article-title":"Generalized quantifiers as a source of error in multilingual NLU benchmarks","volume-title":"Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Cui","year":"2022"},{"key":"2024060717254479400_bib9","first-page":"30318","article-title":"GPT3.int8(): 8-bit matrix multiplication for transformers at scale","volume-title":"Advances in Neural Information Processing Systems","author":"Dettmers","year":"2022"},{"key":"2024060717254479400_bib10","doi-asserted-by":"publisher","first-page":"34","DOI":"10.1162\/tacl_a_00298","article-title":"What BERT is not: Lessons from a new suite of psycholinguistic diagnostics for language models","volume":"8","author":"Ettinger","year":"2020","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2024060717254479400_bib11","first-page":"1790","article-title":"Assessing composition in sentence vector representations","volume-title":"Proceedings of the 27th International Conference on Computational Linguistics","author":"Ettinger","year":"2018"},{"key":"2024060717254479400_bib12","doi-asserted-by":"publisher","first-page":"1828","DOI":"10.18653\/v1\/2021.acl-long.144","article-title":"Causal analysis of syntactic agreement mechanisms in neural language models","volume-title":"Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)","author":"Finlayson","year":"2021"},{"key":"2024060717254479400_bib13","doi-asserted-by":"publisher","first-page":"32","DOI":"10.18653\/v1\/N19-1004","article-title":"Neural language models as psycholinguistic subjects: Representations of syntactic state","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Futrell","year":"2019"},{"key":"2024060717254479400_bib14","first-page":"7324","article-title":"Inducing causal structure for interpretable neural networks","volume-title":"Proceedings of the 39th International Conference on Machine Learning","author":"Geiger","year":"2022"},{"key":"2024060717254479400_bib15","doi-asserted-by":"publisher","first-page":"1772","DOI":"10.18653\/v1\/2021.eacl-main.153","article-title":"Language models as knowledge bases: On entity representations, storage capacity, and paraphrased queries","volume-title":"Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume","author":"Heinzerling","year":"2021"},{"key":"2024060717254479400_bib16","first-page":"4129","article-title":"A structural probe for finding syntax in word representations","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Hewitt","year":"2019"},{"issue":"1","key":"2024060717254479400_bib17","doi-asserted-by":"publisher","first-page":"73","DOI":"10.1162\/089120103321337449","article-title":"A machine learning approach to modeling scope preferences","volume":"29","author":"Higgins","year":"2003","journal-title":"Computational Linguistics"},{"key":"2024060717254479400_bib18","article-title":"Prompt-based methods may underestimate large language models\u2019 linguistic generalizations","author":"Jennifer","year":"2023","journal-title":"arXiv preprint arXiv:2305.13264"},{"key":"2024060717254479400_bib19","article-title":"Consistency analysis of ChatGPT","author":"Jang","year":"2023","journal-title":"arXiv preprint arXiv:2303.06273"},{"key":"2024060717254479400_bib20","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1356","article-title":"What does BERT learn about the structure of language?","volume-title":"ACL 2019-57th Annual Meeting of the Association for Computational Linguistics","author":"Jawahar","year":"2019"},{"key":"2024060717254479400_bib21","doi-asserted-by":"publisher","first-page":"235","DOI":"10.18653\/v1\/S19-1026","article-title":"Probing what different NLP tasks teach machines about function word comprehension","volume-title":"Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019)","author":"Kim","year":"2019"},{"issue":"3","key":"2024060717254479400_bib22","doi-asserted-by":"publisher","first-page":"243","DOI":"10.1016\/0010-0277(93)90042-T","article-title":"Resolution of quantifier scope ambiguities","volume":"48","author":"Kurtzman","year":"1993","journal-title":"Cognition"},{"key":"2024060717254479400_bib23","first-page":"3960","article-title":"Prepositions matter in quantifier scope disambiguation","volume-title":"Proceedings of the 29th International Conference on Computational Linguistics","author":"Leczkowski","year":"2022"},{"key":"2024060717254479400_bib24","doi-asserted-by":"publisher","first-page":"195","DOI":"10.1146\/annurev-linguistics-032020-051035","article-title":"Syntactic structure from deep learning","volume":"7","author":"Linzen","year":"2021","journal-title":"Annual Review of Linguistics"},{"key":"2024060717254479400_bib25","doi-asserted-by":"publisher","first-page":"521","DOI":"10.1162\/tacl_a_00115","article-title":"Assessing the ability of LSTMs to learn syntax-sensitive dependencies","volume":"4","author":"Linzen","year":"2016","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2024060717254479400_bib26","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.emnlp-main.51","article-title":"We\u2019re afraid language models aren\u2019t modeling ambiguity","author":"Liu","year":"2023","journal-title":"arXiv preprint arXiv: 2304.14399"},{"key":"2024060717254479400_bib27","article-title":"RoBERTa: A robustly optimized BERT pretraining approach","author":"Liu","year":"2019","journal-title":"arXiv preprint arXiv:1907.11692"},{"key":"2024060717254479400_bib28","first-page":"51","article-title":"Unrestricted quantifier scope disambiguation","volume-title":"Proceedings of TextGraphs-6: Graph-based Methods for Natural Language Processing","author":"Manshadi","year":"2011"},{"key":"2024060717254479400_bib29","article-title":"minicons: Enabling flexible behavioral and representational analyses of transformer language models","author":"Misra","year":"2022","journal-title":"arXiv preprint arXiv:2203.13112"},{"key":"2024060717254479400_bib30","unstructured":"OpenAI. 2023. GPT-4 technical report."},{"key":"2024060717254479400_bib31","first-page":"27730","article-title":"Training language models to follow instructions with human feedback","volume":"35","author":"Ouyang","year":"2022","journal-title":"Advances in Neural Information Processing Systems"},{"issue":"1","key":"2024060717254479400_bib32","doi-asserted-by":"publisher","first-page":"447","DOI":"10.1146\/annurev-linguistics-031120-122924","article-title":"Semantic structure in deep learning","volume":"8","author":"Pavlick","year":"2022","journal-title":"Annual Review of Linguistics"},{"issue":"8","key":"2024060717254479400_bib33","first-page":"9","article-title":"Language models are unsupervised multitask learners","volume":"1","author":"Radford","year":"2019","journal-title":"OpenAI blog"},{"key":"2024060717254479400_bib34","unstructured":"Nathan Ellis\n              Rasmussen\n            \n          . 2022. Broad-domain Quantifier Scoping with RoBERTa. Ph.D. thesis, The Ohio State University."},{"key":"2024060717254479400_bib35","doi-asserted-by":"publisher","first-page":"8713","DOI":"10.1609\/aaai.v34i05.6397","article-title":"Probing natural language inference models through semantic fragments","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","author":"Richardson","year":"2020"},{"key":"2024060717254479400_bib36","doi-asserted-by":"publisher","first-page":"5418","DOI":"10.18653\/v1\/2020.emnlp-main.437","article-title":"How much knowledge can you pack into the parameters of a language model?","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Roberts","year":"2020"},{"key":"2024060717254479400_bib37","doi-asserted-by":"publisher","first-page":"271","DOI":"10.1023\/A:1010503321412","article-title":"Plausible reasoning and the resolution of quantifier scope ambiguities","volume":"67","author":"Saba","year":"2001","journal-title":"Studia Logica"},{"key":"2024060717254479400_bib38","doi-asserted-by":"publisher","first-page":"969","DOI":"10.18653\/v1\/2022.naacl-main.71","article-title":"When a sentence does not introduce a discourse entity, transformer-based models still sometimes refer to it","volume-title":"Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Schuster","year":"2022"},{"key":"2024060717254479400_bib39","doi-asserted-by":"publisher","first-page":"403","DOI":"10.1162\/tacl_a_00277","article-title":"Still a pain in the neck: Evaluating text representations on lexical composition","volume":"7","author":"Shwartz","year":"2019","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2024060717254479400_bib40","article-title":"Zero and few-shot semantic parsing with ambiguous inputs","author":"Stengel-Eskin","year":"2023","journal-title":"arXiv preprint arXiv:2306.00824"},{"key":"2024060717254479400_bib41","article-title":"Llama 2: Open foundation and fine-tuned chat models","author":"Touvron","year":"2023","journal-title":"arXiv preprint arXiv:2307.09288"},{"key":"2024060717254479400_bib42","article-title":"Quantifier scope disambiguation","author":"Tsiolis","year":"2020","journal-title":"Unpublished manuscript"},{"key":"2024060717254479400_bib43","first-page":"12388","article-title":"Investigating gender bias in language models using causal mediation analysis","volume":"33","author":"Vig","year":"2020","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2024060717254479400_bib44","doi-asserted-by":"publisher","first-page":"1464","DOI":"10.18653\/v1\/2023.findings-eacl.110","article-title":"Assessing monotonicity reasoning in Dutch through natural language inference","volume-title":"Findings of the Association for Computational Linguistics: EACL 2023","author":"Wijnholds","year":"2023"},{"key":"2024060717254479400_bib45","doi-asserted-by":"publisher","first-page":"6105","DOI":"10.18653\/v1\/2020.acl-main.543","article-title":"Do neural models learn systematicity of monotonicity inference in natural language?","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Yanaka","year":"2020"},{"key":"2024060717254479400_bib46","doi-asserted-by":"publisher","first-page":"31","DOI":"10.18653\/v1\/W19-4804","article-title":"Can neural networks understand monotonicity reasoning?","volume-title":"Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP","author":"Yanaka","year":"2019"},{"key":"2024060717254479400_bib47","doi-asserted-by":"publisher","first-page":"4896","DOI":"10.18653\/v1\/2020.emnlp-main.397","article-title":"Assessing phrasal representation and composition in transformers","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Lang","year":"2020"},{"key":"2024060717254479400_bib48","first-page":"2279","article-title":"On the interplay between fine-tuning and composition in transformers","volume-title":"Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021","author":"Lang","year":"2021"}],"container-title":["Transactions of the Association for Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00670\/2377773\/tacl_a_00670.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00670\/2377773\/tacl_a_00670.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,6,7]],"date-time":"2024-06-07T17:26:26Z","timestamp":1717781186000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/tacl\/article\/doi\/10.1162\/tacl_a_00670\/121540\/Scope-Ambiguities-in-Large-Language-Models"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024]]},"references-count":48,"URL":"https:\/\/doi.org\/10.1162\/tacl_a_00670","relation":{},"ISSN":["2307-387X"],"issn-type":[{"value":"2307-387X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024]]},"published":{"date-parts":[[2024]]}}}