{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,12]],"date-time":"2026-06-12T04:02:28Z","timestamp":1781236948525,"version":"3.54.1"},"reference-count":67,"publisher":"MIT Press","license":[{"start":{"date-parts":[[2023,12,15]],"date-time":"2023-12-15T00:00:00Z","timestamp":1702598400000},"content-version":"vor","delay-in-days":348,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,12,14]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Surprisal theory posits that less-predictable words should take more time to process, with word predictability quantified as surprisal, i.e., negative log probability in context. While evidence supporting the predictions of surprisal theory has been replicated widely, much of it has focused on a very narrow slice of data: native English speakers reading English texts. Indeed, no comprehensive multilingual analysis exists. We address this gap in the current literature by investigating the relationship between surprisal and reading times in eleven different languages, distributed across five language families. Deriving estimates from language models trained on monolingual and multilingual corpora, we test three predictions associated with surprisal theory: (i) whether surprisal is predictive of reading times, (ii) whether expected surprisal, i.e., contextual entropy, is predictive of reading times, and (iii) whether the linking function between surprisal and reading times is linear. We find that all three predictions are borne out crosslinguistically. By focusing on a more diverse set of languages, we argue that these results offer the most robust link to date between information theory and incremental language processing across languages.<\/jats:p>","DOI":"10.1162\/tacl_a_00612","type":"journal-article","created":{"date-parts":[[2023,12,15]],"date-time":"2023-12-15T18:58:46Z","timestamp":1702666726000},"page":"1451-1470","update-policy":"https:\/\/doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":63,"title":["Testing the Predictions of Surprisal Theory in 11 Languages"],"prefix":"10.1162","volume":"11","author":[{"given":"Ethan G.","family":"Wilcox","sequence":"first","affiliation":[{"name":"ETH Z\u00fcrich, Switzerland. ethan.wilcox@inf.ethz.ch"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Tiago","family":"Pimentel","sequence":"additional","affiliation":[{"name":"University of Cambridge, UK. tp472@cam.ac.uk"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Clara","family":"Meister","sequence":"additional","affiliation":[{"name":"ETH Z\u00fcrich, Switzerland. clara.meister@inf.ethz.ch"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ryan","family":"Cotterell","sequence":"additional","affiliation":[{"name":"ETH Z\u00fcrich, Switzerland. ryan.cotterell@inf.ethz.ch"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Roger P.","family":"Levy","sequence":"additional","affiliation":[{"name":"MIT, USA. rplevy@mit.edu"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"281","published-online":{"date-parts":[[2023,12,14]]},"reference":[{"key":"2023121518583196000_bib1","first-page":"4781","article-title":"Give your text representation models some love: The case for Basque","volume-title":"Proceedings of the Twelfth Language Resources and Evaluation Conference","author":"Agerri","year":"2020"},{"issue":"3","key":"2023121518583196000_bib2","doi-asserted-by":"publisher","first-page":"255","DOI":"10.1016\/j.jml.2012.11.001","article-title":"Random effects structure for confirmatory hypothesis testing: Keep it maximal","volume":"68","author":"Barr","year":"2013","journal-title":"Journal of Memory and Language"},{"key":"2023121518583196000_bib3","doi-asserted-by":"publisher","first-page":"104082","DOI":"10.1016\/j.jml.2019.104082","article-title":"Maze made easy: Better and easier measurement of incremental processing difficulty","volume":"111","author":"Boyce","year":"2020","journal-title":"Journal of Memory and Language"},{"key":"2023121518583196000_bib4","article-title":"A-maze of natural stories: Texts are comprehensible using the maze task","volume-title":"Talk at 26th Architectures and Mechanisms for Language Processing conference (AMLaP 26)","author":"Boyce","year":"2020"},{"key":"2023121518583196000_bib5","doi-asserted-by":"publisher","first-page":"104174","DOI":"10.1016\/j.jml.2020.104174","article-title":"Word predictability effects are linear, not logarithmic: Implications for probabilistic models of sentence comprehension","volume":"116","author":"Brothers","year":"2021","journal-title":"Journal of Memory and Language"},{"issue":"6","key":"2023121518583196000_bib6","doi-asserted-by":"publisher","first-page":"211837","DOI":"10.1098\/rsos.211837","article-title":"Prediction as a basis for skilled reading: Insights from modern language models","volume":"9","author":"Cevoli","year":"2022","journal-title":"Royal Society Open Science"},{"key":"2023121518583196000_bib7","doi-asserted-by":"publisher","first-page":"341","DOI":"10.1016\/B978-008044980-7\/50017-3","article-title":"Eye movements in reading words and sentences","author":"Clifton","year":"2007","journal-title":"Eye Movements"},{"key":"2023121518583196000_bib8","doi-asserted-by":"publisher","first-page":"8440","DOI":"10.18653\/v1\/2020.acl-main.747","article-title":"Unsupervised cross-lingual representation learning at scale","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Conneau","year":"2020"},{"key":"2023121518583196000_bib9","doi-asserted-by":"publisher","first-page":"536","DOI":"10.18653\/v1\/N18-2085","article-title":"Are all languages equally hard to language-model?","volume-title":"Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)","author":"Cotterell","year":"2018"},{"issue":"9","key":"2023121518583196000_bib10","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1126\/sciadv.aaw2594","article-title":"Different languages, similar encoding efficiency: Comparable information rates across the human communicative niche","volume":"5","author":"Coup\u00e9","year":"2019","journal-title":"Science Advances"},{"key":"2023121518583196000_bib11","first-page":"138","article-title":"The effects of surprisal across languages: Results from native and non-native reading","volume-title":"Findings of the Association for Computational Linguistics: AACL-IJCNLP 2022","author":"de Varda","year":"2022"},{"issue":"2","key":"2023121518583196000_bib12","doi-asserted-by":"publisher","first-page":"193","DOI":"10.1016\/j.cognition.2008.07.008","article-title":"Data from eye-tracking corpora as evidence for theories of syntactic processing complexity","volume":"109","author":"Demberg","year":"2008","journal-title":"Cognition"},{"key":"2023121518583196000_bib13","doi-asserted-by":"publisher","first-page":"4171","DOI":"10.18653\/v1\/N19-1423","article-title":"BERT: Pre-training of deep bidirectional transformers for language understanding","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Devlin","year":"2019"},{"key":"2023121518583196000_bib14","article-title":"A primer on pretrained multilingual language models","author":"Doddapaneni","year":"2021","journal-title":"arXiv preprint arXiv:2107.00676"},{"issue":"1","key":"2023121518583196000_bib15","doi-asserted-by":"publisher","first-page":"163","DOI":"10.3758\/BRM.41.1.163","article-title":"The maze task: Measuring forced incremental sentence processing time","volume":"41","author":"Forster","year":"2009","journal-title":"Behavior Research Methods"},{"key":"2023121518583196000_bib16","first-page":"61","article-title":"Sequential vs. hierarchical syntactic models of human incremental sentence processing","volume-title":"Proceedings of the 3rd Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2012)","author":"Fossum","year":"2012"},{"key":"2023121518583196000_bib17","article-title":"Speaking rationally: Uniform information density as an optimal strategy for language production","volume-title":"Proceedings of the Annual Meeting of the Cognitive Science Society","author":"Frank","year":"2008"},{"key":"2023121518583196000_bib18","first-page":"81","article-title":"Uncertainty reduction as a measure of cognitive processing effort","volume-title":"Proceedings of the 2010 Workshop on Cognitive Modeling and Computational Linguistics","author":"Frank","year":"2010"},{"issue":"3","key":"2023121518583196000_bib19","doi-asserted-by":"publisher","first-page":"475","DOI":"10.1111\/tops.12025","article-title":"Uncertainty reduction as a measure of cognitive load in sentence comprehension","volume":"5","author":"Frank","year":"2013","journal-title":"Topics in Cognitive Science"},{"issue":"6","key":"2023121518583196000_bib20","doi-asserted-by":"publisher","first-page":"829","DOI":"10.1177\/0956797611409589","article-title":"Insensitivity of the human sentence-processing system to hierarchical structure","volume":"22","author":"Frank","year":"2011","journal-title":"Psychological Science"},{"key":"2023121518583196000_bib21","doi-asserted-by":"publisher","first-page":"10","DOI":"10.18653\/v1\/W18-0102","article-title":"Predictive power of word surprisal for reading times is a linear function of language model quality","volume-title":"Proceedings of the 8th Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2018)","author":"Goodkind","year":"2018"},{"issue":"3","key":"2023121518583196000_bib22","doi-asserted-by":"publisher","first-page":"424","DOI":"10.2307\/1912791","article-title":"Investigating causal relations by econometric models and cross-spectral methods","volume":"37","author":"Granger","year":"1969","journal-title":"Econometrica"},{"key":"2023121518583196000_bib23","first-page":"2440","article-title":"Wiki-40B: Multilingual language model dataset","volume-title":"Proceedings of the Twelfth Language Resources and Evaluation Conference","author":"Guo","year":"2020"},{"key":"2023121518583196000_bib24","doi-asserted-by":"publisher","DOI":"10.3115\/1073336.1073357","article-title":"A probabilistic Earley parser as a psycholinguistic model","volume-title":"Second Meeting of the North American Chapter of the Association for Computational Linguistics","author":"Hale","year":"2001"},{"key":"2023121518583196000_bib25","doi-asserted-by":"publisher","first-page":"101","DOI":"10.1023\/A:1022492123056","article-title":"The information conveyed by words in sentences","volume":"32","author":"Hale","year":"2003","journal-title":"Journal of Psycholinguistic Research"},{"issue":"4","key":"2023121518583196000_bib26","doi-asserted-by":"publisher","DOI":"10.1207\/s15516709cog0000_64","article-title":"Uncertainty about the rest of the sentence.","volume":"30","author":"Hale","year":"2006","journal-title":"Cognitive Science"},{"key":"2023121518583196000_bib27","volume-title":"Meaningful Differences in the Everyday Experience of Young American Children","author":"Hart","year":"1995"},{"key":"2023121518583196000_bib28","volume-title":"The World Atlas of Language Structures","author":"Haspelmath","year":"2005"},{"key":"2023121518583196000_bib29","doi-asserted-by":"publisher","DOI":"10.1163\/9780585492230","volume-title":"Sentence Processing: A Crosslinguistic Perspective","author":"Hillert","year":"1998"},{"key":"2023121518583196000_bib30","doi-asserted-by":"publisher","first-page":"106","DOI":"10.18653\/v1\/2021.naacl-main.10","article-title":"Multilingual language models predict human reading behavior","volume-title":"Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Hollenstein","year":"2021"},{"key":"2023121518583196000_bib31","doi-asserted-by":"publisher","DOI":"10.31234\/osf.io\/qjnpv","article-title":"The plausibility of sampling as an algorithmic theory of sentence processing","author":"Hoover","year":"2022","journal-title":"PsyArXiv preprint"},{"key":"2023121518583196000_bib32","doi-asserted-by":"publisher","first-page":"36","DOI":"10.4324\/9780203123430","article-title":"Self-paced reading","volume-title":"Research Methods in Second Language Psycholinguistics","author":"Jegerski","year":"2013"},{"issue":"2","key":"2023121518583196000_bib33","doi-asserted-by":"publisher","first-page":"228","DOI":"10.1037\/0096-3445.111.2.228","article-title":"Paradigms and processes in reading comprehension","volume":"111","author":"Just","year":"1982","journal-title":"Journal of Experimental Psychology: General"},{"key":"2023121518583196000_bib34","article-title":"The Dundee corpus","volume-title":"Proceedings of the 12th European Conference on Eye Movements","author":"Kennedy","year":"2003"},{"key":"2023121518583196000_bib35","article-title":"Adam: A method for stochastic optimization","volume-title":"International Conference on Learning Representations","author":"Kingma","year":"2015"},{"key":"2023121518583196000_bib36","doi-asserted-by":"publisher","first-page":"66","DOI":"10.18653\/v1\/P18-1007","article-title":"Subword regularization: Improving neural network translation models with multiple subword candidates","volume-title":"Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Kudo","year":"2018"},{"key":"2023121518583196000_bib37","doi-asserted-by":"publisher","first-page":"10421","DOI":"10.18653\/v1\/2022.emnlp-main.712","article-title":"Context limitations make neural language models more human-like","volume-title":"Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing","author":"Kuribayashi","year":"2022"},{"key":"2023121518583196000_bib38","doi-asserted-by":"publisher","first-page":"5203","DOI":"10.18653\/v1\/2021.acl-long.405","article-title":"Lower perplexity is not always human-like","volume-title":"Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)","author":"Kuribayashi","year":"2021"},{"issue":"3","key":"2023121518583196000_bib39","doi-asserted-by":"publisher","first-page":"1126","DOI":"10.1016\/j.cognition.2007.05.006","article-title":"Expectation-based syntactic comprehension","volume":"106","author":"Levy","year":"2008","journal-title":"Cognition"},{"key":"2023121518583196000_bib40","doi-asserted-by":"publisher","DOI":"10.7551\/mitpress\/7503.003.0111","article-title":"Speakers optimize information density through syntactic reduction","volume":"19","author":"Levy","year":"2006","journal-title":"Advances in Neural Information Processing Systems"},{"issue":"6","key":"2023121518583196000_bib41","doi-asserted-by":"publisher","first-page":"1382","DOI":"10.1111\/cogs.12274","article-title":"Uncertainty and expectation in sentence processing: Evidence from subcategorization distributions","volume":"40","author":"Linzen","year":"2016","journal-title":"Cognitive Science"},{"key":"2023121518583196000_bib42","doi-asserted-by":"publisher","first-page":"826","DOI":"10.3758\/s13428-017-0908-4","article-title":"The Provo corpus: A large eye-tracking corpus with predictability norms","volume":"50","author":"Luke","year":"2018","journal-title":"Behavior Research Methods"},{"key":"2023121518583196000_bib43","doi-asserted-by":"publisher","first-page":"963","DOI":"10.18653\/v1\/2021.emnlp-main.74","article-title":"Revisiting the Uniform Information Density hypothesis","volume-title":"Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing","author":"Meister","year":"2021"},{"key":"2023121518583196000_bib44","doi-asserted-by":"publisher","first-page":"4975","DOI":"10.18653\/v1\/P19-1491","article-title":"What kind of language is hard to language-model?","volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics","author":"Mielke","year":"2019"},{"key":"2023121518583196000_bib45","doi-asserted-by":"publisher","first-page":"336","DOI":"10.1162\/tacl_a_00548","article-title":"Why does surprisal from larger transformer-based language models provide a poorer fit to human reading times?","volume":"11","author":"Byung-Doh","year":"2023","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2023121518583196000_bib46","doi-asserted-by":"publisher","first-page":"48","DOI":"10.18653\/v1\/N19-4009","article-title":"fairseq: A fast, extensible toolkit for sequence modeling","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations)","author":"Ott","year":"2019"},{"issue":"3","key":"2023121518583196000_bib47","doi-asserted-by":"publisher","first-page":"539","DOI":"10.1353\/lan.2011.0057","article-title":"A cross-language perspective on speech information rate","volume":"87","author":"Pellegrino","year":"2011","journal-title":"Language"},{"key":"2023121518583196000_bib48","doi-asserted-by":"publisher","first-page":"949","DOI":"10.18653\/v1\/2021.emnlp-main.73","article-title":"A surprisal\u2013duration trade-off across and within the world\u2019s languages","volume-title":"Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing","author":"Pimentel","year":"2021"},{"key":"2023121518583196000_bib49","doi-asserted-by":"crossref","DOI":"10.1162\/tacl_a_00603","article-title":"On the effect of anticipation on reading times","author":"Pimentel","year":"2023","journal-title":"Transactions of the Association for Computational Linguistics"},{"issue":"1","key":"2023121518583196000_bib50","first-page":"1","article-title":"Exploring the limits of transfer learning with a unified text-to-text transformer","volume":"21","author":"Raffel","year":"2020","journal-title":"Journal of Machine Learning Research"},{"issue":"1","key":"2023121518583196000_bib51","doi-asserted-by":"publisher","first-page":"65","DOI":"10.1016\/0010-0285(75)90005-5","article-title":"The perceptual span and peripheral cues in reading","volume":"7","author":"Rayner","year":"1975","journal-title":"Cognitive Psychology"},{"issue":"3","key":"2023121518583196000_bib52","doi-asserted-by":"publisher","first-page":"372","DOI":"10.1037\/0033-2909.124.3.372","article-title":"Eye movements in reading and information processing: 20 years of research","volume":"124","author":"Rayner","year":"1998","journal-title":"Psychological Bulletin"},{"key":"2023121518583196000_bib53","doi-asserted-by":"publisher","first-page":"324","DOI":"10.3115\/1699510.1699553","article-title":"Deriving lexical and syntactic expectation-based measures for psycholinguistic modeling via incremental top-down parsing","volume-title":"Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing","author":"Roark","year":"2009"},{"key":"2023121518583196000_bib54","first-page":"29","article-title":"Is multilingual BERT fluent in language generation?","volume-title":"Proceedings of the First NLPL Workshop on Deep Learning for Natural Language Processing","author":"R\u00f6nnqvist","year":"2019"},{"issue":"1","key":"2023121518583196000_bib55","doi-asserted-by":"publisher","first-page":"5","DOI":"10.3758\/s13414-011-0219-2","article-title":"Parafoveal processing in reading","volume":"74","author":"Schotter","year":"2012","journal-title":"Attention, Perception, & Psychophysics"},{"key":"2023121518583196000_bib56","doi-asserted-by":"publisher","first-page":"4086","DOI":"10.18653\/v1\/N19-1413","article-title":"A large-scale study of the effects of word frequency and predictability in naturalistic reading","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Shain","year":"2019"},{"key":"2023121518583196000_bib57","doi-asserted-by":"publisher","first-page":"3718","DOI":"10.18653\/v1\/2021.acl-long.288","article-title":"CDRNN: Discovering complex dynamics in human language processing","volume-title":"Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)","author":"Shain","year":"2021"},{"key":"2023121518583196000_bib58","doi-asserted-by":"publisher","DOI":"10.31234\/osf.io\/4hyna","article-title":"Large- scale evidence for logarithmic effects of word predictability on reading time","author":"Shain","year":"2022","journal-title":"PsyArXiv preprint"},{"issue":"3","key":"2023121518583196000_bib59","doi-asserted-by":"publisher","first-page":"379","DOI":"10.1002\/j.1538-7305.1948.tb01338.x","article-title":"A mathematical theory of communication","volume":"27","author":"Shannon","year":"1948","journal-title":"The Bell System Technical Journal"},{"key":"2023121518583196000_bib60","article-title":"mGPT: Few-shot learners go multilingual","author":"Shliazhko","year":"2022","journal-title":"arXiv preprint arXiv:2204.07580"},{"issue":"6","key":"2023121518583196000_bib61","doi-asserted-by":"publisher","first-page":"2843","DOI":"10.3758\/s13428-021-01772-6","article-title":"Expanding horizons of cross-linguistic research on reading: The multilingual eye-movement corpus (MECO)","volume":"54","author":"Siegelman","year":"2022","journal-title":"Behavior Research Methods"},{"issue":"3","key":"2023121518583196000_bib62","doi-asserted-by":"publisher","first-page":"302","DOI":"10.1016\/j.cognition.2013.02.013","article-title":"The effect of word predictability on reading time is logarithmic","volume":"128","author":"Smith","year":"2013","journal-title":"Cognition"},{"key":"2023121518583196000_bib63","author":"Speer","year":"2022"},{"key":"2023121518583196000_bib64","first-page":"1260","article-title":"Approximations of predictive entropy correlate with reading times","volume-title":"Proceedings of the Cognitive Science Society","author":"van Schijndel","year":"2017"},{"key":"2023121518583196000_bib65","article-title":"Multilingual is not enough: BERT for Finnish","author":"Virtanen","year":"2019","journal-title":"arXiv preprint arXiv:1912.07076"},{"key":"2023121518583196000_bib66","first-page":"1707","article-title":"On the predictive power of neural language models for human real-time comprehension behavior","volume-title":"Proceedings of the 2020 Meeting of the Cognitive Science Society","author":"Wilcox","year":"2020"},{"key":"2023121518583196000_bib67","doi-asserted-by":"publisher","first-page":"1112","DOI":"10.18653\/v1\/2021.acl-long.90","article-title":"When do you need billions of words of pretraining data?","volume-title":"Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)","author":"Zhang","year":"2021"}],"container-title":["Transactions of the Association for Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00612\/2196877\/tacl_a_00612.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00612\/2196877\/tacl_a_00612.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,12,15]],"date-time":"2023-12-15T18:59:02Z","timestamp":1702666742000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/tacl\/article\/doi\/10.1162\/tacl_a_00612\/118718\/Testing-the-Predictions-of-Surprisal-Theory-in-11"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023]]},"references-count":67,"URL":"https:\/\/doi.org\/10.1162\/tacl_a_00612","relation":{},"ISSN":["2307-387X"],"issn-type":[{"value":"2307-387X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023]]},"published":{"date-parts":[[2023]]}}}