{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,13]],"date-time":"2026-05-13T23:34:45Z","timestamp":1778715285988,"version":"3.51.4"},"reference-count":55,"publisher":"MIT Press","license":[{"start":{"date-parts":[[2023,12,15]],"date-time":"2023-12-15T00:00:00Z","timestamp":1702598400000},"content-version":"vor","delay-in-days":348,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,12,14]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Over the past two decades, numerous studies have demonstrated how less-predictable (i.e., higher surprisal) words take more time to read. In general, these studies have implicitly assumed the reading process is purely responsive: Readers observe a new word and allocate time to process it as required. We argue that prior results are also compatible with a reading process that is at least partially anticipatory: Readers could make predictions about a future word and allocate time to process it based on their expectation. In this work, we operationalize this anticipation as a word\u2019s contextual entropy. We assess the effect of anticipation on reading by comparing how well surprisal and contextual entropy predict reading times on four naturalistic reading datasets: two self-paced and two eye-tracking. Experimentally, across datasets and analyses, we find substantial evidence for effects of contextual entropy over surprisal on a word\u2019s reading time (RT): In fact, entropy is sometimes better than surprisal in predicting a word\u2019s RT. Spillover effects, however, are generally not captured by entropy, but only by surprisal. Further, we hypothesize four cognitive mechanisms through which contextual entropy could impact RTs\u2014three of which we are able to design experiments to analyze. Overall, our results support a view of reading that is not just responsive, but also anticipatory.1<\/jats:p>","DOI":"10.1162\/tacl_a_00603","type":"journal-article","created":{"date-parts":[[2023,12,15]],"date-time":"2023-12-15T18:58:45Z","timestamp":1702666725000},"page":"1624-1642","update-policy":"https:\/\/doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":19,"title":["On the Effect of Anticipation on Reading Times"],"prefix":"10.1162","volume":"11","author":[{"given":"Tiago","family":"Pimentel","sequence":"first","affiliation":[{"name":"University of Cambridge, UK. tp472@cam.ac.uk"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Clara","family":"Meister","sequence":"additional","affiliation":[{"name":"ETH Z\u00fcrich, Switzerland. clara.meister@inf.ethz.ch"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ethan G.","family":"Wilcox","sequence":"additional","affiliation":[{"name":"ETH Z\u00fcrich, Switzerland. ethan.wilcox@inf.ethz.ch"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Roger P.","family":"Levy","sequence":"additional","affiliation":[{"name":"MIT, USA. rplevy@mit.edu"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ryan","family":"Cotterell","sequence":"additional","affiliation":[{"name":"ETH Z\u00fcrich, Switzerland. ryan.cotterell@inf.ethz.ch"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"281","published-online":{"date-parts":[[2023,12,14]]},"reference":[{"key":"2023121518583129900_bib1","doi-asserted-by":"publisher","first-page":"76","DOI":"10.1016\/j.jml.2014.11.003","article-title":"Do successor effects in reading reflect lexical parafoveal processing? Evidence from corpus-based and experimental eye movement data","volume":"79\u201380","author":"Angele","year":"2015","journal-title":"Journal of Memory and Language"},{"key":"2023121518583129900_bib2","doi-asserted-by":"publisher","first-page":"107198","DOI":"10.1016\/j.neuropsychologia.2019.107198","article-title":"Evaluating information-theoretic measures of word prediction in naturalistic sentence reading","volume":"134","author":"Aurnhammer","year":"2019","journal-title":"Neuropsychologia"},{"issue":"1","key":"2023121518583129900_bib3","doi-asserted-by":"publisher","first-page":"289","DOI":"10.1111\/j.2517-6161.1995.tb02031.x","article-title":"Controlling the false discovery rate: A practical and powerful approach to multiple testing","volume":"57","author":"Benjamini","year":"1995","journal-title":"Journal of the Royal Statistical Society. Series B (Methodological)"},{"issue":"3","key":"2023121518583129900_bib4","doi-asserted-by":"publisher","first-page":"301","DOI":"10.1080\/01690965.2010.492228","article-title":"Parallel processing and sentence comprehension difficulty","volume":"26","author":"Boston","year":"2011","journal-title":"Language and Cognitive Processes"},{"issue":"6","key":"2023121518583129900_bib5","doi-asserted-by":"publisher","first-page":"211837","DOI":"10.1098\/rsos.211837","article-title":"Prediction as a basis for skilled reading: Insights from modern language models","volume":"9","author":"Cevoli","year":"2022","journal-title":"Royal Society Open Science"},{"issue":"2","key":"2023121518583129900_bib6","doi-asserted-by":"publisher","first-page":"193","DOI":"10.1016\/j.cognition.2008.07.008","article-title":"Data from eye-tracking corpora as evidence for theories of syntactic processing complexity","volume":"109","author":"Demberg","year":"2008","journal-title":"Cognition"},{"issue":"6","key":"2023121518583129900_bib7","doi-asserted-by":"publisher","first-page":"641","DOI":"10.1016\/S0022-5371(81)90220-6","article-title":"Contextual effects on word perception and eye movements during reading","volume":"20","author":"Ehrlich","year":"1981","journal-title":"Journal of Verbal Learning and Verbal Behavior"},{"key":"2023121518583129900_bib8","first-page":"398","article-title":"Lexical surprisal as a general predictor of reading time","volume-title":"Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics","author":"Monsalve","year":"2012"},{"key":"2023121518583129900_bib9","article-title":"The natural stories corpus","volume-title":"Proceedings of the Eleventh International Conference on Language Resources and Evaluation","author":"Futrell","year":"2018"},{"issue":"3","key":"2023121518583129900_bib10","doi-asserted-by":"publisher","first-page":"369","DOI":"10.1038\/s41593-022-01026-4","article-title":"Shared computational principles for language processing in humans and deep language models","volume":"25","author":"Goldstein","year":"2022","journal-title":"Nature Neuroscience"},{"key":"2023121518583129900_bib11","doi-asserted-by":"publisher","first-page":"10","DOI":"10.18653\/v1\/W18-0102","article-title":"Predictive power of word surprisal for reading times is a linear function of language model quality","volume-title":"Proceedings of the 8th Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2018)","author":"Goodkind","year":"2018"},{"issue":"3","key":"2023121518583129900_bib12","doi-asserted-by":"publisher","first-page":"424","DOI":"10.2307\/1912791","article-title":"Investigating causal relations by econometric models and cross-spectral methods","volume":"37","author":"Granger","year":"1969","journal-title":"Econometrica"},{"key":"2023121518583129900_bib13","doi-asserted-by":"publisher","first-page":"1","DOI":"10.3115\/1073336.1073357","article-title":"A probabilistic Earley parser as a psycholinguistic model","volume-title":"Second Meeting of the North American Chapter of the Association for Computational Linguistics","author":"Hale","year":"2001"},{"issue":"2","key":"2023121518583129900_bib14","doi-asserted-by":"publisher","first-page":"101","DOI":"10.1023\/A:1022492123056","article-title":"The information conveyed by words in sentences","volume":"32","author":"Hale","year":"2003","journal-title":"Journal of Psycholinguistic Research"},{"issue":"4","key":"2023121518583129900_bib15","doi-asserted-by":"publisher","first-page":"643","DOI":"10.1207\/s15516709cog0000_64","article-title":"Uncertainty about the rest of the sentence","volume":"30","author":"Hale","year":"2006","journal-title":"Cognitive Science"},{"issue":"9","key":"2023121518583129900_bib16","doi-asserted-by":"publisher","first-page":"397","DOI":"10.1111\/lnc3.12196","article-title":"Information-theoretical complexity metrics","volume":"10","author":"Hale","year":"2016","journal-title":"Language and Linguistics Compass"},{"key":"2023121518583129900_bib17","doi-asserted-by":"publisher","DOI":"10.31234\/osf.io\/qjnpv","article-title":"The plausibility of sampling as an algorithmic theory of sentence processing","author":"Hoover","year":"2022","journal-title":"PsyArXiv preprint"},{"key":"2023121518583129900_bib18","first-page":"317","article-title":"The entropy rate principle as a predictor of processing effort: An evaluation against eye-tracking data","volume-title":"Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing","author":"Keller","year":"2004"},{"key":"2023121518583129900_bib19","article-title":"The Dundee corpus","volume-title":"Proceedings of the 12th European Conference on Eye Movements","author":"Kennedy","year":"2003"},{"key":"2023121518583129900_bib20","doi-asserted-by":"publisher","first-page":"10421","DOI":"10.18653\/v1\/2022.emnlp-main.712","article-title":"Context limitations make neural language models more human-like","volume-title":"Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing","author":"Kuribayashi","year":"2022"},{"key":"2023121518583129900_bib21","doi-asserted-by":"publisher","first-page":"5203","DOI":"10.18653\/v1\/2021.acl-long.405","article-title":"Lower perplexity is not always human-like","volume-title":"Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)","author":"Kuribayashi","year":"2021"},{"key":"2023121518583129900_bib22","unstructured":"Roger\n              Levy\n            \n          . 2005. Probabilistic Models of Word Order and Syntactic Discontinuity. Ph.D. thesis, Stanford University, Stanford, CA, USA."},{"issue":"3","key":"2023121518583129900_bib23","doi-asserted-by":"publisher","first-page":"1126","DOI":"10.1016\/j.cognition.2007.05.006","article-title":"Expectation-based syntactic comprehension","volume":"106","author":"Levy","year":"2008","journal-title":"Cognition"},{"key":"2023121518583129900_bib24","doi-asserted-by":"publisher","first-page":"10","DOI":"10.3115\/v1\/W14-2002","article-title":"Investigating the role of entropy in sentence processing","volume-title":"Proceedings of the Fifth Workshop on Cognitive Modeling and Computational Linguistics","author":"Linzen","year":"2014"},{"issue":"2","key":"2023121518583129900_bib25","doi-asserted-by":"publisher","first-page":"826","DOI":"10.3758\/s13428-017-0908-4","article-title":"The Provo corpus: A large eye-tracking corpus with predictability norms","volume":"50","author":"Luke","year":"2018","journal-title":"Behavior Research Methods"},{"key":"2023121518583129900_bib26","doi-asserted-by":"publisher","first-page":"20","DOI":"10.18653\/v1\/2022.acl-short.3","article-title":"Analyzing wrap-up effects through an information-theoretic lens","volume-title":"Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)","author":"Meister","year":"2022"},{"key":"2023121518583129900_bib27","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.emnlp-main.74","article-title":"Revisiting the uniform information density hypothesis","volume-title":"Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing","author":"Meister","year":"2021"},{"key":"2023121518583129900_bib28","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2212.12131","article-title":"Why does surprisal from larger transformer-based language models provide a poorer fit to human reading times?","author":"Byung-Doh","year":"2022","journal-title":"arXiv preprint arXiv:2112.11446"},{"key":"2023121518583129900_bib29","doi-asserted-by":"publisher","first-page":"726","DOI":"10.1037\/0096-1523.34.3.726","article-title":"Immediate and delayed effects of word frequency and word length on eye movements in reading: A reversed delayed effect of word length","volume":"34","author":"Pollatsek","year":"2008","journal-title":"Journal of Experimental Psychology: Human Perception and Performance"},{"issue":"8","key":"2023121518583129900_bib30","first-page":"9","article-title":"Language models are unsupervised multitask learners","volume":"1","author":"Radford","year":"2019","journal-title":"OpenAI Blog"},{"issue":"3","key":"2023121518583129900_bib31","doi-asserted-by":"publisher","first-page":"372","DOI":"10.1037\/0033-2909.124.3.372","article-title":"Eye movements in reading and information processing: 20 years of research","volume":"124","author":"Rayner","year":"1998","journal-title":"Psychological Bulletin"},{"issue":"1","key":"2023121518583129900_bib32","doi-asserted-by":"publisher","first-page":"4","DOI":"10.1016\/j.biopsycho.2008.05.002","article-title":"Language processing in reading and speech perception is fast and incremental: Implications for event-related potential research","volume":"80","author":"Rayner","year":"2009","journal-title":"Biological Psychology"},{"key":"2023121518583129900_bib33","doi-asserted-by":"publisher","first-page":"79","DOI":"10.1002\/9780470757642.ch5","article-title":"Eye movements during reading","volume-title":"The Science of Reading: A Hand book","author":"Rayner","year":"2005"},{"issue":"2","key":"2023121518583129900_bib34","doi-asserted-by":"publisher","first-page":"514","DOI":"10.1037\/a0020990","article-title":"Eye movements and word skipping during reading: Effects of word length and predictability","volume":"37","author":"Rayner","year":"2011","journal-title":"Journal of Experimental Psychology: Human Perception and Performance"},{"issue":"1","key":"2023121518583129900_bib35","doi-asserted-by":"publisher","first-page":"1","DOI":"10.3758\/PBR.16.1.1","article-title":"Using E-Z reader to model the effects of higher level language processing on eye movements during reading","volume":"16","author":"Reichle","year":"2009","journal-title":"Psychonomic Bulletin & Review"},{"key":"2023121518583129900_bib36","article-title":"On measures of entropy and information","volume-title":"Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Contributions to the Theory of Statistics","author":"R\u00e9nyi","year":"1961"},{"key":"2023121518583129900_bib37","doi-asserted-by":"publisher","first-page":"324","DOI":"10.3115\/1699510.1699553","article-title":"Deriving lexical and syntactic expectation-based measures for psycholinguistic modeling via incremental top-down parsing","volume-title":"Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing","author":"Roark","year":"2009"},{"key":"2023121518583129900_bib38","doi-asserted-by":"publisher","first-page":"4704","DOI":"10.18653\/v1\/D18-1499","article-title":"A neural model of adaptation in reading","volume-title":"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing","author":"van Schijndel","year":"2018"},{"key":"2023121518583129900_bib39","doi-asserted-by":"publisher","first-page":"1","DOI":"10.7275\/qtbb-9d05","article-title":"Can entropy explain successor surprisal effects in reading?","volume-title":"Proceedings of the Society for Computation in Linguistics (SCiL) 2019","author":"van Schijndel","year":"2019"},{"key":"2023121518583129900_bib40","first-page":"32","article-title":"Addressing surprisal deficiencies in reading time models","volume-title":"Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC)","author":"van Schijndel","year":"2016"},{"key":"2023121518583129900_bib41","first-page":"1260","article-title":"Approximations of predictive entropy correlate with reading times.","volume-title":"Proceedings of the Cognitive Science Society","author":"van Schijndel","year":"2017"},{"issue":"1","key":"2023121518583129900_bib42","doi-asserted-by":"publisher","first-page":"5","DOI":"10.3758\/s13414-011-0219-2","article-title":"Parafoveal processing in reading","volume":"74","author":"Schotter","year":"2012","journal-title":"Attention, Perception, & Psychophysics"},{"key":"2023121518583129900_bib43","doi-asserted-by":"publisher","first-page":"4086","DOI":"10.18653\/v1\/N19-1413","article-title":"A large-scale study of the effects of word frequency and predictability in naturalistic reading","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Shain","year":"2019"},{"key":"2023121518583129900_bib44","doi-asserted-by":"publisher","first-page":"3718","DOI":"10.18653\/v1\/2021.acl-long.288","article-title":"CDRNN: Discovering complex dynamics in human language processing","volume-title":"Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)","author":"Shain","year":"2021"},{"key":"2023121518583129900_bib45","doi-asserted-by":"publisher","DOI":"10.31234\/osf.io\/4hyna","article-title":"Large-scale evidence for logarithmic effects of word predictability on reading time","author":"Shain","year":"2022","journal-title":"PsyArXiv preprint"},{"key":"2023121518583129900_bib46","doi-asserted-by":"publisher","first-page":"104735","DOI":"10.1016\/j.cognition.2021.104735","article-title":"Continuous-time deconvolutional regression for psycholinguistic modeling","volume":"215","author":"Shain","year":"2021","journal-title":"Cognition"},{"key":"2023121518583129900_bib47","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2209.12128","article-title":"A deep learning approach to analyzing continuous-time systems","author":"Shain","year":"2022","journal-title":"arXiv preprint arXiv:2209.12128"},{"issue":"3","key":"2023121518583129900_bib48","doi-asserted-by":"publisher","first-page":"379","DOI":"10.1002\/j.1538-7305.1948.tb01338.x","article-title":"A mathematical theory of communication","volume":"27","author":"Shannon","year":"1948","journal-title":"The Bell System Technical Journal"},{"key":"2023121518583129900_bib49","first-page":"595","article-title":"Optimal processing times in reading: A formal model and empirical investigation","volume-title":"Proceedings of the Cognitive Science Society","author":"Smith","year":"2008"},{"key":"2023121518583129900_bib50","first-page":"1313","article-title":"Fixation durations in first-pass reading reflect uncertainty about word identity","volume-title":"Proceedings of the Cognitive Science Society","author":"Smith","year":"2010"},{"issue":"3","key":"2023121518583129900_bib51","doi-asserted-by":"publisher","first-page":"302","DOI":"10.1016\/j.cognition.2013.02.013","article-title":"The effect of word predictability on reading time is logarithmic","volume":"128","author":"Smith","year":"2013","journal-title":"Cognition"},{"key":"2023121518583129900_bib52","doi-asserted-by":"publisher","first-page":"27","DOI":"10.1016\/B978-0-444-70113-8.50007-2","article-title":"Stimulus-induced midflight modification of saccade trajectories","volume-title":"Eye Movements from Physiology to Cognition","author":"Van Gisbergen","year":"1987"},{"key":"2023121518583129900_bib53","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2006.01912","article-title":"On the predictive power of neural language models for human real-time comprehension behavior","volume-title":"Proceedings of the Cognitive Science Society","author":"Wilcox","year":"2020"},{"issue":"6","key":"2023121518583129900_bib54","doi-asserted-by":"publisher","first-page":"2506","DOI":"10.1093\/cercor\/bhv075","article-title":"Prediction during natural language comprehension","volume":"26","author":"Willems","year":"2015","journal-title":"Cerebral Cortex"},{"key":"2023121518583129900_bib55","doi-asserted-by":"publisher","first-page":"38","DOI":"10.18653\/v1\/2020.emnlp-demos.6","article-title":"Transformers: State-of-the-art natural language processing","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations","author":"Wolf","year":"2020"}],"container-title":["Transactions of the Association for Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00603\/2196892\/tacl_a_00603.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00603\/2196892\/tacl_a_00603.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,12,15]],"date-time":"2023-12-15T18:59:00Z","timestamp":1702666740000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/tacl\/article\/doi\/10.1162\/tacl_a_00603\/118720\/On-the-Effect-of-Anticipation-on-Reading-Times"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023]]},"references-count":55,"URL":"https:\/\/doi.org\/10.1162\/tacl_a_00603","relation":{},"ISSN":["2307-387X"],"issn-type":[{"value":"2307-387X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023]]},"published":{"date-parts":[[2023]]}}}