{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,10]],"date-time":"2026-04-10T10:06:25Z","timestamp":1775815585632,"version":"3.50.1"},"reference-count":46,"publisher":"MIT Press","license":[{"start":{"date-parts":[[2024,4,12]],"date-time":"2024-04-12T00:00:00Z","timestamp":1712880000000},"content-version":"vor","delay-in-days":102,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2024,4,9]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Modern language models capture a large body of factual knowledge. However, some facts can be incorrectly induced or become obsolete over time, resulting in factually incorrect generations. This has led to the development of various editing methods that allow updating facts encoded by the model. Evaluation of these methods has primarily focused on testing whether an individual fact has been successfully injected, and if similar predictions for other subjects have not changed. Here we argue that such evaluation is limited, since injecting one fact (e.g., \u201cJack Depp is the son of Johnny Depp\u201d) introduces a \u201cripple effect\u201d in the form of additional facts that the model needs to update (e.g., \u201cJack Depp is the sibling of Lily-Rose Depp\u201d). To address this, we propose novel evaluation criteria that consider the implications of an edit on related facts. Using these criteria, we then construct RippleEdits, a diagnostic benchmark of 5K factual edits, capturing various types of ripple effects. We evaluate prominent editing methods on RippleEdits, showing that they fail to introduce consistent changes in the model\u2019s knowledge. In addition, we find that a simple in-context editing baseline obtains the best scores on our benchmark, suggesting a promising research direction for model editing.1<\/jats:p>","DOI":"10.1162\/tacl_a_00644","type":"journal-article","created":{"date-parts":[[2024,4,12]],"date-time":"2024-04-12T19:02:45Z","timestamp":1712948565000},"page":"283-298","update-policy":"https:\/\/doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":21,"title":["Evaluating the Ripple Effects of Knowledge Editing in Language Models"],"prefix":"10.1162","volume":"12","author":[{"given":"Roi","family":"Cohen","sequence":"first","affiliation":[{"name":"Blavatnik School of Computer Science, Tel Aviv University, Israel. roi1@mail.tau.ac.il"}]},{"given":"Eden","family":"Biran","sequence":"additional","affiliation":[{"name":"Blavatnik School of Computer Science, Tel Aviv University, Israel. edenbiran@mail.tau.ac.il"}]},{"given":"Ori","family":"Yoran","sequence":"additional","affiliation":[{"name":"Blavatnik School of Computer Science, Tel Aviv University, Israel. oriy@mail.tau.ac.il"}]},{"given":"Amir","family":"Globerson","sequence":"additional","affiliation":[{"name":"Blavatnik School of Computer Science, Tel Aviv University, Israel. gamir@tauex.tau.ac.il"},{"name":"Google Research, Israel"}]},{"given":"Mor","family":"Geva","sequence":"additional","affiliation":[{"name":"Blavatnik School of Computer Science, Tel Aviv University, Israel. morgeva@tauex.tau.ac.il"},{"name":"Google Research, Israel"}]}],"member":"281","published-online":{"date-parts":[[2024,4,9]]},"reference":[{"key":"2024041219023342000_bib1","doi-asserted-by":"publisher","first-page":"95","DOI":"10.18653\/v1\/2022.bigscience-1.9","article-title":"GPT-NeoX-20B: An open-source autoregressive language model","volume-title":"Proceedings of BigScience Episode #5 \u2013 Workshop on Challenges & Perspectives in Creating Large Language Models","author":"Black","year":"2022"},{"key":"2024041219023342000_bib2","article-title":"Language models are few-shot learners","volume-title":"Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6\u201312, 2020, virtual","author":"Brown","year":"2020"},{"key":"2024041219023342000_bib3","article-title":"Evaluating large language models trained on code","author":"Chen","year":"2021","journal-title":"ArXiv preprint"},{"key":"2024041219023342000_bib4","doi-asserted-by":"publisher","first-page":"1856","DOI":"10.18653\/v1\/2023.findings-eacl.139","article-title":"Crawling the internal knowledge-base of language models","volume-title":"Findings of the Association for Computational Linguistics: EACL 2023","author":"Cohen","year":"2023"},{"key":"2024041219023342000_bib5","doi-asserted-by":"publisher","first-page":"8493","DOI":"10.18653\/v1\/2022.acl-long.581","article-title":"Knowledge neurons in pretrained transformers","volume-title":"Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Dai","year":"2022"},{"key":"2024041219023342000_bib6","doi-asserted-by":"publisher","first-page":"6491","DOI":"10.18653\/v1\/2021.emnlp-main.522","article-title":"Editing factual knowledge in language models","volume-title":"Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing","author":"De Cao","year":"2021"},{"key":"2024041219023342000_bib7","doi-asserted-by":"publisher","first-page":"257","DOI":"10.1162\/tacl_a_00459","article-title":"Time-aware language models as temporal knowledge bases","volume":"10","author":"Dhingra","year":"2022","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2024041219023342000_bib8","article-title":"Formal representations of belief","volume-title":"The Stanford Encyclopedia of Philosophy","author":"Genin","year":"2022"},{"key":"2024041219023342000_bib9","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.emnlp-main.751","article-title":"Dissecting recall of factual associations in auto-regressive language models","author":"Geva","year":"2023","journal-title":"arXiv preprint arXiv:2304.14767"},{"key":"2024041219023342000_bib10","doi-asserted-by":"publisher","first-page":"5484","DOI":"10.18653\/v1\/2021.emnlp-main.446","article-title":"Transformer feed-forward layers are key-value memories","volume-title":"Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing","author":"Geva","year":"2021"},{"key":"2024041219023342000_bib11","article-title":"Editing commonsense knowledge in gpt","author":"Gupta","year":"2023"},{"key":"2024041219023342000_bib12","doi-asserted-by":"publisher","first-page":"2714","DOI":"10.18653\/v1\/2023.eacl-main.199","article-title":"Methods for measuring, updating, and visualizing factual beliefs in language models","volume-title":"Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics","author":"Hase","year":"2023"},{"key":"2024041219023342000_bib13","doi-asserted-by":"publisher","first-page":"1772","DOI":"10.18653\/v1\/2021.eacl-main.153","article-title":"Language models as knowledge bases: On entity representations, storage capacity, and paraphrased queries","volume-title":"Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume","author":"Heinzerling","year":"2021"},{"key":"2024041219023342000_bib14","article-title":"Inspecting and editing knowledge representations in language models","author":"Hernandez","year":"2023"},{"key":"2024041219023342000_bib15","article-title":"Measuring and manipulating knowledge representations in language models","author":"Hernandez","year":"2023","journal-title":"ArXiv preprint"},{"key":"2024041219023342000_bib16","doi-asserted-by":"publisher","first-page":"11548","DOI":"10.18653\/v1\/2023.findings-acl.733","article-title":"Detecting edit failures in large language models: An improved specificity benchmark","volume-title":"Findings of the Association for Computational Linguistics: ACL 2023","author":"Hoelscher-Obermaier","year":"2023"},{"key":"2024041219023342000_bib17","article-title":"Towards continual knowledge learning of language models","volume-title":"The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25\u201329, 2022","author":"Jang","year":"2022"},{"key":"2024041219023342000_bib18","article-title":"Language models (mostly) know what they know","author":"Kadavath","year":"2022","journal-title":"ArXiv preprint"},{"key":"2024041219023342000_bib19","doi-asserted-by":"publisher","first-page":"8849","DOI":"10.18653\/v1\/2021.emnlp-main.697","article-title":"BeliefBank: Adding memory to a pre-trained language model for a systematic notion of belief","volume-title":"Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing","author":"Kassner","year":"2021"},{"key":"2024041219023342000_bib20","first-page":"29348","article-title":"Mind the gap: Assessing temporal generalization in neural language models","volume-title":"Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6\u201314, 2021, virtual","author":"Lazaridou","year":"2021"},{"key":"2024041219023342000_bib21","doi-asserted-by":"publisher","first-page":"333","DOI":"10.18653\/v1\/K17-1034","article-title":"Zero-shot relation extraction via reading comprehension","volume-title":"Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)","author":"Levy","year":"2017"},{"key":"2024041219023342000_bib22","article-title":"Retrieval- augmented generation for knowledge-intensive NLP tasks","volume-title":"Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6\u201312, 2020, virtual","author":"Lewis","year":"2020"},{"issue":"9","key":"2024041219023342000_bib23","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3560815","article-title":"Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing","volume":"55","author":"Liu","year":"2023","journal-title":"ACM Computing Surveys"},{"key":"2024041219023342000_bib24","doi-asserted-by":"publisher","first-page":"9802","DOI":"10.18653\/v1\/2023.acl-long.546","article-title":"When not to trust language models: Investigating effectiveness of parametric and non-parametric memories","volume-title":"Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Mallen","year":"2023"},{"key":"2024041219023342000_bib25","first-page":"17359","article-title":"Locating and editing factual associations in gpt","volume":"35","author":"Meng","year":"2022","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2024041219023342000_bib26","article-title":"Mass-editing memory in a transformer","volume-title":"The Eleventh International Conference on Learning Representations","author":"Meng","year":"2023"},{"key":"2024041219023342000_bib27","article-title":"Fast model editing at scale","volume-title":"The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25\u201329, 2022","author":"Mitchell","year":"2022"},{"key":"2024041219023342000_bib28","doi-asserted-by":"publisher","first-page":"5469","DOI":"10.18653\/v1\/2023.acl-long.300","article-title":"Can LMs learn new entities from descriptions? Challenges in propagating injected knowledge","volume-title":"Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Onoe","year":"2023"},{"key":"2024041219023342000_bib29","article-title":"Training language models to follow instructions with human feedback","author":"Ouyang","year":"2022"},{"key":"2024041219023342000_bib30","doi-asserted-by":"publisher","first-page":"43","DOI":"10.18653\/v1\/D19-1005","article-title":"Knowledge enhanced contextual word representations","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Peters","year":"2019"},{"key":"2024041219023342000_bib31","doi-asserted-by":"publisher","first-page":"2463","DOI":"10.18653\/v1\/D19-1250","article-title":"Language models as knowledge bases?","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Petroni","year":"2019"},{"issue":"8","key":"2024041219023342000_bib32","first-page":"9","article-title":"Language models are unsupervised multitask learners","volume":"1","author":"Radford","year":"2019","journal-title":"OpenAI blog"},{"key":"2024041219023342000_bib33","article-title":"Language models as or for knowledge bases","author":"Razniewski","year":"2021","journal-title":"ArXiv preprint"},{"key":"2024041219023342000_bib34","doi-asserted-by":"publisher","first-page":"5418","DOI":"10.18653\/v1\/2020.emnlp-main.437","article-title":"How much knowledge can you pack into the parameters of a language model?","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Roberts","year":"2020"},{"key":"2024041219023342000_bib35","doi-asserted-by":"publisher","first-page":"4222","DOI":"10.18653\/v1\/2020.emnlp-main.346","article-title":"AutoPrompt: Eliciting knowledge from language models with automatically generated prompts","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Shin","year":"2020"},{"key":"2024041219023342000_bib36","article-title":"Prompting GPT-3 to be reliable","volume-title":"The Eleventh International Conference on Learning Representations","author":"Si","year":"2023"},{"key":"2024041219023342000_bib37","article-title":"Llama: Open and efficient foundation language models","author":"Touvron","year":"2023","journal-title":"ArXiv preprint"},{"key":"2024041219023342000_bib38","doi-asserted-by":"publisher","first-page":"1405","DOI":"10.18653\/v1\/2021.findings-acl.121","article-title":"K-Adapter: Infusing knowledge into pre-trained models with adapters","volume-title":"Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021","author":"Wang","year":"2021"},{"key":"2024041219023342000_bib39","doi-asserted-by":"publisher","first-page":"176","DOI":"10.1162\/tacl_a_00360","article-title":"KEPLER: A unified model for knowledge embedding and pre-trained language representation","volume":"9","author":"Wang","year":"2021","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2024041219023342000_bib40","doi-asserted-by":"publisher","first-page":"131","DOI":"10.1007\/978-3-031-17120-8_11","article-title":"Kformer: Knowledge injection in transformer feed-forward layers","volume-title":"Natural Language Processing and Chinese Computing","author":"Yao","year":"2022"},{"key":"2024041219023342000_bib41","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/2023.emnlp-main.632","article-title":"Editing large language models: Problems, methods, and opportunities","author":"Yao","year":"2023"},{"key":"2024041219023342000_bib42","doi-asserted-by":"publisher","first-page":"4007","DOI":"10.24963\/ijcai.2021\/552","article-title":"Drop redundant, shrink irrelevant: Selective knowledge injection for language pretraining","volume-title":"Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21","author":"Zhang","year":"2021"},{"key":"2024041219023342000_bib43","article-title":"Greaselm: Graph reasoning enhanced language models","volume-title":"The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25\u201329, 2022","author":"Zhang","year":"2022"},{"key":"2024041219023342000_bib44","doi-asserted-by":"publisher","first-page":"1441","DOI":"10.18653\/v1\/P19-1139","article-title":"ERNIE: Enhanced language representation with informative entities","volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics","author":"Zhang","year":"2019"},{"key":"2024041219023342000_bib45","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.emnlp-main.296","article-title":"Can we edit factual knowledge by in-context learning?","author":"Ce","year":"2023"},{"key":"2024041219023342000_bib46","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2023.emnlp-main.971","article-title":"Mquake: Assessing knowledge editing in language models via multi-hop questions","author":"Zhong","year":"2023"}],"container-title":["Transactions of the Association for Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00644\/2362212\/tacl_a_00644.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00644\/2362212\/tacl_a_00644.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,4,12]],"date-time":"2024-04-12T19:02:56Z","timestamp":1712948576000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/tacl\/article\/doi\/10.1162\/tacl_a_00644\/120576\/Evaluating-the-Ripple-Effects-of-Knowledge-Editing"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024]]},"references-count":46,"URL":"https:\/\/doi.org\/10.1162\/tacl_a_00644","relation":{},"ISSN":["2307-387X"],"issn-type":[{"value":"2307-387X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024]]},"published":{"date-parts":[[2024]]}}}