{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,8]],"date-time":"2026-04-08T22:20:41Z","timestamp":1775686841673,"version":"3.50.1"},"reference-count":34,"publisher":"MIT Press - Journals","license":[{"start":{"date-parts":[[2021,4,28]],"date-time":"2021-04-28T00:00:00Z","timestamp":1619568000000},"content-version":"vor","delay-in-days":117,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,4,26]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Models for question answering, dialogue agents, and summarization often interpret the meaning of a sentence in a rich context and use that meaning in a new context. Taking excerpts of text can be problematic, as key pieces may not be explicit in a local window. We isolate and define the problem of sentence decontextualization: taking a sentence together with its context and rewriting it to be interpretable out of context, while preserving its meaning. We describe an annotation procedure, collect data on the Wikipedia corpus, and use the data to train models to automatically decontextualize sentences. We present preliminary studies that show the value of sentence decontextualization in a user-facing task, and as preprocessing for systems that perform document understanding. We argue that decontextualization is an important subtask in many downstream applications, and that the definitions and resources provided can benefit tasks that operate on sentences that occur in a richer context.<\/jats:p>","DOI":"10.1162\/tacl_a_00377","type":"journal-article","created":{"date-parts":[[2021,4,28]],"date-time":"2021-04-28T23:50:55Z","timestamp":1619653855000},"page":"447-461","update-policy":"https:\/\/doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":21,"title":["Decontextualization: Making Sentences Stand-Alone"],"prefix":"10.1162","volume":"9","author":[{"given":"Eunsol","family":"Choi","sequence":"first","affiliation":[{"name":"Department of Computer Science, The University of Texas at Austin, United States. eunsol@cs.utexas.edu"}]},{"given":"Jennimaria","family":"Palomaki","sequence":"additional","affiliation":[{"name":"Google Research, United States. jpalomaki@google.com"}]},{"given":"Matthew","family":"Lamm","sequence":"additional","affiliation":[{"name":"Google Research, United States. mrlamm@google.com"}]},{"given":"Tom","family":"Kwiatkowski","sequence":"additional","affiliation":[{"name":"Google Research, United States. tomkwiat@google.com"}]},{"given":"Dipanjan","family":"Das","sequence":"additional","affiliation":[{"name":"Google Research, United States. dipanjand@google.com"}]},{"given":"Michael","family":"Collins","sequence":"additional","affiliation":[{"name":"Google Research, United States. mjcollins@google.com"}]}],"member":"281","published-online":{"date-parts":[[2021,4,26]]},"reference":[{"key":"2021060823410317800_bib1","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1609\/aimag.v36i1.2564","article-title":"Truth is a lie: Crowd truth and the seven myths of human annotation","volume":"36","author":"Aroyo","year":"2015","journal-title":"AI Magazine"},{"key":"2021060823410317800_bib2","first-page":"245","article-title":"Lexi: A tool for adaptive, personalized text simplification","volume-title":"Proceedings of the 27th International Conference on Computational Linguistics","author":"Bingel","year":"2018"},{"key":"2021060823410317800_bib3","volume-title":"Introduction to Pragmatics","author":"Birner","year":"2012","edition":"1st edition"},{"issue":"1","key":"2021060823410317800_bib4","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18637\/jss.v080.i01","article-title":"brms: An R package for Bayesian multilevel models using Stan","volume":"80","author":"B\u00fcrkner","year":"2017","journal-title":"Journal of Statistical Software"},{"key":"2021060823410317800_bib5","article-title":"Reading Wikipedia to answer open-domain questions","author":"Chen","year":"2017","journal-title":"Proceedings of the Annual Meeting of the Association for Computation Linguistics (ACL)"},{"key":"2021060823410317800_bib6","first-page":"484","article-title":"Neural summarization by extracting sentences and words","volume-title":"Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics","author":"Cheng","year":"2016"},{"key":"2021060823410317800_bib7","article-title":"Bridging","volume-title":"Theoretical issues in natural language processing","author":"Clark","year":"1975"},{"key":"2021060823410317800_bib8","article-title":"BERT: Pre-training of deep bidirectional transformers for language understanding","author":"Devlin","year":"2018"},{"key":"2021060823410317800_bib9","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/P19-1102","article-title":"Multi-news: A large-scale multi-document summarization dataset and abstractive hierarchical model","volume-title":"Proceedings of the Annual Meeting of the Association for Computation Linguistics (ACL)","author":"Fabbri","year":"2019"},{"key":"2021060823410317800_bib10","first-page":"41","article-title":"Logic and conversation","volume-title":"Speech Acts, volume 3 of  Syntax and Semantics","author":"Paul Grice","year":"1975"},{"key":"2021060823410317800_bib11","article-title":"Realm: Retrieval-augmented language model pre- training","author":"Guu","year":"2020"},{"key":"2021060823410317800_bib12","doi-asserted-by":"crossref","DOI":"10.3115\/1598819.1598830","article-title":"Event coreference for information extraction","author":"Humphreys","year":"1997"},{"key":"2021060823410317800_bib13","article-title":"Leveraging passage retrieval with generative models for open domain question answering","author":"Izacard","year":"2020"},{"key":"2021060823410317800_bib14","doi-asserted-by":"crossref","first-page":"64","DOI":"10.1162\/tacl_a_00300","article-title":"SpanBERT: Improving pre-training by representing and predicting spans","volume":"8","author":"Joshi","year":"2020","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2021060823410317800_bib15","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/2020.emnlp-main.550","article-title":"Dense passage retrieval for Open-Domain question answering","author":"Karpukhin","year":"2020","journal-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)"},{"key":"2021060823410317800_bib16","doi-asserted-by":"crossref","DOI":"10.1162\/tacl_a_00276","article-title":"Natural questions: A benchmark for question answering research","author":"Kwiatkowski","year":"2019","journal-title":"Transactions of the Association of Computational Linguistics"},{"key":"2021060823410317800_bib17","article-title":"Latent retrieval for weakly supervised open domain question answering","author":"Lee","year":"2019","journal-title":"arXiv preprint 1906.00300"},{"key":"2021060823410317800_bib18","article-title":"Improving the annotation of sentence specificity","volume-title":"LREC","author":"Li","year":"2016"},{"key":"2021060823410317800_bib19","doi-asserted-by":"crossref","DOI":"10.1145\/584792.584854","article-title":"Passage retrieval based on language models","volume-title":"CIKM \u201902","author":"Liu","year":"2002"},{"key":"2021060823410317800_bib20","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/D16-1261","article-title":"Improving information extraction by acquiring external evidence with reinforcement learning","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Narasimhan","year":"2016"},{"key":"2021060823410317800_bib21","doi-asserted-by":"crossref","DOI":"10.3115\/1118162.1118166","article-title":"Revisions that improve cohesion in multi-document summaries: a preliminary study","volume-title":"Proceedings of the Annual Meeting of the Association for Computation Linguistics (ACL)","author":"Otterbacher","year":"2002"},{"key":"2021060823410317800_bib22","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/2020.emnlp-main.89","article-title":"ToTTo: A controlled table-to-text generation dataset","author":"Parikh","year":"2020","journal-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)"},{"key":"2021060823410317800_bib23","doi-asserted-by":"crossref","first-page":"677","DOI":"10.1162\/tacl_a_00293","article-title":"Inherent disagreements in human textual inferences","volume":"7","author":"Pavlick","year":"2019","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2021060823410317800_bib24","article-title":"Automatic evaluation of linguistic quality in multi-document summarization","volume-title":"Proceedings of the Annual Meeting of the Association for Computation Linguistics (ACL)","author":"Pitler","year":"2010"},{"key":"2021060823410317800_bib25","article-title":"ConLL-2012 shared task: Modeling multilingual unrestricted coreference in ontonotes","volume-title":"EMNLP-CoNLL Shared Task","author":"Pradhan","year":"2012"},{"key":"2021060823410317800_bib26","article-title":"Exploring the limits of transfer learning with a unified text-to-text transformer","author":"Raffel","year":"2019","journal-title":"arXiv preprint 1910.10683"},{"key":"2021060823410317800_bib27","article-title":"Exploring the limits of transfer learning with a unified text-to-text transformer","author":"Raffel","year":"2019","journal-title":"ArXiv, abs\/1910.10683"},{"key":"2021060823410317800_bib28","article-title":"Bridging resolution: Task definition, corpus resources and rule-based experiments","volume-title":"Proceedings of the International Conference on Computational Linguistics (COLING)","author":"R\u00f6siger","year":"2018"},{"key":"2021060823410317800_bib29","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/P18-1238","article-title":"Conceptual captions: A cleaned, hypernymed, image alt-text dataset for automatic image captioning","volume-title":"Proceedings of the Annual Meeting of the Association for Computation Linguistics (ACL)","author":"Sharma","year":"2018"},{"key":"2021060823410317800_bib30","volume-title":"Relevance: Communication and Cognition","author":"Sperber","year":"1986"},{"issue":"6","key":"2021060823410317800_bib31","doi-asserted-by":"crossref","first-page":"1663","DOI":"10.1016\/j.ipm.2007.01.010","article-title":"Two uses of anaphora resolution in summarization","volume":"43","author":"Steinberger","year":"2007","journal-title":"Information Processing and Management"},{"key":"2021060823410317800_bib32","article-title":"Anaphora resolution in machine translation","author":"Susanne","year":"1992"},{"key":"2021060823410317800_bib33","doi-asserted-by":"crossref","first-page":"283","DOI":"10.1162\/tacl_a_00139","article-title":"Problems in current text simplification research: New data can help","volume":"3","author":"Wei","year":"2015","journal-title":"Transactions of the Association for Computational Linguistics"},{"key":"2021060823410317800_bib34","doi-asserted-by":"crossref","first-page":"401","DOI":"10.1162\/tacl_a_00107","article-title":"Optimizing statistical machine translation for text simplification","volume":"4","author":"Wei","year":"2016","journal-title":"Transactions of the Association for Computational Linguistics"}],"container-title":["Transactions of the Association for Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00377\/1924234\/tacl_a_00377.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00377\/1924234\/tacl_a_00377.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,6,9]],"date-time":"2021-06-09T09:59:56Z","timestamp":1623232796000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/tacl\/article\/doi\/10.1162\/tacl_a_00377\/100685\/Decontextualization-Making-Sentences-Stand-Alone"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021]]},"references-count":34,"URL":"https:\/\/doi.org\/10.1162\/tacl_a_00377","relation":{},"ISSN":["2307-387X"],"issn-type":[{"value":"2307-387X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021]]},"published":{"date-parts":[[2021]]}}}