{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,8]],"date-time":"2026-07-08T19:46:15Z","timestamp":1783539975589,"version":"3.55.0"},"reference-count":45,"publisher":"MIT Press","license":[{"start":{"date-parts":[[2022,9,8]],"date-time":"2022-09-08T00:00:00Z","timestamp":1662595200000},"content-version":"vor","delay-in-days":250,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,9,7]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Keeping the performance of language technologies optimal as time passes is of great practical interest. We study temporal effects on model performance on downstream language tasks, establishing a nuanced terminology for such discussion and identifying factors essential to conduct a robust study. We present experiments for several tasks in English where the label correctness is not dependent on time and demonstrate the importance of distinguishing between temporal model deterioration and temporal domain adaptation for systems using pre-trained representations. We find that, depending on the task, temporal model deterioration is not necessarily a concern. Temporal domain adaptation, however, is beneficial in all cases, with better performance for a given time period possible when the system is trained on temporally more recent data. Therefore, we also examine the efficacy of two approaches for temporal domain adaptation without human annotations on new data. Self-labeling shows consistent improvement and notably, for named entity recognition, leads to better temporal adaptation than even human annotations.<\/jats:p>","DOI":"10.1162\/tacl_a_00497","type":"journal-article","created":{"date-parts":[[2022,9,8]],"date-time":"2022-09-08T13:57:24Z","timestamp":1662645444000},"page":"904-921","update-policy":"https:\/\/doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":16,"title":["Temporal Effects on Pre-trained Models for Language Processing Tasks"],"prefix":"10.1162","volume":"10","author":[{"given":"Oshin","family":"Agarwal","sequence":"first","affiliation":[{"name":"University of Pennsylvania, USA. oagarwal@seas.upenn.edu"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ani","family":"Nenkova","sequence":"additional","affiliation":[{"name":"Adobe Research, USA. nenkova@adobe.com"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"281","published-online":{"date-parts":[[2022,9,7]]},"reference":[{"issue":"1","key":"2022090813571622500_bib1","doi-asserted-by":"publisher","first-page":"117","DOI":"10.1162\/coli_a_00397","article-title":"Interpretability analysis for named entity recognition to understand system predictions and how they can improve","volume":"47","author":"Agarwal","year":"2021","journal-title":"Computational Linguistics"},{"key":"2022090813571622500_bib2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i05.6240","article-title":"Back to the future - temporal adaptation of text representations","volume-title":"AAAI","author":"Bjerva","year":"2020"},{"key":"2022090813571622500_bib3","doi-asserted-by":"publisher","first-page":"146","DOI":"10.18653\/v1\/W19-4718","article-title":"Times are changing: Investigating the pace of language change in diachronic word embeddings","volume-title":"Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change","author":"Brandl","year":"2019"},{"key":"2022090813571622500_bib4","doi-asserted-by":"publisher","first-page":"163","DOI":"10.18653\/v1\/2021.socialnlp-1.14","article-title":"Mitigating temporal-drift: A simple approach to keep NER models crisp","volume-title":"Proceedings of the Ninth International Workshop on Natural Language Processing for Social Media","author":"Chen","year":"2021"},{"key":"2022090813571622500_bib5","doi-asserted-by":"publisher","first-page":"307","DOI":"10.1145\/2488388.2488416","article-title":"No country for old members: User lifecycle and linguistic change in online communities","volume-title":"22nd International World Wide Web Conference, WWW \u201913, Rio de Janeiro, Brazil, May 13\u201317, 2013","author":"Danescu-Niculescu-Mizil","year":"2013"},{"key":"2022090813571622500_bib6","first-page":"4171","article-title":"BERT: Pre- training of deep bidirectional transformers for language understanding","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Devlin","year":"2019"},{"key":"2022090813571622500_bib7","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00459","article-title":"Time-aware language models as temporal knowledge bases","author":"Dhingra","year":"2021","journal-title":"arXiv preprint arXiv:2106.15110"},{"key":"2022090813571622500_bib8","doi-asserted-by":"publisher","first-page":"1383","DOI":"10.18653\/v1\/P18-1128","article-title":"The hitchhiker\u2019s guide to testing statistical significance in natural language processing","volume-title":"Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Dror","year":"2018"},{"key":"2022090813571622500_bib9","first-page":"19","article-title":"When terms disappear from a specialized lexicon: A semi- automatic investigation into necrology","author":"Dury","year":"2011","journal-title":"ICAME Journal"},{"key":"2022090813571622500_bib10","first-page":"359","article-title":"What to do about bad language on the internet","volume-title":"Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Eisenstein","year":"2013"},{"key":"2022090813571622500_bib11","first-page":"9","article-title":"Measuring and modeling language change","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorials","author":"Eisenstein","year":"2019"},{"key":"2022090813571622500_bib12","doi-asserted-by":"publisher","first-page":"2163","DOI":"10.18653\/v1\/D19-1222","article-title":"To annotate or not? Predicting performance drop under domain shift","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Elsahar","year":"2019"},{"issue":"2","key":"2022090813571622500_bib13","doi-asserted-by":"publisher","first-page":"140","DOI":"10.1145\/980972.980990","article-title":"\u201cin vivo\u201d spam filtering: A challenge problem for KDD","volume":"5","author":"Fawcett","year":"2003","journal-title":"ACM SIGKDD Explorations Newsletter"},{"key":"2022090813571622500_bib14","first-page":"2544","article-title":"Crowdsourcing and annotating NER for Twitter #drift","volume-title":"Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC\u201914)","author":"Fromreide","year":"2014"},{"key":"2022090813571622500_bib15","doi-asserted-by":"publisher","first-page":"429","DOI":"10.1007\/BF02030865","article-title":"Discrimination decisions for 100,000-dimensional spaces","volume":"55","author":"Gale","year":"1995","journal-title":"Annals of Operations Research"},{"key":"2022090813571622500_bib16","doi-asserted-by":"publisher","first-page":"8342","DOI":"10.18653\/v1\/2020.acl-main.740","article-title":"Don\u2019t stop pretraining: Adapt language models to domains and tasks","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Gururangan","year":"2020"},{"key":"2022090813571622500_bib17","doi-asserted-by":"publisher","first-page":"1489","DOI":"10.18653\/v1\/P16-1141","article-title":"Diachronic word embeddings reveal statistical laws of semantic change","volume-title":"Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Hamilton","year":"2016"},{"key":"2022090813571622500_bib18","doi-asserted-by":"publisher","first-page":"2241","DOI":"10.24963\/ijcai.2018\/310","article-title":"Time-evolving text classification with deep neural networks","volume-title":"Proceedings of the 27th International Joint Conference on Artificial Intelligence","author":"He","year":"2018"},{"issue":"8","key":"2022090813571622500_bib19","doi-asserted-by":"publisher","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Computation"},{"key":"2022090813571622500_bib20","doi-asserted-by":"publisher","first-page":"6970","DOI":"10.18653\/v1\/2021.acl-long.542","article-title":"Dynamic contextualized word embeddings","volume-title":"Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)","author":"Hofmann","year":"2021"},{"key":"2022090813571622500_bib21","doi-asserted-by":"publisher","first-page":"694","DOI":"10.18653\/v1\/P18-2110","article-title":"Examining temporality in document classification","volume-title":"Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)","author":"Huang","year":"2018"},{"key":"2022090813571622500_bib22","doi-asserted-by":"publisher","first-page":"4113","DOI":"10.18653\/v1\/P19-1403","article-title":"Neural temporality adaptation for document classification: Diachronic word embeddings and domain adaptation models","volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics","author":"Huang","year":"2019"},{"key":"2022090813571622500_bib23","first-page":"282","article-title":"Conditional random fields: Probabilistic models for segmenting and labeling sequence data","volume-title":"Proceedings of the Eighteenth International Conference on Machine Learning","author":"Lafferty","year":"2001"},{"key":"2022090813571622500_bib24","doi-asserted-by":"publisher","first-page":"260","DOI":"10.18653\/v1\/N16-1030","article-title":"Neural architectures for named entity recognition","volume-title":"Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Lample","year":"2016"},{"key":"2022090813571622500_bib25","article-title":"Mind the gap: Assessing temporal generalization in neural language models","volume":"34","author":"Lazaridou","year":"2021","journal-title":"Advances in Neural Information Processing Systems"},{"key":"2022090813571622500_bib26","first-page":"152","article-title":"tRuEcasIng","volume-title":"Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics","author":"Lita","year":"2003"},{"key":"2022090813571622500_bib27","article-title":"Roberta: A robustly optimized bert pretraining approach","volume":"abs\/1907.11692","author":"Liu","year":"2019","journal-title":"ArXiv"},{"key":"2022090813571622500_bib28","doi-asserted-by":"publisher","first-page":"65","DOI":"10.18653\/v1\/W18-6210","article-title":"Sentiment analysis under temporal shift","volume-title":"Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis","author":"Lukes","year":"2018"},{"key":"2022090813571622500_bib29","doi-asserted-by":"crossref","first-page":"1064","DOI":"10.18653\/v1\/P16-1101","article-title":"End- to-end sequence labeling via bi-directional LSTM-CNNs-CRF","volume-title":"Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Ma","year":"2016"},{"key":"2022090813571622500_bib30","volume-title":"Because Internet: Understanding the New Rules of Language","author":"McCulloch","year":"2020"},{"key":"2022090813571622500_bib31","article-title":"Distributed representations of words and phrases and their compositionality","volume-title":"NIPS","author":"Mikolov","year":"2013"},{"key":"2022090813571622500_bib32","doi-asserted-by":"crossref","first-page":"188","DOI":"10.18653\/v1\/D19-1018","article-title":"Justifying recommendations using distantly-labeled reviews and fine-grained aspects","volume-title":"Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)","author":"Ni","year":"2019"},{"key":"2022090813571622500_bib33","doi-asserted-by":"publisher","first-page":"1532","DOI":"10.3115\/v1\/D14-1162","article-title":"GloVe: Global vectors for word representation","volume-title":"Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Pennington","year":"2014"},{"key":"2022090813571622500_bib34","doi-asserted-by":"publisher","first-page":"2227","DOI":"10.18653\/v1\/N18-1202","article-title":"Deep contextualized word representations","volume-title":"Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)","author":"Peters","year":"2018"},{"key":"2022090813571622500_bib35","doi-asserted-by":"publisher","first-page":"338","DOI":"10.18653\/v1\/D17-1035","article-title":"Reporting score distributions makes a difference: Performance study of LSTM-networks for sequence tagging","volume-title":"Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing","author":"Reimers","year":"2017"},{"key":"2022090813571622500_bib36","doi-asserted-by":"publisher","first-page":"7605","DOI":"10.18653\/v1\/2020.acl-main.680","article-title":"Temporally-informed analysis of named entity recognition","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Rijhwani","year":"2020"},{"key":"2022090813571622500_bib37","doi-asserted-by":"publisher","first-page":"474","DOI":"10.18653\/v1\/N18-1044","article-title":"Deep neural models of semantic shift","volume-title":"Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)","author":"Rosenfeld","year":"2018"},{"key":"2022090813571622500_bib38","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/2022.findings-naacl.112","article-title":"Temporal attention for language models","volume-title":"Findings of the North American Chapter of the Association for Computational Linguistics: NAACL 2022","author":"Rosin","year":"2022"},{"key":"2022090813571622500_bib39","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2021.findings-emnlp.206","article-title":"Temporal adaptation of bert and performance on downstream document classification: Insights from social media","volume-title":"EMNLP","author":"R\u00f6ttger","year":"2021"},{"issue":"12","key":"2022090813571622500_bib40","first-page":"e26752","article-title":"The New York Times annotated corpus","volume":"6","author":"Sandhaus","year":"2008","journal-title":"Linguistic Data Consortium, Philadelphia"},{"key":"2022090813571622500_bib41","article-title":"Distilbert, a distilled version of BERT: Smaller, faster, cheaper and lighter","author":"Sanh","year":"2019","journal-title":"arXiv preprint arXiv:1910.01108"},{"key":"2022090813571622500_bib42","doi-asserted-by":"publisher","first-page":"1823","DOI":"10.18653\/v1\/2021.eacl-main.156","article-title":"We need to talk about random splits","volume-title":"Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume","author":"S\u00f8gaard","year":"2021"},{"key":"2022090813571622500_bib43","article-title":"Empirical foundations for a theory of language change","author":"Weinreich","year":"1968"},{"key":"2022090813571622500_bib44","doi-asserted-by":"publisher","first-page":"35","DOI":"10.1145\/2064448.2064475","article-title":"Understanding semantic change of words over centuries","volume-title":"Proceedings of the 2011 International Workshop on DETecting and Exploiting Cultural DiversiTy on the Social Web","author":"Wijaya","year":"2011"},{"key":"2022090813571622500_bib45","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.emnlp-demos.6","article-title":"Huggingface\u2019s transformers: State-of-the-art natural language processing","volume":"abs\/1910. 03771","author":"Wolf","year":"2019","journal-title":"ArXiv"}],"container-title":["Transactions of the Association for Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00497\/2042578\/tacl_a_00497.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00497\/2042578\/tacl_a_00497.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,9,8]],"date-time":"2022-09-08T13:58:00Z","timestamp":1662645480000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/tacl\/article\/doi\/10.1162\/tacl_a_00497\/112912\/Temporal-Effects-on-Pre-trained-Models-for"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022]]},"references-count":45,"URL":"https:\/\/doi.org\/10.1162\/tacl_a_00497","relation":{},"ISSN":["2307-387X"],"issn-type":[{"value":"2307-387X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022]]},"published":{"date-parts":[[2022]]}}}