{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,30]],"date-time":"2026-04-30T05:58:24Z","timestamp":1777528704624,"version":"3.51.4"},"reference-count":61,"publisher":"Springer Science and Business Media LLC","issue":"10","license":[{"start":{"date-parts":[[2023,10,19]],"date-time":"2023-10-19T00:00:00Z","timestamp":1697673600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,10,19]],"date-time":"2023-10-19T00:00:00Z","timestamp":1697673600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Nat Mach Intell"],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>The ability to generalize well is one of the primary desiderata for models of natural language processing (NLP), but what \u2018good generalization\u2019 entails and how it should be evaluated is not well understood. In this Analysis we present a taxonomy for characterizing and understanding generalization research in NLP. The proposed taxonomy is based on an extensive literature review and contains five axes along which generalization studies can differ: their main motivation, the type of generalization they aim to solve, the type of data shift they consider, the source by which this data shift originated, and the locus of the shift within the NLP modelling pipeline. We use our taxonomy to classify over 700 experiments, and we use the results to present an in-depth analysis that maps out the current state of generalization research in NLP and make recommendations for which areas deserve attention in the future.<\/jats:p>","DOI":"10.1038\/s42256-023-00729-y","type":"journal-article","created":{"date-parts":[[2023,10,19]],"date-time":"2023-10-19T16:02:20Z","timestamp":1697731340000},"page":"1161-1174","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":47,"title":["A taxonomy and review of generalization research in NLP"],"prefix":"10.1038","volume":"5","author":[{"given":"Dieuwke","family":"Hupkes","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-1281-9686","authenticated-orcid":false,"given":"Mario","family":"Giulianelli","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Verna","family":"Dankers","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mikel","family":"Artetxe","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yanai","family":"Elazar","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5159-4641","authenticated-orcid":false,"given":"Tiago","family":"Pimentel","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7708-0051","authenticated-orcid":false,"given":"Christos","family":"Christodoulopoulos","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Karim","family":"Lasri","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Naomi","family":"Saphra","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Arabella","family":"Sinclair","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dennis","family":"Ulmer","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Florian","family":"Schottmann","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6819-5444","authenticated-orcid":false,"given":"Khuyagbaatar","family":"Batsuren","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kaiser","family":"Sun","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Koustuv","family":"Sinha","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Leila","family":"Khalatbari","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-5936-1380","authenticated-orcid":false,"given":"Maria","family":"Ryskina","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3921-3519","authenticated-orcid":false,"given":"Rita","family":"Frieske","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ryan","family":"Cotterell","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0238-9024","authenticated-orcid":false,"given":"Zhijing","family":"Jin","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2023,10,19]]},"reference":[{"key":"729_CR1","doi-asserted-by":"publisher","first-page":"243","DOI":"10.1006\/cogp.1998.0694","volume":"37","author":"GF Marcus","year":"1998","unstructured":"Marcus, G. F. Rethinking eliminative connectionism. Cogn. Psychol. 37, 243\u2013282 (1998).","journal-title":"Cogn. Psychol."},{"key":"729_CR2","doi-asserted-by":"publisher","unstructured":"Kirk, R., Zhang, A., Grefenstette, E. & Rockt\u00e4schel, T. A survey of generalisation in deep reinforcement learning. J. Artif. Intell. Res. https:\/\/doi.org\/10.1613\/jair.1.14174 (2023).","DOI":"10.1613\/jair.1.14174"},{"key":"729_CR3","first-page":"1","volume":"24","author":"A Chowdhery","year":"2023","unstructured":"Chowdhery, A. et al. PaLM: scaling language modeling with pathways. J. of Mach. Learn. Res. 24, 1\u2013113 (2023).","journal-title":"J. of Mach. Learn. Res."},{"key":"729_CR4","doi-asserted-by":"publisher","unstructured":"Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. In Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (Burstein, J. et al eds) 4171\u20134186 (Association for Computational Linguistics, 2019); https:\/\/doi.org\/10.18653\/v1\/N19-1423","DOI":"10.18653\/v1\/N19-1423"},{"key":"729_CR5","doi-asserted-by":"publisher","unstructured":"Blodgett, S. L., Green, L. & O\u2019Connor, B. Demographic dialectal variation in social media: a case study of African-American English. Jian Su, Kevin Duh, Xavier Carreras (eds). In Proc. 2016 Conference on Empirical Methods in Natural Language Processing (Su, J. et al eds) 1119\u20131130 (Association for Computational Linguistics, 2016); https:\/\/doi.org\/10.18653\/v1\/D16-1120. https:\/\/aclanthology.org\/D16-1120","DOI":"10.18653\/v1\/D16-1120"},{"key":"729_CR6","doi-asserted-by":"publisher","unstructured":"Plank, B. What to do about non-standard (or non-canonical) language in NLP. Preprint at arXiv https:\/\/doi.org\/10.48550\/arXiv.1608.07836 (2016).","DOI":"10.48550\/arXiv.1608.07836"},{"key":"729_CR7","unstructured":"Lake, B. & Baroni, M. Generalization without systematicity: on the compositional skills of sequence-to-sequence recurrent networks. In Proc. 35th International Conference on Machine Learning (ICML) 4487\u20134499 (International Machine Learning Society, 2018)."},{"key":"729_CR8","doi-asserted-by":"publisher","unstructured":"McCoy, T., Pavlick, E. & Linzen, T. Right for the wrong reasons: diagnosing syntactic heuristics in natural language inference. In Proc. 57th Annual Meeting of the Association for Computational Linguistics (Korhonen, A. et al eds.) 3428\u20133448 (Association for Computational Linguistics, 2019); https:\/\/doi.org\/10.18653\/v1\/P19-1334, https:\/\/aclanthology.org\/P19-1334","DOI":"10.18653\/v1\/P19-1334"},{"key":"729_CR9","doi-asserted-by":"publisher","unstructured":"Kim, N. & Linzen, T. COGS: a compositional generalization challenge based on semantic interpretation. In Proc. 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (Webber, B. et al eds.) 9087\u20139105 (Association for Computational Linguistics, 2020); https:\/\/doi.org\/10.18653\/v1\/2020.emnlp-main.731, https:\/\/aclanthology.org\/2020.emnlp-main.731","DOI":"10.18653\/v1\/2020.emnlp-main.731"},{"key":"729_CR10","unstructured":"Khishigsuren, T. et al. Using linguistic typology to enrich multilingual lexicons: the case of lexical gaps in kinship. In Proceedings of the Thirteenth Language Resources and Evaluation Conference 2798-2807 (European Language Resources Association, 2022); https:\/\/aclanthology.org\/2022.lrec-1.299"},{"key":"729_CR11","unstructured":"Kaushik, D., Hovy, E. & Lipton, Z. Learning the difference that makes a difference with counterfactually-augmented data. In International Conference on Learning Representations (2019)."},{"key":"729_CR12","doi-asserted-by":"publisher","unstructured":"Parrish, A. et al. BBQ: a hand-built bias benchmark for question answering. In Findings of the Association for Computational Linguistics (Muresan, S. et al eds.) 2086\u20132105 (Association for Computational Linguistics, 2022); https:\/\/doi.org\/10.18653\/v1\/2022.findings-acl.165, https:\/\/aclanthology.org\/2022.findings-acl.165","DOI":"10.18653\/v1\/2022.findings-acl.165"},{"key":"729_CR13","doi-asserted-by":"publisher","unstructured":"Srivastava, A. et al. Beyond the imitation game: quantifying and extrapolating the capabilities of language models. Preprint at arXiv https:\/\/doi.org\/10.48550\/arXiv.2206.04615 (2022).","DOI":"10.48550\/arXiv.2206.04615"},{"key":"729_CR14","doi-asserted-by":"crossref","unstructured":"Razeghi, Y., Logan, R. L. IV, Gardner, M. & Singh, S. Impact of pretraining term frequencies on few-shot reasoning. In Findings of the Association for Computational Linguistics: EMNLP 2022 840-854 (Association for Computational Linguistics, 2022); https:\/\/aclanthology.org\/2022.findings-emnlp.59.pdf","DOI":"10.18653\/v1\/2022.findings-emnlp.59"},{"key":"729_CR15","doi-asserted-by":"publisher","unstructured":"Lewis, P., Stenetorp, P. & Riedel, S. Question and answer test-train overlap in open-domain question answering datasets. In Proc. 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume (Merlo, P. et al eds.) 1000\u20131008 (Association for Computational Linguistics, 2021); https:\/\/doi.org\/10.18653\/v1\/2021.eacl-main.86, https:\/\/aclanthology.org\/2021.eacl-main.86","DOI":"10.18653\/v1\/2021.eacl-main.86"},{"key":"729_CR16","doi-asserted-by":"publisher","unstructured":"Michel, P. & Neubig, G. MTNT: a testbed for machine translation of noisy text. In Proc. 2018 Conference on Empirical Methods in Natural Language Processing (Riloff, E. et al eds.) 543\u2013553 (Association for Computational Linguistics, 2018); https:\/\/doi.org\/10.18653\/v1\/D18-1050, https:\/\/aclanthology.org\/D18-1050","DOI":"10.18653\/v1\/D18-1050"},{"key":"729_CR17","doi-asserted-by":"publisher","unstructured":"Dixon, L., Li, J., Sorensen, J., Thain, N. & Vasserman, L. Measuring and mitigating unintended bias in text classification. In Proc. 2018 AAAI\/ACM Conference on AI, Ethics and Society 67\u201373 (Association for Computing Machinery, 2018); https:\/\/doi.org\/10.1145\/3278721.3278729","DOI":"10.1145\/3278721.3278729"},{"key":"729_CR18","doi-asserted-by":"publisher","unstructured":"Dankers, V., Bruni, E. & Hupkes, D. The paradox of the compositionality of natural language: a neural machine translation case study. In Proc. 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Muresan, S. et al eds.) 4154\u20134175 (Association for Computational Linguistics, 2022); https:\/\/doi.org\/10.18653\/v1\/2022.acl-long.286, https:\/\/aclanthology.org\/2022.acl-long.286","DOI":"10.18653\/v1\/2022.acl-long.286"},{"key":"729_CR19","doi-asserted-by":"publisher","unstructured":"Wei, J., Garrette, D., Linzen, T. & Pavlick, E. Frequency effects on syntactic rule learning in transformers. In Proc. 2021 Conference on Empirical Methods in Natural Language Processing (Moens, M.-F. et al eds.) 932\u2013948 (Association for Computational Linguistics, 2021); https:\/\/doi.org\/10.18653\/v1\/2021.emnlp-main.72, https:\/\/aclanthology.org\/2021.emnlp-main.72","DOI":"10.18653\/v1\/2021.emnlp-main.72"},{"key":"729_CR20","doi-asserted-by":"publisher","unstructured":"Weber, L., Jumelet, J., Bruni, E. & Hupkes, D. Language modelling as a multi-task problem. In Proc. 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume (Merlo, P. et al eds.) 2049\u20132060 (Association for Computational Linguistics, 2021); https:\/\/doi.org\/10.18653\/v1\/2021.eacl-main.176, https:\/\/aclanthology.org\/2021.eacl-main.176","DOI":"10.18653\/v1\/2021.eacl-main.176"},{"key":"729_CR21","unstructured":"Raunak, V., Kumar, V., Metze, F. & Callan, J. On compositionality in neural machine translation. In NeurIPS 2019 Context and Compositionality in Biological and Artificial Neural Systems Workshop (2019); https:\/\/arxiv.org\/abs\/1911.01497"},{"key":"729_CR22","doi-asserted-by":"publisher","unstructured":"Dubois, Y., Dagan, G., Hupkes, D. & Bruni, E. Location attention for extrapolation to longer sequences. In Proc. 58th Annual Meeting of the Association for Computational Linguistics (Jurafsky, D. et al eds.) 403\u2013413 (Association for Computational Linguistics, 2020); https:\/\/doi.org\/10.18653\/v1\/2020.acl-main.39, https:\/\/aclanthology.org\/2020.acl-main.39","DOI":"10.18653\/v1\/2020.acl-main.39"},{"key":"729_CR23","doi-asserted-by":"publisher","unstructured":"Chaabouni, R., Dess\u00ec, R. & Kharitonov, E. Can transformers jump around right in natural language? Assessing performance transfer from SCAN. In Proc. Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP 136\u2013148 (Association for Computational Linguistics, 2021); https:\/\/doi.org\/10.18653\/v1\/2021.blackboxnlp-1.9, https:\/\/aclanthology.org\/2021.blackboxnlp-1.9","DOI":"10.18653\/v1\/2021.blackboxnlp-1.9"},{"key":"729_CR24","unstructured":"Sun, K., Williams, A. & Hupkes, D. A replication study of compositional generalization works on semantic parsing. In ML Reproducibility Challenge 2022. (2023); https:\/\/openreview.net\/pdf?id=MF9uv95psps"},{"key":"729_CR25","unstructured":"Marcus, G. F. The Algebraic Mind: Integrating Connectionism and Cognitive Science (Linzen, T. et al eds.) (MIT Press, 2003)."},{"key":"729_CR26","doi-asserted-by":"publisher","unstructured":"Zhou, X., Elfardy, H., Christodoulopoulos, C., Butler, T. & Bansal, M. Hidden biases in unreliable news detection datasets. In Proc. 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume (Merlo, P. et al eds.) 2482\u20132492 (Association for Computational Linguistics, 2021); https:\/\/doi.org\/10.18653\/v1\/2021.eacl-main.211, https:\/\/aclanthology.org\/2021.eacl-main.211","DOI":"10.18653\/v1\/2021.eacl-main.211"},{"key":"729_CR27","doi-asserted-by":"publisher","first-page":"104699","DOI":"10.1016\/j.cognition.2021.104699","volume":"213","author":"Y Lakretz","year":"2021","unstructured":"Lakretz, Y. et al. Mechanisms for handling nested dependencies in neural-network language models and humans. Cognition 213, 104699 (2021).","journal-title":"Cognition"},{"key":"729_CR28","doi-asserted-by":"publisher","unstructured":"Talman, A. & Chatzikyriakidis, S. Testing the generalization power of neural network models across NLI benchmarks. In Proc. 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (Linzen, T. et al eds.) 85\u201394 (Association for Computational Linguistics, 2019); https:\/\/doi.org\/10.18653\/v1\/W19-4810, https:\/\/aclanthology.org\/W19-4810","DOI":"10.18653\/v1\/W19-4810"},{"key":"729_CR29","doi-asserted-by":"publisher","unstructured":"Marcus, G. Deep learning: a critical appraisal. Preprint at arXiv https:\/\/doi.org\/10.48550\/arXiv.1801.00631 (2018).","DOI":"10.48550\/arXiv.1801.00631"},{"key":"729_CR30","doi-asserted-by":"publisher","unstructured":"Strubell, E., Ganesh, A. & McCallum, A. Energy and policy considerations for deep learning in NLP. In Proc. 57th Annual Meeting of the Association for Computational Linguistics (Korhonen, A. et al eds.) 3645\u20133650 (Association for Computational Linguistics, 2019); https:\/\/doi.org\/10.18653\/v1\/P19-1355, https:\/\/aclanthology.org\/P19-1355","DOI":"10.18653\/v1\/P19-1355"},{"key":"729_CR31","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1016\/0010-0277(88)90031-5","volume":"28","author":"JA Fodor","year":"1988","unstructured":"Fodor, J. A. & Pylyshyn, Z. W. Connectionism and cognitive architecture: a critical analysis. Cognition 28, 3\u201371 (1988).","journal-title":"Cognition"},{"key":"729_CR32","doi-asserted-by":"publisher","first-page":"373","DOI":"10.1111\/j.1755-2567.1970.tb00434.x","volume":"36","author":"R Montague","year":"1970","unstructured":"Montague, R. Universal grammar. Theoria 36, 373\u2013398 (1970).","journal-title":"Theoria"},{"key":"729_CR33","unstructured":"Schmidhuber, J. Towards compositional learning in dynamic networks. Technical report (Istituto Dalle Molle di Studi sull\u2019Intelligenza Artificiale (IDSIA), 1990)."},{"key":"729_CR34","doi-asserted-by":"publisher","first-page":"757","DOI":"10.1613\/jair.1.11674","volume":"67","author":"D Hupkes","year":"2020","unstructured":"Hupkes, D., Dankers, V., Mul, M. & Bruni, E. Compositionality decomposed: how do neural networks generalise? J. Artif. Intell. Res. 67, 757\u2013795 (2020).","journal-title":"J. Artif. Intell. Res."},{"key":"729_CR35","doi-asserted-by":"publisher","unstructured":"Jumelet, J., Denic, M., Szymanik, J., Hupkes, D. & Steinert-Threlkeld, S. Language models use monotonicity to assess NPI licensing. In Findings of the Association for Computational Linguistics (Zong, C. et al eds.) 4958\u20134969 (Association for Computational Linguistics, 2021); https:\/\/doi.org\/10.18653\/v1\/2021.findings-acl.439, https:\/\/aclanthology.org\/2021.findings-acl.439","DOI":"10.18653\/v1\/2021.findings-acl.439"},{"key":"729_CR36","doi-asserted-by":"publisher","unstructured":"Pimentel, T. et al. SIGMORPHON 2021 shared task on morphological reinflection: generalization across languages. In Proc. 18th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology and Morphology (Nicolai, G. et al eds.) 229\u2013259 (Association for Computational Linguistics, 2021); https:\/\/doi.org\/10.18653\/v1\/2021.sigmorphon-1.25, https:\/\/aclanthology.org\/2021.sigmorphon-1.25","DOI":"10.18653\/v1\/2021.sigmorphon-1.25"},{"key":"729_CR37","doi-asserted-by":"publisher","unstructured":"Liu, L. & Hulden, M. Can a transformer pass the wug test? Tuning copying bias in neural morphological inflection models. In Proc. 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (Muresan, S. et al eds.) 739\u2013749 (Association for Computational Linguistics, 2022); https:\/\/doi.org\/10.18653\/v1\/2022.acl-short.84, https:\/\/aclanthology.org\/2022.acl-short.84","DOI":"10.18653\/v1\/2022.acl-short.84"},{"key":"729_CR38","doi-asserted-by":"crossref","unstructured":"Collobert, R. & Weston, J. A unified architecture for natural language processing: deep neural networks with multitask learning. In Proc. Twenty-Fifth International Conference on Machine Learning (ICML 2008) Vol. 307 of ACM International Conference Proceeding Series (eds Cohen, W. W., McCallum, A. & Roweis, S. T.) 160\u2013167 (ACM, 2008).","DOI":"10.1145\/1390156.1390177"},{"key":"729_CR39","first-page":"9","volume":"1","author":"A Radford","year":"2019","unstructured":"Radford, A. et al. Language models are unsupervised multitask learners. OpenAI blog 1, 9 (2019).","journal-title":"OpenAI blog"},{"key":"729_CR40","doi-asserted-by":"publisher","unstructured":"Bender, E. M. On achieving and evaluating language-independence in NLP. Ling. Issues Lang. Technol. https:\/\/doi.org\/10.33011\/lilt.v6i.1239 (2011).","DOI":"10.33011\/lilt.v6i.1239"},{"key":"729_CR41","doi-asserted-by":"publisher","unstructured":"Wu, S. & Dredze, M. Beto, Bentz, Becas: the surprising cross-lingual effectiveness of BERT. In Proc. 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (Inui, K. et al eds.) 833\u2013844 (Association for Computational Linguistics, 2019); https:\/\/doi.org\/10.18653\/v1\/D19-1077, https:\/\/aclanthology.org\/D19-1077","DOI":"10.18653\/v1\/D19-1077"},{"key":"729_CR42","doi-asserted-by":"publisher","unstructured":"Zhang, B., Williams, P., Titov, I. & Sennrich, R. Improving massively multilingual neural machine translation and zero-shot translation. In Proc. 58th Annual Meeting of the Association for Computational Linguistics (Jurafsky, D. et al eds.) 1628\u20131639 (Association for Computational Linguistics, 2020); https:\/\/doi.org\/10.18653\/v1\/2020.acl-main.148, https:\/\/aclanthology.org\/2020.acl-main.148","DOI":"10.18653\/v1\/2020.acl-main.148"},{"key":"729_CR43","first-page":"29348","volume":"34","author":"A Lazaridou","year":"2021","unstructured":"Lazaridou, A. et al. Mind the gap: assessing temporal generalization in neural language models. Adv. Neural Inf. Process. Syst. 34, 29348\u201329363 (2021).","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"729_CR44","unstructured":"Daum\u00e9, H. III. Frustratingly easy domain adaptation. In Proc. 45th Annual Meeting of the Association of Computational Linguistics (Zaenen, A. et al eds.) 256\u2013263 (Association for Computational Linguistics, 2007); https:\/\/aclanthology.org\/P07-1033"},{"key":"729_CR45","doi-asserted-by":"publisher","unstructured":"Poliak, A., Naradowsky, J., Haldar, A., Rudinger, R. & Van Durme, B. Hypothesis only baselines in natural language inference. In Proc. Seventh Joint Conference on Lexical and Computational Semantics (Nissim, M. et al eds.) 180\u2013191 (Association for Computational Linguistics, 2018); https:\/\/doi.org\/10.18653\/v1\/S18-2023, https:\/\/aclanthology.org\/S18-2023","DOI":"10.18653\/v1\/S18-2023"},{"key":"729_CR46","doi-asserted-by":"publisher","unstructured":"Gorman, K. & Bedrick, S. We need to talk about standard splits. In Proc. 57th Annual Meeting of the Association for Computational Linguistics (Korhonen, A. et al eds.) 2786\u20132791 (Association for Computational Linguistics, 2019); https:\/\/doi.org\/10.18653\/v1\/P19-1267, https:\/\/aclanthology.org\/P19-1267","DOI":"10.18653\/v1\/P19-1267"},{"key":"729_CR47","first-page":"3","volume":"30","author":"A Storkey","year":"2009","unstructured":"Storkey, A. When training and test sets are different: characterizing learning transfer. Dataset Shift Mach. Learn. 30, 3\u201328 (2009).","journal-title":"Dataset Shift Mach. Learn."},{"key":"729_CR48","doi-asserted-by":"publisher","first-page":"521","DOI":"10.1016\/j.patcog.2011.06.019","volume":"45","author":"JG Moreno-Torres","year":"2012","unstructured":"Moreno-Torres, J. G., Raeder, T., Alaiz-Rodr\u00edguez, Roc\u00edo, Chawla, N. V. & Herrera, F. A unifying view on dataset shift in classification. Pattern Recogn. 45, 521\u2013530 (2012).","journal-title":"Pattern Recogn."},{"key":"729_CR49","doi-asserted-by":"publisher","unstructured":"Kodner, J. et al. SIGMORPHON\u2013UniMorph 2022 shared task 0: generalization and typologically diverse morphological inflection. In Proc. 19th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology and Morphology (Nicolai, G. et al eds.) 176\u2013203 (Association for Computational Linguistics, 2022); https:\/\/doi.org\/10.18653\/v1\/2022.sigmorphon-1.19, https:\/\/aclanthology.org\/2022.sigmorphon-1.19","DOI":"10.18653\/v1\/2022.sigmorphon-1.19"},{"key":"729_CR50","doi-asserted-by":"publisher","unstructured":"Papadimitriou, I. & Jurafsky, D. Learning music helps you read: using transfer to study linguistic structure in language models. In Proc. 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (Webber, B. et al eds.) 6829\u20136839 (Association for Computational Linguistics, 2020); https:\/\/doi.org\/10.18653\/v1\/2020.emnlp-main.554, https:\/\/aclanthology.org\/2020.emnlp-main.554","DOI":"10.18653\/v1\/2020.emnlp-main.554"},{"key":"729_CR51","doi-asserted-by":"publisher","unstructured":"De Varda, A. & Zamparelli, R. Multilingualism encourages recursion: a transfer study with mBERT. In Proc. 4th Workshop on Research in Computational Linguistic Typology and Multilingual NLP (Vylomova, E. et al eds.) 1\u201310 (Association for Computational Linguistics, 2022); https:\/\/doi.org\/10.18653\/v1\/2022.sigtyp-1.1, https:\/\/aclanthology.org\/2022.sigtyp-1.1","DOI":"10.18653\/v1\/2022.sigtyp-1.1"},{"key":"729_CR52","doi-asserted-by":"publisher","unstructured":"Li, B. et al. Quantifying adaptability in pre-trained language models with 500 tasks. In Proc. 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Carpuat, M. et al eds.) 4696\u20134715 (Association for Computational Linguistics, 2022); https:\/\/doi.org\/10.18653\/v1\/2022.naacl-main.346, https:\/\/aclanthology.org\/2022.naacl-main.346","DOI":"10.18653\/v1\/2022.naacl-main.346"},{"key":"729_CR53","doi-asserted-by":"publisher","unstructured":"Wang, B., Lapata, M. & Titov, I. Meta-learning for domain generalization in semantic parsing. In Proc. 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Toutanova, K. et al eds.) 366\u2013379 (Association for Computational Linguistics, 2021); https:\/\/doi.org\/10.18653\/v1\/2021.naacl-main.33, https:\/\/aclanthology.org\/2021.naacl-main.33","DOI":"10.18653\/v1\/2021.naacl-main.33"},{"key":"729_CR54","unstructured":"Lakretz, Y., Desbordes, T., Hupkes, D. & Dehaene, S. Causal transformers perform below chance on recursive nested constructions, unlike humans. In Proceedings of the 29th International Conference on Computational Linguistics 3226\u20133232 (International Committee on Computational Linguistics, 2022); https:\/\/aclanthology.org\/2022.coling-1.285"},{"key":"729_CR55","doi-asserted-by":"publisher","unstructured":"Kiela, D. et al. Dynabench: rethinking benchmarking in NLP. In Proc. 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Toutanova, K. et al eds.) 4110\u20134124 (Association for Computational Linguistics, 2021); https:\/\/doi.org\/10.18653\/v1\/2021.naacl-main.324, https:\/\/aclanthology.org\/2021.naacl-main.324","DOI":"10.18653\/v1\/2021.naacl-main.324"},{"key":"729_CR56","doi-asserted-by":"publisher","unstructured":"Zellers, R., Bisk, Y., Schwartz, R. & Choi, Y. SWAG: a large-scale adversarial dataset for grounded commonsense inference. In Proc. 2018 Conference on Empirical Methods in Natural Language Processing (Riloff, E. et al eds.) 93\u2013104 (Association for Computational Linguistics, 2018); https:\/\/doi.org\/10.18653\/v1\/D18-1009, https:\/\/aclanthology.org\/D18-1009","DOI":"10.18653\/v1\/D18-1009"},{"key":"729_CR57","doi-asserted-by":"publisher","unstructured":"Lakretz, Y. et al. The emergence of number and syntax units in LSTM language models. In Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (Burstein, J. et al eds.) 11\u201320 (Association for Computational Linguistics, 2019); https:\/\/doi.org\/10.18653\/v1\/N19-1002, https:\/\/aclanthology.org\/N19-1002","DOI":"10.18653\/v1\/N19-1002"},{"key":"729_CR58","doi-asserted-by":"publisher","unstructured":"Rae, J. W. et al. Scaling language models: methods, analysis and insights from training gopher. Preprint at arXiv https:\/\/doi.org\/10.48550\/arXiv.2112.11446 (2021).","DOI":"10.48550\/arXiv.2112.11446"},{"key":"729_CR59","doi-asserted-by":"publisher","unstructured":"Artetxe, M. et al. Efficient large scale language modeling with mixtures of experts. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (Goldberg, Y. et al eds.) 11699-11732 (Association for Computational Linguistics, 2022); https:\/\/doi.org\/10.18653\/v1\/2022.emnlp-main.804, https:\/\/aclanthology.org\/2022.emnlp-main.804\/","DOI":"10.18653\/v1\/2022.emnlp-main.804"},{"key":"729_CR60","doi-asserted-by":"publisher","unstructured":"Lin, Xi Victoria et al. Few-shot learning with multilingual generative language models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (Goldberg, Y. et al eds.) 9019\u20139052 (Association for Computational Linguistics, 2022); https:\/\/doi.org\/10.18653\/v1\/2022.emnlp-main.616, https:\/\/aclanthology.org\/2022.emnlp-main.616\/","DOI":"10.18653\/v1\/2022.emnlp-main.616"},{"key":"729_CR61","doi-asserted-by":"publisher","unstructured":"Yanaka, H., Mineshima, K. & Inui, K. Exploring transitivity in neural NLI models through veridicality. In Proc. 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume (Merlo, P. et al eds.) 920\u2013934 (Association for Computational Linguistics, 2021); https:\/\/doi.org\/10.18653\/v1\/2021.eacl-main.78, https:\/\/aclanthology.org\/2021.eacl-main.78","DOI":"10.18653\/v1\/2021.eacl-main.78"}],"container-title":["Nature Machine Intelligence"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.nature.com\/articles\/s42256-023-00729-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s42256-023-00729-y","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s42256-023-00729-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,10,19]],"date-time":"2023-10-19T16:16:43Z","timestamp":1697732203000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.nature.com\/articles\/s42256-023-00729-y"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,10,19]]},"references-count":61,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2023,10]]}},"alternative-id":["729"],"URL":"https:\/\/doi.org\/10.1038\/s42256-023-00729-y","relation":{},"ISSN":["2522-5839"],"issn-type":[{"value":"2522-5839","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,10,19]]},"assertion":[{"value":"22 December 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"5 September 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"19 October 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declare no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}]}}