{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,15]],"date-time":"2026-02-15T14:58:51Z","timestamp":1771167531807,"version":"3.50.1"},"reference-count":29,"publisher":"Springer Science and Business Media LLC","issue":"7","license":[{"start":{"date-parts":[[2023,12,12]],"date-time":"2023-12-12T00:00:00Z","timestamp":1702339200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,12,12]],"date-time":"2023-12-12T00:00:00Z","timestamp":1702339200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Mach Learn"],"published-print":{"date-parts":[[2024,7]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Deep neural networks can yield good performance on various tasks but often require large amounts of data to train them. Meta-learning received considerable attention as one approach to improve the generalization of these networks from a limited amount of data. Whilst meta-learning techniques have been observed to be successful at this in various scenarios, recent results suggest that when evaluated on tasks from a different data distribution than the one used for training, a baseline that simply finetunes a pre-trained network may be more effective than more complicated meta-learning techniques such as MAML, which is one of the most popular meta-learning techniques. This is surprising as the learning behaviour of MAML mimics that of finetuning: both rely on re-using learned features. We investigate the observed performance differences between finetuning, MAML, and another meta-learning technique called Reptile, and show that MAML and Reptile specialize for fast adaptation in low-data regimes of similar data distribution as the one used for training. Our findings show that both the output layer and the noisy training conditions induced by data scarcity play important roles in facilitating this specialization for MAML. Lastly, we show that the pre-trained features as obtained by the finetuning baseline are more diverse and discriminative than those learned by MAML and Reptile. Due to this lack of diversity and distribution specialization, MAML and Reptile may fail to generalize to out-of-distribution tasks whereas finetuning can fall back on the diversity of the learned features.<\/jats:p>","DOI":"10.1007\/s10994-023-06387-w","type":"journal-article","created":{"date-parts":[[2023,12,12]],"date-time":"2023-12-12T20:01:34Z","timestamp":1702411294000},"page":"4113-4132","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["Understanding transfer learning and gradient-based meta-learning techniques"],"prefix":"10.1007","volume":"113","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9215-2973","authenticated-orcid":false,"given":"Mike","family":"Huisman","sequence":"first","affiliation":[]},{"given":"Aske","family":"Plaat","sequence":"additional","affiliation":[]},{"given":"Jan N.","family":"van Rijn","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,12,12]]},"reference":[{"key":"6387_CR1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-67024-5","volume-title":"Metalearning: Applications to automated machine learning and data mining","author":"P Brazdil","year":"2022","unstructured":"Brazdil, P., van Rijn, J. N., Soares, C., & Vanschoren, J. (2022). Metalearning: Applications to automated machine learning and data mining (2nd ed.). Cham: Springer.","edition":"2"},{"key":"6387_CR2","unstructured":"Chen, W.-Y., Liu, Y.-C., Kira, Z., Wang, Y.-C.\u00a0F., & Huang, J.-B. (2019). A closer look at few-shot classification. In International conference on learning representations, ICLR\u201919."},{"key":"6387_CR3","unstructured":"Collins, L., Mokhtari, A., & Shakkottai, S. (2020). Why does maml outperform ERM? An optimization perspective. arXiv preprint arXiv:2010.14672."},{"key":"6387_CR4","unstructured":"Deleu, T., W\u00fcrfl, T., Samiei, M., Cohen, J.\u00a0P., & Bengio, Y. (2019). Torchmeta: A meta-learning library for PyTorch, https:\/\/arxiv.org\/abs\/1909.06576. Available at: https:\/\/github.com\/tristandeleu\/pytorch-meta."},{"key":"6387_CR5","doi-asserted-by":"crossref","unstructured":"Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 248\u2013255). IEEE.","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"6387_CR6","unstructured":"Finn, C., Abbeel, P., & Levine, S. (2017). Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th international conference on machine learning, ICML\u201917 (pp. 1126-1135). PMLR."},{"key":"6387_CR7","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision (pp. 1026\u20131034).","DOI":"10.1109\/ICCV.2015.123"},{"key":"6387_CR8","first-page":"5149","volume":"44","author":"TM Hospedales","year":"2021","unstructured":"Hospedales, T. M., Antoniou, A., Micaelli, P., & Storkey, A. J. (2021). Meta-learning in neural networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44, 5149\u20135169.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"issue":"9","key":"6387_CR9","doi-asserted-by":"publisher","first-page":"3227","DOI":"10.1007\/s10994-022-06210-y","volume":"111","author":"M Huisman","year":"2022","unstructured":"Huisman, M., Plaat, A., & van Rijn, J. N. (2022). Stateless neural meta-learning using second-order gradients. Machine Learning, 111(9), 3227\u20133244.","journal-title":"Machine Learning"},{"issue":"6","key":"6387_CR10","doi-asserted-by":"publisher","first-page":"4483","DOI":"10.1007\/s10462-021-10004-4","volume":"54","author":"M Huisman","year":"2021","unstructured":"Huisman, M., van Rijn, J. N., & Plaat, A. (2021). A survey of deep meta-learning. Artificial Intelligence Review, 54(6), 4483\u20134541.","journal-title":"Artificial Intelligence Review"},{"key":"6387_CR11","unstructured":"Krizhevsky, A., Sutskever, I., & Hinton, G.\u00a0E. (2012). ImageNet classification with deep convolutional neural networks. In Advances in neural information processing systems 25, NIPS\u201912 (pp. 1097\u20131105)"},{"issue":"7553","key":"6387_CR12","doi-asserted-by":"publisher","first-page":"436","DOI":"10.1038\/nature14539","volume":"521","author":"Y LeCun","year":"2015","unstructured":"LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436\u2013444.","journal-title":"Nature"},{"key":"6387_CR13","doi-asserted-by":"crossref","unstructured":"Lee, K., Maji, S., Ravichandran, A., & Soatto, S. (2019). Meta-learning with differentiable convex optimization. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 10657\u201310665).","DOI":"10.1109\/CVPR.2019.01091"},{"key":"6387_CR14","doi-asserted-by":"crossref","unstructured":"Mangla, P., Kumari, N., Sinha, A., Singh, M., Krishnamurthy, B., & Balasubramanian, V.\u00a0N. (2020). Charting the right manifold: Manifold mixup for few-shot learning. In Proceedings of the IEEE\/CVF winter conference on applications of computer vision (pp. 2218\u20132227).","DOI":"10.1109\/WACV45572.2020.9093338"},{"issue":"7540","key":"6387_CR15","doi-asserted-by":"publisher","first-page":"529","DOI":"10.1038\/nature14236","volume":"518","author":"V Mnih","year":"2015","unstructured":"Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529\u2013533.","journal-title":"Nature"},{"key":"6387_CR16","unstructured":"Nichol, A., Achiam, J., & Schulman, J. (2018). On first-order meta-learning algorithms. arXiv preprint arXiv:1803.02999."},{"key":"6387_CR17","unstructured":"Raghu, A., Raghu, M., Bengio, S., & Vinyals, O. (2020). Rapid learning or feature reuse? Towards understanding the effectiveness of MAML. In International conference on learning representations, ICLR\u201920."},{"key":"6387_CR18","unstructured":"Ravi, S., & Larochelle, H. (2017). Optimization as a model for few-shot learning. In International conference on learning representations, ICLR\u201917."},{"key":"6387_CR19","unstructured":"Rusu, A.\u00a0A., Rao, D., Sygnowski, J., Vinyals, O., Pascanu, R., Osindero, S., & Hadsell, R. (2019). Meta-learning with latent embedding optimization. In International conference on learning representations, ICLR\u201919."},{"issue":"6","key":"6387_CR20","doi-asserted-by":"publisher","first-page":"4650","DOI":"10.4249\/scholarpedia.4650","volume":"5","author":"T Schaul","year":"2010","unstructured":"Schaul, T., & Schmidhuber, J. (2010). Metalearning. Scholarpedia, 5(6), 4650.","journal-title":"Scholarpedia"},{"key":"6387_CR21","unstructured":"Schmidhuber, J. (1987). Evolutionary principles in self-referential learning, or on learning how to learn: The meta-meta-... hook. Master\u2019s thesis, Technische Universit\u00e4t M\u00fcnchen."},{"issue":"7587","key":"6387_CR22","doi-asserted-by":"publisher","first-page":"484","DOI":"10.1038\/nature16961","volume":"529","author":"D Silver","year":"2016","unstructured":"Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al. (2016). Mastering the game of go with deep neural networks and tree search. Nature, 529(7587), 484\u2013489.","journal-title":"Nature"},{"key":"6387_CR23","unstructured":"Snell, J., Swersky, K., & Zemel, R. (2017). Prototypical networks for few-shot learning. In Advances in Neural Information Processing Systems 30, NIPS\u201917 (pp. 4077\u20134087). Curran Associates Inc."},{"key":"6387_CR24","doi-asserted-by":"publisher","first-page":"181","DOI":"10.1007\/978-1-4615-5529-2_8","volume-title":"Learning to learn","author":"S Thrun","year":"1998","unstructured":"Thrun, S. (1998). Lifelong learning algorithms. Learning to learn (pp. 181\u2013209). Boston, MA: Springer."},{"key":"6387_CR25","doi-asserted-by":"crossref","unstructured":"Tian, Y., Wang, Y., Krishnan, D., Tenenbaum, & J.\u00a0B., Isola, P. (2020). Rethinking few-shot image classification: a good embedding is all you need? arXiv preprint arXiv:2003.11539.","DOI":"10.1007\/978-3-030-58568-6_16"},{"key":"6387_CR26","unstructured":"Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., & Wierstra, D. (2016). Matching networks for one shot learning. In Advances in neural information processing systems 29, NIPS\u201916 (pp. 3637\u20133645)."},{"key":"6387_CR27","unstructured":"Wah, C., Branson, S., Welinder, P., Perona, P., & Belongie, S. (2011). The Caltech-UCSD Birds-200-2011 Dataset. Technical Report CNS-TR-2011-001, California Institute of Technology."},{"key":"6387_CR28","unstructured":"Wu, Y., Schuster, M., Chen, Z., Le, Q.\u00a0V., Norouzi, M., Macherey, W., Krikun, M., Cao, Y., Gao, Q., Macherey, K., Klingner, J., Shah, A., Johnson, M., Liu, X., Kaiser, \u0141ukasz, Gouws, S., Kato, Y., Kudo, T., Kazawa, H., Stevens, K., Kurian, G., Patil, N., Wang, W., Young, C., Smith, J., Riesa, J., Rudnick, A., Vinyals, O., Corrado, G., Hughes, M., & Dean, J. (2016). Google\u2019s neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144."},{"key":"6387_CR29","unstructured":"Yang, S., Liu, L., & Xu, M. (2021). Free lunch for few-shot learning: distribution calibration. In International Conference on Learning Representations, ICLR\u201921."}],"container-title":["Machine Learning"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-023-06387-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10994-023-06387-w\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-023-06387-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,5,31]],"date-time":"2024-05-31T18:02:34Z","timestamp":1717178554000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10994-023-06387-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,12,12]]},"references-count":29,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2024,7]]}},"alternative-id":["6387"],"URL":"https:\/\/doi.org\/10.1007\/s10994-023-06387-w","relation":{},"ISSN":["0885-6125","1573-0565"],"issn-type":[{"value":"0885-6125","type":"print"},{"value":"1573-0565","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,12,12]]},"assertion":[{"value":"6 July 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"21 February 2023","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"10 August 2023","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"12 December 2023","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval"}},{"value":"Not applicable: this research did not involve human participants, nor did it involve animals.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Human and animal rights"}},{"value":"All authors declare that there is no recent, present, or anticipated employment by any organization that may gain or lose financially through publication of this manuscript.","order":5,"name":"Ethics","group":{"name":"EthicsHeading","label":"Employment"}},{"value":"Not applicable.","order":6,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent to participate"}},{"value":"Not applicable: this research does not involve personal data, and publishing of this manuscript will not result in the disruption of any individual\u2019s privacy.","order":7,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}}]}}