{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,23]],"date-time":"2026-02-23T09:31:43Z","timestamp":1771839103464,"version":"3.50.1"},"reference-count":49,"publisher":"IOP Publishing","issue":"2","license":[{"start":{"date-parts":[[2026,2,23]],"date-time":"2026-02-23T00:00:00Z","timestamp":1771804800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"},{"start":{"date-parts":[[2026,2,23]],"date-time":"2026-02-23T00:00:00Z","timestamp":1771804800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/iopscience.iop.org\/info\/page\/text-and-data-mining"}],"funder":[{"DOI":"10.13039\/501100000271","name":"Science and Technology Facilities Council","doi-asserted-by":"crossref","award":["ST\/T000694\/1"],"award-info":[{"award-number":["ST\/T000694\/1"]}],"id":[{"id":"10.13039\/501100000271","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"crossref","award":["492175459"],"award-info":[{"award-number":["492175459"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["iopscience.iop.org"],"crossmark-restriction":false},"short-container-title":["Mach. Learn.: Sci. Technol."],"published-print":{"date-parts":[[2026,4,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Scaling laws offer valuable insights into the relationship between neural network (NN) performance and computational cost, yet their underlying mechanisms remain poorly understood. In this work, we empirically analyze how NNs behave under data and model scaling through the lens of the neural tangent kernel. This analysis establishes a link between performance scaling and the internal dynamics of NNs. Our findings of standard vision tasks show that similar performance scaling exponents can occur even though the internal model dynamics show opposite behavior. This demonstrates that performance scaling alone is insufficient for understanding the underlying mechanisms of NNs. We also address a previously unresolved issue in neural scaling: how convergence to the infinite-width limit affects scaling behavior in finite-width models. To this end, we investigate how feature learning is lost as the model width increases and quantify the transition between kernel-driven and feature-driven scaling regimes. We identify the maximum model width that supports feature learning, which, in our setups, we find to be more than ten times smaller than typical large language model widths.<\/jats:p>","DOI":"10.1088\/2632-2153\/ae4442","type":"journal-article","created":{"date-parts":[[2026,2,10]],"date-time":"2026-02-10T22:54:07Z","timestamp":1770764047000},"page":"025005","update-policy":"https:\/\/doi.org\/10.1088\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Beyond scaling curves: internal dynamics of neural networks through the NTK lens"],"prefix":"10.1088","volume":"7","author":[{"ORCID":"https:\/\/orcid.org\/0009-0000-9926-680X","authenticated-orcid":true,"given":"Konstantin","family":"Nikolaou","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6374-6828","authenticated-orcid":true,"given":"Sven","family":"Krippendorf","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9537-8361","authenticated-orcid":true,"given":"Samuel","family":"Tovey","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2739-310X","authenticated-orcid":true,"given":"Christian","family":"Holm","sequence":"additional","affiliation":[]}],"member":"266","published-online":{"date-parts":[[2026,2,23]]},"reference":[{"key":"mlstae4442bib1","article-title":"Neural networks as kernel learners: the silent alignment effect","author":"Atanasov","year":"2021","type":"preprint"},{"key":"mlstae4442bib2","article-title":"The DeepMind JAX ecosystem","author":"Babuschkin","year":"2020","type":"other"},{"key":"mlstae4442bib3","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.2311878121","type":"journal-article","article-title":"Explaining neural scaling laws","volume":"121","author":"Bahri","year":"2024","journal-title":"Proc. Natl Acad. Sci."},{"key":"mlstae4442bib4","first-page":"2269","type":"conference-proceedings","article-title":"Implicit regularization via neural feature alignment","author":"Baratin","year":"2021"},{"key":"mlstae4442bib5","article-title":"A foundation model for atomistic materials chemistry","author":"Batatia","year":"2024","type":"preprint"},{"key":"mlstae4442bib6","first-page":"32240","type":"conference-proceedings","article-title":"Self-consistent dynamical field theory of kernel evolution in wide neural networks","volume":"vol 35","author":"Bordelon","year":"2022"},{"key":"mlstae4442bib7","type":"conference-proceedings","article-title":"On lazy training in differentiable programming","volume":"vol 32","author":"Chizat","year":"2019"},{"key":"mlstae4442bib8","doi-asserted-by":"publisher","first-page":"73","DOI":"10.1021\/ar040198i","type":"journal-article","article-title":"Metadynamics as a tool for exploring free energy landscapes of chemical reactions","volume":"39","author":"Ensing","year":"2006","journal-title":"Acc. Chem. Res."},{"key":"mlstae4442bib9","first-page":"7710","type":"conference-proceedings","article-title":"Spectra of the conjugate kernel and neural tangent kernel for linear-width neural networks","volume":"vol 33","author":"Fan","year":"2020"},{"key":"mlstae4442bib10","first-page":"5850","type":"conference-proceedings","article-title":"Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the neural tangent kernel","volume":"vol 33","author":"Fort","year":"2020"},{"key":"mlstae4442bib11","doi-asserted-by":"publisher","DOI":"10.1088\/1742-5468\/abc4de","type":"journal-article","article-title":"Disentangling feature and lazy training in deep neural networks","author":"Geiger","year":"2020","journal-title":"J. Stat. Mech."},{"key":"mlstae4442bib12","first-page":"249","type":"conference-proceedings","article-title":"Understanding the difficulty of training deep feedforward neural networks","author":"Glorot","year":"2010"},{"key":"mlstae4442bib13","doi-asserted-by":"crossref","DOI":"10.18653\/v1\/2025.acl-long.377","type":"preprint","article-title":"Enough coin flips can make LLMs act Bayesian","author":"Gupta","year":"2025"},{"key":"mlstae4442bib14","first-page":"1026","type":"other","article-title":"Delving deep into rectifiers: surpassing human-level performance on imagenet classification","author":"He","year":"2015"},{"key":"mlstae4442bib15","article-title":"Flax: a neural network library and ecosystem for JAX","author":"Heek","year":"2024","type":"other"},{"key":"mlstae4442bib16","article-title":"Scaling laws for autoregressive generative modeling","author":"Henighan","year":"2020","type":"preprint"},{"key":"mlstae4442bib17","first-page":"30016","type":"conference-proceedings","article-title":"An empirical analysis of compute-optimal large language model training","volume":"vol 35","author":"Hoffmann","year":"2022"},{"key":"mlstae4442bib18","first-page":"4542","type":"conference-proceedings","article-title":"Dynamics of deep neural networks and neural tangent hierarchy","author":"Huang","year":"2020"},{"key":"mlstae4442bib19","article-title":"Learning curve theory","author":"Hutter","year":"2021","type":"preprint"},{"key":"mlstae4442bib20","type":"conference-proceedings","article-title":"Neural tangent kernel: convergence and generalization in neural networks","volume":"vol 31","author":"Jacot","year":"2018"},{"key":"mlstae4442bib21","article-title":"Scaling laws for neural language models","author":"Kaplan","year":"2020","type":"preprint"},{"key":"mlstae4442bib22","author":"Kingma","year":"2017","type":"preprint"},{"key":"mlstae4442bib23","doi-asserted-by":"publisher","first-page":"168","DOI":"10.1007\/978-3-030-61616-8_14)","type":"conference-proceedings","article-title":"Neural spectrum alignment: empirical study","author":"Kopitkov","year":"2020"},{"key":"mlstae4442bib24","doi-asserted-by":"publisher","DOI":"10.1088\/2632-2153\/ac87e9","type":"journal-article","article-title":"A duality connecting neural network and cosmological dynamics","volume":"3","author":"Krippendorf","year":"2022","journal-title":"Mach. Learn.: Sci. Technol."},{"key":"mlstae4442bib25","doi-asserted-by":"publisher","first-page":"436","DOI":"10.1038\/nature14539","type":"journal-article","article-title":"Deep learning","volume":"521","author":"LeCun","year":"2015","journal-title":"Nature"},{"key":"mlstae4442bib26","doi-asserted-by":"publisher","first-page":"9","DOI":"10.1007\/3-540-49430-8_2)","type":"book","article-title":"Efficient backProp","author":"LeCun","year":"1998"},{"key":"mlstae4442bib27","first-page":"15156","type":"conference-proceedings","article-title":"Finite versus infinite neural networks: an empirical study","volume":"vol 33","author":"Lee","year":"2020"},{"key":"mlstae4442bib28","first-page":"15954","type":"conference-proceedings","article-title":"On the linearity of large non-linear models: when and why the tangent kernel is constant","volume":"vol 33","author":"Liu","year":"2020"},{"key":"mlstae4442bib29","first-page":"2388","type":"conference-proceedings","article-title":"Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit","author":"Mei","year":"2019"},{"key":"mlstae4442bib30","doi-asserted-by":"publisher","DOI":"10.18419\/DARUS-5717","type":"dataset","article-title":"Replication Data and Scripts for: \u201cBeyond Scaling Curves: Internal Dynamics of Neural Networks Through the NTK Lens\u201d","author":"Nikolaou","year":"2026","unstructured":"Nikolaou K Tovey S Krippendorf S Holm C 2026 Replication Data and Scripts for: \u201cBeyond Scaling Curves: Internal Dynamics of Neural Networks Through the NTK Lens\u201d 10.18419\/DARUS-5717"},{"key":"mlstae4442bib31","article-title":"Neural tangents: fast and easy infinite neural networks in Python","author":"Novak","year":"2019","type":"preprint"},{"key":"mlstae4442bib32","first-page":"8998","type":"conference-proceedings","article-title":"What can linearized neural networks actually say about generalization?","volume":"vol 34","author":"Ortiz-Jimenez","year":"2021"},{"key":"mlstae4442bib33","doi-asserted-by":"crossref","DOI":"10.1093\/mnras\/stae1450","type":"preprint","article-title":"AstroCLIP: a cross-modal foundation model for galaxies","author":"Parker","year":"2024"},{"key":"mlstae4442bib34","first-page":"28843","type":"conference-proceedings","article-title":"Neural networks trained with SGD learn distributions of increasing complexity","author":"Refinetti","year":"2023"},{"key":"mlstae4442bib35","article-title":"A constructive prediction of the generalization error across scales","author":"Rosenfeld","year":"2019","type":"preprint"},{"key":"mlstae4442bib36","first-page":"606","type":"conference-proceedings","article-title":"The effective rank: a measure of effective dimensionality","author":"Roy","year":"2007"},{"key":"mlstae4442bib37","first-page":"868","type":"conference-proceedings","article-title":"Analyzing finite neural networks: can we trust neural tangent kernel theory?","author":"Seleznova","year":"2022"},{"key":"mlstae4442bib38","article-title":"A theory of neural tangent kernel alignment and its influence on training","author":"Shan","year":"2022","type":"preprint"},{"key":"mlstae4442bib39","first-page":"1","type":"journal-article","article-title":"Scaling laws from the data manifold dimension","volume":"23","author":"Sharma","year":"2022","journal-title":"J. Mach. Learn. Res."},{"key":"mlstae4442bib40","doi-asserted-by":"publisher","DOI":"10.1098\/rsos.221454","type":"journal-article","article-title":"Astronomia ex machina: a history, primer and outlook on neural networks in astronomy","volume":"10","author":"Smith","year":"2023","journal-title":"R. Soc. Open Sci."},{"key":"mlstae4442bib41","doi-asserted-by":"publisher","first-page":"225","DOI":"10.1007\/s10994-013-5422-z","type":"journal-article","article-title":"An instance level analysis of data complexity","volume":"95","author":"Smith","year":"2014","journal-title":"Mach. Learn."},{"key":"mlstae4442bib42","article-title":"Gemma 2: improving open language models at a practical size","author":"Team","year":"2024","type":"preprint"},{"key":"mlstae4442bib43","doi-asserted-by":"publisher","DOI":"10.1088\/2632-2153\/acf099","type":"journal-article","article-title":"Towards a phenomenological understanding of neural networks: data","volume":"4","author":"Tovey","year":"2023","journal-title":"Mach. Learn.: Sci. Technol."},{"key":"mlstae4442bib44","article-title":"Collective variables of neural networks: empirical time evolution and scaling laws","author":"Tovey","year":"2024","type":"preprint"},{"key":"mlstae4442bib45","doi-asserted-by":"publisher","first-page":"159","DOI":"10.1146\/annurev-physchem-040215-112229","type":"journal-article","article-title":"Enhancing important fluctuations: rare events and metadynamics from a conceptual viewpoint","volume":"67","author":"Valsson","year":"2016","journal-title":"Annu. Rev. Phys. Chem."},{"key":"mlstae4442bib46","article-title":"Tensor programs II: neural tangent kernel for any architecture","author":"Yang","year":"2020","type":"preprint"},{"key":"mlstae4442bib47","article-title":"Feature learning in infinite-width neural networks","author":"Yang","year":"2022","type":"preprint"},{"key":"mlstae4442bib48","article-title":"A fine-grained spectral perspective on neural networks","author":"Yang","year":"2020","type":"preprint"},{"key":"mlstae4442bib49","first-page":"11255","type":"conference-proceedings","article-title":"Instance regularization for discriminative language model pre-training","author":"Zhang","year":"2022"}],"container-title":["Machine Learning: Science and Technology"],"original-title":[],"link":[{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ae4442","content-type":"text\/html","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ae4442\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ae4442","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ae4442\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ae4442\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ae4442\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ae4442\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"similarity-checking"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ae4442\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,2,23]],"date-time":"2026-02-23T08:46:38Z","timestamp":1771836398000},"score":1,"resource":{"primary":{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ae4442"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,2,23]]},"references-count":49,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2026,2,23]]},"published-print":{"date-parts":[[2026,4,1]]}},"URL":"https:\/\/doi.org\/10.1088\/2632-2153\/ae4442","relation":{},"ISSN":["2632-2153"],"issn-type":[{"value":"2632-2153","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,2,23]]},"assertion":[{"value":"Beyond scaling curves: internal dynamics of neural networks through the NTK lens","name":"article_title","label":"Article Title"},{"value":"Machine Learning: Science and Technology","name":"journal_title","label":"Journal Title"},{"value":"paper","name":"article_type","label":"Article Type"},{"value":"\u00a9 2026 The Author(s). Published by IOP Publishing Ltd","name":"copyright_information","label":"Copyright Information"},{"value":"2025-08-24","name":"date_received","label":"Date Received","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2026-02-10","name":"date_accepted","label":"Date Accepted","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2026-02-23","name":"date_epub","label":"Online publication date","group":{"name":"publication_dates","label":"Publication dates"}}]}}