{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,15]],"date-time":"2025-08-15T00:07:18Z","timestamp":1755216438040,"version":"3.43.0"},"reference-count":24,"publisher":"IOP Publishing","issue":"3","license":[{"start":{"date-parts":[[2025,8,4]],"date-time":"2025-08-04T00:00:00Z","timestamp":1754265600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"},{"start":{"date-parts":[[2025,8,4]],"date-time":"2025-08-04T00:00:00Z","timestamp":1754265600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/iopscience.iop.org\/info\/page\/text-and-data-mining"}],"funder":[{"DOI":"10.13039\/501100000271","name":"Science and Technology Facilities Council","doi-asserted-by":"crossref","award":["ST\/T000694\/1"],"award-info":[{"award-number":["ST\/T000694\/1"]}],"id":[{"id":"10.13039\/501100000271","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100001659","name":"Deutsche Forschungsgemeinschaft","doi-asserted-by":"crossref","award":["492175459"],"award-info":[{"award-number":["492175459"]}],"id":[{"id":"10.13039\/501100001659","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["iopscience.iop.org"],"crossmark-restriction":false},"short-container-title":["Mach. Learn.: Sci. Technol."],"published-print":{"date-parts":[[2025,9,30]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>This work presents a novel framework for understanding learning dynamics and scaling relations in neural networks. We show that certain measures on the spectrum of the empirical neural tangent kernel (NTK), specifically entropy and trace, provide insight into the representations learned by a neural network and how these can be improved through architecture scaling. These results are demonstrated first on test cases before being applied to more complex networks, including transformers, auto-encoders, graph neural networks, and reinforcement learning studies. In testing on a wide range of architectures, we highlight the universal nature of training dynamics and further discuss how it can be used to understand the mechanisms behind learning in neural networks. We identify two such dominant mechanisms present throughout machine learning training. The first, information compression, is seen through a reduction in the entropy of the NTK spectrum during training, and occurs predominantly in small neural networks. The second, coined structure formation, is seen through an increasing entropy and thus, the creation of structure in the neural network representations beyond the prior established by the network at initialization. Due to the ubiquity of the latter in deep neural network architectures and its flexibility in the creation of feature-rich representations, we argue that this network entropy evolution be considered the onset of a deep learning regime.<\/jats:p>","DOI":"10.1088\/2632-2153\/adee76","type":"journal-article","created":{"date-parts":[[2025,7,10]],"date-time":"2025-07-10T22:56:38Z","timestamp":1752188198000},"page":"035021","update-policy":"https:\/\/doi.org\/10.1088\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Collective variables of neural networks: empirical time evolution and scaling laws"],"prefix":"10.1088","volume":"6","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9537-8361","authenticated-orcid":true,"given":"Samuel","family":"Tovey","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6374-6828","authenticated-orcid":true,"given":"Sven","family":"Krippendorf","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8362-0576","authenticated-orcid":false,"given":"Michael","family":"Spannowsky","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0009-0000-9926-680X","authenticated-orcid":true,"given":"Konstantin","family":"Nikolaou","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2739-310X","authenticated-orcid":true,"given":"Christian","family":"Holm","sequence":"additional","affiliation":[]}],"member":"266","published-online":{"date-parts":[[2025,8,4]]},"reference":[{"article-title":"Neural networks as kernel learners: the silent alignment effect","year":"2021","author":"Atanasov","key":"mlstadee76bib1"},{"key":"mlstadee76bib2","doi-asserted-by":"publisher","first-page":"21","DOI":"10.1073\/pnas.2311878121","article-title":"Explaining neural scaling laws","volume":"121","author":"Bahri","year":"2024","journal-title":"Proc. Natl Acad. Sci."},{"key":"mlstadee76bib3","first-page":"pp 79","article-title":"The ancient Greek and Latin dependency treebanks","author":"Bamman","year":"2011"},{"key":"mlstadee76bib4","doi-asserted-by":"publisher","first-page":"253","DOI":"10.1613\/jair.3912","article-title":"The arcade learning environment: an evaluation platform for general agents","volume":"47","author":"Bellemare","year":"2013","journal-title":"J. Artif. Intell. Res."},{"article-title":"JAX: composable transformations of Python+NumPy programs","year":"2018","author":"Bradbury","key":"mlstadee76bib5"},{"key":"mlstadee76bib6","doi-asserted-by":"publisher","first-page":"29","DOI":"10.1007\/s10579-017-9388-5","article-title":"The PROIEL treebank family: a standard for early attestations of Indo-European languages","volume":"52","author":"Eckhoff","year":"2018","journal-title":"Lang. Resour. Eval."},{"key":"mlstadee76bib7","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.physrep.2021.04.001","article-title":"Landscape and training regimes in deep learning","volume":"924","author":"Geiger","year":"2021","journal-title":"Phys. Rep."},{"article-title":"Flax: a neural network library and ecosystem for JAX","year":"2023","author":"Heek","key":"mlstadee76bib8"},{"article-title":"Neural tangent kernel: convergence and generalization in neural networks","year":"2018","author":"Jacot","key":"mlstadee76bib9"},{"article-title":"Scaling laws for neural language models","year":"2020","author":"Kaplan","key":"mlstadee76bib10"},{"key":"mlstadee76bib11","doi-asserted-by":"publisher","DOI":"10.1088\/2632-2153\/ac87e9","article-title":"A duality connecting neural network and cosmological dynamics","volume":"3","author":"Krippendorf","year":"2022","journal-title":"Mach. Learn. Sci. Technol."},{"article-title":"Learning multiple layers of features from tiny images","year":"2009","author":"Krizhevsky","key":"mlstadee76bib12"},{"article-title":"The two regimes of deep network training","year":"2020","author":"Leclerc","key":"mlstadee76bib13"},{"key":"mlstadee76bib14","doi-asserted-by":"publisher","first-page":"2278","DOI":"10.1109\/5.726791","article-title":"Gradient-based learning applied to document recognition","volume":"86","author":"Lecun","year":"1998"},{"article-title":"The large learning rate phase of deep learning: the catapult mechanism","year":"2020","author":"Lewkowycz","key":"mlstadee76bib15"},{"article-title":"Understanding the role of training regimes in continual learning","year":"2020","author":"Mirzadeh","key":"mlstadee76bib16"},{"key":"mlstadee76bib17","first-page":"pp 236","article-title":"Combining instance-based and model-based learning","author":"Quinlan","year":"1993"},{"article-title":"High-dimensional continuous control using generalized advantage estimation","year":"2018","author":"Schulman","key":"mlstadee76bib18"},{"key":"mlstadee76bib19","doi-asserted-by":"publisher","DOI":"10.1109\/5.726791","article-title":"Towards a phenomenological understanding of neural networks: data","volume":"4","author":"Tovey","year":"2023","journal-title":"Mach. Learn.: Sci. Technol."},{"key":"mlstadee76bib20","doi-asserted-by":"publisher","DOI":"10.18419\/DARUS-5175","article-title":"Replication Scripts for: \u201cCollective variables of neural networks: empirical time evolution and scaling laws\u201d","author":"Tovey","year":"2025"},{"article-title":"Open graph benchmark: datasets for machine learning on graphs","year":"2021","author":"Weihua","key":"mlstadee76bib21"},{"article-title":"Feature learning in infinite-width neural networks","year":"2022","author":"Yang","key":"mlstadee76bib22"},{"article-title":"Tensor programs V: tuning large neural networks via zero-shot hyperparameter transfer","year":"2022","author":"Yang","key":"mlstadee76bib23"},{"article-title":"Mean field residual networks: on the edge of chaos","year":"2017","author":"Yang","key":"mlstadee76bib24"}],"container-title":["Machine Learning: Science and Technology"],"original-title":[],"link":[{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/adee76","content-type":"text\/html","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/adee76\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/adee76","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/adee76\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/adee76\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/adee76\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/adee76\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"similarity-checking"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/adee76\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,4]],"date-time":"2025-08-04T10:06:12Z","timestamp":1754301972000},"score":1,"resource":{"primary":{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/adee76"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8,4]]},"references-count":24,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2025,8,4]]},"published-print":{"date-parts":[[2025,9,30]]}},"URL":"https:\/\/doi.org\/10.1088\/2632-2153\/adee76","relation":{},"ISSN":["2632-2153"],"issn-type":[{"type":"electronic","value":"2632-2153"}],"subject":[],"published":{"date-parts":[[2025,8,4]]},"assertion":[{"value":"Collective variables of neural networks: empirical time evolution and scaling laws","name":"article_title","label":"Article Title"},{"value":"Machine Learning: Science and Technology","name":"journal_title","label":"Journal Title"},{"value":"paper","name":"article_type","label":"Article Type"},{"value":"\u00a9 2025 The Author(s). Published by IOP Publishing Ltd","name":"copyright_information","label":"Copyright Information"},{"value":"2025-03-30","name":"date_received","label":"Date Received","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2025-07-10","name":"date_accepted","label":"Date Accepted","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2025-08-04","name":"date_epub","label":"Online publication date","group":{"name":"publication_dates","label":"Publication dates"}}]}}