{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T00:40:25Z","timestamp":1760143225348,"version":"build-2065373602"},"reference-count":40,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2024,1,25]],"date-time":"2024-01-25T00:00:00Z","timestamp":1706140800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Institutes of Health BRAIN Initiative","award":["R01EB026943"],"award-info":[{"award-number":["R01EB026943"]}]},{"name":"ITS-Simons Foundation fellowship","award":["R01EB026943"],"award-info":[{"award-number":["R01EB026943"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>We have formulated a family of machine learning problems as the time evolution of parametric probabilistic models (PPMs), inherently rendering a thermodynamic process. Our primary motivation is to leverage the rich toolbox of thermodynamics of information to assess the information-theoretic content of learning a probabilistic model. We first introduce two information-theoretic metrics, memorized information (M-info) and learned information (L-info), which trace the flow of information during the learning process of PPMs. Then, we demonstrate that the accumulation of L-info during the learning process is associated with entropy production, and the parameters serve as a heat reservoir in this process, capturing learned information in the form of M-info.<\/jats:p>","DOI":"10.3390\/e26020112","type":"journal-article","created":{"date-parts":[[2024,1,26]],"date-time":"2024-01-26T04:05:46Z","timestamp":1706241946000},"page":"112","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Stochastic Thermodynamics of Learning Parametric Probabilistic Models"],"prefix":"10.3390","volume":"26","author":[{"given":"Shervin S.","family":"Parsi","sequence":"first","affiliation":[{"name":"Physics Program, The Graduate Center, City University of New York, New York, NY 10016, USA"},{"name":"Initiative for the Theoretical Sciences, The Graduate Center, City University of New York, New York, NY 10016, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2024,1,25]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"183","DOI":"10.1147\/rd.53.0183","article-title":"Irreversibility and heat generation in the computing process","volume":"5","author":"Landauer","year":"1961","journal-title":"IBM J. Res. Dev."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"840","DOI":"10.1007\/BF01341281","article-title":"On the decrease of entropy in a thermodynamic system by the intervention of intelligent beings","volume":"53","author":"Szilard","year":"1929","journal-title":"Z. Phys."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"905","DOI":"10.1007\/BF02084158","article-title":"The thermodynamics of computation\u2014A review","volume":"21","author":"Bennett","year":"1982","journal-title":"Int. J. Theor. Phys."},{"key":"ref_4","unstructured":"Nielsen, M.A., and Chuang, I.L. (2010). Quantum Computation and Quantum Information: 10th Anniversary Edition, Cambridge University Press."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"035002","DOI":"10.1103\/RevModPhys.93.035002","article-title":"The entropy of hawking radiation","volume":"93","author":"Almheiri","year":"2021","journal-title":"Rev. Mod. Phys."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"131","DOI":"10.1038\/nphys3230","article-title":"Thermodynamics of information","volume":"2","author":"Parrondo","year":"2015","journal-title":"Nat. Phys."},{"key":"ref_7","unstructured":"Peliti, L., and Pigolotti, S. (2021). Stochastic Thermodynamics: An Introduction, Princeton University Press."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"120604","DOI":"10.1103\/PhysRevLett.109.120604","article-title":"Thermodynamics of prediction","volume":"109","author":"Still","year":"2012","journal-title":"Phys. Rev. Lett."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"180602","DOI":"10.1103\/PhysRevLett.109.180602","article-title":"Fluctuation theorem with information exchange: Role of correlations in stochastic thermodynamics","volume":"109","author":"Sagawa","year":"2012","journal-title":"Phys. Rev. Lett."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"013013","DOI":"10.1088\/1367-2630\/12\/1\/013013","article-title":"Entropy production as correlation between system and reservoir","volume":"1","author":"Esposito","year":"2010","journal-title":"New J. Phys."},{"key":"ref_11","unstructured":"Song, Y., and Kingma, D.P. (2021). How to train your energy-based models. arXiv."},{"key":"ref_12","unstructured":"Kingma, D.P., and Welling, M. (2013). Auto-encoding variational bayes. arXiv."},{"key":"ref_13","unstructured":"Jeon, H.J., Zhu, Y., and Roy, B.V. (2022). An information-theoretic framework for supervised learning. arXiv."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Yi, J., Zhang, Q., Chen, Z., Liu, Q., and Shao, W. (2022). Mutual information learned classifiers: An information-theoretic viewpoint of training deep learning classification systems. arXiv.","DOI":"10.1155\/2022\/2376888"},{"key":"ref_15","unstructured":"Shwartz-Ziv, R., and LeCun, Y. (2023). To compress or not to compress- self-supervised learning and information theory: A review. arXiv."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Yu, S., Giraldo, L.G.S., and Pr\u00edncipe, J.C. (2021, January 19\u201327). Information-theoretic methods in deep neural networks: Recent advances and emerging opportunities. Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, Montreal, QC, Canada.","DOI":"10.24963\/ijcai.2021\/633"},{"key":"ref_17","unstructured":"Geiger, B.C. (2021). On information plane analyses of neural network classifiers\u2014A review. arXiv."},{"key":"ref_18","unstructured":"Achille, A., Paolini, G., and Soatto, S. (2019). Where is the information in a deep neural network?. arXiv."},{"key":"ref_19","unstructured":"Shwartz-Ziv, R., and Tishby, N. (2017). Opening the black box of deep neural networks via information. arXiv."},{"key":"ref_20","first-page":"124020","article-title":"On the information bottleneck theory of deep learning","volume":"2018","author":"Andrew","year":"2018","journal-title":"J. Stat. Mech. Theory Exp."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Hinton, G.E., and Camp, D.V. (1993, January 26\u201328). Keeping the neural networks simple by minimizing the description length of the weights. Proceedings of the Sixth Annual Conference on Computational Learning Theory, Santa Cruz, CA, USA.","DOI":"10.1145\/168304.168306"},{"key":"ref_22","first-page":"1947","article-title":"Emergence of invariance and disentanglement in deep representations","volume":"19","author":"Achille","year":"2018","journal-title":"J. Mach. Learn. Res."},{"key":"ref_23","unstructured":"Cover, T.M., and Thomas, J.A. (1991). Elements of Information Theory, John Wiley and Sons."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"1080","DOI":"10.1214\/aos\/1176350051","article-title":"Stochastic complexity and modeling","volume":"14","author":"Rissanen","year":"1986","journal-title":"Ann. Stat."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1109\/JSAIT.2020.2991139","article-title":"Tightening mutual information-based bounds on generalization error","volume":"1","author":"Bu","year":"2020","journal-title":"IEEE J. Sel. Areas Inf. Theory"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"6","DOI":"10.1016\/j.physa.2014.04.035","article-title":"Ensemble and trajectory thermodynamics: A brief introduction","volume":"418","author":"Esposito","year":"2015","journal-title":"Phys. A Stat. Mech. Its Appl."},{"key":"ref_27","unstructured":"Du, S.S., Zhai, X., Poczos, B., and Singh, A. (2018). Gradient descent provably optimizes over-parameterized neural networks. arXiv."},{"key":"ref_28","unstructured":"Li, Y., and Liang, Y. (2018). Learning overparameterized neural networks via stochastic gradient descent on structured data. Adv. Neural Inf. Process. Syst., 31."},{"key":"ref_29","first-page":"041003","article-title":"Information processing and the second law of thermodynamics: An inclusive, hamiltonian approach","volume":"3","author":"Deffner","year":"2013","journal-title":"Phys. Rev. X"},{"key":"ref_30","unstructured":"Du, Y., and Mordatch, I. (2019). Implicit generation and generalization in energy-based models. arXiv."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Maes, C. (2021). Local Detailed Balance, SciPost Physics Lecture Notes; SciPost.","DOI":"10.21468\/SciPostPhysLectNotes.32"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Rao, R., and Esposito, M. (2018). Detailed fluctuation theorems: A unifying perspective. Entropy, 20.","DOI":"10.3390\/e20090635"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Zwanzig, R. (2001). Nonequilibrium Statistical Mechanics, Oxford University Press.","DOI":"10.1093\/oso\/9780195140187.001.0001"},{"key":"ref_34","unstructured":"Wei, M., and Schwab, D.J. (2019). How noise affects the hessian spectrum in overparameterized neural networks. arXiv."},{"key":"ref_35","unstructured":"Jacot, A., Gabriel, F., and Hongler, C. (2018). Neural tangent kernel: Convergence and generalization in neural networks. Adv. Neural Inf. Process. Syst., 31."},{"key":"ref_36","unstructured":"K\u00fchn, M., and Rosenow, B. (2023). Correlated noise in epoch-based stochastic gradient descent: Implications for weight variances. arXiv."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"020601","DOI":"10.1103\/PhysRevLett.102.020601","article-title":"Langevin equation with colored noise for constant-temperature molecular dynamics simulations","volume":"102","author":"Ceriotti","year":"2009","journal-title":"Phys. Rev. Lett."},{"key":"ref_38","unstructured":"Ziyin, L., Li, H., and Ueda, M. (2023). Law of balance and stationary distribution of stochastic gradient descent. arXiv."},{"key":"ref_39","unstructured":"Adhikari, S., Kabak\u00e7\u0131o\u011flu, A., Strang, A., Yuret, D., and Hinczewski, M. (2023). Machine learning in and out of equilibrium. arXiv."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"P03025","DOI":"10.1088\/1742-5468\/2014\/03\/P03025","article-title":"Thermodynamic and logical reversibilities revisited","volume":"2014","author":"Sagawa","year":"2014","journal-title":"J. Stat. Mech. Theory Exp."}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/26\/2\/112\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T13:49:32Z","timestamp":1760104172000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/26\/2\/112"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,1,25]]},"references-count":40,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2024,2]]}},"alternative-id":["e26020112"],"URL":"https:\/\/doi.org\/10.3390\/e26020112","relation":{},"ISSN":["1099-4300"],"issn-type":[{"type":"electronic","value":"1099-4300"}],"subject":[],"published":{"date-parts":[[2024,1,25]]}}}