{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T23:59:15Z","timestamp":1740182355071,"version":"3.37.3"},"reference-count":43,"publisher":"IOP Publishing","issue":"4","license":[{"start":{"date-parts":[[2021,7,15]],"date-time":"2021-07-15T00:00:00Z","timestamp":1626307200000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,7,15]],"date-time":"2021-07-15T00:00:00Z","timestamp":1626307200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/iopscience.iop.org\/info\/page\/text-and-data-mining"}],"content-domain":{"domain":["iopscience.iop.org"],"crossmark-restriction":false},"short-container-title":["Mach. Learn.: Sci. Technol."],"published-print":{"date-parts":[[2021,12,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>This paper employs a formal connection of machine learning with thermodynamics to characterize the quality of learned representations for transfer learning. We discuss how information-theoretic functionals such as rate, distortion and classification loss of a model lie on a convex, so-called, equilibrium surface. We prescribe dynamical processes to traverse this surface under specific constraints; in particular we develop an iso-classification process that trades off rate and distortion to keep the classification loss unchanged. We demonstrate how this process can be used for transferring representations from a source task to a target task while keeping the classification loss constant. Experimental validation of the theoretical results is provided on image-classification datasets.<\/jats:p>","DOI":"10.1088\/2632-2153\/abf984","type":"journal-article","created":{"date-parts":[[2021,4,24]],"date-time":"2021-04-24T00:37:20Z","timestamp":1619224640000},"page":"045004","update-policy":"https:\/\/doi.org\/10.1088\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["A free-energy principle for representation learning"],"prefix":"10.1088","volume":"2","author":[{"given":"Yansong","family":"Gao","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4590-1956","authenticated-orcid":false,"given":"Pratik","family":"Chaudhari","sequence":"additional","affiliation":[]}],"member":"266","published-online":{"date-parts":[[2021,7,15]]},"reference":[{"article-title":"On the emergence of invariance and disentangling in deep representations","year":"2017","author":"Achille","key":"mlstabf984bib1"},{"article-title":"TherML: thermodynamics of machine learning","year":"2018","author":"Alemi","key":"mlstabf984bib2"},{"article-title":"Deep variational information bottleneck","year":"2016","author":"Alemi","key":"mlstabf984bib3"},{"article-title":"Fixing a broken ELBO","year":"2017","author":"Alemi","key":"mlstabf984bib4"},{"key":"mlstabf984bib5","doi-asserted-by":"publisher","first-page":"149","DOI":"10.1613\/jair.731","article-title":"A model of inductive bias learning","volume":"12","author":"Baxter","year":"2000","journal-title":"J. Artif. Intell. Res."},{"key":"mlstabf984bib6","doi-asserted-by":"publisher","first-page":"151","DOI":"10.1007\/s10994-009-5152-4","article-title":"A theory of learning from different domains","volume":"79","author":"Ben-David","year":"2010","journal-title":"Mach. Learn."},{"key":"mlstabf984bib7","first-page":"pp 3884","article-title":"Exact rate-distortion in autoencoders via echo noise","author":"Brekelmans","year":"2019"},{"key":"mlstabf984bib8","doi-asserted-by":"publisher","DOI":"10.1088\/1742-5468\/ab39d9","article-title":"Entropy-sgd: biasing gradient descent into wide valleys","volume":"2019","author":"Chaudhari","year":"2019","journal-title":"J. Stat. Mech.: Theory Exp."},{"key":"mlstabf984bib9","doi-asserted-by":"crossref","DOI":"10.1109\/ITA.2018.8503224","article-title":"Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks","author":"Chaudhari","year":"2018"},{"key":"mlstabf984bib10","first-page":"pp 2292","article-title":"Sinkhorn distances: lightspeed computation of optimal transport","author":"Cuturi","year":"2013"},{"key":"mlstabf984bib11","first-page":"pp 2051","article-title":"Multi-task self-supervised visual learning","author":"Doersch","year":"2017"},{"key":"mlstabf984bib12","first-page":"pp 1716","article-title":"Wasserstein of Wasserstein loss for learning generative models","author":"Dukler","year":"2019"},{"article-title":"Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data","year":"2017","author":"Dziugaite","key":"mlstabf984bib13"},{"key":"mlstabf984bib14","first-page":"pp 3367","article-title":"A free-energy principle for representation learning","author":"Gao","year":"2020a"},{"article-title":"An information-geometric distance on the space of tasks","year":"2020b","author":"Gao","key":"mlstabf984bib15"},{"key":"mlstabf984bib16","first-page":"pp 580","article-title":"Rich feature hierarchies for accurate object detection and semantic segmentation","author":"Girshick","year":"2014"},{"key":"mlstabf984bib17","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-319-46493-0_38","article-title":"Identity mappings in deep residual networks","author":"He","year":"2016"},{"article-title":"Beta-VAE: learning basic visual concepts with a constrained variational framework","year":"2017","author":"Higgins","key":"mlstabf984bib18"},{"article-title":"Batch normalization: accelerating deep network training by reducing internal covariate shift","year":"2015","author":"Ioffe","key":"mlstabf984bib19"},{"key":"mlstabf984bib20","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1137\/S0036141096303359","article-title":"The variational formulation of the Fokker\u2013Planck equation","volume":"29","author":"Jordan","year":"1998","journal-title":"SIAM J. Math. Anal."},{"key":"mlstabf984bib21","doi-asserted-by":"publisher","first-page":"259","DOI":"10.1007\/s10955-017-1805-z","article-title":"Acceleration of convergence to equilibrium in Markov chains by breaking detailed balance","volume":"168","author":"Kaiser","year":"2017","journal-title":"J. Stat. Phys."},{"article-title":"Auto-encoding variational Bayes","year":"2014","author":"Kingma","key":"mlstabf984bib22"},{"article-title":"Adam: a method for stochastic optimization","year":"2015","author":"Kingma","key":"mlstabf984bib23"},{"key":"mlstabf984bib24","doi-asserted-by":"publisher","first-page":"3521","DOI":"10.1073\/pnas.1611835114","article-title":"Overcoming catastrophic forgetting in neural networks","volume":"114","author":"Kirkpatrick","year":"2017","journal-title":"Proc. Natl Acad. Sci."},{"key":"mlstabf984bib25","doi-asserted-by":"publisher","first-page":"1181","DOI":"10.3390\/e21121181","article-title":"Nonlinear information bottleneck","volume":"21","author":"Kolchinsky","year":"2019","journal-title":"Entropy"},{"article-title":"Learning multiple layers of features from tiny images","year":"2009","author":"Krizhevsky","key":"mlstabf984bib26"},{"key":"mlstabf984bib27","first-page":"pp 396","article-title":"Handwritten digit recognition with a back-propagation network","author":"LeCun","year":"1990"},{"article-title":"A PAC-Bayesian tutorial with a dropout bound","year":"2013","author":"McAllester","key":"mlstabf984bib28"},{"year":"2009","author":"Mezard","key":"mlstabf984bib29"},{"key":"mlstabf984bib30","first-page":"pp 1520","article-title":"Learning deconvolution network for semantic segmentation","author":"Noh","year":"2015"},{"key":"mlstabf984bib31","doi-asserted-by":"publisher","first-page":"147","DOI":"10.1162\/neco.1994.6.1.147","article-title":"Fast exact multiplication by the Hessian","volume":"6","author":"Pearlmutter","year":"1994","journal-title":"Neural Comput."},{"key":"mlstabf984bib32","doi-asserted-by":"crossref","DOI":"10.1561\/9781680835519","article-title":"Computational optimal transport","author":"Peyr\u00e9","year":"2019"},{"article-title":"The mutual autoencoder: controlling information in latent code representations","year":"2018","author":"Phuong","key":"mlstabf984bib33"},{"key":"mlstabf984bib34","doi-asserted-by":"publisher","first-page":"235","DOI":"10.1007\/978-1-4612-0919-5_16","author":"Rao","year":"1945"},{"key":"mlstabf984bib35","doi-asserted-by":"publisher","first-page":"400","DOI":"10.1214\/aoms\/1177729586","article-title":"A stochastic approximation method","volume":"22","author":"Robbins","year":"1951","journal-title":"Ann. Math. Stat."},{"key":"mlstabf984bib36","first-page":"p 94","article-title":"Optimal Transport for Applied Mathematicians","volume":"vol 55","author":"Santambrogio","year":"2015"},{"volume":"vol 14","year":"2006","author":"Sethna","key":"mlstabf984bib37"},{"key":"mlstabf984bib38","first-page":"pp 806","article-title":"CNN features off-the-shelf: an astounding baseline for recognition","author":"Sharif Razavian","year":"2014"},{"key":"mlstabf984bib39","first-page":"pp 368","article-title":"The information bottleneck method","author":"Tishby","year":"1999"},{"article-title":"The information bottleneck method","year":"2000","author":"Tishby","key":"mlstabf984bib40"},{"key":"mlstabf984bib41","first-page":"pp 1004","article-title":"Maximally informative hierarchical representations of high-dimensional data","author":"Ver Steeg","year":"2015"},{"volume":"vol 338","year":"2009","author":"Villani","key":"mlstabf984bib42"},{"key":"mlstabf984bib43","first-page":"pp 3712","article-title":"Taskonomy: disentangling task transfer learning","author":"Zamir","year":"2018"}],"container-title":["Machine Learning: Science and Technology"],"original-title":[],"link":[{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/abf984","content-type":"text\/html","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/abf984\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/abf984","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/abf984\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/abf984\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/abf984\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/abf984\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"similarity-checking"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/abf984\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,12,13]],"date-time":"2021-12-13T16:11:11Z","timestamp":1639411871000},"score":1,"resource":{"primary":{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/abf984"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,7,15]]},"references-count":43,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2021,7,15]]},"published-print":{"date-parts":[[2021,12,1]]}},"URL":"https:\/\/doi.org\/10.1088\/2632-2153\/abf984","relation":{},"ISSN":["2632-2153"],"issn-type":[{"type":"electronic","value":"2632-2153"}],"subject":[],"published":{"date-parts":[[2021,7,15]]},"assertion":[{"value":"A free-energy principle for representation learning","name":"article_title","label":"Article Title"},{"value":"Machine Learning: Science and Technology","name":"journal_title","label":"Journal Title"},{"value":"paper","name":"article_type","label":"Article Type"},{"value":"\u00a9 2021 The Author(s). Published by IOP Publishing Ltd","name":"copyright_information","label":"Copyright Information"},{"value":"2020-12-31","name":"date_received","label":"Date Received","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2021-04-19","name":"date_accepted","label":"Date Accepted","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2021-07-15","name":"date_epub","label":"Online publication date","group":{"name":"publication_dates","label":"Publication dates"}}]}}