{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,10]],"date-time":"2025-06-10T09:40:09Z","timestamp":1749548409974,"version":"3.41.0"},"reference-count":46,"publisher":"IOP Publishing","issue":"2","license":[{"start":{"date-parts":[[2025,6,10]],"date-time":"2025-06-10T00:00:00Z","timestamp":1749513600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"},{"start":{"date-parts":[[2025,6,10]],"date-time":"2025-06-10T00:00:00Z","timestamp":1749513600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/iopscience.iop.org\/info\/page\/text-and-data-mining"}],"content-domain":{"domain":["iopscience.iop.org"],"crossmark-restriction":false},"short-container-title":["Mach. Learn.: Sci. Technol."],"published-print":{"date-parts":[[2025,6,30]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>While deep learning has expanded the possibilities for highly expressive variational families, the practical benefits of these tools for variational inference (VI) are often limited by the minimization of the traditional Kullback\u2013Leibler objective, which can yield suboptimal solutions. A major challenge in this context is <jats:italic>mode collapse<\/jats:italic>: the phenomenon where a model concentrates on a few modes of the target distribution during training, despite being statistically capable of expressing them all. In this work, we carry a theoretical investigation of mode collapse for the gradient flow on Gaussian mixture models. We identify the key low-dimensional statistics characterizing the flow, and derive a closed set of low-dimensional equations governing their evolution. Leveraging this compact description, we show that mode collapse is present even in statistically favorable scenarios, and identify two key mechanisms driving it: mean alignment and vanishing weight. Our theoretical findings are consistent with the implementation of VI using normalizing flows, a class of popular generative models, thereby offering practical insights.<\/jats:p>","DOI":"10.1088\/2632-2153\/adde2a","type":"journal-article","created":{"date-parts":[[2025,5,28]],"date-time":"2025-05-28T22:58:28Z","timestamp":1748473108000},"page":"025056","update-policy":"https:\/\/doi.org\/10.1088\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["A theoretical perspective on mode collapse in variational inference"],"prefix":"10.1088","volume":"6","author":[{"given":"Roman","family":"Soletskyi","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5989-1018","authenticated-orcid":true,"given":"Marylou","family":"Gabri\u00e9","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6327-4688","authenticated-orcid":true,"given":"Bruno","family":"Loureiro","sequence":"additional","affiliation":[]}],"member":"266","published-online":{"date-parts":[[2025,6,10]]},"reference":[{"key":"mlstadde2abib1","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevD.100.034515","article-title":"Flow-based generative models for Markov chain Monte Carlo in lattice field theory","volume":"100","author":"Albergo","year":"2019","journal-title":"Phys. Rev. D"},{"key":"mlstadde2abib2","first-page":"pp 1730","article-title":"Online learning and information exponents: the importance of batch size & time\/complexity tradeoffs","author":"Arnaboldi","year":"2024"},{"article-title":"Escaping mediocrity: how two-layer networks learn hard generalized linear models with SGD","year":"2023a","author":"Arnaboldi","key":"mlstadde2abib3"},{"key":"mlstadde2abib4","first-page":"pp 1199","article-title":"From high-dimensional & mean-field dynamics to dimensionless ODEs: a unifying approach to SGD in two-layers networks","author":"Arnaboldi","year":"2023b"},{"article-title":"High-dimensional SGD aligns with emerging outlier eigenspaces","year":"2023","author":"Ben Arous","key":"mlstadde2abib5"},{"key":"mlstadde2abib6","first-page":"pp 25349","article-title":"High-dimensional limit theorems for SGD: effective dynamics and critical scaling","volume":"vol 35","author":"Ben Arous","year":"2022"},{"article-title":"Beyond ELBOs: a large-scale evaluation of variational methods for sampling","year":"2024","author":"Blessing","key":"mlstadde2abib7"},{"key":"mlstadde2abib8","doi-asserted-by":"publisher","first-page":"4218","DOI":"10.1109\/TIT.2024.3374716","article-title":"Local minima structures in Gaussian mixture models","volume":"70","author":"Chen","year":"2024","journal-title":"IEEE Trans. Inf. Theory"},{"article-title":"Hitting the high-dimensional notes: an ODE for SGD learning dynamics on GLMs and multi-index models","year":"2023","author":"Collins-Woodfin","key":"mlstadde2abib9"},{"article-title":"Density estimation using real NVP","year":"2017","author":"Dinh","key":"mlstadde2abib10"},{"key":"mlstadde2abib11","doi-asserted-by":"publisher","first-page":"1208","DOI":"10.1002\/cpa.22032","article-title":"Likelihood landscape and maximum likelihood estimation for the discrete orbit recovery model","volume":"76","author":"Fan","year":"2023","journal-title":"Commun. Pure Appl. Math."},{"key":"mlstadde2abib12","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.2109420119","article-title":"Adaptive Monte Carlo augmented with normalizing flows","volume":"119","author":"Gabri\u00e9","year":"2022","journal-title":"Proc. Natl Acad. Sci."},{"key":"mlstadde2abib13","article-title":"Dynamics of stochastic gradient descent for two-layer neural networks in the teacher-student setup","volume":"vol 32","author":"Goldt","year":"2019"},{"article-title":"Flow-based sampling for multimodal distributions in lattice field theory","year":"2021","author":"Hackett","key":"mlstadde2abib14"},{"key":"mlstadde2abib15","first-page":"pp 20700","article-title":"Theoretical guarantees for variational inference with fixed-variance mixture of Gaussians","author":"Huix","year":"2024"},{"article-title":"Bias in motion: theoretical insights into the dynamics of bias in SGD training","year":"2024","author":"Jain","key":"mlstadde2abib16"},{"key":"mlstadde2abib17","first-page":"pp 1819","article-title":"Variational refinement for importance sampling using the forward Kullback-Leibler divergence","author":"Jerfel","year":"2021"},{"key":"mlstadde2abib18","article-title":"Local maxima in the likelihood of Gaussian mixture models: structural results and algorithmic consequences","volume":"vol 29","author":"Jin","year":"2016"},{"key":"mlstadde2abib19","doi-asserted-by":"publisher","first-page":"6243","DOI":"10.1088\/0305-4470\/25\/23\/020","article-title":"Optimal generalization in perceptions","volume":"25","author":"Kinouchi","year":"1992","journal-title":"J. Phys. A: Math. Gen."},{"key":"mlstadde2abib20","doi-asserted-by":"publisher","first-page":"3964","DOI":"10.1109\/TPAMI.2020.2992934","article-title":"Normalizing flows: an introduction and review of current methods","volume":"43","author":"Kobyzev","year":"2021","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"volume":"vol 13","year":"2006","author":"Krauth","key":"mlstadde2abib21"},{"key":"mlstadde2abib22","first-page":"pp 29","article-title":"The neural autoregressive distribution estimator","author":"Larochelle","year":"2011"},{"key":"mlstadde2abib23","first-page":"pp 8321","article-title":"Mode-seeking divergences: theory and applications to GANs","author":"Li","year":"2023"},{"key":"mlstadde2abib24","first-page":"pp 2062","article-title":"On convergence in Wasserstein distance and f-divergence minimization problems","author":"Li","year":"2024"},{"year":"2002","author":"MacKay","key":"mlstadde2abib25"},{"article-title":"Flow annealed importance sampling bootstrap","year":"2022","author":"Midgley","key":"mlstadde2abib26"},{"article-title":"Variational boosting: iteratively refining posterior approximations","year":"2017","author":"Miller","key":"mlstadde2abib27"},{"article-title":"Optimal protocols for continual learning via statistical physics and control theory","year":"2024","author":"Mori","key":"mlstadde2abib28"},{"key":"mlstadde2abib29","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevD.108.114501","article-title":"Detecting and mitigating mode-collapse for flow-based sampling of lattice field theories","volume":"108","author":"Nicoli","year":"2023","journal-title":"Phys. Rev. D"},{"key":"mlstadde2abib30","doi-asserted-by":"publisher","first-page":"338","DOI":"10.22323\/1.396.0338","article-title":"Machine learning of thermodynamic observables in the presence of mode collapse","volume":"396","author":"Nicoli","year":"2022","journal-title":"Proc. Sci."},{"key":"mlstadde2abib31","doi-asserted-by":"publisher","first-page":"eaaw1147","DOI":"10.1126\/science.aaw1147","article-title":"Boltzmann generators: sampling equilibrium states of many-body systems with deep learning","volume":"365","author":"No\u00e9","year":"2019","journal-title":"Science"},{"key":"mlstadde2abib32","first-page":"1","article-title":"Normalizing flows for probabilistic modeling and inference","volume":"22","author":"Papamakarios","year":"2021","journal-title":"J. Mach. Learn. Res."},{"article-title":"The RL perceptron: generalisation dynamics of policy learning in high dimensions","year":"2023","author":"Patel","key":"mlstadde2abib33"},{"key":"mlstadde2abib34","first-page":"pp 8936","article-title":"Classifying high-dimensional Gaussian mixtures: where kernel methods fail and neural networks succeed","author":"Refinetti","year":"2021"},{"article-title":"Alpha-beta divergence for variational inference","year":"2018","author":"Regli","key":"mlstadde2abib35"},{"key":"mlstadde2abib36","first-page":"pp 1530","article-title":"Variational inference with normalizing flows","author":"Rezende","year":"2015"},{"key":"mlstadde2abib37","doi-asserted-by":"publisher","first-page":"4225","DOI":"10.1103\/PhysRevE.52.4225","article-title":"On-line learning in soft committee machines","volume":"52","author":"Saad","year":"1995","journal-title":"Phys. Rev. E"},{"key":"mlstadde2abib38","article-title":"Dynamics of on-line gradient descent learning for multilayer neural networks","volume":"vol 8","author":"Saad","year":"1996"},{"key":"mlstadde2abib39","first-page":"pp 628","article-title":"Are there local maxima in the infinite-sample likelihood of Gaussian mixture estimation?","author":"Srebro","year":"2007"},{"key":"mlstadde2abib40","doi-asserted-by":"publisher","first-page":"217","DOI":"10.4310\/CMS.2010.v8.n1.a11","article-title":"Density estimation by dual ascent of the log-likelihood","volume":"8","author":"Tabak","year":"2010","journal-title":"Commun. Math. Sci."},{"key":"mlstadde2abib41","first-page":"1235","article-title":"Energy-based models for sparse overcomplete representations","volume":"4","author":"Teh","year":"2003","journal-title":"J. Mach. Learn. Res."},{"key":"mlstadde2abib42","first-page":"1","article-title":"Neural autoregressive distribution estimation","volume":"17","author":"Uria","year":"2016","journal-title":"J. Mach. Learn. Res."},{"key":"mlstadde2abib43","doi-asserted-by":"publisher","DOI":"10.1088\/2632-2153\/ac9455","article-title":"Gradients should stay on path: better estimators of the reverse- and forward KL divergence for normalizing flows","volume":"3","author":"Vaitl","year":"2022","journal-title":"Mach. Learn.: Sci. Technol."},{"key":"mlstadde2abib44","first-page":"pp 23244","article-title":"Phase diagram of stochastic gradient descent in high-dimensional two-layer neural networks","volume":"vol 35","author":"Veiga","year":"2022"},{"key":"mlstadde2abib45","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevLett.122.080602","article-title":"Solving statistical mechanics using variational autoregressive networks","volume":"122","author":"Wu","year":"2019","journal-title":"Phys. Rev. Lett."},{"key":"mlstadde2abib46","article-title":"Global analysis of expectation maximization for mixtures of two Gaussians","volume":"vol 29","author":"Xu","year":"2016"}],"container-title":["Machine Learning: Science and Technology"],"original-title":[],"link":[{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/adde2a","content-type":"text\/html","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/adde2a\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/adde2a","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/adde2a\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/adde2a\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/adde2a\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/adde2a\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"similarity-checking"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/adde2a\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,10]],"date-time":"2025-06-10T09:03:32Z","timestamp":1749546212000},"score":1,"resource":{"primary":{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/adde2a"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,10]]},"references-count":46,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2025,6,10]]},"published-print":{"date-parts":[[2025,6,30]]}},"URL":"https:\/\/doi.org\/10.1088\/2632-2153\/adde2a","relation":{},"ISSN":["2632-2153"],"issn-type":[{"type":"electronic","value":"2632-2153"}],"subject":[],"published":{"date-parts":[[2025,6,10]]},"assertion":[{"value":"A theoretical perspective on mode collapse in variational inference","name":"article_title","label":"Article Title"},{"value":"Machine Learning: Science and Technology","name":"journal_title","label":"Journal Title"},{"value":"paper","name":"article_type","label":"Article Type"},{"value":"\u00a9 2025 The Author(s). Published by IOP Publishing Ltd","name":"copyright_information","label":"Copyright Information"},{"value":"2025-02-17","name":"date_received","label":"Date Received","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2025-05-28","name":"date_accepted","label":"Date Accepted","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2025-06-10","name":"date_epub","label":"Online publication date","group":{"name":"publication_dates","label":"Publication dates"}}]}}