{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:31:57Z","timestamp":1772137917923,"version":"3.50.1"},"reference-count":45,"publisher":"IOP Publishing","issue":"4","license":[{"start":{"date-parts":[[2022,10,19]],"date-time":"2022-10-19T00:00:00Z","timestamp":1666137600000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,10,19]],"date-time":"2022-10-19T00:00:00Z","timestamp":1666137600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/iopscience.iop.org\/info\/page\/text-and-data-mining"}],"funder":[{"name":"Berlin Institute for the Foundations of Learning and Data","award":["01IS18025A"],"award-info":[{"award-number":["01IS18025A"]}]}],"content-domain":{"domain":["iopscience.iop.org"],"crossmark-restriction":false},"short-container-title":["Mach. Learn.: Sci. Technol."],"published-print":{"date-parts":[[2022,12,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>We show how to use the path-wise derivative estimator for both the forward reverse Kullback\u2013Leibler divergence for any practically invertible normalizing flow. The resulting path-gradient estimators are straightforward to implement, have lower variance, and lead not only to faster convergence of training but also to better overall approximation results compared to standard total gradient estimators. We also demonstrate that path-gradient training is less susceptible to mode-collapse. In light of our results, we expect that path-gradient estimators will become the new standard method to train normalizing flows for variational inference.<\/jats:p>","DOI":"10.1088\/2632-2153\/ac9455","type":"journal-article","created":{"date-parts":[[2022,9,22]],"date-time":"2022-09-22T18:46:16Z","timestamp":1663872376000},"page":"045006","update-policy":"https:\/\/doi.org\/10.1088\/crossmark-policy","source":"Crossref","is-referenced-by-count":9,"title":["Gradients should stay on path: better estimators of the reverse- and forward KL divergence for normalizing flows"],"prefix":"10.1088","volume":"3","author":[{"given":"Lorenz","family":"Vaitl","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5933-1822","authenticated-orcid":true,"given":"Kim A","family":"Nicoli","sequence":"additional","affiliation":[]},{"given":"Shinichi","family":"Nakajima","sequence":"additional","affiliation":[]},{"given":"Pan","family":"Kessel","sequence":"additional","affiliation":[]}],"member":"266","published-online":{"date-parts":[[2022,10,19]]},"reference":[{"key":"mlstac9455bib1","doi-asserted-by":"publisher","first-page":"eaaw1147","DOI":"10.1126\/science.aaw1147","article-title":"Boltzmann generators: sampling equilibrium states of many-body systems with deep learning","volume":"365","author":"No\u00e9","year":"2019","journal-title":"Science"},{"key":"mlstac9455bib2","article-title":"Stochastic normalizing flows","author":"Hao","year":"2020"},{"key":"mlstac9455bib3","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevD.100.034515","article-title":"Flow-based generative models for Markov chain Monte Carlo in lattice field theory","volume":"100","author":"Albergo","year":"2019","journal-title":"Phys. Rev. D"},{"key":"mlstac9455bib4","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevLett.125.121601","article-title":"Equivariant flow-based sampling for lattice gauge theory","volume":"125","author":"Kanwar","year":"2020","journal-title":"Phys. Rev. Lett."},{"key":"mlstac9455bib5","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevD.103.074504","article-title":"Sampling using SUN gauge equivariant flows","volume":"103","author":"Boyda","year":"2021","journal-title":"Phys. Rev. D"},{"key":"mlstac9455bib6","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevLett.122.08060","article-title":"Solving statistical mechanics using variational autoregressive networks","volume":"122","author":"Dian","year":"2019","journal-title":"Phys. Rev. Lett."},{"key":"mlstac9455bib7","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevLett.126.032001","article-title":"Estimation of thermodynamic observables in lattice field theories with deep generative models","volume":"126","author":"Nicoli","year":"2021","journal-title":"Phys. Rev. Lett."},{"key":"mlstac9455bib8","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevE.101.023304","article-title":"Asymptotically unbiased estimation of physical observables with neural samplers","volume":"101","author":"Nicoli","year":"2020","journal-title":"Phys. Rev. E"},{"key":"mlstac9455bib9","article-title":"Comment on \u201cSolving statistical mechanics using vans\u201d: introducing savant-vans enhanced by importance and mcmc sampling","author":"Nicoli","year":"2019"},{"key":"mlstac9455bib10","doi-asserted-by":"publisher","first-page":"145","DOI":"10.1145\/3341156","article-title":"Neural importance sampling","volume":"38","author":"M\u00fcller","year":"2019","journal-title":"ACM Trans. Graph."},{"key":"mlstac9455bib11","article-title":"Flow-based sampling for multimodal distributions in lattice field theory","author":"Hackett","year":"2021"},{"key":"mlstac9455bib12","doi-asserted-by":"publisher","DOI":"10.22323\/1.396.0338","article-title":"Machine learning of thermodynamic observables in the presence of mode collapse","author":"Nicoli","year":"2021"},{"key":"mlstac9455bib13","article-title":"Sticking the landing: simple, lower-variance gradient estimators for variational inference","author":"Roeder","year":"2017"},{"key":"mlstac9455bib14","first-page":"pp 21945","article-title":"Path-gradient estimators for continuous normalizing flows","author":"Vaitl","year":"2022"},{"key":"mlstac9455bib15","article-title":"Doubly reparameterized gradient estimators for Monte Carlo objectives","author":"Tucker","year":"2019"},{"key":"mlstac9455bib16","article-title":"Importance weighted autoencoders","author":"Burda","year":"2016"},{"key":"mlstac9455bib17","article-title":"Reweighted wake-sleep","author":"Bornschein","year":"2015"},{"key":"mlstac9455bib18","article-title":"Debiasing Evidence Approximations: On Importance-weighted Autoencoders and Jackknife Variational Inference","author":"Nowozin","year":"2018"},{"key":"mlstac9455bib19","doi-asserted-by":"publisher","first-page":"229","DOI":"10.1007\/BF00992696","article-title":"Simple statistical gradient-following algorithms for connectionist reinforcement learning","volume":"8","author":"Williams","year":"1992","journal-title":"Mach. Learn."},{"key":"mlstac9455bib20","article-title":"On importance-weighted autoencoders","author":"Finke","year":"2019"},{"key":"mlstac9455bib21","first-page":"pp 738","article-title":"Generalized doubly-reparameterized gradient estimators","author":"Bauer","year":"2021"},{"key":"mlstac9455bib22","article-title":"On the difficulty of unbiased alpha divergence minimization","author":"Geffner","year":"2020"},{"key":"mlstac9455bib23","article-title":"Empirical evaluation of biased methods for alpha divergence minimization","author":"Geffner","year":"2021"},{"key":"mlstac9455bib24","article-title":"Advances in black-box VI: normalizing flows, importance weighting and optimization","author":"Agrawal","year":"2020"},{"key":"mlstac9455bib25","article-title":"Auto-encoding variational bayes","author":"Kingma","year":"2014"},{"key":"mlstac9455bib26","article-title":"Implicit reparameterization gradients","author":"Figurnov","year":"2018"},{"key":"mlstac9455bib27","first-page":"pp 2796","article-title":"Smooth normalizing flows","author":"K\u00f6hler","year":"2021"},{"key":"mlstac9455bib28","article-title":"The generalized reparameterization gradient","author":"Ruiz","year":"2016"},{"key":"mlstac9455bib29","first-page":"pp 2235","article-title":"Pathwise derivatives beyond the reparameterization trick","author":"Jankowiak","year":"2018"},{"key":"mlstac9455bib30","first-page":"pp 17370","article-title":"F-divergence variational inference","author":"Wan","year":"2020"},{"key":"mlstac9455bib31","article-title":"Reparameterization gradients through acceptance-rejection sampling algorithms","author":"Naesseth","year":"2017"},{"key":"mlstac9455bib32","article-title":"Neural variational inference and learning in belief networks","author":"Mnih","year":"2014"},{"key":"mlstac9455bib33","first-page":"pp 13481","article-title":"VarGrad: a low-variance gradient estimator for variational inference","author":"Richter","year":"2020"},{"key":"mlstac9455bib34","article-title":"On using control variates with stochastic approximation for variational bayes and its connection to stochastic linear regression","author":"Salimans","year":"2014"},{"key":"mlstac9455bib35","article-title":"Advances in importance sampling","author":"Hesterberg","year":"1988"},{"key":"mlstac9455bib36","article-title":"Challenges and opportunities in high-dimensional variational inference","author":"Dhaka","year":"2021"},{"key":"mlstac9455bib37","article-title":"Monte Carlo theory, methods and examples","author":"Owen","year":"2013"},{"key":"mlstac9455bib38","doi-asserted-by":"publisher","first-page":"60","DOI":"10.1109\/MSP.2017.2699226","article-title":"Adaptive importance sampling: the past, the present and the future","volume":"34","author":"Bugallo","year":"2017","journal-title":"IEEE Signal Process. Mag."},{"key":"mlstac9455bib39","doi-asserted-by":"publisher","first-page":"125","DOI":"10.1023\/A:1008923215028","article-title":"Annealed importance sampling","volume":"11","author":"Neal","year":"2001","journal-title":"Stat. Comput."},{"key":"mlstac9455bib40","article-title":"Flow annealed importance sampling bootstrap","author":"Midgley","year":"2022"},{"key":"mlstac9455bib41","first-page":"pp 12020","article-title":"Marginal tail-adaptive normalizing flows","author":"Laszkiewicz","year":"2022"},{"key":"mlstac9455bib42","first-page":"pp 4673","article-title":"Tails of lipschitz triangular flows","author":"Jaini","year":"2020"},{"key":"mlstac9455bib43","first-page":"pp 3918","article-title":"Parallel WaveNet: fast high-fidelity speech synthesis","author":"Aaron","year":"2018"},{"key":"mlstac9455bib44","author":"Casella","year":"2021"},{"key":"mlstac9455bib45","author":"Small","year":"2010"}],"container-title":["Machine Learning: Science and Technology"],"original-title":[],"link":[{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ac9455","content-type":"text\/html","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ac9455\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ac9455","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ac9455\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ac9455\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ac9455\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ac9455\/pdf","content-type":"application\/pdf","content-version":"am","intended-application":"similarity-checking"},{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ac9455\/pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,10,19]],"date-time":"2022-10-19T07:05:26Z","timestamp":1666163126000},"score":1,"resource":{"primary":{"URL":"https:\/\/iopscience.iop.org\/article\/10.1088\/2632-2153\/ac9455"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,10,19]]},"references-count":45,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2022,10,19]]},"published-print":{"date-parts":[[2022,12,1]]}},"URL":"https:\/\/doi.org\/10.1088\/2632-2153\/ac9455","relation":{"has-review":[{"id-type":"doi","id":"10.1088\/2632-2153\/AC9455\/v1\/review2","asserted-by":"object"},{"id-type":"doi","id":"10.1088\/2632-2153\/AC9455\/v1\/review1","asserted-by":"object"},{"id-type":"doi","id":"10.1088\/2632-2153\/AC9455\/v2\/response1","asserted-by":"object"},{"id-type":"doi","id":"10.1088\/2632-2153\/AC9455\/v1\/decision1","asserted-by":"object"},{"id-type":"doi","id":"10.1088\/2632-2153\/AC9455\/v2\/decision1","asserted-by":"object"}]},"ISSN":["2632-2153"],"issn-type":[{"value":"2632-2153","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,10,19]]},"assertion":[{"value":"Gradients should stay on path: better estimators of the reverse- and forward KL divergence for normalizing flows","name":"article_title","label":"Article Title"},{"value":"Machine Learning: Science and Technology","name":"journal_title","label":"Journal Title"},{"value":"paper","name":"article_type","label":"Article Type"},{"value":"\u00a9 2022 The Author(s). Published by IOP Publishing Ltd","name":"copyright_information","label":"Copyright Information"},{"value":"2022-07-17","name":"date_received","label":"Date Received","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2022-09-22","name":"date_accepted","label":"Date Accepted","group":{"name":"publication_dates","label":"Publication dates"}},{"value":"2022-10-19","name":"date_epub","label":"Online publication date","group":{"name":"publication_dates","label":"Publication dates"}}]}}