{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T02:57:11Z","timestamp":1760151431725,"version":"build-2065373602"},"reference-count":45,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2022,3,16]],"date-time":"2022-03-16T00:00:00Z","timestamp":1647388800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Nature Science Foundation of China","award":["62076096","62006078"],"award-info":[{"award-number":["62076096","62006078"]}]},{"name":"Shanghai Municipal Project","award":["20511100900"],"award-info":[{"award-number":["20511100900"]}]},{"name":"Shanghai Knowledge Service Platform Project","award":["ZF1213"],"award-info":[{"award-number":["ZF1213"]}]},{"name":"Shanghai Chenguang Program","award":["19CG25"],"award-info":[{"award-number":["19CG25"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>Recently, flow models parameterized by neural networks have been used to design efficient Markov chain Monte Carlo (MCMC) transition kernels. However, inefficient utilization of gradient information of the target distribution or the use of volume-preserving flows limits their performance in sampling from multi-modal target distributions. In this paper, we treat the training procedure of the parameterized transition kernels in a different manner and exploit a novel scheme to train MCMC transition kernels. We divide the training process of transition kernels into the exploration stage and training stage, which can make full use of the gradient information of the target distribution and the expressive power of deep neural networks. The transition kernels are constructed with non-volume-preserving flows and trained in an adversarial form. The proposed method achieves significant improvement in effective sample size and mixes quickly to the target distribution. Empirical results validate that the proposed method is able to achieve low autocorrelation of samples and fast convergence rates, and outperforms other state-of-the-art parameterized transition kernels in varieties of challenging analytically described distributions and real world datasets.<\/jats:p>","DOI":"10.3390\/e24030415","type":"journal-article","created":{"date-parts":[[2022,3,16]],"date-time":"2022-03-16T22:09:58Z","timestamp":1647468598000},"page":"415","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Adversarially Training MCMC with Non-Volume-Preserving Flows"],"prefix":"10.3390","volume":"24","author":[{"given":"Shaofan","family":"Liu","sequence":"first","affiliation":[{"name":"School of Computer Science and Technology, East China Normal University, Shanghai 200062, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shiliang","family":"Sun","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, East China Normal University, Shanghai 200062, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2022,3,16]]},"reference":[{"key":"ref_1","unstructured":"Robert, C., and Casella, G. (2013). Monte Carlo Statistical Methods, Springer Science & Business Media."},{"key":"ref_2","unstructured":"Neal, R.M. (1993). Probabilistic Inference Using Markov Chain Monte Carlo Methods, Department of Computer Science, University of Toronto. Technical Report."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"2797","DOI":"10.1007\/s00180-013-0429-2","article-title":"On the flexibility of the design of multiple try Metropolis schemes","volume":"28","author":"Martino","year":"2013","journal-title":"Comput. Stat."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"97","DOI":"10.1093\/biomet\/57.1.97","article-title":"Monte Carlo sampling methods using Markov chains and their applications","volume":"57","author":"Hastings","year":"1970","journal-title":"Biometrika"},{"key":"ref_5","unstructured":"Wang, Z., Mohamed, S., and Freitas, N. (2013, January 16\u201321). Adaptive Hamiltonian and riemann manifold Monte Carlo. Proceedings of the 30th International Conference on Machine Learning, Atlanta, GA, USA."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"107021","DOI":"10.1016\/j.patcog.2019.107021","article-title":"Decomposed slice sampling for factorized distributions","volume":"97","author":"Wang","year":"2020","journal-title":"Pattern Recognit."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"216","DOI":"10.1016\/0370-2693(87)91197-X","article-title":"Hybrid Monte Carlo","volume":"195","author":"Duane","year":"1987","journal-title":"Phys. Lett. B"},{"key":"ref_8","unstructured":"Simsekli, U., Yildiz, C., Nguyen, T.H., Richard, G., and Cemgil, A.T. (2018, January 10\u201315). Asynchronous Stochastic Quasi-Newton MCMC for Non-Convex Optimization. Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Betancourt, M. (2017). A conceptual introduction to Hamiltonian Monte Carlo. arXiv.","DOI":"10.3150\/16-BEJ810"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1016\/j.patcog.2019.01.046","article-title":"Sample size for maximum-likelihood estimates of Gaussian model depending on dimensionality of pattern space","volume":"91","author":"Psutka","year":"2019","journal-title":"Pattern Recognit."},{"key":"ref_11","unstructured":"Betancourt, M., Byrne, S., and Girolami, M. (2014). Optimizing the integrator step size for Hamiltonian Monte Carlo. arXiv."},{"key":"ref_12","unstructured":"Zou, D., Xu, P., and Gu, Q. (2018, January 10\u201315). Stochastic Variance-Reduced Hamilton Monte Carlo Methods. Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden."},{"key":"ref_13","unstructured":"Levy, D., Hoffman, M.D., and Sohl-Dickstein, J. (May, January 30). Generalizing Hamiltonian Monte Carlo with Neural Networks. Proceedings of the 6th International Conference on Learning Representations, Vancouver, BC, Canada."},{"key":"ref_14","unstructured":"Liu, C., Zhuo, J., and Zhu, J. (2019). Understanding MCMC Dynamics as Flows on the Wasserstein Space. arXiv."},{"key":"ref_15","unstructured":"Song, J., Zhao, S., and Ermon, S. (2017, January 4\u20139). A-NICE-MC: Adversarial training for MCMC. Proceedings of the 31st Conference on Neural Information Processing Systems, Long Beach, CA, USA."},{"key":"ref_16","unstructured":"Azadi, S., Olsson, C., Darrell, T., Goodfellow, I., and Odena, A. (2018). Discriminator rejection sampling. arXiv."},{"key":"ref_17","unstructured":"Dinh, L., Krueger, D., and Bengio, Y. (2014). Nice: Non-linear independent components estimation. arXiv."},{"key":"ref_18","unstructured":"Arjovsky, M., Chintala, S., and Bottou, L. (2017). Wasserstein GAN. arXiv."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1016\/j.patrec.2019.01.007","article-title":"Progressive generative adversarial networks with reliable sample identification","volume":"130","author":"Wei","year":"2020","journal-title":"Pattern Recognit. Lett."},{"key":"ref_20","first-page":"343","article-title":"Adaptively scaling the Metropolis algorithm using expected squared jumped distance","volume":"20","author":"Pasarica","year":"2010","journal-title":"Stat. Sin."},{"key":"ref_21","unstructured":"Yang, J., Roberts, G.O., and Rosenthal, J.S. (2019). Optimal scaling of Metropolis algorithms on general target distributions. arXiv."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"711","DOI":"10.1093\/biomet\/82.4.711","article-title":"Reversible jump Markov chain Monte Carlo computation and Bayesian model determination","volume":"82","author":"Green","year":"1995","journal-title":"Biometrika"},{"key":"ref_23","unstructured":"Bishop, C.M. (2006). Pattern Recognition and Machine Learning, Springer."},{"key":"ref_24","unstructured":"Cong, Y., Chen, B., Liu, H., and Zhou, M. (2017, January 6\u201311). Deep Latent Dirichlet Allocation with Topic-Layer-Adaptive Stochastic Gradient Riemannian MCMC. Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"2257","DOI":"10.3150\/16-BEJ810","article-title":"The geometric foundations of Hamiltonian Monte Carlo","volume":"23","author":"Betancourt","year":"2017","journal-title":"Bernoulli"},{"key":"ref_26","unstructured":"Tripuraneni, N., Rowland, M., Ghahramani, Z., and Turner, R. (2017, January 6\u201311). Magnetic Hamiltonian Monte Carlo. Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"035105","DOI":"10.1103\/PhysRevB.95.035105","article-title":"Accelerated Monte Carlo simulations with restricted Boltzmann machines","volume":"95","author":"Huang","year":"2017","journal-title":"Phys. Rev."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Li, C., Chen, C., Carlson, D., and Carin, L. (2016, January 12\u201317). Preconditioned Stochastic Gradient Langevin Dynamics for deep neural networks. Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA.","DOI":"10.1609\/aaai.v30i1.10200"},{"key":"ref_29","unstructured":"Kingma, D.P., Salimans, T., Jozefowicz, R., Chen, X., Sutskever, I., and Welling, M. (2016, January 5\u201310). Improved variational inference with inverse autoregressive flow. Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain."},{"key":"ref_30","unstructured":"Dinh, L., Sohl-Dickstein, J., and Bengio, S. (2016). Density estimation using Real NVP. arXiv."},{"key":"ref_31","unstructured":"Ma, F., Ayaz, U., and Karaman, S. (2018, January 3\u20138). Invertibility of convolutional generative networks from partial measurements. Proceedings of the Advances in Neural Information Processing Systems, Montr\u00e9al, QC, Canada."},{"key":"ref_32","unstructured":"Dinh, V., Bilge, A., Zhang, C., and Matsen, F.A. (2017, January 6\u201311). Probabilistic path Hamiltonian Monte Carlo. Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia."},{"key":"ref_33","unstructured":"Zhang, Y., Ghahramani, Z., Storkey, A.J., and Sutton, C.A. (2012, January 3\u20136). Continuous relaxations for discrete Hamilton Monte Carlo. Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Ichiki, A., and Ohzeki, M. (2013). Violation of detailed balance accelerates relaxation. arXiv.","DOI":"10.1103\/PhysRevE.88.020101"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Neal, R.M. (2011). MCMC using Hamiltonian dynamics. Handbook of Markov Chain Monte Carlo, Chapman & Hall\/CRC.","DOI":"10.1201\/b10905-6"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Efron, B., and Tibshirani, R.J. (1994). An Introduction to the Bootstrap, CRC Press.","DOI":"10.1201\/9780429246593"},{"key":"ref_37","unstructured":"Rezende, D.J., and Mohamed, S. (2015, January 6\u201311). Variational Inference with Normalizing Flows. Proceedings of the 32nd International Conference on Machine Learning, Lille, France."},{"key":"ref_38","first-page":"49","article-title":"Integrating structured biological data by Kernel Maximum Mean Discrepancy","volume":"22","author":"Borgwardt","year":"2006","journal-title":"IBM J. Res. Dev."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"720","DOI":"10.1162\/neco.1992.4.5.720","article-title":"The Evidence Framework Applied to Classification Networks","volume":"4","author":"MacKay","year":"1992","journal-title":"Neural Comput."},{"key":"ref_40","first-page":"1593","article-title":"The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo","volume":"15","author":"Hokman","year":"2014","journal-title":"J. Mach. Learn. Res."},{"key":"ref_41","unstructured":"Kingma, D.P., and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Freedman, D.A. (2009). Statistical Models: Theory and Practice, Cambridge University Press.","DOI":"10.1017\/CBO9780511815867"},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"107510","DOI":"10.1016\/j.patcog.2020.107510","article-title":"Efficient sampling-based energy function evaluation for ensemble optimization using simulated annealing","volume":"107","author":"Hajdu","year":"2020","journal-title":"Pattern Recognit."},{"key":"ref_44","unstructured":"Dua, D., and Graff, C. (2022, March 13). UCI Machine Learning Repository. Available online: https:\/\/archive.ics.uci.edu\/ml\/index.php."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"839","DOI":"10.1148\/radiology.148.3.6878708","article-title":"A method of comparing the areas under receiver operating characteristic curves derived from the same cases","volume":"148","author":"Hanley","year":"1983","journal-title":"Radiology"}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/24\/3\/415\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T22:37:42Z","timestamp":1760135862000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/24\/3\/415"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,16]]},"references-count":45,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2022,3]]}},"alternative-id":["e24030415"],"URL":"https:\/\/doi.org\/10.3390\/e24030415","relation":{},"ISSN":["1099-4300"],"issn-type":[{"type":"electronic","value":"1099-4300"}],"subject":[],"published":{"date-parts":[[2022,3,16]]}}}