{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,16]],"date-time":"2026-04-16T06:47:00Z","timestamp":1776322020950,"version":"3.50.1"},"reference-count":35,"publisher":"MDPI AG","issue":"10","license":[{"start":{"date-parts":[[2025,10,17]],"date-time":"2025-10-17T00:00:00Z","timestamp":1760659200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>The Hamiltonian Monte Carlo (HMC) method is effective for Bayesian inference but suffers from synchronization overhead in distributed settings. We propose two variants: a distributed HMC (DHMC) baseline with synchronized, globally exact gradient evaluations and a communication-avoiding leapfrog HMC (CALF-HMC) method that interleaves local surrogate micro-steps with a single\u2013global Metropolis\u2013Hastings correction per trajectory. Implemented on Apache Spark\/PySpark and evaluated on a large synthetic logistic regression (N=107, d=100, workers J\u2208{4,8,16,32}), DHMC attained an average acceptance of 0.986, mean ESS of 1200, and wall-clock of 64.1 s per evaluation run, yielding \u224818.7 ESS\/s; CALF-HMC achieved an acceptance of 0.942, mean ESS of 5.1, and 14.8 s, i.e., \u22480.34 ESS\/s under the tested surrogate configuration. While DHMC delivered higher ESS\/s due to robust mixing under conservative integration, CALF-HMC reduced the per-trajectory runtime and exhibited more favorable scaling as inter-worker latency increased. The study contributes (i) a systems-oriented communication cost model for distributed HMC, (ii) an exact, communication-avoiding leapfrog variant, and (iii) practical guidance for ESS\/s-optimized tuning on clusters.<\/jats:p>","DOI":"10.3390\/a18100661","type":"journal-article","created":{"date-parts":[[2025,10,17]],"date-time":"2025-10-17T16:13:03Z","timestamp":1760717583000},"page":"661","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["MCMC Methods: From Theory to Distributed Hamiltonian Monte Carlo over PySpark"],"prefix":"10.3390","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4253-7661","authenticated-orcid":false,"given":"Christos","family":"Karras","sequence":"first","affiliation":[{"name":"Computer Engineering and Informatics Department, University of Patras, 26504 Patras, Greece"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0891-6780","authenticated-orcid":false,"given":"Leonidas","family":"Theodorakopoulos","sequence":"additional","affiliation":[{"name":"Department of Management Science and Technology, University of Patras, 26334 Patras, Greece"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4632-6511","authenticated-orcid":false,"given":"Aristeidis","family":"Karras","sequence":"additional","affiliation":[{"name":"Computer Engineering and Informatics Department, University of Patras, 26504 Patras, Greece"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-0008-547X","authenticated-orcid":false,"given":"George A.","family":"Krimpas","sequence":"additional","affiliation":[{"name":"Computer Engineering and Informatics Department, University of Patras, 26504 Patras, Greece"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-8837-2248","authenticated-orcid":false,"given":"Charalampos-Panagiotis","family":"Bakalis","sequence":"additional","affiliation":[{"name":"Department of Management Science and Technology, University of Patras, 26334 Patras, Greece"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-6314-7795","authenticated-orcid":false,"given":"Alexandra","family":"Theodoropoulou","sequence":"additional","affiliation":[{"name":"Department of Management Science and Technology, University of Patras, 26334 Patras, Greece"}]}],"member":"1968","published-online":{"date-parts":[[2025,10,17]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"179","DOI":"10.1090\/S0273-0979-08-01238-X","article-title":"The Markov Chain Monte Carlo Revolution","volume":"46","author":"Diaconis","year":"2009","journal-title":"Bull. Am. Math. Soc."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"204","DOI":"10.63180\/jcsra.thestap.2025.4.2","article-title":"Model-based systems engineering cybersecurity risk assessment for industrial control systems leveraging NIST risk management framework methodology","volume":"2025","author":"Gampel","year":"2025","journal-title":"J. Cyber Secur. Risk Audit."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Fearnhead, P., Nemeth, C., Oates, C.J., and Sherlock, C. (2024). Scalable Monte Carlo for Bayesian Learning. arXiv.","DOI":"10.1017\/9781009288460"},{"key":"ref_4","unstructured":"Cornish, R., Vanetti, P., Bouchard-C\u00f4t\u00e9, A., Deligiannidis, G., and Doucet, A. (2019, January 10\u201315). Scalable Metropolis-Hastings for exact Bayesian inference with large datasets. Proceedings of the International Conference on Machine Learning. PMLR, Long Beach, CA, USA."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Scott, S.L., Blocker, A.W., Bonassi, F.V., Chipman, H.A., George, E.I., and McCulloch, R.E. (2022). Bayes and big data: The consensus Monte Carlo algorithm. Big Data and Information Theory, Routledge.","DOI":"10.4324\/9781003289173-2"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Vlachou, E., Karras, A., Karras, C., Theodorakopoulos, L., Halkiopoulos, C., and Sioutas, S. (2023). Distributed Bayesian inference for large-scale IoT systems. Big Data Cogn. Comput., 8.","DOI":"10.3390\/bdcc8010001"},{"key":"ref_7","unstructured":"Korattikara Balan, A. (2014). Approximate Markov Chain Monte Carlo Algorithms for Large Scale Bayesian Inference. [Ph.D. Thesis, UC Irvine]."},{"key":"ref_8","unstructured":"Strathmann, H., Sejdinovic, D., and Girolami, M. (2015). Unbiased Bayes for big data: Paths of partial posteriors. arXiv."},{"key":"ref_9","unstructured":"Pollock, M., Fearnhead, P., Johansen, A.M., and Roberts, G.O. (2016). The scalable Langevin exact algorithm: Bayesian inference for big data. arXiv."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Johannes, M., and Polson, N. (2009). Markov Chain Monte Carlo. Handbook of Financial Time Series, Springer.","DOI":"10.1007\/978-3-540-71297-8_43"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"179","DOI":"10.1146\/annurev-statistics-022513-115540","article-title":"Bayesian computation via markov chain monte carlo","volume":"1","author":"Craiu","year":"2014","journal-title":"Annu. Rev. Stat. Appl."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"145","DOI":"10.1007\/s11222-014-9524-7","article-title":"Scalable inference for Markov processes with intractable likelihoods","volume":"25","author":"Owen","year":"2015","journal-title":"Stat. Comput."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"1087","DOI":"10.1063\/1.1699114","article-title":"Equation of state calculations by fast computing machines","volume":"21","author":"Metropolis","year":"1953","journal-title":"J. Chem. Phys."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"97","DOI":"10.1093\/biomet\/57.1.97","article-title":"Monte Carlo sampling methods using Markov chains and their applications","volume":"57","author":"Hastings","year":"1970","journal-title":"Biometrika"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"721","DOI":"10.1109\/TPAMI.1984.4767596","article-title":"Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images","volume":"PAMI-6","author":"Geman","year":"1984","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_16","unstructured":"Welling, M., and Teh, Y.W. (July, January 28). Bayesian learning via stochastic gradient Langevin dynamics. Proceedings of the 28th International Conference on Machine Learning (ICML-11), Bellevue, WA, USA."},{"key":"ref_17","unstructured":"Korattikara, A., Chen, Y., and Welling, M. (2014, January 21\u201326). Austerity in MCMC land: Cutting the Metropolis-Hastings budget. Proceedings of the International Conference on Machine Learning, PMLR, Beijing, China."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"1167","DOI":"10.1111\/rssb.12365","article-title":"Quasi-stationary Monte Carlo and the ScaLE algorithm","volume":"82","author":"Pollock","year":"2020","journal-title":"J. R. Stat. Soc. Ser. B Stat. Methodol."},{"key":"ref_19","unstructured":"Br\u00e9maud, P. (1991). Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues, Springer."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Blitzstein, J.K., and Hwang, J. (2019). Introduction to Probability, CRC Press. [2nd ed.].","DOI":"10.1201\/9780429428357"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Karras, C., Karras, A., Avlonitis, M., and Sioutas, S. (2022, January 17\u201320). An overview of mcmc methods: From theory to applications. Proceedings of the IFIP International Conference on Artificial Intelligence Applications and Innovations, Hersonissos, Greece.","DOI":"10.1007\/978-3-031-08341-9_26"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Karras, C., Karras, A., Avlonitis, M., Giannoukou, I., and Sioutas, S. (2022, January 17\u201320). Maximum likelihood estimators on mcmc sampling algorithms for decision making. Proceedings of the IFIP International Conference on Artificial Intelligence Applications and Innovations, Hersonissos, Greece.","DOI":"10.1007\/978-3-031-08341-9_28"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Brooks, S., Gelman, A., Jones, G., and Meng, X.L. (2011). Handbook of Markov Chain Monte Carlo, Chapman and Hall\/CRC.","DOI":"10.1201\/b10905"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., and Rubin, D.B. (2013). Bayesian Data Analysis, Chapman and Hall\/CRC. [3rd ed.].","DOI":"10.1201\/b16018"},{"key":"ref_25","unstructured":"Stephens, M. (2019, December 05). Simple Examples of Metropolis\u2013Hastings Algorithm. Available online: https:\/\/stephens999.github.io\/fiveMinuteStats\/MH-examples1.html."},{"key":"ref_26","unstructured":"Ross, S.M. (2007). Introduction to Probability Models, Academic Press. [9th ed.]."},{"key":"ref_27","unstructured":"Crow, J.F., and Kimura, M. (1970). An Introduction to Population Genetics Theory, Harper and Row."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Karras, C., Karras, A., Tsolis, D., Giotopoulos, K.C., and Sioutas, S. (2022, January 23\u201325). Distributed Gibbs sampling and LDA modelling for large scale big data management on PySpark. Proceedings of the 2022 7th South-East Europe Design Automation, Computer Engineering, Computer Networks and Social Media Conference (SEEDA-CECNSM), Ioannina, Greece.","DOI":"10.1109\/SEEDA-CECNSM57760.2022.9932990"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Vlachou, E., Karras, C., Karras, A., Tsolis, D., and Sioutas, S. (2023). EVCA classifier: A MCMC-based classifier for analyzing high-dimensional big data. Information, 14.","DOI":"10.3390\/info14080451"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Karras, A., Karras, C., Schizas, N., Avlonitis, M., and Sioutas, S. (2023). AutoML with Bayesian optimizations for big data management. Information, 14.","DOI":"10.3390\/info14040223"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"327","DOI":"10.1080\/00031305.1995.10476177","article-title":"Understanding the Metropolis-Hastings Algorithm","volume":"49","author":"Chib","year":"1995","journal-title":"Am. Stat."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"167","DOI":"10.1080\/00031305.1992.10475878","article-title":"Explaining the Gibbs Sampler","volume":"46","author":"Casella","year":"1992","journal-title":"Am. Stat."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Kruschke, J. (2014). Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan, Academic Press.","DOI":"10.1016\/B978-0-12-405888-0.00008-8"},{"key":"ref_34","unstructured":"Stock, J.H., and Watson, M.W. (2012). Introduction to Econometrics, Pearson. [3rd ed.]."},{"key":"ref_35","first-page":"1593","article-title":"The No-U-Turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo","volume":"15","author":"Hoffman","year":"2014","journal-title":"J. Mach. Learn. Res."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/18\/10\/661\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,17]],"date-time":"2025-10-17T16:35:22Z","timestamp":1760718922000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/18\/10\/661"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,17]]},"references-count":35,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2025,10]]}},"alternative-id":["a18100661"],"URL":"https:\/\/doi.org\/10.3390\/a18100661","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,10,17]]}}}