{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,23]],"date-time":"2026-04-23T10:10:48Z","timestamp":1776939048529,"version":"3.51.4"},"reference-count":28,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2020,12,9]],"date-time":"2020-12-09T00:00:00Z","timestamp":1607472000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>Incomplete data are a common feature in many domains, from clinical trials to industrial applications. Bayesian networks (BNs) are often used in these domains because of their graphical and causal interpretations. BN parameter learning from incomplete data is usually implemented with the Expectation-Maximisation algorithm (EM), which computes the relevant sufficient statistics (\u201csoft EM\u201d) using belief propagation. Similarly, the Structural Expectation-Maximisation algorithm (Structural EM) learns the network structure of the BN from those sufficient statistics using algorithms designed for complete data. However, practical implementations of parameter and structure learning often impute missing data (\u201chard EM\u201d) to compute sufficient statistics instead of using belief propagation, for both ease of implementation and computational speed. In this paper, we investigate the question: what is the impact of using imputation instead of belief propagation on the quality of the resulting BNs? From a simulation study using synthetic data and reference BNs, we find that it is possible to recommend one approach over the other in several scenarios based on the characteristics of the data. We then use this information to build a simple decision tree to guide practitioners in choosing the EM algorithm best suited to their problem.<\/jats:p>","DOI":"10.3390\/a13120329","type":"journal-article","created":{"date-parts":[[2020,12,10]],"date-time":"2020-12-10T00:45:03Z","timestamp":1607561103000},"page":"329","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":17,"title":["Hard and Soft EM in Bayesian Network Learning from Incomplete Data"],"prefix":"10.3390","volume":"13","author":[{"given":"Andrea","family":"Ruggieri","sequence":"first","affiliation":[{"name":"Department of Informatics, Systems and Communication, Universit\u00e0 degli Studi di Milano-Bicocca, 20126 Milano, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5366-8499","authenticated-orcid":false,"given":"Francesco","family":"Stranieri","sequence":"additional","affiliation":[{"name":"Department of Informatics, Systems and Communication, Universit\u00e0 degli Studi di Milano-Bicocca, 20126 Milano, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1394-0507","authenticated-orcid":false,"given":"Fabio","family":"Stella","sequence":"additional","affiliation":[{"name":"Department of Informatics, Systems and Communication, Universit\u00e0 degli Studi di Milano-Bicocca, 20126 Milano, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Marco","family":"Scutari","sequence":"additional","affiliation":[{"name":"Istituto Dalle Molle di Studi sull\u2019Intelligenza Artificiale (IDSIA), 6962 Viganello, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2020,12,9]]},"reference":[{"key":"ref_1","first-page":"1","article-title":"The Treatment of Missing Survey Data","volume":"12","author":"Kalton","year":"1986","journal-title":"Surv. Methodol."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1146\/annurev.publhealth.25.102802.124410","article-title":"What Do We Do with Missing Data? Some Options for Analysis of Incomplete Data","volume":"25","author":"Raghunathan","year":"2004","journal-title":"Annu. Rev. Public Health"},{"key":"ref_3","unstructured":"Little, R.J.A., and Rubin, D.B. (1987). Statistical Analysis with Missing Data, Wiley."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"581","DOI":"10.1093\/biomet\/63.3.581","article-title":"Inference and Missing Data","volume":"63","author":"Rubin","year":"1976","journal-title":"Biometrika"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1111\/j.2517-6161.1977.tb01600.x","article-title":"Maximum Likelihood from Incomplete Data Via the EM Algorithm","volume":"39","author":"Dempster","year":"1977","journal-title":"J. R. Stat. Soc. (Ser. B)"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Beal, M.J., and Ghahramani, Z. (2003, January 3). The Variational Bayesian EM Algorithm for Incomplete Data: With Application to Scoring Graphical Model Structures. Proceedings of the 7th Valencia International Meeting, New York, NY, USA.","DOI":"10.1093\/oso\/9780198526155.003.0025"},{"key":"ref_7","unstructured":"Koller, D., and Friedman, N. (2009). Probabilistic Graphical Models: Principles and Techniques, MIT Press."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"397","DOI":"10.1111\/stan.12197","article-title":"Bayesian Network Models for Incomplete and Dynamic Data","volume":"74","author":"Scutari","year":"2020","journal-title":"Stat. Neerl."},{"key":"ref_9","unstructured":"Friedman, N. (1997, January 8\u201312). Learning Belief Networks in the Presence of Missing Values and Hidden Variables. Proceedings of the 14th International Conference on Machine Learning, Nashville, TN, USA."},{"key":"ref_10","unstructured":"Friedman, N. (1998, January 24\u201326). The Bayesian Structural EM Algorithm. Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, San Francisco, CA, USA."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1250","DOI":"10.1093\/bioinformatics\/btw807","article-title":"bnstruct: An R Package for Bayesian Network Structure Learning in the Presence of Missing Data","volume":"33","author":"Franzin","year":"2017","journal-title":"Bioinformatics"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"152","DOI":"10.1016\/j.ijar.2018.02.004","article-title":"Efficient Learning of Bounded-Treewidth Bayesian Networks from Complete and Incomplete Data Sets","volume":"95","author":"Scanagatta","year":"2018","journal-title":"Int. J. Approx. Reason."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1177\/096228029900800102","article-title":"Multiple Imputation: A Primer","volume":"8","author":"Schafer","year":"1999","journal-title":"Stat. Methods Med Res."},{"key":"ref_14","unstructured":"R Development Core Team (2020). R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"197","DOI":"10.1007\/BF00994016","article-title":"Learning Bayesian Networks: The Combination of Knowledge and Statistical Data","volume":"20","author":"Heckerman","year":"1995","journal-title":"Mach. Learn."},{"key":"ref_16","unstructured":"Geiger, D., and Heckerman, D. (1994, January 29\u201331). Learning Gaussian Networks. Proceedings of the 10th Conference on Uncertainty in Artificial Intelligence, Seattle, WA, USA."},{"key":"ref_17","first-page":"31","article-title":"Graphical Models for Associations Between Variables, Some of which are Qualitative and Some Quantitative","volume":"17","author":"Lauritzen","year":"1989","journal-title":"Ann. Stat."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"461","DOI":"10.1214\/aos\/1176344136","article-title":"Estimating the Dimension of a Model","volume":"6","author":"Schwarz","year":"1978","journal-title":"Ann. Stat."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Scutari, M., and Denis, J.B. (2014). Bayesian Networks with Examples in R, Chapman & Hall.","DOI":"10.1201\/b17065"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"913","DOI":"10.1080\/08839514.2019.1637138","article-title":"Comparison of Performance of Data Imputation Methods for Numeric Dataset","volume":"33","author":"Jadhav","year":"2019","journal-title":"Appl. Artif. Intell."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Beretta, L., and Santaniello, A. (2016). Nearest Neighbor Imputation Algorithms: A Critical Evaluation. BMC Med Inform. Decis. Mak., 16.","DOI":"10.1186\/s12911-016-0318-z"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"520","DOI":"10.1093\/bioinformatics\/17.6.520","article-title":"Missing Value Estimation Methods for DNA Microarrays","volume":"17","author":"Troyanskaya","year":"2001","journal-title":"Bioinformatics"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"377","DOI":"10.1002\/sim.4067","article-title":"Multiple Imputation Using Chained Equations: Issues and Guidance for Practice","volume":"30","author":"White","year":"2011","journal-title":"Stat. Med."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"464","DOI":"10.1111\/j.1467-842X.2001.tb00294.x","article-title":"How Can I Deal with Missing Data in My Study?","volume":"25","author":"Bennett","year":"2001","journal-title":"Aust. N. Z. J. Public Health"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Watanabe, M., and Yamaguchi, K. (2004). The EM Algorithm and Related Statistical Models, Marcel Dekker.","DOI":"10.1201\/9780203913055"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"McLachlan, G.J., and Krishnan, T. (2008). The EM Algorithm and Extensions, Wiley.","DOI":"10.1002\/9780470191613"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"2909","DOI":"10.1080\/00949655.2018.1491577","article-title":"Generating missing values for simulation purposes: A multivariate amputation procedure","volume":"88","author":"Schouten","year":"2018","journal-title":"J. Stat. Comput. Simul."},{"key":"ref_28","unstructured":"Constantinou, A.C., Liu, Y., Chobtham, K., Guo, Z., and Kitson, N.K. (2020). The Bayesys Data and Bayesian Network Repository, Queen Mary University of London."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/13\/12\/329\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T10:42:56Z","timestamp":1760179376000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/13\/12\/329"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,12,9]]},"references-count":28,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2020,12]]}},"alternative-id":["a13120329"],"URL":"https:\/\/doi.org\/10.3390\/a13120329","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,12,9]]}}}