{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,2]],"date-time":"2025-11-02T02:42:20Z","timestamp":1762051340175,"version":"build-2065373602"},"reference-count":41,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2022,5,17]],"date-time":"2022-05-17T00:00:00Z","timestamp":1652745600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"NIH","award":["SC3GM096948"],"award-info":[{"award-number":["SC3GM096948"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["BDCC"],"abstract":"<jats:p>Every year, biomedical data is increasing at an alarming rate and is being collected from many different sources, such as hospitals (clinical Big Data), laboratories (genomic and proteomic Big Data), and the internet (online Big Data). This article presents and evaluates a practical causal discovery algorithm that uses modern statistical, machine learning, and informatics approaches that have been used in the learning of causal relationships from biomedical Big Data, which in turn integrates clinical, omics (genomic and proteomic), and environmental aspects. The learning of causal relationships from data using graphical models does not address the hidden (unknown or not measured) mechanisms that are inherent to most measurements and analyses. Also, many algorithms lack a practical usage since they do not incorporate current mechanistic knowledge. This paper proposes a practical causal discovery algorithm using causal Bayesian networks to gain a better understanding of the underlying mechanistic process that generated the data. The algorithm utilizes model averaging techniques such as searching through a relative order (e.g., if gene A is regulating gene B, then we can say that gene A is of a higher order than gene B) and incorporates relevant prior mechanistic knowledge to guide the Markov chain Monte Carlo search through the order. The algorithm was evaluated by testing its performance on datasets generated from the ALARM causal Bayesian network. Out of the 37 variables in the ALARM causal Bayesian network, two sets of nine were chosen and the observations for those variables were provided to the algorithm. The performance of the algorithm was evaluated by comparing its prediction with the generating causal mechanism. The 28 variables that were not in use are referred to as hidden variables and they allowed for the evaluation of the algorithm\u2019s ability to predict hidden confounded causal relationships. The algorithm\u2019s predicted performance was also compared with other causal discovery algorithms. The results show that incorporating order information provides a better mechanistic understanding even when hidden confounded causes are present. The prior mechanistic knowledge incorporated in the Markov chain Monte Carlo search led to the better discovery of causal relationships when hidden variables were involved in generating the simulated data.<\/jats:p>","DOI":"10.3390\/bdcc6020056","type":"journal-article","created":{"date-parts":[[2022,5,17]],"date-time":"2022-05-17T08:34:29Z","timestamp":1652776469000},"page":"56","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["A Better Mechanistic Understanding of Big Data through an Order Search Using Causal Bayesian Networks"],"prefix":"10.3390","volume":"6","author":[{"given":"Changwon","family":"Yoo","sequence":"first","affiliation":[{"name":"Department of Biostatistics, Florida International University, Miami, FL 33199, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9803-6500","authenticated-orcid":false,"given":"Efrain","family":"Gonzalez","sequence":"additional","affiliation":[{"name":"Department of Mathematics & Statistics, University of South Florida, Tampa, FL 33620, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1203-2646","authenticated-orcid":false,"given":"Zhenghua","family":"Gong","sequence":"additional","affiliation":[{"name":"Department of Biostatistics, Florida International University, Miami, FL 33199, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6409-1341","authenticated-orcid":false,"given":"Deodutta","family":"Roy","sequence":"additional","affiliation":[{"name":"Department of Environmental Health Sciences, Florida International University, Miami, FL 33199, USA"}]}],"member":"1968","published-online":{"date-parts":[[2022,5,17]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Pearl, J. (2009). Causality: Models, Reasoning, and Inference, Cambridge University Press. [2nd ed.].","DOI":"10.1017\/CBO9780511803161"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Good, I.J. (1961). A causal calculus I & II. Br. J. Philos. Sci., 11\u201312.","DOI":"10.1093\/bjps\/XII.45.43"},{"key":"ref_3","unstructured":"Suppes, P. (1970). A Probabilistic Theory of Causality, North Holland."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Glymour, C., Scheines, R., Spirtes, P., and Kelley, K. (1987). Discovering Causal Structure, Academic Press.","DOI":"10.1207\/s15327906mbr2302_13"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Cooper, G.F., and Herskovits, E.H. (1991, January 15). A Bayesian method for constructing Bayesian belief networks from databases. Proceedings of the Uncertainty in Artificail Intellegence, Los Angeles, CA, USA.","DOI":"10.1016\/B978-1-55860-203-8.50015-2"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"62","DOI":"10.1177\/089443939100900106","article-title":"An algorithm for fast recovery of sparse causal graphs","volume":"9","author":"Spirtes","year":"1991","journal-title":"Soc. Sci. Comput. Rev."},{"key":"ref_7","unstructured":"Cooper, G.F., and Yoo, C. (1999). Causal Discovery from a Mixture of Experimental and Observational Data. arXiv."},{"key":"ref_8","unstructured":"Glymour, C., and Cooper, G.F. (1999). A Bayesian Approach to Causal Discovery, AAAI Press."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Spirtes, P., Glymour, C., and Scheines, R. (2000). Causation, Prediction, and Search, MIT Press. [2nd ed.].","DOI":"10.7551\/mitpress\/1754.001.0001"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"93","DOI":"10.1111\/j.1749-6632.2008.03749.x","article-title":"Local Causal Discovery Algorithm using Causal Bayesian networks","volume":"1158","author":"Yoo","year":"2009","journal-title":"Ann. N. Y. Acad. Sci."},{"key":"ref_11","unstructured":"Pearl, J., Glymour, M., and Jewell, N.P. (2016). Causal Inference in Statistics: A Primer, John Wiley & Sons."},{"key":"ref_12","unstructured":"Kuipers, J., Suter, P., and Moffa, G. (2018). Efficient Structure Learning and Sampling of Bayesian Networks. arXiv."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41598-021-84905-3","article-title":"Causal effects in microbiomes using interventional calculus","volume":"11","author":"Sazal","year":"2021","journal-title":"Sci. Rep."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"102","DOI":"10.1016\/j.procs.2021.12.216","article-title":"Predictive Big Data Analytics for Service Requests: A Framework","volume":"198","author":"Chauhan","year":"2022","journal-title":"Procedia Comput. Sci."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Binelli, C. (2021). Estimating Causal Effects When the Treatment Affects All Subjects Simultaneously: An Application. Big Data Cogn. Comput., 5.","DOI":"10.3390\/bdcc5020022"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"657","DOI":"10.1007\/s10585-020-10060-0","article-title":"Causal Bayesian gene networks associated with bone, brain and lung metastasis of breast cancer","volume":"37","author":"Park","year":"2020","journal-title":"Clin. Exp. Metastasis"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Chowdhury, D., Das, A., Dey, A., Sarkar, S., Dwivedi, A.D., Rao Mukkamala, R., and Murmu, L. (2022). ABCanDroid: A Cloud Integrated Android App for Noninvasive Early Breast Cancer Detection Using Transfer Learning. Sensors, 22.","DOI":"10.3390\/s22030832"},{"key":"ref_18","unstructured":"Ye, Q., Amini, A.A., and Zhou, Q. (2022). Distributed Learning of Generalized Linear Causal Networks. arXiv."},{"key":"ref_19","unstructured":"Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems, Morgan Kaufmann."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Spirtes, P., Glymour, C., and Scheines, R. (1993). Causation, Prediction, and Search, MIT Press. [1st ed.].","DOI":"10.1007\/978-1-4612-2748-9"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"789","DOI":"10.1016\/S0049-237X(06)80074-1","article-title":"A Theory of Inferred Causality","volume":"Volume 134","author":"Pearl","year":"1995","journal-title":"Studies in Logic and the Foundations of Mathematics"},{"key":"ref_22","unstructured":"Yoo, C., and Cooper, G. (2001). Causal Discovery of Latent-Variable Models from a Mixture of Experimental and Observational Data, Center for Biomedical Informatics."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"2183","DOI":"10.1016\/j.csda.2012.01.010","article-title":"Bayesian Method for Causal Discovery of Latent-Variable Models from a Mixture of Experimental and Observational Data","volume":"56","author":"Yoo","year":"2012","journal-title":"Comput. Stat. Data Anal."},{"key":"ref_24","unstructured":"Meek, C. (2013). Causal inference and causal explanation with background knowledge. arXiv."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Druzdzel, M., and Simon, H. (1993). Causality in Bayesian Belief Networks. Uncertainty in Artificial Intelligence, Elsevier.","DOI":"10.1016\/B978-1-4832-1451-1.50005-6"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"203","DOI":"10.1023\/A:1009787925236","article-title":"A simple constraint-based algorithm for efficiently mining observational databases for causal relationships","volume":"1","author":"Cooper","year":"1997","journal-title":"J. Data Min. Knowl. Discov."},{"key":"ref_27","unstructured":"Meek, C. (1997). Selecting Graphical Models: Causal and Statistical Modeling, Department of Philosophy, Carnegie Mellon University."},{"key":"ref_28","unstructured":"Aliferis, C.F., and Cooper, G.F. (1998). Causal Modeling with Modifiable Temporal Belief Networks, Center for Biomedical Informatics."},{"key":"ref_29","unstructured":"Friedman, N., and Koller, D. (2013). Being Bayesian about network structure. arXiv."},{"key":"ref_30","first-page":"50","article-title":"Bayesian networks without tears","volume":"12","author":"Charniak","year":"1991","journal-title":"AI Mag."},{"key":"ref_31","unstructured":"Heckerman, D.E. (1989). A Tractable Inference Algorithm for Diagnosing Multiple Diseases, Elsevier."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Beinlich, I.A., Suermondt, H.J., Chavez, R.M., and Cooper, G.F. (,  1989). The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks. Proceedings of the Second European Conference on Artificial Intelligence in Medical Care, Berlin, Germany.","DOI":"10.1007\/978-3-642-93437-7_28"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Heckerman, D. (1995). A Bayesian Approach to Learning Causal Networks. arXiv.","DOI":"10.1145\/203330.203336"},{"key":"ref_34","unstructured":"Chickering, D.M., Heckerman, D., and Meek, C. (2013). A Bayesian approach to learning Bayesian networks with local structure. arXiv."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"628","DOI":"10.1109\/TKDE.2007.190732","article-title":"Improving Bayesian Network Structure Learning with Mutual Information-Based Node Ordering in the K2 Algorithm","volume":"20","author":"Chen","year":"2008","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_36","unstructured":"Mani, S., Cooper, G., and Spirtes, P. (2006). A Theoretical Study of Y Structures for Causal Discovery. arXiv."},{"key":"ref_37","unstructured":"Silander, T., and Myllymaki, P. (2006, January 13\u201316). A simple approach for finding the globally optimal Bayesian network structure. Proceedings of the Uncertainty in Artificial Intelligence, Cambridge, MA, USA."},{"key":"ref_38","unstructured":"Hartemink, A.J., and Berger, H. (2022, April 07). Banjo: Banjo is licensed from Duke University. Copyright\u00a9 2005\u20132008 by Alexander J. Hartemink. All rights reserved. Available online: https:\/\/users.cs.duke.edu\/~amink\/software\/banjo\/."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"309","DOI":"10.1007\/BF00994110","article-title":"A Bayesian method for the induction of probabilistic networks from data","volume":"9","author":"Cooper","year":"1992","journal-title":"Mach. Learn."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Geiger, D., and Heckerman, D. (1995). A characterization of the Dirichlet distribution with application to learning Bayesian networks. Maximum Entropy and Bayesian Methods, Springer.","DOI":"10.1007\/978-94-011-5430-7_7"},{"key":"ref_41","unstructured":"Cooper, G.F. (1987). Probabilistic Inference Using Belief Networks Is NP-Hard, Stanford University. KSL8-727."}],"container-title":["Big Data and Cognitive Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-2289\/6\/2\/56\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T23:13:37Z","timestamp":1760138017000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-2289\/6\/2\/56"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,5,17]]},"references-count":41,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2022,6]]}},"alternative-id":["bdcc6020056"],"URL":"https:\/\/doi.org\/10.3390\/bdcc6020056","relation":{},"ISSN":["2504-2289"],"issn-type":[{"type":"electronic","value":"2504-2289"}],"subject":[],"published":{"date-parts":[[2022,5,17]]}}}