{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,3]],"date-time":"2026-03-03T20:52:25Z","timestamp":1772571145062,"version":"3.50.1"},"reference-count":50,"publisher":"SAGE Publications","issue":"5","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IDA"],"published-print":{"date-parts":[[2022,9,5]]},"abstract":"<jats:p>Bayesian network (BN) is one of the most powerful probabilistic models in the field of uncertain knowledge representation and reasoning. During the past decade, numerous approaches have been proposed to build directed acyclic graph (DAG) as the structural specification of BN. However, for most Bayesian network classifiers (BNCs) the directed edges in DAG substantially represent assertions of conditional independence rather than causal relationships although the learned joint probability distributions may fit data well, thus they cannot be applied to causal reasoning. In this paper, conditional entropy is introduced to measure causal uncertainty due to its asymmetry characteristic, and heuristic search strategy is applied to build Bayesian causal tree (BCT) by identifying significant causalities. The resulting highly scalable topology can represent causal relationship in terms of causal science, and corresponding joint probability can fit training data in terms of data science. Then ensemble learning strategy is applied to build Bayesian causal forest (BCF) with a set of BCTs, each taking different attribute as the root node to represent root cause for causality analysis. Extensive experiments performed on 32 public datasets from the UCI machine learning repository show that BCF achieves outstanding classification performance compared to state-of-the-art single-model BNCs (e.g., CFWNB), ensemble BNCs (e.g., WATAN, IWAODE, WAODE-MI and TAODE) and non-Bayesian learners (e.g., SVM, k-NN, LR).<\/jats:p>","DOI":"10.3233\/ida-216114","type":"journal-article","created":{"date-parts":[[2022,9,6]],"date-time":"2022-09-06T15:47:11Z","timestamp":1662479231000},"page":"1275-1302","source":"Crossref","is-referenced-by-count":1,"title":["From undirected dependence to directed causality: A novel Bayesian learning approach"],"prefix":"10.1177","volume":"26","author":[{"given":"Limin","family":"Wang","sequence":"first","affiliation":[{"name":"College of Computer Science and Technology, Jilin University, Jilin, China"},{"name":"Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Jilin, China"}]},{"given":"Hangqi","family":"Fan","sequence":"additional","affiliation":[{"name":"College of Computer Science and Technology, Jilin University, Jilin, China"}]},{"given":"He","family":"Kong","sequence":"additional","affiliation":[{"name":"College of Computer Science and Technology, Jilin University, Jilin, China"}]}],"member":"179","reference":[{"key":"10.3233\/IDA-216114_ref1","doi-asserted-by":"crossref","first-page":"213","DOI":"10.1007\/s10994-005-0473-4","article-title":"Learning Bayesian network classifiers: Searching in a space of partially directed acyclic graphs","volume":"59","author":"Acid","year":"2005","journal-title":"Machine Learning"},{"key":"10.3233\/IDA-216114_ref2","doi-asserted-by":"crossref","first-page":"126","DOI":"10.1016\/j.ins.2016.08.051","article-title":"Collective data mining in the ant colony decision tree approach","volume":"372","author":"Kozak","year":"2016","journal-title":"Information Sciences"},{"key":"10.3233\/IDA-216114_ref3","doi-asserted-by":"crossref","first-page":"425","DOI":"10.1016\/j.ins.2018.07.006","article-title":"Tolerance rough fuzzy decision tree","volume":"465","author":"Zhai","year":"2018","journal-title":"Information Sciences"},{"key":"10.3233\/IDA-216114_ref4","doi-asserted-by":"crossref","first-page":"172","DOI":"10.1109\/TKDE.2016.2608881","article-title":"Sample-Based Attribute Selective AnDE for Large Data","volume":"29","author":"Chen","year":"2016","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"key":"10.3233\/IDA-216114_ref5","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1016\/j.patcog.2018.11.032","article-title":"Class-specific attribute weighted naive Bayes","volume":"88","author":"Jiang","year":"2019","journal-title":"Pattern Recognition"},{"key":"10.3233\/IDA-216114_ref6","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1007\/BF00994018","article-title":"Support-vector networks","volume":"20","author":"Cortes","year":"1995","journal-title":"Machine Learning"},{"key":"10.3233\/IDA-216114_ref7","doi-asserted-by":"crossref","first-page":"507","DOI":"10.1109\/TEMC.2017.2749624","article-title":"Multiple objectives optimization for an EBG common mode filter by using an artificial neural network","volume":"2","author":"Orlandi","year":"2018","journal-title":"IEEE Transactions on Electromagnetic Compatibility"},{"key":"10.3233\/IDA-216114_ref8","first-page":"1","article-title":"Semi-supervised learning for k-dependence Bayesian classifiers","volume":"23","author":"Wang","year":"2021","journal-title":"Applied Intelligence"},{"key":"10.3233\/IDA-216114_ref9","doi-asserted-by":"crossref","first-page":"120","DOI":"10.1016\/j.ins.2013.10.007","article-title":"Domains of competence of the semi-naive Bayesian network classifiers","volume":"260","author":"Flores","year":"2014","journal-title":"Information Sciences"},{"key":"10.3233\/IDA-216114_ref10","doi-asserted-by":"crossref","first-page":"385","DOI":"10.3233\/IDA-194509","article-title":"Efficient heuristics for learning Bayesian network from labeled and unlabeled data","volume":"24","author":"Duan","year":"2020","journal-title":"Intelligent Data Analysis"},{"key":"10.3233\/IDA-216114_ref11","doi-asserted-by":"crossref","unstructured":"J. Pearl and T.S. Verma, A theory of inferred causation, in: Proceedings of the 2nd International Conference on the Principles of Knowledge Representation and Reasoning, Vol. 134, 1995, pp.\u00a0789\u2013811.","DOI":"10.1016\/S0049-237X(06)80074-1"},{"key":"10.3233\/IDA-216114_ref12","doi-asserted-by":"crossref","first-page":"106422","DOI":"10.1016\/j.knosys.2020.106422","article-title":"Learning semi-lazy Bayesian network classifier under the c.i.i.d assumption","volume":"208","author":"Liu","year":"2020","journal-title":"Knowledge-Based Systems"},{"key":"10.3233\/IDA-216114_ref13","first-page":"285","article-title":"A Bayesian Approach to Learning Causal Networks","volume":"150","author":"Heckerman","year":"2013","journal-title":"Advances in Decision Analysis: From Foundations to Applications"},{"key":"10.3233\/IDA-216114_ref15","doi-asserted-by":"crossref","unstructured":"D.D. Lewis, Naive (Bayes) at forty: The independence assumption in information retrieval, in: Proceedings of European Conference on Machine Learning, 1998, pp.\u00a04\u201315.","DOI":"10.1007\/BFb0026666"},{"key":"10.3233\/IDA-216114_ref16","doi-asserted-by":"crossref","first-page":"131","DOI":"10.1023\/A:1007465528199","article-title":"Bayesian network classifiers","volume":"29","author":"Friedman","year":"1997","journal-title":"Machine Learning"},{"key":"10.3233\/IDA-216114_ref17","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1007\/s10994-005-4258-6","article-title":"Not so naive Bayes: Aggregating one-dependence estimators","volume":"58","author":"Webb","year":"2005","journal-title":"Machine Learning"},{"key":"10.3233\/IDA-216114_ref18","doi-asserted-by":"crossref","first-page":"201","DOI":"10.1109\/TKDE.2018.2836440","article-title":"A correlation-based feature weighting filter for naive bayes","volume":"31","author":"Jiang","year":"2018","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"key":"10.3233\/IDA-216114_ref19","doi-asserted-by":"crossref","first-page":"641","DOI":"10.3233\/IDA-205125","article-title":"Bagging k-dependence Bayesian network classifiers","volume":"25","author":"Wang","year":"2021","journal-title":"Intelligent Data Analysis"},{"key":"10.3233\/IDA-216114_ref20","doi-asserted-by":"crossref","first-page":"239","DOI":"10.1016\/j.knosys.2011.08.010","article-title":"Improving tree augmented naive bayes for class probability estimation","volume":"26","author":"Jiang","year":"2012","journal-title":"Knowledge-Based Systems"},{"key":"10.3233\/IDA-216114_ref21","doi-asserted-by":"crossref","first-page":"106085","DOI":"10.1016\/j.knosys.2020.106085","article-title":"Instance-based weighting filter for superparent one-dependence estimators","volume":"203","author":"Duan","year":"2020","journal-title":"Knowledge-Based Systems"},{"key":"10.3233\/IDA-216114_ref22","doi-asserted-by":"crossref","first-page":"219","DOI":"10.1080\/0952813X.2011.639092","article-title":"Weighted average of one-dependence estimators","volume":"24","author":"Jiang","year":"2012","journal-title":"Journal of Experimental and Theoretical Artificial Intelligence"},{"key":"10.3233\/IDA-216114_ref23","doi-asserted-by":"crossref","first-page":"309","DOI":"10.1016\/j.tics.2006.05.009","article-title":"Theory-based Bayesian models of inductive learning and reasoning","volume":"10","author":"Tenenbaum","year":"2006","journal-title":"Trends in Cognitive Sciences"},{"key":"10.3233\/IDA-216114_ref24","doi-asserted-by":"crossref","first-page":"598","DOI":"10.1109\/TIT.1987.1057325","article-title":"Measures of mutual and causal dependence between two time series (Corresp.)","volume":"33","author":"Rissanen","year":"1987","journal-title":"IEEE Transactions on Information Theory"},{"key":"10.3233\/IDA-216114_ref25","doi-asserted-by":"crossref","first-page":"73","DOI":"10.1137\/140956166","article-title":"Causal network inference by optimal causation entropy","volume":"14","author":"Sun","year":"2015","journal-title":"SIAM Journal on Applied Dynamical Systems"},{"key":"10.3233\/IDA-216114_ref26","doi-asserted-by":"crossref","first-page":"203","DOI":"10.1007\/BF00198091","article-title":"A new method of the description of the information flow in the brain structures","volume":"65","author":"Kami\u0144ski","year":"1991","journal-title":"Biological Cybernetics"},{"key":"10.3233\/IDA-216114_ref27","doi-asserted-by":"crossref","first-page":"1273","DOI":"10.1016\/S1053-8119(03)00202-7","article-title":"Dynamic causal modelling","volume":"19","author":"Friston","year":"2003","journal-title":"Neuroimage"},{"key":"10.3233\/IDA-216114_ref28","doi-asserted-by":"crossref","first-page":"134","DOI":"10.1016\/j.ins.2018.03.038","article-title":"Causal inference for multivariate stochastic process prediction","volume":"448","author":"Cabuz","year":"2018","journal-title":"Information Sciences"},{"key":"10.3233\/IDA-216114_ref29","doi-asserted-by":"crossref","first-page":"204","DOI":"10.1016\/j.ins.2014.06.026","article-title":"Pattern-based causal relationships discovery from event sequences for modeling behavioral user profile in ubiquitous environments","volume":"285","author":"Chikhaoui","year":"2014","journal-title":"Information Sciences"},{"key":"10.3233\/IDA-216114_ref31","doi-asserted-by":"crossref","first-page":"639","DOI":"10.1215\/00318108-110-4-639","article-title":"Causality: Models, Reasoning and Inference","volume":"110","author":"Hitchcock","year":"2001","journal-title":"The Philosophical Review"},{"key":"10.3233\/IDA-216114_ref32","doi-asserted-by":"crossref","first-page":"27887","DOI":"10.1109\/ACCESS.2020.2971706","article-title":"Self-adaptive attribute value weighting for averaged one-dependence estimators","volume":"8","author":"Wang","year":"2020","journal-title":"IEEE Access"},{"key":"10.3233\/IDA-216114_ref33","doi-asserted-by":"crossref","first-page":"175","DOI":"10.1080\/00031305.1992.10475879","article-title":"An introduction to kernel and nearest-neighbor nonparametric regression","volume":"46","author":"Altman","year":"1992","journal-title":"The American Statistician"},{"key":"10.3233\/IDA-216114_ref34","first-page":"267","article-title":"On discriminative Bayesian network classifiers and logistic regression","volume":"59","author":"Roos","year":"2005","journal-title":"Machine Learning"},{"key":"10.3233\/IDA-216114_ref35","doi-asserted-by":"crossref","first-page":"379","DOI":"10.1002\/j.1538-7305.1948.tb01338.x","article-title":"A mathematical theory of communication","volume":"27","author":"Shannon","year":"1948","journal-title":"The Bell System Technical Journal"},{"key":"10.3233\/IDA-216114_ref36","doi-asserted-by":"crossref","first-page":"35","DOI":"10.3233\/IDA-194959","article-title":"A novel approach to fully representing the diversity in conditional dependencies for learning Bayesian network classifier","volume":"25","author":"Wang","year":"2021","journal-title":"Intelligent Data Analysis"},{"key":"10.3233\/IDA-216114_ref37","doi-asserted-by":"crossref","first-page":"8471","DOI":"10.1016\/j.eswa.2010.05.030","article-title":"Automatically computed document dependent weighting factor facility for Na\u00efve Bayes classification","volume":"37","author":"Lee","year":"2010","journal-title":"Expert Systems with Applications"},{"key":"10.3233\/IDA-216114_ref38","unstructured":"M. Sahami, Learning Limited Dependence Bayesian Classifiers, in: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, Vol. 96, 1996, pp.\u00a0335\u2013338."},{"key":"10.3233\/IDA-216114_ref39","first-page":"1","article-title":"Scalable learning of Bayesian network classifiers","volume":"17","author":"Mart\u0131nez","year":"2016","journal-title":"Journal of Machine Learning Research"},{"key":"10.3233\/IDA-216114_ref40","doi-asserted-by":"crossref","first-page":"1361","DOI":"10.1109\/TKDE.2008.234","article-title":"A novel Bayes model: Hidden naive Bayes","volume":"21","author":"Jiang","year":"2008","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"key":"10.3233\/IDA-216114_ref41","unstructured":"E.J. Keogh and M.J. Pazzani, Learning augmented Bayesian classifiers: A comparison of distribution-based and classification-based approaches, in: Proceedings of International Workshop on Artificial Intelligence, 1999, pp.\u00a0225\u2013230."},{"key":"10.3233\/IDA-216114_ref42","doi-asserted-by":"crossref","first-page":"93","DOI":"10.1007\/s10994-011-5275-2","article-title":"Subsumption resolution: An efficient and effective technique for semi-naive Bayesian learning","volume":"87","author":"Zheng","year":"2012","journal-title":"Machine Learning"},{"key":"10.3233\/IDA-216114_ref43","unstructured":"Y. Freund, Schapire and R. E, Experiments with a new boosting algorithm, in: Proceedings of the 13th International Conference on Machine Learning, Vol. 96, 1996, pp.\u00a0148\u2013156."},{"key":"10.3233\/IDA-216114_ref44","unstructured":"P. Domingos, Bayesian averaging of classifiers and the overfitting problem, in: Proceedings of the 17th International Conference on Machine Learning, Vol. 747, 2000, pp.\u00a0223\u2013230."},{"key":"10.3233\/IDA-216114_ref46","unstructured":"U. Fayyad and K. Irani, Multi-interval discretization of continuous-valued attributes for classification learning, in: Proceedings of the 13th International Joint Conference on Artificial Intelligence, 1993, pp.\u00a01022\u20131029."},{"key":"10.3233\/IDA-216114_ref47","unstructured":"B. Cestnik, Estimating probabilities: a crucial task in machine learning, in: Proceedings of the 9th European Conference on Artificial Intelligence, Vol. 90, 1990, pp.\u00a0147\u2013149."},{"key":"10.3233\/IDA-216114_ref48","unstructured":"P. Domingos, A unified bias-variance decomposition for zero-one and squared loss, in: Proceedings of the 17th National Conference on Artificial Intelligence, Vol.\u00a034, 2000, pp.\u00a0564\u2013569."},{"key":"10.3233\/IDA-216114_ref49","doi-asserted-by":"crossref","first-page":"679","DOI":"10.1016\/j.ijforecast.2006.03.001","article-title":"Another look at measures of forecast accuracy","volume":"22","author":"Hyndman","year":"2006","journal-title":"International Journal of Forecasting"},{"key":"10.3233\/IDA-216114_ref50","doi-asserted-by":"crossref","first-page":"394","DOI":"10.1016\/j.patcog.2016.08.008","article-title":"Designing multi-label classifiers that maximize F measures: State of the art","volume":"61","author":"Pillai","year":"2017","journal-title":"Pattern Recognition"},{"key":"10.3233\/IDA-216114_ref51","doi-asserted-by":"crossref","first-page":"106627","DOI":"10.1016\/j.knosys.2020.106627","article-title":"Hierarchical Independence Thresholding for learning Bayesian network classifiers","volume":"212","author":"Liu","year":"2021","journal-title":"Knowledge-Based Systems"},{"key":"10.3233\/IDA-216114_ref52","first-page":"1","article-title":"Statistical comparisons of classifiers over multiple data sets","volume":"7","author":"Dem\u0161ar","year":"2006","journal-title":"Journal of Machine Learning Research"},{"key":"10.3233\/IDA-216114_ref53","first-page":"2677","article-title":"An Extension on \u201cStatistical Comparisons of Classifiers over Multiple Data Sets\u201d for all Pairwise Comparisons","volume":"9","author":"Garcia","year":"2008","journal-title":"Journal of Machine Learning Research"}],"container-title":["Intelligent Data Analysis"],"original-title":[],"link":[{"URL":"https:\/\/content.iospress.com\/download?id=10.3233\/IDA-216114","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,10]],"date-time":"2025-03-10T17:12:20Z","timestamp":1741626740000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/full\/10.3233\/IDA-216114"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,9,5]]},"references-count":50,"journal-issue":{"issue":"5"},"URL":"https:\/\/doi.org\/10.3233\/ida-216114","relation":{},"ISSN":["1088-467X","1571-4128"],"issn-type":[{"value":"1088-467X","type":"print"},{"value":"1571-4128","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,9,5]]}}}