{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T04:31:34Z","timestamp":1760243494332,"version":"build-2065373602"},"reference-count":31,"publisher":"MDPI AG","issue":"7","license":[{"start":{"date-parts":[[2013,7,12]],"date-time":"2013-07-12T00:00:00Z","timestamp":1373587200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/3.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>We propose a minimum variance unbiased approximation to the conditional relative entropy of the distribution induced by the observed frequency estimates, for multi-classification tasks. Such approximation is an extension of a decomposable scoring criterion, named approximate conditional log-likelihood (aCLL), primarily used for discriminative learning of augmented Bayesian network classifiers. Our contribution is twofold: (i) it addresses multi-classification tasks and not only binary-classification ones; and (ii) it covers broader stochastic assumptions than uniform distribution over the parameters. Specifically, we considered a Dirichlet distribution over the parameters, which was experimentally shown to be a very good approximation to CLL. In addition, for Bayesian network classifiers, a closed-form equation is found for the parameters that maximize the scoring criterion.<\/jats:p>","DOI":"10.3390\/e15072716","type":"journal-article","created":{"date-parts":[[2013,7,12]],"date-time":"2013-07-12T10:55:35Z","timestamp":1373626535000},"page":"2716-2735","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":10,"title":["Efficient Approximation of the Conditional Relative Entropy with Applications to Discriminative Learning of Bayesian Network Classifiers"],"prefix":"10.3390","volume":"15","author":[{"given":"Alexandra","family":"Carvalho","sequence":"first","affiliation":[{"name":"Department of Electrical Engineering, IST, University of Lisbon, Lisbon 1049-001, Portugal"},{"name":"PIA, Instituto de Telecomunica\u00e7\u00f5es, Lisbon 1049-001, Portugal"}]},{"given":"Pedro","family":"Ad\u00e3o","sequence":"additional","affiliation":[{"name":"Department of Computer Science, IST, University of Lisbon, Lisbon 1049-001, Portugal"},{"name":"SQIG, Instituto de Telecomunica\u00e7\u00f5es, Lisbon 1049-001, Portugal"}]},{"given":"Paulo","family":"Mateus","sequence":"additional","affiliation":[{"name":"SQIG, Instituto de Telecomunica\u00e7\u00f5es, Lisbon 1049-001, Portugal"},{"name":"Department of Mathematics, IST, University of Lisbon, Lisbon 1049-001, Portugal"}]}],"member":"1968","published-online":{"date-parts":[[2013,7,12]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kaufmann.","DOI":"10.1016\/B978-0-08-051489-5.50008-4"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"131","DOI":"10.1023\/A:1007465528199","article-title":"Bayesian network classifiers","volume":"29","author":"Friedman","year":"1997","journal-title":"Mach. Learn."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Grossman, D., and Domingos, P. (2004, January 4\u20138). Learning Bayesian Network Classifiers by Maximizing Conditional Likelihood. Proceedings of the Twenty-first International Conference on Machine Learning, Banff, Alberta, Canada.","DOI":"10.1145\/1015330.1015339"},{"key":"ref_4","unstructured":"Cohen, W.W., and Moore, A. (2006, January 25\u201329). Full Bayesian Network Classifiers. Proceedings of the Twenty-third International Conference on Machine Learning, Pittsburgh, PA, USA."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"103","DOI":"10.1023\/A:1007413511361","article-title":"On the optimality of the simple Bayesian classifier under zero-one loss","volume":"29","author":"Domingos","year":"1997","journal-title":"Mach. Learn."},{"key":"ref_6","unstructured":"Dechter, R., and Sutton, R.S. (August, January 28). Structural Extension to Logistic Regression: Discriminative Parameter Learning of Belief Net Classifiers. Proceedings of the Eighteenth National Conference on Artificial Intelligence and Fourteenth Conference on Innovative Applications of Artificial Intelligence, Edmonton, Alberta, Canada."},{"key":"ref_7","unstructured":"Cohen, W.W., McCallum, A., and Roweis, S.T. (2008, January 5\u20139). Discriminative Parameter Learning for Bayesian Networks. Proceedings of the Twenty-fifth International Conference on Machine Learning, Helsinki, Finland."},{"key":"ref_8","first-page":"2181","article-title":"Discriminative learning of Bayesian networks via factorized conditional log-likelihood","volume":"12","author":"Carvalho","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Barash, Y., Elidan, G., Friedman, N., and Kaplan, T. (2003, January 10\u201313). Modeling Dependencies in Protein-DNA Binding Sites. Proceedings of the Seventh Annual International Conference on Computational Biology, Berlin, Germany.","DOI":"10.1145\/640075.640079"},{"key":"ref_10","first-page":"16","article-title":"Efficient Learning of Bayesian Network Classifiers: An Extension to the TAN Classifier","volume":"Volume 4830","author":"Orgun","year":"2007","journal-title":"Proceedings of the 20th Australian Joint Conference on Artificial Intelligence"},{"key":"ref_11","unstructured":"Carvalho, A.M. (2009). Scoring Functions for Learning Bayesian Networks, INESC-ID. Technical Report."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"419","DOI":"10.1109\/TSMCA.2002.803772","article-title":"Comparison of score metrics for Bayesian network learning","volume":"32","author":"Yang","year":"2002","journal-title":"IEEE Trans. Syst. Man. Cybern. A"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"197","DOI":"10.1007\/BF00994016","article-title":"Learning Bayesian networks: The combination of knowledge and statistical data","volume":"20","author":"Heckerman","year":"1995","journal-title":"Mach. Learn."},{"key":"ref_14","first-page":"2149","article-title":"A scoring function for learning Bayesian networks based on mutual information and conditional independence tests","volume":"7","year":"2006","journal-title":"J. Mach. Learn. Res."},{"key":"ref_15","unstructured":"Silander, T., Roos, T., Kontkanen, P., and Myllym\u00e4ki, P. (2008, January 17\u201319). Bayesian Network Structure Learning using Factorized NML Universal Models. Proceedings of the Fourth European Workshop on Probabilistic Graphical Models, Hirshals, Denmark."},{"key":"ref_16","first-page":"1287","article-title":"Large-sample learning of Bayesian networks is NP-hard","volume":"5","author":"Chickering","year":"2004","journal-title":"J. Mach. Learn. Res."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"141","DOI":"10.1016\/0004-3702(93)90036-B","article-title":"Approximating probabilistic inference in Bayesian belief networks is NP-hard","volume":"60","author":"Dagum","year":"1993","journal-title":"Artif. Intell."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"233","DOI":"10.6028\/jres.071B.032","article-title":"Optimum branchings","volume":"71","author":"Edmonds","year":"1967","journal-title":"J. Res. Nat. Bur. Stand."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"462","DOI":"10.1109\/TIT.1968.1054142","article-title":"Approximating discrete probability distributions with dependence trees","volume":"14","author":"Chow","year":"1968","journal-title":"IEEE Trans. Inform. Theory"},{"key":"ref_20","unstructured":"Chickering, D.M. (1996). Learning from Data: AI and Statistics V, Springer."},{"key":"ref_21","unstructured":"Boutilier, C., and Goldszmidt, M. (July, January 30). Dynamic Bayesian Multinets. Proceedings of the 16th Conference in Uncertainty in Artificial Intelligence, Stanford University, Stanford, CA, USA."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Johnson, R.A., and Wichern, D.W. (2007). Applied Multivariate Statistical Analysis, Prentice Hall.","DOI":"10.1002\/9780470061572.eqr239"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Heckerman, D. (1995). A Tutorial on Learning Bayesian Networks, Microsoft. Technical Report MSR-TR-95-06, Microsoft Research.","DOI":"10.1016\/B978-1-55860-377-6.50079-7"},{"key":"ref_24","unstructured":"Koller, D., and Friedman, N. (2009). Probabilistic Graphical Models: Principles and Techniques, MIT Press."},{"key":"ref_25","unstructured":"Bonissone, P.P., Henrion, M., Kanal, L.N., and Lemmer, J.F. (1990, January 27\u201329). Equivalence and Synthesis of Causal Models. Proceedings of the Sixth Annual Conference on Uncertainty in Artificial Intelligence, Cambridge, MA, USA."},{"key":"ref_26","first-page":"445","article-title":"Learning equivalence classes of Bayesian-network structures","volume":"2","author":"Chickering","year":"2002","journal-title":"J. Mach. Learn. Res."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"214","DOI":"10.1016\/S0019-9958(59)90207-4","article-title":"Approximating probability distributions to reduce storage requirements","volume":"2","author":"Lewis","year":"1959","journal-title":"Inform. Control"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Cover, T., and Thomas, J. (2006). Elements of Information Theory, John Wiley & Sons.","DOI":"10.1002\/047174882X"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Pemmaraju, S.V., and Skiena, S.S. (2003). Computational Discrete Mathematics: Combinatorics and Graph Theory with Mathematica, Cambridge University Press.","DOI":"10.1017\/CBO9781139164849"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"281","DOI":"10.1093\/nar\/29.1.281","article-title":"The TRANSFAC system on gene expression regulation","volume":"29","author":"Wingender","year":"2001","journal-title":"Nucleic Acids Res."},{"key":"ref_31","first-page":"1","article-title":"Statistical comparisons of classifiers over multiple data sets","volume":"7","author":"Demsar","year":"2006","journal-title":"J. Mach. Learn. Res."}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/15\/7\/2716\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T21:47:55Z","timestamp":1760219275000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/15\/7\/2716"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2013,7,12]]},"references-count":31,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2013,7]]}},"alternative-id":["e15072716"],"URL":"https:\/\/doi.org\/10.3390\/e15072716","relation":{},"ISSN":["1099-4300"],"issn-type":[{"type":"electronic","value":"1099-4300"}],"subject":[],"published":{"date-parts":[[2013,7,12]]}}}