{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T04:38:15Z","timestamp":1760243895536,"version":"build-2065373602"},"reference-count":52,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2010,5,5]],"date-time":"2010-05-05T00:00:00Z","timestamp":1273017600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/3.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>Considerable research efforts have been devoted to probabilistic modeling of genetic population structures within the past decade. In particular, a wide spectrum of Bayesian models have been proposed for unlinked molecular marker data from diploid organisms. Here we derive a theoretical framework for learning genetic population structure of a haploid organism from bi-allelic markers for which potential patterns of dependence are a priori unknown and to be explicitly incorporated in the model. Our framework is based on the principle of minimizing stochastic complexity of an unsupervised classification under tree augmented factorization of the predictive data distribution. We discuss a fast implementation of the learning framework using deterministic algorithms.<\/jats:p>","DOI":"10.3390\/e12051102","type":"journal-article","created":{"date-parts":[[2010,5,5]],"date-time":"2010-05-05T11:38:16Z","timestamp":1273059496000},"page":"1102-1124","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Learning Genetic Population Structures Using Minimization of Stochastic Complexity"],"prefix":"10.3390","volume":"12","author":[{"given":"Jukka","family":"Corander","sequence":"first","affiliation":[{"name":"Department of Mathematics and statistics, University of Helsinki, P.O.Box 68, FIN-00014 University of Helsinki, Finland"},{"name":"Department of Mathematics, \u00c5bo Akademi University, FIN-20500 \u00c5bo, Finland"}]},{"given":"Mats","family":"Gyllenberg","sequence":"additional","affiliation":[{"name":"Department of Mathematics and statistics, University of Helsinki, P.O.Box 68, FIN-00014 University of Helsinki, Finland"}]},{"given":"Timo","family":"Koski","sequence":"additional","affiliation":[{"name":"Department of Mathematics, Royal Institute of Technology, S-100 44 Stockholm, Sweden"}]}],"member":"1968","published-online":{"date-parts":[[2010,5,5]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Ewens, W.J. (2004). Mathematical Population Genetics, Springer-Verlag. [2nd ed.].","DOI":"10.1007\/978-0-387-21822-9"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Nagylaki, T. (1992). Theoretical Population Genetics, Springer-Verlag.","DOI":"10.1007\/978-3-642-76214-7"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"945","DOI":"10.1093\/genetics\/155.2.945","article-title":"Inference of population structure using multilocus genotype data","volume":"155","author":"Pritchard","year":"2000","journal-title":"Genetics"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1017\/S001667230100502X","article-title":"A Bayesian approach to the identification of panmictic populations and the assignment of individuals","volume":"78","author":"Dawson","year":"2001","journal-title":"Genet. Res."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"367","DOI":"10.1093\/genetics\/163.1.367","article-title":"Bayesian analysis of genetic differentiation between populations","volume":"163","author":"Corander","year":"2003","journal-title":"Genetics"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"2833","DOI":"10.1111\/j.1365-294X.2006.02994.x","article-title":"Bayesian identification of admixture events using multi-locus molecular markers","volume":"15","author":"Corander","year":"2006","journal-title":"Mol. Ecol."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"797","DOI":"10.1007\/s11538-006-9161-1","article-title":"Random Partition models and Exchangeability for Bayesian Identification of Population Structure","volume":"69","author":"Corander","year":"2007","journal-title":"Bull. Math. Biol."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"1567","DOI":"10.1093\/genetics\/164.4.1567","article-title":"Inference of population structure using multilocus genotype data: Linked loci and correlated allele frequencies","volume":"164","author":"Falush","year":"2003","journal-title":"Genetics"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"1261","DOI":"10.1534\/genetics.104.033803","article-title":"A spatial statistical model for landscape genetics","volume":"170","author":"Guillot","year":"2005","journal-title":"Genetics"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"4734","DOI":"10.1111\/j.1365-294X.2009.04410.x","article-title":"Statistical methods in spatial genetics","volume":"18","author":"Guillot","year":"2010","journal-title":"Mol. Ecol."},{"key":"ref_11","unstructured":"Capasso, V. (2003). Mathematical Modelling and Computing in Biology and Medicine, Progetto Leonardo."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"462","DOI":"10.1109\/TIT.1968.1054142","article-title":"Approximating Discrete Probability Distributions with Dependence Trees","volume":"14","author":"Chow","year":"1968","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1023\/A:1007465528199","article-title":"Bayesian Network Classifiers","volume":"29","author":"Friedman","year":"1997","journal-title":"Mach. Learn."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1016\/j.mbs.2006.09.015","article-title":"Bayesian analysis of population structure based on linked molecular information","volume":"205","author":"Corander","year":"2007","journal-title":"Math. Biosci."},{"key":"ref_15","unstructured":"Cowell, R.G., Dawid, A.P., Lauritzen, S.L., and Spiegelhalter, D.J. (1999). Probabilistic Networks and Expert Systems, Springer-Verlag."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Koski, T., and Noble, J.N. (2009). Bayesian Networks: an Introduction, Wiley.","DOI":"10.1002\/9780470684023"},{"key":"ref_17","first-page":"1","article-title":"Learning with Mixtures of Trees","volume":"1","author":"Meil","year":"2000","journal-title":"J. Mach. Learn. Res."},{"key":"ref_18","unstructured":"Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems, Morgan Kaufmann."},{"key":"ref_19","unstructured":"Becker, A., Geiger, D., and Meek, C. (2000). Perfect Tree-like Markovian Distributions. Proc. 16th Conf. Uncertainty in Artificial Intelligence, 19\u201323."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"197","DOI":"10.1007\/BF00994016","article-title":"Learning Bayesian Networks: The combination of knowledge and statistical data","volume":"20","author":"Heckerman","year":"1995","journal-title":"Mach. Learn."},{"key":"ref_21","unstructured":"Heckerman, D., Geiger, D., and Chickering, D.M. Likelihoods and Parameter Priors for Bayesian Networks. Microsoft Res. Tech. Rep., MSR-TR-95-54."},{"key":"ref_22","unstructured":"Duda, R.O., and Hart, P.E. (1973). Pattern Classification and Scene Analysis, Wiley."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1016\/0378-3758(94)90153-8","article-title":"Jeffreys\u2019 prior is asymptotically least favorable under entropy risk","volume":"41","author":"Clarke","year":"1994","journal-title":"J. Stat. Planning Inference"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"40","DOI":"10.1109\/18.481776","article-title":"Fisher Information and Stochastic Complexity","volume":"42","author":"Rissanen","year":"1996","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_25","unstructured":"Cover, T.M., and Thomas, J.A. (1991). Elements of Information Theory, Wiley."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"161","DOI":"10.1016\/S0025-5564(01)00096-7","article-title":"Bayesian Predictiveness, Exchangeability and Sufficientness in Bacterial Taxonomy","volume":"177 & 178","author":"Gyllenberg","year":"2002","journal-title":"Math. Biosci."},{"key":"ref_27","unstructured":"DeGroot, M.H. (1970). Optimal Statistical Decisions, McGraw-Hill."},{"key":"ref_28","first-page":"2237","article-title":"Learning Bayesian Belief Networks Based on the Minimum Description Length Principle: Basic Properties","volume":"82","author":"Suzuki","year":"1999","journal-title":"IEICE Trans. Fundamentals"},{"key":"ref_29","unstructured":"Ku\u010dera, L. (1990). Combinatorial Algorithms, Adam Hilger."},{"key":"ref_30","first-page":"461","article-title":"Estimating the Dimension of a Model","volume":"6","author":"Schwartz","year":"1978","journal-title":"Ann. Statist."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"47","DOI":"10.1006\/jmva.1997.1687","article-title":"Classification of Binary Vectors by Stochastic Complexity","volume":"63","author":"Gyllenberg","year":"1997","journal-title":"J. Multiv. Analysis"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"928","DOI":"10.1080\/01621459.1995.10476592","article-title":"A Reference Bayesian Test for Nested Hypotheses and Its Relationship to the Schwartz criterion","volume":"90","author":"Kass","year":"1995","journal-title":"J. Amer. Stat. Assoc."},{"key":"ref_33","unstructured":"Drton, M., Sturmfels, B., and Sullivant, S. (2005). Lectures on Algebraic Statistics, Birkh\u00e4user."},{"key":"ref_34","first-page":"1","article-title":"Asymptotic Model Selection for Naive Bayesian Networks","volume":"6","author":"Rusakov","year":"2005","journal-title":"J. Mach. Learn. Res."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"719","DOI":"10.1109\/34.865189","article-title":"Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood","volume":"28","author":"Biernacki","year":"2000","journal-title":"IEEE Trans. Patt. Anal. Mach. Intel."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"342","DOI":"10.1214\/aos\/1176350709","article-title":"On the Choice of the Model to Fit Data from an Exponential Family","volume":"16","author":"Haughton","year":"1988","journal-title":"Ann. Statist."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"333","DOI":"10.1109\/34.21803","article-title":"Comments on the Approximating Discrete Probability Distributions with Dependence Trees","volume":"11","author":"Wong","year":"1989","journal-title":"IEEE Trans. Patt. Anal. Mach. Intel."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"1866","DOI":"10.1109\/TPAMI.2007.1184","article-title":"On the Relationship between Dependence Tree Classification Error and Bayes Error Rate","volume":"29","author":"Balagani","year":"2007","journal-title":"IEEE Trans. Patt. Anal. Mach. Intel."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"89","DOI":"10.1006\/jcss.1997.1501","article-title":"Stochastic Complexity in Learning","volume":"55","author":"Rissanen","year":"1997","journal-title":"J. Comp. System Sci."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"446","DOI":"10.1109\/18.825807","article-title":"Minimum Description Length Induction, Bayesianism, and Kolmogorov Complexity","volume":"46","author":"Li","year":"2000","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"1424","DOI":"10.1109\/18.681319","article-title":"A decision-theoretic extension of stochastic complexity and its applications to learning","volume":"44","author":"Yamanishi","year":"1998","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"85","DOI":"10.1006\/bulm.1998.0076","article-title":"Bayesian Predictive Identification and Cumulative Classification of Bacteria","volume":"61","author":"Gyllenberg","year":"1999","journal-title":"Bull. Math. Biol."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Jordan, M. (1997). Learning in Graphical Models, MIT Press.","DOI":"10.1007\/978-94-011-5014-9"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"269","DOI":"10.1111\/j.1467-8640.1994.tb00166.x","article-title":"Learning Bayesian Belief Networks: An Approach Based on the MDL Principle","volume":"10","author":"Lam","year":"1994","journal-title":"Comput. Intel."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1109\/69.494161","article-title":"A guide to the literature on learning probabilistic networks from data","volume":"8","author":"Buntine","year":"1996","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_46","first-page":"31","article-title":"Learning causal networks from data: a survey and a new algorithm for recovering possibilistic causal networks","volume":"10","year":"1997","journal-title":"AI Commun."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"369","DOI":"10.1109\/TIT.1973.1055013","article-title":"Consistency of an estimate of tree-dependent probability distributions","volume":"19","author":"Chow","year":"1973","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"309","DOI":"10.1007\/BF00994110","article-title":"A Bayesian Method for the Induction of Probabilistic Networks from Data","volume":"9","author":"Cooper","year":"1992","journal-title":"Mach. Learn."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"V. Fisher, D., and Lenz, H-J. (1996). Learning from Data. Artificial Intelligence and Statistics, Springer-Verlag.","DOI":"10.1007\/978-1-4612-2404-4"},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"355","DOI":"10.1007\/s11222-006-9391-y","article-title":"Bayesian model learning based on a parallel MCMC strategy","volume":"16","author":"Corander","year":"2006","journal-title":"Stat. Comput."},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"431","DOI":"10.1007\/s10618-008-0099-9","article-title":"Parallell interacting MCMC for learning of topologies of graphical models","volume":"17","author":"Corander","year":"2008","journal-title":"Data Mining Knowl. Discovery"},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1007\/s11634-009-0036-9","article-title":"Bayesian unsupervised classification framework based on stochastic partitions of data and a parallel search strategy","volume":"3","author":"Corander","year":"2009","journal-title":"Adv. Data Anal. Classification"}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/12\/5\/1102\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T22:02:21Z","timestamp":1760220141000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/12\/5\/1102"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,5,5]]},"references-count":52,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2010,5]]}},"alternative-id":["e12051102"],"URL":"https:\/\/doi.org\/10.3390\/e12051102","relation":{},"ISSN":["1099-4300"],"issn-type":[{"type":"electronic","value":"1099-4300"}],"subject":[],"published":{"date-parts":[[2010,5,5]]}}}