{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,12]],"date-time":"2026-04-12T17:12:20Z","timestamp":1776013940859,"version":"3.50.1"},"reference-count":24,"publisher":"Oxford University Press (OUP)","issue":"9","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2009,5,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: The study of complex biological relationships is aided by large and high-dimensional data sets whose analysis often involves dimension reduction to highlight representative or informative directions of variation. In principle, information theory provides a general framework for quantifying complex statistical relationships for dimension reduction. Unfortunately, direct estimation of high-dimensional information theoretic quantities, such as entropy and mutual information (MI), is often unreliable given the relatively small sample sizes available for biological problems. Here, we develop and evaluate a hierarchy of approximations for high-dimensional information theoretic statistics from associated low-order terms, which can be more reliably estimated from limited samples. Due to a relationship between this metric and the minimum spanning tree over a graph representation of the system, we refer to these approximations as MIST (Maximum Information Spanning Trees).<\/jats:p>\n               <jats:p>Results: The MIST approximations are examined in the context of synthetic networks with analytically computable entropies and using experimental gene expression data as a basis for the classification of multiple cancer types. The approximations result in significantly more accurate estimates of entropy and MI, and also correlate better with biological classification error than direct estimation and another low-order approximation, minimum-redundancy\u2013maximum-relevance (mRMR).<\/jats:p>\n               <jats:p>Availability: Software to compute the entropy approximations described here is available as Supplementary Material.<\/jats:p>\n               <jats:p>Contact: \u00a0tidor@mit.edu<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btp109","type":"journal-article","created":{"date-parts":[[2009,3,5]],"date-time":"2009-03-05T01:56:49Z","timestamp":1236218209000},"page":"1165-1172","source":"Crossref","is-referenced-by-count":90,"title":["MIST: Maximum Information Spanning Trees for dimension reduction of biological data sets"],"prefix":"10.1093","volume":"25","author":[{"given":"Bracken M.","family":"King","sequence":"first","affiliation":[{"name":"1 Computer Science and Artificial Intelligence Laboratory, 2Department of Biological Engineering and 3Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA"},{"name":"1 Computer Science and Artificial Intelligence Laboratory, 2Department of Biological Engineering and 3Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA"}]},{"given":"Bruce","family":"Tidor","sequence":"additional","affiliation":[{"name":"1 Computer Science and Artificial Intelligence Laboratory, 2Department of Biological Engineering and 3Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA"},{"name":"1 Computer Science and Artificial Intelligence Laboratory, 2Department of Biological Engineering and 3Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA"},{"name":"1 Computer Science and Artificial Intelligence Laboratory, 2Department of Biological Engineering and 3Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA"}]}],"member":"286","published-online":{"date-parts":[[2009,3,4]]},"reference":[{"key":"2023013110275633500_B1","doi-asserted-by":"crossref","first-page":"6745","DOI":"10.1073\/pnas.96.12.6745","article-title":"Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays","volume":"96","author":"Alon","year":"1999","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023013110275633500_B2","doi-asserted-by":"crossref","first-page":"837","DOI":"10.1093\/bioinformatics\/btl008","article-title":"Genetic test bed for feature selection","volume":"22","author":"Choudhary","year":"2006","journal-title":"Bioinformatics"},{"key":"2023013110275633500_B3","volume-title":"Introduction to Algorithms.","author":"Cormen","year":"2001","edition":"2"},{"key":"2023013110275633500_B4","volume-title":"Elements of Information Theory.","author":"Cover","year":"2006","edition":"2"},{"key":"2023013110275633500_B5","doi-asserted-by":"crossref","first-page":"185","DOI":"10.1142\/S0219720005001004","article-title":"Minimum redundancy feature selection from microarray gene expression data","volume":"3","author":"Ding","year":"2005","journal-title":"J. Bioinform. Comput. Biol."},{"key":"2023013110275633500_B6","doi-asserted-by":"crossref","first-page":"110","DOI":"10.1093\/bioinformatics\/btm486","article-title":"Monte Carlo feature selection for supervised classification","volume":"24","author":"Draminski","year":"2008","journal-title":"Bioinformatics"},{"key":"2023013110275633500_B7","doi-asserted-by":"crossref","first-page":"1107","DOI":"10.1142\/S0219720005001533","article-title":"An integrated feature selection and classification method to select minimum number of variables on the case study of gene expression data","volume":"3","author":"Goh","year":"2005","journal-title":"J. Bioinform. Comput. Biol."},{"key":"2023013110275633500_B8","first-page":"104","article-title":"Advances in Information Systems","volume-title":"Lecture Notes in Computer Science.","author":"Gokcen","year":"2002"},{"key":"2023013110275633500_B9","doi-asserted-by":"crossref","first-page":"531","DOI":"10.1126\/science.286.5439.531","article-title":"Molecular classification of cancer: class discovery and class prediction by gene expression monitoring","volume":"286","author":"Golub","year":"1999","journal-title":"Science"},{"key":"2023013110275633500_B10","doi-asserted-by":"crossref","first-page":"1646","DOI":"10.1126\/science.1116598","article-title":"A systems model of signaling identifies a molecular basis set for cytokine-induced apoptosis","volume":"310","author":"Janes","year":"2005","journal-title":"Science"},{"key":"2023013110275633500_B11","doi-asserted-by":"crossref","first-page":"e4","DOI":"10.1371\/journal.pcbi.0030004","article-title":"Modeling HER2 effects on cell behavior from mass spectrometry phosphotyrosine data","volume":"3","author":"Kumar","year":"2007","journal-title":"PLoS Comput. Biol."},{"key":"2023013110275633500_B12","doi-asserted-by":"crossref","first-page":"333","DOI":"10.1007\/s11030-006-9042-4","article-title":"JEDA: joint entropy diversity analysis. Aninformation-theoretic method for choosing diverse and representative subsets from combinatorial libraries","volume":"10","author":"Landon","year":"2006","journal-title":"Mol. Divers."},{"key":"2023013110275633500_B13","first-page":"18","article-title":"REVEAL, a general reverse engineering algorithm for inference of genetic network architectures","volume":"3","author":"Liang","year":"1998","journal-title":"Pac. Symp. Biocomput."},{"key":"2023013110275633500_B14","doi-asserted-by":"crossref","first-page":"2691","DOI":"10.1093\/bioinformatics\/bti419","article-title":"Multiclass cancer classification and biomarker discovery using GA-based algorithms","volume":"21","author":"Liu","year":"2005","journal-title":"Bioinformatics"},{"key":"2023013110275633500_B15","volume-title":"Information Theory, Inference, and Learning Algorithms.","author":"MacKay","year":"2003"},{"key":"2023013110275633500_B16","doi-asserted-by":"crossref","first-page":"79879","DOI":"10.1155\/2007\/79879","article-title":"Information-theoretic inference of large transcriptional regulatory networks","volume":"2007","author":"Meyer","year":"2007","journal-title":"EURASIP J. Bioinform. Syst. Biol."},{"key":"2023013110275633500_B17","doi-asserted-by":"crossref","first-page":"636","DOI":"10.1007\/978-3-540-44871-6_74","article-title":"On the relationship between classification error bounds and training criteria in statistical pattern recognition","volume-title":"Pattern Recognition and Image Analysis.","author":"Ney","year":"2003"},{"key":"2023013110275633500_B18","doi-asserted-by":"crossref","first-page":"1191","DOI":"10.1162\/089976603321780272","article-title":"Estimation of entropy and mutual information","volume":"15","author":"Paninski","year":"2003","journal-title":"Neural Comput."},{"key":"2023013110275633500_B19","doi-asserted-by":"crossref","first-page":"1226","DOI":"10.1109\/TPAMI.2005.159","article-title":"Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy","volume":"27","author":"Peng","year":"2005","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"2023013110275633500_B20","doi-asserted-by":"crossref","first-page":"379","DOI":"10.1002\/j.1538-7305.1948.tb01338.x","article-title":"A mathematical theory of communication","volume":"27","author":"Shannon","year":"1948","journal-title":"Bell Syst. Tech. J."},{"key":"2023013110275633500_B21","doi-asserted-by":"crossref","first-page":"203","DOI":"10.1016\/S1535-6108(02)00030-2","article-title":"Gene expression correlates of clinical prostate cancer behavior","volume":"1","author":"Singh","year":"2002","journal-title":"Cancer Cell"},{"key":"2023013110275633500_B22","doi-asserted-by":"crossref","first-page":"18297","DOI":"10.1073\/pnas.0507432102","article-title":"Information-based clustering","volume":"102","author":"Slonim","year":"2005","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023013110275633500_B23","doi-asserted-by":"crossref","first-page":"1540","DOI":"10.1021\/pr0606530","article-title":"The art and practice of systems biology in medicine: mapping patterns of relationships","volume":"6","author":"van der","year":"2007","journal-title":"J. Proteome Res"},{"key":"2023013110275633500_B24","doi-asserted-by":"crossref","first-page":"1999","DOI":"10.1056\/NEJMoa021967","article-title":"A gene-expression signature as a predictor of survival in breast cancer","volume":"347","author":"van de","year":"2002","journal-title":"N. Engl. J. Med."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/25\/9\/1165\/48983661\/bioinformatics_25_9_1165.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/25\/9\/1165\/48983661\/bioinformatics_25_9_1165.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,31]],"date-time":"2023-01-31T20:33:05Z","timestamp":1675197185000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/25\/9\/1165\/203654"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,3,4]]},"references-count":24,"journal-issue":{"issue":"9","published-print":{"date-parts":[[2009,5,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btp109","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2009,5,1]]},"published":{"date-parts":[[2009,3,4]]}}}