{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,3]],"date-time":"2026-02-03T16:22:56Z","timestamp":1770135776224,"version":"3.49.0"},"reference-count":22,"publisher":"Oxford University Press (OUP)","issue":"9","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2010,5,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Mutual information (MI) is a quantity that measures the dependence between two arbitrary random variables and has been repeatedly used to solve a wide variety of bioinformatic problems. Recently, when attempting to quantify the effects of sampling variance on computed values of MI in proteins, we encountered striking differences among various novel estimates of MI. These differences revealed that estimating the \u2018true\u2019 value of MI is not a straightforward procedure, and minor variations of assumptions yielded remarkably different estimates.<\/jats:p>\n               <jats:p>Results: We describe four formally equivalent estimates of MI, three of which explicitly account for sampling variance, that yield non-equal values of MI given exact frequencies. These MI estimates are essentially non-predictive of each other, converging only in the limit of implausibly large datasets. Lastly, we show that all four estimates are biologically reasonable estimates of MI, despite their disparity, since each is actually the Kullback\u2013Leibler divergence between random variables conditioned on equally plausible hypotheses.<\/jats:p>\n               <jats:p>Conclusions: For sparse contingency tables of the type universally observed in protein coevolution studies, our results show that estimates of MI, and hence inferences about physical phenomena such as coevolution, are critically dependent on at least three prior assumptions. These assumptions are: (i) how observation counts relate to expected frequencies; (ii) the relationship between joint and marginal frequencies; and (iii) how non-observed categories are interpreted. In any biologically relevant data, these assumptions will affect the MI estimate as much or more-so than observed data, and are independent of uncertainty in frequency parameters.<\/jats:p>\n               <jats:p>Contact: \u00a0andrew@fernandes.org<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btq111","type":"journal-article","created":{"date-parts":[[2010,3,18]],"date-time":"2010-03-18T00:14:28Z","timestamp":1268871268000},"page":"1135-1139","source":"Crossref","is-referenced-by-count":19,"title":["Mutual information is critically dependent on prior assumptions: would the <i>correct<\/i> estimate of mutual information please identify itself?"],"prefix":"10.1093","volume":"26","author":[{"given":"Andrew D.","family":"Fernandes","sequence":"first","affiliation":[{"name":"1 Department of Biochemistry, The University of Western Ontario, London, ON N6A 5C1 and 2 Department of Applied Mathematics, The University of Western Ontario, London, ON N6A 5B7, Canada"},{"name":"1 Department of Biochemistry, The University of Western Ontario, London, ON N6A 5C1 and 2 Department of Applied Mathematics, The University of Western Ontario, London, ON N6A 5B7, Canada"}]},{"given":"Gregory B.","family":"Gloor","sequence":"additional","affiliation":[{"name":"1 Department of Biochemistry, The University of Western Ontario, London, ON N6A 5C1 and 2 Department of Applied Mathematics, The University of Western Ontario, London, ON N6A 5B7, Canada"}]}],"member":"286","published-online":{"date-parts":[[2010,3,17]]},"reference":[{"key":"2023012508074648000_B1","volume-title":"The statistical analysis of compositional data. Monographs on statistics and applied probability.","author":"Aitchison","year":"1986"},{"key":"2023012508074648000_B2","doi-asserted-by":"crossref","first-page":"164","DOI":"10.1093\/oxfordjournals.molbev.a026229","article-title":"Correlations among amino acid sites in bhlh protein domains: an information theoretic analysis","volume":"17","author":"Atchley","year":"2000","journal-title":"Mol. Biol. Evol."},{"key":"2023012508074648000_B3","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1093\/biomet\/79.1.25","article-title":"Ordered group reference priors with application to the multinomial problem","volume":"79","author":"Berger","year":"1992","journal-title":"Biometrika"},{"key":"2023012508074648000_B4","doi-asserted-by":"crossref","first-page":"905","DOI":"10.1214\/07-AOS587","article-title":"The formal definition of reference priors","volume":"37","author":"Berger","year":"2009","journal-title":"Ann. Stat."},{"key":"2023012508074648000_B5","doi-asserted-by":"crossref","first-page":"342","DOI":"10.1261\/rna.2164906","article-title":"RNA secondary structure prediction from sequence alignments using a network of k-nearest neighbor classifiers","volume":"12","author":"Bindewald","year":"2006","journal-title":"RNA"},{"key":"2023012508074648000_B6","doi-asserted-by":"crossref","first-page":"1125","DOI":"10.1093\/bioinformatics\/btp135","article-title":"Correction for phylogeny, small number of observations and data redundancy improves the identification of coevolving amino acid pairs using mutual information","volume":"25","author":"Buslje","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012508074648000_B7","doi-asserted-by":"crossref","first-page":"106","DOI":"10.1186\/1471-2148-8-106","article-title":"Reducing the false positive rate in the non-parametric analysis of molecular coevolution","volume":"8","author":"Codo\u00f1er","year":"2008","journal-title":"BMC Evol. Biol."},{"key":"2023012508074648000_B8","doi-asserted-by":"crossref","first-page":"333","DOI":"10.1093\/bioinformatics\/btm604","article-title":"Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction","volume":"24","author":"Dunn","year":"2008","journal-title":"Bioinformatics"},{"key":"2023012508074648000_B9","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511790492","volume-title":"Biological sequence analysis: Probabilistic models of proteins and nucleic acids.","author":"Durbin","year":"1998"},{"key":"2023012508074648000_B10","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1023\/A:1023818214614","article-title":"Isometric logratio transformations for compositional data analysis","volume":"35","author":"Egozcue","year":"2003","journal-title":"Math. Geol."},{"key":"2023012508074648000_B11","first-page":"135","article-title":"Using substitution probabilities to improve position-specific scoring matrices","volume":"12","author":"Henikoff","year":"1996","journal-title":"Comput. Appl. Biosci."},{"key":"2023012508074648000_B12","doi-asserted-by":"crossref","first-page":"633","DOI":"10.1016\/j.csda.2004.03.010","article-title":"Distribution of mutual information from complete and incomplete data","volume":"48","author":"Hutter","year":"2005","journal-title":"Comput. Stat. Data Anal."},{"key":"2023012508074648000_B13","doi-asserted-by":"crossref","first-page":"7176","DOI":"10.1073\/pnas.90.15.7176","article-title":"Covariation of mutations in the v3 loop of human immunodeficiency virus type 1 envelope protein: an information theoretic analysis","volume":"90","author":"Korber","year":"1993","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012508074648000_B14","volume-title":"Information theory and statistics.","author":"Kullback","year":"1978"},{"key":"2023012508074648000_B15","doi-asserted-by":"crossref","first-page":"4116","DOI":"10.1093\/bioinformatics\/bti671","article-title":"Using information theory to search for co-evolving residues in proteins","volume":"21","author":"Martin","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012508074648000_B16","doi-asserted-by":"crossref","first-page":"10938","DOI":"10.1073\/pnas.0701900104","article-title":"An empirical test of the concomitantly variable codon hypothesis","volume":"104","author":"Merlo","year":"2007","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012508074648000_B17","doi-asserted-by":"crossref","first-page":"461","DOI":"10.1186\/1471-2105-9-461","article-title":"minet: A R\/Bioconductor package for inferring large transcriptional networks using mutual information","volume":"9","author":"Meyer","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023012508074648000_B18","doi-asserted-by":"crossref","first-page":"939","DOI":"10.1093\/nar\/gkn1019","article-title":"Pseudocounts for transcription factor binding sites","volume":"37","author":"Nishida","year":"2009","journal-title":"Nucleic Acids Res."},{"key":"2023012508074648000_B19","article-title":"R: A Language and Environment for Statistical Computing","volume-title":"R Foundation for Statistical Computing","author":"R Development Core Team","year":"2009"},{"key":"2023012508074648000_B20","doi-asserted-by":"crossref","first-page":"933","DOI":"10.1093\/bioinformatics\/btm055","article-title":"Position dependencies in transcription factor binding sites","volume":"23","author":"Tomovic","year":"2007","journal-title":"Bioinformatics"},{"key":"2023012508074648000_B21","doi-asserted-by":"crossref","first-page":"691","DOI":"10.1093\/oxfordjournals.molbev.a003851","article-title":"A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach","volume":"18","author":"Whelan","year":"2001","journal-title":"Mol. Biol. Evol."},{"key":"2023012508074648000_B22","doi-asserted-by":"crossref","first-page":"3288","DOI":"10.1073\/pnas.97.7.3288","article-title":"Separation of phylogenetic and functional associations in biological sequences by using the parametric bootstrap","volume":"97","author":"Wollenberg","year":"2000","journal-title":"Proc. Natl Acad. Sci. USA"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/9\/1135\/48855570\/bioinformatics_26_9_1135.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/9\/1135\/48855570\/bioinformatics_26_9_1135.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T08:11:26Z","timestamp":1674634286000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/26\/9\/1135\/199846"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,3,17]]},"references-count":22,"journal-issue":{"issue":"9","published-print":{"date-parts":[[2010,5,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btq111","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2010,5,1]]},"published":{"date-parts":[[2010,3,17]]}}}