{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,28]],"date-time":"2026-02-28T04:22:10Z","timestamp":1772252530408,"version":"3.50.1"},"reference-count":61,"publisher":"MDPI AG","issue":"10","license":[{"start":{"date-parts":[[2019,10,12]],"date-time":"2019-10-12T00:00:00Z","timestamp":1570838400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R01 GM088344"],"award-info":[{"award-number":["R01 GM088344"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["F32 GM130113"],"award-info":[{"award-number":["F32 GM130113"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>Homologous sequence alignments contain important information about the constraints that shape protein family evolution. Correlated changes between different residues, for instance, can be highly predictive of physical contacts within three-dimensional structures. Detecting such co-evolutionary signals via direct coupling analysis is particularly challenging given the shared phylogenetic history and uneven sampling of different lineages from which protein sequences are derived. Current best practices for mitigating such effects include sequence-identity-based weighting of input sequences and post-hoc re-scaling of evolutionary coupling scores. However, numerous weighting schemes have been previously developed for other applications, and it is unknown whether any of these schemes may better account for phylogenetic artifacts in evolutionary coupling analyses. Here, we show across a dataset of 150 diverse protein families that the current best practices out-perform several alternative sequence- and tree-based weighting methods. Nevertheless, we find that sequence weighting in general provides only a minor benefit relative to post-hoc transformations that re-scale the derived evolutionary couplings. While our findings do not rule out the possibility that an as-yet-untested weighting method may show improved results, the similar predictive accuracies that we observe across conceptually distinct weighting methods suggests that there may be little room for further improvement on top of existing strategies.<\/jats:p>","DOI":"10.3390\/e21101000","type":"journal-article","created":{"date-parts":[[2019,10,14]],"date-time":"2019-10-14T03:54:13Z","timestamp":1571025253000},"page":"1000","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":19,"title":["Phylogenetic Weighting Does Little to Improve the Accuracy of Evolutionary Coupling Analyses"],"prefix":"10.3390","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9476-0104","authenticated-orcid":false,"given":"Adam J.","family":"Hockenberry","sequence":"first","affiliation":[{"name":"Department of Integrative Biology, The University of Texas at Austin, Austin, TX 78712, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7470-9261","authenticated-orcid":false,"given":"Claus O.","family":"Wilke","sequence":"additional","affiliation":[{"name":"Department of Integrative Biology, The University of Texas at Austin, Austin, TX 78712, USA"}]}],"member":"1968","published-online":{"date-parts":[[2019,10,12]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"309","DOI":"10.1002\/prot.340180402","article-title":"Correlated Mutations and Residue Contacts in Proteins","volume":"18","author":"Gobel","year":"1994","journal-title":"Proteins"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.7554\/eLife.03430","article-title":"Sequence co-evolution gives 3D contacts and structures of protein complexes","volume":"3","author":"Hopf","year":"2014","journal-title":"eLife"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"128","DOI":"10.1038\/nbt.3769","article-title":"Mutation effects predicted from sequence co-variation","volume":"35","author":"Hopf","year":"2017","journal-title":"Nat. Biotechnol."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"15674","DOI":"10.1073\/pnas.1314045110","article-title":"Assessing the utility of coevolution-based residue-residue contact predictions in a sequence- and structure-rich era","volume":"110","author":"Kamisetty","year":"2013","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"294","DOI":"10.1126\/science.aah4043","article-title":"Protein structure determination using metagenome sequence data","volume":"355","author":"Ovchinnikov","year":"2017","journal-title":"Science"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"236","DOI":"10.1214\/lnms\/1215455556","article-title":"Correlated mutations in models of protein sequences: Phylogenetic and structural effects","volume":"33","author":"Lapedes","year":"1999","journal-title":"Stat. Mol. Biol. Genet."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Burger, L., and Van Nimwegen, E. (2008). Accurate prediction of protein-protein interactions from sequence alignments using a Bayesian method. Mol. Syst. Biol., 4.","DOI":"10.1038\/msb4100203"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"67","DOI":"10.1073\/pnas.0805923106","article-title":"Identification of direct residue contacts in protein-protein interaction by message passing","volume":"106","author":"Weigt","year":"2009","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Burger, L., and Van Nimwegen, E. (2010). Disentangling direct from indirect co-evolution of residues in protein alignments. PLoS Comput. Biol., 6.","DOI":"10.1371\/journal.pcbi.1000633"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Marks, D.S., Colwell, L.J., Sheridan, R., Hopf, T.A., Pagnani, A., Zecchina, R., and Sander, C. (2011). Protein 3D structure computed from evolutionary sequence variation. PLoS ONE, 6.","DOI":"10.1371\/journal.pone.0028766"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"E1293","DOI":"10.1073\/pnas.1111471108","article-title":"Direct-coupling analysis of residue coevolution captures native contacts across many protein families","volume":"108","author":"Morcos","year":"2011","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1103\/PhysRevE.87.012707","article-title":"Improved contact prediction in proteins: Using pseudolikelihoods to infer Potts models","volume":"87","author":"Ekeberg","year":"2013","journal-title":"Phys. Rev. E"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"3128","DOI":"10.1093\/bioinformatics\/btu500","article-title":"CCMpred - Fast and precise prediction of protein residue-residue contacts from correlated mutations","volume":"30","author":"Seemayer","year":"2014","journal-title":"Bioinformatics"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"999","DOI":"10.1093\/bioinformatics\/btu791","article-title":"MetaPSICOV: Combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins","volume":"31","author":"Jones","year":"2015","journal-title":"Bioinformatics"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"1018","DOI":"10.1093\/molbev\/msy007","article-title":"How pairwise coevolutionary models capture the collective residue variability in proteins?","volume":"35","author":"Figliuzzi","year":"2018","journal-title":"Mol. Biol. Evol."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Vorberg, S., Seemayer, S., and S\u00f6ding, J. (2018). Synthetic protein alignments by CCMgen quantify noise in residue-residue contact prediction. PLoS Comput. Biol., 14.","DOI":"10.1101\/344333"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"1582","DOI":"10.1093\/bioinformatics\/bty862","article-title":"The EVcouplings Python framework for coevolutionary sequence analysis Thomas","volume":"35","author":"Hopf","year":"2018","journal-title":"Bioinformatics"},{"key":"ref_18","first-page":"1","article-title":"Evolutionary couplings detect side-chain interactions","volume":"e7280","author":"Hockenberry","year":"2019","journal-title":"PeerJ"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"20533","DOI":"10.1073\/pnas.1315625110","article-title":"Coevolutionary signals across protein lineages help capture multiple protein conformations","volume":"110","author":"Morcos","year":"2013","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"12180","DOI":"10.1073\/pnas.1606762113","article-title":"Inferring interaction partners from protein sequences","volume":"113","author":"Bitbol","year":"2016","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"E2662","DOI":"10.1073\/pnas.1615068114","article-title":"Large-scale identification of coevolution signals across homo-oligomeric protein interfaces by direct coupling analysis","volume":"114","author":"Uguzzoni","year":"2017","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"185","DOI":"10.1126\/science.aaw6718","article-title":"Protein interaction networks revealed by proteome coevolution","volume":"365","author":"Cong","year":"2019","journal-title":"Science"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/S0169-5347(01)02381-3","article-title":"Taxonomic chauvinism","volume":"17","author":"Bonnet","year":"2002","journal-title":"Trends Ecol. Evol."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Chen, C., Natale, D.A., Finn, R.D., Huang, H., Zhang, J., Wu, C.H., and Mazumder, R. (2011). Representative Proteomes: A Stable, Scalable and Unbiased proteome set for sequence analysis and functional annotation. PLoS ONE, 6.","DOI":"10.1371\/journal.pone.0018910"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"431","DOI":"10.1038\/nature12352","article-title":"Insights into the phylogeny and coding potential of microbial dark matter","volume":"499","author":"Rinke","year":"2013","journal-title":"Nature"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41598-017-09084-6","article-title":"Taxonomic bias in biodiversity data and societal preferences","volume":"7","author":"Troudet","year":"2017","journal-title":"Sci. Rep."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1371\/journal.pone.0189577","article-title":"Scientific research on animal biodiversity is systematically biased towards vertebrates and temperate regions","volume":"12","author":"Titley","year":"2017","journal-title":"PLoS ONE"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1086\/284325","article-title":"Phylogenies and the comparative method","volume":"125","author":"Felsenstein","year":"1985","journal-title":"Am. Nat."},{"key":"ref_29","first-page":"119","article-title":"The phylogenetic regression","volume":"326","author":"Grafen","year":"1989","journal-title":"Philos. Trans. R. Soc. B"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"877","DOI":"10.1038\/44766","article-title":"Inferring historical patterns of biological evolution","volume":"401","author":"Pagel","year":"1999","journal-title":"Nature"},{"key":"ref_31","first-page":"2143","article-title":"Comparative methods for the analysis of continuous variables: geometric interpretations","volume":"55","author":"Rohlf","year":"2001","journal-title":"Evolution"},{"key":"ref_32","first-page":"717","article-title":"Testing for phylogenetic signal in comparative data: Behavioral traits are more labile","volume":"57","author":"Blomberg","year":"2003","journal-title":"Evolution"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"252","DOI":"10.1080\/10635150701313830","article-title":"Within-species variation and measurement error in phylogenetic comparative methods","volume":"56","author":"Ives","year":"2007","journal-title":"Syst. Biol."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1093\/sysbio\/syp074","article-title":"Phylogenetic Regression for Binary Dependent Variables","volume":"59","author":"Ives","year":"2010","journal-title":"Syst. Biol."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"3258","DOI":"10.1111\/j.1558-5646.2009.00804.x","article-title":"Size-correction and principal components for interspecific comparative studies","volume":"63","author":"Revell","year":"2009","journal-title":"Evolution"},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"319","DOI":"10.1111\/j.2041-210X.2010.00044.x","article-title":"Phylogenetic signal and linear regression on species data","volume":"1","author":"Revell","year":"2010","journal-title":"Methods Ecol. Evol."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"1091","DOI":"10.1093\/sysbio\/syy031","article-title":"Rethinking phylogenetic comparative methods","volume":"67","author":"Uyeda","year":"2018","journal-title":"Syst. Biol."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"647","DOI":"10.1016\/0022-2836(89)90234-9","article-title":"Weights for data related by a tree","volume":"207","author":"Altschul","year":"1989","journal-title":"J. Mol. Biol."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"115","DOI":"10.1093\/bioinformatics\/5.2.115","article-title":"A fast and multiple sequence alignment algorithm","volume":"5","author":"Vingron","year":"1989","journal-title":"Bioinformatics"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"813","DOI":"10.1016\/S0022-2836(99)80003-5","article-title":"Weighting aligned protein or nucleic acid sequences to correct for unequal representation","volume":"216","author":"Sibbald","year":"1990","journal-title":"J. Mol. Biol."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"8777","DOI":"10.1073\/pnas.90.19.8777","article-title":"Weighting in sequence space: A comparison of methods in terms of generalized sequences","volume":"90","author":"Vingron","year":"1993","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1093\/bioinformatics\/10.1.19","article-title":"Improved sensitivity of profile searches through the use of sequence weights and gap excision","volume":"10","author":"Thompson","year":"1994","journal-title":"Bioinformatics"},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"1067","DOI":"10.1016\/0022-2836(94)90012-4","article-title":"Volume changes in protein evolution","volume":"236","author":"Gerstein","year":"1994","journal-title":"J. Mol. Biol."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"574","DOI":"10.1016\/0022-2836(94)90032-9","article-title":"Position-based sequence weights","volume":"243","author":"Henikoff","year":"1994","journal-title":"J. Mol. Biol."},{"key":"ref_45","first-page":"215","article-title":"Maximum entropy weighting of aligned sequences of proteins or DNA","volume":"3","author":"Krogh","year":"1995","journal-title":"Proc. Int. Conf. Intell. Syst. Mol. Biol."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1471-2105-8-222","article-title":"Constructing a meaningful evolutionary average at the phylogenetic center of mass","volume":"8","author":"Stone","year":"2007","journal-title":"BMC Bioinform."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped BLAST and PSI-BLAST: A new generation of protein database search programs","volume":"25","author":"Altschul","year":"1997","journal-title":"Nucleic Acids Res."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"755","DOI":"10.1093\/bioinformatics\/14.9.755","article-title":"Profile hidden Markov models","volume":"14","author":"Eddy","year":"1998","journal-title":"Bioinformatics"},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"333","DOI":"10.1093\/bioinformatics\/btm604","article-title":"Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction","volume":"24","author":"Dunn","year":"2008","journal-title":"Bioinformatics"},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"19333","DOI":"10.1073\/pnas.1213199109","article-title":"Estimating divergence times in large molecular phylogenies","volume":"109","author":"Tamura","year":"2012","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"1368","DOI":"10.1093\/oxfordjournals.molbev.a025583","article-title":"Modeling residue usage in aligned protein sequences via maximum likelihood","volume":"13","author":"Bruno","year":"1996","journal-title":"Mol. Biol. Evol."},{"key":"ref_52","doi-asserted-by":"crossref","unstructured":"Newberg, L.A., McCue, L.A., and Lawrence, C.E. (2005). The Relative Inefficiency of Sequence Weights Approaches in Determining a Nucleotide Position Weight Matrix. Stat. Appl. Genet. Mol. Biol., 4.","DOI":"10.2202\/1544-6115.1135"},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Patterson, N., Price, A.L., and Reich, D. (2006). Population Structure and Eigenanalysis. PLoS Genet., 2.","DOI":"10.1371\/journal.pgen.0020190"},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Cocco, S., Monasson, R., and Weigt, M. (2013). From Principal Component to Direct Coupling Analysis of Coevolution in Proteins: Low-Eigenvalue Modes are Needed for Structure Prediction. PLoS Comput. Biol., 9.","DOI":"10.1371\/journal.pcbi.1003176"},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"690","DOI":"10.1073\/pnas.1711913115","article-title":"Power law tails in phylogenetic systems","volume":"115","author":"Qin","year":"2018","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Anishchenko, I., Ovchinnikov, S., Kamisetty, H., and Baker, D. (2017). Origins of coevolution between residues distant in protein 3D structures. Proc. Natl. Acad. Sci. USA.","DOI":"10.1073\/pnas.1702664114"},{"key":"ref_57","doi-asserted-by":"crossref","first-page":"184","DOI":"10.1093\/bioinformatics\/btr638","article-title":"PSICOV: Precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments","volume":"28","author":"Jones","year":"2012","journal-title":"Bioinformatics"},{"key":"ref_58","doi-asserted-by":"crossref","first-page":"3308","DOI":"10.1093\/bioinformatics\/bty341","article-title":"High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features","volume":"34","author":"Jones","year":"2018","journal-title":"Bioinformatics"},{"key":"ref_59","doi-asserted-by":"crossref","unstructured":"Price, M.N., Dehal, P.S., and Arkin, A.P. (2010). FastTree 2 - Approximately maximum-likelihood trees for large alignments. PLoS ONE, 5.","DOI":"10.1371\/journal.pone.0009490"},{"key":"ref_60","doi-asserted-by":"crossref","first-page":"268","DOI":"10.1093\/molbev\/msu300","article-title":"IQ-TREE: A fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies","volume":"32","author":"Nguyen","year":"2015","journal-title":"Mol. Biol. Evol."},{"key":"ref_61","doi-asserted-by":"crossref","first-page":"1422","DOI":"10.1093\/bioinformatics\/btp163","article-title":"Biopython: Freely available Python tools for computational molecular biology and bioinformatics","volume":"25","author":"Cock","year":"2009","journal-title":"Bioinformatics"}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/21\/10\/1000\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T13:25:49Z","timestamp":1760189149000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/21\/10\/1000"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,10,12]]},"references-count":61,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2019,10]]}},"alternative-id":["e21101000"],"URL":"https:\/\/doi.org\/10.3390\/e21101000","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/736173","asserted-by":"object"}]},"ISSN":["1099-4300"],"issn-type":[{"value":"1099-4300","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,10,12]]}}}