{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,22]],"date-time":"2025-02-22T00:46:13Z","timestamp":1740185173125,"version":"3.37.3"},"reference-count":54,"publisher":"Oxford University Press (OUP)","issue":"20","license":[{"start":{"date-parts":[[2021,5,13]],"date-time":"2021-05-13T00:00:00Z","timestamp":1620864000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000057","name":"National Institute of General Medical Sciences","doi-asserted-by":"publisher","award":["R01 GM125878"],"award-info":[{"award-number":["R01 GM125878"]}],"id":[{"id":"10.13039\/100000057","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["BIO MCB 1817942"],"award-info":[{"award-number":["BIO MCB 1817942"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000092","name":"National Library of Medicine","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000092","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,10,25]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>Detecting subtle biologically relevant patterns in protein sequences often requires the construction of a large and accurate multiple sequence alignment (MSA). Methods for constructing MSAs are usually evaluated using benchmark alignments, which, however, typically contain very few sequences and are therefore inappropriate when dealing with large numbers of proteins.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>eCOMPASS addresses this problem using a statistical measure of relative alignment quality based on direct coupling analysis (DCA): to maintain protein structural integrity over evolutionary time, substitutions at one residue position typically result in compensating substitutions at other positions. eCOMPASS computes the statistical significance of the congruence between high scoring directly coupled pairs and 3D contacts in corresponding structures, which depends upon properly aligned homologous residues. We illustrate eCOMPASS using both simulated and real MSAs.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability and implementation<\/jats:title><jats:p>The eCOMPASS executable, C++ open source code and input data sets are available at https:\/\/www.igs.umaryland.edu\/labs\/neuwald\/software\/compass<\/jats:p><\/jats:sec><jats:sec><jats:title>Supplementary information<\/jats:title><jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btab374","type":"journal-article","created":{"date-parts":[[2021,5,12]],"date-time":"2021-05-12T19:22:57Z","timestamp":1620847377000},"page":"3456-3463","source":"Crossref","is-referenced-by-count":1,"title":["eCOMPASS: evaluative comparison of multiple protein alignments by statistical score"],"prefix":"10.1093","volume":"37","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0086-5755","authenticated-orcid":false,"given":"Andrew F","family":"Neuwald","sequence":"first","affiliation":[{"name":"Department of Biochemistry and Molecular Biology, University of Maryland School of Medicine , Baltimore, MD 21201, USA"}]},{"given":"Bryan D","family":"Kolaczkowski","sequence":"additional","affiliation":[{"name":"Department of Microbiology and Cell Science, University of Florida , Gainesville, FL 32611, USA"}]},{"given":"Stephen F","family":"Altschul","sequence":"additional","affiliation":[{"name":"Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health , Bethesda, MD 20894, USA"}]}],"member":"286","published-online":{"date-parts":[[2021,5,13]]},"reference":[{"key":"2023051609050273700_btab374-B1","doi-asserted-by":"crossref","first-page":"2165","DOI":"10.1093\/bioinformatics\/btn414","article-title":"Model-based prediction of sequence alignment quality","volume":"24","author":"Ahola","year":"2008","journal-title":"Bioinformatics"},{"key":"2023051609050273700_btab374-B2","doi-asserted-by":"crossref","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped BLAST and PSI-BLAST: a new generation of protein database search programs","volume":"25","author":"Altschul","year":"1997","journal-title":"Nucleic Acids Res"},{"key":"2023051609050273700_btab374-B3","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1089\/cmb.2017.0050","article-title":"Initial Cluster Analysis","volume":"25","author":"Altschul","year":"2018","journal-title":"J. Comput. Biol"},{"key":"2023051609050273700_btab374-B4","doi-asserted-by":"crossref","first-page":"117","DOI":"10.1093\/sysbio\/syy036","article-title":"Multiple sequence alignment averaging improves phylogeny reconstruction","volume":"68","author":"Ashkenazy","year":"2019","journal-title":"Syst. Biol"},{"key":"2023051609050273700_btab374-B5","doi-asserted-by":"crossref","first-page":"e92721","DOI":"10.1371\/journal.pone.0092721","article-title":"Fast and accurate multivariate Gaussian modeling of protein families: predicting residue contacts and protein-interaction partners","volume":"9","author":"Baldassi","year":"2014","journal-title":"PLoS One"},{"key":"2023051609050273700_btab374-B6","doi-asserted-by":"crossref","first-page":"333","DOI":"10.1093\/bioinformatics\/btm604","article-title":"Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction","volume":"24","author":"Dunn","year":"2008","journal-title":"Bioinformatics"},{"key":"2023051609050273700_btab374-B7","doi-asserted-by":"crossref","first-page":"113","DOI":"10.1186\/1471-2105-5-113","article-title":"MUSCLE: a multiple sequence alignment method with reduced time and space complexity","volume":"5","author":"Edgar","year":"2004","journal-title":"BMC Bioinform"},{"key":"2023051609050273700_btab374-B8","doi-asserted-by":"crossref","first-page":"2145","DOI":"10.1093\/nar\/gkp1196","article-title":"Quality measures for protein alignment benchmarks","volume":"38","author":"Edgar","year":"2010","journal-title":"Nucleic Acids Res"},{"key":"2023051609050273700_btab374-B9","doi-asserted-by":"crossref","first-page":"D427","DOI":"10.1093\/nar\/gky995","article-title":"The Pfam protein families database in 2019","volume":"47","author":"El-Gebali","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2023051609050273700_btab374-B10","doi-asserted-by":"crossref","first-page":"2257","DOI":"10.1093\/molbev\/msq115","article-title":"The effect of insertions, deletions, and alignment errors on the branch-site test of positive selection","volume":"27","author":"Fletcher","year":"2010","journal-title":"Mol. Biol. Evol"},{"key":"2023051609050273700_btab374-B11","doi-asserted-by":"crossref","first-page":"3150","DOI":"10.1093\/bioinformatics\/bts565","article-title":"CD-HIT: accelerated for clustering the next-generation sequencing data","volume":"28","author":"Fu","year":"2012","journal-title":"Bioinformatics"},{"key":"2023051609050273700_btab374-B12","doi-asserted-by":"crossref","first-page":"341","DOI":"10.1016\/j.sbi.2009.04.003","article-title":"Advances and pitfalls of protein structural alignment","volume":"19","author":"Hasegawa","year":"2009","journal-title":"Curr. Opin. Struct. Biol"},{"key":"2023051609050273700_btab374-B13","doi-asserted-by":"crossref","first-page":"W545","DOI":"10.1093\/nar\/gkq366","article-title":"Dali server: conservation mapping in 3D","volume":"38","author":"Holm","year":"2010","journal-title":"Nucleic Acids Res"},{"key":"2023051609050273700_btab374-B14","doi-asserted-by":"crossref","first-page":"2780","DOI":"10.1093\/bioinformatics\/btn507","article-title":"Searching protein structure databases with DaliLite v.3","volume":"24","author":"Holm","year":"2008","journal-title":"Bioinformatics"},{"key":"2023051609050273700_btab374-B15","doi-asserted-by":"crossref","first-page":"1607","DOI":"10.1016\/j.cell.2012.04.012","article-title":"Three-dimensional structures of membrane proteins from genomic sequencing","volume":"149","author":"Hopf","year":"2012","journal-title":"Cell"},{"key":"2023051609050273700_btab374-B16","doi-asserted-by":"crossref","first-page":"431","DOI":"10.1186\/1471-2105-11-431","article-title":"Hidden Markov model speed heuristic and iterative HMM search procedure","volume":"11","author":"Johnson","year":"2010","journal-title":"BMC Bioinform"},{"key":"2023051609050273700_btab374-B17","doi-asserted-by":"crossref","first-page":"184","DOI":"10.1093\/bioinformatics\/btr638","article-title":"PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments","volume":"28","author":"Jones","year":"2012","journal-title":"Bioinformatics"},{"key":"2023051609050273700_btab374-B18","doi-asserted-by":"crossref","first-page":"131","DOI":"10.1007\/978-1-62703-646-7_8","article-title":"MAFFT: iterative refinement and additional methods","volume":"1079","author":"Katoh","year":"2014","journal-title":"Methods Mol. Biol"},{"key":"2023051609050273700_btab374-B19","doi-asserted-by":"crossref","first-page":"355","DOI":"10.1186\/1471-2105-8-355","article-title":"Accuracy of structure-based sequence alignment of automatic methods","volume":"8","author":"Kim","year":"2007","journal-title":"BMC Bioinform"},{"key":"2023051609050273700_btab374-B20","doi-asserted-by":"crossref","first-page":"1928","DOI":"10.1093\/bioinformatics\/btz795","article-title":"Kalign 3: multiple sequence alignment of large data sets","volume":"36","author":"Lassmann","year":"2020","journal-title":"Bioinformatics"},{"key":"2023051609050273700_btab374-B21","doi-asserted-by":"crossref","first-page":"7120","DOI":"10.1093\/nar\/gki1020","article-title":"Automatic assessment of alignment quality","volume":"33","author":"Lassmann","year":"2005","journal-title":"Nucleic Acids Res"},{"key":"2023051609050273700_btab374-B22","doi-asserted-by":"crossref","first-page":"3057","DOI":"10.1093\/molbev\/msu231","article-title":"Alignment errors strongly impact likelihood-based tests for comparing topologies","volume":"31","author":"Levy Karin","year":"2014","journal-title":"Mol. Biol. Evol"},{"key":"2023051609050273700_btab374-B23","doi-asserted-by":"crossref","first-page":"17","DOI":"10.1016\/S0076-6879(10)71002-8","article-title":"Inference of direct residue contacts in two-component signaling","volume":"471","author":"Lunt","year":"2010","journal-title":"Methods Enzymol"},{"key":"2023051609050273700_btab374-B24","doi-asserted-by":"crossref","first-page":"e28766","DOI":"10.1371\/journal.pone.0028766","article-title":"Protein 3D structure computed from evolutionary sequence variation","volume":"6","author":"Marks","year":"2011","journal-title":"PLoS One"},{"key":"2023051609050273700_btab374-B25","doi-asserted-by":"crossref","first-page":"1072","DOI":"10.1038\/nbt.2419","article-title":"Protein structure prediction from sequence variation","volume":"30","author":"Marks","year":"2012","journal-title":"Nat. Biotechnol"},{"key":"2023051609050273700_btab374-B26","doi-asserted-by":"crossref","first-page":"E1293","DOI":"10.1073\/pnas.1111471108","article-title":"Direct-coupling analysis of residue coevolution captures native contacts across many protein families","volume":"108","author":"Morcos","year":"2011","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023051609050273700_btab374-B27","doi-asserted-by":"crossref","first-page":"062409","DOI":"10.1103\/PhysRevE.102.062409","article-title":"Aligning biological sequences by exploiting residue conservation and coevolution","volume":"102","author":"Muntoni","year":"2020","journal-title":"Phys. Rev. E"},{"key":"2023051609050273700_btab374-B28","doi-asserted-by":"crossref","first-page":"1869","DOI":"10.1093\/bioinformatics\/btp342","article-title":"Rapid detection, classification and accurate alignment of up to a million or more related protein sequences","volume":"25","author":"Neuwald","year":"2009","journal-title":"Bioinformatics"},{"key":"2023051609050273700_btab374-B29","doi-asserted-by":"crossref","first-page":"497","DOI":"10.1515\/sagmb-2014-0008","article-title":"Protein domain hierarchy Gibbs sampling strategies","volume":"13","author":"Neuwald","year":"2014","journal-title":"Stat. Appl. Genet. Mol. Biol"},{"key":"2023051609050273700_btab374-B30","doi-asserted-by":"crossref","first-page":"e1004936","DOI":"10.1371\/journal.pcbi.1004936","article-title":"Bayesian top-down protein sequence alignment with inferred position-specific gap penalties","volume":"12","author":"Neuwald","year":"2016","journal-title":"PLoS Comput. Biol"},{"key":"2023051609050273700_btab374-B31","doi-asserted-by":"crossref","first-page":"e1006237","DOI":"10.1371\/journal.pcbi.1006237","article-title":"Statistical investigations of protein residue direct couplings","volume":"14","author":"Neuwald","year":"2018","journal-title":"PLoS Comput. Biol"},{"key":"2023051609050273700_btab374-B32","doi-asserted-by":"crossref","first-page":"1445","DOI":"10.1101\/gr.147400","article-title":"HEAT repeats associated with condensins, cohesins, and other complexes involved in chromosome-related functions","volume":"10","author":"Neuwald","year":"2000","journal-title":"Genome Res"},{"key":"2023051609050273700_btab374-B33","doi-asserted-by":"crossref","first-page":"3570","DOI":"10.1093\/nar\/28.18.3570","article-title":"PSI-BLAST searches using hidden Markov models of structural repeats: prediction of an unusual sliding DNA clamp and of beta-propellers in UV-damaged DNA-binding protein","volume":"28","author":"Neuwald","year":"2000","journal-title":"Nucleic Acids Res"},{"key":"2023051609050273700_btab374-B34","doi-asserted-by":"crossref","first-page":"e29880","DOI":"10.7554\/eLife.29880","article-title":"Inferring joint sequence-structural determinants of protein functional specificity","volume":"7","author":"Neuwald","year":"2018","journal-title":"Elife"},{"key":"2023051609050273700_btab374-B35","doi-asserted-by":"crossref","first-page":"144","DOI":"10.1186\/1471-2105-13-144","article-title":"Automated hierarchical classification of protein domain subfamilies based on functionally-divergent residue signatures","volume":"13","author":"Neuwald","year":"2012","journal-title":"BMC Bioinform"},{"key":"2023051609050273700_btab374-B36","doi-asserted-by":"crossref","first-page":"baaa042","DOI":"10.1093\/database\/baaa042","article-title":"Obtaining extremely large and accurate protein multiple sequence alignments from curated hierarchical alignments","volume":"2020","author":"Neuwald","year":"2020","journal-title":"Database"},{"key":"2023051609050273700_btab374-B37","doi-asserted-by":"crossref","first-page":"E1540","DOI":"10.1073\/pnas.1120036109","article-title":"Accurate de novo structure prediction of large transmembrane protein domains using fragment-assembly and correlated mutation analysis","volume":"109","author":"Nugent","year":"2012","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023051609050273700_btab374-B38","doi-asserted-by":"crossref","first-page":"i215","DOI":"10.1093\/bioinformatics\/btg1029","article-title":"APDB: a novel measure for benchmarking sequence alignment methods without reference alignments","volume":"19","author":"O'Sullivan","year":"2003","journal-title":"Bioinformatics"},{"key":"2023051609050273700_btab374-B39","doi-asserted-by":"crossref","first-page":"700","DOI":"10.1093\/bioinformatics\/17.8.700","article-title":"AL2CO: calculation of positional conservation in a protein sequence alignment","volume":"17","author":"Pei","year":"2001","journal-title":"Bioinformatics"},{"key":"2023051609050273700_btab374-B40","doi-asserted-by":"crossref","first-page":"2994","DOI":"10.1093\/nar\/29.14.2994","article-title":"Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements","volume":"29","author":"Sch\u00e4ffer","year":"2001","journal-title":"Nucleic Acids Res"},{"key":"2023051609050273700_btab374-B41","doi-asserted-by":"crossref","first-page":"3128","DOI":"10.1093\/bioinformatics\/btu500","article-title":"CCMpred\u2013fast and precise prediction of protein residue-residue contacts from correlated mutations","volume":"30","author":"Seemayer","year":"2014","journal-title":"Bioinformatics"},{"key":"2023051609050273700_btab374-B42","doi-asserted-by":"crossref","first-page":"2326","DOI":"10.1093\/bioinformatics\/btl398","article-title":"ARCS: an aggregated related column scoring scheme for aligned sequences","volume":"22","author":"Song","year":"2006","journal-title":"Bioinformatics"},{"key":"2023051609050273700_btab374-B43","doi-asserted-by":"crossref","first-page":"320","DOI":"10.1093\/nar\/26.1.320","article-title":"Pfam: multiple sequence alignments and HMM-profiles of protein domains","volume":"26","author":"Sonnhammer","year":"1998","journal-title":"Nucleic Acids Res"},{"key":"2023051609050273700_btab374-B44","article-title":"ComPotts: optimal alignment of coevolutionary models for protein sequences","volume":"2020","author":"Talibart","year":"2020","journal-title":"bioRxiv"},{"key":"2023051609050273700_btab374-B45","article-title":"PPalign: optimal alignment of Potts models representing proteins with direct coupling information","volume":"2020","author":"Talibart","year":"2020","journal-title":"bioRxiv"},{"key":"2023051609050273700_btab374-B46","doi-asserted-by":"crossref","first-page":"937","DOI":"10.1006\/jmbi.2001.5187","article-title":"Towards a reliable objective function for multiple sequence alignments","volume":"314","author":"Thompson","year":"2001","journal-title":"J. Mol. Biol"},{"key":"2023051609050273700_btab374-B47","doi-asserted-by":"crossref","first-page":"e18093","DOI":"10.1371\/journal.pone.0018093","article-title":"A comprehensive benchmark study of multiple sequence alignment methods: current challenges and future perspectives","volume":"6","author":"Thompson","year":"2011","journal-title":"PLoS One"},{"key":"2023051609050273700_btab374-B48","doi-asserted-by":"crossref","first-page":"1691","DOI":"10.1038\/s41598-019-55118-6","article-title":"Deep Analysis of Residue Constraints (DARC): identifying determinants of protein functional specificity","volume":"10","author":"Tondnevis","year":"2020","journal-title":"Sci. Rep"},{"key":"2023051609050273700_btab374-B49","doi-asserted-by":"crossref","first-page":"181","DOI":"10.1007\/s00251-020-01157-7","article-title":"A survey of TIR domain sequence and structure divergence","volume":"72","author":"Toshchakov","year":"2020","journal-title":"Immunogenetics"},{"key":"2023051609050273700_btab374-B50","doi-asserted-by":"crossref","first-page":"e1006526","DOI":"10.1371\/journal.pcbi.1006526","article-title":"Synthetic protein alignments by CCMgen quantify noise in residue-residue contact prediction","volume":"14","author":"Vorberg","year":"2018","journal-title":"PLoS Comput. Biol"},{"key":"2023051609050273700_btab374-B51","doi-asserted-by":"crossref","first-page":"W296","DOI":"10.1093\/nar\/gky427","article-title":"SWISS-MODEL: homology modelling of protein structures and complexes","volume":"46","author":"Waterhouse","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"2023051609050273700_btab374-B52","doi-asserted-by":"crossref","first-page":"67","DOI":"10.1073\/pnas.0805923106","article-title":"Identification of direct residue contacts in protein-protein interaction by message passing","volume":"106","author":"Weigt","year":"2009","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023051609050273700_btab374-B53","doi-asserted-by":"crossref","first-page":"e1008085","DOI":"10.1371\/journal.pcbi.1008085","article-title":"Remote homology search with hidden Potts models","volume":"16","author":"Wilburn","year":"2020","journal-title":"PLoS Comput. Biol"},{"key":"2023051609050273700_btab374-B54","doi-asserted-by":"crossref","first-page":"e90","DOI":"10.1002\/cpbi.90","article-title":"NCBI's conserved domain database and tools for protein domain analysis","volume":"69","author":"Yang","year":"2020","journal-title":"Curr. Protoc. Bioinform"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btab374\/39737931\/btab374.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/20\/3456\/50338737\/btab374.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/20\/3456\/50338737\/btab374.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,3]],"date-time":"2023-11-03T17:17:52Z","timestamp":1699031872000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/37\/20\/3456\/6275262"}},"subtitle":[],"editor":[{"given":"Yann","family":"Ponty","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2021,5,13]]},"references-count":54,"journal-issue":{"issue":"20","published-print":{"date-parts":[[2021,10,25]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btab374","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2021,10,15]]},"published":{"date-parts":[[2021,5,13]]}}}