{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,4]],"date-time":"2026-04-04T03:25:44Z","timestamp":1775273144422,"version":"3.50.1"},"reference-count":43,"publisher":"Oxford University Press (OUP)","issue":"11","license":[{"start":{"date-parts":[[2023,10,17]],"date-time":"2023-10-17T00:00:00Z","timestamp":1697500800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Spanish Agency of Research","award":["PID2019-109041GB-C22\/10.13039\/501100011033"],"award-info":[{"award-number":["PID2019-109041GB-C22\/10.13039\/501100011033"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,11,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Evolutionary inference depends crucially on the quality of multiple sequence alignments (MSA), which is problematic for distantly related proteins. Since protein structure is more conserved than sequence, it seems natural to use structure alignments for distant homologs. However, structure alignments may not be suitable for inferring evolutionary relationships.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>Here we examined four protein similarity measures that depend on sequence and structure (fraction of aligned residues, sequence identity, fraction of superimposed residues, and contact overlap), finding that they are intimately correlated but none of them provides a complete and unbiased picture of conservation in proteins. Therefore, we propose the new hybrid protein sequence and structure similarity score PC_sim based on their main principal component. The corresponding divergence measure PC_div shows the strongest correlation with divergences obtained from individual similarities, suggesting that it infers accurate evolutionary divergences. We developed the program PC_ali that constructs protein MSAs either de novo or modifying an input MSA, using a similarity matrix based on PC_sim. The program constructs a starting MSA based on the maximal cliques of the graph of these PAs and it refines it through progressive alignments along the tree reconstructed with PC_div. Compared with eight state-of-the-art multiple structure or sequence alignment tools, PC_ali achieves higher or equal aligned fraction and structural scores, sequence identity higher than structure aligners although lower than sequence aligners, highest score PC_sim, and highest similarity with the MSAs produced by other tools and with the reference MSA Balibase.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>https:\/\/github.com\/ugobas\/PC_ali.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btad630","type":"journal-article","created":{"date-parts":[[2023,10,17]],"date-time":"2023-10-17T19:48:13Z","timestamp":1697572093000},"source":"Crossref","is-referenced-by-count":6,"title":["PC_ali: a tool for improved multiple alignments and evolutionary inference based on a hybrid protein sequence and structure similarity score"],"prefix":"10.1093","volume":"39","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9342-4678","authenticated-orcid":false,"given":"Ugo","family":"Bastolla","sequence":"first","affiliation":[{"name":"Centro de Biologia Molecular \u201cSevero Ochoa\u201d (CBMSO), CSIC-UAM Cantoblanco , 28049 Madrid, Spain"}]},{"given":"David","family":"Abia","sequence":"additional","affiliation":[{"name":"Bioinformatics Facility CBMSO, CSIC-UAM Cantoblanco , 28049 Madrid, Spain"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1472-3213","authenticated-orcid":false,"given":"Oscar","family":"Piette","sequence":"additional","affiliation":[{"name":"Centro de Biologia Molecular \u201cSevero Ochoa\u201d (CBMSO), CSIC-UAM Cantoblanco , 28049 Madrid, Spain"}]}],"member":"286","published-online":{"date-parts":[[2023,10,17]]},"reference":[{"key":"2023110706585333200_btad630-B1","doi-asserted-by":"crossref","first-page":"488","DOI":"10.1002\/wcms.1186","article-title":"Computing protein dynamics from protein structure with elastic network models","volume":"4","author":"Bastolla","year":"2014","journal-title":"WIREs Comput Mol Sci"},{"key":"2023110706585333200_btad630-B2","doi-asserted-by":"crossref","first-page":"291","DOI":"10.3390\/biom4010291","article-title":"Detecting selection on protein stability through statistical mechanical models of folding and evolution","volume":"4","author":"Bastolla","year":"2014","journal-title":"Biomolecules"},{"key":"2023110706585333200_btad630-B3","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1093\/nar\/28.1.235","article-title":"The protein data bank","volume":"28","author":"Berman","year":"2000","journal-title":"Nucleic Acids Res"},{"key":"2023110706585333200_btad630-B4","doi-asserted-by":"crossref","first-page":"3970","DOI":"10.1093\/bioinformatics\/btz236","article-title":"Protein multiple alignments: sequence-based versus structure-based programs","volume":"35","author":"Carpentier","year":"2019","journal-title":"Bioinformatics"},{"key":"2023110706585333200_btad630-B5","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1186\/1745-6150-8-3","article-title":"Next-generation phylogenomics","volume":"8","author":"Chan","year":"2013","journal-title":"Biol Direct"},{"key":"2023110706585333200_btad630-B6","doi-asserted-by":"crossref","first-page":"823","DOI":"10.1002\/j.1460-2075.1986.tb04288.x","article-title":"The relation between the divergence of sequence and structure in proteins","volume":"5","author":"Chothia","year":"1986","journal-title":"EMBO J"},{"key":"2023110706585333200_btad630-B7","doi-asserted-by":"crossref","first-page":"249","DOI":"10.1038\/nrg3414","article-title":"Emerging methods in protein co-evolution","volume":"14","author":"De Juan","year":"2013","journal-title":"Nat Rev Genet"},{"key":"2023110706585333200_btad630-B8","doi-asserted-by":"crossref","first-page":"1719","DOI":"10.1093\/bioinformatics\/btx828","article-title":"mTM-align: an algorithm for fast and accurate multiple protein structure alignment","volume":"34","author":"Dong","year":"2018","journal-title":"Bioinformatics"},{"key":"2023110706585333200_btad630-B11","doi-asserted-by":"crossref","first-page":"1792","DOI":"10.1093\/nar\/gkh340","article-title":"MUSCLE: multiple sequence alignment with high accuracy and high throughput","volume":"32","author":"Edgar","year":"2004","journal-title":"Nucleic Acids Res"},{"key":"2023110706585333200_btad630-B12","doi-asserted-by":"crossref","first-page":"309","DOI":"10.1002\/prot.340180402","article-title":"Correlated mutations and residue contacts in proteins","volume":"18","author":"G\u00f6bel","year":"1994","journal-title":"Proteins"},{"key":"2023110706585333200_btad630-B13","doi-asserted-by":"crossref","first-page":"1868","DOI":"10.1002\/prot.23011","article-title":"Revisiting gap locations in amino acid sequence alignments and a proposal for a method to improve them by introducing solvent accessibility","volume":"79","author":"Hijikata","year":"2011","journal-title":"Proteins"},{"key":"2023110706585333200_btad630-B14","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1006\/jmbi.1993.1489","article-title":"Protein structure comparison by alignment of distance matrices","volume":"233","author":"Holm","year":"1993","journal-title":"J Mol Biol"},{"key":"2023110706585333200_btad630-B15","doi-asserted-by":"crossref","first-page":"499","DOI":"10.1002\/prot.22458","article-title":"Structure is three to ten times more conserved than sequence - a study of structural response in protein cores","volume":"77","author":"Illergard","year":"2009","journal-title":"Proteins"},{"key":"2023110706585333200_btad630-B16","doi-asserted-by":"crossref","first-page":"227","DOI":"10.1093\/protein\/14.4.227","article-title":"An approach to improving multiple alignments of protein sequences using predicted secondary structure","volume":"14","author":"Jennings","year":"2001","journal-title":"Protein Eng"},{"key":"2023110706585333200_btad630-B17","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1038\/s41586-021-03819-2","article-title":"Highly accurate protein structure prediction with AlphaFold","volume":"596","author":"Jumper","year":"2021","journal-title":"Nature"},{"key":"2023110706585333200_btad630-B18","doi-asserted-by":"crossref","first-page":"772","DOI":"10.1093\/molbev\/mst010","article-title":"MAFFT multiple sequence alignment software version 7: improvements in performance and usability","volume":"30","author":"Katoh","year":"2013","journal-title":"Mol Biol Evol"},{"key":"2023110706585333200_btad630-B19","doi-asserted-by":"crossref","first-page":"3057","DOI":"10.1093\/molbev\/msu231","article-title":"Alignment errors strongly impact likelihood-based tests for comparing topologies","volume":"31","author":"Levy Karin","year":"2014","journal-title":"Mol Biol Evol"},{"key":"2023110706585333200_btad630-B20","doi-asserted-by":"crossref","first-page":"298","DOI":"10.1101\/gr.6725608","article-title":"Uncertainty in homology inferences: assessing and improving genomic sequence alignment","volume":"18","author":"Lunter","year":"2008","journal-title":"Genome Res"},{"key":"2023110706585333200_btad630-B21","doi-asserted-by":"crossref","first-page":"3255","DOI":"10.1093\/bioinformatics\/bti527","article-title":"A new progressive-iterative algorithm for multiple structure alignment","volume":"21","author":"Lupyan","year":"2005","journal-title":"Bioinformatics"},{"key":"2023110706585333200_btad630-B23","doi-asserted-by":"crossref","first-page":"2102","DOI":"10.1093\/gbe\/evv127","article-title":"Evidence of statistical inconsistency of phylogenetic methods in the presence of multiple sequence alignment uncertainty","volume":"7","author":"Mukarram","year":"2015","journal-title":"Genome Biol Evol"},{"key":"2023110706585333200_btad630-B24","doi-asserted-by":"crossref","first-page":"536","DOI":"10.1016\/S0022-2836(05)80134-2","article-title":"SCOP: a structural classification of proteins database for the investigation of sequences and structures","volume":"247","author":"Murzin","year":"1995","journal-title":"J Mol Biol"},{"key":"2023110706585333200_btad630-B25","doi-asserted-by":"crossref","first-page":"205","DOI":"10.1006\/jmbi.2000.4042","article-title":"T-Coffee: a novel method for fast and accurate multiple sequence alignment","volume":"302","author":"Notredame","year":"2000","journal-title":"J Mol Biol"},{"key":"2023110706585333200_btad630-B26","doi-asserted-by":"crossref","first-page":"314","DOI":"10.1080\/10635150500541730","article-title":"Multiple sequence alignment accuracy and phylogenetic inference","volume":"55","author":"Ogden","year":"2006","journal-title":"Syst Biol"},{"key":"2023110706585333200_btad630-B27","doi-asserted-by":"crossref","first-page":"1093","DOI":"10.1016\/S0969-2126(97)00260-8","article-title":"CATH\u2013a hierarchic classification of protein domain structures","volume":"5","author":"Orengo","year":"1997","journal-title":"Structure"},{"key":"2023110706585333200_btad630-B28","doi-asserted-by":"crossref","first-page":"181","DOI":"10.1002\/prot.22616","article-title":"Quantifying the evolutionary divergence of protein structures: the role of function change and function conservation","volume":"78","author":"Pascual-Garc\u00eda","year":"2010","journal-title":"Proteins"},{"key":"2023110706585333200_btad630-B29","doi-asserted-by":"crossref","first-page":"987","DOI":"10.1093\/sysbio\/syz022","article-title":"The molecular clock in the evolution of protein structures","volume":"68","author":"Pascual-Garc\u00eda","year":"2019","journal-title":"Syst Biol"},{"key":"2023110706585333200_btad630-B30","doi-asserted-by":"crossref","first-page":"221","DOI":"10.1038\/nmeth.2340","article-title":"A large-scale evaluation of computational protein function prediction","volume":"10","author":"Radivojac","year":"2013","journal-title":"Nat Methods"},{"key":"2023110706585333200_btad630-B31","doi-asserted-by":"crossref","first-page":"S19","DOI":"10.1016\/S1359-0278(97)00059-X","article-title":"Protein structures sustain evolutionary drift","volume":"2","author":"Rost","year":"1997","journal-title":"Fold Des"},{"key":"2023110706585333200_btad630-B32","doi-asserted-by":"crossref","first-page":"318","DOI":"10.1002\/prot.340230306","article-title":"Evaluation of comparative protein modeling by MODELLER","volume":"23","author":"Sali","year":"1995","journal-title":"Proteins"},{"key":"2023110706585333200_btad630-B33","doi-asserted-by":"crossref","first-page":"539","DOI":"10.1038\/msb.2011.75","article-title":"Fast, scalable generation of high-quality protein multiple sequence alignments using clustal omega","volume":"7","author":"Sievers","year":"2011","journal-title":"Mol Sys Biol"},{"key":"2023110706585333200_btad630-B34","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1016\/0022-2836(81)90087-5","article-title":"Identification of common molecular subsequences","volume":"147","author":"Smith","year":"1981","journal-title":"J Mol Biol"},{"key":"2023110706585333200_btad630-B6834716","first-page":"269","article-title":"Estimation of evolutionary distance between nucleotide sequences","volume":"1","author":"Tajima","year":"1984","journal-title":"Mol Biol Evol"},{"key":"2023110706585333200_btad630-B35","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1093\/protein\/14.1.1","article-title":"Conformational change of proteins arising from normal mode calculations","volume":"14","author":"Tama","year":"2001","journal-title":"Protein Eng"},{"key":"2023110706585333200_btad630-B36","doi-asserted-by":"crossref","first-page":"105","DOI":"10.1002\/prot.10016","article-title":"Why are proteins marginally stable?","volume":"46","author":"Taverna","year":"2002","journal-title":"Proteins"},{"key":"2023110706585333200_btad630-B37","doi-asserted-by":"crossref","first-page":"2682","DOI":"10.1093\/nar\/27.13.2682","article-title":"A comprehensive comparison of multiple sequence alignment programs","volume":"27","author":"Thompson","year":"1999","journal-title":"Nucleic Acids Res"},{"key":"2023110706585333200_btad630-B38","doi-asserted-by":"crossref","first-page":"1905","DOI":"10.1103\/PhysRevLett.77.1905","article-title":"Large amplitude elastic motions in proteins from a single-parameter, atomic analysis","volume":"77","author":"Tirion","year":"1996","journal-title":"Phys Rev Lett"},{"key":"2023110706585333200_btad630-B39","doi-asserted-by":"crossref","first-page":"411","DOI":"10.1002\/prot.24746","article-title":"Refinement by shifting secondary structure elements improves sequence alignments","volume":"83","author":"Tong","year":"2015","journal-title":"Proteins"},{"key":"2023110706585333200_btad630-B40","doi-asserted-by":"crossref","first-page":"67","DOI":"10.1073\/pnas.0805923106","article-title":"Identification of direct residue contacts in protein-protein interaction by message passing","volume":"106","author":"Weigt","year":"2009","journal-title":"Proc Natl Acad Sci U S A"},{"key":"2023110706585333200_btad630-B41","doi-asserted-by":"crossref","first-page":"473","DOI":"10.1126\/science.1151532","article-title":"Alignment uncertainty and genomic analysis","volume":"319","author":"Wong","year":"2008","journal-title":"Science"},{"key":"2023110706585333200_btad630-B42","doi-asserted-by":"crossref","first-page":"71","DOI":"10.1002\/prot.10508","article-title":"Gaps in structurally similar proteins: towards improvement of multiple sequence alignment","volume":"54","author":"Wrabl","year":"2004","journal-title":"Proteins"},{"key":"2023110706585333200_btad630-B43","doi-asserted-by":"crossref","first-page":"322","DOI":"10.1186\/s12859-015-0749-z","article-title":"DECIPHER: harnessing local sequence context to improve protein multiple sequence alignment","volume":"16","author":"Wright","year":"2015","journal-title":"BMC Bioinformatics"},{"key":"2023110706585333200_btad630-B44","doi-asserted-by":"crossref","first-page":"702","DOI":"10.1002\/prot.20264","article-title":"Scoring function for automated assessment of protein structure template quality","volume":"57","author":"Zhang","year":"2004","journal-title":"Proteins"},{"key":"2023110706585333200_btad630-B45","doi-asserted-by":"crossref","first-page":"2302","DOI":"10.1093\/nar\/gki524","article-title":"TM-align: a protein structure alignment algorithm based on the TM-score","volume":"33","author":"Zhang","year":"2005","journal-title":"Nucleic Acids Res"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btad630\/52191162\/btad630.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/11\/btad630\/52771890\/btad630.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/39\/11\/btad630\/52771890\/btad630.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,7]],"date-time":"2023-11-07T06:59:35Z","timestamp":1699340375000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btad630\/7320008"}},"subtitle":[],"editor":[{"given":"Alfonso","family":"Valencia","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2023,10,17]]},"references-count":43,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2023,11,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btad630","relation":{},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2023,11,1]]},"published":{"date-parts":[[2023,10,17]]},"article-number":"btad630"}}