{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,27]],"date-time":"2026-02-27T06:11:20Z","timestamp":1772172680261,"version":"3.50.1"},"update-to":[{"DOI":"10.1371\/journal.pcbi.1008085","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2020,12,10]],"date-time":"2020-12-10T00:00:00Z","timestamp":1607558400000}}],"reference-count":56,"publisher":"Public Library of Science (PLoS)","issue":"11","license":[{"start":{"date-parts":[[2020,11,30]],"date-time":"2020-11-30T00:00:00Z","timestamp":1606694400000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000011","name":"Howard Hughes Medical Institute","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000011","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000011","name":"Howard Hughes Medical Institute","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000011","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000051","name":"National Human Genome Research Institute","doi-asserted-by":"publisher","award":["R01-HG009116"],"award-info":[{"award-number":["R01-HG009116"]}],"id":[{"id":"10.13039\/100000051","id-type":"DOI","asserted-by":"publisher"}]},{"name":"National Science Foundation","award":["PHY-1066293"],"award-info":[{"award-number":["PHY-1066293"]}]}],"content-domain":{"domain":["www.ploscompbiol.org"],"crossmark-restriction":false},"short-container-title":["PLoS Comput Biol"],"abstract":"<jats:p>Most methods for biological sequence homology search and alignment work with primary sequence alone, neglecting higher-order correlations. Recently, statistical physics models called Potts models have been used to infer all-by-all pairwise correlations between sites in deep multiple sequence alignments, and these pairwise couplings have improved 3D structure predictions. Here we extend the use of Potts models from structure prediction to sequence alignment and homology search by developing what we call a hidden Potts model (HPM) that merges a Potts emission process to a generative probability model of insertion and deletion. Because an HPM is incompatible with efficient dynamic programming alignment algorithms, we develop an approximate algorithm based on importance sampling, using simpler probabilistic models as proposal distributions. We test an HPM implementation on RNA structure homology search benchmarks, where we can compare directly to exact alignment methods that capture nested RNA base-pairing correlations (stochastic context-free grammars). HPMs perform promisingly in these proof of principle experiments.<\/jats:p>","DOI":"10.1371\/journal.pcbi.1008085","type":"journal-article","created":{"date-parts":[[2020,11,30]],"date-time":"2020-11-30T14:21:26Z","timestamp":1606746086000},"page":"e1008085","update-policy":"https:\/\/doi.org\/10.1371\/journal.pcbi.corrections_policy","source":"Crossref","is-referenced-by-count":26,"title":["Remote homology search with hidden Potts models"],"prefix":"10.1371","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4634-7707","authenticated-orcid":true,"given":"Grey W.","family":"Wilburn","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6676-4706","authenticated-orcid":true,"given":"Sean R.","family":"Eddy","sequence":"additional","affiliation":[]}],"member":"340","published-online":{"date-parts":[[2020,11,30]]},"reference":[{"key":"pcbi.1008085.ref001","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511790492","volume-title":"Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids","author":"R Durbin","year":"1998"},{"key":"pcbi.1008085.ref002","doi-asserted-by":"crossref","unstructured":"Weisman CM, Murray AW, Eddy SR. Many but Not All Lineage-Specific Genes Can Be Explained by Homology Detection Failure. biorXiv 968420v2 [Preprint]. 2020 [Cited 11 June 2020]. Available from: https:\/\/www.biorxiv.org\/content\/10.1101\/2020.02.27.968420v2","DOI":"10.1101\/2020.02.27.968420"},{"key":"pcbi.1008085.ref003","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic Local Alignment Search Tool","volume":"215","author":"SF Altschul","year":"1990","journal-title":"J Mol Biol"},{"key":"pcbi.1008085.ref004","unstructured":"Haussler D, Krogh A, Mian IS, Sjolander K. Protein Modeling Using Hidden Markov Models: Analysis of Globins. In: Proceedings of the Twenty-Sixth Hawaii International Conference on System Sciences; 1993. p. 792\u2013802."},{"key":"pcbi.1008085.ref005","doi-asserted-by":"crossref","first-page":"755","DOI":"10.1093\/bioinformatics\/14.9.755","article-title":"Profile Hidden Markov Models","volume":"14","author":"SR Eddy","year":"1998","journal-title":"Bioinformatics"},{"key":"pcbi.1008085.ref006","doi-asserted-by":"crossref","first-page":"2079","DOI":"10.1093\/nar\/22.11.2079","article-title":"RNA Sequence Analysis Using Covariance Models","volume":"22","author":"SR Eddy","year":"1994","journal-title":"Nucl Acids Res"},{"key":"pcbi.1008085.ref007","doi-asserted-by":"crossref","first-page":"2933","DOI":"10.1093\/bioinformatics\/btt509","article-title":"Infernal 1.1: 100-fold Faster RNA Homology Searches","volume":"29","author":"EP Nawrocki","year":"2013","journal-title":"Bioinformatics"},{"key":"pcbi.1008085.ref008","first-page":"236","article-title":"A Maximum Entropy Formalism for Disentangling Chains of Correlated Sequence Positions","volume":"33","author":"AS Lapedes","year":"1999","journal-title":"Lecture Notes-Monograph Series, Statistics in Molecular Biology and Genetics"},{"key":"pcbi.1008085.ref009","doi-asserted-by":"crossref","first-page":"67","DOI":"10.1073\/pnas.0805923106","article-title":"Identification of Direct Residue Contacts in Protein\u2013Protein Interaction by Message Passing","volume":"106","author":"M Weigt","year":"2009","journal-title":"Proc Natl Acad Sci USA"},{"key":"pcbi.1008085.ref010","doi-asserted-by":"crossref","first-page":"E1293","DOI":"10.1073\/pnas.1111471108","article-title":"Direct-Coupling Analysis of Residue Coevolution Captures Native Contacts Across Many Protein Families","volume":"108","author":"F Morcos","year":"2011","journal-title":"Proc Natl Acad Sci USA"},{"issue":"39","key":"pcbi.1008085.ref011","doi-asserted-by":"crossref","first-page":"15674","DOI":"10.1073\/pnas.1314045110","article-title":"Assessing the Utility of Coevolution-based Residue\u2013Residue Contact Predictions in a Sequence-and Structure-Rich Era","volume":"110","author":"H Kamisetty","year":"2013","journal-title":"Proc Natl Acad Sci USA"},{"issue":"1","key":"pcbi.1008085.ref012","doi-asserted-by":"crossref","first-page":"012707","DOI":"10.1103\/PhysRevE.87.012707","article-title":"Improved Contact Prediction in Proteins: Using Pseudolikelihoods to Infer Potts Models","volume":"87","author":"M Ekeberg","year":"2013","journal-title":"Physical Review E"},{"key":"pcbi.1008085.ref013","first-page":"10444","article-title":"Direct-Coupling Analysis of Nucleotide Coevolution Facilitates RNA Secondary and Tertiary Structure Prediction","volume":"43","author":"E De Leonardis","year":"2015","journal-title":"Nucl Acids Res"},{"key":"pcbi.1008085.ref014","doi-asserted-by":"crossref","first-page":"963","DOI":"10.1016\/j.cell.2016.03.030","article-title":"3D RNA and Functional Interactions from Evolutionary Couplings","volume":"165","author":"C Weinreb","year":"2016","journal-title":"Cell"},{"key":"pcbi.1008085.ref015","doi-asserted-by":"crossref","first-page":"149","DOI":"10.1016\/0025-5564(94)90041-8","article-title":"Modeling Protein Cores with Markov Random Fields","volume":"124","author":"JV White","year":"1994","journal-title":"Math Biosci"},{"key":"pcbi.1008085.ref016","doi-asserted-by":"crossref","first-page":"641","DOI":"10.1006\/jmbi.1996.0053","article-title":"Global Optimum Protein Threading with Gapped Alignment and Empirical Pair Score Functions","volume":"255","author":"RH Lathrop","year":"1996","journal-title":"J Mol Biol"},{"key":"pcbi.1008085.ref017","doi-asserted-by":"crossref","first-page":"183","DOI":"10.1109\/TCBB.2007.70225","article-title":"Graphical Models of Residue Coupling in Protein Families","volume":"5","author":"J Thomas","year":"2008","journal-title":"IEEE\/ACM Trans Comp Biol Bioinf"},{"key":"pcbi.1008085.ref018","first-page":"641","article-title":"Conditional Graphical Models for Protein Structural Motif Recognition","volume":"255","author":"Y Liu","year":"1996","journal-title":"J Comput Biol"},{"key":"pcbi.1008085.ref019","doi-asserted-by":"crossref","first-page":"4069","DOI":"10.1073\/pnas.0909950107","article-title":"Markov Random Fields Reveal an N-Terminal Double Beta-Propeller Motif as Part of a Bacterial Hybrid Two-Component Sensor System","volume":"107","author":"M Menke","year":"2010","journal-title":"Proc Natl Acad Sci USA"},{"key":"pcbi.1008085.ref020","doi-asserted-by":"crossref","first-page":"1930","DOI":"10.1002\/prot.23016","article-title":"A Multiple-Template Approach to Protein Threading","volume":"79","author":"J Peng","year":"2010","journal-title":"Proteins"},{"key":"pcbi.1008085.ref021","doi-asserted-by":"crossref","first-page":"1216","DOI":"10.1093\/bioinformatics\/bts110","article-title":"SMURFLite: Combining Simplified Markov Random Fields with Simulated Evolution Improves Remote Homology Detection for Beta-Structural Proteins into the Twilight Zone","volume":"28","author":"NM Daniels","year":"2012","journal-title":"Bioinformatics"},{"key":"pcbi.1008085.ref022","doi-asserted-by":"crossref","first-page":"e02030","DOI":"10.7554\/eLife.02030","article-title":"Robust and Accurate Prediction of Residue-Residue Interactions across Protein Interfaces Using Evolutionary Information","volume":"113","author":"S Ovchinnikov","year":"2014","journal-title":"eLife"},{"key":"pcbi.1008085.ref023","first-page":"67","article-title":"Inferring Interaction Partners from Protein Sequences","volume":"106","author":"AF Bitbol","year":"2016","journal-title":"Proc Natl Acad Sci USA"},{"key":"pcbi.1008085.ref024","doi-asserted-by":"crossref","first-page":"12185","DOI":"10.1073\/pnas.1607570113","article-title":"Simultaneous Identification of Specifically Interacting Paralogs and Interprotein Contacts by Direct Coupling Analysis","volume":"113","author":"T Gueudre","year":"2016","journal-title":"Proc Natl Acad Sci USA"},{"key":"pcbi.1008085.ref025","doi-asserted-by":"crossref","first-page":"185","DOI":"10.1126\/science.aaw6718","article-title":"Protein Interaction Networks Revealed by Proteome Coevolution","volume":"365","author":"Q Cong","year":"2019","journal-title":"Science"},{"key":"pcbi.1008085.ref026","doi-asserted-by":"crossref","first-page":"3054","DOI":"10.1093\/molbev\/msw188","article-title":"Connecting the Sequence-Space of Bacterial Signaling Proteins to Phenotypes Using Coevolutionary Landscapes","volume":"33","author":"RR Cheng","year":"2016","journal-title":"Mol Biol Evol"},{"key":"pcbi.1008085.ref027","doi-asserted-by":"crossref","first-page":"268","DOI":"10.1093\/molbev\/msv211","article-title":"Coevolutionary Landscape Inference and the Context-Dependence of Mutations in Beta-Lactamase TEM-1","volume":"33","author":"M Figliuzzi","year":"2016","journal-title":"Mol Biol Evol"},{"key":"pcbi.1008085.ref028","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1016\/j.sbi.2016.11.004","article-title":"Potts Hamiltonian Models of Protein Co-variation, Free Energy Landscapes, and Evolutionary Fitness","volume":"43","author":"RM Levy","year":"2017","journal-title":"Curr Opin Struct Biol"},{"key":"pcbi.1008085.ref029","doi-asserted-by":"crossref","first-page":"128","DOI":"10.1038\/nbt.3769","article-title":"Mutation Effects Predicted from Sequence Co-variation","volume":"35","author":"TA Hopf","year":"2017","journal-title":"Nature Biotechnology"},{"key":"pcbi.1008085.ref030","doi-asserted-by":"crossref","first-page":"e34300","DOI":"10.7554\/eLife.34300","article-title":"Coevolution-Based Inference of Amino Acid Interactions Underlying Protein Function","volume":"7","author":"VH Salinas","year":"2018","journal-title":"eLife"},{"key":"pcbi.1008085.ref031","doi-asserted-by":"crossref","first-page":"2013","DOI":"10.1103\/PhysRevLett.69.2013","article-title":"Simulation of Biological Cell Sorting Using a Two-Dimensional Extended Potts Model","volume":"69","author":"F Graner","year":"1992","journal-title":"Physical Review Letters"},{"key":"pcbi.1008085.ref032","doi-asserted-by":"crossref","first-page":"1007","DOI":"10.1038\/nature04701","article-title":"Weak Pairwise Correlations Imply Strongly Correlated Network States in a Neural Population","volume":"440","author":"E Schneidmann","year":"2007","journal-title":"Nature"},{"key":"pcbi.1008085.ref033","first-page":"347","article-title":"Inferring Consensus Structure from Nucleic Acid Sequences","volume":"7","author":"DKY Chiu","year":"1991","journal-title":"Comput Applic Biosci"},{"key":"pcbi.1008085.ref034","doi-asserted-by":"crossref","first-page":"5785","DOI":"10.1093\/nar\/20.21.5785","article-title":"Identifying Constraints on the Higher-Order Structure of RNA: Continued Development and Application of Comparative Sequence Analysis Methods","volume":"20","author":"RR Gutell","year":"1992","journal-title":"Nucl Acids Res"},{"key":"pcbi.1008085.ref035","doi-asserted-by":"crossref","first-page":"D222","DOI":"10.1093\/nar\/gkt1223","article-title":"Pfam: The Protein Families Database","volume":"42","author":"RD Finn","year":"2014","journal-title":"Nucl Acids Res"},{"key":"pcbi.1008085.ref036","doi-asserted-by":"crossref","first-page":"D335","DOI":"10.1093\/nar\/gkx1038","article-title":"Rfam 13.0: Shifting to a Genome-Centric Resource for Non-Coding RNA Families","volume":"46","author":"I Kalvari","year":"2018","journal-title":"Nucl Acids Res"},{"key":"pcbi.1008085.ref037","doi-asserted-by":"crossref","first-page":"616","DOI":"10.1093\/biomet\/64.3.616","article-title":"Efficiency of Pseudolikelihood Estimation for Simple Gaussian Fields","volume":"64","author":"J Besag","year":"1977","journal-title":"Biometrika"},{"key":"pcbi.1008085.ref038","unstructured":"Eddy SR. Multiple Alignment Using Hidden Markov Models. In: Rawlings C, Clark D, Altman R, Hunter L, Lengauer T, Wodak S, editors. Proc. Third Int. Conf. Intelligent Systems for Molecular Biology. Menlo Park, CA: AAAI Press; 1995. p. 114\u2013120."},{"key":"pcbi.1008085.ref039","doi-asserted-by":"crossref","first-page":"415","DOI":"10.1016\/0022-2836(86)90165-8","article-title":"Information Content of Binding Sites on Nucleotide Sequences","volume":"188","author":"TD Schneider","year":"1986","journal-title":"J Mol Biol"},{"key":"pcbi.1008085.ref040","doi-asserted-by":"crossref","first-page":"e1000069","DOI":"10.1371\/journal.pcbi.1000069","article-title":"A Probabilistic Model of Local Sequence Alignment that Simplifies Statistical Significance Estimation","volume":"4","author":"SR Eddy","year":"2008","journal-title":"PLOS Comput Biol"},{"key":"pcbi.1008085.ref041","doi-asserted-by":"crossref","first-page":"45","DOI":"10.1038\/nmeth.4066","article-title":"A Statistical Test for Conserved RNA Structure Shows Lack of Evidence for Structure in lncRNAs","volume":"14","author":"E Rivas","year":"2017","journal-title":"Nature Methods"},{"key":"pcbi.1008085.ref042","doi-asserted-by":"crossref","unstructured":"Rivas E. RNA Structure Prediction Using Positive and Negative Evolutionary Information. biorXiv 933952v2 [Preprint]. 2020 [Cited 11 June 2020]. Available from: https:\/\/www.biorxiv.org\/content\/10.1101\/2020.02.04.933952v2","DOI":"10.1101\/2020.02.04.933952"},{"key":"pcbi.1008085.ref043","doi-asserted-by":"crossref","first-page":"148","DOI":"10.1093\/nar\/26.1.148","article-title":"Compilation of tRNA Sequences and Sequences of tRNA Genes","volume":"26","author":"M Sprinzl","year":"1998","journal-title":"Nucl Acids Res"},{"key":"pcbi.1008085.ref044","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1038\/nchembio.1386","article-title":"A Widespread Self-Cleaving Ribozyme Class is Revealed by Bioinformatics","volume":"10","author":"A Roth","year":"2014","journal-title":"Nat Chem Biol"},{"key":"pcbi.1008085.ref045","doi-asserted-by":"crossref","first-page":"e56","DOI":"10.1371\/journal.pcbi.0030056","article-title":"Query-Dependent Banding (QDB) for Faster RNA Similarity Searches","volume":"3","author":"EP Nawrocki","year":"2007","journal-title":"PLOS Comput Biol"},{"key":"pcbi.1008085.ref046","doi-asserted-by":"crossref","first-page":"1172","DOI":"10.1038\/nature04819","article-title":"Structure of the S-adenosylmethionine Riboswitch Regulatory mRNA Element","volume":"441","author":"R Montange","year":"2006","journal-title":"Nature"},{"key":"pcbi.1008085.ref047","doi-asserted-by":"crossref","first-page":"e1002195","DOI":"10.1371\/journal.pcbi.1002195","article-title":"Accelerated profile HMM searches","volume":"7","author":"SR Eddy","year":"2011","journal-title":"PLOS Comp Biol"},{"key":"pcbi.1008085.ref048","doi-asserted-by":"crossref","first-page":"4868","DOI":"10.1021\/bi00365a022","article-title":"Restrained Refinement of the Monoclinic Form of Yeast Phenylalanine Transfer RNA. Temperature Factors and Dynamics, Coordinated Waters, and Base-Pair Propeller Twist Angles","volume":"25","author":"E Westhof","year":"1986","journal-title":"Biochemistry"},{"key":"pcbi.1008085.ref049","doi-asserted-by":"crossref","first-page":"3063","DOI":"10.1073\/pnas.69.10.3063","article-title":"Is There a Discriminator Site in tRNA?","volume":"69","author":"DM Crothers","year":"1972","journal-title":"Proc Natl Acad Sci USA"},{"key":"pcbi.1008085.ref050","doi-asserted-by":"crossref","first-page":"3089","DOI":"10.1093\/bioinformatics\/btw328","article-title":"ACE: Adaptive Cluster Expansion for Maximum Entropy Graphical Model Inference","volume":"32","author":"JP Barton","year":"2016","journal-title":"Bioinformatics"},{"issue":"3","key":"pcbi.1008085.ref051","doi-asserted-by":"crossref","first-page":"032601","DOI":"10.1088\/1361-6633\/aa9965","article-title":"Inverse Statistical Physics of Protein Sequences: A Key Issues Review","volume":"81","author":"S Cocco","year":"2018","journal-title":"Reports on Progress in Physics"},{"key":"pcbi.1008085.ref052","doi-asserted-by":"crossref","first-page":"45","DOI":"10.2142\/biophysico.13.0_45","article-title":"A Unified Statistical Model of Protein Multiple Sequence Alignment Integrating Direct Coupling and Insertions","volume":"13","author":"AR Kinjo","year":"2016","journal-title":"Biophysics and Physicobiology"},{"issue":"3-1","key":"pcbi.1008085.ref053","doi-asserted-by":"crossref","first-page":"032405","DOI":"10.1103\/PhysRevE.99.032405","article-title":"Influence of Multiple-Sequence-Alignment Depth on Potts Statistical Models of Protein Covariation","volume":"99","author":"A Haldane","year":"2019","journal-title":"Physical Review E"},{"key":"pcbi.1008085.ref054","doi-asserted-by":"crossref","unstructured":"Muntoni AP, Pagnani A, Weigt M, Zamponi F. Aligning Biological Sequences by Exploiting Residue Conservation and Coevolution. biorXiv 101295v1 [Preprint]. 2020 [Cited 15 June 2020]. Available from: https:\/\/www.biorxiv.org\/content\/10.1101\/2020.05.18.101295v1","DOI":"10.1101\/2020.05.18.101295"},{"key":"pcbi.1008085.ref055","doi-asserted-by":"crossref","first-page":"97","DOI":"10.1006\/geno.1994.1018","article-title":"Protein Family Classification Based on Searching a Database of Blocks","volume":"19","author":"S Henikoff","year":"1994","journal-title":"Genomics"},{"key":"pcbi.1008085.ref056","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1093\/bioinformatics\/bth489","article-title":"RALEE\u2013RNA ALignment Editor in Emacs","volume":"21","author":"S Griffiths-Jones","year":"2005","journal-title":"Bioinformatics"}],"updated-by":[{"DOI":"10.1371\/journal.pcbi.1008085","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2020,12,10]],"date-time":"2020-12-10T00:00:00Z","timestamp":1607558400000}}],"container-title":["PLOS Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1008085","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,1]],"date-time":"2022-12-01T01:22:47Z","timestamp":1669857767000},"score":1,"resource":{"primary":{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1008085"}},"subtitle":[],"editor":[{"given":"Sushmita","family":"Roy","sequence":"first","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2020,11,30]]},"references-count":56,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2020,11,30]]}},"URL":"https:\/\/doi.org\/10.1371\/journal.pcbi.1008085","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2020.06.23.168153","asserted-by":"object"}]},"ISSN":["1553-7358"],"issn-type":[{"value":"1553-7358","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,11,30]]}}}