{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,8,5]],"date-time":"2024-08-05T09:52:48Z","timestamp":1722851568126},"reference-count":35,"publisher":"Oxford University Press (OUP)","issue":"9","license":[{"start":{"date-parts":[[2016,10,2]],"date-time":"2016-10-02T00:00:00Z","timestamp":1475366400000},"content-version":"vor","delay-in-days":1668,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/3.0"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2012,5,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: One of the most successful methods to date for recognizing protein sequences that are evolutionarily related has been profile hidden Markov models (HMMs). However, these models do not capture pairwise statistical preferences of residues that are hydrogen bonded in beta sheets. These dependencies have been partially captured in the HMM setting by simulated evolution in the training phase and can be fully captured by Markov random fields (MRFs). However, the MRFs can be computationally prohibitive when beta strands are interleaved in complex topologies. We introduce SMURFLite, a method that combines both simplified MRFs and simulated evolution to substantially improve remote homology detection for beta structures. Unlike previous MRF-based methods, SMURFLite is computationally feasible on any beta-structural motif.<\/jats:p>\n               <jats:p>Results: We test SMURFLite on all propeller and barrel folds in the mainly-beta class of the SCOP hierarchy in stringent cross-validation experiments. We show a mean 26% (median 16%) improvement in area under curve (AUC) for beta-structural motif recognition as compared with HMMER (a well-known HMM method) and a mean 33% (median 19%) improvement as compared with RAPTOR (a well-known threading method) and even a mean 18% (median 10%) improvement in AUC over HHPred (a profile\u2013profile HMM method), despite HHpred's use of extensive additional training data. We demonstrate SMURFLite's ability to scale to whole genomes by running a SMURFLite library of 207 beta-structural SCOP superfamilies against the entire genome of Thermotoga maritima, and make over a 100 new fold predictions.<\/jats:p>\n               <jats:p>Availability and implementaion: A webserver that runs SMURFLite is available at: http:\/\/smurf.cs.tufts.edu\/smurflite\/<\/jats:p>\n               <jats:p>Contact: \u00a0lenore.cowen@tufts.edu; bab@mit.edu<\/jats:p>","DOI":"10.1093\/bioinformatics\/bts110","type":"journal-article","created":{"date-parts":[[2012,3,10]],"date-time":"2012-03-10T01:39:55Z","timestamp":1331343595000},"page":"1216-1222","source":"Crossref","is-referenced-by-count":24,"title":["SMURFLite: combining simplified Markov random fields with simulated evolution improves remote homology detection for beta-structural proteins into the twilight zone"],"prefix":"10.1093","volume":"28","author":[{"given":"Noah M.","family":"Daniels","sequence":"first","affiliation":[{"name":"1 Department of Computer Science, Tufts University, Medford, MA 02155 and 2Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA"}]},{"given":"Raghavendra","family":"Hosur","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science, Tufts University, Medford, MA 02155 and 2Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA"}]},{"given":"Bonnie","family":"Berger","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science, Tufts University, Medford, MA 02155 and 2Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA"}]},{"given":"Lenore J.","family":"Cowen","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science, Tufts University, Medford, MA 02155 and 2Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA"}]}],"member":"286","published-online":{"date-parts":[[2012,3,9]]},"reference":[{"key":"2023012512232091700_B1","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1093\/nar\/28.1.235","article-title":"The protein data bank","volume":"28","author":"Berman","year":"2000","journal-title":"Nucleic Acids Res."},{"key":"2023012512232091700_B2","doi-asserted-by":"crossref","first-page":"14819","DOI":"10.1073\/pnas.251267298","article-title":"Betawrap: successful prediction of parallel \u03b2-helices from primary sequence reveals an association with many microbial pathogens","volume":"98","author":"Bradley","year":"2001","journal-title":"Proc. Natl Acad. Sci."},{"key":"2023012512232091700_B3","doi-asserted-by":"crossref","first-page":"261","DOI":"10.1089\/10665270252935458","article-title":"Predicting the beta-helix fold from protein sequence data","volume":"9","author":"Cowen","year":"2002","journal-title":"J. Comput. Biol."},{"key":"2023012512232091700_B4","doi-asserted-by":"crossref","first-page":"286","DOI":"10.1109\/TCBB.2011.70","article-title":"Touring protein space with Matt","volume":"9","author":"Daniels","year":"2012","journal-title":"IEEE\/ACM Trans. Comput. Biol. Bioinform."},{"key":"2023012512232091700_B5","doi-asserted-by":"crossref","first-page":"755","DOI":"10.1093\/bioinformatics\/14.9.755","article-title":"Profile hidden Markov models","volume":"14","author":"Eddy","year":"1998","journal-title":"Bioinformatics"},{"key":"2023012512232091700_B6","doi-asserted-by":"crossref","first-page":"D281","DOI":"10.1093\/nar\/gkm960","article-title":"The Pfam protein families database","volume":"36","author":"Finn","year":"2008","journal-title":"Nucleic Acids Res."},{"key":"2023012512232091700_B7","doi-asserted-by":"crossref","first-page":"324","DOI":"10.1007\/BF00409880","article-title":"Thermotoga maritima sp. nov. represents a new genus of unique extremely thermophilic eubacteria growing up to 90c","volume":"144","author":"Huber","year":"1986","journal-title":"Arch. Microbiol."},{"key":"2023012512232091700_B8","article-title":"Remote protein homology detection using hidden Markov models","volume-title":"PhD Thesis","author":"Johnson","year":"2006"},{"key":"2023012512232091700_B9","doi-asserted-by":"crossref","first-page":"713","DOI":"10.1093\/bioinformatics\/17.8.713","article-title":"Evaluation of protein multiple alignments by SAM-T99 using the BAliBASE multiple alignment test set","volume":"17","author":"Karplus","year":"2001","journal-title":"Bioinformatics"},{"key":"2023012512232091700_B10","doi-asserted-by":"crossref","first-page":"1602","DOI":"10.1093\/bioinformatics\/btp265","article-title":"Augmented training of hidden Markov models to recognize remote homologs via simulated evolution","volume":"25","author":"Kumar","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012512232091700_B11","doi-asserted-by":"crossref","first-page":"i287","DOI":"10.1093\/bioinformatics\/btq199","article-title":"Recognition of beta-structural motifs using hidden Markov models trained with simulated evolution","volume":"26","author":"Kumar","year":"2010","journal-title":"Bioinformatics"},{"key":"2023012512232091700_B12","doi-asserted-by":"crossref","first-page":"641","DOI":"10.1006\/jmbi.1996.0053","article-title":"Global optimum protein threading with gapped alignment and empirical pair scores","volume":"255","author":"Lathrop","year":"1996","journal-title":"J. Mol. Biol."},{"key":"2023012512232091700_B13","doi-asserted-by":"crossref","first-page":"627","DOI":"10.1016\/0022-2836(80)90052-2","article-title":"Specific recognition in the tertiary structure of \u03b2-sheets of proteins","volume":"139","author":"Lifson","year":"1980","journal-title":"J. Mol. Biol."},{"key":"2023012512232091700_B14","doi-asserted-by":"crossref","first-page":"639","DOI":"10.1089\/cmb.2008.0176","article-title":"Conditional graphical models for protein structural motif recognition","volume":"16","author":"Liu","year":"2009","journal-title":"J. Comput. Biol."},{"key":"2023012512232091700_B15","doi-asserted-by":"crossref","first-page":"e10","DOI":"10.1371\/journal.pcbi.0040010","article-title":"Matt: local flexibility aids protein multiple structure alignment","volume":"4","author":"Menke","year":"2008","journal-title":"PLoS Comput. Biol."},{"key":"2023012512232091700_B16","doi-asserted-by":"crossref","first-page":"4069","DOI":"10.1073\/pnas.0909950107","article-title":"Markov random fields reveal an n-terminal double beta-propeller motif as part of a bacterial hybrid two-component sensor system","volume":"107","author":"Menke","year":"2010","journal-title":"Proc. Natl Acad. Sci."},{"key":"2023012512232091700_B17","article-title":"Computational approaches to modeling the conserved structural core among distantly homolgous proteins","volume-title":"PhD Thesis","author":"Menke","year":"2009"},{"key":"2023012512232091700_B18","first-page":"467","article-title":"Loopy belief propagation for approximate inference: an empirical study","volume-title":"Proceedings of Uncertainty in AI.","author":"Murphy","year":"1999"},{"key":"2023012512232091700_B19","doi-asserted-by":"crossref","first-page":"536","DOI":"10.1016\/S0022-2836(05)80134-2","article-title":"SCOP: a structural classification of proteins database for the investigation of sequences and structures","volume":"247","author":"Murzin","year":"1995","journal-title":"J. Mol. Biol."},{"key":"2023012512232091700_B20","doi-asserted-by":"crossref","first-page":"1221","DOI":"10.1006\/jmbi.1999.3208","article-title":"Effective use of sequence correlation and conservation in fold recognition","volume":"293","author":"Olmea","year":"1999","journal-title":"J. Mol. Biol."},{"key":"2023012512232091700_B21","doi-asserted-by":"crossref","first-page":"1930","DOI":"10.1002\/prot.23016","article-title":"A multiple-template approach to protein threading","volume":"79","author":"Peng","year":"2011","journal-title":"Prot. Struct. Func. Bioinformatics"},{"key":"2023012512232091700_B22","doi-asserted-by":"crossref","first-page":"374","DOI":"10.1016\/S0968-0004(00)89080-5","article-title":"RASMOL: biomolecular graphics for all","volume":"20","author":"Sayle","year":"1995","journal-title":"Trends Biochem. Sci."},{"key":"2023012512232091700_B23","doi-asserted-by":"crossref","first-page":"227","DOI":"10.1162\/neco.1997.9.2.227","article-title":"Probabilistic independence networks for hidden markov probability models","volume":"9","author":"Smyth","year":"1997","journal-title":"Neural comput."},{"issue":"Suppl. 2","key":"2023012512232091700_B24","doi-asserted-by":"crossref","first-page":"W244","DOI":"10.1093\/nar\/gki408","article-title":"The HHpred interactive server for protein homology detection and structure prediction","volume":"33","author":"S\u00f6ding","year":"2005","journal-title":"Nucleic Acids Res."},{"key":"2023012512232091700_B25","doi-asserted-by":"crossref","first-page":"198","DOI":"10.1093\/bib\/bbm064","article-title":"ROC analysis: applications to the classification of biological sequences and 3D structures","volume":"9","author":"Sonego","year":"2008","journal-title":"Brief. Bioinformatics"},{"key":"2023012512232091700_B26","doi-asserted-by":"crossref","first-page":"178","DOI":"10.1002\/prot.10152","article-title":"Prediction of strand pairing in antiparallel and parallel \u03b2-sheets using information theory","volume":"48","author":"Steward","year":"2002","journal-title":"Prot. Struct. Func. Bioinformatics"},{"key":"2023012512232091700_B27","doi-asserted-by":"crossref","first-page":"8767","DOI":"10.1021\/bi00152a012","article-title":"Primary structure of barwin: a barley seed protein closely related to the C-terminal domain of proteins encoded by wound-induced plant genes","volume":"31","author":"Svensson","year":"1992","journal-title":"Biochemistry"},{"key":"2023012512232091700_B28","doi-asserted-by":"crossref","first-page":"183","DOI":"10.1109\/TCBB.2007.70225","article-title":"Graphical models of residue coupling in protein families","volume":"5","author":"Thomas","year":"2008","journal-title":"IEEE\/ACM Trans. Comp. Biol. Bioinf."},{"key":"2023012512232091700_B29","doi-asserted-by":"crossref","first-page":"260","DOI":"10.1109\/TIT.1967.1054010","article-title":"Error bounds for convolutional codes and an asymptotically optimum decoding algorithm","volume":"13","author":"Viterbi","year":"1967","journal-title":"IEEE Trans. Inf. Theory"},{"key":"2023012512232091700_B30","doi-asserted-by":"crossref","first-page":"149","DOI":"10.1016\/0025-5564(94)90041-8","article-title":"Modeling protein cores with Markov random fields","volume":"124","author":"White","year":"1994","journal-title":"Math. Biosci."},{"key":"2023012512232091700_B31","doi-asserted-by":"crossref","first-page":"D308","DOI":"10.1093\/nar\/gkl910","article-title":"The SUPERFAMILY database in 2007: families and functions","volume":"35","author":"Wilson","year":"2007","journal-title":"Nucleic Acids Res."},{"key":"2023012512232091700_B32","doi-asserted-by":"crossref","first-page":"95","DOI":"10.1142\/S0219720003000186","article-title":"Raptor: optimal protein threading by linear programming","volume":"1","author":"Xu","year":"2003","journal-title":"J. Bioinformatics Comput. Biol."},{"key":"2023012512232091700_B33","doi-asserted-by":"crossref","first-page":"2302","DOI":"10.1093\/nar\/gki524","article-title":"TM-align: a protein structure alignment algorithm based on the TM-score","volume":"33","author":"Zhang","year":"2005","journal-title":"Nucleic Acids Res."},{"key":"2023012512232091700_B34","doi-asserted-by":"crossref","first-page":"1544","DOI":"10.1126\/science.1174671","article-title":"Three-dimensional structural view of the central metabolic network of Thermotoga maritima","volume":"325","author":"Zhang","year":"2009","journal-title":"Science"},{"key":"2023012512232091700_B35","doi-asserted-by":"crossref","first-page":"326","DOI":"10.1110\/ps.8.2.326","article-title":"Sequence specificity, statistical potentials and 3D structure prediction with self-correcting distance geometry calculations of beta-sheet formation in proteins","volume":"8","author":"Zhu","year":"1999","journal-title":"Protein Sci."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/28\/9\/1216\/48877985\/bioinformatics_28_9_1216.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/28\/9\/1216\/48877985\/bioinformatics_28_9_1216.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T15:53:34Z","timestamp":1674662014000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/28\/9\/1216\/311484"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,3,9]]},"references-count":35,"journal-issue":{"issue":"9","published-print":{"date-parts":[[2012,5,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bts110","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2012,5,1]]},"published":{"date-parts":[[2012,3,9]]}}}