{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,21]],"date-time":"2025-09-21T16:56:31Z","timestamp":1758473791727},"reference-count":24,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2006,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>In order to maintain the most comprehensive structural annotation databases we must carry out regular updates for each proteome using the latest profile-profile fold recognition methods. The ability to carry out these updates on demand is necessary to keep pace with the regular updates of sequence and structure databases. Providing the highest quality structural models requires the most intensive profile-profile fold recognition methods running with the very latest available sequence databases and fold libraries. However, running these methods on such a regular basis for every sequenced proteome requires large amounts of processing power.<\/jats:p>\n            <jats:p>In this paper we describe and benchmark the JYDE (Job Yield Distribution Environment) system, which is a meta-scheduler designed to work above cluster schedulers, such as Sun Grid Engine (SGE) or Condor. We demonstrate the ability of JYDE to distribute the load of genomic-scale fold recognition across multiple independent Grid domains. We use the most recent profile-profile version of our mGenTHREADER software in order to annotate the latest version of the Human proteome against the latest sequence and structure databases in as short a time as possible.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>We show that our JYDE system is able to scale to large numbers of intensive fold recognition jobs running across several independent computer clusters. Using our JYDE system we have been able to annotate 99.9% of the protein sequences within the Human proteome in less than 24 hours, by harnessing over 500 CPUs from 3 independent Grid domains.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusion<\/jats:title>\n            <jats:p>This study clearly demonstrates the feasibility of carrying out on demand high quality structural annotations for the proteomes of major eukaryotic organisms. Specifically, we have shown that it is now possible to provide complete regular updates of profile-profile based fold recognition models for entire eukaryotic proteomes, through the use of Grid middleware such as JYDE.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-7-288","type":"journal-article","created":{"date-parts":[[2006,6,7]],"date-time":"2006-06-07T18:29:05Z","timestamp":1149704945000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":15,"title":["High throughput profile-profile based fold recognition for the entire human proteome"],"prefix":"10.1186","volume":"7","author":[{"given":"Liam J","family":"McGuffin","sequence":"first","affiliation":[]},{"given":"Richard T","family":"Smith","sequence":"additional","affiliation":[]},{"given":"Kevin","family":"Bryson","sequence":"additional","affiliation":[]},{"given":"S\u00f8ren-Aksel","family":"S\u00f8rensen","sequence":"additional","affiliation":[]},{"given":"David T","family":"Jones","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2006,6,7]]},"reference":[{"key":"1027_CR1","volume-title":"Nucleic Acids Res","author":"K Fleming","year":"2004","unstructured":"Fleming K, Muller A, MacCallum RM, Sternberg MJ: 3D-GENOMICS: a database to compare structural and functional annotations of proteins between sequenced genomes. Nucleic Acids Res 2004, (32 Database):D245\u201350. 10.1093\/nar\/gkh064"},{"key":"1027_CR2","volume-title":"Nucleic Acids Res","author":"C Yeats","year":"2006","unstructured":"Yeats C, Maibaum M, Marsden R, Dibley M, Lee D, Addou S, Orengo CA: Gene3D: modelling protein structure, function and evolution. Nucleic Acids Res 2006, (34 Database):D281\u20134. 10.1093\/nar\/gkj057"},{"key":"1027_CR3","doi-asserted-by":"publisher","first-page":"903","DOI":"10.1006\/jmbi.2001.5080","volume":"313","author":"J Gough","year":"2001","unstructured":"Gough J, Karplus K, Hughey R, Chothia C: Assignment of homology to genome sequences using a library of Hidden Markov Models that represent all proteins of known structure. J Mol Biol 2001, 313: 903\u2013919. 10.1006\/jmbi.2001.5080","journal-title":"J Mol Biol"},{"key":"1027_CR4","doi-asserted-by":"publisher","first-page":"131","DOI":"10.1093\/bioinformatics\/btg387","volume":"20","author":"LJ McGuffin","year":"2004","unstructured":"McGuffin LJ, Street S, S\u00f8rensen SA, Jones DT: Genomic Threading Database. Bioinformatics 2004, 20: 131\u20132. 10.1093\/bioinformatics\/btg387","journal-title":"Bioinformatics"},{"key":"1027_CR5","volume-title":"Nucleic Acids Res","author":"LJ McGuffin","year":"2004","unstructured":"McGuffin LJ, Street SA, Bryson K, Sorensen SA, Jones DT: The Genomic Threading Database: a comprehensive resource for structural annotations of the genomes from key organisms. Nucleic Acids Res 2004, (32 Database):D196\u20139. 10.1093\/nar\/gkh043"},{"key":"1027_CR6","doi-asserted-by":"publisher","first-page":"235","DOI":"10.1093\/nar\/28.1.235","volume":"28","author":"HM Berman","year":"2000","unstructured":"Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Res 2000, 28: 235\u2013242. 10.1093\/nar\/28.1.235","journal-title":"Nucleic Acids Res"},{"key":"1027_CR7","volume-title":"Nucleic Acids Res","author":"CH Wu","year":"2006","unstructured":"Wu CH, Apweiler R, Bairoch A, Natale DA, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Mazumder R, O'Donovan C, Redaschi N, Suzek B: The Universal Protein Resource (UniProt): an expanding universe of protein information. Nucleic Acids Res 2006, (34 Database):D187\u201391. 10.1093\/nar\/gkj161"},{"key":"1027_CR8","doi-asserted-by":"publisher","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","volume":"25","author":"SF Altschul","year":"1997","unstructured":"Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25: 3389\u20133402. 10.1093\/nar\/25.17.3389","journal-title":"Nucleic Acids Res"},{"key":"1027_CR9","doi-asserted-by":"publisher","first-page":"240","DOI":"10.1110\/ps.04888805","volume":"14","author":"L Rychlewski","year":"2005","unstructured":"Rychlewski L, Fischer D: LiveBench-8: the large-scale, continuous assessment of automated protein structure prediction. Protein Sci 2005, 14: 240\u20135. 10.1110\/ps.04888805","journal-title":"Protein Sci"},{"key":"1027_CR10","doi-asserted-by":"publisher","first-page":"188","DOI":"10.1002\/prot.20184","volume":"57","author":"T Ohlson","year":"2004","unstructured":"Ohlson T, Wallner B, Elofsson A: Profile-profile methods provide improved fold-recognition: a study of different profile-profile alignment methods. Proteins 2004, 57: 188\u201397. 10.1002\/prot.20184","journal-title":"Proteins"},{"key":"1027_CR11","unstructured":"Sun Grid Engine Project Homepage[http:\/\/gridengine.sunsource.net]"},{"key":"1027_CR12","unstructured":"Condor Project Homepage[http:\/\/www.cs.wisc.edu\/condor]"},{"key":"1027_CR13","doi-asserted-by":"publisher","first-page":"143","DOI":"10.1002\/prot.20731","volume-title":"Proteins","author":"DT Jones","year":"2005","unstructured":"Jones DT, Bryson K, Coleman A, McGuffin LJ, Sadowski MI, Sodhi JS, Ward JJ: Prediction of novel and analogous folds using fragment assembly and fold recognition. Proteins 2005, (Suppl 7):143\u201351. 10.1002\/prot.20731"},{"key":"1027_CR14","volume-title":"Nucleic Acids Res","author":"E Birney","year":"2006","unstructured":"Birney E, Andrews D, Caccamo M, Chen Y, Clarke L, Coates G, Cox T, Cunningham F, Curwen V, Cutts T, Down T, Durbin R, Fernandez-Suarez XM, Flicek P, Graf S, Hammond M, Herrero J, Howe K, Iyer V, Jekosch K, Kahari A, Kasprzyk A, Keefe D, Kokocinski F, Kulesha E, London D, Longden I, Melsopp C, Meidl P, Overduin B, Parker A, Proctor G, Prlic A, Rae M, Rios D, Redmond S, Schuster M, Sealy I, Searle S, Severin J, Slater G, Smedley D, Smith J, Stabenau A, Stalker J, Trevanion S, Ureta-Vidal A, Vogel J, White S, Woodwark C, Hubbard TJ: Ensembl 2006. Nucleic Acids Res 2006, (34 Database):D556\u201361. 10.1093\/nar\/gkj133"},{"key":"1027_CR15","doi-asserted-by":"publisher","first-page":"2444","DOI":"10.1073\/pnas.85.8.2444","volume":"85","author":"WR Pearson","year":"1988","unstructured":"Pearson WR, Lipman DJ: Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A 1988, 85: 2444\u20138. 10.1073\/pnas.85.8.2444","journal-title":"Proc Natl Acad Sci U S A"},{"key":"1027_CR16","doi-asserted-by":"publisher","first-page":"874","DOI":"10.1093\/bioinformatics\/btg097","volume":"19","author":"LJ McGuffin","year":"2003","unstructured":"McGuffin LJ, Jones DT: Improvement of the GenTHREADER method for genomic fold recognition. Bioinformatics 2003, 19: 874\u201381. 10.1093\/bioinformatics\/btg097","journal-title":"Bioinformatics"},{"key":"1027_CR17","doi-asserted-by":"publisher","first-page":"2302","DOI":"10.1093\/nar\/gki524","volume":"33","author":"Y Zhang","year":"2005","unstructured":"Zhang Y, Skolnick J: TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res 2005, 33: 2302\u20139. 10.1093\/nar\/gki524","journal-title":"Nucleic Acids Res"},{"key":"1027_CR18","unstructured":"The R Project for Statistical Computing[http:\/\/www.r-project.org\/]"},{"key":"1027_CR19","doi-asserted-by":"publisher","first-page":"797","DOI":"10.1006\/jmbi.1999.2583","volume":"287","author":"DT Jones","year":"1999","unstructured":"Jones DT: GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. J Mol Biol 1999, 287: 797\u2013815. 10.1006\/jmbi.1999.2583","journal-title":"J Mol Biol"},{"key":"1027_CR20","doi-asserted-by":"publisher","first-page":"536","DOI":"10.1006\/jmbi.1995.0159","volume":"247","author":"AG Murzin","year":"1995","unstructured":"Murzin AG, Brenner SE, Hubbard T, Chothia C: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 1995, 247: 536\u2013540. 10.1006\/jmbi.1995.0159","journal-title":"J Mol Biol"},{"key":"1027_CR21","doi-asserted-by":"publisher","first-page":"1093","DOI":"10.1016\/S0969-2126(97)00260-8","volume":"5","author":"CA Orengo","year":"1997","unstructured":"Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, Thornton JM: CATH \u2013 A hierarchic classification of protein domain structures. Structure 1997, 5: 1093\u20131108. 10.1016\/S0969-2126(97)00260-8","journal-title":"Structure"},{"key":"1027_CR22","doi-asserted-by":"publisher","first-page":"478","DOI":"10.1016\/S0968-0004(00)89105-7","volume":"11","author":"L Holm","year":"1995","unstructured":"Holm L, Sander C: Dali: a network tool for protein structure comparison. Trends Biochem Sci 1995, 11: 478\u201380. 10.1016\/S0968-0004(00)89105-7","journal-title":"Trends Biochem Sci"},{"key":"1027_CR23","doi-asserted-by":"publisher","first-page":"232","DOI":"10.1110\/ps.9.2.232","volume":"9","author":"L Rychlewski","year":"2000","unstructured":"Rychlewski L, Jaroszewski L, Li W, Godzik A: Comparison of sequence profiles. Strategies for structural predictions using sequence information. Protein Sci 2000, 9: 232\u201341.","journal-title":"Protein Sci"},{"key":"1027_CR24","doi-asserted-by":"publisher","first-page":"197","DOI":"10.1002\/prot.10029","volume":"46","author":"D Przybylski","year":"2002","unstructured":"Przybylski D, Rost B: Alignments grow, secondary structure prediction improves. Proteins 2002, 46: 197\u2013205. 10.1002\/prot.10029","journal-title":"Proteins"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-7-288.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T11:05:33Z","timestamp":1630494333000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-7-288"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2006,6,7]]},"references-count":24,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2006,12]]}},"alternative-id":["1027"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-7-288","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2006,6,7]]},"assertion":[{"value":"22 February 2006","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 June 2006","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 June 2006","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"288"}}