{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,4,17]],"date-time":"2025-04-17T11:28:08Z","timestamp":1744889288028},"reference-count":34,"publisher":"Oxford University Press (OUP)","issue":"21","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2010,11,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Databases of sequenced genomes are widely used to characterize the structure, function and evolutionary relationships of proteins. The ability to discern such relationships is widely expected to grow as sequencing projects provide novel information, bridging gaps in our map of the protein universe.<\/jats:p>\n               <jats:p>Results: We have plotted our progress in protein sequencing over the last two decades and found that the rate of novel sequence discovery is in a sustained period of decline. Consequently, PSI-BLAST, the most widely used method to detect remote evolutionary relationships, which relies upon the accumulation of novel sequence data, is now showing a plateau in performance. We interpret this trend as signalling our approach to a representative map of the protein universe and discuss its implications.<\/jats:p>\n               <jats:p>Contact: \u00a0daniel.chubb01@imperial.ac.uk<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btq527","type":"journal-article","created":{"date-parts":[[2010,9,16]],"date-time":"2010-09-16T00:51:32Z","timestamp":1284598292000},"page":"2664-2671","source":"Crossref","is-referenced-by-count":20,"title":["Sequencing delivers diminishing returns for homology detection: implications for mapping the protein universe"],"prefix":"10.1093","volume":"26","author":[{"given":"Daniel","family":"Chubb","sequence":"first","affiliation":[{"name":"Department of Life Science, Imperial College London, London, UK"}]},{"given":"Benjamin R.","family":"Jefferys","sequence":"additional","affiliation":[{"name":"Department of Life Science, Imperial College London, London, UK"}]},{"given":"Michael J. E.","family":"Sternberg","sequence":"additional","affiliation":[{"name":"Department of Life Science, Imperial College London, London, UK"}]},{"given":"Lawrence A.","family":"Kelley","sequence":"additional","affiliation":[{"name":"Department of Life Science, Imperial College London, London, UK"}]}],"member":"286","published-online":{"date-parts":[[2010,9,15]]},"reference":[{"key":"2023012507541822300_B1","doi-asserted-by":"crossref","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","article-title":"Gapped BLAST and PSI-BLAST: a new generation of protein database search programs","volume":"25","author":"Altschul","year":"1997","journal-title":"Nucleic Acids Res."},{"key":"2023012507541822300_B2","doi-asserted-by":"crossref","first-page":"254","DOI":"10.1093\/nar\/28.1.254","article-title":"The ASTRAL compendium for protein structure and sequence analysis","volume":"28","author":"Brenner","year":"2000","journal-title":"Nucleic Acids Res."},{"key":"2023012507541822300_B3","doi-asserted-by":"crossref","first-page":"33","DOI":"10.1186\/1471-2148-4-33","article-title":"Reconstruction of ancestral protein sequences and its applications","volume":"4","author":"Cai","year":"2004","journal-title":"BMC Evol. Biol."},{"key":"2023012507541822300_B4","doi-asserted-by":"crossref","first-page":"200","DOI":"10.1186\/1471-2105-5-200","article-title":"Analysis of superfamily specific profile-profile recognition accuracy","volume":"5","author":"Casbon","year":"2004","journal-title":"BMC Bioinformatics"},{"key":"2023012507541822300_B5","doi-asserted-by":"crossref","first-page":"18","DOI":"10.1002\/prot.22561","article-title":"Evaluation of template-based models in CASP8 with standard measures","volume":"77","author":"Cozzetto","year":"2009","journal-title":"Proteins Struct. Funct. Bioinformatics"},{"key":"2023012507541822300_B6","doi-asserted-by":"crossref","first-page":"1144","DOI":"10.1126\/science.1106198","article-title":"EVOLUTION: genomic databases and the tree of life","volume":"306","author":"Crandall","year":"2004","journal-title":"Science"},{"key":"2023012507541822300_B7","doi-asserted-by":"crossref","first-page":"363","DOI":"10.1038\/nprot.2009.2","article-title":"Protein structure prediction on the Web: a case study using the Phyre server","volume":"4","author":"Kelley","year":"2009","journal-title":"Nat. Protoc."},{"key":"2023012507541822300_B8","doi-asserted-by":"crossref","first-page":"401","DOI":"10.1186\/gb-2003-4-2-401","article-title":"Myriads of protein families, and still counting","volume":"4","author":"Kunin","year":"2003","journal-title":"Genome Biol."},{"key":"2023012507541822300_B9","doi-asserted-by":"crossref","first-page":"627","DOI":"10.1038\/nbt.1552","article-title":"Fifteen years of microbial genomics: meeting the challenges and fulfilling the dream","volume":"27","author":"Kyrpides","year":"2009","journal-title":"Nat. Biotechnol."},{"key":"2023012507541822300_B10","doi-asserted-by":"crossref","first-page":"686","DOI":"10.1016\/S0959-437X(96)80021-9","article-title":"Biodiversity, genomes, and DNA sequence databases","volume":"6","author":"Leipe","year":"1996","journal-title":"Curr. Opin. Genet. Dev."},{"key":"2023012507541822300_B11","doi-asserted-by":"crossref","first-page":"11079","DOI":"10.1073\/pnas.0905029106","article-title":"Nature of the protein universe","volume":"106","author":"Levitt","year":"2009","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012507541822300_B12","doi-asserted-by":"crossref","first-page":"1658","DOI":"10.1093\/bioinformatics\/btl158","article-title":"Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences","volume":"22","author":"Li","year":"2006","journal-title":"Bioinformatics"},{"key":"2023012507541822300_B13","doi-asserted-by":"crossref","first-page":"643","DOI":"10.1093\/protein\/15.8.643","article-title":"Sequence clustering strategies improve remote homology recognitions while reducing search times","volume":"15","author":"Li","year":"2002","journal-title":"Protein Eng."},{"key":"2023012507541822300_B14","doi-asserted-by":"crossref","first-page":"77","DOI":"10.1093\/bioinformatics\/18.1.77","article-title":"Tolerating some redundancy significantly speeds up clustering of large protein databases","volume":"18","author":"Li","year":"2002","journal-title":"Bioinformatics"},{"key":"2023012507541822300_B15","doi-asserted-by":"crossref","first-page":"e3375","DOI":"10.1371\/journal.pone.0003375","article-title":"Probing metagenomics by rapid cluster analysis of very large datasets","volume":"3","author":"Li","year":"2008","journal-title":"PLoS ONE"},{"key":"2023012507541822300_B16","doi-asserted-by":"crossref","first-page":"D475","DOI":"10.1093\/nar\/gkm884","article-title":"The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata","volume":"36","author":"Liolios","year":"2008","journal-title":"Nucleic Acids Res."},{"key":"2023012507541822300_B17","doi-asserted-by":"crossref","first-page":"1761","DOI":"10.1093\/bioinformatics\/btp302","article-title":"pGenTHREADER and pDomTHREADER: new methods for improved protein fold recognition and superfamily discrimination","volume":"25","author":"Lobley","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012507541822300_B18","doi-asserted-by":"crossref","first-page":"1066","DOI":"10.1093\/nar\/gkj494","article-title":"Comprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space","volume":"34","author":"Marsden","year":"2006","journal-title":"Nucleic Acids Res."},{"key":"2023012507541822300_B19","doi-asserted-by":"crossref","first-page":"536","DOI":"10.1016\/S0022-2836(05)80134-2","article-title":"SCOP: a structural classification of proteins database for the investigation of sequences and structures","volume":"247","author":"Murzin","year":"1995","journal-title":"J. Mol. Biol."},{"key":"2023012507541822300_B20","doi-asserted-by":"crossref","first-page":"458","DOI":"10.1093\/bioinformatics\/16.5.458","article-title":"RSDB: representative protein sequence databases have high information content","volume":"16","author":"Park","year":"2000","journal-title":"Bioinformatics"},{"key":"2023012507541822300_B21","doi-asserted-by":"crossref","first-page":"254","DOI":"10.1016\/j.sbi.2005.05.005","article-title":"The limits of protein sequence comparison?","volume":"15","author":"Pearson","year":"2005","journal-title":"Curr. Opin. Struct. Biol."},{"key":"2023012507541822300_B22","doi-asserted-by":"crossref","first-page":"11361","DOI":"10.1073\/pnas.2034878100","article-title":"Using protein design for homology detection and active site searches","volume":"100","author":"Pei","year":"2003","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012507541822300_B23","doi-asserted-by":"crossref","first-page":"i294","DOI":"10.1093\/bioinformatics\/btq192","article-title":"Low-homology protein threading","volume":"26","author":"Peng","year":"2010","journal-title":"Bioinformatics"},{"key":"2023012507541822300_B24","doi-asserted-by":"crossref","first-page":"818","DOI":"10.1093\/bioinformatics\/btg485","article-title":"Quality of alignment comparison by COMPASS improves with inclusion of diverse confident homologs","volume":"20","author":"Sadreyev","year":"2004","journal-title":"Bioinformatics"},{"key":"2023012507541822300_B25","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1016\/j.sbi.2009.04.009","article-title":"Discrete-continuous duality of protein structure space","volume":"19","author":"Sadreyev","year":"2009","journal-title":"Curr. Opin. Struct. Biol."},{"key":"2023012507541822300_B26","doi-asserted-by":"crossref","first-page":"225","DOI":"10.1016\/S0014-5793(03)00929-3","article-title":"Effective detection of remote homologues by searching in sequence dataset of a protein domain fold","volume":"552","author":"Sandhya","year":"2003","journal-title":"FEBS Lett."},{"key":"2023012507541822300_B27","doi-asserted-by":"crossref","first-page":"951","DOI":"10.1093\/bioinformatics\/bti125","article-title":"Protein homology detection by HMM-HMM comparison","volume":"21","author":"S\u00f6ding","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012507541822300_B28","doi-asserted-by":"crossref","first-page":"1282","DOI":"10.1093\/bioinformatics\/btm098","article-title":"UniRef: comprehensive and non-redundant UniProt reference clusters","volume":"23","author":"Suzek","year":"2007","journal-title":"Bioinformatics"},{"key":"2023012507541822300_B29","doi-asserted-by":"crossref","first-page":"D169","DOI":"10.1093\/nar\/gkn664","article-title":"The Universal Protein Resource (UniProt) 2009","volume":"37","author":"The UniProt Consortium","year":"2009","journal-title":"Nucleic Acids Res."},{"key":"2023012507541822300_B30","doi-asserted-by":"crossref","first-page":"213","DOI":"10.1186\/1471-2105-7-213","article-title":"An analysis of the Sargasso Sea resource and the consequences for database composition","volume":"7","author":"Tress","year":"2006","journal-title":"BMC Bioinformatics"},{"key":"2023012507541822300_B31","doi-asserted-by":"crossref","first-page":"897","DOI":"10.1006\/jmbi.2000.3786","article-title":"Estimating the number of protein folds and families from complete genome data","volume":"299","author":"Wolf","year":"2000","journal-title":"J. Mol. Biol."},{"key":"2023012507541822300_B32","doi-asserted-by":"crossref","first-page":"D187","DOI":"10.1093\/nar\/gkj161","article-title":"The Universal Protein Resource (UniProt): an expanding universe of protein information","volume":"34","author":"Wu","year":"2006","journal-title":"Nucleic Acids Res."},{"key":"2023012507541822300_B33","doi-asserted-by":"crossref","first-page":"e16","DOI":"10.1371\/journal.pbio.0050016","article-title":"The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families","volume":"5","author":"Yooseph","year":"2007","journal-title":"PLoS Biol."},{"key":"2023012507541822300_B34","doi-asserted-by":"crossref","first-page":"2605","DOI":"10.1073\/pnas.0509379103","article-title":"On the origin and highly likely completeness of single-domain protein structures","volume":"103","author":"Zhang","year":"2006","journal-title":"Proc. Natl Acad. Sci. USA"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/21\/2664\/48852124\/bioinformatics_26_21_2664.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/21\/2664\/48852124\/bioinformatics_26_21_2664.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T07:54:46Z","timestamp":1674633286000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/26\/21\/2664\/214603"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,9,15]]},"references-count":34,"journal-issue":{"issue":"21","published-print":{"date-parts":[[2010,11,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btq527","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2010,11,1]]},"published":{"date-parts":[[2010,9,15]]}}}