{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,17]],"date-time":"2026-04-17T06:43:24Z","timestamp":1776408204684,"version":"3.51.2"},"reference-count":50,"publisher":"Oxford University Press (OUP)","issue":"10","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2007,5,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: Redundant protein sequences in biological databases hinder sequence similarity searches and make interpretation of search results difficult. Clustering of protein sequence space based on sequence similarity helps organize all sequences into manageable datasets and reduces sampling bias and overrepresentation of sequences.<\/jats:p><jats:p>Results: The UniRef (UniProt Reference Clusters) provide clustered sets of sequences from the UniProt Knowledgebase (UniProtKB) and selected UniProt Archive records to obtain complete coverage of sequence space at several resolutions while hiding redundant sequences. Currently covering &amp;gt;4 million source sequences, the UniRef100 database combines identical sequences and subfragments from any source organism into a single UniRef entry. UniRef90 and UniRef50 are built by clustering UniRef100 sequences at the 90 or 50% sequence identity levels. UniRef100, UniRef90 and UniRef50 yield a database size reduction of \u223c10, 40 and 70%, respectively, from the source sequence set. The reduced redundancy increases the speed of similarity searches and improves detection of distant relationships. UniRef entries contain summary cluster and membership information, including the sequence of a representative protein, member count and common taxonomy of the cluster, the accession numbers of all the merged entries and links to rich functional annotation in UniProtKB to facilitate biological discovery. UniRef has already been applied to broad research areas ranging from genome annotation to proteomics data analysis.<\/jats:p><jats:p>Availability: UniRef is updated biweekly and is available for online search and retrieval at http:\/\/www.uniprot.org, as well as for download at ftp:\/\/ftp.uniprot.org\/pub\/databases\/uniprot\/uniref<\/jats:p><jats:p>Contact: \u00a0bes23@georgetown.edu<\/jats:p><jats:p>Supplementary information: Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btm098","type":"journal-article","created":{"date-parts":[[2007,3,23]],"date-time":"2007-03-23T00:18:49Z","timestamp":1174609129000},"page":"1282-1288","source":"Crossref","is-referenced-by-count":1464,"title":["UniRef: comprehensive and non-redundant UniProt reference clusters"],"prefix":"10.1093","volume":"23","author":[{"given":"Baris E.","family":"Suzek","sequence":"first","affiliation":[{"name":"Protein Information Resource, Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, DC 20007, USA"}]},{"given":"Hongzhan","family":"Huang","sequence":"additional","affiliation":[{"name":"Protein Information Resource, Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, DC 20007, USA"}]},{"given":"Peter","family":"McGarvey","sequence":"additional","affiliation":[{"name":"Protein Information Resource, Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, DC 20007, USA"}]},{"given":"Raja","family":"Mazumder","sequence":"additional","affiliation":[{"name":"Protein Information Resource, Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, DC 20007, USA"}]},{"given":"Cathy H.","family":"Wu","sequence":"additional","affiliation":[{"name":"Protein Information Resource, Department of Biochemistry and Molecular & Cellular Biology, Georgetown University Medical Center, Washington, DC 20007, USA"}]}],"member":"286","published-online":{"date-parts":[[2007,3,22]]},"reference":[{"key":"2023041104480638100_","unstructured":"Barnosa D \u00a0et al. Divergent paralogous in Uniref50 enriched-COG clusters depicted by Phylip neighbor trees rooted with Taxbrowser tables Abstract ISMB2006 2006 Retrieved September 30, 2006 from http:\/\/ismb2006.cbi.cnptia.embrapa.br\/poster_abstract_lb.php?id=LB-56"},{"key":"2023041104480638100_","doi-asserted-by":"crossref","first-page":"38","DOI":"10.1104\/pp.104.059204","article-title":"Databases and information integration for the Medicago truncatula genome and transcriptome","volume":"138","author":"Cannon","year":"2005","journal-title":"Plant Physiol."},{"key":"2023041104480638100_","doi-asserted-by":"crossref","first-page":"48","DOI":"10.1186\/1471-2105-7-48","article-title":"On single and multiple models of protein families for the detection of remote sequence relationships","volume":"7","author":"Casbon","year":"2006","journal-title":"BMC Bioinformatics"},{"key":"2023041104480638100_","doi-asserted-by":"crossref","first-page":"3135","DOI":"10.1021\/pr060363j","article-title":"Proteomic and bioinformatic characterization of the biogenesis and function of melanosomes","volume":"5","author":"Chi","year":"2006","journal-title":"J. Proteome Res."},{"key":"2023041104480638100_","doi-asserted-by":"crossref","first-page":"D846","DOI":"10.1093\/nar\/gkl785","article-title":"The TIGR Plant Transcript Assemblies database","volume":"35","author":"Childs","year":"2007","journal-title":"Nucleic Acids Res."},{"key":"2023041104480638100_","doi-asserted-by":"crossref","first-page":"451","DOI":"10.1093\/bioinformatics\/16.5.451","article-title":"GeneRAGE: a robust algorithm for sequence clustering and domain detection","volume":"16","author":"Enright","year":"2000","journal-title":"Bioinformatics"},{"key":"2023041104480638100_","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1186\/1472-6807-6-15","article-title":"Saturating representation of loop conformational fragments in structure databanks","volume":"6","author":"Fernandez-Fuentes","year":"2006","journal-title":"BMC Struct. Biol."},{"key":"2023041104480638100_","doi-asserted-by":"crossref","first-page":"2887","DOI":"10.1093\/nar\/gkl295","article-title":"Identification of multiple distinct Snf2 subfamilies with conserved structural motifs","volume":"34","author":"Flaus","year":"2006","journal-title":"Nucleic Acids Res."},{"key":"2023041104480638100_","doi-asserted-by":"crossref","first-page":"e52","DOI":"10.1371\/journal.pgen.0020052","article-title":"The abundance of short proteins in the mammalian proteome","volume":"2","author":"Frith","year":"2006","journal-title":"PLoS Genet."},{"key":"2023041104480638100_","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1007\/s11010-005-7556-1","article-title":"Proteome profiling of human epithelial ovarian cancer cell line TOV-112D","volume":"275","author":"Gagne","year":"2005","journal-title":"Mol. Cell. Biochem."},{"key":"2023041104480638100_","doi-asserted-by":"crossref","first-page":"409","DOI":"10.1002\/pro.5560010313","article-title":"Selection of representative protein data sets","volume":"1","author":"Hobohm","year":"1992","journal-title":"Protein Sci."},{"key":"2023041104480638100_","doi-asserted-by":"crossref","first-page":"147","DOI":"10.1016\/j.ijms.2006.09.024","article-title":"Comparative bioinformatics analyses and profiling of lysosome-related organelle proteomes","volume":"259","author":"Hu","year":"2007","journal-title":"Int. J. Mass Spectrom."},{"key":"2023041104480638100_","doi-asserted-by":"crossref","first-page":"21","DOI":"10.2174\/138920207780076910","article-title":"Challenges and solutions in proteomics","volume":"8","author":"Huang","year":"2007","journal-title":"Curr. Genomics"},{"key":"2023041104480638100_","doi-asserted-by":"crossref","first-page":"D610","DOI":"10.1093\/nar\/gkl996","article-title":"Ensembl 2007","volume":"35","author":"Hubbard","year":"2007","journal-title":"Nucleic Acids Res."},{"key":"2023041104480638100_","doi-asserted-by":"crossref","first-page":"1550","DOI":"10.1107\/S0907444905028805","article-title":"Structure of human semicarbazide-sensitive amine oxidase\/vascular adhesion protein-1","volume":"61","author":"Jakobsson","year":"2005","journal-title":"Acta Crystallogr. D. Biol. Crystallogr."},{"key":"2023041104480638100_","first-page":"4","article-title":"A conserved supergene locus controls colour pattern diversity in heliconius butterflies","author":"Joron","year":"2006","journal-title":"PLoS Biol."},{"key":"2023041104480638100_","doi-asserted-by":"crossref","first-page":"401","DOI":"10.1186\/1471-2105-7-401","article-title":"CRNPRED: highly accurate prediction of one-dimensional protein structures by large-scale critical random networks","volume":"7","author":"Kinjo","year":"2006","journal-title":"BMC Bioinformatics"},{"key":"2023041104480638100_","doi-asserted-by":"crossref","first-page":"151","DOI":"10.1186\/1471-2105-6-151","article-title":"AutoFACT: an automatic functional annotation and classification tool","volume":"6","author":"Koski","year":"2005","journal-title":"BMC Bioinformatics"},{"key":"2023041104480638100_","doi-asserted-by":"crossref","first-page":"D302","DOI":"10.1093\/nar\/gkj120","article-title":"The RCSB PDB information portal for structural genomics","volume":"34","author":"Kouranov","year":"2006","journal-title":"Nucleic Acids Res."},{"key":"2023041104480638100_","doi-asserted-by":"crossref","first-page":"3236","DOI":"10.1093\/bioinformatics\/bth191","article-title":"UniProt archive","volume":"20","author":"Leinonen","year":"2004","journal-title":"Bioinformatics"},{"key":"2023041104480638100_","doi-asserted-by":"crossref","first-page":"1658","DOI":"10.1093\/bioinformatics\/btl158","article-title":"Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences","volume":"22","author":"Li","year":"2006","journal-title":"Bioinformatics"},{"key":"2023041104480638100_","doi-asserted-by":"crossref","first-page":"282","DOI":"10.1093\/bioinformatics\/17.3.282","article-title":"Clustering of highly homologous sequences to reduce the size of large protein databases","volume":"17","author":"Li","year":"2001","journal-title":"Bioinformatics"},{"key":"2023041104480638100_","doi-asserted-by":"crossref","DOI":"10.1186\/gb-2002-3-8-research0040","article-title":"The dominance of the population by a selected few: power-law behaviour applies to a wide variety of genomic properties","volume":"3","author":"Luscombe","year":"2002","journal-title":"Genome Biol."},{"key":"2023041104480638100_","doi-asserted-by":"crossref","first-page":"R55","DOI":"10.1186\/gb-2005-6-6-r55","article-title":"Refinement and prediction of protein prenylation motifs","volume":"6","author":"Maurer-Stroh","year":"2005","journal-title":"Genome Biol."},{"key":"2023041104480638100_","doi-asserted-by":"crossref","first-page":"288","DOI":"10.1186\/1471-2105-7-288","article-title":"High throughput profile-profile based fold recognition for the entire human proteome","volume":"7","author":"McGuffin","year":"2006","journal-title":"BMC Bioinformatics"},{"key":"2023041104480638100_","doi-asserted-by":"crossref","first-page":"3789","DOI":"10.1093\/nar\/gkg620","article-title":"UniqueProt: creating representative protein sequence sets","volume":"31","author":"Mika","year":"2003","journal-title":"Nucleic Acids Res."},{"key":"2023041104480638100_","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1186\/1471-2229-5-15","article-title":"Highly syntenic regions in the genomes of soybean, Medicago truncatula, and Arabidopsis thaliana","volume":"5","author":"Mudge","year":"2005","journal-title":"BMC Plant Biol."},{"key":"2023041104480638100_","doi-asserted-by":"crossref","first-page":"D527","DOI":"10.1093\/nar\/gkj044","article-title":"pSTIING: a \u2018systems\u2019 approach towards integrating signalling pathways, interaction and transcriptional regulatory networks in inflammation and cancer","volume":"34","author":"Ng","year":"2006","journal-title":"Nucleic Acids Res."},{"key":"2023041104480638100_","doi-asserted-by":"crossref","first-page":"W214","DOI":"10.1093\/nar\/gkl332","article-title":"DOUTfinder \u2013 identification of distant domain outliers using subsignificant sequence similarity","volume":"34","author":"Novatchkova","year":"2006","journal-title":"Nucleic Acids Res."},{"key":"2023041104480638100_","doi-asserted-by":"crossref","first-page":"4005","DOI":"10.1016\/j.febslet.2006.06.015","article-title":"A normalised scale for structural genomics target ranking: the OB-Score","volume":"580","author":"Overton","year":"2006","journal-title":"FEBS Lett."},{"key":"2023041104480638100_","doi-asserted-by":"crossref","first-page":"1571","DOI":"10.1093\/nar\/gkj515","article-title":"Spectral clustering of protein sequences","volume":"34","author":"Paccanaro","year":"2006","journal-title":"Nucleic Acids Res."},{"key":"2023041104480638100_","doi-asserted-by":"crossref","first-page":"458","DOI":"10.1093\/bioinformatics\/16.5.458","article-title":"RSDB: representative protein sequence databases have high information content","volume":"16","author":"Park","year":"2000","journal-title":"Bioinformatics"},{"key":"2023041104480638100_","doi-asserted-by":"crossref","first-page":"144","DOI":"10.1186\/1471-2164-6-144","article-title":"Generation, annotation, analysis and database integration of 16 500 white spruce EST clusters","volume":"6","author":"Pavy","year":"2005","journal-title":"BMC Genomics"},{"key":"2023041104480638100_","doi-asserted-by":"crossref","first-page":"174","DOI":"10.1186\/1471-2164-7-174","article-title":"Automated SNP detection from a large collection of white spruce expressed sequences: contributing factors and approaches for the categorization of SNPs","volume":"7","author":"Pavy","year":"2006","journal-title":"BMC Genomics"},{"key":"2023041104480638100_","doi-asserted-by":"crossref","first-page":"208","DOI":"10.1186\/1471-2105-7-208","article-title":"Length-dependent prediction of protein intrinsic disorder","volume":"7","author":"Peng","year":"2006","journal-title":"BMC Bioinformatics"},{"key":"2023041104480638100_","unstructured":"Perkins DN \u00a0et al. Mascot online help manual 2006 Retrieved November 28, 2006, from http:\/\/www.matrixscience.com\/help\/seq_db_setup_uniref.html"},{"key":"2023041104480638100_","doi-asserted-by":"crossref","first-page":"3604","DOI":"10.1093\/bioinformatics\/bti542","article-title":"The predictive power of the CluSTr database","volume":"21","author":"Petryszak","year":"2005","journal-title":"Bioinformatics"},{"key":"2023041104480638100_","doi-asserted-by":"crossref","first-page":"S182","DOI":"10.1093\/bioinformatics\/18.suppl_2.S182","article-title":"ProClust: improved clustering of protein sequences with an extended graph-based approach","volume":"18","author":"Pipenbacher","year":"2002","journal-title":"Bioinformatics"},{"key":"2023041104480638100_","doi-asserted-by":"crossref","first-page":"D61","DOI":"10.1093\/nar\/gkl842","article-title":"NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins","volume":"35","author":"Pruitt","year":"2007","journal-title":"Nucleic Acids Res."},{"key":"2023041104480638100_","doi-asserted-by":"crossref","first-page":"1211","DOI":"10.1104\/pp.104.054999","article-title":"Sequencing and analysis of common bean ESTs. Building a foundation for functional genomics","volume":"137","author":"Ramirez","year":"2005","journal-title":"Plant Physiol."},{"key":"2023041104480638100_","doi-asserted-by":"crossref","first-page":"3505","DOI":"10.1111\/j.1742-4658.2005.04759.x","article-title":"Death inducer obliterator protein 1 in the context of DNA regulation. Sequence analyses of distant homologues point to a novel functional role","volume":"272","author":"Rojas","year":"2005","journal-title":"FEBS J."},{"key":"2023041104480638100_","doi-asserted-by":"crossref","first-page":"301","DOI":"10.1093\/dnares\/dsi018","article-title":"Comprehensive structural analysis of the genome of red clover (Trifolium pratense L.)","volume":"12","author":"Sato","year":"2005","journal-title":"DNA Res."},{"key":"2023041104480638100_","doi-asserted-by":"crossref","first-page":"600","DOI":"10.1104\/pp.105.060079","article-title":"Genome organization of more than 300 defensin-like genes in Arabidopsis","volume":"138","author":"Silverstein","year":"2005","journal-title":"Plant Physiol."},{"key":"2023041104480638100_","doi-asserted-by":"crossref","first-page":"D500","DOI":"10.1093\/nar\/gkj054","article-title":"Tetrahymena Genome Database (TGD): a new genomic resource for Tetrahymena thermophila research","volume":"34","author":"Stover","year":"2006","journal-title":"Nucleic Acids Res."},{"key":"2023041104480638100_","doi-asserted-by":"crossref","first-page":"D193","DOI":"10.1093\/nar\/gkl929","article-title":"The Universal Protein Resource (UniProt)","volume":"35","author":"The UniProt Consortium","year":"2007","journal-title":"Nucleic Acids Res."},{"key":"2023041104480638100_","doi-asserted-by":"crossref","first-page":"3264","DOI":"10.1128\/JB.188.9.3264-3272.2006","article-title":"Specific modification of a Na+ binding site in NADH:quinone oxidoreductase from Klebsiella pneumoniae with dicyclohexylcarbodiimide","volume":"188","author":"Vgenopoulou","year":"2006","journal-title":"J. Bacteriol."},{"key":"2023041104480638100_","doi-asserted-by":"crossref","first-page":"385","DOI":"10.1186\/1471-2105-7-385","article-title":"Incorporating background frequency improves entropy-based residue conservation measures","volume":"7","author":"Wang","year":"2006","journal-title":"BMC Bioinformatics"},{"key":"2023041104480638100_","doi-asserted-by":"crossref","first-page":"D5","DOI":"10.1093\/nar\/gkl1031","article-title":"Database resources of the national center for biotechnology information","volume":"35","author":"Wheeler","year":"2007","journal-title":"Nucleic Acids Res."},{"key":"2023041104480638100_","doi-asserted-by":"crossref","first-page":"D187","DOI":"10.1093\/nar\/gkj161","article-title":"The Universal Protein Resource (UniProt): an expanding universe of protein information","volume":"34","author":"Wu","year":"2006","journal-title":"Nucleic Acids Res."},{"key":"2023041104480638100_","doi-asserted-by":"crossref","first-page":"2123","DOI":"10.1105\/tpc.106.043794","article-title":"Genomic and genetic characterization of rice Cen3 reveals extensive transcription and evolutionary implications of a complex centromere","volume":"18","author":"Yan","year":"2006","journal-title":"Plant Cell"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/23\/10\/1282\/49812789\/bioinformatics_23_10_1282.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/23\/10\/1282\/49812789\/bioinformatics_23_10_1282.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,11]],"date-time":"2023-05-11T04:28:54Z","timestamp":1683779334000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/23\/10\/1282\/197795"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2007,3,22]]},"references-count":50,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2007,5,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btm098","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2007,5,15]]},"published":{"date-parts":[[2007,3,22]]}}}