{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,6]],"date-time":"2026-05-06T07:30:56Z","timestamp":1778052656709,"version":"3.51.4"},"reference-count":26,"publisher":"Oxford University Press (OUP)","issue":"D1","funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,1,8]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>The Reference Sequence (RefSeq) project at the National Center for Biotechnology Information (NCBI) contains nearly 200 000 bacterial and archaeal genomes and 150 million proteins with up-to-date annotation. Changes in the Prokaryotic Genome Annotation Pipeline (PGAP) since 2018 have resulted in a substantial reduction in spurious annotation. The hierarchical collection of protein family models (PFMs) used by PGAP as evidence for structural and functional annotation was expanded to over 35 000 protein profile hidden Markov models (HMMs), 12 300 BlastRules and 36 000 curated CDD architectures. As a result, &amp;gt;122 million or 79% of RefSeq proteins are now named based on a match to a curated PFM. Gene symbols, Enzyme Commission numbers or supporting publication attributes are available on over 40% of the PFMs and are inherited by the proteins and features they name, facilitating multi-genome analyses and connections to the literature. In adherence with the principles of FAIR (findable, accessible, interoperable, reusable), the PFMs are available in the Protein Family Models Entrez database to any user. Finally, the reference and representative genome set, a taxonomically diverse subset of RefSeq prokaryotic genomes, is now recalculated regularly and available for download and homology searches with BLAST. RefSeq is found at\u00a0https:\/\/www.ncbi.nlm.nih.gov\/refseq\/.<\/jats:p>","DOI":"10.1093\/nar\/gkaa1105","type":"journal-article","created":{"date-parts":[[2020,11,3]],"date-time":"2020-11-03T12:10:39Z","timestamp":1604405439000},"page":"D1020-D1028","source":"Crossref","is-referenced-by-count":1034,"title":["RefSeq: expanding the Prokaryotic Genome Annotation Pipeline reach with protein family model curation"],"prefix":"10.1093","volume":"49","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3460-6695","authenticated-orcid":false,"given":"Wenjun","family":"Li","sequence":"first","affiliation":[{"name":"National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, MD 20892-6511, USA"}]},{"given":"Kathleen R","family":"O\u2019Neill","sequence":"additional","affiliation":[{"name":"National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, MD 20892-6511, USA"}]},{"given":"Daniel H","family":"Haft","sequence":"additional","affiliation":[{"name":"National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, MD 20892-6511, USA"}]},{"given":"Michael","family":"DiCuccio","sequence":"additional","affiliation":[{"name":"National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, MD 20892-6511, USA"}]},{"given":"Vyacheslav","family":"Chetvernin","sequence":"additional","affiliation":[{"name":"National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, MD 20892-6511, USA"}]},{"given":"Azat","family":"Badretdin","sequence":"additional","affiliation":[{"name":"National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, MD 20892-6511, USA"}]},{"given":"George","family":"Coulouris","sequence":"additional","affiliation":[{"name":"National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, MD 20892-6511, USA"}]},{"given":"Farideh","family":"Chitsaz","sequence":"additional","affiliation":[{"name":"National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, MD 20892-6511, USA"}]},{"given":"Myra\u00a0K","family":"Derbyshire","sequence":"additional","affiliation":[{"name":"National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, MD 20892-6511, USA"}]},{"given":"A Scott","family":"Durkin","sequence":"additional","affiliation":[{"name":"National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, MD 20892-6511, USA"}]},{"given":"Noreen R","family":"Gonzales","sequence":"additional","affiliation":[{"name":"National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, MD 20892-6511, USA"}]},{"given":"Marc","family":"Gwadz","sequence":"additional","affiliation":[{"name":"National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, MD 20892-6511, USA"}]},{"given":"Christopher\u00a0J","family":"Lanczycki","sequence":"additional","affiliation":[{"name":"National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, MD 20892-6511, USA"}]},{"given":"James S","family":"Song","sequence":"additional","affiliation":[{"name":"National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, MD 20892-6511, USA"}]},{"given":"Narmada","family":"Thanki","sequence":"additional","affiliation":[{"name":"National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, MD 20892-6511, USA"}]},{"given":"Jiyao","family":"Wang","sequence":"additional","affiliation":[{"name":"National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, MD 20892-6511, USA"}]},{"given":"Roxanne\u00a0A","family":"Yamashita","sequence":"additional","affiliation":[{"name":"National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, MD 20892-6511, USA"}]},{"given":"Mingzhang","family":"Yang","sequence":"additional","affiliation":[{"name":"National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, MD 20892-6511, USA"}]},{"given":"Chanjuan","family":"Zheng","sequence":"additional","affiliation":[{"name":"National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, MD 20892-6511, USA"}]},{"given":"Aron","family":"Marchler-Bauer","sequence":"additional","affiliation":[{"name":"National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, MD 20892-6511, USA"}]},{"given":"Fran\u00e7oise","family":"Thibaud-Nissen","sequence":"additional","affiliation":[{"name":"National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 45 Center Drive, Bethesda, MD 20892-6511, USA"}]}],"member":"286","published-online":{"date-parts":[[2020,12,3]]},"reference":[{"key":"2021010313115883900_B1","doi-asserted-by":"crossref","first-page":"6614","DOI":"10.1093\/nar\/gkw569","article-title":"NCBI prokaryotic genome annotation pipeline","volume":"44","author":"Tatusova","year":"2016","journal-title":"Nucleic Acids Res."},{"key":"2021010313115883900_B2","doi-asserted-by":"crossref","first-page":"D851","DOI":"10.1093\/nar\/gkx1068","article-title":"RefSeq: an update on prokaryotic genome annotation and curation","volume":"46","author":"Haft","year":"2018","journal-title":"Nucleic Acids Res."},{"key":"2021010313115883900_B3","doi-asserted-by":"crossref","first-page":"D372","DOI":"10.1093\/nar\/gkv1103","article-title":"The Transporter Classification Database (TCDB): recent advances","volume":"44","author":"Saier","year":"2016","journal-title":"Nucleic Acids Res."},{"key":"2021010313115883900_B4","doi-asserted-by":"crossref","first-page":"D687","DOI":"10.1093\/nar\/gky1080","article-title":"VFDB 2019: a comparative pathogenomic platform with an interactive web interface","volume":"47","author":"Liu","year":"2019","journal-title":"Nucleic Acids Res."},{"key":"2021010313115883900_B5","doi-asserted-by":"crossref","first-page":"2386","DOI":"10.1099\/ijsem.0.002809","article-title":"Using average nucleotide identity to improve taxonomic assignments in prokaryotic genomes at the NCBI","volume":"68","author":"Ciufo","year":"2018","journal-title":"Int. J. Syst. Evol. Microbiol."},{"key":"2021010313115883900_B6","doi-asserted-by":"crossref","first-page":"978","DOI":"10.1016\/j.tim.2018.06.007","article-title":"Evolution of plasmid-mediated antibiotic resistance in the clinical context","volume":"26","author":"San\u00a0Millan","year":"2018","journal-title":"Trends Microbiol."},{"key":"2021010313115883900_B7","doi-asserted-by":"crossref","first-page":"1079","DOI":"10.1101\/gr.230615.117","article-title":"Modeling leaderless transcription and atypical genes results in more accurate gene prediction in prokaryotes","volume":"28","author":"Lomsadze","year":"2018","journal-title":"Genome Res."},{"key":"2021010313115883900_B8","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/978-1-4939-9173-0_1","article-title":"tRNAscan-SE: searching for tRNA genes in genomic sequences","volume":"1962","author":"Chan","year":"2019","journal-title":"Methods Mol. Biol."},{"key":"2021010313115883900_B9","doi-asserted-by":"crossref","first-page":"D130","DOI":"10.1093\/nar\/gku1063","article-title":"Rfam 12.0: updates to the RNA families database","volume":"43","author":"Nawrocki","year":"2015","journal-title":"Nucleic Acids Res."},{"key":"2021010313115883900_B10","doi-asserted-by":"crossref","first-page":"D265","DOI":"10.1093\/nar\/gkz991","article-title":"CDD\/SPARCLE: the conserved domain database in 2020","volume":"48","author":"Lu","year":"2020","journal-title":"Nucleic Acids Res."},{"key":"2021010313115883900_B11","doi-asserted-by":"crossref","first-page":"D216","DOI":"10.1093\/nar\/gkn734","article-title":"The National Center for Biotechnology Information's Protein Clusters Database","volume":"37","author":"Klimke","year":"2009","journal-title":"Nucleic Acids Res."},{"key":"2021010313115883900_B12","doi-asserted-by":"crossref","first-page":"D387","DOI":"10.1093\/nar\/gks1234","article-title":"TIGRFAMs and genome properties in 2013","volume":"41","author":"Haft","year":"2013","journal-title":"Nucleic Acids Res."},{"key":"2021010313115883900_B13","doi-asserted-by":"crossref","first-page":"D427","DOI":"10.1093\/nar\/gky995","article-title":"The Pfam protein families database in 2019","volume":"47","author":"El-Gebali","year":"2019","journal-title":"Nucleic Acids Res."},{"key":"2021010313115883900_B14","doi-asserted-by":"crossref","first-page":"e00039-17","DOI":"10.1128\/mSystems.00039-17","article-title":"PaperBLAST: text mining papers for information about homologs","volume":"2","author":"Price","year":"2017","journal-title":"mSystems"},{"key":"2021010313115883900_B15","doi-asserted-by":"crossref","first-page":"175","DOI":"10.1016\/j.tibs.2015.09.005","article-title":"Type III secretion: building and operating a remarkable nanomachine","volume":"41","author":"Portaliou","year":"2016","journal-title":"Trends Biochem. Sci."},{"key":"2021010313115883900_B16","doi-asserted-by":"crossref","first-page":"2994","DOI":"10.1093\/nar\/29.14.2994","article-title":"Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements","volume":"29","author":"Sch\u00e4ffer","year":"2001","journal-title":"Nucleic Acids Res."},{"key":"2021010313115883900_B17","doi-asserted-by":"crossref","first-page":"e1002195","DOI":"10.1371\/journal.pcbi.1002195","article-title":"Accelerated profile HMM searches","volume":"7","author":"Eddy","year":"2011","journal-title":"PLoS Comput. Biol."},{"key":"2021010313115883900_B18","doi-asserted-by":"crossref","first-page":"e90","DOI":"10.1002\/cpbi.90","article-title":"NCBI's conserved domain database and tools for protein domain analysis","volume":"69","author":"Yang","year":"2020","journal-title":"Curr. Protoc. Bioinformatics"},{"key":"2021010313115883900_B19","doi-asserted-by":"crossref","first-page":"1073","DOI":"10.1093\/bioinformatics\/btm076","article-title":"COBALT: constraint-based alignment tool for multiple protein sequences","volume":"23","author":"Papadopoulos","year":"2007","journal-title":"Bioinformatics"},{"key":"2021010313115883900_B20","doi-asserted-by":"crossref","first-page":"165","DOI":"10.1186\/s13059-018-1554-6","article-title":"RefSeq database growth influences the accuracy of k-mer-based lowest common ancestor species identification","volume":"19","author":"Nasko","year":"2018","journal-title":"Genome Biol."},{"key":"2021010313115883900_B21","doi-asserted-by":"crossref","first-page":"i12","DOI":"10.1093\/bioinformatics\/btaa458","article-title":"ganon: precise metagenomics classification against large and up-to-date sets of reference sequences","volume":"36","author":"Piro","year":"2020","journal-title":"Bioinformatics"},{"key":"2021010313115883900_B22","doi-asserted-by":"crossref","first-page":"ESP-0009-2013","DOI":"10.1128\/ecosalplus.ESP-0006-2018","article-title":"The EcoCyc database","volume":"8","author":"Karp","year":"2018","journal-title":"EcoSal Plus"},{"key":"2021010313115883900_B23","doi-asserted-by":"crossref","first-page":"D73","DOI":"10.1093\/nar\/gkv1226","article-title":"Assembly: a resource for assembled genomes at NCBI","volume":"44","author":"Kitts","year":"2016","journal-title":"Nucleic Acids Res."},{"key":"2021010313115883900_B24","doi-asserted-by":"crossref","first-page":"4643","DOI":"10.1093\/bioinformatics\/btaa485","article-title":"UniRule: a unified rule resource for automatic annotation in the UniProt Knowledgebase","author":"MacDougall","year":"2020","journal-title":"Bioinformatics"},{"key":"2021010313115883900_B25","doi-asserted-by":"crossref","first-page":"D564","DOI":"10.1093\/nar\/gky1013","article-title":"Genome properties in 2019: a new companion database to InterPro for the inference of complete functional attributes","volume":"47","author":"Richardson","year":"2019","journal-title":"Nucleic Acids Res."},{"key":"2021010313115883900_B26","doi-asserted-by":"crossref","first-page":"D206","DOI":"10.1093\/nar\/gkt1226","article-title":"The SEED and the rapid annotation of microbial genomes using subsystems technology (RAST)","volume":"42","author":"Overbeek","year":"2014","journal-title":"Nucleic Acids Res."}],"container-title":["Nucleic Acids Research"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/nar\/article-pdf\/49\/D1\/D1020\/35364279\/gkaa1105.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/academic.oup.com\/nar\/article-pdf\/49\/D1\/D1020\/35364279\/gkaa1105.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,10,11]],"date-time":"2023-10-11T08:29:51Z","timestamp":1697012991000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/nar\/article\/49\/D1\/D1020\/6018440"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,12,3]]},"references-count":26,"journal-issue":{"issue":"D1","published-online":{"date-parts":[[2020,12,3]]},"published-print":{"date-parts":[[2021,1,8]]}},"URL":"https:\/\/doi.org\/10.1093\/nar\/gkaa1105","relation":{},"ISSN":["0305-1048","1362-4962"],"issn-type":[{"value":"0305-1048","type":"print"},{"value":"1362-4962","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,1,8]]},"published":{"date-parts":[[2020,12,3]]}}}