{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,24]],"date-time":"2026-06-24T06:31:26Z","timestamp":1782282686050,"version":"3.54.5"},"reference-count":17,"publisher":"Oxford University Press (OUP)","issue":"17","license":[{"start":{"date-parts":[[2018,4,14]],"date-time":"2018-04-14T00:00:00Z","timestamp":1523664000000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000060","name":"National Institute of Allergy and Infectious Diseases","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000060","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Department of Health and Human Services","award":["U19AI110819"],"award-info":[{"award-number":["U19AI110819"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2018,9,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>The vast number of available sequenced bacterial genomes occasionally exceeds the facilities of comparative genomic methods or is dominated by a single outbreak strain, and thus a diverse and representative subset is required. Generation of the reduced subset currently requires a priori supervised clustering and sequence-only selection of medoid genomic sequences, independent of any additional genome metrics or strain attributes.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>The Gaussian Genome Representative Selector with Prioritization (GGRaSP) R-package described below generates a reduced subset of genomes that prioritizes maintaining genomes of interest to the user as well as minimizing the loss of genetic variation. The package also allows for unsupervised clustering by modeling the genomic relationships using a Gaussian mixture model to select an appropriate cluster threshold. We demonstrate the capabilities of GGRaSP by generating a reduced list of 315 genomes from a genomic dataset of 4600 Escherichia coli genomes, prioritizing selection by type strain and by genome completeness.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementaion<\/jats:title>\n                  <jats:p>GGRaSP is available at https:\/\/github.com\/JCVenterInstitute\/ggrasp\/.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/bty300","type":"journal-article","created":{"date-parts":[[2018,4,12]],"date-time":"2018-04-12T19:32:51Z","timestamp":1523561571000},"page":"3032-3034","source":"Crossref","is-referenced-by-count":20,"title":["GGRaSP: a R-package for selecting representative genomes using Gaussian mixture models"],"prefix":"10.1093","volume":"34","author":[{"given":"Thomas H","family":"Clarke","sequence":"first","affiliation":[{"name":"J. Craig Venter Institute, Rockville, MD, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Lauren M","family":"Brinkac","sequence":"additional","affiliation":[{"name":"J. Craig Venter Institute, Rockville, MD, USA"},{"name":"Department of Biotechnology and Food Technology, Durban University of Technology, Durban, South Africa"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Granger","family":"Sutton","sequence":"additional","affiliation":[{"name":"J. Craig Venter Institute, Rockville, MD, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Derrick E","family":"Fouts","sequence":"additional","affiliation":[{"name":"J. Craig Venter Institute, Rockville, MD, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2018,4,14]]},"reference":[{"key":"2023061313383144400_bty300-B1","doi-asserted-by":"crossref","first-page":"1144","DOI":"10.1038\/nmeth.3103","article-title":"Binning metagenomic contigs by coverage and composition","volume":"11","author":"Alneberg","year":"2014","journal-title":"Nat. Methods"},{"key":"2023061313383144400_bty300-B2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18637\/jss.v032.i06","article-title":"mixtools: an R package for analyzing mixture models","volume":"32","author":"Benaglia","year":"2009","journal-title":"J. Stat. Softw"},{"key":"2023061313383144400_bty300-B3","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18637\/jss.v047.i03","article-title":"The R package bgmm: mixture modeling with uncertain knowledge","volume":"47","author":"Biecek","year":"2012","journal-title":"J. Stat. Softw"},{"key":"2023061313383144400_bty300-B4","doi-asserted-by":"crossref","first-page":"1725","DOI":"10.1093\/bioinformatics\/btx045","article-title":"LOCUST: a custom sequence locus typer for classifying microbial isolates","volume":"33","author":"Brinkac","year":"2017","journal-title":"Bioinformatics"},{"key":"2023061313383144400_bty300-B5","doi-asserted-by":"crossref","first-page":"143","DOI":"10.1186\/s13059-015-0701-6","article-title":"A novel method of consensus pan-chromosome assembly and large-scale comparative analysis reveal the highly flexible pan-genome of Acinetobacter baumannii","volume":"16","author":"Chan","year":"2015","journal-title":"Genome Biol"},{"key":"2023061313383144400_bty300-B6","doi-asserted-by":"crossref","first-page":"e02093-16","DOI":"10.1128\/mBio.02093-16","article-title":"Comprehensive genome analysis of carbapenemase-producing Enterobacter spp.: new insights into phylogeny, population structure, and resistance mechanisms","volume":"7","author":"Chavda","year":"2016","journal-title":"mBio"},{"key":"2023061313383144400_bty300-B7","doi-asserted-by":"crossref","first-page":"738","DOI":"10.1101\/gr.4825606","article-title":"Widespread genome duplications throughout the history of flowering plants","volume":"16","author":"Cui","year":"2006","journal-title":"Genome Res"},{"key":"2023061313383144400_bty300-B8","author":"Ihaka","year":"2016"},{"key":"2023061313383144400_bty300-B9","doi-asserted-by":"crossref","first-page":"14306","DOI":"10.1038\/ncomms14306","article-title":"MetaSort untangles metagenome assembly by reducing microbial community complexity","volume":"8","author":"Ji","year":"2017","journal-title":"Nat. Commun"},{"key":"2023061313383144400_bty300-B10","doi-asserted-by":"crossref","first-page":"W242","DOI":"10.1093\/nar\/gkw290","article-title":"Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees","volume":"44","author":"Letunic","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2023061313383144400_bty300-B11","doi-asserted-by":"crossref","first-page":"947","DOI":"10.1093\/bioinformatics\/btt064","article-title":"Phylogenomic clustering for selecting non-redundant genomes for comparative genomics","volume":"29","author":"Moreno-Hagelsieb","year":"2013","journal-title":"Bioinformatics"},{"key":"2023061313383144400_bty300-B12","doi-asserted-by":"crossref","first-page":"132","DOI":"10.1186\/s13059-016-0997-x","article-title":"Mash: fast genome and metagenome distance estimation using MinHash","volume":"17","author":"Ondov","year":"2016","journal-title":"Genome Biol"},{"key":"2023061313383144400_bty300-B13","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1093\/bioinformatics\/btg412","article-title":"APE: analyses of Phylogenetics and Evolution in R language","volume":"20","author":"Paradis","year":"2004","journal-title":"Bioinformatics"},{"key":"2023061313383144400_bty300-B14","doi-asserted-by":"crossref","first-page":"62","DOI":"10.1186\/s12915-017-0399-x","article-title":"The house spider genome reveals an ancient whole-genome duplication during arachnid evolution","volume":"15","author":"Schwager","year":"2017","journal-title":"BMC Biol"},{"key":"2023061313383144400_bty300-B15","doi-asserted-by":"crossref","first-page":"6761","DOI":"10.1093\/nar\/gkv657","article-title":"Microbial species delineation using whole genome sequences","volume":"43","author":"Varghese","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"2023061313383144400_bty300-B16","author":"Wickham","year":"2009"},{"key":"2023061313383144400_bty300-B17","doi-asserted-by":"crossref","first-page":"276","DOI":"10.1186\/s12859-016-1112-8","article-title":"Clustering analysis of proteins from microbial genomes at multiple levels of resolution","volume":"17","author":"Zaslavsky","year":"2016","journal-title":"BMC Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/17\/3032\/50581981\/bioinformatics_34_17_3032.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/17\/3032\/50581981\/bioinformatics_34_17_3032.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,13]],"date-time":"2023-06-13T13:39:53Z","timestamp":1686663593000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/34\/17\/3032\/4970513"}},"subtitle":[],"editor":[{"given":"John","family":"Hancock","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"editor"}]}],"short-title":[],"issued":{"date-parts":[[2018,4,14]]},"references-count":17,"journal-issue":{"issue":"17","published-print":{"date-parts":[[2018,9,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bty300","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2018,9,1]]},"published":{"date-parts":[[2018,4,14]]}}}