{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:34:15Z","timestamp":1772138055443,"version":"3.50.1"},"reference-count":53,"publisher":"Oxford University Press (OUP)","issue":"17","license":[{"start":{"date-parts":[[2022,7,8]],"date-time":"2022-07-08T00:00:00Z","timestamp":1657238400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["U54GM115677"],"award-info":[{"award-number":["U54GM115677"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,9,2]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Microbiome datasets are often constrained by sequencing limitations. GenBank is the largest collection of publicly available DNA sequences, which is maintained by the National Center of Biotechnology Information (NCBI). The metadata of GenBank records are a largely understudied resource and may be uniquely leveraged to access the sum of prior studies focused on microbiome composition. Here, we developed a computational pipeline to analyze GenBank metadata, containing data on hosts, microorganisms and their place of origin. This work provides the first opportunity to leverage the totality of GenBank to shed light on compositional data practices that shape how microbiome datasets are formed as well as examine host\u2013microbiome relationships.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>The collected dataset contains multiple kingdoms of microorganisms, consisting of bacteria, viruses, archaea, protozoa, fungi, and invertebrate parasites, and hosts of multiple taxonomical classes, including mammals, birds and fish. A human data subset of this dataset provides insights to gaps in current microbiome data collection, which is biased towards clinically relevant pathogens. Clustering and phylogenic analysis reveals the potential to use these data to model host taxonomy and evolution, revealing groupings formed by host diet, environment and coevolution.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>GenBank Host-Microbiome Pipeline is available at https:\/\/github.com\/bcbi\/genbank_holobiome. The GenBank loader is available at https:\/\/github.com\/bcbi\/genbank_loader.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btac487","type":"journal-article","created":{"date-parts":[[2022,7,8]],"date-time":"2022-07-08T09:27:41Z","timestamp":1657272461000},"page":"4172-4177","source":"Crossref","is-referenced-by-count":2,"title":["GenBank as a source to monitor and analyze Host-Microbiome data"],"prefix":"10.1093","volume":"38","author":[{"given":"Vivek","family":"Ramanan","sequence":"first","affiliation":[{"name":"Center of Computational Molecular Biology Brown University , Providence, RI, USA"},{"name":"Center for Biomedical Informatics Brown University , Providence, RI, USA"}]},{"given":"Shanti","family":"Mechery","sequence":"additional","affiliation":[{"name":"Center for Biomedical Informatics Brown University , Providence, RI, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2054-7356","authenticated-orcid":false,"given":"Indra Neil","family":"Sarkar","sequence":"additional","affiliation":[{"name":"Center of Computational Molecular Biology Brown University , Providence, RI, USA"},{"name":"Center for Biomedical Informatics Brown University , Providence, RI, USA"},{"name":"Rhode Island Quality Institute , Providence, RI, USA"}]}],"member":"286","published-online":{"date-parts":[[2022,7,8]]},"reference":[{"key":"2023041408443842300_","first-page":"17","article-title":"Effective mapping of biomedical text to the UMLS metathesaurus: the MetaMap program","author":"Aronson","year":"2001","journal-title":"Proc. AMIA Symp"},{"key":"2023041408443842300_","doi-asserted-by":"crossref","first-page":"e33","DOI":"10.1093\/nar\/gkx1313","article-title":"HipMCL: a high-performance parallel implementation of the markov clustering algorithm for large-scale networks","volume":"46","author":"Azad","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"2023041408443842300_","doi-asserted-by":"crossref","first-page":"572","DOI":"10.4049\/jimmunol.1601247","article-title":"Microbiome-modulated metabolites at the interface of host immunity","volume":"198","author":"Blacher","year":"2017","journal-title":"J. Immunol"},{"key":"2023041408443842300_","doi-asserted-by":"crossref","first-page":"488","DOI":"10.1186\/1471-2105-7-488","article-title":"Evaluation of clustering algorithms for protein-protein interaction networks","volume":"7","author":"Brohee","year":"2006","journal-title":"BMC Bioinformatics"},{"key":"2023041408443842300_","doi-asserted-by":"crossref","first-page":"1926","DOI":"10.1038\/s41598-018-20414-0","article-title":"Taxonomy of anaerobic digestion microbiome reveals biases associated with the applied high throughput sequencing strategies","volume":"8","author":"Campanaro","year":"2018","journal-title":"Sci. Rep"},{"key":"2023041408443842300_","doi-asserted-by":"crossref","first-page":"925","DOI":"10.1016\/j.pt.2017.08.005","article-title":"Gut protozoa: friends or foes of the human gut microbiota?","volume":"33","author":"Chabe","year":"2017","journal-title":"Trends Parasitol"},{"key":"2023041408443842300_","doi-asserted-by":"crossref","first-page":"45","DOI":"10.1890\/13-0133.1","article-title":"Rarefaction and extrapolation with hill numbers: a framework for sampling and estimation in species diversity studies","volume":"84","author":"Chao","year":"2014","journal-title":"Ecol. Monogr"},{"key":"2023041408443842300_","doi-asserted-by":"crossref","first-page":"442","DOI":"10.1016\/j.jbi.2009.10.003","article-title":"MeSHing molecular sequences and clinical trials: a feasibility study","volume":"43","author":"Chen","year":"2010","journal-title":"J. Biomed. Inform"},{"key":"2023041408443842300_","first-page":"6","article-title":"Towards structuring unstructured GenBank metadata for enhancing comparative biological studies","volume":"2011","author":"Chen","year":"2011","journal-title":"AMIA Jt. Summits Transl. Sci. Proc"},{"key":"2023041408443842300_","doi-asserted-by":"crossref","first-page":"2594","DOI":"10.1038\/s41598-017-02995-4","article-title":"Fiber-utilizing capacity varies in prevotella- versus bacteroides-dominated gut microbiota","volume":"7","author":"Chen","year":"2017","journal-title":"Sci. Rep"},{"key":"2023041408443842300_","doi-asserted-by":"crossref","first-page":"260","DOI":"10.1038\/nrg3182","article-title":"The human microbiome: at the interface of health and disease","volume":"13","author":"Cho","year":"2012","journal-title":"Nat. Rev. Genet"},{"key":"2023041408443842300_","doi-asserted-by":"crossref","first-page":"713","DOI":"10.1007\/s001220051343","article-title":"Diversity of microsatellites derived from genomic libraries and GenBank sequences in rice (Oryza sativa L.)","volume":"100","author":"Cho","year":"2000","journal-title":"Theor. Appl. Genet"},{"key":"2023041408443842300_","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1111\/j.1365-2982.2010.01664.x","article-title":"The microbiome-gut-brain axis: from bowel to behavior","volume":"23","author":"Cryan","year":"2011","journal-title":"Neurogastroenterol. Motil"},{"key":"2023041408443842300_","doi-asserted-by":"crossref","first-page":"97","DOI":"10.1016\/j.cca.2015.01.003","article-title":"The role of the gut microbiome in the healthy adult status","volume":"451","author":"D'Argenio","year":"2015","journal-title":"Clin. Chim. Acta"},{"key":"2023041408443842300_","doi-asserted-by":"crossref","first-page":"559","DOI":"10.1038\/nature12820","article-title":"Diet rapidly and reproducibly alters the human gut microbiome","volume":"505","author":"David","year":"2014","journal-title":"Nature"},{"key":"2023041408443842300_","doi-asserted-by":"crossref","first-page":"1635","DOI":"10.1126\/science.1110591","article-title":"Diversity of the human intestinal microbial flora","volume":"308","author":"Eckburg","year":"2005","journal-title":"Science"},{"key":"2023041408443842300_","doi-asserted-by":"crossref","first-page":"123","DOI":"10.3389\/fmicb.2020.00123","article-title":"Extending burk dehority's perspectives on the role of ciliate protozoa in the rumen","volume":"11","author":"Firkins","year":"2020","journal-title":"Front. Microbiol"},{"key":"2023041408443842300_","doi-asserted-by":"crossref","first-page":"84","DOI":"10.1186\/s12915-019-0704-y","article-title":"Studying the gut virome in the metagenomic era: challenges and perspectives","volume":"17","author":"Garmaeva","year":"2019","journal-title":"BMC Biol"},{"key":"2023041408443842300_","doi-asserted-by":"crossref","first-page":"619287","DOI":"10.3389\/fmicb.2021.619287","article-title":"Editorial: advances in the understanding of the commensal eukaryota and viruses of the herbivore gut","volume":"12","author":"Gilbert","year":"2021","journal-title":"Front. Microbiol"},{"key":"2023041408443842300_","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1016\/j.medmal.2015.01.007","article-title":"Digestive tract mycobiota: a source of infection","volume":"45","author":"Gouba","year":"2015","journal-title":"Med. Mal. Infect"},{"key":"2023041408443842300_","doi-asserted-by":"crossref","first-page":"377","DOI":"10.1038\/s41591-019-0377-7","article-title":"The microbiome, cancer, and cancer therapy","volume":"25","author":"Helmink","year":"2019","journal-title":"Nat. Med"},{"key":"2023041408443842300_","doi-asserted-by":"crossref","first-page":"1836","DOI":"10.3389\/fphys.2018.01836","article-title":"Modeling the role of the microbiome in evolution","volume":"9","author":"Huitzil","year":"2018","journal-title":"Front. Physiol"},{"key":"2023041408443842300_","doi-asserted-by":"crossref","first-page":"1812","DOI":"10.1093\/molbev\/msx116","article-title":"TimeTree: a resource for timelines, timetrees, and divergence times","volume":"34","author":"Kumar","year":"2017","journal-title":"Mol. Biol. Evol"},{"key":"2023041408443842300_","doi-asserted-by":"crossref","first-page":"423","DOI":"10.1186\/s12864-019-5699-9","article-title":"MiCoP: microbial community profiling method for detecting viral and fungal organisms in metagenomic samples","volume":"20","author":"LaPierre","year":"2019","journal-title":"BMC Genomics"},{"key":"2023041408443842300_","doi-asserted-by":"crossref","first-page":"e0179017","DOI":"10.1371\/journal.pone.0179017","article-title":"Microbiome restoration diet improves digestion, cognition and physical and emotional wellbeing","volume":"12","author":"Lawrence","year":"2017","journal-title":"PLoS One"},{"key":"2023041408443842300_","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1097\/MOG.0b013e328333d751","article-title":"Obesity and the human microbiome","volume":"26","author":"Ley","year":"2010","journal-title":"Curr. Opin. Gastroenterol"},{"key":"2023041408443842300_","doi-asserted-by":"crossref","first-page":"5120","DOI":"10.1093\/bioinformatics\/btaa647","article-title":"GeoBoost2: a natural languageprocessing pipeline for GenBank metadata enrichment for virus phylogeography","volume":"36","author":"Magge","year":"2020","journal-title":"Bioinformatics"},{"key":"2023041408443842300_","doi-asserted-by":"crossref","first-page":"1474","DOI":"10.3390\/nu12051474","article-title":"The firmicutes\/bacteroidetes ratio: a relevant marker of gut dysbiosis in obese patients?","volume":"12","author":"Magne","year":"2020","journal-title":"Nutrients"},{"key":"2023041408443842300_","doi-asserted-by":"crossref","first-page":"2868","DOI":"10.3389\/fimmu.2018.02868","article-title":"Exploring the human microbiome: the potential future role of Next-Generation sequencing in disease diagnosis and treatment","volume":"9","author":"Malla","year":"2018","journal-title":"Front. Immunol"},{"key":"2023041408443842300_","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1186\/1471-2180-9-123","article-title":"The firmicutes\/bacteroidetes ratio of the human microbiota changes with age","volume":"9","author":"Mariat","year":"2009","journal-title":"BMC Microbiol"},{"key":"2023041408443842300_","doi-asserted-by":"crossref","first-page":"126106","DOI":"10.1016\/j.syapm.2020.126106","article-title":"Plant microbiota modified by plant domestication","volume":"43","author":"Martinez-Romero","year":"2020","journal-title":"Syst. Appl. Microbiol"},{"key":"2023041408443842300_","doi-asserted-by":"crossref","first-page":"970","DOI":"10.1126\/science.1198719","article-title":"Diet drives convergence in gut microbiome functions across mammalian phylogeny and within humans","volume":"332","author":"Muegge","year":"2011","journal-title":"Science"},{"key":"2023041408443842300_","doi-asserted-by":"crossref","first-page":"1756284819836620","DOI":"10.1177\/1756284819836620","article-title":"The gut virome: the \u2018missing link\u2019 between gut bacteria and host immunity?","volume":"12","author":"Mukhopadhya","year":"2019","journal-title":"Therap. Adv. Gastroenterol"},{"key":"2023041408443842300_","doi-asserted-by":"crossref","first-page":"113","DOI":"10.1186\/s40168-021-01059-0","article-title":"Identifying biases and their potential solutions in human microbiome studies","volume":"9","author":"Nearing","year":"2021","journal-title":"Microbiome"},{"key":"2023041408443842300_","doi-asserted-by":"crossref","first-page":"7759","DOI":"10.1073\/pnas.92.17.7759","article-title":"Polymorphic simple sequence repeat regions in chloroplast genomes: applications to the population genetics of pines","volume":"92","author":"Powell","year":"1995","journal-title":"Proc. Natl. Acad. Sci. U S A"},{"key":"2023041408443842300_","doi-asserted-by":"crossref","first-page":"133","DOI":"10.3389\/fmicb.2020.00133","article-title":"Consequences of domestication on gut microbiome: a comparative study between wild gaur and domestic mithun","volume":"11","author":"Prabhu","year":"2020","journal-title":"Front. Microbiol"},{"key":"2023041408443842300_","author":"Reese","year":"2021"},{"key":"2023041408443842300_","doi-asserted-by":"crossref","first-page":"9351507","DOI":"10.1155\/2017\/9351507","article-title":"Proteobacteria: a common factor in human diseases","volume":"2017","author":"Rizzatti","year":"2017","journal-title":"Biomed Res. Int"},{"key":"2023041408443842300_","first-page":"717","article-title":"Leveraging biomedical ontologies and annotation services to organize microbiome data from mammalian hosts","volume":"2010","author":"Sarkar","year":"2010","journal-title":"AMIA Annu. Symp. Proc"},{"key":"2023041408443842300_","doi-asserted-by":"crossref","first-page":"D20","DOI":"10.1093\/nar\/gkab1112","article-title":"Database resources of the national center for biotechnology information","volume":"50","author":"Sayers","year":"2022","journal-title":"Nucleic Acids Res"},{"issue":"Suppl 1","key":"2023041408443842300_","doi-asserted-by":"crossref","first-page":"S44","DOI":"10.1016\/j.jbi.2011.06.005","article-title":"Enhancing phylogeography by improving geographical information from GenBank","volume":"44","author":"Scotch","year":"2011","journal-title":"J. Biomed. Inform"},{"key":"2023041408443842300_","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1186\/s40168-021-01199-3","article-title":"Performance determinants of unsupervised clustering methods for microbiome data","volume":"10","author":"Shi","year":"2022","journal-title":"Microbiome"},{"key":"2023041408443842300_","doi-asserted-by":"crossref","first-page":"496","DOI":"10.1016\/j.tibtech.2015.06.011","article-title":"Proteobacteria: microbial signature of dysbiosis in gut microbiota","volume":"33","author":"Shin","year":"2015","journal-title":"Trends Biotechnol"},{"key":"2023041408443842300_","doi-asserted-by":"crossref","first-page":"73","DOI":"10.1186\/s12967-017-1175-y","article-title":"Influence of diet on the gut microbiome and implications for human health","volume":"15","author":"Singh","year":"2017","journal-title":"J. Transl. Med"},{"key":"2023041408443842300_","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1186\/1757-4749-5-12","article-title":"Emerging importance of holobionts in evolution and in probiotics","volume":"5","author":"Singh","year":"2013","journal-title":"Gut Pathog"},{"key":"2023041408443842300_","doi-asserted-by":"crossref","first-page":"934","DOI":"10.1093\/jamia\/ocv172","article-title":"A high-precision rule-based extraction system for expanding geospatial metadata in GenBank records","volume":"23","author":"Tahsin","year":"2016","journal-title":"J. Am. Med. Inform. Assoc"},{"key":"2023041408443842300_","doi-asserted-by":"crossref","first-page":"774","DOI":"10.1111\/1758-2229.12438","article-title":"Fungal identification biases in microbiome projects","volume":"8","author":"Tedersoo","year":"2016","journal-title":"Environ. Microbiol. Rep"},{"key":"2023041408443842300_","doi-asserted-by":"crossref","first-page":"962","DOI":"10.1016\/j.cell.2018.10.029","article-title":"US immigration westernizes the human gut microbiome","volume":"175","author":"Vangay","year":"2018","journal-title":"Cell"},{"key":"2023041408443842300_","doi-asserted-by":"crossref","first-page":"599","DOI":"10.1093\/molbev\/msz240","article-title":"Treeio: an R package for phylogenetic tree input and output with richly annotated and associated data","volume":"37","author":"Wang","year":"2020","journal-title":"Mol. Biol. Evol"},{"key":"2023041408443842300_","doi-asserted-by":"crossref","first-page":"i348","DOI":"10.1093\/bioinformatics\/btv259","article-title":"Knowledge-driven geospatial location resolution for phylogeographic models of virus migration","volume":"31","author":"Weissenbacher","year":"2015","journal-title":"Bioinformatics"},{"key":"2023041408443842300_","doi-asserted-by":"crossref","first-page":"2200","DOI":"10.1038\/s41467-019-10191-3","article-title":"Host diet and evolutionary history explain different aspects of gut microbiome diversity among vertebrate clades","volume":"10","author":"Youngblut","year":"2019","journal-title":"Nat. Commun"},{"key":"2023041408443842300_","doi-asserted-by":"crossref","first-page":"e96","DOI":"10.1002\/cpbi.96","article-title":"Using ggtree to visualize data on Tree-Like structures","volume":"69","author":"Yu","year":"2020","journal-title":"Curr. Protoc. Bioinformatics"},{"key":"2023041408443842300_","doi-asserted-by":"crossref","first-page":"462","DOI":"10.1038\/s41586-019-1291-3","article-title":"Mapping human microbiome drug metabolism by gut bacteria and their genes","volume":"570","author":"Zimmermann","year":"2019","journal-title":"Nature"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btac487\/45026025\/btac487.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/17\/4172\/49889733\/btac487.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/17\/4172\/49889733\/btac487.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,24]],"date-time":"2023-11-24T05:51:04Z","timestamp":1700805064000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/38\/17\/4172\/6633928"}},"subtitle":[],"editor":[{"given":"Zhiyong","family":"Lu","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2022,7,8]]},"references-count":53,"journal-issue":{"issue":"17","published-print":{"date-parts":[[2022,9,2]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btac487","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2021.10.14.464420","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,9,1]]},"published":{"date-parts":[[2022,7,8]]}}}