{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,30]],"date-time":"2026-03-30T22:27:43Z","timestamp":1774909663085,"version":"3.50.1"},"reference-count":42,"publisher":"Oxford University Press (OUP)","issue":"7","license":[{"start":{"date-parts":[[2018,8,31]],"date-time":"2018-08-31T00:00:00Z","timestamp":1535673600000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Klaus Tschira Stiftung gGmbH in Heidelberg, Germany"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,4,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>In most metagenomic sequencing studies, the initial analysis step consists in assessing the evolutionary provenance of the sequences. Phylogenetic (or Evolutionary) Placement methods can be employed to determine the evolutionary position of sequences with respect to a given reference phylogeny. These placement methods do however face certain limitations: The manual selection of reference sequences is labor-intensive; the computational effort to infer reference phylogenies is substantially larger than for methods that rely on sequence similarity; the number of taxa in the reference phylogeny should be small enough to allow for visually inspecting the results.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We present algorithms to overcome the above limitations. First, we introduce a method to automatically construct representative sequences from databases to infer reference phylogenies. Second, we present an approach for conducting large-scale phylogenetic placements on nested phylogenies. Third, we describe a preprocessing pipeline that allows for handling huge sequence datasets. Our experiments on empirical data show that our methods substantially accelerate the workflow and yield highly accurate placement results.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>Freely available under GPLv3 at http:\/\/github.com\/lczech\/gappa.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/bty767","type":"journal-article","created":{"date-parts":[[2018,8,30]],"date-time":"2018-08-30T15:51:16Z","timestamp":1535644276000},"page":"1151-1158","source":"Crossref","is-referenced-by-count":38,"title":["Methods for automatic reference trees and multilevel phylogenetic placement"],"prefix":"10.1093","volume":"35","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1340-9644","authenticated-orcid":false,"given":"Lucas","family":"Czech","sequence":"first","affiliation":[{"name":"Scientific Computing Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3437-150X","authenticated-orcid":false,"given":"Pierre","family":"Barbera","sequence":"additional","affiliation":[{"name":"Scientific Computing Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany"}]},{"given":"Alexandros","family":"Stamatakis","sequence":"additional","affiliation":[{"name":"Scientific Computing Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany"},{"name":"Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany"}]}],"member":"286","published-online":{"date-parts":[[2018,8,31]]},"reference":[{"key":"2023013107272554300_bty767-B1","doi-asserted-by":"crossref","first-page":"281","DOI":"10.1111\/j.1469-8137.2009.03160.x","article-title":"The UNITE database for molecular identification of fungi\u2013recent updates and future perspectives","volume":"186","author":"Abarenkov","year":"2010","journal-title":"New Phytol"},{"key":"2023013107272554300_bty767-B2","doi-asserted-by":"crossref","first-page":"114.","DOI":"10.1186\/s12864-017-3501-4","article-title":"SILVA, RDP, Greengenes, NCBI and OTT\u2014how do these taxonomies compare?","volume":"18","author":"Balvo\u010di\u016bt\u0117","year":"2017","journal-title":"BMC Genom"},{"key":"2023013107272554300_bty767-B3","article-title":"EPA-ng: massively parallel evolutionary placement of genetic sequences","author":"Barbera","year":"2018","journal-title":"bioRxiv"},{"key":"2023013107272554300_bty767-B4","volume-title":"PaPaRa 2.0: A Vectorized Algorithm for Probabilistic Phylogeny-Aware Alignment Extension. Technical Report","author":"Berger","year":"2012"},{"key":"2023013107272554300_bty767-B5","doi-asserted-by":"crossref","first-page":"291","DOI":"10.1093\/sysbio\/syr010","article-title":"Performance, accuracy, and web server for evolutionary placement of short sequence reads under maximum likelihood","volume":"60","author":"Berger","year":"2011","journal-title":"Syst. Biol"},{"key":"2023013107272554300_bty767-B42","volume-title":"Modern Multidimensional Scaling: Theory and Applications","author":"Borg","year":"2005"},{"key":"2023013107272554300_bty767-B6","doi-asserted-by":"crossref","DOI":"10.1128\/mSystems.00103-18","article-title":"Critical assessment of metagenome interpretation enters the second round","volume":"3","author":"Bremges","year":"2018","journal-title":"mSystems"},{"key":"2023013107272554300_bty767-B7","doi-asserted-by":"crossref","first-page":"D633","DOI":"10.1093\/nar\/gkt1244","article-title":"Ribosomal database project: data and tools for high throughput rRNA analysis","volume":"42","author":"Cole","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2023013107272554300_bty767-B8","article-title":"Scalable methods for post-processing, visualizing, and analyzing phylogenetic placements","author":"Czech","year":"2018","journal-title":"bioRxiv"},{"key":"2023013107272554300_bty767-B9","doi-asserted-by":"crossref","first-page":"1261605.","DOI":"10.1126\/science.1261605","article-title":"Eukaryotic plankton diversity in the sunlit ocean","volume":"348","author":"de Vargas","year":"2015","journal-title":"Science"},{"key":"2023013107272554300_bty767-B10","doi-asserted-by":"crossref","first-page":"5069","DOI":"10.1128\/AEM.03006-05","article-title":"Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB","volume":"72","author":"DeSantis","year":"2006","journal-title":"Appl. Environ. Microbiol"},{"key":"2023013107272554300_bty767-B11","doi-asserted-by":"crossref","first-page":"993","DOI":"10.1093\/molbev\/msu055","article-title":"Placing environmental next-generation sequencing amplicons from microbial eukaryotes into a phylogenetic context","volume":"31","author":"Dunthorn","year":"2014","journal-title":"Mol. Biol. Evol"},{"key":"2023013107272554300_bty767-B12","doi-asserted-by":"crossref","first-page":"755","DOI":"10.1093\/bioinformatics\/14.9.755","article-title":"Profile hidden Markov models","volume":"14","author":"Eddy","year":"1998","journal-title":"Bioinformatics"},{"key":"2023013107272554300_bty767-B13","doi-asserted-by":"crossref","first-page":"2.","DOI":"10.1186\/2042-5783-3-2","article-title":"Beginner\u2019s guide to comparative bacterial genome analysis using next-generation sequence data","volume":"3","author":"Edwards","year":"2013","journal-title":"Microb. Inform. Exp"},{"key":"2023013107272554300_bty767-B14","doi-asserted-by":"crossref","first-page":"1","DOI":"10.3389\/fgene.2015.00348","article-title":"The road to metagenomics: from microbiology to DNA sequencing technologies and bioinformatics","volume":"6","author":"Escobar-Zepeda","year":"2015","journal-title":"Front. Genet"},{"key":"2023013107272554300_bty767-B15","doi-asserted-by":"crossref","first-page":"D597","DOI":"10.1093\/nar\/gks1160","article-title":"The Protist Ribosomal Reference database (PR2): a catalog of unicellular eukaryote small sub-unit rRNA sequences with curated taxonomy","volume":"41","author":"Guillou","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2023013107272554300_bty767-B16","doi-asserted-by":"crossref","first-page":"207","DOI":"10.1038\/nature11234","article-title":"Structure, function and diversity of the healthy human microbiome","volume":"486","author":"Huttenhower","year":"2012","journal-title":"Nature"},{"key":"2023013107272554300_bty767-B17","doi-asserted-by":"crossref","first-page":"2761","DOI":"10.1128\/JCM.01228-07","article-title":"16S rRNA gene sequencing for bacterial identification in the diagnostic laboratory: pluses, perils, and pitfalls","volume":"45","author":"Janda","year":"2007","journal-title":"J. Clin. Microbiol"},{"key":"2023013107272554300_bty767-B18","doi-asserted-by":"crossref","first-page":"716","DOI":"10.1099\/ijs.0.038075-0","article-title":"Introducing EzTaxon-e: a prokaryotic 16S rRNA gene sequence database with phylotypes that represent uncultured species","volume":"62","author":"Kim","year":"2012","journal-title":"Int. J. Syst. Evol. Microbiol"},{"key":"2023013107272554300_bty767-B19","doi-asserted-by":"crossref","first-page":"540","DOI":"10.1007\/s002390010184","article-title":"The closest BLAST hit is often not the nearest neighbor","volume":"52","author":"Koski","year":"2001","journal-title":"J. Mol. Evol"},{"key":"2023013107272554300_bty767-B20","doi-asserted-by":"crossref","first-page":"5022","DOI":"10.1093\/nar\/gkw396","article-title":"Phylogeny-aware identification and correction of taxonomically mislabeled sequences","volume":"44","author":"Kozlov","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2023013107272554300_bty767-B21","doi-asserted-by":"crossref","first-page":"2659","DOI":"10.1111\/1462-2920.12250","article-title":"Metagenomic 16S rDNA Illumina tags are a powerful alternative to amplicon sequencing to explore diversity and structure of microbial communities","volume":"16","author":"Logares","year":"2014","journal-title":"Environ. Microbiol"},{"key":"2023013107272554300_bty767-B41","doi-asserted-by":"crossref","first-page":"8228","DOI":"10.1128\/AEM.71.12.8228-8235.2005","article-title":"UniFrac: a new phylogenetic method for comparing microbial communities","volume":"71","author":"Lozupone","year":"2005","journal-title":"Appl. Environ. Microbiol."},{"key":"2023013107272554300_bty767-B22","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1038\/s41559-017-0091","article-title":"Parasites dominate hyperdiverse soil protist communities in Neotropical rainforests","volume":"1","author":"Mah\u00e9","year":"2017","journal-title":"Nat. Ecol. Evol"},{"key":"2023013107272554300_bty767-B23","first-page":"1","article-title":"Edge principal components and squash clustering: using the special structure of phylogenetic placement data for sample comparison","volume":"8","author":"Matsen","year":"2011","journal-title":"PLoS One"},{"key":"2023013107272554300_bty767-B24","doi-asserted-by":"crossref","first-page":"538.","DOI":"10.1186\/1471-2105-11-538","article-title":"pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree","volume":"11","author":"Matsen","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2023013107272554300_bty767-B25","doi-asserted-by":"crossref","first-page":"680","DOI":"10.2307\/1907651","article-title":"A set of independent necessary and sufficient conditions for simple majority decision","volume":"20","author":"May","year":"1952","journal-title":"Econometrica"},{"key":"2023013107272554300_bty767-B26","doi-asserted-by":"crossref","first-page":"215","DOI":"10.1038\/nature11209","article-title":"A framework for human microbiome research","volume":"486","author":"Meth\u00e9","year":"2012","journal-title":"Nature"},{"key":"2023013107272554300_bty767-B27","first-page":"247","article-title":"SEPP: SAT\u00e9-enabled phylogenetic placement","author":"Mirarab","year":"2012","journal-title":"Proceedings of the Conference Pacific Symposium on Biocomputing. World Scientific"},{"key":"2023013107272554300_bty767-B28","article-title":"A proposal for a standardized bacterial taxonomy based on genome phylogeny","author":"Parks","year":"2018","journal-title":"bioRxiv"},{"key":"2023013107272554300_bty767-B29","doi-asserted-by":"crossref","first-page":"D590","DOI":"10.1093\/nar\/gks1219","article-title":"The SILVA ribosomal RNA gene database project: improved data processing and web-based tools","volume":"41","author":"Quast","year":"2013","journal-title":"Nucleic Acids Res"},{"key":"2023013107272554300_bty767-B30","doi-asserted-by":"crossref","first-page":"369","DOI":"10.1006\/jtbi.1997.0493","article-title":"Estimating the entropy of DNA sequences","volume":"188","author":"Schmitt","year":"1997","journal-title":"J. Theor. Biol"},{"key":"2023013107272554300_bty767-B31","doi-asserted-by":"crossref","first-page":"1063","DOI":"10.1038\/nmeth.4458","article-title":"Critical Assessment of Metagenome Interpretation a benchmark of metagenomics software","volume":"14","author":"Sczyrba","year":"2017","journal-title":"Nat. Methods"},{"key":"2023013107272554300_bty767-B32","volume-title":"The Mathematical Theory of Communication","author":"Shannon","year":"1951"},{"key":"2023013107272554300_bty767-B33","doi-asserted-by":"crossref","first-page":"e37818.","DOI":"10.1371\/journal.pone.0037818","article-title":"Bacterial communities in women with bacterial vaginosis: high resolution phylogenetic analyses reveal relationships of microbiota to clinical criteria","volume":"7","author":"Srinivasan","year":"2012","journal-title":"PLoS One"},{"key":"2023013107272554300_bty767-B34","doi-asserted-by":"crossref","first-page":"1196.","DOI":"10.1038\/nmeth.2693","article-title":"Metagenomic species profiling using universal phylogenetic marker genes","volume":"10","author":"Sunagawa","year":"2013","journal-title":"Nat. Methods"},{"key":"2023013107272554300_bty767-B35","doi-asserted-by":"crossref","first-page":"1256688.","DOI":"10.1126\/science.1256688","article-title":"Global diversity and geography of soil fungi","volume":"346","author":"Tedersoo","year":"2014","journal-title":"Science"},{"key":"2023013107272554300_bty767-B36","doi-asserted-by":"crossref","first-page":"457","DOI":"10.1038\/nature24621","article-title":"A communal catalogue reveals Earth\u2019s multiscale microbial diversity","volume":"551","author":"Thompson","year":"2017","journal-title":"Nature"},{"key":"2023013107272554300_bty767-B37","doi-asserted-by":"crossref","first-page":"376","DOI":"10.1093\/bib\/bbt068","article-title":"Information theory applications for biological sequence analysis","volume":"15","author":"Vinga","year":"2014","journal-title":"Brief. Bioinform"},{"key":"2023013107272554300_bty767-B38","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/S1055-7903(02)00326-3","article-title":"An index of substitution saturation and its application","volume":"26","author":"Xia","year":"2003","journal-title":"Mol. Phylogenet. Evol"},{"key":"2023013107272554300_bty767-B39","doi-asserted-by":"crossref","first-page":"329","DOI":"10.1093\/sysbio\/43.3.329","article-title":"Statistical properties of the maximum likelihood method of phylogenetic estimation and comparison with distance matrix methods","volume":"43","author":"Yang","year":"1994","journal-title":"Syst. Biol"},{"key":"2023013107272554300_bty767-B40","doi-asserted-by":"crossref","first-page":"D643","DOI":"10.1093\/nar\/gkt1209","article-title":"The SILVA and \u201cAll-species Living Tree Project (LTP)\u201d taxonomic frameworks","volume":"42","author":"Yilmaz","year":"2014","journal-title":"Nucleic Acids Res"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/7\/1151\/48967648\/bioinformatics_35_7_1151.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/7\/1151\/48967648\/bioinformatics_35_7_1151.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,31]],"date-time":"2023-01-31T05:28:57Z","timestamp":1675142937000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/35\/7\/1151\/5088318"}},"subtitle":[],"editor":[{"given":"Russell","family":"Schwartz","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2018,8,31]]},"references-count":42,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2019,4,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bty767","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/299792","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2019,4,1]]},"published":{"date-parts":[[2018,8,31]]}}}