{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:34:52Z","timestamp":1772138092112,"version":"3.50.1"},"reference-count":53,"publisher":"Oxford University Press (OUP)","issue":"18","license":[{"start":{"date-parts":[[2019,1,29]],"date-time":"2019-01-29T00:00:00Z","timestamp":1548720000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"name":"European Union\u2019s Horizon 2020","award":["634650"],"award-info":[{"award-number":["634650"]}]},{"name":"Labex: Labex Agro","award":["ANR-10-LABX-0001\u201301"],"award-info":[{"award-number":["ANR-10-LABX-0001\u201301"]}]},{"DOI":"10.13039\/100017605","name":"Labex CeMEB","doi-asserted-by":"crossref","award":["ANR-10-LABX-0004"],"award-info":[{"award-number":["ANR-10-LABX-0004"]}],"id":[{"id":"10.13039\/100017605","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Labex NUMEV","award":["ANR-10-LABX-20"],"award-info":[{"award-number":["ANR-10-LABX-20"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,9,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Taxonomic classification is at the core of environmental DNA analysis. When a phylogenetic tree can be built as a prior hypothesis to such classification, phylogenetic placement (PP) provides the most informative type of classification because each query sequence is assigned to its putative origin in the tree. This is useful whenever precision is sought (e.g. in diagnostics). However, likelihood-based PP algorithms struggle to scale with the ever-increasing throughput of DNA sequencing.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We have developed RAPPAS (Rapid Alignment-free Phylogenetic Placement via Ancestral Sequences) which uses an alignment-free approach, removing the hurdle of query sequence alignment as a preliminary step to PP. Our approach relies on the precomputation of a database of k-mers that may be present with non-negligible probability in relatives of the reference sequences. The placement is performed by inspecting the stored phylogenetic origins of the k-mers in the query, and their probabilities. The database can be reused for the analysis of several different metagenomes. Experiments show that the first implementation of RAPPAS is already faster than competing likelihood-based PP algorithms, while keeping similar accuracy for short reads. RAPPAS scales PP for the era of routine metagenomic diagnostics.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>Program and sources freely available for download at https:\/\/github.com\/blinard-BIOINFO\/RAPPAS.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btz068","type":"journal-article","created":{"date-parts":[[2019,1,24]],"date-time":"2019-01-24T11:19:46Z","timestamp":1548328786000},"page":"3303-3312","source":"Crossref","is-referenced-by-count":48,"title":["Rapid alignment-free phylogenetic identification of metagenomic sequences"],"prefix":"10.1093","volume":"35","author":[{"given":"Benjamin","family":"Linard","sequence":"first","affiliation":[{"name":"LIRMM, University of Montpellier, CNRS , Montpellier, France"},{"name":"ISEM, University of Montpellier, CNRS, IRD, EPHE, CIRAD, INRAP , Montpellier, France"},{"name":"AGAP, University of Montpellier, CIRAD, INRA, Montpellier Supagro , Montpellier, France"}]},{"given":"Krister","family":"Swenson","sequence":"additional","affiliation":[{"name":"LIRMM, University of Montpellier, CNRS , Montpellier, France"},{"name":"Institut de Biologie Computationnelle , Montpellier, France"}]},{"given":"Fabio","family":"Pardi","sequence":"additional","affiliation":[{"name":"LIRMM, University of Montpellier, CNRS , Montpellier, France"},{"name":"Institut de Biologie Computationnelle , Montpellier, France"}]}],"member":"286","published-online":{"date-parts":[[2019,1,29]]},"reference":[{"key":"2023013108051506100_btz068-B1","doi-asserted-by":"crossref","first-page":"2253","DOI":"10.1093\/bioinformatics\/btt389","article-title":"Scalable metagenomic taxonomy classification using a reference genome database","volume":"29","author":"Ames","year":"2013","journal-title":"Bioinformatics"},{"key":"2023013108051506100_btz068-B2","doi-asserted-by":"crossref","first-page":"365","DOI":"10.1093\/sysbio\/syy054","article-title":"EPA-ng: massively parallel evolutionary placement of genetic sequences","volume":"68","author":"Barbera","year":"2019","journal-title":"Syst. Biol"},{"key":"2023013108051506100_btz068-B4","doi-asserted-by":"crossref","first-page":"291","DOI":"10.1093\/sysbio\/syr010","article-title":"Performance, accuracy, and Web server for evolutionary placement of short sequence reads under maximum likelihood","volume":"60","author":"Berger","year":"2011","journal-title":"Syst. Biol."},{"key":"2023013108051506100_btz068-B5","doi-asserted-by":"crossref","first-page":"2068","DOI":"10.1093\/bioinformatics\/btr320","article-title":"Aligning short reads to reference alignments and trees","volume":"27","author":"Berger","year":"2011","journal-title":"Bioinformatics"},{"key":"2023013108051506100_btz068-B56","doi-asserted-by":"crossref","first-page":"3584","DOI":"10.1093\/bioinformatics\/btv419","article-title":"Spaced seeds improve k-mer-based metagenomic classification","volume":"31","author":"B\u0159inda","year":"2015","journal-title":"Bioinformatics."},{"key":"2023013108051506100_btz068-B6","first-page":"310","article-title":"LSHPlace: fast phylogenetic placement using locality-sensitive hashing","volume":"2013","author":"Brown","year":"2013","journal-title":"Pac. Symp. Biocomput."},{"key":"2023013108051506100_btz068-B7","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.medmal.2013.10.002","article-title":"Probiotics, gut microbiota and health","volume":"44","author":"Butel","year":"2014","journal-title":"M\u00e9d. Mal. Infect."},{"key":"2023013108051506100_btz068-B8","doi-asserted-by":"crossref","first-page":"D633","DOI":"10.1093\/nar\/gkt1244","article-title":"Ribosomal Database Project: data and tools for high throughput rRNA analysis","volume":"42","author":"Cole","year":"2014","journal-title":"Nucleic Acids Res."},{"key":"2023013108051506100_btz068-B9","doi-asserted-by":"crossref","first-page":"1435","DOI":"10.1111\/1755-0998.12401","article-title":"PhytoREF: a reference database of the plastidial 16S rRNA gene of photosynthetic eukaryotes with curated taxonomy","volume":"15","author":"Decelle","year":"2015","journal-title":"Mol. Ecol. Resour."},{"key":"2023013108051506100_btz068-B10","doi-asserted-by":"crossref","first-page":"5872","DOI":"10.1111\/mec.14350","article-title":"Environmental DNA metabarcoding: transforming how we survey animal and plant communities","volume":"26","author":"Deiner","year":"2017","journal-title":"Mol. Ecol."},{"key":"2023013108051506100_btz068-B11","doi-asserted-by":"crossref","first-page":"e2005849","DOI":"10.1371\/journal.pbio.2005849","article-title":"EukRef: phylogenetic curation of ribosomal RNA to enhance understanding of eukaryotic diversity and distribution","volume":"16","author":"Del Campo","year":"2018","journal-title":"PLoS Biol."},{"key":"2023013108051506100_btz068-B12","doi-asserted-by":"crossref","first-page":"5069","DOI":"10.1128\/AEM.03006-05","article-title":"Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB","volume":"72","author":"De Santis","year":"2006","journal-title":"Appl. Environ. Microbiol."},{"key":"2023013108051506100_btz068-B13","doi-asserted-by":"crossref","first-page":"755","DOI":"10.1093\/bioinformatics\/14.9.755","article-title":"Profile hidden Markov models","volume":"14","author":"Eddy","year":"1998","journal-title":"Bioinformatics"},{"key":"2023013108051506100_btz068-B14","doi-asserted-by":"crossref","first-page":"e1002195","DOI":"10.1371\/journal.pcbi.1002195","article-title":"Accelerated profile HMM searches","volume":"7","author":"Eddy","year":"2011","journal-title":"PLoS Comput. Biol."},{"key":"2023013108051506100_btz068-B15","article-title":"Extreme metagenomics using nanopore DNA sequencing: a field report from Svalbard, 78\u2009N","author":"Edwards","year":"2016"},{"key":"2023013108051506100_btz068-B16","doi-asserted-by":"crossref","first-page":"569","DOI":"10.1111\/j.1467-9868.2011.01018.x","article-title":"The phylogenetic Kantorovich\u2013Rubinstein metric for environmental sequence samples","volume":"74","author":"Evans","year":"2012","journal-title":"J. R. Stat. Soc. Ser. B Stat. Methodol."},{"key":"2023013108051506100_btz068-B17","volume-title":"Inferring Phylogenies. 2003","author":"Felsenstein","year":"2004"},{"key":"2023013108051506100_btz068-B18","doi-asserted-by":"crossref","first-page":"S13","DOI":"10.1186\/1471-2164-16-S1-S13","article-title":"Phylogenetic placement of metagenomic reads using the minimum evolution principle","volume":"16","author":"Filipski","year":"2015","journal-title":"BMC Genomics"},{"key":"2023013108051506100_btz068-B19","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1038\/nrg.2017.88","article-title":"Towards a genomics-informed, real-time, global pathogen surveillance system","volume":"19","author":"Gardy","year":"2017","journal-title":"Nat. Rev. Genet."},{"key":"2023013108051506100_btz068-B20","doi-asserted-by":"crossref","first-page":"969","DOI":"10.1111\/mec.13944","article-title":"Documenting DNA in the dust","volume":"26","author":"Gilbert","year":"2017","journal-title":"Mol. Ecol."},{"key":"2023013108051506100_btz068-B21","doi-asserted-by":"crossref","first-page":"541","DOI":"10.1128\/CMR.00075-13","article-title":"Whole-genome sequencing in outbreak analysis","volume":"28","author":"Gilchrist","year":"2015","journal-title":"Clin. Microbiol. Rev."},{"key":"2023013108051506100_btz068-B55","doi-asserted-by":"crossref","first-page":"759","DOI":"10.1111\/j.1755-0998.2011.03024.x","article-title":"Field guide to next-generation DNA sequencers","volume":"11","author":"Glenn","year":"2011","journal-title":"Mol. Ecol. Resour."},{"key":"2023013108051506100_btz068-B22","doi-asserted-by":"crossref","first-page":"307","DOI":"10.1093\/sysbio\/syq010","article-title":"New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0","volume":"59","author":"Guindon","year":"2010","journal-title":"Syst. Biol."},{"key":"2023013108051506100_btz068-B23","volume-title":"Mason \u2013 A Read Simulator for Second Generation Sequencing Data","author":"Holtgrewe","year":"2010"},{"key":"2023013108051506100_btz068-B24","doi-asserted-by":"crossref","first-page":"W7","DOI":"10.1093\/nar\/gku398","article-title":"Spaced words and kmacs: fast alignment-free sequence comparison based on inexact word matches","volume":"42","author":"Horwege","year":"2014","journal-title":"Nucleic Acids Res."},{"key":"2023013108051506100_btz068-B25","doi-asserted-by":"crossref","first-page":"377","DOI":"10.1101\/gr.5969107","article-title":"MEGAN analysis of metagenomic data","volume":"17","author":"Huson","year":"2007","journal-title":"Genome Res."},{"key":"2023013108051506100_btz068-B26","doi-asserted-by":"crossref","first-page":"e1004957","DOI":"10.1371\/journal.pcbi.1004957","article-title":"MEGAN community edition - interactive exploration and analysis of large-scale microbiome sequencing data","volume":"12","author":"Huson","year":"2016","journal-title":"PLoS Comput. Biol."},{"key":"2023013108051506100_btz068-B27","doi-asserted-by":"crossref","first-page":"470","DOI":"10.1186\/1471-2105-12-470","article-title":"Algorithms, data structures, and numerics for likelihood-based phylogenetic inference of huge trees","volume":"12","author":"Izquierdo-Carrasco","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"2023013108051506100_btz068-B28","doi-asserted-by":"crossref","first-page":"121","DOI":"10.1016\/j.jcv.2011.03.006","article-title":"An automated genotyping tool for enteroviruses and noroviruses","volume":"51","author":"Kroneman","year":"2011","journal-title":"J. Clin. Virol."},{"key":"2023013108051506100_btz068-B29","volume-title":"R software package not associated to a published manuscript","author":"Lefeuvre","year":"2018"},{"key":"2023013108051506100_btz068-B30","doi-asserted-by":"crossref","first-page":"W242","DOI":"10.1093\/nar\/gkw290","article-title":"Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees","volume":"44","author":"Letunic","year":"2016","journal-title":"Nucleic Acids Res."},{"key":"2023013108051506100_btz068-B31","doi-asserted-by":"crossref","first-page":"171","DOI":"10.1093\/bioinformatics\/btx432","article-title":"A novel data structure to support ultra-fast taxonomic classification of metagenomic sequences with k-mer signatures","volume":"34","author":"Liu","year":"2018","journal-title":"Bioinformatics"},{"key":"2023013108051506100_btz068-B33","doi-asserted-by":"crossref","first-page":"e593","DOI":"10.7717\/peerj.593","article-title":"Swarm: robust and fast clustering method for amplicon-based studies","volume":"2","author":"Mah\u00e9","year":"2014","journal-title":"PeerJ"},{"key":"2023013108051506100_btz068-B34","doi-asserted-by":"crossref","first-page":"538","DOI":"10.1186\/1471-2105-11-538","article-title":"pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree","volume":"11","author":"Matsen","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2023013108051506100_btz068-B35","doi-asserted-by":"crossref","first-page":"e31009","DOI":"10.1371\/journal.pone.0031009","article-title":"A format for phylogenetic placements","volume":"7","author":"Matsen","year":"2012","journal-title":"PLoS One"},{"key":"2023013108051506100_btz068-B36","doi-asserted-by":"crossref","first-page":"e56859","DOI":"10.1371\/journal.pone.0056859","article-title":"Edge principal components and squash clustering: using the special structure of phylogenetic placement data for sample comparison","volume":"8","author":"Matsen","year":"2013","journal-title":"PLoS One"},{"key":"2023013108051506100_btz068-B37","doi-asserted-by":"crossref","first-page":"e157","DOI":"10.7717\/peerj.157","article-title":"Abundance-weighted phylogenetic diversity measures distinguish microbial community states and are robust to sampling depth","volume":"1","author":"McCoy","year":"2013","journal-title":"PeerJ"},{"key":"2023013108051506100_btz068-B38","doi-asserted-by":"crossref","first-page":"3740","DOI":"10.1093\/bioinformatics\/btx520","article-title":"MetaCache: context-aware classification of metagenomic reads using minhashing","volume":"33","author":"M\u00fcller","year":"2017","journal-title":"Bioinformatics"},{"key":"2023013108051506100_btz068-B39","doi-asserted-by":"crossref","first-page":"132","DOI":"10.1186\/s13059-016-0997-x","article-title":"Mash: fast genome and metagenome distance estimation using MinHash","volume":"17","author":"Ondov","year":"2016","journal-title":"Genome Biol."},{"key":"2023013108051506100_btz068-B40","doi-asserted-by":"crossref","first-page":"236","DOI":"10.1186\/s12864-015-1419-2","article-title":"CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers","volume":"16","author":"Ounit","year":"2015","journal-title":"BMC Genomics"},{"key":"2023013108051506100_btz068-B41","doi-asserted-by":"crossref","first-page":"313","DOI":"10.1111\/mec.14478","article-title":"Scaling up: a guide to high-throughput genomic approaches for biodiversity analysis","volume":"27","author":"Porter","year":"2018","journal-title":"Mol. Ecol."},{"key":"2023013108051506100_btz068-B42","doi-asserted-by":"crossref","first-page":"D590","DOI":"10.1093\/nar\/gks1219","article-title":"The SILVA ribosomal RNA gene database project: improved data processing and web-based tools","volume":"41","author":"Quast","year":"2012","journal-title":"Nucleic Acids Res."},{"key":"2023013108051506100_btz068-B44","doi-asserted-by":"crossref","first-page":"48","DOI":"10.1016\/j.csbj.2016.11.005","article-title":"Bioinformatics strategies for taxonomy independent binning and visualization of sequences in shotgun metagenomics","volume":"15","author":"Sedlar","year":"2017","journal-title":"Comput. Struct. Biotechnol. J."},{"key":"2023013108051506100_btz068-B45","doi-asserted-by":"crossref","first-page":"S9","DOI":"10.1186\/1471-2164-15-S10-S9","article-title":"HyDA-Vista: towards optimal guided selection of k-mer size for sequence assembly","volume":"15","author":"Shariat","year":"2014","journal-title":"BMC Genomics"},{"key":"2023013108051506100_btz068-B46","doi-asserted-by":"crossref","first-page":"1489","DOI":"10.1128\/JVI.02027-14","article-title":"Unraveling the web of viroinformatics: computational tools and databases in virus research","volume":"89","author":"Sharma","year":"2015","journal-title":"Virol. J."},{"key":"2023013108051506100_btz068-B47","doi-asserted-by":"crossref","first-page":"D36","DOI":"10.1093\/nar\/gkx1125","article-title":"The European Nucleotide Archive in 2017","volume":"46","author":"Silvester","year":"2018","journal-title":"Nucleic Acids Res."},{"key":"2023013108051506100_btz068-B48","doi-asserted-by":"crossref","first-page":"457","DOI":"10.1038\/nature24621","article-title":"A communal catalogue reveals Earth\u2019s multiscale microbial diversity","volume":"551","year":"2017","journal-title":"Nature"},{"key":"2023013108051506100_btz068-B49","doi-asserted-by":"crossref","first-page":"341","DOI":"10.3109\/10408363.2016.1163663","article-title":"Hepatitis C virus whole genome sequencing: current methods\/issues and future challenges","volume":"53","author":"Tr\u00e9meaux","year":"2016","journal-title":"Crit. Rev. Clin. Lab. Sci."},{"key":"2023013108051506100_btz068-B50","doi-asserted-by":"crossref","first-page":"R46","DOI":"10.1186\/gb-2014-15-3-r46","article-title":"Kraken: ultrafast metagenomic sequence classification using exact alignments","volume":"15","author":"Wood","year":"2014","journal-title":"Genome Biol."},{"key":"2023013108051506100_btz068-B51","doi-asserted-by":"crossref","first-page":"1586","DOI":"10.1093\/molbev\/msm088","article-title":"PAML 4: phylogenetic analysis by maximum likelihood","volume":"24","author":"Yang","year":"2007","journal-title":"Mol. Biol. Evol."},{"key":"2023013108051506100_btz068-B52","doi-asserted-by":"crossref","first-page":"1641","DOI":"10.1093\/genetics\/141.4.1641","article-title":"A new method of inference of ancestral nucleotide and amino acid sequences","volume":"141","author":"Yang","year":"1995","journal-title":"Genetics"},{"key":"2023013108051506100_btz068-B54","doi-asserted-by":"crossref","first-page":"D643","DOI":"10.1093\/nar\/gkt1209","article-title":"The SILVA and \u201cAll-species Living Tree Project (LTP)\u201d taxonomic frameworks","volume":"42","author":"Yilmaz","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2023013108051506100_btz068-B53","doi-asserted-by":"crossref","DOI":"10.7554\/eLife.14258","article-title":"Using mobile sequencers in an academic classroom","volume":"5","author":"Zaaijer","year":"2016","journal-title":"Elife"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/18\/3303\/48975268\/bioinformatics_35_18_3303.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/18\/3303\/48975268\/bioinformatics_35_18_3303.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,31]],"date-time":"2023-01-31T08:37:53Z","timestamp":1675154273000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/35\/18\/3303\/5303992"}},"subtitle":[],"editor":[{"given":"Rusell","family":"Schwartz","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2019,1,29]]},"references-count":53,"journal-issue":{"issue":"18","published-print":{"date-parts":[[2019,9,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btz068","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/328740","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2019,9,15]]},"published":{"date-parts":[[2019,1,29]]}}}