{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,25]],"date-time":"2026-03-25T07:20:01Z","timestamp":1774423201768,"version":"3.50.1"},"reference-count":32,"publisher":"Oxford University Press (OUP)","issue":"14","license":[{"start":{"date-parts":[[2017,2,21]],"date-time":"2017-02-21T00:00:00Z","timestamp":1487635200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/about_us\/legal\/notices"}],"funder":[{"DOI":"10.13039\/100000001","name":"NSF","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"NIH","doi-asserted-by":"publisher","award":["R01 HG006129"],"award-info":[{"award-number":["R01 HG006129"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"NIH","doi-asserted-by":"publisher","award":["R01 DK094699"],"award-info":[{"award-number":["R01 DK094699"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2017,7,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Read assignment is an important first step in many metagenomic analysis workflows, providing the basis for identification and quantification of species. However ambiguity among the sequences of many strains makes it difficult to assign reads at the lowest level of taxonomy, and reads are typically assigned to taxonomic levels where they are unambiguous. We explore connections between metagenomic read assignment and the quantification of transcripts from RNA-Seq data in order to develop novel methods for rapid and accurate quantification of metagenomic strains.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We find that the recent idea of pseudoalignment introduced in the RNA-Seq context is highly applicable in the metagenomics setting. When coupled with the Expectation-Maximization (EM) algorithm, reads can be assigned far more accurately and quickly than is currently possible with state of the art software, making it possible and practical for the first time to analyze abundances of individual genomes in metagenomics projects.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and Implementation<\/jats:title>\n                  <jats:p>Pipeline and analysis code can be downloaded from http:\/\/github.com\/pachterlab\/metakallisto<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btx106","type":"journal-article","created":{"date-parts":[[2017,2,17]],"date-time":"2017-02-17T20:13:51Z","timestamp":1487362431000},"page":"2082-2088","source":"Crossref","is-referenced-by-count":86,"title":["Pseudoalignment for metagenomic read assignment"],"prefix":"10.1093","volume":"33","author":[{"given":"L","family":"Schaeffer","sequence":"first","affiliation":[{"name":"Department of Molecular and Cell Biology, UC Berkeley, Berkeley, CA, USA"}]},{"given":"H","family":"Pimentel","sequence":"additional","affiliation":[{"name":"Department of Genetics, Stanford University, Stanford, CA, USA"}]},{"given":"N","family":"Bray","sequence":"additional","affiliation":[{"name":"Department of Molecular and Cell Biology and Innovative Genomics Institute, UC Berkeley, Berkeley, CA, USA"}]},{"given":"P","family":"Melsted","sequence":"additional","affiliation":[{"name":"Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjavik, Iceland"}]},{"given":"L","family":"Pachter","sequence":"additional","affiliation":[{"name":"Department of Molecular and Cell Biology, UC Berkeley, Berkeley, CA, USA"},{"name":"Departments of Mathematics and Computer Science, UC Berkeley, Berkeley, CA, USA"}]}],"member":"286","published-online":{"date-parts":[[2017,2,21]]},"reference":[{"key":"2023051601055265600_btx106-B1","doi-asserted-by":"crossref","first-page":"R106.","DOI":"10.1186\/gb-2010-11-10-r106","article-title":"Differential expression analysis for sequence count data","volume":"11","author":"Anders","year":"2010","journal-title":"Genome Biol"},{"key":"2023051601055265600_btx106-B2","first-page":"btu170.","article-title":"Trimmomatic: a flexible trimmer for illumina sequence data","author":"Bolger","year":"2014","journal-title":"Bioinformatics"},{"key":"2023051601055265600_btx106-B3","doi-asserted-by":"crossref","first-page":"10063","DOI":"10.1038\/ncomms10063","article-title":"Rapid antibiotic resistance predictions from genome sequence data for S. aureus and M. tuberculosis","volume":"6","author":"Bradley","year":"2015","journal-title":"Nat. Commun."},{"key":"2023051601055265600_btx106-B4","first-page":"525","author":"Bray","year":"2015"},{"key":"2023051601055265600_btx106-B5","doi-asserted-by":"crossref","first-page":"106","DOI":"10.1371\/journal.pcbi.0010024","article-title":"Bioinformatics for whole-genome shotgun sequencing of microbial communities","volume":"1","author":"Chen","year":"2005","journal-title":"PLoS Comput. Biol"},{"key":"2023051601055265600_btx106-B6","doi-asserted-by":"crossref","first-page":"613","DOI":"10.1038\/nmeth.1223","article-title":"Stem cell transcriptome profiling via massive-scale mRNA sequencing","volume":"5","author":"Cloonan","year":"2008","journal-title":"Nat. Methods"},{"key":"2023051601055265600_btx106-B7","doi-asserted-by":"crossref","first-page":"377","DOI":"10.1101\/gr.5969107","article-title":"MEGAN analysis of metagenomic data","volume":"17","author":"Huson","year":"2007","journal-title":"Genome Res"},{"key":"2023051601055265600_btx106-B8","doi-asserted-by":"crossref","first-page":"S12.","DOI":"10.1186\/1471-2105-10-S1-S12","article-title":"Methods for comparative metagenomics","volume":"10","author":"Huson","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2023051601055265600_btx106-B9","doi-asserted-by":"crossref","first-page":"D574","DOI":"10.1093\/nar\/gkv1209","article-title":"Ensembl genomes 2016: more genomes, more complexity","volume":"44","author":"Kersey","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2023051601055265600_btx106-B10","doi-asserted-by":"crossref","first-page":"141","DOI":"10.1007\/s10142-015-0433-4","article-title":"Insights from 20 years of bacterial genome sequencing","volume":"15","author":"Land","year":"2015","journal-title":"Funct. Integr. Genomics"},{"key":"2023051601055265600_btx106-B11","doi-asserted-by":"crossref","first-page":"357","DOI":"10.1038\/nmeth.1923","article-title":"Fast gapped-read alignment with Bowtie 2","volume":"9","author":"Langmead","year":"2012","journal-title":"Nat. Methods"},{"key":"2023051601055265600_btx106-B12","doi-asserted-by":"crossref","first-page":"323.","DOI":"10.1186\/1471-2105-12-323","article-title":"RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome","volume":"12","author":"Li","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"2023051601055265600_btx106-B13","author":"Lindgreen","year":"2015"},{"key":"2023051601055265600_btx106-B14","doi-asserted-by":"crossref","first-page":"e10.","DOI":"10.1093\/nar\/gks803","article-title":"GASiC: Metagenomic abundance estimation and diagnostic testing on species level","volume":"41","author":"Lindner","year":"2013","journal-title":"Nucleic Acids Res"},{"key":"2023051601055265600_btx106-B15","doi-asserted-by":"crossref","first-page":"523","DOI":"10.1016\/j.cell.2008.03.029","article-title":"Highly integrated single-base resolution maps of the epigenome in Arabidopsis","volume":"133","author":"Lister","year":"2008","journal-title":"Cell"},{"key":"2023051601055265600_btx106-B16","author":"Lu","year":"2016"},{"key":"2023051601055265600_btx106-B17","doi-asserted-by":"crossref","first-page":"461","DOI":"10.1093\/bioinformatics\/bts714","article-title":"Data exploration, quality control and testing in single-cell qPCR-based gene expression experiments","volume":"29","author":"McDavid","year":"2013","journal-title":"Bioinformatics"},{"key":"2023051601055265600_btx106-B18","doi-asserted-by":"crossref","first-page":"e31386.","DOI":"10.1371\/journal.pone.0031386","article-title":"Assessment of metagenomic assembly using simulated next generation sequencing data","volume":"7","author":"Mende","year":"2012","journal-title":"PLoS ONE"},{"key":"2023051601055265600_btx106-B19","doi-asserted-by":"crossref","first-page":"621","DOI":"10.1038\/nmeth.1226","article-title":"Mapping and quantifying mammalian transcriptomes by RNA-Seq","volume":"5","author":"Mortazavi","year":"2008","journal-title":"Nat. Methods"},{"key":"2023051601055265600_btx106-B20","doi-asserted-by":"crossref","first-page":"1344","DOI":"10.1126\/science.1158441","article-title":"The transcriptional landscape of the yeast genome defined by RNA sequencing","volume":"320","author":"Nagalakshmi","year":"2008","journal-title":"Science"},{"key":"2023051601055265600_btx106-B21","doi-asserted-by":"crossref","first-page":"9.","DOI":"10.1186\/1748-7188-6-9","article-title":"Estimation of alternative splicing isoform frequencies from RNA-Seq data","volume":"6","author":"Nicolae","year":"2011","journal-title":"Algorithms Mol. Biol"},{"key":"2023051601055265600_btx106-B22","doi-asserted-by":"crossref","first-page":"14.","DOI":"10.1186\/s13059-016-0997-x","article-title":"Mash: fast genome and metagenome distance estimation using minhash","volume":"17","author":"Ondov","year":"2016","journal-title":"Genome Biol"},{"key":"2023051601055265600_btx106-B23","doi-asserted-by":"crossref","first-page":"1200","DOI":"10.1038\/nmeth.2658","article-title":"Differential abundance analysis for microbial marker-gene surveys","volume":"10","author":"Paulson","year":"2013","journal-title":"Nat. Methods"},{"key":"2023051601055265600_btx106-B24","doi-asserted-by":"crossref","first-page":"71","DOI":"10.1038\/nmeth.2251","article-title":"Streaming fragment assignment for real-time analysis of sequencing experiments","volume":"10","author":"Roberts","year":"2013","journal-title":"Nat. Methods"},{"key":"2023051601055265600_btx106-B25","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1093\/bioinformatics\/btp616","article-title":"edgeR: a Bioconductor package for differential expression analysis of digital gene expression data","volume":"26","author":"Robinson","year":"2010","journal-title":"Bioinformatics"},{"key":"2023051601055265600_btx106-B26","doi-asserted-by":"crossref","first-page":", 162.","DOI":"10.1186\/1471-2105-7-162","article-title":"An application of statistics to comparative metagenomics","volume":"7","author":"Rodriguez-Brito","year":"2006","journal-title":"BMC Bioinformatics"},{"key":"2023051601055265600_btx106-B27","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1016\/j.copbio.2011.11.013","article-title":"Next generation sequencing and bioinformatic bottlenecks: the current state of metagenomic data analysis","volume":"23","author":"Scholz","year":"2012","journal-title":"Curr. Opin. Biotechnol"},{"key":"2023051601055265600_btx106-B28","doi-asserted-by":"crossref","first-page":"511","DOI":"10.1038\/nbt.1621","article-title":"Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation","volume":"28","author":"Trapnell","year":"2010","journal-title":"Nat. Biotechnol"},{"key":"2023051601055265600_btx106-B29","doi-asserted-by":"crossref","first-page":"554","DOI":"10.1126\/science.1107851","article-title":"Comparative metagenomics of microbial communities","volume":"308","author":"Tringe","year":"2005","journal-title":"Science"},{"key":"2023051601055265600_btx106-B30","doi-asserted-by":"crossref","first-page":"R46.","DOI":"10.1186\/gb-2014-15-3-r46","article-title":"Kraken: ultrafast metagenomic sequence classification using exact alignments","volume":"15","author":"Wood","year":"2014","journal-title":"Genome Biol"},{"key":"2023051601055265600_btx106-B31","doi-asserted-by":"crossref","first-page":"e27992","DOI":"10.1371\/journal.pone.0027992","article-title":"Accurate Genome Relative Abundance Estimation Based on Shotgun Metagenomic Reads","volume":"6","author":"Xia","year":"2011","journal-title":"Plos One"},{"key":"2023051601055265600_btx106-B32","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1016\/j.gpb.2012.11.002","article-title":"Shigella strains are not clones of Escherichia coli but sister species in the genus Escherichia","volume":"11","author":"Zuo","year":"2013","journal-title":"Genomics Proteomics Bioinf"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/14\/2082\/50314736\/bioinformatics_33_14_2082.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/14\/2082\/50314736\/bioinformatics_33_14_2082.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,16]],"date-time":"2023-05-16T01:06:26Z","timestamp":1684199186000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/33\/14\/2082\/3038398"}},"subtitle":[],"editor":[{"given":"Bonnie","family":"Berger","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2017,2,21]]},"references-count":32,"journal-issue":{"issue":"14","published-print":{"date-parts":[[2017,7,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btx106","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2017,7,15]]},"published":{"date-parts":[[2017,2,21]]}}}