{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,19]],"date-time":"2026-02-19T00:53:57Z","timestamp":1771462437576,"version":"3.50.1"},"reference-count":26,"publisher":"Oxford University Press (OUP)","issue":"19","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2016,10,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: When targeted to a barcoding region, high-throughput sequencing can be used to identify species or operational taxonomical units from environmental samples, and thus to study the diversity and structure of species communities. Although there are many methods which provide confidence scores for assigning taxonomic affiliations, it is not straightforward to translate these values to unbiased probabilities. We present a probabilistic method for taxonomical classification (PROTAX) of DNA sequences. Given a pre-defined taxonomical tree structure that is partially populated by reference sequences, PROTAX decomposes the probability of one to the set of all possible outcomes. PROTAX accounts for species that are present in the taxonomy but that do not have reference sequences, the possibility of unknown taxonomical units, as well as mislabeled reference sequences. PROTAX is based on a statistical multinomial regression model, and it can utilize any kind of sequence similarity measures or the outputs of other classifiers as predictors.<\/jats:p>\n               <jats:p>Results: We demonstrate the performance of PROTAX by using as predictors the output from BLAST, the phylogenetic classification software TIPP, and the RDP classifier. We show that PROTAX improves the predictions of the baseline implementations of TIPP and RDP classifiers, and that it is able to combine complementary information provided by BLAST and TIPP, resulting in accurate and unbiased classifications even with very challenging cases such as 50% mislabeling of reference sequences.<\/jats:p>\n               <jats:p>Availability and implementation: Perl\/R implementation of PROTAX is available at http:\/\/www.helsinki.fi\/science\/metapop\/Software.htm.<\/jats:p>\n               <jats:p>Contact: \u00a0panu.somervuo@helsinki.fi<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btw346","type":"journal-article","created":{"date-parts":[[2016,6,14]],"date-time":"2016-06-14T02:00:59Z","timestamp":1465869659000},"page":"2920-2927","source":"Crossref","is-referenced-by-count":92,"title":["Unbiased probabilistic taxonomic classification for DNA barcoding"],"prefix":"10.1093","volume":"32","author":[{"given":"Panu","family":"Somervuo","sequence":"first","affiliation":[{"name":"1 Department of Biosciences, University of Helsinki, Helsinki, Finland"}]},{"given":"Sonja","family":"Koskela","sequence":"additional","affiliation":[{"name":"1 Department of Biosciences, University of Helsinki, Helsinki, Finland"}]},{"given":"Juho","family":"Pennanen","sequence":"additional","affiliation":[{"name":"1 Department of Biosciences, University of Helsinki, Helsinki, Finland"}]},{"given":"R.","family":"Henrik Nilsson","sequence":"additional","affiliation":[{"name":"2 Department of Biological and Environmental Sciences, University of Gothenburg, Gothenburg, Sweden"}]},{"given":"Otso","family":"Ovaskainen","sequence":"additional","affiliation":[{"name":"1 Department of Biosciences, University of Helsinki, Helsinki, Finland"},{"name":"3 Centre for Biodiversity Dynamics, Department of Biology, Norwegian University of Science and Technology, Trondheim, Norway"}]}],"member":"286","published-online":{"date-parts":[[2016,6,13]]},"reference":[{"key":"2023020113462231400_btw346-B1","doi-asserted-by":"crossref","first-page":"S10","DOI":"10.1186\/1471-2105-10-S14-S10","article-title":"DNA barcode analysis: a comparison of phylogenetic and statistical classification methods","volume":"10","author":"Austerlitz","year":"2009","journal-title":"BMC Bioinformatics"},{"key":"2023020113462231400_btw346-B2","doi-asserted-by":"crossref","first-page":"1471","DOI":"10.1186\/1471-2105-13-92","article-title":"A comparative evaluation of sequence classification programs","volume":"13","author":"Bazinet","year":"2012","journal-title":"BMC Bioinformatics"},{"key":"2023020113462231400_btw346-B3","doi-asserted-by":"crossref","first-page":"673","DOI":"10.1038\/nmeth.1358","article-title":"Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models","volume":"6","author":"Brady","year":"2009","journal-title":"Nat. Methods"},{"key":"2023020113462231400_btw346-B4","doi-asserted-by":"crossref","first-page":"335","DOI":"10.1038\/nmeth.f.303","article-title":"QIIME allows analysis of high-throughput community sequencing data","volume":"7","author":"Caporaso","year":"2010","journal-title":"Nat. Methods"},{"key":"2023020113462231400_btw346-B5","doi-asserted-by":"crossref","first-page":"817","DOI":"10.1093\/bioinformatics\/btu745","article-title":"Taxator-tk: precise taxonomic assignment of metagenomes by fast approximation of evolutionary neighborhoods","volume":"31","author":"Dr\u00f6ge","year":"2015","journal-title":"Bioinformatics"},{"key":"2023020113462231400_btw346-B6","doi-asserted-by":"crossref","first-page":"e1002195","DOI":"10.1371\/journal.pcbi.1002195","article-title":"Accelerated profile HMM searches","volume":"7","author":"Eddy","year":"2011","journal-title":"PLoS Comp. Biol"},{"key":"2023020113462231400_btw346-B7","doi-asserted-by":"crossref","first-page":"611","DOI":"10.1093\/bioinformatics\/btq725","article-title":"Clustering 16S rRNA for OTU prediction: a method of unsupervised Bayesian clustering","volume":"27","author":"Hao","year":"2011","journal-title":"Bioinformatics"},{"key":"2023020113462231400_btw346-B8","doi-asserted-by":"crossref","first-page":"313","DOI":"10.1098\/rspb.2002.2218","article-title":"Biological identifications through DNA barcodes","volume":"270","author":"Hebert","year":"2003","journal-title":"Proc. R. Soc. Lond. B Biol. Sci"},{"key":"2023020113462231400_btw346-B9","doi-asserted-by":"crossref","first-page":"12794","DOI":"10.1073\/pnas.0905845106","article-title":"A DNA barcode for land plants","volume":"106","author":"Hollingsworth","year":"2009","journal-title":"Proc. Natl. Acad. Sci. U. S. A"},{"key":"2023020113462231400_btw346-B10","doi-asserted-by":"crossref","first-page":"377","DOI":"10.1101\/gr.5969107","article-title":"MEGAN analysis of metagenomic data","volume":"17","author":"Huson","year":"2007","journal-title":"Genome Res"},{"key":"2023020113462231400_btw346-B11","doi-asserted-by":"crossref","first-page":"487","DOI":"10.1101\/gr.113985.110","article-title":"Adaptive seeds tame genomic sequence comparison","volume":"21","author":"Kielbasa","year":"2011","journal-title":"Genome Res"},{"key":"2023020113462231400_btw346-B12","doi-asserted-by":"crossref","first-page":"5271","DOI":"10.1111\/mec.12481","article-title":"Towards a unified paradigm for sequence-based identification of fungi","volume":"22","author":"K\u014dljalg","year":"2013","journal-title":"Mol. Ecol"},{"key":"2023020113462231400_btw346-B13","doi-asserted-by":"crossref","first-page":"e73","DOI":"10.1093\/nar\/gku169","article-title":"MyTaxa: an advanced taxonomic classifier for genomic and metagenomic sequences","volume":"42","author":"Luo","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2023020113462231400_btw346-B14","doi-asserted-by":"crossref","first-page":"37","DOI":"10.3897\/mycokeys.4.3606","article-title":"Five simple guidelines for establishing basic authenticity and reliability of newly generated fungal ITS sequences","volume":"4","author":"Nilsson","year":"2012","journal-title":"MycoKeys"},{"key":"2023020113462231400_btw346-B15","doi-asserted-by":"crossref","first-page":"3548","DOI":"10.1093\/bioinformatics\/btu721","article-title":"TIPP: taxonomic identification and phylogenetic profiling","volume":"30","author":"Nguyen","year":"2014","journal-title":"Bioinformatics"},{"key":"2023020113462231400_btw346-B16","doi-asserted-by":"crossref","first-page":"236","DOI":"10.1186\/s12864-015-1419-2","article-title":"CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers","volume":"16","author":"Ounit","year":"2015","journal-title":"BMC Genomics"},{"key":"2023020113462231400_btw346-B17","doi-asserted-by":"crossref","first-page":"274","DOI":"10.1016\/j.funeco.2010.01.001","article-title":"Identifying wood-inhabiting fungi with 454 sequencing \u2013 what is the probability that BLAST gives the correct species","volume":"3","author":"Ovaskainen","year":"2010","journal-title":"Fungal Ecol"},{"key":"2023020113462231400_btw346-B18","doi-asserted-by":"crossref","first-page":"216","DOI":"10.1080\/10635150802032990","article-title":"Testing the reliability of genetic methods of species identification via simulation","volume":"57","author":"Ross","year":"2008","journal-title":"Syst. Biol"},{"key":"2023020113462231400_btw346-B19","doi-asserted-by":"crossref","first-page":"e14689","DOI":"10.1371\/journal.pone.0014689","article-title":"The Barcode of Life data portal: bridging the biodiversity informatics divide for DNA barcoding","volume":"6","author":"Sarkar","year":"2011","journal-title":"PLoS ONE"},{"key":"2023020113462231400_btw346-B20","doi-asserted-by":"crossref","first-page":"7537","DOI":"10.1128\/AEM.01541-09","article-title":"Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities","volume":"75","author":"Schloss","year":"2009","journal-title":"Appl. Environ. Microbiol"},{"key":"2023020113462231400_btw346-B21","doi-asserted-by":"crossref","first-page":"6241","DOI":"10.1073\/pnas.1117018109","article-title":"Nuclear ribosomal internal transcribed spacer (ITS) region as a universal DNA barcode marker for fungi","volume":"109","author":"Schoch","year":"2012","journal-title":"Proc. Natl. Acad. Sci. U. S. A"},{"key":"2023020113462231400_btw346-B22","doi-asserted-by":"crossref","first-page":"811","DOI":"10.1038\/nmeth.2066","article-title":"Metagenomic microbial community profiling using unique clade-specific marker genes","volume":"9","author":"Segata","year":"2012","journal-title":"Nat. Methods"},{"key":"2023020113462231400_btw346-B23","doi-asserted-by":"crossref","first-page":"1196","DOI":"10.1038\/nmeth.2693","article-title":"Metagenomic species profiling using universal phylogenetic marker genes","volume":"10","author":"Sunagawa","year":"2013","journal-title":"Nat. Methods"},{"key":"2023020113462231400_btw346-B24","doi-asserted-by":"crossref","first-page":"347","DOI":"10.1186\/1471-2164-10-347","article-title":"Comparison of next generation sequencing technologies for transcriptome characterization","volume":"10","author":"Wall","year":"2009","journal-title":"BMC Genomics"},{"key":"2023020113462231400_btw346-B25","doi-asserted-by":"crossref","first-page":"5261","DOI":"10.1128\/AEM.00062-07","article-title":"Na\u00efve Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy","volume":"73","author":"Wang","year":"2007","journal-title":"Appl. Environ. Microbiol"},{"key":"2023020113462231400_btw346-B26","doi-asserted-by":"crossref","first-page":"R46","DOI":"10.1186\/gb-2014-15-3-r46","article-title":"Kraken: ultrafast metagenomic sequence classification using exact alignments","volume":"15","author":"Wood","year":"2014","journal-title":"Genome Biol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/32\/19\/2920\/49021302\/bioinformatics_32_19_2920.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/32\/19\/2920\/49021302\/bioinformatics_32_19_2920.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,1]],"date-time":"2023-02-01T23:51:00Z","timestamp":1675295460000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/32\/19\/2920\/2196481"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,6,13]]},"references-count":26,"journal-issue":{"issue":"19","published-print":{"date-parts":[[2016,10,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btw346","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2016,10,1]]},"published":{"date-parts":[[2016,6,13]]}}}