{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,25]],"date-time":"2026-02-25T11:45:35Z","timestamp":1772019935579,"version":"3.50.1"},"update-to":[{"DOI":"10.1371\/journal.pcbi.1013558","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2025,11,17]],"date-time":"2025-11-17T00:00:00Z","timestamp":1763337600000}}],"reference-count":48,"publisher":"Public Library of Science (PLoS)","issue":"11","license":[{"start":{"date-parts":[[2025,11,7]],"date-time":"2025-11-07T00:00:00Z","timestamp":1762473600000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100004063","name":"Knut och Alice Wallenbergs Stiftelse","doi-asserted-by":"publisher","award":["KAW 2020.0239"],"award-info":[{"award-number":["KAW 2020.0239"]}],"id":[{"id":"10.13039\/501100004063","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004063","name":"Knut och Alice Wallenbergs Stiftelse","doi-asserted-by":"publisher","award":["KAW 2017.0003"],"award-info":[{"award-number":["KAW 2017.0003"]}],"id":[{"id":"10.13039\/501100004063","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004359","name":"Vetenskapsr\u00c3\u00a5det","doi-asserted-by":"publisher","award":["2018-04620"],"award-info":[{"award-number":["2018-04620"]}],"id":[{"id":"10.13039\/501100004359","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004359","name":"Vetenskapsr\u00c3\u00a5det","doi-asserted-by":"publisher","award":["2021-04830"],"award-info":[{"award-number":["2021-04830"]}],"id":[{"id":"10.13039\/501100004359","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004359","name":"Vetenskapsr\u00c3\u00a5det","doi-asserted-by":"publisher","award":["2021-05563"],"award-info":[{"award-number":["2021-05563"]}],"id":[{"id":"10.13039\/501100004359","id-type":"DOI","asserted-by":"publisher"}]},{"name":"National Bioinformatics Infrastructure Sweden"},{"DOI":"10.13039\/100007633","name":"Stiftelsen f\u00c3\u00b6r Milj\u00c3\u00b6strategisk Forskning","doi-asserted-by":"publisher","award":["DIA 2020\/10"],"award-info":[{"award-number":["DIA 2020\/10"]}],"id":[{"id":"10.13039\/100007633","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["www.ploscompbiol.org"],"crossmark-restriction":false},"short-container-title":["PLoS Comput Biol"],"abstract":"<jats:p>Deep metabarcoding offers an efficient and reproducible approach to biodiversity monitoring, but noisy data and incomplete reference databases challenge accurate diversity estimation and taxonomic annotation. Here, we introduce a novel algorithm, NEEAT, for removing spurious operational taxonomic units (OTUs) originating from nuclear-embedded mitochondrial DNA sequences (NUMTs) or sequencing errors. It integrates \u2018echo\u2019 signals across samples with the identification of unusual evolutionary patterns among similar DNA sequences. We also extensively benchmark current tools for chimera removal, taxonomic annotation and OTU clustering of deep metabarcoding data. The best performing tools\/parameter settings are integrated into HAPP, a high-accuracy pipeline for processing deep metabarcoding data. Tests using CO1 data from BOLD and large-scale metabarcoding data on insects demonstrate that HAPP significantly outperforms existing methods, while enabling efficient analysis of extensive datasets by parallelizing computations across taxonomic groups.<\/jats:p>","DOI":"10.1371\/journal.pcbi.1013558","type":"journal-article","created":{"date-parts":[[2025,11,7]],"date-time":"2025-11-07T18:43:03Z","timestamp":1762540983000},"page":"e1013558","update-policy":"https:\/\/doi.org\/10.1371\/journal.pcbi.corrections_policy","source":"Crossref","is-referenced-by-count":2,"title":["HAPP: High-accuracy pipeline for processing deep metabarcoding data"],"prefix":"10.1371","volume":"21","author":[{"given":"John","family":"Sundh","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1513-1674","authenticated-orcid":true,"given":"Emma","family":"Granqvist","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1412-1711","authenticated-orcid":true,"given":"Ela","family":"Iwaszkiewicz-Eggebrecht","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9751-5745","authenticated-orcid":true,"given":"Lokeshwaran","family":"Manoharan","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Laura J. A.","family":"van Dijk","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Robert","family":"Goodsell","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1669-6124","authenticated-orcid":true,"given":"Nerivania N.","family":"Godeiro","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bruno C.","family":"Bellini","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Johanna","family":"Orsholm","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Piotr","family":"\u0141ukasik","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Andreia","family":"Miraldo","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tomas","family":"Roslin","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ayco J. M.","family":"Tack","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3627-6899","authenticated-orcid":true,"given":"Anders F.","family":"Andersson","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3929-251X","authenticated-orcid":true,"given":"Fredrik","family":"Ronquist","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"340","published-online":{"date-parts":[[2025,11,7]]},"reference":[{"issue":"8","key":"pcbi.1013558.ref001","doi-asserted-by":"crossref","first-page":"2045","DOI":"10.1111\/j.1365-294X.2012.05470.x","article-title":"Towards next-generation biodiversity assessment using DNA metabarcoding","volume":"21","author":"P Taberlet","year":"2012","journal-title":"Mol Ecol"},{"issue":"3","key":"pcbi.1013558.ref002","doi-asserted-by":"crossref","first-page":"490","DOI":"10.1111\/1755-0998.12751","article-title":"Sorting specimen-rich invertebrate samples with cost-effective NGS barcodes: Validating a reverse workflow for specimen processing","volume":"18","author":"WY Wang","year":"2018","journal-title":"Mol Ecol Resour"},{"issue":"1","key":"pcbi.1013558.ref003","doi-asserted-by":"crossref","first-page":"40","DOI":"10.1111\/1755-0998.13056","article-title":"A validated workflow for rapid taxonomic assignment and monitoring of a national fauna of bees (Apiformes) using high throughput DNA barcoding","volume":"20","author":"TJ Creedy","year":"2020","journal-title":"Mol Ecol Resour"},{"key":"pcbi.1013558.ref004","doi-asserted-by":"crossref","DOI":"10.7717\/peerj.4644","article-title":"Estimating intraspecific genetic diversity from community DNA metabarcoding data","volume":"6","author":"V Elbrecht","year":"2018","journal-title":"PeerJ"},{"issue":"2","key":"pcbi.1013558.ref005","doi-asserted-by":"crossref","first-page":"174","DOI":"10.1007\/BF00163806","article-title":"Numt, a recent transfer and tandem amplification of mitochondrial DNA to the nuclear genome of the domestic cat","volume":"39","author":"JV Lopez","year":"1994","journal-title":"J Mol Evol"},{"issue":"1","key":"pcbi.1013558.ref006","doi-asserted-by":"crossref","first-page":"835","DOI":"10.1038\/s41597-025-05151-0","article-title":"Data of the Insect Biome Atlas: a metabarcoding survey of the terrestrial arthropods of Sweden and Madagascar","volume":"12","author":"A Miraldo","year":"2025","journal-title":"Sci Data"},{"issue":"2046","key":"pcbi.1013558.ref007","first-page":"20242974","article-title":"High-throughput biodiversity surveying sheds new light on the brightest of insect taxa","volume":"292","author":"E Iwaszkiewicz-Eggebrecht","year":"2025","journal-title":"Proc Biol Sci"},{"key":"pcbi.1013558.ref008","doi-asserted-by":"crossref","first-page":"108221","DOI":"10.1016\/j.ympev.2024.108221","article-title":"Genomic and transcriptomic perspectives on the origin and evolution of NUMTs in Orthoptera","volume":"201","author":"X Liu","year":"2024","journal-title":"Mol Phylogenet Evol"},{"key":"pcbi.1013558.ref009","doi-asserted-by":"crossref","first-page":"914","DOI":"10.1016\/j.bbagrm.2011.11.005","article-title":"Mitochondrial DNA nucleoid structure","volume":"1819","author":"DF Bogenhagen","year":"2012","journal-title":"Biochim Biophys Acta"},{"key":"pcbi.1013558.ref010","doi-asserted-by":"crossref","first-page":"47","DOI":"10.1002\/cpmo.21","article-title":"Analysis of mtDNA\/nDNA Ratio in Mice","volume":"7","author":"PM Quiros","year":"2017","journal-title":"Curr Protoc Mouse Biol"},{"issue":"6","key":"pcbi.1013558.ref011","doi-asserted-by":"crossref","first-page":"1755","DOI":"10.1111\/1755-0998.13414","article-title":"Towards eradicating the nuisance of numts and noise in molecular biodiversity assessment","volume":"21","author":"NR Graham","year":"2021","journal-title":"Mol Ecol Resour"},{"issue":"1","key":"pcbi.1013558.ref012","doi-asserted-by":"crossref","first-page":"256","DOI":"10.1186\/s12859-021-04180-x","article-title":"Profile hidden Markov model sequence analysis can help remove putative pseudogenes from DNA barcoding and metabarcoding datasets","volume":"22","author":"TM Porter","year":"2021","journal-title":"BMC Bioinformatics"},{"issue":"1","key":"pcbi.1013558.ref013","doi-asserted-by":"crossref","first-page":"1188","DOI":"10.1038\/s41467-017-01312-x","article-title":"Algorithm for post-clustering curation of DNA amplicon data yields reliable biodiversity estimates","volume":"8","author":"TG Fr\u00f8slev","year":"2017","journal-title":"Nat Commun"},{"issue":"6","key":"pcbi.1013558.ref014","doi-asserted-by":"crossref","first-page":"1772","DOI":"10.1111\/1755-0998.13337","article-title":"Validated removal of nuclear pseudogenes and sequencing artefacts from mitochondrial metabarcode data","volume":"21","author":"C And\u00fajar","year":"2021","journal-title":"Mol Ecol Resour"},{"issue":"6","key":"pcbi.1013558.ref015","doi-asserted-by":"crossref","first-page":"1076","DOI":"10.1111\/j.1755-0998.2010.02850.x","article-title":"An open source chimera checker for the fungal ITS region","volume":"10","author":"RH Nilsson","year":"2010","journal-title":"Mol Ecol Resour"},{"issue":"3","key":"pcbi.1013558.ref016","doi-asserted-by":"crossref","first-page":"494","DOI":"10.1101\/gr.112730.110","article-title":"Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons","volume":"21","author":"BJ Haas","year":"2011","journal-title":"Genome Res"},{"key":"pcbi.1013558.ref017","doi-asserted-by":"crossref","first-page":"38","DOI":"10.1186\/1471-2105-12-38","article-title":"Removing noise from pyrosequenced amplicons","volume":"12","author":"C Quince","year":"2011","journal-title":"BMC Bioinformatics"},{"issue":"16","key":"pcbi.1013558.ref018","doi-asserted-by":"crossref","first-page":"2194","DOI":"10.1093\/bioinformatics\/btr381","article-title":"UCHIME improves sensitivity and speed of chimera detection","volume":"27","author":"RC Edgar","year":"2011","journal-title":"Bioinformatics"},{"issue":"3","key":"pcbi.1013558.ref019","doi-asserted-by":"crossref","first-page":"291","DOI":"10.1093\/sysbio\/syr010","article-title":"Performance, accuracy, and Web server for evolutionary placement of short sequence reads under maximum likelihood","volume":"60","author":"SA Berger","year":"2011","journal-title":"Syst Biol"},{"issue":"7","key":"pcbi.1013558.ref020","doi-asserted-by":"crossref","first-page":"581","DOI":"10.1038\/nmeth.3869","article-title":"DADA2: High-resolution sample inference from Illumina amplicon data","volume":"13","author":"BJ Callahan","year":"2016","journal-title":"Nat Methods"},{"issue":"16","key":"pcbi.1013558.ref021","doi-asserted-by":"crossref","first-page":"5261","DOI":"10.1128\/AEM.00062-07","article-title":"Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy","volume":"73","author":"Q Wang","year":"2007","journal-title":"Appl Environ Microbiol"},{"key":"pcbi.1013558.ref022","article-title":"SINTAX: a simple non-Bayesian taxonomy classifier for 16S and ITS sequences","author":"RC Edgar","year":"2016","journal-title":"bioRxiv"},{"key":"pcbi.1013558.ref023","doi-asserted-by":"crossref","DOI":"10.7717\/peerj.2584","article-title":"VSEARCH: a versatile open source tool for metagenomics","volume":"4","author":"T Rognes","year":"2016","journal-title":"PeerJ"},{"issue":"8","key":"pcbi.1013558.ref024","doi-asserted-by":"crossref","first-page":"852","DOI":"10.1038\/s41587-019-0209-9","article-title":"Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2","volume":"37","author":"E Bolyen","year":"2019","journal-title":"Nat Biotechnol"},{"issue":"2","key":"pcbi.1013558.ref025","doi-asserted-by":"crossref","first-page":"365","DOI":"10.1093\/sysbio\/syy054","article-title":"EPA-ng: Massively Parallel Evolutionary Placement of Genetic Sequences","volume":"68","author":"P Barbera","year":"2019","journal-title":"Syst Biol"},{"issue":"10","key":"pcbi.1013558.ref026","doi-asserted-by":"crossref","first-page":"3263","DOI":"10.1093\/bioinformatics\/btaa070","article-title":"Genesis and Gappa: processing, analyzing and visualizing phylogenetic (placement) data","volume":"36","author":"L Czech","year":"2020","journal-title":"Bioinformatics"},{"issue":"7","key":"pcbi.1013558.ref027","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pone.0066213","article-title":"A DNA-based registry for all animal species: the barcode index number (BIN) system","volume":"8","author":"S Ratnasingham","year":"2013","journal-title":"PLoS One"},{"issue":"3","key":"pcbi.1013558.ref028","doi-asserted-by":"crossref","first-page":"355","DOI":"10.1111\/j.1471-8286.2007.01678.x","article-title":"bold: The Barcode of Life Data System (http:\/\/www.barcodinglife.org)","volume":"7","author":"S Ratnasingham","year":"2007","journal-title":"Mol Ecol Notes"},{"key":"pcbi.1013558.ref029","doi-asserted-by":"crossref","DOI":"10.7717\/peerj.593","article-title":"Swarm: robust and fast clustering method for amplicon-based studies","volume":"2","author":"F Mah\u00e9","year":"2014","journal-title":"PeerJ"},{"issue":"1","key":"pcbi.1013558.ref030","doi-asserted-by":"crossref","first-page":"267","DOI":"10.1093\/bioinformatics\/btab493","article-title":"Swarm v3: towards tera-scale amplicon clustering","volume":"38","author":"F Mah\u00e9","year":"2021","journal-title":"Bioinformatics"},{"issue":"2","key":"pcbi.1013558.ref031","doi-asserted-by":"crossref","DOI":"10.1128\/mSphereDirect.00073-17","article-title":"OptiClust, an Improved Method for Assigning Amplicon-Based Sequence Data to Operational Taxonomic Units","volume":"2","author":"SL Westcott","year":"2017","journal-title":"mSphere"},{"issue":"21","key":"pcbi.1013558.ref032","doi-asserted-by":"crossref","first-page":"6593","DOI":"10.1128\/AEM.00342-13","article-title":"Distribution-based clustering: using ecology to refine the operational taxonomic unit","volume":"79","author":"SP Preheim","year":"2013","journal-title":"Appl Environ Microbiol"},{"issue":"5","key":"pcbi.1013558.ref033","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pone.0176335","article-title":"dbOTU3: A new implementation of distribution-based OTU calling","volume":"12","author":"SW Olesen","year":"2017","journal-title":"PLoS One"},{"issue":"3","key":"pcbi.1013558.ref034","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pone.0228561","article-title":"Completing Linnaeus\u2019s inventory of the Swedish insect fauna: Only 5,000 species left?","volume":"15","author":"F Ronquist","year":"2020","journal-title":"PLoS One"},{"issue":"2","key":"pcbi.1013558.ref035","doi-asserted-by":"crossref","first-page":"803","DOI":"10.1111\/1755-0998.13510","article-title":"A molecular-based identification resource for the arthropods of Finland","volume":"22","author":"T Roslin","year":"2022","journal-title":"Mol Ecol Resour"},{"issue":"3","key":"pcbi.1013558.ref036","first-page":"426","article-title":"Construction of a Species-Level Tree of Life for the Insects and Utility in Taxonomic Profiling","volume":"66","author":"D Chesters","year":"2017","journal-title":"Syst Biol"},{"key":"pcbi.1013558.ref037","doi-asserted-by":"crossref","first-page":"33","DOI":"10.12688\/f1000research.29032.2","article-title":"Sustainable data analysis with Snakemake","volume":"10","author":"F M\u00f6lder","year":"2021","journal-title":"F1000Res"},{"key":"pcbi.1013558.ref038","doi-asserted-by":"crossref","DOI":"10.1111\/1755-0998.14023","article-title":"Upscaling biodiversity monitoring: metabarcoding estimates 31,846 insect species from Malaise traps across Germany","volume":"25","author":"D Buchner","year":"2025","journal-title":"Molecular Ecology Resources"},{"key":"pcbi.1013558.ref039","doi-asserted-by":"crossref","first-page":"550420","DOI":"10.3389\/fmicb.2020.550420","article-title":"Interpretations of Environmental Microbial Community Studies Are Biased by the Selected 16S rRNA (Gene) Amplicon Sequencing Pipeline","volume":"11","author":"D Straub","year":"2020","journal-title":"Front Microbiol"},{"key":"pcbi.1013558.ref040","unstructured":"Sundh J. COI reference sequences from BOLD DB. 2022."},{"issue":"23","key":"pcbi.1013558.ref041","doi-asserted-by":"crossref","first-page":"7537","DOI":"10.1128\/AEM.01541-09","article-title":"Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities","volume":"75","author":"PD Schloss","year":"2009","journal-title":"Appl Environ Microbiol"},{"issue":"1","key":"pcbi.1013558.ref042","doi-asserted-by":"crossref","first-page":"90","DOI":"10.1186\/s40168-018-0470-z","article-title":"Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2\u2019s q2-feature-classifier plugin","volume":"6","author":"NA Bokulich","year":"2018","journal-title":"Microbiome"},{"issue":"1","key":"pcbi.1013558.ref043","doi-asserted-by":"crossref","first-page":"7","DOI":"10.3390\/d15010007","article-title":"The Evolution of Collembola Higher Taxa (Arthropoda, Hexapoda) Based on Mitogenome Data","volume":"15","author":"BC Bellini","year":"2022","journal-title":"Diversity"},{"issue":"4","key":"pcbi.1013558.ref044","doi-asserted-by":"crossref","first-page":"772","DOI":"10.1093\/molbev\/mst010","article-title":"MAFFT multiple sequence alignment software version 7: improvements in performance and usability","volume":"30","author":"K Katoh","year":"2013","journal-title":"Mol Biol Evol"},{"key":"pcbi.1013558.ref045","doi-asserted-by":"crossref","DOI":"10.1093\/nar\/gkl315","article-title":"PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments","volume":"34","author":"M Suyama","year":"2006","journal-title":"Nucleic Acids Res"},{"issue":"3","key":"pcbi.1013558.ref046","doi-asserted-by":"crossref","first-page":"539","DOI":"10.1093\/sysbio\/sys029","article-title":"MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space","volume":"61","author":"F Ronquist","year":"2012","journal-title":"Syst Biol"},{"issue":"3","key":"pcbi.1013558.ref047","doi-asserted-by":"crossref","first-page":"526","DOI":"10.1093\/bioinformatics\/bty633","article-title":"ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R","volume":"35","author":"E Paradis","year":"2019","journal-title":"Bioinformatics"},{"issue":"4154","key":"pcbi.1013558.ref048","doi-asserted-by":"crossref","first-page":"862","DOI":"10.1126\/science.185.4154.862","article-title":"Amino acid difference formula to help explain protein evolution","volume":"185","author":"R Grantham","year":"1974","journal-title":"Science"}],"updated-by":[{"DOI":"10.1371\/journal.pcbi.1013558","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2025,11,17]],"date-time":"2025-11-17T00:00:00Z","timestamp":1763337600000}}],"container-title":["PLOS Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1013558","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,17]],"date-time":"2025-11-17T18:44:31Z","timestamp":1763405071000},"score":1,"resource":{"primary":{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1013558"}},"subtitle":[],"editor":[{"given":"Tobias","family":"Bollenbach","sequence":"first","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2025,11,7]]},"references-count":48,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2025,11,7]]}},"URL":"https:\/\/doi.org\/10.1371\/journal.pcbi.1013558","relation":{},"ISSN":["1553-7358"],"issn-type":[{"value":"1553-7358","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,11,7]]}}}