{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,11]],"date-time":"2026-03-11T18:15:43Z","timestamp":1773252943359,"version":"3.50.1"},"reference-count":12,"publisher":"Oxford University Press (OUP)","issue":"3","license":[{"start":{"date-parts":[[2026,2,24]],"date-time":"2026-02-24T00:00:00Z","timestamp":1771891200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"NIH ACDC\/AViDD","award":["64903"],"award-info":[{"award-number":["64903"]}]},{"name":"Department of Laboratory Medicine and Pathology at the University of Washington Medical Center"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2026,2,28]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Unique molecular identifiers (UMIs) are widely used in next-generation sequencing to enable accurate molecular counting and error correction. However, challenges remain in accurately collapsing UMI clusters, especially when read counts are low or sparse read clusters arise from barcode sequencing errors.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We present RUMINA, a Rust-based pipeline for UMI-aware deduplication and error correction, optimized for both amplicon and shotgun sequencing. RUMINA supports multiple UMI cluster strategies, alongside majority-rule read selection independent of mapping quality, as well as discrete handling of 1\u20132 read clusters, paired-end merging, and read-length stratification. Benchmarking using simulated HIV population sequencing data and real-world iCLIP and TCR datasets showed that RUMINA improves ultra-low frequency SNV detection (0.01%\u20131%), reduces false positives, enhances reproducibility, and processes sequencing data up to 10-fold faster than existing tools. By integrating UMI- and sequence-level correction in a high-performance framework, RUMINA offers a fast, scalable, and robust solution for UMI-enabled sequencing workflows.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>RUMINA is implemented in Rust and distributed as open-source code and precompiled binaries. Source code and installation instructions are available at https:\/\/github.com\/greninger-lab\/rumina. Documentation associated with this manuscript is available at https:\/\/github.com\/greninger-lab\/rumina_paper.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btag097","type":"journal-article","created":{"date-parts":[[2026,2,24]],"date-time":"2026-02-24T12:39:12Z","timestamp":1771936752000},"source":"Crossref","is-referenced-by-count":0,"title":["RUMINA: high-throughput deduplication of unique molecular identifiers for amplicon and whole-genome sequencing with enhanced error correction"],"prefix":"10.1093","volume":"42","author":[{"given":"Eli","family":"Piliper","sequence":"first","affiliation":[{"name":"Department of Laboratory Medicine and Pathology, University of Washington Medical Center , Seattle, 98109, Washington,","place":["United States"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7479-3064","authenticated-orcid":false,"given":"Stephanie","family":"Goya","sequence":"additional","affiliation":[{"name":"Department of Laboratory Medicine and Pathology, University of Washington Medical Center , Seattle, 98109, Washington,","place":["United States"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7443-0527","authenticated-orcid":false,"given":"Alexander L","family":"Greninger","sequence":"additional","affiliation":[{"name":"Department of Laboratory Medicine and Pathology, University of Washington Medical Center , Seattle, 98109, Washington,","place":["United States"]},{"name":"Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center , Seattle, 98109, Washington,","place":["United States"]}]}],"member":"286","published-online":{"date-parts":[[2026,2,24]]},"reference":[{"key":"2026031019335492700_btag097-B1","doi-asserted-by":"crossref","first-page":"148","DOI":"10.3390\/v10040148","article-title":"Applying unique molecular identifiers in next generation sequencing reveals a constrained viral quasispecies evolution under cross-reactive antibody pressure targeting long alpha helix of hemagglutinin","volume":"10","author":"Hauck","year":"2018","journal-title":"Viruses"},{"key":"2026031019335492700_btag097-B2","doi-asserted-by":"crossref","first-page":"644","DOI":"10.3389\/fimmu.2015.00644","article-title":"Dynamic perturbations of the T-cell receptor repertoire in chronic HIV infection and following antiretroviral therapy","volume":"6","author":"Heather","year":"2016","journal-title":"Front Immunol"},{"key":"2026031019335492700_btag097-B3","doi-asserted-by":"crossref","first-page":"72","DOI":"10.1038\/nmeth.1778","article-title":"Counting absolute numbers of molecules using unique molecular identifiers","volume":"9","author":"Kivioja","year":"2012","journal-title":"Nat Methods"},{"key":"2026031019335492700_btag097-B4","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1038\/s41421-023-00522-6","article-title":"High-throughput and high-sensitivity full-length single-cell RNA-seq analysis on third-generation sequencing platform","volume":"9","author":"Liao","year":"2023","journal-title":"Cell Discov"},{"key":"2026031019335492700_btag097-B5","doi-asserted-by":"crossref","first-page":"pgae411","DOI":"10.1093\/pnasnexus\/pgae411","article-title":"High accuracy meets high throughput for near full-length 16S ribosomal RNA amplicon sequencing on the nanopore platform","volume":"3","author":"Lin","year":"2024","journal-title":"PNAS Nexus"},{"key":"2026031019335492700_btag097-B6","doi-asserted-by":"crossref","first-page":"e8275","DOI":"10.7717\/peerj.8275","article-title":"Algorithms for efficiently collapsing reads with unique molecular identifiers","volume":"7","author":"Liu","year":"2019","journal-title":"PeerJ"},{"key":"2026031019335492700_btag097-B7","doi-asserted-by":"crossref","first-page":"553","DOI":"10.1101\/gad.276477.115","article-title":"SR proteins are NXF1 adaptors that link alternative RNA processing to mRNA export","volume":"30","author":"M\u00fcller-McNicoll","year":"2016","journal-title":"Genes Dev"},{"key":"2026031019335492700_btag097-B8","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1200\/JGO.2019.5.suppl.55","article-title":"An optimized ultra-deep massively parallel sequencing with unique molecular identifier tagging for detection and quantification of circulating tumor DNA from lung cancer patients","volume":"5","author":"Pham","year":"2019","journal-title":"JGO"},{"key":"2026031019335492700_btag097-B9","doi-asserted-by":"crossref","first-page":"491","DOI":"10.1101\/gr.209601.116","article-title":"UMI-tools: modeling sequencing errors in unique molecular identifiers to improve quantification accuracy","volume":"27","author":"Smith","year":"2017","journal-title":"Genome Res"},{"key":"2026031019335492700_btag097-B10","doi-asserted-by":"crossref","first-page":"lqab019","DOI":"10.1093\/nargab\/lqab019","article-title":"Sequencing error profiles of Illumina sequencing instruments","volume":"3","author":"Stoler","year":"2021","journal-title":"NAR Genom Bioinform"},{"key":"2026031019335492700_btag097-B11","doi-asserted-by":"crossref","first-page":"102459","DOI":"10.1016\/j.fsigen.2020.102459","article-title":"Reducing noise and stutter in short tandem repeat loci with unique molecular identifiers","volume":"51","author":"Woerner","year":"2021","journal-title":"Forensic Sci Int Genet"},{"key":"2026031019335492700_btag097-B12","doi-asserted-by":"crossref","first-page":"e3938","DOI":"10.21769\/BioProtoc.3938","article-title":"Primer ID next-generation sequencing for the analysis of a broad spectrum antiviral induced transition mutations and errors rates in a coronavirus genome","volume":"11","author":"Zhou","year":"2021","journal-title":"Bio Protoc"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btag097\/67093893\/btag097.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/42\/3\/btag097\/67093893\/btag097.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/42\/3\/btag097\/67093893\/btag097.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,10]],"date-time":"2026-03-10T23:34:05Z","timestamp":1773185645000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btag097\/8496272"}},"subtitle":[],"editor":[{"given":"Xin","family":"Gao","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2026,2,24]]},"references-count":12,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2026,2,28]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btag097","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2026,3]]},"published":{"date-parts":[[2026,2,24]]},"article-number":"btag097"}}