{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,26]],"date-time":"2026-04-26T04:45:01Z","timestamp":1777178701365,"version":"3.51.4"},"reference-count":15,"publisher":"Oxford University Press (OUP)","issue":"9","license":[{"start":{"date-parts":[[2025,9,17]],"date-time":"2025-09-17T00:00:00Z","timestamp":1758067200000},"content-version":"vor","delay-in-days":16,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,9,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>One of the key applications of Unique Molecular Identifiers (UMIs) in high-throughput sequencing is to correct for PCR amplification bias and removal of PCR duplicates, thereby improving quantification in DNA-seq and RNA-seq applications. Accurately grouping error-bearing UMIs that originate from the same input molecule through a UMI deduplication method is a critical step in this process. However, many existing UMI deduplication tools rely on simple Hamming distance comparisons or suboptimal clustering algorithms, often resulting in erroneous UMI groupings, particularly in error-prone long-read sequencing or ultra-high-depth short-read sequencing.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We introduce UMI-nea, a tool that utilizes Levenshtein distance comparisons and a novel clustering approach to optimize multithreading workflows. Compared against three other indel-aware UMI deduplication tools, UMI-nea achieves more accurate UMI groupings with efficient run time. It demonstrates robust performance across diverse sequencing platforms, depths, and UMI lengths. Additionally, UMI-nea incorporates a data-guided adaptive UMI filter, further enhancing quantification accuracy.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>UMI-nea is available on github https:\/\/github.com\/Qiaseq-research\/UMI-nea.git or Zenodo https:\/\/doi.org\/10.5281\/zenodo.16745758. Sequencing data are stored at https:\/\/qiagenpublic.blob.core.windows.net\/umi-nea-datasets\/.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaf514","type":"journal-article","created":{"date-parts":[[2025,9,19]],"date-time":"2025-09-19T23:18:06Z","timestamp":1758323886000},"source":"Crossref","is-referenced-by-count":1,"title":["UMI-nea: a fast, robust tool for reference-free UMI deduplication and accurate quantification"],"prefix":"10.1093","volume":"41","author":[{"ORCID":"https:\/\/orcid.org\/0009-0000-1088-6632","authenticated-orcid":false,"given":"Jixin","family":"Deng","sequence":"first","affiliation":[{"name":"Research and Development, QIAGEN Sciences Inc. , Frederick, MD, 21703,","place":["United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jingxiao","family":"Zhang","sequence":"additional","affiliation":[{"name":"Research and Development, QIAGEN Sciences Inc. , Frederick, MD, 21703,","place":["United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Song","family":"Tian","sequence":"additional","affiliation":[{"name":"Research and Development, QIAGEN Sciences Inc. , Frederick, MD, 21703,","place":["United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"John","family":"DiCarlo","sequence":"additional","affiliation":[{"name":"Research and Development, QIAGEN Sciences Inc. , Frederick, MD, 21703,","place":["United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hong","family":"Xu","sequence":"additional","affiliation":[{"name":"Research and Development, QIAGEN Sciences Inc. , Frederick, MD, 21703,","place":["United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Samuel J","family":"Rulli","sequence":"additional","affiliation":[{"name":"Product Management Genomics, QIAGEN Sciences Inc. , Frederick, MD, 21703,","place":["United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jonathan M","family":"Shaffer","sequence":"additional","affiliation":[{"name":"Research and Development, QIAGEN Sciences Inc. , Frederick, MD, 21703,","place":["United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Vikas","family":"Gupta","sequence":"additional","affiliation":[{"name":"Research and Development, QIAGEN Sciences Inc. , Frederick, MD, 21703,","place":["United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Toeresin","family":"Karakoyun","sequence":"additional","affiliation":[{"name":"Research and Development, QIAGEN Sciences Inc. , Frederick, MD, 21703,","place":["United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2025,9,17]]},"reference":[{"key":"2025092219522960300_btaf514-B1","doi-asserted-by":"crossref","first-page":"101253","DOI":"10.1016\/j.mam.2024.101253","article-title":"Principles of digital sequencing using unique molecular identifiers","volume":"96","author":"Andersson","year":"2024","journal-title":"Mol Aspects Med"},{"key":"2025092219522960300_btaf514-B2","doi-asserted-by":"crossref","first-page":"380","DOI":"10.1038\/nmeth.3364","article-title":"MiXCR: software for comprehensive adaptive immunity profiling","volume":"12","author":"Bolotin","year":"2015","journal-title":"Nat Methods"},{"key":"2025092219522960300_btaf514-B3","doi-asserted-by":"crossref","first-page":"72","DOI":"10.1038\/nmeth.1778","article-title":"Counting absolute numbers of molecules using unique molecular identifiers","volume":"9","author":"Kivioja","year":"2011","journal-title":"Nat Methods"},{"key":"2025092219522960300_btaf514-B4","doi-asserted-by":"crossref","first-page":"2963","DOI":"10.1093\/bioinformatics\/btv309","article-title":"IMSEQ\u2014a fast and error aware approach to immunogenetic sequence analysis","volume":"31","author":"Kuchenbecker","year":"2015","journal-title":"Bioinformatics"},{"key":"2025092219522960300_btaf514-B5","doi-asserted-by":"crossref","first-page":"154","DOI":"10.1093\/bib\/bbv029","article-title":"Denoising DNA deep sequencing data\u2014high-throughput sequencing errors and their correction","volume":"17","author":"Laehnemann","year":"2016","journal-title":"Brief Bioinform"},{"key":"2025092219522960300_btaf514-B6","doi-asserted-by":"crossref","first-page":"e8275","DOI":"10.7717\/peerj.8275","article-title":"Algorithms for efficiently collapsing reads with unique molecular identifiers","volume":"7","author":"Liu","year":"2019","journal-title":"PeerJ"},{"key":"2025092219522960300_btaf514-B7","doi-asserted-by":"crossref","first-page":"1394","DOI":"10.1093\/bioinformatics\/btw753","article-title":"Edlib: a C\/C++ library for fast, exact sequence alignment using edit distance","volume":"33","author":"Martin","year":"2017","journal-title":"Bioinformatics"},{"key":"2025092219522960300_btaf514-B8","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1186\/s13059-025-03504-x","article-title":"Digital sequencing is improved by using structured unique molecular identifiers","volume":"26","author":"Micallef","year":"2025","journal-title":"Genome Biol"},{"key":"2025092219522960300_btaf514-B9","doi-asserted-by":"crossref","first-page":"1829","DOI":"10.1093\/bioinformatics\/bty888","article-title":"Alignment-free clustering of UMI tagged DNA molecules","volume":"35","author":"Orabi","year":"2019","journal-title":"Bioinformatics"},{"key":"2025092219522960300_btaf514-B10","doi-asserted-by":"crossref","first-page":"btac787","DOI":"10.1093\/bioinformatics\/btad002","article-title":"Accurate estimation of molecular counts from amplicon sequence data with unique molecular identifiers","volume":"39","author":"Peng","year":"2023","journal-title":"Bioinformatics"},{"key":"2025092219522960300_btaf514-B11","first-page":"410","author":"Rosenberg","year":"2007"},{"key":"2025092219522960300_btaf514-B12","doi-asserted-by":"crossref","first-page":"491","DOI":"10.1101\/gr.209601.116","article-title":"UMI-tools: modeling sequencing errors in unique molecular identifiers to improve quantification accuracy","volume":"27","author":"Smith","year":"2017","journal-title":"Genome Res"},{"key":"2025092219522960300_btaf514-B13","doi-asserted-by":"crossref","first-page":"e87","DOI":"10.1093\/nar\/gkz474","article-title":"High efficiency error suppression for accurate detection of low-frequency variants","volume":"47","author":"Wang","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2025092219522960300_btaf514-B14","doi-asserted-by":"crossref","first-page":"14049","DOI":"10.1038\/ncomms14049","article-title":"Massively parallel digital transcriptional profiling of single cells","volume":"8","author":"Zheng","year":"2017","journal-title":"Nat Commun"},{"key":"2025092219522960300_btaf514-B15","doi-asserted-by":"crossref","first-page":"6023","DOI":"10.1038\/s41467-020-19687-9","article-title":"UMI-linked consensus sequencing enables phylogenetic analysis of directed evolution","volume":"11","author":"Zurek","year":"2020","journal-title":"Nat Commun"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaf514\/64302274\/btaf514.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/9\/btaf514\/64302274\/btaf514.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/9\/btaf514\/64302274\/btaf514.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,22]],"date-time":"2025-09-22T23:52:35Z","timestamp":1758585155000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btaf514\/8256683"}},"subtitle":[],"editor":[{"given":"Can","family":"Alkan","sequence":"additional","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2025,9,1]]},"references-count":15,"journal-issue":{"issue":"9","published-print":{"date-parts":[[2025,9,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaf514","relation":{},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,9]]},"published":{"date-parts":[[2025,9,1]]},"article-number":"btaf514"}}