{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T21:49:44Z","timestamp":1777499384037,"version":"3.51.4"},"reference-count":49,"publisher":"Oxford University Press (OUP)","issue":"5","license":[{"start":{"date-parts":[[2025,9,22]],"date-time":"2025-09-22T00:00:00Z","timestamp":1758499200000},"content-version":"vor","delay-in-days":22,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"name":"Scientific and Technological Research Program of Chongqing Education Committee","award":["KJQN202300627"],"award-info":[{"award-number":["KJQN202300627"]}]},{"DOI":"10.13039\/501100005230","name":"Natural Science Foundation of Chongqing","doi-asserted-by":"publisher","award":["CSTB2024NSCQ-KJFZMSX0036"],"award-info":[{"award-number":["CSTB2024NSCQ-KJFZMSX0036"]}],"id":[{"id":"10.13039\/501100005230","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100005230","name":"Natural Science Foundation of Chongqing","doi-asserted-by":"publisher","award":["CSTC2021JCYJ-MSXMX0848"],"award-info":[{"award-number":["CSTC2021JCYJ-MSXMX0848"]}],"id":[{"id":"10.13039\/501100005230","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,8,31]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Accurate detection of low-frequency DNA variants (below 1%) is essential in diverse biological and clinical contexts, yet remains fundamentally constrained by the high intrinsic error rates of next-generation sequencing technologies. Although unique molecular identifiers (UMIs) have significantly mitigated these errors by uniquely indexing original template molecules, their efficacy is compromised by UMI collisions and by artifacts introduced during polymerase chain reaction (PCR) amplification and sequencing, which collectively engender false-positive variant calls. Here, we present AFUMIC, an alignment-free UMI clustering framework that systematically addresses these limitations through collision-resilient UMI grouping and a consensus quality score (CQS)\u2013guided strategy for high-fidelity consensus sequence generation. AFUMIC reduces singleton families, enhances clustering precision, and maximizes data retention, yielding 7.27-fold and 3.84-fold increases in single-strand consensus sequence and duplex consensus sequence output, respectively, compared to Du Novo. It further decreases the per-base error rate from $3.01 \\times 10^{-4}$ to $2.10 \\times 10^{-5}$ and raises the proportion of error-free positions from 45.27% to 99.85%, enabling confident detection of variants at variant allele frequencies as low as $1.0 \\times 10^{-5}$. Notably, AFUMIC exhibits superior computational efficiency, rendering it well-suited for high-throughput analysis of UMI-tagged libraries in large-scale genomic studies. Collectively, AFUMIC represents an efficient methodology for ultrasensitive variant detection and establishes a broadly applicable and computationally efficient framework for error-corrected sequencing that can be readily deployed in both clinical diagnostics and large-scale genomic research.<\/jats:p>","DOI":"10.1093\/bib\/bbaf483","type":"journal-article","created":{"date-parts":[[2025,9,22]],"date-time":"2025-09-22T15:06:22Z","timestamp":1758553582000},"source":"Crossref","is-referenced-by-count":1,"title":["Alignment-free unique molecular identifier clustering suppresses sequencing errors for accurate detection of low-frequency DNA variants"],"prefix":"10.1093","volume":"26","author":[{"given":"Fei","family":"Yu","sequence":"first","affiliation":[{"name":"Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications , No. 2 Chongwen Road, Nan'an District, Chongqing 400065 ,","place":["China"]}]},{"given":"Haojie","family":"Xiao","sequence":"additional","affiliation":[{"name":"Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications , No. 2 Chongwen Road, Nan'an District, Chongqing 400065 ,","place":["China"]}]},{"given":"Dongyang","family":"Song","sequence":"additional","affiliation":[{"name":"Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications , No. 2 Chongwen Road, Nan'an District, Chongqing 400065 ,","place":["China"]}]},{"given":"Xiao","family":"Yang","sequence":"additional","affiliation":[{"name":"Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications , No. 2 Chongwen Road, Nan'an District, Chongqing 400065 ,","place":["China"]}]},{"given":"Shiyue","family":"Huang","sequence":"additional","affiliation":[{"name":"Chongqing Yangjiaping Middle School , Chongqing 400050 ,","place":["China"]}]},{"given":"Yu","family":"Wang","sequence":"additional","affiliation":[{"name":"Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications , No. 2 Chongwen Road, Nan'an District, Chongqing 400065 ,","place":["China"]}]},{"given":"Mingze","family":"Bai","sequence":"additional","affiliation":[{"name":"Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications , No. 2 Chongwen Road, Nan'an District, Chongqing 400065 ,","place":["China"]}]},{"given":"Xiaoming","family":"Yao","sequence":"additional","affiliation":[{"name":"Caronos Inc. , Anji, Zhejiang 313300 ,","place":["China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4487-2727","authenticated-orcid":false,"given":"Kunxian","family":"Shu","sequence":"additional","affiliation":[{"name":"Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications , No. 2 Chongwen Road, Nan'an District, Chongqing 400065 ,","place":["China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0156-898X","authenticated-orcid":false,"given":"Dan","family":"Pu","sequence":"additional","affiliation":[{"name":"Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications , No. 2 Chongwen Road, Nan'an District, Chongqing 400065 ,","place":["China"]}]}],"member":"286","published-online":{"date-parts":[[2025,9,22]]},"reference":[{"key":"2025092211061869100_ref1","doi-asserted-by":"publisher","first-page":"333","DOI":"10.1038\/nrg.2016.49","article-title":"Coming of age: ten years of next-generation sequencing technologies","volume":"17","author":"Goodwin","year":"2016","journal-title":"Nat Rev Genet"},{"key":"2025092211061869100_ref2","doi-asserted-by":"publisher","first-page":"269","DOI":"10.1038\/nrg.2017.117","article-title":"Enhancing the accuracy of next-generation sequencing for detecting rare and subclonal mutations","volume":"19","author":"Salk","year":"2018","journal-title":"Nat Rev Genet"},{"key":"2025092211061869100_ref3","doi-asserted-by":"publisher","first-page":"1115","DOI":"10.1038\/s41587-021-00857-z","article-title":"Evaluating the analytical validity of circulating tumor DNA sequencing assays for precision oncology","volume":"39","author":"Deveson","year":"2021","journal-title":"Nat Biotechnol"},{"key":"2025092211061869100_ref4","doi-asserted-by":"publisher","first-page":"82","DOI":"10.1186\/s12943-021-01371-1","article-title":"Liquid biopsy for therapy monitoring in early-stage non-small cell lung cancer","volume":"20","author":"Nagasaka","year":"2021","journal-title":"Mol Cancer"},{"key":"2025092211061869100_ref5","doi-asserted-by":"publisher","first-page":"1667","DOI":"10.1002\/1878-0261.12983","article-title":"Mutated circulating tumor DNA as a liquid biopsy in lung cancer detection and treatment","volume":"15","author":"Filipska","year":"2021","journal-title":"Mol Oncol"},{"key":"2025092211061869100_ref6","doi-asserted-by":"publisher","first-page":"e0042923","DOI":"10.1128\/jcm.00429-23","article-title":"Use of next-generation sequencing to detect mutations associated with antiviral drug resistance in cytomegalovirus","volume":"61","author":"Streck","year":"2023","journal-title":"J Clin Microbiol"},{"key":"2025092211061869100_ref7","doi-asserted-by":"publisher","first-page":"591","DOI":"10.1038\/s41581-021-00428-0","article-title":"Liquid biopsies: donor-derived cell-free DNA for the detection of kidney allograft injury","volume":"17","author":"Oellerich","year":"2021","journal-title":"Nat Rev Nephrol"},{"key":"2025092211061869100_ref8","doi-asserted-by":"publisher","first-page":"397","DOI":"10.1016\/j.healun.2021.01.1564","article-title":"Clinical utility of donor-derived cell-free DNA testing in cardiac transplantation","volume":"40","author":"Khush","year":"2021","journal-title":"J Heart Lung Transplant"},{"key":"2025092211061869100_ref9","doi-asserted-by":"publisher","first-page":"2743","DOI":"10.1093\/jac\/dkad297","article-title":"Impact of pretreatment low-abundance HIV-1 drug resistance on virological failure after 1 year of antiretroviral therapy in China","volume":"78","author":"Li","year":"2023","journal-title":"J Antimicrob Chemother"},{"key":"2025092211061869100_ref10","doi-asserted-by":"publisher","first-page":"397","DOI":"10.1146\/annurev-biodatasci-020722-094144","article-title":"Noninvasive prenatal testing using circulating DNA and RNA: advances, challenges, and possibilities","volume":"6","author":"Moufarrej","year":"2023","journal-title":"Annu Rev Biomed Data Sci"},{"key":"2025092211061869100_ref11","doi-asserted-by":"publisher","first-page":"1320","DOI":"10.1002\/pd.6421","article-title":"Non-invasive prenatal testing (NIPT): combination of copy number variant and gene analyses using an \u201cin-house\u201d target enrichment next generation sequencing-solution for non-centralized NIPT laboratory?","volume":"43","author":"Faldynova","year":"2023","journal-title":"Prenat Diagn"},{"key":"2025092211061869100_ref12","doi-asserted-by":"publisher","first-page":"102870","DOI":"10.1016\/j.fsigen.2023.102870","article-title":"Recent advances in forensic DNA phenotyping of appearance, ancestry and age","volume":"65","author":"Kayser","year":"2023","journal-title":"Forensic Sci Int Genet"},{"key":"2025092211061869100_ref13","doi-asserted-by":"publisher","first-page":"314","DOI":"10.1038\/s41587-019-0368-8","article-title":"Accurate detection of mosaic variants in sequencing data without matched controls","volume":"38","author":"Dou","year":"2020","journal-title":"Nat Biotechnol"},{"key":"2025092211061869100_ref14","doi-asserted-by":"publisher","first-page":"1928","DOI":"10.1038\/s41591-019-0652-7","article-title":"High-intensity sequencing reveals the sources of plasma circulating cell-free DNA variants","volume":"25","author":"Razavi","year":"2019","journal-title":"Nat Med"},{"key":"2025092211061869100_ref15","doi-asserted-by":"publisher","first-page":"99","DOI":"10.1186\/s13059-023-02920-1","article-title":"DREAMS: deep read-level error model for sequencing data applied to low-frequency variant calling and circulating tumor DNA detection","volume":"24","author":"Christensen","year":"2023","journal-title":"Genome Biol"},{"key":"2025092211061869100_ref16","doi-asserted-by":"publisher","first-page":"9530","DOI":"10.1073\/pnas.1105422108","article-title":"Detection and quantification of rare mutations with massively parallel sequencing","volume":"108","author":"Kinde","year":"2011","journal-title":"Proc Natl Acad Sci U S A"},{"key":"2025092211061869100_ref17","doi-asserted-by":"publisher","first-page":"14508","DOI":"10.1073\/pnas.1208715109","article-title":"Detection of ultra-rare mutations by next-generation sequencing","volume":"109","author":"Schmitt","year":"2012","journal-title":"Proc Natl Acad Sci U S A"},{"key":"2025092211061869100_ref18","doi-asserted-by":"publisher","first-page":"81","DOI":"10.1016\/j.gpb.2020.02.003","article-title":"SinoDuplex: an improved duplex sequencing approach to detect low-frequency variants in plasma cfDNA samples","volume":"18","author":"Ren","year":"2020","journal-title":"Genomics Proteomics Bioinf"},{"key":"2025092211061869100_ref19","doi-asserted-by":"publisher","first-page":"547","DOI":"10.1038\/nbt.3520","article-title":"Integrated digital error suppression for improved detection of circulating tumor DNA","volume":"34","author":"Newman","year":"2016","journal-title":"Nat Biotechnol"},{"key":"2025092211061869100_ref20","doi-asserted-by":"publisher","first-page":"11","DOI":"10.1126\/scitranslmed.aav4772","article-title":"A multimodality test to guide the management of patients with a pancreatic cyst","volume":"11","author":"Springer","year":"2019","journal-title":"Sci Transl Med"},{"key":"2025092211061869100_ref21","doi-asserted-by":"publisher","first-page":"499","DOI":"10.1101\/gr.275695.121","article-title":"Discovery of an unusually high number of de novo mutations in sperm of older men using duplex sequencing","volume":"32","author":"Salazar","year":"2022","journal-title":"Genome Res"},{"key":"2025092211061869100_ref22","doi-asserted-by":"publisher","first-page":"23134","DOI":"10.1038\/s41598-024-73587-2","article-title":"Frequency and spectrum of mutations in human sperm measured using duplex sequencing correlate with trio-based de novo mutation analyses","volume":"14","author":"Axelsson","year":"2024","journal-title":"Sci Rep"},{"key":"2025092211061869100_ref23","doi-asserted-by":"publisher","first-page":"e785","DOI":"10.1097\/HS9.0000000000000785","article-title":"Duplex sequencing uncovers recurrent low-frequency cancer-associated mutations in infant and childhood KMT2A-rearranged acute leukemia","volume":"6","author":"Pilheden","year":"2022","journal-title":"Hemasphere."},{"key":"2025092211061869100_ref24","doi-asserted-by":"publisher","first-page":"2245","DOI":"10.1007\/s00204-023-03527-y","article-title":"Duplex sequencing provides detailed characterization of mutation frequencies and spectra in the bone marrow of MutaMouse males exposed to procarbazine hydrochloride","volume":"97","author":"Dodge","year":"2023","journal-title":"Arch Toxicol"},{"key":"2025092211061869100_ref25","doi-asserted-by":"publisher","first-page":"i202","DOI":"10.1093\/bioinformatics\/bty264","article-title":"AmpUMI: design and analysis of unique molecular identifiers for deep amplicon sequencing","volume":"34","author":"Clement","year":"2018","journal-title":"Bioinformatics."},{"key":"2025092211061869100_ref26","doi-asserted-by":"publisher","first-page":"491","DOI":"10.1101\/gr.209601.116","article-title":"UMI-tools: modeling sequencing errors in unique molecular identifiers to improve quantification accuracy","volume":"27","author":"Smith","year":"2017","journal-title":"Genome Res"},{"key":"2025092211061869100_ref27","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btad002","article-title":"Accurate estimation of molecular counts from amplicon sequence data with unique molecular identifiers","volume":"39","author":"Peng","year":"2023","journal-title":"Bioinformatics."},{"key":"2025092211061869100_ref28","doi-asserted-by":"publisher","first-page":"180","DOI":"10.1186\/s13059-016-1039-4","article-title":"Streamlined analysis of duplex sequencing data with Du novo","volume":"17","author":"Stoler","year":"2016","journal-title":"Genome Biol"},{"key":"2025092211061869100_ref29","doi-asserted-by":"publisher","first-page":"3150","DOI":"10.1093\/bioinformatics\/bts565","article-title":"CD-HIT: accelerated for clustering the next-generation sequencing data","volume":"28","author":"Fu","year":"2012","journal-title":"Bioinformatics."},{"key":"2025092211061869100_ref30","doi-asserted-by":"publisher","first-page":"2732","DOI":"10.1093\/bioinformatics\/bts482","article-title":"Rainbow: an integrated tool for efficient clustering and assembling RAD-seq reads","volume":"28","author":"Chong","year":"2012","journal-title":"Bioinformatics."},{"key":"2025092211061869100_ref31","doi-asserted-by":"publisher","first-page":"1829","DOI":"10.1093\/bioinformatics\/bty888","article-title":"Alignment-free clustering of UMI tagged DNA molecules","volume":"35","author":"Orabi","year":"2019","journal-title":"Bioinformatics."},{"key":"2025092211061869100_ref32","doi-asserted-by":"publisher","first-page":"5151","DOI":"10.1093\/bioinformatics\/btaa648","article-title":"AmpliCI: a high-resolution model-based approach for denoising Illumina amplicon data","volume":"36","author":"Peng","year":"2021","journal-title":"Bioinformatics."},{"key":"2025092211061869100_ref33","doi-asserted-by":"publisher","first-page":"581","DOI":"10.1038\/nmeth.3869","article-title":"DADA2: high-resolution sample inference from Illumina amplicon data","volume":"13","author":"Callahan","year":"2016","journal-title":"Nat Methods"},{"key":"2025092211061869100_ref34","doi-asserted-by":"publisher","first-page":"2270","DOI":"10.1016\/j.csbj.2020.08.011","article-title":"UMI-gen: a UMI-based read simulator for variant calling evaluation in paired-end sequencing NGS libraries","volume":"18","author":"Sater","year":"2020","journal-title":"Comput Struct Biotechnol J"},{"key":"2025092211061869100_ref35","doi-asserted-by":"publisher","first-page":"e164","DOI":"10.1093\/nar\/gkq603","article-title":"ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data","volume":"38","author":"Wang","year":"2010","journal-title":"Nucleic Acids Res"},{"key":"2025092211061869100_ref36","doi-asserted-by":"publisher","first-page":"1913","DOI":"10.1093\/bioinformatics\/btv053","article-title":"Starcode: sequence clustering based on all-pairs search","volume":"31","author":"Zorita","year":"2015","journal-title":"Bioinformatics."},{"key":"2025092211061869100_ref37","doi-asserted-by":"publisher","first-page":"e8275","DOI":"10.7717\/peerj.8275","article-title":"Algorithms for efficiently collapsing reads with unique molecular identifiers","volume":"7","author":"Liu","year":"2019","journal-title":"PeerJ."},{"key":"2025092211061869100_ref38","doi-asserted-by":"publisher","first-page":"e0204265","DOI":"10.1371\/journal.pone.0204265","article-title":"Duplex proximity sequencing (pro-seq): a method to improve DNA sequencing accuracy without the cost of molecular barcoding redundancy","volume":"13","author":"Pel","year":"2018","journal-title":"PLoS One"},{"key":"2025092211061869100_ref39","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s12859-020-3419-8","article-title":"Family Reunion via error correction: an efficient analysis of duplex sequencing data","volume":"21","author":"Stoler","year":"2020","journal-title":"BMC Bioinf"},{"key":"2025092211061869100_ref40","doi-asserted-by":"publisher","first-page":"871","DOI":"10.1038\/s41588-023-01376-0","article-title":"Single duplex DNA sequencing with CODEC detects mutations with high sensitivity","volume":"55","author":"Bae","year":"2023","journal-title":"Nat Genet"},{"key":"2025092211061869100_ref41","doi-asserted-by":"publisher","first-page":"752","DOI":"10.1126\/science.aai8690","article-title":"DNA damage is a pervasive cause of sequencing errors, directly confounding variant identification","volume":"355","author":"Chen","year":"2017","journal-title":"Science."},{"key":"2025092211061869100_ref42","doi-asserted-by":"publisher","first-page":"e67","DOI":"10.1093\/nar\/gks1443","article-title":"Discovery and characterization of artifactual mutations in deep coverage targeted capture sequencing data due to oxidative DNA damage during sample preparation","volume":"41","author":"Costello","year":"2013","journal-title":"Nucleic Acids Res"},{"key":"2025092211061869100_ref43","doi-asserted-by":"publisher","first-page":"587","DOI":"10.1007\/s40291-014-0115-2","article-title":"Cytosine deamination is a major cause of baseline noise in next-generation sequencing","volume":"18","author":"Chen","year":"2014","journal-title":"Mol Diagn Ther"},{"key":"2025092211061869100_ref44","doi-asserted-by":"publisher","first-page":"1220","DOI":"10.1038\/s41587-021-00900-z","article-title":"Detection of low-frequency DNA variants by targeted sequencing of the Watson and Crick strands","volume":"39","author":"Cohen","year":"2021","journal-title":"Nat Biotechnol"},{"key":"2025092211061869100_ref45","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gkz474","article-title":"High efficiency error suppression for accurate detection of low-frequency variants","volume":"47","author":"Wang","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2025092211061869100_ref46","doi-asserted-by":"publisher","first-page":"65","DOI":"10.1002\/elps.202400202","article-title":"Enhanced error suppression for accurate detection of low-frequency variants","volume":"46","author":"Chen","year":"2025","journal-title":"Electrophoresis."},{"key":"2025092211061869100_ref47","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13059-019-1659-6","article-title":"Analysis of error profiles in deep next-generation sequencing data","volume":"20","author":"Ma","year":"2019","journal-title":"Genome Biol"},{"key":"2025092211061869100_ref48","doi-asserted-by":"publisher","first-page":"e1005480","DOI":"10.1371\/journal.pcbi.1005480","article-title":"MAGERI: computational pipeline for molecular-barcoded targeted resequencing","volume":"13","author":"Shugay","year":"2017","journal-title":"PLoS Comput Biol"},{"key":"2025092211061869100_ref49","doi-asserted-by":"publisher","first-page":"1299","DOI":"10.1093\/bioinformatics\/bty790","article-title":"smCounter2: an accurate low-frequency variant caller for targeted sequencing data with unique molecular identifiers","volume":"35","author":"Xu","year":"2019","journal-title":"Bioinformatics."}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/26\/5\/bbaf483\/64345463\/bbaf483.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/26\/5\/bbaf483\/64345463\/bbaf483.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,22]],"date-time":"2025-09-22T15:06:23Z","timestamp":1758553583000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbaf483\/8261483"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8,31]]},"references-count":49,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2025,8,31]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbaf483","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,9]]},"published":{"date-parts":[[2025,8,31]]},"article-number":"bbaf483"}}