{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:34:51Z","timestamp":1772138091310,"version":"3.50.1"},"reference-count":31,"publisher":"Oxford University Press (OUP)","issue":"8","license":[{"start":{"date-parts":[[2018,9,6]],"date-time":"2018-09-06T00:00:00Z","timestamp":1536192000000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"name":"QIAGEN Sciences"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,4,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Low-frequency DNA mutations are often confounded with technical artifacts from sample preparation and sequencing. With unique molecular identifiers (UMIs), most of the sequencing errors can be corrected. However, errors before UMI tagging, such as DNA polymerase errors during end repair and the first PCR cycle, cannot be corrected with single-strand UMIs and impose fundamental limits to UMI-based variant calling.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We developed smCounter2, a UMI-based variant caller for targeted sequencing data and an upgrade from the current version of smCounter. Compared to smCounter, smCounter2 features lower detection limit that decreases from 1 to 0.5%, better overall accuracy (particularly in non-coding regions), a consistent threshold that can be applied to both deep and shallow sequencing runs, and easier use via a Docker image and code for read pre-processing. We benchmarked smCounter2 against several state-of-the-art UMI-based variant calling methods using multiple datasets and demonstrated smCounter2\u2019s superior performance in detecting somatic variants. At the core of smCounter2 is a statistical test to determine whether the allele frequency of the putative variant is significantly above the background error rate, which was carefully modeled using an independent dataset. The improved accuracy in non-coding regions was mainly achieved using novel repetitive region filters that were specifically designed for UMI data.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>The entire pipeline is available at https:\/\/github.com\/qiaseq\/qiaseq-dna under MIT license.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/bty790","type":"journal-article","created":{"date-parts":[[2018,9,5]],"date-time":"2018-09-05T15:51:18Z","timestamp":1536162678000},"page":"1299-1309","source":"Crossref","is-referenced-by-count":87,"title":["smCounter2: an accurate low-frequency variant caller for targeted sequencing data with unique molecular identifiers"],"prefix":"10.1093","volume":"35","author":[{"given":"Chang","family":"Xu","sequence":"first","affiliation":[{"name":"Life Science Research and Foundation, QIAGEN Sciences Inc., Frederick, MD, USA"}]},{"given":"Xiujing","family":"Gu","sequence":"additional","affiliation":[{"name":"Life Science Research and Foundation, QIAGEN Sciences Inc., Frederick, MD, USA"}]},{"given":"Raghavendra","family":"Padmanabhan","sequence":"additional","affiliation":[{"name":"Life Science Research and Foundation, QIAGEN Sciences Inc., Frederick, MD, USA"}]},{"given":"Zhong","family":"Wu","sequence":"additional","affiliation":[{"name":"Life Science Research and Foundation, QIAGEN Sciences Inc., Frederick, MD, USA"}]},{"given":"Quan","family":"Peng","sequence":"additional","affiliation":[{"name":"Life Science Research and Foundation, QIAGEN Sciences Inc., Frederick, MD, USA"}]},{"given":"John","family":"DiCarlo","sequence":"additional","affiliation":[{"name":"Life Science Research and Foundation, QIAGEN Sciences Inc., Frederick, MD, USA"}]},{"given":"Yexun","family":"Wang","sequence":"additional","affiliation":[{"name":"Life Science Research and Foundation, QIAGEN Sciences Inc., Frederick, MD, USA"}]}],"member":"286","published-online":{"date-parts":[[2018,9,6]]},"reference":[{"key":"2023012808204062400_bty790-B1","doi-asserted-by":"crossref","first-page":"50","DOI":"10.1016\/j.ajhg.2017.05.013","article-title":"Ultra-sensitive sequencing identifies high prevalence of clonal hematopoiesis-associated mutations throughout adult life","volume":"101","author":"Acuna-Hidalgo","year":"2017","journal-title":"Am. J. Hum. Genet"},{"key":"2023012808204062400_bty790-B2","doi-asserted-by":"crossref","first-page":"e2074.","DOI":"10.7717\/peerj.2074","article-title":"Deepsnvminer: a sequence analysis tool to detect emergent, rare mutations in subsets of cell populations","volume":"4","author":"Andrews","year":"2016","journal-title":"PeerJ"},{"key":"2023012808204062400_bty790-B3","doi-asserted-by":"crossref","first-page":"212","DOI":"10.1136\/jmedgenet-2016-104295","article-title":"A novel somatic mutation achieves partial rescue in a child with hutchinson-gilford progeria syndrome","volume":"54","author":"Bar","year":"2017","journal-title":"J. Med. Genet"},{"key":"2023012808204062400_bty790-B4","author":"Blumenstiel","year":"2017"},{"key":"2023012808204062400_bty790-B5","doi-asserted-by":"crossref","first-page":"37032.","DOI":"10.18632\/oncotarget.16144","article-title":"Lolopicker: detecting low allelic-fraction variants from low-quality cancer samples","volume":"8","author":"Carrot-Zhang","year":"2017","journal-title":"Oncotarget"},{"key":"2023012808204062400_bty790-B6","doi-asserted-by":"crossref","first-page":"213","DOI":"10.1038\/nbt.2514","article-title":"Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples","volume":"31","author":"Cibulskis","year":"2013","journal-title":"Nat. Biotechnol"},{"key":"2023012808204062400_bty790-B7","doi-asserted-by":"crossref","first-page":"80","DOI":"10.4161\/fly.19695","article-title":"A program for annotating and predicting the effects of single nucleotide polymorphisms, snpeff: snps in the genome of drosophila melanogaster strain w1118; iso-2; iso-3","volume":"6","author":"Cingolani","year":"2012","journal-title":"Fly"},{"key":"2023012808204062400_bty790-B8","doi-asserted-by":"crossref","first-page":"35","DOI":"10.3389\/fgene.2012.00035","article-title":"Using drosophila melanogaster as a model for genotoxic chemical mutational studies with a new program, snpsift","volume":"3","author":"Cingolani","year":"2012","journal-title":"Frontiers in Genetics"},{"key":"2023012808204062400_bty790-B9","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18637\/jss.v064.i04","article-title":"fitdistrplus: an r package for fitting distributions","volume":"64","author":"Delignette-Muller","year":"2015","journal-title":"J. Stat. Softw"},{"key":"2023012808204062400_bty790-B10","doi-asserted-by":"crossref","first-page":"491.","DOI":"10.1038\/ng.806","article-title":"A framework for variation discovery and genotyping using next-generation dna sequencing data","volume":"43","author":"DePristo","year":"2011","journal-title":"Nat. Genet"},{"key":"2023012808204062400_bty790-B11","doi-asserted-by":"crossref","first-page":"1198","DOI":"10.1093\/bioinformatics\/btt750","article-title":"Subclonal variant calling with multiple samples and prior knowledge","volume":"30","author":"Gerstung","year":"2014","journal-title":"Bioinformatics"},{"key":"2023012808204062400_bty790-B12","doi-asserted-by":"crossref","first-page":"20166","DOI":"10.1073\/pnas.1110064108","article-title":"Accurate sampling and deep sequencing of the hiv-1 protease gene using a primer id","volume":"108","author":"Jabara","year":"2011","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023012808204062400_bty790-B13","doi-asserted-by":"crossref","first-page":"2586","DOI":"10.1038\/nprot.2014.170","article-title":"Detecting ultralow-frequency mutations by duplex sequencing","volume":"9","author":"Kennedy","year":"2014","journal-title":"Nat. Protoc"},{"key":"2023012808204062400_bty790-B14","doi-asserted-by":"crossref","first-page":"93.","DOI":"10.1038\/nrg.2015.17","article-title":"Role of non-coding sequence variants in cancer","volume":"17","author":"Khurana","year":"2016","journal-title":"Nat. Rev. Genet"},{"key":"2023012808204062400_bty790-B15","doi-asserted-by":"crossref","first-page":"269","DOI":"10.1093\/dnares\/dsv010","article-title":"High-fidelity target sequencing of individual molecules identified using barcode sequences: de novo detection and absolute quantitation of mutations in plasma cell-free dna from cancer patients","volume":"22","author":"Kukita","year":"2015","journal-title":"DNA Res"},{"key":"2023012808204062400_bty790-B16","doi-asserted-by":"crossref","first-page":"e108","DOI":"10.1093\/nar\/gkw227","article-title":"Vardict: a novel and versatile variant caller for next-generation sequencing in cancer research","volume":"44","author":"Lai","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2023012808204062400_bty790-B17","doi-asserted-by":"crossref","first-page":"e98.","DOI":"10.1093\/nar\/gku355","article-title":"Theoretical and experimental assessment of degenerate primer tagging in ultra-deep applications of next-generation sequencing","volume":"42","author":"Liang","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2023012808204062400_bty790-B18","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18637\/jss.v061.i08","article-title":"Optimalcutpoints: an r package for selecting optimal cutpoints in diagnostic tests","volume":"61","author":"L\u00f3pez-Rat\u00f3n","year":"2014","journal-title":"J. Stat. Softw"},{"key":"2023012808204062400_bty790-B19","doi-asserted-by":"crossref","first-page":"547","DOI":"10.1038\/nbt.3520","article-title":"Integrated digital error suppression for improved detection of circulating tumor dna","volume":"34","author":"Newman","year":"2016","journal-title":"Nat. Biotechnol"},{"key":"2023012808204062400_bty790-B20","doi-asserted-by":"crossref","first-page":"136.","DOI":"10.1186\/s13059-017-1275-2","article-title":"Characterization of background noise in capture-based targeted sequencing data","volume":"18","author":"Park","year":"2017","journal-title":"Genome Biol"},{"key":"2023012808204062400_bty790-B21","doi-asserted-by":"crossref","first-page":"589.","DOI":"10.1186\/s12864-015-1806-8","article-title":"Reducing amplification artifacts in high multiplex amplicon sequencing by using molecular barcodes","volume":"16","author":"Peng","year":"2015","journal-title":"BMC Genomics"},{"key":"2023012808204062400_bty790-B22","doi-asserted-by":"crossref","first-page":"e0169774.","DOI":"10.1371\/journal.pone.0169774","article-title":"Examining sources of error in pcr by single-molecule sequencing","volume":"12","author":"Potapov","year":"2017","journal-title":"PLoS One"},{"key":"2023012808204062400_bty790-B23","doi-asserted-by":"crossref","first-page":"1811","DOI":"10.1093\/bioinformatics\/bts271","article-title":"Strelka: accurate somatic small-variant calling from sequenced tumor\u2013normal sample pairs","volume":"28","author":"Saunders","year":"2012","journal-title":"Bioinformatics"},{"key":"2023012808204062400_bty790-B24","doi-asserted-by":"crossref","first-page":"14508","DOI":"10.1073\/pnas.1208715109","article-title":"Detection of ultra-rare mutations by next-generation sequencing","volume":"109","author":"Schmitt","year":"2012","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023012808204062400_bty790-B25","doi-asserted-by":"crossref","first-page":"2718.","DOI":"10.1038\/s41598-017-02727-8","article-title":"A high-throughput assay for quantitative measurement of pcr errors","volume":"7","author":"Shagin","year":"2017","journal-title":"Sci. Rep"},{"key":"2023012808204062400_bty790-B26","doi-asserted-by":"crossref","first-page":"e89","DOI":"10.1093\/nar\/gkt126","article-title":"An empirical bayesian framework for somatic mutation detection from cancer genome sequencing data","volume":"41","author":"Shiraishi","year":"2013","journal-title":"Nucleic Acids Res"},{"key":"2023012808204062400_bty790-B27","doi-asserted-by":"crossref","first-page":"e1005480.","DOI":"10.1371\/journal.pcbi.1005480","article-title":"Mageri: computational pipeline for molecular-barcoded targeted resequencing","volume":"13","author":"Shugay","year":"2017","journal-title":"PLoS Comput. Biol"},{"key":"2023012808204062400_bty790-B28","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1016\/j.csbj.2018.01.003","article-title":"A review of somatic single nucleotide variant calling algorithms for next-generation sequencing data","volume":"16","author":"Xu","year":"2018","journal-title":"Comput. Struct. Biotechnol. J"},{"key":"2023012808204062400_bty790-B29","doi-asserted-by":"crossref","first-page":"5.","DOI":"10.1186\/s12864-016-3425-4","article-title":"Detecting very low allele fraction variants using targeted dna sequencing and a novel molecular barcode-aware variant caller","volume":"18","author":"Xu","year":"2017","journal-title":"BMC Genomics"},{"key":"2023012808204062400_bty790-B30","doi-asserted-by":"crossref","first-page":"12484.","DOI":"10.1038\/ncomms12484","article-title":"Clonal haematopoiesis harbouring aml-associated mutations is ubiquitous in healthy adults","volume":"7","author":"Young","year":"2016","journal-title":"Nat. Commun"},{"key":"2023012808204062400_bty790-B31","doi-asserted-by":"crossref","first-page":"246.","DOI":"10.1038\/nbt.2835","article-title":"Integrating human sequence data sets provides a resource of benchmark snp and indel genotype calls","volume":"32","author":"Zook","year":"2014","journal-title":"Nat. Biotechnol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/8\/1299\/48941590\/bioinformatics_35_8_1299.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/35\/8\/1299\/48941590\/bioinformatics_35_8_1299.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,28]],"date-time":"2023-01-28T03:39:21Z","timestamp":1674877161000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/35\/8\/1299\/5091498"}},"subtitle":[],"editor":[{"given":"Inanc","family":"Birol","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2018,9,6]]},"references-count":31,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2019,4,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bty790","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/281659","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2019,4,15]]},"published":{"date-parts":[[2018,9,6]]}}}