{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,18]],"date-time":"2026-02-18T22:59:24Z","timestamp":1771455564286,"version":"3.50.1"},"reference-count":51,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2025,1,19]],"date-time":"2025-01-19T00:00:00Z","timestamp":1737244800000},"content-version":"vor","delay-in-days":58,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100014233","name":"Fondation Groupe EDF","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100014233","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,11,22]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>By identifying somatic mutations, whole-exome sequencing (WES) has become a technology of choice for the diagnosis and guiding treatment decisions in many cancers. Despite advances in the field of somatic variant detection and the emergence of sophisticated tools incorporating machine learning, accurately identifying somatic variants remains challenging.<\/jats:p>\n               <jats:p>Each new somatic variant caller is often accompanied by claims of superior performance compared to predecessors. Furthermore, most comparative studies focus on a limited set of tools and reference datasets, leading to inconsistent results and making it difficult for laboratories to select the optimal solution. Our study comprehensively evaluated 20 somatic variant callers across four reference WES datasets. We subsequently assessed the performance of ensemble approaches by exploring all possible combinations of these callers, generating 8178 and 1013 combinations for single-nucleotide variants (SNVs) and indels, respectively, with varying voting thresholds. Our analysis identified five high-performing individual somatic variant callers: Muse, Mutect2, Dragen, TNScope, and NeuSomatic. For somatic SNVs, an ensemble combining LoFreq, Muse, Mutect2, SomaticSniper, Strelka, and Lancet outperformed the top-performing caller (Dragen) by &amp;gt;3.6% (mean F1 score\u2009=\u20090.927). Similarly, for somatic indels, an ensemble of Mutect2, Strelka, Varscan2, and Pindel outperformed the best individual caller (Neusomatic) by &amp;gt;3.5% (mean F1 score\u2009=\u20090.867). By considering the computational costs of each combination, we were able to identify an optimal solution involving four somatic variant callers, Muse, Mutect2, and Strelka for the SNVs and Mutect2, Strelka, and Varscan2 for the indels, enabling accurate and cost-effective somatic variant detection in whole exome.<\/jats:p>","DOI":"10.1093\/bib\/bbae697","type":"journal-article","created":{"date-parts":[[2025,1,19]],"date-time":"2025-01-19T23:29:49Z","timestamp":1737329389000},"source":"Crossref","is-referenced-by-count":7,"title":["A benchmarking study of individual somatic variant callers and voting-based ensembles for whole-exome sequencing"],"prefix":"10.1093","volume":"26","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5747-562X","authenticated-orcid":false,"given":"Arnaud","family":"Guille","sequence":"first","affiliation":[{"name":"Predictive Oncology Laboratory, Marseille Research Cancer Center, INSERM U1068, CNRS U7258, Institut Paoli-Calmettes, Aix-Marseille University , Equipe labellis\u00e9e \u00ab Ligue Nationale Contre le Cancer \u00bb, 13009 Marseille ,","place":["France"]}]},{"given":"Jos\u00e9","family":"Ad\u00e9la\u00efde","sequence":"additional","affiliation":[{"name":"Predictive Oncology Laboratory, Marseille Research Cancer Center, INSERM U1068, CNRS U7258, Institut Paoli-Calmettes, Aix-Marseille University , Equipe labellis\u00e9e \u00ab Ligue Nationale Contre le Cancer \u00bb, 13009 Marseille ,","place":["France"]}]},{"given":"Pascal","family":"Finetti","sequence":"additional","affiliation":[{"name":"Predictive Oncology Laboratory, Marseille Research Cancer Center, INSERM U1068, CNRS U7258, Institut Paoli-Calmettes, Aix-Marseille University , Equipe labellis\u00e9e \u00ab Ligue Nationale Contre le Cancer \u00bb, 13009 Marseille ,","place":["France"]}]},{"given":"Fabrice","family":"Andre","sequence":"additional","affiliation":[{"name":"Department of Medical Oncology , Gustave Roussy, University Paris-Saclay, 94805 Villejuif ,","place":["France"]}]},{"given":"Daniel","family":"Birnbaum","sequence":"additional","affiliation":[{"name":"Predictive Oncology Laboratory, Marseille Research Cancer Center, INSERM U1068, CNRS U7258, Institut Paoli-Calmettes, Aix-Marseille University , Equipe labellis\u00e9e \u00ab Ligue Nationale Contre le Cancer \u00bb, 13009 Marseille ,","place":["France"]}]},{"given":"Emilie","family":"Mamessier","sequence":"additional","affiliation":[{"name":"Predictive Oncology Laboratory, Marseille Research Cancer Center, INSERM U1068, CNRS U7258, Institut Paoli-Calmettes, Aix-Marseille University , Equipe labellis\u00e9e \u00ab Ligue Nationale Contre le Cancer \u00bb, 13009 Marseille ,","place":["France"]}]},{"given":"Fran\u00e7ois","family":"Bertucci","sequence":"additional","affiliation":[{"name":"Predictive Oncology Laboratory, Marseille Research Cancer Center, INSERM U1068, CNRS U7258, Institut Paoli-Calmettes, Aix-Marseille University , Equipe labellis\u00e9e \u00ab Ligue Nationale Contre le Cancer \u00bb, 13009 Marseille ,","place":["France"]},{"name":"Medical Oncology, Institut Paoli-Calmettes, 13009 , Marseille ,","place":["France"]}]},{"given":"Max","family":"Chaffanet","sequence":"additional","affiliation":[{"name":"Predictive Oncology Laboratory, Marseille Research Cancer Center, INSERM U1068, CNRS U7258, Institut Paoli-Calmettes, Aix-Marseille University , Equipe labellis\u00e9e \u00ab Ligue Nationale Contre le Cancer \u00bb, 13009 Marseille ,","place":["France"]}]}],"member":"286","published-online":{"date-parts":[[2025,1,18]]},"reference":[{"key":"2025011923292224400_ref1","doi-asserted-by":"publisher","first-page":"57","DOI":"10.1016\/s0092-8674(00)81683-9","article-title":"The hallmarks of cancer","volume":"100","author":"Hanahan","year":"2000","journal-title":"Cell"},{"key":"2025011923292224400_ref2","doi-asserted-by":"publisher","first-page":"922","DOI":"10.1038\/gim.2014.58","article-title":"The usefulness of whole-exome sequencing in routine clinical practice","volume":"16","author":"Iglesias","year":"2014","journal-title":"Genet Med"},{"key":"2025011923292224400_ref3","doi-asserted-by":"publisher","first-page":"212","DOI":"10.1016\/j.canlet.2012.12.028","article-title":"Advances for studying clonal evolution in cancer","volume":"340","author":"Ding","year":"2013","journal-title":"Cancer Lett"},{"key":"2025011923292224400_ref4","doi-asserted-by":"publisher","first-page":"1377","DOI":"10.1038\/s41467-017-01470-y","article-title":"Prevalence and detection of low-allele-fraction variants in clinical cancer samples","volume":"8","author":"Shin","year":"2017","journal-title":"Nat Commun"},{"key":"2025011923292224400_ref5","doi-asserted-by":"publisher","first-page":"e0227427","DOI":"10.1371\/journal.pone.0227427","article-title":"Sequencing artifacts derived from a library preparation method using enzymatic fragmentation","volume":"15","author":"Tanaka","year":"2020","journal-title":"PloS One"},{"key":"2025011923292224400_ref6","doi-asserted-by":"publisher","first-page":"458","DOI":"10.1186\/s12864-017-3770-y","article-title":"Pan-cancer analysis reveals technical artifacts in TCGA germline variant calls","volume":"18","author":"Buckley","year":"2017","journal-title":"BMC Genomics"},{"key":"2025011923292224400_ref7","doi-asserted-by":"publisher","first-page":"15","DOI":"10.1016\/j.csbj.2018.01.003","article-title":"A review of somatic single nucleotide variant calling algorithms for next-generation sequencing data","volume":"16","author":"Xu","year":"2018","journal-title":"Comput Struct Biotechnol J"},{"key":"2025011923292224400_ref8","doi-asserted-by":"publisher","first-page":"1041","DOI":"10.1038\/s41467-019-09027-x","article-title":"Deep convolutional neural networks for accurate somatic mutation detection","volume":"10","author":"Sahraeian","year":"2019","journal-title":"Nat Commun"},{"key":"2025011923292224400_ref9","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btac828","article-title":"DeepSom: a CNN-based approach to somatic variant calling in WGS samples without a matched normal","volume-title":"Bioinformatics","author":""},{"key":"2025011923292224400_ref10","doi-asserted-by":"publisher","first-page":"12898","DOI":"10.1038\/s41598-020-69772-8","article-title":"SomaticCombiner: improving the performance of somatic variant calling based on evaluation tests and a consensus approach","volume":"10","author":"Wang","year":"2020","journal-title":"Sci Rep"},{"key":"2025011923292224400_ref11","doi-asserted-by":"publisher","first-page":"12","DOI":"10.1186\/s13059-021-02592-9","article-title":"Achieving robust somatic mutation detection with deep learning models derived from reference data sets of a cancer sample","volume":"23","author":"Sahraeian","year":"2022","journal-title":"Genome Biol"},{"key":"2025011923292224400_ref12","doi-asserted-by":"publisher","first-page":"442","DOI":"10.1016\/j.tig.2020.03.005","article-title":"Opening the black box: interpretable machine learning for geneticists","volume":"36","author":"Azodi","year":"2020","journal-title":"Trends Genet"},{"key":"2025011923292224400_ref13","doi-asserted-by":"publisher","first-page":"623","DOI":"10.1038\/nmeth.3407","article-title":"Combining tumor genome simulation with crowdsourcing to benchmark somatic single-nucleotide-variant detection","volume":"12","author":"Ewing","year":"2015","journal-title":"Nat Methods"},{"key":"2025011923292224400_ref14","doi-asserted-by":"publisher","first-page":"1151","DOI":"10.1038\/s41587-021-00993-6","article-title":"Establishing community reference samples, data and call sets for benchmarking cancer mutation detection using whole-genome sequencing","volume":"39","author":"Fang","year":"2021","journal-title":"Nat Biotechnol"},{"key":"2025011923292224400_ref15","doi-asserted-by":"publisher","first-page":"1141","DOI":"10.1038\/s41587-021-00994-5","article-title":"Toward best practice in cancer mutation detection with whole-genome and whole-exome sequencing","volume":"39","author":"Xiao","year":"2021","journal-title":"Nat Biotechnol"},{"key":"2025011923292224400_ref16","doi-asserted-by":"publisher","first-page":"157","DOI":"10.1101\/gr.210500.116","article-title":"A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree","volume":"27","author":"Eberle","year":"2017","journal-title":"Genome Res"},{"key":"2025011923292224400_ref17","doi-asserted-by":"publisher","first-page":"3501","DOI":"10.1038\/s41598-020-60559-5","article-title":"Systematic comparison of somatic variant calling performance among different sequencing depth and mutation frequency","volume":"10","author":"Chen","year":"2020","journal-title":"Sci Rep"},{"key":"2025011923292224400_ref18","doi-asserted-by":"publisher","first-page":"e0151664","DOI":"10.1371\/journal.pone.0151664","article-title":"Evaluation of nine somatic variant callers for detection of somatic mutations in exome and targeted deep sequencing data","volume":"11","author":"Kr\u00f8ig\u00e5rd","year":"2016","journal-title":"PloS One"},{"key":"2025011923292224400_ref19","doi-asserted-by":"publisher","first-page":"94","DOI":"10.1186\/s12920-020-00746-5","article-title":"Comparative analysis of somatic variant calling on matched FF and FFPE WGS samples","volume":"13","author":"Brienen","year":"2020","journal-title":"BMC Med Genomics"},{"key":"2025011923292224400_ref20","doi-asserted-by":"publisher","first-page":"8463","DOI":"10.1038\/s41598-023-34925-y","article-title":"Simple combination of multiple somatic variant callers to increase accuracy","volume":"13","author":"Trevarton","year":"2023","journal-title":"Sci Rep"},{"key":"2025011923292224400_ref21","doi-asserted-by":"publisher","first-page":"35","DOI":"10.1186\/s13073-017-0425-1","article-title":"Intersect-then-combine approach: improving the performance of somatic variant calling in whole exome sequencing data using multiple aligners and callers","volume":"9","author":"Callari","year":"2017","journal-title":"Genome Med"},{"key":"2025011923292224400_ref22","doi-asserted-by":"publisher","first-page":"90","DOI":"10.1186\/gm494","article-title":"A simple consensus approach improves somatic mutation prediction accuracy","volume":"5","author":"Goode","year":"2013","journal-title":"Genome Med"},{"key":"2025011923292224400_ref23","doi-asserted-by":"publisher","first-page":"35","DOI":"10.3233\/BD-2010-0307","article-title":"Triple negative breast cancer cell lines: one tool in the search for better treatment of triple negative breast cancer","volume":"32","author":"Chavez","year":"2010","journal-title":"Breast Dis"},{"key":"2025011923292224400_ref24","doi-asserted-by":"publisher","first-page":"87","DOI":"10.1186\/s13073-021-00897-9","article-title":"Prospective high-throughput genome profiling of advanced cancers: results of the PERMED-01 clinical trial","volume":"13","author":"Bertucci","year":"2021","journal-title":"Genome Med"},{"key":"2025011923292224400_ref25","doi-asserted-by":"publisher","first-page":"560","DOI":"10.1038\/s41586-019-1056-z","article-title":"Genomic characterization of metastatic breast cancers [published correction appears in Nature. 2019 Aug;572(7767):E7. Doi: 10.1038\/s41586-019-1380-3]","volume":"569","author":"Bertucci","year":"2019","journal-title":"Nature"},{"key":"2025011923292224400_ref26","article-title":"Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM","author":""},{"key":"2025011923292224400_ref27","doi-asserted-by":"publisher","first-page":"2032","DOI":"10.1093\/bioinformatics\/btv098","article-title":"Sambamba: fast processing of NGS alignment formats","volume":"31","author":"Tarasov","year":"2015","journal-title":"Bioinformatics"},{"key":"2025011923292224400_ref28","doi-asserted-by":"publisher","first-page":"11.10.1","DOI":"10.1002\/0471250953.bi1110s43","article-title":"From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline","volume":"43","author":"Van der Auwera","year":"2013","journal-title":"Curr Protoc Bioinformatics"},{"key":"2025011923292224400_ref29","doi-asserted-by":"publisher","first-page":"20","DOI":"10.1038\/s42003-018-0023-9","article-title":"Genome-wide somatic variant calling using localized colored de Bruijn graphs","volume":"1","author":"Narzisi","year":"2018","journal-title":"Commun Biol"},{"key":"2025011923292224400_ref30","doi-asserted-by":"publisher","first-page":"11189","DOI":"10.1093\/nar\/gks918","article-title":"LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets","volume":"40","author":"Wilm","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2025011923292224400_ref31","doi-asserted-by":"publisher","first-page":"178","DOI":"10.1186\/s13059-016-1029-6","article-title":"MuSE: accounting for tumor heterogeneity using a sample-specific error model improves sensitivity and specificity in mutation calling from sequencing data","volume":"17","author":"Fan","year":"2016","journal-title":"Genome Biol"},{"key":"2025011923292224400_ref32","doi-asserted-by":"publisher","first-page":"213","DOI":"10.1038\/nbt.2514","article-title":"Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples","volume":"31","author":"Cibulskis","year":"2013","journal-title":"Nat Biotechnol"},{"key":"2025011923292224400_ref33","doi-asserted-by":"publisher","first-page":"1297","DOI":"10.1101\/gr.107524.110","article-title":"The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data","volume":"20","author":"McKenna","year":"2010","journal-title":"Genome Res"},{"key":"2025011923292224400_ref34","doi-asserted-by":"publisher","first-page":"2529","DOI":"10.1038\/nprot.2016.150","article-title":"Indel variant analysis of short-read sequencing data with Scalpel","volume":"11","author":"Fang","year":"2016","journal-title":"Nat Protoc"},{"key":"2025011923292224400_ref35","doi-asserted-by":"publisher","first-page":"302","DOI":"10.1186\/1471-2164-14-302","article-title":"Identification of somatic mutations in cancer through Bayesian-based analysis of sequenced genome pairs","volume":"14","author":"Christoforides","year":"2013","journal-title":"BMC Genomics"},{"key":"2025011923292224400_ref36","doi-asserted-by":"publisher","first-page":"311","DOI":"10.1093\/bioinformatics\/btr665","article-title":"SomaticSniper: identification of somatic point mutations in whole genome sequencing data","volume":"28","author":"Larson","year":"2012","journal-title":"Bioinformatics"},{"key":"2025011923292224400_ref37","doi-asserted-by":"publisher","first-page":"591","DOI":"10.1038\/s41592-018-0051-x","article-title":"Strelka2: fast and accurate calling of germline and somatic variants","volume":"15","author":"Kim","year":"2018","journal-title":"Nat Methods"},{"key":"2025011923292224400_ref38","doi-asserted-by":"publisher","first-page":"e108","DOI":"10.1093\/nar\/gkw227","article-title":"VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research","volume":"44","author":"Lai","year":"2016","journal-title":"Nucleic Acids Res"},{"key":"2025011923292224400_ref39","doi-asserted-by":"publisher","first-page":"568","DOI":"10.1101\/gr.129684.111","article-title":"VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing","volume":"22","author":"Koboldt","year":"2012","journal-title":"Genome Res"},{"key":"2025011923292224400_ref40","doi-asserted-by":"publisher","first-page":"1498","DOI":"10.1093\/bioinformatics\/btt183","article-title":"Shimmer: detection of genetic alterations in tumors using next-generation sequence data","volume":"29","author":"Hansen","year":"2013","journal-title":"Bioinformatics"},{"key":"2025011923292224400_ref41","doi-asserted-by":"publisher","first-page":"R90","DOI":"10.1186\/gb-2013-14-8-r90","article-title":"Virmid: accurate detection of somatic mutations with sample impurity inference","volume":"14","author":"Kim","year":"2013","journal-title":"Genome Biol"},{"key":"2025011923292224400_ref42","article-title":"Haplotype-based variant detection from short-read sequencing","author":"Garrison","year":"2012"},{"key":"2025011923292224400_ref43","doi-asserted-by":"publisher","first-page":"2865","DOI":"10.1093\/bioinformatics\/btp394","article-title":"Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads","volume":"25","author":"Ye","year":"2009","journal-title":"Bioinformatics"},{"key":"2025011923292224400_ref44"},{"key":"2025011923292224400_ref45","doi-asserted-by":"publisher","first-page":"4248","DOI":"10.1038\/s41467-022-31765-8","article-title":"Accurate somatic variant detection using weakly supervised deep learning","volume":"13","author":"Krishnamachari","year":"2022","journal-title":"Nat Commun"},{"key":"2025011923292224400_ref46","doi-asserted-by":"publisher","article-title":"Somatic small-variant calling methods in Illumina DRAGEN\u2122 secondary analysis","author":"Scheffler","DOI":"10.1101\/2023.03.23.534011"},{"key":"2025011923292224400_ref47","doi-asserted-by":"publisher","article-title":"TNscope: accurate detection of somatic mutations with haplotype-based variant candidate detection and machine learning filtering","author":"Freed","DOI":"10.1101\/250647"},{"key":"2025011923292224400_ref48","doi-asserted-by":"publisher","first-page":"100129","DOI":"10.1016\/j.xgen.2022.100129","article-title":"PrecisionFDA Truth Challenge V2: calling variants from short and long reads in difficult-to-map regions","volume":"2","author":"Olson","year":"2022","journal-title":"Cell Genom"},{"key":"2025011923292224400_ref49","doi-asserted-by":"publisher","first-page":"33","DOI":"10.12688\/f1000research.29032.2","article-title":"Sustainable data analysis with Snakemake","volume":"10","author":"M\u00f6lder","year":"2021","journal-title":"F1000Res"},{"key":"2025011923292224400_ref50","doi-asserted-by":"publisher","first-page":"2057","DOI":"10.1038\/s41598-020-59026-y","article-title":"Systematic dissection of biases in whole-exome and whole-genome sequencing reveals major determinants of coding sequence coverage","volume":"10","author":"Barbitoff","year":"2020","journal-title":"Sci Rep"},{"key":"2025011923292224400_ref51","doi-asserted-by":"publisher","first-page":"1073","DOI":"10.1093\/bioinformatics\/btt771","article-title":"Bias from removing read duplication in ultra-deep sequencing experiments","volume":"30","author":"Zhou","year":"2014","journal-title":"Bioinformatics"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/26\/1\/bbae697\/61491609\/bbae697.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/26\/1\/bbae697\/61491609\/bbae697.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,1,19]],"date-time":"2025-01-19T23:29:51Z","timestamp":1737329391000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbae697\/7960049"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,11,22]]},"references-count":51,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2024,11,22]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbae697","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,1]]},"published":{"date-parts":[[2024,11,22]]},"article-number":"bbae697"}}