{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,6]],"date-time":"2026-02-06T08:44:32Z","timestamp":1770367472875,"version":"3.49.0"},"reference-count":28,"publisher":"Public Library of Science (PLoS)","issue":"2","license":[{"start":{"date-parts":[[2026,2,5]],"date-time":"2026-02-05T00:00:00Z","timestamp":1770249600000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001871","name":"Funda\u00e7\u00e3o para a Ci\u00eancia e a Tecnologia","doi-asserted-by":"publisher","award":["UI\/BD\/153743\/2022"],"award-info":[{"award-number":["UI\/BD\/153743\/2022"]}],"id":[{"id":"10.13039\/501100001871","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001871","name":"Funda\u00e7\u00e3o para a Ci\u00eancia e a Tecnologia","doi-asserted-by":"publisher","award":["UID\/00006\/2023"],"award-info":[{"award-number":["UID\/00006\/2023"]}],"id":[{"id":"10.13039\/501100001871","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["www.plosone.org"],"crossmark-restriction":false},"short-container-title":["PLoS One"],"abstract":"<jats:p>The accurate detection of genetic variants is critical for advancing genomics research and precision medicine. However, this task remains challenging due to pervasive sequencing errors and complex genomic regions. The choice of variant calling software significantly influences results, creating a need for clear, evidence-based guidance. This study aims to provide a performance evaluation and a clear, evidence-based guide for selecting variant callers by benchmarking seven widely used tools, GATK, FreeBayes, DeepVariant, Samtools, Strelka2, Octopus, and Varscan2, highlighting their algorithmic trade-offs. The well-characterized NA12878 genome from the Genome in a Bottle consortium was analyzed. High-coverage whole-genome sequencing data was processed with each variant caller, and the resulting variant calling files were benchmarked against a gold-standard reference. Performance was assessed using precision, recall, and F1-score on a chromosome 20 subset and on full whole-genome data. The analysis revealed that DeepVariant\u2019s deep learning approach achieved the highest precision (0.7869) and F1-score (0.8754) on chromosome 20. For whole-genome analysis, Strelka2 excelled in precision (0.8326), while Octopus demonstrated superior recall (0.9838). FreeBayes exhibited high sensitivity but lower precision, underscoring a key trade-off. There is no universally superior variant caller; the optimal choice depends on the specific research objectives, whether prioritizing precision, recall, or computational efficiency. This study serves as a crucial evidence-based resource for researchers and clinicians, enabling informed tool selection.<\/jats:p>","DOI":"10.1371\/journal.pone.0339891","type":"journal-article","created":{"date-parts":[[2026,2,5]],"date-time":"2026-02-05T18:46:32Z","timestamp":1770317192000},"page":"e0339891","update-policy":"https:\/\/doi.org\/10.1371\/journal.pone.corrections_policy","source":"Crossref","is-referenced-by-count":0,"title":["Variant calling in genomics: A comparative performance analysis and decision guide"],"prefix":"10.1371","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0724-5612","authenticated-orcid":true,"given":"Vera","family":"Pinto","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2114-720X","authenticated-orcid":true,"given":"Lisete","family":"Sousa","sequence":"additional","affiliation":[]},{"given":"Carina","family":"Silva","sequence":"additional","affiliation":[]}],"member":"340","published-online":{"date-parts":[[2026,2,5]]},"reference":[{"key":"pone.0339891.ref001","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1016\/j.csbj.2018.01.003","article-title":"A review of somatic single nucleotide variant calling algorithms for next-generation sequencing data","volume":"16","author":"C Xu","year":"2018","journal-title":"Comput Struct Biotechnol J."},{"issue":"1","key":"pone.0339891.ref002","doi-asserted-by":"crossref","first-page":"31","DOI":"10.1038\/nrg2626","article-title":"Sequencing technologies - the next generation","volume":"11","author":"ML Metzker","year":"2010","journal-title":"Nat Rev Genet."},{"issue":"2","key":"pone.0339891.ref003","doi-asserted-by":"crossref","first-page":"294","DOI":"10.3390\/genes1020294","article-title":"Bioinformatics for next generation sequencing data","volume":"1","author":"A Magi","year":"2010","journal-title":"Genes (Basel)."},{"key":"pone.0339891.ref004","doi-asserted-by":"crossref","first-page":"47","DOI":"10.1186\/s12918-016-0300-5","article-title":"SNVSniffer: an integrated caller for germline and somatic single-nucleotide and indel mutations","author":"Y Liu","year":"2016","journal-title":"BMC Syst Biol."},{"issue":"1","key":"pone.0339891.ref005","doi-asserted-by":"crossref","first-page":"20444","DOI":"10.1038\/s41598-023-47135-3","article-title":"Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data","volume":"13","author":"X Xiang","year":"2023","journal-title":"Sci Rep."},{"issue":"8","key":"pone.0339891.ref006","doi-asserted-by":"crossref","first-page":"591","DOI":"10.1038\/s41592-018-0051-x","article-title":"Strelka2: fast and accurate calling of germline and somatic variants","volume":"15","author":"S Kim","year":"2018","journal-title":"Nat Methods."},{"issue":"1","key":"pone.0339891.ref007","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1186\/s13073-020-00791-w","article-title":"Best practices for variant calling in clinical sequencing","volume":"12","author":"DC Koboldt","year":"2020","journal-title":"Genome Med."},{"key":"pone.0339891.ref008","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1007\/978-3-030-91836-1_3","article-title":"Somatic and germline variant calling from next-generation sequencing data","volume":"1361","author":"T-C Chang","year":"2022","journal-title":"Adv Exp Med Biol."},{"issue":"1","key":"pone.0339891.ref009","doi-asserted-by":"crossref","first-page":"71","DOI":"10.1007\/s13353-015-0292-7","article-title":"Review of alignment and SNP calling algorithms for next-generation sequencing data","volume":"57","author":"M Mielczarek","year":"2016","journal-title":"J Appl Genet."},{"issue":"21","key":"pone.0339891.ref010","doi-asserted-by":"crossref","first-page":"2987","DOI":"10.1093\/bioinformatics\/btr509","article-title":"A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data","volume":"27","author":"H Li","year":"2011","journal-title":"Bioinformatics."},{"key":"pone.0339891.ref011","unstructured":"Garrison E, Marth G. Haplotype-based variant detection from short-read sequencing; 2012. http:\/\/arxiv.org\/abs\/1207.3907"},{"issue":"24","key":"pone.0339891.ref012","doi-asserted-by":"crossref","first-page":"5582","DOI":"10.1093\/bioinformatics\/btaa1081","article-title":"Accurate, scalable cohort variant calls using DeepVariant and GLnexus","volume":"36","author":"T Yun","year":"2021","journal-title":"Bioinformatics."},{"issue":"10","key":"pone.0339891.ref013","doi-asserted-by":"crossref","first-page":"983","DOI":"10.1038\/nbt.4235","article-title":"A universal SNP and small-indel variant caller using deep neural networks","volume":"36","author":"R Poplin","year":"2018","journal-title":"Nat Biotechnol."},{"issue":"9","key":"pone.0339891.ref014","doi-asserted-by":"crossref","first-page":"1297","DOI":"10.1101\/gr.107524.110","article-title":"The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data","volume":"20","author":"A McKenna","year":"2010","journal-title":"Genome Res."},{"issue":"1","key":"pone.0339891.ref015","doi-asserted-by":"crossref","DOI":"10.1186\/1471-2105-15-104","article-title":"BAYSIC: a Bayesian method for combining sets of genome variants with improved specificity and sensitivity","volume":"15","author":"BL Cantarel","year":"2014","journal-title":"BMC Bioinformatics."},{"key":"pone.0339891.ref016","doi-asserted-by":"crossref","unstructured":"Xu Y, Zheng X, Yuan Y, Estecio MR, Issa J-P, Ji Y, et al. A Bayesian model for SNP discovery based on next-generation sequencing data. IEEE Int Workshop Genomic Signal Process Stat. 2012;2012:42\u20135. https:\/\/doi.org\/10.1109\/GENSIPS.2012.6507722 26726304","DOI":"10.1109\/GENSIPS.2012.6507722"},{"issue":"4","key":"pone.0339891.ref017","doi-asserted-by":"crossref","DOI":"10.1093\/bib\/bbaa366","article-title":"Simulation of African and non-African low and high coverage whole genome sequence data to assess variant calling approaches","volume":"22","author":"S Alosaimi","year":"2021","journal-title":"Brief Bioinform."},{"issue":"1","key":"pone.0339891.ref018","doi-asserted-by":"crossref","first-page":"155","DOI":"10.1186\/s12864-022-08365-3","article-title":"Systematic benchmark of state-of-the-art variant calling pipelines identifies major factors affecting accuracy of coding sequence variant discovery","volume":"23","author":"YA Barbitoff","year":"2022","journal-title":"BMC Genomics."},{"key":"pone.0339891.ref019","doi-asserted-by":"crossref","first-page":"235","DOI":"10.3389\/fgene.2015.00235","article-title":"Best practices for evaluating single nucleotide variant calling methods for microbial genomics","volume":"6","author":"ND Olson","year":"2015","journal-title":"Front Genet."},{"issue":"1","key":"pone.0339891.ref020","doi-asserted-by":"crossref","first-page":"318","DOI":"10.1186\/s12864-024-10239-9","article-title":"Comparison of structural variant callers for massive whole-genome sequence data","volume":"25","author":"S Joe","year":"2024","journal-title":"BMC Genomics."},{"issue":"3","key":"pone.0339891.ref021","doi-asserted-by":"crossref","first-page":"246","DOI":"10.1038\/nbt.2835","article-title":"Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls","volume":"32","author":"JM Zook","year":"2014","journal-title":"Nat Biotechnol."},{"issue":"16","key":"pone.0339891.ref022","doi-asserted-by":"crossref","first-page":"2078","DOI":"10.1093\/bioinformatics\/btp352","article-title":"The sequence alignment\/map format and SAMtools","volume":"25","author":"H Li","year":"2009","journal-title":"Bioinformatics."},{"issue":"2","key":"pone.0339891.ref023","doi-asserted-by":"crossref","DOI":"10.1093\/gigascience\/giab008","article-title":"Twelve years of SAMtools and BCFtools","volume":"10","author":"P Danecek","year":"2021","journal-title":"Gigascience."},{"issue":"7","key":"pone.0339891.ref024","doi-asserted-by":"crossref","first-page":"885","DOI":"10.1038\/s41587-021-00861-3","article-title":"A unified haplotype-based method for accurate and comprehensive variant calling","volume":"39","author":"DP Cooke","year":"2021","journal-title":"Nat Biotechnol."},{"issue":"3","key":"pone.0339891.ref025","doi-asserted-by":"crossref","first-page":"568","DOI":"10.1101\/gr.129684.111","article-title":"VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing","volume":"22","author":"DC Koboldt","year":"2012","journal-title":"Genome Res."},{"key":"pone.0339891.ref026","doi-asserted-by":"crossref","unstructured":"Cleary JG, Braithwaite R, Gaastra K, Hilbush BS, Inglis S, Irvine SA, et al. Comparing Variant Call Files for Performance Benchmarking of Next-Generation Sequencing Variant Calling Pipelines. openRxiv. 2015. https:\/\/doi.org\/10.1101\/023754","DOI":"10.1101\/023754"},{"key":"pone.0339891.ref027","unstructured":"Centers for Medicare & Medicaid Services. Clinical Laboratory Improvement Amendments (CLIA). [cited 2025 Oct]. https:\/\/www.cms.gov\/Regulations-and-Guidance\/Legislation\/CLIA"},{"key":"pone.0339891.ref028","unstructured":"International Organization for Standardization. ISO 1518 9:2022 \u2013 Medical laboratories \u2013 Requirements for quality and competence, Geneva, Switzerland; 2022. https:\/\/www.iso.org\/standard\/76677.html"}],"container-title":["PLOS One"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pone.0339891","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,2,5]],"date-time":"2026-02-05T18:46:44Z","timestamp":1770317204000},"score":1,"resource":{"primary":{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pone.0339891"}},"subtitle":[],"editor":[{"given":"Nejat","family":"Mahdieh","sequence":"first","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2026,2,5]]},"references-count":28,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2026,2,5]]}},"URL":"https:\/\/doi.org\/10.1371\/journal.pone.0339891","relation":{},"ISSN":["1932-6203"],"issn-type":[{"value":"1932-6203","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,2,5]]}}}