{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T14:56:09Z","timestamp":1776092169805,"version":"3.50.1"},"reference-count":32,"publisher":"Oxford University Press (OUP)","issue":"4","license":[{"start":{"date-parts":[[2025,3,6]],"date-time":"2025-03-06T00:00:00Z","timestamp":1741219200000},"content-version":"vor","delay-in-days":1,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Lundbeck Foundation Centre for Disease Evolution","award":["R302-2018-2155"],"award-info":[{"award-number":["R302-2018-2155"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,3,29]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Accurate quantification of genotype uncertainty is pivotal in ensuring the reliability of genetic inferences drawn from NGS data. Genotype uncertainty is typically modeled using Genotype Likelihoods (GLs), which can help propagate measures of statistical uncertainty in base calls to downstream analyses. However, the effects of errors and biases in the estimation of GLs, introduced by biases in the original base call quality scores or the discretization of quality scores, as well as the choice of the GL model, remain under-explored.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We present vcfgl, a versatile tool for simulating genotype likelihoods associated with simulated read data. It offers a framework for researchers to simulate and investigate the uncertainties and biases associated with the quantification of uncertainty, thereby facilitating a deeper understanding of their impacts on downstream analytical methods. Through simulations, we demonstrate the utility of vcfgl in benchmarking GL-based methods. The program can calculate GLs using various widely used genotype likelihood models and can simulate the errors in quality scores using a Beta distribution. It is compatible with modern simulators such as msprime and SLiM, and can output data in pileup, Variant Call Format (VCF)\/BCF, and genomic VCF file formats, supporting a wide range of applications. The vcfgl program is freely available as an efficient and user-friendly software written in C\/C++.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>vcfgl is freely available at https:\/\/github.com\/isinaltinkaya\/vcfgl.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaf098","type":"journal-article","created":{"date-parts":[[2025,3,5]],"date-time":"2025-03-05T23:25:06Z","timestamp":1741217106000},"source":"Crossref","is-referenced-by-count":5,"title":["vcfgl: a flexible genotype likelihood simulator for VCF\/BCF files"],"prefix":"10.1093","volume":"41","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6364-3332","authenticated-orcid":false,"given":"Isin","family":"Altinkaya","sequence":"first","affiliation":[{"name":"Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen , Copenhagen K, 1350,","place":["Denmark"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0513-6591","authenticated-orcid":false,"given":"Rasmus","family":"Nielsen","sequence":"additional","affiliation":[{"name":"Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen , Copenhagen K, 1350,","place":["Denmark"]},{"name":"Departments of Integrative Biology and Statistics, University of California , Berkeley, CA, 94720,","place":["United States"]}]},{"given":"Thorfinn Sand","family":"Korneliussen","sequence":"additional","affiliation":[{"name":"Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen , Copenhagen K, 1350,","place":["Denmark"]}]}],"member":"286","published-online":{"date-parts":[[2025,3,5]]},"reference":[{"key":"2025041602165716100_btaf098-B1","doi-asserted-by":"crossref","DOI":"10.1093\/genetics\/iyab229","article-title":"Efficient ancestry and mutation simulation with msprime 1.0","volume":"220","author":"Baumdicker","year":"2022","journal-title":"Genetics"},{"key":"2025041602165716100_btaf098-B2","author":"Caetano-Anolles","year":"2023"},{"key":"2025041602165716100_btaf098-B3","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1016\/j.margen.2016.04.012","article-title":"Next-generation biology: sequencing and data analysis approaches for non-model organisms","volume":"30","author":"da Fonseca","year":"2016","journal-title":"Mar Genomics"},{"key":"2025041602165716100_btaf098-B4","doi-asserted-by":"crossref","DOI":"10.1093\/gigascience\/giab008","article-title":"Twelve years of SAMtools and BCFtools","volume":"10","author":"Danecek","year":"2021","journal-title":"Gigascience"},{"key":"2025041602165716100_btaf098-B5","doi-asserted-by":"crossref","first-page":"3855","DOI":"10.1093\/bioinformatics\/btz200","article-title":"ngsLD: evaluating linkage disequilibrium using genotype likelihoods","volume":"35","author":"Fox","year":"2019","journal-title":"Bioinformatics"},{"key":"2025041602165716100_btaf098-B6","doi-asserted-by":"publisher","year":"2024","journal-title":"bioRxiv","DOI":"10.1101\/2024.07.01.601500v6"},{"key":"2025041602165716100_btaf098-B7","doi-asserted-by":"crossref","first-page":"E127","DOI":"10.1086\/723601","article-title":"SLiM 4: multispecies eco-evolutionary modeling","volume":"201","author":"Haller","year":"2023","journal-title":"Am Nat"},{"key":"2025041602165716100_btaf098-B8","doi-asserted-by":"crossref","first-page":"356","DOI":"10.1186\/s12859-014-0356-4","article-title":"ANGSD: analysis of next generation sequencing data","volume":"15","author":"Korneliussen","year":"2014","journal-title":"BMC Bioinformatics"},{"key":"2025041602165716100_btaf098-B9","doi-asserted-by":"crossref","first-page":"4009","DOI":"10.1093\/bioinformatics\/btv509","article-title":"NgsRelate: a software tool for estimating pairwise relatedness from next-generation sequencing data","volume":"31","author":"Korneliussen","year":"2015","journal-title":"Bioinformatics"},{"key":"2025041602165716100_btaf098-B10","doi-asserted-by":"crossref","first-page":"2987","DOI":"10.1093\/bioinformatics\/btr509","article-title":"A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data","volume":"27","author":"Li","year":"2011","journal-title":"Bioinformatics"},{"key":"2025041602165716100_btaf098-B11","doi-asserted-by":"crossref","first-page":"1851","DOI":"10.1101\/gr.078212.108","article-title":"Mapping short DNA sequencing reads and calling variants using mapping quality scores","volume":"18","author":"Li","year":"2008","journal-title":"Genome Res"},{"key":"2025041602165716100_btaf098-B12","doi-asserted-by":"publisher","article-title":"ATLAS: analysis tools for low-depth and ancient samples","journal-title":"bioRxiv","DOI":"10.1101\/105346"},{"key":"2025041602165716100_btaf098-B13","doi-asserted-by":"crossref","first-page":"5966","DOI":"10.1111\/mec.16077","article-title":"A beginner\u2019s guide to low-coverage whole genome sequencing for population genomics","volume":"30","author":"Lou","year":"2021","journal-title":"Mol Ecol"},{"key":"2025041602165716100_btaf098-B14","doi-asserted-by":"crossref","first-page":"2719","DOI":"10.1111\/1755-0998.13415","article-title":"Identifying loci under selection via explicit demographic models","volume":"21","author":"Luqman","year":"2021","journal-title":"Mol Ecol Resour"},{"key":"2025041602165716100_btaf098-B15","doi-asserted-by":"crossref","first-page":"1393","DOI":"10.1534\/g3.117.039008","article-title":"Genotype calling from population genomic sequencing data","volume":"7","author":"Maruki","year":"2017","journal-title":"G3 (Bethesda)"},{"key":"2025041602165716100_btaf098-B16","doi-asserted-by":"crossref","DOI":"10.1093\/gigascience\/giac032","article-title":"Fast and accurate estimation of multidimensional site frequency spectra from low-coverage high-throughput sequencing data","volume":"11","author":"Mas-Sandoval","year":"2022","journal-title":"Gigascience"},{"key":"2025041602165716100_btaf098-B17","doi-asserted-by":"crossref","first-page":"1297","DOI":"10.1101\/gr.107524.110","article-title":"The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data","volume":"20","author":"McKenna","year":"2010","journal-title":"Genome Res"},{"key":"2025041602165716100_btaf098-B18","doi-asserted-by":"crossref","first-page":"719","DOI":"10.1534\/genetics.118.301336","article-title":"Inferring population structure and admixture proportions in low-depth NGS data","volume":"210","author":"Meisner","year":"2018","journal-title":"Genetics"},{"key":"2025041602165716100_btaf098-B19","doi-asserted-by":"crossref","first-page":"1037","DOI":"10.1534\/genetics.113.152181","article-title":"SLiM: simulating evolution with selection and linkage","volume":"194","author":"Messer","year":"2013","journal-title":"Genetics"},{"key":"2025041602165716100_btaf098-B20","doi-asserted-by":"crossref","first-page":"33","DOI":"10.12688\/f1000research.29032.2","article-title":"Sustainable data analysis with Snakemake","volume":"10","author":"M\u00f6lder","year":"2021","journal-title":"F1000Res"},{"key":"2025041602165716100_btaf098-B21","doi-asserted-by":"crossref","first-page":"443","DOI":"10.1038\/nrg2986","article-title":"Genotype and SNP calling from next-generation sequencing data","volume":"12","author":"Nielsen","year":"2011","journal-title":"Nat Rev Genet"},{"key":"2025041602165716100_btaf098-B22","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1016\/j.tig.2014.12.002","article-title":"Accounting for uncertainty in DNA sequencing data","volume":"31","author":"O\u2019Rawe","year":"2015","journal-title":"Trends Genet"},{"key":"2025041602165716100_btaf098-B23","volume-title":"Genetics","author":"Rasmussen MS, Garcia-Erill G, Korneliussen TS","year":"2022."},{"key":"2025041602165716100_btaf098-B24","doi-asserted-by":"crossref","first-page":"693","DOI":"10.1534\/genetics.113.154138","article-title":"Estimating individual admixture proportions from next generation sequencing data","volume":"195","author":"Skotte","year":"2013","journal-title":"Genetics"},{"key":"2025041602165716100_btaf098-B25","doi-asserted-by":"crossref","first-page":"551","DOI":"10.1534\/g3.117.300192","article-title":"Powerful inference with the D-statistic on lowcoverage whole-genome data","volume":"8","author":"Soraggi","year":"2018","journal-title":"G3 (Bethesda)"},{"key":"2025041602165716100_btaf098-B26","author":"Van Der Auwera","year":"2020"},{"key":"2025041602165716100_btaf098-B27","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1111\/bij.12511","article-title":"Improving the estimation of genetic distances from Next-Generation Sequencing data","volume":"117","author":"Vieira","year":"2016","journal-title":"Biol J Linn Soc"},{"key":"2025041602165716100_btaf098-B28","doi-asserted-by":"crossref","first-page":"1754","DOI":"10.1093\/molbev\/msw051","article-title":"Variation in linked selection and recombination drive genomic divergence during allopatric speciation of European and American aspens","volume":"33","author":"Wang","year":"2016","journal-title":"Mol Biol Evol"},{"key":"2025041602165716100_btaf098-B29","doi-asserted-by":"crossref","first-page":"833","DOI":"10.1101\/gr.146084.112","article-title":"An integrative variant analysis pipeline for accurate genotype\/haplotype inference in population NGS data","volume":"23","author":"Wang","year":"2013","journal-title":"Genome Res"},{"key":"2025041602165716100_btaf098-B30","doi-asserted-by":"crossref","first-page":"35","DOI":"10.1111\/mec.14954","article-title":"Allele frequency-free inference of close familial relationships from genotypes or low-depth sequencing data","volume":"28","author":"Waples","year":"2019","journal-title":"Mol Ecol"},{"key":"2025041602165716100_btaf098-B31","doi-asserted-by":"crossref","first-page":"821","DOI":"10.3390\/genes14040821","volume":"14","author":"Zhao","year":"2023","journal-title":"Genes (Basel)"},{"key":"2025041602165716100_btaf098-B32","doi-asserted-by":"crossref","DOI":"10.1093\/molbev\/msac119","article-title":"DistAngsd: fast and accurate inference of genetic distances for next-generation sequencing data","volume":"39","author":"Zhao","year":"2022","journal-title":"Mol Biol Evol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaf098\/62297935\/btaf098.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/4\/btaf098\/62297935\/btaf098.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/4\/btaf098\/62297935\/btaf098.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,4,16]],"date-time":"2025-04-16T02:17:09Z","timestamp":1744769829000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btaf098\/8056036"}},"subtitle":[],"editor":[{"given":"Can","family":"Alkan","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2025,3,5]]},"references-count":32,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025,3,29]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaf098","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2024.04.09.586324","asserted-by":"object"}]},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,4]]},"published":{"date-parts":[[2025,3,5]]},"article-number":"btaf098"}}