{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,12]],"date-time":"2026-03-12T00:15:34Z","timestamp":1773274534963,"version":"3.50.1"},"reference-count":41,"publisher":"Oxford University Press (OUP)","issue":"20","license":[{"start":{"date-parts":[[2018,5,29]],"date-time":"2018-05-29T00:00:00Z","timestamp":1527552000000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100000276","name":"UK Department of Health","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100000276","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2018,10,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Classical methods of comparing the accuracies of variant calling pipelines are based on truth sets of variants whose genotypes are previously determined with high confidence. An alternative way of performing benchmarking is based on Mendelian constraints between related individuals. Statistical analysis of Mendelian violations can provide truth set-independent benchmarking information, and enable benchmarking less-studied variants and diverse populations.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We introduce a statistical mixture model for comparing two variant calling pipelines from genotype data they produce after running on individual members of a trio. We determine the accuracy of our model by comparing the precision and recall of GATK Unified Genotyper and Haplotype Caller on the high-confidence SNPs of the NIST Ashkenazim trio and the two independent Platinum Genome trios. We show that our method is able to estimate differential precision and recall between the two pipelines with 10\u22123 uncertainty.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>The Python library geck, and usage examples are available at the following URL: https:\/\/github.com\/sbg\/geck, under the GNU General Public License v3.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/bty415","type":"journal-article","created":{"date-parts":[[2018,5,22]],"date-time":"2018-05-22T23:17:21Z","timestamp":1527031041000},"page":"3488-3495","source":"Crossref","is-referenced-by-count":9,"title":["<i>geck<\/i>\n                    : trio-based comparative benchmarking of variant calls"],"prefix":"10.1093","volume":"34","author":[{"given":"P\u00e9ter","family":"K\u00f3m\u00e1r","sequence":"first","affiliation":[{"name":"Seven Bridges Inc, Cambridge, MA, USA"}]},{"given":"Deniz","family":"Kural","sequence":"additional","affiliation":[{"name":"Totient Inc, Cambridge, MA, USA"}]}],"member":"286","published-online":{"date-parts":[[2018,5,29]]},"reference":[{"key":"2023012712423600500_bty415-B1","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1038\/nature15393","article-title":"A global reference for human genetic variation","volume":"526","author":"Auton","year":"2015","journal-title":"Nature"},{"key":"2023012712423600500_bty415-B2","doi-asserted-by":"crossref","first-page":"745","DOI":"10.1038\/nrg3031","article-title":"Exome sequencing as a tool for Mendelian disease gene discovery","volume":"12","author":"Bamshad","year":"2011","journal-title":"Nat. Rev. Genet"},{"key":"2023012712423600500_bty415-B3","doi-asserted-by":"crossref","first-page":"462.","DOI":"10.1186\/s13059-014-0462-7","article-title":"Toward better benchmarking: challenge-based methods assessment in cancer genomics","volume":"15","author":"Boutros","year":"2014","journal-title":"Genome Biol"},{"key":"2023012712423600500_bty415-B4","doi-asserted-by":"crossref","first-page":"840","DOI":"10.1016\/j.ajhg.2013.09.014","article-title":"Detecting identity by descent and estimating genotype error rates in sequence data","volume":"93","author":"Browning","year":"2013","journal-title":"Am. J. Hum. Genet"},{"key":"2023012712423600500_bty415-B5","doi-asserted-by":"crossref","first-page":"142","DOI":"10.1101\/gr.142455.112","article-title":"Genotype calling and haplotyping in parent-offspring trios","volume":"23","author":"Chen","year":"2013","journal-title":"Genome Res"},{"key":"2023012712423600500_bty415-B6","doi-asserted-by":"crossref","first-page":"1.","DOI":"10.1155\/2015\/456479","article-title":"A comparison of variant calling pipelines using genome in a bottle as a reference","volume":"2015","author":"Cornish","year":"2015","journal-title":"BioMed. Res. Int"},{"key":"2023012712423600500_bty415-B7","doi-asserted-by":"crossref","first-page":"491","DOI":"10.1038\/ng.806","article-title":"A framework for variation discovery and genotyping using next-generation DNA sequencing data","volume":"43","author":"DePristo","year":"2011","journal-title":"Nat. Genet"},{"key":"2023012712423600500_bty415-B8","doi-asserted-by":"crossref","first-page":"487","DOI":"10.1086\/338919","article-title":"Probability of detection of genotyping errors and mutations as inheritance inconsistencies in nuclear-family data","volume":"70","author":"Douglas","year":"2002","journal-title":"Am. J. Hum. Genet"},{"key":"2023012712423600500_bty415-B9","doi-asserted-by":"crossref","first-page":"157","DOI":"10.1101\/gr.210500.116","article-title":"A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree","volume":"27","author":"Eberle","year":"2017","journal-title":"Genome Res"},{"key":"2023012712423600500_bty415-B10","author":"Fang","year":"2016"},{"key":"2023012712423600500_bty415-B11","author":"Fragoso","year":"2015"},{"key":"2023012712423600500_bty415-B12","doi-asserted-by":"crossref","first-page":"812","DOI":"10.1016\/j.spl.2012.11.009","article-title":"Estimating genotyping error rates from parent-offspring dyads","volume":"83","author":"Haaland","year":"2013","journal-title":"Stat. Prob. Lett"},{"key":"2023012712423600500_bty415-B13","doi-asserted-by":"crossref","first-page":"623","DOI":"10.1016\/j.ygeno.2004.05.003","article-title":"Estimation of genotype error rate using samples with pedigree information\u2013an application on the GeneChip Mapping 10K array","volume":"84","author":"Hao","year":"2004","journal-title":"Genomics"},{"key":"2023012712423600500_bty415-B14","doi-asserted-by":"crossref","first-page":"878","DOI":"10.1093\/aje\/kwn208","article-title":"Estimating the single nucleotide polymorphism genotype misclassification from routine double measurements in a large epidemiologic sample","volume":"168","author":"Heid","year":"2008","journal-title":"Am. J. Epidemiol"},{"key":"2023012712423600500_bty415-B15","author":"Human Genome Structural Variant Consortium","year":"2017"},{"key":"2023012712423600500_bty415-B16","doi-asserted-by":"crossref","first-page":"17875.","DOI":"10.1038\/srep17875","article-title":"Systematic comparison of variant calling pipelines using gold standard personal exome variants","volume":"5","author":"Hwang","year":"2015","journal-title":"Sci. Rep"},{"key":"2023012712423600500_bty415-B17","doi-asserted-by":"crossref","first-page":"827","DOI":"10.1534\/genetics.106.064618","article-title":"Maximum-likelihood estimation of allelic dropout and false allele error rates from microsatellite genotypes in the absence of reference data","volume":"175","author":"Johnson","year":"2007","journal-title":"Genetics"},{"key":"2023012712423600500_bty415-B18","author":"Jostins","year":"2011"},{"key":"2023012712423600500_bty415-B19","doi-asserted-by":"crossref","first-page":"2835","DOI":"10.1093\/bioinformatics\/btt503","article-title":"A statistical variant calling approach from pedigree information and local haplotyping with phase informative reads","volume":"29","author":"Kojima","year":"2013","journal-title":"Bioinformatics"},{"key":"2023012712423600500_bty415-B20","doi-asserted-by":"crossref","first-page":"26","DOI":"10.1002\/gepi.20431","article-title":"Parametric model-based statistics for possible genotyping errors and sample stratification in sibling-pair SNP data","volume":"34","author":"Korostishevsky","year":"2009","journal-title":"Genet. Epidemiol"},{"key":"2023012712423600500_bty415-B21","doi-asserted-by":"crossref","first-page":"2987","DOI":"10.1093\/bioinformatics\/btr509","article-title":"A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data","volume":"27","author":"Li","year":"2011","journal-title":"Bioinformatics"},{"key":"2023012712423600500_bty415-B22","doi-asserted-by":"crossref","first-page":"2843","DOI":"10.1093\/bioinformatics\/btu356","article-title":"Toward better understanding of artifacts in variant calling from high-coverage samples","volume":"30","author":"Li","year":"2014","journal-title":"Bioinformatics"},{"key":"2023012712423600500_bty415-B23","doi-asserted-by":"crossref","first-page":"1754","DOI":"10.1093\/bioinformatics\/btp324","article-title":"Fast and accurate short read alignment with Burrows-Wheeler transform","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012712423600500_bty415-B24","doi-asserted-by":"crossref","first-page":"201","DOI":"10.1038\/nature18964","article-title":"The Simons genome diversity project: 300 genomes from 142 diverse populations","volume":"538","author":"Mallick","year":"2016","journal-title":"Nature"},{"key":"2023012712423600500_bty415-B25","doi-asserted-by":"crossref","first-page":"2880","DOI":"10.1093\/bioinformatics\/btr486","article-title":"Integration of SNP genotyping confidence scores in IBD inference","volume":"27","author":"Markus","year":"2011","journal-title":"Bioinformatics"},{"key":"2023012712423600500_bty415-B26","doi-asserted-by":"crossref","first-page":"1297","DOI":"10.1101\/gr.107524.110","article-title":"The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data","volume":"20","author":"McKenna","year":"2010","journal-title":"Genome Res"},{"key":"2023012712423600500_bty415-B27","doi-asserted-by":"crossref","first-page":"e0133465.","DOI":"10.1371\/journal.pone.0133465","article-title":"Family-based benchmarking of copy number variation detection software","volume":"10","author":"Nutsua","year":"2015","journal-title":"Plos One"},{"key":"2023012712423600500_bty415-B28","doi-asserted-by":"crossref","first-page":"235.","DOI":"10.3389\/fgene.2015.00235","article-title":"Best practices for evaluating single nucleotide variant calling methods for microbial genomics","volume":"6","author":"Olson","year":"2015","journal-title":"Front. Genet"},{"key":"2023012712423600500_bty415-B29","doi-asserted-by":"crossref","first-page":"64.","DOI":"10.1186\/s12864-016-2366-2","article-title":"svclassify: a method to establish benchmark structural variant calls","volume":"17","author":"Parikh","year":"2016","journal-title":"BMC Genomics"},{"key":"2023012712423600500_bty415-B30","doi-asserted-by":"crossref","first-page":"3985","DOI":"10.1073\/pnas.1222158110","article-title":"Rare variant detection using family-based sequencing analysis","volume":"110","author":"Peng","year":"2013","journal-title":"Proc. Natl. Acad. Sci"},{"key":"2023012712423600500_bty415-B31","doi-asserted-by":"crossref","first-page":"43169.","DOI":"10.1038\/srep43169","article-title":"Evaluating variant calling tools for non-matched next-generation sequencing data","volume":"7","author":"Sandmann","year":"2017","journal-title":"Sci. Rep"},{"key":"2023012712423600500_bty415-B32","doi-asserted-by":"crossref","first-page":"291","DOI":"10.1016\/j.ygeno.2007.05.011","article-title":"Estimating genotyping error rates from Mendelian errors in SNP array genotypes and their impact on inference","volume":"90","author":"Saunders","year":"2007","journal-title":"Genomics"},{"key":"2023012712423600500_bty415-B33","doi-asserted-by":"crossref","first-page":"e0129277.","DOI":"10.1371\/journal.pone.0129277","article-title":"Inexpensive and highly reproducible cloud-based variant calling of 2, 535 human genomes","volume":"10","author":"Shringarpure","year":"2015","journal-title":"PLoS One"},{"key":"2023012712423600500_bty415-B34","doi-asserted-by":"crossref","first-page":"496","DOI":"10.1086\/338920","article-title":"Detection and integration of genotyping errors in statistical genetics","volume":"70","author":"Sobel","year":"2002","journal-title":"Am. J. Hum. Genet"},{"key":"2023012712423600500_bty415-B35","doi-asserted-by":"crossref","first-page":"2787","DOI":"10.1093\/bioinformatics\/btu345","article-title":"SMASH: a benchmarking toolkit for human genome variant calling","volume":"30","author":"Talwalkar","year":"2014","journal-title":"Bioinformatics"},{"key":"2023012712423600500_bty415-B36","author":"Topta\u015f","year":"2018"},{"key":"2023012712423600500_bty415-B37","first-page":"11.10.1","article-title":"From FastQ data to high-confidence variant calls: the genome analysis toolkit best practices pipeline","volume":"11","author":"Van der Auwera","year":"2013","journal-title":"Curr. Protocols Bioinform"},{"key":"2023012712423600500_bty415-B38","doi-asserted-by":"crossref","first-page":"565","DOI":"10.1038\/nrg3241","article-title":"De novo mutations in human genetic disease","volume":"13","author":"Veltman","year":"2012","journal-title":"Nat. Rev. Genet"},{"key":"2023012712423600500_bty415-B39","doi-asserted-by":"crossref","first-page":"1963","DOI":"10.1093\/genetics\/166.4.1963","article-title":"Sibship reconstruction from genetic data with typing errors","volume":"166","author":"Wang","year":"2004","journal-title":"Genetics"},{"key":"2023012712423600500_bty415-B40","doi-asserted-by":"crossref","first-page":"246","DOI":"10.1038\/nbt.2835","article-title":"Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls","volume":"32","author":"Zook","year":"2014","journal-title":"Nat. Biotechnol"},{"key":"2023012712423600500_bty415-B41","doi-asserted-by":"crossref","first-page":"160025.","DOI":"10.1038\/sdata.2016.25","article-title":"Extensive sequencing of seven human genomes to characterize benchmark reference materials","volume":"3","author":"Zook","year":"2016","journal-title":"Sci. Data"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/20\/3488\/48919407\/bioinformatics_34_20_3488.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/20\/3488\/48919407\/bioinformatics_34_20_3488.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,6]],"date-time":"2024-07-06T18:51:32Z","timestamp":1720291892000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/34\/20\/3488\/5021678"}},"subtitle":[],"editor":[{"given":"Oliver","family":"Stegle","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2018,5,29]]},"references-count":41,"journal-issue":{"issue":"20","published-print":{"date-parts":[[2018,10,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bty415","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/208116","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2018,10,15]]},"published":{"date-parts":[[2018,5,29]]}}}