{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,25]],"date-time":"2026-06-25T03:43:43Z","timestamp":1782359023288,"version":"3.54.5"},"reference-count":51,"publisher":"Oxford University Press (OUP)","issue":"24","license":[{"start":{"date-parts":[[2021,1,5]],"date-time":"2021-01-05T00:00:00Z","timestamp":1609804800000},"content-version":"vor","delay-in-days":21,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000051","name":"NHGRI","doi-asserted-by":"publisher","award":["3UM1HG008901-03S1"],"award-info":[{"award-number":["3UM1HG008901-03S1"]}],"id":[{"id":"10.13039\/100000051","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100006785","name":"Google LLC","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100006785","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,4,5]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Population-scale sequenced cohorts are foundational resources for genetic analyses, but processing raw reads into analysis-ready cohort-level variants remains challenging.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We introduce an open-source cohort-calling method that uses the highly accurate caller DeepVariant and scalable merging tool GLnexus. Using callset quality metrics based on variant recall and precision in benchmark samples and Mendelian consistency in father-mother-child trios, we optimize the method across a range of cohort sizes, sequencing methods and sequencing depths. The resulting callsets show consistent quality improvements over those generated using existing best practices with reduced cost. We further evaluate our pipeline in the deeply sequenced 1000 Genomes Project (1KGP) samples and show superior callset quality metrics and imputation reference panel performance compared to an independently generated GATK Best Practices pipeline.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>We publicly release the 1KGP individual-level variant calls and cohort callset (https:\/\/console.cloud.google.com\/storage\/browser\/brain-genomics-public\/research\/cohort\/1KGP) to foster additional development and evaluation of cohort merging methods as well as broad studies of genetic variation. Both DeepVariant (https:\/\/github.com\/google\/deepvariant) and GLnexus (https:\/\/github.com\/dnanexus-rnd\/GLnexus) are open-source, and the optimized GLnexus setup discovered in this study is also integrated into GLnexus public releases v1.2.2 and later.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaa1081","type":"journal-article","created":{"date-parts":[[2020,12,17]],"date-time":"2020-12-17T13:53:10Z","timestamp":1608213190000},"page":"5582-5589","source":"Crossref","is-referenced-by-count":233,"title":["Accurate, scalable cohort variant calls using DeepVariant and GLnexus"],"prefix":"10.1093","volume":"36","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6242-5536","authenticated-orcid":false,"given":"Taedong","family":"Yun","sequence":"first","affiliation":[{"name":"Google Health , Cambridge, MA 02142, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1145-6527","authenticated-orcid":false,"given":"Helen","family":"Li","sequence":"additional","affiliation":[{"name":"Google Health , Palo Alto, CA 94304, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Pi-Chuan","family":"Chang","sequence":"additional","affiliation":[{"name":"Google Health , Palo Alto, CA 94304, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Michael F","family":"Lin","sequence":"additional","affiliation":[{"name":"mlin.net LLC , Honolulu, HI 96816, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4824-6689","authenticated-orcid":false,"given":"Andrew","family":"Carroll","sequence":"additional","affiliation":[{"name":"Google Health , Palo Alto, CA 94304, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9928-8216","authenticated-orcid":false,"given":"Cory Y","family":"McLean","sequence":"additional","affiliation":[{"name":"Google Health , Cambridge, MA 02142, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2021,1,5]]},"reference":[{"key":"2023062408143558500_btaa1081-B1","doi-asserted-by":"crossref","first-page":"319","DOI":"10.1016\/j.ajhg.2018.08.007","article-title":"The Clinical Sequencing Evidence-Generating Research Consortium: integrating genomic sequencing in diverse and medically underserved populations","volume":"103","author":"Amendola","year":"2018","journal-title":"Am. J. Hum. Genet"},{"key":"2023062408143558500_btaa1081-B2","doi-asserted-by":"crossref","first-page":"R68","DOI":"10.1186\/gb-2011-12-7-r68","article-title":"Targeted enrichment beyond the consensus coding DNA sequence exome reveals exons with higher variant densities","volume":"12","author":"Bainbridge","year":"2011","journal-title":"Genome Biol"},{"key":"2023062408143558500_btaa1081-B3","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1175\/1520-0493(1950)078<0001:VOFEIT>2.0.CO;2","article-title":"Verification of forecasts expressed in terms of probability","volume":"78","author":"Brier","year":"1950","journal-title":"Mon. Weather Rev"},{"key":"2023062408143558500_btaa1081-B4","doi-asserted-by":"crossref","first-page":"338","DOI":"10.1016\/j.ajhg.2018.07.015","article-title":"A one-penny imputed genome from next-generation reference panels","volume":"103","author":"Browning","year":"2018","journal-title":"Am. J. Hum. Genet"},{"key":"2023062408143558500_btaa1081-B5","doi-asserted-by":"crossref","first-page":"203","DOI":"10.1038\/s41586-018-0579-z","article-title":"The UK Biobank resource with deep phenotyping and genomic data","volume":"562","author":"Bycroft","year":"2018","journal-title":"Nature"},{"key":"2023062408143558500_btaa1081-B6","doi-asserted-by":"crossref","first-page":"2156","DOI":"10.1093\/bioinformatics\/btr330","article-title":"The variant call format and VCFtools","volume":"27","author":"Danecek","year":"2011","journal-title":"Bioinformatics"},{"key":"2023062408143558500_btaa1081-B7","doi-asserted-by":"crossref","first-page":"1834","DOI":"10.1093\/bioinformatics\/bty023","article-title":"GTC: how to maintain huge genotype collections in a compressed form","volume":"34","author":"Danek","year":"2018","journal-title":"Bioinformatics"},{"key":"2023062408143558500_btaa1081-B8","doi-asserted-by":"crossref","first-page":"3934","DOI":"10.1038\/ncomms4934","article-title":"Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel","volume":"5","author":"Delaneau","year":"2014","journal-title":"Nat. Commun"},{"key":"2023062408143558500_btaa1081-B9","doi-asserted-by":"crossref","first-page":"491","DOI":"10.1038\/ng.806","article-title":"A framework for variation discovery and genotyping using next-generation DNA sequencing data","volume":"43","author":"DePristo","year":"2011","journal-title":"Nat. Genet"},{"key":"2023062408143558500_btaa1081-B10","doi-asserted-by":"crossref","first-page":"aaf6814","DOI":"10.1126\/science.aaf6814","article-title":"Distribution and clinical impact of functional variants in 50,726 whole-exome sequences from the DiscovEHR study","volume":"354","author":"Dewey","year":"2016","journal-title":"Science"},{"key":"2023062408143558500_btaa1081-B11","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1016\/S1672-0229(07)60009-6","article-title":"A brief review of short tandem repeat mutation","volume":"5","author":"Fan","year":"2007","journal-title":"Genomics Proteomics Bioinf"},{"key":"2023062408143558500_btaa1081-B12","first-page":"2503","volume-title":"Bioinformatics","author":"Faust","year":"2014"},{"key":"2023062408143558500_btaa1081-B13","article-title":"Haplotype-based variant detection from short-read sequencing","author":"Garrison","year":"2012","journal-title":"arXiv, arXiv: 1207.3907"},{"key":"2023062408143558500_btaa1081-B14","first-page":"1487","author":"Golovin","year":"2017"},{"key":"2023062408143558500_btaa1081-B15","doi-asserted-by":"crossref","first-page":"727","DOI":"10.1007\/s00439-017-1786-7","article-title":"A genome-wide study of Hardy\u2013Weinberg equilibrium with next generation sequence data","volume":"136","author":"Graffelman","year":"2017","journal-title":"Hum. Genet"},{"key":"2023062408143558500_btaa1081-B16","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1126\/science.28.706.49","article-title":"Mendelian proportions in a mixed population","volume":"28","author":"Hardy","year":"1908","journal-title":"Science"},{"key":"2023062408143558500_btaa1081-B17","doi-asserted-by":"crossref","first-page":"801","DOI":"10.1038\/ejhg.2012.3","article-title":"1000 Genomes-based imputation identifies novel and refined associations for the Wellcome Trust Case Control Consortium phase 1 Data","volume":"20","author":"Huang","year":"2012","journal-title":"Eur. J. Hum. Genet"},{"key":"2023062408143558500_btaa1081-B18","doi-asserted-by":"crossref","first-page":"434","DOI":"10.1038\/s41586-020-2308-7","article-title":"The mutational constraint spectrum quantified from variation in 141,456 humans","volume":"581","author":"Karczewski","year":"2020","journal-title":"Nature"},{"key":"2023062408143558500_btaa1081-B19","doi-asserted-by":"crossref","first-page":"1330","DOI":"10.1038\/s41588-019-0483-y","article-title":"Inferring whole-genome histories in large population datasets","volume":"51","author":"Kelleher","year":"2019","journal-title":"Nat. Genet"},{"key":"2023062408143558500_btaa1081-B20","doi-asserted-by":"crossref","first-page":"591","DOI":"10.1038\/s41592-018-0051-x","article-title":"Strelka2: fast and accurate calling of germline and somatic variants","volume":"15","author":"Kim","year":"2018","journal-title":"Nat. Methods"},{"key":"2023062408143558500_btaa1081-B21","doi-asserted-by":"crossref","first-page":"555","DOI":"10.1038\/s41587-019-0054-x","article-title":"Best practices for benchmarking germline small-variant calls in human genomes","volume":"37","author":"Krusche","year":"2019","journal-title":"Nat. Biotechnol"},{"key":"2023062408143558500_btaa1081-B22","doi-asserted-by":"crossref","first-page":"63","DOI":"10.1038\/nmeth.3654","article-title":"Efficient genotype compression and analysis of large genetic-variation data sets","volume":"13","author":"Layer","year":"2016","journal-title":"Nat. Methods"},{"key":"2023062408143558500_btaa1081-B23","doi-asserted-by":"crossref","first-page":"285","DOI":"10.1038\/nature19057","article-title":"Analysis of protein-coding genetic variation in 60,706 humans","volume":"536","author":"Lek","year":"2016","journal-title":"Nature"},{"key":"2023062408143558500_btaa1081-B24","article-title":"Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM","author":"Li","year":"2013","journal-title":"arXiv: 1303.3997"},{"key":"2023062408143558500_btaa1081-B25","doi-asserted-by":"crossref","first-page":"590","DOI":"10.1093\/bioinformatics\/btv613","article-title":"BGT: efficient and flexible genotype query across many samples","volume":"32","author":"Li","year":"2016","journal-title":"Bioinformatics"},{"key":"2023062408143558500_btaa1081-B26","doi-asserted-by":"crossref","first-page":"2078","DOI":"10.1093\/bioinformatics\/btp352","article-title":"The sequence alignment\/map format and SAMtools","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2023062408143558500_btaa1081-B27","first-page":"343970. doi: 10.1101\/343970","article-title":"GLnexus: joint variant calling for large cohort sequencing","author":"Lin","year":"2018"},{"key":"2023062408143558500_btaa1081-B28","article-title":"Sparse Project VCF: efficient encoding of population genotype matrices","author":"Lin","year":"2020"},{"key":"2023062408143558500_btaa1081-B29","doi-asserted-by":"crossref","first-page":"1443","DOI":"10.1038\/ng.3679","article-title":"Reference-based phasing using the haplotype reference consortium panel","volume":"48","author":"Loh","year":"2016","journal-title":"Nat. Genet"},{"key":"2023062408143558500_btaa1081-B30","doi-asserted-by":"crossref","first-page":"998","DOI":"10.1038\/s41467-019-09025-z","article-title":"A multi-task convolutional deep neural network for variant calling in single molecule sequencing","volume":"10","author":"Luo","year":"2019","journal-title":"Nat. Commun"},{"key":"2023062408143558500_btaa1081-B31","doi-asserted-by":"crossref","first-page":"849","DOI":"10.1093\/aje\/kwr160","article-title":"The next PAGE in understanding complex traits: design for the analysis of population architecture using genetics and epidemiology (PAGE) study","volume":"174","author":"Matise","year":"2011","journal-title":"Am. J. Epidemiol"},{"key":"2023062408143558500_btaa1081-B32","doi-asserted-by":"crossref","first-page":"1297","DOI":"10.1101\/gr.107524.110","article-title":"The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data","volume":"20","author":"McKenna","year":"2010","journal-title":"Genome Res"},{"key":"2023062408143558500_btaa1081-B33","doi-asserted-by":"crossref","first-page":"122","DOI":"10.1186\/s13059-016-0974-4","article-title":"The ensembl variant effect predictor","volume":"17","author":"McLaren","year":"2016","journal-title":"Genome Biol"},{"key":"2023062408143558500_btaa1081-B34","doi-asserted-by":"crossref","first-page":"1121","DOI":"10.1038\/ng.3396","article-title":"A comprehensive 1000 Genomes-based genome-wide association meta-analysis of coronary artery disease","volume":"47","author":"Nikpay","year":"2015","journal-title":"Nat. Genet"},{"key":"2023062408143558500_btaa1081-B35","doi-asserted-by":"crossref","first-page":"650","DOI":"10.1038\/ng1047","article-title":"Functional SNPs in the lymphotoxin- gene that are associated with susceptibility to myocardial infarction","volume":"32","author":"Ozaki","year":"2002","journal-title":"Nat. Genet"},{"key":"2023062408143558500_btaa1081-B36","article-title":"Scaling accurate genetic variant discovery to tens of thousands of samples","author":"Poplin","year":"2018"},{"key":"2023062408143558500_btaa1081-B37","doi-asserted-by":"crossref","first-page":"983","DOI":"10.1038\/nbt.4235","article-title":"A universal SNP and small-indel variant caller using deep neural networks","volume":"36","author":"Poplin","year":"2018","journal-title":"Nat. Biotechnol."},{"key":"2023062408143558500_btaa1081-B38","doi-asserted-by":"crossref","first-page":"185","DOI":"10.1016\/j.ajhg.2017.01.006","article-title":"The undiagnosed diseases network: accelerating discovery about health and disease","volume":"100","author":"Ramoni","year":"2017","journal-title":"Am. J. Hum. Genet"},{"key":"2023062408143558500_btaa1081-B39","first-page":"078600","article-title":"Quality control analysis of the 1000 Genomes Project Omni2.5 genotypes","author":"Roslin","year":"2016"},{"key":"2023062408143558500_btaa1081-B40","doi-asserted-by":"crossref","first-page":"608","DOI":"10.1186\/s12864-017-4013-y","article-title":"A phased SNP-based classification of sickle cell anemia HBB haplotypes","volume":"18","author":"Shaikho","year":"2017","journal-title":"BMC Genomics"},{"key":"2023062408143558500_btaa1081-B41","doi-asserted-by":"crossref","first-page":"308","DOI":"10.1093\/nar\/29.1.308","article-title":"dbSNP: the NCBI database of genetic variation","volume":"29","author":"Sherry","year":"2001","journal-title":"Nucleic Acids Res"},{"key":"2023062408143558500_btaa1081-B42","doi-asserted-by":"crossref","first-page":"421","DOI":"10.1002\/sim.4780050506","article-title":"Probabilistic prediction in patient management and clinical trials","volume":"5","author":"Spiegelhalter","year":"1986","journal-title":"Stat. Med"},{"key":"2023062408143558500_btaa1081-B43","doi-asserted-by":"crossref","DOI":"10.1101\/563866","article-title":"Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program","author":"Taliun","year":"2019"},{"key":"2023062408143558500_btaa1081-B44","doi-asserted-by":"crossref","first-page":"1061","DOI":"10.1038\/nature09534","article-title":"A map of human genome variation from population-scale sequencing","volume":"467","year":"2010","journal-title":"Nature"},{"key":"2023062408143558500_btaa1081-B45","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1038\/nature15393","article-title":"A global reference for human genetic variation","volume":"526","year":"2015","journal-title":"Nature"},{"key":"2023062408143558500_btaa1081-B46","doi-asserted-by":"crossref","first-page":"D1001","DOI":"10.1093\/nar\/gkt1229","article-title":"The NHGRI GWAS Catalog, a curated resource of SNP-trait associations","volume":"42","author":"Welter","year":"2014","journal-title":"Nucleic Acids Res"},{"key":"2023062408143558500_btaa1081-B47","doi-asserted-by":"crossref","first-page":"1155","DOI":"10.1038\/s41587-019-0217-9","article-title":"Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome","volume":"37","author":"Wenger","year":"2019","journal-title":"Nat. Biotechnol"},{"key":"2023062408143558500_btaa1081-B48","doi-asserted-by":"crossref","first-page":"1502","DOI":"10.1056\/NEJMoa1306555","article-title":"Clinical whole-exome sequencing for the diagnosis of Mendelian disorders","volume":"369","author":"Yang","year":"2013","journal-title":"N. Engl. J. Med"},{"key":"2023062408143558500_btaa1081-B49","doi-asserted-by":"crossref","first-page":"2251","DOI":"10.1093\/bioinformatics\/btx145","article-title":"SeqArray-a storage-efficient high-performance data format for WGS variant calls","volume":"33","author":"Zheng","year":"2017","journal-title":"Bioinformatics"},{"key":"2023062408143558500_btaa1081-B50","doi-asserted-by":"crossref","first-page":"246","DOI":"10.1038\/nbt.2835","article-title":"Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls","volume":"32","author":"Zook","year":"2014","journal-title":"Nat. Biotechnol"},{"key":"2023062408143558500_btaa1081-B51","doi-asserted-by":"crossref","first-page":"561","DOI":"10.1038\/s41587-019-0074-6","article-title":"An open resource for accurately benchmarking small variant and reference calls","volume":"37","author":"Zook","year":"2019","journal-title":"Nat. Biotechnol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaa1081\/35888806\/btaa1081.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/24\/5582\/50693045\/btaa1081.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/24\/5582\/50693045\/btaa1081.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,24]],"date-time":"2023-06-24T19:46:14Z","timestamp":1687635974000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/36\/24\/5582\/6064144"}},"subtitle":[],"editor":[{"given":"Peter","family":"Robinson","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"editor"}]}],"short-title":[],"issued":{"date-parts":[[2020,12,15]]},"references-count":51,"journal-issue":{"issue":"24","published-print":{"date-parts":[[2021,4,5]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaa1081","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2020.02.10.942086","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2020,12,15]]},"published":{"date-parts":[[2020,12,15]]}}}