{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,19]],"date-time":"2026-03-19T21:17:14Z","timestamp":1773955034742,"version":"3.50.1"},"reference-count":30,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2019,11,8]],"date-time":"2019-11-08T00:00:00Z","timestamp":1573171200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2019,11,8]],"date-time":"2019-11-08T00:00:00Z","timestamp":1573171200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2019,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n<jats:sec>\n<jats:title>Background<\/jats:title>\n<jats:p>Use of the Genome Analysis Toolkit (GATK) continues to be the standard practice in genomic variant calling in both research and the clinic. Recently the toolkit has been rapidly evolving. Significant computational performance improvements have been introduced in GATK3.8 through collaboration with Intel in 2017. The first release of GATK4 in early 2018 revealed rewrites in the code base, as the stepping stone toward a Spark implementation. As the software continues to be a moving target for optimal deployment in highly productive environments, we present a detailed analysis of these improvements, to help the community stay abreast with changes in performance.<\/jats:p>\n<\/jats:sec>\n<jats:sec>\n<jats:title>Results<\/jats:title>\n<jats:p>We re-evaluated multiple options, such as threading, parallel garbage collection, I\/O options and data-level parallelization. Additionally, we considered the trade-offs of using GATK3.8 and GATK4. We found optimized parameter values that reduce the time of executing the best practices variant calling procedure by 29.3% for GATK3.8 and 16.9% for GATK4. Further speedups can be accomplished by splitting data for parallel analysis, resulting in run time of only a few hours on whole human genome sequenced to the depth of 20X, for both versions of GATK. Nonetheless, GATK4 is already much more cost-effective than GATK3.8. Thanks to significant rewrites of the algorithms, the same analysis can be run largely in a single-threaded fashion, allowing users to process multiple samples on the same CPU.<\/jats:p>\n<\/jats:sec>\n<jats:sec>\n<jats:title>Conclusions<\/jats:title>\n<jats:p>In time-sensitive situations, when a patient has a critical or rapidly developing condition, it is useful to minimize the time to process a single sample. In such cases we recommend using GATK3.8 by splitting the sample into chunks and computing across multiple nodes. The resultant walltime will be nnn.4 hours at the cost of $41.60 on 4 c5.18xlarge instances of Amazon Cloud. For cost-effectiveness of routine analyses or for large population studies, it is useful to maximize the number of samples processed per unit time. Thus we recommend GATK4, running multiple samples on one node. The total walltime will be \u223c34.1 hours on 40 samples, with 1.18 samples processed per hour at the cost of $2.60 per sample on c5.18xlarge instance of Amazon Cloud.<\/jats:p>\n<\/jats:sec>","DOI":"10.1186\/s12859-019-3169-7","type":"journal-article","created":{"date-parts":[[2019,11,8]],"date-time":"2019-11-08T15:27:31Z","timestamp":1573226851000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":51,"title":["Recommendations for performance optimizations when using GATK3.8 and GATK4"],"prefix":"10.1186","volume":"20","author":[{"given":"Jacob R","family":"Heldenbrand","sequence":"first","affiliation":[]},{"given":"Saurabh","family":"Baheti","sequence":"additional","affiliation":[]},{"given":"Matthew A","family":"Bockol","sequence":"additional","affiliation":[]},{"given":"Travis M","family":"Drucker","sequence":"additional","affiliation":[]},{"given":"Steven N","family":"Hart","sequence":"additional","affiliation":[]},{"given":"Matthew E","family":"Hudson","sequence":"additional","affiliation":[]},{"given":"Ravishankar K","family":"Iyer","sequence":"additional","affiliation":[]},{"given":"Michael T","family":"Kalmbach","sequence":"additional","affiliation":[]},{"given":"Katherine I","family":"Kendig","sequence":"additional","affiliation":[]},{"given":"Eric W","family":"Klee","sequence":"additional","affiliation":[]},{"given":"Nathan R","family":"Mattson","sequence":"additional","affiliation":[]},{"given":"Eric D","family":"Wieben","sequence":"additional","affiliation":[]},{"given":"Mathieu","family":"Wiepert","sequence":"additional","affiliation":[]},{"given":"Derek E","family":"Wildman","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7121-0214","authenticated-orcid":false,"given":"Liudmila S","family":"Mainzer","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2019,11,8]]},"reference":[{"issue":"1","key":"3169_CR1","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1038\/nrg2626","volume":"11","author":"ML Metzker","year":"2010","unstructured":"Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet. 2010; 11(1):31\u201346. \nhttps:\/\/doi.org\/10.1038\/nrg2626\n\n. Accessed 2017-09-19.","journal-title":"Nat Rev Genet"},{"issue":"6","key":"3169_CR2","doi-asserted-by":"publisher","first-page":"333","DOI":"10.1038\/nrg.2016.49","volume":"17","author":"S Goodwin","year":"2016","unstructured":"Goodwin S, McPherson JD, McCombie WR. Coming of age: ten years of next-generation sequencing technologies. Nat Rev Genet. 2016; 17(6):333\u201351. \nhttps:\/\/doi.org\/10.1038\/nrg.2016.49\n\n.","journal-title":"Nat Rev Genet"},{"issue":"1","key":"3169_CR3","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1038\/jhg.2013.114","volume":"59","author":"B Rabbani","year":"2014","unstructured":"Rabbani B, Tekin M, Mahdieh N. The promise of whole-exome sequencing in medical genetics. J Hum Genet. 2014; 59(1):5\u201315. \nhttps:\/\/doi.org\/10.1038\/jhg.2013.114\n\n. Accessed 2017-09-19.","journal-title":"J Hum Genet"},{"issue":"8","key":"3169_CR4","doi-asserted-by":"publisher","first-page":"1946","DOI":"10.1128\/JCM.01082-16","volume":"54","author":"MW Allard","year":"2016","unstructured":"Allard MW. The future of whole-genome sequencing for public health and the clinic. J Clin Microbiol. 2016; 54(8):1946\u20138. \nhttps:\/\/doi.org\/10.1128\/JCM.01082-16\n\n. Accessed 2017-09-19.","journal-title":"J Clin Microbiol"},{"key":"3169_CR5","unstructured":"The Broad Institute. GATK |Best Practices. 2017. \nhttps:\/\/software.broadinstitute.org\/gatk\/best-practices\/\n\n. Accessed 2017-08-12."},{"issue":"9","key":"3169_CR6","doi-asserted-by":"publisher","first-page":"1297","DOI":"10.1101\/gr.107524.110","volume":"20","author":"A McKenna","year":"2010","unstructured":"McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA. The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010; 20(9):1297\u2013303. \nhttps:\/\/doi.org\/10.1101\/gr.107524.110\n\n.","journal-title":"Genome Res"},{"issue":"5","key":"3169_CR7","doi-asserted-by":"publisher","first-page":"491","DOI":"10.1038\/ng.806","volume":"43","author":"MA DePristo","year":"2011","unstructured":"DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, Philippakis AA, del Angel G, Rivas MA, Hanna M, McKenna A, Fennell TJ, Kernytsky AM, Sivachenko AY, Cibulskis K, Gabriel SB, Altshuler D, Daly MJ. A framework for variation discovery and genotyping using next-generation dna sequencing data. Nat Genet. 2011; 43(5):491\u20138. \nhttps:\/\/doi.org\/10.1038\/ng.806\n\n. Accessed 2017-09-19.","journal-title":"Nat Genet"},{"issue":"1110","key":"3169_CR8","doi-asserted-by":"publisher","first-page":"11","DOI":"10.1002\/0471250953.bi1110s43","volume":"11","author":"GA Van der Auwera","year":"2013","unstructured":"Van der Auwera GA, Carneiro MO, Hartl C, Poplin R, Del Angel G, Levy-Moonshine A, Jordan T, Shakir K, Roazen D, Thibault J, Banks E, Garimella KV, Altshuler D, Gabriel S, DePristo MA. From fastq data to high confidence variant calls: the genome analysis toolkit best practices pipeline. Curr Protoc Bioinformatics. 2013; 11(1110):11\u2013101111033. \nhttps:\/\/doi.org\/10.1002\/0471250953.bi1110s43\n\n. Accessed 2017-09-19.","journal-title":"Curr Protoc Bioinformatics"},{"key":"3169_CR9","unstructured":"Illumina. Illumina sequencing platforms. 2018. \nhttps:\/\/www.illumina.com\/systems\/sequencing-platforms.html\n\n. Accessed 17 Jun 2018."},{"issue":"1","key":"3169_CR10","doi-asserted-by":"publisher","first-page":"9058","DOI":"10.1038\/s41598-017-09089-1","volume":"7","author":"N Kathiresan","year":"2017","unstructured":"Kathiresan N, Temanni R, Almabrazi H, Syed N, Jithesh PV, Al-Ali R. Accelerating next generation sequencing data analysis with system level optimizations. Sci Rep. 2017; 7(1):9058.","journal-title":"Sci Rep"},{"key":"3169_CR11","volume-title":"2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","author":"CH Costa","year":"2018","unstructured":"Costa CH, Misale C, Liu F, Silva M, Franke H, Crumley P, D\u2019Amora B. Optimization of genomics analysis pipeline for scalable performance in a cloud environment. In: 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). Piscataway: IEEE: 2018. p. 1147\u201354."},{"key":"3169_CR12","volume-title":"2018 IEEE 23rd International Conference on Digital Signal Processing (DSP)","author":"S-M Liu","year":"2018","unstructured":"Liu S-M, Lin Z-Y, Ju J-L, Chen S-J. Acceleration of variant discovery tool in gatk. In: 2018 IEEE 23rd International Conference on Digital Signal Processing (DSP). Piscataway: IEEE: 2018. p. 1\u20134."},{"key":"3169_CR13","doi-asserted-by":"publisher","unstructured":"Banerjee SS, Athreya AP, Mainzer LS, Jongeneel CV, Hwu W-M, Kalbarczyk ZT, Iyer RK. Efficient and scalable workflows for genomic analyses. In: Proceedings of the ACM International Workshop on Data-Intensive Distributed Computing: 2016. p. 27\u201336. \nhttps:\/\/doi.org\/10.1145\/2912152.2912156\n\n.","DOI":"10.1145\/2912152.2912156"},{"issue":"16","key":"3169_CR14","doi-asserted-by":"publisher","first-page":"2041","DOI":"10.1093\/bioinformatics\/btt314","volume":"29","author":"C Raczy","year":"2013","unstructured":"Raczy C, Petrovski R, Saunders CT, Chorny I, Kruglyak S, Margulies EH, Chuang H-Y, K\u00e4llberg M, Kumar SA, Liao A, Little KM, Str\u00f6mberg MP, Tanner SW. Isaac: ultra-fast whole-genome secondary analysis on illumina sequencing platforms. Bioinformatics. 2013; 29(16):2041\u20133. \nhttps:\/\/doi.org\/10.1093\/bioinformatics\/btt314\n\n. Accessed 2017-09-19.","journal-title":"Bioinformatics"},{"key":"3169_CR15","doi-asserted-by":"publisher","unstructured":"Freed DN, Aldana R, Weber JA, Edwards JS. The sentieon genomics tools-a fast and accurate solution to variant calling from next-generation sequence data. BioRxiv. 2017:115717. \nhttps:\/\/doi.org\/10.1101\/115717\n\n.","DOI":"10.1101\/115717"},{"key":"3169_CR16","doi-asserted-by":"publisher","unstructured":"Weber JA, Aldana R, Gallagher BD, Edwards JS. Sentieon dna pipeline for variant detection-software-only solution, over 20 \u00d7 faster than gatk 3.3 with identical results. PeerJ PrePrints 4:e1672v2: 2016. \nhttps:\/\/doi.org\/10.7287\/peerj.preprints.1672v2\n\n.","DOI":"10.7287\/peerj.preprints.1672v2"},{"issue":"40","key":"3169_CR17","doi-asserted-by":"publisher","first-page":"8320","DOI":"10.1073\/pnas.1713830114","volume":"114","author":"M Pl\u00fcss","year":"2017","unstructured":"Pl\u00fcss M, Kopps AM, Keller I, Meienberg J, Caspar SM, Dubacher N, Bruggmann R, Vogel M, Matyas G. Need for speed in accurate whole-genome data analysis: Genalice map challenges bwa\/gatk more than pemapper\/pecaller and isaac. Proc Nat Acad Sci. 2017; 114(40):8320\u20132.","journal-title":"Proc Nat Acad Sci"},{"issue":"1","key":"3169_CR18","doi-asserted-by":"publisher","first-page":"100","DOI":"10.1186\/s13073-015-0221-8","volume":"7","author":"NA Miller","year":"2015","unstructured":"Miller NA, Farrow EG, Gibson M, Willig LK, Twist G, Yoo B, Marrs T, Corder S, Krivohlavek L, Walter A, et al. A 26-hour system of highly sensitive whole genome sequencing for emergency management of genetic diseases. Genome Med. 2015; 7(1):100.","journal-title":"Genome Med"},{"key":"3169_CR19","unstructured":"Intel, Broad Institute Announce Breakthrough Genomics Analytics Stack. \nhttps:\/\/www.hpcwire.com\/off-the-wire\/intel-broad-institute-announce-breakthrough-genomics-analytics-stack\/\n\n. Accessed 17 Jun 2018."},{"key":"3169_CR20","unstructured":"Genomic Research by Intel and Broad Institute. \nhttps:\/\/www.intel.com\/content\/www\/us\/en\/healthcare-it\/solutions\/genomics-broad-data.html\n\n. Accessed 17 Jun 2018."},{"key":"3169_CR21","unstructured":"GATK: We\u2019re Officially BFFs with Intel Now. \nhttps:\/\/gatkforums.broadinstitute.org\/gatk\/discussion\/8605\/were-officially-bffs-with-intel-now\n\n. Accessed 17 Jun 2018."},{"key":"3169_CR22","unstructured":"Version Highlights for GATK Version 3.8. \nhttps:\/\/gatkforums.broadinstitute.org\/gatk\/discussion\/10063\/version-highlights-for-gatk-version-3-8\n\n. Accessed 17 Jun 2018."},{"issue":"15","key":"3169_CR23","doi-asserted-by":"publisher","first-page":"2482","DOI":"10.1093\/bioinformatics\/btv179","volume":"31","author":"D Decap","year":"2015","unstructured":"Decap D, Reumers J, Herzeel C, Costanza P, Fostier J. Halvade: scalable sequence analysis with mapreduce. Bioinformatics. 2015; 31(15):2482\u20138.","journal-title":"Bioinformatics"},{"key":"3169_CR24","volume-title":"2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","author":"H Mushtaq","year":"2015","unstructured":"Mushtaq H, Al-Ars Z. Cluster-based apache spark implementation of the gatk dna analysis pipeline. In: 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). Piscataway: IEEE: 2015. p. 1471\u20137."},{"key":"3169_CR25","volume-title":"2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","author":"L Deng","year":"2016","unstructured":"Deng L, Huang G, Zhuang Y, Wei J, Yan Y. Higene: A high-performance platform for genomic data analysis. In: 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). Piscataway: IEEE: 2016. p. 576\u201383."},{"key":"3169_CR26","first-page":"2013","volume":"207","author":"M Massie","year":"2013","unstructured":"Massie M, Nothaft F, Hartl C, Kozanitis C, Schumacher A, Joseph AD, Patterson DA. Adam: Genomics formats and processing patterns for cloud scale computing. Univ Cali, Berkeley Tech Rep, No. UCB\/EECS-2013. 2013; 207:2013.","journal-title":"Univ Cali, Berkeley Tech Rep, No. UCB\/EECS-2013"},{"issue":"3","key":"3169_CR27","doi-asserted-by":"publisher","first-page":"246","DOI":"10.1038\/nbt.2835","volume":"32","author":"JM Zook","year":"2014","unstructured":"Zook JM, Chapman B, Wang J, Mittelman D, Hofmann O, Hide W, Salit M. Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nat Biotechnol. 2014; 32(3):246.","journal-title":"Nat Biotechnol"},{"key":"3169_CR28","doi-asserted-by":"publisher","unstructured":"Zook J, McDaniel J, Parikh H, Heaton H, Irvine SA, Trigg L, Truty R, McLean CY, De La Vega FM, Xiao C, Sherry S, Salit M. Reproducible integration of multiple sequencing datasets to form high-confidence SNP, indel, and reference calls for five human genome reference materials. bioRxiv. 2018. \nhttps:\/\/doi.org\/10.1101\/281006\n\n.","DOI":"10.1101\/281006"},{"key":"3169_CR29","unstructured":"Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. 2013. http:\/\/arxiv.org\/abs\/1303.3997v2."},{"key":"3169_CR30","unstructured":"NOVOCRAFT TECHNOLOGIES SDN BHD. Novocraft. 2014. \nhttp:\/\/www.novocraft.com\/\n\n. Accessed 2017-06-27."}],"updated-by":[{"DOI":"10.1186\/s12859-019-3277-4","type":"correction","label":"Correction","source":"publisher","updated":{"date-parts":[[2019,12,17]],"date-time":"2019-12-17T00:00:00Z","timestamp":1576540800000}}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-019-3169-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/s12859-019-3169-7\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-019-3169-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2020,11,7]],"date-time":"2020-11-07T00:09:03Z","timestamp":1604707743000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-019-3169-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,11,8]]},"references-count":30,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2019,12]]}},"alternative-id":["3169"],"URL":"https:\/\/doi.org\/10.1186\/s12859-019-3169-7","relation":{"correction":[{"id-type":"doi","id":"10.1186\/s12859-019-3277-4","asserted-by":"object"}]},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,11,8]]},"assertion":[{"value":"25 March 2019","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 October 2019","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"8 November 2019","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"17 December 2019","order":4,"name":"change_date","label":"Change Date","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Correction","order":5,"name":"change_type","label":"Change Type","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Following publication of the original article [1], the author explained that Table 2 is displayed incorrectly. The correct Table 2 is given below. The original article has been corrected.","order":6,"name":"change_details","label":"Change Details","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Not Applicable","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not Applicable","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"557"}}