{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,20]],"date-time":"2025-11-20T12:51:58Z","timestamp":1763643118178,"version":"3.37.3"},"reference-count":18,"publisher":"Oxford University Press (OUP)","issue":"17","funder":[{"name":"Veterans Affairs Office of Research and Development Cooperative Studies Program"},{"DOI":"10.13039\/100000051","name":"National Human Genome Research Institute","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000051","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"United States National Institutes of Health","doi-asserted-by":"publisher","award":["U24 HG009397","RM1-HG007735"],"award-info":[{"award-number":["U24 HG009397","RM1-HG007735"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,9,9]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>A major drawback of executing genomic applications on cloud computing facilities is the lack of tools to predict which instance type is the most appropriate, often resulting in an over- or under- matching of resources. Determining the right configuration before actually running the applications will save money and time. Here, we introduce Hummingbird, a tool for predicting performance of computing instances with varying memory and CPU on multiple cloud platforms.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>Our experiments on three major genomic data pipelines, including GATK HaplotypeCaller, GATK Mutect2 and ENCODE ATAC-seq, showed that Hummingbird was able to address applications in command line specified in JSON format or workflow description language (WDL) format, and accurately predicted the fastest, the cheapest and the most cost-efficient compute instances in an economic manner.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>Hummingbird is available as an open source tool at: https:\/\/github.com\/StanfordBioinformatics\/Hummingbird.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btab161","type":"journal-article","created":{"date-parts":[[2021,3,4]],"date-time":"2021-03-04T20:12:44Z","timestamp":1614888764000},"page":"2537-2543","source":"Crossref","is-referenced-by-count":4,"title":["Hummingbird: efficient performance prediction for executing genomic applications in the cloud"],"prefix":"10.1093","volume":"37","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4533-9334","authenticated-orcid":false,"given":"Amir","family":"Bahmani","sequence":"first","affiliation":[{"name":"Stanford Healthcare Innovation Lab, Stanford University , Stanford, CA 94304, USA"},{"name":"Stanford Center for Genomics and Personalized Medicine, Stanford University , Stanford, CA 94304, USA"},{"name":"Department of Genetics, Stanford University , Stanford, CA 94304, USA"}]},{"given":"Ziye","family":"Xing","sequence":"additional","affiliation":[{"name":"Stanford Center for Genomics and Personalized Medicine, Stanford University , Stanford, CA 94304, USA"},{"name":"Department of Genetics, Stanford University , Stanford, CA 94304, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6475-8019","authenticated-orcid":false,"given":"Vandhana","family":"Krishnan","sequence":"additional","affiliation":[{"name":"Stanford Center for Genomics and Personalized Medicine, Stanford University , Stanford, CA 94304, USA"},{"name":"Department of Genetics, Stanford University , Stanford, CA 94304, USA"}]},{"given":"Utsab","family":"Ray","sequence":"additional","affiliation":[{"name":"Department of Computer Science, North Carolina State University , Raleigh, NC 27606 USA"}]},{"given":"Frank","family":"Mueller","sequence":"additional","affiliation":[{"name":"Department of Computer Science, North Carolina State University , Raleigh, NC 27606 USA"}]},{"given":"Amir","family":"Alavi","sequence":"additional","affiliation":[{"name":"Stanford Healthcare Innovation Lab, Stanford University , Stanford, CA 94304, USA"}]},{"given":"Philip S.","family":"Tsao","sequence":"additional","affiliation":[{"name":"Palo Alto Epidemiology Research and Information Center for Genomics, VA Palo Alto , Palo Alto, CA 94304, USA"}]},{"given":"Michael P.","family":"Snyder","sequence":"additional","affiliation":[{"name":"Stanford Healthcare Innovation Lab, Stanford University , Stanford, CA 94304, USA"},{"name":"Stanford Center for Genomics and Personalized Medicine, Stanford University , Stanford, CA 94304, USA"},{"name":"Department of Genetics, Stanford University , Stanford, CA 94304, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8152-2489","authenticated-orcid":false,"given":"Cuiping","family":"Pan","sequence":"additional","affiliation":[{"name":"Palo Alto Epidemiology Research and Information Center for Genomics, VA Palo Alto , Palo Alto, CA 94304, USA"}]}],"member":"286","published-online":{"date-parts":[[2021,3,8]]},"reference":[{"key":"2024041009305921500_btab161-B1","doi-asserted-by":"crossref","first-page":"83","DOI":"10.1038\/s41586-020-2371-0","article-title":"Mapping and characterization of structural variation in 17,795 human genomes","volume":"583","author":"Abel","year":"2020","journal-title":"Nature"},{"year":"2017","author":"Alipourfard","key":"2024041009305921500_btab161-B2"},{"key":"2024041009305921500_btab161-B3","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/sdata.2016.10","article-title":"An open access pilot freely sharing cancer genomic data from participants in Texas","volume":"3","author":"Becnel","year":"2016","journal-title":"Sci. Data"},{"key":"2024041009305921500_btab161-B4","doi-asserted-by":"crossref","first-page":"213","DOI":"10.1038\/nbt.2514","article-title":"Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples","volume":"31","author":"Cibulskis","year":"2013","journal-title":"Nat. Biotechnol"},{"key":"2024041009305921500_btab161-B5","doi-asserted-by":"crossref","first-page":"D794","DOI":"10.1093\/nar\/gkx1081","article-title":"The encyclopedia of DNA elements (encode): data portal update","volume":"46","author":"Davis","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"2024041009305921500_btab161-B6","doi-asserted-by":"crossref","first-page":"157","DOI":"10.1101\/gr.210500.116","article-title":"A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree","volume":"27","author":"Eberle","year":"2017","journal-title":"Genome Res"},{"key":"2024041009305921500_btab161-B7","doi-asserted-by":"crossref","first-page":"214","DOI":"10.1016\/j.jclinepi.2015.09.016","article-title":"Million veteran program: a mega-biobank to study genetic influences on health and disease","volume":"70","author":"Gaziano","year":"2016","journal-title":"J. Clin. Epidemiol"},{"year":"2010","author":"Gunarathne","key":"2024041009305921500_btab161-B8"},{"year":"2018","author":"Hsu","key":"2024041009305921500_btab161-B9"},{"year":"2013","author":"Li","key":"2024041009305921500_btab161-B10"},{"key":"2024041009305921500_btab161-B11","doi-asserted-by":"crossref","first-page":"1297","DOI":"10.1101\/gr.107524.110","article-title":"The genome analysis toolkit: a mapreduce framework for analyzing next-generation DNA sequencing data","volume":"20","author":"McKenna","year":"2010","journal-title":"Genome Res"},{"key":"2024041009305921500_btab161-B12","doi-asserted-by":"crossref","first-page":"774","DOI":"10.1016\/j.jbi.2013.07.001","article-title":"Big data\u2019, Hadoop and cloud computing in genomics","volume":"46","author":"O\u2019Driscoll","year":"2013","journal-title":"J. Biomed. Inf"},{"key":"2024041009305921500_btab161-B13","doi-asserted-by":"crossref","first-page":"207","DOI":"10.1186\/gb-2010-11-5-207","article-title":"The case for cloud computing in genome informatics","volume":"11","author":"Stein","year":"2010","journal-title":"Genome Biol"},{"key":"2024041009305921500_btab161-B14","article-title":"Sequencing of 53,831 diverse genomes from the NHLBI topmed program","author":"Taliun","year":"2021","journal-title":"Nature 590.7845: 290-299"},{"key":"2024041009305921500_btab161-B15","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1002\/0471250953.bi1110s43","article-title":"From FastQ data to high-confidence variant calls: the Genome Analysis Toolkit best practices pipeline","volume":"43","author":"Van der Auwera","year":"2013","journal-title":"Curr. Protoc. Bioinf"},{"key":"2024041009305921500_btab161-B16","first-page":"363","volume-title":"Proceedings of the 13th Usenix Conference on Networked Systems Design and Implementation (NSDI'16).","author":"Venkataraman","year":"2016"},{"key":"2024041009305921500_btab161-B17","first-page":"1381","article-title":"Full-stack genomics pipelining with gatk4+wdl+cromwell [version 1; not peer reviewed]","volume":"6","author":"Voss","year":"2017","journal-title":"ISCB Commun. J"},{"year":"2017","author":"Yadwadkar","key":"2024041009305921500_btab161-B18"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btab161\/39355328\/btab161.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/17\/2537\/57195971\/btab161.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/17\/2537\/57195971\/btab161.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,4,10]],"date-time":"2024-04-10T09:39:05Z","timestamp":1712741945000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/37\/17\/2537\/6162881"}},"subtitle":[],"editor":[{"given":"Janet","family":"Kelso","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2021,3,8]]},"references-count":18,"journal-issue":{"issue":"17","published-print":{"date-parts":[[2021,9,9]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btab161","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2021,9,1]]},"published":{"date-parts":[[2021,3,8]]}}}