{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,11]],"date-time":"2025-06-11T15:26:56Z","timestamp":1749655616045,"version":"3.40.3"},"publisher-location":"Cham","reference-count":19,"publisher":"Springer International Publishing","isbn-type":[{"type":"print","value":"9783030598501"},{"type":"electronic","value":"9783030598518"}],"license":[{"start":{"date-parts":[[2020,1,1]],"date-time":"2020-01-01T00:00:00Z","timestamp":1577836800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,10,20]],"date-time":"2020-10-20T00:00:00Z","timestamp":1603152000000},"content-version":"vor","delay-in-days":293,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020]]},"abstract":"<jats:title>Abstract<\/jats:title>\n<jats:p>Bioinformatics pipelines make extensive use of HPC batch processing. The rapid growth of data volumes and computational complexity, especially for modern applications such as machine learning algorithms, imposes significant challenges to local HPC facilities. Many attempts have been made to burst HPC batch processing into clouds with virtual machines. They all suffer from some common issues, for example: very high overhead, slow to scale up and slow to scale down, and nearly impossible to be cloud-agnostic.<\/jats:p>\n<jats:p>We have successfully deployed and run several pipelines on Kubernetes in OpenStack, Google Cloud Platform and Amazon Web Services. In particular, we use Kubeflow on top of Kubernetes for more sophisticated job scheduling, workflow management, and first class support for machine learning. We choose Kubeflow\/Kubernetes to avoid the overhead of provisioning of virtual machines, to achieve rapid scaling with containers, and to be truly cloud-agnostic in all cloud environments.<\/jats:p>\n<jats:p>Kubeflow on Kubernetes also creates some new challenges in deployment, data access, performance monitoring, etc. We will discuss the details of these challenges and provide our solutions. We will demonstrate how our solutions work across all three very different clouds for both classical pipelines and new ones for machine learning.<\/jats:p>","DOI":"10.1007\/978-3-030-59851-8_24","type":"book-chapter","created":{"date-parts":[[2020,10,19]],"date-time":"2020-10-19T18:04:00Z","timestamp":1603130640000},"page":"355-367","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":10,"title":["Bioinformatics Application with Kubeflow for Batch Processing in Clouds"],"prefix":"10.1007","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1075-1628","authenticated-orcid":false,"given":"David Yu","family":"Yuan","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4297-4738","authenticated-orcid":false,"given":"Tony","family":"Wildish","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2020,10,20]]},"reference":[{"key":"24_CR1","unstructured":"Kubeflow.org. \nhttps:\/\/www.kubeflow.org\/docs\/started\/kubeflow-overview\/"},{"key":"24_CR2","unstructured":"Yuan, D.: RSEConUK 2019, University of Birmingham, 17\u201319 September 2019, Case Study of Porting a Bioinformatics Pipeline into Clouds. \nhttps:\/\/sched.co\/QSRc"},{"key":"24_CR3","unstructured":"Kubernetes, Concepts $$\\rightarrow $$ Workloads $$\\rightarrow $$ Controllers $$\\rightarrow $$ Jobs - Run to Completion. \nhttps:\/\/kubernetes.io\/docs\/concepts\/workloads\/controllers\/jobs-run-to-completion\/"},{"key":"24_CR4","unstructured":"Overview of RKE. \nhttps:\/\/rancher.com\/docs\/rke\/latest\/en\/"},{"key":"24_CR5","unstructured":"Installing Kubeflow. \nhttps:\/\/www.kubeflow.org\/docs\/started\/getting-started\/"},{"key":"24_CR6","unstructured":"Cloud-agnostic Kubeflow deployment. \nhttps:\/\/raw.githubusercontent.com\/kubeflow\/manifests\/v1.0-branch\/kfdef\/kfctl_istio_dex.v1.0.0.yaml"},{"key":"24_CR7","unstructured":"Authentication with Istio + Dex. \nhttps:\/\/journal.arrikto.com\/kubeflow-authentication-with-istio-dex-5eafdfac4782"},{"key":"24_CR8","unstructured":"Storage volume. \nhttps:\/\/kubernetes.io\/docs\/concepts\/storage\/persistent-volumes\/#access-modes"},{"key":"24_CR9","unstructured":"Onedata. \nhttps:\/\/onedata.org\/#\/home"},{"key":"24_CR10","unstructured":"Two-staged build. \nhttps:\/\/gitlab.ebi.ac.uk\/TSI\/kubeflow\/blob\/master\/pipelines\/1000g\/freebayes\/Dockerfile"},{"key":"24_CR11","unstructured":"Function samtools$$\\_$$op. \nhttps:\/\/gitlab.ebi.ac.uk\/TSI\/kubeflow\/-\/blob\/1.0.1\/pipelines\/1000g\/1000g.py"},{"key":"24_CR12","unstructured":"Elasticsearch. \nhttps:\/\/www.elastic.co\/elasticsearch"},{"key":"24_CR13","unstructured":"Elastic Cloud on Kubernetes. \nhttps:\/\/www.elastic.co\/downloads\/elastic-cloud-kubernetes"},{"key":"24_CR14","unstructured":"Data - 1000 Genomes Project. \nhttps:\/\/www.internationalgenome.org\/data\/"},{"key":"24_CR15","unstructured":"IDR: Image Data Repository. \nhttps:\/\/idr.openmicroscopy.org\/webclient\/?show=project-402"},{"key":"24_CR16","unstructured":"Human Reference Genome, v37. \nftp:\/\/ftp.1000genomes.ebi.ac.uk\/vol1\/ftp\/technical\/reference\/human_g1k_v37.fasta.gz"},{"key":"24_CR17","unstructured":"Kubeflow pipeline APIs. \nhttps:\/\/kubeflow-pipelines.readthedocs.io\/en\/stable\/index.html"},{"key":"24_CR18","unstructured":"Nirschl, J.J., et al.: A deep-learning classifier identifies patients with clinical heart failure using whole-slide images of H&E tissue. \nhttps:\/\/www.ncbi.nlm.nih.gov\/pmc\/articles\/PMC5882098\/"},{"key":"24_CR19","unstructured":"OMERO 5.6.0 JSON API. \nhttps:\/\/docs.openmicroscopy.org\/omero\/5.6.0\/developers\/json-api.html"}],"container-title":["Lecture Notes in Computer Science","High Performance Computing"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/978-3-030-59851-8_24","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2020,10,19]],"date-time":"2020-10-19T18:11:49Z","timestamp":1603131109000},"score":1,"resource":{"primary":{"URL":"http:\/\/link.springer.com\/10.1007\/978-3-030-59851-8_24"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020]]},"ISBN":["9783030598501","9783030598518"],"references-count":19,"URL":"https:\/\/doi.org\/10.1007\/978-3-030-59851-8_24","relation":{},"ISSN":["0302-9743","1611-3349"],"issn-type":[{"type":"print","value":"0302-9743"},{"type":"electronic","value":"1611-3349"}],"subject":[],"published":{"date-parts":[[2020]]},"assertion":[{"value":"20 October 2020","order":1,"name":"first_online","label":"First Online","group":{"name":"ChapterHistory","label":"Chapter History"}},{"value":"ISC High Performance","order":1,"name":"conference_acronym","label":"Conference Acronym","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"International Conference on High Performance Computing","order":2,"name":"conference_name","label":"Conference Name","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Frankfurt am Main","order":3,"name":"conference_city","label":"Conference City","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Germany","order":4,"name":"conference_country","label":"Conference Country","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"2020","order":5,"name":"conference_year","label":"Conference Year","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"22 June 2020","order":7,"name":"conference_start_date","label":"Conference Start Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"25 June 2020","order":8,"name":"conference_end_date","label":"Conference End Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"35","order":9,"name":"conference_number","label":"Conference Number","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"supercomputing2020","order":10,"name":"conference_id","label":"Conference ID","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"https:\/\/www.isc-hpc.com\/","order":11,"name":"conference_url","label":"Conference URL","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Double-blind","order":1,"name":"type","label":"Type","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"Linklings","order":2,"name":"conference_management_system","label":"Conference Management System","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"87","order":3,"name":"number_of_submissions_sent_for_review","label":"Number of Submissions Sent for Review","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"27","order":4,"name":"number_of_full_papers_accepted","label":"Number of Full Papers Accepted","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"0","order":5,"name":"number_of_short_papers_accepted","label":"Number of Short Papers Accepted","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"31% - The value is computed by the equation \"Number of Full Papers Accepted \/ Number of Submissions Sent for Review * 100\" and then rounded to a whole number.","order":6,"name":"acceptance_rate_of_full_papers","label":"Acceptance Rate of Full Papers","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"3.73","order":7,"name":"average_number_of_reviews_per_paper","label":"Average Number of Reviews per Paper","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"4.33","order":8,"name":"average_number_of_papers_per_reviewer","label":"Average Number of Papers per Reviewer","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"No","order":9,"name":"external_reviewers_involved","label":"External Reviewers Involved","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"The conference was held virtually due to the COVID-19 pandemic.","order":10,"name":"additional_info_on_review_process","label":"Additional Info on Review Process","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}}]}}