{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,27]],"date-time":"2026-02-27T04:25:24Z","timestamp":1772166324672,"version":"3.50.1"},"reference-count":47,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2021,7,29]],"date-time":"2021-07-29T00:00:00Z","timestamp":1627516800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,7,29]],"date-time":"2021-07-29T00:00:00Z","timestamp":1627516800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001407","name":"Department of Biotechnology , Ministry of Science and Technology","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100001407","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BioData Mining"],"published-print":{"date-parts":[[2021,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>GenoVault is a cloud-based repository for handling Next Generation Sequencing (NGS) data. It is developed using OpenStack-based private cloud with various services like keystone for authentication, cinder for block storage, neutron for networking and nova for managing compute instances for the Cloud. GenoVault uses object-based storage, which enables data to be stored as objects instead of files or blocks for faster retrieval from different distributed object nodes. Along with a web-based interface, a JavaFX-based desktop client has also been developed to meet the requirements of large file uploads that are usually seen in NGS datasets. Users can store files in their respective object-based storage areas and the metadata provided by the user during file uploads is used for querying the database. GenoVault repository is designed taking into account future needs and hence can scale both vertically and horizontally using OpenStack-based cloud features. Users have an option to make the data shareable to the public or restrict the access as private. Data security is ensured as every container is a separate entity in object-based storage architecture which is also supported by Secure File Transfer Protocol (SFTP) for data upload and download. The data is uploaded by the user in individual containers that include raw read files (fastq), processed alignment files (bam, sam, bed) and the output of variation detection (vcf). GenoVault architecture allows verification of the data in terms of integrity and authentication before making it available to collaborators as per the user\u2019s permissions. GenoVault is useful for maintaining the organization-wide NGS data generated in various labs which is not yet published and submitted to public repositories like NCBI. GenoVault also provides support to share NGS data among the collaborating institutions. GenoVault can thus manage vast volumes of NGS data on any OpenStack-based private cloud.<\/jats:p>","DOI":"10.1186\/s13040-021-00268-5","type":"journal-article","created":{"date-parts":[[2021,7,29]],"date-time":"2021-07-29T07:02:42Z","timestamp":1627542162000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["GenoVault: a cloud based genomics repository"],"prefix":"10.1186","volume":"14","author":[{"given":"Sankalp","family":"Jain","sequence":"first","affiliation":[]},{"given":"Amit","family":"Saxena","sequence":"additional","affiliation":[]},{"given":"Suprit","family":"Hesarur","sequence":"additional","affiliation":[]},{"given":"Kirti","family":"Bhadhadhara","sequence":"additional","affiliation":[]},{"given":"Neeraj","family":"Bharti","sequence":"additional","affiliation":[]},{"given":"Sunitha Manjari","family":"Kasibhatla","sequence":"additional","affiliation":[]},{"given":"Uddhavesh","family":"Sonavane","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1299-0091","authenticated-orcid":false,"given":"Rajendra","family":"Joshi","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2021,7,29]]},"reference":[{"issue":"1","key":"268_CR1","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/1471-2164-13-341","volume":"13","author":"MA Quail","year":"2012","unstructured":"Quail MA, Smith M, Coupland P, Otto TD, Harris SR, Connor TR, Bertoni A, Swerdlow HP, Gu Y. A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC Genomics. 2012; 13(1):1\u201313.","journal-title":"BMC Genomics"},{"key":"268_CR2","doi-asserted-by":"publisher","first-page":"40","DOI":"10.4103\/2153-3539.103013","volume":"3","author":"RR Gullapalli","year":"2012","unstructured":"Gullapalli RR, Desai KV, Santana-Santos L, Kant JA, Becich MJ. Next generation sequencing in clinical medicine: Challenges and lessons for pathology and biomedical informatics. J Pathol Inform. 2012; 3:40.","journal-title":"J Pathol Inform"},{"key":"268_CR3","doi-asserted-by":"publisher","first-page":"e910","DOI":"10.14806\/ej.24.0.910","volume":"24","author":"L Papageorgiou","year":"2018","unstructured":"Papageorgiou L, Eleni P, Raftopoulou S, Mantaiou M, Megalooikonomou V, Vlachakis D. Genomic big data hitting the storage bottleneck. EMBnet J. 2018; 24:e910.","journal-title":"EMBnet J"},{"key":"268_CR4","first-page":"134023","volume":"2014","author":"I Merelli","year":"2014","unstructured":"Merelli I, P\u00e9rez-S\u00e1nchez H, Gesing S, D\u2019Agostino D. Managing, analysing, and integrating big data in medical bioinformatics: open problems and future perspectives. BioMed Res Int. 2014; 2014:134023.","journal-title":"BioMed Res Int"},{"key":"268_CR5","unstructured":"Vieira M, Costa AC, Madeira H. Timely ACID Transactions in DBMS. In: Supplemental Volume of the 2004 International Conference on Dependable Systems and Networks. IEEE Computer Society Press: 2004. p. 102\u20133."},{"key":"268_CR6","doi-asserted-by":"crossref","unstructured":"Stonebraker M, Madden S, Abadi DJ, Harizopoulos S, Hachem N, Helland P. The end of an architectural era: It\u2019s time for a complete rewrite. In: Making Databases Work: the Pragmatic Wisdom of Michael Stonebraker: 2018. p. 463\u201389.","DOI":"10.1145\/3226595.3226637"},{"issue":"3","key":"268_CR7","doi-asserted-by":"publisher","first-page":"161","DOI":"10.1007\/s13222-012-0098-2","volume":"12","author":"S Wandelt","year":"2012","unstructured":"Wandelt S, Rheinl\u00e4nder A, Bux M, Thalheim L, Haldemann B, Leser U. Data management challenges in next generation sequencing. Datenbank-Spektrum. 2012; 12(3):161\u201371.","journal-title":"Datenbank-Spektrum"},{"issue":"3","key":"268_CR8","first-page":"38","volume":"55","author":"O Sefraoui","year":"2012","unstructured":"Sefraoui O, Aissaoui M, Eleuldj M. OpenStack: toward an open-source solution for cloud computing. Int J Comput Appl. 2012; 55(3):38\u201342.","journal-title":"Int J Comput Appl"},{"issue":"2","key":"268_CR9","doi-asserted-by":"publisher","first-page":"119","DOI":"10.1080\/21553769.2016.1178180","volume":"9","author":"R Tripathi","year":"2016","unstructured":"Tripathi R, Sharma P, Chakraborty P, Varadwaj PK. Next-generation sequencing revolution through big data analytics. Front Life Sci. 2016; 9(2):119\u201349.","journal-title":"Front Life Sci"},{"issue":"9","key":"268_CR10","doi-asserted-by":"publisher","first-page":"647","DOI":"10.1038\/nrg2857","volume":"11","author":"EE Schadt","year":"2010","unstructured":"Schadt EE, Linderman MD, Sorenson J, Lee L, Nolan GP. Computational solutions to large-scale data management and analysis. Nat Rev Genet. 2010; 11(9):647\u201357.","journal-title":"Nat Rev Genet"},{"issue":"10","key":"268_CR11","doi-asserted-by":"publisher","first-page":"1932","DOI":"10.1016\/j.bbadis.2014.06.015","volume":"1842","author":"H Buermans","year":"2014","unstructured":"Buermans H, Den Dunnen J. Next generation sequencing technology: advances and applications. Biochim Biophys Acta (BBA) - Mol Basis Dis. 2014; 1842(10):1932\u201341.","journal-title":"Biochim Biophys Acta (BBA) - Mol Basis Dis"},{"key":"268_CR12","unstructured":"National Center for Biotechnology Information (NCBI)[Internet] Bethesda(MD): National Library of Medicine (US), National Center for BiotechnologyInformation; 1988. [cited 2021 Jul 14]. Available from: https:\/\/www.ncbi.nlm.nih.gov\/. Accessed 14 July 2021."},{"issue":"W1","key":"268_CR13","doi-asserted-by":"publisher","first-page":"636","DOI":"10.1093\/nar\/gkz268","volume":"47","author":"F Madeira","year":"2019","unstructured":"Madeira F, Park YM, Lee J, Buso N, Gur T, Madhusoodanan N, Basutkar P, Tivey AR, Potter SC, Finn RD, et al. The EMBL-EBI search and sequence analysis tools APIs in 2019. Nucleic Acids Res. 2019; 47(W1):636\u201341.","journal-title":"Nucleic Acids Res"},{"issue":"D1","key":"268_CR14","doi-asserted-by":"publisher","first-page":"D71","DOI":"10.1093\/nar\/gkaa982","volume":"49","author":"A Fukuda","year":"2021","unstructured":"Fukuda A, Kodama Y, Mashima J, Fujisawa T. Ogasawara O. DDBJ update: streamlining submission and access of human data. Nucleic Acids Res. 2021; 49(D1):D71-D75.","journal-title":"Nucleic Acids Res"},{"issue":"D1","key":"268_CR15","doi-asserted-by":"publisher","first-page":"48","DOI":"10.1093\/nar\/gkv1323","volume":"44","author":"G Cochrane","year":"2016","unstructured":"Cochrane G, Karsch-Mizrachi I, Takagi T, Sequence Database Collaboration IN. The international nucleotide sequence database collaboration. Nucleic Acids Res. 2016; 44(D1):48\u201350.","journal-title":"Nucleic Acids Res"},{"issue":"D1","key":"268_CR16","doi-asserted-by":"publisher","first-page":"D92","DOI":"10.1093\/nar\/gkaa1023","volume":"49","author":"EW Sayers","year":"2021","unstructured":"Sayers EW, Cavanaugh M, Clark K, Pruitt KD, Schoch CL, Sherry ST, Karsch-Mizrachi I. Genbank. Nucleic Acids Res. 2021; 49(D1):D92\u2013D96.","journal-title":"Nucleic Acids Res"},{"issue":"W1","key":"268_CR17","doi-asserted-by":"publisher","first-page":"580","DOI":"10.1093\/nar\/gkv279","volume":"43","author":"W Li","year":"2015","unstructured":"Li W, Cowley A, Uludag M, Gur T, McWilliam H, Squizzato S, Park YM, Buso N, Lopez R. The EMBL-EBI bioinformatics web and programmatic tools framework. Nucleic Acids Res. 2015; 43(W1):580\u20134.","journal-title":"Nucleic Acids Res"},{"issue":"1","key":"268_CR18","doi-asserted-by":"publisher","first-page":"27","DOI":"10.1093\/nar\/30.1.27","volume":"30","author":"Y Tateno","year":"2002","unstructured":"Tateno Y, Imanishi T, Miyazaki S, Fukami-Kobayashi K, Saitou N, Sugawara H, Gojobori T. DNA Data Bank of Japan (DDBJ) for genome scale research in life science. Nucleic Acids Res. 2002; 30(1):27\u201330.","journal-title":"Nucleic Acids Res"},{"issue":"D1","key":"268_CR19","doi-asserted-by":"publisher","first-page":"D29","DOI":"10.1093\/nar\/gkaa1077","volume":"49","author":"G Cantelli","year":"2021","unstructured":"Cantelli G, Cochrane G, Brooksbank C, McDonagh E, Flicek P, McEntyre J, Birney E, Apweiler R. The European Bioinformatics Institute: empowering cooperation in response to a global health crisis. Nucleic Acids Res. 2021; 49(D1):D29\u2013D37.","journal-title":"Nucleic Acids Res"},{"key":"268_CR20","unstructured":"Smith K. A Brief History of NCBI\u2019s Formation and Growth, 2nd edition. Bethesda (MD): National Center for Biotechnology Information (US); 2013. Available from: https:\/\/www.ncbi.nlm.nih.gov\/books\/NBK148949\/. Accessed 14 July 2021."},{"key":"268_CR21","unstructured":"Quantum ActiveScale. https:\/\/cdn.allbound.com\/iq-ab\/2020\/09\/CS00497A.pdf. Accessed 14 July 2021."},{"key":"268_CR22","unstructured":"Google Genomics. https:\/\/cloud.google.com\/life-sciences. Accessed 14 July 2021."},{"key":"268_CR23","unstructured":"AWS Genomics. https:\/\/aws.amazon.com\/health\/genomics\/. Accessed 14 July 2021."},{"key":"268_CR24","unstructured":"Microsoft Genomics. https:\/\/azure.microsoft.com\/en-in\/services\/genomics\/. Accessed 14 July 2021."},{"key":"268_CR25","unstructured":"DNA Nexus. https:\/\/www.dnanexus.com. Accessed 14 July 2021."},{"key":"268_CR26","unstructured":"SevenBridges. https:\/\/www.sevenbridges.com. Accessed 14 July 2021."},{"key":"268_CR27","unstructured":"DNA Star. https:\/\/www.dnastar.com. Accessed 14 July 2021."},{"key":"268_CR28","unstructured":"CLC Genomics Cloud. https:\/\/digitalinsights.qiagen.com\/products-overview\/discovery-insights-portfolio\/enterprise-ngs-solutions\/qiagen-clc-genomics-cloud-engine\/. Accessed 14 July 2021."},{"key":"268_CR29","unstructured":"OpenStack. https:\/\/www.openstack.org. Accessed 14 July 2021."},{"key":"268_CR30","volume-title":"OpenStack for architects","author":"M Solberg","year":"2017","unstructured":"Solberg M, Silverman B. OpenStack for architects. Birmingham: Packt Publishing; 2017."},{"key":"268_CR31","doi-asserted-by":"publisher","first-page":"115","DOI":"10.4236\/ajmb.2013.32016","volume":"3","author":"JC Jimenez-Lopez","year":"2013","unstructured":"Jimenez-Lopez JC, Gachomo EW, Sharma S, Kotchoni SO. Genome sequencing and next-generation sequence data analysis: A comprehensive compilation of bioinformatics tools and databases. Am J Mol Biol. 2013; 3:115\u201330.","journal-title":"Am J Mol Biol"},{"key":"268_CR32","unstructured":"Fast Data Transfer. https:\/\/github.com\/fast-data-transfer\/fdt. Accessed 14 July 2021."},{"key":"268_CR33","unstructured":"Swift. https:\/\/wiki.openstack.org\/wiki\/Swift. Accessed 14 July 2021."},{"key":"268_CR34","volume-title":"Openstack Swift: Using, Administering, and Developing for Swift Object Storage","author":"J Arnold","year":"2014","unstructured":"Arnold J. Openstack Swift: Using, Administering, and Developing for Swift Object Storage, 1st. ed. Sebastopol: O\u2019Reilly Media; 2014."},{"key":"268_CR35","volume-title":"Mastering openstack","author":"O Khedher","year":"2015","unstructured":"Khedher O. Mastering openstack. Birmingham: Packt Publishing; 2015."},{"issue":"10","key":"268_CR36","first-page":"39","volume":"102","author":"S Bonthu","year":"2014","unstructured":"Bonthu S, Srilakshmi M, et al. Building an object cloud storage service system using openstack swift. Int J Comput Appl. 2014; 102(10):39\u201342.","journal-title":"Int J Comput Appl"},{"key":"268_CR37","volume-title":"ICSOC Workshops 2014","author":"M Turowski","year":"2015","unstructured":"Turowski M, Lenk A. Vertical Scaling Capability of OpenStack - Survey of Guest Operating Systems, Hypervisors, and the Cloud Management Platform. In: ICSOC Workshops 2014. Switzerland: Springer International Publishing Springer Nature: 2015."},{"issue":"4","key":"268_CR38","doi-asserted-by":"publisher","first-page":"559","DOI":"10.1007\/s10723-014-9314-7","volume":"12","author":"T Lorido-Botran","year":"2014","unstructured":"Lorido-Botran T, Miguel-Alonso J, Lozano JA. A review of auto-scaling techniques for elastic applications in cloud environments. J Grid Comput. 2014; 12(4):559\u201392.","journal-title":"J Grid Comput"},{"key":"268_CR39","unstructured":"Picard toolkit. Broad Institute, GitHub Repository. 2019. http:\/\/broadinstitute.github.io\/picard\/ Broad Institute Accessed 14 July 2021."},{"key":"268_CR40","unstructured":"FastQValidator toolkit. Center for Statistical Genetics. 2017. https:\/\/genome.sph.umich.edu\/wiki\/FastQValidator. Accessed 14 July 2021."},{"issue":"03","key":"268_CR41","doi-asserted-by":"publisher","first-page":"645","DOI":"10.1109\/TCBB.2013.68","volume":"10","author":"G Gremme","year":"2013","unstructured":"Gremme G, Steinbiss S, Kurtz S. Genometools: A comprehensive software library for efficient processing of structured genome annotations. IEEE\/ACM Trans Comput Biol Bioinform. 2013; 10(03):645\u201356. https:\/\/doi.org\/10.1109\/TCBB.2013.68.","journal-title":"IEEE\/ACM Trans Comput Biol Bioinform"},{"key":"268_CR42","doi-asserted-by":"publisher","first-page":"56","DOI":"10.1016\/j.future.2015.10.015","volume":"58","author":"Y Jararweh","year":"2016","unstructured":"Jararweh Y, Al-Ayyoub M, Benkhelifa E, Vouk M, Rindos A, et al. Software defined cloud: Survey, system and evaluation. Futur Gener Comput Syst. 2016; 58:56\u201374.","journal-title":"Futur Gener Comput Syst"},{"key":"268_CR43","unstructured":"Apache Software Foundation. Hadoop. https:\/\/hadoop.apache.org. Accessed 14 July 2021."},{"key":"268_CR44","doi-asserted-by":"crossref","unstructured":"Shvachko K, Kuang H, Radia S, Chansler R. The hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST). IEEE: 2010. p. 1\u201310.","DOI":"10.1109\/MSST.2010.5496972"},{"key":"268_CR45","unstructured":"WildFly. https:\/\/www.wildfly.org. Accessed 14 July 2021."},{"issue":"3","key":"268_CR46","doi-asserted-by":"publisher","first-page":"256","DOI":"10.1038\/nbt0308-256b","volume":"26","author":"N Siva","year":"2008","unstructured":"Siva N. 1000 Genomes project. Nat Biotechnol. 2008; 26(3):256.","journal-title":"Nat Biotechnol"},{"issue":"D1","key":"268_CR47","doi-asserted-by":"publisher","first-page":"D941","DOI":"10.1093\/nar\/gkz836","volume":"48","author":"S Fairley","year":"2020","unstructured":"Fairley S, Lowy-Gallego E, Perry E, Flicek P. The International Genome Sample Resource (IGSR) collection of open human genomic variation resources. Nucleic Acids Res. 2020; 48(D1):D941\u20137.","journal-title":"Nucleic Acids Res"}],"container-title":["BioData Mining"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13040-021-00268-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13040-021-00268-5\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13040-021-00268-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,7,29]],"date-time":"2021-07-29T07:03:57Z","timestamp":1627542237000},"score":1,"resource":{"primary":{"URL":"https:\/\/biodatamining.biomedcentral.com\/articles\/10.1186\/s13040-021-00268-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,7,29]]},"references-count":47,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,12]]}},"alternative-id":["268"],"URL":"https:\/\/doi.org\/10.1186\/s13040-021-00268-5","relation":{"has-preprint":[{"id-type":"doi","id":"10.21203\/rs.3.rs-105137\/v1","asserted-by":"object"}]},"ISSN":["1756-0381"],"issn-type":[{"value":"1756-0381","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,7,29]]},"assertion":[{"value":"7 November 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 July 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"29 July 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Not applicable","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"The authors declare that they have no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"36"}}