{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:08:39Z","timestamp":1750306119585,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":37,"publisher":"ACM","license":[{"start":{"date-parts":[[2017,5,9]],"date-time":"2017-05-09T00:00:00Z","timestamp":1494288000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Universit\u00e9 Paris-Saclay","award":["IDEX chair"],"award-info":[{"award-number":["IDEX chair"]}]},{"DOI":"10.13039\/100000001","name":"NSF","doi-asserted-by":"publisher","award":["DBI-1356486"],"award-info":[{"award-number":["DBI-1356486"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2017,5,9]]},"DOI":"10.1145\/3035918.3064048","type":"proceedings-article","created":{"date-parts":[[2017,5,10]],"date-time":"2017-05-10T18:09:00Z","timestamp":1494439740000},"page":"187-202","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":13,"title":["Massively Parallel Processing of Whole Genome Sequence Data"],"prefix":"10.1145","author":[{"given":"Abhishek","family":"Roy","sequence":"first","affiliation":[{"name":"University of Massachusetts Amherst, Amherst, MA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yanlei","family":"Diao","sequence":"additional","affiliation":[{"name":"University of Massachusetts Amherst &amp; \u00c9cole Polytechnique, Amherst, MA, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Uday","family":"Evani","sequence":"additional","affiliation":[{"name":"New York Genome Center, New York City, NY, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Avinash","family":"Abhyankar","sequence":"additional","affiliation":[{"name":"New York Genome Center, New York City, NY, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Clinton","family":"Howarth","sequence":"additional","affiliation":[{"name":"New York Genome Center, New York City, NY, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"R\u00e9mi","family":"Le Priol","sequence":"additional","affiliation":[{"name":"\u00c9cole Polytechnique &amp; New York Genome Center, Palaiseau, France"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Toby","family":"Bloom","sequence":"additional","affiliation":[{"name":"New York Genome Center, New York City, NY, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2017,5,9]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"The galaxy platform for accessible, reproducible and collaborative biomedical analyses. Nucleic Acids Research, 44(W1):W3--W10","author":"Afgan E.","year":"2016","unstructured":"E. Afgan , D. Baker , The galaxy platform for accessible, reproducible and collaborative biomedical analyses. Nucleic Acids Research, 44(W1):W3--W10 , 2016 . E. Afgan, D. Baker, et al. The galaxy platform for accessible, reproducible and collaborative biomedical analyses. Nucleic Acids Research, 44(W1):W3--W10, 2016."},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1186\/1471-2105-13-315"},{"key":"e_1_3_2_1_3_1","volume-title":"Deepblue epigenomic data server: programmatic data retrieval and analysis of epigenome region sets. Nucleic Acids Research, 44(W1):W581","author":"Albrecht F.","year":"2016","unstructured":"F. Albrecht Deepblue epigenomic data server: programmatic data retrieval and analysis of epigenome region sets. Nucleic Acids Research, 44(W1):W581 , 2016 . F. Albrecht et al. Deepblue epigenomic data server: programmatic data retrieval and analysis of epigenome region sets. Nucleic Acids Research, 44(W1):W581, 2016."},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1038\/nmeth0710-495"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1038\/nbt.2514"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1038\/nature09534"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1038\/ng.806"},{"key":"e_1_3_2_1_8_1","volume-title":"CIDR","author":"Diao Y.","year":"2015","unstructured":"Y. Diao , A. Roy , and T. Bloom . Building highly-optimized, low-latency pipelines for genomic data analysis . In CIDR , 2015 . Y. Diao, A. Roy, and T. Bloom. Building highly-optimized, low-latency pipelines for genomic data analysis. In CIDR, 2015."},{"key":"e_1_3_2_1_9_1","unstructured":"Firecloud by broad institute. https:\/\/software.broadinstitute.org\/firecloud.  Firecloud by broad institute. https:\/\/software.broadinstitute.org\/firecloud."},{"key":"e_1_3_2_1_10_1","unstructured":"Best practice variant detection with gatk. http:\/\/http:\/\/www.broadinstitute.org\/gatk\/.  Best practice variant detection with gatk. http:\/\/http:\/\/www.broadinstitute.org\/gatk\/."},{"key":"e_1_3_2_1_11_1","unstructured":"Hadoop: Open-source implementation of mapreduce. http:\/\/hadoop.apache.org.  Hadoop: Open-source implementation of mapreduce. http:\/\/hadoop.apache.org."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2016.2603980"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1186\/gb-2009-10-11-r134"},{"key":"e_1_3_2_1_14_1","volume-title":"Ultrafast and memory-efficient alignment of short dna sequences to the human genome. Genome biology, 10(3)","author":"Langmead B.","year":"2009","unstructured":"B. Langmead , C. Trapnell , Ultrafast and memory-efficient alignment of short dna sequences to the human genome. Genome biology, 10(3) , 2009 . R25. B. Langmead, C. Trapnell, et al. Ultrafast and memory-efficient alignment of short dna sequences to the human genome. Genome biology, 10(3), 2009. R25."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2389241.2389246"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btp324"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btp698"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/1989323.1989370"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1101\/gr.107524.110"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1038\/gim.2012.116"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/bts054"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btt528"},{"key":"e_1_3_2_1_24_1","unstructured":"Custom designed multi-threaded sort\/merge tools for bam files. http:\/\/www.novocraft.com\/products\/novosort\/.  Custom designed multi-threaded sort\/merge tools for bam files. http:\/\/www.novocraft.com\/products\/novosort\/."},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1186\/gb-2013-14-7-r80"},{"key":"e_1_3_2_1_26_1","unstructured":"A commercial dbms for scalable scientific data management. http:\/\/www.paradigm4.com\/.  A commercial dbms for scalable scientific data management. http:\/\/www.paradigm4.com\/."},{"key":"e_1_3_2_1_27_1","unstructured":"Parquet: a columnar storage format for the hadoop ecosystem. http:\/\/parquet.incubator.apache.org.  Parquet: a columnar storage format for the hadoop ecosystem. http:\/\/parquet.incubator.apache.org."},{"key":"e_1_3_2_1_28_1","unstructured":"Picard tools: Java-based command-line utilities for manipulating sam files. http:\/\/picard.sourceforge.net\/.  Picard tools: Java-based command-line utilities for manipulating sam files. http:\/\/picard.sourceforge.net\/."},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/1996092.1996106"},{"key":"e_1_3_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1038\/ng0506-500"},{"key":"e_1_3_2_1_31_1","unstructured":"Sam: a generic format for storing large nucleotide sequence alignments. http:\/\/samtools.sourceforge.net\/.  Sam: a generic format for storing large nucleotide sequence alignments. http:\/\/samtools.sourceforge.net\/."},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btt601"},{"key":"e_1_3_2_1_33_1","volume-title":"An integrative probabilistic model for identification of structural variation in sequence data. Genome Biology, 13(3)","author":"Sindi S. S.","year":"2012","unstructured":"S. S. Sindi , S. Onal , An integrative probabilistic model for identification of structural variation in sequence data. Genome Biology, 13(3) , 2012 . S. S. Sindi, S. Onal, et al. An integrative probabilistic model for identification of structural variation in sequence data. Genome Biology, 13(3), 2012."},{"key":"e_1_3_2_1_34_1","volume-title":"June","author":"Siretskiy A.","year":"2015","unstructured":"A. Siretskiy , T. Sundqvist , A quantitative assessment of the hadoop framework for analyzing massively parallel dna sequencing data. GigaScience, 4(26) , June 2015 . A. Siretskiy, T. Sundqvist, et al. A quantitative assessment of the hadoop framework for analyzing massively parallel dna sequencing data. GigaScience, 4(26), June 2015."},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pbio.1002195"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2595633"},{"key":"e_1_3_2_1_37_1","volume-title":"NSDI'12","author":"Zaharia M.","year":"2012","unstructured":"M. Zaharia , M. Chowdhury , Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing . NSDI'12 , 2012 . M. Zaharia, M. Chowdhury, et al. Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. NSDI'12, 2012."},{"key":"e_1_3_2_1_38_1","volume-title":"Extensive sequencing of seven human genomes to characterize benchmark reference materials. Scientific data, 3","author":"Zook J. M.","year":"2016","unstructured":"J. M. Zook , D. Catoe , Extensive sequencing of seven human genomes to characterize benchmark reference materials. Scientific data, 3 , 2016 . J. M. Zook, D. Catoe, et al. Extensive sequencing of seven human genomes to characterize benchmark reference materials. Scientific data, 3, 2016."}],"event":{"name":"SIGMOD\/PODS'17: International Conference on Management of Data","sponsor":["SIGMOD ACM Special Interest Group on Management of Data"],"location":"Chicago Illinois USA","acronym":"SIGMOD\/PODS'17"},"container-title":["Proceedings of the 2017 ACM International Conference on Management of Data"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3035918.3064048","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3035918.3064048","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3035918.3064048","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T03:36:42Z","timestamp":1750217802000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3035918.3064048"}},"subtitle":["An In-Depth Performance Study"],"short-title":[],"issued":{"date-parts":[[2017,5,9]]},"references-count":37,"alternative-id":["10.1145\/3035918.3064048","10.1145\/3035918"],"URL":"https:\/\/doi.org\/10.1145\/3035918.3064048","relation":{},"subject":[],"published":{"date-parts":[[2017,5,9]]},"assertion":[{"value":"2017-05-09","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}