{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,7]],"date-time":"2026-03-07T18:21:09Z","timestamp":1772907669723,"version":"3.50.1"},"reference-count":10,"publisher":"Oxford University Press (OUP)","issue":"6","license":[{"start":{"date-parts":[[2016,10,2]],"date-time":"2016-10-02T00:00:00Z","timestamp":1475366400000},"content-version":"vor","delay-in-days":1704,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/3.0"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2012,3,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Summary: Hadoop-BAM is a novel library for the scalable manipulation of aligned next-generation sequencing data in the Hadoop distributed computing framework. It acts as an integration layer between analysis applications and BAM files that are processed using Hadoop. Hadoop-BAM solves the issues related to BAM data access by presenting a convenient API for implementing map and reduce functions that can directly operate on BAM records. It builds on top of the Picard SAM JDK, so tools that rely on the Picard API are expected to be easily convertible to support large-scale distributed processing. In this article we demonstrate the use of Hadoop-BAM by building a coverage summarizing tool for the Chipster genome browser. Our results show that Hadoop offers good scalability, and one should avoid moving data in and out of Hadoop between analysis steps.<\/jats:p>\n               <jats:p>Availability: Available under the open-source MIT license at http:\/\/sourceforge.net\/projects\/hadoop-bam\/<\/jats:p>\n               <jats:p>Contact: \u00a0matti.niemenmaa@aalto.fi<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary material is available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/bts054","type":"journal-article","created":{"date-parts":[[2012,2,3]],"date-time":"2012-02-03T01:42:17Z","timestamp":1328233337000},"page":"876-877","source":"Crossref","is-referenced-by-count":113,"title":["Hadoop-BAM: directly manipulating next generation sequencing data in the cloud"],"prefix":"10.1093","volume":"28","author":[{"given":"Matti","family":"Niemenmaa","sequence":"first","affiliation":[{"name":"1 Aalto University, Department of Information and Computer Science, PO Box 15400, FI-00076 Aalto, Finland and 2CSC\u2014IT Center for Science Ltd., PO Box 405, FI-02101 Espoo, Finland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Aleksi","family":"Kallio","sequence":"additional","affiliation":[{"name":"1 Aalto University, Department of Information and Computer Science, PO Box 15400, FI-00076 Aalto, Finland and 2CSC\u2014IT Center for Science Ltd., PO Box 405, FI-02101 Espoo, Finland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Andr\u00e9","family":"Schumacher","sequence":"additional","affiliation":[{"name":"1 Aalto University, Department of Information and Computer Science, PO Box 15400, FI-00076 Aalto, Finland and 2CSC\u2014IT Center for Science Ltd., PO Box 405, FI-02101 Espoo, Finland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Petri","family":"Klemel\u00e4","sequence":"additional","affiliation":[{"name":"1 Aalto University, Department of Information and Computer Science, PO Box 15400, FI-00076 Aalto, Finland and 2CSC\u2014IT Center for Science Ltd., PO Box 405, FI-02101 Espoo, Finland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Eija","family":"Korpelainen","sequence":"additional","affiliation":[{"name":"1 Aalto University, Department of Information and Computer Science, PO Box 15400, FI-00076 Aalto, Finland and 2CSC\u2014IT Center for Science Ltd., PO Box 405, FI-02101 Espoo, Finland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Keijo","family":"Heljanko","sequence":"additional","affiliation":[{"name":"1 Aalto University, Department of Information and Computer Science, PO Box 15400, FI-00076 Aalto, Finland and 2CSC\u2014IT Center for Science Ltd., PO Box 405, FI-02101 Espoo, Finland"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2012,2,2]]},"reference":[{"key":"2023012512203014400_B1","doi-asserted-by":"crossref","first-page":"107","DOI":"10.1145\/1327452.1327492","article-title":"MapReduce: simplified data processing on large clusters","volume":"51","author":"Dean","year":"2008","journal-title":"Commun. of the ACM (CACM)"},{"key":"2023012512203014400_B2","doi-asserted-by":"crossref","first-page":"507","DOI":"10.1186\/1471-2164-12-507","article-title":"Chipster: user-friendly analysis software for microarray and other high-throughput data","volume":"12","author":"Kallio","year":"2011","journal-title":"BMC Genomics"},{"key":"2023012512203014400_B3","doi-asserted-by":"crossref","first-page":"2078","DOI":"10.1093\/bioinformatics\/btp352","article-title":"The Sequence Alignment\/Map format and SAMtools","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012512203014400_B4","doi-asserted-by":"crossref","first-page":"1297","DOI":"10.1101\/gr.107524.110","article-title":"The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data","volume":"20","author":"McKenna","year":"2010","journal-title":"Genome Res."},{"issue":"Suppl. 12","key":"2023012512203014400_B5","doi-asserted-by":"crossref","first-page":"S2","DOI":"10.1186\/1471-2105-11-S12-S2","article-title":"SeqWare Query Engine: storing and searching sequence data in the cloud","volume":"11","author":"O'Connor","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2023012512203014400_B6","first-page":"1099","article-title":"Pig latin: a not-so-foreign language for data processing","volume-title":"SIGMOD Conference","author":"Olston","year":"2008"},{"key":"2023012512203014400_B7","doi-asserted-by":"crossref","first-page":"2159","DOI":"10.1093\/bioinformatics\/btr325","article-title":"SEAL: a distributed short read mapping and duplicate removal tool","volume":"27","author":"Pireddu","year":"2011","journal-title":"Bioinformatics"},{"issue":"Suppl. 12","key":"2023012512203014400_B8","article-title":"An overview of the Hadoop\/MapReduce\/HBase framework and its current applications in bioinformatics","volume":"11","author":"Taylor","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2023012512203014400_B9","first-page":"996","article-title":"Hive \u2013 a petabyte scale data warehouse using Hadoop","volume-title":"ICDE","author":"Thusoo","year":"2010"},{"key":"2023012512203014400_B10","volume-title":"Hadoop - the Definitive Guide: MapReduce for the Cloud.","author":"White","year":"2009"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/28\/6\/876\/48880620\/bioinformatics_28_6_876.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/28\/6\/876\/48880620\/bioinformatics_28_6_876.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T15:39:51Z","timestamp":1674661191000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/28\/6\/876\/312774"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,2,2]]},"references-count":10,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2012,3,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bts054","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2012,3,15]]},"published":{"date-parts":[[2012,2,2]]}}}