{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T21:54:57Z","timestamp":1740174897328,"version":"3.37.3"},"reference-count":14,"publisher":"Wiley","license":[{"start":{"date-parts":[[2018,12,2]],"date-time":"2018-12-02T00:00:00Z","timestamp":1543708800000},"content-version":"unspecified","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100003708","name":"Korea Institute of Science and Technology Information","doi-asserted-by":"publisher","award":["K-18-L11-C03","K-18-L15-C02-S18","NRF-2018R1A6A1A03025109"],"award-info":[{"award-number":["K-18-L11-C03","K-18-L15-C02-S18","NRF-2018R1A6A1A03025109"]}],"id":[{"id":"10.13039\/501100003708","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003708","name":"Korea Institute of Science and Technology Information","doi-asserted-by":"publisher","award":["K-18-L11-C03","K-18-L15-C02-S18","NRF-2018R1A6A1A03025109"],"award-info":[{"award-number":["K-18-L11-C03","K-18-L15-C02-S18","NRF-2018R1A6A1A03025109"]}],"id":[{"id":"10.13039\/501100003708","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100002701","name":"Ministry of Education","doi-asserted-by":"publisher","award":["K-18-L11-C03","K-18-L15-C02-S18","NRF-2018R1A6A1A03025109"],"award-info":[{"award-number":["K-18-L11-C03","K-18-L15-C02-S18","NRF-2018R1A6A1A03025109"]}],"id":[{"id":"10.13039\/501100002701","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Scientific Programming"],"published-print":{"date-parts":[[2018,12,2]]},"abstract":"<jats:p>Apache Hadoop has been a popular parallel processing tool in the era of big data. While practitioners have rewritten many conventional analysis algorithms to make them customized to Hadoop, the issue of inefficient I\/O in Hadoop-based programs has been repeatedly reported in the literature. In this article, we address the problem of the I\/O inefficiency in Hadoop-based massive data analysis by introducing our efficient modification of Hadoop. We first incorporate a columnar data layout into the conventional Hadoop framework, without any modification of the Hadoop internals. We also provide Hadoop with indexing capability to save a huge amount of I\/O while processing not only selection predicates but also star-join queries that are often used in many analysis tasks.<\/jats:p>","DOI":"10.1155\/2018\/2682085","type":"journal-article","created":{"date-parts":[[2018,12,2]],"date-time":"2018-12-02T18:32:57Z","timestamp":1543775577000},"page":"1-9","source":"Crossref","is-referenced-by-count":1,"title":["Improving I\/O Efficiency in Hadoop-Based Massive Data Analysis Programs"],"prefix":"10.1155","volume":"2018","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6929-0825","authenticated-orcid":true,"given":"Kyong-Ha","family":"Lee","sequence":"first","affiliation":[{"name":"Research Data Hub Center, Korea Institute of Science and Technology Information, Daejeon, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Woo Lam","family":"Kang","sequence":"additional","affiliation":[{"name":"School of Computing, KAIST, Daejeon, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3124-2566","authenticated-orcid":true,"given":"Young-Kyoon","family":"Suh","sequence":"additional","affiliation":[{"name":"School of Computer Science and Engineering, Kyungpook National University, Daegu, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"311","reference":[{"key":"2","doi-asserted-by":"publisher","DOI":"10.1145\/1327452.1327492"},{"key":"3","doi-asserted-by":"publisher","DOI":"10.1145\/2094114.2094118"},{"key":"5","doi-asserted-by":"publisher","DOI":"10.1145\/1740390.1740400"},{"key":"6","doi-asserted-by":"publisher","DOI":"10.14778\/1687553.1687609"},{"year":"2011","key":"7"},{"key":"8","doi-asserted-by":"publisher","DOI":"10.1145\/1365815.1365816"},{"key":"9","doi-asserted-by":"publisher","DOI":"10.1145\/2934664"},{"key":"10","doi-asserted-by":"publisher","DOI":"10.14778\/2732977.2733002"},{"key":"14","doi-asserted-by":"publisher","DOI":"10.14778\/2002938.2002943"},{"key":"17","doi-asserted-by":"publisher","DOI":"10.1145\/1132863.1132864"},{"volume-title":"ZLIB compressed data format specification version 3.3","year":"1996","key":"18"},{"key":"20","doi-asserted-by":"publisher","DOI":"10.1145\/276305.276336"},{"key":"21","doi-asserted-by":"publisher","DOI":"10.1145\/356770.356776"},{"volume-title":"The star schema benchmark and augmented fact table indexing","year":"2009","key":"23"}],"container-title":["Scientific Programming"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/downloads.hindawi.com\/journals\/sp\/2018\/2682085.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/downloads.hindawi.com\/journals\/sp\/2018\/2682085.xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/downloads.hindawi.com\/journals\/sp\/2018\/2682085.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2018,12,2]],"date-time":"2018-12-02T18:32:59Z","timestamp":1543775579000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.hindawi.com\/journals\/sp\/2018\/2682085\/"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,12,2]]},"references-count":14,"alternative-id":["2682085","2682085"],"URL":"https:\/\/doi.org\/10.1155\/2018\/2682085","relation":{},"ISSN":["1058-9244","1875-919X"],"issn-type":[{"type":"print","value":"1058-9244"},{"type":"electronic","value":"1875-919X"}],"subject":[],"published":{"date-parts":[[2018,12,2]]}}}