{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,14]],"date-time":"2025-10-14T20:11:28Z","timestamp":1760472688695,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":46,"publisher":"ACM","license":[{"start":{"date-parts":[[2018,11,26]],"date-time":"2018-11-26T00:00:00Z","timestamp":1543190400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"EU H2020","award":["732189"],"award-info":[{"award-number":["732189"]}]},{"name":"SSF","award":["BD15-0006 RIT15-0119,BD15-0006?"],"award-info":[{"award-number":["BD15-0006 RIT15-0119,BD15-0006?"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2018,11,26]]},"DOI":"10.1145\/3274808.3274811","type":"proceedings-article","created":{"date-parts":[[2019,2,13]],"date-time":"2019-02-13T18:41:21Z","timestamp":1550083281000},"page":"26-39","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":19,"title":["Size Matters"],"prefix":"10.1145","author":[{"given":"Salman","family":"Niazi","sequence":"first","affiliation":[{"name":"KTH - Royal Institute of Technology and Logical Clocks AB"}]},{"given":"Mikael","family":"Ronstr\u00f6m","sequence":"additional","affiliation":[{"name":"Oracle AB"}]},{"given":"Seif","family":"Haridi","sequence":"additional","affiliation":[{"name":"KTH - Royal Institute of Technology and Logical Clocks AB"}]},{"given":"Jim","family":"Dowling","sequence":"additional","affiliation":[{"name":"KTH - Royal Institute of Technology and Logical Clocks AB"}]}],"member":"320","published-online":{"date-parts":[[2018,11,26]]},"reference":[{"key":"e_1_3_2_2_1_1","first-page":"7","article-title":"Apache hadoop: the scalability update,\" log","volume":"36","author":"Shvachko K. V.","year":"2011","journal-title":"The Magazine of USENIX"},{"volume-title":"OSDI '06","author":"Weil S. A.","key":"e_1_3_2_2_2_1"},{"key":"e_1_3_2_2_3_1","unstructured":"P. Schwan \"Lustre: Building a File System for 1000-node Clusters \" in Proc. of OLS'03 2003.  P. Schwan \"Lustre: Building a File System for 1000-node Clusters \" in Proc. of OLS'03 2003."},{"key":"e_1_3_2_2_4_1","unstructured":"\"Docs - Getting started with GlusterFS - Architecture.\" http:\/\/gluster.readthedocs.org\/en\/latest\/Quick-Start-Guide\/Architecture\/ 2011. {Online; accessed 30-June-2015}.  \"Docs - Getting started with GlusterFS - Architecture.\" http:\/\/gluster.readthedocs.org\/en\/latest\/Quick-Start-Guide\/Architecture\/ 2011. {Online; accessed 30-June-2015}."},{"key":"e_1_3_2_2_5_1","unstructured":"M. Srivas P. Ravindra U. Saradhi A. Pande C. Sanapala L. Renu S. Kavacheri A. Hadke and V. Vellanki \"MapReduce Ready Distributed File System \" 2011. US Patent App. 13\/162 439.  M. Srivas P. Ravindra U. Saradhi A. Pande C. Sanapala L. Renu S. Kavacheri A. Hadke and V. Vellanki \"MapReduce Ready Distributed File System \" 2011. US Patent App. 13\/162 439."},{"key":"e_1_3_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/MSST.2010.5496972"},{"key":"e_1_3_2_2_7_1","unstructured":"Cindy Gross \"Hadoop Likes Big Files.\" https:\/\/blogs.msdn.microsoft.com\/cindygross\/2015\/05\/04\/hadoop-likes-big-files\/. {Online; accessed 30-Jan-2017}.  Cindy Gross \"Hadoop Likes Big Files.\" https:\/\/blogs.msdn.microsoft.com\/cindygross\/2015\/05\/04\/hadoop-likes-big-files\/. {Online; accessed 30-Jan-2017}."},{"volume-title":"USENIX Association","year":"2017","author":"Niazi S.","key":"e_1_3_2_2_8_1"},{"key":"e_1_3_2_2_9_1","unstructured":"Tom White \"The Small Files Problem.\" http:\/\/blog.cloudera.com\/blog\/2009\/02\/the-small-files-problem\/. {Online; accessed 1-March-2017}.  Tom White \"The Small Files Problem.\" http:\/\/blog.cloudera.com\/blog\/2009\/02\/the-small-files-problem\/. {Online; accessed 1-March-2017}."},{"key":"e_1_3_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/CCGRID.2017.117"},{"key":"e_1_3_2_2_11_1","unstructured":"\"HOPS Software-As-A-Service from SICS\u00e2\u0102&Zacute;S new datacenter.\" https:\/\/www.swedishict.se\/hops-software-as-a-service-from-sicss-new-datacenter. {Online; accessed 23-May-2016}.  \"HOPS Software-As-A-Service from SICS\u00e2\u0102&Zacute;S new datacenter.\" https:\/\/www.swedishict.se\/hops-software-as-a-service-from-sicss-new-datacenter. {Online; accessed 23-May-2016}."},{"key":"e_1_3_2_2_12_1","unstructured":"\"Yahoo Research. S2 - Yahoo Statistical Information Regarding Files and Access Pattern to Files in one of Yahoo's Clusters.\" https:\/\/webscope.sandbox.yahoo.com\/catalog.php?datatype=s. {Online; accessed 30-Jan-2017}.  \"Yahoo Research. S2 - Yahoo Statistical Information Regarding Files and Access Pattern to Files in one of Yahoo's Clusters.\" https:\/\/webscope.sandbox.yahoo.com\/catalog.php?datatype=s. {Online; accessed 30-Jan-2017}."},{"key":"e_1_3_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/1165389.945450"},{"key":"e_1_3_2_2_14_1","unstructured":"A. Foundation \"Apache Hadoop.\" https:\/\/hadoop.apache.org\/. {Online; accessed 30-Aug-2017}.  A. Foundation \"Apache Hadoop.\" https:\/\/hadoop.apache.org\/. {Online; accessed 30-Aug-2017}."},{"key":"e_1_3_2_2_15_1","unstructured":"S. Pook \"Pilot Hadoop Towards 2500 Nodes and Cluster Redundancy.\" http:\/\/events.linuxfoundation.org\/sites\/events\/files\/slides\/Pook-Pilot%20Hadoop%20Towards%202500%20Nodes%20and%20Cluster%20Redundancy.pdf. {Apache Big Data Miami 2017. Online; accessed 28-Sep-2017}.  S. Pook \"Pilot Hadoop Towards 2500 Nodes and Cluster Redundancy.\" http:\/\/events.linuxfoundation.org\/sites\/events\/files\/slides\/Pook-Pilot%20Hadoop%20Towards%202500%20Nodes%20and%20Cluster%20Redundancy.pdf. {Apache Big Data Miami 2017. Online; accessed 28-Sep-2017}."},{"key":"e_1_3_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/2972206.2972210"},{"key":"e_1_3_2_2_17_1","unstructured":"M. Asay \"http:\/\/www.techrepublic.com\/article\/why-the-worlds-largest-hadoop-installation-may-soon-become-the-norm \" Tech Republic vol. Sep 2014.  M. Asay \"http:\/\/www.techrepublic.com\/article\/why-the-worlds-largest-hadoop-installation-may-soon-become-the-norm \" Tech Republic vol. Sep 2014."},{"key":"e_1_3_2_2_18_1","unstructured":"K. V. Shvachko \"HDFS Scalability: The limits to growth \" login vol. 35 no. 2 pp. 6--16 2010.  K. V. Shvachko \"HDFS Scalability: The limits to growth \" login vol. 35 no. 2 pp. 6--16 2010."},{"key":"e_1_3_2_2_19_1","unstructured":"\"HDFS Short-Circuit Local Reads.\" https:\/\/hadoop.apache.org\/docs\/stable\/hadoop-project-dist\/hadoop-hdfs\/ShortCircuitLocalReads.html. {Online; accessed 30-March-2017}.  \"HDFS Short-Circuit Local Reads.\" https:\/\/hadoop.apache.org\/docs\/stable\/hadoop-project-dist\/hadoop-hdfs\/ShortCircuitLocalReads.html. {Online; accessed 30-March-2017}."},{"volume-title":"OSDI'04","author":"Dean J.","key":"e_1_3_2_2_20_1"},{"volume-title":"USENIX Association","year":"2010","author":"Zaharia M.","key":"e_1_3_2_2_21_1"},{"key":"e_1_3_2_2_22_1","unstructured":"A. Kagawa \"Hadoop Summit 2014 Amsterdam. Hadoop Operations Powered By ... Hadoop.\" https:\/\/www.youtube.com\/watch?v=XZWwwc-qeJo. {Online; accessed 30-Aug-2015}.  A. Kagawa \"Hadoop Summit 2014 Amsterdam. Hadoop Operations Powered By ... Hadoop.\" https:\/\/www.youtube.com\/watch?v=XZWwwc-qeJo. {Online; accessed 30-Aug-2015}."},{"key":"e_1_3_2_2_23_1","unstructured":"\"Hadoop Archives Guide.\" https:\/\/hadoop.apache.org\/docs\/r1.2.1\/hadooparchives.html. {Online; accessed 30-Jan-2017}.  \"Hadoop Archives Guide.\" https:\/\/hadoop.apache.org\/docs\/r1.2.1\/hadooparchives.html. {Online; accessed 30-Jan-2017}."},{"volume-title":"Incorporated","year":"2011","author":"George L.","key":"e_1_3_2_2_24_1"},{"key":"e_1_3_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/1773912.1773922"},{"key":"e_1_3_2_2_26_1","unstructured":"A. Agarwal \"Heterogeneous Storages in HDFS.\" https:\/\/hortonworks.com\/blog\/heterogeneous-storages-hdfs\/ 2014. {Online; accessed 26-February-2018}.  A. Agarwal \"Heterogeneous Storages in HDFS.\" https:\/\/hortonworks.com\/blog\/heterogeneous-storages-hdfs\/ 2014. {Online; accessed 26-February-2018}."},{"key":"e_1_3_2_2_27_1","unstructured":"B. Leenders \"Heterogeneous storage in hopsfs.\" Masters thesis at KTH (TRITA-ICT-EX 2016: 123) 2016.  B. Leenders \"Heterogeneous storage in hopsfs.\" Masters thesis at KTH (TRITA-ICT-EX 2016:123) 2016."},{"key":"e_1_3_2_2_28_1","first-page":"1108","article-title":"Recovery Principles of MySQL Cluster 5.1","volume":"05","author":"Ronstr\u00f6m M.","year":"2005","journal-title":"Proc. of VLDB"},{"key":"e_1_3_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/MC.2012.37"},{"volume-title":"MySQL Press","year":"2006","author":"Davies A.","key":"e_1_3_2_2_30_1"},{"key":"e_1_3_2_2_31_1","unstructured":"\"Flexible IO Tester.\" https:\/\/webscope.sandbox.yahoo.com\/catalog.php?datatype=s. {Online; accessed 30-Jan-2017}.  \"Flexible IO Tester.\" https:\/\/webscope.sandbox.yahoo.com\/catalog.php?datatype=s. {Online; accessed 30-Jan-2017}."},{"key":"e_1_3_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.14778\/2536222.2536234"},{"key":"e_1_3_2_2_33_1","unstructured":"Arpit Agarwal \"Scaling the HDFS NameNode.\" https:\/\/community.hortonworks.com\/articles\/43838\/scaling-the-hdfs-namenode-part-1.html. {Online; accessed 30-Jan-2017}.  Arpit Agarwal \"Scaling the HDFS NameNode.\" https:\/\/community.hortonworks.com\/articles\/43838\/scaling-the-hdfs-namenode-part-1.html. {Online; accessed 30-Jan-2017}."},{"key":"e_1_3_2_2_34_1","first-page":"6","article-title":"HDFS Scalability: The Limits to Growth,\" log","volume":"35","author":"Shvachko K. V.","year":"2010","journal-title":"The Magazine of USENIX"},{"key":"e_1_3_2_2_35_1","unstructured":"I. Krasin T. Duerig N. Alldrin V. Ferrari S. Abu-El-Haija A. Kuznetsova H. Rom J. Uijlings S. Popov A. Veit S. Belongie V. Gomes A. Gupta C. Sun G. Chechik D. Cai Z. Feng D. Narayanan and K. Murphy \"Openimages: A public dataset for large-scale multi-label and multi-class image classification. \" Dataset available from https:\/\/github.com\/openimages 2017.  I. Krasin T. Duerig N. Alldrin V. Ferrari S. Abu-El-Haija A. Kuznetsova H. Rom J. Uijlings S. Popov A. Veit S. Belongie V. Gomes A. Gupta C. Sun G. Chechik D. Cai Z. Feng D. Narayanan and K. Murphy \"Openimages: A public dataset for large-scale multi-label and multi-class image classification. \" Dataset available from https:\/\/github.com\/openimages 2017."},{"key":"e_1_3_2_2_36_1","first-page":"02677","article-title":"Accurate, large minibatch SGD: training imagenet in 1 hour","volume":"1706","author":"Goyal P.","year":"2017","journal-title":"CoRR"},{"key":"e_1_3_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/2213836.2213947"},{"key":"e_1_3_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/2213836.2213862"},{"key":"e_1_3_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3033273"},{"key":"e_1_3_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2014.2377720"},{"volume-title":"USENIX","year":"2013","author":"Ren K.","key":"e_1_3_2_2_41_1"},{"key":"e_1_3_2_2_42_1","unstructured":"\"LevelDB.\" http:\/\/leveldb.org\/. {Online; accessed 1-January-2016}.  \"LevelDB.\" http:\/\/leveldb.org\/. {Online; accessed 1-January-2016}."},{"key":"e_1_3_2_2_43_1","unstructured":"J. Hendricks R. R. Sambasivan S. Sinnamohideen and G. R. Ganger \"Improving small file performance in object-based storage \" 2006.  J. Hendricks R. R. Sambasivan S. Sinnamohideen and G. R. Ganger \"Improving small file performance in object-based storage \" 2006."},{"key":"e_1_3_2_2_44_1","doi-asserted-by":"crossref","unstructured":"X. Liu J. Han Y. Zhong C. Han and X. He \"Implementing webgis on hadoop: A case study of improving small file i\/o performance on hdfs \" in 2009 IEEE International Conference on Cluster Computing and Workshops pp. 1--8 Aug 2009.  X. Liu J. Han Y. Zhong C. Han and X. He \"Implementing webgis on hadoop: A case study of improving small file i\/o performance on hdfs \" in 2009 IEEE International Conference on Cluster Computing and Workshops pp. 1--8 Aug 2009.","DOI":"10.1109\/CLUSTR.2009.5289196"},{"key":"e_1_3_2_2_45_1","unstructured":"L. E.\n       \n      Li E.\n       \n      Chen J.\n       \n      Hermann P.\n       \n      Zhang and \n      \n      \n      L.\n       \n      Wang \"\n  Scaling machine learning as a service \" in Proceedings of The 3rd International Conference on Predictive Applications and APIs\n   (C. Hardgrove L. Dorard K. Thompson and F. Douetteau eds.) vol. \n  67\n   of \n  Proceedings of Machine Learning Research (Microsoft NERD Boston USA\n  ) pp. \n  14\n  --\n  29 PMLR 11--12 Oct \n  2017\n  .  L. E. Li E. Chen J. Hermann P. Zhang and L. Wang \"Scaling machine learning as a service \" in Proceedings of The 3rd International Conference on Predictive Applications and APIs (C. Hardgrove L. Dorard K. Thompson and F. Douetteau eds.) vol. 67 of Proceedings of Machine Learning Research (Microsoft NERD Boston USA) pp. 14--29 PMLR 11--12 Oct 2017."},{"key":"e_1_3_2_2_46_1","unstructured":"S. Dong M. Callaghan L. Galanis D. Borthakur T. Savor and M. Strum \"Optimizing space amplification in rocksdb. \" in CIDR 2017.  S. Dong M. Callaghan L. Galanis D. Borthakur T. Savor and M. Strum \"Optimizing space amplification in rocksdb. \" in CIDR 2017."}],"event":{"name":"Middleware '18: 19th International Middleware Conference","sponsor":["ACM Association for Computing Machinery","USENIX Assoc USENIX Assoc","IFIP International Federation for Information Processing"],"location":"Rennes France","acronym":"Middleware '18"},"container-title":["Proceedings of the 19th International Middleware Conference"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3274808.3274811","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3274808.3274811","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T00:44:03Z","timestamp":1750207443000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3274808.3274811"}},"subtitle":["Improving the Performance of Small Files in Hadoop"],"short-title":[],"issued":{"date-parts":[[2018,11,26]]},"references-count":46,"alternative-id":["10.1145\/3274808.3274811","10.1145\/3274808"],"URL":"https:\/\/doi.org\/10.1145\/3274808.3274811","relation":{},"subject":[],"published":{"date-parts":[[2018,11,26]]},"assertion":[{"value":"2018-11-26","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}