{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,24]],"date-time":"2026-04-24T10:01:34Z","timestamp":1777024894613,"version":"3.51.4"},"reference-count":33,"publisher":"Association for Computing Machinery (ACM)","issue":"1","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["SIGOPS Oper. Syst. Rev."],"published-print":{"date-parts":[[2025,8,4]]},"abstract":"<jats:p>Erasure Coding (EC) has recently been integrated and deployed in the Hadoop Distributed File System (HDFS) to provide the same fault tolerance guarantees as replication, but with significantly less storage overhead. When EC is used, data reads typically involve only data chunks. In this paper, we study the effect of data chunk distribution on the performance of reads and data-intensive applications, and present the design and evaluation of an erasure coding aware (EC-aware) block placement that balances the distribution of data chunks across nodes. Experimental results show that EC-aware block placement can reduce the execution time of Sort and WordCount applications by up to 25%.<\/jats:p>","DOI":"10.1145\/3759441.3759451","type":"journal-article","created":{"date-parts":[[2025,8,6]],"date-time":"2025-08-06T14:43:44Z","timestamp":1754491424000},"page":"62-69","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Erasure Coding Aware Block Placement for Data-Intensive Applications"],"prefix":"10.1145","volume":"59","author":[{"given":"Shadi","family":"Ibrahim","sequence":"first","affiliation":[{"name":"Inria, Univ. Rennes, CNRS, IRISA, Rennes, France"}]},{"given":"Jad","family":"Darrous","sequence":"additional","affiliation":[{"name":"Inria, IMT Atlantique, LS2N, Nantes, France"}]}],"member":"320","published-online":{"date-parts":[[2025,8,6]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"2025. Apache Flink. https:\/\/flink.apache.org."},{"key":"e_1_2_1_2_1","unstructured":"2025. Apache Hadoop. http:\/\/hadoop.apache.org."},{"key":"e_1_2_1_3_1","unstructured":"2025. Apache Spark. https:\/\/spark.apache.org."},{"key":"e_1_2_1_4_1","unstructured":"2025. Ceph Erasure-code. https:\/\/docs.ceph.com\/en\/reef\/rados\/ operations\/erasure-code\/."},{"key":"e_1_2_1_5_1","unstructured":"2025. HDFS Erasure Coding. https:\/\/hadoop.apache.org\/docs\/stable\/ hadoop-project-dist\/hadoop-hdfs\/HDFSErasureCoding.html."},{"key":"e_1_2_1_6_1","unstructured":"2025. Intel Intelligent Storage Acceleration Library Homepage. https: \/\/software.intel.com\/en-us\/storage\/ISA-L."},{"key":"e_1_2_1_7_1","unstructured":"2025. Powered by Apache Hadoop. https:\/\/cwiki.apache.org\/ confluence\/display\/HADOOP2\/PoweredBy#PoweredBy-S."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-04519-1_1"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS57955.2024.00078"},{"key":"e_1_2_1_10_1","volume-title":"ICPP 2019 : 48th International Conference on Parallel Processing. 1. https:\/\/inria.hal. science\/hal-02388835\/document","author":"Darrous Jad","year":"2019","unstructured":"Jad Darrous and Shadi Ibrahim. 2019. [Poster Presentation] Enabling Data Processing under Erasure Coding in the Fog. In ICPP 2019 : 48th International Conference on Parallel Processing. 1. https:\/\/inria.hal. science\/hal-02388835\/document"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3503646.3524296"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/MASCOTS.2019.00026"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/CCGRID.2018.00082"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/1713072.1713075"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.5555\/1924943.1924948"},{"key":"e_1_2_1_16_1","volume-title":"Decentralized Storage Despite Massive Correlated Failures. In 2nd Symposium on Networked Systems Design & Implementation (NSDI 05)","author":"Haeberlen Andreas","year":"2005","unstructured":"Andreas Haeberlen and Alan Mislove. 2005. Glacier: Highly Durable, Decentralized Storage Despite Massive Correlated Failures. In 2nd Symposium on Networked Systems Design & Implementation (NSDI 05). USENIX Association, Boston, MA. https:\/\/www.usenix.org\/conference\/nsdi-05\/glacier-highly-durabledecentralized- storage-despite-massive-correlated-failures"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3655038.3665951"},{"key":"e_1_2_1_18_1","volume-title":"Proceedings of the 2012 USENIX Conference on Annual Technical Conference (Boston, MA) (USENIX ATC'12)","author":"Huang Cheng","unstructured":"Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin Li, and Sergey Yekhanin. 2012. Erasure coding in windows azure storage. In Proceedings of the 2012 USENIX Conference on Annual Technical Conference (Boston, MA) (USENIX ATC'12). USENIX Association, USA, 2. https:\/\/www.usenix.org\/conference\/ atc12\/technical-sessions\/presentation\/huang"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3620665.3640353"},{"key":"e_1_2_1_20_1","unstructured":"NetApp Inc. 2025. Use Deduplication Data Compression and Data Compaction to Increase Storage Efficiency. Technical Report. NetApp Inc. https:\/\/docs.netapp.com\/us- 68"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/356989"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2017"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/DSN.2014.47"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/2367589.2367606"},{"key":"e_1_2_1_25_1","first-page":"383","volume-title":"Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation","author":"Muralidhar Subramanian","year":"2014","unstructured":"Subramanian Muralidhar, Wyatt Lloyd, Sabyasachi Roy, Cory Hill, Ernest Lin,Weiwen Liu, Satadru Pan, Shiva Shankar, Viswanath Sivakumar, Linpeng Tang, and Sanjeev Kumar. 2014. f4: Facebook's warm BLOB storage system. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation (Broomfield, CO) (OSDI'14). USENIX Association, USA, 383-398. https:\/\/www.usenix. org\/conference\/osdi14\/technical-sessions\/presentation\/muralidhar"},{"key":"e_1_2_1_26_1","volume-title":"EPJ Web Conf. 245 (2020","author":"Kamil Michal","year":"2020","unstructured":"Peters, Andreas-Joachim, Simon, Michal Kamil, and Sindrilaru, Elvin Alin. 2020. Erasure Coding for production in the EOS Open Storage system. EPJ Web Conf. 245 (2020), 04008. https:\/\/doi.org\/10.1051\/ epjconf\/202024504008"},{"key":"e_1_2_1_27_1","first-page":"401","volume-title":"Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation","author":"Rashmi K. V.","year":"2016","unstructured":"K. V. Rashmi, Mosharaf Chowdhury, Jack Kosaian, Ion Stoica, and Kannan Ramchandran. 2016. EC-cache: load-balanced, low-latency cluster caching with online erasure coding. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation (Savannah, GA, USA) (OSDI'16). USENIX Association, USA, 401-417. https:\/\/www.usenix.org\/conference\/osdi16\/technicalsessions\/ presentation\/rashmi"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3708994"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/MSST.2010.5496972"},{"key":"e_1_2_1_30_1","volume-title":"Kubiatowicz","author":"Weatherspoon Hakim","year":"2002","unstructured":"Hakim Weatherspoon and John D. Kubiatowicz. 2002. Erasure Coding Vs. Replication: A Quantitative Comparison. In Peer-to-Peer Systems, Peter Druschel, Frans Kaashoek, and Antony Rowstron (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 328-337. https:\/\/doi.org\/10. 1007\/3-540-45748-8_31"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.48786\/edbt.2024.13"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDCS54860.2022.00065"},{"key":"e_1_2_1_33_1","unstructured":"Zhe Zhang Amey Deshpande Xiaosong Ma Eno Thereska and Dushyanth Narayanan. 2010. Does erasure coding have a role to play in my data center? Technical Report MSR-TR-2010- 52. https:\/\/www.microsoft.com\/en-us\/research\/publication\/doeserasure- coding-have-a-role-to-play-in-my-data-center\/"}],"container-title":["ACM SIGOPS Operating Systems Review"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3759441.3759451","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,7]],"date-time":"2025-08-07T19:50:41Z","timestamp":1754596241000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3759441.3759451"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8,4]]},"references-count":33,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,8,4]]}},"alternative-id":["10.1145\/3759441.3759451"],"URL":"https:\/\/doi.org\/10.1145\/3759441.3759451","relation":{},"ISSN":["0163-5980"],"issn-type":[{"value":"0163-5980","type":"print"}],"subject":[],"published":{"date-parts":[[2025,8,4]]},"assertion":[{"value":"2025-08-06","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}