{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T02:03:03Z","timestamp":1760148183900,"version":"build-2065373602"},"reference-count":41,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2023,4,6]],"date-time":"2023-04-06T00:00:00Z","timestamp":1680739200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100003725","name":"National Research Foundation of Korea (NRF)","doi-asserted-by":"publisher","award":["2022R1F1A1062953"],"award-info":[{"award-number":["2022R1F1A1062953"]}],"id":[{"id":"10.13039\/501100003725","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Symmetry"],"abstract":"<jats:p>With the development of various information and communication technologies, the amount of big data has increased, and distributed file systems have emerged to store them stably. The replication technique divides the original data into blocks and writes them on multiple servers for redundancy and fault tolerance. However, there is a symmetrical space efficiency problem that arises from the need to store blocks larger than the original data. When storing data, the Erasure Coding (EC) technique generates parity blocks through encoding calculations and writes them separately on each server for fault tolerance and data recovery purposes. Even if a specific server fails, original data can still be recovered through decoding calculations using the parity blocks stored on the remaining servers. However, matrices generated during encoding and decoding are redundantly generated during data writing and recovery, which leads to unnecessary overhead in distributed file systems. This paper proposes a cache-based matrix technique that uploads the matrices generated during encoding and decoding to cache memory and reuses them, rather than generating new matrices each time encoding or decoding occurs. The design of the cache memory applies the Weighting Size and Cost Replacement Policy (WSCRP) algorithm to efficiently upload and reuse matrices to cache memory using parameters known as weights and costs. Furthermore, the cache memory table can be managed efficiently because the weight\u2013cost model sorts and updates matrices using specific parameters, which reduces replacement cost. The experiment utilized the Hadoop Distributed File System (HDFS) as the distributed file system, and the EC volume was composed of Reed\u2013Solomon code with parameters (6, 3). As a result of the experiment, it was possible to reduce the write, read, and recovery times associated with encoding and decoding. In particular, for up to three node failures, systems using WSCRP were able to reduce recovery time by about 30 s compared to regular HDFS systems.<\/jats:p>","DOI":"10.3390\/sym15040872","type":"journal-article","created":{"date-parts":[[2023,4,6]],"date-time":"2023-04-06T03:59:55Z","timestamp":1680753595000},"page":"872","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Cache-Based Matrix Technology for Efficient Write and Recovery in Erasure Coding Distributed File Systems"],"prefix":"10.3390","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9957-0489","authenticated-orcid":false,"given":"Dong-Jin","family":"Shin","sequence":"first","affiliation":[{"name":"Department of Computer Engineering, Anyang University, Anyang-si 14028, Republic of Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0125-1907","authenticated-orcid":false,"given":"Jeong-Joon","family":"Kim","sequence":"additional","affiliation":[{"name":"Department of Software, Anyang University, Anyang-si 14028, Republic of Korea"}]}],"member":"1968","published-online":{"date-parts":[[2023,4,6]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Sigov, A., Ratkin, L., Ivanov, L.A., and Xu, L.D. (2022). Emerging enabling technologies for industry 4.0 and beyond. Inf. Syst. Front., 1\u201311.","DOI":"10.1007\/s10796-021-10213-w"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3465405","article-title":"Survey of Distributed File System Design Choices","volume":"18","author":"Macko","year":"2022","journal-title":"ACM Trans. Storage"},{"key":"ref_3","unstructured":"Karun, A.K., and Chitharanjan, K. (2013, January 11\u201312). A review on hadoop\u2014HDFS infrastructure extensions. Proceedings of the 2013 IEEE Conference on Information & Communication Technologies, Thuckalay, India."},{"key":"ref_4","first-page":"9664","article-title":"Research on Improving disk throughput in EC-based distributed file system","volume":"58","author":"Shin","year":"2021","journal-title":"Psychology"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"4638","DOI":"10.1007\/s11227-018-2663-4","article-title":"Cost analysis of erasure coding for exa-scale storage","volume":"75","author":"Kim","year":"2018","journal-title":"J. Supercomput."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"100301","DOI":"10.1007\/s11432-018-9482-6","article-title":"Erasure coding for distributed storage: An overview","volume":"61","author":"Balaji","year":"2018","journal-title":"Sci. China Inf. Sci."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"27010","DOI":"10.1109\/ACCESS.2018.2829142","article-title":"An improved web cache replacement algorithm based on weighting and cost","volume":"6","author":"Ma","year":"2018","journal-title":"IEEE Access"},{"key":"ref_8","first-page":"93","article-title":"A replacement algorithm based on weighting and ranking cache objects","volume":"2","author":"Samiee","year":"2009","journal-title":"Int. J. Hybrid Inf. Technol."},{"key":"ref_9","first-page":"304","article-title":"Compare cost and performance of replication and erasure coding","volume":"63","author":"Cook","year":"2014","journal-title":"Hitachi Rev."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"2259","DOI":"10.1109\/TC.2013.23","article-title":"Efficient encoding schedules for XOR-based erasure codes","volume":"63","author":"Luo","year":"2013","journal-title":"IEEE Trans. Comput."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"995","DOI":"10.1002\/(SICI)1097-024X(199709)27:9<995::AID-SPE111>3.0.CO;2-6","article-title":"A tutorial on Reed\u2013Solomon coding for fault-tolerance in RAID-like systems","volume":"27","author":"Plank","year":"1997","journal-title":"Softw. Pract. Exp."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"242","DOI":"10.1177\/1094342009106191","article-title":"The raid-6 liber8tion code","volume":"23","author":"Plank","year":"2009","journal-title":"Int. J. High Perform. Comput. Appl. Int. J. High Perform. C"},{"key":"ref_13","unstructured":"Hafner, J.L. (2005, January 13\u201316). WEAVER Codes: Highly Fault Tolerant Erasure Codes for Storage Systems. Proceedings of the FAST\u201905: 4th USENIX Conference on File and Storage Technologies, San Francisco, CA, USA."},{"key":"ref_14","unstructured":"(2023, January 15). Introduction to HDFS Erasure Coding in Apache Hadoop. Available online: https:\/\/blog.cloudera.com\/introduction-to-hdfs-erasure-coding-in-apache-hadoop\/."},{"key":"ref_15","first-page":"44","article-title":"Erasure codes for storage systems: A brief primer","volume":"38","author":"Plank","year":"2013","journal-title":"Login"},{"key":"ref_16","unstructured":"Huang, C., Simitci, H., Xu, Y., Ogus, A., Calder, B., Gopalan, P., and Yekhanin, S. (2012, January 13\u201315). Erasure coding in windows azure storage. Proceedings of the USENIX ATC\u201912: The 2012 USENIX Conference on Annual Technical Conference, Boston, MA, USA."},{"key":"ref_17","unstructured":"Rashmi, K.V., Shah, N.B., Gu, D., Kuang, H., Borthakur, D., and Ramchandran, K. (2013, January 27\u201328). A solution to the network challenges of data recovery in erasure-coded distributed storage systems: A study on the Facebook warehouse cluster. Proceedings of the 5th USENIX Workshop on Hot Topics in Storage and File Systems, San Jose, CA, USA."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"3021","DOI":"10.1109\/TIT.2013.2241819","article-title":"Repair optimal erasure codes through hadamard designs","volume":"59","author":"Papailiopoulos","year":"2013","journal-title":"IEEE Trans. Inf."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Chen, B., Ammula, A.K., and Curtmola, R. (2015, January 2\u20134). Towards server-side repair for erasure coding-based distributed storage systems. Proceedings of the 5th ACM Conference on Data and Application Security and Privacy, New York, NY, USA.","DOI":"10.1145\/2699026.2699122"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Li, J., and Li, B. (2016, January 20\u201321). Zebra: Demand-aware erasure coding for distributed storage systems. Proceedings of the IEEE\/ACM 24th International Symposium on Quality of Service (IWQoS), Beijing, China.","DOI":"10.1109\/IWQoS.2016.7590388"},{"key":"ref_21","first-page":"1861","article-title":"Efficient techniques of parallel recovery for erasure-coding-based distributed file systems","volume":"101","author":"Kim","year":"2019","journal-title":"Comput. J."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Bashyam, K.R. (2021, January 5\u20139). Repair Pipelining for Clay-Coded Storage. Proceedings of the 2021 International Conference on COMmunication Systems & NETworkS (COMSNETS), Bangalore, India.","DOI":"10.1109\/COMSNETS51098.2021.9352864"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"100662","DOI":"10.1016\/j.softx.2021.100662","article-title":"Founsure 1.0: An erasure code library with efficient repair and update features","volume":"13","author":"Arslan","year":"2021","journal-title":"SoftwareX"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Uezato, Y. (2021, January 14\u201319). Accelerating XOR-based erasure coding using program optimization techniques. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, New York, NY, USA.","DOI":"10.1145\/3458817.3476204"},{"key":"ref_25","unstructured":"Muntz, D., and Honeyman, P. (1991, January 16). Multi-level Caching in Distributed File Systems. Proceedings of the Winter USENIX Conference, San Francisco, CA, USA."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Zhang, J., Wu, G., Hu, X., and Wu, X. (2012, January 20\u201323). A Distributed Cache for Hadoop Distributed File System in Real-Time Cloud Services. Proceedings of the ACM\/IEEE 13th International Conference on Grid Computing, Beijing, China.","DOI":"10.1109\/Grid.2012.17"},{"key":"ref_27","unstructured":"Rashmi, K.V., Chowdhury, M., Kosaian, J., Stoica, I., and Ramchandran, K. (2016, January 2\u20134). EC-Cache: Load-Balanced, Low-Latency Cluster Caching with Online Erasure Coding. Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, Savannah, GA, USA."},{"key":"ref_28","unstructured":"Anderson, T.E., Canini, M., Kim, J., Kostic, D., Kwon, Y., Peter, S., Reda, W., Schuh, H.N., and Witchel, E. (2020, January 4\u20136). Assise: Performance and Availability via Client-local NVM in a Distributed File System. Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation, Virtual Event."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"3173","DOI":"10.1007\/s10586-021-03317-0","article-title":"Popularity-based full replica caching for erasure-coded distributed storage systems","volume":"24","author":"Ruty","year":"2021","journal-title":"Clust. Comput."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Silberstein, M., Ganesh, L., Wang, Y., Alvisi, L., and Dahlin, M. (July, January 30). Lazy means smart: Reducing repair bandwidth costs in erasure-coded distributed storage. Proceedings of the SYSTOR 2014 International Conference on Systems and Storage, New York, NY, USA.","DOI":"10.1145\/2611354.2611370"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Mitra, S., Panta, R., Ra, M.R., and Bagchi, S. (2016, January 18\u201321). Partial-parallel-repair (PPR) a distributed technique for repairing erasure coded storage. Proceedings of the Eleventh European Conference on Computer Systems, New York, NY, USA.","DOI":"10.1145\/2901318.2901328"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Pei, X., Wang, Y., Ma, X., and Xu, F. (2016, January 10\u201314). T-update: A tree-structured update scheme with top-down transmission in erasure-coded systems. Proceedings of the IEEE INFOCOM 2016\u2014The 35th Annual IEEE International Conference on Computer Communications, San Francisco, CA, USA.","DOI":"10.1109\/INFOCOM.2016.7524347"},{"key":"ref_33","unstructured":"Li, R., Li, X., Lee, P.P., and Huang, Q. (2017, January 12\u201314). Repair Pipelining for Erasure-Coded Storage. Proceedings of the 2017 USENIX Annual Technical Conference (USENIX ATC\u201917), Santa Clara, CA, USA."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Wang, F., Tang, Y., Xie, Y., and Tang, X. (2019, January 20\u201324). XORInc: Optimizing data repair and update for erasure-coded systems with XOR-based in-network computation. Proceedings of the 35th Symposium on Mass Storage Systems and Technologies (MSST), Santa Clara, CA, USA.","DOI":"10.1109\/MSST.2019.00005"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"33","DOI":"10.1016\/j.future.2019.10.033","article-title":"Efficient in-network aggregation mechanism for data block repairing in data centers","volume":"105","author":"Xia","year":"2020","journal-title":"Future Gener. Comput. Syst."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Qiao, Y., Kong, X., Zhang, M., Zhou, Y., Xu, M., and Bi, J. (2020, January 3). Towards in-network acceleration of erasure coding. Proceedings of the Symposium on SDN Research, San Jose, CA, USA.","DOI":"10.1145\/3373360.3380833"},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Zeng, H., Zhang, C., Wu, C., Yang, G., Li, J., Xue, G., and Guo, M. (2020, January 18\u201321). FAGR: An efficient file-aware graph recovery scheme for erasure coded cloud storage systems. Proceedings of the 2020 IEEE 38th International Conference on Computer Design (ICCD), Hartford, CT, USA.","DOI":"10.1109\/ICCD50377.2020.00033"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"21843","DOI":"10.1109\/ACCESS.2021.3054954","article-title":"An Optimal Tree-Structured Repair Scheme of Multiple Failure Nodes for Distributed Storage Systems","volume":"9","author":"Zhou","year":"2021","journal-title":"IEEE Access"},{"key":"ref_39","unstructured":"Lee, K.H. (2007). Consideration of the Permutations and Combinations Taught in Secondary Schools. [Master\u2019s Thesis, Yonsei University Graduate School of Education]."},{"key":"ref_40","unstructured":"Hafner, J.L., Deenadhayalan, V., Rao, K.K., and Tomlin, J.A. (2005, January 13\u201316). Matrix Methods for Lost Data Reconstruction in Erasure Codes. Proceedings of the FAST\u201905: Proceedings of the 4th Conference on USENIX Conference on File and Storage Technologies, San Francisco, CA, USA."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Kim, J.J. (2021). Erasure-Coding-Based Storage and Recovery for Distributed Exascale Storage Systems. Appl. Sci., 11.","DOI":"10.3390\/app11083298"}],"container-title":["Symmetry"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-8994\/15\/4\/872\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T19:10:59Z","timestamp":1760123459000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-8994\/15\/4\/872"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,4,6]]},"references-count":41,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2023,4]]}},"alternative-id":["sym15040872"],"URL":"https:\/\/doi.org\/10.3390\/sym15040872","relation":{},"ISSN":["2073-8994"],"issn-type":[{"type":"electronic","value":"2073-8994"}],"subject":[],"published":{"date-parts":[[2023,4,6]]}}}