{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T01:03:25Z","timestamp":1760058205685,"version":"build-2065373602"},"reference-count":47,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2025,3,17]],"date-time":"2025-03-17T00:00:00Z","timestamp":1742169600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"the National Science Foundation of China","doi-asserted-by":"publisher","award":["62401258","BK20241380"],"award-info":[{"award-number":["62401258","BK20241380"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Natural Science Foundation of Jiangsu Province","award":["62401258","BK20241380"],"award-info":[{"award-number":["62401258","BK20241380"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>A clustered distributed storage system (DSS), also called a rack-aware storage system, is a distributed storage system in which the nodes are grouped into several clusters. The communication between two clusters may be restricted by their connectivity; that is to say, the communication cost between nodes differs depending on their location. As such, when repairing a failed node, downloading data from nodes that are in the same cluster is much cheaper and more efficient than downloading data from nodes in another cluster. In this article, we consider a scenario in which the failed nodes only download data from nodes in the same cluster, which is an extreme and important case that leverages the fact that the intra-cluster bandwidth is much cheaper than the cross-cluster repair bandwidth. Also, we study the problem of repairing multiple failures in this article, which allows for collaboration within the same cluster, i.e., failed nodes in the same cluster can exchange data with each other. We derive the trade-off between the storage and repair bandwidth for the clustered DSSs and provide explicit code constructions achieving two extreme points in the trade-off, namely the minimum storage clustered collaborative repair (MSCCR) point and the minimum bandwidth clustered collaborative repair (MBCCR) point, respectively.<\/jats:p>","DOI":"10.3390\/e27030313","type":"journal-article","created":{"date-parts":[[2025,3,17]],"date-time":"2025-03-17T11:04:22Z","timestamp":1742209462000},"page":"313","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Clustered Distributed Data Storage Repairing Multiple Failures"],"prefix":"10.3390","volume":"27","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5595-3712","authenticated-orcid":false,"given":"Shiqiu","family":"Liu","sequence":"first","affiliation":[{"name":"Pengcheng Laboratory, Shenzhen 518066, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4675-2622","authenticated-orcid":false,"given":"Fangwei","family":"Ye","sequence":"additional","affiliation":[{"name":"College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9521-5084","authenticated-orcid":false,"given":"Qihui","family":"Wu","sequence":"additional","affiliation":[{"name":"Key Laboratory of Dynamic Cognitive System of Electromagnetic Spectrum Space, Ministry of Industry and Information Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China"}]}],"member":"1968","published-online":{"date-parts":[[2025,3,17]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"29","DOI":"10.1145\/1165389.945450","article-title":"The Google File System","volume":"37","author":"Ghemawat","year":"2003","journal-title":"ACM Sigops Oper. Syst. Rev."},{"key":"ref_2","unstructured":"Muralidhar, S., Lloyd, W., Roy, S., Hill, C., Lin, E., Liu, W., Pan, S., Shankar, S., Sivakumar, V., and Tang, L. (2014, January 6\u20138). f4: Facebook\u2019s Warm BLOB Storage System. Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), Broomfield, CO, USA."},{"key":"ref_3","unstructured":"Bhagwan, R., Tati, K., Cheng, Y.C., Savage, S., and Voelker, G.M. (2004, January 29\u201331). Total Recall: System Support for Automated Availability Management. Proceedings of the First Symposium on Networked Systems Design and Implementation (NSDI 04), San Francisco, CA, USA."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"4539","DOI":"10.1109\/TIT.2010.2054295","article-title":"Network Coding for Distributed Storage Systems","volume":"56","author":"Dimakis","year":"2010","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Kermarrec, A.M., Le Scouarnec, N., and Straub, G. (2011, January 25\u201327). Repairing Multiple Failures with Coordinated and Adaptive Regenerating Codes. Proceedings of the 2011 International Symposium on Networking Coding, Beijing, China.","DOI":"10.1109\/ISNETCOD.2011.5978920"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"268","DOI":"10.1109\/JSAC.2010.100216","article-title":"Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding","volume":"28","author":"Hu","year":"2010","journal-title":"IEEE J. Sel. Areas Commun."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"7229","DOI":"10.1109\/TIT.2013.2274265","article-title":"Cooperative Regenerating Codes","volume":"59","author":"Shum","year":"2013","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_8","unstructured":"Ahmad, F., Chakradhar, S.T., Raghunathan, A., and Vijaykumar, T.N. (2014, January 19\u201320). ShuffleWatcher: Shuffle-aware Scheduling in Multi-tenant MapReduce Clusters. Proceedings of the 2014 USENIX Annual Technical Conference (USENIX ATC 14), Philadelphia, PA, USA."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Benson, T., Akella, A., and Maltz, D.A. (2010, January 1\u201330). Network traffic characteristics of data centers in the wild. Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement, New York, NY, USA.","DOI":"10.1145\/1879141.1879175"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"29","DOI":"10.1109\/MM.2010.72","article-title":"Scale-Out Networking in the Data Center","volume":"30","author":"Vahdat","year":"2010","journal-title":"IEEE Micro"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Gast\u00f3n, B., Pujol, J., and Villanueva, M. (2013, January 20\u201322). A Realistic Distributed Storage System That Minimizes Data Storage and Repair Bandwidth. Proceedings of the 2013 Data Compression Conference, Snowbird, UH, USA.","DOI":"10.1109\/DCC.2013.72"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Pernas, J., Yuen, C., Gast\u00f3n, B., and Pujol, J. (2013, January 7\u201312). Non-homogeneous two-rack model for distributed storage systems. Proceedings of the 2013 IEEE International Symposium on Information Theory, Istanbul, Turkey.","DOI":"10.1109\/ISIT.2013.6620424"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"81","DOI":"10.1109\/TIT.2018.2837860","article-title":"Capacity of Clustered Distributed Storage","volume":"65","author":"Sohn","year":"2019","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Sohn, J., Choi, B., and Moon, J. (2018, January 17\u201322). A Class of MSR Codes for Clustered Distributed Storage. Proceedings of the 2018 IEEE International Symposium on Information Theory (ISIT), Vail, CO, USA.","DOI":"10.1109\/ISIT.2018.8437458"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Hu, Y., Lee, P.P.C., and Zhang, X. (2016, January 10\u201315). Double Regenerating Codes for hierarchical data centers. Proceedings of the 2016 IEEE International Symposium on Information Theory (ISIT), Barcelona, Spain.","DOI":"10.1109\/ISIT.2016.7541298"},{"key":"ref_16","first-page":"1","article-title":"Optimal repair layering for erasure-coded data centers: From theory to practice","volume":"13","author":"Hu","year":"2020","journal-title":"ACM Trans. Storage (TOS)"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Tebbi, M.A., Chan, T.H., and Sung, C.W. (2014, January 2\u20135). A code design framework for multi-rack distributed storage. Proceedings of the 2014 IEEE Information Theory Workshop (ITW 2014), Hobart, Australia.","DOI":"10.1109\/ITW.2014.6970791"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"4730","DOI":"10.1109\/TIT.2019.2902835","article-title":"Rack-Aware Regenerating Codes for Data Centers","volume":"65","author":"Hou","year":"2019","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Hou, H., and Lee, P.P.C. (2021, January 12\u201320). Generalized Rack-aware Regenerating Codes for Jointly Optimal Node and Rack Repairs. Proceedings of the 2021 IEEE International Symposium on Information Theory (ISIT), Victoria, Australia.","DOI":"10.1109\/ISIT45174.2021.9518219"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Yu, B., Jiang, Z., Huang, Z., Song, L., and Hou, H. (2023, January 2\u20134). Product-Matrix Construction of Minimum Storage Rack-aware Regenerating Codes. Proceedings of the 2023 International Conference on Wireless Communications and Signal Processing (WCSP), Hangzhou, China.","DOI":"10.1109\/WCSP58612.2023.10404536"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"886","DOI":"10.1109\/TIT.2019.2941744","article-title":"Explicit Constructions of MSR Codes for Clustered Distributed Storage: The Rack-Aware Storage Model","volume":"66","author":"Chen","year":"2020","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Hou, H., Lee, P.P.C., and Han, Y.S. (2020, January 21\u201326). Minimum Storage Rack-Aware Regenerating Codes with Exact Repair and Small Sub-Packetization. Proceedings of the 2020 IEEE International Symposium on Information Theory (ISIT), Los Angeles, CA, USA.","DOI":"10.1109\/ISIT44484.2020.9174461"},{"key":"ref_23","unstructured":"Jin, L., Luo, G., and Xing, C. (2019). Optimal Repairing Schemes for Reed Solomon Codes with Alphabet Sizes Linear in Lengths under the Rack Aware Model. arXiv."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"1613","DOI":"10.1109\/JIOT.2019.2947720","article-title":"An Adaptive Erasure Code for JointCloud Storage of Internet of Things Big Data","volume":"7","author":"Bao","year":"2020","journal-title":"IEEE Internet Things J."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Shen, Z., Shu, J., and Lee, P.P.C. (July, January 28). Reconsidering Single Failure Recovery in Clustered File Systems. Proceedings of the 2016 46th Annual IEEE\/IFIP International Conference on Dependable Systems and Networks (DSN), Toulouse, France.","DOI":"10.1109\/DSN.2016.37"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Abdrashitov, V., Prakash, N., and M\u00e9dard, M. (2017, January 6\u201310). The storage vs repair bandwidth trade-off for multiple failures in clustered storage networks. Proceedings of the 2017 IEEE Information Theory Workshop (ITW), Kaohsiung, Taiwan.","DOI":"10.1109\/ITW.2017.8277979"},{"key":"ref_27","unstructured":"Gupta, S., and Lalitha, V. (2020, January 24\u201327). Rack-Aware Cooperative Regenerating Codes. Proceedings of the 2020 International Symposium on Information Theory and Its Applications (ISITA), Virtual."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"362","DOI":"10.1109\/JSAIT.2022.3182365","article-title":"On Rack-Aware Cooperative Regenerating Codes and Epsilon-MSCR Codes","volume":"3","author":"Gupta","year":"2022","journal-title":"IEEE J. Sel. Areas Inf. Theory"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"4316","DOI":"10.1109\/TCOMM.2022.3175826","article-title":"Rack-Aware Regenerating Codes with Multiple Erasure Tolerance","volume":"70","author":"Zhou","year":"2022","journal-title":"IEEE Trans. Commun."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"6428","DOI":"10.1109\/TIT.2023.3289187","article-title":"Rack-Aware MSR Codes with Error Correction Capability for Multiple Erasure Tolerance","volume":"69","author":"Wang","year":"2023","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Wang, J., and Guan, X. (2024, January 24\u201328). Rack-Aware Minimum-Storage Regenerating Codes with Optimal Access for Consecutive Node Failures. Proceedings of the 2024 IEEE Information Theory Workshop (ITW), Shenzhen, China.","DOI":"10.1109\/ITW61385.2024.10806950"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Wang, J., and Chen, Z. (2023, January 25\u201330). Low-access repair of Reed-Solomon codes in rack-aware storage. Proceedings of the 2023 IEEE International Symposium on Information Theory (ISIT), Taipei, Taiwan.","DOI":"10.1109\/ISIT54713.2023.10206908"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Le Scouarnec, N. (2012, January 1\u20136). Exact scalar minimum storage coordinated regenerating codes. Proceedings of the 2012 IEEE International Symposium on Information Theory Proceedings, Cambridge, MA, USA.","DOI":"10.1109\/ISIT.2012.6283044"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Wang, A., and Zhang, Z. (2013, January 14\u201319). Exact cooperative regenerating codes with minimum-repair-bandwidth for distributed storage. Proceedings of the 2013 Proceedings IEEE INFOCOM, Turin, Italy.","DOI":"10.1109\/INFCOM.2013.6566803"},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Jiekak, S., and Scouarnec, N.L. (2012). CROSS-MBCR: Exact minimum bandwidth coordinated regenerating codes. arXiv.","DOI":"10.1109\/ISIT.2012.6283044"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Li, J., and Li, B. (May, January 27). Cooperative repair with minimum-storage regenerating codes for distributed storage. Proceedings of the IEEE INFOCOM 2014-IEEE Conference on Computer Communications, Toronto, ON, Canada.","DOI":"10.1109\/INFOCOM.2014.6847953"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"995","DOI":"10.1109\/TIT.2019.2934114","article-title":"Scalar MSCR Codes via the Product Matrix Construction","volume":"66","author":"Zhang","year":"2020","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"1639","DOI":"10.1109\/TIT.2018.2856206","article-title":"Cooperative Repair: Constructions of Optimal MDS Codes for All Admissible Parameters","volume":"65","author":"Ye","year":"2019","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"7457","DOI":"10.1109\/TIT.2020.3008342","article-title":"New Constructions of Cooperative MSR Codes: Reducing Node Size to exp(O(n))","volume":"66","author":"Ye","year":"2020","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Liu, S., and Oggier, F. (July, January 29). On storage codes allowing partially collaborative repairs. Proceedings of the 2014 IEEE International Symposium on Information Theory, Honolulu, HI, USA.","DOI":"10.1109\/ISIT.2014.6875272"},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"4012","DOI":"10.1109\/TCOMM.2020.2988924","article-title":"Exact-Repair Codes with Partial Collaboration in Distributed Storage Systems","volume":"68","author":"Liu","year":"2020","journal-title":"IEEE Trans. Commun."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"113","DOI":"10.3934\/amc.2016.10.113","article-title":"On applications of orbit codes to storage","volume":"10","author":"Liu","year":"2016","journal-title":"Adv. Math. Commun."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Liu, S., and Oggier, F. (2014, January 26\u201329). Two storage code constructions allowing partially collaborative repairs. Proceedings of the 2014 International Symposium on Information Theory and its Applications, Melbourne, Australia.","DOI":"10.1109\/ISIT.2014.6875272"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"4180","DOI":"10.1109\/TIT.2019.2898660","article-title":"Centralized Multi-Node Repair Regenerating Codes","volume":"65","author":"Zorgui","year":"2019","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"7529","DOI":"10.1109\/TIT.2018.2871451","article-title":"Centralized Repair of Multiple Node Failures with Applications to Communication Efficient Secret Sharing","volume":"64","author":"Rawat","year":"2018","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_46","first-page":"1","article-title":"Theory of Codes with Maximum Rank Distance","volume":"21","author":"Gabidulin","year":"1985","journal-title":"Probl. Inform. Transm."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"328","DOI":"10.1109\/18.75248","article-title":"Maximum-rank array codes and their application to crisscross error correction","volume":"37","author":"Roth","year":"1991","journal-title":"IEEE Trans. Inf. Theory"}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/27\/3\/313\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T16:55:06Z","timestamp":1760028906000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/27\/3\/313"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,17]]},"references-count":47,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2025,3]]}},"alternative-id":["e27030313"],"URL":"https:\/\/doi.org\/10.3390\/e27030313","relation":{},"ISSN":["1099-4300"],"issn-type":[{"type":"electronic","value":"1099-4300"}],"subject":[],"published":{"date-parts":[[2025,3,17]]}}}