{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,5]],"date-time":"2026-06-05T15:37:44Z","timestamp":1780673864170,"version":"3.54.1"},"reference-count":68,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2022,11,12]],"date-time":"2022-11-12T00:00:00Z","timestamp":1668211200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"NSFC","doi-asserted-by":"crossref","award":["61832020, 61821003"],"award-info":[{"award-number":["61832020, 61821003"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Storage"],"published-print":{"date-parts":[[2022,11,30]]},"abstract":"<jats:p>\n            Disk failure has always been a major problem for data centers, leading to data loss. Current disk failure prediction approaches are mostly offline and assume that the disk labels required for training learning models are available and accurate. However, these offline methods are no longer suitable for disk failure prediction tasks in large-scale data centers. Behind this explosive amount of data, most methods do not consider whether it is not easy to get the label values during the training or the obtained label values are not completely accurate. These problems further restrict the development of supervised learning and offline modeling in disk failure prediction. In this article, Active Semi-supervised Learning Disk-failure Prediction (\n            <jats:italic>ASLDP<\/jats:italic>\n            ), a novel disk failure prediction method is proposed, which uses active learning and semi-supervised learning. According to the characteristics of data in the disk lifecycle,\n            <jats:italic>ASLDP<\/jats:italic>\n            carries out active learning for those clear labeled samples, which selects valuable samples with the most significant probability uncertainty and eliminates redundancy. For those samples that are unclearly labeled or unlabeled,\n            <jats:italic>ASLDP<\/jats:italic>\n            uses semi-supervised learning for pre-labeled by calculating the conditional values of the samples and enhances the generalization ability by active learning. Compared with several state-of-the-art offline and online learning approaches, the results on four realistic datasets from Backblaze and Baidu demonstrate that\n            <jats:italic>ASLDP<\/jats:italic>\n            achieves stable failure detection rates of 80\u201385% with low false alarm rates. In addition, we use a dataset from Alibaba to evaluate the generality of\n            <jats:italic>ASLDP<\/jats:italic>\n            . Furthermore,\n            <jats:italic>ASLDP<\/jats:italic>\n            can overcome the problem of missing sample labels and data redundancy in large data centers, which are not considered and implemented in all offline learning methods for disk failure prediction to the best of our knowledge. Finally,\n            <jats:italic>ASLDP<\/jats:italic>\n            can predict the disk failure 4.9 days in advance with lower overhead and latency.\n          <\/jats:p>","DOI":"10.1145\/3523699","type":"journal-article","created":{"date-parts":[[2022,9,27]],"date-time":"2022-09-27T11:25:22Z","timestamp":1664277922000},"page":"1-33","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":10,"title":["A Disk Failure Prediction Method Based on Active Semi-supervised Learning"],"prefix":"10.1145","volume":"18","author":[{"given":"Yang","family":"Zhou","sequence":"first","affiliation":[{"name":"Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Fang","family":"Wang","sequence":"additional","affiliation":[{"name":"Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Dan","family":"Feng","sequence":"additional","affiliation":[{"name":"Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2022,11,12]]},"reference":[{"key":"e_1_3_1_2_2","first-page":"9","article-title":"Monitoring hard disks with SMART","author":"Allen Bruce","year":"2004","unstructured":"Bruce Allen. 2004. Monitoring hard disks with SMART. Linux J. 2004, 117 (2004), 9.","journal-title":"Linux J."},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/BigDataCongress.2018.00044"},{"key":"e_1_3_1_4_2","unstructured":"Backblaze. 2014. Hard Drive SMART Stats. Retrieved from https:\/\/www.backblaze.com\/blog\/hard-drive-smart-stats\/."},{"key":"e_1_3_1_5_2","unstructured":"Backblaze. 2015. What Is the Best Hard Drive? Retrieved from https:\/\/www.backblaze.com\/blog\/best-hard-drive-q4-2014\/."},{"key":"e_1_3_1_6_2","unstructured":"Backblaze. 2016\u20132020. Raw Hard Drive Test Data. Retrieved from https:\/\/www.backblaze.com\/b2\/hard-drive-test-data.html."},{"key":"e_1_3_1_7_2","unstructured":"Baidu. 2013. Baidu Dataset. Retrieved from http:\/\/pan.baidu.com\/share\/link?shareid=189977&uk=4278294944."},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/279943.279962"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2939699"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1007\/BF00058655"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDMW.2017.154"},{"key":"e_1_3_1_12_2","first-page":"10","article-title":"Conditional value-based co-training","author":"Cheng Sheng-Jun","year":"2013","unstructured":"Sheng-Jun Cheng, Jia-Feng Liu, Qing-Cheng Huang, and Xiang-Long Tang. 2013. Conditional value-based co-training. Acta Automat. Sin. (2013), 10.","journal-title":"Acta Automat. Sin."},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2018\/558"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/BRACIS.2017.72"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.5555\/3322706.3361996"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/2523813"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-017-5642-8"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/3242086"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDCS47774.2020.00044"},{"key":"e_1_3_1_20_2","first-page":"1919","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Hashemi Milad","year":"2018","unstructured":"Milad Hashemi, Kevin Swersky, Jamie Smith, Grant Ayers, Heiner Litz, Jichuan Chang, Christos Kozyrakis, and Parthasarathy Ranganathan. 2018. Learning memory access patterns. In Proceedings of the International Conference on Machine Learning. PMLR, 1919\u20131928."},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/TR.2002.802886"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3196909"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2013.01.032"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/BigDataCongress.2019.00036"},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/DSN.2014.44"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ress.2017.03.004"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.11396"},{"key":"e_1_3_1_28_2","article-title":"Active learning of strict partial orders: A case study on concept prerequisite relations","author":"Liang Chen","year":"2018","unstructured":"Chen Liang, Jianbo Ye, Han Zhao, Bart Pursel, and C. Lee Giles. 2018. Active learning of strict partial orders: A case study on concept prerequisite relations. arXiv:1801.06481. Retrieved from https:\/\/arxiv.org\/abs\/1801.06481.","journal-title":"arXiv:1801.06481"},{"key":"e_1_3_1_29_2","article-title":"Continuous control with deep reinforcement learning","author":"Lillicrap Timothy P.","year":"2015","unstructured":"Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2015. Continuous control with deep reinforcement learning. arXiv:1509.02971.","journal-title":"arXiv:1509.02971"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.dss.2015.03.008"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.5555\/3386691.3386706"},{"key":"e_1_3_1_32_2","first-page":"391","volume-title":"Proceedings of the USENIX Conference on Usenix Annual Technical Conference","volume":"1","author":"Mahdisoltani Farzaneh","year":"2017","unstructured":"Farzaneh Mahdisoltani, Ioan Stefanovici, and Bianca Schroeder. 2017. Improving storage system reliability with proactive error prediction. In Proceedings of the USENIX Conference on Usenix Annual Technical Conference, Vol. 1. 391\u2013402."},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4842-4106-6_7"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2018.2858821"},{"issue":"72","key":"e_1_3_1_35_2","first-page":"1","article-title":"Scikit-multiflow: A multi-output streaming framework","volume":"19","author":"Montiel Jacob","year":"2018","unstructured":"Jacob Montiel, Jesse Read, Albert Bifet, and Talel Abdessalem. 2018. Scikit-multiflow: A multi-output streaming framework. J. Mach. Learn. Res. 19, 72 (2018), 1\u20135.","journal-title":"J. Mach. Learn. Res."},{"key":"e_1_3_1_36_2","volume-title":"Proceedings of the Artificial Neural Networks and Neural Information Processing (ICANN\/ICONIP\u201903)","author":"Murray Joseph F.","year":"2003","unstructured":"Joseph F. Murray, Gordon F. Hughes, and Kenneth Kreutz-Delgado. 2003. Hard drive failure prediction using non-parametric statistical methods. In Proceedings of the Artificial Neural Networks and Neural Information Processing (ICANN\/ICONIP\u201903)."},{"issue":"5","key":"e_1_3_1_37_2","article-title":"Machine learning methods for predicting failures in hard drives: A multiple-instance application.","volume":"6","author":"Murray Joseph F.","year":"2005","unstructured":"Joseph F. Murray, Gordon F. Hughes, Kenneth Kreutz-Delgado, and Dale Schuurmans. 2005. Machine learning methods for predicting failures in hard drives: A multiple-instance application. J. Mach. Learn. Res. 6, 5 (2005).","journal-title":"J. Mach. Learn. Res."},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1145\/50202.50214"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/BRACIS.2019.00108"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1145\/2465470.2465473"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2939778"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1145\/1670679.1670680"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1145\/1288783.1288785"},{"key":"e_1_3_1_44_2","unstructured":"Burr Settles. 2009. Active learning literature survey."},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1145\/3316781.3317918"},{"key":"e_1_3_1_46_2","first-page":"307","volume-title":"Proceedings of the 7th Workshop on Building Educational Applications Using NLP","author":"Talukdar Partha","year":"2012","unstructured":"Partha Talukdar and William Cohen. 2012. Crowdsourced comprehension: Predicting prerequisite structure in wikipedia. In Proceedings of the 7th Workshop on Building Educational Applications Using NLP. 307\u2013315."},{"key":"e_1_3_1_47_2","unstructured":"Alibaba Cloud Computing TIANCHI. 2021. Large-scale SSD Failure Prediction Dataset. Retrieved from https:\/\/github.com\/alibaba-edu\/dcbrain\/tree\/master\/ssd_smart_logs."},{"key":"e_1_3_1_48_2","doi-asserted-by":"publisher","DOI":"10.1145\/1390156.1390301"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/TII.2013.2264060"},{"key":"e_1_3_1_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/PHM.2011.5939558"},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","DOI":"10.1145\/3225058.3225106"},{"key":"e_1_3_1_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/MSST.2019.000-3"},{"key":"e_1_3_1_53_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCD.2018.00089"},{"key":"e_1_3_1_54_2","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2016.2538237"},{"key":"e_1_3_1_55_2","doi-asserted-by":"publisher","DOI":"10.1109\/DSN48987.2021.00039"},{"key":"e_1_3_1_56_2","first-page":"481","volume-title":"Proceedings of the USENIX Annual Technical Conference (USENIX ATC\u201918)","author":"Xu Yong","year":"2018","unstructured":"Yong Xu, Kaixin Sui, Randolph Yao, Hongyu Zhang, Qingwei Lin, Yingnong Dang, Peng Li, Keceng Jiang, Wenchi Zhang, Jian-Guang Lou, et\u00a0al. 2018. Improving service availability of cloud systems by predicting disk error. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC\u201918). 481\u2013494."},{"key":"e_1_3_1_57_2","doi-asserted-by":"publisher","DOI":"10.1109\/DSN-S50200.2020.00017"},{"key":"e_1_3_1_58_2","doi-asserted-by":"publisher","DOI":"10.14778\/3407790.3407803"},{"key":"e_1_3_1_59_2","first-page":"111","volume-title":"Proceedings of the USENIX Annual Technical Conference (USENIX ATC\u201920)","author":"Zhang Ji","year":"2020","unstructured":"Ji Zhang, Ping Huang, Ke Zhou, Ming Xie, and Sebastian Schelter. 2020. HDDse: Enabling high-dimensional disk state embedding for generic failure detection system of heterogeneous disks in large data centers. In Proceedings of the USENIX Annual Technical Conference (USENIX ATC\u201920). 111\u2013126."},{"key":"e_1_3_1_60_2","doi-asserted-by":"publisher","DOI":"10.1145\/3299869.3300085"},{"key":"e_1_3_1_61_2","doi-asserted-by":"publisher","DOI":"10.1145\/3337821.3337881"},{"key":"e_1_3_1_62_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2020.2985346"},{"key":"e_1_3_1_63_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2013.09.035"},{"key":"e_1_3_1_64_2","doi-asserted-by":"publisher","DOI":"10.1145\/3472456.3472490"},{"key":"e_1_3_1_65_2","doi-asserted-by":"publisher","DOI":"10.1109\/IJCNN.2019.8852275"},{"key":"e_1_3_1_66_2","doi-asserted-by":"publisher","DOI":"10.1109\/MSN50589.2020.00107"},{"key":"e_1_3_1_67_2","doi-asserted-by":"publisher","DOI":"10.1109\/MSST.2013.6558427"},{"key":"e_1_3_1_68_2","unstructured":"Xiaojin Jerry Zhu. 2005. Semi-supervised learning literature survey. University of Wisconsin-Madison Department of Computer Sciences."},{"key":"e_1_3_1_69_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-43024-5_2"}],"container-title":["ACM Transactions on Storage"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3523699","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3523699","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:30:36Z","timestamp":1750188636000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3523699"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,11,12]]},"references-count":68,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2022,11,30]]}},"alternative-id":["10.1145\/3523699"],"URL":"https:\/\/doi.org\/10.1145\/3523699","relation":{},"ISSN":["1553-3077","1553-3093"],"issn-type":[{"value":"1553-3077","type":"print"},{"value":"1553-3093","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,11,12]]},"assertion":[{"value":"2021-07-07","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-03-02","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-11-12","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}