{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T23:36:04Z","timestamp":1775000164932,"version":"3.50.1"},"reference-count":53,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2023,5,17]],"date-time":"2023-05-17T00:00:00Z","timestamp":1684281600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,5,17]],"date-time":"2023-05-17T00:00:00Z","timestamp":1684281600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Seoultech"},{"DOI":"10.13039\/100000015","name":"U.S. Department of Energy","doi-asserted-by":"publisher","award":["DE-AC02-05CH11231"],"award-info":[{"award-number":["DE-AC02-05CH11231"]}],"id":[{"id":"10.13039\/100000015","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003725","name":"National Research Foundation of Korea","doi-asserted-by":"publisher","award":["2021R1C1C1010861"],"award-info":[{"award-number":["2021R1C1C1010861"]}],"id":[{"id":"10.13039\/501100003725","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Big Data"],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Large-scale high performance computing (HPC) systems typically consist of many thousands of CPUs and storage units used by hundreds to thousands of users simultaneously. Applications from large numbers of users have diverse characteristics, such as varying computation, communication, memory, and I\/O intensity. A good understanding of the performance characteristics of each user application is important for job scheduling and resource provisioning. Among these performance characteristics, I\/O performance is becoming increasingly important as data sizes rapidly increase and large-scale applications, such as simulation and model training, are widely adopted. However, predicting I\/O performance is difficult because I\/O systems are shared among all users and involve many layers of software and hardware stack, including the application, network interconnect, operating system, file system, and storage devices. Furthermore, updates to these layers and changes in system management policy can significantly alter the I\/O behavior of applications and the entire system. To improve the prediction of the I\/O performance on HPC systems, we propose integrating information from several different system logs and developing a regression-based approach to predict the I\/O performance. Our proposed scheme can dynamically select the most relevant features from the log entries using various feature selection algorithms and scoring functions, and can automatically select the regression algorithm with the best accuracy for the prediction task. The evaluation results show that our proposed scheme can predict the write performance with up to 90% prediction accuracy and the read performance with up to 99% prediction accuracy using the real logs from the Cori supercomputer system at NERSC.<\/jats:p>","DOI":"10.1186\/s40537-023-00741-4","type":"journal-article","created":{"date-parts":[[2023,5,17]],"date-time":"2023-05-17T05:01:47Z","timestamp":1684299707000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["Design and implementation of I\/O performance prediction scheme on HPC systems through large-scale log analysis"],"prefix":"10.1186","volume":"10","author":[{"given":"Sunggon","family":"Kim","sequence":"first","affiliation":[]},{"given":"Alex","family":"Sim","sequence":"additional","affiliation":[]},{"given":"Kesheng","family":"Wu","sequence":"additional","affiliation":[]},{"given":"Suren","family":"Byna","sequence":"additional","affiliation":[]},{"given":"Yongseok","family":"Son","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,5,17]]},"reference":[{"key":"741_CR1","unstructured":"Abadi M et\u00a0al. Tensorflow: a system for large-scale machine learning. In: 12th $$\\{$$USENIX$$\\}$$ Symposium on Operating Systems Design and Implementation ($$\\{$$OSDI$$\\}$$ 16); 2016. p. 265\u201383."},{"key":"741_CR2","doi-asserted-by":"crossref","unstructured":"Agarwal M, Singhvi D, Malakar P, Byna S. Active learning-based automatic tuning and prediction of parallel i\/o performance. In: 2019 IEEE\/ACM Fourth International Parallel Data Systems Workshop (PDSW), IEEE; 2019. p. 20\u20139.","DOI":"10.1109\/PDSW49588.2019.00007"},{"key":"741_CR3","doi-asserted-by":"publisher","unstructured":"Behzad B et\u00a0al. Improving parallel I\/O autotuning with performance modeling. In: Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing, Association for Computing Machinery, New York, NY, USA; 2014. p. 253\u201356. https:\/\/doi.org\/10.1145\/2600212.2600708.","DOI":"10.1145\/2600212.2600708"},{"key":"741_CR4","doi-asserted-by":"publisher","unstructured":"Behzad B et\u00a0al. Pattern-driven parallel I\/O tuning. In: Proceedings of the 10th Parallel Data Storage Workshop, ACM, New York, NY, USA; 2015. p. 43\u201348. https:\/\/doi.org\/10.1145\/2834976.2834977.","DOI":"10.1145\/2834976.2834977"},{"key":"741_CR5","first-page":"1","volume-title":"Noise reduction in speech processing","author":"J Benesty","year":"2009","unstructured":"Benesty J, et al. Pearson correlation coefficient. In: Davis GM, editor., et al., Noise reduction in speech processing. Heidelberg: Springer; 2009. p. 1\u20134."},{"key":"741_CR6","doi-asserted-by":"crossref","unstructured":"Carns P et\u00a0al. 24\/7 characterization of petascale I\/O workloads. In: 2009 IEEE International Conference on Cluster Computing and Workshops, IEEE; 2009. p. 1\u201310.","DOI":"10.1109\/CLUSTR.2009.5289150"},{"key":"741_CR7","doi-asserted-by":"publisher","first-page":"107760","DOI":"10.1016\/j.ast.2022.107760","volume":"128","author":"Q Chen","year":"2022","unstructured":"Chen Q, Sheng H, Zhang T. A novel direct performance adaptive control of aero-engine using subspace-based improved model predictive control. Aeros Sci Technol. 2022;128: 107760.","journal-title":"Aeros Sci Technol"},{"key":"741_CR8","unstructured":"Chollet F. et\u00a0al. Keras; 2015. https:\/\/keras.io."},{"key":"741_CR9","doi-asserted-by":"crossref","unstructured":"Dudani SA. The distance-weighted k-nearest-neighbor rule. IEEE Trans Syst Man Cybern; 1976. p. 325\u20137.","DOI":"10.1109\/TSMC.1976.5408784"},{"key":"741_CR10","doi-asserted-by":"publisher","first-page":"1189","DOI":"10.1214\/aos\/1013203451","volume":"29","author":"JH Friedman","year":"2001","unstructured":"Friedman JH. Greedy function approximation: a gradient boosting machine. Ann Stat. 2001;29:1189\u2013232.","journal-title":"Ann Stat"},{"key":"741_CR11","doi-asserted-by":"publisher","first-page":"367","DOI":"10.1016\/S0167-9473(01)00065-2","volume":"38","author":"JH Friedman","year":"2002","unstructured":"Friedman JH. Stochastic gradient boosting. Comput stat Data Anal. 2002;38:367\u201378.","journal-title":"Comput stat Data Anal"},{"key":"741_CR12","volume-title":"A guide to chi-squared testing","author":"PE Greenwood","year":"1996","unstructured":"Greenwood PE, Nikulin MS. A guide to chi-squared testing, vol. 280. Hoboken: John Wiley & Sons; 1996."},{"issue":"11844","key":"741_CR13","first-page":"42","volume":"2021","author":"M Khoshboresh-Masouleh","year":"2021","unstructured":"Khoshboresh-Masouleh M, Shah-Hosseini R. Quantum deep learning in remote sensing: achievements and challenges. Photonics Quantum. 2021;2021(11844):42\u20135.","journal-title":"Photonics Quantum"},{"key":"741_CR14","doi-asserted-by":"crossref","unstructured":"Kim S et\u00a0al. Dca-io: A dynamic i\/o control scheme for parallel and distributed file systems. In: 2019 19th IEEE\/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID); 2019. p. 351\u201360.","DOI":"10.1109\/CCGRID.2019.00049"},{"key":"741_CR15","doi-asserted-by":"crossref","unstructured":"Kim S et\u00a0al. Towards hpc i\/o performance prediction through large-scale log analysis. In: Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing; 2020. p. 77\u201388.","DOI":"10.1145\/3369583.3392678"},{"key":"741_CR16","first-page":"249","volume-title":"Machine learning proceedings","author":"K Kira","year":"1992","unstructured":"Kira K, Rendell LA. A practical approach to feature selection. In: Sleeman D, Edwards P, editors. Machine learning proceedings. Amsterdam: Elsevier; 1992. p. 249\u201356."},{"key":"741_CR17","doi-asserted-by":"publisher","first-page":"066138","DOI":"10.1103\/PhysRevE.69.066138","volume":"69","author":"A Kraskov","year":"2004","unstructured":"Kraskov A, St\u00f6gbauer H, Grassberger P. Estimating mutual information. Phys Rev E. 2004;69: 066138.","journal-title":"Phys Rev E"},{"key":"741_CR18","first-page":"1097","volume":"25","author":"A Krizhevsky","year":"2012","unstructured":"Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. Adv Neural Inf Process Syst. 2012;60:1097\u2013105.","journal-title":"Adv Neural Inf Process Syst"},{"key":"741_CR19","doi-asserted-by":"crossref","unstructured":"Kroeger TM, Long DD. The case for efficient file access pattern modeling. In: Proceedings of the Seventh Workshop on Hot Topics in Operating Systems, IEEE; 1999. p. 14\u20139.","DOI":"10.1109\/HOTOS.1999.798371"},{"key":"741_CR20","doi-asserted-by":"crossref","unstructured":"Lang S. et\u00a0al. I\/o performance challenges at leadership scale. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, IEEE; 2009. p. 1\u201312.","DOI":"10.1145\/1654059.1654100"},{"key":"741_CR21","first-page":"18","volume":"2","author":"A Liaw","year":"2002","unstructured":"Liaw A, et al. Classification and regression by randomForest. R News. 2002;2:18\u201322.","journal-title":"R News"},{"key":"741_CR22","unstructured":"Lockwood GK. et\u00a0al. TOKIO on ClusterStor: connecting standard tools to enable holistic i\/o performance analysis; 2018."},{"key":"741_CR23","doi-asserted-by":"crossref","unstructured":"Lockwood GK, et al. A year in the life of a parallel file system. In: SC18: International Conference for High Performance Computing. Storage and Analysis. IEEE: Networking; 2018. p. 931\u201343.","DOI":"10.1109\/SC.2018.00077"},{"key":"741_CR24","unstructured":"Lux TC. et\u00a0al. Predictive modeling of i\/o characteristics in high performance computing systems. In: Proceedings of the High Performance Computing Symposium, Society for Computer Simulation International; 2018. p.\u00a08."},{"key":"741_CR25","doi-asserted-by":"crossref","unstructured":"Matsunaga A, et al. On the use of machine learning to predict the time and resources consumed by applications. In: 2010 10th IEEE\/ACM International Conference on Cluster. IEEE: Cloud and Grid Computing; 2010. p. 495\u2013504.","DOI":"10.1109\/CCGRID.2010.98"},{"key":"741_CR26","doi-asserted-by":"crossref","unstructured":"McKenna R et\u00a0al. Machine learning predictions of runtime and IO traffic on high-end clusters. In: 2016 IEEE International Conference on Cluster Computing (CLUSTER), IEEE; 2016. p. 255\u20138.","DOI":"10.1109\/CLUSTER.2016.58"},{"key":"741_CR27","doi-asserted-by":"crossref","unstructured":"McKinney W. Data structures for statistical computing in python. In: van\u00a0der Walt S, Millman J, editors. Proceedings of the 9th Python in Science Conference; 2010. p. 51\u20136.","DOI":"10.25080\/Majora-92bf1922-00a"},{"key":"741_CR28","doi-asserted-by":"crossref","unstructured":"Meswani MR, Laurenzano MA, Carrington L, Snavely A. Modeling and predicting disk I\/O time of HPC applications. In: 2010 DoD High Performance Computing Modernization Program Users Group Conference, IEEE; 2010. p. 478\u201386.","DOI":"10.1109\/HPCMP-UGC.2010.27"},{"key":"741_CR29","unstructured":"Min Co. SFS: random write considered harmful in solid state drives. In: FAST. 2012. p. 1\u201316."},{"key":"741_CR30","doi-asserted-by":"crossref","unstructured":"Navot A et\u00a0al. Is feature selection still necessary?. In: International Statistical and Optimization Perspectives Workshop\u201d Subspace, Latent Structure and Feature Selection\u201d. Springer; 2005. p. 127\u201338.","DOI":"10.1007\/11752790_8"},{"key":"741_CR31","doi-asserted-by":"publisher","first-page":"183","DOI":"10.1007\/978-1-4615-0509-9_13","volume-title":"Grid resource management","author":"B Nitzberg","year":"2004","unstructured":"Nitzberg B, et al. PBS pro: Grid computing and scheduling attributes. In: Nabrzyski J, Schopf JM, W\u0119glarz J, editors., et al., Grid resource management. Boston: Springer; 2004. p. 183\u201390."},{"key":"741_CR32","doi-asserted-by":"publisher","first-page":"683","DOI":"10.1109\/72.159058","volume":"3","author":"SK Pal","year":"1992","unstructured":"Pal SK, Mitra S. Multilayer perceptron, fuzzy sets, and classification. IEEE Trans Neural Netw. 1992;3:683\u201397.","journal-title":"IEEE Trans Neural Netw"},{"key":"741_CR33","doi-asserted-by":"crossref","unstructured":"Patel T. et\u00a0al. Revisiting I\/O behavior in large-scale storage systems: the expected and the unexpected. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis; 2019. p. 1\u201313.","DOI":"10.1145\/3295500.3356183"},{"key":"741_CR34","first-page":"2825","volume":"12","author":"F Pedregosa","year":"2011","unstructured":"Pedregosa F, et al. Scikit-learn: machine learning in python. J Mach Learn Res. 2011;12:2825\u201330.","journal-title":"J Mach Learn Res"},{"key":"741_CR35","unstructured":"Pfister GF. An introduction to the infiniband architecture. In: High Performance Mass Storage and Parallel I\/O. 2001; ch. 42, p. 617\u201332."},{"key":"741_CR36","unstructured":"Quintero D. et\u00a0al. IBM Spectrum Scale (formerly GPFS). IBM Redbooks. 2017."},{"key":"741_CR37","first-page":"19","volume":"3","author":"JF Schmidt","year":"2016","unstructured":"Schmidt JF, Kunkel JM. Predicting I\/O performance in HPC using artificial neural networks. Supercomput Front Innov. 2016;3:19\u201333.","journal-title":"Supercomput Front Innov"},{"key":"741_CR38","unstructured":"Schwan P. et\u00a0al. Lustre: Building a file system for 1000-node clusters. In: Proceedings of the 2003 Linux symposium; 2003. p. 380\u20136."},{"key":"741_CR39","doi-asserted-by":"crossref","unstructured":"Shan H. et\u00a0al. Characterizing and predicting the I\/O performance of hpc applications using a parameterized synthetic benchmark. In: Proceedings of the 2008 ACM\/IEEE conference on Supercomputing, IEEE Press; 2008. p.\u00a042.","DOI":"10.1109\/SC.2008.5222721"},{"key":"741_CR40","doi-asserted-by":"publisher","first-page":"103419","DOI":"10.1016\/j.dsp.2022.103419","volume":"123","author":"P Shang","year":"2022","unstructured":"Shang P, Liu X, Yu C, Yan G, Xiang Q, Mi X. A new ensemble deep graph reinforcement learning network for spatio-temporal traffic volume forecasting in a freeway network. Digital Signal Process. 2022;123: 103419.","journal-title":"Digital Signal Process"},{"key":"741_CR41","doi-asserted-by":"crossref","unstructured":"Snyder S, Carns P, Harms K, Latham R, Ross R. Performance evaluation of Darshan 3.0. 0 on the Cray XC30. Technical Report. Argonne National Lab.(ANL), Argonne, IL (United States); 2016.","DOI":"10.2172\/1250469"},{"key":"741_CR42","doi-asserted-by":"crossref","unstructured":"Snyder S. et\u00a0al. Modular HPC I\/O characterization with darshan. In: 2016 5th Workshop on Extreme-Scale Programming Tools (ESPT), IEEE; 2016. p. 9\u201317.","DOI":"10.1109\/ESPT.2016.006"},{"key":"741_CR43","unstructured":"Venkataraman S. et\u00a0al. Ernest: efficient performance prediction for large-scale advanced analytics. In: 13th $$\\{$$USENIX$$\\}$$ Symposium on Networked Systems Design and Implementation ($$\\{$$NSDI$$\\}$$ 16); 2016. p. 363\u201378."},{"key":"741_CR44","doi-asserted-by":"publisher","first-page":"469","DOI":"10.1162\/089976603762553004","volume":"15","author":"JJ Verbeek","year":"2003","unstructured":"Verbeek JJ, Vlassis N, Kr\u00f6se B. Efficient greedy learning of gaussian mixture models. Neural Comput. 2003;15:469\u201385.","journal-title":"Neural Comput"},{"key":"741_CR45","doi-asserted-by":"crossref","unstructured":"Wang T. et\u00a0al. Iominer: Large-scale analytics framework for gaining knowledge from I\/O logs. In: 2018 IEEE International Conference on Cluster Computing (CLUSTER), IEEE; 2018. p. 466\u201376.","DOI":"10.1109\/CLUSTER.2018.00062"},{"key":"741_CR46","unstructured":"Wartens CH, Garlick J. LMT-the lustre monitoring tool; 2010."},{"key":"741_CR47","doi-asserted-by":"crossref","unstructured":"Xie B, Tan Z, Carns P, Chase J, Harms K, Lofstead J, Oral S, Vazhkudai SS, Wang F. Applying machine learning to understand write performance of large-scale parallel filesystems. In: 2019 IEEE\/ACM Fourth International Parallel Data Systems Workshop (PDSW), IEEE; 2019. p. 30\u20139.","DOI":"10.1109\/PDSW49588.2019.00008"},{"key":"741_CR48","doi-asserted-by":"crossref","unstructured":"Xie B, Tan Z, Carns P, Chase J, Harms K, Lofstead J, Oral S, Vazhkudai SS, Wang F. Interpreting write performance of supercomputer I\/O systems with regression models. In: 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS), IEEE; 2021. p. 557\u201366.","DOI":"10.1109\/IPDPS49936.2021.00064"},{"key":"741_CR49","doi-asserted-by":"crossref","unstructured":"Xie B. et\u00a0al. Predicting output performance of a petascale supercomputer. In: Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing; 2017. p. 181\u201392.","DOI":"10.1145\/3078597.3078614"},{"key":"741_CR50","doi-asserted-by":"crossref","unstructured":"Xu G. et\u00a0al. Simulation-based performance prediction of HPC applications: a case study of HPL. In: 2020 IEEE\/ACM International Workshop on HPC User Support Tools (HUST) and Workshop on Programming and Performance Visualization Tools (ProTools), IEEE; 2020. p. 81\u201388.","DOI":"10.1109\/HUSTProtools51951.2020.00016"},{"key":"741_CR51","doi-asserted-by":"publisher","first-page":"44","DOI":"10.1007\/10968987_3","volume-title":"Workshop on job scheduling strategies for parallel processing","author":"AB Yoo","year":"2003","unstructured":"Yoo AB, Jette MA, Grondona M. SLURM: Simple linux utility for resource management. In: Feitelson D, Rudolph L, Schwiegelshohn U, editors. Workshop on job scheduling strategies for parallel processing. Berlin: Springer; 2003. p. 44\u201360."},{"key":"741_CR52","first-page":"100337","volume":"27","author":"J Yu","year":"2022","unstructured":"Yu J, Gao M, Li Y, Zhang Z, Ip WH, Yung KL. Workflow performance prediction based on graph structure aware deep attention neural network. J Ind Inf Integr. 2022;27: 100337.","journal-title":"J Ind Inf Integr"},{"key":"741_CR53","doi-asserted-by":"crossref","unstructured":"Zhu Y, Chowdhury F, Fu H, Moody A, Mohror K, Sato K, Yu W. Entropy-aware I\/O pipelining for large-scale deep learning on HPC systems. In: 2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), IEEE; 2018. p. 145\u201356.","DOI":"10.1109\/MASCOTS.2018.00023"}],"container-title":["Journal of Big Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-023-00741-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s40537-023-00741-4\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-023-00741-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,20]],"date-time":"2024-10-20T14:34:38Z","timestamp":1729434878000},"score":1,"resource":{"primary":{"URL":"https:\/\/journalofbigdata.springeropen.com\/articles\/10.1186\/s40537-023-00741-4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,5,17]]},"references-count":53,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,12]]}},"alternative-id":["741"],"URL":"https:\/\/doi.org\/10.1186\/s40537-023-00741-4","relation":{},"ISSN":["2196-1115"],"issn-type":[{"value":"2196-1115","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,5,17]]},"assertion":[{"value":"29 August 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 April 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"17 May 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"65"}}