{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T01:27:33Z","timestamp":1760059653934,"version":"build-2065373602"},"reference-count":30,"publisher":"MDPI AG","issue":"7","license":[{"start":{"date-parts":[[2025,6,29]],"date-time":"2025-06-29T00:00:00Z","timestamp":1751155200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>Sparse matrix\u2013vector multiplication (SpMV) plays a significant role in the computational costs of many scientific applications such as 2D\/3D robotics, power network problems, and computer vision. Numerous implementations using different sparse matrix formats have been introduced to optimize this kernel on CPUs and GPUs. However, due to the sparsity patterns of matrices and the diverse configurations of hardware, accurately modeling the performance of SpMV remains a complex challenge. SpMV computation is often a time-consuming process because of its sparse matrix structure. To address this, we propose a machine learning-based tool, namely Elegante+, that predicts optimal scheduling policies by analyzing matrix structures. This approach eliminates the need for repetitive trial and error, minimizes errors, and finds the best solution of the SpMV kernel, which enables users to make informed decisions about scheduling policies that maximize computational efficiency. For this purpose, we collected 1000+ sparse matrices from the SuiteSparse matrix market collection and converted them into the compressed sparse row (CSR) format, and SpMV computation was performed by extracting 14 key sparse matrix features. After creating a comprehensive dataset, we trained various machine learning models to predict the optimal scheduling policy, significantly enhancing the computational efficiency and reducing the overhead in high-performance computing environments. Our proposed tool, Elegante+ (XGB with all SpMV features), achieved the highest cross-validation score of 79% and performed five times faster than the default scheduling policy during SpMV in a high-performance computing (HPC) environment.<\/jats:p>","DOI":"10.3390\/info16070553","type":"journal-article","created":{"date-parts":[[2025,6,30]],"date-time":"2025-06-30T13:06:17Z","timestamp":1751288777000},"page":"553","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Elegante+: A Machine Learning-Based Optimization Framework for Sparse Matrix\u2013Vector Computations on the CPU Architecture"],"prefix":"10.3390","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0009-0003-8799-8212","authenticated-orcid":false,"given":"Muhammad","family":"Ahmad","sequence":"first","affiliation":[{"name":"Centro de Investigaci\u00f3n en Computaci\u00f3n, Instituto Polit\u00e9cnico Nacional (CIC-IPN), Mexico City 07738, Mexico"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4698-6461","authenticated-orcid":false,"given":"Sardar","family":"Usman","sequence":"additional","affiliation":[{"name":"School of Informatics and Robotics, Institute of Arts and Culture, Lahore 54000, Pakistan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ameer","family":"Hamza","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Software Engineering, The Islamia University of Bahawapur, Bahawalpur 63100, Pakistan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Muhammad","family":"Muzamil","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Software Engineering, The Islamia University of Bahawapur, Bahawalpur 63100, Pakistan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0241-7902","authenticated-orcid":false,"given":"Ildar","family":"Batyrshin","sequence":"additional","affiliation":[{"name":"Centro de Investigaci\u00f3n en Computaci\u00f3n, Instituto Polit\u00e9cnico Nacional (CIC-IPN), Mexico City 07738, Mexico"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2025,6,29]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Hu, Y., Du, Y., Ustun, E., and Zhang, Z. (2021, January 1\u20134). GraphLily: Accelerating graph linear algebra on HBM-equipped FPGAs. Proceedings of the 2021 IEEE\/ACM International Conference on Computer Aided Design (ICCAD), Munich, Germany.","DOI":"10.1109\/ICCAD51958.2021.9643582"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Yesil, S., Heidarshenas, A., Morrison, A., and Torrellas, J. (2020, January 9\u201319). Speeding up SpMV for power-law graph analytics by enhancing locality & vectorization. Proceedings of the SC20: IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, Atlanta, GA, USA.","DOI":"10.1109\/SC41405.2020.00090"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Sun, M., Li, Z., Lu, A., Li, Y., Chang, S.E., Ma, X., Lin, X., and Fang, Z. (March, January 27). FILM-QNN: Efficient FPGA acceleration of deep neural networks with intra-layer, mixed-precision quantization. Proceedings of the 2022 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays, New York, NY, USA.","DOI":"10.1145\/3490422.3502364"},{"key":"ref_4","unstructured":"Nathan, B., and Garland, M. (2009, January 14\u201320). Implementing sparse matrix-vector multiplication on throughput-oriented processors. Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, Portland, OR, USA."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Feng, X., Jin, H., Zheng, R., Hu, K., Zeng, J., and Shao, Z. (2011, January 7\u20139). Optimization of sparse matrix-vector multiplication with variant CSR on GPUs. Proceedings of the 2011 IEEE 17th International Conference on Parallel and Distributed Systems, Tainan, Taiwan.","DOI":"10.1109\/ICPADS.2011.91"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Kislal, O., Ding, M., Kandemir, I., and Demirkiran, M. (2013, January 22\u201323). Optimizing Sparse Matrix-Vector Multiplication on Emerging Multicores. Proceedings of the IEEE 6th International Workshop on Multi-\/Many-Core Computing Systems (MuCoCoS), Paris, France.","DOI":"10.1109\/MuCoCoS.2013.6633600"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"40","DOI":"10.1145\/1365490.1365500","article-title":"Scalable parallel programming with CUDA: Is CUDA the parallel programming model that application developers have been waiting for?","volume":"6","author":"Nickolls","year":"2008","journal-title":"Queue"},{"key":"ref_8","unstructured":"Baskaran, M., and Bordawekar, R. (2009). Optimizing Sparse Matrix-Vector Multiplication on GPUs, IBM Research Reports. RC24704 W0812\u2013047."},{"key":"ref_9","unstructured":"Mike, G. (2012, January 9\u201311). Efficient sparse matrix-vector multiplication on cache-based GPUs. Proceedings of the IEEE 2012 Innovative Parallel Computing (InPar), Jeju Island, Republic of Korea."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Greathouse, J.L., and Daga, M. (2014, January 16\u201321). Efficient sparse matrix-vector multiplication on GPUs using the CSR storage format. Proceedings of the SC\u201914 IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, New Orleans, LA, USA.","DOI":"10.1109\/SC.2014.68"},{"key":"ref_11","unstructured":"Weifeng, L., and Vinter, B. (2015, January 8\u201311). CSR5: An efficient storage format for cross-platform sparse matrix-vector multiplication. Proceedings of the 29th ACM on International Conference on Supercomputing, Newport Beach, CA, USA."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Daga, M., and Greathouse, J.L. (2015, January 16\u201319). Structural agnostic SpMV: Adapting CSR-adaptive for irregular matrices. Proceedings of the 2015 IEEE 22nd International Conference on High Performance Computing (HiPC), Bengaluru, India.","DOI":"10.1109\/HiPC.2015.55"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"2623","DOI":"10.1109\/TC.2014.2366731","article-title":"Performance optimization using partitioned SpMV on GPUs and multicore CPUs","volume":"64","author":"Yang","year":"2014","journal-title":"IEEE Trans. Comput."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"1272","DOI":"10.1109\/TCAD.2019.2912923","article-title":"A streaming dataflow engine for sparse matrix-vector multiplication using high-level synthesis","volume":"39","author":"Hosseinabady","year":"2019","journal-title":"IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst."},{"key":"ref_15","unstructured":"Bowen, L., and Liu, D. (2023, January 6\u201319). Towards high-bandwidth-utilization SpMV on FPGA via partial vector duplication. Proceedings of the 28th Asia and South Pacific Design Automation Conference, Tokyo, Japan."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"247","DOI":"10.1145\/2038037.1941587","article-title":"CSX: An extended compression format for SpMV on shared memory systems","volume":"46","author":"Kourtis","year":"2011","journal-title":"ACM SIGPLAN Not."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Geng, T., Wang, T., Sanaullah, A., Yang, C., Patel, R., and Herbordt, M. (2018, January 3\u20136). A framework for acceleration of CNN training on deeply-pipelined FPGA clusters with work and weight load balancing. Proceedings of the IEEE 2018 28th International Conference on Field Programmable Logic and Applications (FPL), Dublin, Ireland.","DOI":"10.1109\/FPL.2018.00074"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Du, Y., Hu, Y., Zhou, Z., and Zhang, Z. (March, January 27). High-performance sparse linear algebra on HBM-equipped FPGA using HLS: A case study on SpMV. Proceedings of the 2022 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays, Seaside, CA, USA.","DOI":"10.1145\/3490422.3502368"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Muhammed, T., Mehmood, R., Albeshri, A., and Katib, I. (2019). SURAA: A novel method and tool for Loadbalanced and coalesced SpMV computations on GPUs. Appl. Sci., 9.","DOI":"10.3390\/app9050947"},{"key":"ref_20","first-page":"1","article-title":"The University of Florida sparse matrix collection, ACM Trans","volume":"38","author":"Davis","year":"2011","journal-title":"Math. Softw."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Pinar, A., and Heath, M.T. (1999, January 14\u201319). Improving performance of sparse matrix-vector multiplication. Proceedings of the 1999 ACM\/IEEE Conference on Supercomputing, Portland, OR, USA.","DOI":"10.1145\/331532.331562"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"81279","DOI":"10.1109\/ACCESS.2019.2923565","article-title":"ZAKI+: A Machine Learning Based Process Mapping Tool for SpMV Computations on Distributed Memory Architectures","volume":"7","author":"Usman","year":"2019","journal-title":"IEEE Access"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Ahmed, M., Usman, S., Shah, N.A., Ashraf, M.U., Alghamdi, A.M., Bahadded, A.A., and Almarhabi, K.A. (2022). AAQAL: A machine learning-based tool for performance optimization of parallel SPMV computations using block CSR. Appl. Sci., 12.","DOI":"10.3390\/app12147073"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"744","DOI":"10.1007\/s11036-019-01318-3","article-title":"ZAKI: A smart method and tool for automatic performance optimization of parallel SpMV computations on distributed memory machines","volume":"28","author":"Usman","year":"2023","journal-title":"Mob. Netw. Appl."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3652579","article-title":"Machine Learning-Based Kernel Selector for SpMV Optimization in Graph Analysis","volume":"11","author":"Xiao","year":"2024","journal-title":"ACM Trans. Parallel Comput."},{"key":"ref_26","unstructured":"Yesil, S., Heidarshenas, A., Morrison, A., and Torrellas, J. (March, January 25). Wise: Predicting the performance of sparse matrix vector multiplication with machine learning. Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, Montreal, QC, Canada."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"104799","DOI":"10.1016\/j.jpdc.2023.104799","article-title":"Revisiting thread configuration of SpMV kernels on GPU: A machine learning based approach","volume":"185","author":"Gao","year":"2024","journal-title":"J. Parallel Distrib. Comput."},{"key":"ref_28","first-page":"1175","article-title":"An irregular sparse matrix SpMV method","volume":"46","author":"Shi","year":"2024","journal-title":"Comput. Eng. Sci."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"254","DOI":"10.1177\/1094342021990738","article-title":"Selecting optimal SpMV realizations for GPUs via machine learning","volume":"35","author":"Dufrechou","year":"2021","journal-title":"Int. J. High-Perform. Comput. Appl."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Ahmad, M., Sardar, U., Batyrshin, I., Hasnain, M., Sajid, K., and Sidorov, G. (2024). Elegante: A Machine Learning-Based Threads Configuration Tool for SpMV Computations on Shared Memory Architecture. Information, 15.","DOI":"10.3390\/info15110685"}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/16\/7\/553\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T18:01:10Z","timestamp":1760032870000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/16\/7\/553"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,29]]},"references-count":30,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2025,7]]}},"alternative-id":["info16070553"],"URL":"https:\/\/doi.org\/10.3390\/info16070553","relation":{},"ISSN":["2078-2489"],"issn-type":[{"type":"electronic","value":"2078-2489"}],"subject":[],"published":{"date-parts":[[2025,6,29]]}}}