{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,24]],"date-time":"2026-02-24T17:38:44Z","timestamp":1771954724169,"version":"3.50.1"},"reference-count":25,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2018,8,24]],"date-time":"2018-08-24T00:00:00Z","timestamp":1535068800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computation"],"abstract":"<jats:p>Sparse matrix-vector multiplication (SpMV) can be used to solve diverse-scaled linear systems and eigenvalue problems that exist in numerous, and varying scientific applications. One of the scientific applications that SpMV is involved in is known as Configuration Interaction (CI). CI is a linear method for solving the nonrelativistic Schr\u00f6dinger equation for quantum chemical multi-electron systems, and it can deal with the ground state as well as multiple excited states. In this paper, we have developed a hybrid approach in order to deal with CI sparse matrices. The proposed model includes a newly-developed hybrid format for storing CI sparse matrices on the Graphics Processing Unit (GPU). In addition to the new developed format, the proposed model includes the SpMV kernel for multiplying the CI matrix (proposed format) by a vector using the C language and the Compute Unified Device Architecture (CUDA) platform. The proposed SpMV kernel is a vector kernel that uses the warp approach. We have gauged the newly developed model in terms of two primary factors, memory usage and performance. Our proposed kernel was compared to the cuSPARSE library and the CSR5 (Compressed Sparse Row 5) format and already outperformed both.<\/jats:p>","DOI":"10.3390\/computation6030045","type":"journal-article","created":{"date-parts":[[2018,8,24]],"date-time":"2018-08-24T11:13:45Z","timestamp":1535109225000},"page":"45","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["Developing a New Storage Format and a Warp-Based SpMV Kernel for Configuration Interaction Sparse Matrices on the GPU"],"prefix":"10.3390","volume":"6","author":[{"given":"Mohammed","family":"Mahmoud","sequence":"first","affiliation":[{"name":"Department of Computer Science, University of North Dakota, Grand Forks, ND 58202, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mark","family":"Hoffmann","sequence":"additional","affiliation":[{"name":"Department of Chemistry, University of North Dakota, Grand Forks, ND 58202, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hassan","family":"Reza","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of North Dakota, Grand Forks, ND 58202, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2018,8,24]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"72","DOI":"10.1109\/TPDS.2016.2549523","article-title":"Dissecting GPU Memory Hierarchy through Microbenchmarking","volume":"28","author":"Mei","year":"2017","journal-title":"IEEE Trans. Parallel Distrib. Syst."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Nickolls, J., Buck, I., Garland, M., and Skadron, K. (2008, January 11\u201315). Scalable parallel programming with CUDA. Proceedings of the ACM SIGGRAPH 2008 Classes, Los Angeles, CA, USA.","DOI":"10.1145\/1401132.1401152"},{"key":"ref_3","unstructured":"NVIDIA Corporation (2008). CUDA\u00ae: A General-Purpose Parallel Computing Platform and Programming Model. NVIDIA CUDA Programming Guide, NVIDIA. Version 2.0."},{"key":"ref_4","unstructured":"(2017, October 02). NVidia Developer\u2019s Guide, about CUDA. Available online: https:\/\/developer.nvidia.com\/about-cuda."},{"key":"ref_5","first-page":"7577","article-title":"A Survey on Performance Modelling and Optimization Techniques for SpMV on GPUs","volume":"5","author":"Kulkarni","year":"2014","journal-title":"Int. J. Comput. Sci. Inf. Technol."},{"key":"ref_6","unstructured":"Baskaran, M.M., and Bordawekar, R. (Ninth SIAM Conf. Parallel Process. Sci. Comput., 2009). Optimizing Sparse Matrix-Vector Multiplication on GPUs, Ninth SIAM Conf. Parallel Process. Sci. Comput., IBM Research Report no. RC24704."},{"key":"ref_7","unstructured":"Bell, N., and Garland, M. (2010, January 22\u201324). Efficient Sparse Matrix-Vector Multiplication on CUDA. Proceedings of the 2010 2nd International Conference on Education Technology and Computer, Shanghai, China."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"103","DOI":"10.1177\/1094342015593156","article-title":"A hybrid format for better performance of sparse matrix-vector multiplication on a GPU","volume":"30","author":"Guo","year":"2016","journal-title":"Int. J. High Perform. Comput. Appl."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"3057","DOI":"10.1080\/00268970701757875","article-title":"A mathematical and computational review of Hartree-Fock SCF methods in quantum chemistry","volume":"105","author":"Echenique","year":"2007","journal-title":"Mol. Phys."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Anthony, B.M., and Stevens, J. (2017). Virtually going green: The role of quantum computational chemistry in reducing pollution and toxicity in chemistry. Phys. Sci. Rev., 2.","DOI":"10.1515\/psr-2017-0005"},{"key":"ref_11","first-page":"241","article-title":"The Configuration Interaction Method","volume":"1","author":"Siegbahn","year":"2000","journal-title":"Eur. Summer Sch. Quantum Chem."},{"key":"ref_12","first-page":"1","article-title":"Introduction to Full Configuration Interaction Quantum Monte Carlo with Applications to the Hubbard model Stuttgart","volume":"6","author":"Alavi","year":"2016","journal-title":"Autumn Sch. Correl. Electrons"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Aktulga, H.M., Buluc, A., Williams, S., and Yang, C. (2014, January 19\u201323). Optimizing sparse matrix-multiple vectors multiplication for nuclear configuration interaction calculations. Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, Phoenix, AZ, USA.","DOI":"10.1109\/IPDPS.2014.125"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"3366","DOI":"10.1007\/s11227-015-1571-0","article-title":"Adaptive sparse matrix representation for efficient matrix-vector multiplication","volume":"72","author":"Zardoshti","year":"2016","journal-title":"J. Supercomput."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Ren, Z., Ye, C., and Liu, G. (2010, January 15\u201317). Application and Research of C Language Programming Examination System Based on B\/S. Proceedings of the 2010 Third International Symposium on Information Processing, Qingdao, China.","DOI":"10.1109\/ISIP.2010.53"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Bell, N., and Garland, M. (2009, January 14\u201320). Implementing sparse matrix-vector multiplication on throughput-oriented processors. Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, Portland, OR, USA.","DOI":"10.1145\/1654059.1654078"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"49","DOI":"10.2528\/PIER11031607","article-title":"A memory efficient and fast sparse matrix vector product on a GPU","volume":"116","author":"Dziekonski","year":"2011","journal-title":"Prog. Electromagn. Res."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Ye, F., Calvin, C., and Petiton, S.G. (2015). A study of SpMV implementation using MPI and OpenMP on intel many-core architecture. VECPAR 2014: High Performance Computing for Computational Science\u2014VECPAR 2014, Springer.","DOI":"10.1007\/978-3-319-17353-5_4"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Maggioni, M., and Berger-Wolf, T. (2013, January 1\u20134). AdELL: An adaptive warp-Balancing ELL format for efficient sparse matrix-Vector multiplication on GPUs. Proceedings of the 2013 42nd International Conference on Parallel Processing, Lyon, France.","DOI":"10.1109\/ICPP.2013.10"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Eguly, I.R., and Giles, M. (2012, January 13\u201314). Efficient sparse matrix-vector multiplication on cache-based GPUs. Proceedings of the 2012 Innovative Parallel Computing (InPar), San Jose, CA, USA.","DOI":"10.1109\/InPar.2012.6339602"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Bulu\u00e7, A., Fineman, J.T., Frigo, M., Gilbert, J.R., and Leiserson, C.E. (2009, January 11\u201313). Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks. Proceedings of the Twenty-First Annual Symposium on Parallelism in Algorithms and Architectures, Calgary, AB, Canada.","DOI":"10.1145\/1583991.1584053"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"1550","DOI":"10.1109\/TPDS.2016.2630699","article-title":"A High Performance Block Eigensolver for Nuclear Configuration Interaction Calculations","volume":"28","author":"Aktulga","year":"2016","journal-title":"IEEE Trans. Parallel Distrib. Syst."},{"key":"ref_23","unstructured":"(2018, July 19). CUDA Toolkit Documentation. nvprof. Available online: https:\/\/docs.nvidia.com\/cuda\/profiler-users-guide\/index.html#nvprof-overview."},{"key":"ref_24","unstructured":"(2018, July 16). CUDA Toolkit Documentation, cuSPARSE. Available online: https:\/\/docs.nvidia.com\/cuda\/cusparse\/index.html."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Liu, W., and Vinter, B. (2015, January 8\u201311). CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication Categories and Subject Descriptors. Proceedings of the 29th ACM on International Conference on Supercomputing, Newport Beach, CA, USA.","DOI":"10.1145\/2751205.2751209"}],"container-title":["Computation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2079-3197\/6\/3\/45\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T15:20:56Z","timestamp":1760196056000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2079-3197\/6\/3\/45"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,8,24]]},"references-count":25,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2018,9]]}},"alternative-id":["computation6030045"],"URL":"https:\/\/doi.org\/10.3390\/computation6030045","relation":{},"ISSN":["2079-3197"],"issn-type":[{"value":"2079-3197","type":"electronic"}],"subject":[],"published":{"date-parts":[[2018,8,24]]}}}