{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,3,28]],"date-time":"2025-03-28T08:05:05Z","timestamp":1743149105006,"version":"3.40.3"},"publisher-location":"Cham","reference-count":17,"publisher":"Springer International Publishing","isbn-type":[{"type":"print","value":"9783031104183"},{"type":"electronic","value":"9783031104190"}],"license":[{"start":{"date-parts":[[2022,1,1]],"date-time":"2022-01-01T00:00:00Z","timestamp":1640995200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,7,1]],"date-time":"2022-07-01T00:00:00Z","timestamp":1656633600000},"content-version":"vor","delay-in-days":181,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>The physical property of the Hubbard model can be understood by solving the eigenvalue problem for the Hamiltonian derived from the model. Since the Hamiltonian is a large sparse matrix, an iteration method is usually utilized for solving the problems. One of effectual solvers for this problem is the LOBPCG (Locally Optimal Block Preconditioned Conjugate Gradient) method. The tuning strategies of the method on GPU systems when all iteration vectors are stored in device memory have been proposed. In this research, we propose tuning strategies for parallel LOBPCG method on multi-GPU system when the Hamiltonian is large and some iteration vectors are stored in host memory. When the LOBPCG method is used for solving multi eigenpairs (eigenvalues and the corresponding eigenvectors), the number of iteration vectors, whose size is the same as the dimension of the Hamiltonian, is proportional to the number of the eigenpairs. On the other hand, the memory consumption for the non-zero elements of the Hamiltonian can be significantly reduced by considering the regular arrangement of the elements. Therefore, when we execute the LOBPCG method for a large Hamiltonian on GPUs, some of the vectors have to be stored on host memory and have to be transferred between host and device memory as needed. Since the cost of the data transfer is very large, we also propose the optimization for it. The simulation result on a multi-GPU system shows that the optimization of the data transfer is very effective for high performance computing.<\/jats:p>","DOI":"10.1007\/978-3-031-10419-0_1","type":"book-chapter","created":{"date-parts":[[2022,6,30]],"date-time":"2022-06-30T17:07:51Z","timestamp":1656608871000},"page":"1-19","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["High Performance Parallel LOBPCG Method for\u00a0Large Hamiltonian Derived from\u00a0Hubbard Model on\u00a0Multi-GPU Systems"],"prefix":"10.1007","author":[{"given":"Susumu","family":"Yamada","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Toshiyuki","family":"Imamura","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Masahiko","family":"Machida","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2022,7,1]]},"reference":[{"key":"1_CR1","unstructured":"Anzt, H., Tomov, S., Dongarra, J.: Accelerating the LOBPCG method on GPUs using a blocked sparse matrix vector product. In: Proceedings of the Symposium on High Performance Computing, pp. 75\u201382 (2015)"},{"key":"1_CR2","doi-asserted-by":"publisher","first-page":"A206","DOI":"10.1137\/080731992","volume":"34","author":"J Demmel","year":"2012","unstructured":"Demmel, J., Grigori, L., Hoemmen, M., Langou, J.: Communication-optimal parallel and sequential QR and LU factorizations. SIAM J. Sci. Comput. 34, A206\u2013A239 (2012). https:\/\/doi.org\/10.1137\/080731992","journal-title":"SIAM J. Sci. Comput."},{"key":"1_CR3","doi-asserted-by":"publisher","first-page":"C655","DOI":"10.1137\/17M1129830","volume":"40","author":"JA Duersch","year":"2018","unstructured":"Duersch, J.A., Gu, M., Shao, M., Yang, C.: A robust and efficient implementation of LOBPCG. SIAM J. Sci. Comput. 40, C655\u2013C676 (2018). https:\/\/doi.org\/10.1137\/17M1129830","journal-title":"SIAM J. Sci. Comput."},{"key":"1_CR4","unstructured":"Furuya, T., Nakatsukasa, Y., Yanagisawa, Y., Yamamoto, Y.: CholeskyQR2: a simple and communication-avoiding algorithm for computing a Tall-Skinny QR factorization on a large-scale parallel system. In: ScalA 2014 (2014)"},{"key":"1_CR5","doi-asserted-by":"publisher","first-page":"324","DOI":"10.1016\/j.jcp.2006.02.007","volume":"228","author":"U Hetmaniuk","year":"2006","unstructured":"Hetmaniuk, U., Lehoucq, R.: Basis selection in LOBPCG. J. Comput. Phys. 228, 324\u2013332 (2006)","journal-title":"J. Comput. Phys."},{"key":"1_CR6","doi-asserted-by":"publisher","first-page":"2339","DOI":"10.1016\/j.jcp.2009.11.038","volume":"229","author":"JI Iwata","year":"2010","unstructured":"Iwata, J.I., et al.: A massively-parallel electronic-structure calculations based on real-space density functional theory. J. Comput. Phys. 229, 2339\u20132363 (2010). https:\/\/doi.org\/10.1016\/j.jcp.2009.11.038","journal-title":"J. Comput. Phys."},{"key":"1_CR7","first-page":"104","volume":"7","author":"AV Knyazev","year":"1998","unstructured":"Knyazev, A.V.: Preconditioned Eigensolvers - an oxymoron? Electron. Trans. Numer. Anal. 7, 104\u2013123 (1998)","journal-title":"Electron. Trans. Numer. Anal."},{"key":"1_CR8","doi-asserted-by":"publisher","first-page":"517","DOI":"10.1137\/S1064827500366124","volume":"23","author":"AV Knyazev","year":"2001","unstructured":"Knyazev, A.V.: Toward the optimal Eigensolver: locally optimal block preconditioned conjugate gradient method. SIAM J. Sci. Comput. 23, 517\u2013541 (2001)","journal-title":"SIAM J. Sci. Comput."},{"key":"1_CR9","doi-asserted-by":"publisher","unstructured":"Montorsi, A. (ed.): The Hubbard Model: A Collection on Reprints. World Scientific, Singapore (1992). https:\/\/doi.org\/10.1142\/1346","DOI":"10.1142\/1346"},{"key":"1_CR10","series-title":"Lecture Notes in Computer Science","doi-asserted-by":"publisher","first-page":"66","DOI":"10.1007\/978-3-030-49943-3_4","volume-title":"Accelerator Programming Using Directives","author":"F Rabbi","year":"2020","unstructured":"Rabbi, F., Daley, C.S., Aktulga, H.M., Wright, N.J.: Evaluation of directive-based GPU programming models on a block Eigensolver with consideration of large sparse matrices. In: Wienke, S., Bhalachandra, S. (eds.) WACCPD 2019. LNCS, vol. 12017, pp. 66\u201388. Springer, Cham (2020). https:\/\/doi.org\/10.1007\/978-3-030-49943-3_4"},{"key":"1_CR11","doi-asserted-by":"publisher","unstructured":"Rasetti, M. (ed.): The Hubbard Model: Recent Results. World Scientific, Singapore (1991). https:\/\/doi.org\/10.1142\/1377","DOI":"10.1142\/1377"},{"key":"1_CR12","doi-asserted-by":"publisher","first-page":"1884","DOI":"10.1016\/j.cpc.2012.04.006","volume":"183","author":"T Siro","year":"2012","unstructured":"Siro, T., Harju, A.: Exact diagonalization of the Hubbard model on graphics processing units. Comp. Phy. Comm. 183, 1884\u20131889 (2012)","journal-title":"Comp. Phy. Comm."},{"key":"1_CR13","doi-asserted-by":"publisher","first-page":"2165","DOI":"10.1137\/S1064827500370883","volume":"23","author":"A Stathopoulos","year":"2006","unstructured":"Stathopoulos, A., Wu, K.: A block orthogonalization procedure with constant synchronization requirements. SIAM J. Sci. Comput. 23, 2165\u20132182 (2006). https:\/\/doi.org\/10.1137\/S1064827500370883","journal-title":"SIAM J. Sci. Comput."},{"key":"1_CR14","unstructured":"Yamada, S., Imamura, T., Machida, M.: 16.447 TFlops and 159-billion-dimensional exact-diagonalization for trapped Fermion-Hubbard model on the earth simulator. In: Proceedings of SC05 (2005)"},{"key":"1_CR15","doi-asserted-by":"publisher","unstructured":"Yamada, S., Imamura, T., Machida, M.: High performance eigenvalue solver in exact-diagonalization method for Hubbard model on CUDA GPU. In: Joubert, G.R., Leather, H., Parsons, M., Peters, F., Sawyer, M. (eds.) Parallel Computing: On the road to Exascale. Advances in Parallel Computing, vol. 27, pp. 361\u2013369. IOS (2016). https:\/\/doi.org\/10.3233\/978-1-61499-621-7-361","DOI":"10.3233\/978-1-61499-621-7-361"},{"key":"1_CR16","doi-asserted-by":"publisher","unstructured":"Yamada, S., Imamura, T., Machida, M.: Communication avoiding Neumann expansion preconditioner for LOBPCG method: convergence property of exact diagonalization method for Hubbard model. In: Bassini, S., Danelutto, M., Dazzi, P., Joubert, G.R., Peters, F. (eds.) Parallel Computing is Everywhere. Advances in Parallel Computing, vol. 32, pp. 27\u201336. IOS (2018). https:\/\/doi.org\/10.3233\/978-1-61499-843-3-27","DOI":"10.3233\/978-1-61499-843-3-27"},{"key":"1_CR17","doi-asserted-by":"publisher","unstructured":"Yamada, S., Imamura, T., Machida, M.: High performance eigenvalue solver for Hubbard model: tuning strategies for LOBPCG method on CUDA GPU. In: Foster, I., Joubert, G.R., Ku\u010dera, L., Nagel, W.E., Peters, F. (eds.) Parallel Computing: Technology Trends. Advances in Parallel Computing, vol. 36, pp. 105\u2013113. IOS (2020). https:\/\/doi.org\/10.3233\/APC200030","DOI":"10.3233\/APC200030"}],"container-title":["Lecture Notes in Computer Science","Supercomputing Frontiers"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/978-3-031-10419-0_1","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,6,30]],"date-time":"2022-06-30T17:13:11Z","timestamp":1656609191000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/978-3-031-10419-0_1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022]]},"ISBN":["9783031104183","9783031104190"],"references-count":17,"URL":"https:\/\/doi.org\/10.1007\/978-3-031-10419-0_1","relation":{},"ISSN":["0302-9743","1611-3349"],"issn-type":[{"type":"print","value":"0302-9743"},{"type":"electronic","value":"1611-3349"}],"subject":[],"published":{"date-parts":[[2022]]},"assertion":[{"value":"1 July 2022","order":1,"name":"first_online","label":"First Online","group":{"name":"ChapterHistory","label":"Chapter History"}},{"value":"SCFA","order":1,"name":"conference_acronym","label":"Conference Acronym","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Asian Conference on Supercomputing Frontiers","order":2,"name":"conference_name","label":"Conference Name","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Singapore","order":3,"name":"conference_city","label":"Conference City","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Singapore","order":4,"name":"conference_country","label":"Conference Country","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"2022","order":5,"name":"conference_year","label":"Conference Year","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"1 March 2022","order":7,"name":"conference_start_date","label":"Conference Start Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"3 March 2022","order":8,"name":"conference_end_date","label":"Conference End Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"7","order":9,"name":"conference_number","label":"Conference Number","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"scfa2022","order":10,"name":"conference_id","label":"Conference ID","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Single-blind","order":1,"name":"type","label":"Type","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"EasyChair","order":2,"name":"conference_management_system","label":"Conference Management System","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"21","order":3,"name":"number_of_submissions_sent_for_review","label":"Number of Submissions Sent for Review","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"8","order":4,"name":"number_of_full_papers_accepted","label":"Number of Full Papers Accepted","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"0","order":5,"name":"number_of_short_papers_accepted","label":"Number of Short Papers Accepted","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"38% - The value is computed by the equation \"Number of Full Papers Accepted \/ Number of Submissions Sent for Review * 100\" and then rounded to a whole number.","order":6,"name":"acceptance_rate_of_full_papers","label":"Acceptance Rate of Full Papers","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"3.8","order":7,"name":"average_number_of_reviews_per_paper","label":"Average Number of Reviews per Paper","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"3.5","order":8,"name":"average_number_of_papers_per_reviewer","label":"Average Number of Papers per Reviewer","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}},{"value":"No","order":9,"name":"external_reviewers_involved","label":"External Reviewers Involved","group":{"name":"ConfEventPeerReviewInformation","label":"Peer Review Information (provided by the conference organizers)"}}]}}