{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,29]],"date-time":"2025-09-29T08:05:23Z","timestamp":1759133123526,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":25,"publisher":"ACM","license":[{"start":{"date-parts":[[2018,8,13]],"date-time":"2018-08-13T00:00:00Z","timestamp":1534118400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research and and the Exascale Computing Project"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2018,8,13]]},"DOI":"10.1145\/3225058.3225100","type":"proceedings-article","created":{"date-parts":[[2018,8,8]],"date-time":"2018-08-08T19:13:06Z","timestamp":1533755586000},"page":"1-10","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":13,"title":["Vectorized Parallel Sparse Matrix-Vector Multiplication in PETSc Using AVX-512"],"prefix":"10.1145","author":[{"given":"Hong","family":"Zhang","sequence":"first","affiliation":[{"name":"Mathematics and Computer Science Division, Argonne National Laboratory, Lemont, IL, USA"}]},{"given":"Richard T.","family":"Mills","sequence":"additional","affiliation":[{"name":"Mathematics and Computer Science Division, Argonne National Laboratory, Lemont, IL, USA"}]},{"given":"Karl","family":"Rupp","sequence":"additional","affiliation":[{"name":"Institute for Microelectronics, TU Wien, Wien, Austria"}]},{"given":"Barry F.","family":"Smith","sequence":"additional","affiliation":[{"name":"Mathematics and Computer Science Division, Argonne National Laboratory, Lemont, IL, USA"}]}],"member":"320","published-online":{"date-parts":[[2018,8,13]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Karl Rupp, Patrick Sanan, Barry F. Smith, Stefano Zampini, Hong Zhang, and Hong Zhang.","author":"Balay Satish","year":"2017","unstructured":"Satish Balay , Shrirang Abhyankar , Mark F. Adams , Jed Brown , Peter Brune , Kris Buschelman , Lisandro Dalcin , Victor Eijkhout , William D. Gropp , Dinesh Kaushik , Matthew G. Knepley , Dave A. May , Lois Curfman McInnes , Karl Rupp, Patrick Sanan, Barry F. Smith, Stefano Zampini, Hong Zhang, and Hong Zhang. 2017 . PETSc Users Manual. Technical Report ANL-95\/11 - Revision 3.8. Argonne National Laboratory . http:\/\/www.mcs.anl.gov\/petsc Satish Balay, Shrirang Abhyankar, Mark F. Adams, Jed Brown, Peter Brune, Kris Buschelman, Lisandro Dalcin, Victor Eijkhout, William D. Gropp, Dinesh Kaushik, Matthew G. Knepley, Dave A. May, Lois Curfman McInnes, Karl Rupp, Patrick Sanan, Barry F. Smith, Stefano Zampini, Hong Zhang, and Hong Zhang. 2017. PETSc Users Manual. Technical Report ANL-95\/11 - Revision 3.8. Argonne National Laboratory. http:\/\/www.mcs.anl.gov\/petsc"},{"key":"e_1_3_2_1_2_1","volume-title":"Henri Vincenti, Samuel Williams, Pierre Carrier, Nathan Wichmann, Marcus Wagner, Paul Kent, Christopher Kerr, and John Dennis.","author":"Barnes Taylor","year":"2017","unstructured":"Taylor Barnes , Brandon Cook , Jack Deslippe , Douglas Doerfler , Brian Friesen , Yun He , Thorsten Kurth , Tuomas Koskela , Mathieu Lobet , Tareq Malas , Leonid Oliker , Andrey Ovsyannikov , Abhinav Sarje , Jean Luc Vay , Henri Vincenti, Samuel Williams, Pierre Carrier, Nathan Wichmann, Marcus Wagner, Paul Kent, Christopher Kerr, and John Dennis. 2017 . Evaluating and optimizing the NERSC workload on Knights Landing. In Proceedings of PMBS 2016: 7th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computing Systems - Held in conjunction with SC 2016: The International Conference for High Performance Computing, Networking, St . 43--53. Taylor Barnes, Brandon Cook, Jack Deslippe, Douglas Doerfler, Brian Friesen, Yun He, Thorsten Kurth, Tuomas Koskela, Mathieu Lobet, Tareq Malas, Leonid Oliker, Andrey Ovsyannikov, Abhinav Sarje, Jean Luc Vay, Henri Vincenti, Samuel Williams, Pierre Carrier, Nathan Wichmann, Marcus Wagner, Paul Kent, Christopher Kerr, and John Dennis. 2017. Evaluating and optimizing the NERSC workload on Knights Landing. In Proceedings of PMBS 2016: 7th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computing Systems - Held in conjunction with SC 2016: The International Conference for High Performance Computing, Networking, St. 43--53."},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/1654059.1654078"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCASM.2010.5623237"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/1837853.1693471"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1177\/109434200101500106"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1007\/11428831_13"},{"key":"e_1_3_2_1_8_1","volume-title":"Jean Luc Vay, and Henri Vincenti","author":"Doerfler Douglas","year":"2016","unstructured":"Douglas Doerfler , Jack Deslippe , Samuel Williams , Leonid Oliker , Brandon Cook , Thorsten Kurth , Mathieu Lobet , Tareq Malas , Jean Luc Vay, and Henri Vincenti . 2016 . Applying the roofline performance model to the Intel Xeon Phi Knights Landing processor. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 9945 LNCS. 339--353. Douglas Doerfler, Jack Deslippe, Samuel Williams, Leonid Oliker, Brandon Cook, Thorsten Kurth, Mathieu Lobet, Tareq Malas, Jean Luc Vay, and Henri Vincenti. 2016. Applying the roofline performance model to the Intel Xeon Phi Knights Landing processor. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 9945 LNCS. 339--353."},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11227-008-0251-8"},{"key":"e_1_3_2_1_10_1","series-title":"Springer series in","volume-title":"Verwer","author":"Hundsdorfer Willem","year":"2003","unstructured":"Willem Hundsdorfer and Jan G . Verwer . 2003 . Numerical Solution of Time-Dependent Advection-Diffusion-Reaction Equations. Number 33 in Springer series in computational mathematics. Springer . Willem Hundsdorfer and Jan G. Verwer. 2003. Numerical Solution of Time-Dependent Advection-Diffusion-Reaction Equations. Number 33 in Springer series in computational mathematics. Springer."},{"key":"e_1_3_2_1_11_1","volume-title":"Optimizing Sparse Matrix Vector Multiplication on SMPs. In In Ninth SIAM Conference on Parallel Processing for Scientific Computing.","author":"Im Eun-Jin","year":"1999","unstructured":"Eun-Jin Im and Katherine Yelick . 1999 . Optimizing Sparse Matrix Vector Multiplication on SMPs. In In Ninth SIAM Conference on Parallel Processing for Scientific Computing. Eun-Jin Im and Katherine Yelick. 1999. Optimizing Sparse Matrix Vector Multiplication on SMPs. In In Ninth SIAM Conference on Parallel Processing for Scientific Computing."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1177\/1094342004041296"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPSW.2012.211"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1137\/130930352"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2464996.2465013"},{"key":"e_1_3_2_1_16_1","volume-title":"Terry J. Ligocki, Matthew J. Cordery, Nicholas J. Wright, Mary W. Hall, and Leonid Oliker.","author":"Lo Yu Jung","year":"2015","unstructured":"Yu Jung Lo , Samuel Williams , Brian Van Straalen , Terry J. Ligocki, Matthew J. Cordery, Nicholas J. Wright, Mary W. Hall, and Leonid Oliker. 2015 . Roofline model toolkit: A practical tool for architectural and program analysis. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) , Vol. 8966 . 129--148. Yu Jung Lo, Samuel Williams, Brian Van Straalen, Terry J. Ligocki, Matthew J. Cordery, Nicholas J. Wright, Mary W. Hall, and Leonid Oliker. 2015. Roofline model toolkit: A practical tool for architectural and program analysis. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 8966. 129--148."},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-11515-8_10"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.5555\/3111979.3112135"},{"key":"e_1_3_2_1_20_1","volume-title":"Complex Patterns in a Simple System. Science 261, 5118","author":"Pearson John E.","year":"1993","unstructured":"John E. Pearson . 1993. Complex Patterns in a Simple System. Science 261, 5118 ( 1993 ), 189--192. John E. Pearson. 1993. Complex Patterns in a Simple System. Science 261, 5118 (1993), 189--192."},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/331532.331562"},{"key":"e_1_3_2_1_22_1","volume-title":"Catalyurek","author":"Saule Erik","year":"2013","unstructured":"Erik Saule , Kamer Kaya , and Umit V . Catalyurek . 2013 . Performance evaluation of sparse matrix multiplication kernels on Intel Xeon Phi. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 8384 LNCS. 559--570. arXiv:arXiv:1302.1078v1 Erik Saule, Kamer Kaya, and Umit V. Catalyurek. 2013. Performance evaluation of sparse matrix multiplication kernels on Intel Xeon Phi. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 8384 LNCS. 559--570. arXiv:arXiv:1302.1078v1"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.1658"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2008.12.006"},{"key":"e_1_3_2_1_25_1","volume-title":"Efficient CUDA polynomial preconditioned conjugate gradient solver for finite element computation of elasticity problems. Mathematical Problems in Engineering 2013","author":"Zhang Jianfei","year":"2013","unstructured":"Jianfei Zhang and Lei Zhang . 2013. Efficient CUDA polynomial preconditioned conjugate gradient solver for finite element computation of elasticity problems. Mathematical Problems in Engineering 2013 ( 2013 ). Jianfei Zhang and Lei Zhang. 2013. Efficient CUDA polynomial preconditioned conjugate gradient solver for finite element computation of elasticity problems. Mathematical Problems in Engineering 2013 (2013)."},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2014.03.002"}],"event":{"name":"ICPP 2018: 47th International Conference on Parallel Processing","sponsor":["University of Oregon University of Oregon"],"location":"Eugene OR USA","acronym":"ICPP 2018"},"container-title":["Proceedings of the 47th International Conference on Parallel Processing"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3225058.3225100","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3225058.3225100","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T01:39:07Z","timestamp":1750210747000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3225058.3225100"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,8,13]]},"references-count":25,"alternative-id":["10.1145\/3225058.3225100","10.1145\/3225058"],"URL":"https:\/\/doi.org\/10.1145\/3225058.3225100","relation":{},"subject":[],"published":{"date-parts":[[2018,8,13]]},"assertion":[{"value":"2018-08-13","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}