{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,21]],"date-time":"2025-11-21T06:09:52Z","timestamp":1763705392904,"version":"3.41.0"},"reference-count":50,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2017,9,30]],"date-time":"2017-09-30T00:00:00Z","timestamp":1506729600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"the Scientific and Technological Research Council of Turkey","award":["EEEAG-115E512"],"award-info":[{"award-number":["EEEAG-115E512"]}]},{"DOI":"10.13039\/501100000921","name":"European Cooperation in Science and Technology","doi-asserted-by":"publisher","award":["Action IC1406 (cHiPSet)"],"award-info":[{"award-number":["Action IC1406 (cHiPSet)"]}],"id":[{"id":"10.13039\/501100000921","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Parallel Comput."],"published-print":{"date-parts":[[2017,9,30]]},"abstract":"<jats:p>We investigate outer-product--parallel, inner-product--parallel, and row-by-row-product--parallel formulations of sparse matrix-matrix multiplication (SpGEMM) on distributed memory architectures. For each of these three formulations, we propose a hypergraph model and a bipartite graph model for distributing SpGEMM computations based on one-dimensional (1D) partitioning of input matrices. We also propose a communication hypergraph model for each formulation for distributing communication operations. The computational graph and hypergraph models adopted in the first phase aim at minimizing the total message volume and balancing the computational loads of processors, whereas the communication hypergraph models adopted in the second phase aim at minimizing the total message count and balancing the message volume loads of processors. That is, the computational partitioning models reduce the bandwidth cost and the communication hypergraph models reduce the latency cost. Our extensive parallel experiments on up to 2048 processors for a wide range of realistic SpGEMM instances show that although the outer-product--parallel formulation scales better, the row-by-row-product--parallel formulation is more viable due to its significantly lower partitioning overhead and competitive scalability. For computational partitioning models, our experimental findings indicate that the proposed bipartite graph models are attractive alternatives to their hypergraph counterparts because of their lower partitioning overhead. Finally, we show that by reducing the latency cost besides the bandwidth cost through using the communication hypergraph models, the parallel SpGEMM time can be further improved up to 32%.<\/jats:p>","DOI":"10.1145\/3155292","type":"journal-article","created":{"date-parts":[[2018,1,3]],"date-time":"2018-01-03T13:19:43Z","timestamp":1514985583000},"page":"1-34","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":17,"title":["Partitioning Models for Scaling Parallel Sparse Matrix-Matrix Multiplication"],"prefix":"10.1145","volume":"4","author":[{"given":"Kadir","family":"Akbudak","sequence":"first","affiliation":[{"name":"Bilkent University, Ankara, Turkey"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Oguz","family":"Selvitopi","sequence":"additional","affiliation":[{"name":"Bilkent University, Ankara, Turkey"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Cevdet","family":"Aykanat","sequence":"additional","affiliation":[{"name":"Bilkent University, Ankara, Turkey"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2018,1,3]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1137\/13092589X"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2017.2656893"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2007.09.006"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPSW.2015.75"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/2486159.2486196"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/2755573.2755613"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1137\/110838844"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF02024486"},{"volume-title":"Technical Report. SAND2004-6440","year":"2005","author":"Boman Erik G.","key":"e_1_2_1_9_1"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2014.03.012"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1137\/1.9780898719505"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2008.4536313"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1177\/1094342011403516"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1137\/110848244"},{"key":"e_1_2_1_15_1","unstructured":"Lynn Cannon. 1969. A Cellular Computer to Implement the Kalman Filter Algorithm. Ph.D. Dissertation. Montana State University Bozeman MN.  Lynn Cannon. 1969. A Cellular Computer to Implement the Kalman Filter Algorithm. Ph.D. Dissertation. Montana State University Bozeman MN."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/71.780863"},{"key":"e_1_2_1_17_1","unstructured":"\u00dcmit V. \u00c7ataly\u00fcrek and Cevdet Aykanat. 1999b. PaToH: A Multilevel Hypergraph Partitioning Tool Version 3.0. Computer Engineering Department Bilkent University Ankara Turkey.  \u00dcmit V. \u00c7ataly\u00fcrek and Cevdet Aykanat. 1999b. PaToH: A Multilevel Hypergraph Partitioning Tool Version 3.0. Computer Engineering Department Bilkent University Ankara Turkey."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1063\/1.477969"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0010-4655(00)00074-6"},{"key":"e_1_2_1_20_1","unstructured":"CP2K. 2016. CP2K home page. Retrieved from http:\/\/www.cp2k.org\/.  CP2K. 2016. CP2K home page. Retrieved from http:\/\/www.cp2k.org\/."},{"key":"e_1_2_1_21_1","unstructured":"Steven Dalton Nathan Bell and Luke Olson. 2013. Optimizing Sparse Matrix-Matrix Multiplication for the GPU. Technical report. NVIDIA.  Steven Dalton Nathan Bell and Luke Olson. 2013. Optimizing Sparse Matrix-Matrix Multiplication for the GPU. Technical report. NVIDIA."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1063\/1.474404"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1137\/1.9780898718881"},{"key":"e_1_2_1_24_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/2049662.2049663","article-title":"The university of florida sparse matrix collection","volume":"38","author":"Davis Timothy A.","year":"2011","journal-title":"ACM Transactions on Mathematical Software"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2013.80"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1007\/s101070100263"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1137\/130948811"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/355791.355796"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-36803-5_14"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1016\/0010-4655(95)00031-A"},{"key":"e_1_2_1_32_1","doi-asserted-by":"crossref","unstructured":"George Karypis Anshul Gupta and Vipin Kumar. 1994. A parallel formulation of interior point algorithms. In Supercomputing\u201994.  George Karypis Anshul Gupta and Vipin Kumar. 1994. A parallel formulation of interior point algorithms. In Supercomputing\u201994.","DOI":"10.1145\/602770.602808"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1006\/jpdc.1997.1403"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1137\/S1064827595287997"},{"key":"e_1_2_1_35_1","doi-asserted-by":"crossref","unstructured":"Thomas Lengauer. 1990. Combinatorial Algorithms for Integrated Circuit Layout. Willey--Teubner Chichester UK.  Thomas Lengauer. 1990. Combinatorial Algorithms for Integrated Circuit Layout. Willey--Teubner Chichester UK.","DOI":"10.1007\/978-3-322-92106-2"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevB.47.10891"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/MIC.2003.1167344"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2014.47"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1063\/1.473579"},{"key":"e_1_2_1_40_1","unstructured":"Intel MKL. 2015. Math Kernel Library (MKL). Retrieved from http:\/\/software.intel.com\/en-us\/articles\/intel-mkl\/.  Intel MKL. 2015. Math Kernel Library (MKL). Retrieved from http:\/\/software.intel.com\/en-us\/articles\/intel-mkl\/."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.2172\/1029781"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2009.83"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2016.2545664"},{"key":"e_1_2_1_44_1","doi-asserted-by":"crossref","unstructured":"Md Mostofa Ali Patwary Nadathur Rajagopalan Satish Narayanan Sundaram Jongsoo Park Michael J. Anderson Satya Gautam Vadlamudi Dipankar Das Sergey G. Pudov Vadim O. Pirogov and Pradeep Dubey. 2015. Parallel efficient sparse matrix-matrix multiplication on multicore platforms. In High Performance Computing. Springer 48--57.  Md Mostofa Ali Patwary Nadathur Rajagopalan Satish Narayanan Sundaram Jongsoo Park Michael J. Anderson Satya Gautam Vadlamudi Dipankar Das Sergey G. Pudov Vadim O. Pirogov and Pradeep Dubey. 2015. Parallel efficient sparse matrix-matrix multiplication on multicore platforms. In High Performance Computing. Springer 48--57.","DOI":"10.1007\/978-3-319-20119-1_4"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1063\/1.1372182"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2016.04.004"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/2063384.2063487"},{"key":"e_1_2_1_48_1","unstructured":"Total-FETI. 2016. Total-FETI Massively Parallel Implementation Research Group. Retrieved from http:\/\/spomech.vsb.cz\/feti\/.  Total-FETI. 2016. Total-FETI Massively Parallel Implementation Research Group. Retrieved from http:\/\/spomech.vsb.cz\/feti\/."},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1137\/S1064827502410463"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1096-9128(199704)9:4%3C255::AID-CPE250%3E3.0.CO;2-2"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1021\/ct200897x"}],"container-title":["ACM Transactions on Parallel Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3155292","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3155292","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T02:26:28Z","timestamp":1750213588000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3155292"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,9,30]]},"references-count":50,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2017,9,30]]}},"alternative-id":["10.1145\/3155292"],"URL":"https:\/\/doi.org\/10.1145\/3155292","relation":{},"ISSN":["2329-4949","2329-4957"],"issn-type":[{"type":"print","value":"2329-4949"},{"type":"electronic","value":"2329-4957"}],"subject":[],"published":{"date-parts":[[2017,9,30]]},"assertion":[{"value":"2016-02-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2017-08-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2018-01-03","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}