{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,2]],"date-time":"2025-08-02T04:10:05Z","timestamp":1754107805950,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":37,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,8,9]],"date-time":"2021-08-09T00:00:00Z","timestamp":1628467200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"ERC","award":["802554"],"award-info":[{"award-number":["802554"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,8,9]]},"DOI":"10.1145\/3472456.3473519","type":"proceedings-article","created":{"date-parts":[[2021,10,5]],"date-time":"2021-10-05T18:46:04Z","timestamp":1633459564000},"page":"1-12","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["Efficiently Parallelizable Strassen-Based Multiplication of a Matrix by its Transpose"],"prefix":"10.1145","author":[{"given":"Viviana","family":"Arrigoni","sequence":"first","affiliation":[{"name":"Sapienza, University of Rome"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Filippo","family":"Maggioli","sequence":"additional","affiliation":[{"name":"Sapienza, University of Rome"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Annalisa","family":"Massini","sequence":"additional","affiliation":[{"name":"Sapienza, University of Rome"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Emanuele","family":"Rodol\u00e0","sequence":"additional","affiliation":[{"name":"Sapienza, University of Rome"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,10,5]]},"reference":[{"volume-title":"Proc. 24th ACM Symp. Parallelism in Algorithms and Architectures (SPAA). 193\u2013204","author":"Ballard G.","key":"e_1_3_2_1_1_1","unstructured":"G. Ballard , J. Demmel , O. Holtz , B. Lipshitz , and O. Schwartz . 2012. Communication-optimal parallel algorithm for strassen\u2019s matrix multiplication . In Proc. 24th ACM Symp. Parallelism in Algorithms and Architectures (SPAA). 193\u2013204 . G. Ballard, J. Demmel, O. Holtz, B. Lipshitz, and O. Schwartz. 2012. Communication-optimal parallel algorithm for strassen\u2019s matrix multiplication. In Proc. 24th ACM Symp. Parallelism in Algorithms and Architectures (SPAA). 193\u2013204."},{"volume-title":"Proc. 23rd ACM Symp. Parallelism in Algorithms and Architectures (SPAA). 1\u201312","author":"Ballard G.","key":"e_1_3_2_1_2_1","unstructured":"G. Ballard , J. Demmel , O. Holtz , and O. Schwartz . 2011. Graph expansion and communication costs of fast matrix multiplication . In Proc. 23rd ACM Symp. Parallelism in Algorithms and Architectures (SPAA). 1\u201312 . G. Ballard, J. Demmel, O. Holtz, and O. Schwartz. 2011. Graph expansion and communication costs of fast matrix multiplication. In Proc. 23rd ACM Symp. Parallelism in Algorithms and Architectures (SPAA). 1\u201312."},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/2395116.2395121"},{"volume-title":"Proc. 20th ACM SIGPLAN PPoPP. 42\u201353","author":"Benson A.R.","key":"e_1_3_2_1_4_1","unstructured":"A.R. Benson and G. Ballard . 2015. A Framework for Practical Parallel Fast Matrix Multiplication . In Proc. 20th ACM SIGPLAN PPoPP. 42\u201353 . A.R. Benson and G. Ballard. 2015. A Framework for Practical Parallel Fast Matrix Multiplication. In Proc. 20th ACM SIGPLAN PPoPP. 42\u201353."},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF02308867"},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3267101"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"crossref","unstructured":"A. Charara H. Ltaief and D. Keyes. 2016. Redesigning triangular dense matrix computations on GPUs. In Euro-Par. Springer 477\u2013489.  A. Charara H. Ltaief and D. Keyes. 2016. Redesigning triangular dense matrix computations on GPUs. In Euro-Par. Springer 477\u2013489.","DOI":"10.1007\/978-3-319-43659-3_35"},{"volume-title":"Proc. 19th ACM Symp. Theory of Computing (STOC). 1\u20136.","author":"Coppersmith D.","key":"e_1_3_2_1_9_1","unstructured":"D. Coppersmith and S. Winograd . 1987. Matrix multiplication via arithmetic progressions . In Proc. 19th ACM Symp. Theory of Computing (STOC). 1\u20136. D. Coppersmith and S. Winograd. 1987. Matrix multiplication via arithmetic progressions. In Proc. 19th ACM Symp. Theory of Computing (STOC). 1\u20136."},{"volume-title":"Proc. 21st Int. Conf. on Supercomputing. ACM, 284\u2013292","author":"D\u2019Alberto P.","key":"e_1_3_2_1_10_1","unstructured":"P. D\u2019Alberto and A. Nicolau . 2007. Adaptive Strassen\u2019s matrix multiplication . In Proc. 21st Int. Conf. on Supercomputing. ACM, 284\u2013292 . P. D\u2019Alberto and A. Nicolau. 2007. Adaptive Strassen\u2019s matrix multiplication. In Proc. 21st Int. Conf. on Supercomputing. ACM, 284\u2013292."},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2013.80"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.5555\/1064364.1064367"},{"volume-title":"Proc. 45th Int. Symp. Symbolic and Algebraic Computation","author":"Dumas J.","key":"e_1_3_2_1_13_1","unstructured":"J. Dumas , C. Pernet , and A. Sedoglavic . 2020. On Fast Multiplication of a Matrix by Its Transpose . In Proc. 45th Int. Symp. Symbolic and Algebraic Computation ( Kalamata, Greece) (ISSAC \u201920). Association for Computing Machinery, New York, NY, USA, 162\u2013169. https:\/\/doi.org\/10.1145\/3373207.3404021 10.1145\/3373207.3404021 J. Dumas, C. Pernet, and A. Sedoglavic. 2020. On Fast Multiplication of a Matrix by Its Transpose. In Proc. 45th Int. Symp. Symbolic and Algebraic Computation (Kalamata, Greece) (ISSAC \u201920). Association for Computing Machinery, New York, NY, USA, 162\u2013169. https:\/\/doi.org\/10.1145\/3373207.3404021"},{"key":"e_1_3_2_1_14_1","volume-title":"Frpa: A framework for recursive parallel algorithms. Technical Report UCB\/EECS-2015-28. EECS Department","author":"Eliahu D.","year":"2015","unstructured":"D. Eliahu , O. Spillinger , A. Fox , and J. Demmel . 2015 . Frpa: A framework for recursive parallel algorithms. Technical Report UCB\/EECS-2015-28. EECS Department , University of California , Berkeley. D. Eliahu, O. Spillinger, A. Fox, and J. Demmel. 2015. Frpa: A framework for recursive parallel algorithms. Technical Report UCB\/EECS-2015-28. EECS Department, University of California, Berkeley."},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"crossref","unstructured":"E. Elmroth F. Gustavson I. Jonsson and B. K\u00e5gstr\u00f6m. 2004. Recursive blocked algorithms and hybrid data structures for dense matrix library software. SIAM review 46 1 (2004) 3\u201345.  E. Elmroth F. Gustavson I. Jonsson and B. K\u00e5gstr\u00f6m. 2004. Recursive blocked algorithms and hybrid data structures for dense matrix library software. SIAM review 46 1 (2004) 3\u201345.","DOI":"10.1137\/S0036144503428693"},{"key":"e_1_3_2_1_16_1","volume-title":"https:\/\/software.intel.com\/en-us\/download\/developer-reference-for-intel-math-kernel-library-c","author":"Intel\u00ae Math Kernel Developer\u00a0Reference","year":"2019","unstructured":"[ 16 ] Developer\u00a0Reference for Intel\u00ae Math Kernel Library\u00a0C.2019. ( 2019 ). https:\/\/software.intel.com\/en-us\/download\/developer-reference-for-intel-math-kernel-library-c [16] Developer\u00a0Reference for Intel\u00ae Math Kernel Library\u00a0C.2019. (2019). https:\/\/software.intel.com\/en-us\/download\/developer-reference-for-intel-math-kernel-library-c"},{"volume-title":"40th Symp. Foundations of Computer Science (FOCS). IEEE, 285\u2013297","author":"Frigo M.","key":"e_1_3_2_1_17_1","unstructured":"M. Frigo , C.\u00a0 E. Leiserson , H. Prokop , and S. Ramachandran . 1999. Cache-oblivious algorithms . In 40th Symp. Foundations of Computer Science (FOCS). IEEE, 285\u2013297 . M. Frigo, C.\u00a0E. Leiserson, H. Prokop, and S. Ramachandran. 1999. Cache-oblivious algorithms. In 40th Symp. Foundations of Computer Science (FOCS). IEEE, 285\u2013297."},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/2608628.2608664"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1142\/S0129626496000029"},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/98267.98290"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2008.03.003"},{"volume-title":"Proc. ACM\/IEEE Conf. on Supercomputing.","author":"Huss-Lederman S.","key":"e_1_3_2_1_22_1","unstructured":"S. Huss-Lederman , E.M. Jacobson , A. Tsao , T. Turnbull , and J.R. Johnson . 1996. Implementation of Strassen\u2019s algorithm for matrix multiplication . In Proc. ACM\/IEEE Conf. on Supercomputing. S. Huss-Lederman, E.M. Jacobson, A. Tsao, T. Turnbull, and J.R. Johnson. 1996. Implementation of Strassen\u2019s algorithm for matrix multiplication. In Proc. ACM\/IEEE Conf. on Supercomputing."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/800076.802486"},{"volume-title":"Computer Science On-line Conference. Springer, 186\u2013196","author":"Kadhum M.","key":"e_1_3_2_1_24_1","unstructured":"M. Kadhum , M.\u00a0 H. Qasem , A. Sleit , and A. Sharieh . 2017. Efficient MapReduce matrix multiplication with optimized mapper set . In Computer Science On-line Conference. Springer, 186\u2013196 . M. Kadhum, M.\u00a0H. Qasem, A. Sleit, and A. Sharieh. 2017. Efficient MapReduce matrix multiplication with optimized mapper set. In Computer Science On-line Conference. Springer, 186\u2013196."},{"key":"e_1_3_2_1_25_1","volume-title":"Int. Workshop on Applied Parallel Computing. Springer, 21\u201332","author":"K\u00e5gstr\u00f6m Bo","year":"2004","unstructured":"Bo K\u00e5gstr\u00f6m . 2004 . Management of deep memory hierarchies\u2013recursive blocked algorithms and hybrid data structures for dense matrix computations . In Int. Workshop on Applied Parallel Computing. Springer, 21\u201332 . Bo K\u00e5gstr\u00f6m. 2004. Management of deep memory hierarchies\u2013recursive blocked algorithms and hybrid data structures for dense matrix computations. In Int. Workshop on Applied Parallel Computing. Springer, 21\u201332."},{"volume-title":"Proc. Int. Conf. High Performance Computing, Networking, Storage and Analysis(SC \u201919)","author":"Kwasniewski G.","key":"e_1_3_2_1_26_1","unstructured":"G. Kwasniewski , M. Kabi\u0107 , M. Besta , J. VandeVondele , R. Solc\u00e0 , and T. Hoefler . 2019. Red-Blue Pebbling Revisited: Near Optimal Parallel Matrix-Matrix Multiplication . In Proc. Int. Conf. High Performance Computing, Networking, Storage and Analysis(SC \u201919) . Article 24, 22\u00a0pages. https:\/\/doi.org\/10.1145\/3295500.3356181 10.1145\/3295500.3356181 G. Kwasniewski, M. Kabi\u0107, M. Besta, J. VandeVondele, R. Solc\u00e0, and T. Hoefler. 2019. Red-Blue Pebbling Revisited: Near Optimal Parallel Matrix-Matrix Multiplication. In Proc. Int. Conf. High Performance Computing, Networking, Storage and Analysis(SC \u201919). Article 24, 22\u00a0pages. https:\/\/doi.org\/10.1145\/3295500.3356181"},{"volume-title":"Proc. ACM Symp. Applied Computing, SAC\u201995","author":"Luo Q.","key":"e_1_3_2_1_27_1","unstructured":"Q. Luo and J. Drake . 1995. A scalable parallel Strassen\u2019s matrix multiplication algorithm for distributed-memory computers . In Proc. ACM Symp. Applied Computing, SAC\u201995 . 221\u2013226. Q. Luo and J. Drake. 1995. A scalable parallel Strassen\u2019s matrix multiplication algorithm for distributed-memory computers. In Proc. ACM Symp. Applied Computing, SAC\u201995. 221\u2013226."},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3061664"},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/IT-DREPS.2017.8277807"},{"volume-title":"Proc. Parallel and Distributed Computing and Systems, (PDCS).","author":"Song F.","key":"e_1_3_2_1_30_1","unstructured":"F. Song , J. Dongarra , and S. Moore . 2006. Experiments with Strassen\u2019s algorithm: From sequential to parallel . In Proc. Parallel and Distributed Computing and Systems, (PDCS). F. Song, J. Dongarra, and S. Moore. 2006. Experiments with Strassen\u2019s algorithm: From sequential to parallel. In Proc. Parallel and Distributed Computing and Systems, (PDCS)."},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0885-064X(02)00007-9"},{"volume-title":"Linear Algebra and Its Applications","author":"Strang G.","key":"e_1_3_2_1_32_1","unstructured":"G. Strang . 2006. Linear Algebra and Its Applications , Fourth Ed.Thomson Brooks\/Cole . G. Strang. 2006. Linear Algebra and Its Applications, Fourth Ed.Thomson Brooks\/Cole."},{"key":"e_1_3_2_1_33_1","volume-title":"Gaussian elimination is not optimal. Numerische mathematik 13, 4","author":"Strassen V.","year":"1969","unstructured":"V. Strassen . 1969. Gaussian elimination is not optimal. Numerische mathematik 13, 4 ( 1969 ), 354\u2013356. V. Strassen. 1969. Gaussian elimination is not optimal. Numerische mathematik 13, 4 (1969), 354\u2013356."},{"volume-title":"Proc. 1998 ACM\/IEEE Conf. on Supercomputing (SC\u201998)","author":"Thottethodi M.","key":"e_1_3_2_1_34_1","unstructured":"M. Thottethodi , S. Chatterjee , and A.R. Lebeck . 1998. Tuning Strassen\u2019s matrix multiplication for memory efficiency . In Proc. 1998 ACM\/IEEE Conf. on Supercomputing (SC\u201998) . IEEE, 36\u201336. M. Thottethodi, S. Chatterjee, and A.R. Lebeck. 1998. Tuning Strassen\u2019s matrix multiplication for memory efficiency. In Proc. 1998 ACM\/IEEE Conf. on Supercomputing (SC\u201998). IEEE, 36\u201336."},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"crossref","unstructured":"E. Wang Q. Zhang B.\u00a0Shenand\u00a0G. Zhang X. Lu Q. Wu and Y. Wang. 2014. Intel math kernel library. In High-Performance Computing on the Intel\u00ae Xeon Phi\u2122. Springer 167\u2013188.  E. Wang Q. Zhang B.\u00a0Shenand\u00a0G. Zhang X. Lu Q. Wu and Y. Wang. 2014. Intel math kernel library. In High-Performance Computing on the Intel\u00ae Xeon Phi\u2122. Springer 167\u2013188.","DOI":"10.1007\/978-3-319-06486-4_7"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/2213977.2214056"},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/DCS.1988.12538"},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.gmod.2012.03.009"}],"event":{"name":"ICPP 2021: 50th International Conference on Parallel Processing","acronym":"ICPP 2021","location":"Lemont IL USA"},"container-title":["50th International Conference on Parallel Processing"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3472456.3473519","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3472456.3473519","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:17:23Z","timestamp":1750191443000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3472456.3473519"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,8,9]]},"references-count":37,"alternative-id":["10.1145\/3472456.3473519","10.1145\/3472456"],"URL":"https:\/\/doi.org\/10.1145\/3472456.3473519","relation":{},"subject":[],"published":{"date-parts":[[2021,8,9]]},"assertion":[{"value":"2021-10-05","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}