{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,2]],"date-time":"2025-07-02T09:22:59Z","timestamp":1751448179369,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":47,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,8,9]],"date-time":"2021-08-09T00:00:00Z","timestamp":1628467200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Science Foundation of China","award":["11690013, 71991471"],"award-info":[{"award-number":["11690013, 71991471"]}]},{"name":"Shanghai Natural Science Funding","award":["18ZR1403100"],"award-info":[{"award-number":["18ZR1403100"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,8,9]]},"DOI":"10.1145\/3472456.3472506","type":"proceedings-article","created":{"date-parts":[[2021,10,5]],"date-time":"2021-10-05T18:46:04Z","timestamp":1633459564000},"page":"1-10","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Processor-Aware Cache-Oblivious Algorithms\u2731"],"prefix":"10.1145","author":[{"given":"Yuan","family":"Tang","sequence":"first","affiliation":[{"name":"Fudan University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Weiguo","family":"Gao","sequence":"additional","affiliation":[{"name":"Fudan University"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,10,5]]},"reference":[{"key":"e_1_3_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/341800.341801"},{"key":"e_1_3_2_1_2_1","doi-asserted-by":"crossref","unstructured":"R.\u00a0C. Agarwal S.\u00a0M. Balle F.\u00a0G. Gustavson M. Joshi and P. Palkar. 1995. A three-dimensional approach to parallel matrix multiplication. IBM Journal of Research and Development 39 (Sep. 1995) 575\u2013582. Issue 5.  R.\u00a0C. Agarwal S.\u00a0M. Balle F.\u00a0G. Gustavson M. Joshi and P. Palkar. 1995. A three-dimensional approach to parallel matrix multiplication. IBM Journal of Research and Development 39 (Sep. 1995) 575\u2013582. Issue 5.","DOI":"10.1147\/rd.395.0575"},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1016\/0304-3975(90)90188-N"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/1989493.1989531"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/2312005.2312021"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/2312005.2312044"},{"key":"e_1_3_2_1_7_1","unstructured":"Grey Ballard James Demmel Olga Holtz Benjamin Lipshitz and Oded Schwartz. 2012. Strong Scaling of Matrix Multiplication Algorithms and Memory-Independent Communication Lower Bounds. CoRR abs\/1202.3177(2012).  Grey Ballard James Demmel Olga Holtz Benjamin Lipshitz and Oded Schwartz. 2012. Strong Scaling of Matrix Multiplication Algorithms and Memory-Independent Communication Lower Bounds. CoRR abs\/1202.3177(2012)."},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1137\/090769156"},{"key":"e_1_3_2_1_9_1","first-page":"6","article-title":"Graph Expansion and Communication Costs of Fast Matrix Multiplication","volume":"59","author":"Ballard Grey","year":"2013","unstructured":"Grey Ballard , James Demmel , Olga Holtz , and Oded Schwartz . 2013 . Graph Expansion and Communication Costs of Fast Matrix Multiplication . J. ACM 59 , 6 (Jan. 2013). Grey Ballard, James Demmel, Olga Holtz, and Oded Schwartz. 2013. Graph Expansion and Communication Costs of Fast Matrix Multiplication. J. ACM 59, 6 (Jan. 2013).","journal-title":"J. ACM"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2018.2853151"},{"key":"e_1_3_2_1_11_1","volume-title":"Grenoble","author":"Beaumont Olivier","year":"2016","unstructured":"Olivier Beaumont , Lionel Eyraud-Dubois , and Thomas Lambert . 2016 . Cuboid Partitioning for Parallel Matrix Multiplication on Heterogeneous Platforms. In Euro-Par 2016: Parallel Processing - 22nd International Conference on Parallel and Distributed Computing , Grenoble , France, August 24-26, 2016, Proceedings. ACM, 171\u2013182. Olivier Beaumont, Lionel Eyraud-Dubois, and Thomas Lambert. 2016. Cuboid Partitioning for Parallel Matrix Multiplication on Heterogeneous Platforms. In Euro-Par 2016: Parallel Processing - 22nd International Conference on Parallel and Distributed Computing, Grenoble, France, August 24-26, 2016, Proceedings. ACM, 171\u2013182."},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1147\/sj.52.0078"},{"key":"e_1_3_2_1_13_1","volume-title":"Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming(PPoPP","author":"R.","year":"2015","unstructured":"Austin\u00a0 R. Benson and Grey Ballard. 2015. A Framework for Practical Parallel Fast Matrix Multiplication . In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming(PPoPP 2015 ). ACM, New York, NY, USA, 42\u201353. Austin\u00a0R. Benson and Grey Ballard. 2015. A Framework for Practical Parallel Fast Matrix Multiplication. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming(PPoPP 2015). ACM, New York, NY, USA, 42\u201353."},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/2812804"},{"key":"e_1_3_2_1_15_1","volume-title":"Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2008","author":"Blelloch E.","year":"2008","unstructured":"Guy\u00a0 E. Blelloch , Rezaul\u00a0Alam Chowdhury , Phillip\u00a0 B. Gibbons , Vijaya Ramachandran , Shimin Chen , and Michael Kozuch . 2008 . Provably good multicore cache performance for divide-and-conquer algorithms . In Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2008 , San Francisco, California, USA , January 20-22, 2008. 501\u2013510. Guy\u00a0E. Blelloch, Rezaul\u00a0Alam Chowdhury, Phillip\u00a0B. Gibbons, Vijaya Ramachandran, Shimin Chen, and Michael Kozuch. 2008. Provably good multicore cache performance for divide-and-conquer algorithms. In Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2008, San Francisco, California, USA, January 20-22, 2008. 501\u2013510."},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/1989493.1989553"},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/1810479.1810519"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"crossref","unstructured":"Robert\u00a0D. Blumofe Matteo Frigo Chrisopher\u00a0F. Joerg Charles\u00a0E. Leiserson and Keith\u00a0H. Randall. 1996. An Analysis of Dag-Consistent Distributed Shared-Memory Algorithms. In SPAA \u201996. 297\u2013308.  Robert\u00a0D. Blumofe Matteo Frigo Chrisopher\u00a0F. Joerg Charles\u00a0E. Leiserson and Keith\u00a0H. Randall. 1996. An Analysis of Dag-Consistent Distributed Shared-Memory Algorithms. In SPAA \u201996. 297\u2013308.","DOI":"10.1145\/237502.237574"},{"volume-title":"IPPS10.","author":"Blumofe D.","key":"e_1_3_2_1_19_1","unstructured":"Robert\u00a0 D. Blumofe , Matteo Frigo , Christopher\u00a0 F. Joerg , Charles\u00a0 E. Leiserson , and Keith\u00a0 H. Randall . 1996. Dag-Consistent Distributed Shared Memory . In IPPS10. Honolulu, Hawaii , 132\u2013141. Robert\u00a0D. Blumofe, Matteo Frigo, Christopher\u00a0F. Joerg, Charles\u00a0E. Leiserson, and Keith\u00a0H. Randall. 1996. Dag-Consistent Distributed Shared Memory. In IPPS10. Honolulu, Hawaii, 132\u2013141."},{"key":"e_1_3_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/324133.324234"},{"key":"e_1_3_2_1_21_1","volume-title":"Computer Systems: A Programmer\u2019s Perspective","author":"Bryant E.","year":"2015","unstructured":"Randal\u00a0 E. Bryant and David\u00a0 R. O\u2019Hallaron . 2015 . Computer Systems: A Programmer\u2019s Perspective ( 3 rd ed.). Pearson Eduction , USA. Randal\u00a0E. Bryant and David\u00a0R. O\u2019Hallaron. 2015. Computer Systems: A Programmer\u2019s Perspective (3rd ed.). Pearson Eduction, USA.","edition":"3"},{"volume-title":"Proceedings of ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 207\u2013216","author":"Chowdhury R.","key":"e_1_3_2_1_24_1","unstructured":"R. Chowdhury and V. Ramachandran . 2008. Cache-efficient Dynamic Programming Algorithms for Multicores . In Proceedings of ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 207\u2013216 . R. Chowdhury and V. Ramachandran. 2008. Cache-efficient Dynamic Programming Algorithms for Multicores. In Proceedings of ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 207\u2013216."},{"key":"e_1_3_2_1_25_1","volume-title":"Cache-Oblivious Dynamic Programming. In In Proc. of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA \u201906","author":"Chowdhury Rezaul\u00a0Alam","year":"2006","unstructured":"Rezaul\u00a0Alam Chowdhury and Vijaya Ramachandran . 2006 . Cache-Oblivious Dynamic Programming. In In Proc. of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA \u201906 . 591\u2013600. Rezaul\u00a0Alam Chowdhury and Vijaya Ramachandran. 2006. Cache-Oblivious Dynamic Programming. In In Proc. of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA \u201906. 591\u2013600."},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2013.04.008"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2010.5470354"},{"key":"e_1_3_2_1_28_1","unstructured":"Richard Cole and Vijaya Ramachandran. 2011. Efficient Resource Oblivious Algorithms for Multicores. CoRR abs\/1103.4071(2011).  Richard Cole and Vijaya Ramachandran. 2011. Efficient Resource Oblivious Algorithms for Multicores. CoRR abs\/1103.4071(2011)."},{"key":"e_1_3_2_1_29_1","volume-title":"Efficient Resource Oblivious Algorithms for Multicores with False Sharing. In 26th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2012","author":"Cole Richard","year":"2012","unstructured":"Richard Cole and Vijaya Ramachandran . 2012 . Efficient Resource Oblivious Algorithms for Multicores with False Sharing. In 26th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2012 , Shanghai, China , May 21-25, 2012. 201\u2013214. Richard Cole and Vijaya Ramachandran. 2012. Efficient Resource Oblivious Algorithms for Multicores with False Sharing. In 26th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2012, Shanghai, China, May 21-25, 2012. 201\u2013214."},{"key":"e_1_3_2_1_30_1","volume-title":"Revisiting the Cache Miss Analysis of Multithreaded Algorithms. In LATIN 2012: Theoretical Informatics - 10th Latin American Symposium, Arequipa, Peru, April 16-20, 2012. Proceedings. 172\u2013183","author":"Cole Richard","year":"2012","unstructured":"Richard Cole and Vijaya Ramachandran . 2012 . Revisiting the Cache Miss Analysis of Multithreaded Algorithms. In LATIN 2012: Theoretical Informatics - 10th Latin American Symposium, Arequipa, Peru, April 16-20, 2012. Proceedings. 172\u2013183 . Richard Cole and Vijaya Ramachandran. 2012. Revisiting the Cache Miss Analysis of Multithreaded Algorithms. In LATIN 2012: Theoretical Informatics - 10th Latin American Symposium, Arequipa, Peru, April 16-20, 2012. Proceedings. 172\u2013183."},{"key":"e_1_3_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/3040221"},{"volume-title":"Introduction to Algorithms","author":"Cormen H.","key":"e_1_3_2_1_32_1","unstructured":"Thomas\u00a0 H. Cormen , Charles\u00a0 E. Leiserson , Ronald\u00a0 L. Rivest , and Clifford Stein . 2009. Introduction to Algorithms ( third ed.). The MIT Press . Thomas\u00a0H. Cormen, Charles\u00a0E. Leiserson, Ronald\u00a0L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms(third ed.). The MIT Press."},{"key":"e_1_3_2_1_33_1","volume-title":"Communication-Optimal Parallel Recursive Rectangular Matrix Multiplication. In 27th IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2013","author":"Demmel James","year":"2013","unstructured":"James Demmel , David Eliahu , Armando Fox , Shoaib Kamil , Benjamin Lipshitz , Oded Schwartz , and Omer Spillinger . 2013 . Communication-Optimal Parallel Recursive Rectangular Matrix Multiplication. In 27th IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2013 , Cambridge, MA, USA , May 20-24, 2013. 261\u2013272. James Demmel, David Eliahu, Armando Fox, Shoaib Kamil, Benjamin Lipshitz, Oded Schwartz, and Omer Spillinger. 2013. Communication-Optimal Parallel Recursive Rectangular Matrix Multiplication. In 27th IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2013, Cambridge, MA, USA, May 20-24, 2013. 261\u2013272."},{"volume-title":"SPAA\u201916.","author":"Dinh David","key":"e_1_3_2_1_34_1","unstructured":"David Dinh , Harsha\u00a0Vardhan Simhadri , and Yuan Tang . 2016. Extending the Nested Parallel Model to the Nested Dataflow Model with Provably Efficient Schedulers . In SPAA\u201916. Pacific Grove, CA, USA . David Dinh, Harsha\u00a0Vardhan Simhadri, and Yuan Tang. 2016. Extending the Nested Parallel Model to the Nested Dataflow Model with Provably Efficient Schedulers. In SPAA\u201916. Pacific Grove, CA, USA."},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/2071379.2071383"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00224-007-9098-2"},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1016\/0304-3975(89)90101-1"},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1006\/jpdc.1994.1053"},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2004.03.021"},{"key":"e_1_3_2_1_40_1","unstructured":"Charles\u00a0E. Leiserson. [n.d.]. Performance Engineering of Software Systems.  Charles\u00a0E. Leiserson. [n.d.]. Performance Engineering of Software Systems."},{"key":"e_1_3_2_1_41_1","volume-title":"There\u2019s plenty of room at the top: What will drive computer performance after Moore\u2019s Law. Science 368 (June","author":"Leiserson E.","year":"2020","unstructured":"Charles\u00a0 E. Leiserson , Neil\u00a0 C. Thompson , Joel\u00a0 S. Emer , Bradley\u00a0 C. Kuszmaul , Butler\u00a0 W. Lampson , Daniel Sanchez , and Tao\u00a0 B. Schardl . 2020. There\u2019s plenty of room at the top: What will drive computer performance after Moore\u2019s Law. Science 368 (June 2020 ). Issue 6495. Charles\u00a0E. Leiserson, Neil\u00a0C. Thompson, Joel\u00a0S. Emer, Bradley\u00a0C. Kuszmaul, Butler\u00a0W. Lampson, Daniel Sanchez, and Tao\u00a0B. Schardl. 2020. There\u2019s plenty of room at the top: What will drive computer performance after Moore\u2019s Law. Science 368 (June 2020). Issue 6495."},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2012.33"},{"key":"e_1_3_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1007\/PL00008264"},{"key":"e_1_3_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.dam.2006.08.005"},{"key":"e_1_3_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/2786.2793"},{"key":"e_1_3_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-23397-5_10"},{"key":"e_1_3_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/1583991.1584019"},{"key":"e_1_3_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF02165411"},{"volume-title":"PPoPP\u201915.","author":"Tang Yuan","key":"e_1_3_2_1_49_1","unstructured":"Yuan Tang , Ronghui You , Haibin Kan , Jesmin\u00a0Jahan Tithi , Pramod Ganapathi , and Rezaul\u00a0 A. Chowdhury . 2015. Cache-Oblivious Wavefront: Improving Parallelism of Recursive Dynamic Programming Algorithms without losing Cache-Efficiency . In PPoPP\u201915. San Francisco, CA, USA . Yuan Tang, Ronghui You, Haibin Kan, Jesmin\u00a0Jahan Tithi, Pramod Ganapathi, and Rezaul\u00a0A. Chowdhury. 2015. Cache-Oblivious Wavefront: Improving Parallelism of Recursive Dynamic Programming Algorithms without losing Cache-Efficiency. In PPoPP\u201915. San Francisco, CA, USA."}],"event":{"name":"ICPP 2021: 50th International Conference on Parallel Processing","acronym":"ICPP 2021","location":"Lemont IL USA"},"container-title":["50th International Conference on Parallel Processing"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3472456.3472506","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3472456.3472506","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:48:12Z","timestamp":1750193292000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3472456.3472506"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,8,9]]},"references-count":47,"alternative-id":["10.1145\/3472456.3472506","10.1145\/3472456"],"URL":"https:\/\/doi.org\/10.1145\/3472456.3472506","relation":{},"subject":[],"published":{"date-parts":[[2021,8,9]]},"assertion":[{"value":"2021-10-05","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}