{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,24]],"date-time":"2026-02-24T18:55:15Z","timestamp":1771959315986,"version":"3.50.1"},"reference-count":38,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2016,1,9]],"date-time":"2016-01-09T00:00:00Z","timestamp":1452297600000},"content-version":"vor","delay-in-days":365,"URL":"http:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000015","name":"U.S. Department of Energy","doi-asserted-by":"publisher","award":["DE-SC0012489"],"award-info":[{"award-number":["DE-SC0012489"]}],"id":[{"id":"10.13039\/100000015","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"U.S. National Science Foundation","doi-asserted-by":"crossref","award":["0811457, 0926127, 0926687, 1059417"],"award-info":[{"award-number":["0811457, 0926127, 0926687, 1059417"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2015,1,9]]},"abstract":"<jats:p>The roofline model is a popular approach for \u201cbound and bottleneck\u201d performance analysis. It focuses on the limits to the performance of processors because of limited bandwidth to off-chip memory. It models upper bounds on performance as a function of operational intensity, the ratio of computational operations per byte of data moved from\/to memory. While operational intensity can be directly measured for a specific implementation of an algorithm on a particular target platform, it is of interest to obtain broader insights on bottlenecks, where various semantically equivalent implementations of an algorithm are considered, along with analysis for variations in architectural parameters. This is currently very cumbersome and requires performance modeling and analysis of many variants.<\/jats:p>\n                  <jats:p>In this article, we address this problem by using the roofline model in conjunction with upper bounds on the operational intensity of computations as a function of cache capacity, derived from lower bounds on data movement. This enables bottleneck analysis that holds across all dependence-preserving semantically equivalent implementations of an algorithm. We demonstrate the utility of the approach in assessing fundamental limits to performance and energy efficiency for several benchmark algorithms across a design space of architectural variations.<\/jats:p>","DOI":"10.1145\/2693656","type":"journal-article","created":{"date-parts":[[2015,1,12]],"date-time":"2015-01-12T15:02:10Z","timestamp":1421074930000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["On Using the Roofline Model with Lower Bounds on Data Movement"],"prefix":"10.1145","volume":"11","author":[{"given":"Venmugil","family":"Elango","sequence":"first","affiliation":[{"name":"The Ohio State University"}]},{"given":"Naser","family":"Sedaghati","sequence":"additional","affiliation":[{"name":"The Ohio State University"}]},{"given":"Fabrice","family":"Rastello","sequence":"additional","affiliation":[{"name":"Inria"}]},{"given":"Louis-No\u00ebl","family":"Pouchet","sequence":"additional","affiliation":[{"name":"The Ohio State University"}]},{"given":"J.","family":"Ramanujam","sequence":"additional","affiliation":[{"name":"Louisiana State University"}]},{"given":"Radu","family":"Teodorescu","sequence":"additional","affiliation":[{"name":"The Ohio State University"}]},{"given":"P.","family":"Sadayappan","sequence":"additional","affiliation":[{"name":"The Ohio State University"}]}],"member":"320","published-online":{"date-parts":[[2015,1,9]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/28395.28428"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/48529.48535"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1137\/090769156"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/2395116.2395121"},{"key":"e_1_2_1_5_1","unstructured":"Keren Bergman Shekhar Borkar Dan Campbell William Carlson William Dally Monty Denneau Paul Franzon William Harrod Kerry Hill Jon Hiller et al. 2008. Exascale Computing Study: Technology Challenges in Achieving Exascale Systems. Tech. Rep. Defense Advanced Research Projects Agency Information Processing Techniques Office (DARPA IPTO)."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.5555\/646254.684258"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","unstructured":"Gianfranco Bilardi Andrea Pietracaprina and Paolo D\u2019Alberto. 2000. On the space and access complexity of computation DAGs. In Graph-Theoretic Concepts in Computer Science. 81--92.","DOI":"10.5555\/647681.732330"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1007\/11602569_41"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1007\/s002240000131"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","unstructured":"Gianfranco Bilardi Michele Scquizzato and Francesco Silvestri. 2012. A lower bound technique for communication on BSP with application to the FFT. In Euro-Par. 676--687. 10.1007\/978-3-642-32820-6_67","DOI":"10.1007\/978-3-642-32820-6_67"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2013.77"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2014.54"},{"key":"e_1_2_1_13_1","doi-asserted-by":"crossref","unstructured":"Michael Christ James Demmel Nicholas Knight Thomas Scanlon and Katherine Yelick. 2013. Communication Lower Bounds and Optimal Algorithms for Programs That Reference Arrays Part 1. EECS Technical Report EECS--2013-61 University of California Berkeley.","DOI":"10.21236\/ADA584726"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0022-0000(74)80046-2"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.5555\/2001252.2001261"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1137\/080731992"},{"key":"e_1_2_1_17_1","unstructured":"Venmugil Elango Fabrice Rastello Louis-No\u00ebl Pouchet J. Ramanujam and P. Sadayappan. 2013. Data Access Complexity: The Red\/Blue Pebble Game Revisited. Technical Report. OSU\/INRIA\/LSU\/UCLA. OSU-CISRC-7\/13-TR16."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/2612669.2612694"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.5555\/2016691"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.6028\/jres.049.044"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/800076.802486"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2004.03.021"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2013.32"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/1669112.1669172"},{"key":"e_1_2_1_25_1","unstructured":"Naveen Muralimanohar Rajeev Balasubramonian and Norman P. Jouppi. 2009. CACTI 6.0: A Tool to Model Large Caches. Technical Report HPL-2009-85. HP Labs."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.5555\/2033094.2033106"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jda.2011.12.005"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1142\/S0129054112500128"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASSCC.2009.5357230"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.5555\/646714.701412"},{"key":"e_1_2_1_31_1","volume-title":"Models of Computation","author":"Savage John E.","unstructured":"John E. Savage. 1998. Models of Computation. Addison-Wesley."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/1463768.1463780"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/1644001.1644008"},{"key":"e_1_2_1_34_1","volume-title":"Communication lower bounds for distributed-memory computations. CoRR abs\/1307.1805","author":"Scquizzato Michele","year":"2013","unstructured":"Michele Scquizzato and Francesco Silvestri. 2013. Communication lower bounds for distributed-memory computations. CoRR abs\/1307.1805 (2013)."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.5555\/1964238.1964240"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","unstructured":"Edgar Solomonik Aydin Buluc and James Demmel. 2013. Minimizing communication in all-pairs shortest paths. In IPDPS. 10.1109\/IPDPS.2013.111","DOI":"10.1109\/IPDPS.2013.111"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jcss.2010.06.012"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/1498765.1498785"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2693656","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2693656","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2693656","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,18]],"date-time":"2025-11-18T09:31:49Z","timestamp":1763458309000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2693656"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,1,9]]},"references-count":38,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2015,1,9]]}},"alternative-id":["10.1145\/2693656"],"URL":"https:\/\/doi.org\/10.1145\/2693656","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"value":"1544-3566","type":"print"},{"value":"1544-3973","type":"electronic"}],"subject":[],"published":{"date-parts":[[2015,1,9]]},"assertion":[{"value":"2014-06-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2014-11-01","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2015-01-09","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}