{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,25]],"date-time":"2025-07-25T10:16:00Z","timestamp":1753438560477,"version":"3.41.0"},"reference-count":52,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2013,12,1]],"date-time":"2013-12-01T00:00:00Z","timestamp":1385856000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100006751","name":"U.S. Army","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100006751","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000015","name":"U.S. Department of Energy","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000015","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2013,12]]},"abstract":"<jats:p>Emerging computer architectures will feature drastically decreased flops\/byte (ratio of peak processing rate to memory bandwidth) as highlighted by recent studies on Exascale architectural trends. Further, flops are getting cheaper, while the energy cost of data movement is increasingly dominant. The understanding and characterization of data locality properties of computations is critical in order to guide efforts to enhance data locality.<\/jats:p>\n          <jats:p>Reuse distance analysis of memory address traces is a valuable tool to perform data locality characterization of programs. A single reuse distance analysis can be used to estimate the number of cache misses in a fully associative LRU cache of any size, thereby providing estimates on the minimum bandwidth requirements at different levels of the memory hierarchy to avoid being bandwidth bound. However, such an analysis only holds for the particular execution order that produced the trace. It cannot estimate potential improvement in data locality through dependence-preserving transformations that change the execution schedule of the operations in the computation.<\/jats:p>\n          <jats:p>In this article, we develop a novel dynamic analysis approach to characterize the inherent locality properties of a computation and thereby assess the potential for data locality enhancement via dependence-preserving transformations. The execution trace of a code is analyzed to extract a Computational-Directed Acyclic Graph (CDAG) of the data dependences. The CDAG is then partitioned into convex subsets, and the convex partitioning is used to reorder the operations in the execution trace to enhance data locality. The approach enables us to go beyond reuse distance analysis of a single specific order of execution of the operations of a computation in characterization of its data locality properties. It can serve a valuable role in identifying promising code regions for manual transformation, as well as assessing the effectiveness of compiler transformations for data locality enhancement. We demonstrate the effectiveness of the approach using a number of benchmarks, including case studies where the potential shown by the analysis is exploited to achieve lower data movement costs and better performance.<\/jats:p>","DOI":"10.1145\/2541228.2555309","type":"journal-article","created":{"date-parts":[[2014,1,14]],"date-time":"2014-01-14T13:39:57Z","timestamp":1389706797000},"page":"1-29","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":10,"title":["Beyond reuse distance analysis"],"prefix":"10.1145","volume":"10","author":[{"given":"Naznin","family":"Fauzia","sequence":"first","affiliation":[{"name":"The Ohio State University, Columbus OH, USA"}]},{"given":"Venmugil","family":"Elango","sequence":"additional","affiliation":[{"name":"The Ohio State University, Columbus OH, USA"}]},{"given":"Mahesh","family":"Ravishankar","sequence":"additional","affiliation":[{"name":"The Ohio State University, Columbus OH, USA"}]},{"given":"J.","family":"Ramanujam","sequence":"additional","affiliation":[{"name":"Louisiana State University, Baton Rouge LA, USA"}]},{"given":"Fabrice","family":"Rastello","sequence":"additional","affiliation":[{"name":"INRIA COMPSYS\/ENS Lyon, Lyon cedex, France"}]},{"given":"Atanas","family":"Rountev","sequence":"additional","affiliation":[{"name":"The Ohio State University, Columbus OH, USA"}]},{"given":"Louis-No\u00ebl","family":"Pouchet","sequence":"additional","affiliation":[{"name":"University of California Los Angeles, Los Angeles CA, USA"}]},{"given":"P.","family":"Sadayappan","sequence":"additional","affiliation":[{"name":"The Ohio State University, Columbus OH, USA"}]}],"member":"320","published-online":{"date-parts":[[2013,12]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"crossref","unstructured":"Anderson E. Bai Z. Bischof C. Blackford S. Demmel J. Dongarra J. Du Croz J. Greenbaum A. Hammarling S. McKenney A. and Sorensen D. 1999. LAPACK Users\u2019 Guide 3rd ed. Society for Industrial and Applied Mathematics Philadelphia PA.  Anderson E. Bai Z. Bischof C. Blackford S. Demmel J. Dongarra J. Du Croz J. Greenbaum A. Hammarling S. McKenney A. and Sorensen D. 1999. LAPACK Users\u2019 Guide 3rd ed. Society for Industrial and Applied Mathematics Philadelphia PA.","DOI":"10.1137\/1.9780898719604"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/139669.140395"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/1989493.1989495"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1137\/090769156"},{"volume-title":"Proceedings of the 28th International Colloquium on Automata, Languages and Programming (ICALP\u201901)","author":"Bilardi G.","key":"e_1_2_1_6_1","unstructured":"Bilardi , G. and Peserico , E . 2001. A characterization of temporal locality and its portability across memory hierarchies . In Proceedings of the 28th International Colloquium on Automata, Languages and Programming (ICALP\u201901) . 128--139. Bilardi, G. and Peserico, E. 2001. A characterization of temporal locality and its portability across memory hierarchies. In Proceedings of the 28th International Colloquium on Automata, Languages and Programming (ICALP\u201901). 128--139."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/1375581.1375595"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2007.35"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2005.32"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1137\/080731992"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/781131.781159"},{"key":"e_1_2_1_12_1","volume-title":"Technical Report OSU-CISRC-9\/13-TR19","author":"Fauzia N.","year":"2013","unstructured":"Fauzia , N. , Elango , V. , Ravishankar , M. , Pouchet , L.-N. , Ramanujam , J. , Rastello , F. , Rountev , A. , and Sadayappan , P . 2013 . Beyond reuse distance analysis: Dynamic analysis for characterization of data locality potential. Technical Report OSU-CISRC-9\/13-TR19 , Ohio State University . Fauzia, N., Elango, V., Ravishankar, M., Pouchet, L.-N., Ramanujam, J., Rastello, F., Rountev, A., and Sadayappan, P. 2013. Beyond reuse distance analysis: Dynamic analysis for characterization of data locality potential. Technical Report OSU-CISRC-9\/13-TR19, Ohio State University."},{"key":"e_1_2_1_13_1","doi-asserted-by":"crossref","unstructured":"Fuller S. H. and Millett L. I. 2011. The Future of Computing Performance: Game Over or Next Level&quest; National Academies Press.   Fuller S. H. and Millett L. I. 2011. The Future of Computing Performance: Game Over or Next Level&quest; National Academies Press.","DOI":"10.1109\/MC.2011.15"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/1993498.1993553"},{"key":"e_1_2_1_15_1","volume-title":"Computer Architecture: A Quantitative Approach. Morgan Kaufmann.","author":"Hennessy J.","year":"2011","unstructured":"Hennessy , J. and Patterson , D . 2011 . Computer Architecture: A Quantitative Approach. Morgan Kaufmann. Hennessy, J. and Patterson, D. 2011. Computer Architecture: A Quantitative Approach. Morgan Kaufmann."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/1186736.1186737"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/2254064.2254108"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/73560.73588"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2005.130"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-11970-5_15"},{"key":"e_1_2_1_21_1","unstructured":"Kennedy K. and Allen J. 2002. Optimizing Compilers for Modern Architectures: A Dependence-based approach. Morgan Kaufmann.   Kennedy K. and Allen J. 2002. Optimizing Compilers for Modern Architectures: A Dependence-based approach. Morgan Kaufmann."},{"volume-title":"Languages and Compilers for Parallel Computing","author":"Kennedy K.","key":"e_1_2_1_22_1","unstructured":"Kennedy , K. and McKinley , K. S. 1993. Maximizing loop parallelism and improving data locality via loop fusion and distribution . In Languages and Compilers for Parallel Computing . Springer-Verlag , 301--320. Kennedy, K. and McKinley, K. S. 1993. Maximizing loop parallelism and improving data locality via loop fusion and distribution. In Languages and Compilers for Parallel Computing. Springer-Verlag, 301--320."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/1356058.1356071"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/12.2259"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/139669.139702"},{"key":"e_1_2_1_26_1","unstructured":"LAPACK. http:\/\/www.netlib.org\/lapack.  LAPACK. http:\/\/www.netlib.org\/lapack."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/71.238302"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/2134243.2134253"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/1005686.1005691"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1147\/sj.92.0078"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.1984.1676371"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2012.117"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-89740-8_11"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2004.44"},{"key":"e_1_2_1_35_1","volume-title":"PLUTO: A polyhedral automatic parallelizer and locality optimizer for multicores. Retrieved","author":"PLUTO.","year":"2013","unstructured":"PLUTO. 2013 . PLUTO: A polyhedral automatic parallelizer and locality optimizer for multicores. Retrieved December 4, 2013 from http:\/\/pluto-compiler.sourceforge.net. PLUTO. 2013. PLUTO: A polyhedral automatic parallelizer and locality optimizer for multicores. Retrieved December 4, 2013 from http:\/\/pluto-compiler.sourceforge.net."},{"key":"e_1_2_1_36_1","volume-title":"Retrieved","author":"Pohl T.","year":"2013","unstructured":"Pohl , T. 2008. 470.lbm. Retrieved December 4, 2013 from http:\/\/www.spec.org\/cpu2006\/Docs\/470.lbm.html. Pohl, T. 2008. 470.lbm. Retrieved December 4, 2013 from http:\/\/www.spec.org\/cpu2006\/Docs\/470.lbm.html."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/309758.309771"},{"volume-title":"Proceedings of the Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201993)","author":"Rauchwerger L.","key":"e_1_2_1_38_1","unstructured":"Rauchwerger , L. , Dubey , P. , and Nair , R . 1993. Measuring limits of parallelism and characterizing its vulnerability to resource constraints . In Proceedings of the Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201993) . 105--117. Rauchwerger, L., Dubey, P., and Nair, R. 1993. Measuring limits of parallelism and characterizing its vulnerability to resource constraints. In Proceedings of the Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201993). 105--117."},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/207110.207148"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/12276.13313"},{"key":"e_1_2_1_41_1","volume-title":"Proceedings of the 9th International Conference on High Performance Computing for Computational Science--VECPAR","author":"Shalf J.","year":"2010","unstructured":"Shalf , J. , Dosanjh , S. , and Morrison , J . 2011. Exascale computing technology challenges . In Proceedings of the 9th International Conference on High Performance Computing for Computational Science--VECPAR 2010 . 1--25. Shalf, J., Dosanjh, S., and Morrison, J. 2011. Exascale computing technology challenges. In Proceedings of the 9th International Conference on High Performance Computing for Computational Science--VECPAR 2010. 1--25."},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/1024393.1024414"},{"volume-title":"Proceedings of the European Conference on Parallel Computing (Euro-Par\u201900)","author":"Stefanovi\u0107 D.","key":"e_1_2_1_43_1","unstructured":"Stefanovi\u0107 , D. and Martonosi , M . 2000. Limits and graph structure of available instruction-level parallelism . In Proceedings of the European Conference on Parallel Computing (Euro-Par\u201900) . 1018--1022. Stefanovi\u0107, D. and Martonosi, M. 2000. Limits and graph structure of available instruction-level parallelism. In Proceedings of the European Conference on Parallel Computing (Euro-Par\u201900). 1018--1022."},{"volume-title":"Proceedings of the Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201992)","author":"Theobald K.","key":"e_1_2_1_44_1","unstructured":"Theobald , K. , Gao , G. , and Hendren , L . 1992. On the limits of program parallelism and its smoothability . In Proceedings of the Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201992) . 10--19. Theobald, K., Gao, G., and Hendren, L. 1992. On the limits of program parallelism and its smoothability. In Proceedings of the Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201992). 10--19."},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2008.4771802"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/1542476.1542496"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/996546.996553"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/106972.106991"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/113445.113449"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-89740-8_16"},{"volume-title":"Proceedings of the 14th International Symposium on High-Performance Computer Architecture (HPCA\u201908)","author":"Zhong H.","key":"e_1_2_1_51_1","unstructured":"Zhong , H. , Mehrara , M. , Lieberman , S. , and Mahlke , S . 2008. Uncovering hidden loop level parallelism in sequential applications . In Proceedings of the 14th International Symposium on High-Performance Computer Architecture (HPCA\u201908) . 290--301. Zhong, H., Mehrara, M., Lieberman, S., and Mahlke, S. 2008. Uncovering hidden loop level parallelism in sequential applications. In Proceedings of the 14th International Symposium on High-Performance Computer Architecture (HPCA\u201908). 290--301."},{"volume-title":"Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques (PACT\u201903)","author":"Zhong Y.","key":"e_1_2_1_52_1","unstructured":"Zhong , Y. , Dropsho , S. G. , and Ding , C . 2003. Miss rate prediction across all program inputs . In Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques (PACT\u201903) . IEEE Computer Society. Zhong, Y., Dropsho, S. G., and Ding, C. 2003. Miss rate prediction across all program inputs. In Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques (PACT\u201903). IEEE Computer Society."},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/996841.996872"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2541228.2555309","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2541228.2555309","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T07:35:01Z","timestamp":1750232101000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2541228.2555309"}},"subtitle":["Dynamic analysis for characterization of data locality potential"],"short-title":[],"issued":{"date-parts":[[2013,12]]},"references-count":52,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2013,12]]}},"alternative-id":["10.1145\/2541228.2555309"],"URL":"https:\/\/doi.org\/10.1145\/2541228.2555309","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2013,12]]},"assertion":[{"value":"2013-06-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2013-10-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2013-12-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}