{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,21]],"date-time":"2025-11-21T11:23:37Z","timestamp":1763724217521},"reference-count":42,"publisher":"Association for Computing Machinery (ACM)","issue":"2","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Program. Lang. Syst."],"published-print":{"date-parts":[[2007,4]]},"abstract":"<jats:p>With the diverging improvements in CPU speeds and memory access latencies, detecting and removing memory access bottlenecks becomes increasingly important. In this work we present METRIC, a software framework for isolating and understanding such bottlenecks using partial access traces. METRIC extracts access traces from executing programs without special compiler or linker support. We make four primary contributions. First, we present a framework for extracting partial access traces based on dynamic binary rewriting of the executing application. Second, we introduce a novel algorithm for compressing these traces. The algorithm generates constant space representations for regular accesses occurring in nested loop structures. Third, we use these traces for offline incremental memory hierarchy simulation. We extract symbolic information from the application executable and use this to generate detailed source-code correlated statistics including per-reference metrics, cache evictor information, and stream metrics. Finally, we demonstrate how this information can be used to isolate and understand memory access inefficiencies. This illustrates a potential advantage of METRIC over compile-time analysis for sample codes, particularly when interprocedural analysis is required.<\/jats:p>","DOI":"10.1145\/1216374.1216380","type":"journal-article","created":{"date-parts":[[2007,6,6]],"date-time":"2007-06-06T14:37:11Z","timestamp":1181140631000},"page":"12","update-policy":"http:\/\/dx.doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":21,"title":["METRIC"],"prefix":"10.1145","volume":"29","author":[{"given":"Jaydeep","family":"Marathe","sequence":"first","affiliation":[{"name":"North Carolina State University, Raleigh, NC"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Frank","family":"Mueller","sequence":"additional","affiliation":[{"name":"North Carolina State University, Raleigh, NC"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tushar","family":"Mohan","sequence":"additional","affiliation":[{"name":"IBM India Research Lab, Hauz Khas, New Delhi"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sally A.","family":"Mckee","sequence":"additional","affiliation":[{"name":"Cornell University, Ithaca, NY"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bronis R.","family":"De Supinski","sequence":"additional","affiliation":[{"name":"Lawrence Livermore National Laboratory, Livermore, CA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Andy","family":"Yoo","sequence":"additional","affiliation":[{"name":"Lawrence Livermore National Laboratory, Livermore, CA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2007,4]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/349299.349303"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1177\/109434200001400404"},{"key":"e_1_2_1_3_1","doi-asserted-by":"crossref","unstructured":"Buck B. and Hollingsworth J. 2000b. Using hardware performance monitors to isolate memory bottlenecks. In Supercomput. 64--65.   Buck B. and Hollingsworth J. 2000b. Using hardware performance monitors to isolate memory bottlenecks. In Supercomput. 64--65.","DOI":"10.1109\/SC.2000.10034"},{"key":"e_1_2_1_4_1","volume-title":"Tech. Rep. 124.","author":"Burrows M.","year":"1994","unstructured":"Burrows , M. and Wheeler , D. J . 1994 . A block-sorting lossless data compression algorithm. Tech. Rep. 124. Burrows, M. and Wheeler, D. J. 1994. A block-sorting lossless data compression algorithm. Tech. Rep. 124."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/1005686.1005708"},{"key":"e_1_2_1_6_1","doi-asserted-by":"crossref","unstructured":"Burtscher M. 2004b. Vpc3 source code. http:\/\/www.csl.cornell.edu\/burtscher\/research\/tracecom pression\/.  Burtscher M. 2004b. Vpc3 source code. http:\/\/www.csl.cornell.edu\/burtscher\/research\/tracecom pression\/.","DOI":"10.1145\/1012888.1005708"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/378795.378859"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/378795.378840"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/301618.301635"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/301618.301633"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/2.825697"},{"key":"e_1_2_1_12_1","volume-title":"Proceedings of the ACM\/IEEE SC Conference.","author":"DeRose L.","unstructured":"DeRose , L. , Ekanadham , K. , Hollingsworth , J. K. , and Sbaraglia , S . 2002. SIGMA: A simulator infrastructure to guide memory analysis . In Proceedings of the ACM\/IEEE SC Conference. DeRose, L., Ekanadham, K., Hollingsworth, J. K., and Sbaraglia, S. 2002. SIGMA: A simulator infrastructure to guide memory analysis. In Proceedings of the ACM\/IEEE SC Conference."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/781131.781159"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/325478.325479"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/301618.301683"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/71.86110"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/232973.233000"},{"key":"e_1_2_1_18_1","volume-title":"Intel Itanium2 Processor Reference Manual for Software Development and Optimization","author":"Intel","unstructured":"Intel . 2004. Intel Itanium2 Processor Reference Manual for Software Development and Optimization Vol. 1 , Intel , Santa Clara, CA . Intel. 2004. Intel Itanium2 Processor Reference Manual for Software Development and Optimization Vol.1, Intel, Santa Clara, CA."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1002\/spe.4380240204"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/207110.207163"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/2.318580"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/244804.244806"},{"key":"e_1_2_1_23_1","unstructured":"Manning N. 2005. Sequitur source code. http:\/\/sequence.rutgers.edu\/sequitur\/sequitur.cc.  Manning N. 2005. Sequitur source code. http:\/\/sequence.rutgers.edu\/sequitur\/sequitur.cc."},{"key":"e_1_2_1_24_1","volume-title":"Proceedings of the Workshop on Binary Translation.","author":"Marathe J.","unstructured":"Marathe , J. and Mueller , F . 2002. Detecting memory performance bottlenecks via binary rewriting . In Proceedings of the Workshop on Binary Translation. Marathe, J. and Mueller, F. 2002. Detecting memory performance bottlenecks via binary rewriting. In Proceedings of the Workshop on Binary Translation."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/1088149.1088153"},{"key":"e_1_2_1_26_1","volume-title":"Proceedings of the International Symposium on Code Generation and Optimization, 289--300","author":"Marathe J.","unstructured":"Marathe , J. , Mueller , F. , Mohan , T. , de Supinski , B. R. , McKee , S. A. , and Yoo , A . 2003. METRIC: Tracking down inefficiencies in the memory hierarchy via binary rewriting . In Proceedings of the International Symposium on Code Generation and Optimization, 289--300 . Marathe, J., Mueller, F., Mohan, T., de Supinski, B. R., McKee, S. A., and Yoo, A. 2003. METRIC: Tracking down inefficiencies in the memory hierarchy via binary rewriting. In Proceedings of the International Symposium on Code Generation and Optimization, 289--300."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/1006209.1006250"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/377792.377826"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/1048935.1050199"},{"key":"e_1_2_1_30_1","volume-title":"-K","author":"Mowry T.","year":"1997","unstructured":"Mowry , T. and Luk , C . -K . 1997 . Predicting data cache misses in non-numeric applications through correlation profiling. In MICRO- 30, 314--320. Mowry, T. and Luk, C.-K. 1997. Predicting data cache misses in non-numeric applications through correlation profiling. In MICRO-30, 314--320."},{"key":"e_1_2_1_31_1","volume-title":"Workshop on Binary Translation. IEEE Technical Committee on Computer Architecture Newsletter.","author":"Mueller F.","unstructured":"Mueller , F. , Mohan , T. , de Supinski , B. R. , McKee , S. A. , and Yoo , A . 2001. Partial data traces: Efficient generation and representation . In Workshop on Binary Translation. IEEE Technical Committee on Computer Architecture Newsletter. Mueller, F., Mohan, T., de Supinski, B. R., McKee, S. A., and Yoo, A. 2001. Partial data traces: Efficient generation and representation. In Workshop on Binary Translation. IEEE Technical Committee on Computer Architecture Newsletter."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1093\/comjnl\/40.2_and_3.103"},{"key":"e_1_2_1_33_1","volume-title":"Proceedings of the Data Compression Conference, 3--11","author":"Nevill-Manning C. G.","unstructured":"Nevill-Manning , C. G. and Witten , I. H . 1997b. Linear-Time, incremental hierarchy inference for compression . In Proceedings of the Data Compression Conference, 3--11 . Nevill-Manning, C. G. and Witten, I. H. 1997b. Linear-Time, incremental hierarchy inference for compression. In Proceedings of the Data Compression Conference, 3--11."},{"key":"e_1_2_1_34_1","unstructured":"Seward J. 2005. Libbzip2 source code. http:\/\/www.bzip.org\/index.html.  Seward J. 2005. Libbzip2 source code. http:\/\/www.bzip.org\/index.html."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/151220.151227"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/178243.178260"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1147\/rd.461.0005"},{"key":"e_1_2_1_38_1","volume-title":"Proceedings of the Workshop on Binary Translation.","author":"Ung D.","unstructured":"Ung , D. and Cifuentes , C . 2000. Optimising hot paths in a dynamic binary translator . In Proceedings of the Workshop on Binary Translation. Ung, D. and Cifuentes, C. 2000. Optimising hot paths in a dynamic binary translator. In Proceedings of the Workshop on Binary Translation."},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0743-7315(03)00104-7"},{"key":"e_1_2_1_40_1","volume-title":"Proceedings of the Grace Murray Hopper Conference.","author":"Weikle D.","unstructured":"Weikle , D. , McKee , S. A. , Skadron , K. , and Wulf , W . 2000. Caches as filters: A framework for the analysis of caching systems . In Proceedings of the Grace Murray Hopper Conference. Weikle, D., McKee, S. A., Skadron, K., and Wulf, W. 2000. Caches as filters: A framework for the analysis of caching systems. In Proceedings of the Grace Murray Hopper Conference."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/139669.140402"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/996841.996872"}],"container-title":["ACM Transactions on Programming Languages and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1216374.1216380","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T19:46:14Z","timestamp":1672256774000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1216374.1216380"}},"subtitle":["Memory tracing via dynamic binary rewriting to identify cache inefficiencies"],"short-title":[],"issued":{"date-parts":[[2007,4]]},"references-count":42,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2007,4]]}},"alternative-id":["10.1145\/1216374.1216380"],"URL":"https:\/\/doi.org\/10.1145\/1216374.1216380","relation":{},"ISSN":["0164-0925","1558-4593"],"issn-type":[{"value":"0164-0925","type":"print"},{"value":"1558-4593","type":"electronic"}],"subject":[],"published":{"date-parts":[[2007,4]]},"assertion":[{"value":"2007-04-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}