{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,21]],"date-time":"2025-12-21T06:24:26Z","timestamp":1766298266293,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":60,"publisher":"ACM","license":[{"start":{"date-parts":[[2019,6,8]],"date-time":"2019-06-08T00:00:00Z","timestamp":1559952000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["1526750, 1763681, 1439057, 1439021, 1629129, 1409095, 1626251, 1629915"],"award-info":[{"award-number":["1526750, 1763681, 1439057, 1439021, 1629129, 1409095, 1626251, 1629915"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2019,6,8]]},"DOI":"10.1145\/3314221.3314599","type":"proceedings-article","created":{"date-parts":[[2019,6,7]],"date-time":"2019-06-07T21:02:18Z","timestamp":1559941338000},"page":"935-949","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":11,"title":["Co-optimizing memory-level parallelism and cache-level parallelism"],"prefix":"10.1145","author":[{"given":"Xulong","family":"Tang","sequence":"first","affiliation":[{"name":"Pennsylvania State University, USA"}]},{"given":"Mahmut Taylan","family":"Kandemir","sequence":"additional","affiliation":[{"name":"Pennsylvania State University, USA"}]},{"given":"Mustafa","family":"Karakoy","sequence":"additional","affiliation":[{"name":"TOBB University of Economics and Technology, Turkey"}]},{"given":"Meenakshi","family":"Arunachalam","sequence":"additional","affiliation":[{"name":"Intel, USA"}]}],"member":"320","published-online":{"date-parts":[[2019,6,8]]},"reference":[{"key":"e_1_3_2_2_1_1","volume-title":"Lam","author":"Anderson Jennifer M.","year":"1993","unstructured":"Jennifer M. Anderson and Monica S . Lam . 1993 . Global Optimizations for Parallelism and Locality on Scalable Parallel Machines. In PLDI. Jennifer M. Anderson and Monica S. Lam. 1993. Global Optimizations for Parallelism and Locality on Scalable Parallel Machines. In PLDI."},{"key":"e_1_3_2_2_2_1","volume-title":"Wood","author":"Binkert Nathan","year":"2011","unstructured":"Nathan Binkert , Bradford Beckmann , Gabriel Black , Steven K. Reinhardt , Ali Saidi , Arkaprava Basu , Joel Hestness , Derek R. Hower , Tushar Krishna , Somayeh Sardashti , Rathijit Sen , Korey Sewell , Muhammad Shoaib , Nilay Vaish , Mark D. Hill , and David A . Wood . 2011 . The Gem5 Simulator. SIGARCH Computer. Architecture. News ( 2011). Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib, Nilay Vaish, Mark D. Hill, and David A. Wood. 2011. The Gem5 Simulator. SIGARCH Computer. Architecture. News (2011)."},{"volume-title":"Proceedings of Programming Language Design And Implementation (PLDI).","author":"Bondhugula Uday","key":"e_1_3_2_2_3_1","unstructured":"Uday Bondhugula , J. Ramanujam, and et al. 2008. PLuTo: A practical and fully automatic polyhedral program optimization system . In Proceedings of Programming Language Design And Implementation (PLDI). Uday Bondhugula, J. Ramanujam, and et al. 2008. PLuTo: A practical and fully automatic polyhedral program optimization system. In Proceedings of Programming Language Design And Implementation (PLDI)."},{"key":"e_1_3_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/195473.195557"},{"volume-title":"Proceedings of the 31st Annual International Symposium on Computer Architecture (ISCA).","author":"Chou Yuan","key":"e_1_3_2_2_5_1","unstructured":"Yuan Chou , B. Fahs , and S. Abraham . 2004. Microarchitecture optimizations for exploiting memory-level parallelism . In Proceedings of the 31st Annual International Symposium on Computer Architecture (ISCA). Yuan Chou, B. Fahs, and S. Abraham. 2004. Microarchitecture optimizations for exploiting memory-level parallelism. In Proceedings of the 31st Annual International Symposium on Computer Architecture (ISCA)."},{"key":"e_1_3_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/207110.207145"},{"key":"e_1_3_2_2_7_1","volume-title":"Traffic Management: A Holistic Approach to Memory Placement on NUMA Systems. In ASPLOS.","author":"Dashti Mohammad","year":"2013","unstructured":"Mohammad Dashti , Alexandra Fedorova , Justin Funston , Fabien Gaud , Renaud Lachaize , Baptiste Lepers , Vivien Quema , and Mark Roth . 2013 . Traffic Management: A Holistic Approach to Memory Placement on NUMA Systems. In ASPLOS. Mohammad Dashti, Alexandra Fedorova, Justin Funston, Fabien Gaud, Renaud Lachaize, Baptiste Lepers, Vivien Quema, and Mark Roth. 2013. Traffic Management: A Holistic Approach to Memory Placement on NUMA Systems. In ASPLOS."},{"key":"e_1_3_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2014.34"},{"key":"e_1_3_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/2737924.2737989"},{"key":"e_1_3_2_2_10_1","volume-title":"Proceedings of the 13th International Symposium on High Performance Computer Architecture (HPCA).","author":"Eyerman Stijn","year":"2007","unstructured":"Stijn Eyerman and Lieven Eeckhout . 2007 . A Memory-Level Parallelism Aware Fetch Policy for SMT Processors . In Proceedings of the 13th International Symposium on High Performance Computer Architecture (HPCA). Stijn Eyerman and Lieven Eeckhout. 2007. A Memory-Level Parallelism Aware Fetch Policy for SMT Processors. In Proceedings of the 13th International Symposium on High Performance Computer Architecture (HPCA)."},{"key":"e_1_3_2_2_11_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF01407835"},{"key":"e_1_3_2_2_12_1","volume-title":"Lam","author":"Hall Mary H.","year":"1995","unstructured":"Mary H. Hall , Saman P. Amarasinghe , Brian R. Murphy , Shih-Wei Liao , and Monica S . Lam . 1995 . Detecting Coarse-grain Parallelism Using an Interprocedural Parallelizing Compiler. In Supercomputing . Mary H. Hall, Saman P. Amarasinghe, Brian R. Murphy, Shih-Wei Liao, and Monica S. Lam. 1995. Detecting Coarse-grain Parallelism Using an Interprocedural Parallelizing Compiler. In Supercomputing."},{"volume-title":"Proceedings of the 43rd International Symposium on Computer Architecture.","author":"Hashemi Milad","key":"e_1_3_2_2_13_1","unstructured":"Milad Hashemi , Khubaib, Eiman Ebrahimi , Onur Mutlu , and Yale N. Patt . 2016. Accelerating Dependent Cache Misses with an Enhanced Memory Controller . In Proceedings of the 43rd International Symposium on Computer Architecture. Milad Hashemi, Khubaib, Eiman Ebrahimi, Onur Mutlu, and Yale N. Patt. 2016. Accelerating Dependent Cache Misses with an Enhanced Memory Controller. In Proceedings of the 43rd International Symposium on Computer Architecture."},{"key":"e_1_3_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2004.4"},{"key":"e_1_3_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2540708.2540730"},{"key":"e_1_3_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2012.6168944"},{"key":"e_1_3_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/263580.263595"},{"key":"e_1_3_2_2_18_1","doi-asserted-by":"crossref","unstructured":"Mahmut Kandemir Alok Choudhary J Ramanujam and Prith Banerjee. 1999. A matrix-based approach to global locality optimization. In Journal of Parallel and Distributed Computing.   Mahmut Kandemir Alok Choudhary J Ramanujam and Prith Banerjee. 1999. A matrix-based approach to global locality optimization. In Journal of Parallel and Distributed Computing .","DOI":"10.1006\/jpdc.1999.1552"},{"volume-title":"A layout-conscious iteration space transformation technique","author":"Kandemir Mahmut","key":"e_1_3_2_2_19_1","unstructured":"Mahmut Kandemir , J. Ramanujam , Alok Choudhary , and Prithviraj Banerjee . 2001. A layout-conscious iteration space transformation technique . In IEEE Transactions on Computers . Mahmut Kandemir, J. Ramanujam, Alok Choudhary, and Prithviraj Banerjee. 2001. A layout-conscious iteration space transformation technique. In IEEE Transactions on Computers."},{"key":"e_1_3_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/2745844.2745867"},{"volume-title":"Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).","author":"Kim Changkyu","key":"e_1_3_2_2_21_1","unstructured":"Changkyu Kim , Doug Burger , and Stephen O. Keckler . 2002. An Adaptive, Non-uniform Cache Structure for Wire-delay Dominated On-chip Caches . In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Changkyu Kim, Doug Burger, and Stephen O. Keckler. 2002. An Adaptive, Non-uniform Cache Structure for Wire-delay Dominated On-chip Caches. In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)."},{"key":"e_1_3_2_2_22_1","volume-title":"The Sixteenth International Symposium on High-Performance Computer Architecture.","author":"Kim Yoongu","year":"2010","unstructured":"Yoongu Kim , Dongsu Han , Onur Mutlu , and Mor Harchol-Balter . 2010 . ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers . In The Sixteenth International Symposium on High-Performance Computer Architecture. Yoongu Kim, Dongsu Han, Onur Mutlu, and Mor Harchol-Balter. 2010. ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers. In The Sixteenth International Symposium on High-Performance Computer Architecture."},{"key":"e_1_3_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2010.51"},{"key":"e_1_3_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3192366.3192386"},{"key":"e_1_3_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2017.20"},{"key":"e_1_3_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/258915.258946"},{"volume-title":"Proceedings of the 42nd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO).","author":"Lee Chang Joo","key":"e_1_3_2_2_27_1","unstructured":"Chang Joo Lee , Veynu Narasiman , Onur Mutlu , and Yale N. Patt . 2009. Improving Memory Bank-level Parallelism in the Presence of Prefetching . In Proceedings of the 42nd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). Chang Joo Lee, Veynu Narasiman, Onur Mutlu, and Yale N. Patt. 2009. Improving Memory Bank-level Parallelism in the Presence of Prefetching. In Proceedings of the 42nd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO)."},{"volume-title":"Optimizing data locality by array restructuring. Department of Computer Science and Engineering","author":"Leung Shun-Tak","key":"e_1_3_2_2_28_1","unstructured":"Shun-Tak Leung and John Zahorjan . 1995. Optimizing data locality by array restructuring. Department of Computer Science and Engineering , University of Washington , Seattle, WA . Shun-Tak Leung and John Zahorjan. 1995. Optimizing data locality by array restructuring. Department of Computer Science and Engineering, University of Washington, Seattle, WA."},{"key":"e_1_3_2_2_30_1","volume-title":"Lam","author":"Lim Amy W.","year":"1999","unstructured":"Amy W. Lim , Gerald I. Cheong , and Monica S . Lam . 1999 . An Affine Partitioning Algorithm to Maximize Parallelism and Minimize Communication. In ICS. Amy W. Lim, Gerald I. Cheong, and Monica S. Lam. 1999. An Affine Partitioning Algorithm to Maximize Parallelism and Minimize Communication. In ICS."},{"volume-title":"Proceedings of the 20th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL).","author":"Maydan Dror E.","key":"e_1_3_2_2_31_1","unstructured":"Dror E. Maydan , Saman P. Amarasinghe , and Monica S. Lam . 1993. Array-data Flow Analysis and Its Use in Array Privatization . In Proceedings of the 20th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL). Dror E. Maydan, Saman P. Amarasinghe, and Monica S. Lam. 1993. Array-data Flow Analysis and Its Use in Array Privatization. In Proceedings of the 20th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL)."},{"volume-title":"Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).","author":"Kathryn","key":"e_1_3_2_2_32_1","unstructured":"Kathryn S. McKinley and Olivier Temam. 1996. A Quantitative Analysis of Loop Nest Locality . In Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Kathryn S. McKinley and Olivier Temam. 1996. A Quantitative Analysis of Loop Nest Locality. In Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)."},{"key":"e_1_3_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2007.40"},{"key":"e_1_3_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2008.7"},{"key":"e_1_3_2_2_35_1","volume-title":"Integrating Loop and Data Transformations for Global Optimization. J. Parallel Distribute Computer","author":"O'Boyle M.F.P.","year":"2002","unstructured":"M.F.P. O'Boyle and P.M.O. Knijnenburg . 2002. Integrating Loop and Data Transformations for Global Optimization. J. Parallel Distribute Computer ( 2002 ). M.F.P. O'Boyle and P.M.O. Knijnenburg. 2002. Integrating Loop and Data Transformations for Global Optimization. J. Parallel Distribute Computer (2002)."},{"volume-title":"Proceedings of the 32nd Annual ACM\/IEEE International Symposium on Microarchitecture.","author":"Vijay","key":"e_1_3_2_2_36_1","unstructured":"Vijay S. Pai and Sarita Adve. 1999. Code Transformations to Improve Memory Parallelism . In Proceedings of the 32nd Annual ACM\/IEEE International Symposium on Microarchitecture. Vijay S. Pai and Sarita Adve. 1999. Code Transformations to Improve Memory Parallelism. In Proceedings of the 32nd Annual ACM\/IEEE International Symposium on Microarchitecture."},{"volume-title":"Proceedings of the 2016 International Conference on Parallel Architectures and Compilation (PACT).","author":"Pattnaik Ashutosh","key":"e_1_3_2_2_37_1","unstructured":"Ashutosh Pattnaik , Xulong Tang , Adwait Jog , Onur Kayiran , Asit K. Mishra , Mahmut T. Kandemir , Onur Mutlu , and Chita R. Das . 2016. Scheduling Techniques for GPU Architectures with Processing-In-Memory Capabilities . In Proceedings of the 2016 International Conference on Parallel Architectures and Compilation (PACT). Ashutosh Pattnaik, Xulong Tang, Adwait Jog, Onur Kayiran, Asit K. Mishra, Mahmut T. Kandemir, Onur Mutlu, and Chita R. Das. 2016. Scheduling Techniques for GPU Architectures with Processing-In-Memory Capabilities. In Proceedings of the 2016 International Conference on Parallel Architectures and Compilation (PACT)."},{"volume-title":"Proceedings of the 46th International Symposium on Computer Architecture.","author":"Pattnaik Ashutosh","key":"e_1_3_2_2_38_1","unstructured":"Ashutosh Pattnaik , Xulong Tang , Onur Kayiran , Adwait Jog , Asit Mishra , Mahmut T. Kandemir , Anand Sivasubramaniam , and Chita R. Das . 2019. Opportunistic Computing in GPU Architectures . In Proceedings of the 46th International Symposium on Computer Architecture. Ashutosh Pattnaik, Xulong Tang, Onur Kayiran, Adwait Jog, Asit Mishra, Mahmut T. Kandemir, Anand Sivasubramaniam, and Chita R. Das. 2019. Opportunistic Computing in GPU Architectures. In Proceedings of the 46th International Symposium on Computer Architecture."},{"volume-title":"Proceedings of the 33rd Annual International Symposium on Computer Architecture (ISCA).","author":"Qureshi Moinuddin K.","key":"e_1_3_2_2_39_1","unstructured":"Moinuddin K. Qureshi , Daniel N. Lynch , Onur Mutlu , and Yale N. Patt . 2006. A Case for MLP-Aware Cache Replacement . In Proceedings of the 33rd Annual International Symposium on Computer Architecture (ISCA). Moinuddin K. Qureshi, Daniel N. Lynch, Onur Mutlu, and Yale N. Patt. 2006. A Case for MLP-Aware Cache Replacement. In Proceedings of the 33rd Annual International Symposium on Computer Architecture (ISCA)."},{"key":"e_1_3_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/MASCOTS.2018.00022"},{"key":"e_1_3_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.5555\/522659.825636"},{"volume-title":"Proceedings of the 2012 45th Annual IEEE\/ACM International Symposium on Microarchitecture.","author":"Sharifi Akbar","key":"e_1_3_2_2_42_1","unstructured":"Akbar Sharifi , Emre Kultursay , Mahmut Kandemir , and Chita R. Das . 2012. Addressing End-to-End Memory Access Latency in NoC-Based Multicores . In Proceedings of the 2012 45th Annual IEEE\/ACM International Symposium on Microarchitecture. Akbar Sharifi, Emre Kultursay, Mahmut Kandemir, and Chita R. Das. 2012. Addressing End-to-End Memory Access Latency in NoC-Based Multicores. In Proceedings of the 2012 45th Annual IEEE\/ACM International Symposium on Microarchitecture."},{"key":"e_1_3_2_2_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/MASCOTS.2017.16"},{"key":"e_1_3_2_2_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/3123939.3124540"},{"key":"e_1_3_2_2_45_1","volume-title":"Knights Landing: Second-Generation Intel Xeon Phi Product","author":"Sodani Avinash","year":"2016","unstructured":"Avinash Sodani , Roger Gramunt , Jesus Corbal , Ho-Seop Kim , Krishna Vinod , Sundaram Chinthamani , Steven Hutsell , Rajat Agarwal , and Yen-Chen Liu . 2016 . Knights Landing: Second-Generation Intel Xeon Phi Product . IEEE Micro ( 2016). Avinash Sodani, Roger Gramunt, Jesus Corbal, Ho-Seop Kim, Krishna Vinod, Sundaram Chinthamani, Steven Hutsell, Rajat Agarwal, and Yen-Chen Liu. 2016. Knights Landing: Second-Generation Intel Xeon Phi Product. IEEE Micro (2016)."},{"key":"e_1_3_2_2_46_1","first-page":"53","volume-title":"Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS IV). ACM","author":"Gurindar","unstructured":"Gurindar S. Sohi and Manoj Franklin. 1991. High-bandwidth Data Memory Systems for Superscalar Processors . In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS IV). ACM , New York, NY, USA , 53 - 62 . Gurindar S. Sohi and Manoj Franklin. 1991. High-bandwidth Data Memory Systems for Superscalar Processors. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS IV). ACM, New York, NY, USA, 53-62."},{"key":"e_1_3_2_2_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/301618.301668"},{"key":"e_1_3_2_2_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/LCA.2017.2701370"},{"key":"e_1_3_2_2_49_1","doi-asserted-by":"publisher","DOI":"10.5555\/3195638.3195708"},{"key":"e_1_3_2_2_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/3309697.3331487"},{"key":"e_1_3_2_2_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/3123939.3123954"},{"volume-title":"Proceedings of the 2019 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS).","author":"Tang Xulong","key":"e_1_3_2_2_52_1","unstructured":"Xulong Tang , Ashutosh Pattnaik , Onur Kayiran , Adwait Jog , Mahmut Taylan Kandemir , and Chita R. Das . 2019. Quantifying Data Locality in Dynamic Parallelism in GPUs . In Proceedings of the 2019 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS). Xulong Tang, Ashutosh Pattnaik, Onur Kayiran, Adwait Jog, Mahmut Taylan Kandemir, and Chita R. Das. 2019. Quantifying Data Locality in Dynamic Parallelism in GPUs. In Proceedings of the 2019 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS)."},{"key":"e_1_3_2_2_53_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASAP.2003.1212826"},{"key":"e_1_3_2_2_54_1","doi-asserted-by":"crossref","unstructured":"Ben Verghese Scott Devine Anoop Gupta and Mendel Rosenblum. 1996. Operating System Support for Improving Data Locality on CCNUMA Compute Servers. In ASPLOS.   Ben Verghese Scott Devine Anoop Gupta and Mendel Rosenblum. 1996. Operating System Support for Improving Data Locality on CCNUMA Compute Servers. In ASPLOS .","DOI":"10.1145\/237090.237205"},{"volume-title":"Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation (PLDI).","author":"Michael","key":"e_1_3_2_2_55_1","unstructured":"Michael E. Wolf and Monica S. Lam. 1991. A Data Locality Optimizing Algorithm . In Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation (PLDI). Michael E. Wolf and Monica S. Lam. 1991. A Data Locality Optimizing Algorithm. In Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation (PLDI)."},{"key":"e_1_3_2_2_56_1","volume-title":"Lam","author":"Wolf Michael E.","year":"1991","unstructured":"Michael E. Wolf and Monica S . Lam . 1991 . A loop transformation theory and an algorithm to maximize parallelism. IEEE Transactions on Parallel and Distributed Systems ( 1991). Michael E. Wolf and Monica S. Lam. 1991. A loop transformation theory and an algorithm to maximize parallelism. IEEE Transactions on Parallel and Distributed Systems (1991)."},{"key":"e_1_3_2_2_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/223982.223990"},{"key":"e_1_3_2_2_58_1","volume-title":"McKee","author":"Wulf Wm. A.","year":"1995","unstructured":"Wm. A. Wulf and Sally A . McKee . 1995 . Hitting the Memory Wall : Implications of the Obvious. SIGARCH Computer Architecture News ( 1995). Wm. A. Wulf and Sally A. McKee. 1995. Hitting the Memory Wall: Implications of the Obvious. SIGARCH Computer Architecture News (1995)."},{"key":"e_1_3_2_2_59_1","volume-title":"Meeting midway: Improving CMP performance with memory-side prefetching. In Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT).","author":"Yedlapalli Praveen","year":"2013","unstructured":"Praveen Yedlapalli , Jagadish Kotra , Emre Kultursay , Mahmut Kandemir , Chita R. Das , and Anand Sivasubramaniam . 2013 . Meeting midway: Improving CMP performance with memory-side prefetching. In Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT). Praveen Yedlapalli, Jagadish Kotra, Emre Kultursay, Mahmut Kandemir, Chita R. Das, and Anand Sivasubramaniam. 2013. Meeting midway: Improving CMP performance with memory-side prefetching. In Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT)."},{"volume-title":"Proceedings of the 55th Annual Design Automation Conference (DAC '18)","author":"Zhang Haibo","key":"e_1_3_2_2_60_1","unstructured":"Haibo Zhang , Prasanna Venkatesh Rengasamy , Nachiappan Chidambaram Nachiappan , Shulin Zhao , Anand Sivasubramaniam , Mahmut T. Kandemir , and Chita R. Das . 2018. FLOSS: FLOw Sensitive Scheduling on Mobile Platforms . In Proceedings of the 55th Annual Design Automation Conference (DAC '18) . ACM, New York, NY, USA, Article 173, 6 pages. Haibo Zhang, Prasanna Venkatesh Rengasamy, Nachiappan Chidambaram Nachiappan, Shulin Zhao, Anand Sivasubramaniam, Mahmut T. Kandemir, and Chita R. Das. 2018. FLOSS: FLOw Sensitive Scheduling on Mobile Platforms. In Proceedings of the 55th Annual Design Automation Conference (DAC '18). ACM, New York, NY, USA, Article 173, 6 pages."},{"key":"e_1_3_2_2_61_1","first-page":"517","volume-title":"Proceedings of the 50th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO-50 '17)","author":"Zhang Haibo","unstructured":"Haibo Zhang , Prasanna Venkatesh Rengasamy , Shulin Zhao , Nachiappan Chidambaram Nachiappan , Anand Sivasubramaniam , Mahmut T. Kandemir , Ravi Iyer , and Chita R. Das . 2017. Race-to-sleep + Content Caching + Display Caching: A Recipe for Energy-efficient Video Streaming on Handhelds . In Proceedings of the 50th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO-50 '17) . ACM, NewYork, NY, USA , 517 - 531 . Haibo Zhang, Prasanna Venkatesh Rengasamy, Shulin Zhao, Nachiappan Chidambaram Nachiappan, Anand Sivasubramaniam, Mahmut T. Kandemir, Ravi Iyer, and Chita R. Das. 2017. Race-to-sleep + Content Caching + Display Caching: A Recipe for Energy-efficient Video Streaming on Handhelds. In Proceedings of the 50th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO-50 '17). ACM, NewYork, NY, USA, 517-531."}],"event":{"name":"PLDI '19: 40th ACM SIGPLAN Conference on Programming Language Design and Implementation","sponsor":["SIGPLAN ACM Special Interest Group on Programming Languages"],"location":"Phoenix AZ USA","acronym":"PLDI '19"},"container-title":["Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3314221.3314599","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3314221.3314599","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3314221.3314599","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:53:22Z","timestamp":1750204402000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3314221.3314599"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,6,8]]},"references-count":60,"alternative-id":["10.1145\/3314221.3314599","10.1145\/3314221"],"URL":"https:\/\/doi.org\/10.1145\/3314221.3314599","relation":{},"subject":[],"published":{"date-parts":[[2019,6,8]]},"assertion":[{"value":"2019-06-08","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}