{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,27]],"date-time":"2026-02-27T03:47:18Z","timestamp":1772164038151,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":67,"publisher":"ACM","license":[{"start":{"date-parts":[[2018,6,11]],"date-time":"2018-06-11T00:00:00Z","timestamp":1528675200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2018,6,11]]},"DOI":"10.1145\/3192366.3192386","type":"proceedings-article","created":{"date-parts":[[2018,6,12]],"date-time":"2018-06-12T08:16:01Z","timestamp":1528791361000},"page":"312-327","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":17,"title":["Enhancing computation-to-core assignment with physical location information"],"prefix":"10.1145","author":[{"given":"Orhan","family":"Kislal","sequence":"first","affiliation":[{"name":"Pennsylvania State University, USA"}]},{"given":"Jagadish","family":"Kotra","sequence":"additional","affiliation":[{"name":"Pennsylvania State University, USA"}]},{"given":"Xulong","family":"Tang","sequence":"additional","affiliation":[{"name":"Pennsylvania State University, USA"}]},{"given":"Mahmut Taylan","family":"Kandemir","sequence":"additional","affiliation":[{"name":"Pennsylvania State University, USA"}]},{"given":"Myoungsoo","family":"Jung","sequence":"additional","affiliation":[{"name":"Yonsei University, South Korea"}]}],"member":"320","published-online":{"date-parts":[[2018,6,11]]},"reference":[{"key":"e_1_3_2_2_1_1","unstructured":"2007. Intel teralops research chip. goo.gl\/lewCk7.  2007. Intel teralops research chip. goo.gl\/lewCk7."},{"key":"e_1_3_2_2_2_1","unstructured":"2009. Intel Single-cloud chip. goo.gl\/RSJjfg.  2009. Intel Single-cloud chip. goo.gl\/RSJjfg."},{"key":"e_1_3_2_2_3_1","unstructured":"2012. minighost. https:\/\/mantevo.org\/default.php.  2012. minighost. https:\/\/mantevo.org\/default.php."},{"key":"e_1_3_2_2_4_1","unstructured":"2012. The Architecture and Performance of the TILE-Gx Processor Family. http:\/\/www.tilera.com\/products\/processors\/TILE-Gx_Family.  2012. The Architecture and Performance of the TILE-Gx Processor Family. http:\/\/www.tilera.com\/products\/processors\/TILE-Gx_Family."},{"key":"e_1_3_2_2_5_1","unstructured":"2013. CORAL Benchmarks. htps:\/\/asc.llnl.gov\/CORAL-benchmarks\/  2013. CORAL Benchmarks. htps:\/\/asc.llnl.gov\/CORAL-benchmarks\/"},{"key":"e_1_3_2_2_6_1","volume-title":"SPEC OMP 2001","year":"2001","unstructured":"2013. SPEC OMP 2001 . htps:\/\/www.spec.org\/omp 2001 \/ 2013. SPEC OMP 2001. htps:\/\/www.spec.org\/omp2001\/"},{"key":"e_1_3_2_2_7_1","unstructured":"2014. Intel Xeon Phi processor. goo.gl\/3DVc9T.  2014. Intel Xeon Phi processor. goo.gl\/3DVc9T."},{"key":"e_1_3_2_2_8_1","volume-title":"Lam","author":"Anderson Jennifer M.","year":"1993","unstructured":"Jennifer M. Anderson and Monica S . Lam . 1993 . Global Optimizations for Parallelism and Locality on Scalable Parallel Machines. In PLDI. Jennifer M. Anderson and Monica S. Lam. 1993. Global Optimizations for Parallelism and Locality on Scalable Parallel Machines. In PLDI."},{"key":"e_1_3_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2015.7056061"},{"key":"e_1_3_2_2_10_1","volume-title":"Wood","author":"Binkert Nathan","year":"2011","unstructured":"Nathan Binkert , Bradford Beckmann , Gabriel Black , Steven K. Reinhardt , Ali Saidi , Arkaprava Basu , Joel Hestness , Derek R. Hower , Tushar Krishna , Somayeh Sardashti , Rathijit Sen , Korey Sewell , Muhammad Shoaib , Nilay Vaish , Mark D. Hill , and David A . Wood . 2011 . The Gem5 Simulator. SIGARCH Comput. Archit. News ( 2011). Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib, Nilay Vaish, Mark D. Hill, and David A. Wood. 2011. The Gem5 Simulator. SIGARCH Comput. Archit. News (2011)."},{"key":"e_1_3_2_2_11_1","volume-title":"Proceedings of Programming Language Design And Implementation (PLDI).","author":"Bondhugula Uday","unstructured":"Uday Bondhugula , J. Ramanujam, and et al. 2008. PLuTo: A practical and fully automatic polyhedral program optimization system . In Proceedings of Programming Language Design And Implementation (PLDI). Uday Bondhugula, J. Ramanujam, and et al. 2008. PLuTo: A practical and fully automatic polyhedral program optimization system. In Proceedings of Programming Language Design And Implementation (PLDI)."},{"key":"e_1_3_2_2_12_1","doi-asserted-by":"crossref","unstructured":"Steve Carr Kathryn S. McKinley and Chau-Oen Tseng. 1994. Compiler Optimizations for Improving Data Locality. In ASPLOS.   Steve Carr Kathryn S. McKinley and Chau-Oen Tseng. 1994. Compiler Optimizations for Improving Data Locality. In ASPLOS .","DOI":"10.1145\/195473.195557"},{"key":"e_1_3_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2009.4798258"},{"key":"e_1_3_2_2_14_1","volume-title":"Proceedings of the 36th Annual IEEE\/ACM International Symposium on Microarchitecture, (MICRO).","author":"Chishti Zeshan","unstructured":"Zeshan Chishti , Michael D. Powell , and T. N. Vijaykumar . 2003. Distance Associativity for High-Performance Energy-Eicient Non-Uniform Cache Architectures . In Proceedings of the 36th Annual IEEE\/ACM International Symposium on Microarchitecture, (MICRO). Zeshan Chishti, Michael D. Powell, and T. N. Vijaykumar. 2003. Distance Associativity for High-Performance Energy-Eicient Non-Uniform Cache Architectures. In Proceedings of the 36th Annual IEEE\/ACM International Symposium on Microarchitecture, (MICRO)."},{"key":"e_1_3_2_2_15_1","doi-asserted-by":"crossref","unstructured":"Micha?Cierniak and Wei Li. 1995. Unifying Data and Control Transformations for Distributed Shared-memory Machines. In PLDI.   Micha?Cierniak and Wei Li. 1995. Unifying Data and Control Transformations for Distributed Shared-memory Machines. In PLDI .","DOI":"10.1145\/207110.207145"},{"key":"e_1_3_2_2_16_1","volume-title":"Proceedings of International Symposium on High Performance Computer Architecture (HPCA).","author":"Das R.","unstructured":"R. Das , R. Ausavarungnirun , O. Mutlu , A. Kumar , and M. Azimi . 2013. Application-to-core mapping policies to reduce memory system interference in multi-core systems . In Proceedings of International Symposium on High Performance Computer Architecture (HPCA). R. Das, R. Ausavarungnirun, O. Mutlu, A. Kumar, and M. Azimi. 2013. Application-to-core mapping policies to reduce memory system interference in multi-core systems. In Proceedings of International Symposium on High Performance Computer Architecture (HPCA)."},{"key":"e_1_3_2_2_17_1","volume-title":"Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA).","author":"Das Reetuparna","unstructured":"Reetuparna Das , Onur Mutlu , Thomas Moscibroda , and Chita R. Das . 2010. Aergia: Exploiting Packet Latency Slack in On-chip Networks . In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA). Reetuparna Das, Onur Mutlu, Thomas Moscibroda, and Chita R. Das. 2010. Aergia: Exploiting Packet Latency Slack in On-chip Networks. In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA)."},{"key":"e_1_3_2_2_18_1","doi-asserted-by":"crossref","unstructured":"Raja Das Mustafa Uysal Joel Saltz and Yuan-Shin Hwang. 1994. Communication Optimizations for Irregular Scientiic Computations on Distributed Memory Architectures. J. Parallel Distrib. Comput. (1994).   Raja Das Mustafa Uysal Joel Saltz and Yuan-Shin Hwang. 1994. Communication Optimizations for Irregular Scientiic Computations on Distributed Memory Architectures. J. Parallel Distrib. Comput. (1994).","DOI":"10.1006\/jpdc.1994.1104"},{"key":"e_1_3_2_2_19_1","doi-asserted-by":"crossref","unstructured":"Raja Das Mustafa Uysal Joel Saltz and Yuan-Shin Hwang. 1994. Communication Optimizations for Irregular Scientiic Computations on Distributed Memory Architectures. J. Parallel Distrib. Comput. (1994).   Raja Das Mustafa Uysal Joel Saltz and Yuan-Shin Hwang. 1994. Communication Optimizations for Irregular Scientiic Computations on Distributed Memory Architectures. J. Parallel Distrib. Comput. (1994).","DOI":"10.1006\/jpdc.1994.1104"},{"key":"e_1_3_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/2451116.2451157"},{"key":"e_1_3_2_2_21_1","volume-title":"Traffic Management: A Holistic Approach to Memory Placement on NUMA Systems. In ASPLOS.","author":"Dashti Mohammad","year":"2013","unstructured":"Mohammad Dashti , Alexandra Fedorova , Justin Funston , Fabien Gaud , Renaud Lachaize , Baptiste Lepers , Vivien Quema , and Mark Roth . 2013 . Traffic Management: A Holistic Approach to Memory Placement on NUMA Systems. In ASPLOS. Mohammad Dashti, Alexandra Fedorova, Justin Funston, Fabien Gaud, Renaud Lachaize, Baptiste Lepers, Vivien Quema, and Mark Roth. 2013. Traffic Management: A Holistic Approach to Memory Placement on NUMA Systems. In ASPLOS."},{"key":"e_1_3_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/2737924.2737989"},{"key":"e_1_3_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/325478.325479"},{"key":"e_1_3_2_2_24_1","volume-title":"Lam","author":"Hall Mary H.","year":"1995","unstructured":"Mary H. Hall , Saman P. Amarasinghe , Brian R. Murphy , Shih-Oei Liao , and Monica S . Lam . 1995 . Detecting Coarse-grain Parallelism Using an Interprocedural Parallelizing Compiler. In Supercomputing . Mary H. Hall, Saman P. Amarasinghe, Brian R. Murphy, Shih-Oei Liao, and Monica S. Lam. 1995. Detecting Coarse-grain Parallelism Using an Interprocedural Parallelizing Compiler. In Supercomputing."},{"key":"e_1_3_2_2_25_1","volume-title":"Exploiting locality for irregular scientific codes. Parallel and Distributed Systems","author":"Han Hwansoo","year":"2006","unstructured":"Hwansoo Han and C.-O. Tseng . 2006. Exploiting locality for irregular scientific codes. Parallel and Distributed Systems , IEEE Transactions on ( 2006 ). Hwansoo Han and C.-O. Tseng. 2006. Exploiting locality for irregular scientific codes. Parallel and Distributed Systems, IEEE Transactions on (2006)."},{"key":"e_1_3_2_2_26_1","doi-asserted-by":"crossref","unstructured":"Mahmut Kandemir Alok Choudhary J Ramanujam and Prith Banerjee. 1999. A matrix-based approach to global locality optimization. J. Parallel and Distrib. Comput. (1999).   Mahmut Kandemir Alok Choudhary J Ramanujam and Prith Banerjee. 1999. A matrix-based approach to global locality optimization. J. Parallel and Distrib. Comput. (1999).","DOI":"10.1006\/jpdc.1999.1552"},{"key":"e_1_3_2_2_27_1","doi-asserted-by":"crossref","unstructured":"M. Kandemir J. Ramanujam A. Choudhary and P. Banerjee. 2001. A layout-conscious iteration space transformation technique. IEEE Trans. Comput. (2001).   M. Kandemir J. Ramanujam A. Choudhary and P. Banerjee. 2001. A layout-conscious iteration space transformation technique. IEEE Trans. Comput. (2001).","DOI":"10.1109\/TC.2001.970571"},{"key":"e_1_3_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/2745844.2745867"},{"key":"e_1_3_2_2_29_1","volume-title":"Das","author":"Kayiran Onur","year":"2016","unstructured":"Onur Kayiran , Adwait Jog , Ashutosh Pattnaik , Rachata Ausavarungnirun , Xulong Tang , Mahmut T. Kandemir , Gabriel H. Loh , Onur Mutlu , and Chita R . Das . 2016 . uC-States: Fine-grained GPU Datapath Power Management. In PACT. Onur Kayiran, Adwait Jog, Ashutosh Pattnaik, Rachata Ausavarungnirun, Xulong Tang, Mahmut T. Kandemir, Gabriel H. Loh, Onur Mutlu, and Chita R. Das. 2016. uC-States: Fine-grained GPU Datapath Power Management. In PACT."},{"key":"e_1_3_2_2_30_1","doi-asserted-by":"crossref","unstructured":"Changkyu Kim Doug Burger and Stephen O Keckler. 2002. An adaptive non-uniform cache structure for wire-delay dominated onchip caches. In ACM SIGPLAN Notices.   Changkyu Kim Doug Burger and Stephen O Keckler. 2002. An adaptive non-uniform cache structure for wire-delay dominated onchip caches. In ACM SIGPLAN Notices .","DOI":"10.1109\/MM.2003.1261393"},{"key":"e_1_3_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2007.15"},{"key":"e_1_3_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2017.20"},{"key":"e_1_3_2_2_33_1","doi-asserted-by":"crossref","unstructured":"Induprakas Kodukula Nawaaz Ahmed and Keshav Pingali. 1997. Data-centric Multi-level Blocking. In PLDI.   Induprakas Kodukula Nawaaz Ahmed and Keshav Pingali. 1997. Data-centric Multi-level Blocking. In PLDI .","DOI":"10.1145\/258915.258946"},{"key":"e_1_3_2_2_34_1","volume-title":"Re-NUCA: A Practical NUCA Architecture for ReRAM Based Last-Level Caches. In 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).","author":"Kotra J. B.","unstructured":"J. B. Kotra , M. Arjomand , D. Guttman , M. T. Kandemir , and C. R. Das . 2016 . Re-NUCA: A Practical NUCA Architecture for ReRAM Based Last-Level Caches. In 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS). J. B. Kotra, M. Arjomand, D. Guttman, M. T. Kandemir, and C. R. Das. 2016. Re-NUCA: A Practical NUCA Architecture for ReRAM Based Last-Level Caches. In 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS)."},{"key":"e_1_3_2_2_35_1","doi-asserted-by":"crossref","unstructured":"J. B. Kotra D. Guttman N. C. N. M. T. Kandemir and C. R. Das. 2017. Quantifying the Potential Beneits of On-chip Near-Data Computing in Manycore Processors. In 2017 IEEE 25th International Symposium on Modeling Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS).  J. B. Kotra D. Guttman N. C. N. M. T. Kandemir and C. R. Das. 2017. Quantifying the Potential Beneits of On-chip Near-Data Computing in Manycore Processors. In 2017 IEEE 25th International Symposium on Modeling Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS) .","DOI":"10.1109\/MASCOTS.2017.26"},{"key":"e_1_3_2_2_36_1","volume-title":"2017 IEEE International Symposium on Workload Characterization (IISWC).","author":"Kotra J. B.","unstructured":"J. B. Kotra , S. Kim , K. Madduri , and M. T. Kandemir . 2017. Congestionaware memory management on NUMA platforms: A VMware ESXi case study . In 2017 IEEE International Symposium on Workload Characterization (IISWC). J. B. Kotra, S. Kim, K. Madduri, and M. T. Kandemir. 2017. Congestionaware memory management on NUMA platforms: A VMware ESXi case study. In 2017 IEEE International Symposium on Workload Characterization (IISWC)."},{"key":"e_1_3_2_2_37_1","volume-title":"Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems.","author":"Kotra Jagadish B.","unstructured":"Jagadish B. Kotra , Narges Shahidi , Zeshan A. Chishti , and Mahmut T. Kandemir . 2017. Hardware-Software Co-design to Mitigate DRAM Refresh Overheads: A Case for Refresh-Aware Process Scheduling . In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems. Jagadish B. Kotra, Narges Shahidi, Zeshan A. Chishti, and Mahmut T. Kandemir. 2017. Hardware-Software Co-design to Mitigate DRAM Refresh Overheads: A Case for Refresh-Aware Process Scheduling. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems."},{"key":"e_1_3_2_2_38_1","volume-title":"Oolf","author":"Lam Monica D.","year":"1991","unstructured":"Monica D. Lam , Edward E. Rothberg , and Michael E . Oolf . 1991 . The Cache Performance and Optimizations of Blocked Algorithms. In ASPLOS. Monica D. Lam, Edward E. Rothberg, and Michael E. Oolf. 1991. The Cache Performance and Optimizations of Blocked Algorithms. In ASPLOS."},{"key":"e_1_3_2_2_39_1","volume-title":"Optimizing data locality by array restructuring. Department of Computer Science and Engineering","author":"Leung Shun-Tak","unstructured":"Shun-Tak Leung and John Zahorjan . 1995. Optimizing data locality by array restructuring. Department of Computer Science and Engineering , University of Oashington. Shun-Tak Leung and John Zahorjan. 1995. Optimizing data locality by array restructuring. Department of Computer Science and Engineering, University of Oashington."},{"key":"e_1_3_2_2_41_1","doi-asserted-by":"crossref","unstructured":"Yong Li A. Abousamra R. Melhem and A. K. Jones. 2012. Compiler-Assisted Data Distribution and Network Coniguration for Chip Multiprocessors. IEEE Transactions on Parallel and Distributed Systems (2012).   Yong Li A. Abousamra R. Melhem and A. K. Jones. 2012. Compiler-Assisted Data Distribution and Network Coniguration for Chip Multiprocessors. IEEE Transactions on Parallel and Distributed Systems (2012).","DOI":"10.1109\/TPDS.2011.279"},{"key":"e_1_3_2_2_42_1","volume-title":"Lam","author":"Lim Amy O.","year":"1999","unstructured":"Amy O. Lim , Gerald I. Cheong , and Monica S . Lam . 1999 . An Affine Partitioning Algorithm to Maximize Parallelism and Minimize Communication. In ICS. Amy O. Lim, Gerald I. Cheong, and Monica S. Lam. 1999. An Affine Partitioning Algorithm to Maximize Parallelism and Minimize Communication. In ICS."},{"key":"e_1_3_2_2_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/2744769.2744876"},{"key":"e_1_3_2_2_44_1","volume-title":"Ainity-on-next-touch: Increasing the Performance of an Industrial PDE Solver on a cc-NUMA System. In ICS.","author":"L\u00f6f Henrik","year":"2005","unstructured":"Henrik L\u00f6f and Sverker Holmgren . 2005 . Ainity-on-next-touch: Increasing the Performance of an Industrial PDE Solver on a cc-NUMA System. In ICS. Henrik L\u00f6f and Sverker Holmgren. 2005. Ainity-on-next-touch: Increasing the Performance of an Industrial PDE Solver on a cc-NUMA System. In ICS."},{"key":"e_1_3_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2009.36"},{"key":"e_1_3_2_2_46_1","doi-asserted-by":"publisher","DOI":"10.5555\/2755753.2757164"},{"key":"e_1_3_2_2_47_1","volume-title":"Proceedings of the 50th Annual Design Automation Conference (DAC).","author":"Mishra Asit K.","unstructured":"Asit K. Mishra , Onur Mutlu , and Chita R. Das . 2013. A Heterogeneous Multiple Network-on-chip Design: An Application-aware Approach . In Proceedings of the 50th Annual Design Automation Conference (DAC). Asit K. Mishra, Onur Mutlu, and Chita R. Das. 2013. A Heterogeneous Multiple Network-on-chip Design: An Application-aware Approach. In Proceedings of the 50th Annual Design Automation Conference (DAC)."},{"key":"e_1_3_2_2_48_1","volume-title":"Proceedings of the 38th Annual International Symposium on Computer Architecture (ISCA).","author":"Mishra Asit K.","unstructured":"Asit K. Mishra , N. Vijaykrishnan , and Chita R. Das . 2011. A Case for Heterogeneous On-chip Interconnects for CMPs . In Proceedings of the 38th Annual International Symposium on Computer Architecture (ISCA). Asit K. Mishra, N. Vijaykrishnan, and Chita R. Das. 2011. A Case for Heterogeneous On-chip Interconnects for CMPs. In Proceedings of the 38th Annual International Symposium on Computer Architecture (ISCA)."},{"key":"e_1_3_2_2_49_1","volume-title":"Integrating Loop and Data Transformations for Global Optimization. J. Parallel Distribute Computer","author":"O'Boyle M.F.P.","year":"2002","unstructured":"M.F.P. O'Boyle and P.M.O. Knijnenburg . 2002. Integrating Loop and Data Transformations for Global Optimization. J. Parallel Distribute Computer ( 2002 ). M.F.P. O'Boyle and P.M.O. Knijnenburg. 2002. Integrating Loop and Data Transformations for Global Optimization. J. Parallel Distribute Computer (2002)."},{"key":"e_1_3_2_2_50_1","volume-title":"Das","author":"Pattnaik Ashutosh","year":"2016","unstructured":"Ashutosh Pattnaik , Xulong Tang , Adwait Jog , Onur Kayiran , Asit K. Mishra , Mahmut T. Kandemir , Onur Mutlu , and Chita R . Das . 2016 . Scheduling Techniques for GPU Architectures with Processing-In-Memory Capabilities. In PACT. Ashutosh Pattnaik, Xulong Tang, Adwait Jog, Onur Kayiran, Asit K. Mishra, Mahmut T. Kandemir, Onur Mutlu, and Chita R. Das. 2016. Scheduling Techniques for GPU Architectures with Processing-In-Memory Capabilities. In PACT."},{"key":"e_1_3_2_2_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/781027.781076"},{"key":"e_1_3_2_2_52_1","volume-title":"Proceedings of International Symposium on Microarchitecture (MICRO).","author":"Sharifi A.","unstructured":"A. Sharifi , E. Kultursay , M. Kandemir , and C.R. Das . 2012. Addressing End-to-End Memory Access Latency in NoC-Based Multicores . In Proceedings of International Symposium on Microarchitecture (MICRO). A. Sharifi, E. Kultursay, M. Kandemir, and C.R. Das. 2012. Addressing End-to-End Memory Access Latency in NoC-Based Multicores. In Proceedings of International Symposium on Microarchitecture (MICRO)."},{"key":"e_1_3_2_2_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/1654059.1654099"},{"key":"e_1_3_2_2_54_1","doi-asserted-by":"publisher","DOI":"10.1109\/MASCOTS.2017.16"},{"key":"e_1_3_2_2_55_1","volume-title":"Knights Landing: Second-Generation Intel Xeon Phi Product","author":"Sodani A.","year":"2016","unstructured":"A. Sodani , R. Gramunt , J. Corbal , H. S. Kim , K. Vinod , S. Chinthamani , S. Hutsell , R. Agarwal , and Y. C. Liu . 2016 . Knights Landing: Second-Generation Intel Xeon Phi Product . IEEE Micro ( 2016). A. Sodani, R. Gramunt, J. Corbal, H. S. Kim, K. Vinod, S. Chinthamani, S. Hutsell, R. Agarwal, and Y. C. Liu. 2016. Knights Landing: Second-Generation Intel Xeon Phi Product. IEEE Micro (2016)."},{"key":"e_1_3_2_2_56_1","doi-asserted-by":"crossref","unstructured":"Yonghong Song and Zhiyuan Li. 1999. New Tiling Techniques to Improve Cache Temporal Locality. In PLDI.   Yonghong Song and Zhiyuan Li. 1999. New Tiling Techniques to Improve Cache Temporal Locality. In PLDI .","DOI":"10.1145\/301618.301668"},{"key":"e_1_3_2_2_57_1","doi-asserted-by":"publisher","DOI":"10.5555\/3195638.3195708"},{"key":"e_1_3_2_2_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/3123939.3123954"},{"key":"e_1_3_2_2_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2017.14"},{"key":"e_1_3_2_2_60_1","unstructured":"S. Verdoolaege M. Bruynooghe G. Janssens and P. Catthoor. 2003. Multi-dimensional incremental loop fusion for data locality. In ASAP.  S. Verdoolaege M. Bruynooghe G. Janssens and P. Catthoor. 2003. Multi-dimensional incremental loop fusion for data locality. In ASAP ."},{"key":"e_1_3_2_2_61_1","doi-asserted-by":"crossref","unstructured":"Ben Verghese Scott Devine Anoop Gupta and Mendel Rosenblum. 1996. Operating System Support for Improving Data Locality on CCNUMA Compute Servers. In ASPLOS.   Ben Verghese Scott Devine Anoop Gupta and Mendel Rosenblum. 1996. Operating System Support for Improving Data Locality on CCNUMA Compute Servers. In ASPLOS .","DOI":"10.1145\/237090.237205"},{"key":"e_1_3_2_2_62_1","volume-title":"Lam","author":"Wolf Michael E.","year":"1991","unstructured":"Michael E. Wolf and Monica S . Lam . 1991 . A Data Locality Optimizing Algorithm. In PLDI. Michael E. Wolf and Monica S. Lam. 1991. A Data Locality Optimizing Algorithm. In PLDI."},{"key":"e_1_3_2_2_63_1","doi-asserted-by":"crossref","unstructured":"M. E. Wolf and M. S. Lam. 1991. A loop transformation theory and an algorithm to maximize parallelism. IEEE Transactions on Parallel and Distributed Systems (1991).   M. E. Wolf and M. S. Lam. 1991. A loop transformation theory and an algorithm to maximize parallelism. IEEE Transactions on Parallel and Distributed Systems (1991).","DOI":"10.1109\/71.97902"},{"key":"e_1_3_2_2_64_1","doi-asserted-by":"publisher","DOI":"10.1145\/223982.223990"},{"key":"e_1_3_2_2_65_1","volume-title":"FLOSS: FLOw Sensitive Scheduling on Mobile Platforms. In In Proceedings of The Design Automation Conference (DAC).","author":"Zhang Haibo","unstructured":"Haibo Zhang , Prasanna Venkatesh Rengasamy , Nachiappan Chidambaram Nachiappan , Shulin Zhao , Anand Sivasubramaniam , Mahmut Kandemir , and Chita R. Das . 2018 . FLOSS: FLOw Sensitive Scheduling on Mobile Platforms. In In Proceedings of The Design Automation Conference (DAC). Haibo Zhang, Prasanna Venkatesh Rengasamy, Nachiappan Chidambaram Nachiappan, Shulin Zhao, Anand Sivasubramaniam, Mahmut Kandemir, and Chita R. Das. 2018. FLOSS: FLOw Sensitive Scheduling on Mobile Platforms. In In Proceedings of The Design Automation Conference (DAC)."},{"key":"e_1_3_2_2_66_1","volume-title":"Proceedings of the 50th Annual IEEE\/ACM International Symposium on Microarchitecture.","author":"Zhang Haibo","unstructured":"Haibo Zhang , Prasanna Venkatesh Rengasamy , Shulin Zhao , Nachiappan Chidambaram Nachiappan , Anand Sivasubramaniam , Mahmut T. Kandemir , Ravi Iyer , and Chita R. Das . 2017. Race-to-sleep + Content Caching + Display Caching: A Recipe for Energy-efficient Video Streaming on Handhelds . In Proceedings of the 50th Annual IEEE\/ACM International Symposium on Microarchitecture. Haibo Zhang, Prasanna Venkatesh Rengasamy, Shulin Zhao, Nachiappan Chidambaram Nachiappan, Anand Sivasubramaniam, Mahmut T. Kandemir, Ravi Iyer, and Chita R. Das. 2017. Race-to-sleep + Content Caching + Display Caching: A Recipe for Energy-efficient Video Streaming on Handhelds. In Proceedings of the 50th Annual IEEE\/ACM International Symposium on Microarchitecture."},{"key":"e_1_3_2_2_67_1","unstructured":"Zhao Zhang Zhichun Zhu and Xiaodong Zhang. 2002. Breaking Address Mapping Symmetry at Multi-levels of Memory Hierarchy to Reduce DRAM Row-bufer Conlicts. In The Journal of Instruction-Level Parallelism (JILP).  Zhao Zhang Zhichun Zhu and Xiaodong Zhang. 2002. Breaking Address Mapping Symmetry at Multi-levels of Memory Hierarchy to Reduce DRAM Row-bufer Conlicts. In The Journal of Instruction-Level Parallelism (JILP) ."},{"key":"e_1_3_2_2_68_1","doi-asserted-by":"publisher","DOI":"10.1145\/1736020.1736036"}],"event":{"name":"PLDI '18: ACM SIGPLAN Conference on Programming Language Design and Implementation","location":"Philadelphia PA USA","acronym":"PLDI '18","sponsor":["SIGPLAN ACM Special Interest Group on Programming Languages"]},"container-title":["Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3192366.3192386","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3192366.3192386","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:07:53Z","timestamp":1750198073000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3192366.3192386"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,6,11]]},"references-count":67,"alternative-id":["10.1145\/3192366.3192386","10.1145\/3192366"],"URL":"https:\/\/doi.org\/10.1145\/3192366.3192386","relation":{"is-identical-to":[{"id-type":"doi","id":"10.1145\/3296979.3192386","asserted-by":"object"}]},"subject":[],"published":{"date-parts":[[2018,6,11]]},"assertion":[{"value":"2018-06-11","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}