{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,9]],"date-time":"2026-07-09T06:03:58Z","timestamp":1783577038051,"version":"3.55.0"},"publisher-location":"New York, NY, USA","reference-count":97,"publisher":"ACM","license":[{"start":{"date-parts":[[2023,3,25]],"date-time":"2023-03-25T00:00:00Z","timestamp":1679702400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,3,25]]},"DOI":"10.1145\/3582016.3582026","type":"proceedings-article","created":{"date-parts":[[2023,3,20]],"date-time":"2023-03-20T16:59:03Z","timestamp":1679331543000},"page":"3-17","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":27,"title":["ABNDP: Co-optimizing Data Access and Load Balance in Near-Data Processing"],"prefix":"10.1145","author":[{"given":"Boyu","family":"Tian","sequence":"first","affiliation":[{"name":"Tsinghua University, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Qihang","family":"Chen","sequence":"additional","affiliation":[{"name":"Tsinghua University, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Mingyu","family":"Gao","sequence":"additional","affiliation":[{"name":"Tsinghua University, China \/ Shanghai Qi Zhi Institute, Shanghai, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2023,3,25]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Co-ML: A Case for Collaborative ML Acceleration Using Near-Data Processing. In International Symposium on Memory Systems (MEMSYS).","author":"Aga Shaizeen","year":"2019","unstructured":"Shaizeen Aga , Nuwan Jayasena , and Mike Ignatowski . 2019 . Co-ML: A Case for Collaborative ML Acceleration Using Near-Data Processing. In International Symposium on Memory Systems (MEMSYS). Shaizeen Aga, Nuwan Jayasena, and Mike Ignatowski. 2019. Co-ML: A Case for Collaborative ML Acceleration Using Near-Data Processing. In International Symposium on Memory Systems (MEMSYS)."},{"key":"e_1_3_2_1_2_1","volume-title":"42nd ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA).","author":"Ahn Junwhan","year":"2015","unstructured":"Junwhan Ahn , Sungpack Hong , Sungjoo Yoo , Onur Mutlu , and Kiyoung Choi . 2015 . A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing . In 42nd ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA). Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2015. A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing. In 42nd ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA)."},{"key":"e_1_3_2_1_3_1","volume-title":"Locality-Aware Processing-in-Memory Architecture. In 42nd ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA).","author":"Ahn Junwhan","year":"2015","unstructured":"Junwhan Ahn , Sungjoo Yoo , Onur Mutlu , and Kiyoung Choi . 2015 . PIM-Enabled Instructions: A Low-Overhead , Locality-Aware Processing-in-Memory Architecture. In 42nd ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA). Junwhan Ahn, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2015. PIM-Enabled Instructions: A Low-Overhead, Locality-Aware Processing-in-Memory Architecture. In 42nd ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA)."},{"key":"e_1_3_2_1_4_1","volume-title":"27th IEEE International Symposium on High-Performance Computer Architecture (HPCA).","author":"Asgari Bahar","year":"2021","unstructured":"Bahar Asgari , Ramyad Hadidi , Jiashen Cao , Da Eun Shim , Sung Kyu Lim , and Hyesoon Kim . 2021 . FAFNIR: Accelerating Sparse Gathering by Using Efficient Near-Memory Intelligent Reduction . In 27th IEEE International Symposium on High-Performance Computer Architecture (HPCA). Bahar Asgari, Ramyad Hadidi, Jiashen Cao, Da Eun Shim, Sung Kyu Lim, and Hyesoon Kim. 2021. FAFNIR: Accelerating Sparse Gathering by Using Efficient Near-Memory Intelligent Reduction. In 27th IEEE International Symposium on High-Performance Computer Architecture (HPCA)."},{"key":"e_1_3_2_1_5_1","volume-title":"Chameleon: Versatile and Practical Near-DRAM Acceleration Architecture for Large Memory Systems. In 49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO).","author":"Asghari-Moghaddam Hadi","year":"2016","unstructured":"Hadi Asghari-Moghaddam , Young Hoon Son , Jung Ho Ahn , and Nam Sung Kim . 2016 . Chameleon: Versatile and Practical Near-DRAM Acceleration Architecture for Large Memory Systems. In 49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). Hadi Asghari-Moghaddam, Young Hoon Son, Jung Ho Ahn, and Nam Sung Kim. 2016. Chameleon: Versatile and Practical Near-DRAM Acceleration Architecture for Large Memory Systems. In 49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO)."},{"key":"e_1_3_2_1_6_1","volume-title":"Near-Data Processing: Insights From a MICRO-46 Workshop","author":"Balasubramonian Rajeev","year":"2014","unstructured":"Rajeev Balasubramonian , Jichuan Chang , Troy Manning , Jaime H Moreno , Richard Murphy , Ravi Nair , and Steven Swanson . 2014. Near-Data Processing: Insights From a MICRO-46 Workshop . IEEE Micro , 34, 4 ( 2014 ). Rajeev Balasubramonian, Jichuan Chang, Troy Manning, Jaime H Moreno, Richard Murphy, Ravi Nair, and Steven Swanson. 2014. Near-Data Processing: Insights From a MICRO-46 Workshop. IEEE Micro, 34, 4 (2014)."},{"key":"e_1_3_2_1_7_1","article-title":"CACTI 7: New Tools for Interconnect Exploration in Innovative Off-Chip Memories","volume":"14","author":"Balasubramonian Rajeev","year":"2017","unstructured":"Rajeev Balasubramonian , Andrew B. Kahng , Naveen Muralimanohar , Ali Shafiee , and Vaishnav Srinivas . 2017 . CACTI 7: New Tools for Interconnect Exploration in Innovative Off-Chip Memories . ACM Transactions on Architecture and Code Optimization (TACO) , 14 , 2 (2017). Rajeev Balasubramonian, Andrew B. Kahng, Naveen Muralimanohar, Ali Shafiee, and Vaishnav Srinivas. 2017. CACTI 7: New Tools for Interconnect Exploration in Innovative Off-Chip Memories. ACM Transactions on Architecture and Code Optimization (TACO), 14, 2 (2017).","journal-title":"ACM Transactions on Architecture and Code Optimization (TACO)"},{"key":"e_1_3_2_1_8_1","volume-title":"22nd International Conference on Parallel Architectures and Compilation Techniques (PACT).","author":"Beckmann Nathan","year":"2013","unstructured":"Nathan Beckmann and Daniel Sanchez . 2013 . Jigsaw : Scalable Software-Defined Caches . In 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT). Nathan Beckmann and Daniel Sanchez. 2013. Jigsaw : Scalable Software-Defined Caches. In 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT)."},{"key":"e_1_3_2_1_9_1","volume-title":"21st IEEE International Symposium on High Performance Computer Architecture (HPCA).","author":"Beckmann Nathan","year":"2015","unstructured":"Nathan Beckmann , Po An Tsai , and Daniel Sanchez . 2015 . Scaling Distributed Cache Hierarchies through Computation and Data Co-Scheduling . In 21st IEEE International Symposium on High Performance Computer Architecture (HPCA). Nathan Beckmann, Po An Tsai, and Daniel Sanchez. 2015. Scaling Distributed Cache Hierarchies through Computation and Data Co-Scheduling. In 21st IEEE International Symposium on High Performance Computer Architecture (HPCA)."},{"key":"e_1_3_2_1_10_1","volume-title":"RedCache: Reduced DRAM Caching. In 57th ACM\/IEEE Design Automation Conference (DAC).","author":"Behnam Payman","year":"2020","unstructured":"Payman Behnam and Mahdi Nazm Bojnordi . 2020 . RedCache: Reduced DRAM Caching. In 57th ACM\/IEEE Design Automation Conference (DAC). Payman Behnam and Mahdi Nazm Bojnordi. 2020. RedCache: Reduced DRAM Caching. In 57th ACM\/IEEE Design Automation Conference (DAC)."},{"key":"e_1_3_2_1_11_1","volume-title":"Unfair Scheduling Patterns in NUMA Architectures. In 28th International Conference on Parallel Architectures and Compilation Techniques (PACT).","author":"Ben-David Naama","unstructured":"Naama Ben-David , Ziv Scully , and Guy E. Blelloch . 2019 . Unfair Scheduling Patterns in NUMA Architectures. In 28th International Conference on Parallel Architectures and Compilation Techniques (PACT). Naama Ben-David, Ziv Scully, and Guy E. Blelloch. 2019. Unfair Scheduling Patterns in NUMA Architectures. In 28th International Conference on Parallel Architectures and Compilation Techniques (PACT)."},{"key":"e_1_3_2_1_12_1","volume-title":"SISA: Set-Centric Instruction Set Architecture For Graph Mining on Processing-in-Memory Systems. In 54th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO).","author":"Besta Maciej","year":"2021","unstructured":"Maciej Besta , Raghavendra Kanakagiri , Grzegorz Kwasniewski , Rachata Ausavarungnirun , Jakub Ber\u00e1nek , Konstantinos Kanellopoulos , Kacper Janda , Zur Vonarburg-Shmaria , Lukas Gianinazzi , and Ioana Stefan . 2021 . SISA: Set-Centric Instruction Set Architecture For Graph Mining on Processing-in-Memory Systems. In 54th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). Maciej Besta, Raghavendra Kanakagiri, Grzegorz Kwasniewski, Rachata Ausavarungnirun, Jakub Ber\u00e1nek, Konstantinos Kanellopoulos, Kacper Janda, Zur Vonarburg-Shmaria, Lukas Gianinazzi, and Ioana Stefan. 2021. SISA: Set-Centric Instruction Set Architecture For Graph Mining on Processing-in-Memory Systems. In 54th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO)."},{"key":"e_1_3_2_1_13_1","article-title":"Scheduling Multithreaded Computations by Work Stealing","volume":"46","author":"Blumofe Robert D","year":"1999","unstructured":"Robert D Blumofe and Charles E Leiserson . 1999 . Scheduling Multithreaded Computations by Work Stealing . Journal of the ACM (JACM) , 46 , 5 (1999). Robert D Blumofe and Charles E Leiserson. 1999. Scheduling Multithreaded Computations by Work Stealing. Journal of the ACM (JACM), 46, 5 (1999).","journal-title":"Journal of the ACM (JACM)"},{"key":"e_1_3_2_1_14_1","volume-title":"Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks. In 23rd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).","author":"Boroumand Amirali","year":"2018","unstructured":"Amirali Boroumand , Saugata Ghose , Youngsok Kim , Rachata Ausavarungnirun , Eric Shiu , Rahul Thakur , Daehyun Kim , Aki Kuusela , Allan Knies , Parthasarathy Ranganathan , and Onur Mutlu . 2018 . Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks. In 23rd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Amirali Boroumand, Saugata Ghose, Youngsok Kim, Rachata Ausavarungnirun, Eric Shiu, Rahul Thakur, Daehyun Kim, Aki Kuusela, Allan Knies, Parthasarathy Ranganathan, and Onur Mutlu. 2018. Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks. In 23rd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)."},{"key":"e_1_3_2_1_15_1","volume-title":"CoNDA: Efficient Cache Coherence Support for Near-Data Accelerators. In 46th ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA).","author":"Boroumand Amirali","year":"2019","unstructured":"Amirali Boroumand , Saugata Ghose , Minesh Patel , Hasan Hassan , Brandon Lucia , Rachata Ausavarungnirun , Kevin Hsieh , Nastaran Hajinazar , Krishna T. Malladi , Hongzhong Zheng , and Onur Mutlu . 2019 . CoNDA: Efficient Cache Coherence Support for Near-Data Accelerators. In 46th ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA). Amirali Boroumand, Saugata Ghose, Minesh Patel, Hasan Hassan, Brandon Lucia, Rachata Ausavarungnirun, Kevin Hsieh, Nastaran Hajinazar, Krishna T. Malladi, Hongzhong Zheng, and Onur Mutlu. 2019. CoNDA: Efficient Cache Coherence Support for Near-Data Accelerators. In 46th ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA)."},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"crossref","unstructured":"Amirali Boroumand Saugata Ghose Minesh Patel Hasan Hassan Brandon Lucia Nastaran Hajinazar Kevin Hsieh Krishna T Malladi Hongzhong Zheng and Onur Mutlu. 2017. LazyPIM: Efficient support for cache coherence in processing-in-memory architectures. In arXiv preprint arXiv:1706.03162. \t\t\t\t  Amirali Boroumand Saugata Ghose Minesh Patel Hasan Hassan Brandon Lucia Nastaran Hajinazar Kevin Hsieh Krishna T Malladi Hongzhong Zheng and Onur Mutlu. 2017. LazyPIM: Efficient support for cache coherence in processing-in-memory architectures. In arXiv preprint arXiv:1706.03162.","DOI":"10.1109\/LCA.2016.2577557"},{"key":"e_1_3_2_1_17_1","volume-title":"Low-Power Approximate String Matching Acceleration Framework for Genome Sequence Analysis. In 53rd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO).","author":"Cali Damla Senol","year":"2020","unstructured":"Damla Senol Cali , Gurpreet S. Kalsi , Z\u00fclal Bing\u00f6l , Can Firtina , Lavanya Subramanian , Jeremie S. Kim , Rachata Ausavarungnirun , Mohammed Alser , Juan Gomez-Luna , Amirali Boroumand , Anant Norion , Allison Scibisz , Sreenivas Subramoneyon , Can Alkan , Saugata Ghose , and Onur Mutlu . 2020 . GenASM: A High-Performance , Low-Power Approximate String Matching Acceleration Framework for Genome Sequence Analysis. In 53rd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). Damla Senol Cali, Gurpreet S. Kalsi, Z\u00fclal Bing\u00f6l, Can Firtina, Lavanya Subramanian, Jeremie S. Kim, Rachata Ausavarungnirun, Mohammed Alser, Juan Gomez-Luna, Amirali Boroumand, Anant Norion, Allison Scibisz, Sreenivas Subramoneyon, Can Alkan, Saugata Ghose, and Onur Mutlu. 2020. GenASM: A High-Performance, Low-Power Approximate String Matching Acceleration Framework for Genome Sequence Analysis. In 53rd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO)."},{"key":"e_1_3_2_1_18_1","volume-title":"23rd IEEE International Symposium on High Performance Computer Architecture (HPCA).","author":"Chatterjee Niladrish","year":"2017","unstructured":"Niladrish Chatterjee , Mike O\u2019Connor , Donghyuk Lee , Daniel R Johnson , Stephen W Keckler , Minsoo Rhu , and William J Dally . 2017 . Architecting an Energy-Efficient DRAM System for GPUs . In 23rd IEEE International Symposium on High Performance Computer Architecture (HPCA). Niladrish Chatterjee, Mike O\u2019Connor, Donghyuk Lee, Daniel R Johnson, Stephen W Keckler, Minsoo Rhu, and William J Dally. 2017. Architecting an Energy-Efficient DRAM System for GPUs. In 23rd IEEE International Symposium on High Performance Computer Architecture (HPCA)."},{"key":"e_1_3_2_1_19_1","volume-title":"WATS: Workload-Aware Task Scheduling in Asymmetric Multi-Core Architectures. In IEEE 26th International Parallel and Distributed Processing Symposium (IPDPS).","author":"Chen Quan","year":"2012","unstructured":"Quan Chen , Yawen Chen , Zhiyi Huang , and Minyi Guo . 2012 . WATS: Workload-Aware Task Scheduling in Asymmetric Multi-Core Architectures. In IEEE 26th International Parallel and Distributed Processing Symposium (IPDPS). Quan Chen, Yawen Chen, Zhiyi Huang, and Minyi Guo. 2012. WATS: Workload-Aware Task Scheduling in Asymmetric Multi-Core Architectures. In IEEE 26th International Parallel and Distributed Processing Symposium (IPDPS)."},{"key":"e_1_3_2_1_20_1","volume-title":"PIMCloud: QoS-Aware Resource Management of Latency-Critical Applications in Clouds with Processing-in-Memory. In 28th IEEE International Symposium on High Performance Computer Architecture (HPCA).","author":"Chen Shuang","unstructured":"Shuang Chen , Yi Jiang , Christina Delimitrou , and Jose F. Martinez . 2022 . PIMCloud: QoS-Aware Resource Management of Latency-Critical Applications in Clouds with Processing-in-Memory. In 28th IEEE International Symposium on High Performance Computer Architecture (HPCA). Shuang Chen, Yi Jiang, Christina Delimitrou, and Jose F. Martinez. 2022. PIMCloud: QoS-Aware Resource Management of Latency-Critical Applications in Clouds with Processing-in-Memory. In 28th IEEE International Symposium on High Performance Computer Architecture (HPCA)."},{"key":"e_1_3_2_1_21_1","volume-title":"CANDY: Enabling Coherent DRAM Caches for Multi-Node Systems. In 49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO).","author":"Chou Chiachen","unstructured":"Chiachen Chou , Aamer Jaleel , and Moinuddin K. Qureshi . 2016 . CANDY: Enabling Coherent DRAM Caches for Multi-Node Systems. In 49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). Chiachen Chou, Aamer Jaleel, and Moinuddin K. Qureshi. 2016. CANDY: Enabling Coherent DRAM Caches for Multi-Node Systems. In 49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO)."},{"key":"e_1_3_2_1_22_1","unstructured":"OpenMP Committee. 2013. OpenMP 4.0 Complete Specifications.  https:\/\/openmp.org\/wp-content\/uploads\/OpenMP4.0.0.pdf \t\t\t\t  OpenMP Committee. 2013. OpenMP 4.0 Complete Specifications.  https:\/\/openmp.org\/wp-content\/uploads\/OpenMP4.0.0.pdf"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3503222.3507706"},{"key":"e_1_3_2_1_24_1","article-title":"GraphH: A Processing-in-Memory Architecture for Large-Scale Graph Processing","volume":"38","author":"Dai Guohao","year":"2019","unstructured":"Guohao Dai , Tianhao Huang , Yuze Chi , Jishen Zhao , Guangyu Sun , Yongpan Liu , Yu Wang , Yuan Xie , and Huazhong Yang . 2019 . GraphH: A Processing-in-Memory Architecture for Large-Scale Graph Processing . IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) , 38 , 4 (2019). Guohao Dai, Tianhao Huang, Yuze Chi, Jishen Zhao, Guangyu Sun, Yongpan Liu, Yu Wang, Yuan Xie, and Huazhong Yang. 2019. GraphH: A Processing-in-Memory Architecture for Large-Scale Graph Processing. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 38, 4 (2019).","journal-title":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD)"},{"key":"e_1_3_2_1_25_1","volume-title":"DIMMining: Pruning-Efficient and Parallel Graph Mining on Near-Memory-Computing. In 49th ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA).","author":"Dai Guohao","year":"2022","unstructured":"Guohao Dai , Zhenhua Zhu , Tianyu Fu , Chiyue Wei , Bangyan Wang , Xiangyu Li , Yuan Xie , Huazhong Yang , and Yu Wang . 2022 . DIMMining: Pruning-Efficient and Parallel Graph Mining on Near-Memory-Computing. In 49th ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA). Guohao Dai, Zhenhua Zhu, Tianyu Fu, Chiyue Wei, Bangyan Wang, Xiangyu Li, Yuan Xie, Huazhong Yang, and Yu Wang. 2022. DIMMining: Pruning-Efficient and Parallel Graph Mining on Near-Memory-Computing. In 49th ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA)."},{"key":"e_1_3_2_1_26_1","volume-title":"19th IEEE International Symposium on High Performance Computer Architecture (HPCA).","author":"Das Reetuparna","year":"2013","unstructured":"Reetuparna Das , Rachata Ausavarungnirun , Onur Mutlu , Akhilesh Kumar , and Mani Azimi . 2013 . Application-to-Core Mapping Policies to Reduce Memory System Interference in Multi-Core Systems . In 19th IEEE International Symposium on High Performance Computer Architecture (HPCA). Reetuparna Das, Rachata Ausavarungnirun, Onur Mutlu, Akhilesh Kumar, and Mani Azimi. 2013. Application-to-Core Mapping Policies to Reduce Memory System Interference in Multi-Core Systems. In 19th IEEE International Symposium on High Performance Computer Architecture (HPCA)."},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/2049662.2049663"},{"key":"e_1_3_2_1_28_1","volume-title":"25th International Conference on Parallel Architectures and Compilation Techniques (PACT).","author":"Drebes Andi","unstructured":"Andi Drebes , Antoniu Pop , Karine Heydemann , Albert Cohen , and Nathalie Drach . 2016. Scalable Task Parallelism for NUMA: A Uniform Abstraction for Coordinated Scheduling and Memory Management . In 25th International Conference on Parallel Architectures and Compilation Techniques (PACT). Andi Drebes, Antoniu Pop, Karine Heydemann, Albert Cohen, and Nathalie Drach. 2016. Scalable Task Parallelism for NUMA: A Uniform Abstraction for Coordinated Scheduling and Memory Management. In 25th International Conference on Parallel Architectures and Compilation Techniques (PACT)."},{"key":"e_1_3_2_1_29_1","volume-title":"NUMA-Aware Scheduling and Memory Allocation for Data-Flow Task-Parallel Applications. In 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP).","author":"Drebes Andi","year":"2016","unstructured":"Andi Drebes , Antoniu Pop , Karine Heydemann , Nathalie Drach , and Albert Cohen . 2016 . NUMA-Aware Scheduling and Memory Allocation for Data-Flow Task-Parallel Applications. In 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP). Andi Drebes, Antoniu Pop, Karine Heydemann, Nathalie Drach, and Albert Cohen. 2016. NUMA-Aware Scheduling and Memory Allocation for Data-Flow Task-Parallel Applications. In 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP)."},{"key":"e_1_3_2_1_30_1","volume-title":"The Mondrian Data Engine. In 44nd ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA).","author":"Drumond Mario","year":"2017","unstructured":"Mario Drumond , Alexandros Daglis , Nooshin Mirzadeh , Dmitrii Ustiugov , Javier Picorel , Babak Falsafi , Boris Grot , and Dionisios Pnevmatikatos . 2017 . The Mondrian Data Engine. In 44nd ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA). Mario Drumond, Alexandros Daglis, Nooshin Mirzadeh, Dmitrii Ustiugov, Javier Picorel, Babak Falsafi, Boris Grot, and Dionisios Pnevmatikatos. 2017. The Mondrian Data Engine. In 44nd ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA)."},{"key":"e_1_3_2_1_31_1","volume-title":"NDA: Near-DRAM Acceleration Architecture Leveraging Commodity DRAM Devices and Standard Memory Modules. In 21st IEEE International Symposium on High Performance Computer Architecture (HPCA).","author":"Farmahini-Farahani Amin","year":"2015","unstructured":"Amin Farmahini-Farahani , Jung Ho Ahn , Katherine Morrow , and Nam Sung Kim . 2015 . NDA: Near-DRAM Acceleration Architecture Leveraging Commodity DRAM Devices and Standard Memory Modules. In 21st IEEE International Symposium on High Performance Computer Architecture (HPCA). Amin Farmahini-Farahani, Jung Ho Ahn, Katherine Morrow, and Nam Sung Kim. 2015. NDA: Near-DRAM Acceleration Architecture Leveraging Commodity DRAM Devices and Standard Memory Modules. In 21st IEEE International Symposium on High Performance Computer Architecture (HPCA)."},{"key":"e_1_3_2_1_32_1","volume-title":"MeNDA: A Near-Memory Multi-Way Merge Solution for Sparse Transposition and Dataflows. In 49th ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA).","author":"Feng Siying","year":"2022","unstructured":"Siying Feng , Xin He , Kuan-Yu Chen , Liu Ke , Xuan Zhang , David Blaauw , Trevor Mudge , and Ronald Dreslinski . 2022 . MeNDA: A Near-Memory Multi-Way Merge Solution for Sparse Transposition and Dataflows. In 49th ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA). Siying Feng, Xin He, Kuan-Yu Chen, Liu Ke, Xuan Zhang, David Blaauw, Trevor Mudge, and Ronald Dreslinski. 2022. MeNDA: A Near-Memory Multi-Way Merge Solution for Sparse Transposition and Dataflows. In 49th ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA)."},{"key":"e_1_3_2_1_33_1","volume-title":"Practical Near-Data Processing for In-Memory Analytics Frameworks. In 24th International Conference on Parallel Architectures and Compilation Techniques (PACT).","author":"Gao Mingyu","year":"2015","unstructured":"Mingyu Gao , Grant Ayers , and Christos Kozyrakis . 2015 . Practical Near-Data Processing for In-Memory Analytics Frameworks. In 24th International Conference on Parallel Architectures and Compilation Techniques (PACT). Mingyu Gao, Grant Ayers, and Christos Kozyrakis. 2015. Practical Near-Data Processing for In-Memory Analytics Frameworks. In 24th International Conference on Parallel Architectures and Compilation Techniques (PACT)."},{"key":"e_1_3_2_1_34_1","volume-title":"HRL: Efficient and Flexible Reconfigurable Logic for Near-Data Processing. In 22nd IEEE International Symposium on High Performance Computer Architecture (HPCA).","author":"Gao Mingyu","year":"2016","unstructured":"Mingyu Gao and Christos Kozyrakis . 2016 . HRL: Efficient and Flexible Reconfigurable Logic for Near-Data Processing. In 22nd IEEE International Symposium on High Performance Computer Architecture (HPCA). Mingyu Gao and Christos Kozyrakis. 2016. HRL: Efficient and Flexible Reconfigurable Logic for Near-Data Processing. In 22nd IEEE International Symposium on High Performance Computer Architecture (HPCA)."},{"key":"e_1_3_2_1_35_1","volume-title":"20nd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).","author":"Gao Mingyu","year":"2017","unstructured":"Mingyu Gao , Jing Pu , Xuan Yang , Mark Horowitz , and Christos Kozyrakis . 2017 . Tetris: Scalable and Efficient Neural Network Acceleration With 3D Memory . In 20nd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Mingyu Gao, Jing Pu, Xuan Yang, Mark Horowitz, and Christos Kozyrakis. 2017. Tetris: Scalable and Efficient Neural Network Acceleration With 3D Memory. In 20nd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)."},{"key":"e_1_3_2_1_36_1","volume-title":"SynCron: Efficient Synchronization Support for Near-Data-Processing Architectures. In 27th IEEE International Symposium on High Performance Computer Architecture (HPCA).","author":"Giannoula Christina","year":"2021","unstructured":"Christina Giannoula , Nandita Vijaykumar , Nikela Papadopoulou , Vasileios Karakostas , Ivan Fernandez , Juan G\u00f3mez-Luna , Lois Orosa , Nectarios Koziris , Georgios Goumas , and Onur Mutlu . 2021 . SynCron: Efficient Synchronization Support for Near-Data-Processing Architectures. In 27th IEEE International Symposium on High Performance Computer Architecture (HPCA). Christina Giannoula, Nandita Vijaykumar, Nikela Papadopoulou, Vasileios Karakostas, Ivan Fernandez, Juan G\u00f3mez-Luna, Lois Orosa, Nectarios Koziris, Georgios Goumas, and Onur Mutlu. 2021. SynCron: Efficient Synchronization Support for Near-Data-Processing Architectures. In 27th IEEE International Symposium on High Performance Computer Architecture (HPCA)."},{"key":"e_1_3_2_1_37_1","volume-title":"PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs. In 10th USENIX Conference on Operating Systems Design and Implementation (OSDI).","author":"Gonzalez Joseph E.","year":"2012","unstructured":"Joseph E. Gonzalez , Yucheng Low , Haijie Gu , Danny Bickson , and Carlos Guestrin . 2012 . PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs. In 10th USENIX Conference on Operating Systems Design and Implementation (OSDI). Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs. In 10th USENIX Conference on Operating Systems Design and Implementation (OSDI)."},{"key":"e_1_3_2_1_38_1","volume-title":"47nd ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA).","author":"Gu Peng","year":"2020","unstructured":"Peng Gu , Xinfeng Xie , Yufei Ding , Guoyang Chen , Weifeng Zhang , Dimin Niu , and Yuan Xie . 2020 . iPIM: Programmable In-Memory Image Processing Accelerator Using Near-Bank Architecture . In 47nd ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA). Peng Gu, Xinfeng Xie, Yufei Ding, Guoyang Chen, Weifeng Zhang, Dimin Niu, and Yuan Xie. 2020. iPIM: Programmable In-Memory Image Processing Accelerator Using Near-Bank Architecture. In 47nd ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA)."},{"key":"e_1_3_2_1_39_1","volume-title":"Hit Latency and Bandwidth. In 48th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO).","author":"Gulur Nagendra","unstructured":"Nagendra Gulur , Mahesh Mehendale , R. Manikantan , and R. Govindarajan . 2015. Bi-Modal DRAM Cache: Improving Hit Rate , Hit Latency and Bandwidth. In 48th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). Nagendra Gulur, Mahesh Mehendale, R. Manikantan, and R. Govindarajan. 2015. Bi-Modal DRAM Cache: Improving Hit Rate, Hit Latency and Bandwidth. In 48th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO)."},{"key":"e_1_3_2_1_40_1","volume-title":"Reactive NUCA: Near-Optimal Block Placement and Replication in Distributed Caches. In 36th ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA).","author":"Hardavellas Nikos","year":"2009","unstructured":"Nikos Hardavellas , Michael Ferdman , Babak Falsafi , and Anastasia Ailamaki . 2009 . Reactive NUCA: Near-Optimal Block Placement and Replication in Distributed Caches. In 36th ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA). Nikos Hardavellas, Michael Ferdman, Babak Falsafi, and Anastasia Ailamaki. 2009. Reactive NUCA: Near-Optimal Block Placement and Replication in Distributed Caches. In 36th ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA)."},{"key":"e_1_3_2_1_41_1","volume-title":"Transparent Offloading and Mapping (TOM) Enabling Programmer-Transparent Near-Data Processing in GPU Systems. In 43rd ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA).","author":"Hsieh Kevin","year":"2016","unstructured":"Kevin Hsieh , Eiman Ebrahimi , Gwangsun Kim , Niladrish Chatterjee , Mike O\u2019Connor , Nandita Vijaykumar , Onur Mutlu , and Stephen W Keckler . 2016 . Transparent Offloading and Mapping (TOM) Enabling Programmer-Transparent Near-Data Processing in GPU Systems. In 43rd ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA). Kevin Hsieh, Eiman Ebrahimi, Gwangsun Kim, Niladrish Chatterjee, Mike O\u2019Connor, Nandita Vijaykumar, Onur Mutlu, and Stephen W Keckler. 2016. Transparent Offloading and Mapping (TOM) Enabling Programmer-Transparent Near-Data Processing in GPU Systems. In 43rd ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA)."},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/2628071.2628089"},{"key":"e_1_3_2_1_43_1","volume-title":"CRUISE: Cache Replacement and Utility-Aware Scheduling. In 17th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).","author":"Jaleel Aamer","year":"2012","unstructured":"Aamer Jaleel , Hashem H. Najaf-abadi, Samantika Subramaniam , Simon C. Steely , and Joel Emer . 2012 . CRUISE: Cache Replacement and Utility-Aware Scheduling. In 17th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). Aamer Jaleel, Hashem H. Najaf-abadi, Samantika Subramaniam, Simon C. Steely, and Joel Emer. 2012. CRUISE: Cache Replacement and Utility-Aware Scheduling. In 17th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)."},{"key":"e_1_3_2_1_44_1","unstructured":"JEDEC. 2021. High Bandwidth Memory (HBM) DRAM.  https:\/\/www.jedec.org\/standards-documents\/docs\/jesd235a \t\t\t\t  JEDEC. 2021. High Bandwidth Memory (HBM) DRAM.  https:\/\/www.jedec.org\/standards-documents\/docs\/jesd235a"},{"key":"e_1_3_2_1_45_1","volume-title":"Data-Centric Execution of Speculative Parallel Programs. In 49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO).","author":"Jeffrey Mark C.","year":"2016","unstructured":"Mark C. Jeffrey , Suvinay Subramanian , Maleen Abeydeera , Joel Emer , and Daniel Sanchez . 2016 . Data-Centric Execution of Speculative Parallel Programs. In 49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). Mark C. Jeffrey, Suvinay Subramanian, Maleen Abeydeera, Joel Emer, and Daniel Sanchez. 2016. Data-Centric Execution of Speculative Parallel Programs. In 49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO)."},{"key":"e_1_3_2_1_46_1","volume-title":"48th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO).","author":"Jeffrey Mark C.","year":"2015","unstructured":"Mark C. Jeffrey , Suvinay Subramanian , Cong Yan , Joel Emer , and Daniel Sanchez . 2015 . A Scalable Architecture for Ordered Parallelism . In 48th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). Mark C. Jeffrey, Suvinay Subramanian, Cong Yan, Joel Emer, and Daniel Sanchez. 2015. A Scalable Architecture for Ordered Parallelism. In 48th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO)."},{"key":"e_1_3_2_1_47_1","volume-title":"Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache. In 48th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO).","author":"Jevdjic Djordje","year":"2015","unstructured":"Djordje Jevdjic , Gabriel H. Loh , Cansu Kaynak , and Babak Falsafi . 2015 . Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache. In 48th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). Djordje Jevdjic, Gabriel H. Loh, Cansu Kaynak, and Babak Falsafi. 2015. Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache. In 48th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO)."},{"key":"e_1_3_2_1_48_1","volume-title":"Have It All with Footprint Cache. In 40th ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA).","author":"Jevdjic Djordje","year":"2013","unstructured":"Djordje Jevdjic , Stavros Volos , and Babak Falsafi . 2013 . Die-Stacked DRAM Caches for Servers Hit Ratio, Latency, or Bandwidth? Have It All with Footprint Cache. In 40th ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA). Djordje Jevdjic, Stavros Volos, and Babak Falsafi. 2013. Die-Stacked DRAM Caches for Servers Hit Ratio, Latency, or Bandwidth? Have It All with Footprint Cache. In 40th ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA)."},{"key":"e_1_3_2_1_49_1","volume-title":"HBM (High Bandwidth Memory) DRAM Technology and Architecture. In 2017 IEEE International Memory Workshop (IMW).","author":"Jun Hongshin","year":"2017","unstructured":"Hongshin Jun , Jinhee Cho , Kangseol Lee , Ho-Young Son , Kwiwook Kim , Hanho Jin , and Keith Kim . 2017 . HBM (High Bandwidth Memory) DRAM Technology and Architecture. In 2017 IEEE International Memory Workshop (IMW). Hongshin Jun, Jinhee Cho, Kangseol Lee, Ho-Young Son, Kwiwook Kim, Hanho Jin, and Keith Kim. 2017. HBM (High Bandwidth Memory) DRAM Technology and Architecture. In 2017 IEEE International Memory Workshop (IMW)."},{"key":"e_1_3_2_1_50_1","volume-title":"48th ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA).","author":"Kal Hongju","year":"2021","unstructured":"Hongju Kal , Seokmin Lee , Gun Ko , and Won Woo Ro . 2021 . SPACE : Locality-Aware Processing in Heterogeneous Memory for Personalized Recommendations . In 48th ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA). Hongju Kal, Seokmin Lee, Gun Ko, and Won Woo Ro. 2021. SPACE : Locality-Aware Processing in Heterogeneous Memory for Personalized Recommendations. In 48th ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA)."},{"key":"e_1_3_2_1_51_1","volume-title":"RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing. In 47th ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA).","author":"Ke Liu","year":"2020","unstructured":"Liu Ke , Udit Gupta , Benjamin Youngjae Cho , David Brooks , Vikas Chandra , Utku Diril , Amin Firoozshahian , Kim Hazelwood , Bill Jia , Hsien-Hsin S. Lee , Meng Li , Bert Maher , Dheevatsa Mudigere , Maxim Naumov , Martin Schatz , Mikhail Smelyanskiy , Xiaodong Wang , Brandon Reagen , Carole-Jean Wu , Mark Hempstead , and Xuan Zhang . 2020 . RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing. In 47th ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA). Liu Ke, Udit Gupta, Benjamin Youngjae Cho, David Brooks, Vikas Chandra, Utku Diril, Amin Firoozshahian, Kim Hazelwood, Bill Jia, Hsien-Hsin S. Lee, Meng Li, Bert Maher, Dheevatsa Mudigere, Maxim Naumov, Martin Schatz, Mikhail Smelyanskiy, Xiaodong Wang, Brandon Reagen, Carole-Jean Wu, Mark Hempstead, and Xuan Zhang. 2020. RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing. In 47th ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA)."},{"key":"e_1_3_2_1_52_1","volume-title":"43rd ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA).","author":"Kim Duckhwan","year":"2016","unstructured":"Duckhwan Kim , Jaeha Kung , Sek Chai , Sudhakar Yalamanchili , and Saibal Mukhopadhyay . 2016 . NeuroCube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory . In 43rd ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA). Duckhwan Kim, Jaeha Kung, Sek Chai, Sudhakar Yalamanchili, and Saibal Mukhopadhyay. 2016. NeuroCube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory. In 43rd ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA)."},{"key":"e_1_3_2_1_53_1","volume-title":"Memory-Centric System Interconnect Design with Hybrid Memory Cubes. In 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT).","author":"Kim Gwangsun","year":"2013","unstructured":"Gwangsun Kim , John Kim , Jung Ho Ahn , and Jaeha Kim . 2013 . Memory-Centric System Interconnect Design with Hybrid Memory Cubes. In 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT). Gwangsun Kim, John Kim, Jung Ho Ahn, and Jaeha Kim. 2013. Memory-Centric System Interconnect Design with Hybrid Memory Cubes. In 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT)."},{"key":"e_1_3_2_1_54_1","volume-title":"Hongyi Xin, Donghyuk Lee, Saugata Ghose, Mohammed Alser, Hasan Hassan, Oguz Ergin, Can Alkan, and Onur Mutlu.","author":"Kim Jeremie S.","year":"2018","unstructured":"Jeremie S. Kim , Damla Senol Cali , Hongyi Xin, Donghyuk Lee, Saugata Ghose, Mohammed Alser, Hasan Hassan, Oguz Ergin, Can Alkan, and Onur Mutlu. 2018 . GRIM-Filter: Fast Seed Location Filtering in DNA Read Mapping Using Processing-in-Memory Technologies. BMC Genomics , 19, 2 (2018). Jeremie S. Kim, Damla Senol Cali, Hongyi Xin, Donghyuk Lee, Saugata Ghose, Mohammed Alser, Hasan Hassan, Oguz Ergin, Can Alkan, and Onur Mutlu. 2018. GRIM-Filter: Fast Seed Location Filtering in DNA Read Mapping Using Processing-in-Memory Technologies. BMC Genomics, 19, 2 (2018)."},{"key":"e_1_3_2_1_55_1","volume-title":"Enhancing Computation-to-Core Assignment with Physical Location Information. In 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI).","author":"Kislal Orhan","year":"2018","unstructured":"Orhan Kislal , Jagadish Kotra , Xulong Tang , Mahmut Taylan Kandemir , and Myoungsoo Jung . 2018 . Enhancing Computation-to-Core Assignment with Physical Location Information. In 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). Orhan Kislal, Jagadish Kotra, Xulong Tang, Mahmut Taylan Kandemir, and Myoungsoo Jung. 2018. Enhancing Computation-to-Core Assignment with Physical Location Information. In 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)."},{"key":"e_1_3_2_1_56_1","volume-title":"TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning. In 52nd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO).","author":"Kwon Youngeun","year":"2019","unstructured":"Youngeun Kwon , Yunjae Lee , and Minsoo Rhu . 2019 . TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning. In 52nd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). Youngeun Kwon, Yunjae Lee, and Minsoo Rhu. 2019. TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning. In 52nd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO)."},{"key":"e_1_3_2_1_57_1","volume-title":"2021 IEEE International Solid- State Circuits Conference (ISSCC).","author":"Kwon Young-Cheon","year":"2021","unstructured":"Young-Cheon Kwon , Suk Han Lee , Jaehoon Lee , Sang-Hyuk Kwon , Je Min Ryu , Jong-Pil Son , O Seongil , Hak-Soo Yu , Haesuk Lee , Soo Young Kim , Youngmin Cho , Jin Guk Kim , Jongyoon Choi , Hyun-Sung Shin , Jin Kim , BengSeng Phuah , HyoungMin Kim , Myeong Jun Song , Ahn Choi , Daeho Kim , SooYoung Kim , Eun-Bong Kim , David Wang , Shinhaeng Kang , Yuhwan Ro , Seungwoo Seo , JoonHo Song , Jaeyoun Youn , Kyomin Sohn , and Nam Sung Kim . 2021 . A 20nm 6GB Function-In-Memory DRAM, Based on HBM2 with a 1.2TFLOPS Programmable Computing Unit Using Bank-Level Parallelism, for Machine Learning Applications . In 2021 IEEE International Solid- State Circuits Conference (ISSCC). Young-Cheon Kwon, Suk Han Lee, Jaehoon Lee, Sang-Hyuk Kwon, Je Min Ryu, Jong-Pil Son, O Seongil, Hak-Soo Yu, Haesuk Lee, Soo Young Kim, Youngmin Cho, Jin Guk Kim, Jongyoon Choi, Hyun-Sung Shin, Jin Kim, BengSeng Phuah, HyoungMin Kim, Myeong Jun Song, Ahn Choi, Daeho Kim, SooYoung Kim, Eun-Bong Kim, David Wang, Shinhaeng Kang, Yuhwan Ro, Seungwoo Seo, JoonHo Song, Jaeyoun Youn, Kyomin Sohn, and Nam Sung Kim. 2021. A 20nm 6GB Function-In-Memory DRAM, Based on HBM2 with a 1.2TFLOPS Programmable Computing Unit Using Bank-Level Parallelism, for Machine Learning Applications. In 2021 IEEE International Solid- State Circuits Conference (ISSCC)."},{"key":"e_1_3_2_1_58_1","volume-title":"Tagless DRAM Cache. In 42nd ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA).","author":"Lee Yongjun","unstructured":"Yongjun Lee , Jongwon Kim , Hakbeom Jang , Hyunggyun Yang , Jangwoo Kim , Jinkyu Jeong , and Jae W. Lee . 2015. A Fully Associative , Tagless DRAM Cache. In 42nd ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA). Yongjun Lee, Jongwon Kim, Hakbeom Jang, Hyunggyun Yang, Jangwoo Kim, Jinkyu Jeong, and Jae W. Lee. 2015. A Fully Associative, Tagless DRAM Cache. In 42nd ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA)."},{"key":"e_1_3_2_1_59_1","volume-title":"Thread and Memory Placement on NUMA Systems: Asymmetry Matters. In 2015 USENIX Annual Technical Conference (USENIX ATC).","author":"Lepers Baptiste","year":"2015","unstructured":"Baptiste Lepers , Vivien Qu\u00e9ma , and Alexandra Fedorova . 2015 . Thread and Memory Placement on NUMA Systems: Asymmetry Matters. In 2015 USENIX Annual Technical Conference (USENIX ATC). Baptiste Lepers, Vivien Qu\u00e9ma, and Alexandra Fedorova. 2015. Thread and Memory Placement on NUMA Systems: Asymmetry Matters. In 2015 USENIX Annual Technical Conference (USENIX ATC)."},{"key":"e_1_3_2_1_60_1","article-title":"SNAP: A General-Purpose Network Analysis and Graph-Mining Library","volume":"8","author":"Leskovec Jure","year":"2016","unstructured":"Jure Leskovec and Rok Sosi\u010d . 2016 . SNAP: A General-Purpose Network Analysis and Graph-Mining Library . ACM Transactions on Intelligent Systems and Technology (TIST) , 8 , 1 (2016). Jure Leskovec and Rok Sosi\u010d. 2016. SNAP: A General-Purpose Network Analysis and Graph-Mining Library. ACM Transactions on Intelligent Systems and Technology (TIST), 8, 1 (2016).","journal-title":"ACM Transactions on Intelligent Systems and Technology (TIST)"},{"key":"e_1_3_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1145\/1669112.1669139"},{"key":"e_1_3_2_1_62_1","volume-title":"Efficiently Enabling Conventional Block Sizes for Very Large Die-Stacked DRAM Caches. In 44th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO).","author":"Gabriel","unstructured":"Gabriel H. Loh and Mark D. Hill. 2011 . Efficiently Enabling Conventional Block Sizes for Very Large Die-Stacked DRAM Caches. In 44th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). Gabriel H. Loh and Mark D. Hill. 2011. Efficiently Enabling Conventional Block Sizes for Very Large Die-Stacked DRAM Caches. In 44th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO)."},{"key":"e_1_3_2_1_63_1","volume-title":"Pregel: A System for Large-Scale Graph Processing. In 2010 ACM SIGMOD International Conference on Management of Data.","author":"Malewicz Grzegorz","year":"2010","unstructured":"Grzegorz Malewicz , Matthew H. Austern , Aart J.C Bik , James C. Dehnert , Ilan Horn , Naty Leiser , and Grzegorz Czajkowski . 2010 . Pregel: A System for Large-Scale Graph Processing. In 2010 ACM SIGMOD International Conference on Management of Data. Grzegorz Malewicz, Matthew H. Austern, Aart J.C Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A System for Large-Scale Graph Processing. In 2010 ACM SIGMOD International Conference on Management of Data."},{"key":"e_1_3_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1145\/977091.977115"},{"key":"e_1_3_2_1_65_1","unstructured":"Micron. 2018. Hybrid Memory Cube \u2013 HMC Gen2.  https:\/\/www.micron.com\/-\/media\/client\/global\/documents\/products\/data-sheet\/hmc\/gen2\/hmc_gen2.pdf \t\t\t\t  Micron. 2018. Hybrid Memory Cube \u2013 HMC Gen2.  https:\/\/www.micron.com\/-\/media\/client\/global\/documents\/products\/data-sheet\/hmc\/gen2\/hmc_gen2.pdf"},{"key":"e_1_3_2_1_66_1","volume-title":"50th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO).","author":"Mutlu Onur","year":"2017","unstructured":"Onur Mutlu and Srinivas Devadas . 2017 . Banshee: Bandwidth-Efficient DRAM Caching via Software\/Hardware Cooperation High-Bandwidth In-Package DRAM . In 50th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). Onur Mutlu and Srinivas Devadas. 2017. Banshee: Bandwidth-Efficient DRAM Caching via Software\/Hardware Cooperation High-Bandwidth In-Package DRAM. In 50th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO)."},{"key":"e_1_3_2_1_67_1","volume-title":"GraphPIM: Enabling Instruction-Level PIM Offloading in Graph Computing Frameworks. In 23rd IEEE Symposium on High Performance Computer Architecture (HPCA).","author":"Nai Lifeng","year":"2017","unstructured":"Lifeng Nai , Ramyad Hadidi , Jaewoong Sim , Hyojong Kim , Pranith Kumar , and Hyesoon Kim . 2017 . GraphPIM: Enabling Instruction-Level PIM Offloading in Graph Computing Frameworks. In 23rd IEEE Symposium on High Performance Computer Architecture (HPCA). Lifeng Nai, Ramyad Hadidi, Jaewoong Sim, Hyojong Kim, Pranith Kumar, and Hyesoon Kim. 2017. GraphPIM: Enabling Instruction-Level PIM Offloading in Graph Computing Frameworks. In 23rd IEEE Symposium on High Performance Computer Architecture (HPCA)."},{"key":"e_1_3_2_1_68_1","article-title":"Active Memory Cube: A Processing-In-Memory Architecture for Exascale Systems","volume":"59","author":"Nair Ravi","year":"2015","unstructured":"Ravi Nair , Samuel F Antao , Carlo Bertolli , Pradip Bose , Jose R Brunheroto , Tong Chen , C-Y Cher , Carlos HA Costa , Jun Doi , and Constantinos Evangelinos . 2015 . Active Memory Cube: A Processing-In-Memory Architecture for Exascale Systems . IBM Journal of Research and Development , 59 , 2\/3 (2015). Ravi Nair, Samuel F Antao, Carlo Bertolli, Pradip Bose, Jose R Brunheroto, Tong Chen, C-Y Cher, Carlos HA Costa, Jun Doi, and Constantinos Evangelinos. 2015. Active Memory Cube: A Processing-In-Memory Architecture for Exascale Systems. IBM Journal of Research and Development, 59, 2\/3 (2015).","journal-title":"IBM Journal of Research and Development"},{"key":"e_1_3_2_1_69_1","volume-title":"TRiM: Enhancing Processor-Memory Interfaces with Scalable Tensor Reduction in Memory. In 54th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO).","author":"Park Jaehyun","year":"2021","unstructured":"Jaehyun Park , Byeongho Kim , Sungmin Yun , Eojin Lee , Minsoo Rhu , and Jung Ho Ahn . 2021 . TRiM: Enhancing Processor-Memory Interfaces with Scalable Tensor Reduction in Memory. In 54th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). Jaehyun Park, Byeongho Kim, Sungmin Yun, Eojin Lee, Minsoo Rhu, and Jung Ho Ahn. 2021. TRiM: Enhancing Processor-Memory Interfaces with Scalable Tensor Reduction in Memory. In 54th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO)."},{"key":"e_1_3_2_1_70_1","volume-title":"Automation & Test in Europe Conference & Exhibition (DATE).","author":"Pathania Anuj","year":"2018","unstructured":"Anuj Pathania . 2018 . Task Scheduling for Many-Cores with S-NUCA Caches. In Design , Automation & Test in Europe Conference & Exhibition (DATE). Anuj Pathania. 2018. Task Scheduling for Many-Cores with S-NUCA Caches. In Design, Automation & Test in Europe Conference & Exhibition (DATE)."},{"key":"e_1_3_2_1_71_1","volume-title":"There and Back Again: Optimizing the Interconnect in Networks of Memory Cubes. In 44th ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA).","author":"Poremba Matthew","unstructured":"Matthew Poremba , Itir Akgun , Jieming Yin , Onur Kayiran , Yuan Xie , and Gabriel H. Loh . 2017 . There and Back Again: Optimizing the Interconnect in Networks of Memory Cubes. In 44th ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA). Matthew Poremba, Itir Akgun, Jieming Yin, Onur Kayiran, Yuan Xie, and Gabriel H. Loh. 2017. There and Back Again: Optimizing the Interconnect in Networks of Memory Cubes. In 44th ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA)."},{"key":"e_1_3_2_1_72_1","volume-title":"Adaptive NUMA-Aware Data Placement and Task Scheduling for Analytical Workloads in Main-Memory Column-Stores. In 46th International Conference on Very Large Data Bases (VLDB).","author":"Psaroudakis Iraklis","year":"2016","unstructured":"Iraklis Psaroudakis , Tobias Scheuer , Norman May , Abdelkader Sellami , and Anastasia Ailamaki . 2016 . Adaptive NUMA-Aware Data Placement and Task Scheduling for Analytical Workloads in Main-Memory Column-Stores. In 46th International Conference on Very Large Data Bases (VLDB). Iraklis Psaroudakis, Tobias Scheuer, Norman May, Abdelkader Sellami, and Anastasia Ailamaki. 2016. Adaptive NUMA-Aware Data Placement and Task Scheduling for Analytical Workloads in Main-Memory Column-Stores. In 46th International Conference on Very Large Data Bases (VLDB)."},{"key":"e_1_3_2_1_73_1","volume-title":"2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).","author":"Pugsley Seth H","year":"2014","unstructured":"Seth H Pugsley , Jeffrey Jestes , Huihui Zhang , Rajeev Balasubramonian , Vijayalakshmi Srinivasan , Alper Buyuktosunoglu , Al Davis , and Feifei Li . 2014 . NDC: Analyzing the Impact of 3D-Stacked Memory+Logic Devices on MapReduce Workloads . In 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). Seth H Pugsley, Jeffrey Jestes, Huihui Zhang, Rajeev Balasubramonian, Vijayalakshmi Srinivasan, Alper Buyuktosunoglu, Al Davis, and Feifei Li. 2014. NDC: Analyzing the Impact of 3D-Stacked Memory+Logic Devices on MapReduce Workloads. In 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)."},{"key":"e_1_3_2_1_74_1","volume-title":"45th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO).","author":"Qureshi Moinuddin K","year":"2012","unstructured":"Moinuddin K Qureshi and Gabe H Loh . 2012 . Fundamental Latency Trade-off in Architecting DRAM Caches: Outperforming Impractical SRAM-tags with a Simple and Practical Design . In 45th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). Moinuddin K Qureshi and Gabe H Loh. 2012. Fundamental Latency Trade-off in Architecting DRAM Caches: Outperforming Impractical SRAM-tags with a Simple and Practical Design. In 45th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO)."},{"key":"e_1_3_2_1_75_1","volume-title":"Automation & Test in Europe Conference & Exhibition (DATE).","author":"Rapp Martin","year":"2019","unstructured":"Martin Rapp , Anuj Pathania , Tulika Mitra , and J\u00f6rg Henkel . 2019 . Prediction-Based Task Migration on S-NUCA Many-Cores. In Design , Automation & Test in Europe Conference & Exhibition (DATE). Martin Rapp, Anuj Pathania, Tulika Mitra, and J\u00f6rg Henkel. 2019. Prediction-Based Task Migration on S-NUCA Many-Cores. In Design, Automation & Test in Europe Conference & Exhibition (DATE)."},{"key":"e_1_3_2_1_76_1","volume-title":"X-Stream: Edge-Centric Graph Processing Using Streaming Partitions. In 24th ACM Symposium on Operating Systems Principles (SOSP).","author":"Roy Amitabha","year":"2013","unstructured":"Amitabha Roy , Ivo Mihailovic , and Willy Zwaenepoel . 2013 . X-Stream: Edge-Centric Graph Processing Using Streaming Partitions. In 24th ACM Symposium on Operating Systems Principles (SOSP). Amitabha Roy, Ivo Mihailovic, and Willy Zwaenepoel. 2013. X-Stream: Edge-Centric Graph Processing Using Streaming Partitions. In 24th ACM Symposium on Operating Systems Principles (SOSP)."},{"key":"e_1_3_2_1_77_1","volume-title":"ZSim: Fast and Accurate Microarchitectural Simulation of Thousand-Core Systems. In 40th ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA).","author":"Sanchez Daniel","year":"2013","unstructured":"Daniel Sanchez and Christos Kozyrakis . 2013 . ZSim: Fast and Accurate Microarchitectural Simulation of Thousand-Core Systems. In 40th ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA). Daniel Sanchez and Christos Kozyrakis. 2013. ZSim: Fast and Accurate Microarchitectural Simulation of Thousand-Core Systems. In 40th ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA)."},{"key":"e_1_3_2_1_78_1","volume-title":"Dynamic Fine-Grain Scheduling of Pipeline Parallelism. In 20th International Conference on Parallel Architectures and Compilation Techniques (PACT).","author":"Sanchez Daniel","year":"2011","unstructured":"Daniel Sanchez , David Lo , Richard M. Yoo , Jeremy Sugerman , and Christos Kozyrakis . 2011 . Dynamic Fine-Grain Scheduling of Pipeline Parallelism. In 20th International Conference on Parallel Architectures and Compilation Techniques (PACT). Daniel Sanchez, David Lo, Richard M. Yoo, Jeremy Sugerman, and Christos Kozyrakis. 2011. Dynamic Fine-Grain Scheduling of Pipeline Parallelism. In 20th International Conference on Parallel Architectures and Compilation Techniques (PACT)."},{"key":"e_1_3_2_1_79_1","volume-title":"Skewed-Associative Caches. In International Conference on Parallel Architectures and Languages Europe (PARLE).","author":"Seznec Andr\u00e9","year":"1993","unstructured":"Andr\u00e9 Seznec and Francois Bodin . 1993 . Skewed-Associative Caches. In International Conference on Parallel Architectures and Languages Europe (PARLE). Andr\u00e9 Seznec and Francois Bodin. 1993. Skewed-Associative Caches. In International Conference on Parallel Architectures and Languages Europe (PARLE)."},{"key":"e_1_3_2_1_80_1","volume-title":"Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture. In 52nd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO).","author":"Shao Yakun Sophia","year":"2019","unstructured":"Yakun Sophia Shao , Jason Clemons , Rangharajan Venkatesan , Brian Zimmer , Matthew Fojtik , Nan Jiang , Ben Keller , Alicia Klinefelter , Nathaniel Pinckney , and Priyanka Raina . 2019 . Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture. In 52nd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). Yakun Sophia Shao, Jason Clemons, Rangharajan Venkatesan, Brian Zimmer, Matthew Fojtik, Nan Jiang, Ben Keller, Alicia Klinefelter, Nathaniel Pinckney, and Priyanka Raina. 2019. Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture. In 52nd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO)."},{"key":"e_1_3_2_1_81_1","volume-title":"Almost Deterministic Work Stealing. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC).","author":"Shiina Shumpei","year":"2019","unstructured":"Shumpei Shiina and Kenjiro Taura . 2019 . Almost Deterministic Work Stealing. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC). Shumpei Shiina and Kenjiro Taura. 2019. Almost Deterministic Work Stealing. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC)."},{"key":"e_1_3_2_1_82_1","volume-title":"Ligra: A Lightweight Graph Processing Framework for Shared Memory. In 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP).","author":"Shun Julian","unstructured":"Julian Shun and Guy E. Blelloch . 2013 . Ligra: A Lightweight Graph Processing Framework for Shared Memory. In 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP). Julian Shun and Guy E. Blelloch. 2013. Ligra: A Lightweight Graph Processing Framework for Shared Memory. In 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP)."},{"key":"e_1_3_2_1_83_1","volume-title":"45th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO).","author":"Sim Jaewoong","year":"2012","unstructured":"Jaewoong Sim , Gabriel H. Loh , Hyesoon Kim , Mike Oconnor , and Mithuna Thottethodi . 2012 . A Mostly-Clean DRAM Cache for Effective Hit Speculation and Self-Balancing Dispatch . In 45th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). Jaewoong Sim, Gabriel H. Loh, Hyesoon Kim, Mike Oconnor, and Mithuna Thottethodi. 2012. A Mostly-Clean DRAM Cache for Effective Hit Speculation and Self-Balancing Dispatch. In 45th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO)."},{"key":"e_1_3_2_1_84_1","volume-title":"Napel: Near-Memory Computing Application Performance Prediction Via Ensemble Learning. In 56th ACM\/IEEE Design Automation Conference (DAC).","author":"Singh Gagandeep","year":"2019","unstructured":"Gagandeep Singh , Juan G\u00f3mez-Luna , Giovanni Mariani , Geraldo F Oliveira , Stefano Corda , Sander Stuijk , Onur Mutlu , and Henk Corp oraal. 2019 . Napel: Near-Memory Computing Application Performance Prediction Via Ensemble Learning. In 56th ACM\/IEEE Design Automation Conference (DAC). Gagandeep Singh, Juan G\u00f3mez-Luna, Giovanni Mariani, Geraldo F Oliveira, Stefano Corda, Sander Stuijk, Onur Mutlu, and Henk Corporaal. 2019. Napel: Near-Memory Computing Application Performance Prediction Via Ensemble Learning. In 56th ACM\/IEEE Design Automation Conference (DAC)."},{"key":"e_1_3_2_1_85_1","volume-title":"ABC-DIMM: Alleviating the Bottleneck of Communication in DIMM-based Near-Memory Processing with Inter-DIMM Broadcast. In 48th ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA).","author":"Sun Weiyi","year":"2021","unstructured":"Weiyi Sun , Zhaoshi Li , Shouyi Yin , Shaojun Wei , and Leibo Liu . 2021 . ABC-DIMM: Alleviating the Bottleneck of Communication in DIMM-based Near-Memory Processing with Inter-DIMM Broadcast. In 48th ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA). Weiyi Sun, Zhaoshi Li, Shouyi Yin, Shaojun Wei, and Leibo Liu. 2021. ABC-DIMM: Alleviating the Bottleneck of Communication in DIMM-based Near-Memory Processing with Inter-DIMM Broadcast. In 48th ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA)."},{"key":"e_1_3_2_1_86_1","volume-title":"Proceedings of the International Conference on Supercomputing (ICS).","author":"Barrera Isaac S\u00e1nchez","year":"2018","unstructured":"Isaac S\u00e1nchez Barrera , Miquel Moret\u00f3 , Eduard Ayguad\u00e9 , Jes\u00fas Labarta , Mateo Valero , and Marc Casas . 2018 . Reducing Data Movement on Large Shared Memory Systems by Exploiting Computation Dependencies . In Proceedings of the International Conference on Supercomputing (ICS). Isaac S\u00e1nchez Barrera, Miquel Moret\u00f3, Eduard Ayguad\u00e9, Jes\u00fas Labarta, Mateo Valero, and Marc Casas. 2018. Reducing Data Movement on Large Shared Memory Systems by Exploiting Computation Dependencies. In Proceedings of the International Conference on Supercomputing (ICS)."},{"key":"e_1_3_2_1_87_1","volume-title":"NDMiner: Accelerating Graph Pattern Mining Using Near Data Processing. In 49th ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA).","author":"Talati Nishil","year":"2022","unstructured":"Nishil Talati , Haojie Ye , Yichen Yang , Leul Belayneh , Kuan-Yu Chen , David Blaauw , Trevor Mudge , and Ronald Dreslinski . 2022 . NDMiner: Accelerating Graph Pattern Mining Using Near Data Processing. In 49th ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA). Nishil Talati, Haojie Ye, Yichen Yang, Leul Belayneh, Kuan-Yu Chen, David Blaauw, Trevor Mudge, and Ronald Dreslinski. 2022. NDMiner: Accelerating Graph Pattern Mining Using Near Data Processing. In 49th ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA)."},{"key":"e_1_3_2_1_88_1","volume-title":"Data Movement Aware Computation Partitioning. In 50th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO).","author":"Tang Xulong","year":"2017","unstructured":"Xulong Tang , Orhan Kislal , Mahmut Kandemir , and Mustafa Karakoy . 2017 . Data Movement Aware Computation Partitioning. In 50th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). Xulong Tang, Orhan Kislal, Mahmut Kandemir, and Mustafa Karakoy. 2017. Data Movement Aware Computation Partitioning. In 50th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO)."},{"key":"e_1_3_2_1_89_1","volume-title":"Jenga: Software-Defined Cache Hierarchies. In 47th ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA).","author":"Tsai Po-An","year":"2017","unstructured":"Po-An Tsai , Nathan Beckmann , and Daniel Sanchez . 2017 . Jenga: Software-Defined Cache Hierarchies. In 47th ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA). Po-An Tsai, Nathan Beckmann, and Daniel Sanchez. 2017. Jenga: Software-Defined Cache Hierarchies. In 47th ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA)."},{"key":"e_1_3_2_1_90_1","volume-title":"Adaptive Scheduling for Systems with Asymmetric Memory Hierarchies. In 51st Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO).","author":"Tsai Po An","year":"2018","unstructured":"Po An Tsai , Changping Chen , and Daniel Sanchez . 2018 . Adaptive Scheduling for Systems with Asymmetric Memory Hierarchies. In 51st Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). Po An Tsai, Changping Chen, and Daniel Sanchez. 2018. Adaptive Scheduling for Systems with Asymmetric Memory Hierarchies. In 51st Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO)."},{"key":"e_1_3_2_1_91_1","volume-title":"SpaceA: Sparse Matrix Vector Multiplication on Processing-in-Memory Accelerator. In 27th IEEE International Symposium on High-Performance Computer Architecture (HPCA).","author":"Xie Xinfeng","year":"2021","unstructured":"Xinfeng Xie , Zheng Liang , Peng Gu , Abanti Basak , Lei Deng , Ling Liang , Xing Hu , and Yuan Xie . 2021 . SpaceA: Sparse Matrix Vector Multiplication on Processing-in-Memory Accelerator. In 27th IEEE International Symposium on High-Performance Computer Architecture (HPCA). Xinfeng Xie, Zheng Liang, Peng Gu, Abanti Basak, Lei Deng, Ling Liang, Xing Hu, and Yuan Xie. 2021. SpaceA: Sparse Matrix Vector Multiplication on Processing-in-Memory Accelerator. In 27th IEEE International Symposium on High-Performance Computer Architecture (HPCA)."},{"key":"e_1_3_2_1_92_1","volume-title":"47th ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA).","author":"Ying Victor A.","year":"2020","unstructured":"Victor A. Ying , Mark C. Jeffrey , and Daniel Sanchez . 2020 . T4: Compiling Sequential Code for Effective Speculative Parallelization in Hardware . In 47th ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA). Victor A. Ying, Mark C. Jeffrey, and Daniel Sanchez. 2020. T4: Compiling Sequential Code for Effective Speculative Parallelization in Hardware. In 47th ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA)."},{"key":"e_1_3_2_1_93_1","volume-title":"45th ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA).","author":"Young Vinson","year":"2018","unstructured":"Vinson Young , Chiachen Chou , Aamer Jaleel , and Moinuddin Qureshi . 2018 . ACCORD: Enabling Associativity for Gigascale DRAM Caches by Coordinating Way-Install and Way-Prediction . In 45th ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA). Vinson Young, Chiachen Chou, Aamer Jaleel, and Moinuddin Qureshi. 2018. ACCORD: Enabling Associativity for Gigascale DRAM Caches by Coordinating Way-Install and Way-Prediction. In 45th ACM\/IEEE Annual International Symposium on Computer Architecture (ISCA)."},{"key":"e_1_3_2_1_94_1","doi-asserted-by":"publisher","DOI":"10.5555\/3195638.3195673"},{"key":"e_1_3_2_1_95_1","volume-title":"TOP-PIM: Throughput-Oriented Programmable Processing in Memory. In 23rd International Symposium on High-Performance Parallel and Distributed Computing (HPDC).","author":"Zhang Dong Ping","year":"2014","unstructured":"Dong Ping Zhang , Nuwan Jayasena , Alexander Lyashevsky , Joseph L. Greathouse , Lifan Xu , and Michael Ignatowski . 2014 . TOP-PIM: Throughput-Oriented Programmable Processing in Memory. In 23rd International Symposium on High-Performance Parallel and Distributed Computing (HPDC). Dong Ping Zhang, Nuwan Jayasena, Alexander Lyashevsky, Joseph L. Greathouse, Lifan Xu, and Michael Ignatowski. 2014. TOP-PIM: Throughput-Oriented Programmable Processing in Memory. In 23rd International Symposium on High-Performance Parallel and Distributed Computing (HPDC)."},{"key":"e_1_3_2_1_96_1","volume-title":"GraphP: Reducing Communication for PIM-Based Graph Processing with Efficient Data Partition. In 24th IEEE International Symposium on High Performance Computer Architecture (HPCA).","author":"Zhang Mingxing","year":"2018","unstructured":"Mingxing Zhang , Youwei Zhuo , Chao Wang , Mingyu Gao , Yongwei Wu , Kang Chen , Christos Kozyrakis , and Xuehai Qian . 2018 . GraphP: Reducing Communication for PIM-Based Graph Processing with Efficient Data Partition. In 24th IEEE International Symposium on High Performance Computer Architecture (HPCA). Mingxing Zhang, Youwei Zhuo, Chao Wang, Mingyu Gao, Yongwei Wu, Kang Chen, Christos Kozyrakis, and Xuehai Qian. 2018. GraphP: Reducing Communication for PIM-Based Graph Processing with Efficient Data Partition. In 24th IEEE International Symposium on High Performance Computer Architecture (HPCA)."},{"key":"e_1_3_2_1_97_1","volume-title":"GraphQ: Scalable PIM-Based Graph Processing. In 52nd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO).","author":"Zhuo Youwei","year":"2019","unstructured":"Youwei Zhuo , Chao Wang , Mingxing Zhang , Rui Wang , Dimin Niu , Yanzhi Wang , and Xuehai Qian . 2019 . GraphQ: Scalable PIM-Based Graph Processing. In 52nd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO). Youwei Zhuo, Chao Wang, Mingxing Zhang, Rui Wang, Dimin Niu, Yanzhi Wang, and Xuehai Qian. 2019. GraphQ: Scalable PIM-Based Graph Processing. In 52nd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO)."}],"event":{"name":"ASPLOS '23: 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3","location":"Vancouver BC Canada","acronym":"ASPLOS '23","sponsor":["SIGARCH ACM Special Interest Group on Computer Architecture","SIGOPS ACM Special Interest Group on Operating Systems","SIGPLAN ACM Special Interest Group on Programming Languages","SIGBED ACM Special Interest Group on Embedded Systems"]},"container-title":["Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3582016.3582026","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:46:45Z","timestamp":1750178805000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3582016.3582026"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,3,25]]},"references-count":97,"alternative-id":["10.1145\/3582016.3582026","10.1145\/3582016"],"URL":"https:\/\/doi.org\/10.1145\/3582016.3582026","relation":{},"subject":[],"published":{"date-parts":[[2023,3,25]]},"assertion":[{"value":"2023-03-25","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}