{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,5]],"date-time":"2026-03-05T15:45:09Z","timestamp":1772725509961,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":60,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,6,11]],"date-time":"2022-06-11T00:00:00Z","timestamp":1654905600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"DARPA"},{"name":"MARCO"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,6,18]]},"DOI":"10.1145\/3470496.3527402","type":"proceedings-article","created":{"date-parts":[[2022,5,31]],"date-time":"2022-05-31T19:06:01Z","timestamp":1654023961000},"page":"218-230","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":19,"title":["Gearbox"],"prefix":"10.1145","author":[{"given":"Marzieh","family":"Lenjani","sequence":"first","affiliation":[{"name":"University of Virginia"}]},{"given":"Alif","family":"Ahmed","sequence":"additional","affiliation":[{"name":"University of Virginia"}]},{"given":"Mircea","family":"Stan","sequence":"additional","affiliation":[{"name":"University of Virginia"}]},{"given":"Kevin","family":"Skadron","sequence":"additional","affiliation":[{"name":"University of Virginia"}]}],"member":"320","published-online":{"date-parts":[[2022,6,11]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Retrieved","author":"Sheet Data","year":"2022","unstructured":"2016. Data Sheet : Tesla P100 . Retrieved April 22, 2022 from https:\/\/images.nvidia.com\/content\/tesla\/pdf\/nvidia-tesla-p100-PCIe-datasheet.pdf 2016. Data Sheet: Tesla P100. Retrieved April 22, 2022 from https:\/\/images.nvidia.com\/content\/tesla\/pdf\/nvidia-tesla-p100-PCIe-datasheet.pdf"},{"key":"e_1_3_2_1_2_1","volume-title":"Retrieved","year":"2022","unstructured":"2020. A benchmark suit for In-situ computing . Retrieved April 22, 2022 from https:\/\/github.com\/MarziehLenjani\/InSituBench 2020. A benchmark suit for In-situ computing. Retrieved April 22, 2022 from https:\/\/github.com\/MarziehLenjani\/InSituBench"},{"key":"e_1_3_2_1_3_1","unstructured":"2020. Collective Operations. Retrieved April 22 2022 from https:\/\/docs.nvidia.com\/deeplearning\/nccl\/user-guide\/docs\/usage\/collectives.html  2020. Collective Operations. Retrieved April 22 2022 from https:\/\/docs.nvidia.com\/deeplearning\/nccl\/user-guide\/docs\/usage\/collectives.html"},{"key":"e_1_3_2_1_4_1","volume-title":"Retrieved","year":"2022","unstructured":"2020. MoveProf : Integrating NVProf and GPUWattch for Extracting the Energy Cost of Data Movement . Retrieved April 22, 2022 from https:\/\/github.com\/MarziehLenjani\/MoveProf 2020. MoveProf: Integrating NVProf and GPUWattch for Extracting the Energy Cost of Data Movement. Retrieved April 22, 2022 from https:\/\/github.com\/MarziehLenjani\/MoveProf"},{"key":"e_1_3_2_1_5_1","volume-title":"The Building Blocks of Advanced Multi-GPU Communication. Retrieved","year":"2022","unstructured":"2022. NVLink and NVSwitch , The Building Blocks of Advanced Multi-GPU Communication. Retrieved April 22, 2022 from https:\/\/www.nvidia.com\/en-us\/data-center\/nvlink\/ 2022. NVLink and NVSwitch, The Building Blocks of Advanced Multi-GPU Communication. Retrieved April 22, 2022 from https:\/\/www.nvidia.com\/en-us\/data-center\/nvlink\/"},{"key":"e_1_3_2_1_6_1","volume-title":"Profiler User's Guide. Retrieved","year":"2022","unstructured":"2022. Profiler User's Guide. Retrieved April 22, 2022 from https:\/\/docs.nvidia.com\/cuda\/profiler-users-guide\/index.html 2022. Profiler User's Guide. Retrieved April 22, 2022 from https:\/\/docs.nvidia.com\/cuda\/profiler-users-guide\/index.html"},{"key":"e_1_3_2_1_7_1","unstructured":"Junwhan Ahn Sungpack Hong Sungjoo Yoo Onur Mutlu and Kiyoung Choi. 2015. A Scalable Processing-in-memory Accelerator for Parallel Graph Processing. In ISCA.  Junwhan Ahn Sungpack Hong Sungjoo Yoo Onur Mutlu and Kiyoung Choi. 2015. A Scalable Processing-in-memory Accelerator for Parallel Graph Processing. In ISCA."},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"crossref","unstructured":"Ariful Azad and Aydin Bulu\u00e7. 2017. A Work-efficient Parallel Sparse Matrix-sparse Vector Multiplication Algorithm. In IPDPS.  Ariful Azad and Aydin Bulu\u00e7. 2017. A Work-efficient Parallel Sparse Matrix-sparse Vector Multiplication Algorithm. In IPDPS.","DOI":"10.1109\/IPDPS.2017.76"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"crossref","unstructured":"Nagadastagiri Challapalle Sahithi Rampalli Linghao Song Nandhini Chandramoorthy Karthik Swaminathan John Sampson Yiran Chen and Vijaykrishnan Narayanan. 2020. GaaS-X: Graph Analytics Accelerator Supporting Sparse Data Representation using Crossbar Architectures. In ISCA.  Nagadastagiri Challapalle Sahithi Rampalli Linghao Song Nandhini Chandramoorthy Karthik Swaminathan John Sampson Yiran Chen and Vijaykrishnan Narayanan. 2020. GaaS-X: Graph Analytics Accelerator Supporting Sparse Data Representation using Crossbar Architectures. In ISCA.","DOI":"10.1109\/ISCA45697.2020.00044"},{"key":"e_1_3_2_1_10_1","volume-title":"Jay B Brockman, and Norman P Jouppi.","author":"Chen Ke","year":"2012","unstructured":"Ke Chen , Sheng Li , Naveen Muralimanohar , Jung Ho Ahn , Jay B Brockman, and Norman P Jouppi. 2012 . CACTI-3DD: Architecture-level Modeling for 3D Die-stacked DRAM Main Memory. In DATE. Ke Chen, Sheng Li, Naveen Muralimanohar, Jung Ho Ahn, Jay B Brockman, and Norman P Jouppi. 2012. CACTI-3DD: Architecture-level Modeling for 3D Die-stacked DRAM Main Memory. In DATE."},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"crossref","unstructured":"Xinyu Chen Hongshi Tan Yao Chen Bingsheng He Weng-Fai Wong and Deming Chen. 2021. ThunderGP: HLS-based Graph Pprocessing Framework on FPGAs. In FPGA.  Xinyu Chen Hongshi Tan Yao Chen Bingsheng He Weng-Fai Wong and Deming Chen. 2021. ThunderGP: HLS-based Graph Pprocessing Framework on FPGAs. In FPGA.","DOI":"10.1145\/3431920.3439290"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3020078.3021739"},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2018.2821565"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/2049662.2049670"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"crossref","unstructured":"Mario Drumond Alexandros Daglis Nooshin Mirzadeh Dmitrii Ustiugov Javier Picorel Babak Falsafi Boris Grot and Dionisios Pnevmatikatos. 2017. The Mondrian Data Engine. In ISCA.  Mario Drumond Alexandros Daglis Nooshin Mirzadeh Dmitrii Ustiugov Javier Picorel Babak Falsafi Boris Grot and Dionisios Pnevmatikatos. 2017. The Mondrian Data Engine. In ISCA.","DOI":"10.1145\/3079856.3080233"},{"key":"e_1_3_2_1_16_1","unstructured":"Yasuko Eckert Nuwan Jayasena and Gabriel H Loh. 2014. Thermal Feasibility of Die-stacked Processing in Memory. In WoNDP.  Yasuko Eckert Nuwan Jayasena and Gabriel H Loh. 2014. Thermal Feasibility of Die-stacked Processing in Memory. In WoNDP."},{"key":"e_1_3_2_1_17_1","unstructured":"Joseph E Gonzalez YuchengLow Haijie Gu Danny Bickson and Carlos Guestrin. 2012. PowerGraph: Distributed Graph-parallel Computation on Natural Graphs. In OSDI.  Joseph E Gonzalez YuchengLow Haijie Gu Danny Bickson and Carlos Guestrin. 2012. PowerGraph: Distributed Graph-parallel Computation on Natural Graphs. In OSDI."},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3404975"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11390-019-1914-z"},{"key":"e_1_3_2_1_20_1","volume-title":"Saibal Mukhopadhyay, Sudhakar Yalamanchili, and Hyesoon Kim.","author":"Hadidi Ramyad","year":"2017","unstructured":"Ramyad Hadidi , Bahar Asgari , Burhan Ahmad Mudassar , Saibal Mukhopadhyay, Sudhakar Yalamanchili, and Hyesoon Kim. 2017 . Demystifying the characteristics of 3dstacked memories: A Case Study for Hybrid Memory Cube. In IISWC. Ramyad Hadidi, Bahar Asgari, Burhan Ahmad Mudassar, Saibal Mukhopadhyay, Sudhakar Yalamanchili, and Hyesoon Kim. 2017. Demystifying the characteristics of 3dstacked memories: A Case Study for Hybrid Memory Cube. In IISWC."},{"key":"e_1_3_2_1_21_1","volume-title":"Nika Mansouri Ghiasi, Minesh Patel, Mohammed Alser, Saugata Ghose, Juan G\u00f3mez-Luna, and Onur Mutlu.","author":"Hajinazar Nastaran","year":"2021","unstructured":"Nastaran Hajinazar , Geraldo F Oliveira , Sven Gregorio , Jo\u00e3o Dinis Ferreira , Nika Mansouri Ghiasi, Minesh Patel, Mohammed Alser, Saugata Ghose, Juan G\u00f3mez-Luna, and Onur Mutlu. 2021 . SIMDRAM: a Framework for Bit-serial SIMD Processing using DRAM. In ASPLOS. Nastaran Hajinazar, Geraldo F Oliveira, Sven Gregorio, Jo\u00e3o Dinis Ferreira, Nika Mansouri Ghiasi, Minesh Patel, Mohammed Alser, Saugata Ghose, Juan G\u00f3mez-Luna, and Onur Mutlu. 2021. SIMDRAM: a Framework for Bit-serial SIMD Processing using DRAM. In ASPLOS."},{"key":"e_1_3_2_1_22_1","volume-title":"Graphicionado: A High-performance and Energy-efficient Accelerator for Graph Analytics. In MICRO.","author":"Ham Tae Jun","year":"2016","unstructured":"Tae Jun Ham , Lisa Wu , Narayanan Sundaram , Nadathur Satish , and Margaret Martonosi . 2016 . Graphicionado: A High-performance and Energy-efficient Accelerator for Graph Analytics. In MICRO. Tae Jun Ham, Lisa Wu, Narayanan Sundaram, Nadathur Satish, and Margaret Martonosi. 2016. Graphicionado: A High-performance and Energy-efficient Accelerator for Graph Analytics. In MICRO."},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3007787.3001163"},{"key":"e_1_3_2_1_24_1","volume-title":"Newton: A DRAM-maker's Accelerator-in-memory (AIM) Architecture for Machine Learning. In MICRO.","author":"He Mingxuan","year":"2020","unstructured":"Mingxuan He , Choungki Song , Ilkon Kim , Chunseok Jeong , Seho Kim , Il Park , Mithuna Thottethodi , and TN Vijaykumar . 2020 . Newton: A DRAM-maker's Accelerator-in-memory (AIM) Architecture for Machine Learning. In MICRO. Mingxuan He, Choungki Song, Ilkon Kim, Chunseok Jeong, Seho Kim, Il Park, Mithuna Thottethodi, and TN Vijaykumar. 2020. Newton: A DRAM-maker's Accelerator-in-memory (AIM) Architecture for Machine Learning. In MICRO."},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"crossref","unstructured":"Kevin Hsieh Samira Khan Nandita Vijaykumar Kevin K Chang Amirali Boroumand Saugata Ghose and Onur Mutlu. 2016. Accelerating Pointer Chasing in 3D-stacked Memory: Challenges Mechanisms Evaluation. In ICCD.  Kevin Hsieh Samira Khan Nandita Vijaykumar Kevin K Chang Amirali Boroumand Saugata Ghose and Onur Mutlu. 2016. Accelerating Pointer Chasing in 3D-stacked Memory: Challenges Mechanisms Evaluation. In ICCD.","DOI":"10.1109\/ICCD.2016.7753257"},{"key":"e_1_3_2_1_26_1","unstructured":"Yuwei Hu Yixiao Du Ecenur Ustun and Zhiru Zhang. 2021. GraphLily: Accelerating Graph Linear Algebra on HBM-Equipped FPGAs. In ICCAD.  Yuwei Hu Yixiao Du Ecenur Ustun and Zhiru Zhang. 2021. GraphLily: Accelerating Graph Linear Algebra on HBM-Equipped FPGAs. In ICCAD."},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"crossref","unstructured":"Jeremy Kepner Peter Aaltonen David Bader Aydin Bulu\u00e7 Franz Franchetti John Gilbert Dylan Hutchison Manoj Kumar Andrew Lumsdaine Henning Meyerhenke etal 2016. Mathematical Foundations of the GraphBLAS. In HPEC.  Jeremy Kepner Peter Aaltonen David Bader Aydin Bulu\u00e7 Franz Franchetti John Gilbert Dylan Hutchison Manoj Kumar Andrew Lumsdaine Henning Meyerhenke et al. 2016. Mathematical Foundations of the GraphBLAS. In HPEC.","DOI":"10.1109\/HPEC.2016.7761646"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-9260(99)00006-1"},{"key":"e_1_3_2_1_29_1","volume-title":"Jaehoon Lee, Sang-Hyuk Kwon, Je Min Ryu, Jong-Pil Son, O Seongil, Hak-Soo Yu, Haesuk Lee, Soo Young Kim, et al.","author":"Kwon Young-Cheon","year":"2021","unstructured":"Young-Cheon Kwon , Suk Han Lee , Jaehoon Lee, Sang-Hyuk Kwon, Je Min Ryu, Jong-Pil Son, O Seongil, Hak-Soo Yu, Haesuk Lee, Soo Young Kim, et al. 2021 . 25.4 A 20nm 6GB Function-In-Memory DRAM, Based on HBM2 with a 1.2 TFLOPS Programmable Computing Unit Using Bank-Level Parallelism, for Machine Learning Applications. In ISSCC. Young-Cheon Kwon, Suk Han Lee, Jaehoon Lee, Sang-Hyuk Kwon, Je Min Ryu, Jong-Pil Son, O Seongil, Hak-Soo Yu, Haesuk Lee, Soo Young Kim, et al. 2021. 25.4 A 20nm 6GB Function-In-Memory DRAM, Based on HBM2 with a 1.2 TFLOPS Programmable Computing Unit Using Bank-Level Parallelism, for Machine Learning Applications. In ISSCC."},{"key":"e_1_3_2_1_30_1","unstructured":"Sukhan Lee Shin-haeng Kang Jaehoon Lee Hyeonsu Kim Eojin Lee Seungwoo Seo Hosang Yoon Seungwon Lee Kyounghwan Lim Hyunsung Shin etal 2021. Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology. In ISCA.  Sukhan Lee Shin-haeng Kang Jaehoon Lee Hyeonsu Kim Eojin Lee Seungwoo Seo Hosang Yoon Seungwon Lee Kyounghwan Lim Hyunsung Shin et al. 2021. Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology. In ISCA."},{"key":"e_1_3_2_1_31_1","volume-title":"Tor M. Aamodt, and Vijay Janapa Reddi.","author":"Leng Jingwen","year":"2013","unstructured":"Jingwen Leng , Tayler Hetherington , Ahmed ElTantawy , Syed Gilani , Nam Sung Kim , Tor M. Aamodt, and Vijay Janapa Reddi. 2013 . GPUWattch: Enabling energy optimizations in GPGPUs. In ISCA. Jingwen Leng, Tayler Hetherington, Ahmed ElTantawy, Syed Gilani, Nam Sung Kim, Tor M. Aamodt, and Vijay Janapa Reddi. 2013. GPUWattch: Enabling energy optimizations in GPGPUs. In ISCA."},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"crossref","unstructured":"Marzieh Lenjani Patricia Gonzalez Elaheh Sadredini Shuangchen Li Yuan Xie Ameen Akel Sean Eilert Mircea R. Stan and Kevin Skadron. 2020. Fulcrum: a Simplified Control and Access Mechanism toward Flexible and Practical in-situ Accelerators. In HPCA.  Marzieh Lenjani Patricia Gonzalez Elaheh Sadredini Shuangchen Li Yuan Xie Ameen Akel Sean Eilert Mircea R. Stan and Kevin Skadron. 2020. Fulcrum: a Simplified Control and Access Mechanism toward Flexible and Practical in-situ Accelerators. In HPCA.","DOI":"10.1109\/HPCA47549.2020.00052"},{"key":"e_1_3_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1049\/iet-cdt.2011.0066"},{"key":"e_1_3_2_1_34_1","volume-title":"Position Dependency, and Divergence in PIM-based Accelerators","author":"Lenjani Marzieh","year":"2021","unstructured":"Marzieh Lenjani and Kevin Skadron . 2021. Supporting Moderate Data Dependency , Position Dependency, and Divergence in PIM-based Accelerators . IEEE Micro ( 2021 ). Marzieh Lenjani and Kevin Skadron. 2021. Supporting Moderate Data Dependency, Position Dependency, and Divergence in PIM-based Accelerators. IEEE Micro (2021)."},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/3123939.3123977"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"crossref","unstructured":"Justin Meza Qiang Wu Sanjeev Kumar and Onur Mutlu. 2015. Revisiting Memory Errors in Large-scale Production Data centers: Analysis and Modeling of New Trends from the Field. In DSN.  Justin Meza Qiang Wu Sanjeev Kumar and Onur Mutlu. 2015. Revisiting Memory Errors in Large-scale Production Data centers: Analysis and Modeling of New Trends from the Field. In DSN.","DOI":"10.1109\/DSN.2015.57"},{"key":"e_1_3_2_1_37_1","unstructured":"Lifeng Nai Ramyad Hadidi Jaewoong Sim Hyojong Kim Pranith Kumar and Hyesoon Kim. 2017. GraphPIM: Enabling instruction-level PIM offloading in graph computing frameworks. In HPCA.  Lifeng Nai Ramyad Hadidi Jaewoong Sim Hyojong Kim Pranith Kumar and Hyesoon Kim. 2017. GraphPIM: Enabling instruction-level PIM offloading in graph computing frameworks. In HPCA."},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"crossref","unstructured":"Eriko Nurvitadhi Asit Mishra Yu Wang Ganesh Venkatesh and Debbie Marr. 2016. Hardware Accelerator for Analytics of Sparse Data. In DATE.  Eriko Nurvitadhi Asit Mishra Yu Wang Ganesh Venkatesh and Debbie Marr. 2016. Hardware Accelerator for Analytics of Sparse Data. In DATE.","DOI":"10.3850\/9783981537079_0766"},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3007787.3001155"},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"crossref","unstructured":"Mike O'Connor Niladrish Chatterjee Donghyuk Lee John Wilson Aditya Agrawal Stephen W Keckler and William J Dally. 2017. Fine-grained DRAM: Energy-efficient DRAM for Extreme Bandwidth Systems. In MICRO.  Mike O'Connor Niladrish Chatterjee Donghyuk Lee John Wilson Aditya Agrawal Stephen W Keckler and William J Dally. 2017. Fine-grained DRAM: Energy-efficient DRAM for Extreme Bandwidth Systems. In MICRO.","DOI":"10.1145\/3123939.3124545"},{"key":"e_1_3_2_1_41_1","volume-title":"Outerspace: An Outer Product based Sparse Matrix Multiplication Accelerator. In HPCA.","author":"Pal Subhankar","year":"2018","unstructured":"Subhankar Pal , Jonathan Beaumont , Dong-Hyeon Park , Aporva Amarnath , Siying Feng , Chaitali Chakrabarti , Hun-Seok Kim , David Blaauw , Trevor Mudge , and Ronald Dreslinski . 2018 . Outerspace: An Outer Product based Sparse Matrix Multiplication Accelerator. In HPCA. Subhankar Pal, Jonathan Beaumont, Dong-Hyeon Park, Aporva Amarnath, Siying Feng, Chaitali Chakrabarti, Hun-Seok Kim, David Blaauw, Trevor Mudge, and Ronald Dreslinski. 2018. Outerspace: An Outer Product based Sparse Matrix Multiplication Accelerator. In HPCA."},{"key":"e_1_3_2_1_42_1","volume-title":"James C Hoe, Larry Pileggi, and Franz Franchetti.","author":"Sadi Fazle","year":"2019","unstructured":"Fazle Sadi , Joe Sweeney , Tze Meng Low , James C Hoe, Larry Pileggi, and Franz Franchetti. 2019 . Efficient SPMV Operation for Large and Highly Sparse Matrices using Scalable Multi-way Merge Parallelization. In MICRO. Fazle Sadi, Joe Sweeney, Tze Meng Low, James C Hoe, Larry Pileggi, and Franz Franchetti. 2019. Efficient SPMV Operation for Large and Highly Sparse Matrices using Scalable Multi-way Merge Parallelization. In MICRO."},{"key":"e_1_3_2_1_43_1","volume-title":"Sunder: Enabling Low-Overhead and Scalable Near-Data Pattern Matching Acceleration. In MICRO.","author":"Sadredini Elaheh","year":"2021","unstructured":"Elaheh Sadredini , Reza Rahimi , Mohsen Imani , and Kevin Skadron . 2021 . Sunder: Enabling Low-Overhead and Scalable Near-Data Pattern Matching Acceleration. In MICRO. Elaheh Sadredini, Reza Rahimi, Mohsen Imani, and Kevin Skadron. 2021. Sunder: Enabling Low-Overhead and Scalable Near-Data Pattern Matching Acceleration. In MICRO."},{"key":"e_1_3_2_1_44_1","doi-asserted-by":"crossref","unstructured":"Elaheh Sadredini Reza Rahimi Marzieh Lenjani Mircea Stan and Kevin Skadron. 2020. FlexAmata: A Universal and Efficient Adaption of Applications to Spatial Automata Processing Accelerators. In ASPLOS.  Elaheh Sadredini Reza Rahimi Marzieh Lenjani Mircea Stan and Kevin Skadron. 2020. FlexAmata: A Universal and Efficient Adaption of Applications to Spatial Automata Processing Accelerators. In ASPLOS.","DOI":"10.1145\/3373376.3378459"},{"key":"e_1_3_2_1_45_1","volume-title":"Impala: Algorithm\/architecture co-design for in-memory multi-stride pattern matching. In HPCA.","author":"Sadredini Elaheh","year":"2020","unstructured":"Elaheh Sadredini , Reza Rahimi , Marzieh Lenjani , Mircea Stan , and Kevin Skadron . 2020 . Impala: Algorithm\/architecture co-design for in-memory multi-stride pattern matching. In HPCA. Elaheh Sadredini, Reza Rahimi, Marzieh Lenjani, Mircea Stan, and Kevin Skadron. 2020. Impala: Algorithm\/architecture co-design for in-memory multi-stride pattern matching. In HPCA."},{"key":"e_1_3_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/LCA.2020.3042194"},{"key":"e_1_3_2_1_47_1","doi-asserted-by":"crossref","unstructured":"Elaheh Sadredini Reza Rahimi Vaibhav Verma Mircea Stan and Kevin Skadron. 2019. eAP: A Scalable and Efficient In-Memory Accelerator for Automata Processing. In MICRO.  Elaheh Sadredini Reza Rahimi Vaibhav Verma Mircea Stan and Kevin Skadron. 2019. eAP: A Scalable and Efficient In-Memory Accelerator for Automata Processing. In MICRO.","DOI":"10.1145\/3352460.3358324"},{"key":"e_1_3_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/3123939.3124544"},{"key":"e_1_3_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/2851141.2851145"},{"key":"e_1_3_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/3108140"},{"key":"e_1_3_2_1_51_1","volume-title":"Sieve: Scalable in-situ dram-based accelerator designs for massively parallel k-mer matching. In ISCA.","author":"Wu Lingxi","year":"2021","unstructured":"Lingxi Wu , Rasool Sharifi , Marzieh Lenjani , Kevin Skadron , and Ashish Venkat . 2021 . Sieve: Scalable in-situ dram-based accelerator designs for massively parallel k-mer matching. In ISCA. Lingxi Wu, Rasool Sharifi, Marzieh Lenjani, Kevin Skadron, and Ashish Venkat. 2021. Sieve: Scalable in-situ dram-based accelerator designs for massively parallel k-mer matching. In ISCA."},{"key":"e_1_3_2_1_52_1","unstructured":"Xinfeng Xie Zheng Liang Peng Gu Abanti Basak Lei Deng Ling Liang Xing Hu and Yuan Xie. 2021. SpaceA: Sparse Matrix Vector Multiplication on Processing-in-Memory Accelerator. In HPCA.  Xinfeng Xie Zheng Liang Peng Gu Abanti Basak Lei Deng Ling Liang Xing Hu and Yuan Xie. 2021. SpaceA: Sparse Matrix Vector Multiplication on Processing-in-Memory Accelerator. In HPCA."},{"key":"e_1_3_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/3466795"},{"key":"e_1_3_2_1_54_1","doi-asserted-by":"crossref","unstructured":"Carl Yang Yangzihao Wang and John D Owens. 2015. Fast Sparse Matrix and Sparse Vector Multiplication Algorithm on the GPU. In IPDPSW.  Carl Yang Yangzihao Wang and John D Owens. 2015. Fast Sparse Matrix and Sparse Vector Multiplication Algorithm on the GPU. In IPDPSW.","DOI":"10.1109\/IPDPSW.2015.77"},{"key":"e_1_3_2_1_55_1","doi-asserted-by":"crossref","unstructured":"Amir Yazdanbakhsh Choungki Song Jacob Sacks Pejman Lotfi-Kamran Hadi Esmaeilzadeh and Nam Sung Kim. 2018. In-DRAM Near-data Approximate Acceleration for GPUs. In PACT.  Amir Yazdanbakhsh Choungki Song Jacob Sacks Pejman Lotfi-Kamran Hadi Esmaeilzadeh and Nam Sung Kim. 2018. In-DRAM Near-data Approximate Acceleration for GPUs. In PACT.","DOI":"10.1145\/3243176.3243188"},{"key":"e_1_3_2_1_56_1","doi-asserted-by":"crossref","unstructured":"Dongping Zhang Nuwan Jayasena Alexander Lyashevsky Joseph L Greathouse Lifan Xu and Michael Ignatowski. 2014. TOP-PIM: Throughput-oriented programmable processing in memory. In HPDC.  Dongping Zhang Nuwan Jayasena Alexander Lyashevsky Joseph L Greathouse Lifan Xu and Michael Ignatowski. 2014. TOP-PIM: Throughput-oriented programmable processing in memory. In HPDC.","DOI":"10.1145\/2600212.2600213"},{"key":"e_1_3_2_1_57_1","volume-title":"GraphP: Reducing Communication for PIM-based Graph Processing with Efficient Data Partition","author":"Zhang Mingxing","unstructured":"Mingxing Zhang , Youwei Zhuo , Chao Wang , Mingyu Gao , Yongwei Wu , Kang Chen , Christos Kozyrakis , and Xuehai Qian . 2018. GraphP: Reducing Communication for PIM-based Graph Processing with Efficient Data Partition . In HPCA. IEEE , 544--557. Mingxing Zhang, Youwei Zhuo, Chao Wang, Mingyu Gao, Yongwei Wu, Kang Chen, Christos Kozyrakis, and Xuehai Qian. 2018. GraphP: Reducing Communication for PIM-based Graph Processing with Efficient Data Partition. In HPCA. IEEE, 544--557."},{"key":"e_1_3_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/3287624.3287711"},{"key":"e_1_3_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2019.2910068"},{"key":"e_1_3_2_1_60_1","volume-title":"Gridgraph: Large-scale Graph Pprocessing on a Single Machine using 2-level Hierarchical Partitioning. In ATC.","author":"Zhu Xiaowei","year":"2015","unstructured":"Xiaowei Zhu , Wentao Han , and Wenguang Chen . 2015 . Gridgraph: Large-scale Graph Pprocessing on a Single Machine using 2-level Hierarchical Partitioning. In ATC. Xiaowei Zhu, Wentao Han, and Wenguang Chen. 2015. Gridgraph: Large-scale Graph Pprocessing on a Single Machine using 2-level Hierarchical Partitioning. In ATC."}],"event":{"name":"ISCA '22: The 49th Annual International Symposium on Computer Architecture","location":"New York New York","acronym":"ISCA '22","sponsor":["SIGARCH ACM Special Interest Group on Computer Architecture","IEEE CS TCAA IEEE CS technical committee on architectural acoustics"]},"container-title":["Proceedings of the 49th Annual International Symposium on Computer Architecture"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3470496.3527402","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3470496.3527402","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3470496.3527402","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:30:28Z","timestamp":1750188628000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3470496.3527402"}},"subtitle":["a case for supporting accumulation dispatching and hybrid partitioning in PIM-based accelerators"],"short-title":[],"issued":{"date-parts":[[2022,6,11]]},"references-count":60,"alternative-id":["10.1145\/3470496.3527402","10.1145\/3470496"],"URL":"https:\/\/doi.org\/10.1145\/3470496.3527402","relation":{},"subject":[],"published":{"date-parts":[[2022,6,11]]},"assertion":[{"value":"2022-06-11","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}