{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,5]],"date-time":"2026-03-05T15:44:46Z","timestamp":1772725486984,"version":"3.50.1"},"reference-count":25,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2023,3,19]],"date-time":"2023-03-19T00:00:00Z","timestamp":1679184000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000001","name":"US National Science Foundation","doi-asserted-by":"crossref","award":["CCF-1815467"],"award-info":[{"award-number":["CCF-1815467"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Des. Autom. Electron. Syst."],"published-print":{"date-parts":[[2023,5,31]]},"abstract":"<jats:p>Graph application workloads are dominated by random memory accesses with the poor locality. To tackle the irregular and sparse nature of computation, ReRAM-based Processing-in-Memory (PIM) architectures have been proposed recently. Most of these ReRAM architecture designs have focused on mapping graph computations into a set of multiply-and-accumulate (MAC) operations. ReRAMs also offer a key advantage in reducing memory latency between cores and memory by allowing for PIM. However, when implemented on a ReRAM-based manycore architecture, graph applications still pose two key challenges\u2014significant storage requirements (particularly due to wasted zero cell storage), and significant amount of on-chip traffic. To tackle these two challenges, in this article, we propose the design of a 3D NoC-enabled ReRAM-based manycore architecture. Our proposed architecture incorporates a novel crossbar-aware node reordering to reduce ReRAM storage requirements. Secondly, its 3D NoC-enabled design reduces on-chip communication latency. Our architecture outperforms the state-of-the-art in ReRAM-based graph acceleration by up to 5\u00d7 in performance while consuming up to 10.3\u00d7 less energy for a range of graph inputs and workloads.<\/jats:p>","DOI":"10.1145\/3564290","type":"journal-article","created":{"date-parts":[[2022,10,7]],"date-time":"2022-10-07T13:16:18Z","timestamp":1665148578000},"page":"1-16","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":5,"title":["Accelerating Graph Computations on 3D NoC-Enabled PIM Architectures"],"prefix":"10.1145","volume":"28","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6852-6074","authenticated-orcid":false,"given":"Dwaipayan","family":"Choudhury","sequence":"first","affiliation":[{"name":"Washington State University, Pullman, WA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5685-6826","authenticated-orcid":false,"given":"Lizhi","family":"Xiang","sequence":"additional","affiliation":[{"name":"Washington State University, Pullman, WA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4062-0293","authenticated-orcid":false,"given":"Aravind","family":"Rajam","sequence":"additional","affiliation":[{"name":"Washington State University, Pullman, WA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6721-233X","authenticated-orcid":false,"given":"Anantharaman","family":"Kalyanaraman","sequence":"additional","affiliation":[{"name":"Washington State University, Pullman, WA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5930-8531","authenticated-orcid":false,"given":"Partha Pratim","family":"Pande","sequence":"additional","affiliation":[{"name":"Washington State University, Pullman, WA"}]}],"member":"320","published-online":{"date-parts":[[2023,3,19]]},"reference":[{"key":"e_1_3_1_2_2","first-page":"1307","volume-title":"Proceedings of the Design, Automation & Test in Europe Conference & Exhibition","author":"Kalyanaraman K. A.","year":"2019","unstructured":"K. A. Kalyanaraman and P. Pande. 2019. A brief survey of algorithms, architectures, and challenges toward extreme-scale graph analytics. In Proceedings of the Design, Automation & Test in Europe Conference & Exhibition. 1307\u20131312."},{"key":"e_1_3_1_3_2","first-page":"696","volume-title":"Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS)","author":"Zheng L.","year":"2020","unstructured":"L. Zheng, J. Zhao, Y. Huang, Q. Wang, Z. Zeng, J. Xue, X. Liao, and H. Jin. 2020. Spara: An energy-efficient ReRAM-Based accelerator for sparse graph analytics applications. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS). 696\u2013707."},{"key":"e_1_3_1_4_2","doi-asserted-by":"crossref","first-page":"120","DOI":"10.1145\/3287624.3287637","volume-title":"Proceedings of the Asia and South Pacific Design Automation Conference","author":"Dai G.","year":"2019","unstructured":"G. Dai, T. Huang, Y. Wang, H. Yang, and J. Wawrzynek. 2019. GraphSAR: A sparsity-aware processing-in-memory architecture for large-scale graph processing on ReRAMs. In Proceedings of the Asia and South Pacific Design Automation Conference. 120\u2013126."},{"key":"e_1_3_1_5_2","first-page":"254","volume-title":"Proceedings of the IEEE International Conference on Computer Design","author":"Maashri A. A.","year":"2009","unstructured":"A. A. Maashri, G. Sun, X. Dong, V. Narayanan, and Y. Xie. 2009. 3D GPU architecture using cache stacking: Performance, cost, power and thermal analysis. In Proceedings of the IEEE International Conference on Computer Design. 254\u2013259."},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2016.24"},{"key":"e_1_3_1_7_2","first-page":"531","volume-title":"Proceedings of the IEEE International Symposium on High Performance Computer Architecture","author":"Song L.","year":"2018","unstructured":"L. Song, Y. Zhuo, X. Qian, H. Li, and Y. Chen. 2018. GraphR: Accelerating graph processing using ReRAM. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture. 531\u2013543."},{"key":"e_1_3_1_8_2","first-page":"973","volume-title":"Proceedings of the Design, Automation & Test in Europe Conference & Exhibition","author":"Huang T.","year":"2018","unstructured":"T. Huang, G. Dai, Y. Wang, and H. Yang. 2018. HyVE: Hybrid vertex-edge memory hierarchy for energy-efficient graph processing. In Proceedings of the Design, Automation & Test in Europe Conference & Exhibition. 973\u2013978."},{"key":"e_1_3_1_9_2","first-page":"1","volume-title":"Proceedings of the 54th ACM\/IEEE Design Automation Conference (DAC)","author":"Duraisamy K.","year":"2017","unstructured":"K. Duraisamy, H. Lu, P. P. Pande, and A. Kalyanaraman. 2017. Accelerating graph community detection with approximate updates via an energy-efficient NoC. In Proceedings of the 54th ACM\/IEEE Design Automation Conference (DAC). 1\u20136."},{"key":"e_1_3_1_10_2","first-page":"240","volume-title":"Proceedings of the IEEE International Symposium on Workload Characterization (IISWC)","author":"Barik R.","year":"2020","unstructured":"R. Barik, M. Minutoli, M. Halappanavar, N. R. Tallent, and A. Kalyanaraman. 2020. Vertex reordering for real-world graphs and applications: An empirical evaluation. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC). 240\u2013251."},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1145\/1412228.141223"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2016.2604288"},{"issue":"6","key":"e_1_3_1_13_2","doi-asserted-by":"crossref","first-page":"852","DOI":"10.1109\/TC.2018.2889053","article-title":"Learning-Based Application-Agnostic 3D NoC design for heterogeneous manycore systems","volume":"68","author":"Marculescu D.","year":"2019","unstructured":"B. K. Joardar, R. G. Kim, J. R. Doppa, P. P. Pande, D. Marculescu, and R. Marculescu. 2019. Learning-Based Application-Agnostic 3D NoC design for heterogeneous manycore systems. IEEE Transactions on Computers 68, 6 (2019), 852\u2013866.","journal-title":"IEEE Transactions on Computers"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/TEVC.2007.900837"},{"key":"e_1_3_1_15_2","first-page":"14","volume-title":"Proceedings of the ACM\/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA)","author":"Shafiee A.","year":"2016","unstructured":"A. Shafiee, A. Nag, N. Muralimanohar, R. Balasubramonian, J. P. Strachan, M. Hu, R. S. Williams, and V. Srikumar. 2016. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars. In Proceedings of the ACM\/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). 14-2."},{"key":"e_1_3_1_16_2","first-page":"1667","volume-title":"Design, Automation & Test in Europe Conference & Exhibition (DATE)","author":"Pande P. P","year":"2021","unstructured":"A. I. Arka, B. K. Joardar, J. R. Doppa, and P. P. Pande. 2021. ReGraphX: NoC-enabled 3D Heterogeneous ReRAM architecture for training graph neural networks. Design, Automation & Test in Europe Conference & Exhibition (DATE). 1667\u20131672."},{"key":"e_1_3_1_17_2","first-page":"86","volume-title":"Proceedings of IEEE International Symposium on Performance Analysis of Systems and Software","author":"Jiang N.","year":"2013","unstructured":"N. Jiang, D. U. Becker, G. Michelogiannakis, J. Balfour, B. Towles, D. E. Shaw, J. Kim, and W. Dally. 2013. A detailed and flexible cycle-accurate network-on-chip simulator. In Proceedings of IEEE International Symposium on Performance Analysis of Systems and Software. 86\u201396."},{"key":"e_1_3_1_18_2","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1007\/978-1-4419-9551-3_2","volume-title":"Proceedings of the Emerging Memory Technologies","author":"Dong X.","year":"2014","unstructured":"X. Dong, C. Xu, Y. Xie, and N. P. Jouppi. 2014. Nvsim: A circuit-level performance, energy, and area model for emerging non-volatile memory. In Proceedings of the Emerging Memory Technologies. Springer, 15\u201350."},{"key":"e_1_3_1_19_2","unstructured":"http:\/\/snap.stanford.edu\/. Data accessed September 2021."},{"key":"e_1_3_1_20_2","unstructured":"http:\/\/networkrepository.com\/. Data accessed September 2021."},{"key":"e_1_3_1_21_2","first-page":"463","volume-title":"Proceedings of the IEEE\/ACM International Conference on Computer-Aided Design","author":"Vincenzi A.","year":"2010","unstructured":"A . Sridhar, A. Vincenzi, M. Ruggiero, T. Brunschwiler, and D. Atienza. 2010. 3D-ICE: Fast compact transient thermal modeling for 3D ICs with inter-tier liquid cooling. In Proceedings of the IEEE\/ACM International Conference on Computer-Aided Design (2010), 463\u2013470."},{"key":"e_1_3_1_22_2","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1109\/VLSIT.2012.6242449","volume-title":"Proceedings of the 2012 Symposium on VLSI Technology","author":"Lee W.","year":"2012","unstructured":"W. Lee, J. Park, J. Shin, and J. Woo. 2012. Varistor-type bidirectional switch (JMAX >107 A\/cm2, selectivity\u223c104) for 3D bipolar resistive memory arrays. In Proceedings of the 2012 Symposium on VLSI Technology (Jun. 2012), 37\u201338."},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/LED.2015.2427313"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/TR.2019.2910793"},{"issue":"1","key":"e_1_3_1_25_2","doi-asserted-by":"crossref","first-page":"Article No: 18","DOI":"10.1145\/3482880","article-title":"High-performance and energy-efficient 3D manycore GPU architecture for accelerating graph analytics","volume":"18","author":"Choudhury D.","year":"2022","unstructured":"D. Choudhury, A. S. Rajam, A. Kalyanaraman, and P. Pande. 2022. High-performance and energy-efficient 3D manycore GPU architecture for accelerating graph analytics. ACM Journal on Emerging Technologies in Computing Systems 18, 1 (2022), Article No: 18, 1\u201319.","journal-title":"ACM Journal on Emerging Technologies in Computing Systems"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1145\/3514354"}],"container-title":["ACM Transactions on Design Automation of Electronic Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3564290","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3564290","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T18:09:10Z","timestamp":1750183750000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3564290"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,3,19]]},"references-count":25,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2023,5,31]]}},"alternative-id":["10.1145\/3564290"],"URL":"https:\/\/doi.org\/10.1145\/3564290","relation":{},"ISSN":["1084-4309","1557-7309"],"issn-type":[{"value":"1084-4309","type":"print"},{"value":"1557-7309","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,3,19]]},"assertion":[{"value":"2022-03-17","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-09-13","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-03-19","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}