{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,19]],"date-time":"2025-09-19T10:56:47Z","timestamp":1758279407685,"version":"3.41.0"},"reference-count":43,"publisher":"Association for Computing Machinery (ACM)","issue":"6","license":[{"start":{"date-parts":[[2022,6,27]],"date-time":"2022-06-27T00:00:00Z","timestamp":1656288000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000001","name":"US National Science Foundation","doi-asserted-by":"crossref","award":["CCF-1815467"],"award-info":[{"award-number":["CCF-1815467"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Des. Autom. Electron. Syst."],"published-print":{"date-parts":[[2022,11,30]]},"abstract":"<jats:p>Manycore GPU architectures have become the mainstay for accelerating graph computations. One of the primary bottlenecks to performance of graph computations on manycore architectures is the data movement. Since most of the accesses in graph processing are due to vertex neighborhood lookups, locality in graph data structures plays a key role in dictating the degree of data movement. Vertex reordering is a widely used technique to improve data locality within graph data structures. However, these reordering schemes alone are not sufficient as they need to be complemented with efficient task allocation on manycore GPU architectures to reduce latency due to local cache misses. Consequently, in this article, we introduce a software\/hardware co-design framework for accelerating graph computations. Our approach couples an architecture-aware vertex reordering with a priority-based task allocation technique. As the task allocation aims to reduce on-chip latency and associated energy, the choice of Network-on-Chip (NoC) as the communication backbone in the manycore platform is an important parameter. By leveraging emerging three-dimensional (3D) integration technology, we propose design of a small-world NoC (SWNoC)-enabled manycore GPU architecture, where the placement of the links connecting the streaming multiprocessors (SMs) and the memory controllers (MCs) follow a power-law distribution. The proposed 3D SWNoC-enabled software\/hardware co-design framework achieves 11.1% to 22.9% performance improvement and 16.4% to 32.6% less energy consumption depending on the dataset and the graph application, when compared to the default order of dataset running on a conventional planar mesh architecture.<\/jats:p>","DOI":"10.1145\/3514354","type":"journal-article","created":{"date-parts":[[2022,4,4]],"date-time":"2022-04-04T12:07:24Z","timestamp":1649074044000},"page":"1-22","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["Software\/Hardware Co-design of 3D NoC-based GPU Architectures for Accelerated Graph Computations"],"prefix":"10.1145","volume":"27","author":[{"given":"Dwaipayan","family":"Choudhury","sequence":"first","affiliation":[{"name":"Washington State University, Pullman, WA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1504-6768","authenticated-orcid":false,"given":"Reet","family":"Barik","sequence":"additional","affiliation":[{"name":"Washington State University, Pullman, WA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4062-0293","authenticated-orcid":false,"given":"Aravind Sukumaran","family":"Rajam","sequence":"additional","affiliation":[{"name":"Washington State University, Pullman, WA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6721-233X","authenticated-orcid":false,"given":"Ananth","family":"Kalyanaraman","sequence":"additional","affiliation":[{"name":"Washington State University, Pullman, WA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5930-8531","authenticated-orcid":false,"given":"Partha Pratim","family":"Pande","sequence":"additional","affiliation":[{"name":"Washington State University, Pullman, WA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,6,27]]},"reference":[{"key":"e_1_3_1_2_2","first-page":"1--26","article-title":"High-performance and energy-efficient network-on-chip architectures for graph analytics","author":"Duraisamy K.","year":"2016","unstructured":"K. Duraisamy et al. 2016. High-performance and energy-efficient network-on-chip architectures for graph analytics. ACM Transactions on Embedded Computing Systems 15, 4 (2016), 1--26.","journal-title":"ACM Transactions on Embedded Computing Systems"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1145\/2851141.2851145"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/2508148.2485964"},{"issue":"5","key":"e_1_3_1_5_2","first-page":"78","article-title":"Parallel graph analytics","volume":"59","author":"Lenharth A.","year":"2016","unstructured":"A. Lenharth et al. 2016. Parallel graph analytics. Communications of the ACM 59, 5 (2016), 78\u201387.","journal-title":"Communications of the"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1109\/DAC18072.2020.9218624"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC50251.2020.00031"},{"key":"e_1_3_1_8_2","article-title":"The Pagerank citation ranking: Bringing order to the web","author":"Page L.","year":"1999","unstructured":"L. Page et al. 1999. The Pagerank citation ranking: Bringing order to the web. Technical Report SIDL-WP-1999- 01204, Stanford University.","journal-title":"Technical Report SIDL-WP-1999- 01204"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.parco.2015.03.003"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2016.7783759"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCD.2009.5413147"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2016.24"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2017.54"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/2967938.2967940"},{"key":"e_1_3_1_15_2","article-title":"Design-space exploration and optimization of an energy-efficient and reliable 3D small-world network-on-chip","author":"Das S.","year":"2016","unstructured":"S. Das, J. R. Doppa, P. P. Pande, and K. Chakrabarty. 2016. Design-space exploration and optimization of an energy-efficient and reliable 3D small-world network-on-chip. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) 36, 5 (2016), 719--732.","journal-title":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD)"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/CCGRID.2017.114"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2009.4919648"},{"key":"e_1_3_1_18_2","unstructured":"T. Petermann and P. De Los Rios. 2005. Spatial small-world networks: A wiring cost perspective. arXiv: Condmat\/0501420v2."},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/hpca.2018.00052"},{"key":"e_1_3_1_20_2","volume-title":"Proceedings of the 24th Asia and South Pacific Design Automation Conference","author":"Dai G.","year":"2009","unstructured":"G. Dai et al. 2009. GraphSAR: GraphSAR: A sparsity-aware processing-in-memory architecture for large-scale graph processing on ReRAMs. In Proceedings of the 24th Asia and South Pacific Design Automation Conference."},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS47924.2020.00077"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1145\/996546.996554"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1145\/800119.803884"},{"key":"e_1_3_1_24_2","unstructured":"Y. Zhang et al. 2016. Optimizing cache performance for graph analytics. arXiv preprintarXiv:1608.01362."},{"key":"e_1_3_1_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC.2018.8573478"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2011.26"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1017\/S0962492916000076"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1145\/800195.805928"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1137\/0710032"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1137\/1031001"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1145\/2882903.2915220"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1137\/S1064827595287997"},{"issue":"4","key":"e_1_3_1_33_2","first-page":"20","article-title":"Multilevel algorithms for linear ordering problems","volume":"13","author":"Safro I.","year":"2009","unstructured":"I. Safro, D. Ron, and A. Brandt. 2009. Multilevel algorithms for linear ordering problems. ACM Journal of Experimental Algorithmics 13, 4 (2009), 20.","journal-title":"ACM Journal of Experimental Algorithmics"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1088\/1742-5468\/2008\/10\/P10008"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC.2013.6704684"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2014.2351577"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2010.176"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2013.6557149"},{"key":"e_1_3_1_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/TEVC.2007.900837"},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2018.2889053"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.vlsi.2017.12.002"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1145\/3444844"},{"key":"e_1_3_1_43_2","doi-asserted-by":"publisher","DOI":"10.1145\/3450440"},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1145\/3352460.3358318"}],"container-title":["ACM Transactions on Design Automation of Electronic Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3514354","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3514354","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:30:20Z","timestamp":1750188620000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3514354"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,6,27]]},"references-count":43,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2022,11,30]]}},"alternative-id":["10.1145\/3514354"],"URL":"https:\/\/doi.org\/10.1145\/3514354","relation":{},"ISSN":["1084-4309","1557-7309"],"issn-type":[{"type":"print","value":"1084-4309"},{"type":"electronic","value":"1557-7309"}],"subject":[],"published":{"date-parts":[[2022,6,27]]},"assertion":[{"value":"2021-06-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-01-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-06-27","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}