{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:17:11Z","timestamp":1750220231077,"version":"3.41.0"},"reference-count":68,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2021,9,3]],"date-time":"2021-09-03T00:00:00Z","timestamp":1630627200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000185","name":"DARPA","doi-asserted-by":"crossref","award":["FA8650-18-2-7862"],"award-info":[{"award-number":["FA8650-18-2-7862"]}],"id":[{"id":"10.13039\/100000185","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100011033","name":"Spanish State Research Agency","doi-asserted-by":"crossref","award":["TIN2016-75344-R"],"award-info":[{"award-number":["TIN2016-75344-R"]}],"id":[{"id":"10.13039\/501100011033","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Fundacion Seneca, Region de Murcia, Programa Jimenez de la Espada","award":["20580\/EE\/18"],"award-info":[{"award-number":["20580\/EE\/18"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2021,12,31]]},"abstract":"<jats:p>Graph structures are a natural representation of important and pervasive data. While graph applications have significant parallelism, their characteristic pointer indirect loads to neighbor data hinder scalability to large datasets on multicore systems. A scalable and efficient system must tolerate latency while leveraging data parallelism across millions of vertices. Modern Out-of-Order (OoO) cores inherently tolerate a fraction of long latencies, but become clogged when running severely memory-bound applications. Combined with large power\/area footprints, this limits their parallel scaling potential and, consequently, the gains that existing software frameworks can achieve. Conversely, accelerator and memory hierarchy designs provide performant hardware specializations, but cannot support diverse application demands.<\/jats:p>\n          <jats:p>To address these shortcomings, we present GraphAttack, a hardware-software data supply approach that accelerates graph applications on in-order multicore architectures. GraphAttack proposes compiler passes to (1)\u00a0identify idiomatic long-latency loads and (2)\u00a0slice programs along these loads into data Producer\/ Consumer threads to map onto pairs of parallel cores. Each pair shares a communication queue; the Producer asynchronously issues long-latency loads, whose results are buffered in the queue and used by the Consumer. This scheme drastically increases memory-level parallelism (MLP) to mitigate latency bottlenecks. In equal-area comparisons, GraphAttack outperforms OoO cores, do-all parallelism, prefetching, and prior decoupling approaches, achieving a 2.87\u00d7 speedup and 8.61\u00d7 gain in energy efficiency across a range of graph applications. These improvements scale; GraphAttack achieves a 3\u00d7 speedup over 64 parallel cores. Lastly, it has pragmatic design principles; it enhances in-order architectures that are gaining increasing open-source support.<\/jats:p>","DOI":"10.1145\/3469846","type":"journal-article","created":{"date-parts":[[2021,9,3]],"date-time":"2021-09-03T16:12:01Z","timestamp":1630685521000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["GraphAttack"],"prefix":"10.1145","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0764-0778","authenticated-orcid":false,"given":"Aninda","family":"Manocha","sequence":"first","affiliation":[{"name":"Princeton University, Princeton, NJ"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tyler","family":"Sorensen","sequence":"additional","affiliation":[{"name":"UC Santa Cruz, Santa Cruz, CA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Esin","family":"Tureci","sequence":"additional","affiliation":[{"name":"Princeton University, Princeton, NJ"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Opeoluwa","family":"Matthews","sequence":"additional","affiliation":[{"name":"Princeton University, Princeton, NJ"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4955-7235","authenticated-orcid":false,"given":"Juan L.","family":"Arag\u00f3n","sequence":"additional","affiliation":[{"name":"Universidad de Murcia, Murcia, Spain"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Margaret","family":"Martonosi","sequence":"additional","affiliation":[{"name":"Princeton University, Princeton, NJ"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,9,3]]},"reference":[{"volume-title":"RISC-V Ariane CVA6. Retrieved from on","year":"2020","key":"e_1_2_1_1_1","unstructured":"2019. RISC-V Ariane CVA6. Retrieved from on Aug. 2020 https:\/\/github.com\/openhwgroup\/cva6. 2019. RISC-V Ariane CVA6. Retrieved from on Aug. 2020 https:\/\/github.com\/openhwgroup\/cva6."},{"volume-title":"Retrieved from on","year":"2020","key":"e_1_2_1_2_1","unstructured":"2020. HammerBlade. Retrieved from on Aug. 2020 https:\/\/github.com\/bespoke-silicon-group\/bsg_bladerunner. 2020. HammerBlade. Retrieved from on Aug. 2020 https:\/\/github.com\/bespoke-silicon-group\/bsg_bladerunner."},{"key":"e_1_2_1_3_1","volume-title":"John Hennessy and David Patterson. Turing Award lecture. Retrieved on","author":"ACM.","year":"2019","unstructured":"ACM. 2018. John Hennessy and David Patterson. Turing Award lecture. Retrieved on Aug 2019 from https:\/\/www.acm.org\/hennessy-patterson-turing-lecture ACM. 2018. John Hennessy and David Patterson. Turing Award lecture. Retrieved on Aug 2019 from https:\/\/www.acm.org\/hennessy-patterson-turing-lecture"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/IISWC.2018.8573480"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/2749469.2750386"},{"volume-title":"Proceedings of the International Conference on Supercomputing. ACM.","author":"Ainsworth Sam","key":"e_1_2_1_6_1","unstructured":"Sam Ainsworth and Timothy M. Jones . 2016. Graph prefetching using data structure knowledge . In Proceedings of the International Conference on Supercomputing. ACM. Sam Ainsworth and Timothy M. Jones. 2016. Graph prefetching using data structure knowledge. In Proceedings of the International Conference on Supercomputing. ACM."},{"volume-title":"Proceedings of the International Symposium on Code Generation and Optimization. 305\u2013317","author":"Ainsworth S.","key":"e_1_2_1_7_1","unstructured":"S. Ainsworth and T. M. Jones . 2017. Software prefetching for indirect memory accesses . In Proceedings of the International Symposium on Code Generation and Optimization. 305\u2013317 . S. Ainsworth and T. M. Jones. 2017. Software prefetching for indirect memory accesses. In Proceedings of the International Symposium on Code Generation and Optimization. 305\u2013317."},{"volume-title":"Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. ACM.","author":"Ainsworth Sam","key":"e_1_2_1_8_1","unstructured":"Sam Ainsworth and Timothy M. Jones . 2018. An event-triggered programmable prefetcher for irregular workloads . In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. ACM. Sam Ainsworth and Timothy M. Jones. 2018. An event-triggered programmable prefetcher for irregular workloads. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems. ACM."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/2872362.2872414"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2019.00051"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.5555\/2388996.2389013"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3078597.3078616"},{"volume-title":"Proceedings of the International Conference on Supercomputing.","author":"Bird Peter L.","key":"e_1_2_1_14_1","unstructured":"Peter L. Bird , Alasdair Rawsthorne , and Nigel P. Topham . 1993. The effectiveness of decoupling . In Proceedings of the International Conference on Supercomputing. Peter L. Bird, Alasdair Rawsthorne, and Nigel P. Topham. 1993. The effectiveness of decoupling. In Proceedings of the International Conference on Supercomputing."},{"key":"e_1_2_1_15_1","volume-title":"Proceedings of the 3rd MLSys Conference. ACM.","author":"Chen Beidi","year":"2019","unstructured":"Beidi Chen , Tharun Medini , and Anshumali Shrivastava . 2019 . SLIDE : In defense of smart algorithms over hardware acceleration for large-scale deep learning systems . In Proceedings of the 3rd MLSys Conference. ACM. Beidi Chen, Tharun Medini, and Anshumali Shrivastava. 2019. SLIDE : In defense of smart algorithms over hardware acceleration for large-scale deep learning systems. In Proceedings of the 3rd MLSys Conference. ACM."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/263580.263597"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2019.00023"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3224419"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2830772.2830800"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3075620"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2016.7783759"},{"volume-title":"Proceedings of the International Symposium on Microarchitecture. 1\u201312","author":"Hashemi Milad","key":"e_1_2_1_22_1","unstructured":"Milad Hashemi , Onur Mutlu , and Yale N. Patt . 2016. Continuous runahead: Transparent hardware acceleration for memory intensive workloads . In Proceedings of the International Symposium on Microarchitecture. 1\u201312 . Milad Hashemi, Onur Mutlu, and Yale N. Patt. 2016. Continuous runahead: Transparent hardware acceleration for memory intensive workloads. In Proceedings of the International Symposium on Microarchitecture. 1\u201312."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1098\/rsif.2005.0051"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2019.00064"},{"volume-title":"Proceedings of the International Conference on Research and Development in Information Retrieval. ACM, 195\u2013202","author":"Konstas Ioannis","key":"e_1_2_1_25_1","unstructured":"Ioannis Konstas , Vassilios Stathopoulos , and Joemon M. Jose . 2009. On social networks and collaborative recommendation . In Proceedings of the International Conference on Research and Development in Information Retrieval. ACM, 195\u2013202 . Ioannis Konstas, Vassilios Stathopoulos, and Joemon M. Jose. 2009. On social networks and collaborative recommendation. In Proceedings of the International Conference on Research and Development in Information Retrieval. ACM, 195\u2013202."},{"key":"e_1_2_1_26_1","article-title":"Kronecker graphs: An approach to modeling networks","author":"Leskovec Jure","year":"2010","unstructured":"Jure Leskovec , Deepayan Chakrabarti , Jon Kleinberg , Christos Faloutsos , and Zoubin Ghahramani . 2010 . Kronecker graphs: An approach to modeling networks . Journal of Machine Learning Reseach 11 ( March 2010), 985\u20131042. Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos, and Zoubin Ghahramani. 2010. Kronecker graphs: An approach to modeling networks. Journal of Machine Learning Reseach 11 (March 2010), 985\u20131042.","journal-title":"Journal of Machine Learning Reseach 11"},{"volume-title":"Proceedings of the International Symposium on Microarchitecture. ACM, 469\u2013480","author":"Li Sheng","key":"e_1_2_1_27_1","unstructured":"Sheng Li , Jung Ho Ahn , Richard D. Strong , Jay B. Brockman , Dean M. Tullsen , and Norman P. Jouppi . 2009. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures . In Proceedings of the International Symposium on Microarchitecture. ACM, 469\u2013480 . Sheng Li, Jung Ho Ahn, Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, and Norman P. Jouppi. 2009. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proceedings of the International Symposium on Microarchitecture. ACM, 469\u2013480."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/248209.237190"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/3123939.3124536"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS48437.2020.00029"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/3151034"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO.2018.00010"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/3352460.3358254"},{"key":"e_1_2_1_34_1","first-page":"28","article-title":"CACTI 6.0: A tool to model large caches","volume":"27","author":"Muralimanohar Naveen","year":"2009","unstructured":"Naveen Muralimanohar , Rajeev Balasubramonian , and Norman P. Jouppi . 2009 . CACTI 6.0: A tool to model large caches . HP Labs 27 (2009), 28 . Naveen Muralimanohar, Rajeev Balasubramonian, and Norman P. Jouppi. 2009. CACTI 6.0: A tool to model large caches. HP Labs 27 (2009), 28.","journal-title":"HP Labs"},{"volume-title":"Proceedings of the International Symposium on High-Performance Computer Architecture. IEEE, 129","author":"Mutlu Onur","key":"e_1_2_1_35_1","unstructured":"Onur Mutlu , Jared Stark , Chris Wilkerson , and Yale N. Patt . 2003. Runahead execution: An alternative to very large instruction windows for out-of-order processors . In Proceedings of the International Symposium on High-Performance Computer Architecture. IEEE, 129 . Onur Mutlu, Jared Stark, Chris Wilkerson, and Yale N. Patt. 2003. Runahead execution: An alternative to very large instruction windows for out-of-order processors. In Proceedings of the International Symposium on High-Performance Computer Architecture. IEEE, 129."},{"volume-title":"Proceedings of the 2020 53rd Annual IEEE\/ACM International Symposium on Microarchitecture.","author":"Quan","key":"e_1_2_1_36_1","unstructured":"Quan M. Nguyen and Ddaniel Sanchez. 2020. Pipette: Improving core utilization on irregular applications through intra-core pipeline parallelism . In Proceedings of the 2020 53rd Annual IEEE\/ACM International Symposium on Microarchitecture. Quan M. Nguyen and Ddaniel Sanchez. 2020. Pipette: Improving core utilization on irregular applications through intra-core pipeline parallelism. In Proceedings of the 2020 53rd Annual IEEE\/ACM International Symposium on Microarchitecture."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2016.7446051"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2014.15"},{"volume-title":"Proceedings of the International Symposium on Workload Characterization. IEEE, 130\u2013139","author":"Molly","key":"e_1_2_1_39_1","unstructured":"Molly A. O\u2019Neil and Martin Burtscher. 2014. Microarchitectural performance characterization of irregular GPU kernels . In Proceedings of the International Symposium on Workload Characterization. IEEE, 130\u2013139 . Molly A. O\u2019Neil and Martin Burtscher. 2014. Microarchitectural performance characterization of irregular GPU kernels. In Proceedings of the International Symposium on Workload Characterization. IEEE, 130\u2013139."},{"volume-title":"Proceedings of the International Symposium on Computer Architecture. 370\u2013381","author":"Mutlu Onur","key":"e_1_2_1_40_1","unstructured":"Onur Mutlu , Hyesoon Kim , and Y. N. Patt . 2005. Techniques for efficient processing in runahead execution engines . In Proceedings of the International Symposium on Computer Architecture. 370\u2013381 . Onur Mutlu, Hyesoon Kim, and Y. N. Patt. 2005. Techniques for efficient processing in runahead execution engines. In Proceedings of the International Symposium on Computer Architecture. 370\u2013381."},{"key":"e_1_2_1_41_1","volume-title":"OpenMP application program interface Version 5.0. Retrieved on","author":"Architecture Review Board MP","year":"2019","unstructured":"Open MP Architecture Review Board . 2018. OpenMP application program interface Version 5.0. Retrieved on Aug. 2019 from https:\/\/www.openmp.org\/wp-content\/uploads\/OpenMP-API-Specification-5.0.pdf. OpenMP Architecture Review Board. 2018. OpenMP application program interface Version 5.0. Retrieved on Aug. 2019 from https:\/\/www.openmp.org\/wp-content\/uploads\/OpenMP-API-Specification-5.0.pdf."},{"volume-title":"Proceedings of the 38th International Symposium on Microarchitecture. IEEE, 107\u2013118","author":"Ottoni Guilherme","key":"e_1_2_1_42_1","unstructured":"Guilherme Ottoni , Ram Rangan , Adam Stoler , and David I. August . 2005. Automatic thread extraction with decoupled software pipelining . In Proceedings of the 38th International Symposium on Microarchitecture. IEEE, 107\u2013118 . Guilherme Ottoni, Ram Rangan, Adam Stoler, and David I. August. 2005. Automatic thread extraction with decoupled software pipelining. In Proceedings of the 38th International Symposium on Microarchitecture. IEEE, 107\u2013118."},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2016.24"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/2983990.2984015"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/3297858.3304025"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-87700-4_107"},{"key":"e_1_2_1_47_1","volume-title":"Proceedings of the International Symposium on Microarchitecture. 269\u2013280","author":"Purser Zach","year":"2000","unstructured":"Zach Purser , Karthik Sundaramoorthy , and Eric Rotenberg . 2000 . A study of slipstream processors . In Proceedings of the International Symposium on Microarchitecture. 269\u2013280 . Zach Purser, Karthik Sundaramoorthy, and Eric Rotenberg. 2000. A study of slipstream processors. In Proceedings of the International Symposium on Microarchitecture. 269\u2013280."},{"volume-title":"Proceedings of the International Conference on Parallel Architecture and Compilation Techniques. IEEE, 177\u2013188","author":"Rangan Ram","key":"e_1_2_1_48_1","unstructured":"Ram Rangan , Neil Vachharajani , Manish Vachharajani , and David I. August . 2004. Decoupled software pipelining with the synchronization array . In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques. IEEE, 177\u2013188 . Ram Rangan, Neil Vachharajani, Manish Vachharajani, and David I. August. 2004. Decoupled software pipelining with the synchronization array. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques. IEEE, 177\u2013188."},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/1067649.801719"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/3400302.3415751"},{"volume-title":"GPUs. In Proceedings of the International Symposium on Workload Characterization. IEEE.","author":"Sorensen Tyler","key":"e_1_2_1_51_1","unstructured":"Tyler Sorensen , Sreepathi Pai , and Alastair F. Donaldson . 2019. One size doesn\u2019t fit all: Quantifying performance portability of graph applications on GPUs. In Proceedings of the International Symposium on Workload Characterization. IEEE. Tyler Sorensen, Sreepathi Pai, and Alastair F. Donaldson. 2019. One size doesn\u2019t fit all: Quantifying performance portability of graph applications on GPUs. In Proceedings of the International Symposium on Workload Characterization. IEEE."},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.14778\/2809974.2809983"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/384264.379247"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA51647.2021.00061"},{"key":"e_1_2_1_55_1","volume-title":"Theano: A python framework for fast computation of mathematical expressions. arXiv:1605.02688.","author":"Team Theano Development","year":"2016","unstructured":"Theano Development Team . 2016 . Theano: A python framework for fast computation of mathematical expressions. arXiv:1605.02688. Retrieved from https:\/\/arxiv.org\/abs\/1605.02688. Theano Development Team. 2016. Theano: A python framework for fast computation of mathematical expressions. arXiv:1605.02688. Retrieved from https:\/\/arxiv.org\/abs\/1605.02688."},{"volume-title":"Proceedings of the International Conference on Parallel Architecture and Compilation Techniques. IEEE.","author":"Vachharajani Neil","key":"e_1_2_1_56_1","unstructured":"Neil Vachharajani , Ram Rangan , Easwaran Raman , Matthew J. Bridges , Guilherme Ottoni , and David I. August . 2007. Speculative decoupled software pipelining . In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques. IEEE. Neil Vachharajani, Ram Rangan, Easwaran Raman, Matthew J. Bridges, Guilherme Ottoni, and David I. August. 2007. Speculative decoupled software pipelining. In Proceedings of the International Conference on Parallel Architecture and Compilation Techniques. IEEE."},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/1736020.1736044"},{"volume-title":"Proceedings of the Symposium on Principles and Practice of Parallel Programming. ACM, 12 pages.","author":"Wang Yangzihao","key":"e_1_2_1_58_1","unstructured":"Yangzihao Wang , Andrew Davidson , Yuechao Pan , Yuduo Wu , Andy Riffel , and John D. Owens . 2016. Gunrock: A high-performance graph processing library on the GPU . In Proceedings of the Symposium on Principles and Practice of Parallel Programming. ACM, 12 pages. Yangzihao Wang, Andrew Davidson, Yuechao Pan, Yuduo Wu, Andy Riffel, and John D. Owens. 2016. Gunrock: A high-performance graph processing library on the GPU. In Proceedings of the Symposium on Principles and Practice of Parallel Programming. ACM, 12 pages."},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2009.4798239"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-662-48096-0_34"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1145\/3307650.3322225"},{"key":"e_1_2_1_62_1","volume-title":"International Conference on Learning Representations (ICLR). OpenReview.net.","author":"Xu Keyulu","year":"2018","unstructured":"Keyulu Xu , Weihua Hu , Jure Leskovec , and Stefanie Jegelka . 2018 . How powerful are graph neural networks? In International Conference on Learning Representations (ICLR). OpenReview.net. Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2018. How powerful are graph neural networks? In International Conference on Learning Representations (ICLR). OpenReview.net."},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1145\/2830772.2830807"},{"key":"e_1_2_1_64_1","volume-title":"Ariane: An open-source 64-bit RISC-V application class processor and latest improvements. Technical talk at the RISC-V Workshop, Retrieved on from https:\/\/www.youtube.com\/watch?v=8HpvRNh0ux4.","author":"Zaruba Florian","year":"2018","unstructured":"Florian Zaruba and Luca Benini . 2018 . Ariane: An open-source 64-bit RISC-V application class processor and latest improvements. Technical talk at the RISC-V Workshop, Retrieved on from https:\/\/www.youtube.com\/watch?v=8HpvRNh0ux4. Florian Zaruba and Luca Benini. 2018. Ariane: An open-source 64-bit RISC-V application class processor and latest improvements. Technical talk at the RISC-V Workshop, Retrieved on from https:\/\/www.youtube.com\/watch?v=8HpvRNh0ux4."},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2018.00053"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1145\/3276491"},{"key":"e_1_2_1_67_1","unstructured":"Jie Zhou Ganqu Cui Zhengyan Zhang Cheng Yang Zhiyuan Liu and Maosong Sun. 2018. Graph neural networks: A review of methods and applications. arXiv:1812.08434. Retrieved from https:\/\/arxiv.org\/abs\/1812.08434.  Jie Zhou Ganqu Cui Zhengyan Zhang Cheng Yang Zhiyuan Liu and Maosong Sun. 2018. Graph neural networks: A review of methods and applications. arXiv:1812.08434. Retrieved from https:\/\/arxiv.org\/abs\/1812.08434."},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevE.76.046115"},{"key":"e_1_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1145\/3352460.3358256"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3469846","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3469846","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3469846","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:30:16Z","timestamp":1750188616000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3469846"}},"subtitle":["Optimizing Data Supply for Graph Applications on In-Order Multicore Architectures"],"short-title":[],"issued":{"date-parts":[[2021,9,3]]},"references-count":68,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2021,12,31]]}},"alternative-id":["10.1145\/3469846"],"URL":"https:\/\/doi.org\/10.1145\/3469846","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2021,9,3]]},"assertion":[{"value":"2021-03-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-06-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-09-03","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}