{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,16]],"date-time":"2026-02-16T15:53:06Z","timestamp":1771257186391,"version":"3.50.1"},"reference-count":51,"publisher":"Association for Computing Machinery (ACM)","issue":"2","funder":[{"name":"National Key Research and Development Program of China","award":["2023YFB3001503"],"award-info":[{"award-number":["2023YFB3001503"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2025,6,30]]},"abstract":"<jats:p>Graph attention networks (GATs) have advanced performance in various application domains by introducing the attention mechanism into the graph neural networks (GNNs). The inefficiency of running GATs on CPUs or GPUs necessitates specialized hardware designs. Unfortunately, previous specialized architecture designs have focused on either the GNN architecture or the attention mechanism, resulting in limited performance and leaving ample room for improvement.<\/jats:p>\n          <jats:p>\n            This article presents\n            <jats:sc>Gator<\/jats:sc>\n            , a joint optimization approach with software\u2013hardware co-designs for GAT inference. On the software level,\n            <jats:sc>Gator<\/jats:sc>\n            leverages degree-weighted graph partitioning and parameter-adaptive feature selection techniques to preprocess the input graph data, mining subgraph-level parallelism and mitigating the computation bottleneck of the dedicated dataflow. On the hardware level,\n            <jats:sc>Gator<\/jats:sc>\n            designs a unified processing engine to support various kernels by extracting a common computation pattern and a dimension-aware microarchitecture for efficient partial sum reduction. Extensive experiments show that our approach can achieve 11.5\u00d7 more efficiency compared to NVIDIA RTX 4090 and provide a speedup of 3\u00d7 to 9.4\u00d7, along with a 2.6\u00d7 to 4.7\u00d7 reduction in memory traffic, when compared to six state-of-the-art methods, with minimal accuracy loss.\n          <\/jats:p>","DOI":"10.1145\/3722219","type":"journal-article","created":{"date-parts":[[2025,3,8]],"date-time":"2025-03-08T11:41:56Z","timestamp":1741434116000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Gator: Accelerating Graph Attention Networks by Jointly Optimizing Attention and Graph Processing"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0009-0005-2673-8673","authenticated-orcid":false,"given":"Xiaobo","family":"Lu","sequence":"first","affiliation":[{"name":"National University of Defense Technology","place":["Changsha, China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3542-4869","authenticated-orcid":false,"given":"Jianbin","family":"Fang","sequence":"additional","affiliation":[{"name":"National University of Defense Technology","place":["Changsha, China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6828-3364","authenticated-orcid":false,"given":"Lin","family":"Peng","sequence":"additional","affiliation":[{"name":"National University of Defense Technology","place":["Changsha, China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0317-8192","authenticated-orcid":false,"given":"Chun","family":"Huang","sequence":"additional","affiliation":[{"name":"National University of Defense Technology","place":["Changsha, China"]}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-8190-3024","authenticated-orcid":false,"given":"Zixiao","family":"Yu","sequence":"additional","affiliation":[{"name":"National University of Defense Technology","place":["Changsha, China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3381-3027","authenticated-orcid":false,"given":"Tiejun","family":"Li","sequence":"additional","affiliation":[{"name":"National University of Defense Technology","place":["Changsha, China"]}]}],"member":"320","published-online":{"date-parts":[[2025,7,2]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/2228360.2228584"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1145\/3085572"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/3572848.3577528"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW59228.2023.00490"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1145\/3400302.3415610"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1145\/3292500.3330925"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/1840845.1840883"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-981-16-1089-9_76"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/3568022"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO50266.2020.00079"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/TII.2020.2986316"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA47549.2020.00035"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA52012.2021.00060"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/3566097.3567869"},{"key":"e_1_3_2_16_2","first-page":"2233","article-title":"H-GAT: A hardware-efficient accelerator for graph attention networks","volume":"27","author":"Huang Shizhen","year":"2023","unstructured":"Shizhen Huang, Enhao Tang, and Shun Li. 2023. H-GAT: A hardware-efficient accelerator for graph attention networks. Journal of Applied Science and Engineering 27 (2023), 2233\u20132240.","journal-title":"Journal of Applied Science and Engineering"},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA56546.2023.10070983"},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1093\/bib\/bbab513"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/3575693.3575747"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1137\/S1064827595287997"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2022.3197083"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA51647.2021.00070"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1145\/3470496.3527423"},{"key":"e_1_3_2_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/3490422.3502359"},{"key":"e_1_3_2_25_2","article-title":"Transformer acceleration with dynamic sparse attention","author":"Liu Liu","year":"2021","unstructured":"Liu Liu, Zheng Qu, Zhaodong Chen, Yufei Ding, and Yuan Xie. 2021. Transformer acceleration with dynamic sparse attention. arXiv preprint arXiv:2110.11299 (2021).","journal-title":"arXiv preprint arXiv:2110.11299"},{"issue":"2","key":"e_1_3_2_26_2","first-page":"i779\u2013i786","article-title":"Ensembling graph attention networks for human microbe\u2013drug association prediction","volume":"36","author":"Long Yahui","year":"2020","unstructured":"Yahui Long, Min Wu, Yong Liu, Chee Keong Kwoh, Jiawei Luo, and Xiaoli Li. 2020. Ensembling graph attention networks for human microbe\u2013drug association prediction. Bioinformatics 36, Supplement 2 (2020), i779\u2013i786.","journal-title":"Bioinformatics"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/3466752.3480125"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1145\/3123939.3124545"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1145\/3508352.3549343"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/3579371.3589057"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1145\/3503222.3507738"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/3613424.3614305"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA56546.2023.10071015"},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1145\/3489517.3530504"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1093\/bib\/bbab051"},{"key":"e_1_3_2_36_2","article-title":"VersaGNN: A versatile accelerator for graph neural networks","author":"Shi Feng","year":"2021","unstructured":"Feng Shi, Ahren Yiqiao Jin, and Song-Chun Zhu. 2021. VersaGNN: A versatile accelerator for graph neural networks. arXiv preprint arXiv:2105.01280 (2021).","journal-title":"arXiv preprint arXiv:2105.01280"},{"key":"e_1_3_2_37_2","doi-asserted-by":"publisher","DOI":"10.1145\/3289600.3290989"},{"key":"e_1_3_2_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO50266.2020.00068"},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/MSE.2007.44"},{"key":"e_1_3_2_40_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2020.102277"},{"key":"e_1_3_2_41_2","article-title":"Graph attention networks","author":"Veli\u010dkovi\u0107 Petar","year":"2017","unstructured":"Petar Veli\u010dkovi\u0107, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. Graph attention networks. arXiv preprint arXiv:1710.10903 (2017).","journal-title":"arXiv preprint arXiv:1710.10903"},{"key":"e_1_3_2_42_2","doi-asserted-by":"publisher","DOI":"10.1145\/3572848.3577487"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA51647.2021.00018"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1145\/3292500.3330989"},{"key":"e_1_3_2_45_2","first-page":"515","volume-title":"15th USENIX Symposium on Operating Systems Design and Implementation (OSDI \u201921)","author":"Wang Yuke","year":"2021","unstructured":"Yuke Wang, Boyuan Feng, Gushu Li, Shuangchen Li, Lei Deng, Yuan Xie, and Yufei Ding. 2021. {GNNAdvisor}: An adaptive and efficient runtime system for {GNN} acceleration on {GPUs}. In 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI \u201921). 515\u2013531."},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1145\/3340531.3411996"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA47549.2020.00012"},{"key":"e_1_3_2_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.3023946"},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICPADS51040.2020.00093"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS57955.2024.00084"},{"key":"e_1_3_2_51_2","doi-asserted-by":"publisher","DOI":"10.1145\/3404835.3463028"},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA53966.2022.00041"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3722219","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,2]],"date-time":"2025-07-02T12:20:32Z","timestamp":1751458832000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3722219"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,30]]},"references-count":51,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,6,30]]}},"alternative-id":["10.1145\/3722219"],"URL":"https:\/\/doi.org\/10.1145\/3722219","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"value":"1544-3566","type":"print"},{"value":"1544-3973","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,6,30]]},"assertion":[{"value":"2024-07-07","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-02-24","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-07-02","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}