{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T05:05:16Z","timestamp":1750309516090,"version":"3.41.0"},"reference-count":45,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2025,3,20]],"date-time":"2025-03-20T00:00:00Z","timestamp":1742428800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62202288 and 92264108"],"award-info":[{"award-number":["62202288 and 92264108"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2025,3,31]]},"abstract":"<jats:p>\n            Recently, graph convolutional networks (GCNs) have gained wide attention due to their ability to capture node relationships in graphs. One problem appears when full-batch GCN is trained on large graph datasets, where the computational and memory requirements are unacceptable. To address this issue, mini-batch GCN training is introduced to improve the scalability of GCN training for large datasets by sampling and training only a subset of the graph in each batch. Although several acceleration techniques have been designed for boosting the efficiency of full-batch GCN, they lack attention to mini-batch GCN, which differs from full-batch GCN in terms of the sampled dynamic graph structures. Based on our previous work, GCNTrain\u00a0[\n            <jats:xref ref-type=\"bibr\">28<\/jats:xref>\n            ], which was originally excogitated for accelerating full-batch GCN training, we devise GCNTrain+\u2014a universal accelerator to tackle the performance bottlenecks associated with both full-batch and mini-batch GCN training.\n          <\/jats:p>\n          <jats:p>GCNTrain+ is equipped with two engines to optimize computation and memory access in GCN training, respectively. To reduce the computation overhead, we propose to dynamically reconfigure the computation order based on the varying data dimensions involved in each training batch. Moreover, we build a unified computation engine to perform the sparse-dense matrix multiplications and sparse-sparse matrix multiplications discovered in GCN training uniformly. To alleviate the memory burden, we devise a two-phased dynamic clustering mechanism to capture data locality as well as customized hardware to reduce the clustering overhead. We evaluate GCNTrain+ on seven datasets, and the result shows that GCNTrain+ achieves 136.0\u00d7, 52.6\u00d7, 2.2\u00d7, and 1.5\u00d7 speedup over CPU, GPU, GCNAX, and GCNTrain in full-batch GCN training. Additionally, GCNTrain+ outperforms them with speedups of 131.6\u00d7, 67.1\u00d7, 4.4\u00d7, and 1.5\u00d7 in mini-batch GCN training.<\/jats:p>","DOI":"10.1145\/3705317","type":"journal-article","created":{"date-parts":[[2024,11,23]],"date-time":"2024-11-23T10:13:14Z","timestamp":1732356794000},"page":"1-22","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["GCNTrain+: A Versatile and Efficient Accelerator for Graph Convolutional Neural Network Training"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6494-4786","authenticated-orcid":false,"given":"Zhuoran","family":"Song","sequence":"first","affiliation":[{"name":"Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-4719-7303","authenticated-orcid":false,"given":"Jiabei","family":"Long","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7353-8798","authenticated-orcid":false,"given":"Li","family":"Jiang","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8417-5796","authenticated-orcid":false,"given":"Naifeng","family":"Jing","sequence":"additional","affiliation":[{"name":"Department of Micro\/Nano Electronics, Shanghai Jiao Tong University, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2790-5884","authenticated-orcid":false,"given":"Xiaoyao","family":"Liang","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China"}]}],"member":"320","published-online":{"date-parts":[[2025,3,20]]},"reference":[{"key":"e_1_3_1_2_2","unstructured":"Sami Abu-El-Haija Amol Kapoor Bryan Perozzi and Joonseok Lee. 2020. N-GCN: Multi-scale graph convolution for semi-supervised node classification. Proceedings of Machine Learning Research 115 (2020) 841\u2013851. https:\/\/proceedings.mlr.press\/v115\/abu-el-haija20a.html"},{"issue":"2","key":"e_1_3_1_3_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3085572","article-title":"CACTI 7: New tools for interconnect exploration in innovative off-chip memories","volume":"14","author":"Balasubramonian Rajeev","year":"2017","unstructured":"Rajeev Balasubramonian, Andrew B. Kahng, Naveen Muralimanohar, Ali Shafiee, and Vaishnav Srinivas. 2017. CACTI 7: New tools for interconnect exploration in innovative off-chip memories. ACM Transactions on Architecture and Code Optimization 14, 2 (2017), 1\u201325.","journal-title":"ACM Transactions on Architecture and Code Optimization"},{"key":"e_1_3_1_4_2","volume-title":"Proceedings of the 2nd International Conference on Learning Representations (ICLR\u201914): Conference Track","author":"Bruna Joan","year":"2014","unstructured":"Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann LeCun. 2014. Spectral networks and locally connected networks on graphs. In Proceedings of the 2nd International Conference on Learning Representations (ICLR\u201914): Conference Track. http:\/\/arxiv.org\/abs\/1312.6203"},{"key":"e_1_3_1_5_2","article-title":"MolGAN: An implicit generative model for small molecular graphs","volume":"1805","author":"Cao Nicola De","year":"2018","unstructured":"Nicola De Cao and Thomas Kipf. 2018. MolGAN: An implicit generative model for small molecular graphs. arXiv abs\/1805.11973 (2018). https:\/\/api.semanticscholar.org\/CorpusID:44100802","journal-title":"arXiv"},{"key":"e_1_3_1_6_2","series-title":"AAAI\u201910","volume-title":"Proceedings of the 24th AAAI Conference on Artificial Intelligence","author":"Carlson Andrew","year":"2010","unstructured":"Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam R. Hruschka, and Tom M. Mitchell. 2010. Toward an architecture for never-ending language learning. In Proceedings of the 24th AAAI Conference on Artificial Intelligence(AAAI\u201910). 1306\u20131313."},{"key":"e_1_3_1_7_2","unstructured":"Jie Chen Tengfei Ma and Cao Xiao. 2018. FastGCN: Fast learning with graph convolutional networks via importance sampling. arXiv:cs.LG\/1801.10247 (2018)."},{"key":"e_1_3_1_8_2","unstructured":"Jianfei Chen Jun Zhu and Le Song. 2018. Stochastic training of graph convolutional networks with variance reduction. Proceedings of Machine Learning Research 80 (2018) 941\u2013949. http:\/\/proceedings.mlr.press\/v80\/chen18p.html"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2018.2871189"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2016.40"},{"key":"e_1_3_1_11_2","series-title":"KDD\u201919","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1145\/3292500.3330925","volume-title":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","author":"Chiang Wei-Lin","year":"2019","unstructured":"Wei-Lin Chiang, Xuanqing Liu, Si Si, Yang Li, Samy Bengio, and Cho-Jui Hsieh. 2019. Cluster-GCN: An efficient algorithm for training deep and large graph convolutional networks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD\u201919). ACM, New York, NY, USA, 257\u2013266."},{"key":"e_1_3_1_12_2","unstructured":"Micha\u00ebl Defferrard Xavier Bresson and Pierre Vandergheynst. 2017. Convolutional neural networks on graphs with fast localized spectral filtering. arXiv:cs.LG\/1606.09375 (2017)."},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPSW.2017.8"},{"key":"e_1_3_1_14_2","first-page":"1","article-title":"A systematic survey of general sparse matrix-matrix multiplication","volume":"55","author":"Gao Jianhua","year":"2020","unstructured":"Jianhua Gao, Weixing Ji, Zhaonian Tan, and Yueyan Zhao. 2020. A systematic survey of general sparse matrix-matrix multiplication. ACM Computing Surveys 55 (2020), 1\u201336. https:\/\/api.semanticscholar.org\/CorpusID:211505964","journal-title":"ACM Computing Surveys"},{"key":"e_1_3_1_15_2","doi-asserted-by":"crossref","first-page":"922","DOI":"10.1109\/MICRO50266.2020.00079","volume-title":"Proceedings of the 2020 53rd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201920)","author":"Geng Tong","year":"2020","unstructured":"Tong Geng, Ang Li, Runbin Shi, Chunshu Wu, Tianqi Wang, Yanfei Li, Pouya Haghi, Antonino Tumeo, Shuai Che, Steve Reinhardt, et\u00a0al. 2020. AWB-GCN: A graph convolutional network accelerator with runtime workload rebalancing. In Proceedings of the 2020 53rd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201920). IEEE, Los Alamitos, CA, USA, 922\u2013936."},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO50266.2020.00079"},{"key":"e_1_3_1_17_2","article-title":"Inductive representation learning on large graphs","volume":"30","author":"Hamilton Will","year":"2017","unstructured":"Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. Advances in Neural Information Processing Systems 30 (2017). 1\u201311.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.5555\/3294771.3294869"},{"key":"e_1_3_1_19_2","series-title":"KDD\u201919","first-page":"705","volume-title":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","author":"Han Peng","year":"2019","unstructured":"Peng Han, Peng Yang, Peilin Zhao, Shuo Shang, Yong Liu, Jiayu Zhou, Xin Gao, and Panos Kalnis. 2019. GCN-MF: Disease-gene association identification by graph convolutional networks and matrix factorization. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD\u201919). ACM, New York, NY, USA, 705\u2013713."},{"key":"e_1_3_1_20_2","series-title":"SIGIR\u201920","first-page":"639","volume-title":"Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval","author":"He Xiangnan","year":"2020","unstructured":"Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, YongDong Zhang, and Meng Wang. 2020. LightGCN: Simplifying and powering graph convolution network for recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR\u201920). ACM, New York, NY, USA, 639\u2013648."},{"key":"e_1_3_1_21_2","article-title":"Deep convolutional networks on graph-structured data","volume":"1506","author":"Henaff Mikael","year":"2015","unstructured":"Mikael Henaff, Joan Bruna, and Yann LeCun. 2015. Deep convolutional networks on graph-structured data. CoRR abs\/1506.05163 (2015). http:\/\/arxiv.org\/abs\/1506.05163","journal-title":"CoRR"},{"key":"e_1_3_1_22_2","first-page":"22118","volume-title":"Advances in Neural Information Processing Systems","author":"Hu Weihua","year":"2020","unstructured":"Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. 2020. Open graph benchmark: Datasets for machine learning on graphs. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, 22118\u201322133."},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA56546.2023.10070983"},{"key":"e_1_3_1_24_2","unstructured":"George Karypis and Vipin Kumar. 1997. METIS\u2014A Software Package for Partitioning Unstructured Graphs Partitioning Meshes and Computing Fill-reducing Ordering of Sparse Matrices. Technical Report 97-061. University of Minnesota."},{"key":"e_1_3_1_25_2","volume-title":"Proceedings of the 5th International Conference on Learning Representations (ICLR\u201917): Conference Track","author":"Kipf Thomas N.","year":"2017","unstructured":"Thomas N. Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In Proceedings of the 5th International Conference on Learning Representations (ICLR\u201917): Conference Track."},{"key":"e_1_3_1_26_2","unstructured":"Adam Lerer Ledell Wu Jiajun Shen Timothee Lacroix Luca Wehrstedt Abhijit Bose and Alex Peysakhovich. 2019. PyTorch-BigGraph: A large-scale graph embedding system. arXiv:cs.LG\/1903.12287 (2019)."},{"key":"e_1_3_1_27_2","first-page":"775","volume-title":"Proceedings of the 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA\u201921)","author":"Li Jiajun","year":"2021","unstructured":"Jiajun Li, Ahmed Louri, Avinash Karanth, and Razvan Bunescu. 2021. GCNAX: A flexible and energy-efficient accelerator for graph convolutional neural networks. In Proceedings of the 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA\u201921). IEEE, Los Alamitos, CA, USA, 775\u2013788."},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1145\/3442381.3449986"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCD56317.2022.00112"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1023\/A:1009953814988"},{"key":"e_1_3_1_31_2","article-title":"TUDataset: A collection of benchmark datasets for learning with graphs","volume":"2007","author":"Morris Christopher","year":"2020","unstructured":"Christopher Morris, Nils M. Kriege, Franka Bause, Kristian Kersting, Petra Mutzel, and Marion Neumann. 2020. TUDataset: A collection of benchmark datasets for learning with graphs. CoRR abs\/2007.08663 (2020). https:\/\/arxiv.org\/abs\/2007.08663","journal-title":"CoRR"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2016.24"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","unstructured":"Prithviraj Sen Galileo Namata Mustafa Bilgic Lise Getoor Brian Galligher and Tina Eliassi-Rad. 2008. Collective classification in network data. AI Magazine 29 (Sept.2008) 93. 10.1609\/aimag.v29i3.2157","DOI":"10.1609\/aimag.v29i3.2157"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA51647.2021.00018"},{"issue":"6","key":"e_1_3_1_35_2","doi-asserted-by":"crossref","first-page":"bbac463","DOI":"10.1093\/bib\/bbac463","article-title":"Predicting the potential human lncRNA\u2013miRNA interactions based on graph convolution network with conditional random field","volume":"23","author":"Wang Wenya","year":"2022","unstructured":"Wenya Wang, Li Zhang, Jianqiang Sun, Qi Zhao, and Jianwei Shuai. 2022. Predicting the potential human lncRNA\u2013miRNA interactions based on graph convolution network with conditional random field. Briefings in Bioinformatics 23, 6 (102022), bbac463.","journal-title":"Briefings in Bioinformatics"},{"key":"e_1_3_1_36_2","unstructured":"Le Wu Peijie Sun Richang Hong Yanjie Fu Xiting Wang and Meng Wang. 2019. SocialGCN: An efficient graph convolutional network based model for social recommendation. arXiv:cs.IR\/1811.02815 (2019)."},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i01.5455"},{"key":"e_1_3_1_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2020.2978386"},{"key":"e_1_3_1_39_2","first-page":"15","volume-title":"Proceedings of the 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA\u201920)","author":"Yan Mingyu","year":"2020","unstructured":"Mingyu Yan, Lei Deng, Xing Hu, Ling Liang, Yujing Feng, Xiaochun Ye, Zhimin Zhang, Dongrui Fan, and Yuan Xie. 2020. HyGCN: A GCN accelerator with hybrid architecture. In Proceedings of the 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA\u201920). IEEE, Los Alamitos, CA, 15\u201329."},{"key":"e_1_3_1_40_2","series-title":"MICRO\u201952","first-page":"615","volume-title":"Proceedings of the 52nd Annual IEEE\/ACM International Symposium on Microarchitecture","author":"Yan Mingyu","year":"2019","unstructured":"Mingyu Yan, Xing Hu, Shuangchen Li, Abanti Basak, Han Li, Xin Ma, Itir Akgun, Yujing Feng, Peng Gu, Lei Deng, et\u00a0al. 2019. Alleviating irregularity in graph analytics acceleration: A hardware\/software co-design approach. In Proceedings of the 52nd Annual IEEE\/ACM International Symposium on Microarchitecture(MICRO\u201952). ACM, New York, NY, USA, 615\u2013628."},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.1145\/3219819.3219890"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2020.3033673"},{"key":"e_1_3_1_43_2","article-title":"DSTC: Dual-side sparsity tensor core for DNNs acceleration on modern GPU architectures","author":"Zhang Chen","year":"2024","unstructured":"Chen Zhang, Yang Wang, Zhiqiang Xie, Cong Guo, Yunxin Liu, Jingwen Leng, Guangyu Sun, Zhigang Ji, Runsheng Wang, Yuan Xie, et\u00a0al. 2024. DSTC: Dual-side sparsity tensor core for DNNs acceleration on modern GPU architectures. IEEE Transactions on Computers. Early Access, October 8, 2024.","journal-title":"IEEE Transactions on Computers."},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.3389\/fgene.2021.690049"},{"key":"e_1_3_1_45_2","first-page":"261","volume-title":"Proceedings of the 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA\u201920)","author":"Zhang Zhekai","year":"2020","unstructured":"Zhekai Zhang, Hanrui Wang, Song Han, and William J. Dally. 2020. Sparch: Efficient architecture for sparse matrix multiplication. In Proceedings of the 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA\u201920). IEEE, Los Alamitos, CA, USA, 261\u2013274."},{"key":"e_1_3_1_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/TITS.2019.2935152"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3705317","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3705317","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:18:02Z","timestamp":1750295882000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3705317"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,20]]},"references-count":45,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,3,31]]}},"alternative-id":["10.1145\/3705317"],"URL":"https:\/\/doi.org\/10.1145\/3705317","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2025,3,20]]},"assertion":[{"value":"2024-02-25","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-10-28","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-03-20","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}