{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,24]],"date-time":"2025-09-24T10:01:29Z","timestamp":1758708089766,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":38,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,8,9]],"date-time":"2021-08-09T00:00:00Z","timestamp":1628467200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["1618706,1717774"],"award-info":[{"award-number":["1618706,1717774"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,8,9]]},"DOI":"10.1145\/3472456.3473511","type":"proceedings-article","created":{"date-parts":[[2021,10,5]],"date-time":"2021-10-05T18:46:04Z","timestamp":1633459564000},"page":"1-11","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["Automatic Generation of High-Performance Inference Kernels for Graph Neural Networks on Multi-Core Systems"],"prefix":"10.1145","author":[{"given":"Qiang","family":"Fu","sequence":"first","affiliation":[{"name":"George Washington University"}]},{"given":"H. Howie","family":"Huang","sequence":"additional","affiliation":[{"name":"George Washington University"}]}],"member":"320","published-online":{"date-parts":[[2021,10,5]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Tensorflow: A system for large-scale machine learning. In 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16). 265\u2013283.","author":"Abadi Mart\u00edn","year":"2016","unstructured":"Mart\u00edn Abadi , Paul Barham , Chen , 2016 . Tensorflow: A system for large-scale machine learning. In 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16). 265\u2013283. Mart\u00edn Abadi, Paul Barham, Chen, 2016. Tensorflow: A system for large-scale machine learning. In 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16). 265\u2013283."},{"key":"e_1_3_2_1_2_1","unstructured":"Lucas Bernardi 2019. 150 Successful Machine Learning Models: 6 Lessons Learned at Booking. com. In KDD. ACM 1743\u20131751.  Lucas Bernardi 2019. 150 Successful Machine Learning Models: 6 Lessons Learned at Booking. com. In KDD. ACM 1743\u20131751."},{"key":"e_1_3_2_1_3_1","unstructured":"Xavier Bresson and Thomas Laurent. 2017. Residual gated graph convnets. arXiv preprint arXiv:1711.07553(2017).  Xavier Bresson and Thomas Laurent. 2017. Residual gated graph convnets. arXiv preprint arXiv:1711.07553(2017)."},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3159652.3159731"},{"key":"e_1_3_2_1_5_1","unstructured":"Tianqi Chen 2015. Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:1512.01274(2015).  Tianqi Chen 2015. Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:1512.01274(2015)."},{"key":"e_1_3_2_1_6_1","unstructured":"Tianqi Chen 2018. {TVM}: An automated end-to-end optimizing compiler for deep learning. In ({OSDI} 18). 578\u2013594.  Tianqi Chen 2018. {TVM}: An automated end-to-end optimizing compiler for deep learning. In ({OSDI} 18). 578\u2013594."},{"key":"e_1_3_2_1_7_1","unstructured":"Micha\u00ebl Defferrard Xavier Bresson and Pierre Vandergheynst. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in neural information processing systems. 3844\u20133852.  Micha\u00ebl Defferrard Xavier Bresson and Pierre Vandergheynst. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in neural information processing systems. 3844\u20133852."},{"key":"e_1_3_2_1_8_1","unstructured":"David\u00a0K Duvenaud Dougal Maclaurin Jorge Iparraguirre Rafael Bombarell Timothy Hirzel Al\u00e1n Aspuru-Guzik and Ryan\u00a0P Adams. 2015. Convolutional networks on graphs for learning molecular fingerprints. In NIPS. 2224\u20132232.  David\u00a0K Duvenaud Dougal Maclaurin Jorge Iparraguirre Rafael Bombarell Timothy Hirzel Al\u00e1n Aspuru-Guzik and Ryan\u00a0P Adams. 2015. Convolutional networks on graphs for learning molecular fingerprints. In NIPS. 2224\u20132232."},{"key":"e_1_3_2_1_9_1","unstructured":"Matthias Fey and Jan\u00a0Eric Lenssen. 2019. Fast Graph Representation Learning with PyTorch Geometric. arXiv preprint arXiv:1903.02428(2019).  Matthias Fey and Jan\u00a0Eric Lenssen. 2019. Fast Graph Representation Learning with PyTorch Geometric. arXiv preprint arXiv:1903.02428(2019)."},{"key":"e_1_3_2_1_10_1","unstructured":"Will Hamilton Zhitao Ying and Jure Leskovec. 2017. Inductive representation learning on large graphs. In NIPS. 1024\u20131034.  Will Hamilton Zhitao Ying and Jure Leskovec. 2017. Inductive representation learning on large graphs. In NIPS. 1024\u20131034."},{"key":"e_1_3_2_1_11_1","unstructured":"William\u00a0L Hamilton Rex Ying and Jure Leskovec. 2017. Representation learning on graphs: Methods and applications. arXiv preprint arXiv:1709.05584(2017).  William\u00a0L Hamilton Rex Ying and Jure Leskovec. 2017. Representation learning on graphs: Methods and applications. arXiv preprint arXiv:1709.05584(2017)."},{"key":"e_1_3_2_1_12_1","unstructured":"Mikael Henaff Joan Bruna and Yann LeCun. 2015. Deep convolutional networks on graph-structured data. arXiv preprint arXiv:1506.05163(2015).  Mikael Henaff Joan Bruna and Yann LeCun. 2015. Deep convolutional networks on graph-structured data. arXiv preprint arXiv:1506.05163(2015)."},{"key":"e_1_3_2_1_13_1","volume-title":"Proceedings of SAI Intelligent Systems Conference. Springer, 432\u2013448","author":"Hu Dichao","year":"2019","unstructured":"Dichao Hu . 2019 . An introductory survey on attention mechanisms in NLP problems . In Proceedings of SAI Intelligent Systems Conference. Springer, 432\u2013448 . Dichao Hu. 2019. An introductory survey on attention mechanisms in NLP problems. In Proceedings of SAI Intelligent Systems Conference. Springer, 432\u2013448."},{"key":"e_1_3_2_1_14_1","unstructured":"Yuwei Hu 2020. Featgraph: A flexible and efficient backend for graph neural network systems. arXiv preprint arXiv:2008.11359(2020).  Yuwei Hu 2020. Featgraph: A flexible and efficient backend for graph neural network systems. arXiv preprint arXiv:2008.11359(2020)."},{"key":"e_1_3_2_1_15_1","volume-title":"Proceedings of the 20th international conference on machine learning (ICML-03)","author":"Kashima Hisashi","year":"2003","unstructured":"Hisashi Kashima , Koji Tsuda , and Akihiro Inokuchi . 2003 . Marginalized kernels between labeled graphs . In Proceedings of the 20th international conference on machine learning (ICML-03) . 321\u2013328. Hisashi Kashima, Koji Tsuda, and Akihiro Inokuchi. 2003. Marginalized kernels between labeled graphs. In Proceedings of the 20th international conference on machine learning (ICML-03). 321\u2013328."},{"key":"e_1_3_2_1_16_1","unstructured":"Thomas\u00a0N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907(2016).  Thomas\u00a0N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907(2016)."},{"key":"e_1_3_2_1_17_1","unstructured":"Aapo Kyrola Guy Blelloch and Carlos Guestrin. 2012. GraphChi: Large-Scale Graph Computation on Just a {PC}. In ({OSDI} 12). 31\u201346.  Aapo Kyrola Guy Blelloch and Carlos Guestrin. 2012. GraphChi: Large-Scale Graph Computation on Just a {PC}. In ({OSDI} 12). 31\u201346."},{"key":"e_1_3_2_1_18_1","unstructured":"Mingzhen Li 2020. The Deep Learning Compiler: A Comprehensive Survey. arXiv preprint arXiv:2002.03794(2020).  Mingzhen Li 2020. The Deep Learning Compiler: A Comprehensive Survey. arXiv preprint arXiv:2002.03794(2020)."},{"key":"e_1_3_2_1_19_1","unstructured":"Yujia Li Daniel Tarlow Marc Brockschmidt and Richard Zemel. 2015. Gated graph sequence neural networks. arXiv preprint arXiv:1511.05493(2015).  Yujia Li Daniel Tarlow Marc Brockschmidt and Richard Zemel. 2015. Gated graph sequence neural networks. arXiv preprint arXiv:1511.05493(2015)."},{"key":"e_1_3_2_1_20_1","volume-title":"Simd-x: Programming and processing of graph algorithms on gpus. In ({USENIX}{ATC} 19). 411\u2013428.","author":"Liu Hang","year":"2019","unstructured":"Hang Liu and H\u00a0Howie Huang . 2019 . Simd-x: Programming and processing of graph algorithms on gpus. In ({USENIX}{ATC} 19). 411\u2013428. Hang Liu and H\u00a0Howie Huang. 2019. Simd-x: Programming and processing of graph algorithms on gpus. In ({USENIX}{ATC} 19). 411\u2013428."},{"key":"e_1_3_2_1_21_1","unstructured":"Lingxiao Ma and Others. 2019. NeuGraph: Parallel Deep Neural Network Computation on Large Graphs. In 2019 {USENIX} Annual Technical Conference ({USENIX}{ATC} 19). 443\u2013458.  Lingxiao Ma and Others. 2019. NeuGraph: Parallel Deep Neural Network Computation on Large Graphs. In 2019 {USENIX} Annual Technical Conference ({USENIX}{ATC} 19). 443\u2013458."},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1807167.1807184"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"crossref","unstructured":"Diego Marcheggiani and Ivan Titov. 2017. Encoding sentences with graph convolutional networks for semantic role labeling. arXiv preprint arXiv:1703.04826(2017).  Diego Marcheggiani and Ivan Titov. 2017. Encoding sentences with graph convolutional networks for semantic role labeling. arXiv preprint arXiv:1703.04826(2017).","DOI":"10.18653\/v1\/D17-1159"},{"key":"e_1_3_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/2517349.2522739"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-69311-6_21"},{"key":"e_1_3_2_1_26_1","unstructured":"Adam Paszke 2019. PyTorch: An imperative style high-performance deep learning library. In Advances in Neural Information Processing Systems. 8024\u20138035.  Adam Paszke 2019. PyTorch: An imperative style high-performance deep learning library. In Advances in Neural Information Processing Systems. 8024\u20138035."},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/2517349.2522740"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"crossref","unstructured":"Julian Shun and Guy\u00a0E Blelloch. 2013. Ligra: a lightweight graph processing framework for shared memory. In PPoPP. 135\u2013146.  Julian Shun and Guy\u00a0E Blelloch. 2013. Ligra: a lightweight graph processing framework for shared memory. In PPoPP. 135\u2013146.","DOI":"10.1145\/2517327.2442530"},{"key":"e_1_3_2_1_29_1","unstructured":"Sainbayar Sukhbaatar Rob Fergus 2016. Learning multiagent communication with backpropagation. In Advances in Neural Information Processing Systems. 2244\u20132252.  Sainbayar Sukhbaatar Rob Fergus 2016. Learning multiagent communication with backpropagation. In Advances in Neural Information Processing Systems. 2244\u20132252."},{"key":"e_1_3_2_1_30_1","unstructured":"Nicolas Vasilache 2018. Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions. arXiv preprint arXiv:1802.04730(2018).  Nicolas Vasilache 2018. Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions. arXiv preprint arXiv:1802.04730(2018)."},{"key":"e_1_3_2_1_31_1","unstructured":"Petar Veli\u010dkovi\u0107 Guillem Cucurull Arantxa Casanova Adriana Romero Pietro Lio and Yoshua Bengio. 2017. Graph attention networks. arXiv preprint arXiv:1710.10903(2017).  Petar Veli\u010dkovi\u0107 Guillem Cucurull Arantxa Casanova Adriana Romero Pietro Lio and Yoshua Bengio. 2017. Graph attention networks. arXiv preprint arXiv:1710.10903(2017)."},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"crossref","unstructured":"Hongwei Wang 2018. Graphgan: Graph representation learning with generative adversarial nets. In AAAI\u201918.  Hongwei Wang 2018. Graphgan: Graph representation learning with generative adversarial nets. In AAAI\u201918.","DOI":"10.1609\/aaai.v32i1.11872"},{"key":"e_1_3_2_1_33_1","volume-title":"Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs. ICLR Workshop on Representation Learning on Graphs and Manifolds","author":"Wang Minjie","year":"2019","unstructured":"Minjie Wang 2019 . Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs. ICLR Workshop on Representation Learning on Graphs and Manifolds (2019). Minjie Wang 2019. Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs. ICLR Workshop on Representation Learning on Graphs and Manifolds (2019)."},{"key":"e_1_3_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/2851141.2851145"},{"volume-title":"Machine learning at facebook: Understanding inference at the edge. In 2019 HPCA","author":"Carole-Jean","key":"e_1_3_2_1_35_1","unstructured":"Carole-Jean Wu 2019. Machine learning at facebook: Understanding inference at the edge. In 2019 HPCA . IEEE , 331\u2013344. Carole-Jean Wu 2019. Machine learning at facebook: Understanding inference at the edge. In 2019 HPCA. IEEE, 331\u2013344."},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"crossref","unstructured":"Sijie Yan Yuanjun Xiong and Dahua Lin. 2018. Spatial temporal graph convolutional networks for skeleton-based action recognition. In AAAI\u201918.  Sijie Yan Yuanjun Xiong and Dahua Lin. 2018. Spatial temporal graph convolutional networks for skeleton-based action recognition. In AAAI\u201918.","DOI":"10.1609\/aaai.v32i1.12328"},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3219819.3219890"},{"key":"e_1_3_2_1_38_1","unstructured":"Zhitao Ying Jiaxuan You Christopher Morris Xiang Ren Will Hamilton and Jure Leskovec. 2018. Hierarchical graph representation learning with differentiable pooling. In Advances in neural information processing systems. 4800\u20134810.  Zhitao Ying Jiaxuan You Christopher Morris Xiang Ren Will Hamilton and Jure Leskovec. 2018. Hierarchical graph representation learning with differentiable pooling. In Advances in neural information processing systems. 4800\u20134810."}],"event":{"name":"ICPP 2021: 50th International Conference on Parallel Processing","acronym":"ICPP 2021","location":"Lemont IL USA"},"container-title":["50th International Conference on Parallel Processing"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3472456.3473511","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3472456.3473511","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3472456.3473511","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:17:23Z","timestamp":1750191443000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3472456.3473511"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,8,9]]},"references-count":38,"alternative-id":["10.1145\/3472456.3473511","10.1145\/3472456"],"URL":"https:\/\/doi.org\/10.1145\/3472456.3473511","relation":{},"subject":[],"published":{"date-parts":[[2021,8,9]]},"assertion":[{"value":"2021-10-05","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}