{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T07:42:59Z","timestamp":1774942979514,"version":"3.50.1"},"publisher-location":"New York, NY, USA","reference-count":56,"publisher":"ACM","license":[{"start":{"date-parts":[[2023,6,21]],"date-time":"2023-06-21T00:00:00Z","timestamp":1687305600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,6,21]]},"DOI":"10.1145\/3577193.3593739","type":"proceedings-article","created":{"date-parts":[[2023,6,20]],"date-time":"2023-06-20T18:47:05Z","timestamp":1687286825000},"page":"450-462","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":11,"title":["FLASH: FPGA-Accelerated Smart Switches with GCN Case Study"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2893-9194","authenticated-orcid":false,"given":"Pouya","family":"Haghi","sequence":"first","affiliation":[{"name":"Boston University, Boston, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-7259-5961","authenticated-orcid":false,"given":"William","family":"Krska","sequence":"additional","affiliation":[{"name":"Boston University, Boston, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3727-2889","authenticated-orcid":false,"given":"Cheng","family":"Tan","sequence":"additional","affiliation":[{"name":"Microsoft, Bellevue, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3644-2922","authenticated-orcid":false,"given":"Tong","family":"Geng","sequence":"additional","affiliation":[{"name":"University of Rochester, Rochester, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-8164-7061","authenticated-orcid":false,"given":"Po Hao","family":"Chen","sequence":"additional","affiliation":[{"name":"Boston University, Boston, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-4862-1169","authenticated-orcid":false,"given":"Connor","family":"Greenwood","sequence":"additional","affiliation":[{"name":"Boston University, Boston, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5872-4464","authenticated-orcid":false,"given":"Anqi","family":"Guo","sequence":"additional","affiliation":[{"name":"Boston University, Boston, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9675-0399","authenticated-orcid":false,"given":"Thomas","family":"Hines","sequence":"additional","affiliation":[{"name":"University of Tennessee at Chattanooga, Chattanooga, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-2039-0853","authenticated-orcid":false,"given":"Chunshu","family":"Wu","sequence":"additional","affiliation":[{"name":"Boston University, Boston, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3734-9137","authenticated-orcid":false,"given":"Ang","family":"Li","sequence":"additional","affiliation":[{"name":"Pacific Northwest National Laboratory, Richland, United States of America"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5252-6600","authenticated-orcid":false,"given":"Anthony","family":"Skjellum","sequence":"additional","affiliation":[{"name":"University of Tennessee at Chattanooga, Chattanooga, United States of America"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3443-9113","authenticated-orcid":false,"given":"Martin","family":"Herbordt","sequence":"additional","affiliation":[{"name":"Boston University, Boston, United States of America"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2023,6,21]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"2016 IEEE 24th Annual Symposium on High-Performance Interconnects (HOTI). 76--83","author":"Arap O.","unstructured":"O. Arap and M. Swany . 2016. Offloading Collective Operations to Programmable Logic on a Zynq Cluster . In 2016 IEEE 24th Annual Symposium on High-Performance Interconnects (HOTI). 76--83 . O. Arap and M. Swany. 2016. Offloading Collective Operations to Programmable Logic on a Zynq Cluster. In 2016 IEEE 24th Annual Symposium on High-Performance Interconnects (HOTI). 76--83."},{"key":"e_1_3_2_1_2_1","unstructured":"Arista. 2023. 7130 FPGA-enabled Network Switches - Quick Look. www.arista.com\/en\/products\/7130-fpga-enabled-network-switches-quick-look.  Arista. 2023. 7130 FPGA-enabled Network Switches - Quick Look. www.arista.com\/en\/products\/7130-fpga-enabled-network-switches-quick-look."},{"key":"e_1_3_2_1_3_1","unstructured":"AWS. 2019. Deliver high performance ML inference with AWS Inferentia. https:\/\/d1.awsstatic.com\/events\/reinvent\/2019\/REPEAT_1_Deliver_high_performance_ML_inference_with_AWS_Inferentia_CMP324-R1.pdf.  AWS. 2019. Deliver high performance ML inference with AWS Inferentia. https:\/\/d1.awsstatic.com\/events\/reinvent\/2019\/REPEAT_1_Deliver_high_performance_ML_inference_with_AWS_Inferentia_CMP324-R1.pdf."},{"key":"e_1_3_2_1_4_1","volume-title":"BluesMPI: Efficient MPI Non-blocking Alltoall Offloading Designs on Modern BlueField Smart NICs. In High Performance Computing: 36th International Conference, ISC High Performance","author":"Bayatpour M.","year":"2021","unstructured":"M. Bayatpour , N. Sarkauskas , H. Subramoni , J. Maqbool Hashmi , and D. K. Panda . 2021 . BluesMPI: Efficient MPI Non-blocking Alltoall Offloading Designs on Modern BlueField Smart NICs. In High Performance Computing: 36th International Conference, ISC High Performance 2021 . Springer, 18--37. M. Bayatpour, N. Sarkauskas, H. Subramoni, J. Maqbool Hashmi, and D. K. Panda. 2021. BluesMPI: Efficient MPI Non-blocking Alltoall Offloading Designs on Modern BlueField Smart NICs. In High Performance Computing: 36th International Conference, ISC High Performance 2021. Springer, 18--37."},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/2656877.2656890"},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2017.54"},{"key":"e_1_3_2_1_7_1","volume-title":"Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1--16","author":"Sensi D. De","unstructured":"D. De Sensi , S. Di Girolamo , S. Ashkboos , S. Li , and T. Hoefler . 2021. Flare: Flexible In-Network Allreduce . In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1--16 . D. De Sensi, S. Di Girolamo, S. Ashkboos, S. Li, and T. Hoefler. 2021. Flare: Flexible In-Network Allreduce. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1--16."},{"key":"e_1_3_2_1_8_1","volume-title":"2009 17th IEEE Symposium on High Performance Interconnects","author":"Faraj A.","year":"2009","unstructured":"A. Faraj , S. Kumar , B. Smith , A. Mamidala , and J. Gunnels . 2009. MPI Collective Communications on the Blue Gene\/P Supercomputer: Algorithms and Optimizations . 2009 17th IEEE Symposium on High Performance Interconnects ( 2009 ), 63--72. A. Faraj, S. Kumar, B. Smith, A. Mamidala, and J. Gunnels. 2009. MPI Collective Communications on the Blue Gene\/P Supercomputer: Algorithms and Optimizations. 2009 17th IEEE Symposium on High Performance Interconnects (2009), 63--72."},{"key":"e_1_3_2_1_9_1","unstructured":"J. Gasteiger C. Qian and S. G\u00fcnnemann. 2022. Influence-Based Mini-Batching for Graph Neural Networks. arXiv preprint arXiv:2212.09083 (2022).  J. Gasteiger C. Qian and S. G\u00fcnnemann. 2022. Influence-Based Mini-Batching for Graph Neural Networks. arXiv preprint arXiv:2212.09083 (2022)."},{"key":"e_1_3_2_1_10_1","volume-title":"AWB-GCN: A Graph Convolutional Network Accelerator with Runtime Workload Rebalancing. In 53rd IEEE\/ACM International Symposium on Microarchitecture (MICRO).","author":"Geng T.","unstructured":"T. Geng , A. Li , R. Shi , C. Wu , T. Wang , Y. Li , P. Haghi , A. Tumeo , S. Che , S. Reinhardt , and M.C. Herbordt . 2020 . AWB-GCN: A Graph Convolutional Network Accelerator with Runtime Workload Rebalancing. In 53rd IEEE\/ACM International Symposium on Microarchitecture (MICRO). T. Geng, A. Li, R. Shi, C. Wu, T. Wang, Y. Li, P. Haghi, A. Tumeo, S. Che, S. Reinhardt, and M.C. Herbordt. 2020. AWB-GCN: A Graph Convolutional Network Accelerator with Runtime Workload Rebalancing. In 53rd IEEE\/ACM International Symposium on Microarchitecture (MICRO)."},{"key":"e_1_3_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3466752.3480113"},{"key":"e_1_3_2_1_12_1","volume-title":"2010 IEEE International Symposium on Parallel Distributed Processing, Workshops and Phd Forum (IPDPSW). 1--8.","author":"Graham R. L.","year":"2010","unstructured":"R. L. Graham 2010 . Overlapping Computation and Communication: Barrier Algorithms and ConnectX-2 CORE-Direct Capabilities . In 2010 IEEE International Symposium on Parallel Distributed Processing, Workshops and Phd Forum (IPDPSW). 1--8. R. L. Graham et al. 2010. Overlapping Computation and Communication: Barrier Algorithms and ConnectX-2 CORE-Direct Capabilities. In 2010 IEEE International Symposium on Parallel Distributed Processing, Workshops and Phd Forum (IPDPSW). 1--8."},{"key":"e_1_3_2_1_13_1","volume-title":"2016 First International Workshop on Communication Optimizations in HPC (COMHPC). 1--10","author":"Graham R. L.","year":"2016","unstructured":"R. L. Graham 2016 . Scalable Hierarchical Aggregation Protocol (SHArP): A Hardware Architecture for Efficient Data Reduction . In 2016 First International Workshop on Communication Optimizations in HPC (COMHPC). 1--10 . R. L. Graham et al. 2016. Scalable Hierarchical Aggregation Protocol (SHArP): A Hardware Architecture for Efficient Data Reduction. In 2016 First International Workshop on Communication Optimizations in HPC (COMHPC). 1--10."},{"key":"e_1_3_2_1_14_1","volume-title":"Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)TM Streaming-Aggregation Hardware Design and Evaluation","author":"Graham Richard L.","unstructured":"Richard L. Graham , Lion Levi , Devendar Burredy , Gil Bloch , Gilad Shainer , David Cho , George Elias , Daniel Klein , Joshua Ladd , Ophir Maor , Ami Marelli , Valentin Petrov , Evyatar Romlet , Yong Qin , and Ido Zemah . 2020. Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)TM Streaming-Aggregation Hardware Design and Evaluation . In High Performance Computing, Ponnuswamy Sadayappan, Bradford L. Chamberlain, Guido Juckeland, and Hatem Ltaief (Eds.). Springer International Publishing , Cham , 41--59. Richard L. Graham, Lion Levi, Devendar Burredy, Gil Bloch, Gilad Shainer, David Cho, George Elias, Daniel Klein, Joshua Ladd, Ophir Maor, Ami Marelli, Valentin Petrov, Evyatar Romlet, Yong Qin, and Ido Zemah. 2020. Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)TM Streaming-Aggregation Hardware Design and Evaluation. In High Performance Computing, Ponnuswamy Sadayappan, Bradford L. Chamberlain, Guido Juckeland, and Hatem Ltaief (Eds.). Springer International Publishing, Cham, 41--59."},{"key":"e_1_3_2_1_15_1","volume-title":"International Conference on Field-Programmable Logic and Applications (FPL).","author":"Guo A.","unstructured":"A. Guo , T. Geng , Y. Zhang , P. Haghi , C. Wu , C. Tan , Y. Lin , A. Li , and M.C. Herbordt . 2022. A Framework for Neural Network Inference on FPGA-Centric SmartNICs . In International Conference on Field-Programmable Logic and Applications (FPL). A. Guo, T. Geng, Y. Zhang, P. Haghi, C. Wu, C. Tan, Y. Lin, A. Li, and M.C. Herbordt. 2022. A Framework for Neural Network Inference on FPGA-Centric SmartNICs. In International Conference on Field-Programmable Logic and Applications (FPL)."},{"key":"e_1_3_2_1_16_1","volume-title":"Software-Hardware Co-design of Heterogeneous SmartNIC System for Recommendation Models Inference and Training. In ICS 2023: International Conference on Supercomputing.","author":"Guo A.","unstructured":"A. Guo , Y. Hao , C. Wu , P. Haghi , Z. Pan , M. Si , D. Tao , A. Li , M.C. Herbordt , and T. Geng . 2023 . Software-Hardware Co-design of Heterogeneous SmartNIC System for Recommendation Models Inference and Training. In ICS 2023: International Conference on Supercomputing. A. Guo, Y. Hao, C. Wu, P. Haghi, Z. Pan, M. Si, D. Tao, A. Li, M.C. Herbordt, and T. Geng. 2023. Software-Hardware Co-design of Heterogeneous SmartNIC System for Recommendation Models Inference and Training. In ICS 2023: International Conference on Supercomputing."},{"key":"e_1_3_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPEC49654.2021.9622847"},{"key":"e_1_3_2_1_18_1","volume-title":"FPGAs in the Network and Novel Communicator Support Accelerate MPI Collectives. In IEEE High Performance Extreme Computing Conference.","author":"Haghi P.","unstructured":"P. Haghi , A. Guo , Q. Xiong , R. Patel , C. Yang , T. Geng , J.T. Broaddus , R. Marshall , A. Skjellum , and M.C. Herbordt . 2020 . FPGAs in the Network and Novel Communicator Support Accelerate MPI Collectives. In IEEE High Performance Extreme Computing Conference. P. Haghi, A. Guo, Q. Xiong, R. Patel, C. Yang, T. Geng, J.T. Broaddus, R. Marshall, A. Skjellum, and M.C. Herbordt. 2020. FPGAs in the Network and Novel Communicator Support Accelerate MPI Collectives. In IEEE High Performance Extreme Computing Conference."},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.6769"},{"key":"e_1_3_2_1_20_1","volume-title":"OCT: The Open Cloud FPGA Testbed. In 31st International Conference on Field Programmable Logic and Applications (FPL).","author":"Handagala S.","unstructured":"S. Handagala , M.C. Herbordt , and M. Leeser . 2021 . OCT: The Open Cloud FPGA Testbed. In 31st International Conference on Field Programmable Logic and Applications (FPL). S. Handagala, M.C. Herbordt, and M. Leeser. 2021. OCT: The Open Cloud FPGA Testbed. In 31st International Conference on Field Programmable Logic and Applications (FPL)."},{"key":"e_1_3_2_1_21_1","volume-title":"IEEE INFOCOM 2022 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). 1--6.","author":"Handagala S.","unstructured":"S. Handagala , M. Leeser , K. Patle , and M. Zink . 2022. Network Attached FPGAs in the Open Cloud Testbed (OCT) . In IEEE INFOCOM 2022 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). 1--6. S. Handagala, M. Leeser, K. Patle, and M. Zink. 2022. Network Attached FPGAs in the Open Cloud Testbed (OCT). In IEEE INFOCOM 2022 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). 1--6."},{"key":"e_1_3_2_1_22_1","unstructured":"F. Hauser et al. 2021. A Survey on Data Plane Programming with P4: Fundamentals Advances and Applied Research. arXiv preprint arXiv:2101.10632 (2021).  F. Hauser et al. 2021. A Survey on Data Plane Programming with P4: Fundamentals Advances and Applied Research. arXiv preprint arXiv:2101.10632 (2021)."},{"key":"e_1_3_2_1_23_1","volume-title":"Open graph benchmark: Datasets for machine learning on graphs. Advances in neural information processing systems 33","author":"Hu Weihua","year":"2020","unstructured":"Weihua Hu , Matthias Fey , Marinka Zitnik , Yuxiao Dong , Hongyu Ren , Bowen Liu , Michele Catasta , and Jure Leskovec . 2020. Open graph benchmark: Datasets for machine learning on graphs. Advances in neural information processing systems 33 ( 2020 ), 22118--22133. Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. 2020. Open graph benchmark: Datasets for machine learning on graphs. Advances in neural information processing systems 33 (2020), 22118--22133."},{"key":"e_1_3_2_1_24_1","volume-title":"Proceedings of Machine Learning and Systems 2020","author":"Jia Z.","year":"2020","unstructured":"Z. Jia , S. Lin , M. Gao , M. Zaharia , and A. Aiken . 2020. Improving the Accuracy, Scalability, and Performance of Graph Neural Networks with Roc . In Proceedings of Machine Learning and Systems 2020 , MLSys 2020 , Austin, TX, USA, March 2--4 , 2020, I.S. Dhillon, D.S. Papailiopoulos, and V. Sze (Eds.). mlsys.org. https:\/\/proceedings.mlsys.org\/book\/300.pdf Z. Jia, S. Lin, M. Gao, M. Zaharia, and A. Aiken. 2020. Improving the Accuracy, Scalability, and Performance of Graph Neural Networks with Roc. In Proceedings of Machine Learning and Systems 2020, MLSys 2020, Austin, TX, USA, March 2--4, 2020, I.S. Dhillon, D.S. Papailiopoulos, and V. Sze (Eds.). mlsys.org. https:\/\/proceedings.mlsys.org\/book\/300.pdf"},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/3061639.3062262"},{"key":"e_1_3_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2021.3086704"},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/HOTI51249.2020.00018"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/CGO.2004.1281665"},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/3295500.3356169"},{"key":"e_1_3_2_1_30_1","volume-title":"2019 ACM\/IEEE 46th Annual International Symposium on Computer Architecture (ISCA). 279--291","author":"Youjie","unstructured":"Youjie Li and et al. 2019. Accelerating Distributed Reinforcement learning with In-Switch Computing . In 2019 ACM\/IEEE 46th Annual International Symposium on Computer Architecture (ISCA). 279--291 . Youjie Li and et al. 2019. Accelerating Distributed Reinforcement learning with In-Switch Computing. In 2019 ACM\/IEEE 46th Annual International Symposium on Computer Architecture (ISCA). 279--291."},{"key":"e_1_3_2_1_31_1","volume-title":"ADRES: An Architecture with Tightly Coupled VLIW Processor and Coarse-Grained Reconfigurable Matrix. In Field Programmable Logic and Application (FPL). 61--70.","author":"Mei B.","year":"2003","unstructured":"B. Mei , S. Vernalde , D. Verkest , H. De Man , and R. Lauwereins . 2003 . ADRES: An Architecture with Tightly Coupled VLIW Processor and Coarse-Grained Reconfigurable Matrix. In Field Programmable Logic and Application (FPL). 61--70. B. Mei, S. Vernalde, D. Verkest, H. De Man, and R. Lauwereins. 2003. ADRES: An Architecture with Tightly Coupled VLIW Processor and Coarse-Grained Reconfigurable Matrix. In Field Programmable Logic and Application (FPL). 61--70."},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/1397718.1397720"},{"key":"e_1_3_2_1_33_1","unstructured":"New Wave DV. 2023. 32-Port Programmable Switch. https:\/\/newwavedv.com\/products\/appliances\/32-port-programmable-switch\/.  New Wave DV. 2023. 32-Port Programmable Switch. https:\/\/newwavedv.com\/products\/appliances\/32-port-programmable-switch\/."},{"key":"e_1_3_2_1_34_1","volume-title":"SC '15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1--12","author":"Park J.","unstructured":"J. Park , M. Smelyanskiy , U. M. Yang , D. Mudigere , and P. Dubey . 2015. High-performance algebraic multigrid solver optimized for multi-core based distributed parallel systems . In SC '15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1--12 . J. Park, M. Smelyanskiy, U. M. Yang, D. Mudigere, and P. Dubey. 2015. High-performance algebraic multigrid solver optimized for multi-core based distributed parallel systems. In SC '15: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1--12."},{"key":"e_1_3_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/3079856.3080256"},{"key":"e_1_3_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/MCOM.001.2000399"},{"key":"e_1_3_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCAS51556.2021.9401784"},{"key":"e_1_3_2_1_38_1","unstructured":"RISC-V. 2023. RISC-V Specifications. https:\/\/riscv.org\/technical\/specifications\/.  RISC-V. 2023. RISC-V Specifications. https:\/\/riscv.org\/technical\/specifications\/."},{"key":"e_1_3_2_1_39_1","unstructured":"RISC-V. 2023. RISC-V 'V' Vector Specifications. https:\/\/github.com\/riscv\/riscv-v-spec\/blob\/master\/v-spec.adoc.  RISC-V. 2023. RISC-V 'V' Vector Specifications. https:\/\/github.com\/riscv\/riscv-v-spec\/blob\/master\/v-spec.adoc."},{"key":"e_1_3_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/NetSoft51509.2021.9492726"},{"key":"e_1_3_2_1_41_1","volume-title":"Scaling Distributed Machine Learning with In-Network Aggregation. In 18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21)","author":"A. Sapio","year":"2021","unstructured":"A. Sapio et al. 2021 . Scaling Distributed Machine Learning with In-Network Aggregation. In 18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21) . 785--808. https:\/\/www.usenix.org\/conference\/nsdi21\/presentation\/sapio A. Sapio et al. 2021. Scaling Distributed Machine Learning with In-Network Aggregation. In 18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21). 785--808. https:\/\/www.usenix.org\/conference\/nsdi21\/presentation\/sapio"},{"key":"e_1_3_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3039902.3039904"},{"key":"e_1_3_2_1_43_1","unstructured":"G. Siracusano and R. Bifulco. 2018. In-Network Neural Networks. arXiv preprint arXiv:1801.05731 (2018).  G. Siracusano and R. Bifulco. 2018. In-Network Neural Networks. arXiv preprint arXiv:1801.05731 (2018)."},{"key":"e_1_3_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/3093338.3093385"},{"key":"e_1_3_2_1_45_1","volume-title":"Workshop on Exascale MPI.","author":"Stern J.","unstructured":"J. Stern , Q. Xiong , J. Sheng , A. Skjellum , and M.C. Herbordt . 2017. Accelerating MPI_Reduce with FPGAs in the Network . In Workshop on Exascale MPI. J. Stern, Q. Xiong, J. Sheng, A. Skjellum, and M.C. Herbordt. 2017. Accelerating MPI_Reduce with FPGAs in the Network. In Workshop on Exascale MPI."},{"key":"e_1_3_2_1_46_1","volume-title":"Workshop on Exascale MPI.","author":"Stern J.","unstructured":"J. Stern , Q. Xiong , A. Skjellum , and M.C. Herbordt . 2018. A Novel Approach to Supporting Communicators for In-Switch Processing of MPI Collectives . In Workshop on Exascale MPI. J. Stern, Q. Xiong, A. Skjellum, and M.C. Herbordt. 2018. A Novel Approach to Supporting Communicators for In-Switch Processing of MPI Collectives. In Workshop on Exascale MPI."},{"key":"e_1_3_2_1_47_1","volume-title":"Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '22)","author":"Swamy T.","unstructured":"T. Swamy , A. Rucker , M. Shahbaz , I. Gaur , and K. Olukotun . 2022. Taurus: a Data Plane Architecture for Per-Packet ML . In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '22) . 1099--1114. T. Swamy, A. Rucker, M. Shahbaz, I. Gaur, and K. Olukotun. 2022. Taurus: a Data Plane Architecture for Per-Packet ML. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '22). 1099--1114."},{"key":"e_1_3_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2019.00022"},{"key":"e_1_3_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC41405.2020.00074"},{"key":"e_1_3_2_1_50_1","volume-title":"Proceedings of the Symposium on SDN Research (SOSR '17)","author":"H. Wang","year":"2017","unstructured":"H. Wang et al. 2017 . P4FPGA: A Rapid Prototyping Framework for P4 . In Proceedings of the Symposium on SDN Research (SOSR '17) . 122--135. H. Wang et al. 2017. P4FPGA: A Rapid Prototyping Framework for P4. In Proceedings of the Symposium on SDN Research (SOSR '17). 122--135."},{"key":"e_1_3_2_1_51_1","unstructured":"Andrew Waterman and Krste Asanovic. 2017. The RISC-V Instruction Set Manual Volume I: User-Level ISA Document Version 2.2. https:\/\/riscv.org\/wp-content\/uploads\/2017\/05\/riscv-spec-v2.2.pdf.  Andrew Waterman and Krste Asanovic. 2017. The RISC-V Instruction Set Manual Volume I: User-Level ISA Document Version 2.2. https:\/\/riscv.org\/wp-content\/uploads\/2017\/05\/riscv-spec-v2.2.pdf."},{"key":"e_1_3_2_1_52_1","unstructured":"Xilinx. 2023. AXI Reference Guide Vivado Design Suite. https:\/\/docs.xilinx.com\/v\/u\/en-US\/ug1037-vivado-axi-reference-guide.  Xilinx. 2023. AXI Reference Guide Vivado Design Suite. https:\/\/docs.xilinx.com\/v\/u\/en-US\/ug1037-vivado-axi-reference-guide."},{"key":"e_1_3_2_1_53_1","unstructured":"Xilinx. 2023. Xilinx Runtime Library (XRT). https:\/\/www.xilinx.com\/products\/design-tools\/vitis\/xrt.html.  Xilinx. 2023. Xilinx Runtime Library (XRT). https:\/\/www.xilinx.com\/products\/design-tools\/vitis\/xrt.html."},{"key":"e_1_3_2_1_54_1","unstructured":"Xilinx. 2023. XUP Vitis Network Example (VNx). https:\/\/github.com\/Xilinx\/xup_vitis_network_example.  Xilinx. 2023. XUP Vitis Network Example (VNx). https:\/\/github.com\/Xilinx\/xup_vitis_network_example."},{"key":"e_1_3_2_1_55_1","volume-title":"Proceedings of the 18th ACM Workshop on Hot Topics in Networks. 25--33","author":"Xiong Z.","unstructured":"Z. Xiong and N. Zilberman . 2019. Do Switches Dream of Machine Learning? Toward In-Network Classification . In Proceedings of the 18th ACM Workshop on Hot Topics in Networks. 25--33 . Z. Xiong and N. Zilberman. 2019. Do Switches Dream of Machine Learning? Toward In-Network Classification. In Proceedings of the 18th ACM Workshop on Hot Topics in Networks. 25--33."},{"key":"e_1_3_2_1_56_1","volume-title":"BoostGCN: A Framework for Optimizing GCN Inference on FPGA. In 2021 IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). 29--39","author":"Zhang B.","unstructured":"B. Zhang , R. Kannan , and V. Prasanna . 2021 . BoostGCN: A Framework for Optimizing GCN Inference on FPGA. In 2021 IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). 29--39 . B. Zhang, R. Kannan, and V. Prasanna. 2021. BoostGCN: A Framework for Optimizing GCN Inference on FPGA. In 2021 IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). 29--39."}],"event":{"name":"ICS '23: 37th International Conference on Supercomputing","location":"Orlando FL USA","acronym":"ICS '23","sponsor":["SIGARCH ACM Special Interest Group on Computer Architecture"]},"container-title":["Proceedings of the 37th International Conference on Supercomputing"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3577193.3593739","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:47:32Z","timestamp":1750178852000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3577193.3593739"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,21]]},"references-count":56,"alternative-id":["10.1145\/3577193.3593739","10.1145\/3577193"],"URL":"https:\/\/doi.org\/10.1145\/3577193.3593739","relation":{},"subject":[],"published":{"date-parts":[[2023,6,21]]},"assertion":[{"value":"2023-06-21","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}