{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,2]],"date-time":"2025-12-02T19:37:02Z","timestamp":1764704222087,"version":"3.46.0"},"reference-count":78,"publisher":"Association for Computing Machinery (ACM)","issue":"CoNEXT4","funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["2472404"],"award-info":[{"award-number":["2472404"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Netw."],"published-print":{"date-parts":[[2025,11,24]]},"abstract":"<jats:p>Congestion control (CC) is crucial for datacenter networks (DCNs), and CC frameworks are proposed to enable users to easily deploy new algorithms tailored to diverse scenarios. The framework is desired to be high-performance and generic: (i) allows CC to achieve high throughput and low latency. (ii) supports various algorithms and congestion scenarios. However, prior works either suffer from performance limitations or lack sufficient generality. CCP experiences throughput degradation under heavy traffic, while DOCA-PCC improves performance using hardware but lacks support for detecting and mitigating host congestion. In this paper, we present Taurus, a high-performance and generic CC framework through hardware-software co-design. To this end, Taurus partitions CC functions into distinct tasks and maps them onto suitable hardware\/software components while mitigating excessive interaction overhead. Specifically, Taurus designs a collaborative signal collection mechanism to support diverse congestion feedback, a type-aware message report engine to reduce communication overhead, and software built-in handlers to facilitate deployments. We have implemented a fully functional Taurus on commodity servers with FPGA-based NICs. Experimental results show that Taurus supports various CC algorithms in achieving their near-native performance. Compared to CCP, Taurus improves throughput by 32.3%, reduces latency by 96.4%, and lowers CPU overhead by 158.7%. Compared to DOCA-PCC, Taurus improves throughput by 9.3% and reduces latency by 28.8%.<\/jats:p>","DOI":"10.1145\/3768973","type":"journal-article","created":{"date-parts":[[2025,11,25]],"date-time":"2025-11-25T17:09:56Z","timestamp":1764090596000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Taurus: Towards A High-Performance and Generic Congestion Control Framework for Datacenter Networks"],"prefix":"10.1145","volume":"3","author":[{"ORCID":"https:\/\/orcid.org\/0009-0004-3814-3776","authenticated-orcid":false,"given":"Luyang","family":"Li","sequence":"first","affiliation":[{"name":"Institute of Computing Technology, Chinese Academy of Sciences, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5506-5958","authenticated-orcid":false,"given":"Heng","family":"Pan","sequence":"additional","affiliation":[{"name":"Computer Network Information Center, Chinese Academy of Sciences, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-1489-1288","authenticated-orcid":false,"given":"Pengyi","family":"Zhang","sequence":"additional","affiliation":[{"name":"Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-7675-5711","authenticated-orcid":false,"given":"Kai","family":"Lv","sequence":"additional","affiliation":[{"name":"Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3184-4081","authenticated-orcid":false,"given":"Zilong","family":"Wang","sequence":"additional","affiliation":[{"name":"Hong Kong University of Science and Technology, Hongkong SAR, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6503-5309","authenticated-orcid":false,"given":"Xinchen","family":"Wan","sequence":"additional","affiliation":[{"name":"Hong Kong University of Science and Technology, Hongkong SAR, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-3297-5336","authenticated-orcid":false,"given":"Dai","family":"Zhang","sequence":"additional","affiliation":[{"name":"Douyin Vision Co., Ltd., Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4985-1645","authenticated-orcid":false,"given":"Xiaolong","family":"Zhong","sequence":"additional","affiliation":[{"name":"Douyin Vision Co., Ltd., Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-4093-3187","authenticated-orcid":false,"given":"Haoran","family":"Wei","sequence":"additional","affiliation":[{"name":"Douyin Vision Co., Ltd., Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-1556-7919","authenticated-orcid":false,"given":"Lichao","family":"Liu","sequence":"additional","affiliation":[{"name":"Douyin Vision Co., Ltd., Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9171-9990","authenticated-orcid":false,"given":"Huichen","family":"Dai","sequence":"additional","affiliation":[{"name":"Douyin Vision Co., Ltd., Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-4951-3099","authenticated-orcid":false,"given":"Qingsong","family":"Ning","sequence":"additional","affiliation":[{"name":"Researcher, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-7102-1252","authenticated-orcid":false,"given":"Xin","family":"Wei","sequence":"additional","affiliation":[{"name":"Douyin Vision Co., Ltd., Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-7167-5768","authenticated-orcid":false,"given":"Shideng","family":"Zhang","sequence":"additional","affiliation":[{"name":"Douyin Vision Co., Ltd., Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-3290-4767","authenticated-orcid":false,"given":"Hongtao","family":"Guan","sequence":"additional","affiliation":[{"name":"Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9959-1124","authenticated-orcid":false,"given":"Zhenyu","family":"Li","sequence":"additional","affiliation":[{"name":"Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4964-1135","authenticated-orcid":false,"given":"Gaogang","family":"Xie","sequence":"additional","affiliation":[{"name":"Computer Network Information Center, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2025,11,25]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/3470496.3527405"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/3563766.3564110"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/3603269.3604878"},{"key":"e_1_2_1_4_1","volume-title":"A scalable, commodity data center network architecture. ACM SIGCOMM computer communication review, 38(4):63-74","author":"Al-Fares Mohammad","year":"2008","unstructured":"Mohammad Al-Fares, Alexander Loukissas, and Amin Vahdat. A scalable, commodity data center network architecture. ACM SIGCOMM computer communication review, 38(4):63-74, 2008."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/1851182.1851192"},{"key":"e_1_2_1_6_1","volume-title":"https:\/\/developer.amd.com\/wp- content\/resources\/56375.pdf","author":"Service Extensions Technology Platform","year":"2020","unstructured":"AMD64 Technology Platform Quality of Service Extensions. https:\/\/developer.amd.com\/wp- content\/resources\/56375.pdf, 2020."},{"key":"e_1_2_1_7_1","first-page":"97","volume-title":"2018 USENIX Annual Technical Conference (USENIX ATC 18)","author":"Amit Nadav","year":"2018","unstructured":"Nadav Amit and Michael Wei. The design and implementation of hyperupcalls. In 2018 USENIX Annual Technical Conference (USENIX ATC 18), pages 97-112, 2018."},{"key":"e_1_2_1_8_1","volume-title":"Proc. NSDI","author":"Arashloo Mina Tahmasbi","year":"2020","unstructured":"Mina Tahmasbi Arashloo, Alexey Lavrov, Manya Ghobadi, Jennifer Rexford, David Walker, and David Wentzlaff. Enabling programmable transport protocols in high-speed nics. In Proc. NSDI, 2020."},{"key":"e_1_2_1_9_1","first-page":"219","volume-title":"20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23)","author":"Arslan Serhat","year":"2023","unstructured":"Serhat Arslan, Yuliang Li, Gautam Kumar, and Nandita Dukkipati. Bolt:sub-rtt congestion control for ultra-low latency. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23), pages 219-236, 2023."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/90.649568"},{"key":"e_1_2_1_11_1","volume-title":"https:\/\/www.intel.com\/content\/www\/us\/en\/content-details\/758440\/agilex-fpga-portfolio-product-brief.html","author":"Portfolio Product Brief Agilex\u2122 FPGA","year":"2023","unstructured":"Agilex\u2122 FPGA Portfolio Product Brief. https:\/\/www.intel.com\/content\/www\/us\/en\/content-details\/758440\/agilex-fpga-portfolio-product-brief.html, 2023."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCOM.1974.1092259"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3575693.3576173"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3098822.3098840"},{"key":"e_1_2_1_15_1","volume-title":"https:\/\/docs.nvidia.com\/doca\/sdk\/nvidiadocapccapplicationguide\/index.html","author":"Implementation On DOCA PCC","year":"2023","unstructured":"DOCA PCC Implementation On Top of NVIDIA\u00ae BlueField\u00ae networking platform. https:\/\/docs.nvidia.com\/doca\/sdk\/nvidiadocapccapplicationguide\/index.html, 2023."},{"key":"e_1_2_1_16_1","first-page":"343","volume-title":"15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18)","author":"Dong Mo","year":"2018","unstructured":"Mo Dong, Tong Meng, Doron Zarchy, Engin Arslan, Yossi Gilad, Brighten Godfrey, and Michael Schapira. Pcc vivace:online-learning congestion control. In 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18), pages 343-356, 2018."},{"key":"e_1_2_1_17_1","volume-title":"DOCA Documentation v2.8.0. https:\/\/docs.nvidia.com\/doca\/sdk\/dpasubsystem\/index.html","author":"Subsystem DPA","year":"2024","unstructured":"DPA Subsystem, DOCA Documentation v2.8.0. https:\/\/docs.nvidia.com\/doca\/sdk\/dpasubsystem\/index.html, 2024."},{"key":"e_1_2_1_18_1","volume-title":"https:\/\/www.nvidia.com\/content\/dam\/en-zz\/Solutions\/Data-Center\/documents\/datasheet-nvidia-bluefield-3-dpu.pdf","author":"NVIDIA","year":"2021","unstructured":"NVIDIA BlueField-3 DPU. https:\/\/www.nvidia.com\/content\/dam\/en-zz\/Solutions\/Data-Center\/documents\/datasheet-nvidia-bluefield-3-dpu.pdf, 2021."},{"volume-title":"Open Source High Performant Network Framework based on DPDK. https:\/\/github.com\/F-Stack\/f-stack","year":"2017","key":"e_1_2_1_19_1","unstructured":"F-Stack: Open Source High Performant Network Framework based on DPDK. https:\/\/github.com\/F-Stack\/f-stack, 2017."},{"key":"e_1_2_1_20_1","first-page":"381","volume-title":"2025 USENIX Annual Technical Conference (USENIX ATC 25)","author":"Gan Qiaoyin","year":"2025","unstructured":"Qiaoyin Gan, Heng Pan, Luyang Li, Kai Lv, Hongtao Guan, Zhaohua Wang, Zhenyu Li, and Gaogang Xie. Snary: A high-performance and generic smartnic-accelerated retrieval system. In 2025 USENIX Annual Technical Conference (USENIX ATC 25), pages 381-398, 2025."},{"key":"e_1_2_1_21_1","volume-title":"Rdma in data centers: Looking back and looking forward. Keynote at APNet","author":"Guo Chuanxiong","year":"2017","unstructured":"Chuanxiong Guo. Rdma in data centers: Looking back and looking forward. Keynote at APNet, 2017."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/2934872.2934908"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3098822.3098825"},{"volume-title":"https:\/\/docs.broadcom.com\/doc\/IBT-PB100","year":"2017","key":"e_1_2_1_24_1","unstructured":"In-band Telemetry. https:\/\/docs.broadcom.com\/doc\/IBT-PB100, 2017."},{"key":"e_1_2_1_25_1","volume-title":"https:\/\/www.intel.com\/content\/www\/us\/en\/developer\/articles\/technical\/intel-sdm.html","author":"Architectures Software Developer Manuals Intel\u00ae","year":"2023","unstructured":"Intel\u00ae 64 and IA-32 Architectures Software Developer Manuals. https:\/\/www.intel.com\/content\/www\/us\/en\/developer\/articles\/technical\/intel-sdm.html, 2023."},{"key":"e_1_2_1_26_1","volume-title":"https:\/\/www.intel.com\/content\/www\/us\/en\/developer\/articles\/tool\/intelr-memory-latency-checker.html","author":"Latency Checker Intel\u00ae Memory","year":"2023","unstructured":"Intel\u00ae Memory Latency Checker. (2023). https:\/\/www.intel.com\/content\/www\/us\/en\/developer\/articles\/tool\/intelr-memory-latency-checker.html, 2023."},{"key":"e_1_2_1_27_1","volume-title":"https:\/\/www.intel.com\/content\/www\/us\/en\/developer\/tools\/oneapi\/vtune-profiler.html#gs.gkpp99","author":"Tune\u2122 Profiler Intel\u00ae","year":"2024","unstructured":"Intel\u00ae VTune\u2122 Profiler. https:\/\/www.intel.com\/content\/www\/us\/en\/developer\/tools\/oneapi\/vtune-profiler.html#gs.gkpp99, 2024."},{"key":"e_1_2_1_28_1","volume-title":"https:\/\/www.intel.com\/content\/www\/us\/en\/products\/details\/processors\/xeon.html","author":"Processors Intel\u00ae Xeon\u00ae","year":"2024","unstructured":"Intel\u00ae Xeon\u00ae Processors. https:\/\/www.intel.com\/content\/www\/us\/en\/products\/details\/processors\/xeon.html, 2024."},{"key":"e_1_2_1_29_1","first-page":"489","volume-title":"11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14)","author":"Jeong EunYoung","year":"2014","unstructured":"EunYoung Jeong, Shinae Wood, Muhammad Jamshed, Haewon Jeong, Sunghwan Ihm, Dongsu Han, and KyoungSoo Park. mtcp: a highly scalable user-level tcp stack for multicore systems. In 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14), pages 489-502, 2014."},{"key":"e_1_2_1_30_1","volume-title":"et al. Megascale: Scaling large language model training to more than 10,000 gpus. arXiv preprint arXiv:2402.15627","author":"Jiang Ziheng","year":"2024","unstructured":"Ziheng Jiang, Haibin Lin, Yinmin Zhong, Qi Huang, Yangrui Chen, Zhi Zhang, Yanghua Peng, Xiang Li, Cong Xie, Shibiao Nong, et al. Megascale: Scaling large language model training to more than 10,000 gpus. arXiv preprint arXiv:2402.15627, 2024."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/INFCOM.1998.665104"},{"key":"e_1_2_1_32_1","volume-title":"Proc. NSDI","author":"Kalia Anuj","year":"2019","unstructured":"Anuj Kalia, Michael Kaminsky, and David Andersen. Datacenter rpcs can be general and fast. In Proc. NSDI, 2019."},{"key":"e_1_2_1_33_1","volume-title":"Proc. SIGCOMM","author":"Kumar Gautam","year":"2020","unstructured":"Gautam Kumar, Nandita Dukkipati, Keon Jang, Hassan MG Wassel, Xian Wu, Behnam Montazeri, Yaogong Wang, Kevin Springborn, Christopher Alfeld, Michael Ryan, et al. Swift: Delay is simple and effective for congestion control in the datacenter. In Proc. SIGCOMM, 2020."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3676641.3716259"},{"key":"e_1_2_1_35_1","volume-title":"From rdma to rdca: Toward high-speed last mile of data center networks using remote direct cache access. arXiv preprint arXiv:2211.05975","author":"Li Qiang","year":"2022","unstructured":"Qiang Li, Qiao Xiang, Derui Liu, Yuxin Wang, Haonan Qiu, Xiaoliang Wang, Jie Zhang, Ridi Wen, Haohao Song, Gexiao Tian, et al. From rdma to rdca: Toward high-speed last mile of data center networks using remote direct cache access. arXiv preprint arXiv:2211.05975, 2022."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3341302.3342085"},{"key":"e_1_2_1_37_1","volume-title":"Ub-mesh: a hierarchically localized nd-fullmesh datacenter network architecture. arXiv preprint arXiv:2503.20377","author":"Liao Heng","year":"2025","unstructured":"Heng Liao, Bingyang Liu, Xianping Chen, Zhigang Guo, Chuanning Cheng, Jianbing Wang, Xiangyu Chen, Peng Dong, Rui Meng, Wenjie Liu, et al. Ub-mesh: a hierarchically localized nd-fullmesh datacenter network architecture. arXiv preprint arXiv:2503.20377, 2025."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/3627703.3650069"},{"key":"e_1_2_1_39_1","first-page":"103","volume-title":"20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23)","author":"Liu Tianfeng","year":"2023","unstructured":"Tianfeng Liu, Yangrui Chen, Dan Li, Chuan Wu, Yibo Zhu, Jun He, Yanghua Peng, Hongzheng Chen, Hongzhi Chen, and Chuanxiong Guo. Bgl:gpu-efficientgnn training by optimizing graph data i\/o and preprocessing. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23), pages 103-118, 2023."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/3689031.3696101"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3718958.3750539"},{"key":"e_1_2_1_42_1","first-page":"357","volume-title":"15th USENIX symposium on networked systems design and implementation (NSDI 18)","author":"Lu Yuanwei","year":"2018","unstructured":"Yuanwei Lu, Guo Chen, Bojie Li, Kun Tan, Yongqiang Xiong, Peng Cheng, Jiansong Zhang, Enhong Chen, and Thomas Moscibroda. Multi-path transport for rdma in datacenters. In 15th USENIX symposium on networked systems design and implementation (NSDI 18), pages 357-371, 2018."},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/VTC2021-Spring51267.2021.9448640"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/3735358.3735373"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICNP61940.2024.10858514"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/3492321.3519593"},{"key":"e_1_2_1_47_1","volume-title":"https:\/\/network.nvidia.com\/sites\/default\/files\/doc-2020\/pb-connectx-6-en-card.pdf","author":"Product Brief Mellanox","year":"2020","unstructured":"Mellanox ConnectX-6 Product Brief. https:\/\/network.nvidia.com\/sites\/default\/files\/doc-2020\/pb-connectx-6-en-card.pdf, 2020."},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/3544216.3544238"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/3544216.3544238"},{"key":"e_1_2_1_50_1","volume-title":"Proc. SIGCOMM","author":"Mittal Radhika","year":"2015","unstructured":"Radhika Mittal, Vinh The Lam, Nandita Dukkipati, Emily Blem, Hassan Wassel, Monia Ghobadi, Amin Vahdat, Yaogong Wang, David Wetherall, and David Zats. Timely: Rtt-based congestion control for the datacenter. In Proc. SIGCOMM, 2015."},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/3230543.3230564"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/3230543.3230553"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/3458817.3476209"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1145\/3230543.3230560"},{"key":"e_1_2_1_55_1","volume-title":"https:\/\/www.nsnam.org\/","author":"Simulator Network","year":"2011","unstructured":"Ns-3 Network Simulator. https:\/\/www.nsnam.org\/, 2011."},{"key":"e_1_2_1_56_1","first-page":"761","volume-title":"19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22)","author":"Olteanu Vladimir","year":"2022","unstructured":"Vladimir Olteanu, Haggai Eran, Dragos Dumitrescu, Adrian Popa, Cristi Baciu, Mark Silberstein, Georgios Nikolaidis, Mark Handley, and Costin Raiciu. An edge-queued datagram service for all datacenter traffic. In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22), pages 761-777, 2022."},{"key":"e_1_2_1_57_1","first-page":"451","volume-title":"22nd USENIX Symposium on Networked Systems Design and Implementation (NSDI 25)","author":"Prasopoulos Konstantinos","year":"2025","unstructured":"Konstantinos Prasopoulos, Ryan Kosta, Edouard Bugnion, and Marios Kogias. Sird: A sender-informed,receiver-driven datacenter transport protocol. In 22nd USENIX Symposium on Networked Systems Design and Implementation (NSDI 25), pages 451-471, 2025."},{"key":"e_1_2_1_58_1","volume-title":"https:\/\/docs.majerle.eu\/projects\/lwrb\/en\/v1.2.0\/index.html","author":"Ringbuffer","year":"2019","unstructured":"Ringbuffer documentation. https:\/\/docs.majerle.eu\/projects\/lwrb\/en\/v1.2.0\/index.html, 2019."},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1145\/2785956.2787472"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1145\/3718958.3754353"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1145\/3534879.3534882"},{"key":"e_1_2_1_62_1","volume-title":"RoCEv2 (IP routable RoCE). https:\/\/www.infinibandta.org\/ specs","author":"Supplement","year":"2014","unstructured":"Supplement to InfiniBand architecture specification volume 1 release 1.2.2 annex A17: RoCEv2 (IP routable RoCE). https:\/\/www.infinibandta.org\/ specs, 2014."},{"key":"e_1_2_1_63_1","volume-title":"https:\/\/docs.amd.com\/v\/u\/en-US\/wp350","author":"Express Systems Understanding Performance","year":"2014","unstructured":"Understanding Performance of PCI Express Systems. https:\/\/docs.amd.com\/v\/u\/en-US\/wp350, 2014."},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1145\/2377677.2377709"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1145\/3651890.3672271"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM55648.2025.11044670"},{"key":"e_1_2_1_67_1","first-page":"1289","volume-title":"19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22)","author":"Wang Tao","year":"2022","unstructured":"Tao Wang, Xiangrui Yang, Gianni Antichi, Anirudh Sivaraman, and Aurojit Panda. Isolation mechanisms for high-speedpacket-processing pipelines. In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22), pages 1289-1305, 2022."},{"key":"e_1_2_1_68_1","first-page":"255","volume-title":"20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23)","author":"Wang Weitao","year":"2023","unstructured":"Weitao Wang, Masoud Moshref, Yuliang Li, Gautam Kumar, TS Eugene Ng, Neal Cardwell, and Nandita Dukkipati. Poseidon: Efficient, robust, and practical datacenter cc via deployable int. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23), pages 255-274, 2023."},{"key":"e_1_2_1_69_1","first-page":"1","volume-title":"20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23)","author":"Wang Zilong","year":"2023","unstructured":"Zilong Wang, Layong Luo, Qingsong Ning, Chaoliang Zeng, Wenxue Li, Xinchen Wan, Peng Xie, Tao Feng, Ke Cheng, Xiongfei Geng, et al. Srnic: A scalable architecture for rdmanics. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23), pages 1-14, 2023."},{"key":"e_1_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.1145\/3651890.3672215"},{"key":"e_1_2_1_71_1","volume-title":"Opportunities and limitations of quality-of-service (qos) in message passing (mpi) applications on adaptively routed dragonfly and fat tree networks. Technical report","author":"Wilke Jeremiah J","year":"2020","unstructured":"Jeremiah J Wilke and Joseph P Kenny. Opportunities and limitations of quality-of-service (qos) in message passing (mpi) applications on adaptively routed dragonfly and fat tree networks. Technical report, Sandia National Lab.(SNL-CA), Livermore, CA (United States), 2020."},{"key":"e_1_2_1_72_1","first-page":"459","volume-title":"10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13)","author":"Winstein Keith","year":"2013","unstructured":"Keith Winstein, Anirudh Sivaraman, and Hari Balakrishnan. Stochastic forecasts achieve high throughput and low delay over cellular networks. In 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13), pages 459-471, 2013."},{"key":"e_1_2_1_73_1","doi-asserted-by":"publisher","DOI":"10.5555\/1227865.1228021"},{"key":"e_1_2_1_74_1","first-page":"731","volume-title":"2018 USENIX Annual Technical Conference (USENIX ATC 18)","author":"Yan Francis Y","year":"2018","unstructured":"Francis Y Yan, Jestin Ma, Greg D Hill, Deepti Raghavan, Riad S Wahby, Philip Levis, and Keith Winstein. Pantheon: the training ground for internet congestion-control research. In 2018 USENIX Annual Technical Conference (USENIX ATC 18), pages 731-743, 2018."},{"key":"e_1_2_1_75_1","first-page":"1345","volume-title":"19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22)","author":"Zeng Chaoliang","year":"2022","unstructured":"Chaoliang Zeng, Layong Luo, Teng Zhang, Zilong Wang, Luyang Li, Wenchen Han, Nan Chen, Lebing Wan, Lichao Liu, Zhipeng Ding, et al. Tiara: A scalable and efficient hardware acceleration architecture for stateful layer-4 load balancing. In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22), pages 1345-1358, 2022."},{"key":"e_1_2_1_76_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICPADS60453.2023.00373"},{"key":"e_1_2_1_77_1","first-page":"673","volume-title":"2023 USENIX Annual Technical Conference (USENIX ATC 23)","author":"Zhu Lingjun","year":"2023","unstructured":"Lingjun Zhu, Yifan Shen, Erci Xu, Bo Shi, Ting Fu, Shu Ma, Shuguang Chen, Zhongyu Wang, Haonan Wu, Xingyu Liao, et al. Deploying user-space tcp at cloud scale with luna. In 2023 USENIX Annual Technical Conference (USENIX ATC 23), pages 673-687, 2023."},{"key":"e_1_2_1_78_1","doi-asserted-by":"publisher","DOI":"10.1145\/2785956.2787484"}],"container-title":["Proceedings of the ACM on Networking"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3768973","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,2]],"date-time":"2025-12-02T19:33:23Z","timestamp":1764704003000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3768973"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11,24]]},"references-count":78,"journal-issue":{"issue":"CoNEXT4","published-print":{"date-parts":[[2025,11,24]]}},"alternative-id":["10.1145\/3768973"],"URL":"https:\/\/doi.org\/10.1145\/3768973","relation":{},"ISSN":["2834-5509"],"issn-type":[{"type":"electronic","value":"2834-5509"}],"subject":[],"published":{"date-parts":[[2025,11,24]]},"assertion":[{"value":"2025-11-25","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}