{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,16]],"date-time":"2026-07-16T14:10:59Z","timestamp":1784211059834,"version":"3.55.0"},"reference-count":106,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2025,11,24]],"date-time":"2025-11-24T00:00:00Z","timestamp":1763942400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100009592","name":"Beijing Municipal Science & Technology Commission","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100009592","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100005090","name":"project of the Research and Development of the Intra-node Scale-up Interconnect Protocol in SuperPod","doi-asserted-by":"publisher","award":["RZ241100004224022"],"award-info":[{"award-number":["RZ241100004224022"]}],"id":[{"id":"10.13039\/501100005090","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Future Internet"],"abstract":"<jats:p>Nowadays, driven by the exponential growth of parameters and training data of AI applications and Large Language Models, a single GPU is no longer sufficient in terms of computing power and storage capacity. Building high-performance multi-GPU systems or a GPU cluster via vertical scaling (scale-up) has thus become an effective approach to break the bottleneck and has further emerged as a key research focus. Given that traditional inter-GPU communication technologies fail to meet the requirement of GPU interconnection in vertical scaling, a variety of high-performance inter-GPU communication protocols tailored for the scale-up domain have been proposed recently. Notably, due to the emerging nature of these demands and technologies, academic research in this field remains scarce, with limited deep participation from the academic community. Inspired by this trend, this article identifies the challenges and requirements of a scale-up network, analyzes the bottlenecks of traditional technologies like PCIe in a scale-up network, and surveys the emerging scale-up targeted technologies, including NVLink, OISA, UALink, SUE, and other X-Links. Then, an in-depth comparison and discussion is conducted, and we express our insights in protocol design and related technologies. We also highlight that existing emerging protocols and technologies still face limitations, with certain technical mechanisms requiring further exploration. Finally, this article presents future research directions and opportunities. As the first review article fully focusing on intra-node GPU interconnection in a scale-up network, this article aims to provide valuable insights and guidance for future research in this emerging field, and we hope to establish a foundation that will inspire and direct subsequent studies.<\/jats:p>","DOI":"10.3390\/fi17120537","type":"journal-article","created":{"date-parts":[[2025,11,24]],"date-time":"2025-11-24T13:09:25Z","timestamp":1763989765000},"page":"537","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Survey of Intra-Node GPU Interconnection in Scale-Up Network: Challenges, Status, Insights, and Future Directions"],"prefix":"10.3390","volume":"17","author":[{"ORCID":"https:\/\/orcid.org\/0009-0005-8621-2315","authenticated-orcid":false,"given":"Xiaoyong","family":"Song","sequence":"first","affiliation":[{"name":"China Mobile Research Institute, No.32 Xuanwumen West Street, Xicheng District, Beijing 100053, China"},{"name":"China Mobile, No. 29, Finance Street, Xicheng District, Beijing 100033, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Danyuan","family":"Zhou","sequence":"additional","affiliation":[{"name":"China Mobile, No. 29, Finance Street, Xicheng District, Beijing 100033, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Kai","family":"Li","sequence":"additional","affiliation":[{"name":"China Mobile Research Institute, No.32 Xuanwumen West Street, Xicheng District, Beijing 100053, China"},{"name":"China Mobile, No. 29, Finance Street, Xicheng District, Beijing 100033, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jiayuan","family":"Chen","sequence":"additional","affiliation":[{"name":"China Mobile Research Institute, No.32 Xuanwumen West Street, Xicheng District, Beijing 100053, China"},{"name":"China Mobile, No. 29, Finance Street, Xicheng District, Beijing 100033, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Hao","family":"Zhang","sequence":"additional","affiliation":[{"name":"China Mobile Research Institute, No.32 Xuanwumen West Street, Xicheng District, Beijing 100053, China"},{"name":"China Mobile, No. 29, Finance Street, Xicheng District, Beijing 100033, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Xiaoguang","family":"Zhang","sequence":"additional","affiliation":[{"name":"China Mobile Research Institute, No.32 Xuanwumen West Street, Xicheng District, Beijing 100053, China"},{"name":"China Mobile, No. 29, Finance Street, Xicheng District, Beijing 100033, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Xuxia","family":"Zhong","sequence":"additional","affiliation":[{"name":"China Mobile Research Institute, No.32 Xuanwumen West Street, Xicheng District, Beijing 100053, China"},{"name":"China Mobile, No. 29, Finance Street, Xicheng District, Beijing 100033, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2025,11,24]]},"reference":[{"key":"ref_1","unstructured":"Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava, P., and Bhosale, S. (2023). Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv."},{"key":"ref_2","unstructured":"Grattafiori, A., Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., and Vaughan, A. (2024). The Llama 3 Herd of Models. arXiv."},{"key":"ref_3","unstructured":"Duan, J., Zhang, S., Wang, Z., Jiang, L., Qu, W., Hu, Q., Wang, G., Weng, Q., Yan, H., and Zhang, X. (2024). Efficient Training of Large Language Models on Distributed Infrastructures: A Survey. arXiv."},{"key":"ref_4","unstructured":"Liu, A., Feng, B., Xue, B., Wang, B., Wu, B., Lu, C., Zhao, C., Deng, C., Zhang, C., and Ruan, C. (2024). Deepseek-v3 technical report. arXiv."},{"key":"ref_5","unstructured":"Team, K., Bai, Y., Bao, Y., Chen, G., Chen, J., Chen, N., Chen, R., Chen, Y., Chen, Y., and Chen, Y. (2025). Kimi K2: Open Agentic Intelligence. arXiv."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Li, A., Song, S.L., Chen, J., Liu, X., Tallent, N., and Barker, K. (October, January 30). Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suite. Proceedings of the 2018 IEEE International Symposium on Workload Characterization (IISWC), Raleigh, NC, USA.","DOI":"10.1109\/IISWC.2018.8573483"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"94","DOI":"10.1109\/TPDS.2019.2928289","article-title":"Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect","volume":"31","author":"Li","year":"2020","journal-title":"IEEE Trans. Parallel Distrib. Syst."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Huang, X., and Wang, J. (2025). Inter-Data Center RDMA: Challenges, Status, and Future Directions. Future Internet, 17.","DOI":"10.3390\/fi17060242"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Xu, L., Anthony, Q., Zhou, Q., Alnaasan, N., Gulhane, R., Shafi, A., Subramoni, H., and Panda, D.K.D. (2024, January 6\u20139). Accelerating Large Language Model Training with Hybrid GPU-based Compression. Proceedings of the 2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid), Philadelphia, PA, USA.","DOI":"10.1109\/CCGrid59990.2024.00031"},{"key":"ref_10","unstructured":"Liao, X., Sun, Y., Tian, H., Wan, X., Jin, Y., Wang, Z., Ren, Z., Huang, X., Li, W., and Tse, K.F. (2025). mFabric: An Efficient and Scalable Fabric for Mixture-of-Experts Training. arXiv."},{"key":"ref_11","unstructured":"Lepikhin, D., Lee, H., Xu, Y., Chen, D., Firat, O., Huang, Y., Krikun, M., Shazeer, N., and Chen, Z. (2020). GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding. arXiv."},{"key":"ref_12","unstructured":"Zhang, S., Zheng, N., Lin, H., Jiang, Z., Bao, W., Jiang, C., Hou, Q., Cui, W., Zheng, S., and Chang, L.W. (2025). Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts. arXiv."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"47","DOI":"10.1109\/MCAS.2024.3373556","article-title":"PCI-Express: Evolution of a Ubiquitous Load-Store Interconnect Over Two Decades and the Path Forward for the Next Two Decades","volume":"24","author":"Sharma","year":"2024","journal-title":"IEEE Circuits Syst. Mag."},{"key":"ref_14","first-page":"23","article-title":"Research on the Development Status of High Speed Interconnection Technologies and Topogies of Multi-GPU Systems","volume":"31","author":"Chen","year":"2024","journal-title":"Aero Weapon."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Gouk, D., Kang, S., Lee, S., Kim, J., Nam, K., Ryu, E., Lee, S., Kim, D., Jang, J., and Bae, H. (2025). CXL-GPU: Pushing GPU Memory Boundaries with the Integration of CXL Technologies. IEEE Micro, 1\u20138.","DOI":"10.1109\/MM.2025.3582433"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Sharma, D.D. (2022, January 17\u201319). Compute Express Link\u00ae: An open industry-standard interconnect enabling heterogeneous data-centric computing. Proceedings of the 2022 IEEE Symposium on High-Performance Interconnects (HOTI), Virtual.","DOI":"10.1109\/HOTI55740.2022.00017"},{"key":"ref_17","unstructured":"NVIDIA (2025, October 06). NVIDIA NVLink. Available online: https:\/\/www.nvidia.com\/en-us\/data-center\/nvlink\/."},{"key":"ref_18","unstructured":"AMD (2025, October 06). Infinity Fabric (IF)\u2014AMD. Available online: https:\/\/en.wikichip.org\/wiki\/amd\/infinity_fabric."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Schieffer, G., Shi, R., Markidis, S., Herten, A., Faj, J., and Peng, I. (2024, January 17\u201322). Understanding Data Movement in AMD Multi-GPU Systems with Infinity Fabric. Proceedings of the SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, Atlanta, GA, USA.","DOI":"10.1109\/SCW63240.2024.00079"},{"key":"ref_20","unstructured":"ChinaMobile (2025, October 06). OISA. Available online: https:\/\/www.oisa.org.cn\/."},{"key":"ref_21","unstructured":"(2025, August 15). UALink Consortium\u2122. Available online: https:\/\/ualinkconsortium.org\/."},{"key":"ref_22","unstructured":"Broadcom (2025, August 15). Scale-Up Ethernet Framework Specification. Available online: https:\/\/docs.broadcom.com\/doc\/scale-up-ethernet-framework."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"4395","DOI":"10.1109\/TPDS.2022.3188656","article-title":"A Survey of Storage Systems in the RDMA Era","volume":"33","author":"Ma","year":"2022","journal-title":"IEEE Trans. Parallel Distrib. Syst."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"677","DOI":"10.1109\/JSYST.2019.2936519","article-title":"Traffic Control for RDMA-Enabled Data Center Networks: A Survey","volume":"14","author":"Guo","year":"2020","journal-title":"IEEE Syst. J."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"An, W., Bi, X., Chen, G., Chen, S., Deng, C., Ding, H., Dong, K., Du, Q., Gao, W., and Guan, K. (2024). Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning. arXiv.","DOI":"10.1109\/SC41406.2024.00089"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Wang, W., Ghobadi, M., Shakeri, K., Zhang, Y., and Hasani, N. (2024, January 21\u201323). Rail-only: A Low-Cost High-Performance Network for Training LLMs with Trillion Parameters. Proceedings of the 2024 IEEE Symposium on High-Performance Interconnects (HOTI), Albuquerque, NM, USA.","DOI":"10.1109\/HOTI63208.2024.00013"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Qian, K., Xi, Y., Cao, J., Gao, J., Xu, Y., Guan, Y., Fu, B., Shi, X., Zhu, F., and Miao, R. (2024, January 4\u20138). Alibaba HPN: A Data Center Network for Large Language Model Training. Proceedings of the ACM SIGCOMM 2024 Conference, Sydney, Australia.","DOI":"10.1145\/3651890.3672265"},{"key":"ref_28","first-page":"1","article-title":"An Introduction to the Compute Express Link (CXL) Interconnect","volume":"56","author":"Blankenship","year":"2024","journal-title":"ACM Comput. Surv."},{"key":"ref_29","unstructured":"Chen, C., Zhao, X., Cheng, G., Xu, Y., Deng, S., and Yin, J. (2025). Next-Gen Computing Systems with Compute Express Link: A Comprehensive Survey. arXiv."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"380","DOI":"10.1109\/MNET.2024.3397781","article-title":"RDMA Transports in Datacenter Networks: Survey","volume":"38","author":"Hu","year":"2024","journal-title":"IEEE Netw."},{"key":"ref_31","first-page":"IJSAT25023103","article-title":"Ultra Ethernet and UALink: Next-Generation Interconnects for AI Infrastructure","volume":"16","author":"Arsid","year":"2025","journal-title":"IJSAT Int. J. Sci. Technol."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1109\/MM.2025.3592688","article-title":"UB-Mesh: A Hierarchically Localized nD-FullMesh Datacenter Network Architecture","volume":"45","author":"Liao","year":"2025","journal-title":"IEEE Micro"},{"key":"ref_33","unstructured":"NVIDIA (2025, November 12). NVIDIA DGX SuperPOD. Available online: https:\/\/www.nvidia.com\/en-us\/data-center\/dgx-superpod\/."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Ahn, J., Choi, S., Shin, T., Lee, J., Yoon, J., Kim, K., Son, K., Suh, H., Kim, T., and Park, H. (2024, January 6\u20139). Design and Analysis of Ultra High Bandwidth (UHB) Interconnection-based GPU-Ring for the AI Superchip Module. Proceedings of the 2024 IEEE 33rd Conference on Electrical Performance of Electronic Packaging and Systems (EPEPS), Toronto, ON, Canada.","DOI":"10.1109\/EPEPS61853.2024.10754190"},{"key":"ref_35","unstructured":"Arfeen, D., Mudigere, D., More, A., Gopireddy, B., Inci, A., and Ganger, G.R. (2025). Nonuniform-Tensor-Parallelism: Mitigating GPU failure impact for Scaled-up LLM Training. arXiv."},{"key":"ref_36","unstructured":"AFL (2025, November 12). AI Data Centers: Scaling UP and Scaling Out. Available online: https:\/\/www.aflhyperscale.com\/wp-content\/uploads\/2024\/12\/AI-Data-Centers-Scaling-Up-and-Scaling-Out-White-Paper.pdf."},{"key":"ref_37","unstructured":"Kalyanasundharam, N. (2025, August 15). Introducing UALink 200G 1.0 Specification. Available online: https:\/\/ualinkconsortium.org\/wp-content\/uploads\/2025\/04\/UALink-1.0-White_Paper_FINAL.pdf."},{"key":"ref_38","unstructured":"Tarraga-Moreno, J., Escudero-Sahuquillo, J., Garcia, P.J., and Quiles, F.J. (2025). Understanding intra-node communication in HPC systems and Datacenters. arXiv."},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Qi, H., Dai, L., Chen, W., Jia, Z., and Lu, X. (2023, January 23\u201325). Performance Characterization of Large Language Models on High-Speed Interconnects. Proceedings of the 2023 IEEE Symposium on High-Performance Interconnects (HOTI), Virtual.","DOI":"10.1109\/HOTI59126.2023.00022"},{"key":"ref_40","unstructured":"Si, M., Balaji, P., Chen, Y., Chu, C.H., Gangidi, A., Hasan, S., Iyengar, S., Johnson, D., Liu, B., and Ren, R. (2025). Collective Communication for 100k+ GPUs. arXiv."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Zhao, C., Deng, C., Ruan, C., Dai, D., Gao, H., Li, J., Zhang, L., Huang, P., Zhou, S., and Ma, S. (2025, January 21\u201325). Insights into deepseek-v3: Scaling challenges and reflections on hardware for ai architectures. Proceedings of the 52nd Annual International Symposium on Computer Architecture (ISCA), Tokyo, Japan.","DOI":"10.1145\/3695053.3731412"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Bittner, R., and Ruf, E. (2012, January 10\u201313). Direct GPU\/FPGA Communication via PCI Express. Proceedings of the 2012 41st International Conference on Parallel Processing Workshops (ICPPW), Pittsburgh, PA, USA.","DOI":"10.1109\/ICPPW.2012.20"},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1109\/MM.2017.37","article-title":"Ultra-Performance Pascal GPU and NVLink Interconnect","volume":"37","author":"Foley","year":"2017","journal-title":"IEEE Micro"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Muthukrishnan, H., Lustig, D., Villa, O., Wenisch, T., and Nellans, D. (March, January 25). Finepack: Transparently improving the efficiency of fine-grained transfers in multi-gpu systems. Proceedings of the 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA), Montreal, QC, Canada.","DOI":"10.1109\/HPCA56546.2023.10070949"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Chu, C.H., Hashmi, J.M., Khorassani, K.S., Subramoni, H., and Panda, D.K. (2019, January 17\u201320). High-Performance Adaptive MPI Derived Datatype Communication for Modern Multi-GPU Systems. Proceedings of the 2019 IEEE 26th International Conference on High Performance Computing, Data, and Analytics (HiPC), Hyderabad, India.","DOI":"10.1109\/HiPC.2019.00041"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"De Sensi, D., Pichetti, L., Vella, F., De Matteis, T., Ren, Z., Fusco, L., Turisini, M., Cesarini, D., Lust, K., and Trivedi, A. (2024, January 17\u201322). Exploring GPU-to-GPU Communication: Insights into Supercomputer Interconnects. Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, Atlanta, GA, USA.","DOI":"10.1109\/SC41406.2024.00039"},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Shou, C., Liu, G., Nie, H., Meng, H., Zhou, Y., Jiang, Y., Lv, W., Xu, Y., Lu, Y., and Chen, Z. (2025, January 8\u201311). InfiniteHBD: Building Datacenter-Scale High-Bandwidth Domain for LLM with Optical Circuit Switching Transceivers. Proceedings of the ACM SIGCOMM 2025 Conference, Coimbra, Portugal.","DOI":"10.1145\/3718958.3750468"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Saber, M.G., and Jiang, Z. (2025). Physical Layer Standardization for AI Data Centers: Challenges, Progress and Perspectives. IEEE Netw.","DOI":"10.1109\/MNET.2025.3557812"},{"key":"ref_49","unstructured":"Zuo, P., Lin, H., Deng, J., Zou, N., Yang, X., Diao, Y., Gao, W., Xu, K., Chen, Z., and Lu, S. (2025). Serving Large Language Models on Huawei CloudMatrix384. arXiv."},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1109\/MM.2022.3228561","article-title":"Compute Express Link (CXL): Enabling Heterogeneous Data-Centric Computing With Heterogeneous Memory Hierarchy","volume":"43","author":"Sharma","year":"2023","journal-title":"IEEE Micro"},{"key":"ref_51","doi-asserted-by":"crossref","unstructured":"Hong, M., and Xu, L. (2022, January 21\u201323). BR100 GPGPU: Accelerating Datacenter Scale AI Computing. Proceedings of the 2022 IEEE Hot Chips 34 Symposium (HCS), Cupertino, CA, USA.","DOI":"10.1109\/HCS55958.2022.9895604"},{"key":"ref_52","unstructured":"(2025, October 06). PCI-SIG. Available online: https:\/\/pcisig.com\/."},{"key":"ref_53","unstructured":"NVIDIA (2025, October 06). NVIDIA H100 Tensor Core GPU Architecture Whitepaper. Available online: https:\/\/www.nvidia.cn\/lp\/data-center\/resources\/download-hopper-arch-whitepaper\/."},{"key":"ref_54","unstructured":"AMD (2025, October 06). Data Sheet\u2014AMD Instinct\u2122 MI300X Accelerator. Available online: https:\/\/www.amd.com\/content\/dam\/amd\/en\/documents\/instinct-tech-docs\/data-sheets\/amd-instinct-mi300x-data-sheet.pdf."},{"key":"ref_55","unstructured":"CXL (2025, August 15). CXL-3.1-Specification. Available online: https:\/\/computeexpresslink.org\/wp-content\/uploads\/2024\/02\/CXL-3.1-Specification.pdf."},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Wang, X., Liu, J., Wu, J., Yang, S., Ren, J., Shankar, B., and Li, D. (2025, January 3\u20137). Performance Characterization of CXL Memory and Its Use Cases. Proceedings of the 2025 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Milano, Italy.","DOI":"10.1109\/IPDPS64566.2025.00097"},{"key":"ref_57","unstructured":"Wang, Z., Luo, L., Ning, Q., Zeng, C., Li, W., Wan, X., Xie, P., Feng, T., Cheng, K., and Geng, X. (2023, January 17\u201329). SRNIC: A Scalable Architecture for RDMA NICs. Proceedings of the 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23), Boston, MA, USA."},{"key":"ref_58","doi-asserted-by":"crossref","unstructured":"Sun, Z., Guo, Z., Ma, J., and Pan, Y. (2024). A High-Performance FPGA-Based RoCE v2 RDMA Packet Parser and Generator. Electronics, 13.","DOI":"10.3390\/electronics13204107"},{"key":"ref_59","unstructured":"Star Oceans Wiki (2025, August 07). NVLink. Available online: http:\/\/www.staroceans.org\/wiki\/A\/NVLink."},{"key":"ref_60","unstructured":"NVIDIA (2025, October 06). NVIDIA GB200 NVL72. Available online: https:\/\/www.nvidia.com\/en-us\/data-center\/gb200-nvl72\/."},{"key":"ref_61","unstructured":"NVIDIA (2025, October 06). NVIDIA GB300 NVL72. Available online: https:\/\/www.nvidia.com\/en-us\/data-center\/gb300-nvl72\/."},{"key":"ref_62","doi-asserted-by":"crossref","unstructured":"Danskin, J., and Foley, D. (2016, January 21\u201323). Pascal GPU with NVLink. Proceedings of the 2016 IEEE Hot Chips 28 Symposium (HCS), Cupertino, CA, USA.","DOI":"10.1109\/HOTCHIPS.2016.7936202"},{"key":"ref_63","unstructured":"NVIDIA (2025, October 06). NVIDIA Tesla P100. Available online: https:\/\/images.nvidia.com\/content\/pdf\/tesla\/whitepaper\/pascal-architecture-whitepaper.pdf."},{"key":"ref_64","unstructured":"NVIDIA (2025, October 06). NVIDIA DGX-1 with Tesla V100 System Architecture. Available online: https:\/\/images.nvidia.com\/content\/pdf\/dgx1-v100-system-architecture-whitepaper.pdf."},{"key":"ref_65","unstructured":"Ishii, A., and Foley, D. (2018, January 19\u201321). NVSwitch and DGX-2\u2014NVIDIA\u2019s NVLink-Switching Chip and Scale-Up GPU-Compute Server. Proceedings of the 2018 IEEE Hot Chips 30 Symposium (HCS), Cupertino, CA, USA."},{"key":"ref_66","unstructured":"NVIDIA (2025, October 06). Unified Memory for CUDA Beginners. Available online: https:\/\/developer.nvidia.com\/blog\/unified-memory-cuda-beginners\/."},{"key":"ref_67","doi-asserted-by":"crossref","unstructured":"Graham, R.L., Bureddy, D., Lui, P., Rosenstock, H., Shainer, G., Bloch, G., Goldenerg, D., Dubman, M., Kotchubievsky, S., and Koushnir, V. (2016, January 13\u201318). Scalable Hierarchical Aggregation Protocol (SHArP): A Hardware Architecture for Efficient Data Reduction. Proceedings of the 2016 First International Workshop on Communication Optimizations in HPC (COMHPC), Salt Lake City, UT, USA.","DOI":"10.1109\/COMHPC.2016.006"},{"key":"ref_68","doi-asserted-by":"crossref","unstructured":"Ramesh, B., Kuncham, G.K.R., Suresh, K.K., Vaidya, R., Alnaasan, N., Abduljabbar, M., Shafi, A., Subramoni, H., and Panda, D.K.D. (2023, January 23\u201325). Designing In-network Computing Aware Reduction Collectives in MPI. Proceedings of the 2023 IEEE Symposium on High-Performance Interconnects (HOTI), Virtual.","DOI":"10.1109\/HOTI59126.2023.00018"},{"key":"ref_69","doi-asserted-by":"crossref","unstructured":"Graham, R.L., Levi, L., Burredy, D., Bloch, G., Shainer, G., Cho, D., Elias, G., Klein, D., Ladd, J., and Maor, O. (2020, January 22\u201325). Scalable hierarchical aggregation and reduction protocol (sharp) tm streaming-aggregation hardware design and evaluation. Proceedings of the International Conference on High Performance Computing, Frankfurt am Main, Germany.","DOI":"10.1007\/978-3-030-50743-5_3"},{"key":"ref_70","unstructured":"NVIDIA Corporation (2025, August 06). NVIDIA NVLink Fusion. Available online: https:\/\/www.nvidia.cn\/data-center\/nvlink-fusion\/."},{"key":"ref_71","unstructured":"Onufryk, P. (2025, August 15). UALink 200G 1.0 Specification Overview. Available online: https:\/\/staging.ualinkconsortium.org\/wp-content\/uploads\/2025\/04\/UALink-1.0-Specification-Webinar_FINAL.pdf."},{"key":"ref_72","unstructured":"Brown, D., and Lusted, K. (2025, August 15). UALink 200G 1.0 Specification Overview: Data Link Layer (DL) and Physical Layer (PL). Available online: https:\/\/www.ieee802.org\/3\/ad_hoc\/E4AI\/public\/25_0624\/lusted_e4ai_01_250624.pdf."},{"key":"ref_73","doi-asserted-by":"crossref","unstructured":"Norrie, T., Patil, N., Yoon, D.H., Kurian, G., Li, S., Laudon, J., Young, C., Jouppi, N.P., and Patterson, D. (2020, January 16\u201318). Google\u2019s Training Chips Revealed: TPUv2 and TPUv3. Proceedings of the 2020 IEEE Hot Chips 32 Symposium (HCS), Palo Alto, CA, USA.","DOI":"10.1109\/HCS49909.2020.9220735"},{"key":"ref_74","unstructured":"NVIDIA (2025, November 10). NVIDIA DGX A100 User Guide. Available online: https:\/\/docs.nvidia.com\/dgx\/dgxa100-user-guide\/introduction-to-dgxa100.html."},{"key":"ref_75","unstructured":"NVIDIA (2025, November 10). NVIDIA DGX-2. Available online: https:\/\/www.nvidia.com\/en-in\/data-center\/dgx-2\/."},{"key":"ref_76","unstructured":"Intel (2025, November 10). Intel Gaudi 3 AI Accelerator. Available online: https:\/\/www.intel.com\/content\/www\/us\/en\/content-details\/817486\/intel-gaudi-3-ai-accelerator-white-paper.html."},{"key":"ref_77","unstructured":"Ultra Ethernet Consortium \u2122 (2025, August 15). Ultra Ethernet \u2122 Specification v1.0. Available online: https:\/\/ultraethernet.org\/wp-content\/uploads\/sites\/20\/2025\/06\/UE-Specification-6.11.25.pdf."},{"key":"ref_78","unstructured":"BROADCOM (2025, October 06). Tomahawk Ultra\/BCM78920 Series 51.2Tb\/s StrataXGS Tomahawk Ultra Low-Latency Ethernet Switch Series. Available online: https:\/\/www.broadcom.com\/products\/ethernet-connectivity\/switching\/strataxgs\/bcm78920-series."},{"key":"ref_79","unstructured":"MOORE THREADS (2025, August 15). MTLink-S4000. Available online: https:\/\/www.mthreads.com\/product\/S4000."},{"key":"ref_80","doi-asserted-by":"crossref","first-page":"156013","DOI":"10.1109\/ACCESS.2021.3129595","article-title":"A Flow Control Scheme Based on Per Hop and Per Flow in Commodity Switches for Lossless Networks","volume":"9","author":"Wang","year":"2021","journal-title":"IEEE Access"},{"key":"ref_81","doi-asserted-by":"crossref","first-page":"4134","DOI":"10.1109\/JSYST.2019.2903819","article-title":"Fair Congestion Control Protocol for Data Center Bridging","volume":"13","author":"Bahnasy","year":"2019","journal-title":"IEEE Syst. J."},{"key":"ref_82","doi-asserted-by":"crossref","unstructured":"Wu, X.C., and Eugene Ng, T.S. (2022, January 2\u20135). Detecting and Resolving PFC Deadlocks with ITSY Entirely in the Data Plane. Proceedings of the IEEE INFOCOM 2022\u2014IEEE Conference on Computer Communications, London, UK.","DOI":"10.1109\/INFOCOM48880.2022.9796798"},{"key":"ref_83","doi-asserted-by":"crossref","first-page":"40","DOI":"10.1109\/65.372658","article-title":"Credit-based flow control for ATM networks","volume":"9","author":"Kung","year":"1995","journal-title":"IEEE Netw."},{"key":"ref_84","doi-asserted-by":"crossref","first-page":"319","DOI":"10.1145\/161541.161736","article-title":"High-speed switch scheduling for local-area networks","volume":"11","author":"Anderson","year":"1993","journal-title":"ACM Trans. Comput. Syst."},{"key":"ref_85","doi-asserted-by":"crossref","unstructured":"Li, L., Chen, Y., Lu, H., He, L., Gao, L., and Wang, N. (2024, January 13\u201316). Credit-R: Enhancing Credit-Based Congestion Control in Cross-Data Center Networks. Proceedings of the 2024 10th International Conference on Computer and Communications (ICCC), Chengdu, China.","DOI":"10.1109\/ICCC62609.2024.10941866"},{"key":"ref_86","unstructured":"Malhotra, A., and Chitre, K. (2016, January 16\u201318). Performance analysis of data link layer protocols with a special emphasis on improving the performance of Stop-and-Wait-ARQ. Proceedings of the 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India."},{"key":"ref_87","unstructured":"Hayashida, Y., Sugimachi, N., Komatsu, M., and Yoshida, Y. (1989, January 22\u201324). Go-back-N system with limited retransmissions. Proceedings of the Eighth Annual International Phoenix Conference on Computers and Communications, Scottsdale, AZ, USA."},{"key":"ref_88","unstructured":"Lee, T.H. (1991, January 7\u201311). The throughput efficiency of go-back-N ARQ scheme for burst-error channels. Proceedings of the IEEE INFCOM \u201991. The conference on Computer Communications. Tenth Annual Joint Comference of the IEEE Computer and Communications Societies Proceedings, Bal Harbour, FL, USA."},{"key":"ref_89","unstructured":"Rati Preethi, S., Kumar, P., Anil, S., and Chandavarkar, B.R. (2023, January 6\u20138). Predictive Selective Repeat\u2014An Optimized Selective Repeat for Noisy Channels. Proceedings of the 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), Delhi, India."},{"key":"ref_90","doi-asserted-by":"crossref","first-page":"269","DOI":"10.1109\/LCA.2025.3594110","article-title":"RoSR: A Novel Selective Retransmission FPGA Architecture for RDMA NICs","volume":"24","author":"Zhang","year":"2025","journal-title":"IEEE Comput. Archit. Lett."},{"key":"ref_91","unstructured":"Muthukrishnan, H. (2022). Improving Multi-GPU Strong Scaling Through Optimization of Fine-Grained Transfers. [Ph.D. Thesis, University of Michigan]."},{"key":"ref_92","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1109\/LNET.2021.3067343","article-title":"FindINT: Detect and Locate the Lost in-Band Network Telemetry Packet","volume":"4","author":"Tan","year":"2022","journal-title":"IEEE Netw. Lett."},{"key":"ref_93","doi-asserted-by":"crossref","unstructured":"Wang, H., Liu, Y., Li, W., and Yang, Z. (2024). Multi-Agent Deep Reinforcement Learning-Based Fine-Grained Traffic Scheduling in Data Center Networks. Future Internet, 16.","DOI":"10.3390\/fi16040119"},{"key":"ref_94","doi-asserted-by":"crossref","unstructured":"Li, Y., Miao, R., Liu, H.H., Zhuang, Y., Feng, F., Tang, L., Cao, Z., Zhang, M., Kelly, F., and Alizadeh, M. (2019, January 19\u201324). HPCC: High precision congestion control. Proceedings of the ACM Special Interest Group on Data Communication, Beijing, China.","DOI":"10.1145\/3341302.3342085"},{"key":"ref_95","doi-asserted-by":"crossref","unstructured":"Huang, H., Xue, G., Wang, Y., and Zhang, H. (2013, January 20\u201322). An adaptive active queue management algorithm. Proceedings of the 2013 3rd International Conference on Consumer Electronics, Communications and Networks, Xianning, China.","DOI":"10.1109\/CECNet.2013.6703275"},{"key":"ref_96","doi-asserted-by":"crossref","unstructured":"Guo, Y., Meng, Z., Wang, B., and Xu, M. (2024, January 9\u201313). Inferring in-Network Queue Management from End Hosts in Real-Time Communications. Proceedings of the ICC 2024\u2014IEEE International Conference on Communications, Denver, CO, USA.","DOI":"10.1109\/ICC51166.2024.10622436"},{"key":"ref_97","doi-asserted-by":"crossref","first-page":"537","DOI":"10.1145\/2829988.2787510","article-title":"TIMELY: RTT-based Congestion Control for the Datacenter","volume":"45","author":"Mittal","year":"2015","journal-title":"SIGCOMM Comput. Commun. Rev."},{"key":"ref_98","doi-asserted-by":"crossref","unstructured":"Kumar, G., Dukkipati, N., Jang, K., Wassel, H.M.G., Wu, X., Montazeri, B., Wang, Y., Springborn, K., Alfeld, C., and Ryan, M. (2020, January 22\u201326). Swift: Delay is Simple and Effective for Congestion Control in the Datacenter. Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication on the Applications, Technologies, Architectures, and Protocols for Computer Communication, Amsterdam, The Netherlands.","DOI":"10.1145\/3387514.3406591"},{"key":"ref_99","unstructured":"Tithi, J.J., Wu, H., Abuhatzera, A., and Petrini, F. (2025). Scaling Intelligence: Designing Data Centers for Next-Gen Language Models. arXiv."},{"key":"ref_100","unstructured":"Cui, S., Patke, A., Nguyen, H., Ranjan, A., Chen, Z., Cao, P., Bode, B., Bauer, G., Martino, C.D., and Jha, S. (2025). Characterizing GPU Resilience and Impact on AI\/HPC Systems. arXiv."},{"key":"ref_101","doi-asserted-by":"crossref","first-page":"69","DOI":"10.36348\/sjet.2024.v09i02.006","article-title":"Chiplet Technology: Revolutionizing Semiconductor Design-A Review","volume":"9","author":"Gujar","year":"2024","journal-title":"Saudi J. Eng. Technol."},{"key":"ref_102","doi-asserted-by":"crossref","unstructured":"Li, C., Jiang, F., Chen, S., Li, X., Liu, J., Zhang, W., and Xu, J. (2024, January 25\u201327). Towards Scalable GPU System with Silicon Photonic Chiplet. Proceedings of the 2024 Design, Automation & Test in Europe Conference & Exhibition (DATE), Valencia, Spain.","DOI":"10.23919\/DATE58400.2024.10546733"},{"key":"ref_103","doi-asserted-by":"crossref","unstructured":"Li, C., Jiang, F., Chen, S., Lil, X., Liu, Y., Chen, L., Li, X., and Xu, J. (November, January 28). RONet: Scaling GPU System with Silicon Photonic Chiplet. Proceedings of the 2023 IEEE\/ACM International Conference on Computer Aided Design (ICCAD), San Francisco, CA, USA.","DOI":"10.1109\/ICCAD57390.2023.10323762"},{"key":"ref_104","doi-asserted-by":"crossref","unstructured":"Willner, A.E. (2020). Chapter 18\u2014Optical interconnection networks for high-performance systems. Optical Fiber Telecommunications VII, Academic Press.","DOI":"10.1016\/B978-0-12-816502-7.00032-4"},{"key":"ref_105","doi-asserted-by":"crossref","unstructured":"Yang, C., Hu, B., Chen, P., Liu, Y., Zhang, W., and Xu, J. (April, January 31). BEAM: A Multi-Channel Optical Interconnect for Multi-GPU Systems. Proceedings of the 2025 Design, Automation & Test in Europe Conference (DATE), Lyon, France.","DOI":"10.23919\/DATE64628.2025.10993197"},{"key":"ref_106","unstructured":"NS-3 (2025, August 15). NS-3 Network Simulator. Available online: https:\/\/www.nsnam.org\/."}],"container-title":["Future Internet"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-5903\/17\/12\/537\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,26]],"date-time":"2025-11-26T05:31:07Z","timestamp":1764135067000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-5903\/17\/12\/537"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11,24]]},"references-count":106,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["fi17120537"],"URL":"https:\/\/doi.org\/10.3390\/fi17120537","relation":{},"ISSN":["1999-5903"],"issn-type":[{"value":"1999-5903","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,11,24]]}}}