{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,13]],"date-time":"2025-11-13T17:23:15Z","timestamp":1763054595592,"version":"3.45.0"},"publisher-location":"New York, NY, USA","reference-count":43,"publisher":"ACM","funder":[{"name":"Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT)","award":["RS-2025-02214652, RS-2025-02214654, RS-2023-00221040"],"award-info":[{"award-number":["RS-2025-02214652, RS-2025-02214654, RS-2023-00221040"]}]},{"name":"Ministry of Trade, Industry and Energy (MOTIE) and Korea Institute for Advancement of Technology (KIAT)","award":["P0028225"],"award-info":[{"award-number":["P0028225"]}]},{"name":"Korea Institute for Advancement of Technology (KIAT) grant funded by the Korea Government (MOTIE)","award":["P0027923"],"award-info":[{"award-number":["P0027923"]}]},{"name":"Technology development Program of MSS","award":["RS-2023-00303967"],"award-info":[{"award-number":["RS-2023-00303967"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2025,10,13]]},"DOI":"10.1145\/3764862.3768173","type":"proceedings-article","created":{"date-parts":[[2025,10,1]],"date-time":"2025-10-01T13:54:03Z","timestamp":1759326843000},"page":"10-18","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["ScalePool: Hybrid XLink-CXL Fabric for Composable Resource Disaggregation in Unified Scale-up Domains"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0006-1117-7639","authenticated-orcid":false,"given":"Hyein","family":"Woo","sequence":"first","affiliation":[{"name":"Panmnesia, Inc."}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0313-1319","authenticated-orcid":false,"given":"Miryeong","family":"Kwon","sequence":"additional","affiliation":[{"name":"Panmnesia, Inc."}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6637-2411","authenticated-orcid":false,"given":"Jiseon","family":"Kim","sequence":"additional","affiliation":[{"name":"Panmnesia, Inc."}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-8264-280X","authenticated-orcid":false,"given":"Eunjee","family":"Na","sequence":"additional","affiliation":[{"name":"Panmnesia, Inc."}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3562-9410","authenticated-orcid":false,"given":"Hanjin","family":"Choi","sequence":"additional","affiliation":[{"name":"Panmnesia, Inc."}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-0410-4159","authenticated-orcid":false,"given":"Seonghyeon","family":"Jang","sequence":"additional","affiliation":[{"name":"Panmnesia, Inc."}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9832-5801","authenticated-orcid":false,"given":"Myoungsoo","family":"Jung","sequence":"additional","affiliation":[{"name":"Panmnesia, Inc."}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,10,13]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Attention is all you need. Advances in Neural Information Processing Systems","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems (2017)."},{"key":"e_1_3_2_1_2_1","unstructured":"Tom Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared D Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell et al. 2020. Language models are few-shot learners. Advances in Neural Information Processing Systems (2020)."},{"key":"e_1_3_2_1_3_1","unstructured":"Jack W Rae Sebastian Borgeaud Trevor Cai Katie Millican Jordan Hoffmann Francis Song John Aslanides Sarah Henderson Roman Ring Susannah Young et al. 2021. Scaling language models: Methods analysis & insights from training gopher. arXiv preprint arXiv:2112.11446 (2021)."},{"key":"e_1_3_2_1_4_1","unstructured":"Aaron Grattafiori Abhimanyu Dubey Abhinav Jauhri Abhinav Pandey Abhishek Kadian Ahmad Al-Dahle Aiesha Letman Akhil Mathur Alan Schelten Alex Vaughan et al. 2024. The Llama 3 Herd of Models. arXiv preprint arXiv:2407.21783 (2024)."},{"key":"e_1_3_2_1_5_1","volume-title":"Charles Sutton, Sebastian Gehrmann, et al.","author":"Chowdhery Aakanksha","year":"2023","unstructured":"Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, et al. 2023. Palm: Scaling language modeling with pathways. Journal of Machine Learning Research (2023)."},{"key":"e_1_3_2_1_6_1","volume-title":"Megatron-lm: Training multibillion parameter language models using model parallelism. arXiv preprint arXiv:1909.08053","author":"Shoeybi Mohammad","year":"2019","unstructured":"Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, and Bryan Catanzaro. 2019. Megatron-lm: Training multibillion parameter language models using model parallelism. arXiv preprint arXiv:1909.08053 (2019)."},{"key":"e_1_3_2_1_7_1","volume-title":"Jianyu Huang, Narayanan Sundaraman, Jongsoo Park, Xiaodong Wang, Udit Gupta, Carole-Jean Wu, Alisson G Azzolini, et al.","author":"Naumov Maxim","year":"2019","unstructured":"Maxim Naumov, Dheevatsa Mudigere, Hao-Jun Michael Shi, Jianyu Huang, Narayanan Sundaraman, Jongsoo Park, Xiaodong Wang, Udit Gupta, Carole-Jean Wu, Alisson G Azzolini, et al. 2019. Deep learning recommendation model for personalization and recommendation systems. arXiv preprint arXiv:1906.00091 (2019)."},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDCS54860.2022.00037"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/3470496.3533727"},{"key":"e_1_3_2_1_10_1","unstructured":"CXL Consortium. 2024. Compute Express Link Specification 3.2. https:\/\/computeexpresslink.org\/cxl-specification\/."},{"key":"e_1_3_2_1_11_1","unstructured":"UALink Consortium. 2025. UALink 200 Rev 1.0 Specification. https:\/\/ualinkconsortium.org\/specification\/."},{"key":"e_1_3_2_1_12_1","volume-title":"Proceedings of Machine Learning and Systems","author":"Jia Zhihao","year":"2019","unstructured":"Zhihao Jia, Matei Zaharia, and Alex Aiken. 2019. Beyond data and model parallelism for deep neural networks. Proceedings of Machine Learning and Systems (2019)."},{"key":"e_1_3_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3458817.3476209"},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3452296.3472904"},{"key":"e_1_3_2_1_15_1","volume-title":"2017 USENIX Annual Technical Conference (USENIX ATC 17)","author":"Zhang Hao","year":"2017","unstructured":"Hao Zhang, Zeyu Zheng, Shizhen Xu, Wei Dai, Qirong Ho, Xiaodan Liang, Zhiting Hu, Jinliang Wei, Pengtao Xie, and Eric P Xing. 2017. Poseidon: An efficient communication architecture for distributed deep learning on GPU clusters. In 2017 USENIX Annual Technical Conference (USENIX ATC 17)."},{"key":"e_1_3_2_1_16_1","volume-title":"Gpipe: Efficient training of giant neural networks using pipeline parallelism. Advances in Neural Information Processing Systems","author":"Huang Yanping","year":"2019","unstructured":"Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Dehao Chen, Mia Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V Le, Yonghui Wu, et al. 2019. Gpipe: Efficient training of giant neural networks using pipeline parallelism. Advances in Neural Information Processing Systems (2019)."},{"key":"e_1_3_2_1_17_1","volume-title":"Dreamshard: Generalizable embedding table placement for recommender systems. Advances in Neural Information Processing Systems","author":"Zha Daochen","year":"2022","unstructured":"Daochen Zha, Louis Feng, Qiaoyu Tan, Zirui Liu, Kwei-Herng Lai, Bhargav Bhushanam, Yuandong Tian, Arun Kejariwal, and Xia Hu. 2022. Dreamshard: Generalizable embedding table placement for recommender systems. Advances in Neural Information Processing Systems (2022)."},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/2988450.2988454"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3651890.3672274"},{"key":"e_1_3_2_1_20_1","volume-title":"18th USENIX Symposium on Operating Systems Design and Implementation (OSDI'24)","author":"Lee Wonbeom","year":"2024","unstructured":"Wonbeom Lee, Jungi Lee, Junghwan Seo, and Jaewoong Sim. 2024. InfiniGen: Efficient generative inference of large language models with dynamic KV cache management. In 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI'24)."},{"key":"e_1_3_2_1_21_1","volume-title":"Scissorhands: Exploiting the Persistence of Importance Hypothesis for LLM KV Cache Compression at Test Time. Advances in Neural Information Processing Systems","author":"Liu Zichang","year":"2023","unstructured":"Zichang Liu, Aditya Desai, Fangshuo Liao, Weitao Wang, Victor Xie, Zhaozhuo Xu, Anastasios Kyrillidis, and Anshumali Shrivastava. 2023. Scissorhands: Exploiting the Persistence of Importance Hypothesis for LLM KV Cache Compression at Test Time. Advances in Neural Information Processing Systems (2023)."},{"key":"e_1_3_2_1_22_1","unstructured":"Patrick Lewis Ethan Perez Aleksandra Piktus Fabio Petroni Vladimir Karpukhin Naman Goyal Heinrich K\u00fcttler Mike Lewis Wen-tau Yih Tim Rockt\u00e4schel et al. 2020. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Advances in Neural Information Processing Systems (2020)."},{"key":"e_1_3_2_1_23_1","volume-title":"International Conference on Machine Learning (ICML'20)","author":"Guu Kelvin","year":"2020","unstructured":"Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Mingwei Chang. 2020. Retrieval augmented language model pre-training. In International Conference on Machine Learning (ICML'20)."},{"key":"e_1_3_2_1_24_1","volume-title":"International Conference on Machine Learning (ICML'22)","author":"Borgeaud Sebastian","year":"2022","unstructured":"Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George Bm Van Den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, et al. 2022. Improving language models by retrieving from trillions of tokens. In International Conference on Machine Learning (ICML'22)."},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/3600006.3613165"},{"key":"e_1_3_2_1_26_1","volume-title":"Proceedings of Machine Learning and Systems","author":"Adnan Muhammad","year":"2024","unstructured":"Muhammad Adnan, Akhil Arunkumar, Gaurav Jain, Prashant J Nair, Ilya Soloveychik, and Purushotham Kamath. 2024. Keyformer: Kv cache reduction through key tokens selection for efficient generative inference. Proceedings of Machine Learning and Systems (2024)."},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/3661821"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3613424.3614256"},{"key":"e_1_3_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/3533737.3535090"},{"key":"e_1_3_2_1_30_1","unstructured":"NVIDIA. 2025. NVIDIA NVLink Fusion. https:\/\/www.nvidia.com\/en-us\/data-center\/nvlink-fusion\/."},{"key":"e_1_3_2_1_31_1","unstructured":"UCIe Consortium. 2024. UCIe 2.0 Specification. https:\/\/www.uciexpress.org\/specifications\/."},{"key":"e_1_3_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/3695053.3731409"},{"key":"e_1_3_2_1_33_1","unstructured":"AWS. 2025. AWS Trainium. https:\/\/aws.amazon.com\/ai\/machine-learning\/trainium\/."},{"key":"e_1_3_2_1_34_1","unstructured":"AWS. 2025. AWS Inferentia. https:\/\/aws.amazon.com\/ai\/machine-learning\/inferentia\/."},{"key":"e_1_3_2_1_35_1","unstructured":"Rani Borkar Andrew Wall Prasanth Pulavarthi and Yuan Yu. 2024. Azure Maia for the era of AI: From silicon to software to systems. https:\/\/azure.microsoft.com\/en-us\/blog\/azure-maia-for-the-era-of-ai-from-silicon-to-software-to-systems\/."},{"key":"e_1_3_2_1_36_1","unstructured":"Intel Corporation. 2025. Intel Gaudi 3 AI Accelerator White Paper. https:\/\/www.intel.com\/content\/www\/us\/en\/content-details\/817486\/intel-gaudi-3-ai-accelerator-white-paper.html."},{"key":"e_1_3_2_1_37_1","volume-title":"Cxl and the return of scale-up database engines. arXiv preprint arXiv:2401.01150","author":"Lerner Alberto","year":"2024","unstructured":"Alberto Lerner and Gustavo Alonso. 2024. Cxl and the return of scale-up database engines. arXiv preprint arXiv:2401.01150 (2024)."},{"key":"e_1_3_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.14778\/3685800.3685809"},{"key":"e_1_3_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2022.3228561"},{"key":"e_1_3_2_1_40_1","volume-title":"Next-gen interconnection systems with compute express link: A comprehensive survey. arXiv e-prints","author":"Chen Chen","year":"2024","unstructured":"Chen Chen, Xinkui Zhao, Guanjie Cheng, Yuesheng Xu, Shuiguang Deng, and Jianwei Yin. 2024. Next-gen interconnection systems with compute express link: A comprehensive survey. arXiv e-prints (2024), arXiv-2412."},{"key":"e_1_3_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3581784.3607102"},{"key":"e_1_3_2_1_42_1","unstructured":"NVIDIA. [n.d.]. NVIDIA GB200 NVL72. https:\/\/www.nvidia.com\/en-us\/data-center\/gb200-nvl72\/."},{"key":"e_1_3_2_1_43_1","volume-title":"2021 USENIX Annual Technical Conference (USENIX ATC 21)","author":"Ren Jie","year":"2021","unstructured":"Jie Ren, Samyam Rajbhandari, Reza Yazdani Aminabadi, Olatunji Ruwase, Shuangyan Yang, Minjia Zhang, Dong Li, and Yuxiong He. 2021. Zero-offload: Democratizing billion-scale model training. In 2021 USENIX Annual Technical Conference (USENIX ATC 21)."}],"event":{"name":"SOSP '25: ACM SIGOPS 31st Symposium on Operating Systems Principles","sponsor":["SIGOPS ACM Special Interest Group on Operating Systems"],"location":"Seoul Republic of Korea","acronym":"SOSP '25"},"container-title":["Proceedings of the 3rd Workshop on Disruptive Memory Systems"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3764862.3768173","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,13]],"date-time":"2025-11-13T17:19:19Z","timestamp":1763054359000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3764862.3768173"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,13]]},"references-count":43,"alternative-id":["10.1145\/3764862.3768173","10.1145\/3764862"],"URL":"https:\/\/doi.org\/10.1145\/3764862.3768173","relation":{},"subject":[],"published":{"date-parts":[[2025,10,13]]},"assertion":[{"value":"2025-10-13","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}