{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T19:49:07Z","timestamp":1774986547287,"version":"3.50.1"},"reference-count":93,"publisher":"Association for Computing Machinery (ACM)","issue":"3","funder":[{"name":"UGC of Hong Kong","award":["8601116, 8601594, and 8601625"],"award-info":[{"award-number":["8601116, 8601594, and 8601625"]}]},{"name":"Hong Kong General Research Fund","award":["14208023"],"award-info":[{"award-number":["14208023"]}]},{"name":"Alibaba Innovative Research Program"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Manag. Data"],"published-print":{"date-parts":[[2025,6,17]]},"abstract":"<jats:p>Embedding-based recommendation models (ERMs) require large memory to host huge embedding tables and involve massive data traffic to read the embeddings. As a new interconnect, CXL suits ERMs since it can scale up single-machine memory with performant remote memory devices. However, directly running DRAM-based ERM serving systems on CXL yields poor performance because the bandwidth of CXL is notably lower than DRAM and can be easily saturated, making CXL memory the bottleneck. The non-uniform memory access (NUMA) architecture in modern CXL servers further decreased the system performance. In this paper, we design Carina for ERM serving on heterogeneous memory with CXL by considering such bandwidth asymmetry. In particular, Carina balances the memory access from different memory devices by storing hot embeddings with high access frequencies on DRAM and specifying the placement of embedding tables on the NUMA nodes. Moreover, Carina adopts bandwidth-aware task execution, which decomposes each batch of ERM requests into fine-grained tasks and schedules the tasks to control the real-time utilization of CXL bandwidth to avoid instantaneous saturation. We evaluate Carina under real CXL devices and find that it outperforms a CXL-oblivious baseline by an average of 5.38x and 4.04x in system throughput and request latency, respectively.<\/jats:p>","DOI":"10.1145\/3725274","type":"journal-article","created":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T21:23:29Z","timestamp":1750281809000},"page":"1-29","source":"Crossref","is-referenced-by-count":0,"title":["CARINA: An Efficient CXL-Oriented Embedding Serving System for Recommendation Models"],"prefix":"10.1145","volume":"3","author":[{"ORCID":"https:\/\/orcid.org\/0009-0008-4081-913X","authenticated-orcid":false,"given":"Peiqi","family":"Yin","sequence":"first","affiliation":[{"name":"The Chinese University of Hong Kong, Hong Kong SAR, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8927-3325","authenticated-orcid":false,"given":"Qihui","family":"Zhou","sequence":"additional","affiliation":[{"name":"The Chinese University of Hong Kong, Hong Kong SAR, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2122-915X","authenticated-orcid":false,"given":"Xiao","family":"Yan","sequence":"additional","affiliation":[{"name":"Centre for Perceptual and Interactive Intelligence (CPII), Hong Kong SAR, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-2162-5675","authenticated-orcid":false,"given":"Chao","family":"Wang","sequence":"additional","affiliation":[{"name":"The Chinese University of Hong Kong, Hong Kong SAR, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2679-3945","authenticated-orcid":false,"given":"Eric","family":"Lo","sequence":"additional","affiliation":[{"name":"The Chinese University of Hong Kong, Hong Kong SAR, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2768-1224","authenticated-orcid":false,"given":"Changji","family":"Li","sequence":"additional","affiliation":[{"name":"The Chinese University of Hong Kong, Hong Kong SAR, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-6492-5221","authenticated-orcid":false,"given":"Lan","family":"Lu","sequence":"additional","affiliation":[{"name":"University of Pennsylvania, Philadelphia, PA, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6230-0445","authenticated-orcid":false,"given":"Hua","family":"Fan","sequence":"additional","affiliation":[{"name":"Alibaba Cloud, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0002-2689-6020","authenticated-orcid":false,"given":"Wenchao","family":"Zhou","sequence":"additional","affiliation":[{"name":"Alibaba Cloud, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4029-757X","authenticated-orcid":false,"given":"Ming-Chang","family":"Yang","sequence":"additional","affiliation":[{"name":"The Chinese University of Hong Kong, Hong Kong SAR, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6313-6288","authenticated-orcid":false,"given":"James","family":"Cheng","sequence":"additional","affiliation":[{"name":"The Chinese University of Hong Kong, Hong Kong SAR, China"}]}],"member":"320","published-online":{"date-parts":[[2025,6,18]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"2003. Linux numactl command. https:\/\/linux.die.net\/man\/8\/numactl."},{"key":"e_1_2_1_2_1","unstructured":"2012. NUMA Balancing (AutoNUMA). https:\/\/mirrors.edge.kernel.org\/pub\/linux\/kernel\/people\/andrea\/autonuma\/ autonuma_bench-20120530.pdf."},{"key":"e_1_2_1_3_1","unstructured":"2020. Compute Express Link\u00ae 2.0 White Paper. https:\/\/computeexpresslink.org\/wp-content\/uploads\/2023\/12\/CXL2.0_ White_Paper_November-2020_FINAL.pdf."},{"key":"e_1_2_1_4_1","unstructured":"2022. CXL 3.0 Specification. https:\/\/www.computeexpresslink.org\/download-the-specification\/."},{"key":"e_1_2_1_5_1","unstructured":"2022. Intel Optane DC PMM. https:\/\/www.intel.com\/content\/www\/us\/en\/products\/docs\/memory-storage\/optanepersistent- memory\/overview.html."},{"key":"e_1_2_1_6_1","unstructured":"2023. Compute Express Link\u2122(CXL\u2122): Supporting Persistent Memory. https:\/\/computeexpresslink.org\/wp-content\/ uploads\/2023\/12\/CXL-2.0-Presentation-Persistent-Memory-20210615_FINAL.pdf."},{"key":"e_1_2_1_7_1","unstructured":"2023. Micron Launches Memory Expansion Module Portfolio to Accelerate CXL 2.0 Adoption. https:\/\/investors.micron. com\/news-releases\/news-release-details\/micron-launches-memory-expansion-module-portfolio-accelerate-cxl."},{"key":"e_1_2_1_8_1","unstructured":"2023. Xconn Technologies: CXL 2.0 Memory Pooling (Sharing) Using Xconn Switch. https:\/\/www.xconn-tech.com\/ product."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.14778\/3485450.3485462"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3533737.3535090"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3077136"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3626111.3628201"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.14778\/3598581.3598585"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2023.3241586"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3572848.3577528"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE55515.2023.00228"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/2988450.2988454"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2406.06955"},{"key":"e_1_2_1_19_1","unstructured":"CriteoLabs. 2014. Criteo display ad challenge. https:\/\/www.kaggle.com\/c\/criteo-display-ad-challenge."},{"key":"e_1_2_1_20_1","volume-title":"Proceedings of Machine Learning and Systems 2019","author":"Eisenman Assaf","year":"2019","unstructured":"Assaf Eisenman, Maxim Naumov, Darryl Gardner, Misha Smelyanskiy, Sergey Pupyrev, Kim M. Hazelwood, Asaf Cidon, and Sachin Katti. 2019. Bandana: Using Non-Volatile Memory for Storing Deep Learning Models. In Proceedings of Machine Learning and Systems 2019, MLSys 2019, Stanford, CA, USA, March 31 - April 2, 2019. mlsys.org. https: \/\/proceedings.mlsys.org\/book\/277.pdf"},{"key":"e_1_2_1_21_1","volume-title":"Lenssen","author":"Fey Matthias","year":"2019","unstructured":"Matthias Fey and Jan E. Lenssen. 2019. Fast Graph Representation Learning with PyTorch Geometric. In ICLRWorkshop on Representation Learning on Graphs and Manifolds."},{"key":"e_1_2_1_22_1","doi-asserted-by":"crossref","unstructured":"John Forrest and Robin Lougee-Heimer. 2005. CBC user guide. In Emerging theory methods and applications. INFORMS 257--277.","DOI":"10.1287\/educ.1053.0020"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3446662"},{"key":"e_1_2_1_24_1","unstructured":"Google. 2021. TensorFlow Recommenders. https:\/\/github.com\/tensorflow\/recommenders."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/3577193.3593724"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.24963\/IJCAI.2017\/239"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA45697.2020.00084"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA47549.2020.00047"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1016\/J.ELERAP.2018.01.012"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3523227.3547387"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/3579371.3589112"},{"key":"e_1_2_1_32_1","volume-title":"CXL-ANNS: Software-Hardware Collaborative Memory Disaggregation and Computation for Billion-Scale Approximate Nearest Neighbor Search. In 2023 USENIX Annual Technical Conference, USENIX ATC 2023","author":"Jang Junhyeok","year":"2023","unstructured":"Junhyeok Jang, Hanjin Choi, Hanyeoreum Bae, Seungjun Lee, Miryeong Kwon, and Myoungsoo Jung. 2023. CXL-ANNS: Software-Hardware Collaborative Memory Disaggregation and Computation for Billion-Scale Approximate Nearest Neighbor Search. In 2023 USENIX Annual Technical Conference, USENIX ATC 2023, Boston, MA, USA, July 10--12, 2023. USENIX Association, 585--600. https:\/\/www.usenix.org\/conference\/atc23\/presentation\/jang"},{"key":"e_1_2_1_33_1","volume-title":"Bridging Software-Hardware for CXL Memory Disaggregation in Billion-Scale Nearest Neighbor Search. ACM Transactions on Storage","author":"Jang Junhyeok","year":"2024","unstructured":"Junhyeok Jang, Hanjin Choi, Hanyeoreum Bae, Seungjun Lee, Miryeong Kwon, and Myoungsoo Jung. 2024. Bridging Software-Hardware for CXL Memory Disaggregation in Billion-Scale Nearest Neighbor Search. ACM Transactions on Storage (2024). https:\/\/dl.acm.org\/doi\/10.1145\/3639471"},{"key":"e_1_2_1_34_1","volume-title":"Proceedings of Machine Learning and Systems 2021","author":"Jiang Wenqi","year":"2021","unstructured":"Wenqi Jiang, Zhenhao He, Shuai Zhang, Thomas B. Preu\u00dfer, Kai Zeng, Liang Feng, Jiansong Zhang, Tongxuan Liu, Yong Li, Jingren Zhou, Ce Zhang, and Gustavo Alonso. 2021. MicroRec: Efficient Recommendation Inference by Hardware and Data Structure Solutions. In Proceedings of Machine Learning and Systems 2021, MLSys 2021, virtual, April 5--9, 2021. mlsys.org. https:\/\/proceedings.mlsys.org\/paper_files\/paper\/2021\/hash\/9e9a5486cb2f8e44d5b5fedd2a9e5fcd- Abstract.html"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/3447548.3467139"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2404.12457"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA52012.2021.00059"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA45697.2020.00070"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA53966.2022.00019"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2212.00939"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/146628.139676"},{"key":"e_1_2_1_42_1","volume-title":"FBGEMM: Enabling High-Performance Low-Precision Deep Learning Inference. CoRR abs\/2101.05615","author":"Khudia Daya Shanker","year":"2021","unstructured":"Daya Shanker Khudia, Jianyu Huang, Protonu Basu, Summer Deng, Haixin Liu, Jongsoo Park, and Mikhail Smelyanskiy. 2021. FBGEMM: Enabling High-Performance Low-Precision Deep Learning Inference. CoRR abs\/2101.05615 (2021). arXiv:2101.05615 https:\/\/arxiv.org\/abs\/2101.05615"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2024.3375352"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2023.3240774"},{"key":"e_1_2_1_45_1","unstructured":"Petr Kobalicek. 2010. AsmJit Library. asmjit.com."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/319566.319567"},{"key":"e_1_2_1_47_1","volume-title":"Gunawi","author":"Kurniawan Daniar Heri","year":"2023","unstructured":"Daniar Heri Kurniawan, Ruipu Wang, Kahfi S. Zulkifli, Fandi A. Wiranata, John Bent, Ymir Vigfusson, and Haryadi S. Gunawi. 2023. EVStore: Storage and Caching Capabilities for Scaling Embedding Tables in Deep Recommendation Systems. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, ASPLOS 2023, Vancouver, BC, Canada, March 25--29, 2023. ACM, 281--294. doi:10.1145\/ 3575693.3575718"},{"key":"e_1_2_1_48_1","volume-title":"Improving key-value cache performance with heterogeneous memory tiering: A case study of CXL-based memory expansion","author":"Lee KyungSoo","year":"2024","unstructured":"KyungSoo Lee, Sohyun Kim, Joohee Lee, Donguk Moon, Rakie Kim, Honggyu Kim, Hyeongtak Ji, Yunjeong Mun, and Youngpyo Joo. 2024. Improving key-value cache performance with heterogeneous memory tiering: A case study of CXL-based memory expansion. IEEE Micro (2024)."},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/3445814.3446717"},{"key":"e_1_2_1_50_1","volume-title":"Proceedings of the Second Conference on Machine Learning and Systems, SysML 2019","author":"Lerer Adam","year":"2019","unstructured":"Adam Lerer, Ledell Wu, Jiajun Shen, Timoth\u00e9e Lacroix, Luca Wehrstedt, Abhijit Bose, and Alex Peysakhovich. 2019. Pytorch-BigGraph: A Large Scale Graph Embedding System. In Proceedings of the Second Conference on Machine Learning and Systems, SysML 2019, Stanford, CA, USA, March 31 - April 2, 2019. mlsys.org. https:\/\/proceedings.mlsys. org\/paper_files\/paper\/2019\/hash\/1eb34d662b67a14e3511d0dfd78669be-Abstract.html"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/3575693.3578835"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/3447548.3467101"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/3219819.3220023"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1145\/3534678.3539070"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA59077.2024.00036"},{"key":"e_1_2_1_56_1","unstructured":"Zhuoran Liu Leqi Zou Xuan Zou Caihua Wang Biao Zhang Da Tang Bolin Zhu Yijie Zhu Peng Wu Ke Wang and Youlong Cheng. 2022. Monolith: Real Time Recommendation System with Collisionless Embedding Table. In Proceedings of the 5th Workshop on Online Recommender Systems and User Modeling co-located with the 16th ACM Conference on Recommender Systems ORSUM@RecSys 2022 Seattle WA USA September 23rd 2022 (CEUR Workshop Proceedings Vol. 3303). CEUR-WS.org. https:\/\/ceur-ws.org\/Vol-3303\/paper8.pdf"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS51385.2021.00033"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/3589310"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2018.2889473"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1145\/3582016.3582063"},{"key":"e_1_2_1_61_1","unstructured":"Meta. 2022. Facebook DLRM datasets. https:\/\/github.com\/facebookresearch\/dlrm_datasets."},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1145\/3589777"},{"key":"e_1_2_1_63_1","volume-title":"15th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2021","author":"Mohoney Jason","year":"2021","unstructured":"Jason Mohoney, RogerWaleffe, Henry Xu, Theodoros Rekatsinas, and Shivaram Venkataraman. 2021. Marius: Learning Massive Graph Embeddings on a Single Machine. In 15th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2021, July 14--16, 2021. USENIX Association, 533--549. https:\/\/www.usenix.org\/conference\/ osdi21\/presentation\/mohoney"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA57654.2024"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.5555\/3495724.3497378"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1145\/3624062.3624173"},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1145\/3503222.3507777"},{"key":"e_1_2_1_68_1","volume-title":"Deep Learning based Large Scale Visual Recommendation and Search for E-Commerce. CoRR abs\/1703.02344","author":"Shankar Devashish","year":"2017","unstructured":"Devashish Shankar, Sujay Narumanchi, H. A. Ananya, Pramod Kompalli, and Krishnendu Chaudhury. 2017. Deep Learning based Large Scale Visual Recommendation and Search for E-Commerce. CoRR abs\/1703.02344 (2017). arXiv:1703.02344 http:\/\/arxiv.org\/abs\/1703.02344"},{"key":"e_1_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1145\/3489525.3511672"},{"key":"e_1_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2312.04789"},{"key":"e_1_2_1_71_1","doi-asserted-by":"publisher","DOI":"10.1145\/3600006.3613169"},{"key":"e_1_2_1_72_1","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA53966.2022.00081"},{"key":"e_1_2_1_73_1","doi-asserted-by":"publisher","DOI":"10.1145\/3613424.3614256"},{"key":"e_1_2_1_74_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2305.10863"},{"key":"e_1_2_1_75_1","doi-asserted-by":"publisher","DOI":"10.1145\/3627703.3650061"},{"key":"e_1_2_1_76_1","volume-title":"Highly-Performant Package for Graph Neural Networks. arXiv preprint arXiv:1909.01315","author":"Wang Minjie","year":"2019","unstructured":"Minjie Wang, Da Zheng, Zihao Ye, Quan Gan, Mufei Li, Xiang Song, Jinjing Zhou, Chao Ma, Lingfan Yu, Yu Gai, Tianjun Xiao, Tong He, George Karypis, Jinyang Li, and Zheng Zhang. 2019. Deep Graph Library: A Graph-Centric, Highly-Performant Package for Graph Neural Networks. arXiv preprint arXiv:1909.01315 (2019)."},{"key":"e_1_2_1_77_1","doi-asserted-by":"publisher","DOI":"10.23919\/DATE54114.2022.9774762"},{"key":"e_1_2_1_78_1","doi-asserted-by":"publisher","DOI":"10.1145\/3609384"},{"key":"e_1_2_1_79_1","doi-asserted-by":"publisher","DOI":"10.1145\/3523227.3546765"},{"key":"e_1_2_1_80_1","doi-asserted-by":"publisher","DOI":"10.1145\/3445814.3446763"},{"key":"e_1_2_1_81_1","doi-asserted-by":"publisher","DOI":"10.1145\/3492321.3519554"},{"key":"e_1_2_1_82_1","doi-asserted-by":"publisher","DOI":"10.14778\/3579075.3579077"},{"key":"e_1_2_1_83_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC41405.2020.00025"},{"key":"e_1_2_1_84_1","doi-asserted-by":"publisher","DOI":"10.1145\/3582016.3582029"},{"key":"e_1_2_1_85_1","doi-asserted-by":"publisher","DOI":"10.1145\/3580305.3599805"},{"key":"e_1_2_1_86_1","doi-asserted-by":"publisher","DOI":"10.1145\/3650200.3656595"},{"key":"e_1_2_1_87_1","doi-asserted-by":"publisher","DOI":"10.1145\/3534678.3539034"},{"key":"e_1_2_1_88_1","doi-asserted-by":"publisher","DOI":"10.1145\/3600006.3613135"},{"key":"e_1_2_1_89_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2311.09544"},{"key":"e_1_2_1_90_1","volume-title":"Proceedings of Machine Learning and Systems 2020","author":"Zhao Weijie","year":"2020","unstructured":"Weijie Zhao, Deping Xie, Ronglai Jia, Yulei Qian, Ruiquan Ding, Mingming Sun, and Ping Li. 2020. Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems. In Proceedings of Machine Learning and Systems 2020, MLSys 2020, Austin, TX, USA, March 2--4, 2020. mlsys.org. https:\/\/proceedings.mlsys.org\/book\/315.pdf"},{"key":"e_1_2_1_91_1","doi-asserted-by":"publisher","DOI":"10.1145\/3654986"},{"key":"e_1_2_1_92_1","doi-asserted-by":"publisher","DOI":"10.1145\/3677129"},{"key":"e_1_2_1_93_1","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2403.18702"}],"container-title":["Proceedings of the ACM on Management of Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3725274","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T18:55:16Z","timestamp":1774983316000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3725274"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,17]]},"references-count":93,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2025,6,17]]}},"alternative-id":["10.1145\/3725274"],"URL":"https:\/\/doi.org\/10.1145\/3725274","relation":{},"ISSN":["2836-6573"],"issn-type":[{"value":"2836-6573","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,6,17]]}}}