{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,31]],"date-time":"2025-08-31T10:09:33Z","timestamp":1756634973975,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":48,"publisher":"ACM","license":[{"start":{"date-parts":[[2020,10,12]],"date-time":"2020-10-12T00:00:00Z","timestamp":1602460800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Hong Kong RGC","award":["HKU 17204619, 17208920"],"award-info":[{"award-number":["HKU 17204619, 17208920"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2020,10,12]]},"DOI":"10.1145\/3419111.3421307","type":"proceedings-article","created":{"date-parts":[[2020,10,13]],"date-time":"2020-10-13T04:40:25Z","timestamp":1602564025000},"page":"507-521","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":28,"title":["Elastic parameter server load distribution in deep learning clusters"],"prefix":"10.1145","author":[{"given":"Yangrui","family":"Chen","sequence":"first","affiliation":[{"name":"The University of Hong Kong"}]},{"given":"Yanghua","family":"Peng","sequence":"additional","affiliation":[{"name":"The University of Hong Kong"}]},{"given":"Yixin","family":"Bao","sequence":"additional","affiliation":[{"name":"The University of Hong Kong"}]},{"given":"Chuan","family":"Wu","sequence":"additional","affiliation":[{"name":"The University of Hong Kong"}]},{"given":"Yibo","family":"Zhu","sequence":"additional","affiliation":[{"name":"ByteDance Inc."}]},{"given":"Chuanxiong","family":"Guo","sequence":"additional","affiliation":[{"name":"ByteDance Inc."}]}],"member":"320","published-online":{"date-parts":[[2020,10,12]]},"reference":[{"key":"e_1_3_2_2_1_1","unstructured":"2012. ImageNet Dataset. http:\/\/www.image-net.org\/.  2012. ImageNet Dataset. http:\/\/www.image-net.org\/."},{"key":"e_1_3_2_2_2_1","unstructured":"2014. ps-lite: A Lightweight Parameter Server Interface. https:\/\/github.com\/dmlc\/ps-lite.  2014. ps-lite: A Lightweight Parameter Server Interface. https:\/\/github.com\/dmlc\/ps-lite."},{"key":"e_1_3_2_2_3_1","unstructured":"2019. Alibaba PS-Plus. https:\/\/github.com\/alibaba\/x-deeplearning\/tree\/master\/xdl\/ps-plus.  2019. Alibaba PS-Plus. https:\/\/github.com\/alibaba\/x-deeplearning\/tree\/master\/xdl\/ps-plus."},{"key":"e_1_3_2_2_4_1","unstructured":"2019. AWS EC2 Instance. https:\/\/aws.amazon.com\/ec2\/instance-types\/.  2019. AWS EC2 Instance. https:\/\/aws.amazon.com\/ec2\/instance-types\/."},{"key":"e_1_3_2_2_5_1","unstructured":"2019. BytePS: A High Performance and General Framework for Distributed Training. https:\/\/github.com\/bytedance\/byteps\/.  2019. BytePS: A High Performance and General Framework for Distributed Training. https:\/\/github.com\/bytedance\/byteps\/."},{"key":"e_1_3_2_2_6_1","unstructured":"2019. Linux tc. https:\/\/linux.die.net\/man\/8\/tc.  2019. Linux tc. https:\/\/linux.die.net\/man\/8\/tc."},{"key":"e_1_3_2_2_7_1","unstructured":"2019. NCCL. https:\/\/developer.nvidia.com\/nccl.  2019. NCCL. https:\/\/developer.nvidia.com\/nccl."},{"key":"e_1_3_2_2_8_1","volume-title":"Proc. of USENIX OSDI.","author":"Abadi Mart\u00edn","year":"2016","unstructured":"Mart\u00edn Abadi , Paul Barham , Jianmin Chen , Zhifeng Chen , Andy Davis , Jeffrey Dean , Matthieu Devin , Sanjay Ghemawat , Geoffrey Irving , Michael Isard , 2016 . Tensorflow: A System for Large-Scale Machine Learning . In Proc. of USENIX OSDI. Mart\u00edn Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. Tensorflow: A System for Large-Scale Machine Learning. In Proc. of USENIX OSDI."},{"key":"e_1_3_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/2442516.2442538"},{"key":"e_1_3_2_2_10_1","volume-title":"Proc. of USENIX NSDI.","author":"Ananthanarayanan Ganesh","year":"2013","unstructured":"Ganesh Ananthanarayanan , Ali Ghodsi , Scott Shenker , and Ion Stoica . 2013 . Effective Straggler Mitigation: Attack of the Clones . In Proc. of USENIX NSDI. Ganesh Ananthanarayanan, Ali Ghodsi, Scott Shenker, and Ion Stoica. 2013. Effective Straggler Mitigation: Attack of the Clones. In Proc. of USENIX NSDI."},{"key":"e_1_3_2_2_11_1","volume-title":"Proc. of ICLR.","author":"Bahdanau Dzmitry","year":"2015","unstructured":"Dzmitry Bahdanau , Kyunghyun Cho , and Yoshua Bengio . 2015 . Neural Machine Translation by Jointly Learning to Align and Translate . In Proc. of ICLR. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. In Proc. of ICLR."},{"key":"e_1_3_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM41043.2020.9155446"},{"key":"e_1_3_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM.2019.8737460"},{"key":"e_1_3_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM.2018.8486422"},{"key":"e_1_3_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM.2019.8737587"},{"key":"e_1_3_2_2_16_1","volume-title":"MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. In NIPS Workshop on Machine Learning Systems (LearningSys).","author":"Chen Tianqi","year":"2016","unstructured":"Tianqi Chen , Mu Li , Yutian Li , Min Lin , Naiyan Wang , Minjie Wang , Tianjun Xiao , Bing Xu , Chiyuan Zhang , and Zheng Zhang . 2016 . MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. In NIPS Workshop on Machine Learning Systems (LearningSys). Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2016. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. In NIPS Workshop on Machine Learning Systems (LearningSys)."},{"key":"e_1_3_2_2_17_1","volume-title":"Proc. of USENIX OSDI.","author":"Chen Tianqi","year":"2018","unstructured":"Tianqi Chen , Thierry Moreau , Ziheng Jiang , Lianmin Zheng , Eddie Yan , Haichen Shen , Meghan Cowan , Leyuan Wang , Yuwei Hu , Luis Ceze , 2018 . TVM: An Automated End-to-End Optimizing Compiler for Deep Learning . In Proc. of USENIX OSDI. Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, et al. 2018. TVM: An Automated End-to-End Optimizing Compiler for Deep Learning. In Proc. of USENIX OSDI."},{"key":"e_1_3_2_2_18_1","doi-asserted-by":"crossref","unstructured":"Bulpitt Ci. 1987. Confidence Intervals. Lancet (1987).  Bulpitt Ci. 1987. Confidence Intervals. Lancet (1987).","DOI":"10.1016\/S0140-6736(87)92100-3"},{"key":"e_1_3_2_2_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3343180.3343192"},{"key":"e_1_3_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2013.6638947"},{"key":"e_1_3_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/2987550.2987554"},{"key":"e_1_3_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3064176.3064182"},{"key":"e_1_3_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_3_2_2_24_1","volume-title":"Proc. of ICML.","author":"Ioffe Sergey","year":"2015","unstructured":"Sergey Ioffe and Christian Szegedy . 2015 . Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift . In Proc. of ICML. Sergey Ioffe and Christian Szegedy. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proc. of ICML."},{"key":"e_1_3_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3035933"},{"key":"e_1_3_2_2_26_1","volume-title":"Proc. of USENIX OSDI.","author":"Jiang Yimin","year":"2020","unstructured":"Yimin Jiang , Yibo Zhu , Chang Lan , Bairen Yi , Yong Cui , and Chuanxiong Guo . 2020 . A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU\/CPU Clusters . In Proc. of USENIX OSDI. Yimin Jiang, Yibo Zhu, Chang Lan, Bairen Yi, Yong Cui, and Chuanxiong Guo. 2020. A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU\/CPU Clusters. In Proc. of USENIX OSDI."},{"key":"e_1_3_2_2_27_1","volume-title":"Proc. of NIPS.","author":"Krizhevsky Alex","year":"2012","unstructured":"Alex Krizhevsky , Ilya Sutskever , and Geoffrey E Hinton . 2012 . ImageNet Classification with Deep Convolutional Neural Networks . In Proc. of NIPS. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Proc. of NIPS."},{"key":"e_1_3_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/2640087.2644155"},{"key":"e_1_3_2_2_29_1","volume-title":"Proc. of ICML.","author":"Lian Xiangru","year":"2018","unstructured":"Xiangru Lian , Wei Zhang , Ce Zhang , and Ji Liu . 2018 . Asynchronous Decentralized Parallel Stochastic Gradient Descent . In Proc. of ICML. Xiangru Lian, Wei Zhang, Ce Zhang, and Ji Liu. 2018. Asynchronous Decentralized Parallel Stochastic Gradient Descent. In Proc. of ICML."},{"key":"e_1_3_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3297858.3304009"},{"key":"e_1_3_2_2_31_1","volume":"202","author":"Or Andrew","unstructured":"Andrew Or , Haoyu Zhang , and Michael J Freedman. 202 0. Resource Elasticity in Distributed Deep Learning. In Proc. of Machine Learning and Systems (MLSys). Andrew Or, Haoyu Zhang, and Michael J Freedman. 2020. Resource Elasticity in Distributed Deep Learning. In Proc. of Machine Learning and Systems (MLSys).","journal-title":"Michael J Freedman."},{"key":"e_1_3_2_2_32_1","volume-title":"Proc. of NIPS Autodiff Workshop.","author":"Paszke Adam","year":"2017","unstructured":"Adam Paszke , Sam Gross , Soumith Chintala , Gregory Chanan , Edward Yang , Zachary DeVito , Zeming Lin , Alban Desmaison , Luca Antiga , and Adam Lerer . 2017 . Automatic Differentiation in PyTorch . In Proc. of NIPS Autodiff Workshop. Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic Differentiation in PyTorch. In Proc. of NIPS Autodiff Workshop."},{"key":"e_1_3_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS.2007.370405"},{"key":"e_1_3_2_2_34_1","doi-asserted-by":"crossref","unstructured":"Pitch Patarasuk and Xin Yuan. 2009. Bandwidth Optimal All-reduce Algorithms for Clusters of Workstations. J. Parallel andDistrib. Comput. (2009).  Pitch Patarasuk and Xin Yuan. 2009. Bandwidth Optimal All-reduce Algorithms for Clusters of Workstations. J. Parallel andDistrib. Comput. (2009).","DOI":"10.1016\/j.jpdc.2008.09.002"},{"key":"e_1_3_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/3190508.3190517"},{"key":"e_1_3_2_2_36_1","volume-title":"DL2: A Deep Learning-driven Scheduler for Deep Learning Clusters. arXiv preprint arXiv:1909.06040","author":"Peng Yanghua","year":"2019","unstructured":"Yanghua Peng , Yixin Bao , Yangrui Chen , Chuan Wu , Chen Meng , and Wei Lin . 2019. DL2: A Deep Learning-driven Scheduler for Deep Learning Clusters. arXiv preprint arXiv:1909.06040 ( 2019 ). Yanghua Peng, Yixin Bao, Yangrui Chen, Chuan Wu, Chen Meng, and Wei Lin. 2019. DL2: A Deep Learning-driven Scheduler for Deep Learning Clusters. arXiv preprint arXiv:1909.06040 (2019)."},{"key":"e_1_3_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3341301.3359642"},{"key":"e_1_3_2_2_38_1","volume-title":"Proc. of USENIX ATC.","author":"Qiao Aurick","year":"2018","unstructured":"Aurick Qiao , Abutalib Aghayev , Weiren Yu , Haoyang Chen , Qirong Ho , Garth A Gibson , and Eric P Xing . 2018 . Litz: Elastic Framework for High-Performance Distributed Machine Learning . In Proc. of USENIX ATC. Aurick Qiao, Abutalib Aghayev, Weiren Yu, Haoyang Chen, Qirong Ho, Garth A Gibson, and Eric P Xing. 2018. Litz: Elastic Framework for High-Performance Distributed Machine Learning. In Proc. of USENIX ATC."},{"key":"e_1_3_2_2_39_1","volume-title":"A Stochastic Approximation Method. The Annals of Mathematical Statistics","author":"Robbins Herbert","year":"1951","unstructured":"Herbert Robbins and Sutton Monro . 1951. A Stochastic Approximation Method. The Annals of Mathematical Statistics ( 1951 ). Herbert Robbins and Sutton Monro. 1951. A Stochastic Approximation Method. The Annals of Mathematical Statistics (1951)."},{"key":"e_1_3_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/505202.505215"},{"key":"e_1_3_2_2_41_1","volume-title":"Horovod: Fast and Easy Distributed Deep Learning in TensorFlow. arXiv preprint arXiv:1802.05799","author":"Sergeev Alexander","year":"2018","unstructured":"Alexander Sergeev and Mike Del Balso . 2018 . Horovod: Fast and Easy Distributed Deep Learning in TensorFlow. arXiv preprint arXiv:1802.05799 (2018). Alexander Sergeev and Mike Del Balso. 2018. Horovod: Fast and Easy Distributed Deep Learning in TensorFlow. arXiv preprint arXiv:1802.05799 (2018)."},{"key":"e_1_3_2_2_42_1","volume-title":"Proc. of ICLR.","author":"Simonyan Karen","year":"2015","unstructured":"Karen Simonyan and Andrew Zisserman . 2015 . Very Deep Convolutional Networks for Large-scale Image Recognition . In Proc. of ICLR. Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-scale Image Recognition. In Proc. of ICLR."},{"key":"e_1_3_2_2_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/SMARTCOMP.2017.7947053"},{"key":"e_1_3_2_2_44_1","doi-asserted-by":"publisher","DOI":"10.5555\/551283"},{"key":"e_1_3_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"e_1_3_2_2_46_1","volume-title":"Proc. of ICML.","author":"Tandon Rashish","year":"2017","unstructured":"Rashish Tandon , Qi Lei , Alexandras G Dimakis , and Nikos Karampatziakis . 2017 . Gradient Coding: Avoiding Stragglers in Distributed Learning . In Proc. of ICML. Rashish Tandon, Qi Lei, Alexandras G Dimakis, and Nikos Karampatziakis. 2017. Gradient Coding: Avoiding Stragglers in Distributed Learning. In Proc. of ICML."},{"key":"e_1_3_2_2_47_1","volume-title":"Jinliang Wei, Seunghak Lee, Xun Zheng, Pengtao Xie, Abhimanu Kumar, and Yaoliang Yu.","author":"Xing Eric P","year":"2015","unstructured":"Eric P Xing , Qirong Ho , Wei Dai , Jin Kyu Kim , Jinliang Wei, Seunghak Lee, Xun Zheng, Pengtao Xie, Abhimanu Kumar, and Yaoliang Yu. 2015 . Petuum : A New Platform for Distributed Machine Learning on Big Data. IEEE Transactions on Big Data ( 2015). Eric P Xing, Qirong Ho, Wei Dai, Jin Kyu Kim, Jinliang Wei, Seunghak Lee, Xun Zheng, Pengtao Xie, Abhimanu Kumar, and Yaoliang Yu. 2015. Petuum: A New Platform for Distributed Machine Learning on Big Data. IEEE Transactions on Big Data (2015)."},{"key":"e_1_3_2_2_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDCS.2019.00150"}],"event":{"name":"SoCC '20: ACM Symposium on Cloud Computing","sponsor":["SIGMOD ACM Special Interest Group on Management of Data","SIGOPS ACM Special Interest Group on Operating Systems"],"location":"Virtual Event USA","acronym":"SoCC '20"},"container-title":["Proceedings of the 11th ACM Symposium on Cloud Computing"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3419111.3421307","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3419111.3421307","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T21:32:06Z","timestamp":1750195926000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3419111.3421307"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,10,12]]},"references-count":48,"alternative-id":["10.1145\/3419111.3421307","10.1145\/3419111"],"URL":"https:\/\/doi.org\/10.1145\/3419111.3421307","relation":{},"subject":[],"published":{"date-parts":[[2020,10,12]]},"assertion":[{"value":"2020-10-12","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}