{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:18:10Z","timestamp":1750220290057,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":17,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,4,5]],"date-time":"2022-04-05T00:00:00Z","timestamp":1649116800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"NSF","award":["CNS-2008468"],"award-info":[{"award-number":["CNS-2008468"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,4,5]]},"DOI":"10.1145\/3517207.3526981","type":"proceedings-article","created":{"date-parts":[[2022,3,29]],"date-time":"2022-03-29T22:09:26Z","timestamp":1648591766000},"page":"79-86","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["dSyncPS"],"prefix":"10.1145","author":[{"given":"Yibo","family":"Guo","sequence":"first","affiliation":[{"name":"Case Western Reserve University"}]},{"given":"An","family":"Wang","sequence":"additional","affiliation":[{"name":"Case Western Reserve University"}]}],"member":"320","published-online":{"date-parts":[[2022,4,5]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"2022. Cloud Object Storage - Amazon S3 - Amazon Web Services. https:\/\/aws.amazon.com\/s3  2022. Cloud Object Storage - Amazon S3 - Amazon Web Services. https:\/\/aws.amazon.com\/s3"},{"key":"e_1_3_2_1_2_1","unstructured":"2022. Fashion MNIST. https:\/\/www.kaggle.com\/zalando-research\/fashionmnist  2022. Fashion MNIST. https:\/\/www.kaggle.com\/zalando-research\/fashionmnist"},{"key":"e_1_3_2_1_3_1","volume-title":"Tensorflow: A system for large-scale machine learning. In Procs of USENIX OSDI.","author":"Abadi Mart\u00edn","year":"2016","unstructured":"Mart\u00edn Abadi , Paul Barham , Jianmin Chen , 2016 . Tensorflow: A system for large-scale machine learning. In Procs of USENIX OSDI. Mart\u00edn Abadi, Paul Barham, Jianmin Chen, et al. 2016. Tensorflow: A system for large-scale machine learning. In Procs of USENIX OSDI."},{"key":"e_1_3_2_1_4_1","volume-title":"Performance analysis and comparison of distributed machine learning systems. arXiv:1909.02061","author":"Alqahtani Salem","year":"2019","unstructured":"Salem Alqahtani and Murat Demirbas . 2019. Performance analysis and comparison of distributed machine learning systems. arXiv:1909.02061 ( 2019 ). Salem Alqahtani and Murat Demirbas. 2019. Performance analysis and comparison of distributed machine learning systems. arXiv:1909.02061 (2019)."},{"key":"e_1_3_2_1_5_1","volume-title":"Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv:1512.01274.","author":"Chen Tianqi","year":"2015","unstructured":"Tianqi Chen , Mu Li , Yutian Li , Min Lin , 2015 . Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv:1512.01274. Tianqi Chen, Mu Li, Yutian Li, Min Lin, et al. 2015. Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv:1512.01274."},{"key":"e_1_3_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.17487\/RFC7871"},{"key":"e_1_3_2_1_7_1","unstructured":"Dmitry Duplyakin Robert Ricci Aleksander Maricq Gary Wong Jonathon Duerig Eric Eide Leigh Stoller Mike Hibler David Johnson Kirk Webb etal 2019. The Design and Operation of CloudLab. In Procs of USENIX ATC.  Dmitry Duplyakin Robert Ricci Aleksander Maricq Gary Wong Jonathon Duerig Eric Eide Leigh Stoller Mike Hibler David Johnson Kirk Webb et al. 2019. The Design and Operation of CloudLab. In Procs of USENIX ATC."},{"key":"e_1_3_2_1_8_1","volume-title":"Direct bulk-synchronous parallel algorithms. Journal of parallel and distributed computing","author":"Gerbessiotis Alexandras V","year":"1994","unstructured":"Alexandras V Gerbessiotis and Leslie G Valiant . 1994. Direct bulk-synchronous parallel algorithms. Journal of parallel and distributed computing ( 1994 ). Alexandras V Gerbessiotis and Leslie G Valiant. 1994. Direct bulk-synchronous parallel algorithms. Journal of parallel and distributed computing (1994)."},{"key":"e_1_3_2_1_9_1","volume-title":"Phillip B Gibbons, Garth A Gibson, Greg Ganger, and Eric P Xing.","author":"Ho Qirong","year":"2013","unstructured":"Qirong Ho , James Cipar , Henggang Cui , Seunghak Lee , Jin Kyu Kim , Phillip B Gibbons, Garth A Gibson, Greg Ganger, and Eric P Xing. 2013 . More effective distributed ml via a stale synchronous parallel parameter server. In Advances in neural information processing systems. Qirong Ho, James Cipar, Henggang Cui, Seunghak Lee, Jin Kyu Kim, Phillip B Gibbons, Garth A Gibson, Greg Ganger, and Eric P Xing. 2013. More effective distributed ml via a stale synchronous parallel parameter server. In Advances in neural information processing systems."},{"key":"e_1_3_2_1_10_1","volume-title":"Alexander J Smola, Amr Ahmed, Vanja Josifovski, et al.","author":"Li Mu","year":"2014","unstructured":"Mu Li , David G Andersen , Jun Woo Park , Alexander J Smola, Amr Ahmed, Vanja Josifovski, et al. 2014 . Scaling distributed machine learning with the parameter server. In Procs of USENIX OSDI. Mu Li, David G Andersen, Jun Woo Park, Alexander J Smola, Amr Ahmed, Vanja Josifovski, et al. 2014. Scaling distributed machine learning with the parameter server. In Procs of USENIX OSDI."},{"key":"e_1_3_2_1_11_1","unstructured":"Shijian Li Oren Mangoubi Lijie Xu and Tian Guo. 2021. Sync-Switch: Hybrid Parameter Synchronization for Distributed Deep Learning. arXiv:2104.08364.  Shijian Li Oren Mangoubi Lijie Xu and Tian Guo. 2021. Sync-Switch: Hybrid Parameter Synchronization for Distributed Deep Learning. arXiv:2104.08364."},{"key":"e_1_3_2_1_12_1","volume-title":"Yuan Wang, et al.","author":"Ooi Beng Chin","year":"2015","unstructured":"Beng Chin Ooi , Kian-Lee Tan , Sheng Wang , Wei Wang , Qingchao Cai , Gang Chen , Jinyang Gao , Zhaojing Luo , Anthony KH Tung , Yuan Wang, et al. 2015 . SINGA : A distributed deep learning platform. In Procs of ACM Multimedia . Beng Chin Ooi, Kian-Lee Tan, Sheng Wang, Wei Wang, Qingchao Cai, Gang Chen, Jinyang Gao, Zhaojing Luo, Anthony KH Tung, Yuan Wang, et al. 2015. SINGA: A distributed deep learning platform. In Procs of ACM Multimedia."},{"key":"e_1_3_2_1_13_1","volume-title":"Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems","author":"Paszke Adam","year":"2019","unstructured":"Adam Paszke , Sam Gross , Francisco Massa , Adam Lerer , James Bradbury , Gregory Chanan , Trevor Killeen , Zeming Lin , Natalia Gimelshein , Luca Antiga , 2019 . Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems (2019). Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems (2019)."},{"key":"e_1_3_2_1_14_1","unstructured":"Yuxin Su Michael Lyu etal 2018. Communication-efficient distributed deep metric learning with hybrid synchronization. In Procs of ACM CIKM.  Yuxin Su Michael Lyu et al. 2018. Communication-efficient distributed deep metric learning with hybrid synchronization. In Procs of ACM CIKM."},{"key":"e_1_3_2_1_15_1","volume-title":"Raul Castro Fernandez, and Peter Pietzuch.","author":"Watcharapichat Pijika","year":"2016","unstructured":"Pijika Watcharapichat , Victoria Lopez Morales , Raul Castro Fernandez, and Peter Pietzuch. 2016 . Ako : Decentralised deep learning with partial gradient exchange. In Procs of ACM SoCC. Pijika Watcharapichat, Victoria Lopez Morales, Raul Castro Fernandez, and Peter Pietzuch. 2016. Ako: Decentralised deep learning with partial gradient exchange. In Procs of ACM SoCC."},{"key":"e_1_3_2_1_16_1","volume-title":"Poseidon: An efficient communication architecture for distributed deep learning on {GPU} clusters. In Procs of USENIX ATC.","author":"Zhang Hao","year":"2017","unstructured":"Hao Zhang , Zeyu Zheng , Shizhen Xu , Wei Dai , Qirong Ho , Xiaodan Liang , Zhiting Hu , Jinliang Wei , Pengtao Xie , and Eric P Xing . 2017 . Poseidon: An efficient communication architecture for distributed deep learning on {GPU} clusters. In Procs of USENIX ATC. Hao Zhang, Zeyu Zheng, Shizhen Xu, Wei Dai, Qirong Ho, Xiaodan Liang, Zhiting Hu, Jinliang Wei, Pengtao Xie, and Eric P Xing. 2017. Poseidon: An efficient communication architecture for distributed deep learning on {GPU} clusters. In Procs of USENIX ATC."},{"key":"e_1_3_2_1_17_1","volume-title":"Petrel: Heterogeneity-aware distributed deep learning via hybrid synchronization","author":"Zhou Qihua","year":"2020","unstructured":"Qihua Zhou , Song Guo , Zhihao Qu , Peng Li , Li Li , Minyi Guo , and Kun Wang . 2020 . Petrel: Heterogeneity-aware distributed deep learning via hybrid synchronization . Procs of IEEE TPDS ( 2020). Qihua Zhou, Song Guo, Zhihao Qu, Peng Li, Li Li, Minyi Guo, and Kun Wang. 2020. Petrel: Heterogeneity-aware distributed deep learning via hybrid synchronization. Procs of IEEE TPDS (2020)."}],"event":{"name":"EuroSys '22: Seventeenth European Conference on Computer Systems","sponsor":["SIGOPS ACM Special Interest Group on Operating Systems"],"location":"Rennes France","acronym":"EuroSys '22"},"container-title":["Proceedings of the 2nd European Workshop on Machine Learning and Systems"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3517207.3526981","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3517207.3526981","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3517207.3526981","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:31:29Z","timestamp":1750188689000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3517207.3526981"}},"subtitle":["delayed synchronization for dynamic deployment of distributed machine learning"],"short-title":[],"issued":{"date-parts":[[2022,4,5]]},"references-count":17,"alternative-id":["10.1145\/3517207.3526981","10.1145\/3517207"],"URL":"https:\/\/doi.org\/10.1145\/3517207.3526981","relation":{},"subject":[],"published":{"date-parts":[[2022,4,5]]},"assertion":[{"value":"2022-04-05","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}