{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,8]],"date-time":"2026-04-08T18:01:42Z","timestamp":1775671302811,"version":"3.50.1"},"reference-count":71,"publisher":"Association for Computing Machinery (ACM)","issue":"9","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2023,5]]},"abstract":"<jats:p>\n            The increasing size of both deep learning models and training data necessitates the ability to scale out model training through pipeline-parallel training, which combines pipelined model parallelism and data parallelism. However, most of them assume an ideal homogeneous dedicated cluster. As for real cloud clusters, these approaches suffer from the intensive model synchronization overheads due to the dynamic environment heterogeneity. Such a huge challenge leaves the design in a dilemma: either the performance bottleneck of the central parameter server (PS) or severe performance degradation caused by stragglers for decentralized synchronization (like All-Reduce). This approach presents SDPipe, a new\n            <jats:italic>semi-decentralized<\/jats:italic>\n            framework to get the best of both worlds, achieving both high heterogeneity tolerance and convergence efficiency in pipeline-parallel training. To provide high performance, we decentralize the communication model synchronization, which accounts for the largest proportion of synchronization overhead. In contrast, we centralize the process of group scheduling, which is lightweight but needs a global view for better performance and convergence speed against heterogeneity. We show via a prototype implementation the significant advantage of SDPipe on performance and scalability, facing different environments.\n          <\/jats:p>","DOI":"10.14778\/3598581.3598604","type":"journal-article","created":{"date-parts":[[2023,7,10]],"date-time":"2023-07-10T22:19:06Z","timestamp":1689027546000},"page":"2354-2363","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":28,"title":["SDPipe: A Semi-Decentralized Framework for Heterogeneity-Aware Pipeline-parallel Training"],"prefix":"10.14778","volume":"16","author":[{"given":"Xupeng","family":"Miao","sequence":"first","affiliation":[{"name":"Carnegie Mellon University"}]},{"given":"Yining","family":"Shi","sequence":"additional","affiliation":[{"name":"Peking University"}]},{"given":"Zhi","family":"Yang","sequence":"additional","affiliation":[{"name":"Peking University"}]},{"given":"Bin","family":"Cui","sequence":"additional","affiliation":[{"name":"Peking University"}]},{"given":"Zhihao","family":"Jia","sequence":"additional","affiliation":[{"name":"Carnegie Mellon University"}]}],"member":"320","published-online":{"date-parts":[[2023,7,10]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"2017. PyTorch. https:\/\/github.com\/pytorch\/examples\/tree\/master\/imagenet.  2017. PyTorch. https:\/\/github.com\/pytorch\/examples\/tree\/master\/imagenet."},{"key":"e_1_2_1_2_1","unstructured":"2021. NCCL. https:\/\/developer.nvidia.com\/nccl.  2021. NCCL. https:\/\/developer.nvidia.com\/nccl."},{"key":"e_1_2_1_3_1","unstructured":"2023. Alibaba Cloud Virtual GPU Instance. https:\/\/www.alibabacloud.com\/help\/en\/elastic-gpu-service\/latest\/vgpu-accelerated-instance-families.  2023. Alibaba Cloud Virtual GPU Instance. https:\/\/www.alibabacloud.com\/help\/en\/elastic-gpu-service\/latest\/vgpu-accelerated-instance-families."},{"key":"e_1_2_1_4_1","unstructured":"2023. SDPipe. https:\/\/github.com\/Hsword\/VLDB2023_SDPipe.  2023. SDPipe. https:\/\/github.com\/Hsword\/VLDB2023_SDPipe."},{"key":"e_1_2_1_5_1","unstructured":"2023. SDPipe Artifacts and Proofs. https:\/\/github.com\/Hsword\/VLDB2023_SDPipe\/blob\/main\/VLDB2023_SDPipe_Artifacts_and_Proofs.pdf.  2023. SDPipe Artifacts and Proofs. https:\/\/github.com\/Hsword\/VLDB2023_SDPipe\/blob\/main\/VLDB2023_SDPipe_Artifacts_and_Proofs.pdf."},{"key":"e_1_2_1_6_1","unstructured":"2023. Vultr. https:\/\/www.vultr.com.  2023. Vultr. https:\/\/www.vultr.com."},{"key":"e_1_2_1_7_1","volume-title":"Benoit Steiner, Paul A. Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng.","author":"Abadi Mart\u00edn","year":"2016","unstructured":"Mart\u00edn Abadi , Paul Barham , Jianmin Chen , Zhifeng Chen , Andy Davis , Jeffrey Dean , Matthieu Devin , Sanjay Ghemawat , Geoffrey Irving , Michael Isard , Manjunath Kudlur , Josh Levenberg , Rajat Monga , Sherry Moore , Derek Gordon Murray , Benoit Steiner, Paul A. Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016 . TensorFlow: A System for Large-Scale Machine Learning. In OSDI. 265--283. Mart\u00edn Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek Gordon Murray, Benoit Steiner, Paul A. Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-Scale Machine Learning. In OSDI. 265--283."},{"key":"e_1_2_1_8_1","volume-title":"Proceedings of the Seventeenth European Conference on Computer Systems","author":"Athlur Sanjith","year":"2021","unstructured":"Sanjith Athlur , Nitika Saran , Muthian Sivathanu , Ramachandran Ramjee , and Nipun Kwatra . 2021 . Varuna: scalable, low-cost training of massive deep learning models . Proceedings of the Seventeenth European Conference on Computer Systems (2021). Sanjith Athlur, Nitika Saran, Muthian Sivathanu, Ramachandran Ramjee, and Nipun Kwatra. 2021. Varuna: scalable, low-cost training of massive deep learning models. Proceedings of the Seventeenth European Conference on Computer Systems (2021)."},{"key":"e_1_2_1_9_1","unstructured":"Sanjith Athlur Nitika Saran Muthian Sivathanu Ramachandran Ramjee and Nipun Kwatra. 2022. Varuna: scalable low-cost training of massive deep learning models. In EuroSys. ACM 472--487.  Sanjith Athlur Nitika Saran Muthian Sivathanu Ramachandran Ramjee and Nipun Kwatra. 2022. Varuna: scalable low-cost training of massive deep learning models. In EuroSys. ACM 472--487."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3320060"},{"key":"e_1_2_1_11_1","unstructured":"Rishi Bommasani Drew A. Hudson Ehsan Adeli Russ Altman Simran Arora Sydney von Arx Michael S. Bernstein Jeannette Bohg Antoine Bosselut Emma Brunskill Erik Brynjolfsson Shyamal Buch Dallas Card Rodrigo Castellon Niladri S. Chatterji Annie S. Chen Kathleen Creel Jared Quincy Davis Dorottya Demszky Chris Donahue Moussa Doumbouya Esin Durmus Stefano Ermon John Etchemendy Kawin Ethayarajh Li Fei-Fei Chelsea Finn Trevor Gale Lauren Gillespie Karan Goel Noah D. Goodman Shelby Grossman Neel Guha Tatsunori Hashimoto Peter Henderson John Hewitt Daniel E. Ho Jenny Hong Kyle Hsu Jing Huang Thomas Icard Saahil Jain Dan Jurafsky Pratyusha Kalluri Siddharth Karamcheti Geoff Keeling Fereshte Khani Omar Khattab Pang Wei Koh Mark S. Krass Ranjay Krishna Rohith Kuditipudi and et al. 2021. On the Opportunities and Risks of Foundation Models. (2021). arXiv:2108.07258  Rishi Bommasani Drew A. Hudson Ehsan Adeli Russ Altman Simran Arora Sydney von Arx Michael S. Bernstein Jeannette Bohg Antoine Bosselut Emma Brunskill Erik Brynjolfsson Shyamal Buch Dallas Card Rodrigo Castellon Niladri S. Chatterji Annie S. Chen Kathleen Creel Jared Quincy Davis Dorottya Demszky Chris Donahue Moussa Doumbouya Esin Durmus Stefano Ermon John Etchemendy Kawin Ethayarajh Li Fei-Fei Chelsea Finn Trevor Gale Lauren Gillespie Karan Goel Noah D. Goodman Shelby Grossman Neel Guha Tatsunori Hashimoto Peter Henderson John Hewitt Daniel E. Ho Jenny Hong Kyle Hsu Jing Huang Thomas Icard Saahil Jain Dan Jurafsky Pratyusha Kalluri Siddharth Karamcheti Geoff Keeling Fereshte Khani Omar Khattab Pang Wei Koh Mark S. Krass Ranjay Krishna Rohith Kuditipudi and et al. 2021. On the Opportunities and Risks of Foundation Models. (2021). arXiv:2108.07258"},{"key":"e_1_2_1_12_1","volume-title":"Petals: Collaborative Inference and Fine-tuning of Large Models. ArXiv abs\/2209.01188","author":"Borzunov Alexander","year":"2022","unstructured":"Alexander Borzunov , Dmitry Baranchuk , Tim Dettmers , Max Ryabinin , Younes Belkada , Artem Chumachenko , Pavel K. Samygin , and Colin Raffel . 2022 . Petals: Collaborative Inference and Fine-tuning of Large Models. ArXiv abs\/2209.01188 (2022). Alexander Borzunov, Dmitry Baranchuk, Tim Dettmers, Max Ryabinin, Younes Belkada, Artem Chumachenko, Pavel K. Samygin, and Colin Raffel. 2022. Petals: Collaborative Inference and Fine-tuning of Large Models. ArXiv abs\/2209.01188 (2022)."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1137\/16M1080173"},{"key":"e_1_2_1_14_1","unstructured":"J. Chen Rajat Monga S. Bengio and R. J\u00f3zefowicz. 2016. Revisiting Distributed Synchronous SGD. ArXiv abs\/1702.05800 (2016).  J. Chen Rajat Monga S. Bengio and R. J\u00f3zefowicz. 2016. Revisiting Distributed Synchronous SGD. ArXiv abs\/1702.05800 (2016)."},{"key":"e_1_2_1_15_1","volume-title":"MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. CoRR abs\/1512.01274","author":"Chen Tianqi","year":"2015","unstructured":"Tianqi Chen , Mu Li , Yutian Li , Min Lin , Naiyan Wang , Minjie Wang , Tianjun Xiao , Bing Xu , Chiyuan Zhang , and Zheng Zhang . 2015. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. CoRR abs\/1512.01274 ( 2015 ). arXiv:1512.01274 Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. CoRR abs\/1512.01274 (2015). arXiv:1512.01274"},{"key":"e_1_2_1_16_1","volume-title":"Ubershuffle: Communication-efficient data shuffling for sgd via coding theory. NeurIPS.","author":"Chung Jichan","year":"2017","unstructured":"Jichan Chung , Kangwook Lee , Ramtin Pedarsani , Dimitris Papailiopoulos , and Kannan Ramchandran . 2017 . Ubershuffle: Communication-efficient data shuffling for sgd via coding theory. NeurIPS. Jichan Chung, Kangwook Lee, Ramtin Pedarsani, Dimitris Papailiopoulos, and Kannan Ramchandran. 2017. Ubershuffle: Communication-efficient data shuffling for sgd via coding theory. NeurIPS."},{"key":"e_1_2_1_17_1","doi-asserted-by":"crossref","unstructured":"Jia Deng Wei Dong Richard Socher Li-Jia Li Kai Li and Fei-Fei Li. 2009. ImageNet: A large-scale hierarchical image database. In CVPR. 248--255.  Jia Deng Wei Dong Richard Socher Li-Jia Li Kai Li and Fei-Fei Li. 2009. ImageNet: A large-scale hierarchical image database. In CVPR. 248--255.","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_2_1_18_1","volume-title":"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT. 4171--4186.","author":"Devlin Jacob","year":"2019","unstructured":"Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2019 . BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT. 4171--4186. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT. 4171--4186."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10107-014-0846-1"},{"key":"e_1_2_1_20_1","volume-title":"Hydrozoa: Dynamic Hybrid-Parallel DNN Training on Serverless Containers. In MLSys.","author":"Guo Runsheng","year":"2022","unstructured":"Runsheng Guo , Victor Guo , Antonio Kim , Josh Hildred , and Khuzaima Daudjee . 2022 . Hydrozoa: Dynamic Hybrid-Parallel DNN Training on Serverless Containers. In MLSys. Runsheng Guo, Victor Guo, Antonio Kim, Josh Hildred, and Khuzaima Daudjee. 2022. Hydrozoa: Dynamic Hybrid-Parallel DNN Training on Serverless Containers. In MLSys."},{"key":"e_1_2_1_21_1","unstructured":"William L. Hamilton Zhitao Ying and Jure Leskovec. 2017. Inductive Representation Learning on Large Graphs. In NeurIPS. 1024--1034.  William L. Hamilton Zhitao Ying and Jure Leskovec. 2017. Inductive Representation Learning on Large Graphs. In NeurIPS. 1024--1034."},{"key":"e_1_2_1_22_1","doi-asserted-by":"crossref","unstructured":"Aaron Harlap Henggang Cui Wei Dai Jinliang Wei Gregory R Ganger Phillip B Gibbons Garth A Gibson and Eric P Xing. 2016. Addressing the straggler problem for iterative convergent parallel ML. In SoCC. 98--111.  Aaron Harlap Henggang Cui Wei Dai Jinliang Wei Gregory R Ganger Phillip B Gibbons Garth A Gibson and Eric P Xing. 2016. Addressing the straggler problem for iterative convergent parallel ML. In SoCC. 98--111.","DOI":"10.1145\/2987550.2987554"},{"key":"e_1_2_1_23_1","unstructured":"Kaiming He Xiangyu Zhang Shaoqing Ren and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR. 770--778.  Kaiming He Xiangyu Zhang Shaoqing Ren and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR. 770--778."},{"key":"e_1_2_1_24_1","volume-title":"Phillip B. Gibbons, Garth A. Gibson, Gregory R. Ganger, and Eric P. Xing.","author":"Ho Qirong","year":"2013","unstructured":"Qirong Ho , James Cipar , Henggang Cui , Seunghak Lee , Jin Kyu Kim , Phillip B. Gibbons, Garth A. Gibson, Gregory R. Ganger, and Eric P. Xing. 2013 . More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server. In NeurIPS. 1223--1231. Qirong Ho, James Cipar, Henggang Cui, Seunghak Lee, Jin Kyu Kim, Phillip B. Gibbons, Garth A. Gibson, Gregory R. Ganger, and Eric P. Xing. 2013. More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server. In NeurIPS. 1223--1231."},{"key":"e_1_2_1_25_1","volume-title":"Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds. In NSDI. 629--647.","author":"Hsieh Kevin","year":"2017","unstructured":"Kevin Hsieh , Aaron Harlap , Nandita Vijaykumar , Dimitris Konomis , Gregory R. Ganger , Phillip B. Gibbons , and Onur Mutlu . 2017 . Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds. In NSDI. 629--647. Kevin Hsieh, Aaron Harlap, Nandita Vijaykumar, Dimitris Konomis, Gregory R. Ganger, Phillip B. Gibbons, and Onur Mutlu. 2017. Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds. In NSDI. 629--647."},{"key":"e_1_2_1_26_1","volume-title":"HyoukJoong Lee, Jiquan Ngiam, Quoc V. Le, Yonghui Wu, and Zhifeng Chen.","author":"Huang Yanping","year":"2019","unstructured":"Yanping Huang , Youlong Cheng , Ankur Bapna , Orhan Firat , Dehao Chen , Mia Xu Chen , HyoukJoong Lee, Jiquan Ngiam, Quoc V. Le, Yonghui Wu, and Zhifeng Chen. 2019 . GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism. In NeurIPS. 103--112. Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Dehao Chen, Mia Xu Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V. Le, Yonghui Wu, and Zhifeng Chen. 2019. GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism. In NeurIPS. 103--112."},{"key":"e_1_2_1_27_1","doi-asserted-by":"crossref","unstructured":"Jiawei Jiang Bin Cui Ce Zhang and Lele Yu. 2017. Heterogeneity-aware Distributed Parameter Servers. In SIGMOD. 463--478.  Jiawei Jiang Bin Cui Ce Zhang and Lele Yu. 2017. Heterogeneity-aware Distributed Parameter Servers. In SIGMOD. 463--478.","DOI":"10.1145\/3035918.3035933"},{"key":"e_1_2_1_28_1","doi-asserted-by":"crossref","unstructured":"Jiawei Jiang Shaoduo Gan Yue Liu Fanlin Wang Gustavo Alonso Ana Klimovic Ankit Singla Wentao Wu and Ce Zhang. 2021. Towards Demystifying Serverless Machine Learning Training. In SIGMOD. ACM 857--871.  Jiawei Jiang Shaoduo Gan Yue Liu Fanlin Wang Gustavo Alonso Ana Klimovic Ankit Singla Wentao Wu and Ce Zhang. 2021. Towards Demystifying Serverless Machine Learning Training. In SIGMOD. ACM 857--871.","DOI":"10.1145\/3448016.3459240"},{"key":"e_1_2_1_29_1","volume-title":"USENIX Symposium on Operating Systems Design and Implementation.","author":"Jiang Yimin","year":"2020","unstructured":"Yimin Jiang , Yibo Zhu , Chang Lan , Bairen Yi , Yong Cui , and Chuanxiong Guo . 2020 . A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU\/CPU Clusters . In USENIX Symposium on Operating Systems Design and Implementation. Yimin Jiang, Yibo Zhu, Chang Lan, Bairen Yi, Yong Cui, and Chuanxiong Guo. 2020. A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU\/CPU Clusters. In USENIX Symposium on Operating Systems Design and Implementation."},{"key":"e_1_2_1_30_1","volume-title":"Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World's Largest and Most Powerful Generative Language Model. NVIDIA Developer Blog","author":"Kharya Paresh","year":"2021","unstructured":"Paresh Kharya and Ali Alvi . 2021. Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World's Largest and Most Powerful Generative Language Model. NVIDIA Developer Blog ( 2021 ). Paresh Kharya and Ali Alvi. 2021. Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World's Largest and Most Powerful Generative Language Model. NVIDIA Developer Blog (2021)."},{"key":"e_1_2_1_31_1","first-page":"1","article-title":"Parallax","volume":"43","author":"Kim Soojeong","year":"2019","unstructured":"Soojeong Kim , Gyeong-In Yu , Hojin Park , Sungwoo Cho , Eunji Jeong , Hyeonmin Ha , Sanha Lee , Joo Seong Jeong , and Byung-Gon Chun . 2019 . Parallax : Sparsity-aware Data Parallel Training of Deep Neural Networks. In EuroSys. 43 : 1 -- 43 :15. Soojeong Kim, Gyeong-In Yu, Hojin Park, Sungwoo Cho, Eunji Jeong, Hyeonmin Ha, Sanha Lee, Joo Seong Jeong, and Byung-Gon Chun. 2019. Parallax: Sparsity-aware Data Parallel Training of Deep Neural Networks. In EuroSys. 43:1--43:15.","journal-title":"Sparsity-aware Data Parallel Training of Deep Neural Networks. In EuroSys."},{"key":"e_1_2_1_32_1","volume-title":"Alexander J. Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J. Shekita, and Bor-Yiing Su.","author":"Li Mu","year":"2014","unstructured":"Mu Li , David G. Andersen , Jun Woo Park , Alexander J. Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J. Shekita, and Bor-Yiing Su. 2014 . Scaling Distributed Machine Learning with the Parameter Server. In OSDI. 583--598. Mu Li, David G. Andersen, Jun Woo Park, Alexander J. Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J. Shekita, and Bor-Yiing Su. 2014. Scaling Distributed Machine Learning with the Parameter Server. In OSDI. 583--598."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.14778\/3415478.3415530"},{"key":"e_1_2_1_34_1","unstructured":"Xiangru Lian Ce Zhang Huan Zhang Cho-Jui Hsieh Wei Zhang and Ji Liu. 2017. Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent. In NeruIPS. 5330--5340.  Xiangru Lian Ce Zhang Huan Zhang Cho-Jui Hsieh Wei Zhang and Ji Liu. 2017. Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent. In NeruIPS. 5330--5340."},{"key":"e_1_2_1_35_1","first-page":"3049","article-title":"Asynchronous Decentralized Parallel Stochastic Gradient Descent","volume":"80","author":"Lian Xiangru","year":"2018","unstructured":"Xiangru Lian , Wei Zhang , Ce Zhang , and Ji Liu . 2018 . Asynchronous Decentralized Parallel Stochastic Gradient Descent . In ICML , Vol. 80. 3049 -- 3058 . Xiangru Lian, Wei Zhang, Ce Zhang, and Ji Liu. 2018. Asynchronous Decentralized Parallel Stochastic Gradient Descent. In ICML, Vol. 80. 3049--3058.","journal-title":"ICML"},{"key":"e_1_2_1_36_1","volume-title":"MixML: A Unified Analysis of Weakly Consistent Parallel Learning. CoRR abs\/2005.06706","author":"Lu Yucheng","year":"2020","unstructured":"Yucheng Lu , Jack Nash , and Christopher De Sa. 2020. MixML: A Unified Analysis of Weakly Consistent Parallel Learning. CoRR abs\/2005.06706 ( 2020 ). arXiv:2005.06706 Yucheng Lu, Jack Nash, and Christopher De Sa. 2020. MixML: A Unified Analysis of Weakly Consistent Parallel Learning. CoRR abs\/2005.06706 (2020). arXiv:2005.06706"},{"key":"e_1_2_1_37_1","volume-title":"Prague: High-Performance Heterogeneity-Aware Asynchronous Decentralized Training. In ASPLOS. 401--416.","author":"Luo Qinyi","year":"2020","unstructured":"Qinyi Luo , Jiaao He , Youwei Zhuo , and Xuehai Qian . 2020 . Prague: High-Performance Heterogeneity-Aware Asynchronous Decentralized Training. In ASPLOS. 401--416. Qinyi Luo, Jiaao He, Youwei Zhuo, and Xuehai Qian. 2020. Prague: High-Performance Heterogeneity-Aware Asynchronous Decentralized Training. In ASPLOS. 401--416."},{"key":"e_1_2_1_38_1","doi-asserted-by":"crossref","unstructured":"Xupeng Miao Xiaonan Nie Yingxia Shao Zhi Yang Jiawei Jiang Lingxiao Ma and Bin Cui. 2021. Heterogeneity-Aware Distributed Machine Learning Training via Partial Reduce. In SIGMOD. ACM 2262--2270.  Xupeng Miao Xiaonan Nie Yingxia Shao Zhi Yang Jiawei Jiang Lingxiao Ma and Bin Cui. 2021. Heterogeneity-Aware Distributed Machine Learning Training via Partial Reduce. In SIGMOD. ACM 2262--2270.","DOI":"10.1145\/3448016.3452773"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11432-022-3581-9"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.14778\/3570690.3570697"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.14778\/3489496.3489511"},{"key":"e_1_2_1_42_1","volume-title":"Synergy: Resource Sensitive DNN Scheduling in Multi-Tenant Clusters. OSDI","author":"Mohan Jayashree","year":"2022","unstructured":"Jayashree Mohan , Amar Phanishayee , Janardhan Kulkarni , and Vijay Chidambaram . 2022 . Synergy: Resource Sensitive DNN Scheduling in Multi-Tenant Clusters. OSDI (2022). Jayashree Mohan, Amar Phanishayee, Janardhan Kulkarni, and Vijay Chidambaram. 2022. Synergy: Resource Sensitive DNN Scheduling in Multi-Tenant Clusters. OSDI (2022)."},{"key":"e_1_2_1_43_1","doi-asserted-by":"crossref","unstructured":"Deepak Narayanan Aaron Harlap Amar Phanishayee Vivek Seshadri Nikhil R. Devanur Gregory R. Ganger Phillip B. Gibbons and Matei Zaharia. 2019. PipeDream: generalized pipeline parallelism for DNN training. In SOSP. 1--15.  Deepak Narayanan Aaron Harlap Amar Phanishayee Vivek Seshadri Nikhil R. Devanur Gregory R. Ganger Phillip B. Gibbons and Matei Zaharia. 2019. PipeDream: generalized pipeline parallelism for DNN training. In SOSP. 1--15.","DOI":"10.1145\/3341301.3359646"},{"key":"e_1_2_1_44_1","unstructured":"Deepak Narayanan Amar Phanishayee Kaiyu Shi Xie Chen and Matei Zaharia. 2021. Memory-Efficient Pipeline-Parallel DNN Training. In ICML. 7937--7947.  Deepak Narayanan Amar Phanishayee Kaiyu Shi Xie Chen and Matei Zaharia. 2021. Memory-Efficient Pipeline-Parallel DNN Training. In ICML. 7937--7947."},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/TAC.2009.2031203"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/TAC.2008.2009515"},{"key":"e_1_2_1_47_1","volume-title":"Proceedings of the VLDB Endowment","author":"Nie Xiaonan","year":"2023","unstructured":"Xiaonan Nie , Yi Liu , Fangcheng Fu , Jinbao Xue , Dian Jiao , Xupeng Miao , Yangyu Tao , and Bin Cui . 2023 . Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent . Proceedings of the VLDB Endowment (2023). Xiaonan Nie, Yi Liu, Fangcheng Fu, Jinbao Xue, Dian Jiao, Xupeng Miao, Yangyu Tao, and Bin Cui. 2023. Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent. Proceedings of the VLDB Endowment (2023)."},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/3588964"},{"key":"e_1_2_1_49_1","unstructured":"Jay H Park Gyeongchan Yun M Yi Chang Nguyen T Nguyen Seungmin Lee Jaesik Choi Sam H Noh and Young-ri Choi. 2020. {HetPipe}: Enabling Large {DNN} Training on (Whimpy) Heterogeneous {GPU} Clusters through Integration of Pipelined Model Parallelism and Data Parallelism. In ATC. 307--321.  Jay H Park Gyeongchan Yun M Yi Chang Nguyen T Nguyen Seungmin Lee Jaesik Choi Sam H Noh and Young-ri Choi. 2020. {HetPipe}: Enabling Large {DNN} Training on (Whimpy) Heterogeneous {GPU} Clusters through Integration of Pipelined Model Parallelism and Data Parallelism. In ATC. 307--321."},{"key":"e_1_2_1_50_1","volume-title":"PyTorch: An Imperative Style","author":"Paszke Adam","unstructured":"Adam Paszke and Sam Gross . 2019. PyTorch: An Imperative Style , High-Performance Deep Learning Library . In NeurIPS. 8024--8035. Adam Paszke and Sam Gross. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In NeurIPS. 8024--8035."},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jpdc.2008.09.002"},{"key":"e_1_2_1_52_1","volume-title":"Veeravalli","author":"Ram Sundhar Srinivasan","year":"2009","unstructured":"Sundhar Srinivasan Ram , Angelia Nedic , and Venugopal V . Veeravalli . 2009 . Asynchronous gossip algorithms for stochastic optimization. In CDC. IEEE , 3581--3586. Sundhar Srinivasan Ram, Angelia Nedic, and Venugopal V. Veeravalli. 2009. Asynchronous gossip algorithms for stochastic optimization. In CDC. IEEE, 3581--3586."},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10957-010-9737-7"},{"key":"e_1_2_1_54_1","volume-title":"Horovod: fast and easy distributed deep learning in TensorFlow. CoRR abs\/1802.05799","author":"Sergeev Alexander","year":"2018","unstructured":"Alexander Sergeev and Mike Del Balso . 2018. Horovod: fast and easy distributed deep learning in TensorFlow. CoRR abs\/1802.05799 ( 2018 ). Alexander Sergeev and Mike Del Balso. 2018. Horovod: fast and easy distributed deep learning in TensorFlow. CoRR abs\/1802.05799 (2018)."},{"key":"e_1_2_1_55_1","unstructured":"Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In ICLR.  Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In ICLR."},{"key":"e_1_2_1_56_1","unstructured":"Sebastian U. Stich. 2019. Local SGD Converges Fast and Communicates Little. In ICLR. OpenReview.net.  Sebastian U. Stich. 2019. Local SGD Converges Fast and Communicates Little. In ICLR. OpenReview.net."},{"key":"e_1_2_1_57_1","volume-title":"Bamboo: Making Preemptible Instances Resilient for Affordable Training of Large DNNs. ArXiv abs\/2204.12013","author":"Thorpe John","year":"2022","unstructured":"John Thorpe , Pengzhan Zhao , Jon Eyolfson , Yifan Qiao , Zhihao Jia , Minjia Zhang , Ravi Netravali , and Guoqing Harry Xu . 2022 . Bamboo: Making Preemptible Instances Resilient for Affordable Training of Large DNNs. ArXiv abs\/2204.12013 (2022). John Thorpe, Pengzhan Zhao, Jon Eyolfson, Yifan Qiao, Zhihao Jia, Minjia Zhang, Ravi Netravali, and Guoqing Harry Xu. 2022. Bamboo: Making Preemptible Instances Resilient for Affordable Training of Large DNNs. ArXiv abs\/2204.12013 (2022)."},{"key":"e_1_2_1_58_1","volume-title":"Bamboo: Making Preemptible Instances Resilient for Affordable Training of Large DNNs. NSDI","author":"Thorpe John","year":"2023","unstructured":"John Thorpe , Pengzhan Zhao , Jonathan Eyolfson , Yifan Qiao , Zhihao Jia , Minjia Zhang , Ravi Netravali , and Guoqing Harry Xu . 2023 . Bamboo: Making Preemptible Instances Resilient for Affordable Training of Large DNNs. NSDI (2023). John Thorpe, Pengzhan Zhao, Jonathan Eyolfson, Yifan Qiao, Zhihao Jia, Minjia Zhang, Ravi Netravali, and Guoqing Harry Xu. 2023. Bamboo: Making Preemptible Instances Resilient for Affordable Training of Large DNNs. NSDI (2023)."},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1007\/s41019-021-00174-0"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1007\/s41019-022-00202-7"},{"key":"e_1_2_1_61_1","volume-title":"Cooperative SGD: A Unified Framework for the Design and Analysis of Communication-Efficient SGD Algorithms. In ICML Workshop.","author":"Wang Jianyu","year":"2019","unstructured":"Jianyu Wang and Gauri Joshi . 2019 . Cooperative SGD: A Unified Framework for the Design and Analysis of Communication-Efficient SGD Algorithms. In ICML Workshop. Jianyu Wang and Gauri Joshi. 2019. Cooperative SGD: A Unified Framework for the Design and Analysis of Communication-Efficient SGD Algorithms. In ICML Workshop."},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11390-023-2894-6"},{"key":"e_1_2_1_63_1","volume-title":"MLaaS in the Wild: Workload Analysis and Scheduling in Large-Scale Heterogeneous GPU Clusters","author":"Weng Qizhen","unstructured":"Qizhen Weng , Wencong Xiao , Yinghao Yu , Wei Wang , Cheng Wang , Jian He , Yong Li , Liping Zhang , Wei Lin , and Yu Ding . 2022. MLaaS in the Wild: Workload Analysis and Scheduling in Large-Scale Heterogeneous GPU Clusters . In NSDI. USENIX Association , 945--960. Qizhen Weng, Wencong Xiao, Yinghao Yu, Wei Wang, Cheng Wang, Jian He, Yong Li, Liping Zhang, Wei Lin, and Yu Ding. 2022. MLaaS in the Wild: Workload Analysis and Scheduling in Large-Scale Heterogeneous GPU Clusters. In NSDI. USENIX Association, 945--960."},{"key":"e_1_2_1_64_1","volume-title":"Gandiva: Introspective Cluster Scheduling for Deep Learning. In OSDI. 595--610.","author":"Xiao Wencong","year":"2018","unstructured":"Wencong Xiao , Romil Bhardwaj , Ramachandran Ramjee , Muthian Sivathanu , Nipun Kwatra , Zhenhua Han , Pratyush Patel , Xuan Peng , Hanyu Zhao , Quanlu Zhang , Fan Yang , and Lidong Zhou . 2018 . Gandiva: Introspective Cluster Scheduling for Deep Learning. In OSDI. 595--610. Wencong Xiao, Romil Bhardwaj, Ramachandran Ramjee, Muthian Sivathanu, Nipun Kwatra, Zhenhua Han, Pratyush Patel, Xuan Peng, Hanyu Zhao, Quanlu Zhang, Fan Yang, and Lidong Zhou. 2018. Gandiva: Introspective Cluster Scheduling for Deep Learning. In OSDI. 595--610."},{"key":"e_1_2_1_65_1","volume-title":"Tianyi Zhang, Tri Dao, Beidi Chen, Percy Liang, Christopher Re, and Ce Zhang.","author":"Yuan Binhang","year":"2022","unstructured":"Binhang Yuan , Yongjun He , Jared Quincy Davis , Tianyi Zhang, Tri Dao, Beidi Chen, Percy Liang, Christopher Re, and Ce Zhang. 2022 . Decentralized Training of Foundation Models in Heterogeneous Environments. In Advances in Neural Information Processing Systems, Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (Eds .). https:\/\/openreview.net\/forum?id=UHoGOaGjEq Binhang Yuan, Yongjun He, Jared Quincy Davis, Tianyi Zhang, Tri Dao, Beidi Chen, Percy Liang, Christopher Re, and Ce Zhang. 2022. Decentralized Training of Foundation Models in Heterogeneous Environments. In Advances in Neural Information Processing Systems, Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (Eds.). https:\/\/openreview.net\/forum?id=UHoGOaGjEq"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1137\/130943170"},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.14778\/2732977.2733001"},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1145\/3503222.3507735"},{"key":"e_1_2_1_69_1","doi-asserted-by":"crossref","unstructured":"Yihao Zhao Yuanqiang Liu Yanghua Peng Yibo Zhu Xuanzhe Liu and Xin Jin. 2022. Multi-resource interleaving for deep learning training. In SIGCOMM. ACM 428--440.  Yihao Zhao Yuanqiang Liu Yanghua Peng Yibo Zhu Xuanzhe Liu and Xin Jin. 2022. Multi-resource interleaving for deep learning training. In SIGCOMM. ACM 428--440.","DOI":"10.1145\/3544216.3544224"},{"key":"e_1_2_1_70_1","volume-title":"Alpa: Automating Inter-and Intra-Operator Parallelism for Distributed Deep Learning. OSDI","author":"Zheng Lianmin","year":"2022","unstructured":"Lianmin Zheng , Zhuohan Li , Hao Zhang , Yonghao Zhuang , Zhifeng Chen , Yanping Huang , Yida Wang , Yuanzhong Xu , Danyang Zhuo , Joseph E Gonzalez , 2022 . Alpa: Automating Inter-and Intra-Operator Parallelism for Distributed Deep Learning. OSDI (2022). Lianmin Zheng, Zhuohan Li, Hao Zhang, Yonghao Zhuang, Zhifeng Chen, Yanping Huang, Yida Wang, Yuanzhong Xu, Danyang Zhuo, Joseph E Gonzalez, et al. 2022. Alpa: Automating Inter-and Intra-Operator Parallelism for Distributed Deep Learning. OSDI (2022)."},{"key":"e_1_2_1_71_1","unstructured":"Martin Zinkevich M. Weimer Alex Smola and L. Li. 2010. Parallelized Stochastic Gradient Descent. In NeurIPS.  Martin Zinkevich M. Weimer Alex Smola and L. Li. 2010. Parallelized Stochastic Gradient Descent. In NeurIPS."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3598581.3598604","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,7,19]],"date-time":"2023-07-19T23:01:25Z","timestamp":1689807685000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3598581.3598604"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,5]]},"references-count":71,"journal-issue":{"issue":"9","published-print":{"date-parts":[[2023,5]]}},"alternative-id":["10.14778\/3598581.3598604"],"URL":"https:\/\/doi.org\/10.14778\/3598581.3598604","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2023,5]]},"assertion":[{"value":"2023-07-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}