{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:24:17Z","timestamp":1750220657343,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":33,"publisher":"ACM","license":[{"start":{"date-parts":[[2021,4,9]],"date-time":"2021-04-09T00:00:00Z","timestamp":1617926400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Swiss National Science Foundation NRP75 Dapprox","award":["407540_167266"],"award-info":[{"award-number":["407540_167266"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,4,9]]},"DOI":"10.1145\/3427921.3450233","type":"proceedings-article","created":{"date-parts":[[2021,4,10]],"date-time":"2021-04-10T07:37:01Z","timestamp":1618040221000},"page":"133-144","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Courier: Real-Time Optimal Batch Size Prediction for Latency SLOs in BigDL"],"prefix":"10.1145","author":[{"given":"Diego","family":"Albo Mart\u00ednez","sequence":"first","affiliation":[{"name":"Delft University of Technology, Delft, Netherlands"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sharwin","family":"Bobde","sequence":"additional","affiliation":[{"name":"Delft University of Technology, Delft, Netherlands"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tomasz","family":"Motyka","sequence":"additional","affiliation":[{"name":"Delft University of Technology, Delft, Netherlands"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lydia","family":"Chen","sequence":"additional","affiliation":[{"name":"Delft University of Technology, Delft, Netherlands"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,4,9]]},"reference":[{"key":"e_1_3_2_1_1_1","volume-title":"Sparse communication for distributed gradient descent. arXiv preprint arXiv:1704.05021","author":"Aji Alham Fikri","year":"2017","unstructured":"Alham Fikri Aji and Kenneth Heafield . 2017. Sparse communication for distributed gradient descent. arXiv preprint arXiv:1704.05021 ( 2017 ). Alham Fikri Aji and Kenneth Heafield. 2017. Sparse communication for distributed gradient descent. arXiv preprint arXiv:1704.05021 (2017)."},{"key":"e_1_3_2_1_2_1","unstructured":"Salem Alqahtani and Murat Demirbas. 2019. Performance Analysis and Comparison of Distributed Machine Learning Systems.  Salem Alqahtani and Murat Demirbas. 2019. Performance Analysis and Comparison of Distributed Machine Learning Systems."},{"key":"e_1_3_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.future.2018.06.032"},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3357223.3362707"},{"key":"e_1_3_2_1_5_1","volume-title":"Sangeetha Abdu Jyothi, and R. Campbell","author":"Hashemi Sayed Hadi","year":"2019","unstructured":"Sayed Hadi Hashemi , Sangeetha Abdu Jyothi, and R. Campbell . 2019 . TicTac: Accelerating Distributed Deep Learning with Communication Scheduling. arXiv: Distributed, Parallel, and Cluster Computing ( 2019). Sayed Hadi Hashemi, Sangeetha Abdu Jyothi, and R. Campbell. 2019. TicTac: Accelerating Distributed Deep Learning with Communication Scheduling. arXiv: Distributed, Parallel, and Cluster Computing (2019)."},{"key":"e_1_3_2_1_6_1","unstructured":"Raj Jain. 2008. The art of computer systems performance analysis .john wiley & sons.  Raj Jain. 2008. The art of computer systems performance analysis .john wiley & sons."},{"key":"e_1_3_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3292500.3330648"},{"key":"#cr-split#-e_1_3_2_1_8_1.1","doi-asserted-by":"crossref","unstructured":"Daniel Justus John Brennan Stephen Bonner and Andrew Mcgough. 2018. Predicting the Computational Cost of Deep Learning Models. 3873--3882. https:\/\/doi.org\/10.1109\/BigData.2018.8622396 10.1109\/BigData.2018.8622396","DOI":"10.1109\/BigData.2018.8622396"},{"key":"#cr-split#-e_1_3_2_1_8_1.2","doi-asserted-by":"crossref","unstructured":"Daniel Justus John Brennan Stephen Bonner and Andrew Mcgough. 2018. Predicting the Computational Cost of Deep Learning Models. 3873--3882. https:\/\/doi.org\/10.1109\/BigData.2018.8622396","DOI":"10.1109\/BigData.2018.8622396"},{"key":"e_1_3_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/3361525.3361545"},{"key":"#cr-split#-e_1_3_2_1_10_1.1","doi-asserted-by":"crossref","unstructured":"Jin Kim Qirong Ho Seunghak Lee Xun Zheng Wei Dai Garth Gibson and Eric Xing. 2016. STRADS: a distributed framework for scheduled model parallel machine learning. 1--16. https:\/\/doi.org\/10.1145\/2901318.2901331 10.1145\/2901318.2901331","DOI":"10.1145\/2901318.2901331"},{"key":"#cr-split#-e_1_3_2_1_10_1.2","doi-asserted-by":"crossref","unstructured":"Jin Kim Qirong Ho Seunghak Lee Xun Zheng Wei Dai Garth Gibson and Eric Xing. 2016. STRADS: a distributed framework for scheduled model parallel machine learning. 1--16. https:\/\/doi.org\/10.1145\/2901318.2901331","DOI":"10.1145\/2901318.2901331"},{"volume-title":"2019 USENIX Annual Technical Conference (USENIX ATC 19)","author":"Kim Jin Kyu","key":"e_1_3_2_1_11_1","unstructured":"Jin Kyu Kim , Abutalib Aghayev , Garth A. Gibson , and Eric P. Xing . 2019 a. STRADS-AP: Simplifying Distributed Machine Learning Programming without Introducing a New Programming Model . In 2019 USENIX Annual Technical Conference (USENIX ATC 19) . USENIX Association, Renton, WA, 207--222. https:\/\/www.usenix.org\/conference\/atc19\/presentation\/kim-jin Jin Kyu Kim, Abutalib Aghayev, Garth A. Gibson, and Eric P. Xing. 2019 a. STRADS-AP: Simplifying Distributed Machine Learning Programming without Introducing a New Programming Model. In 2019 USENIX Annual Technical Conference (USENIX ATC 19). USENIX Association, Renton, WA, 207--222. https:\/\/www.usenix.org\/conference\/atc19\/presentation\/kim-jin"},{"key":"e_1_3_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3302424.3303957"},{"key":"e_1_3_2_1_13_1","volume-title":"CROSSBOW: scaling deep learning with small batch sizes on multi-gpu servers. arXiv preprint arXiv:1901.02244","author":"Koliousis Alexandros","year":"2019","unstructured":"Alexandros Koliousis , Pijika Watcharapichat , Matthias Weidlich , Luo Mai , Paolo Costa , and Peter Pietzuch . 2019. CROSSBOW: scaling deep learning with small batch sizes on multi-gpu servers. arXiv preprint arXiv:1901.02244 ( 2019 ). Alexandros Koliousis, Pijika Watcharapichat, Matthias Weidlich, Luo Mai, Paolo Costa, and Peter Pietzuch. 2019. CROSSBOW: scaling deep learning with small batch sizes on multi-gpu servers. arXiv preprint arXiv:1901.02244 (2019)."},{"key":"e_1_3_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.5555\/2685048.2685095"},{"key":"e_1_3_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3358960.3379141"},{"key":"e_1_3_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3318464.3386126"},{"key":"e_1_3_2_1_17_1","volume-title":"2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS). 306--318","author":"Lin S.","year":"2018","unstructured":"S. Lin , M. Paolieri , C. Chou , and L. Golubchik . 2018. A Model-Based Approach to Streamlining Distributed Training for Asynchronous SGD . In 2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS). 306--318 . https:\/\/doi.org\/10.1109\/MASCOTS. 2018 .00037 10.1109\/MASCOTS.2018.00037 S. Lin, M. Paolieri, C. Chou, and L. Golubchik. 2018. A Model-Based Approach to Streamlining Distributed Training for Asynchronous SGD. In 2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS). 306--318. https:\/\/doi.org\/10.1109\/MASCOTS.2018.00037"},{"key":"#cr-split#-e_1_3_2_1_18_1.1","doi-asserted-by":"crossref","unstructured":"A.R. Mamidala Jiuxing Liu and D.K. Panda. 2004. Efficient Barrier and Allreduce on Infiniband clusters using multicast and adaptive algorithms. 135-- 144. https:\/\/doi.org\/10.1109\/CLUSTR.2004.1392611 10.1109\/CLUSTR.2004.1392611","DOI":"10.1109\/CLUSTR.2004.1392611"},{"key":"#cr-split#-e_1_3_2_1_18_1.2","doi-asserted-by":"crossref","unstructured":"A.R. Mamidala Jiuxing Liu and D.K. Panda. 2004. Efficient Barrier and Allreduce on Infiniband clusters using multicast and adaptive algorithms. 135-- 144. https:\/\/doi.org\/10.1109\/CLUSTR.2004.1392611","DOI":"10.1109\/CLUSTR.2004.1392611"},{"volume-title":"PipeDream: Generalized Pipeline Parallelism for DNN Training (SOSP '19)","author":"Narayanan Deepak","key":"e_1_3_2_1_19_1","unstructured":"Deepak Narayanan , Aaron Harlap , Amar Phanishayee , Vivek Seshadri , Nikhil R. Devanur , Gregory R. Ganger , Phillip B. Gibbons , and Matei Zaharia . 2019. PipeDream: Generalized Pipeline Parallelism for DNN Training (SOSP '19) . Association for Computing Machinery , New York, NY, USA , 1--15. https:\/\/doi.org\/10.1145\/3341301.3359646 10.1145\/3341301.3359646 Deepak Narayanan, Aaron Harlap, Amar Phanishayee, Vivek Seshadri, Nikhil R. Devanur, Gregory R. Ganger, Phillip B. Gibbons, and Matei Zaharia. 2019. PipeDream: Generalized Pipeline Parallelism for DNN Training (SOSP '19). Association for Computing Machinery, New York, NY, USA, 1--15. https:\/\/doi.org\/10.1145\/3341301.3359646"},{"volume-title":"A Generic Communication Scheduler for Distributed DNN Training Acceleration (SOSP '19)","author":"Peng Yanghua","key":"e_1_3_2_1_20_1","unstructured":"Yanghua Peng , Yibo Zhu , Yangrui Chen , Yixin Bao , Bairen Yi , Chang Lan , Chuan Wu , and Chuanxiong Guo . 2019. A Generic Communication Scheduler for Distributed DNN Training Acceleration (SOSP '19) . Association for Computing Machinery , New York, NY, USA , 16--29. https:\/\/doi.org\/10.1145\/3341301.3359642 10.1145\/3341301.3359642 Yanghua Peng, Yibo Zhu, Yangrui Chen, Yixin Bao, Bairen Yi, Chang Lan, Chuan Wu, and Chuanxiong Guo. 2019. A Generic Communication Scheduler for Distributed DNN Training Acceleration (SOSP '19). Association for Computing Machinery, New York, NY, USA, 16--29. https:\/\/doi.org\/10.1145\/3341301.3359642"},{"key":"e_1_3_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3135974.3135994"},{"key":"e_1_3_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3423211.3425692"},{"key":"e_1_3_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3341301.3359658"},{"key":"e_1_3_2_1_24_1","volume-title":"Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs. (11","author":"Shi Shaohuai","year":"2017","unstructured":"Shaohuai Shi and Xiaowen Chu . 2017. Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs. (11 2017 ). Shaohuai Shi and Xiaowen Chu. 2017. Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs. (11 2017)."},{"key":"e_1_3_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/3297858.3304072"},{"key":"e_1_3_2_1_26_1","volume-title":"A Survey on Distributed Machine Learning. arXiv preprint arXiv:1912.09789","author":"Verbraeken Joost","year":"2019","unstructured":"Joost Verbraeken , Matthijs Wolting , Jonathan Katzy , Jeroen Kloppenburg , Tim Verbelen , and Jan S Rellermeyer . 2019. A Survey on Distributed Machine Learning. arXiv preprint arXiv:1912.09789 ( 2019 ). Joost Verbraeken, Matthijs Wolting, Jonathan Katzy, Jeroen Kloppenburg, Tim Verbelen, and Jan S Rellermeyer. 2019. A Survey on Distributed Machine Learning. arXiv preprint arXiv:1912.09789 (2019)."},{"key":"#cr-split#-e_1_3_2_1_27_1.1","doi-asserted-by":"crossref","unstructured":"G. Wang J. Xu and B. He. 2016. A Novel Method for Tuning Configuration Parameters of Spark Based on Machine Learning. In 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC\/SmartCity\/DSS). 586--593. https:\/\/doi.org\/10.1109\/HPCC-SmartCity-DSS.2016.0088 10.1109\/HPCC-SmartCity-DSS.2016.0088","DOI":"10.1109\/HPCC-SmartCity-DSS.2016.0088"},{"key":"#cr-split#-e_1_3_2_1_27_1.2","doi-asserted-by":"crossref","unstructured":"G. Wang J. Xu and B. He. 2016. A Novel Method for Tuning Configuration Parameters of Spark Based on Machine Learning. In 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC\/SmartCity\/DSS). 586--593. https:\/\/doi.org\/10.1109\/HPCC-SmartCity-DSS.2016.0088","DOI":"10.1109\/HPCC-SmartCity-DSS.2016.0088"},{"key":"e_1_3_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33015289"},{"key":"e_1_3_2_1_29_1","volume-title":"DBS: Dynamic Batch Size For Distributed Deep Neural Network Training. arXiv preprint arXiv:2007.11831","author":"Ye Qing","year":"2020","unstructured":"Qing Ye , Yuhao Zhou , Mingjia Shi , Yanan Sun , and Jiancheng Lv . 2020 . DBS: Dynamic Batch Size For Distributed Deep Neural Network Training. arXiv preprint arXiv:2007.11831 (2020). Qing Ye, Yuhao Zhou, Mingjia Shi, Yanan Sun, and Jiancheng Lv. 2020. DBS: Dynamic Batch Size For Distributed Deep Neural Network Training. arXiv preprint arXiv:2007.11831 (2020)."}],"event":{"name":"ICPE '21: ACM\/SPEC International Conference on Performance Engineering","sponsor":["SIGMETRICS ACM Special Interest Group on Measurement and Evaluation","SIGSOFT ACM Special Interest Group on Software Engineering"],"location":"Virtual Event France","acronym":"ICPE '21"},"container-title":["Proceedings of the ACM\/SPEC International Conference on Performance Engineering"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3427921.3450233","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3427921.3450233","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:02:32Z","timestamp":1750197752000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3427921.3450233"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,4,9]]},"references-count":33,"alternative-id":["10.1145\/3427921.3450233","10.1145\/3427921"],"URL":"https:\/\/doi.org\/10.1145\/3427921.3450233","relation":{},"subject":[],"published":{"date-parts":[[2021,4,9]]},"assertion":[{"value":"2021-04-09","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}