{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,18]],"date-time":"2026-01-18T13:07:14Z","timestamp":1768741634419,"version":"3.49.0"},"publisher-location":"New York, NY, USA","reference-count":29,"publisher":"ACM","license":[{"start":{"date-parts":[[2023,12,5]],"date-time":"2023-12-05T00:00:00Z","timestamp":1701734400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2023,12,8]]},"DOI":"10.1145\/3630048.3630187","type":"proceedings-article","created":{"date-parts":[[2023,11,28]],"date-time":"2023-11-28T19:42:35Z","timestamp":1701200555000},"page":"85-104","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["Server-Side Stepsizes and Sampling Without Replacement Provably Help in Federated Optimization"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6428-1866","authenticated-orcid":false,"given":"Grigory","family":"Malinovsky","sequence":"first","affiliation":[{"name":"King Abdullah University of Science and Technology, Thuwal, Saudi Arabia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5241-7292","authenticated-orcid":false,"given":"Konstantin","family":"Mishchenko","sequence":"additional","affiliation":[{"name":"Samsung AI Center, Cambridge, United Kingdom"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4380-5848","authenticated-orcid":false,"given":"Peter","family":"Richt\u00e1rik","sequence":"additional","affiliation":[{"name":"King Abdullah University of Science and Technology, Thuwal, Saudi Arabia"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2023,12,5]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"Dan Alistarh Torsten Hoefler Mikael Johansson Sarit Khirirat Nikola Konstantinov and C\u00e9dric Renggli. 2018. The convergence of sparsified gradient methods. In Advances in Neural Information Processing Systems."},{"key":"e_1_3_2_1_2_1","volume-title":"Neural Networks: Tricks of the trade","author":"Bengio Yoshua","unstructured":"Yoshua Bengio. 2012. Practical recommendations for gradient-based training of deep architectures. In Neural Networks: Tricks of the trade. Springer, 437--478."},{"key":"e_1_3_2_1_3_1","unstructured":"L\u00e9on Bottou. 2009. Curiously fast convergence of some stochastic gradient descent algorithms. (2009). Unpublished open problem offered to the attendance of the SLDS 2009 conference."},{"key":"e_1_3_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/1961189.1961199"},{"key":"e_1_3_2_1_5_1","volume-title":"On Large-Cohort Training for Federated Learning. arXiv preprint arXiv:2106.07820","author":"Charles Zachary","year":"2021","unstructured":"Zachary Charles, Zachary Garrett, Zhouyuan Huo, Sergei Shmulyian, and Virginia Smith. 2021. On Large-Cohort Training for Federated Learning. arXiv preprint arXiv:2106.07820 (2021)."},{"key":"e_1_3_2_1_6_1","volume-title":"Optimal client sampling for federated learning. arXiv preprint arXiv:2010.13723","author":"Chen Wenlin","year":"2020","unstructured":"Wenlin Chen, Samuel Horvath, and Peter Richt\u00e1rik. 2020. Optimal client sampling for federated learning. arXiv preprint arXiv:2010.13723 (2020)."},{"key":"e_1_3_2_1_7_1","volume-title":"MARINA: Faster Non-Convex Distributed Learning with Compression. 139 (18--24","author":"Gorbunov Eduard","year":"2021","unstructured":"Eduard Gorbunov, Konstantin Burlachenko, Zhize Li, and Peter Richt\u00e1rik. 2021. MARINA: Faster Non-Convex Distributed Learning with Compression. 139 (18--24 Jul 2021), 3788--3798."},{"key":"e_1_3_2_1_8_1","unstructured":"Eduard Gorbunov Filip Hanzely and Peter Richt\u00e1rik. 2020. Local SGD: unified theory and new efficient methods. In NeurIPS."},{"key":"e_1_3_2_1_9_1","volume-title":"Measuring the effects of non-identical data distribution for federated visual classification. arXiv preprint arXiv:1909.06335","author":"Harry Hsu Tzu-Ming","year":"2019","unstructured":"Tzu-Ming Harry Hsu, Hang Qi, and Matthew Brown. 2019. Measuring the effects of non-identical data distribution for federated visual classification. arXiv preprint arXiv:1909.06335 (2019)."},{"key":"e_1_3_2_1_10_1","volume-title":"Keith Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, et al.","author":"Kairouz Peter","year":"2021","unstructured":"Peter Kairouz, H. Brendan McMahan, Brendan Avent, Aur\u00e9lien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Keith Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, et al. 2021. Advances and open problems in federated learning. Foundations and Trends\u00ae in Machine Learning 14, 1 (2021)."},{"key":"e_1_3_2_1_11_1","volume-title":"International Conference on Machine Learning. PMLR, 5132--5143","author":"Karimireddy Sai Praneeth","year":"2020","unstructured":"Sai Praneeth Karimireddy, Satyen Kale, Mehryar Mohri, Sashank Reddi, Sebastian U. Stich, and Ananda Theertha Suresh. 2020. SCAFFOLD: Stochastic controlled averaging for federated learning. In International Conference on Machine Learning. PMLR, 5132--5143."},{"key":"e_1_3_2_1_12_1","volume-title":"Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics. PMLR, 4519--4529","author":"Khaled Ahmed","year":"2020","unstructured":"Ahmed Khaled, Konstantin Mishchenko, and Peter Richt\u00e1rik. 2020. Tighter Theory for Local SGD on Identical and Heterogeneous Data. In Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics. PMLR, 4519--4529."},{"key":"e_1_3_2_1_13_1","volume-title":"Better theory for SGD in the nonconvex world. arXiv preprint arXiv:2002.03329","author":"Khaled Ahmed","year":"2020","unstructured":"Ahmed Khaled and Peter Richt\u00e1rik. 2020. Better theory for SGD in the nonconvex world. arXiv preprint arXiv:2002.03329 (2020)."},{"key":"e_1_3_2_1_14_1","volume-title":"International Conference on Machine Learning. PMLR, 5381--5393","author":"Koloskova Anastasia","unstructured":"Anastasia Koloskova, Nicolas Loizou, Sadra Boreiri, Martin Jaggi, and Sebastian U. Stich. 2020. A unified theory of decentralized SGD with changing topology and local updates. In International Conference on Machine Learning. PMLR, 5381--5393."},{"key":"e_1_3_2_1_15_1","volume-title":"Federated optimization: Distributed machine learning for on-device intelligence. arXiv preprint arXiv:1610.02527","author":"Kone\u010dn\u1ef3 Jakub","year":"2016","unstructured":"Jakub Kone\u010dn\u1ef3, H. Brendan McMahan, Daniel Ramage, and Peter Richt\u00e1rik. 2016. Federated optimization: Distributed machine learning for on-device intelligence. arXiv preprint arXiv:1610.02527 (2016)."},{"key":"e_1_3_2_1_16_1","volume-title":"NIPS Private Multi-Party Machine Learning Workshop.","author":"Kone\u010dn\u1ef3 Jakub","year":"2016","unstructured":"Jakub Kone\u010dn\u1ef3, H. Brendan McMahan, Felix Yu, Peter Richt\u00e1rik, Ananda Theertha Suresh, and Dave Bacon. 2016. Federated learning: strategies for improving communication efficiency. In NIPS Private Multi-Party Machine Learning Workshop."},{"key":"e_1_3_2_1_17_1","unstructured":"Alex Krizhevsky Geoffrey Hinton et al. 2009. Learning multiple layers of features from tiny images. (2009)."},{"key":"e_1_3_2_1_18_1","volume-title":"Proceedings of the 37th International Conference on Machine Learning","volume":"119","author":"Malitsky Yura","year":"2020","unstructured":"Yura Malitsky and Konstantin Mishchenko. 2020. Adaptive Gradient Descent without Descent. In Proceedings of the 37th International Conference on Machine Learning, Vol. 119. PMLR, 6702--6712."},{"key":"e_1_3_2_1_19_1","volume-title":"Proceedings of the 20th International Conference on Artificial Intelligence and Statistics. PMLR, 1273--1282","author":"McMahan H. Brendan","year":"2017","unstructured":"H. Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Ag\u00fcera y Arcas. 2017. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics. PMLR, 1273--1282."},{"key":"e_1_3_2_1_20_1","first-page":"17309","article-title":"Random Reshuffling: Simple Analysis with Vast Improvements","volume":"33","author":"Mishchenko Konstantin","year":"2020","unstructured":"Konstantin Mishchenko, Ahmed Khaled, and Peter Richt\u00e1rik. 2020. Random Reshuffling: Simple Analysis with Vast Improvements. Advances in Neural Information Processing Systems 33 (2020), 17309--17320.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_2_1_21_1","volume-title":"Proximal and Federated Random Reshuffling. arXiv preprint arXiv:2102.06704","author":"Mishchenko Konstantin","year":"2021","unstructured":"Konstantin Mishchenko, Ahmed Khaled, and Peter Richt\u00e1rik. 2021. Proximal and Federated Random Reshuffling. arXiv preprint arXiv:2102.06704 (2021)."},{"key":"e_1_3_2_1_22_1","unstructured":"Adam Paszke Sam Gross Soumith Chintala Gregory Chanan Edward Yang Zachary DeVito Zeming Lin Alban Desmaison Luca Antiga and Adam Lerer. 2017. Automatic differentiation in PyTorch. (2017)."},{"key":"e_1_3_2_1_23_1","volume-title":"Adaptive Federated Optimization. In International Conference on Learning Representations.","author":"Reddi Sashank J.","year":"2020","unstructured":"Sashank J. Reddi, Zachary Charles, Manzil Zaheer, Zachary Garrett, Keith Rush, Sanjiv Kumar, and H. Brendan McMahan. 2020. Adaptive Federated Optimization. In International Conference on Learning Representations."},{"key":"e_1_3_2_1_24_1","volume-title":"EF21: A new, simpler, theoretically better, and practically faster error feedback. arXiv preprint arXiv:2106.05203","author":"Richt\u00e1rik Peter","year":"2021","unstructured":"Peter Richt\u00e1rik, Igor Sokolov, and Ilyas Fatkhullin. 2021. EF21: A new, simpler, theoretically better, and practically faster error feedback. arXiv preprint arXiv:2106.05203 (2021)."},{"key":"e_1_3_2_1_25_1","volume-title":"On the Origin of Implicit Regularization in Stochastic Gradient Descent. In International Conference on Learning Representations.","author":"Smith Samuel L.","year":"2020","unstructured":"Samuel L. Smith, Benoit Dherin, David Barrett, and Soham De. 2020. On the Origin of Implicit Regularization in Stochastic Gradient Descent. In International Conference on Learning Representations."},{"key":"e_1_3_2_1_26_1","volume-title":"Stich and Sai Praneeth Karimireddy","author":"Sebastian","year":"2019","unstructured":"Sebastian U. Stich and Sai Praneeth Karimireddy. 2019. The Error-Feedback Framework: Better Rates for SGD with Delayed Gradients and Compressed Communication. arXiv preprint arXiv:1909.05350 (2019)."},{"key":"e_1_3_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1007\/s40305-020-00309--6"},{"key":"e_1_3_2_1_28_1","volume-title":"Advances in Neural Information Processing Systems","volume":"33","author":"Woodworth Blake E.","year":"2020","unstructured":"Blake E. Woodworth, Kumar Kshitij Patel, and Nati Srebro. 2020. Minibatch vs Local SGD for Heterogeneous Distributed Learning. In Advances in Neural Information Processing Systems, Vol. 33. Curran Associates, Inc., 6281--6292."},{"key":"e_1_3_2_1_29_1","volume-title":"Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond. arXiv preprint arXiv:2110.10342","author":"Yun Chulhee","year":"2021","unstructured":"Chulhee Yun, Shashank Rajput, and Suvrit Sra. 2021. Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond. arXiv preprint arXiv:2110.10342 (2021)."}],"event":{"name":"CoNEXT 2023: The 19th International Conference on emerging Networking EXperiments and Technologies","location":"Paris France","acronym":"CoNEXT 2023","sponsor":["SIGCOMM ACM Special Interest Group on Data Communication"]},"container-title":["Proceedings of the 4th International Workshop on Distributed Machine Learning"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3630048.3630187","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3630048.3630187","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,23]],"date-time":"2025-08-23T20:15:07Z","timestamp":1755980107000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3630048.3630187"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,12,5]]},"references-count":29,"alternative-id":["10.1145\/3630048.3630187","10.1145\/3630048"],"URL":"https:\/\/doi.org\/10.1145\/3630048.3630187","relation":{},"subject":[],"published":{"date-parts":[[2023,12,5]]},"assertion":[{"value":"2023-12-05","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}