{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,2]],"date-time":"2025-11-02T03:32:53Z","timestamp":1762054373812,"version":"build-2065373602"},"reference-count":71,"publisher":"Springer Science and Business Media LLC","issue":"4","license":[{"start":{"date-parts":[[2022,5,26]],"date-time":"2022-05-26T00:00:00Z","timestamp":1653523200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,5,26]],"date-time":"2022-05-26T00:00:00Z","timestamp":1653523200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"EPFL Lausanne"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Distrib. Comput."],"published-print":{"date-parts":[[2022,8]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Machine learning (ML) solutions are nowadays distributed, according to the so-called <jats:italic>server\/worker<\/jats:italic> architecture. One <jats:italic>server<\/jats:italic> holds the model parameters while several <jats:italic>workers<\/jats:italic> train the model. Clearly, such architecture is prone to various types of component failures, which can be all encompassed within the spectrum of a Byzantine behavior. Several approaches have been proposed recently to tolerate Byzantine workers. Yet all require trusting a central parameter server. We initiate in this paper the study of the <jats:italic>\u201cgeneral\u201d Byzantine-resilient<\/jats:italic> distributed machine learning problem where no individual component is trusted. In particular, we distribute the parameter server computation on several nodes. We show that this problem can be solved in an asynchronous system, despite the presence of <jats:inline-formula><jats:alternatives><jats:tex-math>$$\\frac{1}{3}$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mfrac>\n                    <mml:mn>1<\/mml:mn>\n                    <mml:mn>3<\/mml:mn>\n                  <\/mml:mfrac>\n                <\/mml:math><\/jats:alternatives><\/jats:inline-formula> Byzantine parameter servers (i.e., <jats:inline-formula><jats:alternatives><jats:tex-math>$$n_{ps} &gt; 3f_{ps}+1$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mrow>\n                    <mml:msub>\n                      <mml:mi>n<\/mml:mi>\n                      <mml:mrow>\n                        <mml:mi>ps<\/mml:mi>\n                      <\/mml:mrow>\n                    <\/mml:msub>\n                    <mml:mo>&gt;<\/mml:mo>\n                    <mml:mn>3<\/mml:mn>\n                    <mml:msub>\n                      <mml:mi>f<\/mml:mi>\n                      <mml:mrow>\n                        <mml:mi>ps<\/mml:mi>\n                      <\/mml:mrow>\n                    <\/mml:msub>\n                    <mml:mo>+<\/mml:mo>\n                    <mml:mn>1<\/mml:mn>\n                  <\/mml:mrow>\n                <\/mml:math><\/jats:alternatives><\/jats:inline-formula>) and <jats:inline-formula><jats:alternatives><jats:tex-math>$$\\frac{1}{3}$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mfrac>\n                    <mml:mn>1<\/mml:mn>\n                    <mml:mn>3<\/mml:mn>\n                  <\/mml:mfrac>\n                <\/mml:math><\/jats:alternatives><\/jats:inline-formula> Byzantine workers (i.e., <jats:inline-formula><jats:alternatives><jats:tex-math>$$n_w &gt; 3f_w$$<\/jats:tex-math><mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\">\n                  <mml:mrow>\n                    <mml:msub>\n                      <mml:mi>n<\/mml:mi>\n                      <mml:mi>w<\/mml:mi>\n                    <\/mml:msub>\n                    <mml:mo>&gt;<\/mml:mo>\n                    <mml:mn>3<\/mml:mn>\n                    <mml:msub>\n                      <mml:mi>f<\/mml:mi>\n                      <mml:mi>w<\/mml:mi>\n                    <\/mml:msub>\n                  <\/mml:mrow>\n                <\/mml:math><\/jats:alternatives><\/jats:inline-formula>), which is asymptotically optimal. We present a new algorithm, <jats:italic>ByzSGD<\/jats:italic>, which solves the general Byzantine-resilient distributed machine learning problem by relying on three major schemes. The first, <jats:italic>scatter\/gather<\/jats:italic>, is a communication scheme whose goal is to bound the maximum drift among models on correct servers. The second, <jats:italic>distributed median contraction<\/jats:italic> (DMC), leverages the geometric properties of the median in high dimensional spaces to bring parameters within the correct servers back close to each other, ensuring safe and lively learning. The third, <jats:italic>Minimum-diameter averaging<\/jats:italic> (<jats:italic>MDA<\/jats:italic>), is a statistically-robust gradient aggregation rule whose goal is to tolerate Byzantine workers. <jats:italic>MDA<\/jats:italic> requires a loose bound on the variance of non-Byzantine gradient estimates, compared to existing alternatives [e.g., Krum (Blanchard et al., in: Neural information processing systems, pp 118-128, 2017)]. Interestingly, <jats:italic>ByzSGD<\/jats:italic> ensures Byzantine resilience without adding communication rounds (on a normal path), compared to vanilla non-Byzantine alternatives. <jats:italic>ByzSGD<\/jats:italic> requires, however, a larger number of messages which, we show, can be reduced if we assume synchrony. We implemented <jats:italic>ByzSGD<\/jats:italic> on top of both TensorFlow and PyTorch, and we report on our evaluation results. In particular, we show that <jats:italic>ByzSGD<\/jats:italic> guarantees convergence with around 32% overhead compared to vanilla SGD. Furthermore, we show that <jats:italic>ByzSGD<\/jats:italic>\u2019s throughput overhead is 24\u2013176% in the synchronous case and 28\u2013220% in the asynchronous case.<\/jats:p>","DOI":"10.1007\/s00446-022-00427-9","type":"journal-article","created":{"date-parts":[[2022,5,26]],"date-time":"2022-05-26T06:04:03Z","timestamp":1653545043000},"page":"305-331","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["Genuinely distributed Byzantine machine learning"],"prefix":"10.1007","volume":"35","author":[{"given":"El-Mahdi","family":"El-Mhamdi","sequence":"first","affiliation":[]},{"given":"Rachid","family":"Guerraoui","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0898-0387","authenticated-orcid":false,"given":"Arsany","family":"Guirguis","sequence":"additional","affiliation":[]},{"given":"L\u00ea-Nguy\u00ean","family":"Hoang","sequence":"additional","affiliation":[]},{"given":"S\u00e9bastien","family":"Rouault","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,5,26]]},"reference":[{"key":"427_CR1","unstructured":"Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., et\u00a0al. Tensorflow: a system for large-scale machine learning. In: 12th $$USENIX$$ Symposium on Operating Systems Design and Implementation (OSDI 16), pp. 265\u2013283 (2016)"},{"key":"427_CR2","doi-asserted-by":"crossref","unstructured":"Abraham, I., Amit, Y., Dolev, D.: Optimal resilience asynchronous approximate agreement. In: International Conference on Principles of Distributed Systems, pp. 229\u2013239. Springer (2004)","DOI":"10.1007\/11516798_17"},{"key":"427_CR3","unstructured":"Alain, G., Lamb, A., Sankar, C., Courville, A., Bengio, Y.: Variance reduction in SGD by distributed importance sampling (2015). arXiv preprint arXiv:1511.06481"},{"key":"427_CR4","unstructured":"Alistarh, D., Allen-Zhu, Z., Li, J.: Byzantine stochastic gradient descent. In: Advances in Neural Information Processing Systems, pp. 4613\u20134623 (2018)"},{"key":"427_CR5","unstructured":"Alistarh, D., Li, J., Tomioka, R., Vojnovic, M.: QSGD: randomized quantization for communication-optimal stochastic gradient descent (2016). arXiv preprint arXiv:1610.02132"},{"key":"427_CR6","unstructured":"Bartlett, P.L., Foster, D.J., Telgarsky, M.J.: Spectrally-normalized margin bounds for neural networks. In: Neural Information Processing Systems, pp. 6241\u20136250 (2017)"},{"key":"427_CR7","unstructured":"Baruch, G., Baruch, M., Goldberg, Y.: A little is enough: circumventing defenses for distributed learning. In: Advances in Neural Information Processing Systems, 32 (2019)"},{"key":"427_CR8","unstructured":"Bernstein, J., Zhao, J., Azizzadenesheli, K., Anandkumar, A.: SignSGD with majority vote is communication efficient and fault tolerant (2018). arXiv preprint arXiv:1810.05291"},{"key":"427_CR9","unstructured":"Biggio, B., Nelson, B., Laskov, P.: Poisoning attacks against support vector machines (2012). arXiv preprint arXiv:1206.6389"},{"key":"427_CR10","doi-asserted-by":"publisher","first-page":"317","DOI":"10.1016\/j.patcog.2018.07.023","volume":"84","author":"B Biggio","year":"2018","unstructured":"Biggio, B., Roli, F.: Wild patterns: ten years after the rise of adversarial machine learning. Pattern Recognit. 84, 317\u2013331 (2018)","journal-title":"Pattern Recognit."},{"key":"427_CR11","unstructured":"Blanchard, P., El\u00a0Mhamdi, E.M., Guerraoui, R., Stainer, J.: Machine learning with adversaries: byzantine tolerant gradient descent. In: Neural Information Processing Systems, pp. 118\u2013128 (2017)"},{"issue":"9","key":"427_CR12","first-page":"142","volume":"17","author":"L Bottou","year":"1998","unstructured":"Bottou, L.: Online learning and stochastic approximations. Online Learn. Neural Netw. 17(9), 142 (1998)","journal-title":"Online Learn. Neural Netw."},{"key":"427_CR13","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-15260-3","volume-title":"Introduction to Reliable and Secure Distributed Programming","author":"C Cachin","year":"2011","unstructured":"Cachin, C., Guerraoui, R., Rodrigues, L.: Introduction to Reliable and Secure Distributed Programming. Springer, Berlin (2011)"},{"key":"427_CR14","unstructured":"Castro, M., Liskov, B., et al.: Practical Byzantine fault tolerance. In: USENIX Symposium on Operating Systems Design and Implementation, vol. 99, pp. 173\u2013186 (1999)"},{"key":"427_CR15","unstructured":"Chen, L., Wang, H., Charles, Z., Papailiopoulos, D.: Draco: byzantine-resilient distributed training via redundant gradients. In: International Conference on Machine Learning, pp. 902\u2013911 (2018)"},{"key":"427_CR16","unstructured":"Chilimbi, T.M., Suzue, Y., Apacible, J., Kalyanaraman, K.: Project adam: building an efficient and scalable deep learning training system. In: USENIX Symposium on Operating Systems Design and Implementation, vol.\u00a014, pp. 571\u2013582 (2014)"},{"key":"427_CR17","unstructured":"Damaskinos, G., El\u00a0Mhamdi, E.M., Guerraoui, R., Guirguis, A., Rouault, S.: Aggregathor: byzantine machine learning via robust gradient aggregation. In: The Conference on Systems and Machine Learning (MLSys) (2019)"},{"key":"427_CR18","unstructured":"Damaskinos, G., El\u00a0Mhamdi, E.M., Guerraoui, R., Patra, R., Taziki, M.: Asynchronous byzantine machine learning (the case of SGD). In: International Conference on Machine Learning, pp. 1153\u20131162 (2018)"},{"key":"427_CR19","unstructured":"Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding (2018). arXiv preprint arXiv:1810.04805"},{"key":"427_CR20","doi-asserted-by":"crossref","unstructured":"Diakonikolas, I., Kamath, G., Kane, D.M., Li, J., Moitra, A., Stewart, A.: Robustly learning a gaussian: getting optimal error, efficiently. In: Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 2683\u20132702 (2018)","DOI":"10.1137\/1.9781611975031.171"},{"issue":"3","key":"427_CR21","doi-asserted-by":"publisher","first-page":"499","DOI":"10.1145\/5925.5931","volume":"33","author":"D Dolev","year":"1986","unstructured":"Dolev, D., Lynch, N.A., Pinter, S.S., Stark, E.W., Weihl, W.E.: Reaching approximate agreement in the presence of faults. J. ACM JACM 33(3), 499\u2013516 (1986)","journal-title":"J. ACM JACM"},{"key":"427_CR22","unstructured":"El-Mhamdi, E.-M., Farhadkhani, S., Guerraoui, R., Guirguis, A., Hoang, L.N., Rouault, S.: Collaborative learning in the jungle. In: Advances in Neural Information Processing Systems (2021)"},{"key":"427_CR23","doi-asserted-by":"crossref","unstructured":"El-Mhamdi, E.-M., Guerraoui, R., Guirguis, A., Hoang, L.N., Rouault, S.: Genuinely distributed Byzantine machine learning. In: Proceedings of the 39th Symposium on Principles of Distributed Computing, pp. 355\u2013364 (2020)","DOI":"10.1145\/3382734.3405695"},{"key":"427_CR24","unstructured":"El-Mhamdi, E.-M., Guerraoui, R., Rouault, S.: The hidden vulnerability of distributed learning in byzantium. In: International Conference on Machine Learning, pp. 3521\u20133530 (2018)"},{"issue":"7639","key":"427_CR25","doi-asserted-by":"publisher","first-page":"115","DOI":"10.1038\/nature21056","volume":"542","author":"A Esteva","year":"2017","unstructured":"Esteva, A., Kuprel, B., Novoa, R.A., Ko, J., Swetter, S.M., Blau, H.M., Thrun, S.: Dermatologist-level classification of skin cancer with deep neural networks. Nature 542(7639), 115 (2017)","journal-title":"Nature"},{"key":"427_CR26","doi-asserted-by":"crossref","unstructured":"Fekete, A.D.: Asymptotically optimal algorithms for approximate agreement. In: Proceedings of the Fifth Annual ACM Symposium on Principles of Distributed Computing, pp. 73\u201387 (1986)","DOI":"10.1145\/41840.41846"},{"key":"427_CR27","doi-asserted-by":"crossref","unstructured":"Fekete, A.D.: Asynchronous approximate agreement. In: Proceedings of the Sixth Annual ACM Symposium on Principles of Distributed Computing, pp. 64\u201376 (1987)","DOI":"10.1145\/41840.41846"},{"issue":"2","key":"427_CR28","doi-asserted-by":"publisher","first-page":"374","DOI":"10.1145\/3149.214121","volume":"32","author":"MJ Fischer","year":"1985","unstructured":"Fischer, M.J., Lynch, N.A., Paterson, M.S.: Impossibility of distributed consensus with one faulty process. J. ACM (JACM) 32(2), 374\u2013382 (1985)","journal-title":"J. ACM (JACM)"},{"key":"427_CR29","unstructured":"Gilmer, J., Metz, L., Faghri, F., et\u00a0al. Adversarial spheres (2018). arXiv preprint arXiv:1801.02774"},{"key":"427_CR30","unstructured":"Grid5000. Grid5000. https:\/\/www.grid5000.fr\/"},{"key":"427_CR31","doi-asserted-by":"crossref","unstructured":"Guerraoui, R., Guirguis, A., Plassmann, J., Ragot, A., Rouault, S.: Garfield: system support for Byzantine machine learning. In: 2021 51st Annual IEEE\/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 39\u201351. IEEE (2021)","DOI":"10.1109\/DSN48987.2021.00021"},{"key":"427_CR32","unstructured":"Guo, S., Zhang, T., Xie, X., Ma, L., Xiang, T., Liu, Y.: Towards byzantine-resilient learning in decentralized systems (2020). arXiv preprint arXiv:2002.08569"},{"key":"427_CR33","unstructured":"Hashemi, H., Wang, Y., Guo, C., Annavaram, M.: Byzantine-robust and privacy-preserving framework for FEDML (2021). arXiv preprint arXiv:2105.02295"},{"key":"427_CR34","unstructured":"He, L., Karimireddy, S.P., Jaggi, M.: Secure byzantine-robust machine learning (2020). arXiv preprint arXiv:2006.04747"},{"key":"427_CR35","doi-asserted-by":"crossref","unstructured":"Hecht-Nielsen, R.: Theory of the backpropagation neural network. In: Neural Networks for Perception, pp. 65\u201393. Elsevier (1992)","DOI":"10.1016\/B978-0-12-741252-8.50010-8"},{"key":"427_CR36","unstructured":"Hsieh, K., Harlap, A., Vijaykumar, N., et\u00a0al.: Gaia: Geo-distributed machine learning approaching LAN speeds. In: USENIX Symposium on Networked Systems Design and Implementation, pp. 629\u2013647 (2017)"},{"key":"427_CR37","unstructured":"Karimireddy, S.P., He, L., Jaggi, M.: Byzantine-robust learning on heterogeneous datasets via resampling (2020). arXiv preprint arXiv:2006.09365"},{"key":"427_CR38","unstructured":"Karimireddy, S.P., He, L., Jaggi, M.: Learning from history for byzantine robust optimization. In: Meila, M., Zhang, T., (eds) Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18\u201324 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pp. 5311\u20135319. PMLR (2021)"},{"key":"427_CR39","unstructured":"Kim, L. How many ads does google serve in a day? (2012). http:\/\/goo.gl\/oIidXO"},{"key":"427_CR40","unstructured":"Kone\u010dn\u1ef3, J., McMahan, B., Ramage, D.: Federated optimization: distributed optimization beyond the datacenter (2015). arXiv preprint arXiv:1511.03575"},{"key":"427_CR41","unstructured":"Krizhevsky, A., Nair, V., Hinton, G.: Cifar dataset. https:\/\/www.cs.toronto.edu\/~kriz\/cifar.html"},{"issue":"3","key":"427_CR42","doi-asserted-by":"publisher","first-page":"382","DOI":"10.1145\/357172.357176","volume":"4","author":"L Lamport","year":"1982","unstructured":"Lamport, L., Shostak, R., Pease, M.: The Byzantine generals problem. ACM Trans. Program. Lang. Syst. TOPLAS 4(3), 382\u2013401 (1982)","journal-title":"ACM Trans. Program. Lang. Syst. TOPLAS"},{"issue":"7553","key":"427_CR43","doi-asserted-by":"publisher","first-page":"436","DOI":"10.1038\/nature14539","volume":"521","author":"Y LeCun","year":"2015","unstructured":"LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436\u2013444 (2015)","journal-title":"Nature"},{"key":"427_CR44","unstructured":"Lecunn, Y.: MNIST dataset (1998). http:\/\/yann.lecun.com\/exdb\/mnist\/"},{"key":"427_CR45","unstructured":"Li, M., Andersen, D.G., Park, J.W., et\u00a0al.: Scaling distributed machine learning with the parameter server. In: 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pp. 583\u2013598 (2014)"},{"key":"427_CR46","unstructured":"Li, M., Zhou, L., Yang, Z., et\u00a0al.: Parameter server for distributed machine learning. In: Big Learning NeurIPS Workshop, vol.\u00a06, pp.\u00a02 (2013)"},{"key":"427_CR47","doi-asserted-by":"crossref","unstructured":"Liu, S., Gupta, N., Vaidya, N.H.: Approximate byzantine fault-tolerance in distributed optimization. In: Proceedings of the 2021 ACM Symposium on Principles of Distributed Computing, pp. 379\u2013389 (2021)","DOI":"10.1145\/3465084.3467902"},{"issue":"6","key":"427_CR48","doi-asserted-by":"publisher","first-page":"423","DOI":"10.1007\/s00446-014-0240-5","volume":"28","author":"H Mendes","year":"2015","unstructured":"Mendes, H., Herlihy, M., Vaidya, N.H., Garg, V.K.: Multidimensional agreement in Byzantine systems. Distrib. Comput. 28(6), 423\u2013441 (2015)","journal-title":"Distrib. Comput."},{"issue":"1","key":"427_CR49","first-page":"1235","volume":"17","author":"X Meng","year":"2016","unstructured":"Meng, X., Bradley, J., Yavuz, B., et al.: Mllib: machine learning in apache spark. J. Mach. Learn. Res. 17(1), 1235\u20131241 (2016)","journal-title":"J. Mach. Learn. Res."},{"key":"427_CR50","doi-asserted-by":"crossref","unstructured":"Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z.B., Swami, A.: Practical black-box attacks against machine learning. In: Asia Conference on Computer and Communications Security, pp. 506\u2013519 (2017)","DOI":"10.1145\/3052973.3053009"},{"key":"427_CR51","first-page":"8026","volume":"32","author":"A Paszke","year":"2019","unstructured":"Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al.: Pytorch: an imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 32, 8026\u20138037 (2019)","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"427_CR52","unstructured":"Rajput, S., Wang, H., Charles, Z., Papailiopoulos, D.: Detox: a redundancy-based framework for faster and more robust gradient aggregation. In: Advances in Neural Information Processing Systems, vol. 32 (2019)"},{"key":"427_CR53","doi-asserted-by":"crossref","unstructured":"Rosasco, L., De Vito, E., Caponnetto, A., Piana, M., Verri, A.: Are loss functions all the same? Neural Comput. 16(5), 1063\u20131076 (2004)","DOI":"10.1162\/089976604773135104"},{"key":"427_CR54","doi-asserted-by":"publisher","first-page":"283","DOI":"10.1007\/978-94-009-5438-0_20","volume":"8","author":"PJ Rousseeuw","year":"1985","unstructured":"Rousseeuw, P.J.: Multivariate estimation with high breakdown point. Math. Stat. Appl. 8, 283\u2013297 (1985)","journal-title":"Math. Stat. Appl."},{"issue":"6088","key":"427_CR55","doi-asserted-by":"publisher","first-page":"533","DOI":"10.1038\/323533a0","volume":"323","author":"DE Rumelhart","year":"1986","unstructured":"Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533\u2013536 (1986)","journal-title":"Nature"},{"issue":"4","key":"427_CR56","doi-asserted-by":"publisher","first-page":"299","DOI":"10.1145\/98163.98167","volume":"22","author":"FB Schneider","year":"1990","unstructured":"Schneider, F.B.: Implementing fault-tolerant services using the state machine approach: a tutorial. ACM Comput. Surv. 22(4), 299\u2013319 (1990)","journal-title":"ACM Comput. Surv."},{"key":"427_CR57","doi-asserted-by":"publisher","first-page":"2168","DOI":"10.1109\/JSAC.2020.3041404","volume":"39","author":"J So","year":"2020","unstructured":"So, J., G\u00fcler, B., Avestimehr, A.S.: Byzantine-resilient secure federated learning. IEEE J. Sel. Areas Commun. 39, 2168\u20132181 (2020)","journal-title":"IEEE J. Sel. Areas Commun."},{"issue":"1","key":"427_CR58","first-page":"1929","volume":"15","author":"N Srivastava","year":"2014","unstructured":"Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929\u20131958 (2014)","journal-title":"J. Mach. Learn. Res."},{"key":"427_CR59","unstructured":"Su, L.: Defending distributed systems against adversarial attacks: consensus, consensus-based learning, and statistical learning. PhD thesis, University of Illinois at Urbana-Champaign (2017)"},{"issue":"9","key":"427_CR60","doi-asserted-by":"publisher","first-page":"3758","DOI":"10.1109\/TAC.2019.2951686","volume":"65","author":"L Su","year":"2019","unstructured":"Su, L., Shahrampour, S.: Finite-time guarantees for byzantine-resilient distributed state estimation with noisy measurements. IEEE Trans. Autom. Control 65(9), 3758\u20133771 (2019)","journal-title":"IEEE Trans. Autom. Control"},{"key":"427_CR61","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998\u20136008 (2017)"},{"key":"427_CR62","unstructured":"Virmaux, A., Scaman, K.: Lipschitz regularity of deep neural networks: analysis and efficient estimation. In: Advances in Neural Information Processing Systems, pp. 3835\u20133844 (2018)"},{"key":"427_CR63","unstructured":"Vyavahare, P., Su, L., Vaidya, N.H.: Distributed learning with adversarial agents under relaxed network condition (2019). arXiv preprint arXiv:1901.01943"},{"key":"427_CR64","unstructured":"Xiao, H., Biggio, B., Brown, G., Fumera, G., Eckert, C., Roli, F.: Is feature selection secure against training data poisoning? In: International Conference on Machine Learning, pp. 1689\u20131698 (2015)"},{"key":"427_CR65","unstructured":"Xie, C., Koyejo, and O., Gupta, I.: Generalized Byzantine-tolerant SGD (2018). arXiv preprint arXiv:1802.10116"},{"key":"427_CR66","unstructured":"Xie, C., Koyejo, O., Gupta, I.: Phocas: dimensional byzantine-resilient stochastic gradient descent (2018). arXiv preprint arXiv:1805.09682"},{"key":"427_CR67","unstructured":"Xie, C., Koyejo, O., Gupta, I.: Zeno: Byzantine-suspicious stochastic gradient descent (2018). arXiv preprint arXiv:1805.10032"},{"key":"427_CR68","doi-asserted-by":"crossref","unstructured":"Xu, H., Ho, C-Y., Abdelmoniem, A.M., Dutta, A., Bergou, E.H., Karatsenidis, K., Canini, M., Kalnis, P.: Grace: a compressed communication framework for distributed machine learning. In 2021 IEEE 41st international conference on distributed computing systems (ICDCS), pp. 561\u2013572. IEEE (2021)","DOI":"10.1109\/ICDCS51616.2021.00060"},{"key":"427_CR69","doi-asserted-by":"crossref","unstructured":"Yang, H., Zhang, X., Fang, M., Liu, J.: Byzantine-resilient stochastic gradient descent for distributed learning: a lipschitz-inspired coordinate-wise median approach. In: 2019 IEEE 58th Conference on Decision and Control (CDC), pp. 5832\u20135837. IEEE (2019)","DOI":"10.1109\/CDC40024.2019.9029245"},{"key":"427_CR70","unstructured":"Zhang, H., Zheng, Z., Xu, S., Dai, W., Ho, O., Liang, X., Hu, Z., Wei, J., Xie, P., Xing, E.P.: Poseidon: an efficient communication architecture for distributed deep learning on GPU clusters. In: USENIX Annual Technical Conference, pp. 181\u2013193 (2017)"},{"key":"427_CR71","unstructured":"Zhang, S., Choromanska, A.E., LeCun, Y.: Deep learning with elastic averaging SGD. In: Neural Information Processing Systems, pp. 685\u2013693 (2015)"}],"container-title":["Distributed Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00446-022-00427-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s00446-022-00427-9\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00446-022-00427-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,7,22]],"date-time":"2022-07-22T06:05:31Z","timestamp":1658469931000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s00446-022-00427-9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,5,26]]},"references-count":71,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2022,8]]}},"alternative-id":["427"],"URL":"https:\/\/doi.org\/10.1007\/s00446-022-00427-9","relation":{},"ISSN":["0178-2770","1432-0452"],"issn-type":[{"type":"print","value":"0178-2770"},{"type":"electronic","value":"1432-0452"}],"subject":[],"published":{"date-parts":[[2022,5,26]]},"assertion":[{"value":"1 December 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 April 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"26 May 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}