{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,18]],"date-time":"2026-01-18T12:19:16Z","timestamp":1768738756424,"version":"3.49.0"},"reference-count":42,"publisher":"Springer Science and Business Media LLC","issue":"8-9","license":[{"start":{"date-parts":[[2019,6,25]],"date-time":"2019-06-25T00:00:00Z","timestamp":1561420800000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/www.springer.com\/tdm"},{"start":{"date-parts":[[2019,6,25]],"date-time":"2019-06-25T00:00:00Z","timestamp":1561420800000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/www.springer.com\/tdm"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Mach Learn"],"published-print":{"date-parts":[[2019,9]]},"DOI":"10.1007\/s10994-019-05802-5","type":"journal-article","created":{"date-parts":[[2019,6,25]],"date-time":"2019-06-25T20:58:39Z","timestamp":1561496319000},"page":"1523-1560","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["Efficient learning with robust gradient descent"],"prefix":"10.1007","volume":"108","author":[{"given":"Matthew J.","family":"Holland","sequence":"first","affiliation":[]},{"given":"Kazushi","family":"Ikeda","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2019,6,25]]},"reference":[{"key":"5802_CR1","unstructured":"Abramowitz, M., & Stegun, I. A. (1964). Handbook of mathematical functions with formulas, graphs, and mathematical tables, National Bureau of Standards Applied Mathematics Series, vol\u00a055. US National Bureau of Standards."},{"issue":"4","key":"5802_CR2","doi-asserted-by":"publisher","first-page":"615","DOI":"10.1145\/263867.263927","volume":"44","author":"N Alon","year":"1997","unstructured":"Alon, N., Ben-David, S., Cesa-Bianchi, N., & Haussler, D. (1997). Scale-sensitive dimensions, uniform convergence, and learnability. Journal of the ACM, 44(4), 615\u2013631.","journal-title":"Journal of the ACM"},{"key":"5802_CR3","volume-title":"Probability and measure theory","author":"RB Ash","year":"2000","unstructured":"Ash, R. B., & Doleans-Dade, C. (2000). Probability and measure theory. Cambridge: Academic Press."},{"issue":"3","key":"5802_CR4","doi-asserted-by":"publisher","first-page":"434","DOI":"10.1006\/jcss.1996.0033","volume":"52","author":"PL Bartlett","year":"1996","unstructured":"Bartlett, P. L., Long, P. M., & Williamson, R. C. (1996). Fat-shattering and the learnability of real-valued functions. Journal of Computer and System Sciences, 52(3), 434\u2013452.","journal-title":"Journal of Computer and System Sciences"},{"key":"5802_CR5","first-page":"463","volume":"3","author":"PL Bartlett","year":"2003","unstructured":"Bartlett, P. L., & Mendelson, S. (2003). Rademacher and gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3, 463\u2013482.","journal-title":"Journal of Machine Learning Research"},{"issue":"6","key":"5802_CR6","doi-asserted-by":"publisher","first-page":"2507","DOI":"10.1214\/15-AOS1350","volume":"43","author":"C Brownlees","year":"2015","unstructured":"Brownlees, C., Joly, E., & Lugosi, G. (2015). Empirical risk minimization for heavy-tailed losses. Annals of Statistics, 43(6), 2507\u20132536.","journal-title":"Annals of Statistics"},{"key":"5802_CR7","unstructured":"Catoni, O. (2009). High confidence estimates of the mean of heavy-tailed real random variables. arXiv preprint \n                    arXiv:0909.5366\n                    \n                  ."},{"issue":"4","key":"5802_CR8","doi-asserted-by":"publisher","first-page":"1148","DOI":"10.1214\/11-AIHP454","volume":"48","author":"O Catoni","year":"2012","unstructured":"Catoni, O. (2012). Challenging the empirical mean and empirical variance: A deviation study. Annales de l\u2019Institut Henri Poincar\u00e9, Probabilit\u00e9s et Statistiques, 48(4), 1148\u20131185.","journal-title":"Annales de l\u2019Institut Henri Poincar\u00e9, Probabilit\u00e9s et Statistiques"},{"key":"5802_CR9","doi-asserted-by":"crossref","unstructured":"Chen, Y., Su, L., & Xu, J. (2017a). Distributed statistical machine learning in adversarial settings: Byzantine gradient descent. arXiv preprint \n                    arXiv:1705.05491\n                    \n                  .","DOI":"10.1145\/3219617.3219655"},{"issue":"2","key":"5802_CR10","first-page":"44","volume":"1","author":"Y Chen","year":"2017","unstructured":"Chen, Y., Su, L., & Xu, J. (2017b). Distributed statistical machine learning in adversarial settings: Byzantine gradient descent. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 1(2), 44.","journal-title":"Proceedings of the ACM on Measurement and Analysis of Computing Systems"},{"key":"5802_CR11","unstructured":"Daniely, A., & Shalev-Shwartz, S. (2014). Optimal learners for multiclass problems. In 27th annual conference on learning theory, proceedings of machine learning research (vol. 35, pp. 287\u2013316)."},{"key":"5802_CR12","unstructured":"Devroye, L., Lerasle, M., Lugosi, G., & Oliveira, R. I. (2015). Sub-Gaussian mean estimators. arXiv preprint \n                    arXiv:1509.05845\n                    \n                  ."},{"key":"5802_CR13","first-page":"2121","volume":"12","author":"J Duchi","year":"2011","unstructured":"Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12, 2121\u20132159.","journal-title":"Journal of Machine Learning Research"},{"key":"5802_CR14","first-page":"3576","volume":"29","author":"V Feldman","year":"2016","unstructured":"Feldman, V. (2016). Generalization of ERM in stochastic convex optimization: The dimension strikes back. Advances in Neural Information Processing Systems, 29, 3576\u20133584.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"5802_CR15","volume-title":"Extreme values in finance, telecommunications, and the environment","year":"2003","unstructured":"Finkenst\u00e4dt, B., & Rootz\u00e9n, H. (Eds.). (2003). Extreme values in finance, telecommunications, and the environment. Boca Raton: CRC Press."},{"key":"5802_CR16","unstructured":"Frostig, R., Ge, R., Kakade, S. M., & Sidford, A. (2015). Competing with the empirical risk minimizer in a single pass. arXiv preprint \n                    arXiv:1412.6606\n                    \n                  ."},{"key":"5802_CR17","unstructured":"Holland, M. J., & Ikeda, K. (2017a). Efficient learning with robust gradient descent. arXiv preprint \n                    arXiv:1706.00182\n                    \n                  ."},{"issue":"9","key":"5802_CR18","doi-asserted-by":"publisher","first-page":"1643","DOI":"10.1007\/s10994-017-5653-5","volume":"106","author":"MJ Holland","year":"2017","unstructured":"Holland, M. J., & Ikeda, K. (2017b). Robust regression using biased objectives. Machine Learning, 106(9), 1643\u20131679. \n                    https:\/\/doi.org\/10.1007\/s10994-017-5653-5\n                    \n                  .","journal-title":"Machine Learning"},{"issue":"18","key":"5802_CR19","first-page":"1","volume":"17","author":"D Hsu","year":"2016","unstructured":"Hsu, D., & Sabato, S. (2016). Loss minimization and parameter estimation with heavy tails. Journal of Machine Learning Research, 17(18), 1\u201340.","journal-title":"Journal of Machine Learning Research"},{"key":"5802_CR20","doi-asserted-by":"publisher","DOI":"10.1002\/9780470434697","volume-title":"Robust statistics","author":"PJ Huber","year":"2009","unstructured":"Huber, P. J., & Ronchetti, E. M. (2009). Robust statistics (2nd ed.). New York: Wiley.","edition":"2"},{"key":"5802_CR21","first-page":"315","volume":"26","author":"R Johnson","year":"2013","unstructured":"Johnson, R., & Zhang, T. (2013). Accelerating stochastic gradient descent using predictive variance reduction. Advances in Neural Information Processing Systems, 26, 315\u2013323.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"5802_CR22","doi-asserted-by":"publisher","first-page":"464","DOI":"10.1016\/S0022-0000(05)80062-5","volume":"48","author":"MJ Kearns","year":"1994","unstructured":"Kearns, M. J., & Schapire, R. E. (1994). Efficient distribution-free learning of probabilistic concepts. Journal of Computer and System Sciences, 48, 464\u2013497.","journal-title":"Journal of Computer and System Sciences"},{"key":"5802_CR23","unstructured":"Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint \n                    arXiv:1412.6980\n                    \n                  ."},{"key":"5802_CR24","unstructured":"Kolmogorov, A. N. (1993). $$\\varepsilon $$-entropy and $$\\varepsilon $$-capacity of sets in functional spaces. In A. N. Shiryayev (Ed.), Selected works of A.\u00a0N.\u00a0Kolmogorov, volume III: Information theory and the theory of algorithms (pp. 86\u2013170). Berlin: Springer."},{"key":"5802_CR25","first-page":"2663","volume":"25","author":"N Le Roux","year":"2012","unstructured":"Le Roux, N., Schmidt, M., & Bach, F. R. (2012). A stochastic gradient method with an exponential convergence rate for finite training sets. Advances in Neural Information Processing Systems, 25, 2663\u20132671.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"5802_CR26","unstructured":"Lecu\u00e9, G., & Lerasle, M.(2017). Learning from MOM\u2019s principles. arXiv preprint \n                    arXiv:1701.01961\n                    \n                  ."},{"key":"5802_CR27","unstructured":"Lecu\u00e9, G., Lerasle, M., & Mathieu, T. (2018). Robust classification via MOM minimization. arXiv preprint \n                    arXiv:1808.03106\n                    \n                  ."},{"key":"5802_CR28","unstructured":"Lerasle, M., & Oliveira, R. I. (2011). Robust empirical mean estimators. arXiv preprint \n                    arXiv:1112.3914\n                    \n                  ."},{"key":"5802_CR29","first-page":"4556","volume":"29","author":"J Lin","year":"2016","unstructured":"Lin, J., & Rosasco, L. (2016). Optimal learning for multi-pass stochastic gradient methods. Advances in Neural Information Processing Systems, 29, 4556\u20134564.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"5802_CR30","volume-title":"Optimization by vector space methods","author":"DG Luenberger","year":"1969","unstructured":"Luenberger, D. G. (1969). Optimization by vector space methods. New York: Wiley."},{"key":"5802_CR31","unstructured":"Lugosi, G., & Mendelson, S. (2016). Risk minimization by median-of-means tournaments. arXiv preprint \n                    arXiv:1608.00757\n                    \n                  ."},{"key":"5802_CR32","unstructured":"Minsker, S., & Strawn, N. (2017). Distributed statistical estimation and rates of convergence in normal approximation. arXiv preprint \n                    arXiv:1704.02658\n                    \n                  ."},{"issue":"4","key":"5802_CR33","doi-asserted-by":"publisher","first-page":"2308","DOI":"10.3150\/14-BEJ645","volume":"21","author":"S Minsker","year":"2015","unstructured":"Minsker, S. (2015). Geometric median and robust estimation in Banach spaces. Bernoulli, 21(4), 2308\u20132335.","journal-title":"Bernoulli"},{"key":"5802_CR34","unstructured":"Murata, T., & Suzuki, T. (2016). Stochastic dual averaging methods using variance reduction techniques for regularized empirical risk minimization problems. arXiv preprint \n                    arXiv:1603.02412\n                    \n                  ."},{"key":"5802_CR35","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4419-8853-9","volume-title":"Introductory lectures on convex optimization: A basic course","author":"Y Nesterov","year":"2004","unstructured":"Nesterov, Y. (2004). Introductory lectures on convex optimization: A basic course. Berlin: Springer."},{"key":"5802_CR36","series-title":"Springer Series in Operations Research","doi-asserted-by":"publisher","DOI":"10.1007\/b98874","volume-title":"Numerical optimization","author":"J Nocedal","year":"1999","unstructured":"Nocedal, J., & Wright, S. (1999). Numerical optimization., Springer Series in Operations Research Berlin: Springer."},{"key":"5802_CR37","unstructured":"Prasad, A., Suggala, A. S., Balakrishnan, S., & Ravikumar, P. (2018). Robust estimation via robust gradient estimation. arXiv preprint \n                    arXiv:1802.06485\n                    \n                  ."},{"key":"5802_CR38","unstructured":"Rakhlin, A., Shamir, O., & Sridharan, K. (2012). Making gradient descent optimal for strongly convex stochastic optimization. In Proceedings of the 29th international conference on machine learning (pp. 449\u2013456)."},{"key":"5802_CR39","first-page":"567","volume":"14","author":"S Shalev-Shwartz","year":"2013","unstructured":"Shalev-Shwartz, S., & Zhang, T. (2013). Stochastic dual coordinate ascent methods for regularized loss minimization. Journal of Machine Learning Research, 14, 567\u2013599.","journal-title":"Journal of Machine Learning Research"},{"issue":"6","key":"5802_CR40","doi-asserted-by":"publisher","first-page":"544","DOI":"10.1080\/00029890.2001.11919782","volume":"108","author":"E Talvila","year":"2001","unstructured":"Talvila, E. (2001). Necessary and sufficient conditions for differentiating under the integral sign. American Mathematical Monthly, 108(6), 544\u2013548.","journal-title":"American Mathematical Monthly"},{"key":"5802_CR41","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511802256","volume-title":"Asymptotic statistics","author":"AW van der Vaart","year":"1998","unstructured":"van der Vaart, A. W. (1998). Asymptotic statistics. Cambridge: Cambridge University Press."},{"issue":"4","key":"5802_CR42","doi-asserted-by":"publisher","first-page":"1423","DOI":"10.1073\/pnas.97.4.1423","volume":"97","author":"Y Vardi","year":"2000","unstructured":"Vardi, Y., & Zhang, C. H. (2000). The multivariate $$L_{1}$$-median and associated data depth. Proceedings of the National Academy of Sciences, 97(4), 1423\u20131426.","journal-title":"Proceedings of the National Academy of Sciences"}],"container-title":["Machine Learning"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-019-05802-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1007\/s10994-019-05802-5\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-019-05802-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2020,6,25]],"date-time":"2020-06-25T00:12:03Z","timestamp":1593043923000},"score":1,"resource":{"primary":{"URL":"http:\/\/link.springer.com\/10.1007\/s10994-019-05802-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,6,25]]},"references-count":42,"journal-issue":{"issue":"8-9","published-print":{"date-parts":[[2019,9]]}},"alternative-id":["5802"],"URL":"https:\/\/doi.org\/10.1007\/s10994-019-05802-5","relation":{},"ISSN":["0885-6125","1573-0565"],"issn-type":[{"value":"0885-6125","type":"print"},{"value":"1573-0565","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,6,25]]},"assertion":[{"value":"13 September 2018","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 April 2019","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"25 June 2019","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"This content has been made available to all.","name":"free","label":"Free to read"}]}}