{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,9]],"date-time":"2026-04-09T14:44:40Z","timestamp":1775745880410,"version":"3.50.1"},"reference-count":46,"publisher":"Emerald","issue":"2","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2012,3,29]]},"abstract":"<jats:p>Online learning is a well established learning paradigm which has both theoretical and practical appeals. The goal of online learning is to make a sequence of accurate predictions given knowledge of the correct answer to previous prediction tasks and possibly additional available information. Online learning has been studied in several research fields including game theory, information theory, and machine learning. It also became of great interest to practitioners due the recent emergence of large scale applications such as online advertisement placement and online web ranking. In this survey we provide a modern overview of online learning. Our goal is to give the reader a sense of some of the interesting ideas and in particular to underscore the centrality of convexity in deriving efficient online learning algorithms. We do not mean to be comprehensive but rather to give a high-level, rigorous yet easy to follow, survey.<\/jats:p>","DOI":"10.1561\/2200000018","type":"journal-article","created":{"date-parts":[[2012,3,29]],"date-time":"2012-03-29T09:23:33Z","timestamp":1333013013000},"page":"107-194","source":"Crossref","is-referenced-by-count":977,"title":["Online Learning and Online Convex Optimization"],"prefix":"10.1561","volume":"4","author":[{"given":"Shai","family":"Shalev-Shwartz","sequence":"first","affiliation":[{"name":"Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem ,","place":["Israel"]}]}],"member":"140","published-online":{"date-parts":[[2012,3,29]]},"reference":[{"key":"2026033014014596200_ref001","article-title":"Optimal strategies and minimax lower bounds for online convex games","author":"Abernethy","year":"2008","journal-title":"Proceedings of the 21st Annual Conference on Learning Theory (COLT)"},{"issue":"3","key":"2026033014014596200_ref002","article-title":"Competing in the dark: An efficient algorithm for bandit linear optimization","author":"Abernethy","year":"2008","journal-title":"Proceedings of the 21st Annual Conference on Learning Theory (COLT)"},{"issue":"3","key":"2026033014014596200_ref003","doi-asserted-by":"crossref","first-page":"382","DOI":"10.4153\/CJM-1954-037-2","article-title":"The relaxation method for linear inequalities","volume":"6","author":"Agmon","year":"1954","journal-title":"Canadian Journal of Mathematics"},{"issue":"1","key":"2026033014014596200_ref004","doi-asserted-by":"crossref","first-page":"48","DOI":"10.1137\/S0097539701398375","article-title":"The nonstochastic multiarmed bandit problem","volume":"32","author":"Auer","year":"2002","journal-title":"SIAM Journal on Computing"},{"issue":"2","key":"2026033014014596200_ref005","doi-asserted-by":"crossref","first-page":"127","DOI":"10.1023\/A:1007472513967","article-title":"Tracking the best disjunction","volume":"32","author":"Auer","year":"1998","journal-title":"Machine Learning"},{"issue":"3","key":"2026033014014596200_ref006","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1023\/A:1010896012157","article-title":"Relative loss bounds for on-line density estimation with the exponential family of distributions","volume":"43","author":"Azoury","year":"2001","journal-title":"Machine Learning"},{"key":"2026033014014596200_ref007","article-title":"Agnostic online learning","author":"Ben-David","year":"2009","journal-title":"Proceedings of the 22nd Annual Conference on Learning Theory (COLT)"},{"key":"2026033014014596200_ref008","doi-asserted-by":"crossref","first-page":"306","DOI":"10.1007\/BFb0029575","volume-title":"Online Algorithms: The State of the Art","author":"Blum","year":"1998"},{"key":"2026033014014596200_ref009","doi-asserted-by":"crossref","DOI":"10.1007\/978-0-387-31256-9","volume-title":"Convex Analysis and Nonlinear Optimization: Theory and Examples","author":"Borwein","year":"2006"},{"issue":"9","key":"2026033014014596200_ref010","doi-asserted-by":"crossref","first-page":"2050","DOI":"10.1109\/TIT.2004.833339","article-title":"On the generalization ability of on-line learning algorithms","volume":"50","author":"Cesa-Bianchi","year":"2004","journal-title":"IEEE Transactions on Information Theory"},{"key":"2026033014014596200_ref011","article-title":"Improved risk tail bounds for on-line algorithms","author":"Cesa-Bianchi","year":"2006","journal-title":"Advances in Neural Information Processing Systems 18 (NIPS 2005)"},{"key":"2026033014014596200_ref012","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511546921","volume-title":"Prediction, Learning, and Games","author":"Cesa-Bianchi","year":"2006"},{"key":"2026033014014596200_ref013","first-page":"263","article-title":"Behavior of sequential predictors of binary sequences","author":"Cover","year":"1965","journal-title":"Transactions of the Fourth Prague Conference on Information Theory, Statistical Decision Functions, Random Processes"},{"key":"2026033014014596200_ref014","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511801389","volume-title":"An Introduction to Support Vector Machines and Other Kernel-based Learning Methods","author":"Cristianini","year":"2000"},{"key":"2026033014014596200_ref015","first-page":"345","article-title":"The price of bandit information for online optimization","author":"Dani","year":"2008","journal-title":"Advances in Neural Information Processing Systems 20"},{"key":"2026033014014596200_ref016","first-page":"385","article-title":"Online convex optimization in the bandit setting: Gradient descent without a gradient","author":"Flaxman","year":"2005","journal-title":"Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA)"},{"issue":"3","key":"2026033014014596200_ref017","doi-asserted-by":"crossref","first-page":"277","DOI":"10.1023\/A:1007662407062","article-title":"Large margin classification using the perceptron algorithm","volume":"37","author":"Freund","year":"1999","journal-title":"Machine Learning"},{"issue":"3","key":"2026033014014596200_ref018","doi-asserted-by":"crossref","first-page":"265","DOI":"10.1023\/A:1026319107706","article-title":"The robustness of the p-norm algorithms","volume":"53","author":"Gentile","year":"2003","journal-title":"Machine Learning"},{"key":"2026033014014596200_ref019","doi-asserted-by":"crossref","DOI":"10.1145\/307400.307405","article-title":"The robustness of the p-norm algorithms","author":"Gentile","year":"1999","journal-title":"Proceedings of the 12th Annual Conference on Computational Learning Theory (COLT)"},{"key":"2026033014014596200_ref020","doi-asserted-by":"crossref","first-page":"29","DOI":"10.1145\/307400.307410","article-title":"Regret bounds for prediction problems","author":"Gordon","year":"1999","journal-title":"Proceedings of the 12th Annual Conference on Computational Learning Theory (COLT)"},{"issue":"3","key":"2026033014014596200_ref021","doi-asserted-by":"crossref","first-page":"173","DOI":"10.1023\/A:1010844028087","article-title":"General convergence results for linear discriminant updates","volume":"43","author":"Grove","year":"2001","journal-title":"Machine Learning"},{"key":"2026033014014596200_ref022","volume-title":"Optimization for Machine Learning","author":"Hazan","year":"2011"},{"issue":"3","key":"2026033014014596200_ref023","doi-asserted-by":"crossref","first-page":"551","DOI":"10.1006\/jcss.1995.1044","article-title":"On weak learning","volume":"50","author":"Helmbold","year":"1995","journal-title":"Journal of Computer and System Sciences"},{"key":"2026033014014596200_ref024","first-page":"1865","article-title":"Regularization techniques for learning with matrices","volume":"13","author":"Kakade","year":"2012","journal-title":"Journal of Machine Learning Research"},{"issue":"3","key":"2026033014014596200_ref025","doi-asserted-by":"crossref","first-page":"291","DOI":"10.1016\/j.jcss.2004.10.016","article-title":"Efficient algorithms for online decision problems","volume":"71","author":"Kalai","year":"2005","journal-title":"Journal of Computer and System Sciences"},{"issue":"1","key":"2026033014014596200_ref026","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1006\/inco.1996.2612","article-title":"Exponentiated gradient versus gradient descent for linear predictors","volume":"132","author":"Kivinen","year":"1997","journal-title":"Information and Computation"},{"key":"2026033014014596200_ref027","doi-asserted-by":"crossref","first-page":"209","DOI":"10.1145\/225058.225121","article-title":"Additive versus exponentiated gradient updates for linear prediction","author":"Kivinen","year":"1995","journal-title":"Proceedings of the Twenty-Seventh Annual ACM Symposium on Theory of Computing (STOC)"},{"issue":"3","key":"2026033014014596200_ref028","doi-asserted-by":"crossref","first-page":"301","DOI":"10.1023\/A:1017938623079","article-title":"Relative loss bounds for multidimensional regression problems","volume":"45","author":"Kivinen","year":"2001","journal-title":"Machine Learning"},{"issue":"4","key":"2026033014014596200_ref029","doi-asserted-by":"crossref","first-page":"285","DOI":"10.1023\/A:1022869011914","article-title":"Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm","volume":"2","author":"Littlestone","year":"1988","journal-title":"Machine Learning"},{"key":"2026033014014596200_ref030","doi-asserted-by":"crossref","first-page":"269","DOI":"10.1016\/B978-0-08-094829-4.50022-2","article-title":"From on-line to batch learning","author":"Littlestone","year":"1989","journal-title":"Proceedings of the Second Annual Workshop on Computational Learning Theory (COLT)"},{"key":"2026033014014596200_ref031","volume-title":"Ph.D. dissertation","author":"Littlestone","year":"1989"},{"issue":"2","key":"2026033014014596200_ref032","doi-asserted-by":"crossref","first-page":"212","DOI":"10.1006\/inco.1994.1009","article-title":"The weighted majority algorithm","volume":"108","author":"Littlestone","year":"1994","journal-title":"Information and Computation"},{"key":"2026033014014596200_ref033","volume-title":"Perceptrons: An Introduction to Computational Geometry","author":"Minsky","year":"1969"},{"key":"2026033014014596200_ref034","article-title":"Lecture Notes on Online Learning","author":"Rakhlin","year":"2009"},{"key":"2026033014014596200_ref035","first-page":"1984","article-title":"Online learning: Random averages, combinatorial parameters, and learnability","author":"Rakhlin","year":"2010","journal-title":"Advances in Neural Information Processing Systems 23 (NIPS 2010)"},{"issue":"5","key":"2026033014014596200_ref036","doi-asserted-by":"crossref","first-page":"527","DOI":"10.1090\/S0002-9904-1952-09620-8","article-title":"Some aspects of the sequential design of experiments","volume":"58","author":"Robbins","year":"1952","journal-title":"Bulletin of the American Mathematical Society"},{"issue":"6","key":"2026033014014596200_ref037","doi-asserted-by":"crossref","first-page":"386","DOI":"10.1037\/h0042519","article-title":"The perceptron: A probabilistic model for information storage and organization in the brain","volume":"65","author":"Rosenblatt","year":"1958","journal-title":"Psychological Review"},{"key":"2026033014014596200_ref038","volume-title":"Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond","author":"Sch\u00f6lkopf","year":"2002"},{"key":"2026033014014596200_ref039","article-title":"Online learning: Theory, algorithms, and applications","author":"Shalev-Shwartz","year":"2007","journal-title":"Ph.D. dissertation"},{"issue":"2-3","key":"2026033014014596200_ref040","doi-asserted-by":"crossref","first-page":"115","DOI":"10.1007\/s10994-007-5014-x","article-title":"A primal-dual perspective of online learning algorithms","volume":"69","author":"Shalev-Shwartz","year":"2007","journal-title":"Machine Learning"},{"key":"2026033014014596200_ref041","doi-asserted-by":"crossref","DOI":"10.1002\/0471722138","volume-title":"Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control","author":"Spall","year":"2003"},{"key":"2026033014014596200_ref042","volume-title":"Statistical Learning Theory","author":"Vapnik","year":"1998"},{"key":"2026033014014596200_ref043","first-page":"371","volume-title":"Proceedings of the 3rd Annual Workshop on Computational Learning Theory (COLT)","author":"Vovk","year":"1990"},{"key":"2026033014014596200_ref044","doi-asserted-by":"crossref","DOI":"10.1142\/5021","volume-title":"Convex Analysis in General Vector Spaces","author":"Zalinescu","year":"2002"},{"key":"2026033014014596200_ref045","doi-asserted-by":"crossref","first-page":"173","DOI":"10.1007\/11503415_12","article-title":"Data dependent concentration bounds for sequential prediction algorithms","author":"Zhang","year":"2005","journal-title":"Proceedings of the 18th Annual Conference on Learning Theory (COLT)"},{"key":"2026033014014596200_ref046","first-page":"928","article-title":"Online convex programming and generalized infinitesimal gradient ascent","author":"Zinkevich","year":"2003","journal-title":"Proceedings of the Twentieth International Conference on Machine Learning (ICML)"}],"container-title":["Foundations and Trends\u00ae in Machine Learning"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.emerald.com\/ftmal\/article-pdf\/4\/2\/107\/11147287\/2200000018en.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/www.emerald.com\/ftmal\/article-pdf\/4\/2\/107\/11147287\/2200000018en.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,30]],"date-time":"2026-03-30T18:02:02Z","timestamp":1774893722000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.emerald.com\/ftmal\/article\/4\/2\/107\/1332161\/Online-Learning-and-Online-Convex-Optimization"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,3,29]]},"references-count":46,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2012,3,29]]}},"URL":"https:\/\/doi.org\/10.1561\/2200000018","relation":{},"ISSN":["1935-8237","1935-8245"],"issn-type":[{"value":"1935-8237","type":"print"},{"value":"1935-8245","type":"electronic"}],"subject":[],"published":{"date-parts":[[2012,3,29]]}}}