{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,4]],"date-time":"2026-06-04T03:05:33Z","timestamp":1780542333877,"version":"3.54.1"},"reference-count":132,"publisher":"Emerald","issue":"3-4","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2017,12,4]]},"abstract":"<jats:p>A vast majority of machine learning algorithms train their models and perform inference by solving optimization problems. In order to capture the learning and prediction problems accurately, structural constraints such as sparsity or low rank are frequently imposed or else the objective itself is designed to be a non-convex function. This is especially true of algorithms that operate in high-dimensional spaces or that train non-linear models such as tensor models and deep networks.<\/jats:p>\n                  <jats:p>The freedom to express the learning problem as a non-convex optimization problem gives immense modeling power to the algorithm designer, but often such problems are NP-hard to solve. A popular workaround to this has been to relax non-convex problems to convex ones and use traditional methods to solve the (convex) relaxed optimization problems. However this approach may be lossy and nevertheless presents significant challenges for large scale optimization.<\/jats:p>\n                  <jats:p>On the other hand, direct approaches to non-convex optimization have met with resounding success in several domains and remain the methods of choice for the practitioner, as they frequently outperform relaxation-based techniques - popular heuristics include projected gradient descent and alternating minimization. However, these are often poorly understood in terms of their convergence and other properties.<\/jats:p>\n                  <jats:p>This monograph presents a selection of recent advances that bridge a long-standing gap in our understanding of these heuristics. We hope that an insight into the inner workings of these methods will allow the reader to appreciate the unique marriage of task structure and generative models that allow these heuristic techniques to (provably) succeed. The monograph will lead the reader through several widely used nonconvex optimization techniques, as well as applications thereof. The goal of this monograph is to both, introduce the rich literature in this area, as well as equip the reader with the tools and techniques needed to analyze these simple procedures for non-convex problems.<\/jats:p>","DOI":"10.1561\/2200000058","type":"journal-article","created":{"date-parts":[[2017,12,4]],"date-time":"2017-12-04T05:44:31Z","timestamp":1512366271000},"page":"142-336","source":"Crossref","is-referenced-by-count":290,"title":["Non-convex Optimization for Machine Learning"],"prefix":"10.1108","volume":"10","author":[{"given":"Prateek","family":"Jain","sequence":"first","affiliation":[{"name":"Microsoft Research India"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Purushottam","family":"Kar","sequence":"additional","affiliation":[{"name":"IIT Kanpur ,","place":["India"]}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"140","published-online":{"date-parts":[[2017,12,4]]},"reference":[{"issue":"5","key":"2026033012240200500_ref001","doi-asserted-by":"crossref","first-page":"2452","DOI":"10.1214\/12-AOS1032","article-title":"Fast global convergenceof gradient methods for high-dimensional statistical recovery","volume":"40","author":"Agarwal","year":"2012","journal-title":"The Annals of Statistics"},{"issue":"4","key":"2026033012240200500_ref002","doi-asserted-by":"crossref","first-page":"2775","DOI":"10.1137\/140979861","article-title":"Learning Sparsely Used Overcomplete Dictionaries via Alternating Minimization","volume":"26","author":"Agarwal","year":"2016","journal-title":"SIAM Journal of Optimization,"},{"key":"2026033012240200500_ref003","volume-title":"Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing (STOC)","author":"Agarwal","year":"2017"},{"key":"2026033012240200500_ref004","first-page":"81","volume-title":"Proceedings of the 29th Conference on Learning Theory (COLT)","author":"Anandkumar","year":"2016"},{"key":"2026033012240200500_ref005","first-page":"2773","article-title":"Tensor Decompositions for Learning Latent Variable Models","volume":"15","author":"Anandkumar","year":"2014","journal-title":"Journal of Machine Learning Research"},{"key":"2026033012240200500_ref006","first-page":"1","article-title":"Convergence of an Alternating Maximization Procedure","volume":"17","author":"Andresen","year":"2016","journal-title":"Journal of Machine Learning Research"},{"key":"2026033012240200500_ref007","volume-title":"Proceedings of The 27th Conference on Learning Theory (COLT)","author":"Arora","year":"2014"},{"key":"2026033012240200500_ref008","volume-title":"Proceedings of the 29th Conference on Learning Theory (COLT)","author":"Azizzadenesheli","year":"2016"},{"issue":"1","key":"2026033012240200500_ref009","doi-asserted-by":"crossref","first-page":"77","DOI":"10.1214\/16-AOS1435","article-title":"Statistical Guarantees for the EM Algorithm: From Population to Sample-based Analysis","volume":"45","author":"Balakrishnan","year":"2017","journal-title":"Annals of Statistics"},{"issue":"3","key":"2026033012240200500_ref010","doi-asserted-by":"crossref","first-page":"253","DOI":"10.1007\/s00365-007-9003-x","article-title":"A Simple Proof of the Restricted Isometry Property for Random Matrices","volume":"28","author":"Baraniuk","year":"2008","journal-title":"Constructive Approximation"},{"key":"2026033012240200500_ref011","volume-title":"Nonlinear Programming","author":"Bertsekas","year":"2016"},{"key":"2026033012240200500_ref012","volume-title":"Proceedings of the 29th Annual Conference on Neural Information Processing Systems (NIPS)","author":"Bhatia","year":"2015"},{"key":"2026033012240200500_ref013","volume-title":"Proceedings of the 31st Annual Conference on Neural Information Processing Systems (NIPS)","author":"Bhatia","year":"2017"},{"key":"2026033012240200500_ref014","volume-title":"Proceedings of the 31st International Conference on Machine Learning (ICML)","author":"Bhojanapalli","year":"2014"},{"issue":"7","key":"2026033012240200500_ref015","doi-asserted-by":"crossref","first-page":"4660","DOI":"10.1109\/TIT.2011.2146550","article-title":"Sampling and Reconstructing Signals From a Union of Linear Subspaces","volume":"57","author":"Blumensath","year":"2011","journal-title":"IEEE Transactions on Information Theory"},{"issue":"1","key":"2026033012240200500_ref016","doi-asserted-by":"crossref","first-page":"145","DOI":"10.1215\/00127094-1384809","article-title":"Explicit constructions of RIP matrices and related problems","volume":"159","author":"Bourgain","year":"2011","journal-title":"Duke Mathematical Journal"},{"key":"2026033012240200500_ref017","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511804441","volume-title":"Convex Optimization","author":"Boyd","year":"2004"},{"key":"2026033012240200500_ref018","volume-title":"Proceedings of the 34th International Conference on Machine Learning (ICML)","author":"Brutzkus","year":"2017"},{"issue":"34","key":"2026033012240200500_ref019","doi-asserted-by":"crossref","first-page":"231","DOI":"10.1561\/2200000050","article-title":"Convex Optimization: Algorithms and Complexity","volume":"8","author":"Bubeck","year":"2015","journal-title":"Foundations and Trends\u00ae in Machine Learning"},{"issue":"4","key":"2026033012240200500_ref020","doi-asserted-by":"crossref","first-page":"1956","DOI":"10.1137\/080738970","article-title":"A Singular Value Thresholding Algorithm for Matrix Completion","volume":"20","author":"Cai","year":"2010","journal-title":"SIAM Journal of Optimization"},{"issue":"12","key":"2026033012240200500_ref021","doi-asserted-by":"crossref","first-page":"4203","DOI":"10.1109\/TIT.2005.858979","article-title":"Decoding by Linear Programming","volume":"51","author":"Candes","year":"2005","journal-title":"IEEE Transactions on Information Theory"},{"issue":"9\u201310","key":"2026033012240200500_ref022","doi-asserted-by":"crossref","first-page":"589","DOI":"10.1016\/j.crma.2008.03.014","article-title":"The Restricted Isometry Property and Its Implications for Compressed Sensing","volume":"346","author":"Candes","year":"2008","journal-title":"Comptes Rendus Mathematique,"},{"issue":"5","key":"2026033012240200500_ref023","doi-asserted-by":"crossref","first-page":"1017","DOI":"10.1007\/s10208-013-9162-z","article-title":"Solving Quadratic Equations via PhaseLift When There Are About as Many Equations as Unknowns","volume":"14","author":"Candes","year":"2014","journal-title":"Foundations of Computational Mathematics"},{"issue":"6","key":"2026033012240200500_ref024","doi-asserted-by":"crossref","first-page":"717","DOI":"10.1007\/s10208-009-9045-5","article-title":"Exact Matrix Completion via Convex Optimization","volume":"9","author":"Candes","year":"2009","journal-title":"Foundations of Computational Mathematics"},{"issue":"5","key":"2026033012240200500_ref025","doi-asserted-by":"crossref","first-page":"2053","DOI":"10.1109\/TIT.2010.2044061","article-title":"The power of convex relaxation: Near- optimal matrix completion","volume":"56","author":"Candes","year":"2009","journal-title":"IEEE Transactions on Information Theory"},{"issue":"8","key":"2026033012240200500_ref026","doi-asserted-by":"crossref","first-page":"1207","DOI":"10.1002\/cpa.20124","article-title":"Stable Signal Recovery from Incomplete and Inaccurate Measurements","volume":"59","author":"Candes","year":"2006","journal-title":"Communications on Pure and Applied Mathematics"},{"issue":"4","key":"2026033012240200500_ref027","doi-asserted-by":"crossref","first-page":"1985","DOI":"10.1109\/TIT.2015.2399924","article-title":"Phase Retrieval via Wirtinger Flow: Theory and Algorithms.","volume":"61","author":"Candes","year":"2015","journal-title":"IEEE Transactions on Information Theory"},{"key":"2026033012240200500_ref028","volume-title":"Proceedings of the 3fth International Conference on Machine Learning (ICML)","author":"Carmon","year":"2017"},{"issue":"10","key":"2026033012240200500_ref029","first-page":"707","article-title":"Exact Reconstruction of Sparse Signals via Nonconvex Minimization","volume":"14","author":"Chartrand","year":"2007","journal-title":"IEEE Information Processing Letters"},{"issue":"1","key":"2026033012240200500_ref030","doi-asserted-by":"crossref","first-page":"227","DOI":"10.1093\/imanum\/drq039","article-title":"Matrix Completion via an Alternating Direction Method.","volume":"32","author":"Chen","year":"2012","journal-title":"IMA Journal of Numerical Analysis"},{"key":"2026033012240200500_ref031","volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","author":"Chen","year":"2015"},{"key":"2026033012240200500_ref032","volume-title":"Proceedings of the 26th Annual Conference on Neural Information Processing Systems (NIPS)","author":"Chen","year":"2012"},{"key":"2026033012240200500_ref033","volume-title":"Proceedings of the 30th International Conference on Machine Learning (ICML)","author":"Chen","year":"2013"},{"issue":"1","key":"2026033012240200500_ref034","doi-asserted-by":"crossref","first-page":"503","DOI":"10.1109\/TIT.2015.2499247","article-title":"Matrix Completion with Column Manipulation: Near-Optimal Sample-RobustnessRank Tradeoffs","volume":"62","author":"Chen","year":"2016","journal-title":"IEEE Transactions on Information Theory"},{"key":"2026033012240200500_ref035","volume-title":"Proceedings of the 34th International Conference on Machine Learning (ICML)","author":"Cherapanamjeri","year":"2017"},{"key":"2026033012240200500_ref036","volume-title":"Proceedings of the 18th International Conference on Arti cial Intelligence and Statistics (AISTATS)","author":"Choromanska","year":"2015"},{"issue":"1","key":"2026033012240200500_ref037","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1090\/S0894-0347-08-00610-3","article-title":"Compressed Sensing and Best k-term Approximation","volume":"22","author":"Cohen","year":"2009","journal-title":"Journal of the American Mathematical Society"},{"key":"2026033012240200500_ref038","volume-title":"Proceedings in Computational Statistics (COMPSTAT)","author":"Croux","year":"2008"},{"key":"2026033012240200500_ref039","first-page":"2933","article-title":"Identifying and attacking the saddle point problem in high-dimensional non-convex optimization","author":"Dauphin","year":"2014","journal-title":"Proceedings of the 28th Annual Conference on Neural Information Processing Systems (NIPS)"},{"issue":"1","key":"2026033012240200500_ref040","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1111\/j.2517-6161.1977.tb01600.x","article-title":"Maximum Likelihood from Incomplete Data via the EM Algorithm","volume":"39","author":"Dempster","year":"1977","journal-title":"Journal of the Royal Statistical Society, Series B"},{"issue":"4","key":"2026033012240200500_ref041","doi-asserted-by":"crossref","first-page":"1289","DOI":"10.1109\/TIT.2006.871582","article-title":"Compressed Sensing","volume":"52","author":"Donoho","year":"2006","journal-title":"IEEE Transactions on Information Theory"},{"issue":"45","key":"2026033012240200500_ref042","doi-asserted-by":"crossref","first-page":"18914","DOI":"10.1073\/pnas.0909892106","article-title":"Message Passing Algorithms for Compressed Sensing: I. Motivation and Construction","volume":"106","author":"Donoho","year":"2009","journal-title":"Proceedings of the National Academy of Sciences USA"},{"key":"2026033012240200500_ref043","volume-title":"Proceedings of the 25th International Conference on Machine Learning (ICML)","author":"Duchi","year":"2008"},{"key":"2026033012240200500_ref044","first-page":"1871","article-title":"LIBLINEAR: A Library for Large Linear Classification","volume":"9","author":"Fan","year":"2008","journal-title":"Journal of Machine Learning Research"},{"issue":"3","key":"2026033012240200500_ref045","doi-asserted-by":"crossref","first-page":"946","DOI":"10.1137\/110853996","article-title":"Hankel matrix rank minimization with applications in system identification and realization","volume":"34","author":"Fazel","year":"2013","journal-title":"SIAM Journal on Matrix Analysis and Applications"},{"issue":"6","key":"2026033012240200500_ref046","doi-asserted-by":"crossref","first-page":"381","DOI":"10.1145\/358669.358692","article-title":"Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography","volume":"24","author":"Fischler","year":"1981","journal-title":"Communications of the ACM"},{"issue":"1","key":"2026033012240200500_ref047","doi-asserted-by":"crossref","first-page":"97","DOI":"10.1016\/j.acha.2009.10.004","article-title":"A Note on Guaranteed Sparse Recovery via l1-minimization","volume":"29","author":"Foucart","year":"2010","journal-title":"Applied and Computational Harmonic Analysis"},{"issue":"6","key":"2026033012240200500_ref048","doi-asserted-by":"crossref","first-page":"2543","DOI":"10.1137\/100806278","article-title":"Hard Thresholding Pursuit: an Algorithm for Compressive Sensing","volume":"49","author":"Foucart","year":"2011","journal-title":"SIAM Journal on Numerical Analysis"},{"issue":"3","key":"2026033012240200500_ref049","doi-asserted-by":"crossref","first-page":"395","DOI":"10.1016\/j.acha.2008.09.001","article-title":"Sparsest solutions of underdetermined linear systems via lq\u2013minimization for 0 &lt; q \u2264 1","volume":"26","author":"Foucart","year":"2009","journal-title":"Applied and Computational Harmonic Analysis"},{"key":"2026033012240200500_ref050","doi-asserted-by":"crossref","first-page":"156","DOI":"10.1038\/172156a0","article-title":"Evidence for 2-Chain Helix in Crystalline Structure of Sodium Deoxyribonucleate","volume":"172","author":"Franklin","year":"1953","journal-title":"Nature"},{"key":"2026033012240200500_ref051","doi-asserted-by":"crossref","first-page":"740","DOI":"10.1038\/171740a0","article-title":"Molecular Configuration in Sodium Thymonucleate","volume":"171","author":"Franklin","year":"1953","journal-title":"Nature"},{"key":"2026033012240200500_ref052","volume-title":"Proceedings of the 26th International Conference on Machine Learning (ICML)","author":"Garg","year":"2009"},{"key":"2026033012240200500_ref053","first-page":"797","volume-title":"Proceedings of The 28th Conference on Learning Theory (COLT)","author":"Ge","year":"2015"},{"key":"2026033012240200500_ref054","volume-title":"Proceedings of the 30th Annual Conference on Neural Information Processing Systems (NIPS)","author":"Ge","year":"2016"},{"issue":"2","key":"2026033012240200500_ref055","first-page":"237","article-title":"A Practical Algorithm for the Determination of Phase from Image and Diffraction Plane Pictures","volume":"35","author":"Gerchberg","year":"1972","journal-title":"Optik"},{"key":"2026033012240200500_ref056","author":"Goel","year":"2017"},{"issue":"2","key":"2026033012240200500_ref057","doi-asserted-by":"crossref","first-page":"183","DOI":"10.1007\/s10208-011-9084-6","article-title":"Convergence of Fixed-Point Continuation Algorithms for Matrix Rank Minimization","volume":"11","author":"Goldfarb","year":"2011","journal-title":"Foundations of Computational Mathematics"},{"key":"2026033012240200500_ref058","volume-title":"Matrix Computations","author":"Golub","year":"1996"},{"issue":"11","key":"2026033012240200500_ref059","doi-asserted-by":"crossref","first-page":"6298","DOI":"10.1109\/TIT.2015.2472522","article-title":"Sparse and Spurious: Dictionary Learning With Noise and Outliers","volume":"61","author":"Gribonval","year":"2015","journal-title":"IEEE Transaction on Information Theory"},{"key":"2026033012240200500_ref060","volume-title":"Proceedings of the 55th IEEE Annual Symposium on Foundations of Computer Science (FOCS)","author":"Hardt","year":"2014"},{"key":"2026033012240200500_ref061","volume-title":"Proceedings of The 27th Conference on Learning Theory (COLT)","author":"Hardt","year":"2014"},{"key":"2026033012240200500_ref062","volume-title":"Proceedings of The 27th Conference on Learning Theory (COLT)","author":"Hardt","year":"2014"},{"key":"2026033012240200500_ref063","volume-title":"Statistical Learning with Sparsity: The Lasso and Generalizations","author":"Hastie","year":"2016"},{"key":"2026033012240200500_ref064","doi-asserted-by":"crossref","first-page":"163","DOI":"10.1007\/978-3-319-45282-1_11","volume-title":"Geometric Aspects of Functional Analysis","author":"Haviv","year":"2017"},{"key":"2026033012240200500_ref065","doi-asserted-by":"crossref","DOI":"10.1002\/9780470434697","volume-title":"Robust Statistics","author":"Huber","year":"2009"},{"key":"2026033012240200500_ref066","volume-title":"Proceedings of the IEEE International Symposium on Information Theory (ISIT)","author":"Jaganathan","year":"2013"},{"key":"2026033012240200500_ref067","volume-title":"Proceedings of The 28th Conference on Learning Theory (COLT)","author":"Jain","year":"2015"},{"key":"2026033012240200500_ref068","volume-title":"Proceedings of the 29th Annual Conference on Neural Information Processing Systems (NIPS)","author":"Jain","year":"2015"},{"key":"2026033012240200500_ref069","volume-title":"Proceedings of the 24th Annual Conference on Neural Information Processing Systems (NIPS)","author":"Jain","year":"2010"},{"key":"2026033012240200500_ref070","volume-title":"Proceedings of the 25th Annual Conference on Neural Information Processing Systems (NIPS)","author":"Jain","year":"2011"},{"key":"2026033012240200500_ref071","first-page":"665","volume-title":"Proceedings of the 45th annual ACM Symposium on Theory of Computing (STOC)","author":"Jain","year":"2013"},{"key":"2026033012240200500_ref072","volume-title":"Proceedings of the 28th Annual Conference on Neural Information Processing Systems (NIPS)","author":"Jain","year":"2014"},{"key":"2026033012240200500_ref073","first-page":"1935","volume-title":"Proceedings of the 25th Annual Conference on Neural Information Processing Systems (NIPS)","author":"Jalali","year":"2011"},{"key":"2026033012240200500_ref074","first-page":"1724","volume-title":"Proceedings of the 34th International Conference on Machine Learning (ICML)","author":"Jin","year":"2017"},{"issue":"4(184)","key":"2026033012240200500_ref075","first-page":"251","article-title":"The diameters of octahedra","volume":"30","author":"Kashin","year":"1975","journal-title":"Uspekhi Matematich- eskikh Nauk"},{"key":"2026033012240200500_ref076","volume-title":"Efficient Algorithms for Collaborative Filtering","author":"Keshavan","year":"2012"},{"issue":"6","key":"2026033012240200500_ref077","doi-asserted-by":"crossref","first-page":"2980","DOI":"10.1109\/TIT.2010.2046205","article-title":"Matrix Completion from a Few Entries","volume":"56","author":"Keshavan","year":"2010","journal-title":"IEEE Transactions on Information Theory"},{"issue":"8","key":"2026033012240200500_ref078","doi-asserted-by":"crossref","first-page":"30","DOI":"10.1109\/MC.2009.263","article-title":"Matrix Factorization Techniques for. Recommender Systems","volume":"42","author":"Koren","year":"2009","journal-title":"IEEE Computer"},{"key":"2026033012240200500_ref079","first-page":"1246","volume-title":"Proceedings of the 29th Conference on Learning Theory (COLT)","author":"Lee","year":"2016"},{"issue":"9","key":"2026033012240200500_ref080","doi-asserted-by":"crossref","first-page":"4402","DOI":"10.1109\/TIT.2010.2054251","article-title":"ADMiRA: Atomic Decomposition for Minimum Rank Approximation","volume":"56","author":"Lee","year":"2010","journal-title":"IEEE Transactions on Information Theory"},{"key":"2026033012240200500_ref081","volume-title":"Proceedings of the 31st Annual Conference on Neural Information Processing Systems (NIPS)","author":"Li","year":"2017"},{"issue":"2","key":"2026033012240200500_ref082","doi-asserted-by":"crossref","first-page":"129","DOI":"10.1109\/TIT.1982.1056489","article-title":"Least squares quantization in PCM","volume":"28","author":"Lloyd","year":"1982","journal-title":"IEEE Transactions on Information Theory"},{"issue":"5","key":"2026033012240200500_ref083","doi-asserted-by":"crossref","first-page":"737","DOI":"10.1021\/ed085p737","article-title":"A-DNA and B-DNA: Comparing Their Historical X-ray Fiber Diffraction Images","volume":"85","author":"Lucas","year":"2008","journal-title":"Journal of Chemical Education"},{"issue":"1","key":"2026033012240200500_ref084","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1007\/BF00939948","article-title":"On the Convergence of the Coordinate Descent Method for Convex Differentiable Minimization","volume":"72","author":"Luo","year":"1992","journal-title":"Journal of Optimization Theory and Applications"},{"issue":"1","key":"2026033012240200500_ref085","doi-asserted-by":"crossref","first-page":"157","DOI":"10.1007\/BF02096261","article-title":"Error bounds and convergence analysis of feasible descent methods: A general approach","volume":"46","author":"Luo","year":"1993","journal-title":"Annals of Operations Research"},{"key":"2026033012240200500_ref086","doi-asserted-by":"crossref","DOI":"10.1002\/0470010940","volume-title":"Robust Statistics: Theory and Methods","author":"Maronna","year":"2006"},{"key":"2026033012240200500_ref087","author":"Martin","year":"1978"},{"key":"2026033012240200500_ref088","volume-title":"Proceedings of the 25th International Conference on Machine Learning (ICML)","author":"Meka","year":"2008"},{"issue":"2","key":"2026033012240200500_ref089","doi-asserted-by":"crossref","first-page":"227","DOI":"10.1137\/S0097539792240406","article-title":"Sparse approximate solutions to linear systems","volume":"24","author":"Natarajan","year":"1995","journal-title":"SIAM Journal on Computing"},{"key":"2026033012240200500_ref090","doi-asserted-by":"crossref","first-page":"301","DOI":"10.1016\/j.acha.2008.07.002","article-title":"CoSaMP: Iterative Signal Recovery from Incomplete and Inaccurate Samples","volume":"26","author":"Needell","year":"2008","journal-title":"Applied and Computational Harmonic Analysis"},{"issue":"4","key":"2026033012240200500_ref091","doi-asserted-by":"crossref","first-page":"538","DOI":"10.1214\/12-STS400","article-title":"A Unified Framework for High-Dimensional Analysis of M-Estimators with Decomposable Regularizers","volume":"27","author":"Negahban","year":"2012","journal-title":"Statistical Science"},{"key":"2026033012240200500_ref092","volume-title":"Proceedings of the 25th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA)","author":"Nelson","year":"2014"},{"key":"2026033012240200500_ref093","volume-title":"Introductory Lectures on Convex Optimization: A Basic Course","author":"Nesterov","year":"2003"},{"issue":"2","key":"2026033012240200500_ref094","doi-asserted-by":"crossref","first-page":"341","DOI":"10.1137\/100802001","article-title":"Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems","volume":"22","author":"Nesterov","year":"2012","journal-title":"SIAM Journal of Optimization"},{"issue":"1","key":"2026033012240200500_ref095","doi-asserted-by":"crossref","first-page":"177","DOI":"10.1007\/s10107-006-0706-8","article-title":"Cubic regularization of Newton method and its global performance","volume":"108","author":"Nesterov","year":"2006","journal-title":"Mathematical Programming"},{"key":"2026033012240200500_ref096","volume-title":"Proceedings of the 27th Annual Conference on Neural Information Processing Systems (NIPS)","author":"Netrapalli","year":"2013"},{"issue":"4","key":"2026033012240200500_ref097","doi-asserted-by":"crossref","first-page":"2036","DOI":"10.1109\/TIT.2012.2232347","volume":"59","author":"Nguyen","year":"2013","journal-title":"IEEE Transactions on Information Theory"},{"issue":"4","key":"2026033012240200500_ref098","doi-asserted-by":"crossref","first-page":"2036","DOI":"10.1109\/TIT.2012.2232347","article-title":"Robust Lasso With Missing and Grossly Corrupted Observations","volume":"59","author":"Nguyen","year":"2013","journal-title":"IEEE Transaction on Information Theory"},{"key":"2026033012240200500_ref099","author":"Oymak","year":"2015"},{"key":"2026033012240200500_ref100","first-page":"2241","article-title":"Restricted Eigenvalue Properties for Correlated Gaussian Designs","volume":"11","author":"Raskutti","year":"2010","journal-title":"Journal of Machine Learning Research"},{"key":"2026033012240200500_ref101","first-page":"3413","article-title":"A Simpler Approach to Matrix Completion","volume":"12","author":"Recht","year":"2011","journal-title":"Journal of Machine Learning Research"},{"issue":"3","key":"2026033012240200500_ref102","doi-asserted-by":"crossref","first-page":"471","DOI":"10.1137\/070697835","article-title":"Guaranteed Minimum Rank Solutions to Linear Matrix Equations via Nuclear Norm Minimization","volume":"52","author":"Recht","year":"2010","journal-title":"SIAM Review"},{"key":"2026033012240200500_ref103","volume-title":"Proceedings of the 33rd International Conference on Machine Learning (ICML)","author":"Reddi","year":"2016"},{"issue":"388","key":"2026033012240200500_ref104","doi-asserted-by":"crossref","first-page":"871","DOI":"10.1080\/01621459.1984.10477105","article-title":"Least Median of Squares Regression","volume":"79","author":"Rousseeuw","year":"1984","journal-title":"Journal of the American Statistical Association"},{"key":"2026033012240200500_ref105","doi-asserted-by":"crossref","DOI":"10.1002\/0471725382","volume-title":"Robust Regression and Outlier Detection","author":"Rousseeuw","year":"1987"},{"issue":"1","key":"2026033012240200500_ref106","doi-asserted-by":"crossref","first-page":"576","DOI":"10.1137\/110840054","article-title":"On the Non-asymptotic Convergence of Cyclic Coordinate Descent Methods","volume":"23","author":"Saha","year":"2013","journal-title":"SIAM Journal on Optimization"},{"key":"2026033012240200500_ref107","author":"Sedghi","year":"2016"},{"key":"2026033012240200500_ref108","first-page":"567","article-title":"Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization","volume":"14","author":"Shalev-Shwartz","year":"2013","journal-title":"Journal of Machine Learning Research"},{"issue":"494","key":"2026033012240200500_ref109","doi-asserted-by":"crossref","first-page":"626","DOI":"10.1198\/jasa.2011.tm10390","article-title":"Outlier Detection Using Nonconvex Penalized Regression","volume":"106","author":"She","year":"2011","journal-title":"Journal of the American Statistical Association"},{"key":"2026033012240200500_ref110","volume-title":"Proceedings of the 25th Annual Conference on Learning Theory (COLT)","author":"Spielman","year":"2012"},{"key":"2026033012240200500_ref111","doi-asserted-by":"crossref","DOI":"10.7551\/mitpress\/8996.001.0001","volume-title":"Optimization for Machine Learning","author":"Sra","year":"2011"},{"issue":"7","key":"2026033012240200500_ref112","first-page":"1","article-title":"Robust time series analysis: A survey","volume":"23","author":"Stockinger","year":"1987","journal-title":"Kybernetika"},{"key":"2026033012240200500_ref113","author":"Sun","year":"2015"},{"key":"2026033012240200500_ref114","volume-title":"Proceedings of the 56th IEEE Annual Symposium on Foundations of Computer Science (FOCS)","author":"Sun","year":"2015"},{"key":"2026033012240200500_ref115","volume-title":"Proceedings of the 25th Annual Conference on Neural Information Processing Systems (NIPS)","author":"Tewari","year":"2011"},{"issue":"4","key":"2026033012240200500_ref116","doi-asserted-by":"crossref","first-page":"389","DOI":"10.1007\/s10208-011-9099-z","article-title":"User-Friendly Tail Bounds for Sums of Random Matrices","volume":"12","author":"Tropp","year":"2012","journal-title":"Foundations of Computational Mathematics"},{"issue":"12","key":"2026033012240200500_ref117","doi-asserted-by":"crossref","first-page":"4655","DOI":"10.1109\/TIT.2007.909108","article-title":"Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit","volume":"53","author":"Tropp","journal-title":"IEEE Transactions on Information Theory"},{"issue":"11","key":"2026033012240200500_ref118","doi-asserted-by":"crossref","first-page":"7255","DOI":"10.1109\/TIT.2011.2159959","article-title":"On the Performance of Sparse Recovery Via lp-Minimization (0 &lt; p &lt; 1)","volume":"57","author":"Wang","year":"2011","journal-title":"IEEE Transactions on Information Theory"},{"key":"2026033012240200500_ref119","volume-title":"Proceedings of the 29th Annual Conference on Neural Information Processing Systems (NIPS)","author":"Wang","year":"2015"},{"issue":"8","key":"2026033012240200500_ref120","doi-asserted-by":"crossref","first-page":"2151","DOI":"10.2337\/diabetes.52.8.2151","article-title":"Microarray Analysis of Gene Expression in the Kidneys of New- and Post-Onset Diabetic NOD Mice","volume":"52","author":"Wilson","year":"2003","journal-title":"Diabetes"},{"issue":"2","key":"2026033012240200500_ref121","doi-asserted-by":"crossref","first-page":"210","DOI":"10.1109\/TPAMI.2008.79","article-title":"Robust Face Recognition via Sparse Representation","volume":"31","author":"Wright","year":"2009","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"2026033012240200500_ref122","volume-title":"Numerical Optimization","author":"Wright","year":"1999"},{"key":"2026033012240200500_ref123","first-page":"95","volume-title":"The Annals of Statistics","author":"","year":"1983"},{"issue":"8","key":"2026033012240200500_ref124","doi-asserted-by":"crossref","first-page":"3234","DOI":"10.1109\/TIP.2013.2262292","article-title":"Fast l1-Minimization Algorithms for Robust Face Recognition","volume":"22","author":"Yang","year":"2013","journal-title":"IEEE Transactions on Image Processing"},{"key":"2026033012240200500_ref125","volume-title":"Proceedings of the 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton)","author":"Yang","year":"2015"},{"key":"2026033012240200500_ref126","volume-title":"Proceedings of the 29th Annual Conference on Neural Information Processing Systems (NIPS)","author":"Yi","year":"2015"},{"key":"2026033012240200500_ref127","volume-title":"Proceedings of the 31st International Conference on Machine Learning (ICML)","author":"Yi","year":"2014"},{"issue":"1","key":"2026033012240200500_ref128","doi-asserted-by":"crossref","first-page":"249","DOI":"10.1007\/s10107-015-0893-2","article-title":"Recent advances in trust region algorithms","volume":"151","author":"Yuan","year":"2015","journal-title":"Mathematical Programming"},{"key":"2026033012240200500_ref129","doi-asserted-by":"crossref","first-page":"4689","DOI":"10.1109\/TIT.2011.2146690","article-title":"Adaptive Forward-Backward Greedy Algorithm for Learning Sparse Representations","volume":"57","author":"Zhang","year":"2011","journal-title":"IEEE Transactions on Information Theory"},{"key":"2026033012240200500_ref130","volume-title":"Proceedings of the 30th Conference on Learning Theory","author":"Zhang","year":"2017"},{"key":"2026033012240200500_ref131","volume-title":"Proceedings of the 34th International Conference on Machine Learning (ICML)","author":"Zhong","year":"2017"},{"key":"2026033012240200500_ref132","volume-title":"Proceedings of the 4th International Conference on Algorithmic Aspects in Information and Management (AAIM)","author":"Zhou","year":"2008"}],"container-title":["Foundations and Trends\u00ae in Machine Learning"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.emerald.com\/ftmal\/article-pdf\/10\/3-4\/142\/11446918\/2200000058en.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/www.emerald.com\/ftmal\/article-pdf\/10\/3-4\/142\/11446918\/2200000058en.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T18:10:52Z","timestamp":1777486252000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.emerald.com\/ftmal\/article\/10\/3-4\/142\/1332392\/Non-convex-Optimization-for-Machine-Learning"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,12,4]]},"references-count":132,"journal-issue":{"issue":"3-4","published-print":{"date-parts":[[2017,12,4]]}},"URL":"https:\/\/doi.org\/10.1561\/2200000058","relation":{},"ISSN":["1935-8237","1935-8245"],"issn-type":[{"value":"1935-8237","type":"print"},{"value":"1935-8245","type":"electronic"}],"subject":[],"published":{"date-parts":[[2017,12,4]]}}}