{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,8]],"date-time":"2026-03-08T03:12:36Z","timestamp":1772939556169,"version":"3.50.1"},"reference-count":57,"publisher":"Institute for Operations Research and the Management Sciences (INFORMS)","issue":"1","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Mathematics of OR"],"published-print":{"date-parts":[[2026,1]]},"abstract":"<jats:p>Nonconvex constrained stochastic optimization has emerged in many important application areas. Subject to general functional constraints, it minimizes the sum of an expectation function and a nonsmooth regularizer. Main challenges arise because of the stochasticity in the random integrand and the possibly nonconvex functional constraints. To address these issues, we propose a momentum-based linearized augmented Lagrangian method (MLALM). MLALM adopts a single-loop framework and incorporates a recursive momentum scheme to compute the stochastic gradient, which enables the construction of a stochastic approximation to the augmented Lagrangian function. We provide an analysis of global convergence of MLALM. Under mild conditions and with unbounded penalty parameters, we show that the sequences of average stationarity measure and constraint violations are convergent in expectation. Under a constraint qualification assumption, the sequences of average constraint violation and complementary slackness measure converge to zero in expectation. We also explore properties of those related metrics when penalty parameters are bounded. Furthermore, we investigate oracle complexities of MLALM in terms of the total number of stochastic gradient evaluations to find an \u03f5-stationary point and an \u03f5-Karush -Kuhn -Tucker point when assuming the constraint qualification. Numerical experiments on two types of test problems reveal promising performances of the proposed algorithm.<\/jats:p>\n                  <jats:p>Funding: This work was supported by the National Natural Science Foundation of China [Grant 12271278], the Major Key Project of PCL [Grant PCL2022A05], and the Natural Science Foundation of Shanghai [Grant 21ZR1442800].<\/jats:p>","DOI":"10.1287\/moor.2022.0193","type":"journal-article","created":{"date-parts":[[2025,1,23]],"date-time":"2025-01-23T11:17:42Z","timestamp":1737631062000},"page":"92-133","source":"Crossref","is-referenced-by-count":6,"title":["A Momentum-Based Linearized Augmented Lagrangian Method for Nonconvex Constrained Stochastic Optimization"],"prefix":"10.1287","volume":"51","author":[{"given":"Qiankun","family":"Shi","sequence":"first","affiliation":[{"name":"School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510006, China; and Department of AI Computing, Pengcheng Laboratory, Shenzhen 518000, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3492-9235","authenticated-orcid":false,"given":"Xiao","family":"Wang","sequence":"additional","affiliation":[{"name":"Department of AI Computing, Pengcheng Laboratory, Shenzhen 518000, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8821-7260","authenticated-orcid":false,"given":"Hao","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China"}]}],"member":"109","reference":[{"key":"B1","doi-asserted-by":"publisher","DOI":"10.1007\/s10107-022-01822-7"},{"key":"B2","doi-asserted-by":"publisher","DOI":"10.1137\/20M1354556"},{"key":"B3","doi-asserted-by":"publisher","DOI":"10.1007\/s10107-017-1174-z"},{"key":"B4","doi-asserted-by":"publisher","DOI":"10.1007\/s10107-021-01742-y"},{"key":"B5","doi-asserted-by":"crossref","unstructured":"Boob D, Deng Q, Lan G (2024) Level constrained first order methods for function constrained optimization.\n                      Math. Programming\n                      , 1\u201361.","DOI":"10.1007\/s10107-024-02057-4"},{"key":"B6","doi-asserted-by":"publisher","DOI":"10.1137\/16M1080173"},{"key":"B7","doi-asserted-by":"publisher","DOI":"10.1007\/s10107-012-0617-9"},{"key":"B8","doi-asserted-by":"publisher","DOI":"10.1145\/1961189.1961199"},{"key":"B9","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.2015.1123157"},{"key":"B10","doi-asserted-by":"crossref","unstructured":"Chen C, Tung F, Vedula N, Mori G (2018) Constraint-aware deep neural network compression. Ferrari V, Hebert M, Sminchisescu C, Weiss Y, eds.\n                      Computer Vision ECCV 2018\n                      , Lecture Notes in Computer Science, vol. 11212 (Springer, Cham, Switzerland), 400\u2013415.","DOI":"10.1007\/978-3-030-01237-3_25"},{"key":"B11","doi-asserted-by":"publisher","DOI":"10.1007\/s10107-023-01981-1"},{"key":"B12","doi-asserted-by":"crossref","unstructured":"Curtis FE, Robinson DP, Zhou B (2024) A stochastic inexact sequential quadratic optimization algorithm for nonlinear equality-constrained optimization.\n                      INFORMS J. Optimization\n                      6(3\u20134):173\u2013195.","DOI":"10.1287\/ijoo.2022.0008"},{"key":"B13","doi-asserted-by":"crossref","unstructured":"Curtis FE, Robinson DP, Zhou B (2023) Sequential quadratic optimization for stochastic optimization with deterministic nonlinear inequality and equality constraints.\n                      SIAM J. Optimization\n                      34(4):3592\u20133622.","DOI":"10.1137\/23M1556149"},{"key":"B14","unstructured":"Cutkosky A, Orabona F (2019) Momentum-based variance reduction in non-convex SGD.\n                      NeurIPS\n                      , vol. 32 (Curran Associates, Red Hook, NY)."},{"key":"B15","unstructured":"Defazio A, Bach F, Lacoste-Julien S (2014) Saga: A fast incremental gradient method with support for non-strongly convex composite objectives.\n                      NeurIPS\n                      , vol. 27 (Curran Associates, Red Hook, NY)."},{"key":"B16","unstructured":"Fang C, Li CJ, Lin Z, Zhang T (2018) Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator.\n                      NeurIPS\n                      , vol. 31 (Curran Associates, Red Hook, NY)."},{"key":"B17","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.1991.10475100"},{"key":"B18","doi-asserted-by":"publisher","DOI":"10.1137\/120880811"},{"key":"B19","doi-asserted-by":"publisher","DOI":"10.1007\/s10107-014-0846-1"},{"key":"B20","unstructured":"Jia Z, Grimmer B (2022) First-order methods for nonsmooth nonconvex functional constrained optimization with or without slater points. Preprint, submitted December 2, https:\/\/arxiv.org\/abs\/2212.00927."},{"key":"B21","doi-asserted-by":"publisher","DOI":"10.1007\/s10589-022-00384-w"},{"key":"B22","doi-asserted-by":"crossref","unstructured":"Jin L, Wang X (2024) Stochastic nested primal-dual method for nonconvex constrained composition optimization.\n                      Math. Comput.\n                      94(351):305\u2013358.","DOI":"10.1090\/mcom\/3965"},{"key":"B23","unstructured":"Johnson R, Zhang T (2013) Accelerating stochastic gradient descent using predictive variance reduction.\n                      NeurIPS\n                      , vol. 26 (Curran Associates, Red Hook, NY)."},{"key":"B24","doi-asserted-by":"publisher","DOI":"10.1287\/ijoo.2021.0052"},{"key":"B25","unstructured":"Li Z, Chen PY, Liu S, Lu S, Xu Y (2021) Rate-improved inexact augmented Lagrangian method for constrained nonconvex optimization.\n                      Proc. 24th Internat. Conf. Artificial Intelligence Statist. (AISTATS) 2021\n                      , vol. 130 (PMLR, New York), 2170\u20132178."},{"key":"B26","doi-asserted-by":"publisher","DOI":"10.1007\/s10589-022-00358-y"},{"key":"B27","unstructured":"Ma R, Lin Q, Yang T (2020) Quadratically regularized subgradient methods for weakly convex optimization with weakly convex constraints.\n                      Internat. Conf. Machine Learn\n                      . (PMLR, New York), 6554\u20136564."},{"key":"B28","unstructured":"M\u00e1rquez-Neila P, Salzmann M, Fua P (2017) Imposing hard constraints on deep networks: Promises and limitations. Preprint, submitted."},{"key":"B29","unstructured":"Na S, Mahoney MW (2024) Statistical inference of constrained stochastic optimization via sketched sequential quadratic programming. Preprint, submitted April 13, https:\/\/arxiv.org\/abs\/2205.13687."},{"key":"B30","doi-asserted-by":"publisher","DOI":"10.1007\/s10107-022-01846-z"},{"key":"B31","doi-asserted-by":"publisher","DOI":"10.1007\/s10107-023-01935-7"},{"key":"B32","unstructured":"Nandwani Y, Pathak A, Mausam, Singla P (2019) A primal dual formulation for deep learning with constraints.\n                      NeurIPS\n                      , vol. 32 (Curran Associates, Red Hook, NY)."},{"key":"B33","first-page":"2613","volume":"70","author":"Nguyen LM","year":"2017","journal-title":"Proc. Machine Learn. Res."},{"issue":"110","key":"B34","first-page":"1","volume":"21","author":"Pham NH","year":"2020","journal-title":"J. Machine Learn. Res."},{"key":"B35","doi-asserted-by":"publisher","DOI":"10.1007\/BF02346160"},{"key":"B36","doi-asserted-by":"publisher","DOI":"10.1137\/0327068"},{"key":"B37","doi-asserted-by":"crossref","unstructured":"Ravi SN, Dinh T, Lokhande VS, Singh V (2019) Explicitly imposing constraints in deep networks via conditional gradients gives improved generalization and faster convergence.\n                      AAAI\u201919\/IAAI\u201919\/EAAI\u201919 Proc. Thirty-Third AAAI Conf. Artificial Intelligence Thirty-First Innovative Appl. Artificial Intelligence Conf. Ninth AAAI Sympos. Educational Adv. Artificial Intelligence\n                      , vol. 33 (AAAI Press, Palo Alto, CA), 4772\u20134779.","DOI":"10.1609\/aaai.v33i01.33014772"},{"key":"B38","doi-asserted-by":"publisher","DOI":"10.1007\/BF00934777"},{"key":"B39","doi-asserted-by":"crossref","unstructured":"Roy SK, Mhammedi Z, Harandi M (2018) Geometry aware constrained optimization techniques for deep learning.\n                      CVPR\n                      , 4460\u20134469.","DOI":"10.1109\/CVPR.2018.00469"},{"key":"B40","unstructured":"Sahin MF, Eftekhari A, Alacaoglu A, Latorre F, Cevher V (2019) An inexact augmented Lagrangian framework for nonconvex optimization with nonlinear constraints.\n                      NeurIPS\n                      , vol. 32 (Curran Associates, Red Hook, NY)."},{"issue":"1","key":"B41","first-page":"112","volume":"83","author":"Schmidt M","year":"2017","journal-title":"Math. Programming"},{"key":"B42","doi-asserted-by":"publisher","DOI":"10.1109\/TSP.2016.2637317"},{"key":"B43","first-page":"815","volume":"95","author":"Shang F","year":"2018","journal-title":"Proc. Machine Learn. Res."},{"key":"B44","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611976595"},{"key":"B45","doi-asserted-by":"crossref","unstructured":"Tomar VS, Rose RC (2014) Manifold regularized deep neural networks.\n                      Interspeech 2014\n                      .","DOI":"10.21437\/Interspeech.2014-82"},{"key":"B46","doi-asserted-by":"publisher","DOI":"10.1007\/s10107-020-01583-1"},{"key":"B47","doi-asserted-by":"publisher","DOI":"10.1080\/10556788.2014.940947"},{"key":"B48","doi-asserted-by":"publisher","DOI":"10.1080\/10556788.2015.1004332"},{"key":"B49","doi-asserted-by":"publisher","DOI":"10.1090\/mcom\/3178"},{"key":"B50","unstructured":"Wang Z, Ji K, Zhou Y, Liang Y, Tarokh V (2019) Spiderboost and momentum: Faster variance reduction algorithms.\n                      NeurIPS\n                      , vol. 32 (Curran Associates, Red Hook, NY)."},{"key":"B51","volume-title":"Numerical Optimization","author":"Wright S","year":"2006"},{"key":"B52","doi-asserted-by":"publisher","DOI":"10.1137\/140961791"},{"key":"B53","doi-asserted-by":"publisher","DOI":"10.1137\/18M1229869"},{"key":"B54","doi-asserted-by":"publisher","DOI":"10.1287\/ijoo.2019.0033"},{"key":"B55","doi-asserted-by":"publisher","DOI":"10.1007\/s10957-022-02132-w"},{"key":"B56","doi-asserted-by":"publisher","DOI":"10.1007\/s10898-014-0242-7"},{"key":"B57","doi-asserted-by":"publisher","DOI":"10.1016\/j.jcp.2019.05.024"}],"container-title":["Mathematics of Operations Research"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/pubsonline.informs.org\/doi\/pdf\/10.1287\/moor.2022.0193","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,7]],"date-time":"2026-03-07T09:41:20Z","timestamp":1772876480000},"score":1,"resource":{"primary":{"URL":"https:\/\/pubsonline.informs.org\/doi\/10.1287\/moor.2022.0193"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,1]]},"references-count":57,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2026,1]]}},"alternative-id":["10.1287\/moor.2022.0193"],"URL":"https:\/\/doi.org\/10.1287\/moor.2022.0193","relation":{},"ISSN":["0364-765X","1526-5471"],"issn-type":[{"value":"0364-765X","type":"print"},{"value":"1526-5471","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,1]]}}}