{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,20]],"date-time":"2026-02-20T21:41:54Z","timestamp":1771623714366,"version":"3.50.1"},"reference-count":60,"publisher":"American Mathematical Society (AMS)","issue":"351","license":[{"start":{"date-parts":[[2025,4,10]],"date-time":"2025-04-10T00:00:00Z","timestamp":1744243200000},"content-version":"am","delay-in-days":365,"URL":"https:\/\/www.ams.org\/publications\/copyright-and-permissions"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Math. Comp."],"abstract":"<p>In this paper we study the nonconvex constrained composition optimization, in which the objective contains a composition of two expected-value functions whose accurate information is normally expensive to calculate. We propose a STochastic nEsted Primal-dual (STEP) method for such problems. In each iteration, with an auxiliary variable introduced to track the inner layer function values we compute stochastic gradients of the nested function using a subsampling strategy. To alleviate difficulties caused by possibly nonconvex constraints, we construct a stochastic approximation to the linearized augmented Lagrangian function to update the primal variable, which further motivates to update the dual variable in a weighted-average way. Moreover, to better understand the asymptotic dynamics of the update schemes we consider a deterministic continuous-time system from the perspective of ordinary differential equation (ODE). We analyze the Karush-Kuhn-Tucker measure at the output by the STEP method with constant parameters and establish its iteration and sample complexities to find an <inline-formula content-type=\"math\/mathml\">\n<mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\" alttext=\"epsilon\">\n  <mml:semantics>\n    <mml:mi>\u03f5<\/mml:mi>\n    <mml:annotation encoding=\"application\/x-tex\">\\epsilon<\/mml:annotation>\n  <\/mml:semantics>\n<\/mml:math>\n<\/inline-formula>-stationary point, ensuring that expected stationarity, feasibility as well as complementary slackness are below accuracy <inline-formula content-type=\"math\/mathml\">\n<mml:math xmlns:mml=\"http:\/\/www.w3.org\/1998\/Math\/MathML\" alttext=\"epsilon\">\n  <mml:semantics>\n    <mml:mi>\u03f5<\/mml:mi>\n    <mml:annotation encoding=\"application\/x-tex\">\\epsilon<\/mml:annotation>\n  <\/mml:semantics>\n<\/mml:math>\n<\/inline-formula>. To leverage the benefit of the (near) initial feasibility in the STEP method, we propose a two-stage framework incorporating a feasibility-seeking phase, aiming to locate a nearly feasible initial point. Moreover, to enhance the adaptivity of the STEP algorithm, we propose an adaptive variant by adaptively adjusting its parameters, along with a complexity analysis. Numerical results on a risk-averse portfolio optimization problem and an orthogonal nonnegative matrix decomposition reveal the effectiveness of the proposed algorithms.<\/p>","DOI":"10.1090\/mcom\/3965","type":"journal-article","created":{"date-parts":[[2024,3,13]],"date-time":"2024-03-13T20:40:03Z","timestamp":1710362403000},"page":"305-358","source":"Crossref","is-referenced-by-count":3,"title":["Stochastic nested primal-dual method for nonconvex constrained composition optimization"],"prefix":"10.1090","volume":"94","author":[{"given":"Lingzi","family":"Jin","sequence":"first","affiliation":[]},{"given":"Xiao","family":"Wang","sequence":"additional","affiliation":[]}],"member":"14","published-online":{"date-parts":[[2024,4,10]]},"reference":[{"issue":"2","key":"1","doi-asserted-by":"publisher","first-page":"519","DOI":"10.1137\/21M1406222","article-title":"Stochastic multilevel composition optimization algorithms with level-independent convergence rates","volume":"32","author":"Balasubramanian, Krishnakumar","year":"2022","journal-title":"SIAM J. Optim.","ISSN":"https:\/\/id.crossref.org\/issn\/1052-6234","issn-type":"print"},{"issue":"1-2","key":"2","doi-asserted-by":"publisher","first-page":"595","DOI":"10.1007\/s10107-022-01859-8","article-title":"Probability maximization via Minkowski functionals: convex representations and tractable resolution","volume":"199","author":"Bardakci, I. E.","year":"2023","journal-title":"Math. Program.","ISSN":"https:\/\/id.crossref.org\/issn\/0025-5610","issn-type":"print"},{"issue":"2","key":"3","doi-asserted-by":"publisher","first-page":"1352","DOI":"10.1137\/20M1354556","article-title":"Sequential quadratic optimization for nonlinear equality constrained stochastic optimization","volume":"31","author":"Berahas, Albert S.","year":"2021","journal-title":"SIAM J. Optim.","ISSN":"https:\/\/id.crossref.org\/issn\/1052-6234","issn-type":"print"},{"issue":"1","key":"4","doi-asserted-by":"publisher","first-page":"215","DOI":"10.1007\/s10107-021-01742-y","article-title":"Stochastic first-order methods for convex and nonconvex functional constrained optimization","volume":"197","author":"Boob, Digvijay","year":"2023","journal-title":"Math. Program.","ISSN":"https:\/\/id.crossref.org\/issn\/0025-5610","issn-type":"print"},{"key":"5","unstructured":"T. Chen, Y. Sun, Q. Xiao, and W. Yin, A single-timescale method for stochastic bilevel optimization, 25th AISTATS, PMLR, vol. 151, PMLR, 2022, pp. 2466\u20132488."},{"key":"6","unstructured":"T. Chen, Y. Sun, and W. Yin, Closing the gap: Tighter analysis of alternating stochastic gradient methods for bilevel problems, NIPS, vol. 34, Curran Associates, Inc., 2021, pp. 25294\u201325307."},{"key":"7","doi-asserted-by":"publisher","first-page":"4937","DOI":"10.1109\/TSP.2021.3092377","article-title":"Solving stochastic compositional optimization is nearly as easy as solving stochastic optimization","volume":"69","author":"Chen, Tianyi","year":"2021","journal-title":"IEEE Trans. Signal Process.","ISSN":"https:\/\/id.crossref.org\/issn\/1053-587X","issn-type":"print"},{"issue":"20","key":"8","doi-asserted-by":"publisher","first-page":"5239","DOI":"10.1109\/TSP.2019.2937282","article-title":"Nonconvex optimization meets low-rank matrix factorization: an overview","volume":"67","author":"Chi, Yuejie","year":"2019","journal-title":"IEEE Trans. Signal Process.","ISSN":"https:\/\/id.crossref.org\/issn\/1053-587X","issn-type":"print"},{"key":"9","doi-asserted-by":"crossref","unstructured":"F. E. Curtis, M. J. O\u2019Neill, and D. P. Robinson, Worst-case complexity of an sqp method for nonlinear equality constrained stochastic optimization, Math. Prog. (2023), DOI: \\url{https:\/\/doi.org\/10.1007\/s10107-023-01981-1}.","DOI":"10.1007\/s10107-023-01981-1"},{"key":"10","unstructured":"B. Dai, N. He, Y. Pan, B. Boots, and L. Song, Learning from conditional distributions via dual embeddings, 20th AISTATS, PMLR, vol. 54, PMLR, 2017, pp. 1458\u20131467."},{"key":"11","unstructured":"A. M. Devraj and J. Chen, Stochastic variance reduced primal dual algorithms for empirical composition optimization, NIPS, vol. 32, Curran Associates, Inc., 2019."},{"key":"12","doi-asserted-by":"crossref","unstructured":"C. Ding, T. Li, W. Peng, and H. Park, Orthogonal nonnegative matrix t-factorizations for clustering, Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Association for Computing Machinery, 2006, pp. 126\u2013135.","DOI":"10.1145\/1150402.1150420"},{"key":"13","first-page":"38","article-title":"Two numerical methods for solving nonlinear programming problems","volume":"215","author":"Evtu\u0161enko, Ju. G.","year":"1974","journal-title":"Dokl. Akad. Nauk SSSR","ISSN":"https:\/\/id.crossref.org\/issn\/0002-3264","issn-type":"print"},{"key":"14","unstructured":"R. Ge, F. Huang, C. Jin, and Y. Yuan, Escaping from saddle points\u2014online stochastic gradient for tensor decomposition, Proceedings of The 28th Conference on Learning Theory, Proceedings of Machine Learning Research, vol. 40, PMLR, 2015, pp. 797\u2013842."},{"issue":"1","key":"15","doi-asserted-by":"publisher","first-page":"960","DOI":"10.1137\/18M1230542","article-title":"A single timescale stochastic approximation method for nested stochastic optimization","volume":"30","author":"Ghadimi, Saeed","year":"2020","journal-title":"SIAM J. Optim.","ISSN":"https:\/\/id.crossref.org\/issn\/1052-6234","issn-type":"print"},{"key":"16","series-title":"Grundlehren Text Editions","isbn-type":"print","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-56468-0","volume-title":"Fundamentals of convex analysis","author":"Hiriart-Urruty, Jean-Baptiste","year":"2001","ISBN":"https:\/\/id.crossref.org\/isbn\/3540422056"},{"key":"17","unstructured":"W. Hu, C. J. Li, X. Lian, J. Liu, and H. Yuan, Efficient smooth non-convex stochastic compositional optimization via stochastic recursive gradient descent, NIPS, vol. 32, Curran Associates, Inc., 2019."},{"issue":"1","key":"18","doi-asserted-by":"publisher","first-page":"143","DOI":"10.1007\/s10589-022-00384-w","article-title":"A stochastic primal-dual method for a class of nonconvex constrained optimization","volume":"83","author":"Jin, Lingzi","year":"2022","journal-title":"Comput. Optim. Appl.","ISSN":"https:\/\/id.crossref.org\/issn\/0926-6003","issn-type":"print"},{"key":"19","unstructured":"D. P. Kingma and J. Ba, Adam: a method for stochastic optimization, ICLR (2015)."},{"issue":"2","key":"20","doi-asserted-by":"publisher","first-page":"461","DOI":"10.1007\/s10589-020-00179-x","article-title":"Algorithms for stochastic optimization with function or expectation constraints","volume":"76","author":"Lan, Guanghui","year":"2020","journal-title":"Comput. Optim. Appl.","ISSN":"https:\/\/id.crossref.org\/issn\/0926-6003","issn-type":"print"},{"key":"21","doi-asserted-by":"crossref","unstructured":"Q. Li, Z. Zhu, G. Tang, and M. B. Wakin, The geometry of equality-constrained global consensus problems, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 7928\u20137932.","DOI":"10.1109\/ICASSP.2019.8682568"},{"key":"22","unstructured":"Z. Li, P.-Y. Chen, S. Liu, S. Lu, and Y. Xu, Rate-improved inexact augmented lagrangian method for constrained nonconvex optimization, 24th AISTATS, PMLR, vol. 130, PMLR, 13\u201315 Apr 2021, pp. 2170\u20132178."},{"key":"23","unstructured":"X. Lian, M. Wang, and J. Liu, Finite-sum composition optimization via variance reduced gradient descent, 20th AISTATS, PMLR, vol. 54, PMLR, 2017, pp. 1159\u20131167."},{"issue":"1","key":"24","doi-asserted-by":"publisher","first-page":"175","DOI":"10.1007\/s10589-022-00358-y","article-title":"Complexity of an inexact proximal-point penalty method for constrained smooth non-convex optimization","volume":"82","author":"Lin, Qihang","year":"2022","journal-title":"Comput. Optim. Appl.","ISSN":"https:\/\/id.crossref.org\/issn\/0926-6003","issn-type":"print"},{"key":"25","unstructured":"T. Lin, C. Jin, and M. Jordan, On gradient descent ascent for nonconvex-concave minimax problems, 37th ICML, PMLR, vol. 119, PMLR, 2020, pp. 6083\u20136093."},{"key":"26","series-title":"Cowles Foundation for Research in Economics at Yale University, Monograph 16","volume-title":"Portfolio selection: Efficient diversification of investments","author":"Markowitz, Harry M.","year":"1959"},{"key":"27","unstructured":"Y. Nandwani, A. Pathak, Mausam, and P. Singla, A primal dual formulation for deep learning with constraints, NIPS, vol. 32, Curran Associates, Inc., 2019."},{"key":"28","series-title":"Springer Series in Operations Research and Financial Engineering","isbn-type":"print","volume-title":"Numerical optimization","author":"Nocedal, Jorge","year":"2006","ISBN":"https:\/\/id.crossref.org\/isbn\/9780387303031","edition":"2"},{"issue":"2","key":"29","doi-asserted-by":"publisher","first-page":"856","DOI":"10.1137\/16M1107863","article-title":"Orthogonal nonnegative matrix factorization by sparsity and nuclear norm optimization","volume":"39","author":"Pan, Junjun","year":"2018","journal-title":"SIAM J. Matrix Anal. Appl.","ISSN":"https:\/\/id.crossref.org\/issn\/0895-4798","issn-type":"print"},{"issue":"1","key":"30","doi-asserted-by":"publisher","first-page":"B55--B81","DOI":"10.1137\/19M1294708","article-title":"Orthogonal nonnegative Tucker decomposition","volume":"43","author":"Pan, Junjun","year":"2021","journal-title":"SIAM J. Sci. Comput.","ISSN":"https:\/\/id.crossref.org\/issn\/1064-8275","issn-type":"print"},{"key":"31","unstructured":"C. Paquette, H. Lin, D. Drusvyatskiy, J. Mairal, and Z. Harchaoui, Catalyst acceleration for gradient-based non-convex optimization,  arXiv:1703.10993 (2018)."},{"key":"32","doi-asserted-by":"crossref","unstructured":"N. Parikh and S. Boyd, Proximal algorithms, Found. Trends Optim. 1 (2014), no. 3, 123\u2013231.","DOI":"10.1561\/2400000003"},{"key":"33","doi-asserted-by":"crossref","unstructured":"F. Pompili, N. Gillis, P.-A. Absil, and F. Glineur, Two algorithms for orthogonal nonnegative matrix factorization with application to clustering, Neurocomputing 141 (2014), 15\u201325.","DOI":"10.1016\/j.neucom.2014.02.018"},{"issue":"3","key":"34","doi-asserted-by":"publisher","first-page":"1087","DOI":"10.1080\/10556788.2021.1895152","article-title":"Weakly-convex-concave min-max optimization: provable algorithms and applications in machine learning","volume":"37","author":"Rafique, Hassan","year":"2022","journal-title":"Optim. Methods Softw.","ISSN":"https:\/\/id.crossref.org\/issn\/1055-6788","issn-type":"print"},{"key":"35","unstructured":"R. T. Rockafellar and R. J-B. Wets, Variational Analysis, Springer Science & Business Media, 2009."},{"key":"36","doi-asserted-by":"publisher","first-page":"555","DOI":"10.1007\/BF00934777","article-title":"The multiplier method of Hestenes and Powell applied to convex programming","volume":"12","author":"Rockafellar, R. T.","year":"1973","journal-title":"J. Optim. Theory Appl.","ISSN":"https:\/\/id.crossref.org\/issn\/0022-3239","issn-type":"print"},{"issue":"3","key":"37","doi-asserted-by":"publisher","first-page":"2301","DOI":"10.1137\/20M1312952","article-title":"A stochastic subgradient method for nonsmooth nonconvex multilevel composition optimization","volume":"59","author":"Ruszczy\u0144ski, Andrzej","year":"2021","journal-title":"SIAM J. Control Optim.","ISSN":"https:\/\/id.crossref.org\/issn\/0363-0129","issn-type":"print"},{"key":"38","unstructured":"M. F. Sahin, A. Eftekhari, A. Alacaoglu, F. Latorre, and V. Cevher, An inexact augmented lagrangian framework for nonconvex optimization with nonlinear constraints, NeurIPS (2019)."},{"key":"39","unstructured":"Q. Shi, X. Wang, and H. Wang, A momentum-based linearized augmented lagrangian method for nonconvex constrained stochastic optimization, optimization-online.org (2022), \\url{https:\/\/optimization-online.org\/?p=19870}."},{"key":"40","doi-asserted-by":"crossref","unstructured":"R. Stubbs and D. Vandenbussche, Constraint attribution, J. Portfolio Manage. 36 (2010), 48\u201359.","DOI":"10.3905\/jpm.2010.36.4.048"},{"key":"41","unstructured":"Q. Tran-Dinh, N. Pham, and L. Nguyen, Stochastic Gauss-Newton algorithms for nonconvex compositional optimization, 37th ICML, PMLR, vol. 119, PMLR, 2020, pp. 9572\u20139582."},{"key":"42","unstructured":"R. Tutunov, M. Li, A. I. Cowen-Rivers, J. Wang, and H. Bou-Ammar, Compositional adam: an adaptive compositional solver,  arXiv:2002.03755v2 [cs.LG] 24 Apr (2020)."},{"issue":"1-2","key":"43","doi-asserted-by":"publisher","first-page":"419","DOI":"10.1007\/s10107-016-1017-3","article-title":"Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions","volume":"161","author":"Wang, Mengdi","year":"2017","journal-title":"Math. Program.","ISSN":"https:\/\/id.crossref.org\/issn\/0025-5610","issn-type":"print"},{"key":"44","doi-asserted-by":"crossref","unstructured":"M. Wang and J. Liu, A stochastic compositional gradient method using markov samples, WSC, 2016, pp. 702\u2013713.","DOI":"10.1109\/WSC.2016.7822134"},{"key":"45","first-page":"Paper No. 105, 23","article-title":"Accelerating stochastic composition optimization","volume":"18","author":"Wang, Mengdi","year":"2017","journal-title":"J. Mach. Learn. Res.","ISSN":"https:\/\/id.crossref.org\/issn\/1532-4435","issn-type":"print"},{"issue":"306","key":"46","doi-asserted-by":"publisher","first-page":"1793","DOI":"10.1090\/mcom\/3178","article-title":"Penalty methods with stochastic approximation for stochastic nonlinear programming","volume":"86","author":"Wang, Xiao","year":"2017","journal-title":"Math. Comp.","ISSN":"https:\/\/id.crossref.org\/issn\/0025-5718","issn-type":"print"},{"issue":"3","key":"47","doi-asserted-by":"publisher","first-page":"559","DOI":"10.1080\/10556788.2014.940947","article-title":"An augmented Lagrangian trust region method for equality constrained optimization","volume":"30","author":"Wang, Xiao","year":"2015","journal-title":"Optim. Methods Softw.","ISSN":"https:\/\/id.crossref.org\/issn\/1055-6788","issn-type":"print"},{"issue":"5","key":"48","doi-asserted-by":"publisher","first-page":"934","DOI":"10.1080\/10556788.2015.1004332","article-title":"An augmented Lagrangian affine scaling method for nonlinear programming","volume":"30","author":"Wang, Xiao","year":"2015","journal-title":"Optim. Methods Softw.","ISSN":"https:\/\/id.crossref.org\/issn\/1055-6788","issn-type":"print"},{"issue":"2","key":"49","doi-asserted-by":"publisher","first-page":"1664","DOI":"10.1137\/18M1229869","article-title":"Primal-dual stochastic gradient method for convex programs with many functional constraints","volume":"30","author":"Xu, Yangyang","year":"2020","journal-title":"SIAM J. Optim.","ISSN":"https:\/\/id.crossref.org\/issn\/1052-6234","issn-type":"print"},{"issue":"1","key":"50","doi-asserted-by":"crossref","first-page":"89","DOI":"10.1287\/ijoo.2019.0033","article-title":"First-order methods for constrained convex programming based on linearized augmented Lagrangian function","volume":"3","author":"Xu, Yangyang","year":"2021","journal-title":"INFORMS J. Optim.","ISSN":"https:\/\/id.crossref.org\/issn\/2575-1484","issn-type":"print"},{"key":"51","unstructured":"J. Yang, A. Orvieto, A. Lucchi, and N. He, Faster single-loop algorithms for minimax optimization without strong concavity, 25th AISTATS, PMLR, vol. 151, PMLR, 28\u201330 Mar 2022, pp. 5485\u20135517."},{"key":"52","doi-asserted-by":"crossref","unstructured":"S. Yang, X. Li, and G. Lan, Data-driven minimax optimization with expectation constraints, Operations Research (2024), DOI:10.1287\/opre.2022.0110.","DOI":"10.1287\/opre.2022.0110"},{"issue":"1","key":"53","doi-asserted-by":"publisher","first-page":"616","DOI":"10.1137\/18M1164846","article-title":"Multilevel stochastic gradient methods for nested composition optimization","volume":"29","author":"Yang, Shuoguang","year":"2019","journal-title":"SIAM J. Optim.","ISSN":"https:\/\/id.crossref.org\/issn\/1052-6234","issn-type":"print"},{"key":"54","doi-asserted-by":"crossref","unstructured":"Z. Yang and E. Oja, Linear and nonlinear projective nonnegative matrix factorization, IEEE Trans. Neural Netw. 21 (2010), no. 5, 734\u2013749.","DOI":"10.1109\/TNN.2010.2041361"},{"key":"55","doi-asserted-by":"crossref","unstructured":"Y. Yu and L. Huang, Fast stochastic variance reduced admm for stochastic composition optimization, 26th IJCAI, 2017, pp. 3364\u20133370.","DOI":"10.24963\/ijcai.2017\/470"},{"key":"56","unstructured":"A. Zhang, Z. C. Lipton, M. Li, and A. J. Smola, Dive into Deep Learning, Cambridge University Press, 2023."},{"key":"57","unstructured":"J. Zhang and L. Xiao, A composite randomized incremental gradient method, 36th ICML, PMLR, vol. 97, PMLR, 2019, pp. 7454\u20137462."},{"issue":"2","key":"58","doi-asserted-by":"publisher","first-page":"1131","DOI":"10.1137\/19M1285457","article-title":"Multilevel composite stochastic optimization via nested variance reduction","volume":"31","author":"Zhang, Junyu","year":"2021","journal-title":"SIAM J. Optim.","ISSN":"https:\/\/id.crossref.org\/issn\/1052-6234","issn-type":"print"},{"issue":"1-2","key":"59","doi-asserted-by":"publisher","first-page":"649","DOI":"10.1007\/s10107-021-01709-z","article-title":"Stochastic variance-reduced prox-linear algorithms for nonconvex composite optimization","volume":"195","author":"Zhang, Junyu","year":"2022","journal-title":"Math. Program.","ISSN":"https:\/\/id.crossref.org\/issn\/0025-5610","issn-type":"print"},{"key":"60","unstructured":"Zhe Zhang and Guanghui Lan, Optimal algorithms for convex nested stochastic composite optimization, arXiv::2011.10076v5 [math.OC] 21 Jun (2021)."}],"container-title":["Mathematics of Computation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.ams.org\/mcom\/2025-94-351\/S0025-5718-2024-03965-0\/S0025-5718-2024-03965-0.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,10]],"date-time":"2024-10-10T17:50:32Z","timestamp":1728582632000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.ams.org\/mcom\/2025-94-351\/S0025-5718-2024-03965-0\/"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,4,10]]},"references-count":60,"journal-issue":{"issue":"351","published-print":{"date-parts":[[2025,1]]}},"alternative-id":["S0025-5718-2024-03965-0"],"URL":"https:\/\/doi.org\/10.1090\/mcom\/3965","archive":["CLOCKSS","Portico"],"relation":{},"ISSN":["0025-5718","1088-6842"],"issn-type":[{"value":"0025-5718","type":"print"},{"value":"1088-6842","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,4,10]]}}}