{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T16:02:53Z","timestamp":1740153773537,"version":"3.37.3"},"reference-count":33,"publisher":"Springer Science and Business Media LLC","issue":"4","license":[{"start":{"date-parts":[[2024,9,16]],"date-time":"2024-09-16T00:00:00Z","timestamp":1726444800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,9,16]],"date-time":"2024-09-16T00:00:00Z","timestamp":1726444800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Math. Prog. Comp."],"published-print":{"date-parts":[[2024,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Conjugate gradient minimization methods (CGM) and their accelerated variants are widely used. We focus on the use of cubic regularization to improve the CGM direction independent of the step length computation. In this paper, we propose the Hybrid Cubic Regularization of CGM, where regularized steps are used selectively. Using Shanno\u2019s reformulation of CGM as a memoryless BFGS method, we derive new formulas for the regularized step direction. We show that the regularized step direction uses the same order of computational burden per iteration as its non-regularized version. Moreover, the Hybrid Cubic Regularization of CGM exhibits global convergence with fewer assumptions. In numerical experiments, the new step directions are shown to require fewer iteration counts, improve runtime, and reduce the need to reset the step direction. Overall, the Hybrid Cubic Regularization of CGM exhibits the same memoryless and matrix-free properties, while outperforming CGM as a memoryless BFGS method in iterations and runtime.<\/jats:p>","DOI":"10.1007\/s12532-024-00265-9","type":"journal-article","created":{"date-parts":[[2024,9,16]],"date-time":"2024-09-16T07:02:25Z","timestamp":1726470145000},"page":"629-664","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Regularized step directions in nonlinear conjugate gradient methods"],"prefix":"10.1007","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4157-4273","authenticated-orcid":false,"given":"Cassidy K.","family":"Buhler","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5554-9928","authenticated-orcid":false,"given":"Hande Y.","family":"Benson","sequence":"additional","affiliation":[]},{"given":"David F.","family":"Shanno","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,9,16]]},"reference":[{"key":"265_CR1","doi-asserted-by":"crossref","unstructured":"Benson, H., Shanno, D.: Interior-point methods for nonconvex nonlinear programming: cubic regularization. Comput. Optim. Appl. 58 (2014)","DOI":"10.1007\/s10589-013-9626-8"},{"issue":"4","key":"265_CR2","doi-asserted-by":"publisher","first-page":"457","DOI":"10.1007\/s12532-018-0136-7","volume":"10","author":"HY Benson","year":"2018","unstructured":"Benson, H.Y., Shanno, D.F.: Cubic regularization in symmetric rank-1 quasi-Newton methods. Math. Program. Comput. 10(4), 457\u2013486 (2018)","journal-title":"Math. Program. Comput."},{"key":"265_CR3","doi-asserted-by":"publisher","unstructured":"Buhler, C.K.: Conmin-CG (2024). https:\/\/doi.org\/10.5281\/zenodo.13315592","DOI":"10.5281\/zenodo.13315592"},{"key":"265_CR4","doi-asserted-by":"publisher","unstructured":"Cartis, C., Gould, N., Toint, P.: Adaptive cubic regularisation methods for unconstrained optimization. Part I: motivation, convergence and numerical results. Math. Program. 127, 245\u2013295 (2011). https:\/\/doi.org\/10.1007\/s10107-009-0286-5","DOI":"10.1007\/s10107-009-0286-5"},{"issue":"4","key":"265_CR5","doi-asserted-by":"publisher","first-page":"431","DOI":"10.1147\/rd.164.0431","volume":"16","author":"H Crowder","year":"1972","unstructured":"Crowder, H., Wolfe, P.: Linear convergence of the conjugate gradient method. IBM J. Res. Dev. 16(4), 431\u2013433 (1972)","journal-title":"IBM J. Res. Dev."},{"issue":"4","key":"265_CR6","doi-asserted-by":"publisher","first-page":"406","DOI":"10.1093\/comjnl\/10.4.406","volume":"10","author":"WC Davidon","year":"1968","unstructured":"Davidon, W.C.: Variance algorithm for minimization. Comput. J. 10(4), 406\u2013410 (1968)","journal-title":"Comput. J."},{"key":"265_CR7","first-page":"2121","volume":"12","author":"JC Duchi","year":"2011","unstructured":"Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121\u20132159 (2011)","journal-title":"J. Mach. Learn. Res."},{"issue":"2","key":"265_CR8","doi-asserted-by":"publisher","first-page":"149","DOI":"10.1093\/comjnl\/7.2.149","volume":"7","author":"R Fletcher","year":"1964","unstructured":"Fletcher, R., Reeves, C.M.: Function minimization by conjugate gradients. Comput. J. 7(2), 149\u2013154 (1964)","journal-title":"Comput. J."},{"key":"265_CR9","unstructured":"Fourer, R., Gay, D., Kernighan, B.: AMPL: A Modeling Language for Mathematical Programming. Scientific Press (1993)"},{"key":"265_CR10","doi-asserted-by":"publisher","first-page":"545","DOI":"10.1007\/s10589-014-9687-3","volume":"60","author":"NI Gould","year":"2015","unstructured":"Gould, N.I., Orban, D., Toint, P.L.: CUTEst: a constrained and unconstrained testing environment with safe threads for mathematical optimization. Comput. Optim. Appl. 60, 545\u2013557 (2015)","journal-title":"Comput. Optim. Appl."},{"key":"265_CR11","unstructured":"Griewank, A.: The modification of Newton\u2019s method for unconstrained optimization by bounding cubic terms. Technical Report NA\/12, University of Cambridge (1981)"},{"key":"265_CR12","doi-asserted-by":"crossref","unstructured":"Hestenes, M.R., Stiefel, E.: Methods of Conjugate Gradients for Solving Linear Systems, vol.\u00a049. NBS (1952)","DOI":"10.6028\/jres.049.044"},{"key":"265_CR13","unstructured":"Hinton, G., Srivastava, N., Swersky, K.: Neural networks for machine learning lecture 6a overview of mini-batch gradient descent. (2012). Retrieved April 27, 2021, from https:\/\/www.cs.toronto.edu\/~hinton\/coursera\/lecture6\/lec6.pdf"},{"key":"265_CR14","unstructured":"Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the International Conference on Learning Representations (ICLR) (2015)"},{"key":"265_CR15","doi-asserted-by":"publisher","first-page":"309","DOI":"10.1007\/BF01584673","volume":"4","author":"M Lenard","year":"1973","unstructured":"Lenard, M.: Practical convergence conditions for unconstrained optimization. Math. Program. 4, 309\u2013323 (1973)","journal-title":"Math. Program."},{"key":"265_CR16","doi-asserted-by":"publisher","first-page":"164","DOI":"10.1090\/qam\/10666","volume":"2","author":"K Levenberg","year":"1944","unstructured":"Levenberg, K.: A method for the solution of certain problems in least squares. Q. Appl. Math. 2, 164\u2013168 (1944)","journal-title":"Q. Appl. Math."},{"key":"265_CR17","doi-asserted-by":"publisher","first-page":"431","DOI":"10.1137\/0111030","volume":"11","author":"D Marquardt","year":"1963","unstructured":"Marquardt, D.: An algorithm for least-squares estimation of nonlinear parameters. SIAM J. Appl. Math. 11, 431\u2013441 (1963)","journal-title":"SIAM J. Appl. Math."},{"key":"265_CR18","unstructured":"MathWorks: Choose a multilayer neural network training function. MathWorks Documentation. https:\/\/www.mathworks.com\/help\/deeplearning\/ug\/choose-a-multilayer-neural-network-training-function.html. Retrieved April 27, 2021"},{"issue":"4","key":"265_CR19","doi-asserted-by":"publisher","first-page":"525","DOI":"10.1016\/S0893-6080(05)80056-5","volume":"6","author":"MF M\u00f8ller","year":"1993","unstructured":"M\u00f8ller, M.F.: A scaled conjugate gradient algorithm for fast supervised learning. Neural Netw. 6(4), 525\u2013533 (1993)","journal-title":"Neural Netw."},{"key":"265_CR20","doi-asserted-by":"publisher","first-page":"177","DOI":"10.1007\/s10107-006-0706-8","volume":"108","author":"Y Nesterov","year":"2006","unstructured":"Nesterov, Y., Polyak, B.: Cubic regularization of Newton method and its global performance. Math. Program. 108, 177\u2013205 (2006). https:\/\/doi.org\/10.1007\/s10107-006-0706-8","journal-title":"Math. Program."},{"issue":"1","key":"265_CR21","doi-asserted-by":"publisher","first-page":"70","DOI":"10.1007\/BF01580654","volume":"10","author":"SS Oren","year":"1976","unstructured":"Oren, S.S., Spedicato, E.: Optimal conditioning of self-scaling variable metric algorithms. Math. Program. 10(1), 70\u201390 (1976)","journal-title":"Math. Program."},{"issue":"6","key":"265_CR22","doi-asserted-by":"publisher","first-page":"1073","DOI":"10.1287\/opre.26.6.1073","volume":"26","author":"A Perry","year":"1978","unstructured":"Perry, A.: Technical note-A modified conjugate gradient algorithm. Oper. Res. 26(6), 1073\u20131078 (1978). https:\/\/doi.org\/10.1287\/opre.26.6.1073","journal-title":"Oper. Res."},{"key":"265_CR23","doi-asserted-by":"crossref","unstructured":"Polak, E., Rib\u00ecere, G.: Note sur la convergence de m\u00e9thodes de directions conjugu\u00e9es. Revue fran\u00e7aise d\u2019informatique et de recherche op\u00e9rationnelle. S\u00e9rie rouge 16, 35\u201343 (1969)","DOI":"10.1051\/m2an\/196903R100351"},{"issue":"1","key":"265_CR24","doi-asserted-by":"publisher","first-page":"42","DOI":"10.1007\/BF01580369","volume":"11","author":"MJD Powell","year":"1976","unstructured":"Powell, M.J.D.: Some convergence properties of the conjugate gradient method. Math. Program. 11(1), 42\u201349 (1976)","journal-title":"Math. Program."},{"issue":"1","key":"265_CR25","doi-asserted-by":"publisher","first-page":"241","DOI":"10.1007\/BF01593790","volume":"12","author":"MJD Powell","year":"1977","unstructured":"Powell, M.J.D.: Restart procedures for the conjugate gradient method. Math. Program. 12(1), 241\u2013254 (1977)","journal-title":"Math. Program."},{"issue":"6","key":"265_CR26","doi-asserted-by":"publisher","first-page":"1247","DOI":"10.1137\/0715085","volume":"15","author":"D Shanno","year":"1978","unstructured":"Shanno, D.: On the convergence of a new conjugate gradient algorithm. SIAM J. Numer. Anal. 15(6), 1247\u20131257 (1978)","journal-title":"SIAM J. Numer. Anal."},{"issue":"3","key":"265_CR27","doi-asserted-by":"publisher","first-page":"244","DOI":"10.1287\/moor.3.3.244","volume":"3","author":"DF Shanno","year":"1978","unstructured":"Shanno, D.F.: Conjugate gradient methods with inexact searches. Math. Oper. Res. 3(3), 244\u2013256 (1978)","journal-title":"Math. Oper. Res."},{"issue":"1","key":"265_CR28","doi-asserted-by":"publisher","first-page":"87","DOI":"10.1145\/355666.355673","volume":"2","author":"DF Shanno","year":"1976","unstructured":"Shanno, D.F., Phua, K.H.: Algorithm 500: Minimization of unconstrained multivariate functions [e4]. ACM Trans. Math. Softw. (TOMS) 2(1), 87\u201394 (1976)","journal-title":"ACM Trans. Math. Softw. (TOMS)"},{"key":"265_CR29","doi-asserted-by":"publisher","first-page":"149","DOI":"10.1007\/BF01588962","volume":"14","author":"DF Shanno","year":"1978","unstructured":"Shanno, D.F., Phua, K.H.: Matrix conditioning and nonlinear optimization. Math. Program. 14, 149\u2013160 (1978)","journal-title":"Math. Program."},{"key":"265_CR30","unstructured":"Sherman, J., Morrison, W.J.: Adjustment of an inverse matrix corresponding to changes in the elements of a given column or a given row of the original matrix. In: Annals of Mathematical Statistics, vol. 20(4), pp. 621\u2013621 (1949)"},{"key":"265_CR31","unstructured":"Vanderbei, R.J.: AMPL models (1997). https:\/\/vanderbei.princeton.edu\/ampl\/nlmodels\/cute\/index.html"},{"issue":"3","key":"265_CR32","doi-asserted-by":"publisher","first-page":"413","DOI":"10.1080\/10556780600605129","volume":"22","author":"M Weiser","year":"2007","unstructured":"Weiser, M., Deuflhard, P., Erdmann, B.: Affine conjugate adaptive Newton methods for nonlinear elastomechanics. Optim. Methods Softw. 22(3), 413\u2013431 (2007)","journal-title":"Optim. Methods Softw."},{"key":"265_CR33","unstructured":"Zeiler, M.D.: Adadelta: An adaptive learning rate method. In: Proceedings of the International Conference on Machine Learning (ICML), vol.\u00a028, pp. 105\u2013112 (2012). https:\/\/www.jmlr.org\/proceedings\/papers\/v28\/zeiler13.pdf"}],"container-title":["Mathematical Programming Computation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s12532-024-00265-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s12532-024-00265-9\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s12532-024-00265-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,11,7]],"date-time":"2024-11-07T14:14:25Z","timestamp":1730988865000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s12532-024-00265-9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,9,16]]},"references-count":33,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2024,12]]}},"alternative-id":["265"],"URL":"https:\/\/doi.org\/10.1007\/s12532-024-00265-9","relation":{},"ISSN":["1867-2949","1867-2957"],"issn-type":[{"type":"print","value":"1867-2949"},{"type":"electronic","value":"1867-2957"}],"subject":[],"published":{"date-parts":[[2024,9,16]]},"assertion":[{"value":"19 May 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"12 August 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"16 September 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors have no relevant financial or non-financial interests to disclose.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}},{"value":"The software <tt>Conmin-CG<\/tt> [] is open source and available for download, .","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Code availability"}}]}}