{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,30]],"date-time":"2025-07-30T15:05:00Z","timestamp":1753887900903,"version":"3.41.2"},"reference-count":42,"publisher":"Wiley","issue":"1","license":[{"start":{"date-parts":[[2021,11,10]],"date-time":"2021-11-10T00:00:00Z","timestamp":1636502400000},"content-version":"vor","delay-in-days":313,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61602494"],"award-info":[{"award-number":["61602494"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["onlinelibrary.wiley.com"],"crossmark-restriction":true},"short-container-title":["Computational Intelligence and Neuroscience"],"published-print":{"date-parts":[[2021,1]]},"abstract":"<jats:p>In this work, we introduce AdaCN, a novel adaptive cubic Newton method for nonconvex stochastic optimization. AdaCN dynamically captures the curvature of the loss landscape by diagonally approximated Hessian plus the norm of difference between previous two estimates. It only requires at most first order gradients and updates with linear complexity for both time and memory. In order to reduce the variance introduced by the stochastic nature of the problem, AdaCN hires the first and second moment to implement and exponential moving average on iteratively updated stochastic gradients and approximated stochastic Hessians, respectively. We validate AdaCN in extensive experiments, showing that it outperforms other stochastic first order methods (including SGD, Adam, and AdaBound) and stochastic quasi\u2010Newton method (i.e., Apollo), in terms of both convergence speed and generalization performance.<\/jats:p>","DOI":"10.1155\/2021\/5790608","type":"journal-article","created":{"date-parts":[[2021,11,11]],"date-time":"2021-11-11T05:50:10Z","timestamp":1636609810000},"update-policy":"https:\/\/doi.org\/10.1002\/crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["AdaCN: An Adaptive Cubic Newton Method for Nonconvex Stochastic Optimization"],"prefix":"10.1155","volume":"2021","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2015-4301","authenticated-orcid":false,"given":"Yan","family":"Liu","sequence":"first","affiliation":[]},{"given":"Maojun","family":"Zhang","sequence":"additional","affiliation":[]},{"given":"Zhiwei","family":"Zhong","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1424-9523","authenticated-orcid":false,"given":"Xiangrong","family":"Zeng","sequence":"additional","affiliation":[]}],"member":"311","published-online":{"date-parts":[[2021,11,10]]},"reference":[{"key":"e_1_2_8_1_2","doi-asserted-by":"publisher","DOI":"10.1214\/aoms\/1177729586"},{"key":"e_1_2_8_2_2","article-title":"A method of solving a convex programming problem with convergence rate o (1\/k^2)","volume":"269","author":"Nesterov Y.","year":"1983","journal-title":"Soviet mathematics-Doklady"},{"key":"e_1_2_8_3_2","unstructured":"SutskeverI. MartensJ. DahlG. andHintonG. On the importance of initialization and momentum in deep learning 28 Proceedings of the 30th International Conference on Machine Learning 2013 no. 3 1139\u20131147."},{"key":"e_1_2_8_4_2","doi-asserted-by":"publisher","DOI":"10.1016\/0041-5553(64)90137-5"},{"key":"e_1_2_8_5_2","first-page":"2121","article-title":"Adaptive subgradient methods for online learning and stochastic optimization","volume":"12","author":"Duchi J.","year":"2011","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_2_8_6_2","unstructured":"ZeilerM. D. Adadelta: an adaptive learning rate method 2012 http:\/\/arxiv.org\/abs\/1212.5701."},{"key":"e_1_2_8_7_2","unstructured":"GravesA. Generating sequences with recurrent neural networks 2013 http:\/\/arxiv.org\/abs\/1308.1250850."},{"key":"e_1_2_8_8_2","unstructured":"KingmaD. P.andBaJ. Adam: a method for stochastic optimization Proceedings of the 3rd International Conference on Learning Representations ICLR 2015 May 2015 San Diego CA USA."},{"key":"e_1_2_8_9_2","unstructured":"RuderS. An overview of gradient descent optimization algorithms 2016 http:\/\/arxiv.org\/abs\/1609.04747."},{"key":"e_1_2_8_10_2","unstructured":"LoshchilovI.andHutterF. Decoupled weight decay regularization Proceedings of the 7th International Conference on Learning Representations ICLR 2019 May 2019 New Orleans LA USA."},{"key":"e_1_2_8_11_2","doi-asserted-by":"crossref","unstructured":"HuangH. WangC. andDongB. Nostalgic adam: weighting more of the past gradients when designing the adaptive learning rate Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence IJCAI 2019 August 2019 Macao China 2556\u20132562 https:\/\/doi.org\/10.24963\/ijcai.2019\/355.","DOI":"10.24963\/ijcai.2019\/355"},{"key":"e_1_2_8_12_2","unstructured":"ReddiS. J. KaleS. andKumarS. On the convergence of adam and beyond 2019 http:\/\/arxiv.org\/abs\/1904.09237."},{"key":"e_1_2_8_13_2","unstructured":"LuoL. XiongY. LiuY. andSunX. Adaptive gradient methods with dynamic bound of learning rate Proceedings of the 7th International Conference on Learning Representations ICLR 2019 May 2019 New Orleans LA USA."},{"key":"e_1_2_8_14_2","unstructured":"LiuL. JiangH. HeP. ChenW. LiuJ. G. X. andHanJ. On the variance of the adaptive learning rate and beyond 2019 http:\/\/arxiv.org\/abs\/1908.03265."},{"key":"e_1_2_8_15_2","unstructured":"WangG. LuS. TuW. andZhangL. Sadam: a variant of adam for strongly convex functions Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence IJCAI 2019 August 2019 Macao China 2556\u20132562."},{"key":"e_1_2_8_16_2","unstructured":"LiW. ZhangZ. WangX. andLuoP. Adax: adaptive gradient descent with exponential long term memory 2020 http:\/\/arxiv.org\/abs\/2004.09740."},{"key":"e_1_2_8_17_2","unstructured":"WilsonA. C. RoelofsR. SternM. SrebroN. andRechtB. The marginal value of adaptive gradient methods in machine learning Proceedings of the Annual Conference on Neural Information Processing Systems 2017 December 2017 Long Beach CA USA 4148\u20134158."},{"key":"e_1_2_8_18_2","unstructured":"KeskarN. S.andBerahasA. S. Adaqn: an adaptive quasi-newton algorithm for training rnns Proceedings of the Annual Conference on Neural Information Processing Systems 2017 December 2017 Long Beach CA USA 4148\u20134158."},{"key":"e_1_2_8_19_2","doi-asserted-by":"publisher","DOI":"10.1137\/15m1053141"},{"key":"e_1_2_8_20_2","unstructured":"MaX. Apollo: an adaptive parameter-wise diagonal quasi-newton method for nonconvex stochastic optimization 2020 http:\/\/arxiv.org\/abs\/2009.13586."},{"key":"e_1_2_8_21_2","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1994.6.1.147"},{"key":"e_1_2_8_22_2","unstructured":"YaoZ. GholamiA. ShenS. KeutzerK. andMahoneyM. W. Adahessian: an adaptive second order optimizer for machine learning 2020 http:\/\/arxiv.org\/abs\/2006.00719."},{"key":"e_1_2_8_23_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10107-006-0706-8"},{"key":"e_1_2_8_24_2","unstructured":"TripuraneniN. SternM. JinC. RegierJ. andJordanM. I. Stochastic cubic regularization for fast nonconvex optimization 2017 http:\/\/arxiv.org\/abs\/1711.02838."},{"key":"e_1_2_8_25_2","unstructured":"KohlerJ. M.andLucchiA. Sub-sampled cubic regularization for non-convex optimization 2017 http:\/\/arxiv.org\/abs\/1705.05933."},{"key":"e_1_2_8_26_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10957-018-1341-2"},{"key":"e_1_2_8_27_2","doi-asserted-by":"crossref","unstructured":"XuP. RoostaF. andMahoneyM. W. Second-order optimization for nonconvex machine learning: an empirical study Proceedings of the 2020 SIAM International Conference on Data Mining May 2020 Cincinnati OH USA https:\/\/doi.org\/10.1137\/1.9781611976236.23.","DOI":"10.1137\/1.9781611976236.23"},{"key":"e_1_2_8_28_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10957-019-01624-6"},{"key":"e_1_2_8_29_2","unstructured":"ReddiS. J. ZaheerM. SraS. PoczosB. BachF. R. SalakhutdinovR. andSmolaA. J. A generic approach for escaping saddle points 84 Proceedings of the International Conference on Artificial Intelligence and Statistics AISTATS 2018 April 2018 Playa Blanca Lanzarote Canary Islands Spain 1233\u20131242."},{"key":"e_1_2_8_30_2","unstructured":"XuY. RongJ. andYangT. First-order stochastic algorithms for escaping from saddle points in almost linear time Proceedings of the Annual Conference on Neural Information Processing Systems 2018 NeurIPS 2018 December 2018 Montreal Canada 5535\u20135545."},{"key":"e_1_2_8_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/5.726791"},{"key":"e_1_2_8_32_2","unstructured":"SimonyanK.andZissermanA. Very deep convolutional networks for large-scale image recognition Proceedings of the 3rd International Conference 180 on Learning Representations ICLR 2015 May 2015 San Diego CA USA."},{"key":"e_1_2_8_33_2","doi-asserted-by":"crossref","unstructured":"HeK. ZhangX. RenS. andSunJ. Deep residual learning for image recognition Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition CVPR 2016 June 2016 Las Vegas NV USA https:\/\/doi.org\/10.1109\/CVPR.2016.90 2-s2.0-84986274465.","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_2_8_34_2","doi-asserted-by":"crossref","unstructured":"HuangG. LiuZ. Van Der MaatenL. andWeinbergerK. Q. Densely connected convolutional networks Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition CVPR 2017 July 2017 Honolulu HI USA https:\/\/doi.org\/10.1109\/CVPR.2017.243 2-s2.0-85035343801.","DOI":"10.1109\/CVPR.2017.243"},{"key":"e_1_2_8_35_2","article-title":"Learning multiple layers of features from tiny images","author":"Krizhevsky A.","year":"2009","journal-title":"Handbook of Systemic Autoimmune Diseases"},{"key":"e_1_2_8_36_2","unstructured":"HochreiterS.andSchmidhuberJ. LSTM can solve hard long time lag problems 9 Proceedings of the 9th International Conference on Neural Information Processing Systems December 1996 Denver CO USA 473\u2013479."},{"key":"e_1_2_8_37_2","first-page":"313","article-title":"Building a large annotated corpus of English: the penn -treebank, Comput","volume":"19","author":"Marcus M. P.","year":"1993","journal-title":"Linguistics"},{"key":"e_1_2_8_38_2","first-page":"99","volume-title":"Statistical Learning Theory","author":"Vapnik V. N.","year":"1998"},{"key":"e_1_2_8_39_2","doi-asserted-by":"publisher","DOI":"10.1145\/1944345.1944349"},{"key":"e_1_2_8_40_2","first-page":"11","article-title":"If quasi-newton then why not quasi-cauchy","volume":"6","author":"Nazareth J. L.","year":"1995","journal-title":"SIAG\/OPT Views-and-News"},{"key":"e_1_2_8_41_2","doi-asserted-by":"publisher","DOI":"10.1137\/0730067"},{"key":"e_1_2_8_42_2","doi-asserted-by":"publisher","DOI":"10.1137\/S1052623498331793"}],"container-title":["Computational Intelligence and Neuroscience"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/downloads.hindawi.com\/journals\/cin\/2021\/5790608.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/downloads.hindawi.com\/journals\/cin\/2021\/5790608.xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1155\/2021\/5790608","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,6]],"date-time":"2024-08-06T12:25:32Z","timestamp":1722947132000},"score":1,"resource":{"primary":{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/10.1155\/2021\/5790608"}},"subtitle":[],"editor":[{"given":"Paolo","family":"Gastaldo","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2021,1]]},"references-count":42,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,1]]}},"alternative-id":["10.1155\/2021\/5790608"],"URL":"https:\/\/doi.org\/10.1155\/2021\/5790608","archive":["Portico"],"relation":{},"ISSN":["1687-5265","1687-5273"],"issn-type":[{"type":"print","value":"1687-5265"},{"type":"electronic","value":"1687-5273"}],"subject":[],"published":{"date-parts":[[2021,1]]},"assertion":[{"value":"2021-07-04","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-10-15","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-11-10","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}],"article-number":"5790608"}}