{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,4]],"date-time":"2025-06-04T18:03:40Z","timestamp":1749060220131,"version":"3.37.3"},"reference-count":29,"publisher":"Springer Science and Business Media LLC","issue":"11","license":[{"start":{"date-parts":[[2022,10,15]],"date-time":"2022-10-15T00:00:00Z","timestamp":1665792000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"},{"start":{"date-parts":[[2022,10,15]],"date-time":"2022-10-15T00:00:00Z","timestamp":1665792000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.springernature.com\/gp\/researchers\/text-and-data-mining"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Appl Intell"],"published-print":{"date-parts":[[2023,6]]},"DOI":"10.1007\/s10489-022-04206-8","type":"journal-article","created":{"date-parts":[[2022,10,15]],"date-time":"2022-10-15T06:04:49Z","timestamp":1665813889000},"page":"13741-13762","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["Gradient-only surrogate to resolve learning rates for robust and consistent training of deep neural networks"],"prefix":"10.1007","volume":"53","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1603-7097","authenticated-orcid":false,"given":"Younghwan","family":"Chae","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Daniel N.","family":"Wilke","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dominic","family":"Kafka","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2022,10,15]]},"reference":[{"key":"4206_CR1","unstructured":"Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y, Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Man\u00e9 D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Vi\u00e9gas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X (2015) TensorFlow: large-scale machine learning on heterogeneous systems. software available from tensorflow.org. https:\/\/www.tensorflow.org\/. Accessed 10 Aug 2021"},{"issue":"1","key":"4206_CR2","first-page":"4148","volume":"18","author":"N Agarwal","year":"2017","unstructured":"Agarwal N, Bullins B, Hazan E (2017) Second-order stochastic optimization for machine learning in linear time. J Mach Learn Res 18(1):4148\u20134187","journal-title":"J Mach Learn Res"},{"key":"4206_CR3","doi-asserted-by":"crossref","unstructured":"Bengio Y (2012) Practical recommendations for gradient-based training of deep architectures. In: Neural networks tricks of the trade. Springer, pp 437\u2013478","DOI":"10.1007\/978-3-642-35289-8_26"},{"key":"4206_CR4","unstructured":"Bergou Eh, Diouane Y, Kunc V, Kungurtsev V, Royer CW (2018) A subsampling line search method with second-order results. arXiv:181007211"},{"issue":"4","key":"4206_CR5","doi-asserted-by":"publisher","first-page":"3312","DOI":"10.1137\/17M1154679","volume":"28","author":"R Bollapragada","year":"2018","unstructured":"Bollapragada R, Byrd R, Nocedal J (2018) Adaptive sampling strategies for stochastic optimization. SIAM J Optim 28(4):3312\u20133343","journal-title":"SIAM J Optim"},{"key":"4206_CR6","unstructured":"Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler D, Wu J, Winter C, Hesse C, Chen M, Sigler E, Litwin M, Gray S, Chess B, Clark J, Berner C, McCandlish S, Radford A, Sutskever I, Amodei D (2020) Language models are few-shot learners. In: Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H (eds) Advances in neural information processing systems, Curran associates, Inc., vol 33, pp 1877\u20131901. https:\/\/proceedings.neurips.cc\/paper\/2020\/file\/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf. Accessed 04 Oct 2021"},{"issue":"1","key":"4206_CR7","doi-asserted-by":"publisher","first-page":"127","DOI":"10.1007\/s10107-012-0572-5","volume":"134","author":"RH Byrd","year":"2012","unstructured":"Byrd RH, Chin GM, Nocedal J, Wu Y (2012) Sample size selection in optimization methods for machine learning. Math Program 134(1):127\u2013155. https:\/\/doi.org\/10.1007\/s10107-012-0572-5","journal-title":"Math Program"},{"key":"4206_CR8","unstructured":"Chae Y, Wilke DN (2019) Empirical study towards understanding line search approximations for training neural networks. arXiv:190906893"},{"issue":"3","key":"4206_CR9","doi-asserted-by":"publisher","first-page":"A1380","DOI":"10.1137\/110830629","volume":"34","author":"MP Friedlander","year":"2012","unstructured":"Friedlander MP, Schmidt M (2012) Hybrid deterministic-stochastic methods for data fitting. SIAM J Sci Comput 34(3):A1380\u2013A1405","journal-title":"SIAM J Sci Comput"},{"key":"4206_CR10","unstructured":"Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics, JMLR workshop and conference proceedings, pp 249\u2013256"},{"key":"4206_CR11","unstructured":"Goodfellow I, Bengio Y, Courville A (2016) Deep learning (Adaptive computation and machine learning series). The MIT Press"},{"key":"4206_CR12","doi-asserted-by":"crossref","unstructured":"Gupta RK (2019) Numerical methods: fundamentals and applications. Cambridge University Press","DOI":"10.1017\/9781108685306"},{"key":"4206_CR13","doi-asserted-by":"crossref","unstructured":"He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770\u2013778","DOI":"10.1109\/CVPR.2016.90"},{"key":"4206_CR14","unstructured":"Kafka D, Wilke DN (2019) Gradient-only line searches: an alternative to probabilistic line searches. arXiv:190309383"},{"issue":"1","key":"4206_CR15","doi-asserted-by":"publisher","first-page":"111","DOI":"10.1007\/s10898-020-00921-z","volume":"79","author":"D Kafka","year":"2021","unstructured":"Kafka D, Wilke DN (2021) Resolving learning rates adaptively by locating stochastic non-negative associated gradient projection points using line searches. J Glob Optim 79(1):111\u2013152","journal-title":"J Glob Optim"},{"key":"4206_CR16","unstructured":"Krizhevsky A (2009) Learning multiple layers of features from tiny images. Tech rep, Department of Computer Science, University of Toronto"},{"issue":"11","key":"4206_CR17","doi-asserted-by":"publisher","first-page":"2278","DOI":"10.1109\/5.726791","volume":"86","author":"Y LeCun","year":"1998","unstructured":"LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278\u20132324","journal-title":"Proc IEEE"},{"key":"4206_CR18","unstructured":"Liu K (2020) 95.16% on CIFAR10 with PyTorch. https:\/\/github.com\/kuangliu\/pytorch-cifar. Accessed 12 June 2021"},{"key":"4206_CR19","unstructured":"Loshchilov I, Hutter F (2017) SGDR: stochastic gradient descent with warm restarts. In: 5th international conference on learning representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference track proceedings, OpenReview.net. https:\/\/openreview.net\/forum?id=Skq89Scxx"},{"issue":"3","key":"4206_CR20","doi-asserted-by":"publisher","first-page":"531","DOI":"10.1080\/00207179208934253","volume":"55","author":"AM Lyapunov","year":"1992","unstructured":"Lyapunov AM (1992) The general problem of the stability of motion. Int J Control 55(3):531\u2013534","journal-title":"Int J Control"},{"issue":"1","key":"4206_CR21","first-page":"4262","volume":"18","author":"M Mahsereci","year":"2017","unstructured":"Mahsereci M, Hennig P (2017) Probabilistic line searches for stochastic optimization. J Mach Learn Res 18(1):4262\u20134320","journal-title":"J Mach Learn Res"},{"key":"4206_CR22","unstructured":"Masters D, Luschi C (2018) Revisiting small batch training for deep neural networks. arXiv:180407612"},{"key":"4206_CR23","first-page":"5405","volume":"33","author":"M Mutschler","year":"2020","unstructured":"Mutschler M, Zell A (2020) Parabolic approximation line search for dnns. Adv Neural Inf Process Syst 33:5405\u20135416","journal-title":"Adv Neural Inf Process Syst"},{"key":"4206_CR24","unstructured":"Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Kopf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S (2019) PyTorch: an imperative style, high-performance deep learning library. In: Wallach H, Larochelle H, Beygelzimer A, dAlch\u00e9-Buc F, Fox E, Garnett R (eds) Advances in neural information processing systems 32, Curran associates, Inc., pp 8024\u20138035. http:\/\/papers.neurips.cc\/paper\/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdfhttp:\/\/papers.neurips.cc\/paper\/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf. Accessed 06 Aug 2021"},{"key":"4206_CR25","doi-asserted-by":"crossref","unstructured":"Robbins H, Monro S (1951) A stochastic approximation method. Ann Math Stat, pp 400\u2013407","DOI":"10.1214\/aoms\/1177729586"},{"key":"4206_CR26","doi-asserted-by":"crossref","unstructured":"Snyman J, Wilke D (2018) Practical mathematical optimization: basic optimization theory and gradient-based algorithms. Springer optimization and its applications, Springer international publishing. https:\/\/books.google.co.kr\/books?id=n1dLswEACAAJ. Accessed 27 July 2021","DOI":"10.1007\/978-3-319-77586-9_5"},{"key":"4206_CR27","doi-asserted-by":"crossref","unstructured":"Strubell E, Ganesh A, McCallum A (2020) Energy and policy considerations for modern deep learning research. In: Proceedings of the AAAI conference on artificial intelligence vol 34, pp 13693\u201313696","DOI":"10.1609\/aaai.v34i09.7123"},{"key":"4206_CR28","unstructured":"Tan M, Le Q (2019) Efficientnet: rethinking model scaling for convolutional neural networks. In: International conference on machine learning, PMLR, pp 6105\u20136114"},{"issue":"3","key":"4206_CR29","doi-asserted-by":"publisher","first-page":"1460","DOI":"10.1007\/s10489-020-01892-0","volume":"51","author":"R Yedida","year":"2021","unstructured":"Yedida R, Saha S, Prashanth T (2021) LipschitzLR: using theoretically computed adaptive learning rates for fast convergence. Appl Intell 51(3):1460\u20131478","journal-title":"Appl Intell"}],"container-title":["Applied Intelligence"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10489-022-04206-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10489-022-04206-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10489-022-04206-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,31]],"date-time":"2023-05-31T21:03:53Z","timestamp":1685567033000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10489-022-04206-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,10,15]]},"references-count":29,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2023,6]]}},"alternative-id":["4206"],"URL":"https:\/\/doi.org\/10.1007\/s10489-022-04206-8","relation":{},"ISSN":["0924-669X","1573-7497"],"issn-type":[{"type":"print","value":"0924-669X"},{"type":"electronic","value":"1573-7497"}],"subject":[],"published":{"date-parts":[[2022,10,15]]},"assertion":[{"value":"25 September 2022","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"15 October 2022","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}