{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,6]],"date-time":"2026-06-06T15:31:27Z","timestamp":1780759887704,"version":"3.54.1"},"reference-count":28,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2020,4,22]],"date-time":"2020-04-22T00:00:00Z","timestamp":1587513600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100003725","name":"National Research Foundation of Korea","doi-asserted-by":"publisher","award":["NRF-2017R1E1A1A03070311"],"award-info":[{"award-number":["NRF-2017R1E1A1A03070311"]}],"id":[{"id":"10.13039\/501100003725","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Symmetry"],"abstract":"<jats:p>The process of machine learning is to find parameters that minimize the cost function constructed by learning the data. This is called optimization and the parameters at that time are called the optimal parameters in neural networks. In the process of finding the optimization, there were attempts to solve the symmetric optimization or initialize the parameters symmetrically. Furthermore, in order to obtain the optimal parameters, the existing methods have used methods in which the learning rate is decreased over the iteration time or is changed according to a certain ratio. These methods are a monotonically decreasing method at a constant rate according to the iteration time. Our idea is to make the learning rate changeable unlike the monotonically decreasing method. We introduce a method to find the optimal parameters which adaptively changes the learning rate according to the value of the cost function. Therefore, when the cost function is optimized, the learning is complete and the optimal parameters are obtained. This paper proves that the method ensures convergence to the optimal parameters. This means that our method achieves a minimum of the cost function (or effective learning). Numerical experiments demonstrate that learning is good effective when using the proposed learning rate schedule in various situations.<\/jats:p>","DOI":"10.3390\/sym12040660","type":"journal-article","created":{"date-parts":[[2020,4,23]],"date-time":"2020-04-23T02:10:52Z","timestamp":1587607852000},"page":"660","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":40,"title":["A Novel Learning Rate Schedule in Optimization for Neural Networks and It\u2019s Convergence"],"prefix":"10.3390","volume":"12","author":[{"given":"Jieun","family":"Park","sequence":"first","affiliation":[{"name":"Seongsan Liberal Arts College, Daegu University, Kyungsan 38453, Korea"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Dokkyun","family":"Yi","sequence":"additional","affiliation":[{"name":"Seongsan Liberal Arts College, Daegu University, Kyungsan 38453, Korea"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3930-985X","authenticated-orcid":false,"given":"Sangmin","family":"Ji","sequence":"additional","affiliation":[{"name":"Department of Mathematics, College of Natural Sciences, Chungnam National University, Daejeon 34134, Korea"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2020,4,22]]},"reference":[{"key":"ref_1","unstructured":"Bishop, C.M., and Wheeler, T. (2006). Pattern Recognition and Machine Learning, Springer."},{"key":"ref_2","unstructured":"Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R.R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv."},{"key":"ref_3","unstructured":"Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning, The MIT Press Cambridge."},{"key":"ref_4","unstructured":"Dean, J., Corrado, G., Monga, R., Chen, K., Devin, M., Le, Q., Mao, M., Ranzato, M., Senior, A., and Tucker, P. (2012, January 3\u20136). Large scale distributed deep networks. Proceedings of the 25th International Conference on Neural Information Processing Systems\u2014NIPS 2012, Lake Tahoe, NV, USA."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"251","DOI":"10.1162\/089976698300017746","article-title":"Natural gradient works efficiently in learning","volume":"10","author":"Amari","year":"1998","journal-title":"Neural Comput."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"82","DOI":"10.1109\/MSP.2012.2205597","article-title":"Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups","volume":"29","author":"Hinton","year":"2012","journal-title":"Signal Process. Mag."},{"key":"ref_7","unstructured":"Krizhevsky, A., Sutskever, I., and Hinton, G.E. (2012, January 3\u20136). Imagenet classification with deep convolutional neural networks. Proceedings of the Neural Information Processing Systems\u2013NIPS 2012, Lake Tahoe, NV, USA."},{"key":"ref_8","unstructured":"Pascanu, R., and Bengio, Y. (2013). Revisiting natural gradient for deep networks. arXiv."},{"key":"ref_9","unstructured":"Sutskever, I., Martens, J., Dahl, G., and Hinton, G.E. (2013, January 16\u201321). On the importance of initialization and momentum in deep learning. Proceedings of the 30th International Conference on Machine Learning\u2014ICML 2013, Atlanta, GA, USA."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Forst, W., and Hoffmann, D. (2010). Optimization\u2014Theory and Practice, Springer.","DOI":"10.1007\/978-0-387-78977-4"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Nesterov, Y. (2004). Introductory Lectures on Convex Optimization: A Basic Course, Springer.","DOI":"10.1007\/978-1-4419-8853-9"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Bengio, Y. (2012). Practical recommendations for gradient-based training of deep architectures. arXiv.","DOI":"10.1007\/978-3-642-35289-8_26"},{"key":"ref_13","unstructured":"Ge, R., Kakade, S.M., Kidambi, R., and Netrapalli, P. (2019). The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning Rate Procedure For Least Squares. arXiv."},{"key":"ref_14","unstructured":"Li, Z., and Arora, S. (2019). An Exponential Learning Rate Schedule for Deep Learning. arXiv."},{"key":"ref_15","first-page":"2121","article-title":"Adaptive subgradient methods for online learning and stochastic optimization","volume":"12","author":"Duchi","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"ref_16","unstructured":"Tieleman, T., and Hinton, G.E. (2012). Lecture 6.5\u2014RMSProp, COURSERA: Neural Networks for Machine Learning, University of Toronto. Technical Report."},{"key":"ref_17","unstructured":"Zeiler, M.D. (2012). Adadelta: An adaptive learning rate method. arXiv."},{"key":"ref_18","unstructured":"Kingma, D.P., and Ba, J. (2015, January 7\u20139). ADAM: A method for stochastic optimization. Proceedings of the 3rd International Conference for Learning Representations\u2014ICLR 2015, San Diego, CA, USA."},{"key":"ref_19","unstructured":"Reddi, S.J., Kale, S., and Kumar, S. (2019). On the Convergence of ADAM and Beyond. arXiv."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Yi, D., Ahn, J., and Ji, S. (2020). An Effective Optimization Method for Machine Learning Based on ADAM. Appl. Sci., 10.","DOI":"10.3390\/app10031073"},{"key":"ref_21","unstructured":"Kochenderfer, M., and Wheeler, T. (2019). Algorithms for Optimization, The MIT Press Cambridge."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"223","DOI":"10.1137\/16M1080173","article-title":"Optimization Methods for Large-Scale Machine Learning","volume":"60","author":"Bottou","year":"2018","journal-title":"SIAM Rev."},{"key":"ref_23","unstructured":"Roux, N.L., and Fitzgibbon, A.W. (2010, January 21\u201324). A fast natural newton method. Proceedings of the 27th International Conference on Machine Learning\u2014ICML 2010, Haifa, Israel."},{"key":"ref_24","unstructured":"Sohl-Dickstein, J., Poole, B., and Ganguli, S. (2014, January 21\u201324). Fast large-scale optimization by unifying stochastic gradient and quasi-newton methods. Proceedings of the 31st International Conference on Machine Learning\u2014ICML 2014, Beijing, China."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"533","DOI":"10.1038\/323533a0","article-title":"Learning representations by back-propagating errors","volume":"323","author":"Rumelhart","year":"1986","journal-title":"Nature"},{"key":"ref_26","unstructured":"Becker, S., and LeCun, Y. (1988). Improving the Convergence of Back-Propagation Learning with Second Order Methods, Department of Computer Science, University of Toronto. Technical Report."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Kelley, C.T. (1995). Iterative methods for linear and nonlinear equations. Frontiers in Applied Mathematics, SIAM.","DOI":"10.1137\/1.9781611970944"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Kelley, C.T. (1999). Iterative Methods for Optimization. Frontiers in Applied Mathematics, SIAM.","DOI":"10.1137\/1.9781611970920"}],"container-title":["Symmetry"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-8994\/12\/4\/660\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,13]],"date-time":"2025-10-13T14:08:59Z","timestamp":1760364539000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-8994\/12\/4\/660"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,4,22]]},"references-count":28,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2020,4]]}},"alternative-id":["sym12040660"],"URL":"https:\/\/doi.org\/10.3390\/sym12040660","relation":{},"ISSN":["2073-8994"],"issn-type":[{"value":"2073-8994","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,4,22]]}}}