{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,13]],"date-time":"2025-10-13T15:35:33Z","timestamp":1760369733976,"version":"build-2065373602"},"reference-count":32,"publisher":"MDPI AG","issue":"6","license":[{"start":{"date-parts":[[2021,6,19]],"date-time":"2021-06-19T00:00:00Z","timestamp":1624060800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>Optimization methods are of great importance for the efficient training of neural networks. There are many articles in the literature that propose particular variants of existing optimizers. In our article, we propose the use of the combination of two very different optimizers that, when used simultaneously, can exceed the performance of the single optimizers in very different problems. We propose a new optimizer called ATMO (AdapTive Meta Optimizers), which integrates two different optimizers simultaneously weighing the contributions of both. Rather than trying to improve each single one, we leverage both at the same time, as a meta-optimizer, by taking the best of both. We have conducted several experiments on the classification of images and text documents, using various types of deep neural models, and we have demonstrated through experiments that the proposed ATMO produces better performance than the single optimizers.<\/jats:p>","DOI":"10.3390\/a14060186","type":"journal-article","created":{"date-parts":[[2021,6,20]],"date-time":"2021-06-20T22:00:02Z","timestamp":1624226402000},"page":"186","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":14,"title":["Combining Optimization Methods Using an Adaptive Meta Optimizer"],"prefix":"10.3390","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0565-7496","authenticated-orcid":false,"given":"Nicola","family":"Landro","sequence":"first","affiliation":[{"name":"Department of Theoretical and Applied Sciences, University of Insubria, 21100 Varese, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7076-8328","authenticated-orcid":false,"given":"Ignazio","family":"Gallo","sequence":"additional","affiliation":[{"name":"Department of Theoretical and Applied Sciences, University of Insubria, 21100 Varese, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4355-0366","authenticated-orcid":false,"given":"Riccardo","family":"La Grassa","sequence":"additional","affiliation":[{"name":"Department of Theoretical and Applied Sciences, University of Insubria, 21100 Varese, Italy"}]}],"member":"1968","published-online":{"date-parts":[[2021,6,19]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"400","DOI":"10.1214\/aoms\/1177729586","article-title":"A stochastic approximation method","volume":"22","author":"Robbins","year":"1951","journal-title":"Ann. Math. Stat."},{"key":"ref_2","unstructured":"Kingma, D.P., and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv."},{"key":"ref_3","unstructured":"Zaheer, M., Reddi, S., Sachan, D., Kale, S., and Kumar, S. (2018). Adaptive methods for nonconvex optimization. Advances in Neural Information Processing Systems, MIT Press."},{"key":"ref_4","unstructured":"Luo, L., Xiong, Y., Liu, Y., and Sun, X. (May, January 30). Adaptive Gradient Methods with Dynamic Bound of Learning Rate. Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"2664","DOI":"10.1080\/01431161.2019.1694725","article-title":"Analysis of various optimizers on deep convolutional neural network model in the application of hyperspectral remote sensing image classification","volume":"41","author":"Bera","year":"2020","journal-title":"Int. J. Remote Sens."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Graves, A. (2013). Generating sequences with recurrent neural networks. arXiv.","DOI":"10.1007\/978-3-642-24797-2_3"},{"key":"ref_7","first-page":"2121","article-title":"Adaptive subgradient methods for online learning and stochastic optimization","volume":"12","author":"Duchi","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"ref_8","unstructured":"Zeiler, M.D. (2012). Adadelta: An adaptive learning rate method. arXiv."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Kobayashi, T. (2020, January 25\u201328). SCW-SGD: Stochastically Confidence-Weighted SGD. Proceedings of the 2020 IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates.","DOI":"10.1109\/ICIP40778.2020.9190992"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Zhang, Z. (2018, January 4\u20136). Improved adam optimizer for deep neural networks. Proceedings of the 2018 IEEE\/ACM 26th International Symposium on Quality of Service (IWQoS), Banff, AB, Canada.","DOI":"10.1109\/IWQoS.2018.8624183"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Pawe\u0142czyk, K., Kawulok, M., and Nalepa, J. (2018, January 15\u201319). Genetically-trained deep neural networks. Proceedings of the Genetic and Evolutionary Computation Conference Companion, Kyoto, Japan.","DOI":"10.1145\/3205651.3208763"},{"key":"ref_12","unstructured":"Landro Nicola, G.I., and Riccardo, L.G. (2021, June 18). Mixing ADAM and SGD: A Combined Optimization Method with Pytorch. Available online: https:\/\/gitlab.com\/nicolalandro\/multi_optimizer."},{"key":"ref_13","unstructured":"Keskar, N.S., and Socher, R. (2017). Improving generalization performance by switching from adam to sgd. arXiv."},{"key":"ref_14","unstructured":"Cui, X., Zhang, W., T\u00fcske, Z., and Picheny, M. (2018). Evolutionary stochastic gradient descent for optimization of deep neural networks. Advances in Neural Information Processing Systems, MIT Press."},{"key":"ref_15","unstructured":"Loshchilov, I., and Hutter, F. (2017). Decoupled weight decay regularization. arXiv."},{"key":"ref_16","unstructured":"Loshchilov, I., and Hutter, F. (2020, June 18). Fixing Weight Decay Regularization in Adam. Available online: https:\/\/openreview.net\/forum?id=rk6qdGgCZ."},{"key":"ref_17","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv."},{"key":"ref_18","unstructured":"Chen, J., Zhou, D., Tang, Y., Yang, Z., and Gu, Q. (2018). Closing the generalization gap of adaptive gradient methods in training deep neural networks. arXiv."},{"key":"ref_19","unstructured":"Krogh, A., and Hertz, J.A. (1992). A simple weight decay can improve generalization. Advances in Neural Information Processing Systems, MIT Press."},{"key":"ref_20","unstructured":"Sutskever, I., Martens, J., Dahl, G., and Hinton, G. (2013, January 16\u201321). On the importance of initialization and momentum in deep learning. Proceedings of the International Conference on Machine Learning, Atlanta, GA, USA."},{"key":"ref_21","unstructured":"Damaskinos, G., Mhamdi, E.M.E., Guerraoui, R., Patra, R., and Taziki, M. (2018). Asynchronous Byzantine machine learning (the case of SGD). arXiv."},{"key":"ref_22","unstructured":"Liu, C., and Belkin, M. (2018). Accelerating SGD with momentum for over-parameterized learning. arXiv."},{"key":"ref_23","unstructured":"Reddi, S.J., Kale, S., and Kumar, S. (2019). On the convergence of adam and beyond. arXiv."},{"key":"ref_24","unstructured":"Lee, J.D., Simchowitz, M., Jordan, M.I., and Recht, B. (2016, January 23\u201326). Gradient descent only converges to minimizers. Proceedings of the Conference on Learning Theory, New York, NY, USA."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"175","DOI":"10.1093\/comjnl\/3.3.175","article-title":"An automatic method for finding the greatest or least value of a function","volume":"3","author":"Rosenbrock","year":"1960","journal-title":"Comput. J."},{"key":"ref_26","unstructured":"Krizhevsky, A., and Hinton, G. (2009). Learning Multiple Layers of Features from Tiny Images, Citeseer."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Warstadt, A., Singh, A., and Bowman, S.R. (2018). Neural Network Acceptability Judgments. arXiv.","DOI":"10.1162\/tacl_a_00290"},{"key":"ref_28","unstructured":"Gulli, A. (2020, October 15). AG\u2019s Corpus of News Articles. Available online: http:\/\/groups.di.unipi.it\/~gulli\/\\AG_corpus_of_news_articles.html."},{"key":"ref_29","unstructured":"Zhang, X., Zhao, J., and LeCun, Y. (2015). Character-level convolutional networks for text classification. Advances in Neural Information Processing Systems, MIT Press."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2015). Deep Residual Learning for Image Recognition. arXiv.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_31","unstructured":"Targ, S., Almeida, D., and Lyman, K. (2016). Resnet in resnet: Generalizing residual architectures. arXiv."},{"key":"ref_32","unstructured":"Huggingface.co (2020, October 15). Bert Base Uncased Pre-Trained Model. Available online: https:\/\/huggingface.co\/bert-base-uncased."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/14\/6\/186\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T06:19:06Z","timestamp":1760163546000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/14\/6\/186"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,6,19]]},"references-count":32,"journal-issue":{"issue":"6","published-online":{"date-parts":[[2021,6]]}},"alternative-id":["a14060186"],"URL":"https:\/\/doi.org\/10.3390\/a14060186","relation":{},"ISSN":["1999-4893"],"issn-type":[{"type":"electronic","value":"1999-4893"}],"subject":[],"published":{"date-parts":[[2021,6,19]]}}}