{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,2]],"date-time":"2026-05-02T15:06:04Z","timestamp":1777734364834,"version":"3.51.4"},"reference-count":21,"publisher":"World Scientific Pub Co Pte Ltd","issue":"13","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Int. J. Patt. Recogn. Artif. Intell."],"published-print":{"date-parts":[[2020,12,15]]},"abstract":"<jats:p> In recent years, we have witnessed the rise of deep learning. Deep neural networks have proved their success in many areas. However, the optimization of these networks has become more difficult as neural networks going deeper and datasets becoming bigger. Therefore, more advanced optimization algorithms have been proposed over the past years. In this study, widely used optimization algorithms for deep learning are examined in detail. To this end, these algorithms called adaptive gradient methods are implemented for both supervised and unsupervised tasks. The behavior of the algorithms during training and results on four image datasets, namely, MNIST, CIFAR-10, Kaggle Flowers and Labeled Faces in the Wild are compared by pointing out their differences against basic optimization algorithms. <\/jats:p>","DOI":"10.1142\/s0218001420520138","type":"journal-article","created":{"date-parts":[[2020,2,6]],"date-time":"2020-02-06T06:10:18Z","timestamp":1580969418000},"page":"2052013","source":"Crossref","is-referenced-by-count":171,"title":["A Comparison of Optimization Algorithms for Deep Learning"],"prefix":"10.1142","volume":"34","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3212-6711","authenticated-orcid":false,"given":"Derya","family":"Soydaner","sequence":"first","affiliation":[{"name":"Statistics Department, Mimar Sinan Fine Arts University, \u0130stanbul 34380, Turkey"}]}],"member":"219","published-online":{"date-parts":[[2020,4,30]]},"reference":[{"key":"S0218001420520138BIB001","volume-title":"Introduction to Machine Learning","author":"Alpayd\u0131n E.","year":"2014"},{"key":"S0218001420520138BIB002","first-page":"3","volume-title":"Proc. 9th Python in Science Conference","author":"Bergstra J.","year":"2010"},{"issue":"2","key":"S0218001420520138BIB003","doi-asserted-by":"crossref","first-page":"223","DOI":"10.1137\/16M1080173","volume":"60","author":"Bottou L.","year":"2018","journal-title":"Siam Rev."},{"key":"S0218001420520138BIB004","author":"Dauphin Y.","year":"2015","journal-title":"CoRR abs\/1502.04390"},{"key":"S0218001420520138BIB005","first-page":"2013","volume-title":"Int. Conference on Learning Representations","volume":"1","author":"Dozat T.","year":"2016"},{"key":"S0218001420520138BIB006","first-page":"2121","volume":"12","author":"Duchi J.","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"S0218001420520138BIB007","volume-title":"Hands-on Machine Learning with Scikit-Learn and Tensorflow","author":"Geron A.","year":"2017"},{"key":"S0218001420520138BIB008","volume-title":"Deep Learning","author":"Goodfellow I.","year":"2016"},{"key":"S0218001420520138BIB010","volume-title":"Neural Networks for Machine Learning","author":"Hinton G.","year":"2012"},{"key":"S0218001420520138BIB015","first-page":"1097","volume":"25","author":"Krizhevsky A.","year":"2012","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"S0218001420520138BIB016","first-page":"265","volume-title":"Proc. 28th Int. Conf. Machine Learning","author":"Le Q. V.","year":"2011"},{"issue":"11","key":"S0218001420520138BIB017","doi-asserted-by":"crossref","first-page":"2278","DOI":"10.1109\/5.726791","volume":"86","author":"LeCun Y.","year":"1998","journal-title":"Proc. IEEE"},{"key":"S0218001420520138BIB018","volume-title":"Int. Conf. Learning Representations","author":"Luo L.","year":"2019"},{"key":"S0218001420520138BIB020","first-page":"2545","volume":"70","author":"Mukkamala M. C.","year":"2017","journal-title":"Proc. 34th Int. Conf. Mach. Learn."},{"key":"S0218001420520138BIB021","first-page":"372","volume":"27","author":"Nesterov Y.","year":"1983","journal-title":"Sov. Math. Doklady"},{"issue":"5","key":"S0218001420520138BIB022","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/0041-5553(64)90137-5","volume":"4","author":"Polyak B. T.","year":"1964","journal-title":"USSR Comput. Math. Math. Phys."},{"key":"S0218001420520138BIB023","volume-title":"Int. Conf. Learning Representations","author":"Reddi S. J.","year":"2018"},{"key":"S0218001420520138BIB024","doi-asserted-by":"crossref","first-page":"400","DOI":"10.1214\/aoms\/1177729586","volume":"22","author":"Robbins H.","year":"1951","journal-title":"Ann. Math. Statist."},{"key":"S0218001420520138BIB027","volume-title":"Int. Conf. Learning Representations","author":"Simonyan K.","year":"2015"},{"key":"S0218001420520138BIB029","first-page":"1139","volume-title":"Int. Conf. Machine Learning","author":"Sutskever I.","year":"2013"},{"key":"S0218001420520138BIB030","first-page":"9793","volume-title":"Adv. Neural Inf. Process. Syst.","author":"Zaheer M.","year":"2018"}],"container-title":["International Journal of Pattern Recognition and Artificial Intelligence"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.worldscientific.com\/doi\/pdf\/10.1142\/S0218001420520138","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2020,12,7]],"date-time":"2020-12-07T10:26:41Z","timestamp":1607336801000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.worldscientific.com\/doi\/abs\/10.1142\/S0218001420520138"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,4,30]]},"references-count":21,"journal-issue":{"issue":"13","published-print":{"date-parts":[[2020,12,15]]}},"alternative-id":["10.1142\/S0218001420520138"],"URL":"https:\/\/doi.org\/10.1142\/s0218001420520138","relation":{},"ISSN":["0218-0014","1793-6381"],"issn-type":[{"value":"0218-0014","type":"print"},{"value":"1793-6381","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,4,30]]}}}