{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,20]],"date-time":"2026-05-20T16:25:26Z","timestamp":1779294326110,"version":"3.51.4"},"reference-count":306,"publisher":"MIT Press","issue":"9","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Neural Computation"],"published-print":{"date-parts":[[2017,9]]},"abstract":"<jats:p> Convolutional neural networks (CNNs) have been applied to visual tasks since the late 1980s. However, despite a few scattered applications, they were dormant until the mid-2000s when developments in computing power and the advent of large amounts of labeled data, supplemented by improved algorithms, contributed to their advancement and brought them to the forefront of a neural network renaissance that has seen rapid progression since 2012. In this review, which focuses on the application of CNNs to image classification tasks, we cover their development, from their predecessors up to recent state-of-the-art deep learning systems. Along the way, we analyze (1) their early successes, (2) their role in the deep learning renaissance, (3) selected symbolic works that have contributed to their recent popularity, and (4) several improvement attempts by reviewing contributions and challenges of over 300 publications. We also introduce some of their current trends and remaining challenges. <\/jats:p>","DOI":"10.1162\/neco_a_00990","type":"journal-article","created":{"date-parts":[[2017,6,10]],"date-time":"2017-06-10T00:55:19Z","timestamp":1497056119000},"page":"2352-2449","source":"Crossref","is-referenced-by-count":3001,"title":["Deep Convolutional Neural Networks for Image Classification: A Comprehensive Review"],"prefix":"10.1162","volume":"29","author":[{"given":"Waseem","family":"Rawat","sequence":"first","affiliation":[{"name":"Department of Electrical and Mining Engineering, University of South Africa, Florida 1710, South Africa"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zenghui","family":"Wang","sequence":"additional","affiliation":[{"name":"Department of Electrical and Mining Engineering, University of South Africa, Florida 1710, South Africa"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"281","reference":[{"key":"B1","first-page":"1","volume-title":"Proceedings of the 10th International Workshop on Frontiers in Handwriting Recognition","author":"Abdulkader A.","year":"2006"},{"key":"B2","volume-title":"Learning activation functions to improve deep neural networks","author":"Agostinelli F.","year":"2014"},{"key":"B3","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-88690-7_6"},{"key":"B4","first-page":"584","volume-title":"Proceedings of the 31th International Conference Machine Learning","author":"Arora S.","year":"2014"},{"key":"B5","first-page":"3084","volume-title":"Advances in neural information processing systems, 26","author":"Ba J.","year":"2013"},{"key":"B6","first-page":"1","volume-title":"Proceedings of the 3rd International Conference on Learning Representations","author":"Ba J.","year":"2015"},{"key":"B7","first-page":"4826","volume-title":"Advances in neural information processing systemsm, 29","author":"Bachman P.","year":"2016"},{"key":"B8","first-page":"3084","volume-title":"Advances in neural information processing systems, 26","author":"Baldi P.","year":"2013"},{"key":"B9","doi-asserted-by":"publisher","DOI":"10.1016\/j.artint.2014.02.004"},{"key":"B10","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2013.246"},{"key":"B11","first-page":"2613","volume-title":"Advances in neural information processing systems","volume":"29","author":"Bastani O.","year":"2016"},{"key":"B12","doi-asserted-by":"publisher","DOI":"10.1109\/IJCNN.2016.7727306"},{"key":"B13","first-page":"2399","volume":"7","author":"Belkin M.","year":"2006","journal-title":"Journal of Machine Learning Research"},{"key":"B14","doi-asserted-by":"publisher","DOI":"10.1145\/2766959"},{"key":"B15","doi-asserted-by":"publisher","DOI":"10.1561\/2200000006"},{"key":"B16","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-39593-2_1"},{"key":"B17","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2013.50"},{"key":"B18","first-page":"2814","volume-title":"Advances in neural information processing systems, 19","author":"Bengio Y.","year":"2006"},{"key":"B19","doi-asserted-by":"publisher","DOI":"10.1162\/NECO_a_00934"},{"key":"B20","doi-asserted-by":"publisher","DOI":"10.1109\/72.279181"},{"key":"B21","first-page":"226","volume-title":"Proceedings of the 31st International Conference Machine Learning","author":"Bengio Y.","year":"2014"},{"issue":"9","key":"B22","first-page":"142","volume":"17","author":"Bottou L.","year":"1998","journal-title":"On-Line Learning in Neural Networks"},{"key":"B23","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-7908-2604-3_16"},{"key":"B24","first-page":"111","volume-title":"Proceedings of the 27th International Conference on Machine Learning","author":"Boureau Y.","year":"2010"},{"key":"B25","doi-asserted-by":"publisher","DOI":"10.1142\/S0218001493000339"},{"key":"B26","first-page":"81","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Bulo S.","year":"2014"},{"key":"B27","volume-title":"Signal recovery from pooling representations","author":"Bruna J.","year":"2013"},{"key":"B28","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2013.398"},{"key":"B29","volume-title":"Batch-normalized maxout network in network","author":"Chang J.","year":"2015"},{"key":"B30","volume-title":"Return of the devil in the details: Delving deep into convolutional nets","author":"Chatfield K.","year":"2014"},{"key":"B31","volume-title":"Proceedings of the 10th International Workshop on Frontiers in Handwriting Recognition.","author":"Chellapilla K.","year":"2006"},{"key":"B32","first-page":"6067","volume-title":"Proceedings of the 18th Annual Symposium on Electronic Imaging","author":"Chellapilla K.","year":"2006"},{"key":"B33","first-page":"1","volume-title":"Proceedings of the 10th Internstional Workshop on Frontiers in Handwriting Recognition","author":"Chellapilla K.","year":"2006"},{"key":"B34","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2013.389"},{"key":"B35","volume-title":"Compressing neural networks with the hashing trick","author":"Chen W.","year":"2015"},{"key":"B36","volume-title":"Fast neural networks with circulant projections","author":"Cheng Y.","year":"2015"},{"key":"B37","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.327"},{"key":"B38","volume-title":"Training binary multilayer neural networks for image classification using expectation backpropagation","author":"Cheng Z.","year":"2015"},{"key":"B39","volume-title":"Xception: Deep learning with depthwise separable convolutions","author":"Chollet F.","year":"2016"},{"key":"B40","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2005.202"},{"key":"B41","first-page":"192","volume-title":"Proceedings 18th International Conference on Artificial Intelligence and Statistics","author":"Choromanska A.","year":"2015"},{"key":"B42","doi-asserted-by":"publisher","DOI":"10.1162\/NECO_a_00052"},{"key":"B43","first-page":"1237","volume-title":"Proceedings of the International Joint Conference on Artificial Intelligence","author":"Ciresan D. C.","year":"2011"},{"key":"B44","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2012.6248110"},{"key":"B45","first-page":"1","volume-title":"Proceedings of the 4th International Conference on Learning Representations","author":"Clevert D.","year":"2016"},{"key":"B46","first-page":"215","volume-title":"Proceedings of the 14th International Conference on Artificial Intelligence and Statistics","author":"Coates A.","year":"2011"},{"key":"B47","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2004.1327161"},{"key":"B48","first-page":"1687","volume":"7","author":"Collobert R.","year":"2006","journal-title":"Journal of Machine Learning Research"},{"key":"B49","first-page":"3123","volume-title":"Advances in neural information processing systems, 28","author":"Courbariaux M.","year":"2015"},{"key":"B50","first-page":"1","volume-title":"Advances in neural information processing systems, 29","author":"Courbariaux M.","year":"2016"},{"key":"B51","first-page":"1223","volume-title":"Advances in neural information processing systems","volume":"25","author":"Dean J.","year":"2012"},{"key":"B52","doi-asserted-by":"publisher","DOI":"10.1023\/A:1012454411458"},{"issue":"2","key":"B53","first-page":"1","volume":"3","author":"Deng L.","year":"2014","journal-title":"APSIPA Transactions on Signal and Information Processing"},{"key":"B54","doi-asserted-by":"publisher","DOI":"10.1561\/2000000039"},{"key":"B55","first-page":"2148","volume-title":"Advances in neural information processing systems","volume":"26","author":"Denil M.","year":"2013"},{"key":"B56","first-page":"1269","volume-title":"Advances in neural information processing systems","volume":"27","author":"Denton E. L.","year":"2014"},{"key":"B57","first-page":"1","volume-title":"Proceedings of the 4th International Conference on Learning Representations","author":"Dettmers T.","year":"2016"},{"key":"B58","doi-asserted-by":"publisher","DOI":"10.1016\/0022-247X(62)90004-5"},{"key":"B59","first-page":"2121","volume":"12","author":"Duchi J.","year":"2011","journal-title":"Journal of Machine Learning Research"},{"key":"B60","volume-title":"A guide to convolution arithmetic for deep learning","author":"Dumoulin V.","year":"2016"},{"key":"B61","first-page":"625","volume-title":"Journal of Machine Learning Research","volume":"11","author":"Erhan D.","year":"2010"},{"key":"B62","volume-title":"Scene parsing with multiscale feature learning, purity trees, and optimal covers","author":"Farabet C.","year":"2012"},{"key":"B63","volume-title":"Analysis of classifiers' robustness to adversarial perturbations","author":"Fawzi A.","year":"2015"},{"key":"B64","first-page":"1","volume-title":"Proceedings of the 32nd International Conference on Machine Learning","author":"Fawzi A.","year":"2015"},{"key":"B65","first-page":"11","volume-title":"Proceedings of the 4th International Conference on Development and Learning","author":"Fei-Fei L.","year":"2006"},{"key":"B66","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2006.79"},{"key":"B67","first-page":"1","volume-title":"Proceedings of the XXII Brazilian Symposium on Computer Graphics and Image Processing","author":"Filho D. P.","year":"2009"},{"key":"B68","unstructured":"Finn, C., Tan, X. Y., Duan, Y., Darrell, T., Levine, S. & Abbeel, P. (2015). Deep spatial autoencoders for visuomotor learning. arXiv 1509.06113."},{"key":"B69","volume-title":"Bio-inspired artificial intelligence: Theories, methods, and technologies.","author":"Floreano D.","year":"2008"},{"key":"B70","first-page":"291","volume-title":"Proceedings of the 6th International Joint Conference on Artificial Intelligence","author":"Fukushima K.","year":"1979"},{"key":"B71","doi-asserted-by":"publisher","DOI":"10.1007\/BF00344251"},{"key":"B72","doi-asserted-by":"publisher","DOI":"10.1016\/0031-3203(82)90024-3"},{"key":"B73","doi-asserted-by":"publisher","DOI":"10.1109\/ICPR.2002.1048232"},{"key":"B74","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.81"},{"key":"B75","first-page":"249","volume-title":"Proceedings of the 13th International Conference on Artificial Intelligence and Statistics","author":"Glorot X.","year":"2010"},{"key":"B76","first-page":"315","volume-title":"Proceedings of the 14th International Conference on Artificial Intelligence and Statistics","author":"Glorot X.","year":"2011"},{"key":"B77","volume-title":"Compressing deep convolutional networks using vector quantization","author":"Gong Y.","year":"2014"},{"key":"B78","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10584-0_26"},{"key":"B79","volume-title":"Deep learning","author":"Goodfellow I.","year":"2016"},{"key":"B80","first-page":"2672","volume-title":"Advances in neural information processing systems, 27","author":"Goodfellow I.","year":"2014"},{"key":"B81","first-page":"1","volume-title":"Proceedings of the 3rd International Conference on Learning Representations","author":"Goodfellow I. J.","year":"2015"},{"key":"B82","first-page":"1319","volume-title":"Proceedings of the 30th International Conference Machine Learning","author":"Goodfellow I. J.","year":"2013"},{"key":"B83","volume-title":"Fractional max-pooling","author":"Graham B.","year":"2014"},{"key":"B84","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2005.239"},{"key":"B85","volume-title":"Caltech-256 object category dataset","author":"Griffin G.","year":"2007"},{"key":"B86","volume-title":"Recent advances in convolutional neural networks","author":"Gu J.","year":"2015"},{"key":"B87","volume-title":"Towards deep neural network architectures robust to adversarial examples","author":"Gu S.","year":"2014"},{"key":"B88","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-662-44848-9_34"},{"key":"B89","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2015.09.116"},{"key":"B90","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2006.100"},{"key":"B91","doi-asserted-by":"publisher","DOI":"10.1002\/rob.20276"},{"key":"B92","doi-asserted-by":"publisher","DOI":"10.1145\/3007787.3001163"},{"key":"B93","first-page":"1","volume-title":"Proceedings of the 3rd International Conference on Learning Representations","author":"Han S.","year":"2016"},{"key":"B94","volume-title":"Identity matters in deep learning","author":"Hardt M.","year":"2016"},{"key":"B95","doi-asserted-by":"publisher","DOI":"10.1038\/nn.3917"},{"key":"B96","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7299173"},{"key":"B97","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10578-9_23"},{"key":"B98","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S. & Sun, J. (2015a). Deep residual learning for image recognition. arXiv 1512.03385.","DOI":"10.1109\/CVPR.2016.90"},{"key":"B99","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.123"},{"key":"B100","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46493-0_38"},{"key":"B101","doi-asserted-by":"publisher","DOI":"10.1016\/0004-3702(89)90049-0"},{"key":"B102","doi-asserted-by":"publisher","DOI":"10.1162\/089976602760128018"},{"key":"B103","doi-asserted-by":"publisher","DOI":"10.1162\/neco.2006.18.7.1527"},{"key":"B104","doi-asserted-by":"publisher","DOI":"10.1126\/science.1127647"},{"key":"B105","volume-title":"Improving neural networks by preventing co-adaptation of feature detectors","author":"Hinton G. E.","year":"2012"},{"key":"B106","volume-title":"Distilling the knowledge in a neural network","author":"Hinton G.","year":"2015"},{"key":"B107","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"B108","first-page":"284","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Huang F. J.","year":"2006"},{"key":"B109","volume-title":"Labeled faces in the wild: A database for studying face recognition in unconstrained environments","author":"Huang G. B.","year":"2007"},{"key":"B110","volume-title":"Densely connected convolutional networks","author":"Huang G.","year":"2016"},{"key":"B111","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46493-0_39"},{"key":"B112","volume-title":"Learning with a strong adversary","author":"Huang R.","year":"2016"},{"key":"B113","doi-asserted-by":"publisher","DOI":"10.1113\/jphysiol.1959.sp006308"},{"key":"B114","doi-asserted-by":"publisher","DOI":"10.1113\/jphysiol.1962.sp006837"},{"key":"B115","doi-asserted-by":"publisher","DOI":"10.1080\/09548980701418942"},{"key":"B116","volume-title":"SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and 1MB model size","author":"Iandola F. N.","year":"2016"},{"key":"B117","volume-title":"Batch renormalization: Towards reducing minibatch dependence in batch-normalized models","author":"Ioffe S.","year":"2017"},{"key":"B118","first-page":"448","volume-title":"Proceedings of the 32nd International Conference Machine Learning","author":"Ioffe S.","year":"2015"},{"key":"B119","volume-title":"Cybernetic predicting devices.","author":"Ivakhnenko A. G.","year":"1966"},{"key":"B120","first-page":"2017","volume-title":"Advances in neural information processing systems","volume":"28","author":"Jaderberg M.","year":"2015"},{"key":"B121","doi-asserted-by":"publisher","DOI":"10.5244\/C.28.88"},{"key":"B122","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2009.5459469"},{"key":"B123","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2011.235"},{"key":"B124","doi-asserted-by":"publisher","DOI":"10.1145\/2647868.2654889"},{"key":"B125","first-page":"1","volume-title":"Proceedings of the 4th International Conference on Learning Representations","author":"Jin J.","year":"2016"},{"key":"B126","volume-title":"Deep learning with S-shaped rectified linear activation units","author":"Jin X.","year":"2015"},{"key":"B127","volume-title":"Grid long short-term memory","author":"Kalchbrenner N.","year":"2015"},{"key":"B128","volume-title":"Neural machine translation in linear time","author":"Kalchbrenner N.","year":"2016"},{"key":"B129","volume-title":"CS231n: Convolutional neural networks for visual recognition","author":"Karpathy A.","year":"2016"},{"key":"B130","first-page":"3128","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Karpathy A.","year":"2016"},{"key":"B131","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206545"},{"key":"B132","volume-title":"Fast inference in sparse coding algorithms with applications to object recognition","author":"Kavukcuoglu K.","year":"2010"},{"key":"B133","first-page":"39","volume-title":"Proceedings of the 6th International Symposium on Micro Machine and Human Science","author":"Kennedy J.","year":"1995"},{"key":"B134","volume-title":"Bitwise neural networks","author":"Kim M.","year":"2016"},{"key":"B135","volume-title":"Compression of deep convolutional neural networks for fast and low power mobile applications","author":"Kim Y.","year":"2015"},{"key":"B136","volume-title":"Adam: A method for stochastic optimization","author":"Kingma D.","year":"2014"},{"key":"B137","volume-title":"Auto-encoding variational Bayes","author":"Kingma D. P.","year":"2014"},{"key":"B138","volume-title":"Understanding convolutional neural networks","author":"Koushik J.","year":"2016"},{"key":"B139","volume-title":"Learning multiple layers of features from tiny images.","author":"Krizhevsky A.","year":"2009"},{"key":"B140","volume-title":"One weird trick for parallelizing convolutional neural networks","author":"Krizhevsky A.","year":"2014"},{"key":"B141","first-page":"1097","volume-title":"Advances in neural information processing systems, 25","author":"Krizhevsky A.","year":"2012"},{"key":"B142","first-page":"2539","volume-title":"Advances in neural information processing systemsm","volume":"28","author":"Kulkarni T. D.","year":"2015"},{"key":"B143","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2009.5459250"},{"key":"B144","volume-title":"TI-POOLING: Transformation-invariant pooling for feature learning in convolutional neural networks","author":"Laptev D.","year":"2016"},{"key":"B145","doi-asserted-by":"publisher","DOI":"10.1145\/1273496.1273556"},{"key":"B146","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.435"},{"key":"B147","doi-asserted-by":"publisher","DOI":"10.1109\/72.554195"},{"key":"B148","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2005.151"},{"key":"B149","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2006.68"},{"key":"B150","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-25958-1_8"},{"key":"B151","volume-title":"Speeding-up convolutional neural networks using fine-tuned CP-decomposition","author":"Lebedev V.","year":"2014"},{"key":"B152","first-page":"143","volume-title":"Connections in perspective","author":"LeCun Y.","year":"1989"},{"key":"B153","doi-asserted-by":"publisher","DOI":"10.1038\/nature14539"},{"key":"B154","first-page":"396","volume-title":"Advances in neural information processing systems, 2","author":"LeCun Y.","year":"1989"},{"key":"B155","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1989.1.4.541"},{"key":"B156","doi-asserted-by":"publisher","DOI":"10.1109\/5.726791"},{"key":"B157","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2004.1315150"},{"key":"B158","doi-asserted-by":"publisher","DOI":"10.1109\/ISCAS.2010.5537907"},{"key":"B159","first-page":"464","volume-title":"Proceedings of the 19th International Conference on Artificial Intelligence and Statistics","author":"Lee C.","year":"2016"},{"key":"B160","first-page":"562","volume-title":"Proceedings of the 18th International Conference on Artificial Intelligence and Statistics","author":"Lee C.","year":"2015"},{"key":"B161","doi-asserted-by":"publisher","DOI":"10.1145\/1553374.1553453"},{"key":"B162","first-page":"711","volume-title":"Advances in neural information processing systems, 24","author":"Leibo J. Z.","year":"2011"},{"issue":"39","key":"B163","first-page":"1","volume":"17","author":"Levine S.","year":"2016","journal-title":"Journal of Machine Learning Research"},{"key":"B164","volume-title":"Ternary weight networks","author":"Li F.","year":"2016"},{"key":"B165","volume-title":"Demystifying ResNet","author":"Li S.","year":"2016"},{"key":"B166","first-page":"1","volume-title":"Advances in neural information processing systemsm","author":"Li Z.","year":"2016"},{"key":"B167","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298958"},{"key":"B168","doi-asserted-by":"publisher","DOI":"10.1109\/WACV.2016.7477624"},{"key":"B169","volume-title":"Deephash: Getting regularization, depth and fine-tuning right","author":"Lin J.","year":"2015"},{"key":"B170","volume-title":"Network in network","author":"Lin M.","year":"2013"},{"key":"B171","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2011.5995477"},{"key":"B172","volume-title":"The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors","author":"Linnainmaa S.","year":"1970"},{"key":"B173","volume-title":"The loss surface of residual networks: Ensembles and the role of batch normalization","author":"Littwin E.","year":"2016"},{"key":"B174","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.238"},{"key":"B175","first-page":"507","volume-title":"Proceedings of the 33rd International Conference Machine Learning","author":"Liu W.","year":"2016"},{"key":"B176","first-page":"1","volume-title":"Proceedings of the 30th International Conference Machine Learning","author":"Maas A. L.","year":"2013"},{"key":"B177","volume-title":"Improving the adversarial robustness of ConvNets by reduction of input dimensionality","author":"Maharaj A. V.","year":"2015"},{"key":"B178","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2008.4587652"},{"key":"B179","volume-title":"Learnable pooling regions for image classification","author":"Malinowski M.","year":"2013"},{"key":"B180","doi-asserted-by":"publisher","DOI":"10.1002\/cpa.21413"},{"key":"B181","first-page":"52","volume-title":"Proceedings of the 21th International Conference on Artificial Neural Networks","author":"Masci J.","year":"2011"},{"key":"B182","volume-title":"Fast training of convolutional networks through FFTs","author":"Mathieu M.","year":"2013"},{"key":"B183","first-page":"1","volume-title":"Proceedings of the 4th International Conference on Learning Representations","author":"Mishkin D.","year":"2016"},{"key":"B184","first-page":"1","volume-title":"Proceedings of the 5th International Conference on Learning Representations","author":"Miyato T.","year":"2016"},{"key":"B185","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-35289-8"},{"key":"B186","first-page":"739","volume-title":"Advances in neural information processing systems, 18","author":"Muller U.","year":"2005"},{"key":"B187","doi-asserted-by":"publisher","DOI":"10.1109\/ICMLA.2012.14"},{"key":"B188","doi-asserted-by":"publisher","DOI":"10.1109\/ICSIPA.2011.6144164"},{"key":"B189","first-page":"1339","volume-title":"Advances in neural information processing systems","volume":"22","author":"Nair V.","year":"2009"},{"key":"B190","first-page":"807","volume-title":"Proceedings of the 27th International Conference on Machine Learning","author":"Nair V.","year":"2010"},{"key":"B191","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-03767-2_10"},{"key":"B192","volume-title":"Kaggle.com","author":"National Data Science Bowl | Kaggle","year":"2016"},{"key":"B193","first-page":"1","volume-title":"Advances in neural information processing systems, 24 (NIPS) Workshop on Deep Learning and Unsupervised Feature Learning","author":"Netzer Y.","year":"2011"},{"key":"B194","first-page":"1279","volume-title":"Advances in neural information processing systems","volume":"23","author":"Ngiam J.","year":"2010"},{"key":"B195","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298640"},{"key":"B196","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2005.852470"},{"key":"B197","volume-title":"Advances in neural information processing systems, 28","author":"Novikov A.","year":"2015"},{"key":"B198","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1992.4.4.473"},{"key":"B199","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2004.01.013"},{"key":"B200","volume-title":"Wavenet: A generative model for raw audio","author":"Oord A.","year":"2016"},{"key":"B201","unstructured":"Orhan, A. E. (2017). Skip connections as effective symmetry-breaking. arXiv 1701.09175."},{"key":"B202","doi-asserted-by":"publisher","DOI":"10.1137\/090752286"},{"key":"B203","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-33718-5_5"},{"key":"B204","volume-title":"GPU asynchronous stochastic gradient descent to speed up neural network training","author":"Paine T.","year":"2013"},{"key":"B205","doi-asserted-by":"publisher","DOI":"10.1109\/SP.2016.41"},{"key":"B206","volume-title":"Regularizing neural networks by penalizing confident output distributions","author":"Pereyra G.","year":"2017"},{"key":"B207","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-15561-1_11"},{"key":"B208","volume-title":"Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours","author":"Pinto L.","year":"2015"},{"key":"B209","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.3850"},{"key":"B210","doi-asserted-by":"publisher","DOI":"10.1016\/S0893-6080(98)00116-6"},{"key":"B211","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2007.383157"},{"key":"B212","first-page":"1137","volume-title":"Advances in neural information processing systems, 19","author":"Ranzato M.","year":"2006"},{"key":"B213","doi-asserted-by":"publisher","DOI":"10.1145\/1390156.1390256"},{"key":"B214","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46493-0_32"},{"key":"B215","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2014.131"},{"key":"B216","first-page":"693","volume-title":"Advances in neural information processing systems","volume":"24","author":"Recht B.","year":"2011"},{"key":"B217","first-page":"1746","volume-title":"Proceedings of the 30th International Conference on Machine Learning","author":"Rippel O.","year":"2014"},{"key":"B218","first-page":"2449","volume-title":"Advances in neural information processing systems","volume":"28","author":"Rippel O.","year":"2015"},{"key":"B219","first-page":"1","volume-title":"Proceedings of the 3rd International Conference on Learning Representations","author":"Romero A.","year":"2015"},{"key":"B220","doi-asserted-by":"publisher","DOI":"10.1038\/323533a0"},{"key":"B221","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-015-0816-y"},{"key":"B222","first-page":"1","volume-title":"Proceedings of the 4th International Conference on Learning Representations","author":"Sabour S.","year":"2016"},{"key":"B223","doi-asserted-by":"publisher","DOI":"10.1109\/ASRU.2013.6707749"},{"key":"B224","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2013.6638949"},{"key":"B225","first-page":"412","volume-title":"Proceedings of the 11th International Conference on Artificial Intelligence and Statistics","author":"Salakhutdinov R.","year":"2007"},{"key":"B226","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2011.5995504"},{"key":"B227","volume-title":"Exact solutions to the nonlinear dynamics of learning in deep linear neural networks","author":"Saxe A. M.","year":"2013"},{"key":"B228","volume-title":"Dense prediction on sequences with time-dilated convolutions for speech recognition","author":"Sercu T.","year":"2016"},{"key":"B229","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-15825-4_10"},{"key":"B230","doi-asserted-by":"publisher","DOI":"10.1016\/j.neunet.2014.09.003"},{"key":"B231","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298682"},{"key":"B232","first-page":"3288","volume-title":"Proceedings of the 21st International Conference on Pattern Recognition","author":"Sermanet P.","year":"2012"},{"key":"B233","doi-asserted-by":"publisher","DOI":"10.1109\/ICDAR.2003.1227801"},{"key":"B234","doi-asserted-by":"publisher","DOI":"10.1016\/S0042-6989(97)00183-1"},{"key":"B235","volume-title":"Very deep convolutional networks for large-scale image recognition","author":"Simonyan K.","year":"2014"},{"key":"B236","doi-asserted-by":"publisher","DOI":"10.1016\/j.aasri.2014.05.013"},{"key":"B237","first-page":"2951","volume-title":"Advances in neural information processing systems, 25","author":"Snoek J.","year":"2012"},{"key":"B238","first-page":"69","volume":"168","author":"Sontag E. D.","year":"1998","journal-title":"NATO ASI Series F Computer and Systems Sciences"},{"key":"B239","volume-title":"Striving for simplicity: The all convolutional net","author":"Springenberg J. T.","year":"2014"},{"key":"B240","unstructured":"Springenberg, J. T. & Riedmiller, M. (2013). Improving deep neural networks with probabilistic maxout units. arXiv 1312.6116."},{"key":"B241","doi-asserted-by":"publisher","DOI":"10.3389\/frobt.2015.00036"},{"issue":"1","key":"B242","first-page":"1929","volume":"15","author":"Srivastava N.","year":"2014","journal-title":"Journal of Machine Learning Research"},{"key":"B243","first-page":"2094","volume-title":"Advances in neural information processing systems, 26","author":"Srivastava N.","year":"2013"},{"key":"B244","first-page":"2377","volume-title":"Advances in neural information processing systems, 28","author":"Srivastava R. K.","year":"2015"},{"key":"B245","volume-title":"Highway networks","author":"Srivastava R. K.","year":"2015"},{"key":"B246","doi-asserted-by":"publisher","DOI":"10.1109\/IJCNN.2011.6033395"},{"key":"B247","first-page":"1115","volume-title":"Proceedings of the 8th International Conference on Document Analysis and Recognition","author":"Steinkrau D.","year":"2005"},{"key":"B248","first-page":"3545","volume-title":"Advances in neural information processing systems, 27","author":"Stollenga M. F.","year":"2014"},{"key":"B249","first-page":"1988","volume-title":"Advances in neural information processing systems, 27","author":"Sun Y.","year":"2014"},{"key":"B250","volume-title":"Deepid3: Face recognition with very deep neural networks","author":"Sun Y.","year":"2015"},{"key":"B251","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.244"},{"key":"B252","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298907"},{"key":"B253","volume-title":"Random walk initialization for training very deep feedforward networks","author":"Sussillo D.","year":"2014"},{"key":"B254","first-page":"1139","volume-title":"Proceedings of the 30th International Conference Machine Learning","author":"Sutskever I.","year":"2013"},{"key":"B255","volume-title":"Inception-v4, Inception-Resnet and the impact of residual connections on learning","author":"Szegedy C.","year":"2016"},{"key":"B256","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"B257","volume-title":"Rethinking the Inception architecture for computer vision","author":"Szegedy C.","year":"2015"},{"key":"B258","first-page":"1","volume-title":"Proceedings of the 1st International Conference on Learning Representations","author":"Szegedy C.","year":"2014"},{"key":"B259","doi-asserted-by":"publisher","DOI":"10.1109\/IJCNN.2016.7727230"},{"key":"B260","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.220"},{"key":"B261","volume-title":"Deep learning using linear support vector machines","author":"Tang Y.","year":"2013"},{"key":"B262","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298664"},{"key":"B263","doi-asserted-by":"publisher","DOI":"10.1007\/BF02289464"},{"key":"B264","doi-asserted-by":"publisher","DOI":"10.1162\/neco.2009.10-08-881"},{"key":"B265","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-30447-2_2"},{"key":"B266","doi-asserted-by":"publisher","DOI":"10.1198\/10618600152418584"},{"key":"B267","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4757-2440-0"},{"key":"B268","doi-asserted-by":"publisher","DOI":"10.1137\/1116025"},{"key":"B269","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298935"},{"key":"B270","first-page":"351","volume-title":"Advances in neural information processing systems","volume":"26","author":"Wager S.","year":"2013"},{"key":"B271","first-page":"1058","volume-title":"Proceedings of the 30th International Conference Machine Learning","author":"Wan L.","year":"2013"},{"key":"B272","volume-title":"CNN-RNN: A unified framework for multi-label image classification","author":"Wang J.","year":"2016"},{"key":"B273","first-page":"118","volume-title":"Proceedings of the 30th International Conference Machine Learning","author":"Wang S. I.","year":"2013"},{"key":"B274","first-page":"1","volume-title":"Advances in neural information processing systems, 29","author":"Wang Y.","year":"2016"},{"key":"B275","first-page":"40","volume-title":"Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence","author":"Wang Z.","year":"2015"},{"key":"B276","volume-title":"An empirical analysis of dropout in piecewise linear networks","author":"Warde-Farley D.","year":"2013"},{"key":"B277","first-page":"1473","volume-title":"Advances in neural information processing systems, 18","author":"Weinberger K. Q.","year":"2005"},{"key":"B278","volume-title":"Beyond regression: New tools for prediction and analysis in the behavioral sciences","author":"Werbos P.","year":"1974"},{"key":"B279","doi-asserted-by":"publisher","DOI":"10.1007\/BFb0006203"},{"key":"B280","doi-asserted-by":"publisher","DOI":"10.1145\/1390156.1390303"},{"key":"B281","volume-title":"A mathematical theory of deep convolutional neural networks for feature extraction","author":"Wiatowski T.","year":"2015"},{"key":"B282","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611970364"},{"key":"B283","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2011.5995566"},{"key":"B284","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-26532-2_6"},{"key":"B285","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298968"},{"key":"B286","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.514"},{"key":"B287","volume-title":"Aggregated residual transformations for deep neural networks","author":"Xie S.","year":"2016"},{"key":"B288","volume-title":"Empirical evaluation of rectified activations in convolutional network","author":"Xu B.","year":"2015"},{"key":"B289","volume-title":"Multi-GPU training of convnets","author":"Yadan O.","year":"2014"},{"key":"B290","first-page":"1794","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Yang J.","year":"2009"},{"key":"B291","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-11740-9_34"},{"key":"B292","volume-title":"Multi-scale context aggregation by dilated convolutions","author":"Yu F.","year":"2015"},{"key":"B293","doi-asserted-by":"publisher","DOI":"10.5244\/C.28.109"},{"key":"B294","volume-title":"Visualizing and comparing convolutional neural networks","author":"Yu W.","year":"2014"},{"key":"B295","volume-title":"Wide residual networks","author":"Zagoruyko S.","year":"2017"},{"key":"B296","volume-title":"ADADELTA: An adaptive learning rate method","author":"Zeiler M. D.","year":"2012"},{"key":"B297","volume-title":"Stochastic pooling for regularization of deep convolutional neural networks","author":"Zeiler M. D.","year":"2013"},{"key":"B298","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-10590-1_53"},{"key":"B299","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2011.6126474"},{"key":"B300","first-page":"1","volume-title":"Advances in neural information processing systems, 29","author":"Zhai S.","year":"2016"},{"key":"B301","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2006.301"},{"key":"B302","volume-title":"Suppressing the unusual: Towards robust CNNs using symmetric activation functions","author":"Zhao Q.","year":"2016"},{"key":"B303","volume-title":"Naive-deep face recognition: Touching the limit of LFW benchmark or not?","author":"Zhou E.","year":"2015"},{"key":"B304","volume-title":"Recover canonical-view faces in the wild with deep neural networks","author":"Zhu Z.","year":"2014"},{"key":"B305","doi-asserted-by":"publisher","DOI":"10.1145\/2507157.2507164"},{"key":"B306","first-page":"2595","volume-title":"Advances in neural information processing systemsm, 23","author":"Zinkevich M.","year":"2010"}],"container-title":["Neural Computation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mitpressjournals.org\/doi\/pdf\/10.1162\/neco_a_00990","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,3,12]],"date-time":"2021-03-12T21:42:02Z","timestamp":1615585322000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/neco\/article\/29\/9\/2352-2449\/8292"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,9]]},"references-count":306,"journal-issue":{"issue":"9","published-print":{"date-parts":[[2017,9]]}},"alternative-id":["10.1162\/neco_a_00990"],"URL":"https:\/\/doi.org\/10.1162\/neco_a_00990","relation":{},"ISSN":["0899-7667","1530-888X"],"issn-type":[{"value":"0899-7667","type":"print"},{"value":"1530-888X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2017,9]]}}}