{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,3]],"date-time":"2026-04-03T15:14:22Z","timestamp":1775229262168,"version":"3.50.1"},"reference-count":100,"publisher":"Springer Science and Business Media LLC","issue":"4","license":[{"start":{"date-parts":[[2024,3,23]],"date-time":"2024-03-23T00:00:00Z","timestamp":1711152000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,3,23]],"date-time":"2024-03-23T00:00:00Z","timestamp":1711152000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100012456","name":"National Social Science Fund of China","doi-asserted-by":"publisher","award":["22BTJ057"],"award-info":[{"award-number":["22BTJ057"]}],"id":[{"id":"10.13039\/501100012456","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Artif Intell Rev"],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>In computer vision, a series of exemplary advances have been made in several areas involving image classification, semantic segmentation, object detection, and image super-resolution reconstruction with the rapid development of deep convolutional neural network (CNN). The CNN has superior features for autonomous learning and expression, and feature extraction from original input data can be realized by means of training CNN models that match practical applications. Due to the rapid progress in deep learning technology, the structure of CNN is becoming more and more complex and diverse. Consequently, it gradually replaces the traditional machine learning methods. This paper presents an elementary understanding of CNN components and their functions, including input layers, convolution layers, pooling layers, activation functions, batch normalization, dropout, fully connected layers, and output layers. On this basis, this paper gives a comprehensive overview of the past and current research status of the applications of CNN models in computer vision fields, e.g., image classification, object detection, and video prediction. In addition, we summarize the challenges and solutions of the deep CNN, and future research directions are also discussed.<\/jats:p>","DOI":"10.1007\/s10462-024-10721-6","type":"journal-article","created":{"date-parts":[[2024,3,23]],"date-time":"2024-03-23T07:10:03Z","timestamp":1711177803000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":806,"title":["A review of convolutional neural networks in computer vision"],"prefix":"10.1007","volume":"57","author":[{"given":"Xia","family":"Zhao","sequence":"first","affiliation":[]},{"given":"Limin","family":"Wang","sequence":"additional","affiliation":[]},{"given":"Yufei","family":"Zhang","sequence":"additional","affiliation":[]},{"given":"Xuming","family":"Han","sequence":"additional","affiliation":[]},{"given":"Muhammet","family":"Deveci","sequence":"additional","affiliation":[]},{"given":"Milan","family":"Parmar","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,3,23]]},"reference":[{"key":"10721_CR2","doi-asserted-by":"crossref","unstructured":"Al-Haija QA, Smadi M, Al-Bataineh OM (2021) Identifying phasic dopamine releases using darknet-19 convolutional neural network. In: 2021 IEEE International IOT, Electronics and Mechatronics Conference (IEMTRONICS), pp. 1\u20135.","DOI":"10.1109\/IEMTRONICS52119.2021.9422617"},{"issue":"1","key":"10721_CR1","doi-asserted-by":"publisher","first-page":"333","DOI":"10.1007\/s00521-021-06372-1","volume":"34","author":"MAS Al Husaini","year":"2022","unstructured":"Al Husaini MAS, Habaebi MH, Gunawan TS, Islam MR, Elsheikh EA, Suliman F (2022) Thermal-based early breast cancer detection using inception v3, inception v4 and modified inception mv4. Neural Comput Appl 34(1):333\u2013348","journal-title":"Neural Comput Appl"},{"key":"10721_CR3","unstructured":"Alom MZ, Taha TM, Yakopcic C, Westberg S, Sidike P, Nasrin MS, Van\u00a0Esesn BC, Awwal AAS, Asari VK (2018) The history began from alexnet: A comprehensive survey on deep learning approaches. arXiv preprint arXiv:1803.01164"},{"issue":"9","key":"10721_CR4","doi-asserted-by":"publisher","first-page":"4895","DOI":"10.3390\/su14094895","volume":"14","author":"J Ankrah","year":"2022","unstructured":"Ankrah J, Monteiro A, Madureira H (2022) Bibliometric analysis of data sources and tools for shoreline change analysis and detection. Sustainability 14(9):4895","journal-title":"Sustainability"},{"issue":"6","key":"10721_CR5","first-page":"3237","volume":"63","author":"L Anuj","year":"2020","unstructured":"Anuj L, Gopalakrishna M (2020) ResNet50-YOLOv2-convolutional neural network based hybrid deep structural learning for moving vehicle tracking under occlusion. Solid State Technol 63(6):3237\u20133258","journal-title":"Solid State Technol"},{"key":"10721_CR6","unstructured":"Baldi P (2012) Autoencoders, unsupervised learning, and deep architectures. In: Proceedings of ICML Workshop on Unsupervised and Transfer Learning, pp. 37\u201349. JMLR Workshop and Conference Proceedings"},{"key":"10721_CR7","doi-asserted-by":"publisher","first-page":"404","DOI":"10.1007\/11744023_32","volume":"3951","author":"H Bay","year":"2006","unstructured":"Bay H, Tuytelaars T, Van Gool L (2006) Surf: speeded up robust features. Lecture Notes Comput Sci 3951:404\u2013417","journal-title":"Lecture Notes Comput Sci"},{"issue":"20","key":"10721_CR8","doi-asserted-by":"publisher","first-page":"2470","DOI":"10.3390\/electronics10202470","volume":"10","author":"D Bhatt","year":"2021","unstructured":"Bhatt D, Patel C, Talsania H, Patel J, Vaghela R, Pandya S, Modi K, Ghayvat H (2021) CNN variants for computer vision: history, architecture, application, challenges and future scope. Electronics 10(20):2470","journal-title":"Electronics"},{"key":"10721_CR9","unstructured":"Bochkovskiy A, Wang C-Y, Liao H-YM (2020) Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934"},{"key":"10721_CR10","unstructured":"Bouvrie, J (2006) Introduction Notes on Convolutional Neural Networks,\u201d (1)"},{"key":"10721_CR11","doi-asserted-by":"crossref","unstructured":"Cao J, Cholakkal H, Anwer RM, Khan FS, Pang Y, Shao L (2020) D2det: Towards high quality object detection and instance segmentation. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 11485\u201311494","DOI":"10.1109\/CVPR42600.2020.01150"},{"key":"10721_CR12","doi-asserted-by":"crossref","unstructured":"Castrejon L, Ballas N, Courville A (2019) Improved conditional vrnns for video prediction. In: Proceedings of the IEEE\/CVF International Conference on Computer Vision, pp. 7608\u20137617","DOI":"10.1109\/ICCV.2019.00770"},{"key":"10721_CR14","doi-asserted-by":"crossref","unstructured":"Chan ER, Lin CZ, Chan MA, Nagano K, Pan B, De\u00a0Mello S, Gallo O, Guibas LJ., Tremblay J, Khamis S (2022) Efficient geometry-aware 3d generative adversarial networks. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 16123\u201316133","DOI":"10.1109\/CVPR52688.2022.01565"},{"issue":"1","key":"10721_CR13","doi-asserted-by":"publisher","first-page":"749","DOI":"10.1007\/s10462-022-10183-8","volume":"56","author":"JY-L Chan","year":"2023","unstructured":"Chan JY-L, Bea KT, Leow SMH, Phoong SW, Cheng WK (2023) State of the art: a review of sentiment analysis based on sequential transfer learning. Artif Intell Rev 56(1):749\u2013780","journal-title":"Artif Intell Rev"},{"key":"10721_CR15","first-page":"1","volume":"13","author":"MA Chandra","year":"2021","unstructured":"Chandra MA, Bedi S (2021) Survey on SVM and their application in image classification. Int J Inf Technol 13:1\u201311","journal-title":"Int J Inf Technol"},{"key":"10721_CR16","unstructured":"Chang Z, Zhang X, Wang S, Ma S, Gao W (2022) Stau: A spatiotemporal-aware unit for video prediction and beyond. arXiv preprint arXiv:2204.09456"},{"key":"10721_CR17","doi-asserted-by":"crossref","unstructured":"Chen Y, Dai X, Chen D, Liu M, Dong X, Yuan L, Liu Z (2022) Mobile-former: Bridging mobilenet and transformer. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 5270\u20135279","DOI":"10.1109\/CVPR52688.2022.00520"},{"issue":"1","key":"10721_CR18","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1109\/MSP.2017.2765202","volume":"35","author":"A Creswell","year":"2018","unstructured":"Creswell A, White T, Dumoulin V, Arulkumaran K, Sengupta B, Bharath AA (2018) Generative adversarial networks: an overview. IEEE Signal Process Mag 35(1):53\u201365","journal-title":"IEEE Signal Process Mag"},{"key":"10721_CR19","doi-asserted-by":"crossref","unstructured":"Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248\u2013255.","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"10721_CR20","doi-asserted-by":"publisher","first-page":"3835","DOI":"10.1109\/TIP.2020.2965299","volume":"29","author":"C Dhiman","year":"2020","unstructured":"Dhiman C, Vishwakarma DK (2020) View-invariant deep architecture for human action recognition using two-stream motion and shape temporal dynamics. IEEE Trans Image Process 29:3835\u20133844","journal-title":"IEEE Trans Image Process"},{"issue":"9","key":"10721_CR21","first-page":"1563","volume":"15","author":"W Dicong","year":"2021","unstructured":"Dicong W, Chenshuai B, Kaijun W (2021) Survey of video object detection based on deep learning. J Front Comput Sci Technol 15(9):1563","journal-title":"J Front Comput Sci Technol"},{"key":"10721_CR22","doi-asserted-by":"crossref","unstructured":"Ding X, Zhang X, Ma N, Han J, Ding G, Sun J (2021) Repvgg: Making vgg-style convnets great again. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 13733\u201313742","DOI":"10.1109\/CVPR46437.2021.01352"},{"key":"10721_CR23","doi-asserted-by":"crossref","unstructured":"Dong Z, Li G, Liao Y, Wang F, Ren P, Qian C (2020) Centripetalnet: Pursuing high-quality keypoint pairs for object detection. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 10519\u201310528","DOI":"10.1109\/CVPR42600.2020.01053"},{"issue":"3","key":"10721_CR24","doi-asserted-by":"publisher","first-page":"210","DOI":"10.1017\/S0140525X1200218X","volume":"36","author":"T Egner","year":"2013","unstructured":"Egner T, Summerfield C (2013) Grounding predictive coding models in empirical neuroscience research. Behav Brain Sci 36(3):210\u2013211","journal-title":"Behav Brain Sci"},{"key":"10721_CR25","doi-asserted-by":"crossref","unstructured":"Fei-Fei L, Fergus R, Perona P (2004) Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. In: 2004 Conference on Computer Vision and Pattern Recognition Workshop, pp. 178\u2013178.","DOI":"10.1109\/CVPR.2004.383"},{"issue":"4","key":"10721_CR26","doi-asserted-by":"publisher","first-page":"2205","DOI":"10.1109\/LRA.2023.3247175","volume":"8","author":"Z Feng","year":"2023","unstructured":"Feng Z, Guo Y, Sun Y (2023) CEKD: Cross-modal edge-privileged knowledge distillation for semantic scene understanding using only thermal images. IEEE Robot Autom Lett 8(4):2205\u20132212","journal-title":"IEEE Robot Autom Lett"},{"key":"10721_CR27","doi-asserted-by":"publisher","first-page":"2891","DOI":"10.1007\/s10462-020-09916-4","volume":"54","author":"S Fernandes","year":"2021","unstructured":"Fernandes S, Fanaee-T H, Gama J (2021) Tensor decomposition for analysing time-evolving social networks: an overview. Artif Intell Rev 54:2891\u20132916","journal-title":"Artif Intell Rev"},{"key":"10721_CR28","doi-asserted-by":"crossref","unstructured":"Gao Z, Tan C, Wu L, Li SZ (2022) Simvp: Simpler yet better video prediction. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 3170\u20133180","DOI":"10.1109\/CVPR52688.2022.00317"},{"key":"10721_CR29","unstructured":"Ge Z, Liu S, Wang F, Li Z, Sun J (2021) Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430"},{"key":"10721_CR30","unstructured":"Gevorgyan Z (2022) Siou loss: More powerful learning for bounding box regression. arXiv preprint arXiv:2205.12740"},{"key":"10721_CR32","doi-asserted-by":"crossref","unstructured":"Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440\u20131448","DOI":"10.1109\/ICCV.2015.169"},{"key":"10721_CR31","doi-asserted-by":"crossref","unstructured":"Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580\u2013587","DOI":"10.1109\/CVPR.2014.81"},{"issue":"1","key":"10721_CR33","doi-asserted-by":"publisher","first-page":"6","DOI":"10.1007\/s44267-023-00003-0","volume":"1","author":"G Guo","year":"2023","unstructured":"Guo G, Han L, Wang L, Zhang D, Han J (2023) Semantic-aware knowledge distillation with parameter-free feature uniformization. Visual Intell 1(1):6","journal-title":"Visual Intell"},{"key":"10721_CR34","doi-asserted-by":"crossref","unstructured":"He K, Gkioxari G, Doll\u00e1r P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961\u20132969","DOI":"10.1109\/ICCV.2017.322"},{"issue":"5786","key":"10721_CR35","doi-asserted-by":"publisher","first-page":"504","DOI":"10.1126\/science.1127647","volume":"313","author":"GE Hinton","year":"2006","unstructured":"Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504\u2013507","journal-title":"Science"},{"key":"10721_CR36","unstructured":"Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861"},{"issue":"3","key":"10721_CR37","doi-asserted-by":"publisher","first-page":"1833","DOI":"10.1007\/s10462-022-10210-8","volume":"56","author":"K Hu","year":"2023","unstructured":"Hu K, Jin J, Zheng F, Weng L, Ding Y (2023) Overview of behavior recognition based on deep learning. Artif Intell Rev 56(3):1833\u20131865","journal-title":"Artif Intell Rev"},{"issue":"8","key":"10721_CR38","doi-asserted-by":"publisher","first-page":"5171","DOI":"10.1109\/TII.2021.3122801","volume":"18","author":"C Huang","year":"2021","unstructured":"Huang C, Wu Z, Wen J, Xu Y, Jiang Q, Wang Y (2021) Abnormal event detection using deep contrastive learning for intelligent video surveillance system. IEEE Trans Industr Inform 18(8):5171\u20135179","journal-title":"IEEE Trans Industr Inform"},{"key":"10721_CR39","doi-asserted-by":"crossref","unstructured":"Huang L, Qin J, Zhou Y, Zhu F, Liu L, Shao L (2023) Normalization techniques in training dnns: Methodology, analysis and application. IEEE Transactions on Pattern Analysis and Machine Intelligence","DOI":"10.1109\/TPAMI.2023.3250241"},{"issue":"1","key":"10721_CR40","doi-asserted-by":"publisher","first-page":"215","DOI":"10.1113\/jphysiol.1968.sp008455","volume":"195","author":"DH Hubel","year":"1968","unstructured":"Hubel DH, Wiesel TN (1968) Receptive fields and functional architecture of monkey striate cortex. J Physiol 195(1):215\u2013243","journal-title":"J Physiol"},{"key":"10721_CR41","unstructured":"Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448\u2013456. pmlr"},{"issue":"11","key":"10721_CR42","doi-asserted-by":"publisher","first-page":"5713","DOI":"10.3390\/app12115713","volume":"12","author":"J Isabona","year":"2022","unstructured":"Isabona J, Imoize AL, Ojo S, Karunwi O, Kim Y, Lee C-C, Li C-T (2022) Development of a multilayer perceptron neural network for optimal predictive modeling in urban microcellular radio environments. Appl Sci 12(11):5713","journal-title":"Appl Sci"},{"key":"10721_CR43","doi-asserted-by":"publisher","DOI":"10.1016\/j.jmatprotec.2021.117064","volume":"292","author":"X Ji","year":"2021","unstructured":"Ji X, Yan Q, Huang D, Wu B, Xu X, Zhang A, Liao G, Zhou J, Wu M (2021) Filtered selective search and evenly distributed convolutional neural networks for casting defects recognition. J Mater Process Technol 292:117064","journal-title":"J Mater Process Technol"},{"key":"10721_CR44","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2021.108159","volume":"121","author":"X Jin","year":"2022","unstructured":"Jin X, Xie Y, Wei X-S, Zhao B-R, Chen Z-M, Tan X (2022) Delving deep into spatial pooling for squeeze-and-excitation networks. Pattern Recognit 121:108159","journal-title":"Pattern Recognit"},{"key":"10721_CR45","doi-asserted-by":"publisher","first-page":"29","DOI":"10.1007\/s11416-018-0324-z","volume":"15","author":"RU Khan","year":"2019","unstructured":"Khan RU, Zhang X, Kumar R (2019) Analysis of ResNet and GoogleNet models for malware detection. J Comput Virol Hacking Tech 15:29\u201337","journal-title":"J Comput Virol Hacking Tech"},{"key":"10721_CR46","unstructured":"Krizhevsky A, Hinton G, et al (2009) Learning multiple layers of features from tiny images"},{"issue":"11","key":"10721_CR47","doi-asserted-by":"publisher","first-page":"2278","DOI":"10.1109\/5.726791","volume":"86","author":"Y LeCun","year":"1998","unstructured":"LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278\u20132324","journal-title":"Proc IEEE"},{"key":"10721_CR49","doi-asserted-by":"crossref","unstructured":"Li Z, Liu F, Yang W, Peng S, Zhou J (2021) A survey of convolutional neural networks: analysis, applications, and prospects. IEEE transactions on neural networks and learning systems","DOI":"10.1109\/TNNLS.2021.3084827"},{"key":"10721_CR48","doi-asserted-by":"crossref","unstructured":"Li J et al. (2022) Recent advances in end-to-end automatic speech recognition. APSIPA Transactions on Signal and Information Processing 11(1)","DOI":"10.1561\/116.00000050"},{"key":"10721_CR50","unstructured":"Li C, Li L, Jiang H, Weng K, Geng Y, Li L, Ke Z, Li Q, Cheng M, Nie W, et al (2022) Yolov6: A single-stage object detection framework for industrial applications. arXiv preprint arXiv:2209.02976"},{"key":"10721_CR52","doi-asserted-by":"crossref","unstructured":"Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Doll\u00e1r P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: Computer Vision\u2013ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pp. 740\u2013755. Springer","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"10721_CR51","doi-asserted-by":"crossref","unstructured":"Lin T-Y, Doll\u00e1r P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117\u20132125","DOI":"10.1109\/CVPR.2017.106"},{"issue":"1","key":"10721_CR53","doi-asserted-by":"publisher","first-page":"681","DOI":"10.1109\/TPAMI.2021.3139918","volume":"45","author":"Z Liu","year":"2022","unstructured":"Liu Z, Wu S, Jin S, Ji S, Liu Q, Lu S, Cheng L (2022) Investigating pose representations and motion contexts modeling for 3d motion prediction. IEEE Transn Pattern Anal Mach Intell 45(1):681\u2013697","journal-title":"IEEE Transn Pattern Anal Mach Intell"},{"key":"10721_CR54","unstructured":"Lotter W, Kreiman G, Cox D (2016) Deep predictive coding networks for video prediction and unsupervised learning. arXiv preprint arXiv:1605.08104"},{"key":"10721_CR55","doi-asserted-by":"publisher","first-page":"91","DOI":"10.1023\/B:VISI.0000029664.99615.94","volume":"60","author":"DG Lowe","year":"2004","unstructured":"Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60:91\u2013110","journal-title":"Int J Comput Vis"},{"key":"10721_CR56","doi-asserted-by":"publisher","DOI":"10.1016\/j.artint.2020.103448","volume":"293","author":"W Luo","year":"2021","unstructured":"Luo W, Xing J, Milan A, Zhang X, Liu W, Kim T-K (2021) Multiple object tracking: a literature review. Artif intell 293:103448","journal-title":"Artif intell"},{"key":"10721_CR57","doi-asserted-by":"publisher","first-page":"3048","DOI":"10.1109\/TMM.2021.3068576","volume":"23","author":"X Ma","year":"2021","unstructured":"Ma X, Guo J, Sansom A, McGuire M, Kalaani A, Chen Q, Tang S, Yang Q, Fu S (2021) Spatial pyramid attention for deep convolutional neural networks. IEEE Trans Multimedia 23:3048\u20133058","journal-title":"IEEE Trans Multimedia"},{"issue":"2","key":"10721_CR58","doi-asserted-by":"publisher","first-page":"1627","DOI":"10.1007\/s10462-022-10209-1","volume":"56","author":"P Ma","year":"2023","unstructured":"Ma P, Li C, Rahaman MM, Yao Y, Zhang J, Zou S, Zhao X, Grzegorzek M (2023) A state-of-the-art survey of object detection techniques in microorganism image analysis: from classical methods to deep learning approaches. Artif Intell Rev 56(2):1627\u20131698","journal-title":"Artif Intell Rev"},{"key":"10721_CR59","first-page":"64","volume":"5","author":"LR Medsker","year":"2001","unstructured":"Medsker LR, Jain L (2001) Recurrent neural networks. Des Appl 5:64\u201367","journal-title":"Des Appl"},{"issue":"7","key":"10721_CR60","first-page":"3523","volume":"44","author":"S Minaee","year":"2021","unstructured":"Minaee S, Boykov Y, Porikli F, Plaza A, Kehtarnavaz N, Terzopoulos D (2021) Image segmentation using deep learning: a survey. IEEE Trans Pattern Anal Mach Intell 44(7):3523\u20133542","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"10721_CR61","unstructured":"Nwankpa C, Ijomah W, Gachagan A, Marshall S (2018) Activation functions: Comparison of trends in practice and research for deep learning. arXiv preprint arXiv:1811.03378"},{"key":"10721_CR62","doi-asserted-by":"crossref","unstructured":"Papageorgiou CP, Oren M, Poggio T (1998) A general framework for object detection. In: Sixth International Conference on Computer Vision (IEEE Cat. No. 98CH36271), pp. 555\u2013562. IEEE","DOI":"10.1109\/ICCV.1998.710772"},{"issue":"5","key":"10721_CR63","doi-asserted-by":"publisher","first-page":"1780","DOI":"10.3390\/s22051780","volume":"22","author":"C Patel","year":"2022","unstructured":"Patel C, Bhatt D, Sharma U, Patel R, Pandya S, Modi K, Cholli N, Patel A, Bhatt U, Khan MA (2022) DBGC: dimension-based generic convolution block for object recognition. Sensors 22(5):1780","journal-title":"Sensors"},{"key":"10721_CR64","unstructured":"Patraucean V, Handa A, Cipolla R (2015) Spatio-temporal video autoencoder with differentiable memory. arXiv preprint arXiv:1511.06309"},{"key":"10721_CR66","doi-asserted-by":"crossref","unstructured":"Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263\u20137271","DOI":"10.1109\/CVPR.2017.690"},{"key":"10721_CR67","unstructured":"Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767"},{"key":"10721_CR65","doi-asserted-by":"crossref","unstructured":"Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779\u2013788","DOI":"10.1109\/CVPR.2016.91"},{"key":"10721_CR68","unstructured":"Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems 28"},{"key":"10721_CR69","doi-asserted-by":"crossref","unstructured":"Ren J, Zheng Q, Zhao Y, Xu X, Li C (2022) Dlformer: Discrete latent transformer for video inpainting. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 3511\u20133520","DOI":"10.1109\/CVPR52688.2022.00350"},{"key":"10721_CR70","doi-asserted-by":"crossref","unstructured":"Sainath TN, Kingsbury B, Mohamed A-r, Dahl GE, Saon G, Soltau H, Beran T, Aravkin AY, Ramabhadran B (2013) Improvements to deep convolutional neural networks for lvcsr. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 315\u2013320. IEEE","DOI":"10.1109\/ASRU.2013.6707749"},{"key":"10721_CR71","doi-asserted-by":"crossref","unstructured":"Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510\u20134520","DOI":"10.1109\/CVPR.2018.00474"},{"key":"10721_CR72","doi-asserted-by":"publisher","first-page":"95","DOI":"10.3389\/fnins.2019.00095","volume":"13","author":"A Sengupta","year":"2019","unstructured":"Sengupta A, Ye Y, Wang R, Liu C, Roy K (2019) Going deeper in spiking neural networks: VGG and residual architectures. Front Neurosci 13:95","journal-title":"Front Neurosci"},{"key":"10721_CR73","unstructured":"Shetty S (2016) Application of convolutional neural network for image classification on pascal voc challenge 2012 dataset. arXiv preprint arXiv:1607.03785"},{"key":"10721_CR74","unstructured":"Shi X, Chen Z, Wang H, Yeung D-Y, Wong W-K, Woo W-c (2015) Convolutional lstm network: A machine learning approach for precipitation nowcasting. Advances in neural information processing systems 28"},{"key":"10721_CR75","doi-asserted-by":"publisher","first-page":"1107","DOI":"10.1007\/s10462-018-9651-1","volume":"52","author":"T Singh","year":"2019","unstructured":"Singh T, Vishwakarma DK (2019) Video benchmarks of human action datasets: a review. Artif Intell Rev 52:1107\u20131154","journal-title":"Artif Intell Rev"},{"key":"10721_CR76","doi-asserted-by":"publisher","first-page":"469","DOI":"10.1007\/s00521-020-05018-y","volume":"33","author":"T Singh","year":"2021","unstructured":"Singh T, Vishwakarma DK (2021) A deeply coupled convnet for human activity recognition using dynamic and RGB images. Neural Comput Appl 33:469\u2013485","journal-title":"Neural Comput Appl"},{"issue":"1","key":"10721_CR77","first-page":"1929","volume":"15","author":"N Srivastava","year":"2014","unstructured":"Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929\u20131958","journal-title":"J Mach Learn Res"},{"key":"10721_CR78","unstructured":"Srivastava RK, Greff K, Schmidhuber J (2015) Highway networks. arXiv preprint arXiv:1505.00387"},{"key":"10721_CR79","doi-asserted-by":"publisher","DOI":"10.1016\/j.cam.2022.114980","volume":"424","author":"S Stepanov","year":"2023","unstructured":"Stepanov S, Spiridonov D, Mai T (2023) Prediction of numerical homogenization using deep learning for the Richards equation. J Comput Appl Math 424:114980","journal-title":"J Comput Appl Math"},{"key":"10721_CR80","doi-asserted-by":"crossref","unstructured":"Sui X, Li S, Geng X, Wu Y, Xu X, Liu Y, Goh R, Zhu H (2022) Craft: Cross-attentional flow transformer for robust optical flow. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 17602\u201317611","DOI":"10.1109\/CVPR52688.2022.01708"},{"key":"10721_CR81","doi-asserted-by":"crossref","unstructured":"Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1\u20139","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"10721_CR82","doi-asserted-by":"crossref","unstructured":"Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818\u20132826","DOI":"10.1109\/CVPR.2016.308"},{"key":"10721_CR83","doi-asserted-by":"crossref","unstructured":"Szegedy C, Ioffe S, Vanhoucke V, Alemi A (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31","DOI":"10.1609\/aaai.v31i1.11231"},{"key":"10721_CR85","unstructured":"Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp. 6105\u20136114."},{"key":"10721_CR84","doi-asserted-by":"crossref","unstructured":"Tan M, Pang R, Le QV (2020) Efficientdet: Scalable and efficient object detection. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 10781\u201310790","DOI":"10.1109\/CVPR42600.2020.01079"},{"issue":"4","key":"10721_CR86","doi-asserted-by":"publisher","first-page":"377","DOI":"10.1080\/02564602.2020.1740615","volume":"38","author":"MP Uddin","year":"2021","unstructured":"Uddin MP, Mamun MA, Hossain MA (2021) PCA-based feature reduction for hyperspectral remote sensing image classification. IETE Tech Rev 38(4):377\u2013396","journal-title":"IETE Tech Rev"},{"key":"10721_CR87","unstructured":"Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser \u0141, Polosukhin I (2017) Attention is all you need. Advances in neural information processing systems 30"},{"key":"10721_CR88","doi-asserted-by":"publisher","first-page":"157","DOI":"10.1016\/j.aeue.2019.05.023","volume":"107","author":"DK Vishwakarma","year":"2019","unstructured":"Vishwakarma DK, Singh T (2019) A visual cognizance based multi-resolution descriptor for human action recognition using key pose. AEU-Int J Electron Commun 107:157\u2013169","journal-title":"AEU-Int J Electron Commun"},{"key":"10721_CR90","unstructured":"Wang Y, Long M, Wang J, Gao Z, Yu PS (2017) Predrnn: Recurrent neural networks for predictive learning using spatiotemporal lstms. Advances in neural information processing systems 30"},{"key":"10721_CR91","unstructured":"Wang Y, Jiang L, Yang M-H, Li L-J, Long M, Fei-Fei L (2019) Eidetic 3d lstm: A model for video prediction and beyond. In: International Conference on Learning Representations"},{"issue":"2","key":"10721_CR89","doi-asserted-by":"publisher","first-page":"2208","DOI":"10.1109\/TPAMI.2022.3165153","volume":"45","author":"Y Wang","year":"2022","unstructured":"Wang Y, Wu H, Zhang J, Gao Z, Wang J, Philip SY, Long M (2022) Predrnn: a recurrent neural network for spatiotemporal predictive learning. IEEE Trans Pattern Anal Mach Intell 45(2):2208\u20132225","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"10721_CR92","unstructured":"Wightman R, Touvron H, J\u00e9gou H (2021) Resnet strikes back: An improved training procedure in timm. arXiv preprint arXiv:2110.00476"},{"key":"10721_CR93","doi-asserted-by":"crossref","unstructured":"Xiao J, Hays J, Ehinger KA, Oliva A, Torralba A (2010) Sun database: Large-scale scene recognition from abbey to zoo. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3485\u20133492.","DOI":"10.1109\/CVPR.2010.5539970"},{"issue":"1","key":"10721_CR94","doi-asserted-by":"publisher","first-page":"216","DOI":"10.1038\/s41377-021-00658-8","volume":"10","author":"J Xiong","year":"2021","unstructured":"Xiong J, Hsiang E-L, He Z, Zhan T, Wu S-T (2021) Augmented reality and virtual reality displays: emerging technologies and future perspectives. Light Sci Appl 10(1):216","journal-title":"Light Sci Appl"},{"key":"10721_CR95","doi-asserted-by":"crossref","unstructured":"Yan S, Xiong X, Arnab A, Lu Z, Zhang M, Sun C, Schmid C (2022) Multiview transformers for video recognition. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 3333\u20133343","DOI":"10.1109\/CVPR52688.2022.00333"},{"issue":"1","key":"10721_CR96","doi-asserted-by":"publisher","first-page":"55","DOI":"10.1038\/s41746-023-00805-y","volume":"6","author":"J Yang","year":"2023","unstructured":"Yang J, Soltan AA, Eyre DW, Yang Y, Clifton DA (2023) An adversarial training framework for mitigating algorithmic biases in clinical machine learning. NPJ Digit Med 6(1):55","journal-title":"NPJ Digit Med"},{"key":"10721_CR97","doi-asserted-by":"publisher","first-page":"1897","DOI":"10.1007\/s10462-023-10566-5","volume":"56","author":"W Yang","year":"2023","unstructured":"Yang W, Yu H, Cui B, Sui R, Gu T (2023) Deep neural network pruning method based on sensitive layers and reinforcement learning. Artif Intell Rev 56:1897\u2013917","journal-title":"Artif Intell Rev"},{"issue":"9","key":"10721_CR98","first-page":"1799","volume":"50","author":"K Yu","year":"2013","unstructured":"Yu K, Jia L, Chen Y, Xu W (2013) Deep learning: yesterday, today, and tomorrow. J Comput Res Dev 50(9):1799\u20131804","journal-title":"J Comput Res Dev"},{"key":"10721_CR99","unstructured":"Yu W, Lu Y, Easterbrook S, Fidler S (2020) Efficient and information-preserving future frame prediction and beyond"},{"issue":"10","key":"10721_CR100","doi-asserted-by":"publisher","first-page":"2425","DOI":"10.1007\/s11263-022-01657-x","volume":"130","author":"\u00c9 Zablocki","year":"2022","unstructured":"Zablocki \u00c9, Ben-Younes H, P\u00e9rez P, Cord M (2022) Explainability of deep vision-based autonomous driving systems: review and challenges. Int J Comput Vision 130(10):2425\u20132452","journal-title":"Int J Comput Vision"}],"container-title":["Artificial Intelligence Review"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10462-024-10721-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10462-024-10721-6\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10462-024-10721-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,11,14]],"date-time":"2024-11-14T22:43:58Z","timestamp":1731624238000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10462-024-10721-6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,3,23]]},"references-count":100,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2024,4]]}},"alternative-id":["10721"],"URL":"https:\/\/doi.org\/10.1007\/s10462-024-10721-6","relation":{},"ISSN":["1573-7462"],"issn-type":[{"value":"1573-7462","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,3,23]]},"assertion":[{"value":"4 February 2024","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 March 2024","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors have no competing interests to declare that are relevant to the content of this article.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}],"article-number":"99"}}