{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,20]],"date-time":"2026-03-20T21:41:52Z","timestamp":1774042912908,"version":"3.50.1"},"reference-count":120,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2020,1,24]],"date-time":"2020-01-24T00:00:00Z","timestamp":1579824000000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/www.springer.com\/tdm"},{"start":{"date-parts":[[2020,1,24]],"date-time":"2020-01-24T00:00:00Z","timestamp":1579824000000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/www.springer.com\/tdm"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int J Multimed Info Retr"],"published-print":{"date-parts":[[2020,6]]},"DOI":"10.1007\/s13735-019-00190-x","type":"journal-article","created":{"date-parts":[[2020,1,24]],"date-time":"2020-01-24T07:02:30Z","timestamp":1579849350000},"page":"81-101","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":32,"title":["A study on deep learning spatiotemporal models and feature extraction techniques for video understanding"],"prefix":"10.1007","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0668-926X","authenticated-orcid":false,"given":"M.","family":"Suresha","sequence":"first","affiliation":[]},{"given":"S.","family":"Kuppa","sequence":"additional","affiliation":[]},{"given":"D. S.","family":"Raghukumar","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2020,1,24]]},"reference":[{"issue":"3","key":"190_CR1","doi-asserted-by":"publisher","first-page":"292","DOI":"10.3390\/electronics8030292","volume":"8","author":"MZ Alom","year":"2019","unstructured":"Alom MZ, Taha TM, Yakopcic C, Westberg S, Sidike P, Nasrin MS, Hasan M, Van Essen BC, Awwal AAS, Asari VK (2019) A state-of-the-art survey on deep learning theory and architectures. Electronics 8(3):292","journal-title":"Electronics"},{"issue":"4","key":"190_CR2","doi-asserted-by":"publisher","first-page":"83","DOI":"10.1145\/3161602","volume":"51","author":"G Atluri","year":"2018","unstructured":"Atluri G, Karpatne A, Kumar V (2018) Spatio-temporal data mining: a survey of problems and methods. ACM Comput Surv: CSUR 51(4):83","journal-title":"ACM Comput Surv: CSUR"},{"issue":"1","key":"190_CR3","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s11263-010-0390-2","volume":"92","author":"S Baker","year":"2011","unstructured":"Baker S, Scharstein D, Lewis JP, Roth S, Black MJ, Szeliski R (2011) A database and evaluation methodology for optical flow. Int J Comput Vis 92(1):1\u201331","journal-title":"Int J Comput Vis"},{"key":"190_CR4","unstructured":"Barrett B (2018) Inside the olympics opening ceremony world-record drone show. In: wired. https:\/\/www.wired.com\/story\/olympics-opening-ceremony-drone-show\/"},{"issue":"3","key":"190_CR5","doi-asserted-by":"publisher","first-page":"197","DOI":"10.1007\/s13735-018-0152-4","volume":"7","author":"SB Bhorge","year":"2018","unstructured":"Bhorge SB, Manthalkar RR (2018) Three-dimensional spatio-temporal trajectory descriptor for human action recognition. Int J Multimed Inf Retr 7(3):197\u2013205","journal-title":"Int J Multimed Inf Retr"},{"key":"190_CR6","doi-asserted-by":"publisher","first-page":"257","DOI":"10.1109\/34.910878","volume":"3","author":"AF Bobick","year":"2001","unstructured":"Bobick AF, Davis JW (2001) The recognition of human movement using temporal templates. IEEE Trans Pattern Anal Mach Intell 3:257\u2013267","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"issue":"15","key":"190_CR7","doi-asserted-by":"publisher","first-page":"1861","DOI":"10.1016\/j.patrec.2013.01.024","volume":"34","author":"GJ Burghouts","year":"2013","unstructured":"Burghouts GJ, Schutte K (2013) Spatio-temporal layout of human actions for improved bag-of-words action detection. Pattern Recogn Lett 34(15):1861\u20131869","journal-title":"Pattern Recogn Lett"},{"key":"190_CR8","unstructured":"Chalapathy R, Chawla S (2019) Deep learning for anomaly detection: a survey. arXiv:1901.03407"},{"key":"190_CR9","unstructured":"Chaudhry R, Ravichandran A, Hager G, Vidal R (2009) June. Histograms of oriented optical flow and Binet\u2013Cauchy kernels on nonlinear dynamical systems for the recognition of human actions. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 1932\u20131939"},{"issue":"1","key":"190_CR10","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41746-019-0122-0","volume":"2","author":"D Chen","year":"2019","unstructured":"Chen D, Liu S, Kingsbury P, Sohn S, Storlie CB, Habermann EB, Naessens JM, Larson DW, Liu H (2019) Deep learning and alternative learning strategies for retrospective real-world clinical data. NPJ Digit Med 2(1):1\u20135","journal-title":"NPJ Digit Med"},{"issue":"1","key":"190_CR11","doi-asserted-by":"publisher","first-page":"17","DOI":"10.1007\/s13735-017-0139-6","volume":"7","author":"K Chen","year":"2018","unstructured":"Chen K, Kovvuri R, Gao J, Nevatia R (2018) MSRC: multimodal spatial regression with semantic context for phrase grounding. Int J Multimed Inf Retr 7(1):17\u201328","journal-title":"Int J Multimed Inf Retr"},{"key":"190_CR12","doi-asserted-by":"publisher","first-page":"13","DOI":"10.1007\/978-3-319-06160-3_2","volume-title":"Smart City","author":"Annalisa Cocchia","year":"2014","unstructured":"Cocchia A (2014) Smart and digital city: a systematic literature review. In: Dameri RP, Rosenthal-Sabroux C (eds) Smart city. Progress in IS. Springer, Cham, pp 13\u201343. https:\/\/doi.org\/10.1007\/978-3-319-06160-3_2"},{"issue":"4","key":"190_CR13","doi-asserted-by":"publisher","first-page":"207","DOI":"10.1007\/s13735-018-0155-1","volume":"7","author":"Y Deldjoo","year":"2018","unstructured":"Deldjoo Y, Elahi M, Quadrana M, Cremonesi P (2018) Using visual features based on MPEG-7 and deep learning for movie recommendation. Int J Multimed Inf Retr 7(4):207\u2013219","journal-title":"Int J Multimed Inf Retr"},{"key":"190_CR14","doi-asserted-by":"publisher","first-page":"388","DOI":"10.1007\/978-3-030-01270-0_23","volume-title":"Computer Vision \u2013 ECCV 2018","author":"Yang Du","year":"2018","unstructured":"Du Y, Yuan C, Li B, Zhao L, Li Y, Hu W (2018) Interaction-aware spatio-temporal pyramid attention networks for action classification. In: Proceedings of the European conference on computer vision (ECCV), pp 373\u2013389"},{"issue":"6","key":"190_CR15","doi-asserted-by":"publisher","first-page":"428","DOI":"10.1038\/s41558-019-0481-1","volume":"9","author":"Darrick Evensen","year":"2019","unstructured":"Evensen D (2019) The rhetorical limitations of the #FridaysForFuture movement. Nat Clim Chang 9:428\u2013430. https:\/\/doi.org\/10.1038\/s41558-019-0481-1","journal-title":"Nature Climate Change"},{"key":"190_CR16","unstructured":"Fan J, Ma C, Zhong Y (2019) A selective overview of deep learning. arXiv:1904.05526"},{"key":"190_CR17","unstructured":"Federal Highway Administration (2015) Video analytics research projects. U.S Department of Transportation. 16 p"},{"key":"190_CR18","doi-asserted-by":"crossref","unstructured":"Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1933\u20131941","DOI":"10.1109\/CVPR.2016.213"},{"key":"190_CR19","doi-asserted-by":"crossref","unstructured":"Gammulle H, Denman S, Sridharan S, Fookes C (2017) March. Two stream lstm: a deep fusion framework for human action recognition. In: 2017 IEEE Winter conference on applications of computer vision (WACV). IEEE, pp 177\u2013186","DOI":"10.1109\/WACV.2017.27"},{"key":"190_CR20","doi-asserted-by":"publisher","DOI":"10.1201\/9781420010749","volume-title":"Handbook of approximation algorithms and metaheuristics","author":"TF Gonzalez","year":"2007","unstructured":"Gonzalez TF (2007) Handbook of approximation algorithms and metaheuristics. Chapman and Hall, London"},{"issue":"1","key":"190_CR21","doi-asserted-by":"publisher","first-page":"20","DOI":"10.1016\/0166-2236(92)90344-8","volume":"15","author":"MA Goodale","year":"1992","unstructured":"Goodale MA, Milner AD (1992) Separate visual pathways for perception and action. Trends Neurosci 15(1):20\u201325","journal-title":"Trends Neurosci"},{"key":"190_CR22","doi-asserted-by":"crossref","unstructured":"Guiming D, Xia W, Guangyan W, Yan Z, Dan L (2016) Speech recognition based on convolutional neural networks. In: 2016 IEEE international conference on signal and image processing (ICSIP). IEEE, pp 708\u2013711","DOI":"10.1109\/SIPROCESS.2016.7888355"},{"issue":"2","key":"190_CR23","doi-asserted-by":"publisher","first-page":"87","DOI":"10.1007\/s13735-017-0141-z","volume":"7","author":"Y Guo","year":"2018","unstructured":"Guo Y, Liu Y, Georgiou T, Lew MS (2018) A review of semantic segmentation using deep neural networks. Int J Multimed Inf Retr 7(2):87\u201393","journal-title":"Int J Multimed Inf Retr"},{"key":"190_CR24","doi-asserted-by":"publisher","first-page":"24411","DOI":"10.1109\/ACCESS.2018.2830661","volume":"6","author":"WG Hatcher","year":"2018","unstructured":"Hatcher WG, Yu W (2018) A survey of deep learning: platforms, applications and emerging research trends. IEEE Access 6:24411\u201324432","journal-title":"IEEE Access"},{"key":"190_CR25","unstructured":"He D, Li F, Zhao Q, Long X, Fu Y, Wen S (2018) Exploiting spatial-temporal modelling and multi-modal fusion for human action recognition. arXiv:1806.10319"},{"key":"190_CR26","doi-asserted-by":"crossref","unstructured":"He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770\u2013778","DOI":"10.1109\/CVPR.2016.90"},{"key":"190_CR27","doi-asserted-by":"crossref","unstructured":"Hoang VD, Hoang DH, Hieu CL (2018) Action recognition based on sequential 2D-CNN for surveillance systems. In: IECON 2018-44th annual conference of the IEEE industrial electronics society. IEEE, pp 3225\u20133230","DOI":"10.1109\/IECON.2018.8591338"},{"key":"190_CR28","unstructured":"Honda (2018) Cooperative merge. In: Honda news. http:\/\/www.multivu.com\/players\/English\/7988331-honda-ces-cooperative-mobility-ecosystem\/"},{"key":"190_CR29","doi-asserted-by":"crossref","unstructured":"Hou R, Chen C, Shah M (2017) Tube convolutional neural network (T-CNN) for action detection in videos. In: Proceedings of the IEEE international conference on computer vision, pp 5822\u20135831","DOI":"10.1109\/ICCV.2017.620"},{"key":"190_CR30","unstructured":"Huang H, Yu PS, Wang C (2018) An introduction to image synthesis with generative adversarial nets. arXiv:1803.04469"},{"key":"190_CR31","doi-asserted-by":"crossref","unstructured":"Hui TW, Tang X, Change Loy C (2018) Liteflownet: a lightweight convolutional neural network for optical flow estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8981\u20138989","DOI":"10.1109\/CVPR.2018.00936"},{"key":"190_CR32","unstructured":"Ilg E, Mayer N, Saikia T, Keuper M, Dosovitskiy A, Brox T (2017) Flownet 2.0: evolution of optical flow estimation with deep networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2462\u20132470"},{"issue":"11","key":"190_CR33","doi-asserted-by":"publisher","first-page":"3137","DOI":"10.1109\/TMM.2018.2823900","volume":"20","author":"YG Jiang","year":"2018","unstructured":"Jiang YG, Wu Z, Tang J, Li Z, Xue X, Chang SF (2018) Modeling multimodal clues in a hybrid deep learning framework for video classification. IEEE Trans Multimed 20(11):3137\u20133147","journal-title":"IEEE Trans Multimed"},{"issue":"2","key":"190_CR34","doi-asserted-by":"publisher","first-page":"352","DOI":"10.1109\/TPAMI.2017.2670560","volume":"40","author":"YG Jiang","year":"2017","unstructured":"Jiang YG, Wu Z, Wang J, Xue X, Chang SF (2017) Exploiting feature and class relationships in video categorization with regularized deep neural networks. IEEE Trans Pattern Anal Mach Intell 40(2):352\u2013364","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"190_CR35","unstructured":"Kahn J (2018) Meet \u2018Millie\u2019 the Avatar. She\u2019d like to sell you a pair of sunglasses. In: Bloomberg. https:\/\/www.bloomberg.com\/news\/articles\/2018-12-15\/meet-millie-the-avatar-she-d-like-to-sell-you-a-pair-of-sunglasses"},{"issue":"2","key":"190_CR36","doi-asserted-by":"publisher","first-page":"255","DOI":"10.1007\/s11760-017-1153-0","volume":"12","author":"L Kangwei","year":"2018","unstructured":"Kangwei L, Jianhua W, Zhongzhi H (2018) Abnormal event detection and localization using level set based on hybrid features. Signal Image Video Process 12(2):255\u2013261","journal-title":"Signal Image Video Process"},{"key":"190_CR37","doi-asserted-by":"crossref","unstructured":"Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1725\u20131732","DOI":"10.1109\/CVPR.2014.223"},{"issue":"1","key":"190_CR38","doi-asserted-by":"publisher","first-page":"195","DOI":"10.1186\/s12916-019-1426-2","volume":"17","author":"CJ Kelly","year":"2019","unstructured":"Kelly CJ, Karthikesalingam A, Suleyman M, Corrado G, King D (2019) Key challenges for delivering clinical impact with artificial intelligence. BMC Med 17(1):195","journal-title":"BMC Med"},{"key":"190_CR39","unstructured":"Kong Y, Fu Y (2018) Human action recognition and prediction: a survey. arXiv:1806.11230"},{"issue":"8","key":"190_CR40","doi-asserted-by":"publisher","first-page":"1847","DOI":"10.1109\/TPAMI.2012.272","volume":"35","author":"N Kruger","year":"2012","unstructured":"Kruger N, Janssen P, Kalkan S, Lappe M, Leonardis A, Piater J, Rodriguez-Sanchez AJ, Wiskott L (2012) Deep hierarchies in the primate visual cortex: what can we learn for computer vision? IEEE Trans Pattern Anal Mach Intell 35(8):1847\u20131871","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"190_CR41","unstructured":"Kumaran SK, Dogra DP, Roy PP (2019) Anomaly detection in road traffic using visual surveillance: a survey. arXiv:1901.08292"},{"issue":"2\u20133","key":"190_CR42","doi-asserted-by":"publisher","first-page":"107","DOI":"10.1007\/s11263-005-1838-7","volume":"64","author":"I Laptev","year":"2005","unstructured":"Laptev I (2005) On space-time interest points. Int J Comput Vis 64(2\u20133):107\u2013123","journal-title":"Int J Comput Vis"},{"key":"190_CR43","doi-asserted-by":"crossref","unstructured":"Laptev I, Marsza\u0142ek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: CVPR\u2014IEEE conference on computer vision & pattern recognition, Jun 2008, Anchorage, USA, pp 1\u20138","DOI":"10.1109\/CVPR.2008.4587756"},{"issue":"7553","key":"190_CR44","doi-asserted-by":"publisher","first-page":"436","DOI":"10.1038\/nature14539","volume":"521","author":"Y LeCun","year":"2015","unstructured":"LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436\u2013444","journal-title":"Nature"},{"key":"190_CR45","doi-asserted-by":"crossref","unstructured":"Lenz I, Gemici M, Saxena A (2012) Low-power parallel algorithms for single image based obstacle avoidance in aerial robots. In: 2012 IEEE\/RSJ international conference on intelligent robots and systems. IEEE, pp 772\u2013779","DOI":"10.1109\/IROS.2012.6386146"},{"issue":"4\u20135","key":"190_CR46","doi-asserted-by":"publisher","first-page":"421","DOI":"10.1177\/0278364917710318","volume":"37","author":"S Levine","year":"2018","unstructured":"Levine S, Pastor P, Krizhevsky A, Ibarz J, Quillen D (2018) Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. Int J Robot Res 37(4\u20135):421\u2013436","journal-title":"Int J Robot Res"},{"key":"190_CR47","unstructured":"Li F, Du J (2012) October. Local spatio-temporal interest point detection for human action recognition. In: 2012 IEEE fifth international conference on advanced computational intelligence (ICACI). IEEE, pp 579\u2013582"},{"issue":"1","key":"190_CR48","doi-asserted-by":"publisher","first-page":"85","DOI":"10.1007\/s13735-016-0117-4","volume":"6","author":"Q Li","year":"2017","unstructured":"Li Q, Qiu Z, Yao T, Mei T, Rui Y, Luo J (2017) Learning hierarchical video representation for action recognition. Int J Multimed Inf Retr 6(1):85\u201398","journal-title":"Int J Multimed Inf Retr"},{"key":"190_CR49","doi-asserted-by":"crossref","unstructured":"Li X, Pang T, Liu W, Wang T (2017) Fall detection for elderly person care using convolutional neural networks. In: 2017 10th international congress on image and signal processing, biomedical engineering and informatics (CISP-BMEI). IEEE, pp 1\u20136","DOI":"10.1109\/CISP-BMEI.2017.8302004"},{"issue":"9","key":"190_CR50","doi-asserted-by":"publisher","first-page":"3436","DOI":"10.1007\/s10489-019-01459-8","volume":"49","author":"J Liu","year":"2019","unstructured":"Liu J, Sun C, Xu X, Xu B, Yu S (2019) A spatial and temporal features mixture model with body parts for video-based person re-identification. Appl Intell 49(9):3436\u20133446","journal-title":"Appl Intell"},{"key":"190_CR51","unstructured":"Livni R, Shalev-Shwartz S, Shamir O (2014) On the computational efficiency of training neural networks. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (eds) Advances in neural information processing systems, vol 27. Curran Associates, Inc., pp 855\u2013863. http:\/\/papers.nips.cc\/paper\/5267-on-the-computational-efficiency-of-training-neural-networks.pdf"},{"issue":"2","key":"190_CR52","doi-asserted-by":"publisher","first-page":"91","DOI":"10.1023\/B:VISI.0000029664.99615.94","volume":"60","author":"DG Lowe","year":"2004","unstructured":"Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91\u2013110","journal-title":"Int J Comput Vis"},{"key":"190_CR53","doi-asserted-by":"crossref","unstructured":"Lu C, Shi J, Jia J (2013) Abnormal event detection at 150 fps in matlab. In: Proceedings of the IEEE international conference on computer vision, pp 2720\u20132727","DOI":"10.1109\/ICCV.2013.338"},{"issue":"5","key":"190_CR54","doi-asserted-by":"publisher","first-page":"1445","DOI":"10.1021\/acs.molpharmaceut.5b00982","volume":"13","author":"P Mamoshina","year":"2016","unstructured":"Mamoshina P, Vieira A, Putin E, Zhavoronkov A (2016) Applications of deep learning in biomedicine. Mol Pharm 13(5):1445\u20131454","journal-title":"Mol Pharm"},{"key":"190_CR55","unstructured":"Marcus G (2018) Deep learning: a critical appraisal. arXiv:1801.00631"},{"issue":"15","key":"190_CR56","doi-asserted-by":"publisher","first-page":"1990","DOI":"10.1016\/j.patrec.2013.04.025","volume":"34","author":"R Melfi","year":"2013","unstructured":"Melfi R, Kondra S, Petrosino A (2013) Human activity modeling by spatio temporal textural appearance. Pattern Recogn Lett 34(15):1990\u20131994","journal-title":"Pattern Recogn Lett"},{"key":"190_CR57","doi-asserted-by":"crossref","unstructured":"Menze M, Geiger A (2015) Object scene flow for autonomous vehicles. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3061\u20133070","DOI":"10.1109\/CVPR.2015.7298925"},{"issue":"1","key":"190_CR58","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1007\/s13735-018-00166-3","volume":"8","author":"NC Mithun","year":"2019","unstructured":"Mithun NC, Li J, Metze F, Roy-Chowdhury AK (2019) Joint embeddings with multimodal cues for video-text retrieval. Int J Multimed Inf Retr 8(1):3\u201318","journal-title":"Int J Multimed Inf Retr"},{"issue":"1","key":"190_CR59","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s40537-014-0007-7","volume":"2","author":"MM Najafabadi","year":"2015","unstructured":"Najafabadi MM, Villanustre F, Khoshgoftaar TM, Seliya N, Wald R, Muharemagic E (2015) Deep learning applications and challenges in big data analytics. J Big Data 2(1):1","journal-title":"J Big Data"},{"key":"190_CR60","doi-asserted-by":"publisher","first-page":"48231","DOI":"10.1109\/ACCESS.2018.2863036","volume":"6","author":"S Naseer","year":"2018","unstructured":"Naseer S, Saleem Y, Khalid S, Bashir MK, Han J, Iqbal MM, Han K (2018) Enhanced network anomaly detection based on deep neural networks. IEEE Access 6:48231\u201348246","journal-title":"IEEE Access"},{"key":"190_CR61","doi-asserted-by":"crossref","unstructured":"Ouadiay FZ, Bouftaih H, Bouyakhf EH, Himmi MM (2018) Simultaneous object detection and localization using convolutional neural networks. In: 2018 international conference on intelligent systems and computer vision (ISCV). IEEE, pp 1\u20138","DOI":"10.1109\/ISACV.2018.8354045"},{"key":"190_CR62","doi-asserted-by":"crossref","unstructured":"Palmer R, West G, Tan T (2012) Scale proportionate histograms of oriented gradients for object detection in co-registered visual and range data. In: 2012 international conference on digital image computing techniques and applications (DICTA). IEEE, pp 1\u20138","DOI":"10.1109\/DICTA.2012.6411699"},{"issue":"16","key":"190_CR63","doi-asserted-by":"publisher","first-page":"3503","DOI":"10.3390\/s19163503","volume":"19","author":"Konstantinos Papadopoulos","year":"2019","unstructured":"Papadopoulos K, Demisse G, Ghorbel E, Antunes M, Aouada D, Ottersten B (2019) Localized trajectories for 2D and 3D action recognition. arXiv:1904.05244","journal-title":"Sensors"},{"key":"190_CR64","doi-asserted-by":"crossref","unstructured":"Papernot N, McDaniel P, Jha S, Fredrikson M, Celik ZB, Swami A (2016) The limitations of deep learning in adversarial settings. In: 2016 IEEE European symposium on security and privacy (EuroS&P). IEEE, pp 372\u2013387","DOI":"10.1109\/EuroSP.2016.36"},{"key":"190_CR65","doi-asserted-by":"crossref","unstructured":"Peng K, Chen X, Zhou D, Liu Y (2009) 3D reconstruction based on SIFT and Harris feature points. In: 2009 IEEE international conference on robotics and biomimetics (ROBIO). IEEE, pp 960\u2013964","DOI":"10.1109\/ROBIO.2009.5420735"},{"issue":"3","key":"190_CR66","doi-asserted-by":"publisher","first-page":"773","DOI":"10.1109\/TCSVT.2018.2808685","volume":"29","author":"Y Peng","year":"2018","unstructured":"Peng Y, Zhao Y, Zhang J (2018) Two-stream collaborative learning with spatial-temporal attention for video classification. IEEE Trans Circuits Syst Video Technol 29(3):773\u2013786","journal-title":"IEEE Trans Circuits Syst Video Technol"},{"issue":"4","key":"190_CR67","doi-asserted-by":"publisher","first-page":"939","DOI":"10.1109\/TMM.2017.2759504","volume":"20","author":"Z Qiu","year":"2017","unstructured":"Qiu Z, Yao T, Mei T (2017) Learning deep spatio-temporal dependence for semantic video segmentation. IEEE Trans Multimed 20(4):939\u2013949","journal-title":"IEEE Trans Multimed"},{"key":"190_CR68","doi-asserted-by":"publisher","first-page":"662","DOI":"10.1016\/j.jvcir.2018.12.002","volume":"58","author":"KS Ray","year":"2019","unstructured":"Ray KS, Chakraborty S (2019) Object detection by spatio-temporal analysis and tracking of the detected objects in a video with variable background. J Vis Commun Image Represent 58:662\u2013674","journal-title":"J Vis Commun Image Represent"},{"key":"190_CR69","unstructured":"Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R (eds) Advances in neural information processing systems, vol 28. Curran Associates, Inc., pp 91\u201399 http:\/\/papers.nips.cc\/paper\/5638-faster-r-cnn-towards-real-time-object-detection-with-region-proposal-networks.pdf"},{"key":"190_CR70","unstructured":"Rolnick D, Donti PL, Kaack LH, Kochanski K, Lacoste A, Sankaran K, Ross AS, Milojevic-Dupont N, Jaques N, Waldman-Brown A, Luccioni A (2019) Tackling climate change with machine learning. arXiv:1906.05433"},{"issue":"3","key":"190_CR71","doi-asserted-by":"publisher","first-page":"211","DOI":"10.1007\/s11263-015-0816-y","volume":"115","author":"O Russakovsky","year":"2015","unstructured":"Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211\u2013252","journal-title":"Int J Comput Vis"},{"key":"190_CR72","doi-asserted-by":"crossref","unstructured":"Scovanner P, Ali S, Shah M (2007) A 3-dimensional sift descriptor and its application to action recognition. In: Proceedings of the 15th ACM international conference on multimedia. ACM, pp 357\u2013360","DOI":"10.1145\/1291233.1291311"},{"key":"190_CR73","doi-asserted-by":"publisher","first-page":"37","DOI":"10.1016\/j.patrec.2015.06.029","volume":"65","author":"M Sekma","year":"2015","unstructured":"Sekma M, Mejdoub M, Amar CB (2015) Human action recognition based on multi-layer fisher vector encoding method. Pattern Recogn Lett 65:37\u201343","journal-title":"Pattern Recogn Lett"},{"key":"190_CR74","unstructured":"Seligman L (2016) How swarming drones could change the face of air warfare. In: Def. News. https:\/\/www.defensenews.com\/2016\/05\/17\/how-swarming-drones-could-change-the-face-of-air-warfare\/"},{"key":"190_CR75","unstructured":"Sermanet P, Chintala S, LeCun Y (2012) Convolutional neural networks applied to house numbers digit classification. arXiv:1204.3968"},{"key":"190_CR76","doi-asserted-by":"crossref","unstructured":"Shou Z, Lin X, Kalantidis Y, Sevilla-Lara L, Rohrbach M, Chang SF, Yan Z (2019) Dmc-net: generating discriminative motion cues for fast compressed video action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1268\u20131277","DOI":"10.1109\/CVPR.2019.00136"},{"key":"190_CR77","unstructured":"Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556"},{"key":"190_CR78","unstructured":"Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KD (eds) Advances in neural information processing systems, vol 27. Curran Associates, Inc., pp 568\u2013576. http:\/\/papers.nips.cc\/paper\/5353-two-stream-convolutional-networks-for-action-recognition-in-videos.pdf"},{"key":"190_CR79","doi-asserted-by":"crossref","unstructured":"Singh B, Marks TK, Jones M, Tuzel O, Shao M (2016) A multi-stream bi-directional recurrent neural network for fine-grained action detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1961\u20131970","DOI":"10.1109\/CVPR.2016.216"},{"key":"190_CR80","doi-asserted-by":"publisher","first-page":"263","DOI":"10.1016\/j.jbusres.2016.08.001","volume":"70","author":"U Sivarajah","year":"2017","unstructured":"Sivarajah U, Kamal MM, Irani Z, Weerakkody V (2017) Critical analysis of Big Data challenges and analytical methods. J Bus Res 70:263\u2013286","journal-title":"J Bus Res"},{"key":"190_CR81","unstructured":"Soomro K, Zamir AR, Shah M (2012) A dataset of 101 human action classes from videos in the wild. Center for Research in Computer Vision"},{"issue":"1","key":"190_CR82","doi-asserted-by":"publisher","first-page":"48","DOI":"10.1186\/s40537-019-0212-5","volume":"6","author":"G Sreenu","year":"2019","unstructured":"Sreenu G, Durai MS (2019) Intelligent video surveillance: a review through deep learning techniques for crowd analysis. J Big Data 6(1):48","journal-title":"J Big Data"},{"key":"190_CR83","doi-asserted-by":"crossref","unstructured":"Sun C, Shetty S, Sukthankar R, Nevatia R (2015) Temporal localization of fine-grained actions in videos by domain transfer from web images. In: Proceedings of the 23rd ACM international conference on multimedia. ACM, pp 371\u2013380","DOI":"10.1145\/2733373.2806226"},{"key":"190_CR84","doi-asserted-by":"crossref","unstructured":"Sun D, Yang X, Liu MY, Kautz J (2018) PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8934\u20138943","DOI":"10.1109\/CVPR.2018.00931"},{"key":"190_CR85","doi-asserted-by":"crossref","unstructured":"Sun L, Jia K, Yeung DY, Shi BE (2015) Human action recognition using factorized spatio-temporal convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 4597\u20134605","DOI":"10.1109\/ICCV.2015.522"},{"key":"190_CR86","doi-asserted-by":"crossref","unstructured":"Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818\u20132826","DOI":"10.1109\/CVPR.2016.308"},{"key":"190_CR87","doi-asserted-by":"publisher","first-page":"270","DOI":"10.1007\/978-3-030-01424-7_27","volume-title":"Artificial Neural Networks and Machine Learning \u2013 ICANN 2018","author":"Chuanqi Tan","year":"2018","unstructured":"Tan C, Sun F, Kong T, Zhang W, Yang C, Liu C (2018) A survey on deep transfer learning. In: International conference on artificial neural networks. Springer, Cham, pp 270\u2013279"},{"key":"190_CR88","unstructured":"Tan M, Le QV (2019) EfficientNet: rethinking model scaling for convolutional neural networks. arXiv:1905.11946"},{"key":"190_CR89","unstructured":"Thakkar K, Narayanan PJ (2018) Part-based graph convolutional network for action recognition. arXiv:1809.04983"},{"key":"190_CR90","doi-asserted-by":"crossref","unstructured":"Tian Y, Pei K, Jana S, Ray B (2018) Deeptest: automated testing of deep-neural-network-driven autonomous cars. In: Proceedings of the 40th international conference on software engineering. ACM, pp 303\u2013314","DOI":"10.1145\/3180155.3180220"},{"key":"190_CR91","doi-asserted-by":"crossref","unstructured":"Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 4489\u20134497","DOI":"10.1109\/ICCV.2015.510"},{"issue":"2","key":"190_CR92","doi-asserted-by":"publisher","first-page":"283","DOI":"10.1007\/s10462-017-9545-7","volume":"50","author":"RK Tripathi","year":"2018","unstructured":"Tripathi RK, Jalal AS, Agrawal SC (2018) Suspicious human activity recognition: a review. Artif Intell Rev 50(2):283\u2013339","journal-title":"Artif Intell Rev"},{"key":"190_CR93","doi-asserted-by":"publisher","first-page":"1155","DOI":"10.1109\/ACCESS.2017.2778011","volume":"6","author":"A Ullah","year":"2017","unstructured":"Ullah A, Ahmad J, Muhammad K, Sajjad M, Baik SW (2017) Action recognition in video sequences using deep bi-directional LSTM with CNN features. IEEE Access 6:1155\u20131166","journal-title":"IEEE Access"},{"key":"190_CR94","doi-asserted-by":"crossref","unstructured":"Wang H, Kl\u00e4ser A, Schmid C, Liu CL (2011) Action recognition by dense trajectories. CVPR. In: IEEE conference on computer vision & pattern recognition, June 2011. Colorado Springs, United States, pp 3169\u20133176","DOI":"10.1109\/CVPR.2011.5995407"},{"key":"190_CR95","doi-asserted-by":"crossref","unstructured":"Wang H, Schmid C (2013) Action recognition with improved trajectories. In: Proceedings of the IEEE international conference on computer vision, pp 3551\u20133558","DOI":"10.1109\/ICCV.2013.441"},{"key":"190_CR96","doi-asserted-by":"publisher","first-page":"33","DOI":"10.1016\/j.patrec.2017.04.004","volume":"92","author":"L Wang","year":"2017","unstructured":"Wang L, Ge L, Li R, Fang Y (2017) Three-stream CNNs for action recognition. Pattern Recogn Lett 92:33\u201340","journal-title":"Pattern Recogn Lett"},{"issue":"3","key":"190_CR97","doi-asserted-by":"publisher","first-page":"585","DOI":"10.1016\/S0031-3203(02)00100-0","volume":"36","author":"L Wang","year":"2003","unstructured":"Wang L, Hu W, Tan T (2003) Recent developments in human motion analysis. Pattern Recogn 36(3):585\u2013601","journal-title":"Pattern Recogn"},{"key":"190_CR98","doi-asserted-by":"crossref","unstructured":"Wang L, Qiao Y, Tang X (2015) Action recognition with trajectory-pooled deep-convolutional descriptors. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4305\u20134314","DOI":"10.1109\/CVPR.2015.7299059"},{"key":"190_CR99","doi-asserted-by":"publisher","first-page":"118","DOI":"10.1016\/j.cviu.2018.04.007","volume":"171","author":"P Wang","year":"2018","unstructured":"Wang P, Li W, Ogunbona P, Wan J, Escalera S (2018) RGB-D-based human motion recognition with deep learning: a survey. Comput Vis Image Underst 171:118\u2013139","journal-title":"Comput Vis Image Underst"},{"key":"190_CR100","doi-asserted-by":"crossref","unstructured":"Wang T, Snoussi H (2012) Histograms of optical flow orientation for visual abnormal events detection. In: 2012 IEEE ninth international conference on advanced video and signal-based surveillance. IEEE, pp 13\u201318","DOI":"10.1109\/AVSS.2012.39"},{"key":"190_CR101","doi-asserted-by":"crossref","unstructured":"Wang Y, Long M, Wang J, Yu PS (2017) Spatiotemporal pyramid network for video action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1529\u20131538","DOI":"10.1109\/CVPR.2017.226"},{"key":"190_CR102","doi-asserted-by":"publisher","first-page":"68","DOI":"10.1016\/j.neucom.2018.01.076","volume":"287","author":"Z Wang","year":"2018","unstructured":"Wang Z, Ren J, Zhang D, Sun M, Jiang J (2018) A deep-learning based feature hybrid framework for spatiotemporal saliency detection inside videos. Neurocomputing 287:68\u201383","journal-title":"Neurocomputing"},{"key":"190_CR103","unstructured":"Weng X (2019) On the importance of video action recognition for visual lipreading. arXiv:1903.09616"},{"key":"190_CR104","unstructured":"Wu Z, Jiang YG, Wang J, Pu J, Xue X (2014) November. Exploring inter-feature and inter-class relationships with deep neural networks for video classification. In: Proceedings of the 22nd ACM international conference on multimedia. ACM, pp 167\u2013176"},{"key":"190_CR105","doi-asserted-by":"crossref","unstructured":"Wu Z, Wang X, Jiang YG, Ye H, Xue X (2015) Modeling spatial-temporal clues in a hybrid deep learning framework for video classification. In: Proceedings of the 23rd ACM international conference on multimedia. ACM, pp 461\u2013470","DOI":"10.1145\/2733373.2806222"},{"key":"190_CR106","unstructured":"Wu Z, Yao T, Fu Y, Jiang YG (2016) Deep learning for video classification and captioning. arXiv:1609.06782"},{"key":"190_CR107","doi-asserted-by":"crossref","unstructured":"Xu Z, Yang Y, Hauptmann AG (2015) A discriminative CNN video representation for event detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1798\u20131807","DOI":"10.1109\/CVPR.2015.7298789"},{"key":"190_CR108","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1155\/2016\/1760172","volume":"2016","author":"Li Yao","year":"2016","unstructured":"Yao L (2016) Extract the relational information of static features and motion features for human activities recognition in videos. Intell Neurosci 2016:3. https:\/\/doi.org\/10.1155\/2016\/1760172","journal-title":"Computational Intelligence and Neuroscience"},{"key":"190_CR109","doi-asserted-by":"crossref","unstructured":"Ye H, Wu Z, Zhao RW, Wang X, Jiang YG, Xue X (2015) Evaluating two-stream CNN for video classification. In: Proceedings of the 5th ACM on international conference on multimedia retrieval. ACM, pp 435\u2013442","DOI":"10.1145\/2671188.2749406"},{"key":"190_CR110","doi-asserted-by":"publisher","first-page":"88","DOI":"10.1016\/j.patcog.2016.02.022","volume":"59","author":"Y Yuan","year":"2016","unstructured":"Yuan Y, Zheng X, Lu X (2016) A discriminative representation for human action recognition. Pattern Recogn 59:88\u201397","journal-title":"Pattern Recogn"},{"issue":"4","key":"190_CR111","first-page":"13","volume":"8","author":"M Zab\u0142ocki","year":"2014","unstructured":"Zab\u0142ocki M, Go\u015bciewska K, Frejlichowski D, Hofman R (2014) Intelligent video surveillance systems for public spaces\u2014a survey. J Theor Appl Comput Sci 8(4):13\u201327","journal-title":"J Theor Appl Comput Sci"},{"key":"190_CR112","doi-asserted-by":"crossref","unstructured":"Zhan F, Zhu H, Lu S (2019) Spatial fusion gan for image synthesis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3653\u20133662","DOI":"10.1109\/CVPR.2019.00377"},{"key":"190_CR113","unstructured":"Zhang C, Vinyals O, Munos R, Bengio S (2018) A study on overfitting in deep reinforcement learning. arXiv:1804.06893"},{"key":"190_CR114","doi-asserted-by":"crossref","unstructured":"Zhang H, Liu D, Xiong Z (2019) Two-stream oriented video super-resolution for action recognition. arXiv:1903.05577","DOI":"10.1109\/ICCV.2019.00889"},{"issue":"1","key":"190_CR115","doi-asserted-by":"publisher","first-page":"56","DOI":"10.3390\/s19010056","volume":"19","author":"J Zhang","year":"2019","unstructured":"Zhang J, Feng Z, Su Y, Xing M, Xue W (2019) Riemannian spatio-temporal features of locomotion for individual recognition. Sensors 19(1):56","journal-title":"Sensors"},{"issue":"1","key":"190_CR116","doi-asserted-by":"publisher","first-page":"8","DOI":"10.3390\/a12010008","volume":"12","author":"W Zhang","year":"2019","unstructured":"Zhang W, Luo Y, Chen Z, Du Y, Zhu D, Liu P (2019) A robust visual tracking algorithm based on spatial-temporal context hierarchical response fusion. Algorithms 12(1):8","journal-title":"Algorithms"},{"key":"190_CR117","doi-asserted-by":"publisher","first-page":"9227","DOI":"10.1609\/aaai.v33i01.33019227","volume":"33","author":"Xiao-Yu Zhang","year":"2019","unstructured":"Zhang XY, Shi H, Li C, Zheng K, Zhu X, Duan L (2019) Learning transferable self-attentive representations for action recognition in untrimmed videos with weak supervision. In: Proceedings of the 33rd AAAI conference on artificial intelligence, pp 1\u20138","journal-title":"Proceedings of the AAAI Conference on Artificial Intelligence"},{"key":"190_CR118","doi-asserted-by":"crossref","unstructured":"Zhao R, Ali H, Van der Smagt P (2017) Two-stream RNN\/CNN for action recognition in 3D videos. In: 2017 IEEE\/RSJ international conference on intelligent robots and systems (IROS). IEEE, pp 4260\u20134267","DOI":"10.1109\/IROS.2017.8206288"},{"key":"190_CR119","doi-asserted-by":"crossref","unstructured":"Zhu AZ, Yuan L, Chaney K, Daniilidis K (2018) EV-FlowNet: self-supervised optical flow estimation for event-based cameras. arXiv:1802.06898","DOI":"10.15607\/RSS.2018.XIV.062"},{"key":"190_CR120","doi-asserted-by":"crossref","unstructured":"Zhu JY, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE international conference on computer vision, pp 2223\u20132232","DOI":"10.1109\/ICCV.2017.244"}],"container-title":["International Journal of Multimedia Information Retrieval"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s13735-019-00190-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1007\/s13735-019-00190-x\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1007\/s13735-019-00190-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,1,23]],"date-time":"2021-01-23T00:46:09Z","timestamp":1611362769000},"score":1,"resource":{"primary":{"URL":"http:\/\/link.springer.com\/10.1007\/s13735-019-00190-x"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,1,24]]},"references-count":120,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2020,6]]}},"alternative-id":["190"],"URL":"https:\/\/doi.org\/10.1007\/s13735-019-00190-x","relation":{},"ISSN":["2192-6611","2192-662X"],"issn-type":[{"value":"2192-6611","type":"print"},{"value":"2192-662X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,1,24]]},"assertion":[{"value":"20 May 2019","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 December 2019","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"21 December 2019","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"24 January 2020","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}