{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,7]],"date-time":"2026-04-07T16:34:11Z","timestamp":1775579651743,"version":"3.50.1"},"reference-count":76,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2021,11,13]],"date-time":"2021-11-13T00:00:00Z","timestamp":1636761600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,11,13]],"date-time":"2021-11-13T00:00:00Z","timestamp":1636761600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2022,4]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Online action detection (OAD) is a practical yet challenging task, which has attracted increasing attention in recent years. A typical OAD system mainly consists of three modules: a frame-level feature extractor which is usually based on pre-trained deep Convolutional Neural Networks (CNNs), a temporal modeling module, and an action classifier. Among them, the temporal modeling module is crucial which aggregates discriminative information from historical and current features. Though many temporal modeling methods have been developed for OAD and other topics, their effects are lack of investigation on OAD fairly. This paper aims to provide an empirical study on temporal modeling for OAD including four meta types of temporal modeling methods, <jats:italic>i<\/jats:italic>.<jats:italic>e<\/jats:italic>. temporal pooling, temporal convolution, recurrent neural networks, and temporal attention, and uncover some good practices to produce a state-of-the-art OAD system. Many of them are explored in OAD for the first time, and extensively evaluated with various hyper parameters. Furthermore, based on our empirical study, we present several hybrid temporal modeling methods. Our best networks, <jats:italic>i<\/jats:italic>.<jats:italic>e<\/jats:italic>. , the hybridization of DCC, LSTM and M-NL, and the hybridization of DCC and M-NL, which outperform previously published results with sizable margins on THUMOS-14 dataset (48.6% vs. 47.2%) and TVSeries dataset (84.3% vs. 83.7%).<\/jats:p>","DOI":"10.1007\/s40747-021-00534-3","type":"journal-article","created":{"date-parts":[[2021,11,13]],"date-time":"2021-11-13T02:02:11Z","timestamp":1636768931000},"page":"1803-1817","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["An empirical study on temporal modeling for online action detection"],"prefix":"10.1007","volume":"8","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3737-3201","authenticated-orcid":false,"given":"Wen","family":"Wang","sequence":"first","affiliation":[]},{"given":"Xiaojiang","family":"Peng","sequence":"additional","affiliation":[]},{"given":"Yu","family":"Qiao","sequence":"additional","affiliation":[]},{"given":"Jian","family":"Cheng","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2021,11,13]]},"reference":[{"key":"534_CR1","first-page":"280","volume":"1","author":"MS Aliakbarian","year":"2017","unstructured":"Aliakbarian MS, Saleh FS, Salzmann M, Fernando B, Petersson L, Andersson L (2017) Encouraging lstms to anticipate actions very early. ICCV 1:280\u2013289","journal-title":"ICCV"},{"key":"534_CR2","unstructured":"Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv: 1409.0473"},{"key":"534_CR3","unstructured":"Bai S, Kolter JZ, Koltun V (2018) An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv: 1803.01271"},{"key":"534_CR4","first-page":"961","volume":"1","author":"F Caba","year":"2015","unstructured":"Caba F, Escorcia V, Ghanem B, Niebles JC (2015) Activitynet: A large-scale video benchmark for human activity understanding. CVPR 1:961\u2013970","journal-title":"CVPR"},{"key":"534_CR5","first-page":"6299","volume":"1","author":"J Carreira","year":"2017","unstructured":"Carreira J, Zisserman A (2017) Quo vadis, action recognition? a new model and the kinetics dataset. CVPR 1:6299\u20136308","journal-title":"CVPR"},{"key":"534_CR6","first-page":"1130","volume":"1","author":"YW Chao","year":"2018","unstructured":"Chao YW, Vijayanarasimhan S, Seybold B, Ross DA, Deng J, Sukthankar R (2018) Rethinking the faster r-cnn architecture for temporal action localization. CVPR 1:1130\u20131139","journal-title":"CVPR"},{"key":"534_CR7","doi-asserted-by":"crossref","unstructured":"Cho K, van Merrienboer B, Bahdanau D, Bengio Y (2014) On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259","DOI":"10.3115\/v1\/W14-4012"},{"key":"534_CR8","first-page":"933","volume":"70","author":"YN Dauphin","year":"2017","unstructured":"Dauphin YN, Fan A, Auli M, Grangier D (2017) Language modeling with gated convolutional networks. ICML 70:933\u2013941","journal-title":"ICML"},{"key":"534_CR9","first-page":"2067","volume":"1","author":"A Dave","year":"2017","unstructured":"Dave A, Russakovsky O, Ramanan D (2017) Predictive-corrective networks for action detection. CVPR 1:2067\u20132076","journal-title":"Predictive-corrective networks for action detection. CVPR"},{"key":"534_CR10","first-page":"269","volume":"9909","author":"R De Geest","year":"2016","unstructured":"De Geest R, Gavves E, Ghodrati A, Li Z, Snoek C, Tuytelaars T (2016) Online action detection. ECCV 9909:269\u2013284","journal-title":"Online action detection. ECCV"},{"key":"534_CR11","first-page":"1549","volume":"1","author":"R De Geest","year":"2018","unstructured":"De Geest R, Tuytelaars T (2018) Modeling temporal structure with lstm for online action detection. WACV 1:1549\u20131557","journal-title":"WACV"},{"key":"534_CR12","first-page":"2625","volume":"1","author":"J Donahue","year":"2015","unstructured":"Donahue J, Hendricks LA, Rohrbach M, Venugopalan S, Guadarrama S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. CVPR 1:2625\u20132634","journal-title":"CVPR"},{"key":"534_CR13","first-page":"4489","volume":"1","author":"T Du","year":"2015","unstructured":"Du T, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. ICCV 1:4489\u20134497","journal-title":"ICCV"},{"key":"534_CR14","first-page":"6450","volume":"1","author":"T Du","year":"2018","unstructured":"Du T, Wang H, Torresani L, Ray J, Lecun Y (2018) A closer look at spatiotemporal convolutions for action recognition. CVPR 1:6450\u20136459","journal-title":"CVPR"},{"key":"534_CR15","first-page":"806","volume":"1","author":"H Eun","year":"2020","unstructured":"Eun H, Moon J, Park J, Jung C, Kim C (2020) Learning to discriminate information for online action detection. CVPR 1:806\u2013815","journal-title":"CVPR"},{"key":"534_CR16","first-page":"2261","volume":"32","author":"Q Fan","year":"2019","unstructured":"Fan Q, Chen CFR, Kuehne H, Pistoia M, Cox D (2019) More is less: Learning efficient video representations by big-little network and depthwise temporal aggregation. NIPS 32:2261\u20132270","journal-title":"NIPS"},{"key":"534_CR17","first-page":"1933","volume":"1","author":"C Feichtenhofer","year":"2016","unstructured":"Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. CVPR 1:1933\u20131941","journal-title":"CVPR"},{"key":"534_CR18","first-page":"177","volume":"1","author":"H Gammulle","year":"2017","unstructured":"Gammulle H, Denman S, Sridharan S, Fookes C (2017) Two stream lstm: A deep fusion framework for human action recognition. WACV 1:177\u2013186","journal-title":"WACV"},{"key":"534_CR19","first-page":"70","volume":"11206","author":"J Gao","year":"2018","unstructured":"Gao J, Chen K, Nevatia R (2018) Ctap: Complementary temporal action proposal generation. ECCV 11206:70\u201385","journal-title":"ECCV"},{"key":"534_CR20","first-page":"1","volume":"92","author":"J Gao","year":"2017","unstructured":"Gao J, Yang Z, Nevatia R (2017) RED: reinforced encoder-decoder networks for action anticipation. BMVC 92:1\u201311","journal-title":"BMVC"},{"key":"534_CR21","first-page":"5541","volume":"1","author":"M Gao","year":"2019","unstructured":"Gao M, Xu M, Davis LS, Socher R, Xiong C (2019) Startnet: Online detection of action start in untrimmed videos. ICCV 1:5541\u20135550","journal-title":"ICCV"},{"key":"534_CR22","first-page":"580","volume":"1","author":"RB Girshick","year":"2014","unstructured":"Girshick RB, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR 1:580\u2013587","journal-title":"CVPR"},{"key":"534_CR23","doi-asserted-by":"crossref","unstructured":"Gkioxari G, Malik J (2014) Finding action tubes. arXiv preprint arXiv: 1411.6031","DOI":"10.1109\/CVPR.2015.7298676"},{"key":"534_CR24","first-page":"6047","volume":"1","author":"C Gu","year":"2018","unstructured":"Gu C, Sun C, Ross DA, Vondrick C, Pantofaru C, Li Y, Vijayanarasimhan S, Toderici G, Ricco S, Sukthankar R, Schmid C, Malik J (2018) AVA: A video dataset of spatio-temporally localized atomic visual actions. CVPR 1:6047\u20136056","journal-title":"CVPR"},{"key":"534_CR25","first-page":"6546","volume":"1","author":"K Hara","year":"2017","unstructured":"Hara K, Kataoka H, Satoh Y (2017) Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet? CVPR 1:6546\u20136555","journal-title":"CVPR"},{"key":"534_CR26","doi-asserted-by":"crossref","unstructured":"He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. arXiv preprint arXiv: 1512.03385","DOI":"10.1109\/CVPR.2016.90"},{"key":"534_CR27","first-page":"2863","volume":"1","author":"M Hoai","year":"2012","unstructured":"Hoai M, Torre FDL (2012) Max-margin early event detectors. CVPR 1:2863\u20132870","journal-title":"Max-margin early event detectors. CVPR"},{"key":"534_CR28","doi-asserted-by":"publisher","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","volume":"9","author":"S Hochreiter","year":"1997","unstructured":"Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Computation 9:1735\u20131780","journal-title":"Neural Computation"},{"key":"534_CR29","first-page":"410","volume":"8691","author":"D Huang","year":"2014","unstructured":"Huang D, Yao S, Wang Y, De La Torre F (2014) Sequential max-margin event detectors. ECCV 8691:410\u2013424","journal-title":"Sequential max-margin event detectors. ECCV"},{"key":"534_CR30","unstructured":"Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167"},{"key":"534_CR31","first-page":"3192","volume":"1","author":"H Jhuang","year":"2013","unstructured":"Jhuang H, Gall J, Zuffi S, Schmid C, Black MJ (2013) Towards understanding action recognition. ICCV 1:3192\u20133199","journal-title":"Towards understanding action recognition. ICCV"},{"key":"534_CR32","unstructured":"Jiang YG, Liu J, Roshan\u00a0Zamir A, Toderici G, Laptev I, Shah M, Sukthankar R (2014) THUMOS challenge: Action recognition with a large number of classes. http:\/\/crcv.ucf.edu\/THUMOS14\/"},{"key":"534_CR33","first-page":"5699","volume":"1","author":"A Kar","year":"2017","unstructured":"Kar A, Rai N, Sikka K, Sharma G (2017) Adascan: Adaptive scan pooling in deep convolutional neural networks for human action recognition in videos. CVPR 1:5699\u20135708","journal-title":"CVPR"},{"key":"534_CR34","first-page":"1725","volume":"1","author":"A Karpathy","year":"2014","unstructured":"Karpathy A, Toderici G, Shetty S, Leung T, Li FF (2014) Large-scale video classification with convolutional neural networks. CVPR 1:1725\u20131732","journal-title":"CVPR"},{"key":"534_CR35","first-page":"9917","volume":"1","author":"Q Ke","year":"2019","unstructured":"Ke Q, Fritz M, Schiele B (2019) Time-conditioned action anticipation in one shot. CVPR 1:9917\u20139926","journal-title":"CVPR"},{"key":"534_CR36","doi-asserted-by":"publisher","first-page":"1775","DOI":"10.1109\/TPAMI.2014.2303090","volume":"36","author":"Y Kong","year":"2014","unstructured":"Kong Y, Jia Y (2014) Fu Y (2014) Interactive phrases: Semantic descriptionsfor human interaction recognition. TPAMI 36:1775\u20131788","journal-title":"TPAMI"},{"key":"534_CR37","first-page":"1","volume":"1","author":"I Laptev","year":"2008","unstructured":"Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. CVPR 1:1\u20138","journal-title":"CVPR"},{"key":"534_CR38","doi-asserted-by":"crossref","unstructured":"Lea C, Flynn MD, Vidal R, Reiter A, Hager GD (2016) Temporal convolutional networks for action segmentation and detection. arXiv preprint arXiv: 1611.05267","DOI":"10.1109\/CVPR.2017.113"},{"key":"534_CR39","first-page":"3957","volume":"1","author":"J Li","year":"2019","unstructured":"Li J, Wang J, Tian Q, Gao W, Zhang S (2019) Global-local temporal representations for video person re-identification. ICCV 1:3957\u20133966","journal-title":"ICCV"},{"key":"534_CR40","first-page":"203","volume":"9911","author":"Y Li","year":"2016","unstructured":"Li Y, Lan C, Xing J, Zeng W, Yuan C, Liu J (2016) Online human action detection using joint classification-regression recurrent neural networks. ECCV 9911:203\u2013220","journal-title":"ECCV"},{"key":"534_CR41","doi-asserted-by":"crossref","unstructured":"Lin J, Gan C, Han, S (2018) Temporal shift module for efficient video understanding. arXiv preprint arXiv: 1811.08383","DOI":"10.1109\/ICCV.2019.00718"},{"key":"534_CR42","first-page":"3","volume":"11208","author":"T Lin","year":"2018","unstructured":"Lin T, Zhao X, Su H, Wang C, Yang M (2018) Bsn: Boundary sensitive network for temporal action proposal generation. ECCV 11208:3\u201321","journal-title":"ECCV"},{"key":"534_CR43","unstructured":"Lin Z, Feng M, Santos CNd, Yu M, Xiang B, Zhou B, Bengio Y (2017) A structured self-attentive sentence embedding. arXiv preprint arXiv:1703.03130"},{"key":"534_CR44","first-page":"3671","volume":"1","author":"J Liu","year":"2017","unstructured":"Liu J, Wang G, Hu P, Duan LY, Kot AC (2017) Global context-aware attention lstm networks for 3d action recognition. CVPR 1:3671\u20133680","journal-title":"CVPR"},{"key":"534_CR45","first-page":"21","volume":"9905","author":"W Liu","year":"2016","unstructured":"Liu W, Anguelov D, Erhan D, Szegedy C, Reed SE, Fu C, Berg AC (2016) SSD: single shot multibox detector. ECCV 9905:21\u201337","journal-title":"ECCV"},{"key":"534_CR46","doi-asserted-by":"crossref","unstructured":"Ma, S., Sigal, L., Sclaroff, S.: Learning activity progression in lstms for activity detection and early detection. In: CVPR, vol.\u00a01, pp. 1942\u20131950 (2016)","DOI":"10.1109\/CVPR.2016.214"},{"key":"534_CR47","unstructured":"Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: ICML, pp. 807\u2013814"},{"key":"534_CR48","first-page":"4694","volume":"1","author":"YH Ng","year":"2015","unstructured":"Ng YH, Hausknecht M, Vijayanarasimhan S, Vinyals O, Monga R, Toderici G (2015) Beyond short snippets: Deep networks for video classification. CVPR 1:4694\u20134702","journal-title":"CVPR"},{"key":"534_CR49","unstructured":"Oord AVD, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, Kalchbrenner N, Senior A, Kavukcuoglu K (2016) Wavenet: A generative model for raw audio. arXiv preprint arXiv: 1609.03499"},{"key":"534_CR50","first-page":"744","volume":"9908","author":"X Peng","year":"2016","unstructured":"Peng X, Schmid C (2016) Multi-region two-stream R-CNN for action detection. ECCV 9908:744\u2013759","journal-title":"ECCV"},{"key":"534_CR51","first-page":"5534","volume":"1","author":"Z Qiu","year":"2017","unstructured":"Qiu Z, Yao T, Tao M (2017) Learning spatio-temporal representation with pseudo-3d residual networks. ICCV 1:5534\u20135542","journal-title":"ICCV"},{"key":"534_CR52","doi-asserted-by":"publisher","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","volume":"39","author":"S Ren","year":"2017","unstructured":"Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: Towards real-time object detection with region proposal networks. TPAMI 39:1137\u20131149","journal-title":"TPAMI"},{"key":"534_CR53","doi-asserted-by":"publisher","first-page":"400","DOI":"10.1214\/aoms\/1177729586","volume":"22","author":"H Robbins","year":"1951","unstructured":"Robbins H, Monro S (1951) A stochastic approximation method. The annals of mathematical statistics 22:400\u2013407","journal-title":"The annals of mathematical statistics"},{"key":"534_CR54","first-page":"1036","volume":"1","author":"MS Ryoo","year":"2012","unstructured":"Ryoo MS (2012) Human activity prediction: Early recognition of ongoing activities from streaming videos. CVPR 1:1036\u20131043","journal-title":"CVPR"},{"key":"534_CR55","first-page":"1417","volume":"1","author":"Z Shou","year":"2017","unstructured":"Shou Z, Chan J, Zareian A, Miyazawa K, Chang SF (2017) Cdc: Convolutional-de-convolutional networks for precise temporal action localization in untrimmed videos. ICCV 1:1417\u20131426","journal-title":"ICCV"},{"key":"534_CR56","first-page":"534","volume":"11207","author":"Z Shou","year":"2018","unstructured":"Shou Z, Pan J, Chan J, Miyazawa K, Mansour H, Vetro A, Giro-I-Nieto X, Chang SF (2018) Online detection of action start in untrimmed, streaming videos. CVPR 11207:534\u2013551","journal-title":"CVPR"},{"key":"534_CR57","first-page":"1049","volume":"1","author":"Z Shou","year":"2016","unstructured":"Shou Z, Wang D, Chang SF (2016) Temporal action localization in untrimmed videos via multi-stage cnns. CVPR 1:1049\u20131058","journal-title":"CVPR"},{"key":"534_CR58","first-page":"568","volume":"1","author":"K Simonyan","year":"2014","unstructured":"Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. NIPS 1:568\u2013576","journal-title":"NIPS"},{"key":"534_CR59","unstructured":"Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv: 1409.1556"},{"key":"534_CR60","first-page":"1961","volume":"1","author":"B Singh","year":"2016","unstructured":"Singh B, Marks TK, Jones M, Tuzel O, Ming S (2016) A multi-stream bi-directional recurrent neural network for fine-grained action detection. CVPR 1:1961\u20131970","journal-title":"CVPR"},{"key":"534_CR61","unstructured":"Singh G, Saha S, Sapienza M, Torr PH, Cuzzolin F. Online real-time multiple spatiotemporal action localisation and prediction. In: ICCV, vol.\u00a01"},{"key":"534_CR62","doi-asserted-by":"crossref","unstructured":"Soomro K, Idrees H, Shah M (2016) Predicting the where and what of actors and actions through online action localization. CVPR 1:2648\u20132657","DOI":"10.1109\/CVPR.2016.290"},{"key":"534_CR63","unstructured":"Soomro K, Zamir AR, Shah M (2012) Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402"},{"key":"534_CR64","unstructured":"Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. arXiv preprint arXiv: 1706.03762"},{"key":"534_CR65","first-page":"3551","volume":"1","author":"H Wang","year":"2013","unstructured":"Wang H, Schmid C (2013) Action recognition with improved trajectories. ICCV 1:3551\u20133558","journal-title":"Action recognition with improved trajectories. ICCV"},{"key":"534_CR66","unstructured":"Wang L, Xiong Y, Wang Z, Qiao Y (2015) Towards good practices for very deep two-stream convnets. arXiv preprint arXiv:1507.02159"},{"key":"534_CR67","first-page":"7794","volume":"1","author":"X Wang","year":"2018","unstructured":"Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. CVPR 1:7794\u20137803","journal-title":"Non-local neural networks. CVPR"},{"key":"534_CR68","first-page":"284","volume":"1","author":"CY Wu","year":"2019","unstructured":"Wu CY, Feichtenhofer C, Fan H, He K, Krahenbuhl P, Girshick R (2019) Long-term feature banks for detailed video understanding. CVPR 1:284\u2013293","journal-title":"CVPR"},{"key":"534_CR69","unstructured":"Xie S, Sun C, Huang J, Tu Z, Murphy K (2017) Rethinking spatiotemporal feature learning for video understanding. arXiv preprint arXiv:1712.04851"},{"key":"534_CR70","unstructured":"Xiong Y, Wang L, Wang Z, Zhang B, Song H, Li W, Lin D, Qiao Y, Van\u00a0Gool L, Tang X (2016) Cuhk ethz siat submission to activitynet challenge 2016. arXiv preprint arXiv:1608.00797"},{"key":"534_CR71","first-page":"5794","volume":"1","author":"H Xu","year":"2017","unstructured":"Xu H, Das A, Saenko K (2017) R-c3d: Region convolutional 3d network for temporal activity detection. ICCV 1:5794\u20135803","journal-title":"ICCV"},{"key":"534_CR72","first-page":"5531","volume":"1","author":"M Xu","year":"2019","unstructured":"Xu M, Gao M, Chen YT, Davis LS, Crandall DJ (2019) Temporal recurrent networks for online action detection. ICCV 1:5531\u20135540","journal-title":"ICCV"},{"key":"534_CR73","first-page":"1480","volume":"1","author":"Z Yang","year":"2016","unstructured":"Yang Z, Yang D, Dyer C, He X, Smola A, Hovy E (2016) Hierarchical attention networks for document classification. NAACL 1:1480\u20131489","journal-title":"NAACL"},{"key":"534_CR74","doi-asserted-by":"publisher","first-page":"375","DOI":"10.1007\/s11263-017-1013-y","volume":"126","author":"S Yeung","year":"2018","unstructured":"Yeung S, Russakovsky O, Jin N, Andriluka M, Mori G, Fei-Fei L (2018) Every moment counts: Dense detailed labeling of actions in complex videos. IJCV 126:375\u2013389","journal-title":"IJCV"},{"key":"534_CR75","first-page":"2658","volume":"1","author":"C Yu","year":"2013","unstructured":"Yu C, Barrett D, Barbu A, Narayanaswamy S, Song W (2013) Recognize human activities from partially observed videos. CVPR 1:2658\u20132665","journal-title":"CVPR"},{"key":"534_CR76","first-page":"7093","volume":"1","author":"R Zeng","year":"2019","unstructured":"Zeng R, Huang W, Tan M, Rong Y, Zhao P, Huang J, Gan C (2019) Graph convolutional networks for temporal action localization. ICCV 1:7093\u20137102","journal-title":"ICCV"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-021-00534-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-021-00534-3\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-021-00534-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,9]],"date-time":"2023-02-09T06:22:09Z","timestamp":1675923729000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-021-00534-3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,11,13]]},"references-count":76,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2022,4]]}},"alternative-id":["534"],"URL":"https:\/\/doi.org\/10.1007\/s40747-021-00534-3","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"value":"2199-4536","type":"print"},{"value":"2198-6053","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,11,13]]},"assertion":[{"value":"10 April 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"5 September 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 November 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"On behalf of all authors, the corresponding authors state that there is no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}