{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,21]],"date-time":"2025-08-21T16:31:35Z","timestamp":1755793895372,"version":"3.37.3"},"reference-count":66,"publisher":"Springer Science and Business Media LLC","issue":"8","license":[{"start":{"date-parts":[[2021,3,30]],"date-time":"2021-03-30T00:00:00Z","timestamp":1617062400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/www.springer.com\/tdm"},{"start":{"date-parts":[[2021,3,30]],"date-time":"2021-03-30T00:00:00Z","timestamp":1617062400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.springer.com\/tdm"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int. J. Mach. Learn. &amp; Cyber."],"published-print":{"date-parts":[[2021,8]]},"DOI":"10.1007\/s13042-021-01301-z","type":"journal-article","created":{"date-parts":[[2021,3,30]],"date-time":"2021-03-30T11:02:28Z","timestamp":1617102148000},"page":"2199-2211","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["A two-stage temporal proposal network for precise action localization in untrimmed video"],"prefix":"10.1007","volume":"12","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8296-8039","authenticated-orcid":false,"given":"Fei","family":"Wang","sequence":"first","affiliation":[]},{"given":"Guorui","family":"Wang","sequence":"additional","affiliation":[]},{"given":"Yuxuan","family":"Du","sequence":"additional","affiliation":[]},{"given":"Zhenquan","family":"He","sequence":"additional","affiliation":[]},{"given":"Yong","family":"Jiang","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2021,3,30]]},"reference":[{"key":"1301_CR1","doi-asserted-by":"publisher","unstructured":"Yeung S, Russakovsky O, Mori G, Fei-Fei L (2016) End-to-end learning of action detection from frame glimpses in videos. In: IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, USA, 2016, pp 2678\u20132687. https:\/\/doi.org\/10.1109\/CVPR.2016.293","DOI":"10.1109\/CVPR.2016.293"},{"issue":"5","key":"1301_CR2","doi-asserted-by":"publisher","first-page":"1423","DOI":"10.1109\/TCSVT.2018.2830102","volume":"29","author":"Z Tu","year":"2019","unstructured":"Tu Z, Xie W, Dauwels J, Li B, Yuan J (2019) Semantic cues enhanced multimodality multistream CNN for action recognition. IEEE Trans Circuits Syst Video Technol 29(5):1423\u20131437","journal-title":"IEEE Trans Circuits Syst Video Technol"},{"key":"1301_CR3","doi-asserted-by":"publisher","unstructured":"Liu K, Gao L, Khan NM, Qi L, Guan L (2020) A multi-stream graph convolutional networks-hidden conditional random field model for skeleton-based action recognition. In: IEEE Transactions on Multimedia, vol 23, 2021, pp 64\u201376. https:\/\/doi.org\/10.1109\/TMM.2020.2974323","DOI":"10.1109\/TMM.2020.2974323"},{"key":"1301_CR4","doi-asserted-by":"publisher","unstructured":"Lee I, Kim D, Lee S (2020) 3D human behavior understanding using generalized TS-LSTM networks. In: IEEE Transactions on Multimedia, vol 23, 2021, pp415\u2013428. https:\/\/doi.org\/10.1109\/TMM.2020.2978637","DOI":"10.1109\/TMM.2020.2978637"},{"key":"1301_CR5","doi-asserted-by":"publisher","unstructured":"Yang J, Liu W, Yuan J, Mei T (2020) Hierarchical soft quantization for skeleton-based human action recognition. In: IEEE Transactions on Multimedia, vol 23, 2021,  pp 883\u2013898. https:\/\/doi.org\/10.1109\/TMM.2020.2990082","DOI":"10.1109\/TMM.2020.2990082"},{"key":"1301_CR6","doi-asserted-by":"crossref","unstructured":"Escorcia V, Heilbron FC, Niebles JC, Ghanem B (2016) Daps: deep action proposals for action understanding. In: European conference on computer vision. Springer, Cham, pp 768\u2013784","DOI":"10.1007\/978-3-319-46487-9_47"},{"key":"1301_CR7","doi-asserted-by":"publisher","unstructured":"Shou Z, Wang D, Chang S (2016) Temporal action localization in untrimmed videos via multi-stage CNNs. In: IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, USA, 2016, pp 1049\u20131058. https:\/\/doi.org\/10.1109\/CVPR.2016.119","DOI":"10.1109\/CVPR.2016.119"},{"key":"1301_CR8","doi-asserted-by":"publisher","unstructured":"Gao J, Yang Z, Sun C, Chen K, Nevatia R (2017) Turn tap: temporal unit regression network for temporal action proposals. In: IEEE international conference on computer vision (ICCV), Venice, Italy, 2017, pp 3648\u20133656. https:\/\/doi.org\/10.1109\/ICCV.2017.392","DOI":"10.1109\/ICCV.2017.392"},{"key":"1301_CR9","doi-asserted-by":"publisher","unstructured":"Buch S, Escorcia V, Shen C, Ghanem B, Niebles JC (2017) SST: single-stream temporal action proposals. In: IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, HI, USA, 2017, pp 6373\u20136382. https:\/\/doi.org\/10.1109\/CVPR.2017.675","DOI":"10.1109\/CVPR.2017.675"},{"issue":"2","key":"1301_CR10","doi-asserted-by":"publisher","first-page":"337","DOI":"10.1109\/TMM.2019.2929923","volume":"22","author":"H Liu","year":"2020","unstructured":"Liu H, Wang S, Wang W, Cheng J (2020) Multi-scale based context-aware net for action detection. IEEE Trans Multimed 22(2):337\u2013348","journal-title":"IEEE Trans Multimed"},{"key":"1301_CR11","doi-asserted-by":"crossref","unstructured":"Gao J, Chen K, Nevatia R (2018) CTAP: complementary temporal action proposal generation. In: Proceedings of the European conference on computer vision (ECCV), pp 68\u201383","DOI":"10.1007\/978-3-030-01216-8_5"},{"key":"1301_CR12","doi-asserted-by":"crossref","unstructured":"Zhao Y, Xiong Y, Wang L, Wu Z, Tang X, Lin D (2017) Temporal action detection with structured segment networks. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 2914\u20132923","DOI":"10.1109\/ICCV.2017.317"},{"issue":"8","key":"1301_CR13","doi-asserted-by":"publisher","first-page":"2650","DOI":"10.1109\/TCSVT.2019.2923712","volume":"30","author":"J Huang","year":"2020","unstructured":"Huang J, Li N, Li T, Liu S, Li G (2020) Spatial-temporal context-aware online action detection and prediction. IEEE Trans Circuits Syst Video Technol 30(8):2650\u20132662","journal-title":"IEEE Trans Circuits Syst Video Technol"},{"key":"1301_CR14","doi-asserted-by":"publisher","unstructured":"Gong G, Wang X, Mu Y, Tian Q (2020) Learning temporal co-attention models for unsupervised video action localization. In: IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp 9816\u20139825. https:\/\/doi.org\/10.1109\/CVPR42600.2020.00984","DOI":"10.1109\/CVPR42600.2020.00984"},{"key":"1301_CR15","doi-asserted-by":"publisher","unstructured":"Oneata D, Verbeek J, Schmid C (2013) Action and event recognition with fisher vectors on a compact feature set. In: IEEE International Conference on Computer Vision, Sydney, NSW, Australia, 2013, pp 1817\u20131824. https:\/\/doi.org\/10.1109\/ICCV.2013.228","DOI":"10.1109\/ICCV.2013.228"},{"key":"1301_CR16","doi-asserted-by":"publisher","unstructured":"Oneata D, Verbeek J, Schmid C (2014) Efficient action localization with approximately normalized fisher vectors. In: IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 2014, pp 2545\u20132552. https:\/\/doi.org\/10.1109\/CVPR.2014.326","DOI":"10.1109\/CVPR.2014.326"},{"key":"1301_CR17","doi-asserted-by":"publisher","unstructured":"Jain M, Gemert Jv, J\u00e9gou H, Bouthemy P, Snoek CGM (2014) In: IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 2014, pp 740\u2013747. https:\/\/doi.org\/10.1109\/CVPR.2014.100","DOI":"10.1109\/CVPR.2014.100"},{"key":"1301_CR18","doi-asserted-by":"publisher","unstructured":"Tang K, Yao B, Fei-Fei L, Koller D (2013) Combining the right features for complex event recognition. In:  IEEE International Conference on Computer Vision, Sydney, NSW, Australia, 2013, pp 2696\u20132703. https:\/\/doi.org\/10.1109\/ICCV.2013.335","DOI":"10.1109\/ICCV.2013.335"},{"key":"1301_CR19","unstructured":"Jiang YG, Liu J, Roshan Zamir A, Toderici G, Laptev I, Shah M, Sukthankar R (2014) Thumos challenge: action recognition with a large number of classes. http:\/\/crcv.ucf.edu\/THUMOS14\/"},{"key":"1301_CR20","doi-asserted-by":"publisher","unstructured":"Heilbron FC, Escorcia V, Ghanem B, Niebles JC (2015) ActivityNet: a large-scale video benchmark for human activity understanding. In:  IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 2015, pp 961\u2013970. https:\/\/doi.org\/10.1109\/CVPR.2015.7298698","DOI":"10.1109\/CVPR.2015.7298698"},{"key":"1301_CR21","doi-asserted-by":"crossref","unstructured":"Sigurdsson GA, Varol G, Wang X, Farhadi A, Laptev I, Gupta A (2016) Hollywood in homes: crowd sourcing data collection for activity understanding. In: European Conference on Computer Vision, Springer, Cham, pp 510\u2013526","DOI":"10.1007\/978-3-319-46448-0_31"},{"key":"1301_CR22","doi-asserted-by":"publisher","unstructured":"Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Van\u00a0Gool L (2016) Temporal segment networks: towards good practices for deep action recognition. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer Vision\u2014ECCV 2016.  Lecture Notes in Computer Science, vol 9912. Springer, Cham. https:\/\/doi.org\/10.1007\/978-3-319-46484-8_2","DOI":"10.1007\/978-3-319-46484-8_2"},{"key":"1301_CR23","unstructured":"Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In:  Proceedings of the 27th international conference on neural information processing systems, vol 1. pp 568\u2013576"},{"key":"1301_CR24","doi-asserted-by":"publisher","unstructured":"Feichtenhofer C, Fan H, Malik J, He K (2018) Slowfast networks for video recognition. In: IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), 2019, pp 6201\u20136210. https:\/\/doi.org\/10.1109\/ICCV.2019.00630","DOI":"10.1109\/ICCV.2019.00630"},{"key":"1301_CR25","doi-asserted-by":"crossref","unstructured":"Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 4489\u20134497","DOI":"10.1109\/ICCV.2015.510"},{"key":"1301_CR26","doi-asserted-by":"publisher","unstructured":"Diba A, Sharma V, Van\u00a0Gool L, Stiefelhagen R (2019) Dynamonet: dynamic action and motion network. In: IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), 2019, pp 6191\u20136200. https:\/\/doi.org\/10.1109\/ICCV.2019.00629","DOI":"10.1109\/ICCV.2019.00629"},{"key":"1301_CR27","doi-asserted-by":"publisher","unstructured":"Girdhar R, Tran D, Torresani L, Ramanan D (2019) DistInit: learning video representations without a single labeled video. In: IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), 2019, pp 852\u2013861. https:\/\/doi.org\/10.1109\/ICCV.2019.00094","DOI":"10.1109\/ICCV.2019.00094"},{"issue":"10","key":"1301_CR28","doi-asserted-by":"publisher","first-page":"2504","DOI":"10.1109\/TMM.2019.2907060","volume":"21","author":"T Yu","year":"2019","unstructured":"Yu T, Wang L, Da C, Gu H, Xiang S, Pan C (2019) Weakly semantic guided action recognition. IEEE Trans Multimed 21(10):2504\u20132517","journal-title":"IEEE Trans Multimed"},{"key":"1301_CR29","doi-asserted-by":"publisher","unstructured":"Chen G, Zhang C, Zou Y (2020) AFNet: temporal locality-aware network with dual structure for accurate and fast action detection. In: IEEE Transactions on Multimedia. https:\/\/doi.org\/10.1109\/TMM.2020.3014555","DOI":"10.1109\/TMM.2020.3014555"},{"issue":"9","key":"1301_CR30","doi-asserted-by":"publisher","first-page":"2293","DOI":"10.1109\/TMM.2019.2953814","volume":"22","author":"H Wu","year":"2020","unstructured":"Wu H, Ma X, Li Y (2020) Convolutional networks with channel and STIPs attention model for action recognition in videos. IEEE Trans Multimed 22(9):2293\u20132306","journal-title":"IEEE Trans Multimed"},{"key":"1301_CR31","doi-asserted-by":"publisher","unstructured":"Zhang T, Zheng W, Cui Z, Zong Y, Li C, Zhou X, Yang J (2020) Deep manifold-to-manifold transforming network for skeleton-based action recognition. In: IEEE Transactions on Multimedia, vol 22(11), pp 2926\u20132937. https:\/\/doi.org\/10.1109\/TMM.2020.2966878","DOI":"10.1109\/TMM.2020.2966878"},{"key":"1301_CR32","doi-asserted-by":"publisher","unstructured":"Y. An, Y. Wang, Z. Li, Q. Yang, Yu (2019) PA3D: pose-action 3D machine for video recognition. In: IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp 7922\u20137931. https:\/\/doi.org\/10.1109\/CVPR.2019.00811","DOI":"10.1109\/CVPR.2019.00811"},{"key":"1301_CR33","doi-asserted-by":"publisher","unstructured":"Choutas V, Weinzaepfel P, Revaud J, Schmid C (2018) Potion: pose motion representation for action recognition. In: IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp 7024\u20137033. https:\/\/doi.org\/10.1109\/CVPR.2018.00734","DOI":"10.1109\/CVPR.2018.00734"},{"key":"1301_CR34","doi-asserted-by":"crossref","unstructured":"Marcon M, Paracchini MBM, Tubaro S (2019) A framework for interpreting, modeling and recognizing human body gestures through 3D eigenpostures. Int J Mach Learn Cybern 10(5):1205\u20131226","DOI":"10.1007\/s13042-018-0801-1"},{"key":"1301_CR35","doi-asserted-by":"crossref","unstructured":"Zhang S, Callaghan V (2021) Real-time human posture recognition using an adaptive hybrid classifier. Int J Mach Learn Cybern 12(2):489\u2013499","DOI":"10.1007\/s13042-020-01182-8"},{"issue":"2","key":"1301_CR36","doi-asserted-by":"publisher","first-page":"154","DOI":"10.1007\/s11263-013-0620-5","volume":"104","author":"JRR Uijlings","year":"2013","unstructured":"Uijlings JRR, Sande KEAVD, Gevers T, Smeulders AWM (2013) Selective search for object recognition. Int J Comput Vis 104(2):154\u2013171","journal-title":"Int J Comput Vis"},{"key":"1301_CR37","doi-asserted-by":"crossref","unstructured":"Zitnick,CL, Dollar P (2014) Edge boxes: locating object proposals from edges. In:  European conference on computer vision, Springer, Cham, pp 391\u2013405","DOI":"10.1007\/978-3-319-10602-1_26"},{"key":"1301_CR38","doi-asserted-by":"publisher","unstructured":"Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 2014,  pp 580\u2013587. https:\/\/doi.org\/10.1109\/CVPR.2014.81","DOI":"10.1109\/CVPR.2014.81"},{"issue":"6","key":"1301_CR39","doi-asserted-by":"publisher","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","volume":"39","author":"S Ren","year":"2017","unstructured":"Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137\u20131149","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"1301_CR40","doi-asserted-by":"publisher","unstructured":"Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, USA, 2016, pp 779\u2013788. https:\/\/doi.org\/10.1109\/CVPR.2016.91","DOI":"10.1109\/CVPR.2016.91"},{"key":"1301_CR41","doi-asserted-by":"crossref","unstructured":"Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) SSD: single shot multibox detector. In: European conference on computer vision,  Springer, Cham pp 21\u201337","DOI":"10.1007\/978-3-319-46448-0_2"},{"issue":"4","key":"1301_CR42","doi-asserted-by":"publisher","first-page":"661","DOI":"10.1109\/TCSVT.2015.2424054","volume":"26","author":"S Cho","year":"2016","unstructured":"Cho S, Byun H (2016) A space-time graph optimization approach based on maximum cliques for action detection. IEEE Trans Circuits Syst Video Technol 26(4):661\u2013672","journal-title":"IEEE Trans Circuits Syst Video Technol"},{"key":"1301_CR43","doi-asserted-by":"crossref","unstructured":"Gao J, Yang Z, Nevatia R (2017) Cascaded boundary regression for temporal action detection. arXiv:1705.01180","DOI":"10.5244\/C.31.52"},{"key":"1301_CR44","doi-asserted-by":"publisher","unstructured":"Buch S, Escorcia V, Shen C, Ghanem B, Niebles JC (2017) SST: single-stream temporal action proposals. In: IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, HI, USA, 2017, pp 6373\u20136382. https:\/\/doi.org\/10.1109\/CVPR.2017.675","DOI":"10.1109\/CVPR.2017.675"},{"key":"1301_CR45","unstructured":"Lin T, Zhao X, Shou Z (2017) Temporal convolution based action proposal: submission to activitynet 2017. arxiv.org\/abs\/1707.06750"},{"key":"1301_CR46","doi-asserted-by":"publisher","unstructured":"Xu H, Das A, Saenko K (2017) R-C3D: region convolutional 3d network for temporal activity detection. In: IEEE international conference on computer vision (ICCV), Venice, Italy,  2017, pp 5794\u20135803. https:\/\/doi.org\/10.1109\/ICCV.2017.617","DOI":"10.1109\/ICCV.2017.617"},{"key":"1301_CR47","doi-asserted-by":"publisher","unstructured":"Xu M, Zhao C, Rojas DS, Thabet A, Ghanem B (2020) G-TAD: sub-graph localization for temporal action detection. In: IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp 10153\u201310162. https:\/\/doi.org\/10.1109\/CVPR42600.2020.01017","DOI":"10.1109\/CVPR42600.2020.01017"},{"key":"1301_CR48","doi-asserted-by":"publisher","unstructured":"Fan L, Huang W, Gan C, Ermon S, Gong B, Huang J (2018) End-to-end learning of motion representation for video understanding. In: IEEE\/CVF conference on computer vision and pattern recognition, Salt Lake City, UT, USA, 2018, pp 6016\u20136025. https:\/\/doi.org\/10.1109\/CVPR.2018.00630","DOI":"10.1109\/CVPR.2018.00630"},{"issue":"1","key":"1301_CR49","doi-asserted-by":"publisher","first-page":"128","DOI":"10.1109\/TPAMI.2016.2537320","volume":"39","author":"J Pont-Tuset","year":"2017","unstructured":"Pont-Tuset J, Arbel\u00e1ez P, Barron JT, Marques F, Malik J (2017) Multiscale combinatorial grouping for image segmentation and object proposal generation. IEEE Trans Pattern Anal Mach Intell 39(1):128\u2013140","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"1301_CR50","doi-asserted-by":"publisher","unstructured":"Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE computer society conference on computer vision and pattern recognition (CVPR\u201906), New York, NY, USA, pp 2169\u20132178. https:\/\/doi.org\/10.1109\/CVPR.2006.68","DOI":"10.1109\/CVPR.2006.68"},{"key":"1301_CR51","doi-asserted-by":"publisher","unstructured":"Heilbron FC, Niebles JC, Ghanem B (2016) Fast temporal activity proposals for efficient detection of human actions in untrimmed videos. In: IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, USA,  2016, pp 1914\u20131923. https:\/\/doi.org\/10.1109\/CVPR.2016.211","DOI":"10.1109\/CVPR.2016.211"},{"key":"1301_CR52","doi-asserted-by":"crossref","unstructured":"Lin T, Zhao X, Su H, Wang C, Yang M (2018) BSN: boundary sensitive network for temporal action proposal generation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 3\u201319","DOI":"10.1007\/978-3-030-01225-0_1"},{"key":"1301_CR53","doi-asserted-by":"publisher","unstructured":"Yuan J, Ni B, Yang X, Kassim AA (2016) Temporal action localization with pyramid of score distribution features. In: IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, USA,  2016, pp 3093\u20133102. https:\/\/doi.org\/10.1109\/CVPR.2016.337","DOI":"10.1109\/CVPR.2016.337"},{"key":"1301_CR54","doi-asserted-by":"publisher","unstructured":"Yu T, Ren Z, Li Y, Yan E, Xu N, Yuan J (2019) Temporal structure mining for weakly supervised action detection. In: IEEE\/CVF international conference on computer vision (ICCV), Seoul, Korea (South),  2019, pp 5521\u20135530. https:\/\/doi.org\/10.1109\/ICCV.2019.00562","DOI":"10.1109\/ICCV.2019.00562"},{"key":"1301_CR55","doi-asserted-by":"publisher","unstructured":"Narayan S, Cholakkal H, Khan FS, Shao L (2019) 3C-Net: Category count and center loss for weakly-supervised action localization. In: IEEE\/CVF international conference on computer vision (ICCV), Seoul, Korea (South), 2019, pp 8678\u20138686. https:\/\/doi.org\/10.1109\/ICCV.2019.00877","DOI":"10.1109\/ICCV.2019.00877"},{"key":"1301_CR56","doi-asserted-by":"publisher","unstructured":"Nguyen PX, Ramanan D, Fowlkes CC (2019) Weakly-supervised action localization with background modeling. In: IEEE\/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), 2019, pp 5501\u20135510. https:\/\/doi.org\/10.1109\/ICCV.2019.00560","DOI":"10.1109\/ICCV.2019.00560"},{"issue":"1","key":"1301_CR57","doi-asserted-by":"publisher","first-page":"117","DOI":"10.1109\/TCSVT.2018.2887061","volume":"30","author":"J Wang","year":"2020","unstructured":"Wang J, Wang W, Gao W (2020) Fast and accurate action detection in videos with motion-centric attention model. IEEE Trans Circuits Syst Video Technol 30(1):117\u2013130","journal-title":"IEEE Trans Circuits Syst Video Technol"},{"key":"1301_CR58","doi-asserted-by":"publisher","unstructured":"Shi B, Dai Q, Mu Y, Wang J (2020) Weakly-supervised action localization by generative attention modeling. In: IEEE\/CVF conference on computer vision and pattern recognition (CVPR), Seattle, WA, USA,  2020, pp 1006\u20131016. https:\/\/doi.org\/10.1109\/CVPR42600.2020.00109","DOI":"10.1109\/CVPR42600.2020.00109"},{"key":"1301_CR59","doi-asserted-by":"publisher","unstructured":"Liu Z, Wang L, Zhang Q, Gao Z, Niu Z, Zheng N, Hua G (2019) Weakly supervised temporal action localization through contrast based evaluation networks. In: IEEE\/CVF international conference on computer vision (ICCV), Seoul, Korea (South), 2019, pp 3898\u20133907. https:\/\/doi.org\/10.1109\/ICCV.2019.00400","DOI":"10.1109\/ICCV.2019.00400"},{"key":"1301_CR60","doi-asserted-by":"publisher","unstructured":"Shou Z, Chan J, Zareian A, Miyazawa K, Chang S (2017) CDC: convolutional-de-convolutional networks for precise temporal action localization in untrimmed videos. In: IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, HI, USA,  2017, pp 1417\u20131426. https:\/\/doi.org\/10.1109\/CVPR.2017.155","DOI":"10.1109\/CVPR.2017.155"},{"key":"1301_CR61","doi-asserted-by":"publisher","unstructured":"Dai X, Singh B, Zhang G, Davis LS, Chen YQ (2017) Temporal context network for activity localization in videos. In: IEEE international conference on computer vision (ICCV), Venice, Italy, 2017, pp 5727\u20135736. https:\/\/doi.org\/10.1109\/ICCV.2017.610","DOI":"10.1109\/ICCV.2017.610"},{"key":"1301_CR62","doi-asserted-by":"publisher","unstructured":"Heilbron FC, Barrios W, Escorcia V, Ghanem B (2017) SCC: semantic context cascade for efficient action detection. In: IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, HI, USA,  2017, pp 3175\u20133184. https:\/\/doi.org\/10.1109\/CVPR.2017.338","DOI":"10.1109\/CVPR.2017.338"},{"key":"1301_CR63","doi-asserted-by":"publisher","unstructured":"Zeng R, Huang W, Gan C, Tan M, Rong Y, Zhao P, Huang J (2019) Graph convolutional networks for temporal action localization. In: IEEE\/CVF international conference on computer vision (ICCV), Seoul, Korea (South),  2019, pp 7093\u20137102. https:\/\/doi.org\/10.1109\/ICCV.2019.00719","DOI":"10.1109\/ICCV.2019.00719"},{"key":"1301_CR64","doi-asserted-by":"publisher","unstructured":"Lin T, Liu X, Li X, Ding E, Wen S (2019) BMN: boundary-matching network for temporal action proposal generation. In: IEEE\/CVF international conference on computer vision (ICCV), Seoul, Korea (South), 2019, pp 3888\u20133897. https:\/\/doi.org\/10.1109\/ICCV.2019.00399","DOI":"10.1109\/ICCV.2019.00399"},{"key":"1301_CR65","doi-asserted-by":"publisher","unstructured":"Sigurdsson GA, Divvala S, Farhadi A, Gupta A (2017) Asynchronous temporal fields for action recognition. In: IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, HI, USA, 2017, pp 5650\u20135659. https:\/\/doi.org\/10.1109\/CVPR.2017.599","DOI":"10.1109\/CVPR.2017.599"},{"key":"1301_CR66","doi-asserted-by":"publisher","unstructured":"Piergiovanni A, Ryoo MS (2018) Learning latent super-events to detect multiple activities in videos. In: IEEE\/CVF conference on computer vision and pattern recognition, Salt Lake City, UT, USA, 2018, pp 5304\u20135313. https:\/\/doi.org\/10.1109\/CVPR.2018.00556","DOI":"10.1109\/CVPR.2018.00556"}],"container-title":["International Journal of Machine Learning and Cybernetics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s13042-021-01301-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s13042-021-01301-z\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s13042-021-01301-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,7,13]],"date-time":"2021-07-13T02:30:09Z","timestamp":1626143409000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s13042-021-01301-z"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,3,30]]},"references-count":66,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2021,8]]}},"alternative-id":["1301"],"URL":"https:\/\/doi.org\/10.1007\/s13042-021-01301-z","relation":{},"ISSN":["1868-8071","1868-808X"],"issn-type":[{"type":"print","value":"1868-8071"},{"type":"electronic","value":"1868-808X"}],"subject":[],"published":{"date-parts":[[2021,3,30]]},"assertion":[{"value":"17 June 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"8 March 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"30 March 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}