{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,2]],"date-time":"2026-06-02T18:16:46Z","timestamp":1780424206299,"version":"3.54.1"},"reference-count":58,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2024,3,6]],"date-time":"2024-03-06T00:00:00Z","timestamp":1709683200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,3,6]],"date-time":"2024-03-06T00:00:00Z","timestamp":1709683200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61971007"],"award-info":[{"award-number":["61971007"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61971007"],"award-info":[{"award-number":["61971007"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61971007"],"award-info":[{"award-number":["61971007"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Neural Process Lett"],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Reinforcement of motor features is necessary in action recognition tasks. In this work, we propose an efficient feature reinforcement model, termed as Feature Selection and Enhancement Networks (FEASE-Net). The core of our FEASE-Net is the use of the FEASE module to adaptively capture input features at multi-scales and reinforce them globally. FEASE module is composed of two sub-module, Feature Selection (FS) and Feature Enhancement (FE). The FS focuses on adaptive attention and selection of input features through a multi-scale structure with an attention mechanism, and FE employs channel attention to enhance the global useful feature information. To assess the effectiveness of FEASE-Net, we undertake a series of extensive experiments on two benchmark datasets, namely Kinetics 400 and Something-Something V2. Our proposed FEASE-Net can achieve a competitive performance compared with previous state-of-the-art methods that use similar backbones.<\/jats:p>","DOI":"10.1007\/s11063-024-11547-7","type":"journal-article","created":{"date-parts":[[2024,3,6]],"date-time":"2024-03-06T13:02:21Z","timestamp":1709730141000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["FEASE: Feature Selection and Enhancement Networks for Action Recognition"],"prefix":"10.1007","volume":"56","author":[{"given":"Lu","family":"Zhou","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yuanyao","family":"Lu","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Haiyang","family":"Jiang","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2024,3,6]]},"reference":[{"key":"11547_CR1","first-page":"6201","volume":"2019","author":"C Feichtenhofer","year":"2018","unstructured":"Feichtenhofer C, Fan H, Malik J, He K (2018) Slowfast networks for video recognition. IEEE\/CVF Int Conf Comput Vision (ICCV) 2019:6201\u20136210","journal-title":"IEEE\/CVF Int Conf Comput Vision (ICCV)"},{"key":"11547_CR2","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1007\/s11063-020-10248-1","volume":"52","author":"M Koohzadi","year":"2020","unstructured":"Koohzadi M, Charkari NM (2020) A context based deep temporal embedding network in action recognition. Neural Process Lett 52:187\u2013220","journal-title":"Neural Process Lett"},{"key":"11547_CR3","doi-asserted-by":"publisher","DOI":"10.1007\/s11063-022-11138-4","author":"B Li","year":"2023","unstructured":"Li B, Pan Y-T, Liu R, Zhu Y (2023) Separately guided context-aware network for weakly supervised temporal action detection. Neural Process Lett. https:\/\/doi.org\/10.1007\/s11063-022-11138-4","journal-title":"Neural Process Lett"},{"key":"11547_CR4","first-page":"1725","volume":"2014","author":"A Karpathy","year":"2014","unstructured":"Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. IEEE Conf Comput Visi Patt Recognit 2014:1725\u20131732","journal-title":"IEEE Conf Comput Visi Patt Recognit"},{"key":"11547_CR5","first-page":"7445","volume":"2017","author":"C Feichtenhofer","year":"2017","unstructured":"Feichtenhofer C, Pinz A, Wildes RP (2017) Spatiotemporal multiplier networks for video action recognition. IEEE Conf Comput Vision Patt Recognit (CVPR) 2017:7445\u20137454","journal-title":"IEEE Conf Comput Vision Patt Recognit (CVPR)"},{"key":"11547_CR6","doi-asserted-by":"crossref","unstructured":"Zhou B, Andonian A, Torralba A (2017) Temporal relational reasoning in videos. In: European conference on computer vision","DOI":"10.1007\/978-3-030-01246-5_49"},{"key":"11547_CR7","doi-asserted-by":"crossref","first-page":"287","DOI":"10.1007\/s11063-019-10091-z","volume":"51","author":"Z Liao","year":"2019","unstructured":"Liao Z, Hu H, Liu Y (2019) Action recognition with multiple relative descriptors of trajectories. Neural Process Lett 51:287\u2013302","journal-title":"Neural Process Lett"},{"key":"11547_CR8","doi-asserted-by":"crossref","first-page":"379","DOI":"10.1007\/s11063-018-9932-3","volume":"50","author":"H Hu","year":"2018","unstructured":"Hu H, Liao Z, Xiao X (2018) Action recognition using multiple pooling strategies of CNN features. Neural Process Lett 50:379\u2013396","journal-title":"Neural Process Lett"},{"key":"11547_CR9","doi-asserted-by":"crossref","first-page":"1501","DOI":"10.1007\/s11063-020-10320-w","volume":"52","author":"Z Gao","year":"2020","unstructured":"Gao Z, Wang P, Wang H, Xu M, Li W (2020) A review of dynamic maps for 3d human motion recognition using convnets and its improvement. Neural Process Lett 52:1501\u20131515","journal-title":"Neural Process Lett"},{"key":"11547_CR10","first-page":"4724","volume":"2017","author":"J Carreira","year":"2017","unstructured":"Carreira J, Zisserman A (2017) Quo vadis, action recognition? a new model and the kinetics dataset. IEEE Conf Comput Vision Patt Recognit (CVPR) 2017:4724\u20134733","journal-title":"IEEE Conf Comput Vision Patt Recognit (CVPR)"},{"key":"11547_CR11","first-page":"4489","volume":"2015","author":"D Tran","year":"2015","unstructured":"Tran D, Bourdev LD, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. IEEE Int Conf Comput Vision (ICCV) 2015:4489\u20134497","journal-title":"IEEE Int Conf Comput Vision (ICCV)"},{"key":"11547_CR12","doi-asserted-by":"crossref","first-page":"221","DOI":"10.1109\/TPAMI.2012.59","volume":"35","author":"S Ji","year":"2010","unstructured":"Ji S, Xu W, Yang M, Yu K (2010) 3d convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35:221\u2013231","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"11547_CR13","unstructured":"Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. arXiv:1406.2199"},{"key":"11547_CR14","first-page":"12018","volume":"2019","author":"L Shi","year":"2019","unstructured":"Shi L, Zhang Y, Cheng J, Lu H (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition. IEEE\/CVF Conf Comput Vision Patt Recognit (CVPR) 2019:12018\u201312027","journal-title":"IEEE\/CVF Conf Comput Vision Patt Recognit (CVPR)"},{"key":"11547_CR15","first-page":"2000","volume":"2019","author":"B Jiang","year":"2019","unstructured":"Jiang B, Wang M, Gan W, Wu W, Yan J (2019) Stm: Spatiotemporal and motion encoding for action recognition. IEEE\/CVF Int Conf Comput Vision (ICCV) 2019:2000\u20132009","journal-title":"IEEE\/CVF Int Conf Comput Vision (ICCV)"},{"key":"11547_CR16","first-page":"1089","volume":"2020","author":"X Li","year":"2020","unstructured":"Li X, Wang Y, Zhou Z, Qiao Y (2020) Smallbignet: integrating core and contextual views for video classification. IEEE\/CVF Conf Comput Vision Patt Recognit (CVPR) 2020:1089\u20131098","journal-title":"IEEE\/CVF Conf Comput Vision Patt Recognit (CVPR)"},{"key":"11547_CR17","first-page":"13688","volume":"2021","author":"Z Liu","year":"2021","unstructured":"Liu Z, Wang L, Wu W, Qian C, Lu T (2021) Tam: Temporal adaptive module for video recognition. IEEE\/CVF Int Conf Comput Vision (ICCV) 2021:13688\u201313698","journal-title":"IEEE\/CVF Int Conf Comput Vision (ICCV)"},{"key":"11547_CR18","doi-asserted-by":"crossref","unstructured":"Liu Z, Luo D, Wang Y, Wang L, Tai Y, Wang C, Li J, Huang F, Lu T (2019) Teinet: towards an efficient architecture for video recognition. arXiv:1911.09435","DOI":"10.1609\/aaai.v34i07.6836"},{"key":"11547_CR19","first-page":"7082","volume":"2019","author":"J Lin","year":"2019","unstructured":"Lin J, Gan C, Han S (2019) Tsm: temporal shift module for efficient video understanding. IEEE\/CVF Int Conf Comput Vision (ICCV) 2019:7082\u20137092","journal-title":"IEEE\/CVF Int Conf Comput Vision (ICCV)"},{"key":"11547_CR20","unstructured":"Chung J, Wu Y, Russakovsky O (2022) Enabling detailed action recognition evaluation through video dataset augmentation. In: Neural information processing systems. https:\/\/api.semanticscholar.org\/CorpusID:252494043"},{"key":"11547_CR21","doi-asserted-by":"crossref","first-page":"1045","DOI":"10.1109\/TPAMI.2017.2691321","volume":"40","author":"A Shahroudy","year":"2016","unstructured":"Shahroudy A, Ng T-T, Gong Y, Wang G (2016) Deep multimodal feature analysis for action recognition in rgb+d videos. IEEE Trans Pattern Anal Mach Intell 40:1045\u20131058","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"11547_CR22","first-page":"10627","volume":"2023","author":"M Zhao","year":"2023","unstructured":"Zhao M, Yu Wang X, Yang L, Niu D (2023) Search-map-search: a frame selection paradigm for action recognition. IEEE\/CVF conference on computer vision and pattern recognition (CVPR) 2023:10627\u201310636","journal-title":"IEEE\/CVF conference on computer vision and pattern recognition (CVPR)"},{"key":"11547_CR23","doi-asserted-by":"crossref","unstructured":"Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Gool LV (2016) Temporal segment networks: towards good practices for deep action recognition. In: European conference on computer vision","DOI":"10.1007\/978-3-319-46484-8_2"},{"key":"11547_CR24","first-page":"1895","volume":"2021","author":"L Wang","year":"2021","unstructured":"Wang L, Tong Z, Ji B, Wu G (2021) Tdn: temporal difference networks for efficient action recognition. IEEE\/CVF Conf Comput Vision Patt Recognit (CVPR) 2021:1895\u20131904","journal-title":"IEEE\/CVF Conf Comput Vision Patt Recognit (CVPR)"},{"key":"11547_CR25","first-page":"770","volume":"2016","author":"K He","year":"2016","unstructured":"He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. IEEE Conf Comput Vision Patt Recognit (CVPR) 2016:770\u2013778","journal-title":"IEEE Conf Comput Vision Patt Recognit (CVPR)"},{"key":"11547_CR26","unstructured":"Kay W, Carreira J, Simonyan K, Zhang B, Hillier C, Vijayanarasimhan S, Viola F, Green T, Back T, Natsev A, Suleyman M, Zisserman A (2017) The kinetics human action video dataset. arXiv:1705.06950"},{"key":"11547_CR27","first-page":"5843","volume":"2017","author":"R Goyal","year":"2017","unstructured":"Goyal R, Kahou SE, Michalski V, Materzynska J, Westphal S, Kim H, Haenel V, Fr\u00fcnd I, Yianilos PN, Mueller-Freitag M, Hoppe F, Thurau C, Bax I, Memisevic R (2017) The \u201csomething something\" video database for learning and evaluating visual common sense. IEEE Int Conf Comput Vision (ICCV) 2017:5843\u20135851","journal-title":"IEEE Int Conf Comput Vision (ICCV)"},{"issue":"9","key":"11547_CR28","doi-asserted-by":"publisher","first-page":"5174","DOI":"10.1109\/TCSVT.2023.3250646","volume":"33","author":"Z Li","year":"2023","unstructured":"Li Z, Li J, Ma Y, Wang R, Shi Z, Ding Y, Liu X (2023) Spatio-temporal adaptive network with bidirectional temporal difference for action recognition. IEEE Trans Circuits Syst Video Technol 33(9):5174\u20135185. https:\/\/doi.org\/10.1109\/TCSVT.2023.3250646","journal-title":"IEEE Trans Circuits Syst Video Technol"},{"key":"11547_CR29","first-page":"906","volume":"2020","author":"Y Li","year":"2020","unstructured":"Li Y, Ji B, Shi X, Zhang J, Kang B, Wang L (2020) Tea: Temporal excitation and aggregation for action recognition. IEEE\/CVF Conf Comput Vision Patt Recognit (CVPR) 2020:906\u2013915","journal-title":"IEEE\/CVF Conf Comput Vision Patt Recognit (CVPR)"},{"key":"11547_CR30","first-page":"6566","volume":"2018","author":"Y Zhao","year":"2018","unstructured":"Zhao Y, Xiong Y, Lin D (2018) Recognize actions by disentangling components of dynamics. IEEE\/CVF Conf Comput Vision Patt Recognit 2018:6566\u20136575","journal-title":"IEEE\/CVF Conf Comput Vision Patt Recognit"},{"key":"11547_CR31","first-page":"1587","volume":"2018","author":"JY-H Ng","year":"2018","unstructured":"Ng JY-H, Davis LS (2018) Temporal difference networks for video action recognition. IEEE Winter Conf Appl Comput Vision (WACV) 2018:1587\u20131596","journal-title":"IEEE Winter Conf Appl Comput Vision (WACV)"},{"key":"11547_CR32","first-page":"4651","volume":"2016","author":"Q You","year":"2016","unstructured":"You Q, Jin H, Wang Z, Fang C, Luo J (2016) Image captioning with semantic attention. IEEE Conf Comput Vision Patt Recognit (CVPR) 2016:4651\u20134659","journal-title":"IEEE Conf Comput Vision Patt Recognit (CVPR)"},{"key":"11547_CR33","doi-asserted-by":"crossref","first-page":"6171","DOI":"10.1523\/JNEUROSCI.14-10-06171.1994","volume":"14","author":"B Olshausen","year":"1994","unstructured":"Olshausen B, Anderson C, Essen D (1994) A neurobiological model of visual attention and invariant pattern recognition based task. J Neurosci 14:6171\u20136186","journal-title":"J Neurosci"},{"key":"11547_CR34","doi-asserted-by":"crossref","first-page":"194","DOI":"10.1038\/35058500","volume":"2","author":"L Itti","year":"2001","unstructured":"Itti L, Koch C (2001) Computational modelling of visual attention. Nat Rev Neurosci 2:194\u2013203","journal-title":"Nat Rev Neurosci"},{"key":"11547_CR35","doi-asserted-by":"crossref","first-page":"1254","DOI":"10.1109\/34.730558","volume":"20","author":"L Itti","year":"1998","unstructured":"Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20:1254\u20131259","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"11547_CR36","first-page":"1243","volume":"1","author":"H Larochelle","year":"2010","unstructured":"Larochelle H, Hinton G (2010) Learning to combine foveal glimpses with a third-order Boltzmann machine. Adv Neural Inform Process syst 1:1243\u20131251","journal-title":"Adv Neural Inform Process syst"},{"key":"11547_CR37","unstructured":"Mnih V, Heess NMO, Graves A, Kavukcuoglu K (2014) Recurrent models of visual attention. In: NIPS"},{"issue":"11","key":"11547_CR38","doi-asserted-by":"crossref","first-page":"4700","DOI":"10.1523\/JNEUROSCI.13-11-04700.1993","volume":"13","author":"B Olshausen","year":"1993","unstructured":"Olshausen B, Anderson CH, Essen DC (1993) A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information. J Neurosci 13(11):4700\u20134719","journal-title":"J Neurosci"},{"key":"11547_CR39","doi-asserted-by":"crossref","first-page":"2853","DOI":"10.1007\/s11063-021-10523-9","volume":"53","author":"X Wang","year":"2021","unstructured":"Wang X, Tong J, Wang R (2021) Attention refined network for human pose estimation. Neural Process Lett 53:2853\u20132872","journal-title":"Neural Process Lett"},{"key":"11547_CR40","doi-asserted-by":"crossref","first-page":"313","DOI":"10.1007\/s11063-017-9721-4","volume":"48","author":"Y Peng","year":"2018","unstructured":"Peng Y, Li L, Liu S, Lei T, Wu J (2018) A new virtual samples-based CRC method for face recognition. Neural Process Lett 48:313\u2013327","journal-title":"Neural Process Lett"},{"key":"11547_CR41","doi-asserted-by":"crossref","first-page":"2011","DOI":"10.1109\/TPAMI.2019.2913372","volume":"42","author":"J Hu","year":"2017","unstructured":"Hu J, Shen L, Albanie S, Sun G, Wu E (2017) Squeeze-and-excitation networks. IEEE Trans Pattern Anal Mach Intell 42:2011\u20132023","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"11547_CR42","first-page":"13209","volume":"2021","author":"Z Wang","year":"2021","unstructured":"Wang Z, She Q, Smolic A (2021) Action-net: multipath excitation for action recognition. IEEE\/CVF Conf Comput Vision Patt Recognit (CVPR) 2021:13209\u201313218","journal-title":"IEEE\/CVF Conf Comput Vision Patt Recognit (CVPR)"},{"issue":"9","key":"11547_CR43","doi-asserted-by":"crossref","first-page":"7","DOI":"10.1167\/15.9.7","volume":"15","author":"L Spillmann","year":"2015","unstructured":"Spillmann L, Dresp-Langley B, Dresp B, Tseng C-h (2015) Beyond the classical receptive field: the effect of contextual stimuli. J Vision 15(9):7","journal-title":"J Vision"},{"key":"11547_CR44","unstructured":"Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning. https:\/\/api.semanticscholar.org\/CorpusID:5808102"},{"key":"11547_CR45","doi-asserted-by":"crossref","unstructured":"Szegedy C, Liu W, Jia Y, Sermanet P, Reed SE, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2014) Going deeper with convolutions. In: 2015 IEEE Conference on computer vision and pattern recognition (CVPR), 1\u20139, 1(2)","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"11547_CR46","first-page":"2818","volume":"2016","author":"C Szegedy","year":"2015","unstructured":"Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2015) Rethinking the inception architecture for computer vision. IEEE Conf Comput Vision Patt Recognit (CVPR) 2016:2818\u20132826","journal-title":"IEEE Conf Comput Vision Patt Recognit (CVPR)"},{"key":"11547_CR47","doi-asserted-by":"crossref","unstructured":"Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2016) Inception-v4, inception-resnet and the impact of residual connections on learning. arXiv:1602.07261","DOI":"10.1609\/aaai.v31i1.11231"},{"key":"11547_CR48","first-page":"510","volume":"2019","author":"X Li","year":"2019","unstructured":"Li X, Wang W, Hu X, Yang J (2019) Selective kernel networks. IEEE\/CVF Conf Comput Vision Patt Recognit (CVPR) 2019:510\u2013519","journal-title":"IEEE\/CVF Conf Comput Vision Patt Recognit (CVPR)"},{"key":"11547_CR49","unstructured":"Nair V, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. In: International conference on machine learning"},{"key":"11547_CR50","unstructured":"Zhang S, Guo S, Huang W, Scott MR, Wang L (2020) V4d: 4d covolutional neural networks for video-level representations learning. In: International conference on learning representations"},{"key":"11547_CR51","first-page":"248","volume":"2009","author":"J Deng","year":"2009","unstructured":"Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. IEEE Conf Comput Vision Patt Recognit 2009:248\u2013255","journal-title":"IEEE Conf Comput Vision Patt Recognit"},{"key":"11547_CR52","first-page":"7794","volume":"2018","author":"X Wang","year":"2018","unstructured":"Wang X, Girshick RB, Gupta AK, He K (2018) Non-local neural networks. IEEE\/CVF Conf Comput Vision Patt Recognit 2018:7794\u20137803","journal-title":"IEEE\/CVF Conf Comput Vision Patt Recognit"},{"key":"11547_CR53","first-page":"5511","volume":"2019","author":"C Luo","year":"2019","unstructured":"Luo C, Yuille AL (2019) Grouped spatial-temporal aggregation for efficient action recognition. IEEE\/CVF Int Conf Comput Vision (ICCV) 2019:5511\u20135520","journal-title":"IEEE\/CVF Int Conf Comput Vision (ICCV)"},{"key":"11547_CR54","doi-asserted-by":"crossref","unstructured":"Weng J, Luo D, Wang Y, Tai Y, Wang C, Li J, Huang F, Jiang X, Yuan J (2020) Temporal distinct representation learning for action recognition. arXiv:2007.07626","DOI":"10.1007\/978-3-030-58571-6_22"},{"key":"11547_CR55","first-page":"433","volume":"2019","author":"Y Chen","year":"2019","unstructured":"Chen Y, Rohrbach M, Yan Z, Yan S, Feng J, Kalantidis Y (2019) Graph-based global reasoning networks. IEEE\/CVF Conf Comput Vision Patt Recognit (CVPR) 2019:433\u2013442","journal-title":"IEEE\/CVF Conf Comput Vision Patt Recognit (CVPR)"},{"key":"11547_CR56","first-page":"349","volume":"2020","author":"H Wang","year":"2020","unstructured":"Wang H, Tran D, Torresani L, Feiszli M (2020) Video modeling with correlation networks. IEEE\/CVF Conf Comput Vision Patt Recognit (CVPR) 2020:349\u2013358","journal-title":"IEEE\/CVF Conf Comput Vision Patt Recognit (CVPR)"},{"key":"11547_CR57","doi-asserted-by":"publisher","unstructured":"Li B, Chen J, Zhang D, Bao X, Huang D (2022) Representation learning for compressed video action recognition via attentive cross-modal interaction with motion enhancement. In: Raedt, L.D. (ed.) Proceedings of the thirty-first international joint conference on artificial intelligence, IJCAI-22, pp.1060\u20131066. International joint conferences on artificial intelligence organization. https:\/\/doi.org\/10.24963\/ijcai.2022\/148 . Main Track. https:\/\/doi.org\/10.24963\/ijcai.2022\/148","DOI":"10.24963\/ijcai.2022\/148"},{"key":"11547_CR58","doi-asserted-by":"crossref","first-page":"336","DOI":"10.1007\/s11263-019-01228-7","volume":"128","author":"RR Selvaraju","year":"2016","unstructured":"Selvaraju RR, Das A, Vedantam R, Cogswell M, Parikh D, Batra D (2016) Grad-cam: visual explanations from deep networks via gradient-based localization. Int J Comput Vision 128:336\u2013359","journal-title":"Int J Comput Vision"}],"container-title":["Neural Processing Letters"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11063-024-11547-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11063-024-11547-7\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11063-024-11547-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,5,16]],"date-time":"2024-05-16T20:31:08Z","timestamp":1715891468000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11063-024-11547-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,3,6]]},"references-count":58,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2024,4]]}},"alternative-id":["11547"],"URL":"https:\/\/doi.org\/10.1007\/s11063-024-11547-7","relation":{},"ISSN":["1573-773X"],"issn-type":[{"value":"1573-773X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,3,6]]},"assertion":[{"value":"28 January 2024","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"6 March 2024","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}],"article-number":"87"}}