{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,30]],"date-time":"2025-07-30T15:03:02Z","timestamp":1753887782701,"version":"3.41.2"},"reference-count":52,"publisher":"Wiley","issue":"1","license":[{"start":{"date-parts":[[2021,3,30]],"date-time":"2021-03-30T00:00:00Z","timestamp":1617062400000},"content-version":"vor","delay-in-days":88,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100004731","name":"Natural Science Foundation of Zhejiang Province","doi-asserted-by":"publisher","award":["LGF21F020007","LGF20F020003"],"award-info":[{"award-number":["LGF21F020007","LGF20F020003"]}],"id":[{"id":"10.13039\/501100004731","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100008867","name":"Department of Education of Zhejiang Province","doi-asserted-by":"publisher","award":["Y202043874"],"award-info":[{"award-number":["Y202043874"]}],"id":[{"id":"10.13039\/501100008867","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["onlinelibrary.wiley.com"],"crossmark-restriction":true},"short-container-title":["Computational Intelligence and Neuroscience"],"published-print":{"date-parts":[[2021,1]]},"abstract":"<jats:p>Convolutional neural network (CNN) has been leaping forward in recent years. However, the high dimensionality, rich human dynamic characteristics, and various kinds of background interference increase difficulty for traditional CNNs in capturing complicated motion data in videos. A novel framework named the attention\u2010based temporal encoding network (ATEN) with background\u2010independent motion mask (BIMM) is proposed to achieve video action recognition here. Initially, we introduce one motion segmenting approach on the basis of boundary prior by associating with the minimal geodesic distance inside a weighted graph that is not directed. Then, we propose one dynamic contrast segmenting strategic procedure for segmenting the object that moves within complicated environments. Subsequently, we build the BIMM for enhancing the object that moves based on the suppression of the not relevant background inside the respective frame. Furthermore, we design one long\u2010range attention system inside ATEN, capable of effectively remedying the dependency of sophisticated actions that are not periodic in a long term based on the more automatic focus on the semantical vital frames other than the equal process for overall sampled frames. For this reason, the attention mechanism is capable of suppressing the temporal redundancy and highlighting the discriminative frames. Lastly, the framework is assessed by using HMDB51 and UCF101 datasets. As revealed from the experimentally achieved results, our ATEN with BIMM gains 94.5% and 70.6% accuracy, respectively, which outperforms a number of existing methods on both datasets.<\/jats:p>","DOI":"10.1155\/2021\/8890808","type":"journal-article","created":{"date-parts":[[2021,3,30]],"date-time":"2021-03-30T20:20:10Z","timestamp":1617135610000},"update-policy":"https:\/\/doi.org\/10.1002\/crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Attention\u2010Based Temporal Encoding Network with Background\u2010Independent Motion Mask for Action Recognition"],"prefix":"10.1155","volume":"2021","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6704-5049","authenticated-orcid":false,"given":"Zhengkui","family":"Weng","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhipeng","family":"Jin","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shuangxi","family":"Chen","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Quanquan","family":"Shen","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xiangyang","family":"Ren","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wuzhao","family":"Li","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"311","published-online":{"date-parts":[[2021,3,30]]},"reference":[{"key":"e_1_2_10_1_2","doi-asserted-by":"crossref","unstructured":"WangA. K. SchmidC. andLiuC.-L. Action recognition by dense trajectories Proceedings of the CVPR June 2011 Colorado Springs CO USA 3169\u20133176.","DOI":"10.1109\/CVPR.2011.5995407"},{"key":"e_1_2_10_2_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cviu.2013.01.013"},{"key":"e_1_2_10_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/lsp.2016.2611485"},{"key":"e_1_2_10_4_2","doi-asserted-by":"crossref","unstructured":"SunL. JiaK. YeungD. Y. andBertramE. S. Human action recognition using factorized spatio-temporal convolution networks Proceedings of the ICCV December 2015 Santiago Chile 4597\u20134605.","DOI":"10.1109\/ICCV.2015.522"},{"key":"e_1_2_10_5_2","doi-asserted-by":"crossref","unstructured":"LeaC. ReiterA. VidalR. andHagerG. D. Gmental Spatiotemporal CNNs for fine-grained action segmentation Proceedings of the ECCV October 2016 Amsterdam The Netherlands 36\u201352.","DOI":"10.1007\/978-3-319-46487-9_3"},{"key":"e_1_2_10_6_2","unstructured":"KrizhevskyA. SutskeverI. andHintonG. E. ImageNet classification with deep convolution neural networks Proceedings of the NIPS December 2012 Lake Tahoe NV USA 1097\u20131105."},{"key":"e_1_2_10_7_2","unstructured":"LaptevI. MarszalekM. SchmidC. andRozenfeldB. Learning realistic human actions from movies Proceedings of the CVPR June 2018 Salt Lake City UT USA 1\u20138."},{"key":"e_1_2_10_8_2","first-page":"391","article-title":"Motion keypoint trajectory and covariance descriptor for human action recognition","volume":"34","author":"Yun Y.","year":"2018","journal-title":"Vision Computing"},{"key":"e_1_2_10_9_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2018.01.020"},{"key":"e_1_2_10_10_2","doi-asserted-by":"crossref","unstructured":"TuZ. LiY. CaoJ. andLiB. MSR-CNN: applying motion salient region based descriptors for action recognition Proceedings of the ICPR December 2016 Cancun Mexico 3524\u20133529.","DOI":"10.1109\/ICPR.2016.7900180"},{"key":"e_1_2_10_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/tpami.2012.59"},{"key":"e_1_2_10_12_2","unstructured":"WangH.andSchmidC. Action recognition with improved trajectories Proceedings of the ICCV September 2014 Zurich Switzerland 3551\u20133558."},{"key":"e_1_2_10_13_2","doi-asserted-by":"crossref","unstructured":"DonahueJ. Anne HendricksL. GuadarramaS.et al. Long-term recurrent convolutional networks for visual recognition and description Proceedings of the CVPR June 2015 Boston MA USA 2625\u20132634.","DOI":"10.1109\/CVPR.2015.7298878"},{"key":"e_1_2_10_14_2","doi-asserted-by":"crossref","unstructured":"CheronG. LaptevI. andSchmidC. P-CNN: pose-based CNN features for action recognition Proceedings of the ICCV December 2015 Santiago Chile 3218\u20133226.","DOI":"10.1109\/ICCV.2015.368"},{"key":"e_1_2_10_15_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-015-2819-7"},{"key":"e_1_2_10_16_2","article-title":"Temporal pyramid pooling based convolution neural network for action recognition","volume":"1","author":"Wang P.","year":"2017","journal-title":"IEEE Transactions on Circuits and Systems for Video Technology"},{"key":"e_1_2_10_17_2","unstructured":"NgJ. Y.-H. HausknechtM. VijayanarasimhanS. VinyalsO. MongaR. andTodericiG. Beyond short snippets: deep networks for video classification Proceedings of the CVPR June 2015 Boston MA USA 4694\u20134702."},{"key":"e_1_2_10_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/tpami.2011.272"},{"key":"e_1_2_10_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/tpami.2017.2662005"},{"key":"e_1_2_10_20_2","doi-asserted-by":"crossref","unstructured":"DaniilidisK. MaragosP. andParagiosN. Improving the fisher kernel for large-scale image classification Proceedings of the ECCV September 2010 Crete Greece 143\u2013156.","DOI":"10.1007\/978-3-642-15561-1_11"},{"key":"e_1_2_10_21_2","doi-asserted-by":"publisher","DOI":"10.1007\/s13042-010-0001-0"},{"key":"e_1_2_10_22_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-012-0594-8"},{"key":"e_1_2_10_23_2","doi-asserted-by":"crossref","unstructured":"VigE. DorrM. andCoxD. Space-variant descriptor sampling for action recognition based on saliency and eye movements Proceedings of the ECCV October 2012 Firenze Italy 84\u201397 https:\/\/doi.org\/10.1007\/978-3-642-33786-4_7 2-s2.0-84867884391.","DOI":"10.1007\/978-3-642-33786-4_7"},{"key":"e_1_2_10_24_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.sigpro.2013.05.002"},{"key":"e_1_2_10_25_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2016.09.106"},{"key":"e_1_2_10_26_2","doi-asserted-by":"publisher","DOI":"10.1186\/s13640-018-0250-5"},{"key":"e_1_2_10_27_2","unstructured":"SimonyanK.andZissermanA. Two-Stream convolution networks for action recognition in videos Proceedings of the NIPS December 2014 Montreal Canada 568\u2013576."},{"key":"e_1_2_10_28_2","doi-asserted-by":"crossref","unstructured":"TranD. BourdevL. D. FergusR. TorresaniL. andPaluriM. Learning spatiotemporal features with 3d convolutional networks Proceedings of the ICCV December 2015 Santiago Chile 4489\u20134497.","DOI":"10.1109\/ICCV.2015.510"},{"key":"e_1_2_10_29_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2017.04.004"},{"key":"e_1_2_10_30_2","doi-asserted-by":"crossref","unstructured":"WangL. QiaoY. andTangX. Action recognition with trajectory-pooled deep-convolution descriptors Proceedings of the CVPR June 2015 Boston MA USA 4305\u20134314.","DOI":"10.1109\/CVPR.2015.7299059"},{"key":"e_1_2_10_31_2","doi-asserted-by":"crossref","unstructured":"FeichtenhoferC. PinzA. andZissermanA. Convolutional twostream network fusion for video action recognition Proceedings of the CVPR June 2016 Las Vegas NV USA 1933\u20131941.","DOI":"10.1109\/CVPR.2016.213"},{"key":"e_1_2_10_32_2","unstructured":"RuiH. ChenC. andShahM. Tube convolutional neural network (T-CNN) for action detection in videos Proceedings of the ICCV October 2017 Venice Italy 5823\u20135832."},{"key":"e_1_2_10_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/tmm.2017.2666540"},{"key":"e_1_2_10_34_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46484-8_2"},{"key":"e_1_2_10_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2017.2712608"},{"key":"e_1_2_10_36_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.image.2017.11.005"},{"key":"e_1_2_10_37_2","unstructured":"XuK. BaJ. KirosR.et al. Show attend and tell: neural image caption generation with visual attention Proceedings of the International Conference on Machine Learning July 2015 Lille France 2048\u20132057."},{"key":"e_1_2_10_38_2","unstructured":"BahdanauD. ChoK. andBengioY. Neural machine translation by jointly learning to align and translate 2014 https:\/\/arxiv.org\/abs\/1409.0473."},{"key":"e_1_2_10_39_2","doi-asserted-by":"publisher","DOI":"10.1049\/iet-ipr.2019.0963"},{"key":"e_1_2_10_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/tmm.2018.2862341"},{"key":"e_1_2_10_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/access.2020.2979549"},{"key":"e_1_2_10_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/tip.2020.2984904"},{"key":"e_1_2_10_43_2","doi-asserted-by":"crossref","unstructured":"VersaciM. CalcagnoS. andMorabitoF. C. Image contrast enhancement by distances among points in fuzzy hyper-cubes Proceedings of the International Conference on Computer Analysis of Images and Patterns September 2015 Valletta Malta 494\u2013505.","DOI":"10.1007\/978-3-319-23117-4_43"},{"key":"e_1_2_10_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/access.2017.2776349"},{"key":"e_1_2_10_45_2","doi-asserted-by":"crossref","unstructured":"VersaciM. CalcagnoS. andMorabitoF. C. Fuzzy geometrical approach based on unit hyper-cubes for image contrast enhancement Proceedings of the IEEE International Conference on Signal and Image Processing Applications (ICSIPA) October 2015 Malaysia Kuala Lumpur 488\u2013493.","DOI":"10.1109\/ICSIPA.2015.7412240"},{"key":"e_1_2_10_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/tpami.1986.4767851"},{"key":"e_1_2_10_47_2","doi-asserted-by":"crossref","unstructured":"ZachC. PockT. andBischofH. A duality based approach for realtime TV-L1 optical flow Proceedings of the Symposium on Pattern Recognition September 2007 Heidelberg Germany 214\u2013223.","DOI":"10.1007\/978-3-540-74936-3_22"},{"key":"e_1_2_10_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/tpami.2012.120"},{"key":"e_1_2_10_49_2","doi-asserted-by":"crossref","unstructured":"KuehneH. HMDB: a large video database for human motion recognition Proceedings of the ICCV November 2011 Barcelona Spain 2556\u20132563.","DOI":"10.1109\/ICCV.2011.6126543"},{"key":"e_1_2_10_50_2","unstructured":"SoomroK. ZamirA. R. andShahM. UCF101: a data set of 101 human actions classes from videos in the wild 2012 UCF Center for Research in Computer Vision Orlando FL USA Technical report CRCV-TR-12-01."},{"key":"e_1_2_10_51_2","unstructured":"IoffeS.andSzegedyC. Batch normalization: accelerating deep network training by reducing internal covariate shift Proceedings of the ICML July 2015 Lille France 448\u2013456."},{"key":"e_1_2_10_52_2","doi-asserted-by":"crossref","unstructured":"DengJ. DongW. SocherR. LiL. J. LiK. andFei-FeiL. Imagenet: a large-scale hierarchical image database Proceedings of the CVPR June 2009 Miami FL USA 248\u2013255.","DOI":"10.1109\/CVPR.2009.5206848"}],"container-title":["Computational Intelligence and Neuroscience"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/downloads.hindawi.com\/journals\/cin\/2021\/8890808.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/downloads.hindawi.com\/journals\/cin\/2021\/8890808.xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1155\/2021\/8890808","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,6]],"date-time":"2024-08-06T11:39:44Z","timestamp":1722944384000},"score":1,"resource":{"primary":{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/10.1155\/2021\/8890808"}},"subtitle":[],"editor":[{"given":"Mario","family":"Versaci","sequence":"additional","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2021,1]]},"references-count":52,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,1]]}},"alternative-id":["10.1155\/2021\/8890808"],"URL":"https:\/\/doi.org\/10.1155\/2021\/8890808","archive":["Portico"],"relation":{},"ISSN":["1687-5265","1687-5273"],"issn-type":[{"type":"print","value":"1687-5265"},{"type":"electronic","value":"1687-5273"}],"subject":[],"published":{"date-parts":[[2021,1]]},"assertion":[{"value":"2020-09-12","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-03-12","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-03-30","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}],"article-number":"8890808"}}