{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,1]],"date-time":"2026-05-01T09:08:01Z","timestamp":1777626481627,"version":"3.51.4"},"reference-count":40,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2019,8,8]],"date-time":"2019-08-08T00:00:00Z","timestamp":1565222400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"National Key R8D Program of China","award":["2018YFB1601101"],"award-info":[{"award-number":["2018YFB1601101"]}]},{"name":"Science and Technology Program of Guangzhou","award":["201704020180"],"award-info":[{"award-number":["201704020180"]}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["61673402, 61273270, and 60802069"],"award-info":[{"award-number":["61673402, 61273270, and 60802069"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100012226","name":"Fundamental Research Funds for the Central Universities of China","doi-asserted-by":"crossref","award":["17lgzd08"],"award-info":[{"award-number":["17lgzd08"]}],"id":[{"id":"10.13039\/501100012226","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100003453","name":"Natural Science Foundation of Guangdong","doi-asserted-by":"crossref","award":["2017A030311029"],"award-info":[{"award-number":["2017A030311029"]}],"id":[{"id":"10.13039\/501100003453","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2019,8,31]]},"abstract":"<jats:p>Recently, many deep learning approaches have shown remarkable progress on human action recognition. However, it remains unclear how to extract the useful information in videos since only video-level labels are available in the training phase. To address this limitation, many efforts have been made to improve the performance of action recognition by applying the visual attention mechanism in the deep learning model. In this article, we propose a novel deep model called Moving Foreground Attention (MFA) that enhances the performance of action recognition by guiding the model to focus on the discriminative foreground targets. In our work, MFA detects the moving foreground through a proposed variance-based algorithm. Meanwhile, an unsupervised proposal is utilized to mine the action-related key volumes and generate corresponding correlation scores. Based on these scores, a newly proposed stochastic-out scheme is exploited to train the MFA. Experiment results show that action recognition performance can be significantly improved by using our proposed techniques, and our model achieves state-of-the-art performance on UCF101 and HMDB51.<\/jats:p>","DOI":"10.1145\/3321511","type":"journal-article","created":{"date-parts":[[2019,8,8]],"date-time":"2019-08-08T12:30:31Z","timestamp":1565267431000},"page":"1-16","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":16,"title":["Moving Foreground-Aware Visual Attention and Key Volume Mining for Human Action Recognition"],"prefix":"10.1145","volume":"15","author":[{"given":"Junxuan","family":"Zhang","sequence":"first","affiliation":[{"name":"School of Electronic and Information Technology, Sun Yat-sen University, Guangzhou, Peoples Republic of China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4884-323X","authenticated-orcid":false,"given":"Haifeng","family":"Hu","sequence":"additional","affiliation":[{"name":"School of Electronic and Information Technology, Sun Yat-sen University, Guangzhou, Peoples Republic of China"}]},{"given":"Xinlong","family":"Lu","sequence":"additional","affiliation":[{"name":"School of Electronic and Information Technology, Sun Yat-sen University, Guangzhou, Peoples Republic of China"}]}],"member":"320","published-online":{"date-parts":[[2019,8,8]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201915)","author":"Yue-Hei Ng Joe"},{"key":"e_1_2_1_2_1","volume-title":"Advances in Neural Information Processing Systems 15","author":"Andrews Stuart"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCVW.2013.73"},{"key":"e_1_2_1_4_1","volume-title":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201916)","author":"Bilen H."},{"key":"e_1_2_1_5_1","volume-title":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201917)","author":"Carreira J."},{"key":"e_1_2_1_6_1","volume-title":"2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance. 65--72","author":"Dollar P."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2016.2599174"},{"key":"e_1_2_1_8_1","volume-title":"Wildes","author":"Feichtenhofer Christoph","year":"2016"},{"key":"e_1_2_1_9_1","volume-title":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201916). 1933","author":"Feichtenhofer C.","year":"1941"},{"key":"e_1_2_1_10_1","volume-title":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201917)","author":"Huang G."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.223"},{"key":"e_1_2_1_12_1","volume-title":"High Performance Computing in Science and Engineering\u201912, Wolfgang E","author":"Kuehne Hilde"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-005-1838-7"},{"key":"e_1_2_1_14_1","volume-title":"Action recognition with coarse-to-fine deep feature integration and asynchronous fusion. CoRR abs\/1711.07430","author":"Lin Weiyao","year":"2017"},{"key":"e_1_2_1_15_1","volume-title":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201917)","author":"Liu J."},{"key":"e_1_2_1_16_1","volume-title":"2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 1159--1168","author":"Liu M."},{"key":"e_1_2_1_17_1","volume-title":"Recurrent models of visual attention. CoRR abs\/1406.6247","author":"Mnih Volodymyr","year":"2014"},{"key":"e_1_2_1_18_1","volume-title":"2017 IEEE International Conference on Computer Vision (ICCV\u201917)","author":"Qiu Z."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1080\/135062800394667"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-013-0662-8"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/1291233.1291311"},{"key":"e_1_2_1_22_1","volume-title":"Action recognition using visual attention. CoRR abs\/1511.04119","author":"Sharma Shikhar","year":"2015"},{"key":"e_1_2_1_23_1","volume-title":"Two-stream convolutional networks for action recognition in videos. CoRR abs\/1406.2199","author":"Simonyan Karen","year":"2014"},{"key":"e_1_2_1_24_1","volume-title":"Very deep convolutional networks for large-scale image recognition. CoRR abs\/1409.1556","author":"Simonyan Karen","year":"2014"},{"key":"e_1_2_1_25_1","volume-title":"An end-to-end spatio-temporal attention model for human action recognition from skeleton data. CoRR abs\/1611.06067","author":"Song Sijie","year":"2016"},{"key":"e_1_2_1_26_1","volume-title":"Amir Roshan Zamir, and Mubarak Shah","author":"Soomro Khurram","year":"2012"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.510"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2017.2712608"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-012-0588-6"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2013.441"},{"key":"e_1_2_1_31_1","volume-title":"2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201915)","author":"Wang L."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46484-8_2"},{"key":"e_1_2_1_33_1","volume-title":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201916)","author":"Wang X."},{"key":"e_1_2_1_34_1","volume-title":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201917)","author":"Wang Y.","year":"2097"},{"key":"e_1_2_1_35_1","volume-title":"2018 25th IEEE International Conference on Image Processing (ICIP\u201918)","author":"Wang Z."},{"key":"e_1_2_1_36_1","volume-title":"attend and tell: Neural image caption generation with visual attention. CoRR abs\/1502.03044","author":"Xu Kelvin","year":"2015"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.image.2017.11.005"},{"key":"e_1_2_1_38_1","doi-asserted-by":"crossref","unstructured":"C. Zach T. Pock and H. Bischof. 2007. A duality based approach for realtime tv-l1 optical flow. In Pattern Recognition Fred A. Hamprecht Christoph Schn\u00f6rr and Bernd J\u00e4hne (Eds.). Springer Berlin 214--223.   C. Zach T. Pock and H. Bischof. 2007. A duality based approach for realtime tv-l1 optical flow. In Pattern Recognition Fred A. Hamprecht Christoph Schn\u00f6rr and Bernd J\u00e4hne (Eds.). Springer Berlin 214--223.","DOI":"10.1007\/978-3-540-74936-3_22"},{"key":"e_1_2_1_39_1","volume-title":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201916). 1991","author":"Zhu W.","year":"1999"},{"key":"e_1_2_1_40_1","volume-title":"Computer Vision -- ECCV","author":"Lawrence Zitnick C.","year":"2014"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3321511","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3321511","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:54:38Z","timestamp":1750204478000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3321511"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,8,8]]},"references-count":40,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2019,8,31]]}},"alternative-id":["10.1145\/3321511"],"URL":"https:\/\/doi.org\/10.1145\/3321511","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,8,8]]},"assertion":[{"value":"2018-10-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-03-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-08-08","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}