{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,2]],"date-time":"2025-07-02T04:27:06Z","timestamp":1751430426798,"version":"3.41.0"},"reference-count":80,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2021,6,29]],"date-time":"2021-06-29T00:00:00Z","timestamp":1624924800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100007065","name":"NVIDIA","doi-asserted-by":"crossref","id":[{"id":"10.13039\/100007065","id-type":"DOI","asserted-by":"crossref"}]},{"name":"TITAN Xp GPU"},{"name":"MSRA"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Intell. Syst. Technol."],"published-print":{"date-parts":[[2021,6,30]]},"abstract":"<jats:p>In this article, we study the problem of video-based action recognition. We improve the action recognition performance by finding an effective temporal and appearance representation. For capturing the temporal representation, we introduce two temporal learning techniques for improving long-term temporal information modeling, specifically Temporal Relational Network and Temporal Second-Order Pooling-based Network. Moreover, we harness the representation using complementary learning techniques, specifically Global-Local Network and Fuse-Inception Network. Performance evaluation on three datasets (UCF101, HMDB-51, and Mini-Kinetics-200) demonstrated the superiority of the proposed framework compared to the 2D Deep ConvNets-based state-of-the-art techniques.<\/jats:p>","DOI":"10.1145\/3447686","type":"journal-article","created":{"date-parts":[[2021,6,30]],"date-time":"2021-06-30T00:23:56Z","timestamp":1625012636000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["Improving Action Recognition via Temporal and Complementary Learning"],"prefix":"10.1145","volume":"12","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4887-7050","authenticated-orcid":false,"given":"Nour Eldin","family":"Elmadany","sequence":"first","affiliation":[{"name":"Ryerson University and Vector Institute"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yifeng","family":"He","sequence":"additional","affiliation":[{"name":"Ryerson University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ling","family":"Guan","sequence":"additional","affiliation":[{"name":"Ryerson University"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,6,29]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.4108\/icst.bodynets.2014.257036"},{"key":"e_1_2_1_2_1","volume-title":"Proceedings of the 4th International Joint Conference on Pattern Recognition. 579\u2013583","author":"Beaudet P. R.","year":"1978","unstructured":"P. R. Beaudet . 1978 . Rotationally invariant image operators . In Proceedings of the 4th International Joint Conference on Pattern Recognition. 579\u2013583 . P. R. Beaudet. 1978. Rotationally invariant image operators. In Proceedings of the 4th International Joint Conference on Pattern Recognition. 579\u2013583."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.331"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2005.28"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2012.6239175"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/34.910878"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.502"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206821"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01246-5_22"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.352"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2005.177"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1007\/11744047_33"},{"key":"e_1_2_1_13_1","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 1117\u20131121","author":"Diba Ali","year":"2018","unstructured":"Ali Diba , Mohsen Fayyaz , Vivek Sharma , A. Hossein Karami , M. Mahdi Arzani , Rahman Yousefzadeh , and Luc Van Gool . 2018 a. Temporal 3D ConvNets using temporal transition layer . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 1117\u20131121 . Ali Diba, Mohsen Fayyaz, Vivek Sharma, A. Hossein Karami, M. Mahdi Arzani, Rahman Yousefzadeh, and Luc Van Gool. 2018a. Temporal 3D ConvNets using temporal transition layer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 1117\u20131121."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01225-0_18"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.168"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.5555\/1259587.1259830"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298878"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2018.2855438"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.5555\/3157382.3157486"},{"volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4768\u20134777","author":"Feichtenhofer Christoph","key":"e_1_2_1_20_1","unstructured":"Christoph Feichtenhofer , Axel Pinz , and Richard P. Wildes . 2017. Spatiotemporal multiplier networks for video action recognition . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4768\u20134777 . Christoph Feichtenhofer, Axel Pinz, and Richard P. Wildes. 2017. Spatiotemporal multiplier networks for video action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4768\u20134777."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.213"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.41"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00685"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.62"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00745"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.243"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2013.330"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2010.5540039"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.223"},{"volume-title":"Proceedings of the British Machine Vision Conference (BMVC\u201915)","author":"Klaser A.","key":"e_1_2_1_31_1","unstructured":"A. Klaser , M. Marszalek , and C. Schmid . 2015. A SpatioTemporal descriptor based on 3D-gradients . In Proceedings of the British Machine Vision Conference (BMVC\u201915) . A. Klaser, M. Marszalek, and C. Schmid. 2015. A SpatioTemporal descriptor based on 3D-gradients. In Proceedings of the British Machine Vision Conference (BMVC\u201915)."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2011.6126543"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.5555\/645530.655813"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1007\/11676959_8"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-005-1838-7"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01231-1_19"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.33018674"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2017.2715045"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00718"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.170"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2018.10.095"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2009.5459154"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-010-0677-x"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2013.228"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCSVT.2018.2808685"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.5555\/1888089.1888101"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2013.2246148"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.5555\/3295222.3295250"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/1291233.1291311"},{"key":"e_1_2_1_50_1","volume-title":"Proceedings of the International Conference on Pattern Recognition (ICPR\u201912)","author":"Sermanet Pierre","year":"2012","unstructured":"Pierre Sermanet , Soumith Chintala , and Yann LeCun . 2012 . Convolutional neural networks applied to house numbers digit classification . In Proceedings of the International Conference on Pattern Recognition (ICPR\u201912) . Pierre Sermanet, Soumith Chintala, and Yann LeCun. 2012. Convolutional neural networks applied to house numbers digit classification. In Proceedings of the International Conference on Pattern Recognition (ICPR\u201912)."},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.5555\/2968826.2968890"},{"key":"e_1_2_1_52_1","volume-title":"Amir Roshan Zamir, and Mubarak Shah","author":"Soomro Khurram","year":"2012","unstructured":"Khurram Soomro , Amir Roshan Zamir, and Mubarak Shah . 2012 . A dataset of 101 human action classes from videos in the wild. 2, 11, Center for Research in Computer Vision. Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. 2012. A dataset of 101 human action classes from videos in the wild. 2, 11, Center for Research in Computer Vision."},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00151"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.308"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.510"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2017.2712608"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2011.5995407"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2013.441"},{"key":"e_1_2_1_60_1","article-title":"Fast and accurate action detection in videos with motion-centric attention model","volume":"30","author":"Wang Jinzhuo","year":"2018","unstructured":"Jinzhuo Wang , Wenmin Wang , and Wen Gao . 2018 b. Fast and accurate action detection in videos with motion-centric attention model . IEEE Trans. Circ. Syst. Vid. Technol. 30 , 1 (2018). 117\u2013130. Jinzhuo Wang, Wenmin Wang, and Wen Gao. 2018b. Fast and accurate action detection in videos with motion-centric attention model. IEEE Trans. Circ. Syst. Vid. Technol. 30, 1 (2018). 117\u2013130.","journal-title":"IEEE Trans. Circ. Syst. Vid. Technol."},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2010.5540018"},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7299059"},{"key":"e_1_2_1_63_1","volume-title":"Towards good practices for very deep two-stream convnets. arXiv:1507.02159","author":"Wang Limin","year":"2015","unstructured":"Limin Wang , Yuanjun Xiong , Zhe Wang , and Yu Qiao . 2015b. Towards good practices for very deep two-stream convnets. arXiv:1507.02159 ( 2015 ). Retrieved from https:\/\/arxiv.org\/abs\/1507.02159. Limin Wang, Yuanjun Xiong, Zhe Wang, and Yu Qiao. 2015b. Towards good practices for very deep two-stream convnets. arXiv:1507.02159 (2015). Retrieved from https:\/\/arxiv.org\/abs\/1507.02159."},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46484-8_2"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00813"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01228-1_25"},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM41043.2020.9155402"},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-88688-4_48"},{"key":"e_1_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.634"},{"key":"e_1_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01267-0_19"},{"key":"e_1_2_1_71_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01234-2_37"},{"key":"e_1_2_1_72_1","doi-asserted-by":"publisher","DOI":"10.5555\/1771530.1771554"},{"key":"e_1_2_1_73_1","doi-asserted-by":"publisher","DOI":"10.5244\/C.30.87"},{"key":"e_1_2_1_74_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.297"},{"key":"e_1_2_1_75_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2014.265"},{"volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.1325\u20131334","author":"Zhang Xiaolin","key":"e_1_2_1_76_1","unstructured":"Xiaolin Zhang , Yunchao Wei , Jiashi Feng , Yi Yang , and Thomas S. Huang . 2018. Adversarial complementary learning for weakly supervised object localization . In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.1325\u20131334 . Xiaolin Zhang, Yunchao Wei, Jiashi Feng, Yi Yang, and Thomas S. Huang. 2018. Adversarial complementary learning for weakly supervised object localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.1325\u20131334."},{"key":"e_1_2_1_77_1","doi-asserted-by":"publisher","DOI":"10.5555\/2354409.2354957"},{"key":"e_1_2_1_78_1","unstructured":"Liang Zheng Yali Zhao Shengjin Wang Jingdong Wang and Qi Tian. 2016. Good Practice in CNN Feature Transfer. arXiv:1604.00133. Retrieved from https:\/\/arxiv.org\/abs\/1604.00133.  Liang Zheng Yali Zhao Shengjin Wang Jingdong Wang and Qi Tian. 2016. Good Practice in CNN Feature Transfer. arXiv:1604.00133. Retrieved from https:\/\/arxiv.org\/abs\/1604.00133."},{"key":"e_1_2_1_79_1","doi-asserted-by":"publisher","DOI":"10.1145\/3240508.3240511"},{"key":"e_1_2_1_80_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01246-5_49"}],"container-title":["ACM Transactions on Intelligent Systems and Technology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3447686","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3447686","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T22:41:10Z","timestamp":1750200070000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3447686"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,6,29]]},"references-count":80,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2021,6,30]]}},"alternative-id":["10.1145\/3447686"],"URL":"https:\/\/doi.org\/10.1145\/3447686","relation":{},"ISSN":["2157-6904","2157-6912"],"issn-type":[{"type":"print","value":"2157-6904"},{"type":"electronic","value":"2157-6912"}],"subject":[],"published":{"date-parts":[[2021,6,29]]},"assertion":[{"value":"2019-08-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-01-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-06-29","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}