{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,20]],"date-time":"2025-07-20T04:09:58Z","timestamp":1752984598762,"version":"3.41.0"},"publisher-location":"New York, NY, USA","reference-count":26,"publisher":"ACM","license":[{"start":{"date-parts":[[2022,12,1]],"date-time":"2022-12-01T00:00:00Z","timestamp":1669852800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Honors Program, University of Science, Vietnam National University - Ho Chi Minh City"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2022,12]]},"DOI":"10.1145\/3568562.3568653","type":"proceedings-article","created":{"date-parts":[[2022,11,29]],"date-time":"2022-11-29T00:25:01Z","timestamp":1669681501000},"page":"302-308","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Motion Embedded Image: A combination of spatial and temporal features for action recognition"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2546-5651","authenticated-orcid":false,"given":"Tri","family":"Le","sequence":"first","affiliation":[{"name":"Faculty of Information Technology, University of Science - VNUHCM, Viet Nam and Viet Nam National University, Ho Chi Minh City, Vietnam"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8480-8773","authenticated-orcid":false,"given":"Nham","family":"Huynh-Duc","sequence":"additional","affiliation":[{"name":"Faculty of Information Technology, University of Science - VNUHCM, Viet Nam and Viet Nam National University, Ho Chi Minh City, Vietnam"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4159-3311","authenticated-orcid":false,"given":"Chung Thai","family":"Nguyen","sequence":"additional","affiliation":[{"name":"Faculty of Information Technology, University of Science - VNUHCM, Viet Nam and Viet Nam National University, Ho Chi Minh City, Vietnam"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3046-3041","authenticated-orcid":false,"given":"Minh-Triet","family":"Tran","sequence":"additional","affiliation":[{"name":"Faculty of Information Technology, Software Engineering Lab, John von Neumann Institute, University of Science - VNUHCM, Viet Nam and Viet Nam National University, Ho Chi Minh City, Vietnam"}]}],"member":"320","published-online":{"date-parts":[[2022,12]]},"reference":[{"key":"e_1_3_2_1_1_1","unstructured":"Jo\u00e3o Carreira and Andrew Zisserman. 2017. Quo Vadis Action Recognition? A New Model and the Kinetics Dataset. CoRR abs\/1705.07750(2017). arXiv:1705.07750http:\/\/arxiv.org\/abs\/1705.07750  Jo\u00e3o Carreira and Andrew Zisserman. 2017. Quo Vadis Action Recognition? A New Model and the Kinetics Dataset. CoRR abs\/1705.07750(2017). arXiv:1705.07750http:\/\/arxiv.org\/abs\/1705.07750"},{"key":"e_1_3_2_1_2_1","unstructured":"Christoph Feichtenhofer. 2020. X3D: Expanding Architectures for Efficient Video Recognition. CoRR abs\/2004.04730(2020). arXiv:2004.04730https:\/\/arxiv.org\/abs\/2004.04730  Christoph Feichtenhofer. 2020. X3D: Expanding Architectures for Efficient Video Recognition. CoRR abs\/2004.04730(2020). arXiv:2004.04730https:\/\/arxiv.org\/abs\/2004.04730"},{"key":"e_1_3_2_1_3_1","unstructured":"Christoph Feichtenhofer Haoqi Fan Jitendra Malik and Kaiming He. 2018. SlowFast Networks for Video Recognition. CoRR abs\/1812.03982(2018). arXiv:1812.03982http:\/\/arxiv.org\/abs\/1812.03982  Christoph Feichtenhofer Haoqi Fan Jitendra Malik and Kaiming He. 2018. SlowFast Networks for Video Recognition. CoRR abs\/1812.03982(2018). arXiv:1812.03982http:\/\/arxiv.org\/abs\/1812.03982"},{"key":"e_1_3_2_1_4_1","unstructured":"Christoph Feichtenhofer Axel Pinz and Andrew Zisserman. 2016. Convolutional Two-Stream Network Fusion for Video Action Recognition. CoRR abs\/1604.06573(2016). arXiv:1604.06573http:\/\/arxiv.org\/abs\/1604.06573  Christoph Feichtenhofer Axel Pinz and Andrew Zisserman. 2016. Convolutional Two-Stream Network Fusion for Video Action Recognition. CoRR abs\/1604.06573(2016). arXiv:1604.06573http:\/\/arxiv.org\/abs\/1604.06573"},{"key":"e_1_3_2_1_5_1","doi-asserted-by":"crossref","unstructured":"Raghav Goyal Samira\u00a0Ebrahimi Kahou Vincent Michalski Joanna Materzynska Susanne Westphal Heuna Kim Valentin Haenel Ingo Fr\u00fcnd Peter Yianilos Moritz Mueller-Freitag Florian Hoppe Christian Thurau Ingo Bax and Roland Memisevic. 2017. The \"something something\" video database for learning and evaluating visual common sense. CoRR abs\/1706.04261(2017). arXiv:1706.04261http:\/\/arxiv.org\/abs\/1706.04261  Raghav Goyal Samira\u00a0Ebrahimi Kahou Vincent Michalski Joanna Materzynska Susanne Westphal Heuna Kim Valentin Haenel Ingo Fr\u00fcnd Peter Yianilos Moritz Mueller-Freitag Florian Hoppe Christian Thurau Ingo Bax and Roland Memisevic. 2017. The \"something something\" video database for learning and evaluating visual common sense. CoRR abs\/1706.04261(2017). arXiv:1706.04261http:\/\/arxiv.org\/abs\/1706.04261","DOI":"10.1109\/ICCV.2017.622"},{"key":"e_1_3_2_1_6_1","unstructured":"Charles Han Chao Wang Evelyn Mei Joseph Redmon Santosh\u00a0Kumar Divvala Zuxuan Wu Xi Wang Yu-Gang Jiang Hao Ye and X. Xue. 2017. YOLO-based Adaptive Window Two-stream Convolutional Neural Network for Video Classification.  Charles Han Chao Wang Evelyn Mei Joseph Redmon Santosh\u00a0Kumar Divvala Zuxuan Wu Xi Wang Yu-Gang Jiang Hao Ye and X. Xue. 2017. YOLO-based Adaptive Window Two-stream Convolutional Neural Network for Video Classification."},{"key":"e_1_3_2_1_7_1","unstructured":"Kensho Hara Hirokatsu Kataoka and Yutaka Satoh. 2017. Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?CoRR abs\/1711.09577(2017). arXiv:1711.09577http:\/\/arxiv.org\/abs\/1711.09577  Kensho Hara Hirokatsu Kataoka and Yutaka Satoh. 2017. Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?CoRR abs\/1711.09577(2017). arXiv:1711.09577http:\/\/arxiv.org\/abs\/1711.09577"},{"key":"e_1_3_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298698"},{"key":"e_1_3_2_1_9_1","unstructured":"Cordelia Schmid Liu Cheng-Lin Heng\u00a0Wang Alexander\u00a0Kl\u00e4ser. 2011. Action Recognition by Dense Trajectories. (2011). https:\/\/hal.inria.fr\/inria-00583818\/document  Cordelia Schmid Liu Cheng-Lin Heng\u00a0Wang Alexander\u00a0Kl\u00e4ser. 2011. Action Recognition by Dense Trajectories. (2011). https:\/\/hal.inria.fr\/inria-00583818\/document"},{"key":"e_1_3_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2012.59"},{"key":"#cr-split#-e_1_3_2_1_11_1.1","doi-asserted-by":"crossref","unstructured":"M.\u00a0Esat Kalfaoglu Sinan Kalkan and A.\u00a0Aydin Alatan. 2020. Late Temporal Modeling in 3D CNN Architectures with BERT for Action Recognition. https:\/\/doi.org\/10.48550\/ARXIV.2008.01232 10.48550\/ARXIV.2008.01232","DOI":"10.1007\/978-3-030-68238-5_48"},{"key":"#cr-split#-e_1_3_2_1_11_1.2","doi-asserted-by":"crossref","unstructured":"M.\u00a0Esat Kalfaoglu Sinan Kalkan and A.\u00a0Aydin Alatan. 2020. Late Temporal Modeling in 3D CNN Architectures with BERT for Action Recognition. https:\/\/doi.org\/10.48550\/ARXIV.2008.01232","DOI":"10.1007\/978-3-030-68238-5_48"},{"key":"e_1_3_2_1_12_1","volume-title":"Large-Scale Video Classification with Convolutional Neural Networks. In 2014 IEEE Conference on Computer Vision and Pattern Recognition. 1725\u20131732","author":"Karpathy Andrej","year":"2014","unstructured":"Andrej Karpathy , George Toderici , Sanketh Shetty , Thomas Leung , Rahul Sukthankar , and Li Fei-Fei . 2014 . Large-Scale Video Classification with Convolutional Neural Networks. In 2014 IEEE Conference on Computer Vision and Pattern Recognition. 1725\u20131732 . https:\/\/doi.org\/10.1109\/CVPR.2014.223 10.1109\/CVPR.2014.223 Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. 2014. Large-Scale Video Classification with Convolutional Neural Networks. In 2014 IEEE Conference on Computer Vision and Pattern Recognition. 1725\u20131732. https:\/\/doi.org\/10.1109\/CVPR.2014.223"},{"key":"e_1_3_2_1_13_1","unstructured":"Will Kay Jo\u00e3o Carreira Karen Simonyan Brian Zhang Chloe Hillier Sudheendra Vijayanarasimhan Fabio Viola Tim Green Trevor Back Paul Natsev Mustafa Suleyman and Andrew Zisserman. 2017. The Kinetics Human Action Video Dataset. CoRR abs\/1705.06950(2017). arXiv:1705.06950http:\/\/arxiv.org\/abs\/1705.06950  Will Kay Jo\u00e3o Carreira Karen Simonyan Brian Zhang Chloe Hillier Sudheendra Vijayanarasimhan Fabio Viola Tim Green Trevor Back Paul Natsev Mustafa Suleyman and Andrew Zisserman. 2017. The Kinetics Human Action Video Dataset. CoRR abs\/1705.06950(2017). arXiv:1705.06950http:\/\/arxiv.org\/abs\/1705.06950"},{"volume-title":"Advances in Neural Information Processing Systems 25, F.\u00a0Pereira, C.\u00a0J.\u00a0C. Burges, L.\u00a0Bottou, and K.\u00a0Q","author":"Krizhevsky Alex","key":"e_1_3_2_1_14_1","unstructured":"Alex Krizhevsky , Ilya Sutskever , and Geoffrey\u00a0 E. Hinton . 2012. ImageNet Classification with Deep Convolutional Neural Networks . In Advances in Neural Information Processing Systems 25, F.\u00a0Pereira, C.\u00a0J.\u00a0C. Burges, L.\u00a0Bottou, and K.\u00a0Q . Weinberger (Eds.). Curran Associates, Inc. , 1097\u20131105. http:\/\/papers.nips.cc\/paper\/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf Alex Krizhevsky, Ilya Sutskever, and Geoffrey\u00a0E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems 25, F.\u00a0Pereira, C.\u00a0J.\u00a0C. Burges, L.\u00a0Bottou, and K.\u00a0Q. Weinberger (Eds.). Curran Associates, Inc., 1097\u20131105. http:\/\/papers.nips.cc\/paper\/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf"},{"key":"e_1_3_2_1_15_1","volume-title":"Proceedings Ninth IEEE International Conference on Computer Vision. 432\u2013439","volume":"1","author":"Lindeberg Laptev","year":"2003","unstructured":"Laptev and Lindeberg . 2003 . Space-time interest points . In Proceedings Ninth IEEE International Conference on Computer Vision. 432\u2013439 vol. 1 . https:\/\/doi.org\/10.1109\/ICCV.2003.1238378 10.1109\/ICCV.2003.1238378 Laptev and Lindeberg. 2003. Space-time interest points. In Proceedings Ninth IEEE International Conference on Computer Vision. 432\u2013439 vol.1. https:\/\/doi.org\/10.1109\/ICCV.2003.1238378"},{"key":"e_1_3_2_1_16_1","unstructured":"Ji Lin Chuang Gan and Song Han. 2018. Temporal Shift Module for Efficient Video Understanding. CoRR abs\/1811.08383(2018). arXiv:1811.08383http:\/\/arxiv.org\/abs\/1811.08383  Ji Lin Chuang Gan and Song Han. 2018. Temporal Shift Module for Efficient Video Understanding. CoRR abs\/1811.08383(2018). arXiv:1811.08383http:\/\/arxiv.org\/abs\/1811.08383"},{"key":"e_1_3_2_1_17_1","unstructured":"Joe\u00a0Yue-Hei Ng Jonghyun Choi Jan Neumann and Larry\u00a0S. Davis. 2016. ActionFlowNet: Learning Motion Representation for Action Recognition. CoRR abs\/1612.03052(2016). arXiv:1612.03052http:\/\/arxiv.org\/abs\/1612.03052  Joe\u00a0Yue-Hei Ng Jonghyun Choi Jan Neumann and Larry\u00a0S. Davis. 2016. ActionFlowNet: Learning Motion Representation for Action Recognition. CoRR abs\/1612.03052(2016). arXiv:1612.03052http:\/\/arxiv.org\/abs\/1612.03052"},{"key":"e_1_3_2_1_18_1","doi-asserted-by":"crossref","unstructured":"Joe\u00a0Yue-Hei Ng Matthew\u00a0J. Hausknecht Sudheendra Vijayanarasimhan Oriol Vinyals Rajat Monga and George Toderici. 2015. Beyond Short Snippets: Deep Networks for Video Classification. CoRR abs\/1503.08909(2015). arXiv:1503.08909http:\/\/arxiv.org\/abs\/1503.08909  Joe\u00a0Yue-Hei Ng Matthew\u00a0J. Hausknecht Sudheendra Vijayanarasimhan Oriol Vinyals Rajat Monga and George Toderici. 2015. Beyond Short Snippets: Deep Networks for Video Classification. CoRR abs\/1503.08909(2015). arXiv:1503.08909http:\/\/arxiv.org\/abs\/1503.08909","DOI":"10.1109\/CVPR.2015.7299101"},{"key":"e_1_3_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2008.4587727"},{"key":"e_1_3_2_1_20_1","unstructured":"Karen Simonyan and Andrew Zisserman. 2014. Two-Stream Convolutional Networks for Action Recognition in Videos. CoRR abs\/1406.2199(2014). arXiv:1406.2199http:\/\/arxiv.org\/abs\/1406.2199  Karen Simonyan and Andrew Zisserman. 2014. Two-Stream Convolutional Networks for Action Recognition in Videos. CoRR abs\/1406.2199(2014). arXiv:1406.2199http:\/\/arxiv.org\/abs\/1406.2199"},{"key":"e_1_3_2_1_21_1","unstructured":"Khurram Soomro Amir\u00a0Roshan Zamir and Mubarak Shah. 2012. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild. CoRR abs\/1212.0402(2012). arXiv:1212.0402http:\/\/arxiv.org\/abs\/1212.0402  Khurram Soomro Amir\u00a0Roshan Zamir and Mubarak Shah. 2012. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild. CoRR abs\/1212.0402(2012). arXiv:1212.0402http:\/\/arxiv.org\/abs\/1212.0402"},{"key":"e_1_3_2_1_22_1","unstructured":"Limin Wang Yuanjun Xiong Zhe Wang and Yu Qiao. 2015. Towards Good Practices for Very Deep Two-Stream ConvNets. CoRR abs\/1507.02159(2015). arXiv:1507.02159http:\/\/arxiv.org\/abs\/1507.02159  Limin Wang Yuanjun Xiong Zhe Wang and Yu Qiao. 2015. Towards Good Practices for Very Deep Two-Stream ConvNets. CoRR abs\/1507.02159(2015). arXiv:1507.02159http:\/\/arxiv.org\/abs\/1507.02159"},{"key":"e_1_3_2_1_23_1","first-page":"214","article-title":"A Duality Based Approach for Realtime TV-L1 Optical Flow","volume":"4713","author":"Zach Christopher","year":"2007","unstructured":"Christopher Zach , Thomas Pock , and Horst Bischof . 2007 . A Duality Based Approach for Realtime TV-L1 Optical Flow . Pattern Recognition 4713 , 214 \u2013 223 . https:\/\/doi.org\/10.1007\/978-3-540-74936-3_22 10.1007\/978-3-540-74936-3_22 Christopher Zach, Thomas Pock, and Horst Bischof. 2007. A Duality Based Approach for Realtime TV-L1 Optical Flow. Pattern Recognition 4713, 214\u2013223. https:\/\/doi.org\/10.1007\/978-3-540-74936-3_22","journal-title":"Pattern Recognition"},{"key":"e_1_3_2_1_24_1","unstructured":"Bowen Zhang Limin Wang Zhe Wang Yu Qiao and Hanli Wang. 2016. Real-time Action Recognition with Enhanced Motion Vector CNNs. CoRR abs\/1604.07669(2016). arXiv:1604.07669http:\/\/arxiv.org\/abs\/1604.07669  Bowen Zhang Limin Wang Zhe Wang Yu Qiao and Hanli Wang. 2016. Real-time Action Recognition with Enhanced Motion Vector CNNs. CoRR abs\/1604.07669(2016). arXiv:1604.07669http:\/\/arxiv.org\/abs\/1604.07669"},{"key":"e_1_3_2_1_25_1","unstructured":"Yi Zhu Zhen-Zhong Lan Shawn\u00a0D. Newsam and Alexander\u00a0G. Hauptmann. 2017. Hidden Two-Stream Convolutional Networks for Action Recognition. CoRR abs\/1704.00389(2017). arXiv:1704.00389http:\/\/arxiv.org\/abs\/1704.00389  Yi Zhu Zhen-Zhong Lan Shawn\u00a0D. Newsam and Alexander\u00a0G. Hauptmann. 2017. Hidden Two-Stream Convolutional Networks for Action Recognition. CoRR abs\/1704.00389(2017). arXiv:1704.00389http:\/\/arxiv.org\/abs\/1704.00389"}],"event":{"name":"SoICT 2022: The 11th International Symposium on Information and Communication Technology","acronym":"SoICT 2022","location":"Hanoi Vietnam"},"container-title":["The 11th International Symposium on Information and Communication Technology"],"original-title":[],"link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3568562.3568653","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3568562.3568653","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:00:40Z","timestamp":1750186840000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3568562.3568653"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12]]},"references-count":26,"alternative-id":["10.1145\/3568562.3568653","10.1145\/3568562"],"URL":"https:\/\/doi.org\/10.1145\/3568562.3568653","relation":{},"subject":[],"published":{"date-parts":[[2022,12]]},"assertion":[{"value":"2022-12-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}