{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,9]],"date-time":"2026-01-09T18:52:14Z","timestamp":1767984734893,"version":"3.49.0"},"reference-count":34,"publisher":"MDPI AG","issue":"6","license":[{"start":{"date-parts":[[2019,3,20]],"date-time":"2019-03-20T00:00:00Z","timestamp":1553040000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100003725","name":"National Research Foundation of Korea","doi-asserted-by":"publisher","award":["NRF-2017M3C4A7069370"],"award-info":[{"award-number":["NRF-2017M3C4A7069370"]}],"id":[{"id":"10.13039\/501100003725","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>In action recognition research, two primary types of information are appearance and motion information that is learned from RGB images through visual sensors. However, depending on the action characteristics, contextual information, such as the existence of specific objects or globally-shared information in the image, becomes vital information to define the action. For example, the existence of the ball is vital information distinguishing \u201ckicking\u201d from \u201crunning\u201d. Furthermore, some actions share typical global abstract poses, which can be used as a key to classify actions. Based on these observations, we propose the multi-stream network model, which incorporates spatial, temporal, and contextual cues in the image for action recognition. We experimented on the proposed method using C3D or inflated 3D ConvNet (I3D) as a backbone network, regarding two different action recognition datasets. As a result, we observed overall improvement in accuracy, demonstrating the effectiveness of our proposed method.<\/jats:p>","DOI":"10.3390\/s19061382","type":"journal-article","created":{"date-parts":[[2019,3,21]],"date-time":"2019-03-21T04:11:56Z","timestamp":1553141516000},"page":"1382","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":18,"title":["Contextual Action Cues from Camera Sensor for Multi-Stream Action Recognition"],"prefix":"10.3390","volume":"19","author":[{"given":"Jongkwang","family":"Hong","sequence":"first","affiliation":[{"name":"Department of Computer Science, Yonsei University, Seoul 03722, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bora","family":"Cho","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Yonsei University, Seoul 03722, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yong Won","family":"Hong","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Yonsei University, Seoul 03722, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3082-3214","authenticated-orcid":false,"given":"Hyeran","family":"Byun","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Yonsei University, Seoul 03722, Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2019,3,20]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1679","DOI":"10.1109\/TPAMI.2015.2496209","article-title":"Probabilistic Social Behavior Analysis by Exploring Body Motion-Based Patterns","volume":"38","author":"Roudposhti","year":"2016","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Lu, C., Shi, J., Wang, W., and Jia, J. (2018). Fast Abnormal Event Detection. Int. J. Comput. Vis.","DOI":"10.1007\/s11263-018-1129-8"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"73","DOI":"10.1016\/j.patcog.2017.01.030","article-title":"Accurate object detection using memory-based models in surveillance scenes","volume":"67","author":"Li","year":"2017","journal-title":"Pattern Recognit."},{"key":"ref_4","unstructured":"Simonyan, K., and Zisserman, A. (2014, January 8\u201313). Two-Stream Convolutional Networks for Action Recognition in Videos. Proceedings of the Advances in Neural Information Processing Systems 27, Annual Conference on Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Peng, X., and Schmid, C. (2016, January 11\u201314). Multi-region Two-Stream R-CNN for Action Detection. Proceedings of the Computer Vision\u2014ECCV 2016\u201414th European Conference, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46493-0_45"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Nie, B.X., Xiong, C., and Zhu, S. (2015, January 7\u201312). Joint action recognition and pose estimation from video. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298734"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Du, W., Wang, Y., and Qiao, Y. (2017, January 22\u201329). RPAN: An End-to-End Recurrent Pose-Attention Network for Action Recognition in Videos. Proceedings of the IEEE International Conference on Computer Vision (ICCV 2017), Venice, Italy.","DOI":"10.1109\/ICCV.2017.402"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Chao, Y., Liu, Y., Liu, X., Zeng, H., and Deng, J. (2018, January 12\u201315). Learning to Detect Human-Object Interactions. Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018, Lake Tahoe, NV, USA.","DOI":"10.1109\/WACV.2018.00048"},{"key":"ref_9","unstructured":"Sun, J., Wu, X., Yan, S., Cheong, L.F., Chua, T., and Li, J. (2009, January 20\u201325). Hierarchical spatio-temporal context modeling for action recognition. Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), Miami, FL, USA."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Torralba, A., Murphy, K.P., Freeman, W.T., and Rubin, M.A. (2003, January 14\u201317). Context-based vision system for place and object recognition. Proceedings of the 9th IEEE International Conference on Computer Vision (ICCV 2003), Nice, France.","DOI":"10.1109\/ICCV.2003.1238354"},{"key":"ref_11","unstructured":"Soomro, K., Zamir, A.R., and Shah, M. (arXiv, 2012). UCF101: A Dataset of 101 Human Actions Classes from Videos in the Wild, arXiv."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Tran, D., Bourdev, L.D., Fergus, R., Torresani, L., and Paluri, M. (2015, January 7\u201313). Learning Spatiotemporal Features with 3D Convolutional Networks. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV 2015), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.510"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Carreira, J., and Zisserman, A. (2017, January 21\u201326). Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.502"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Sivic, J., and Zisserman, A. (2003, January 14\u201317). Video Google: A Text Retrieval Approach to Object Matching in Videos. Proceedings of the 9th IEEE International Conference on Computer Vision (ICCV 2003), Nice, France.","DOI":"10.1109\/ICCV.2003.1238663"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Wang, H., Kl\u00e4ser, A., Schmid, C., and Liu, C. (2011, January 20\u201325). Action recognition by dense trajectories. Proceedings of the 24th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2011), Colorado Springs, CO, USA.","DOI":"10.1109\/CVPR.2011.5995407"},{"key":"ref_16","unstructured":"Dalal, N., and Triggs, B. (2005, January 20\u201326). Histograms of Oriented Gradients for Human Detection. Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), San Diego, CA, USA."},{"key":"ref_17","unstructured":"Zach, C., Pock, T., and Bischof, H. (2007, January 12\u201314). A Duality Based Approach for Realtime TV-L1 Optical Flow. Proceedings of the 29th DAGM Symposium on Pattern Recognition, Heidelberg, Germany."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., and Gool, L.V. (2016, January 11\u201314). Temporal Segment Networks: Towards Good Practices for Deep Action Recognition. Proceedings of the Computer Vision\u2014ECCV 2016\u201414th European Conference, Amsterdam, The Netherlands.","DOI":"10.1007\/978-3-319-46484-8_2"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long Short-Term Memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput."},{"key":"ref_20","unstructured":"Ng, J.Y., Hausknecht, M.J., Vijayanarasimhan, S., Vinyals, O., Monga, R., and Toderici, G. (2015, January 7\u201312). Beyond short snippets: Deep networks for video classification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015), Boston, MA, USA."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Gammulle, H., Denman, S., Sridharan, S., and Fookes, C. (2017, January 24\u201331). Two Stream LSTM: A Deep Fusion Framework for Human Action Recognition. Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV 2017), Santa Rosa, CA, USA.","DOI":"10.1109\/WACV.2017.27"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Qiu, Z., Yao, T., and Mei, T. (2017, January 22\u201329). Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks. Proceedings of the IEEE International Conference on Computer Vision (ICCV 2017), Venice, Italy.","DOI":"10.1109\/ICCV.2017.590"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., and Paluri, M. (2018, January 18\u201322). A Closer Look at Spatiotemporal Convolutions for Action Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00675"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Chao, Y., Wang, Z., He, Y., Wang, J., and Deng, J. (2015, January 7\u201313). HICO: A Benchmark for Recognizing Human-Object Interactions in Images. Proceedings of the IEEE International Conference on Computer Vision (ICCV 2015), Santiago, Chile.","DOI":"10.1109\/ICCV.2015.122"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Gkioxari, G., Girshick, R.B., Doll\u00e1r, P., and He, K. (2018, January 18\u201322). Detecting and Recognizing Human-Object Interactions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2018), Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00872"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Lin, T., Maire, M., Belongie, S.J., Hays, J., Perona, P., Ramanan, D., Doll\u00e1r, P., and Zitnick, C.L. (2014, January 6\u201312). Microsoft COCO: Common Objects in Context. Proceedings of the Computer Vision\u2014ECCV 2014\u201413th European Conference, Zurich, Switzerland.","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"1137","DOI":"10.1109\/TPAMI.2016.2577031","article-title":"Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks","volume":"39","author":"Ren","year":"2017","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"He, K., Gkioxari, G., Doll\u00e1r, P., and Girshick, R.B. (2017, January 22\u201329). Mask R-CNN. Proceedings of the IEEE International Conference on Computer Vision (ICCV 2017), Venice, Italy.","DOI":"10.1109\/ICCV.2017.322"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Kuehne, H., Jhuang, H., Garrote, E., Poggio, T.A., and Serre, T. (2011, January 6\u201313). HMDB: A large video database for human motion recognition. Proceedings of the IEEE International Conference on Computer Vision (ICCV 2011), Barcelona, Spain.","DOI":"10.1109\/ICCV.2011.6126543"},{"key":"ref_30","unstructured":"Kay, W., Carreira, J., Simonyan, K., Zhang, B., Hillier, C., Vijayanarasimhan, S., Viola, F., Green, T., Back, T., and Natsev, P. (arXiv, 2017). The Kinetics Human Action Video Dataset, arXiv."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., and Li, F. (2014, January 23\u201328). Large-Scale Video Classification with Convolutional Neural Networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2014), Columbus, OH, USA.","DOI":"10.1109\/CVPR.2014.223"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1007\/s11263-015-0816-y","article-title":"ImageNet Large Scale Visual Recognition Challenge","volume":"115","author":"Russakovsky","year":"2015","journal-title":"Int. J. Comput. Vis."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Feichtenhofer, C., Pinz, A., and Zisserman, A. (2016, January 27\u201330). Convolutional Two-Stream Network Fusion for Video Action Recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), Las Vegas, NV, USA.","DOI":"10.1109\/CVPR.2016.213"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Feichtenhofer, C., Pinz, A., and Wildes, R.P. (2017, January 21\u201326). Spatiotemporal Multiplier Networks for Video Action Recognition. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.787"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/19\/6\/1382\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T12:39:19Z","timestamp":1760186359000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/19\/6\/1382"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,3,20]]},"references-count":34,"journal-issue":{"issue":"6","published-online":{"date-parts":[[2019,3]]}},"alternative-id":["s19061382"],"URL":"https:\/\/doi.org\/10.3390\/s19061382","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,3,20]]}}}