{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T01:44:35Z","timestamp":1760147075447,"version":"build-2065373602"},"reference-count":39,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2023,1,12]],"date-time":"2023-01-12T00:00:00Z","timestamp":1673481600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>Human pose prediction is vital for robot applications such as human\u2013robot interaction and autonomous control of robots. Recent prediction methods often use deep learning and are based on a 3D human skeleton sequence to predict future poses. Even if the starting motions of 3D human skeleton sequences are very similar, their future poses will have variety. It makes it difficult to predict future poses only from a given human skeleton sequence. Meanwhile, when carefully observing human motions, we can find that human motions are often affected by objects or other people around the target person. We consider that the presence of surrounding objects is an important clue for the prediction. This paper proposes a method for predicting the future skeleton sequence by incorporating the surrounding situation into the prediction model. The proposed method uses a feature of an image around the target person as the surrounding information. We confirmed the performance improvement of the proposed method through evaluations on publicly available datasets. As a result, the prediction accuracy was improved for object-related and human-related motions.<\/jats:p>","DOI":"10.3390\/s23020876","type":"journal-article","created":{"date-parts":[[2023,1,12]],"date-time":"2023-01-12T04:29:38Z","timestamp":1673497778000},"page":"876","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["Future Pose Prediction from 3D Human Skeleton Sequence with Surrounding Situation"],"prefix":"10.3390","volume":"23","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3352-4225","authenticated-orcid":false,"given":"Tomohiro","family":"Fujita","sequence":"first","affiliation":[{"name":"Guardian Robot Project R-IH, RIKEN, Advanced Telecommunications Research Institute International, 3rd Floor, 2-2-2 Hikaridai, Seika-cho, Sorakugun, Kyoto 619-0288, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3799-4550","authenticated-orcid":false,"given":"Yasutomo","family":"Kawanishi","sequence":"additional","affiliation":[{"name":"Guardian Robot Project R-IH, RIKEN, Advanced Telecommunications Research Institute International, 3rd Floor, 2-2-2 Hikaridai, Seika-cho, Sorakugun, Kyoto 619-0288, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2023,1,12]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"79","DOI":"10.1007\/s12369-009-0037-z","article-title":"Probabilistic Autonomous Robot Navigation in Dynamic Environments with Human Motion Prediction","volume":"2","author":"Foka","year":"2010","journal-title":"Int. J. Soc. Robot."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Koppula, H.S., and Saxena, A. (2013, January 3\u20137). Anticipating human activities for reactive robotic response. Proceedings of the 2013 IEEE\/RSJ International Conference on Intelligent Robots and Systems, Tokyo, Japan.","DOI":"10.1109\/IROS.2013.6696634"},{"key":"ref_3","unstructured":"Gong, H., Sim, J., Likhachev, M., and Shi, J. (2011, January 6\u201313). Multi-hypothesis motion planning for visual object tracking. Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"287","DOI":"10.1016\/j.jmsy.2017.04.009","article-title":"Human motion prediction for human-robot collaboration","volume":"44","author":"Liu","year":"2017","journal-title":"J. Manuf. Syst."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Gui, L.Y., Zhang, K., Wang, Y.X., Liang, X., Moura, J.M.F., and Veloso, M. (2018, January 1\u20135). Teaching Robots to Predict Human Motion. Proceedings of the 2018 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain.","DOI":"10.1109\/IROS.2018.8594452"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Brand, M., and Hertzmann, A. (2000, January 23\u201328). Style Machines. Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, New Orleans, LA, USA.","DOI":"10.1145\/344779.344865"},{"key":"ref_7","unstructured":"Taylor, G.W., Hinton, G.E., and Roweis, S.T. (2006, January 4\u20137). Modeling Human Motion Using Binary Latent Variables. Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Fragkiadaki, K., Levine, S., Felsen, P., and Malik, J. (2015, January 7\u201313). Recurrent network models for human dynamics. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.494"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Martinez, J., Black, M.J., and Romero, J. (2017, January 21\u201326). On human motion prediction using recurrent neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.497"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Tang, Y., Ma, L., Liu, W., and Zheng, W.S. (2018, January 13\u201319). Long-Term Human Motion Prediction by Modeling Motion Context and Enhancing Motion Dynamics. Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18, Stockholm, Sweden.","DOI":"10.24963\/ijcai.2018\/130"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Wang, B., Adeli, E., Chiu, H.K., Huang, D.A., and Niebles, J.C. (November, January 27). Imitation Learning for Human Pose Prediction. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision, Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00722"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Li, M., Chen, S., Zhao, Y., Zhang, Y., Wang, Y., and Tian, Q. (2020, January 14\u201319). Dynamic Multiscale Graph Neural Networks for 3D Skeleton Based Human Motion Prediction. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00029"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Mao, W., Liu, M., Salzmann, M., and Li, H. (November, January 27). Learning Trajectory Dependencies for Human Motion Prediction. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision, Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00958"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Cui, Q., Sun, H., and Yang, F. (2020, January 14\u201319). Learning Dynamic Relationships for 3D Human Motion Prediction. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00655"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Sofianos, T., Sampieri, A., Franco, L., and Galasso, F. (2021, January 11\u201317). Space-Time-Separable Graph Convolutional Network for Pose Forecasting. Proceedings of the 2021 IEEE\/CVF International Conference on Computer Vision, Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.01102"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Dang, L., Nie, Y., Long, C., Zhang, Q., and Li, G. (2021, January 11\u201317). MSR-GCN: Multi-Scale Residual Graph Convolution Networks for Human Motion Prediction. Proceedings of the 2021 IEEE\/CVF International Conference on Computer Vision, Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.01127"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Fujita, T., and Kawanishi, Y. (2022, January 21\u201325). Toward Surroundings-aware Temporal Prediction of 3D Human Skeleton Sequence. Proceedings of the Towards a Complete Analysis of People: From Face and Body to Clothes (T-CAP), Montreal, QC, Canada.","DOI":"10.1007\/978-3-031-37660-3_10"},{"key":"ref_18","unstructured":"Wang, J., Hertzmann, A., and Fleet, D.J. (2006, January 4\u20137). Gaussian Process Dynamical Models. Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Yan, S., Xiong, Y., and Lin, D. (2018, January 2\u20137). Spatial temporal graph convolutional networks for skeleton-based action recognition. Proceedings of the Thirty-Second AAAI conference on artificial intelligence, New Orleans, LA, USA.","DOI":"10.1609\/aaai.v32i1.12328"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Chan, W., Tian, Z., and Wu, Y. (2020). Gas-gcn: Gated action-specific graph convolutional networks for skeleton-based action recognition. Sensors, 20.","DOI":"10.3390\/s20123499"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"38472","DOI":"10.1109\/ACCESS.2020.2973039","article-title":"Global Relation Reasoning Graph Convolutional Networks for Human Pose Estimation","volume":"8","author":"Wang","year":"2020","journal-title":"IEEE Access"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Azizi, N., Possegger, H., Rodol\u00e0, E., and Bischof, H. (2022, January 23\u201327). 3D Human Pose Estimation Using M\u00f6bius Graph Convolutional Networks. Proceedings of the 17th European Conference on Computer Vision, Tel Aviv, Israel.","DOI":"10.1007\/978-3-031-19769-7_10"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Liu, Z., Jiang, Z., Feng, W., and Feng, H. (2020, January 6\u201310). OD-GCN: Object Detection Boosted by Knowledge GCN. Proceedings of the 2020 IEEE International Conference on Multimedia & Expo Workshops, London, UK.","DOI":"10.1109\/ICMEW46912.2020.9105952"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Li, Z., Du, X., and Cao, Y. (2020, January 1\u20135). GAR: Graph Assisted Reasoning for Object Detection. Proceedings of the 2020 IEEE Winter Conference on Applications of Computer Vision, Snowmass Village, CO, USA.","DOI":"10.1109\/WACV45572.2020.9093559"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Chopin, B., Otberdout, N., Daoudi, M., and Bartolo, A. (2022). 3D Skeleton-based Human Motion Prediction with Manifold-Aware GAN. IEEE Trans. Biom. Behav. Identity Sci.","DOI":"10.1109\/TBIOM.2022.3215067"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Sampieri, A., di Melendugno, G.M.D., Avogaro, A., Cunico, F., Setti, F., Skenderi, G., Cristani, M., and Galasso, F. (2022, January 23\u201327). Pose Forecasting in Industrial Human-Robot Collaboration. Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel.","DOI":"10.1007\/978-3-031-19839-7_4"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Corona, E., Pumarola, A., Alenya, G., and Moreno-Noguer, F. (2020, January 14\u201319). Context-Aware Human Motion Prediction. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00702"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"6033","DOI":"10.1109\/LRA.2020.3010742","article-title":"Socially and Contextually Aware Human Motion and Pose Forecasting","volume":"5","author":"Adeli","year":"2020","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Chao, Y.W., Yang, J., Price, B., Cohen, S., and Deng, J. (2017, January 21\u201326). Forecasting human dynamics from static images. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.388"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Zhang, J.Y., Felsen, P., Kanazawa, A., and Malik, J. (November, January 27). Predicting 3d human dynamics from video. Proceedings of the 2019 IEEE\/CVF International Conference on Computer Vision, Seoul, Korea.","DOI":"10.1109\/ICCV.2019.00721"},{"key":"ref_31","unstructured":"Tan, M., and Le, Q. (2019;, January 9\u201315). EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"2684","DOI":"10.1109\/TPAMI.2019.2916873","article-title":"NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding","volume":"42","author":"Liu","year":"2020","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Liu, C., Hu, Y., Li, Y., Song, S., and Liu, J. (2017, January 27). PKU-MMD: A large scale benchmark for skeleton-based human action understanding. Proceedings of the Workshop on Visual Analysis in Smart and Connected Communities, Mountain View, CA, USA.","DOI":"10.1145\/3132734.3132739"},{"key":"ref_34","unstructured":"Ultralytics (2022, November 21). Yolov5. Available online: https:\/\/github.com\/ultralytics\/yolov5."},{"key":"ref_35","unstructured":"Kingma, D., and Ba, J. (2015, January 7\u20139). Adam: A Method for Stochastic Optimization. Proceedings of the 3rd International Conference on Learning Representations, San Diego, CA, USA."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"1325","DOI":"10.1109\/TPAMI.2013.248","article-title":"Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments","volume":"36","author":"Ionescu","year":"2014","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Chen, C.H., and Ramanan, D. (2017, January 21\u201326). 3D Human Pose Estimation = 2D Pose Estimation + Matching. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.610"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Habibie, I., Xu, W., Mehta, D., Pons-Moll, G., and Theobalt, C. (2019, January 16\u201320). In the Wild Human Pose Estimation Using Explicit 2D Features and Intermediate 3D Representations. Proceedings of the IEEE\/CVF Conference on Computer VISION and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.01116"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Li, W., Liu, H., Tang, H., Wang, P., and Van Gool, L. (2022, January 18\u201324). MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.01280"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/2\/876\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T18:03:58Z","timestamp":1760119438000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/23\/2\/876"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,1,12]]},"references-count":39,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2023,1]]}},"alternative-id":["s23020876"],"URL":"https:\/\/doi.org\/10.3390\/s23020876","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2023,1,12]]}}}