{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,7]],"date-time":"2026-05-07T16:21:33Z","timestamp":1778170893892,"version":"3.51.4"},"reference-count":41,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2024,1,18]],"date-time":"2024-01-18T00:00:00Z","timestamp":1705536000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Fundamental Research Funds for the Central Universities","award":["22120220666"],"award-info":[{"award-number":["22120220666"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Symmetry"],"abstract":"<jats:p>In the process of human\u2013robot collaborative assembly, robots need to recognize and predict human behaviors accurately, and then perform autonomous control and work route planning in real-time. To support the judgment of human intervention behaviors and meet the need of real-time human\u2013robot collaboration, the Fast Spatial\u2013Temporal Transformer Network (FST-Trans), an accurate prediction method of human assembly actions, is proposed. We tried to maximize the symmetry between the prediction results and the actual action while meeting the real-time requirement. With concise and efficient structural design, FST-Trans can learn about the spatial\u2013temporal interactions of human joints during assembly in the same latent space and capture more complex motion dynamics. Considering the inconsistent assembly rates of different individuals, the network is forced to learn more motion variations by introducing velocity\u2013acceleration loss, realizing accurate prediction of assembly actions. An assembly dataset was collected and constructed for detailed comparative experiments and ablation studies, and the experimental results demonstrate the effectiveness of the proposed method.<\/jats:p>","DOI":"10.3390\/sym16010118","type":"journal-article","created":{"date-parts":[[2024,1,18]],"date-time":"2024-01-18T11:28:46Z","timestamp":1705577326000},"page":"118","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["An Accurate Prediction Method of Human Assembly Motion for Human\u2013Robot Collaboration"],"prefix":"10.3390","volume":"16","author":[{"given":"Yangzheng","family":"Zhou","sequence":"first","affiliation":[{"name":"School of Mechanical Engineering, Tongji University, Shanghai 200092, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1112-0895","authenticated-orcid":false,"given":"Liang","family":"Luo","sequence":"additional","affiliation":[{"name":"Sino-German College of Postgraduate Studies, Tongji University, Shanghai 200092, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5519-5990","authenticated-orcid":false,"given":"Pengzhong","family":"Li","sequence":"additional","affiliation":[{"name":"School of Mechanical Engineering, Tongji University, Shanghai 200092, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2024,1,18]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"335","DOI":"10.1109\/THMS.2021.3092684","article-title":"Cobots in industry 4.0: A roadmap for future practice studies on human\u2013robot collaboration","volume":"51","author":"Weiss","year":"2021","journal-title":"IEEE Trans. Hum. Mach. Syst."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"641","DOI":"10.1007\/978-3-030-14730-3_68","article-title":"A brief overview of the use of collaborative robots in industry 4.0: Human role and safety","volume":"202","author":"Costa","year":"2019","journal-title":"Occup. Environ. Saf. Health"},{"key":"ref_3","unstructured":"Goel, R., and Gupta, P. (2020). Sharp Business and Sustainable Development, Springer."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Lea, C., Vidal, R., Reiter, A., and Hager, G.D. (2016, January 8\u201310). Temporal convolutional networks: A unified approach to action segmentation. Proceedings of the Computer Vision\u2013ECCV 2016 Workshops, Amsterdam, The Netherlands. Part III 14.","DOI":"10.1007\/978-3-319-49409-8_7"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Choi, A., Jawed, M.K., and Joo, J. (2022, January 23\u201327). Preemptive motion planning for human-to-robot indirect placement handovers. Proceedings of the 2022 International Conference on Robotics and Automation, Philadelphia, PA, USA.","DOI":"10.1109\/ICRA46639.2022.9811558"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"2394","DOI":"10.1109\/LRA.2018.2812906","article-title":"Human-aware robotic assistant for collaborative assembly: Integrating human motion prediction with planning in time","volume":"3","author":"Unhelkar","year":"2018","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Khawaja, F.I., Kanazawa, A., Kinugawa, J., and Kosuge, K. (2021). A Human-Following Motion Planning and Control Scheme for Collaborative Robots Based on Human Motion Prediction. Sensors, 21.","DOI":"10.20944\/preprints202111.0181.v1"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"6046","DOI":"10.1109\/LRA.2021.3086666","article-title":"Human posture prediction during physical human-robot interaction","volume":"6","author":"Vianello","year":"2021","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Cheng, Y., Zhao, W., Liu, C., and Tomizuka, M. (2019, January 10\u201312). Human motion prediction using semi-adaptable neural networks. Proceedings of the 2019 American Control Conference, Philadelphia, PA, USA.","DOI":"10.23919\/ACC.2019.8814980"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"2602","DOI":"10.1109\/LRA.2020.2972874","article-title":"Towards efficient human-robot collaboration with robust plan recognition and trajectory prediction","volume":"5","author":"Cheng","year":"2020","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Fragkiadaki, K., Levine, S., Felsen, P., and Malik, J. (2015, January 7\u201313). Recurrent network models for human dynamics. Proceedings of the 2015 IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.494"},{"key":"ref_12","unstructured":"Kratzer, P., Midlagajni, N.B., Toussaint, M., and Mainprice, J. (September, January 31). Anticipating human intention for full-body motion prediction in object grasping and placing tasks. Proceedings of the 29th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), Naples, Italy."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1016\/j.cirp.2020.04.077","article-title":"Recurrent neural network for motion trajectory prediction in human-robot collaborative assembly","volume":"69","author":"Zhang","year":"2020","journal-title":"CIRP Ann."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"295","DOI":"10.1109\/LRA.2020.3043163","article-title":"Multimodal deep generative models for trajectory prediction: A conditional variational autoencoder approach","volume":"6","author":"Ivanovic","year":"2020","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"1853","DOI":"10.1109\/TCDS.2022.3215093","article-title":"Efficient and collision-free human-robot collaboration based on intention and trajectory prediction","volume":"15","author":"Lyu","year":"2023","journal-title":"IEEE Trans. Cogn. Dev. Syst."},{"key":"ref_16","unstructured":"Kipf, T.N., and Welling, M. (2016). Semi-supervised classification with graph convolutional networks. arXiv."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Yan, S., Xiong, Y., and Lin, D. (2018, January 27). Spatial temporal graph convolutional networks for skeleton-based action recognition. Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA.","DOI":"10.1609\/aaai.v32i1.12328"},{"key":"ref_18","unstructured":"Mao, W., Liu, M., Salzmann, M., and Li, H. (November, January 27). Learning trajectory dependencies for human motion prediction. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Seoul, Republic of Korea."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Cui, Q., Sun, H., and Yang, F. (2020, January 19). Learning dynamic relationships for 3d human motion prediction. Proceedings of the 2020 IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00655"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Dang, L., Nie, Y., Long, C., Zhang, Q., and Li, G. (2021, January 10\u201317). Msr-gcn: Multi-scale residual graph convolution networks for human motion prediction. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.01127"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Sofianos, T., Sampieri, A., Franco, L., and Galasso, F. (2021, January 10\u201317). Space-time-separable graph convolutional network for pose forecasting. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, QC, Canada.","DOI":"10.1109\/ICCV48922.2021.01102"},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Li, M., Chen, S., Zhang, Z., Xie, L., Tian, Q., and Zhang, Y. (2022, January 23\u201327). Skeleton-parted graph scattering networks for 3d human motion prediction. Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel.","DOI":"10.1007\/978-3-031-20068-7_2"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Sampieri, A., di Melendugno, G.M.D.A., Avogaro, A., Cunico, F., Setti, F., Skenderi, G., Cristani, M., and Galasso, F. (2022, January 23\u201327). Pose forecasting in industrial human-robot collaboration. Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel.","DOI":"10.1007\/978-3-031-19839-7_4"},{"key":"ref_24","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, \u0141., and Polosukhin, I. (2017, January 4\u20139). Attention is all you need. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Lucas, T., Baradel, F., Weinzaepfel, P., and Rogez, G. (2022, January 23\u201327). Posegpt: Quantization-based 3d human motion generation and forecasting. Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel.","DOI":"10.1007\/978-3-031-20068-7_24"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Mao, W., Liu, M., and Salzmann, M. (2022, January 18\u201324). Weakly-supervised action transition learning for stochastic human motion prediction. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.","DOI":"10.1109\/CVPR52688.2022.00798"},{"key":"ref_27","unstructured":"Tevet, G., Raab, S., Gordon, B., Shafir, Y., Cohen-Or, D., and Bermano, A.H. (2022). Human motion diffusion model. arXiv."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Zhu, W., Ma, X., Liu, Z., Liu, L., Wu, W., and Wang, Y. (2023, January 1\u20136). Motionbert: A unified perspective on learning human motion representations. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Paris, France.","DOI":"10.1109\/ICCV51070.2023.01385"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Yu, C., Ma, X., Ren, J., Zhao, H., and Yi, S. (2020, January 23\u201328). Spatio-temporal graph transformer networks for pedestrian trajectory prediction. Proceedings of the European Conference on Computer Vision, Glasgow, UK. Part XII 16.","DOI":"10.1007\/978-3-030-58610-2_30"},{"key":"ref_30","unstructured":"Akhter, I., Sheikh, Y., Khan, S., and Kanade, T. (2008, January 8\u201310). Nonrigid structure from motion in trajectory space. Proceedings of the 21st International Conference on Neural Information Processing Systems, Vancouver, BC, Canada."},{"key":"ref_31","unstructured":"Ba, J.L., Kiros, J.R., and Hinton, G.E. (2016). Layer normalization. arXiv."},{"key":"ref_32","unstructured":"Hendrycks, D., and Gimpel, K. (2016). Gaussian error linear units (gelus). arXiv."},{"key":"ref_33","unstructured":"Hayou, S., Clerico, E., He, B., Deligiannidis, G., Doucet, A., and Rousseau, J. (2021, January 13\u201315). Stable resnet. Proceedings of the International Conference on Artificial Intelligence and Statistics, Virtual."},{"key":"ref_34","unstructured":"He, B., Martens, J., Zhang, G., Botev, A., Brock, A., Smith, S.L., and Teh, Y.W. (2022, January 25\u201329). Deep Transformers without Shortcuts: Modifying Self-attention for Faithful Signal Propagation. Proceedings of the Eleventh International Conference on Learning Representations, Virtual."},{"key":"ref_35","unstructured":"He, B., and Hofmann, T. (2023). Simplifying Transformer Blocks. arXiv."},{"key":"ref_36","unstructured":"Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., and Antiga, L. (2019, January 8\u201314). Pytorch: An imperative style, high-performance deep learning library. Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, BC, Canada."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Mohammed, W.M., Nejman, M., Casta\u00f1o, F., Lastra, J.L.M., Strzelczak, S., and Villalonga, A. (2020, January 10\u201312). Training an Under-actuated Gripper for Grasping Shallow Objects Using Reinforcement Learning. Proceedings of the 2020 IEEE Conference on Industrial Cyberphysical Systems (ICPS), Tampere, Finland.","DOI":"10.1109\/ICPS48405.2020.9274727"},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Goel, P., Mehta, S., Kumar, R., and Casta\u00f1o, F. (2022). Sustainable Green Human Resource management practices in educational institutions: An interpretive structural modelling and analytic hierarchy process approach. Sustainability, 14.","DOI":"10.3390\/su141912853"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"102227","DOI":"10.1016\/j.rcim.2021.102227","article-title":"A reinforcement learning method for human-robot collaboration in assembly tasks","volume":"73","author":"Zhang","year":"2022","journal-title":"Robot. Comput. Integr. Manuf."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"326","DOI":"10.1016\/j.jmsy.2020.06.018","article-title":"Reinforcement learning for facilitating human-robot-interaction in manufacturing","volume":"56","author":"Oliff","year":"2020","journal-title":"J. Manuf. Syst."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"El-Shamouty, M., Wu, X., Yang, S., Albus, M., and Huber, M.F. (August, January 31). Towards safe human-robot collaboration using deep reinforcement learning. Proceedings of the 2020 IEEE international conference on robotics and automation (ICRA), Paris, France.","DOI":"10.1109\/ICRA40945.2020.9196924"}],"container-title":["Symmetry"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-8994\/16\/1\/118\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T13:50:01Z","timestamp":1760104201000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-8994\/16\/1\/118"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,1,18]]},"references-count":41,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2024,1]]}},"alternative-id":["sym16010118"],"URL":"https:\/\/doi.org\/10.3390\/sym16010118","relation":{},"ISSN":["2073-8994"],"issn-type":[{"value":"2073-8994","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,1,18]]}}}