{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,25]],"date-time":"2026-03-25T13:20:11Z","timestamp":1774444811967,"version":"3.50.1"},"reference-count":40,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2022,7,29]],"date-time":"2022-07-29T00:00:00Z","timestamp":1659052800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,7,29]],"date-time":"2022-07-29T00:00:00Z","timestamp":1659052800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Key Program of NSFC","award":["U1908214"],"award-info":[{"award-number":["U1908214"]}]},{"DOI":"10.13039\/501100012269","name":"Science and Technology Innovative Research Team in Higher Educational Institutions of Hunan Province","doi-asserted-by":"publisher","award":["LT2020015"],"award-info":[{"award-number":["LT2020015"]}],"id":[{"id":"10.13039\/501100012269","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Support Plan for Key Field Innovation Team of Dalian","award":["2021RT06"],"award-info":[{"award-number":["2021RT06"]}]},{"name":"the Science and Technology Innovation Fund of Dalian","award":["2020JJ25CY001"],"award-info":[{"award-number":["2020JJ25CY001"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Vis. Comput. Ind. Biomed. Art"],"published-print":{"date-parts":[[2022,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>In recent years, human motion prediction has become an active research topic in computer vision. However, owing to the complexity and stochastic nature of human motion, it remains a challenging problem. In previous works, human motion prediction has always been treated as a typical inter-sequence problem, and most works have aimed to capture the temporal dependence between successive frames. However, although these approaches focused on the effects of the temporal dimension, they rarely considered the correlation between different joints in space. Thus, the spatio-temporal coupling of human joints is considered, to propose a novel spatio-temporal network based on a transformer and a gragh convolutional network (GCN) (STTG-Net). The temporal transformer is used to capture the global temporal dependencies, and the spatial GCN module is used to establish local spatial correlations between the joints for each frame. To overcome the problems of error accumulation and discontinuity in the motion prediction, a revision method based on fusion strategy is also proposed, in which the current prediction frame is fused with the previous frame. The experimental results show that the proposed prediction method has less prediction error and the prediction motion is smoother than previous prediction methods. The effectiveness of the proposed method is also demonstrated comparing it with the state-of-the-art method on the Human3.6\u2009M dataset.<\/jats:p>","DOI":"10.1186\/s42492-022-00112-5","type":"journal-article","created":{"date-parts":[[2022,7,29]],"date-time":"2022-07-29T10:04:33Z","timestamp":1659089073000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":13,"title":["STTG-net: a Spatio-temporal network for human motion prediction based on transformer and graph convolution network"],"prefix":"10.1186","volume":"5","author":[{"given":"Lujing","family":"Chen","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5200-640X","authenticated-orcid":false,"given":"Rui","family":"Liu","sequence":"additional","affiliation":[]},{"given":"Xin","family":"Yang","sequence":"additional","affiliation":[]},{"given":"Dongsheng","family":"Zhou","sequence":"additional","affiliation":[]},{"given":"Qiang","family":"Zhang","sequence":"additional","affiliation":[]},{"given":"Xiaopeng","family":"Wei","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,7,29]]},"reference":[{"key":"112_CR1","doi-asserted-by":"publisher","unstructured":"Wang H, Wang L (2017) Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks. Paper presented at 2017 IEEE conference on computer vision and pattern recognition, IEEE, Honolulu, 21-27 July 2017. https:\/\/doi.org\/10.1109\/CVPR.2017.387","DOI":"10.1109\/CVPR.2017.387"},{"key":"112_CR2","doi-asserted-by":"publisher","unstructured":"Liu J, Shahroudy A, Xu D. Kot AC, Wang G (2018) Skeleton-based action recognition using spatio-temporal LSTM network with trust gates. IEEE Trans Pattern Anal Mach Intelli, 40(12): 3007-3021. https:\/\/doi.org\/10.1109\/TPAMI.2017.2771306","DOI":"10.1109\/TPAMI.2017.2771306"},{"key":"112_CR3","doi-asserted-by":"publisher","unstructured":"Li C, Zhang Z, Lee W S, Lee G H (2018) Convolutional sequence to sequence model for human dynamics. Paper presented at 2018 IEEE\/CVF conference on computer vision and pattern recognition, IEEE, Salt Lake City, 18-23 June 2018.https:\/\/doi.org\/10.1109\/CVPR.2018.00548","DOI":"10.1109\/CVPR.2018.00548"},{"key":"112_CR4","doi-asserted-by":"publisher","unstructured":"Mao W, Liu MM, Salzmann M, Li HD (2019) Learning trajectory dependencies for human motion prediction. Paper presented at 2019 IEEE\/CVF international conference on computer vision, IEEE, Seoul, 27-28 October 2019.https:\/\/doi.org\/10.1109\/ICCV.2019.00958","DOI":"10.1109\/ICCV.2019.00958"},{"key":"112_CR5","doi-asserted-by":"publisher","unstructured":"Tanco LM, Hilton A (2000) Realistic synthesis of novel human movements from a database of motion capture examples. Paper presented at Workshop on Human Motion, IEEE, Austin, 7-8 December. https:\/\/doi.org\/10.1109\/HUMO.2000.897383","DOI":"10.1109\/HUMO.2000.897383"},{"key":"112_CR6","unstructured":"Pavlovic V, Rehg JM, MacCormick J (2000) Learning switching linear models of human motion. Paper presented at 13th international conference on neural information processing systems, MIT Press, Denver, 1 January 2000."},{"key":"112_CR7","doi-asserted-by":"publisher","unstructured":"Arikan O, Forsyth D A, O'Brien J F (2003) Motion synthesis from annotations. Paper presented at ACM SIGGRAPH, ACM, New York, 27-31 July 2003.https:\/\/doi.org\/10.1145\/1201775.882284","DOI":"10.1145\/1201775.882284"},{"key":"112_CR8","doi-asserted-by":"publisher","unstructured":"Treuille A, Lee Y, Popovi\u0107 Z (2007) Near-optimal character animation with continuous control. ACM Trans Graph 26(3):7-es. https:\/\/doi.org\/10.1145\/1275808.1276386","DOI":"10.1145\/1275808.1276386"},{"key":"112_CR9","doi-asserted-by":"publisher","unstructured":"Wang J M, Fleet D J, Hertzmann A (2007) Gaussian process dynamical models for human motion. IEEE Trans Pattern Anal Mach Intelli 30(2): 283-298.https:\/\/doi.org\/10.1109\/TPAMI.2007.1167","DOI":"10.1109\/TPAMI.2007.1167"},{"key":"112_CR10","doi-asserted-by":"publisher","unstructured":"Akhter I, Simon T, Khan S, Matthews I, Sheikh Y (2012) Bilinear spatiotemporal basis models. ACM Trans Graph 31(2): 17. https:\/\/doi.org\/10.1145\/2159516.2159523","DOI":"10.1145\/2159516.2159523"},{"key":"112_CR11","doi-asserted-by":"crossref","unstructured":"Taylor G W, Hinton G E, Roweis S T (2007) Modeling human motion using binary latent variables. Paper presented at 20th annual conference on neural information processing systems, MIT Press, Vancouver, 4-7 December 2006.","DOI":"10.7551\/mitpress\/7503.003.0173"},{"key":"112_CR12","doi-asserted-by":"publisher","unstructured":"Fragkiadaki K, Levine S, Felsen P, Malik J (2015) Recurrent Network Models for Human Dynamics. Paper presented at 2015 IEEE international conference on computer vision, IEEE, Santiago, 7-13 December 2015.https:\/\/doi.org\/10.1109\/ICCV.2015.494","DOI":"10.1109\/ICCV.2015.494"},{"key":"112_CR13","doi-asserted-by":"publisher","unstructured":"Jain A, Zamir A R, Savarese S, Saxena A (2016) Structural-RNN: Deep learning on spatio-temporal graphs. Paper presented at 2016 IEEE conference on computer vision and pattern recognition, IEEE, Las Vegas, 27-30 June 2016.https:\/\/doi.org\/10.1109\/CVPR.2016.573","DOI":"10.1109\/CVPR.2016.573"},{"key":"112_CR14","doi-asserted-by":"publisher","unstructured":"Martinez J, Black M J, Romero J (2017) On human motion prediction using recurrent neural networks. Paper presented at 2017 IEEE conference on computer vision and pattern recognition, IEEE, Honolulu, 21-26 July 2017.https:\/\/doi.org\/10.1109\/cvpr.2017.497","DOI":"10.1109\/cvpr.2017.497"},{"key":"112_CR15","unstructured":"Zhou Y, Li ZM, Xiao SJ, He C, Huang Z, Li H (2017) Auto-conditioned recurrent networks for extended complex human motion synthesis. Paper presented at 6th international conference on learning representations, OpenReview, Vancouver, 30 April-3 May 2017."},{"key":"112_CR16","doi-asserted-by":"publisher","unstructured":"Tang YL, Ma L, Liu W, Zheng WS (2018) Long-term human motion prediction by modeling motion context and enhancing motion dynamic. Paper presented at the 27th international joint conference on artificial intelligence, IJCAL, Stockholm, 13-19 July 2018. https:\/\/doi.org\/10.24963\/ijcai.2018\/130","DOI":"10.24963\/ijcai.2018\/130"},{"key":"112_CR17","doi-asserted-by":"publisher","unstructured":"Gopalakrishnan A, Mali A, Kifer D, Giles L, Ororbia AG (2019) A neural temporal model for human motion prediction. Paper presented at 2019 IEEE conference on computer vision and pattern recognition, IEEE, Long Beach, 15-20 June 2019.https:\/\/doi.org\/10.1109\/CVPR.2019.01239","DOI":"10.1109\/CVPR.2019.01239"},{"key":"112_CR18","doi-asserted-by":"publisher","unstructured":"Liu ZG, Wu S, Jin SY, Liu Q, Lu SJ, Zimmermann R et al (2019) Towards natural and accurate future motion prediction of humans and animals. Paper presented at 2019 IEEE conference on computer vision and pattern recognition, IEEE, Long Beach, 15-20 June 2019. https:\/\/doi.org\/10.1109\/CVPR.2019.01024","DOI":"10.1109\/CVPR.2019.01024"},{"key":"112_CR19","doi-asserted-by":"publisher","unstructured":"Corona E, Pumarola A, Aleny\u00e0 G, Moreno-Noguer F (2020) Context-aware Human Motion Prediction. Paper presented at 2019 IEEE\/CVF conference on computer vision and pattern recognition, IEEE, Seattle, 13-19 June 2020.https:\/\/doi.org\/10.1109\/CVPR42600.2020.00702","DOI":"10.1109\/CVPR42600.2020.00702"},{"key":"112_CR20","doi-asserted-by":"publisher","unstructured":"Adeli V, Adeli E, Reid I, Niebles JC, Rezatofighi H (2020) Socially and contextually aware human motion and pose forecasting. IEEE Robot Autom Lett 5(4): 6033-6040.https:\/\/doi.org\/10.1109\/LRA.2020.3010742","DOI":"10.1109\/LRA.2020.3010742"},{"key":"112_CR21","doi-asserted-by":"publisher","unstructured":"Guo X, Choi J (2019) Human Motion Prediction via Learning Local Structure Representations and Temporal Dependencies. Paper presented at the thirty-third AAAI conference on artificial intelligence and thirty-first innovative applications of artificial intelligence conference and ninth symposium on educational advances in artificial intelligence AAAI, Honolulu, 27 January-1 February 2019. https:\/\/doi.org\/10.1609\/aaai.v33i01.33012580","DOI":"10.1609\/aaai.v33i01.33012580"},{"key":"112_CR22","doi-asserted-by":"publisher","unstructured":"Li MS, Chen SH, Zhao YH, Zhang Y, Wang YF, Tian Q (2020) Dynamic Multiscale Graph Neural Networks for 3D Skeleton Based Human Motion Prediction. Paper presented at 2020 IEEE\/CVF conference on computer vision and pattern recognition, IEEE, Seattle, 13-19 June 2020. https:\/\/doi.org\/10.1109\/CVPR42600.2020.00029","DOI":"10.1109\/CVPR42600.2020.00029"},{"key":"112_CR23","doi-asserted-by":"publisher","unstructured":"Barsoum E, Kender J, Liu ZC (2018) HP-GAN: Probabilistic 3D Human Motion Prediction via GAN. Paper presented at 2018 IEEE\/CVF conference on computer vision and pattern recognition workshops, IEEE, Salt Lake, 18-22 June 2020.https:\/\/doi.org\/10.1109\/CVPRW.2018.00191","DOI":"10.1109\/CVPRW.2018.00191"},{"key":"112_CR24","doi-asserted-by":"publisher","unstructured":"Gui LY, Wang YX, Liang X, Moura JMF (2018) Adversarial Geometry-Aware Human Motion Prediction. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer Vision-ECCV 2018. ECCV 2018. Lecture Notes in Computer Science(), vol 11208. Springer, Cham. https:\/\/doi.org\/10.1007\/978-3-030-01225-0_48","DOI":"10.1007\/978-3-030-01225-0_48"},{"key":"112_CR25","doi-asserted-by":"publisher","unstructured":"Wang BR, Adeli E, Chiu HK, Huang DA, Niebles JC (2019) Imitation learning for human pose prediction. Paper presented at 2019 IEEE international conference on computer vision, Seoul, 27 October-2 November 2019.https:\/\/doi.org\/10.1109\/ICCV.2019.00722","DOI":"10.1109\/ICCV.2019.00722"},{"key":"112_CR26","doi-asserted-by":"publisher","unstructured":"Pavllo D, Feichtenhofer C, Auli M, Grangier D (2020) Modeling human motion with quaternion-based neural networks. Int J Comput Vis 128(4): 855-872.https:\/\/doi.org\/10.1007\/s11263-019-01245-6","DOI":"10.1007\/s11263-019-01245-6"},{"key":"112_CR27","doi-asserted-by":"publisher","unstructured":"Mao W, Liu MM, Salzmann M (2020) History repeats itself: Human motion prediction via motion attention. Paper presented at 2020 16th European conference on computer vision, Springer, Cham, 23-28 August 2020.https:\/\/doi.org\/10.1007\/978-3-030-58568-6_28","DOI":"10.1007\/978-3-030-58568-6_28"},{"key":"112_CR28","doi-asserted-by":"publisher","unstructured":"Mao W, Liu MM, Salzmann M, Li HD (2021) Multi-level motion attention for human motion prediction. Int J Comput Vis 129(9): 2513-2535.https:\/\/doi.org\/10.1007\/s11263-021-01483-7","DOI":"10.1007\/s11263-021-01483-7"},{"key":"112_CR29","doi-asserted-by":"crossref","unstructured":"Hermes L, Hammer B, Schilling M (2021) Application of Graph Convolutions in a Lightweight Model for Skeletal Human Motion Forecasting. arXiv preprint arXiv:2110.04810. https:\/\/arxiv.org\/abs\/2110.04810","DOI":"10.14428\/esann\/2021.ES2021-145"},{"key":"112_CR30","doi-asserted-by":"publisher","unstructured":"Mart\u00ednez-Gonz\u00e1lez A, Villamizar M, Odobez J M (2021) Pose Transformers (POTR): Human Motion Prediction with Non-Autoregressive Transformers. Paper presented at 2021 IEEE\/CVF international conference on computer vision Workshops, IEEE, Montreal, 11-17 October 2021. https:\/\/doi.org\/10.1109\/ICCVW54120.2021.00257","DOI":"10.1109\/ICCVW54120.2021.00257"},{"key":"112_CR31","unstructured":"Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN et al (2017) Attention is all you need. Paper presented at the 31st international conference on neural information processing systems, ACM, Long Beach, 4-9 December 2017."},{"key":"112_CR32","doi-asserted-by":"publisher","unstructured":"Jiang T, Camg\u00d6z NC, Bowden R (2021) Skeletor: Skeletal Transformers for Robust Body-Pose Estimation. Paper presented at 2021 IEEE\/CVF conference on computer vision and pattern recognition workshops, IEEE, Nashville, 19-25 June 2021.https:\/\/doi.org\/10.1109\/CVPRW53098.2021.00378","DOI":"10.1109\/CVPRW53098.2021.00378"},{"key":"112_CR33","unstructured":"Mao WA, Ge YT, Shen CH, Tian Z, Wang XL, Wang ZB (2021) Tfpose: Direct human pose estimation with transformers. arXiv preprint arXiv:2103.15320.https:\/\/arxiv.org\/abs\/2103.15320"},{"key":"112_CR34","doi-asserted-by":"publisher","unstructured":"Aksan E, Kaufmann M, Cao P, Hilliges O (2021) A Spatio-temporal Transformer for 3D Human Motion Prediction. Paper presented at the 2021 international conference on 3D Vision, IEEE, London, 1-3 December 2021.https:\/\/doi.org\/10.1109\/3DV53792.2021.00066","DOI":"10.1109\/3DV53792.2021.00066"},{"key":"112_CR35","unstructured":"Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. Paper presented at the 5th international conference on learning representations, OpenReview, Toulon, 24-26 April 2017."},{"key":"112_CR36","doi-asserted-by":"publisher","unstructured":"He KM, Zhang XY, Ren SQ, Sun J (2016) Deep residual learning for image recognition. Paper presented at 2016 IEEE conference on computer vision and pattern recognition, IEEE, Las Vegas, 27-30 June 2016.https:\/\/doi.org\/10.1109\/cvpr.2016.90","DOI":"10.1109\/cvpr.2016.90"},{"key":"112_CR37","unstructured":"Kingma DP, Ba J (2015) Adam: A Method for Stochastic Optimization. Paper presented at 3rd international conference on learning representations, ICLR, San Diego, 7-9 May 2015."},{"key":"112_CR38","doi-asserted-by":"publisher","unstructured":"Ionescu C, Papava D, Olaru V, Sminchisescu C (2014) Human3.6m: Large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans Pattern Anal Mach Intell 36(7): 1325-1339.https:\/\/doi.org\/10.1109\/TPAMI.2013.248","DOI":"10.1109\/TPAMI.2013.248"},{"key":"112_CR39","doi-asserted-by":"crossref","unstructured":"Liu ZG, Lyu K, Wu S, Chen HP, Hao YB, Ji SL (2021) Aggregated Multi-GANs for Controlled 3D Human Motion Prediction. Proc AAAI Conf Artif Intell 35(3): 2225-2232.","DOI":"10.1609\/aaai.v35i3.16321"},{"key":"112_CR40","doi-asserted-by":"publisher","unstructured":"Bourached A, Griffiths RR, Gray R, Jha A, Nachev P (2022) Generative Model-Enhanced Human Motion Prediction. Appl AI Lett 3(2):e63. https:\/\/doi.org\/10.1002\/ail2.63","DOI":"10.1002\/ail2.63"}],"container-title":["Visual Computing for Industry, Biomedicine, and Art"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s42492-022-00112-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s42492-022-00112-5\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s42492-022-00112-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,25]],"date-time":"2023-11-25T03:33:05Z","timestamp":1700883185000},"score":1,"resource":{"primary":{"URL":"https:\/\/vciba.springeropen.com\/articles\/10.1186\/s42492-022-00112-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,7,29]]},"references-count":40,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2022,12]]}},"alternative-id":["112"],"URL":"https:\/\/doi.org\/10.1186\/s42492-022-00112-5","relation":{},"ISSN":["2524-4442"],"issn-type":[{"value":"2524-4442","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,7,29]]},"assertion":[{"value":"6 February 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 May 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"29 July 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"19"}}