{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,28]],"date-time":"2025-10-28T11:27:41Z","timestamp":1761650861923,"version":"build-2065373602"},"reference-count":55,"publisher":"Institution of Engineering and Technology (IET)","issue":"7","license":[{"start":{"date-parts":[[2024,7,11]],"date-time":"2024-07-11T00:00:00Z","timestamp":1720656000000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"content-domain":{"domain":["ietresearch.onlinelibrary.wiley.com"],"crossmark-restriction":true},"short-container-title":["IET Computer Vision"],"published-print":{"date-parts":[[2024,10]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Human actions are predominantly presented in 2D format in video surveillance scenarios, which hinders the accurate determination of action details not apparent in 2D data. Depth estimation can aid human action recognition tasks, enhancing accuracy with neural networks. However, reliance on images for depth estimation requires extensive computational resources and cannot utilise the connectivity between human body structures. Besides, the depth information may not accurately reflect actual depth ranges, necessitating improved reliability. Therefore, a 2D human skeleton action recognition method with spatial constraints (2D\u2010SCHAR) is introduced. 2D\u2010SCHAR employs graph convolution networks to process graph\u2010structured human action skeleton data comprising three parts: depth estimation, spatial transformation, and action recognition. The initial two components, which infer 3D information from 2D human skeleton actions and generate spatial transformation parameters to correct abnormal deviations in action data, support the latter in the model to enhance the accuracy of action recognition. The model is designed in an end\u2010to\u2010end, multitasking manner, allowing parameter sharing among these three components to boost performance. The experimental results validate the model's effectiveness and superiority in human skeleton action recognition.<\/jats:p>","DOI":"10.1049\/cvi2.12296","type":"journal-article","created":{"date-parts":[[2024,7,12]],"date-time":"2024-07-12T00:09:09Z","timestamp":1720742949000},"page":"968-981","update-policy":"https:\/\/doi.org\/10.1002\/crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["2D human skeleton action recognition with spatial constraints"],"prefix":"10.1049","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0912-6059","authenticated-orcid":false,"given":"Lei","family":"Wang","sequence":"first","affiliation":[{"name":"School of Aeronautics and Astronautics Sichuan University  Chengdu Sichuan China"},{"name":"School of Aeronautical Manufacturing Industry Chengdu Aeronautic Polytechnic  Chengdu Sichuan China"}]},{"given":"Jianwei","family":"Zhang","sequence":"additional","affiliation":[{"name":"College of Computer Science Sichuan University  Chengdu Sichuan China"}]},{"given":"Wenbing","family":"Yang","sequence":"additional","affiliation":[{"name":"Chengdu Army Equipment Department The 3rd Office of Military Delegate  Chengdu Sichuan China"}]},{"given":"Song","family":"Gu","sequence":"additional","affiliation":[{"name":"School of Aeronautical Manufacturing Industry Chengdu Aeronautic Polytechnic  Chengdu Sichuan China"}]},{"given":"Shanmin","family":"Yang","sequence":"additional","affiliation":[{"name":"School of Computer Science Chengdu University of Information Technology  Chengdu Sichuan China"}]}],"member":"265","published-online":{"date-parts":[[2024,7,11]]},"reference":[{"key":"e_1_2_11_2_1","doi-asserted-by":"crossref","unstructured":"Zhou Z. Tulsiani S.:Sparsefusion: distilling view\u2010conditioned diffusion for 3d reconstruction 12588\u201312597(2023)","DOI":"10.1109\/CVPR52729.2023.01211"},{"key":"e_1_2_11_3_1","article-title":"Depth map prediction from a single image using a multi\u2010scale deep network","volume":"27","author":"Eigen D.","year":"2014","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_2_11_4_1","doi-asserted-by":"crossref","unstructured":"Lee J.H. Kim C.S.:Monocular depth estimation using relative depth maps 9729\u20139738(2019)","DOI":"10.1109\/CVPR.2019.00996"},{"key":"e_1_2_11_5_1","doi-asserted-by":"crossref","unstructured":"CS Kumar A. Bhandarkar S.M. Prasad M.:Depthnet: a recurrent neural network architecture for monocular depth prediction 283\u2013291(2018)","DOI":"10.1109\/CVPRW.2018.00066"},{"key":"e_1_2_11_6_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11633\u2010023\u20101458\u20100"},{"key":"e_1_2_11_7_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11263\u2010023\u201001799\u20106"},{"key":"e_1_2_11_8_1","first-page":"503","volume-title":"PMLR","author":"Guizilini V.","year":"2020"},{"key":"e_1_2_11_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/lra.2021.3061343"},{"key":"e_1_2_11_10_1","doi-asserted-by":"crossref","unstructured":"Poggi M. et\u00a0al.:On the uncertainty of self\u2010supervised monocular depth estimation 3227\u20133237(2020)","DOI":"10.1109\/CVPR42600.2020.00329"},{"key":"e_1_2_11_11_1","doi-asserted-by":"crossref","unstructured":"Johnston A. Carneiro G.:Self\u2010supervised monocular trained depth estimation using self\u2010attention and discrete disparity volume 4756\u20134765(2020)","DOI":"10.1109\/CVPR42600.2020.00481"},{"key":"e_1_2_11_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICME.2017.8019532"},{"key":"e_1_2_11_13_1","doi-asserted-by":"crossref","unstructured":"Rossi M. et\u00a0al.:Joint graph\u2010based depth refinement and normal estimation 12154\u201312163(2020)","DOI":"10.1109\/CVPR42600.2020.01217"},{"key":"e_1_2_11_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/tcsvt.2019.2954948"},{"key":"e_1_2_11_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/access.2019.2961606"},{"key":"e_1_2_11_16_1","doi-asserted-by":"crossref","unstructured":"Chen Y. et\u00a0al.:Monopair: monocular 3d object detection using pairwise spatial relationships 12093\u201312102(2020)","DOI":"10.1109\/CVPR42600.2020.01211"},{"key":"e_1_2_11_17_1","doi-asserted-by":"crossref","unstructured":"Rey\u2010Area M. Yuan M. Richardt C.:360monodepth: high\u2010resolution 360deg monocular depth estimation 3762\u20133772(2022)","DOI":"10.1109\/CVPR52688.2022.00374"},{"key":"e_1_2_11_18_1","doi-asserted-by":"crossref","unstructured":"Yuan W. et\u00a0al.:Neural window fully\u2010connected crfs for monocular depth estimation 3916\u20133925(2022)","DOI":"10.1109\/CVPR52688.2022.00389"},{"key":"e_1_2_11_19_1","article-title":"Others. Spatial transformer networks","volume":"28","author":"Jaderberg M.","year":"2015","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_2_11_20_1","first-page":"17605","article-title":"Learning invariances in neural networks from training data","volume":"33","author":"Benton G.","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"e_1_2_11_21_1","first-page":"6086","volume-title":"Equivariant Transformer Networks","author":"Tai K.S.","year":"2019"},{"key":"e_1_2_11_22_1","doi-asserted-by":"crossref","unstructured":"Chaman A. Dokmanic I.:Truly shift\u2010invariant convolutional neural networks 3773\u20133783(2021)","DOI":"10.1109\/CVPR46437.2021.00377"},{"key":"e_1_2_11_23_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-19772-7_11"},{"key":"e_1_2_11_24_1","first-page":"3870","volume-title":"Video Anomaly Prediction: Problem, Dataset and Method","author":"Wang Y.","year":"2024"},{"key":"e_1_2_11_25_1","doi-asserted-by":"publisher","DOI":"10.1049\/iet\u2010cvi.2018.5020"},{"key":"e_1_2_11_26_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2023.02.027"},{"key":"e_1_2_11_27_1","doi-asserted-by":"crossref","unstructured":"Wu P. et\u00a0al.:Vadclip: adapting vision\u2010language models for weakly supervised video anomaly detection38 6074\u20136082(2024)","DOI":"10.1609\/aaai.v38i6.28423"},{"key":"e_1_2_11_28_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10489\u2010021\u201002517\u2010w"},{"key":"e_1_2_11_29_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v36i3.20191"},{"key":"e_1_2_11_30_1","doi-asserted-by":"crossref","unstructured":"Zhou H. Liu Q. Wang Y.:Learning discriminative representations for skeleton based action recognition 10608\u201310617(2023)","DOI":"10.1109\/CVPR52729.2023.01022"},{"key":"e_1_2_11_31_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00521\u2010023\u201008814\u20104"},{"key":"e_1_2_11_32_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2022.07.046"},{"key":"e_1_2_11_33_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.12328"},{"key":"e_1_2_11_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/tpami.2019.2916873"},{"key":"e_1_2_11_35_1","doi-asserted-by":"crossref","unstructured":"Cao Z. et\u00a0al.:Realtime multi\u2010person 2d pose estimation using part affinity fields 7291\u20137299(2017)","DOI":"10.1109\/CVPR.2017.143"},{"key":"e_1_2_11_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICPR.2004.1334462"},{"key":"e_1_2_11_37_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.procs.2018.07.059"},{"key":"e_1_2_11_38_1","first-page":"23","article-title":"Human action recognition in videos using a robust CNN LSTM approach","author":"Orozco C.I.","year":"2020","journal-title":"Ciencia Tecnolog."},{"key":"e_1_2_11_39_1","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.5302"},{"key":"e_1_2_11_40_1","doi-asserted-by":"publisher","DOI":"10.3389\/fnins.2023.994517"},{"key":"e_1_2_11_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/tpami.2022.3222784"},{"key":"e_1_2_11_42_1","doi-asserted-by":"crossref","unstructured":"Wang C. et\u00a0al.:Mancs: a multi\u2010task attentional network with curriculum sampling for person re\u2010identification 365\u2013381(2018)","DOI":"10.1007\/978-3-030-01225-0_23"},{"key":"e_1_2_11_43_1","doi-asserted-by":"crossref","unstructured":"Li M. et\u00a0al.:Actional\u2010structural graph convolutional networks for skeleton\u2010based action recognition 3595\u20133603(2019)","DOI":"10.1109\/CVPR.2019.00371"},{"key":"e_1_2_11_44_1","doi-asserted-by":"crossref","unstructured":"Shi L. et\u00a0al.:Two\u2010stream adaptive graph convolutional networks for skeleton\u2010based action recognition 12026\u201312035(2019)","DOI":"10.1109\/CVPR.2019.01230"},{"key":"e_1_2_11_45_1","doi-asserted-by":"crossref","unstructured":"Shi L. et\u00a0al.:Skeleton\u2010based action recognition with directed graph neural networks 7912\u20137921(2019)","DOI":"10.1109\/CVPR.2019.00810"},{"key":"e_1_2_11_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/tip.2020.3028207"},{"key":"e_1_2_11_47_1","doi-asserted-by":"crossref","unstructured":"Cho S. et\u00a0al.:Self\u2010attention network for skeleton\u2010based human action recognition 635\u2013644(2020)","DOI":"10.1109\/WACV45572.2020.9093639"},{"key":"e_1_2_11_48_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.imavis.2021.104141"},{"key":"e_1_2_11_49_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2021.02.001"},{"key":"e_1_2_11_50_1","doi-asserted-by":"publisher","DOI":"10.1049\/cit2.12012"},{"key":"e_1_2_11_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/tmm.2022.3168137"},{"key":"e_1_2_11_52_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10489\u2010022\u201003436\u20100"},{"key":"e_1_2_11_53_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00530\u2010023\u201001082\u20101"},{"key":"e_1_2_11_54_1","doi-asserted-by":"publisher","DOI":"10.1109\/tvcg.2023.3247075"},{"key":"e_1_2_11_55_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2022.109231"},{"key":"e_1_2_11_56_1","doi-asserted-by":"crossref","unstructured":"Zhang J. et\u00a0al.:Mixste: seq2seq mixed spatio\u2010temporal encoder for 3d human pose estimation in video 13232\u201313242(2022)","DOI":"10.1109\/CVPR52688.2022.01288"}],"container-title":["IET Computer Vision"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/ietresearch.onlinelibrary.wiley.com\/doi\/pdf\/10.1049\/cvi2.12296","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,28]],"date-time":"2025-10-28T11:23:44Z","timestamp":1761650624000},"score":1,"resource":{"primary":{"URL":"https:\/\/ietresearch.onlinelibrary.wiley.com\/doi\/10.1049\/cvi2.12296"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7,11]]},"references-count":55,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2024,10]]}},"alternative-id":["10.1049\/cvi2.12296"],"URL":"https:\/\/doi.org\/10.1049\/cvi2.12296","archive":["Portico"],"relation":{},"ISSN":["1751-9632","1751-9640"],"issn-type":[{"type":"print","value":"1751-9632"},{"type":"electronic","value":"1751-9640"}],"subject":[],"published":{"date-parts":[[2024,7,11]]},"assertion":[{"value":"2024-02-11","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-06-16","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-07-11","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}