{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,24]],"date-time":"2025-11-24T12:48:20Z","timestamp":1763988500842,"version":"3.45.0"},"reference-count":28,"publisher":"National Library of Serbia","issue":"4","license":[{"start":{"date-parts":[[2025,1,1]],"date-time":"2025-01-01T00:00:00Z","timestamp":1735689600000},"content-version":"unspecified","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["ComSIS","COMPUT SCI INF SYST","COMPUT SCI INFORM SY","COMPUTER SCI INFORM","COMSIS J"],"published-print":{"date-parts":[[2025]]},"abstract":"<jats:p>This study aims to develop a real-time motion recognition system that translates skeletal human movements into a virtual environment. This will be achieved through the use of advanced tech-niques for the accurate capture of human skeletons and coordinate conversion. This paper investi-gates the acquisition and processing of motion data for virtual characters using depth cameras to obtain depth information. This study identifies six specific actions: left kick, right kick, left punch, right punch, squatting, and sitting. The experimental process successfully integrated RGB+D cameras, Media Pipe, and OpenCV into Unreal Engine models to capture and display human skeletal and joint positions in real-time. The experimental results show that the system achieved a precision of 100% for all motion detections, with an accuracy of more than 94%. How-ever, the recall rate for specific actions was lower, reaching 88%.<\/jats:p>","DOI":"10.2298\/csis241002067l","type":"journal-article","created":{"date-parts":[[2025,9,30]],"date-time":"2025-09-30T12:38:46Z","timestamp":1759235926000},"page":"1687-1705","source":"Crossref","is-referenced-by-count":0,"title":["A study of real-time operations by converting human skeleton coordinates to digital avatars"],"prefix":"10.2298","volume":"22","author":[{"given":"Fei-Lung","family":"Lin","sequence":"first","affiliation":[{"name":"Institute of Technical and Vocational Education, National Taipei University of Technology, Taipei, Taiwan"}]},{"given":"Jui-Hung","family":"Kao","sequence":"additional","affiliation":[{"name":"Department of Information Management, Shih Hsin University, Taipei, Taiwan"}]},{"given":"Yu-Yu","family":"Yen","sequence":"additional","affiliation":[{"name":"Center of General Education, Shih Hsin University, Taipei, Taiwan + Department of Biomedical Engineering, National Yang Ming Chiao Tung University, Taipei, Taiwan"}]},{"given":"Kuan-Wen","family":"Liao","sequence":"additional","affiliation":[{"name":"Department of Information Management, Shih Hsin University, Taipei, Taiwan"}]},{"given":"Pu","family":"Huang","sequence":"additional","affiliation":[{"name":"School of political science and law, Shaoguan University, Shaoguan, China Taiwan"}]}],"member":"1078","reference":[{"key":"ref1","unstructured":"Sathe, P.S., Tracking, Recognizing and Analyzing Human Exercise Activity. University of Akron.(2019)"},{"key":"ref2","unstructured":"Krizhevsky, A., I. Sutskever, and G.E. Hinton, Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25.(2012)"},{"key":"ref3","doi-asserted-by":"crossref","unstructured":"Rumelhart, D.E., G.E. Hinton, and R.J. Williams, Learning representations by backpropagating errors. nature, 323(6088): p. 533-536.(1986)","DOI":"10.1038\/323533a0"},{"key":"ref4","doi-asserted-by":"crossref","unstructured":"Hochreiter, S. and J. Schmidhuber, Long short-term memory. Neural computation. 9(8): p. 1735-1780.(1997)","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"ref5","doi-asserted-by":"crossref","unstructured":"Cho, K., et al., Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.(2014)","DOI":"10.3115\/v1\/D14-1179"},{"key":"ref6","doi-asserted-by":"crossref","unstructured":"Tran, D., et al. Learning spatiotemporal features with 3d convolutional networks. in Proceedings of the IEEE international conference on computer vision. (2015)","DOI":"10.1109\/ICCV.2015.510"},{"key":"ref7","doi-asserted-by":"crossref","unstructured":"Carreira, J. and A. Zisserman. Quo vadis, action recognition? a new model and the kinetics dataset. in proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (2017)","DOI":"10.1109\/CVPR.2017.502"},{"key":"ref8","doi-asserted-by":"crossref","unstructured":"Qiu, Z., T. Yao, and T. Mei. Learning spatio-temporal representation with pseudo-3d residual networks. in proceedings of the IEEE International Conference on Computer Vision. (2017)","DOI":"10.1109\/ICCV.2017.590"},{"key":"ref9","doi-asserted-by":"crossref","unstructured":"Xie, S., et al. Rethinking spatiotemporal feature learning: Speed-accuracy trade-offs in video classification. in Proceedings of the European conference on computer vision (ECCV) (2018)","DOI":"10.1007\/978-3-030-01267-0_19"},{"key":"ref10","doi-asserted-by":"crossref","unstructured":"Long, J., E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. in Proceedings of the IEEE conference on computer vision and pattern recognition. (2015)","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"ref11","doi-asserted-by":"crossref","unstructured":"Zhou, B., et al. Temporal relational reasoning in videos. in Proceedings of the European conference on computer vision (ECCV).(2018)","DOI":"10.1007\/978-3-030-01246-5_49"},{"key":"ref12","doi-asserted-by":"crossref","unstructured":"Zolfaghari, M., K. Singh, and T. Brox. Eco: Efficient convolutional network for online video understanding. in Proceedings of the European conference on computer vision (ECCV).(2018)","DOI":"10.1007\/978-3-030-01216-8_43"},{"key":"ref13","doi-asserted-by":"crossref","unstructured":"Wang, L., et al. Temporal segment networks: Towards good practices for deep action recognition. in European conference on computer vision. Springer.(2016)","DOI":"10.1007\/978-3-319-46484-8_2"},{"key":"ref14","unstructured":"Bochkovskiy, A., C.-Y.Wang, and H.-Y.M. Liao, Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934.(2020)"},{"key":"ref15","doi-asserted-by":"crossref","unstructured":"Liu, W., et al. Ssd: Single shot multibox detector. in European conference on computer vision. Springer.(2016)","DOI":"10.1007\/978-3-319-46448-0_2"},{"key":"ref16","doi-asserted-by":"crossref","unstructured":"Cao, Z., et al. Realtime multi-person 2d pose estimation using part affinity fields. in Proceedings of the IEEE conference on computer vision and pattern recognition. (2017)","DOI":"10.1109\/CVPR.2017.143"},{"key":"ref17","doi-asserted-by":"crossref","unstructured":"Fang, H.-S., et al., Alphapose: Whole-body regional multi-person pose estimation and tracking in real-time. IEEE transactions on pattern analysis and machine intelligence, 45(6): p. 7157-7173.(2022)","DOI":"10.1109\/TPAMI.2022.3222784"},{"key":"ref18","doi-asserted-by":"crossref","unstructured":"Mathis, A., et al., DeepLabCut: markerless pose estimation of user-defined body parts with deep learning. Nature neuroscience, 21(9): p. 1281-1289.(2018)","DOI":"10.1038\/s41593-018-0209-y"},{"key":"ref19","doi-asserted-by":"crossref","unstructured":"Izadi, S., et al. Kinectfusion: real-time 3d reconstruction and interaction using a moving depth camera. in Proceedings of the 24th annual ACM symposium on User interface software and technology. (2011)","DOI":"10.1145\/2047196.2047270"},{"key":"ref20","unstructured":"Lugaresi, C., et al., Mediapipe: A framework for building perception pipelines. arXiv preprint arXiv:1906.08172, (2019)"},{"key":"ref21","unstructured":"Bradski, G. and A. Kaehler, Learning OpenCV: Computer vision with the OpenCV library. O\u2019Reilly Media, Inc.(2008)"},{"key":"ref22","doi-asserted-by":"crossref","unstructured":"Shoemake, K. Animating rotation with quaternion curves. in Proceedings of the 12th annual conference on Computer graphics and interactive techniques.(1985)","DOI":"10.1145\/325334.325242"},{"key":"ref23","doi-asserted-by":"crossref","unstructured":"Alvarado, E., D. Rohmer, and M.P. Cani. Generating Upper-Body Motion for Real-Time Characters Making their Way through Dynamic Environments. in Computer Graphics Forum.Wiley Online Library.(2022)","DOI":"10.1111\/cgf.14633"},{"key":"ref24","unstructured":"Edeline, K., et al., Using UDP for internet transport evolution. arXiv preprint arXiv:1612.07816. (2016)"},{"key":"ref25","doi-asserted-by":"crossref","unstructured":"Qiu, W., et al. Unrealcv: Virtual worlds for computer vision. in Proceedings of the 25th ACM international conference on Multimedia. (2017)","DOI":"10.1145\/3123266.3129396"},{"key":"ref26","doi-asserted-by":"crossref","unstructured":"Huang, X., et al., A systematic review of AR and VR enhanced language learning. Sustainability, 13(9): p. 4639.(2021)","DOI":"10.3390\/su13094639"},{"key":"ref27","unstructured":"Younes, M., Learning and simulation of sport strategies (boxing) for virtual reality training, Universit\u00e9 de Rennes.(2024)"},{"key":"ref28","doi-asserted-by":"crossref","unstructured":"Yan, Z. and J. Yi, Dissecting Latency in 360 Video Camera Sensing Systems. Sensors, 22(16): p. 6001.(2022)","DOI":"10.3390\/s22166001"}],"container-title":["Computer Science and Information Systems"],"original-title":[],"language":"en","deposited":{"date-parts":[[2025,11,24]],"date-time":"2025-11-24T12:35:49Z","timestamp":1763987749000},"score":1,"resource":{"primary":{"URL":"https:\/\/doiserbia.nb.rs\/Article.aspx?ID=1820-02142500067L"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025]]},"references-count":28,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025]]}},"URL":"https:\/\/doi.org\/10.2298\/csis241002067l","relation":{},"ISSN":["1820-0214","2406-1018"],"issn-type":[{"type":"print","value":"1820-0214"},{"type":"electronic","value":"2406-1018"}],"subject":[],"published":{"date-parts":[[2025]]}}}