{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,27]],"date-time":"2025-11-27T16:20:05Z","timestamp":1764260405824,"version":"3.41.0"},"reference-count":52,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2025,4,29]],"date-time":"2025-04-29T00:00:00Z","timestamp":1745884800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,4,29]],"date-time":"2025-04-29T00:00:00Z","timestamp":1745884800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100009092","name":"Universidad de Alicante","doi-asserted-by":"crossref","id":[{"id":"10.13039\/100009092","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Virtual Reality"],"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Humans engage in a multitude of actions, some of which are rare but essential for data collection. Synthetic generation techniques are particularly effective in these scenarios, enriching the data for such uncommon actions. To address this need, we introduce a novel framework developed within Unreal Engine 5, designed to generate human action video data in hyper-realistic virtual environments. Our framework mitigates the scarcity and limited diversity of existing datasets for infrequent actions or routine tasks by utilizing synthetic motion generation through text-guided generative motion models, Gaussian splatting 3D reconstruction, and MetaHuman avatars. The utility of the framework is demonstrated by producing a synthetic video dataset depicting various human actions in diverse settings. To validate the effectiveness of the generated data, we trained VideoMAE, a state-of-the-art action recognition model, on the extended UCF101 dataset, incorporating both synthetic and real fall data, obtaining F1-scores of 0.95 and 0.97 when evaluated on the URFall and MCF datasets, respectively. The quality of the RGB-D videos generated represent a significant advance in the field. Additionally, a graph is generated from the rendered scene, detecting objects and their relationships, thus adding valuable contextual information to the video data. This capability to generate data across a wide range of actions and environments positions our framework as a valuable tool for broader applications, including digital twin creation and dataset augmentation.<\/jats:p>","DOI":"10.1007\/s10055-025-01146-9","type":"journal-article","created":{"date-parts":[[2025,4,29]],"date-time":"2025-04-29T05:54:39Z","timestamp":1745906079000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Unrealgensyn: a framework for generating synthetic videos of Unfrequent human events"],"prefix":"10.1007","volume":"29","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1712-7265","authenticated-orcid":false,"given":"David","family":"Mulero-P\u00e9rez","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8809-8476","authenticated-orcid":false,"given":"Manuel","family":"Benavent-Lledo","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7798-3055","authenticated-orcid":false,"given":"Jose","family":"Garcia-Rodriguez","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2799-491X","authenticated-orcid":false,"given":"Markus","family":"Vincze","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,4,29]]},"reference":[{"doi-asserted-by":"crossref","unstructured":"Ahn H, Ha T, Choi Y, Yoo H, Oh S (2018) Text2action: generative adversarial synthesis from language to action. In: 2018 IEEE ICRA. IEEE, pp 5915\u20135920","key":"1146_CR1","DOI":"10.1109\/ICRA.2018.8460608"},{"doi-asserted-by":"crossref","unstructured":"Ahuja C, Morency L-P (2019) Language2pose: natural language grounded pose forecasting. In: 2019 international conference on 3D Vision (3DV). IEEE, pp 719\u2013728","key":"1146_CR2","DOI":"10.1109\/3DV.2019.00084"},{"doi-asserted-by":"crossref","unstructured":"Athanasiou N, Petrovich M, Black MJ, Varol G (2022) Teach: temporal action composition for 3D humans. In: 2022 international conference on 3D vision (3DV). IEEE, pp 414\u2013423","key":"1146_CR3","DOI":"10.1109\/3DV57658.2022.00053"},{"unstructured":"Auvinet E, Rougier C, Meunier J, St-Arnaud A, Rousseau J (2010) Multiple cameras fall dataset. DIRO-Universit\u00e9 de Montr\u00e9al, Tech. Rep 1350, 24","key":"1146_CR4"},{"doi-asserted-by":"publisher","unstructured":"Binh\u00a0Do PN, Chi\u00a0Nguyen Q (2019) A review of stereo-photogrammetry method for 3-D reconstruction in computer vision. In: 2019 19th international symposium on communications and information technologies (ISCIT), pp 138\u2013143. https:\/\/doi.org\/10.1109\/ISCIT.2019.8905144","key":"1146_CR5","DOI":"10.1109\/ISCIT.2019.8905144"},{"doi-asserted-by":"crossref","unstructured":"Cai Z, Jiang J, Qing Z, et al (2024) Digital life project: autonomous 3D characters with social intelligence. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 582\u2013592","key":"1146_CR7","DOI":"10.1109\/CVPR52733.2024.00062"},{"doi-asserted-by":"crossref","unstructured":"Cen Z, Pi H, Peng S, Shen Z, Yang M, Zhu S, Bao H, Zhou X (2024) Generating human motion in 3D scenes from text descriptions. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 1855\u20131866","key":"1146_CR9","DOI":"10.1109\/CVPR52733.2024.00182"},{"doi-asserted-by":"crossref","unstructured":"Cervantes P, Sekikawa Y, Sato I, Shinoda K (2022) Implicit neural representations for variable length human motion generation. In: ECCV. Springer, pp 356\u2013372","key":"1146_CR10","DOI":"10.1007\/978-3-031-19790-1_22"},{"unstructured":"Chen M, Zhou Y, Jian W, Wan P, Wang Z (2023) Temporal-aware refinement for video-based human pose and shape recovery. arXiv e-prints, 2311","key":"1146_CR11"},{"doi-asserted-by":"crossref","unstructured":"Chen S, Chen X, et al (2024) Ll3da: Visual interactive instruction tuning for omni-3d understanding reasoning and planning. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 26428\u201326438","key":"1146_CR6","DOI":"10.1109\/CVPR52733.2024.02496"},{"doi-asserted-by":"crossref","unstructured":"Cui J, Liu T, Liu N, Yang Y, Zhu Y, Huang S (2024) Anyskill: learning open-vocabulary physical skill for interactive agents. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 852\u2013862","key":"1146_CR8","DOI":"10.1109\/CVPR52733.2024.00087"},{"doi-asserted-by":"crossref","unstructured":"Esser P, Chiu J, Atighehchian P, Granskog J, Germanidis A (2023) Structure and content-guided video synthesis with diffusion models. arXiv e-prints, 2302","key":"1146_CR12","DOI":"10.1109\/ICCV51070.2023.00675"},{"doi-asserted-by":"crossref","unstructured":"Fiche G, Sevestre V, Gonzalez-Barral C, Leglaive S, S\u00e9guier R (2023) Swimxyz: a large-scale dataset of synthetic swimming motions and videos. In: Proceedings of the 16th ACM SIGGRAPH conference on motion, interaction and games, pp 1\u20137","key":"1146_CR13","DOI":"10.1145\/3623264.3624440"},{"doi-asserted-by":"crossref","unstructured":"Greff K, Belletti F, Beyer L, Doersch C, Du Y, Duckworth D, Fleet DJ, Gnanapragasam D, et al (2022) Kubric: a scalable dataset generator. In: 2022 IEEE\/CVF CVPR. IEEE, pp 3739\u20133751","key":"1146_CR14","DOI":"10.1109\/CVPR52688.2022.00373"},{"doi-asserted-by":"crossref","unstructured":"Gu Q, Kuwajerwala A, Jatavallabhula KM, Sen B, Agarwal A, Rivera C, Paul W, Chellappa R, Gan C, Melo CM, et al (2023) Conceptgraphs: open-vocabulary 3D scene graphs for perception and planning. In: 2nd workshop on language and robot learning: language as grounding","key":"1146_CR15","DOI":"10.1109\/ICRA57147.2024.10610243"},{"doi-asserted-by":"crossref","unstructured":"Guo C, Zou S, Zuo X, Wang S, Ji W, Li X, Cheng L (2022) Generating diverse and natural 3D human motions from text. In: IEEE\/CVF CVPR, pp 5152\u20135161","key":"1146_CR16","DOI":"10.1109\/CVPR52688.2022.00509"},{"issue":"4","key":"1146_CR17","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3528223.3530094","volume":"41","author":"F Hong","year":"2022","unstructured":"Hong F, Zhang M, Pan L, Cai Z, Yang L, Liu Z (2022) Avatarclip: zero-shot text-driven generation and animation of 3D avatars. ACM Trans Graph (TOG) 41(4):1\u201319","journal-title":"ACM Trans Graph (TOG)"},{"unstructured":"Jiang B, Chen X, Liu W, Yu J, Yu G, Chen T (2023) Motiongpt: human motion as a foreign language. Preprint at arXiv:2306.14795","key":"1146_CR18"},{"unstructured":"Kay W, Carreira J, Simonyan K, Zhang B, Hillier C, Vijayanarasimhan S, Viola F, Green T, Back T, Natsev P, et al (2017) The kinetics human action video dataset. Preprint at arXiv:1705.06950","key":"1146_CR20"},{"doi-asserted-by":"crossref","unstructured":"Ke B, Obukhov A, Huang S, Metzger N, Daudt RC, Schindler K (2024) Repurposing diffusion-based image generators for monocular depth estimation. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 9492\u20139502","key":"1146_CR25","DOI":"10.1109\/CVPR52733.2024.00907"},{"issue":"4","key":"1146_CR22","doi-asserted-by":"publisher","first-page":"139","DOI":"10.1145\/3592433","volume":"42","author":"B Kerbl","year":"2023","unstructured":"Kerbl B, Kopanas G, Leimk\u00fchler T, Drettakis G (2023) 3D Gaussian splatting for real-time radiance field rendering. ACM Trans Graph 42(4):139","journal-title":"ACM Trans Graph"},{"doi-asserted-by":"crossref","unstructured":"Khachatryan L, Movsisyan A, Tadevosyan V, Henschel R, Wang Z, Navasardyan S, Shi H (2023) Text2video-zero: text-to-image diffusion models are zero-shot video generators. Preprint at arXiv:2303.13439","key":"1146_CR24","DOI":"10.1109\/ICCV51070.2023.01462"},{"doi-asserted-by":"crossref","unstructured":"Kirillov A, Mintun E, Ravi N, Mao H, Rolland C, Gustafson L, Xiao T, Whitehead S, Berg AC, Lo W-Y, et al (2023) Segment anything. In: Proceedings of the IEEE\/CVF international conference on computer vision, pp 4015\u20134026","key":"1146_CR23","DOI":"10.1109\/ICCV51070.2023.00371"},{"doi-asserted-by":"crossref","unstructured":"Kocabas M, Athanasiou N, Black MJ (2020) Vibe: video inference for human body pose and shape estimation. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 5253\u20135263","key":"1146_CR19","DOI":"10.1109\/CVPR42600.2020.00530"},{"issue":"3","key":"1146_CR21","doi-asserted-by":"publisher","first-page":"489","DOI":"10.1016\/j.cmpb.2014.09.005","volume":"117","author":"B Kwolek","year":"2014","unstructured":"Kwolek B, Kepski M (2014) Human fall detection on embedded platform using depth maps and wireless accelerometer. Comput Methods Programs Biomed 117(3):489\u2013501","journal-title":"Comput Methods Programs Biomed"},{"unstructured":"Liu H, Li C, Wu Q, Lee YJ (2024) Visual instruction tuning. Adv Neural Inf Process Syst 36:34892\u201334916","key":"1146_CR28"},{"issue":"4","key":"1146_CR30","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3306346.3323020","volume":"38","author":"S Lombardi","year":"2019","unstructured":"Lombardi S, Simon T, Saragih J, Schwartz G, Lehrmann A, Sheikh Y (2019) Neural volumes: learning dynamic renderable volumes from images. ACM Trans Graph 38(4):1\u201314","journal-title":"ACM Trans Graph"},{"issue":"6","key":"1146_CR29","first-page":"248","volume":"34","author":"M Loper","year":"2015","unstructured":"Loper M, Mahmood N, Romero J, Pons-Moll G, Black MJ (2015) SMPL: a skinned multi-person linear model. ACM Trans Graph (Proc SIGGRAPH Asia) 34(6):248\u2013124816","journal-title":"ACM Trans Graph (Proc SIGGRAPH Asia)"},{"doi-asserted-by":"crossref","unstructured":"Lucas T, Baradel F, Weinzaepfel P, Rogez G (2022) Posegpt: quantization-based 3D human motion generation and forecasting. In: ECCV. Springer, pp 417\u2013435","key":"1146_CR26","DOI":"10.1007\/978-3-031-20068-7_24"},{"unstructured":"Luma AI (2024) Luma Unreal Engine plugin (0.4). https:\/\/lumaai.notion.site\/Luma-Unreal-Engine-Plugin-0-4-8005919d93444c008982346185e933a1. Accessed 06 Jan 2024","key":"1146_CR31"},{"doi-asserted-by":"crossref","unstructured":"Luo Z, Chen D, Zhang Y, Huang Y, Wang L, Shen Y, Zhao D, Zhou J, Tan T (2023) Videofusion: decomposed diffusion models for high-quality video generation. In: IEEE\/CVF CVPR","key":"1146_CR27","DOI":"10.1109\/CVPR52729.2023.10308948"},{"doi-asserted-by":"crossref","unstructured":"Martinez-Gonzalez P, Oprea S, Castro-Vargas JA, Garcia-Garcia A, Orts-Escolano S et al (2021) Unrealrox+: an improved tool for acquiring synthetic data from virtual 3D environments. In: 2021 IJCNN. IEEE, pp 1\u20138","key":"1146_CR33","DOI":"10.1109\/IJCNN52387.2021.9534447"},{"doi-asserted-by":"crossref","unstructured":"Miangoleh SMH, Dille S, Mai L, Paris S, Aksoy Y (2021) Boosting monocular depth estimation models to high-resolution via content-adaptive multi-resolution merging. In: Proceedings of CVPR","key":"1146_CR32","DOI":"10.1109\/CVPR46437.2021.00956"},{"issue":"1","key":"1146_CR35","doi-asserted-by":"publisher","first-page":"99","DOI":"10.1145\/3503250","volume":"65","author":"B Mildenhall","year":"2021","unstructured":"Mildenhall B, Srinivasan PP, Tancik M, Barron JT, Ramamoorthi R, Ng R (2021) Nerf: representing scenes as neural radiance fields for view synthesis. ACM Commun 65(1):99\u2013106","journal-title":"ACM Commun"},{"issue":"1","key":"1146_CR34","doi-asserted-by":"publisher","first-page":"23","DOI":"10.1007\/s42454-022-00041-x","volume":"4","author":"ME Miller","year":"2022","unstructured":"Miller ME, Spatz E (2022) A unified view of a human digital twin. Hum Intell Syst Integr 4(1):23\u201333","journal-title":"Hum Intell Syst Integr"},{"unstructured":"OpenAI Achiam J, Adler S et al (2024) GPT-4 Technical report. https:\/\/arxiv.org\/abs\/2303.08774","key":"1146_CR36"},{"unstructured":"Qu H, Guo Z, Liu J (2024) Gpt-connect: interaction between text-driven human motion generator and 3D scenes in a training-free manner. arXiv e-prints, 2403","key":"1146_CR37"},{"unstructured":"Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J, et al (2021) Learning transferable visual models from natural language supervision. In: International conference on machine learning. PMLR, pp 8748\u20138763","key":"1146_CR38"},{"key":"1146_CR39","doi-asserted-by":"publisher","DOI":"10.1016\/j.cviu.2023.103907","volume":"240","author":"H Schieber","year":"2024","unstructured":"Schieber H, Demir KC, Kleinbeck C et al (2024) Indoor synthetic data generation: a systematic review. Comput Vis Image Underst 240:103907","journal-title":"Comput Vis Image Underst"},{"doi-asserted-by":"crossref","unstructured":"Schneider D, Keller M, Zhong Z, Peng K, Roitberg A, Beyerer J, Stiefelhagen R (2024) Synthact: towards generalizable human action recognition based on synthetic data. In: 2024 IEEE ICRA, pp 13038\u201313045","key":"1146_CR41","DOI":"10.1109\/ICRA57147.2024.10611486"},{"doi-asserted-by":"crossref","unstructured":"Sch\u00f6nberger JL, Zheng E, Pollefeys M, Frahm J-M (2016) Pixelwise view selection for unstructured multi-view stereo. In: European conference on computer vision (ECCV)","key":"1146_CR43","DOI":"10.1007\/978-3-319-46487-9_31"},{"doi-asserted-by":"crossref","unstructured":"Shin S, Kim J, Halilaj E, Black MJ (2023) Wham: reconstructing world-grounded humans with accurate 3D motion. Preprint at arXiv:2312.07531","key":"1146_CR40","DOI":"10.1109\/CVPR52733.2024.00202"},{"doi-asserted-by":"crossref","unstructured":"Sitzmann V, Thies J, Heide F, NieBner M, Wetzstein G, Zollhofer M (2019) Deepvoxels: learning persistent 3D feature embeddings. In: 2019 IEEE\/CVF CVPR. IEEE Computer Society, pp 2432\u20132441","key":"1146_CR42","DOI":"10.1109\/CVPR.2019.00254"},{"unstructured":"Soomro K, Zamir AR, Shah M (2012) UCF101: a dataset of 101 human actions classes from videos in the wild. CRCV-TR-12-01","key":"1146_CR44"},{"doi-asserted-by":"crossref","unstructured":"Tang Y, Liu J, Liu A, Yang B, Dai W, Rao Y, Lu J, Zhou J, Li X (2023) Flag3d: a 3D fitness activity dataset with language instruction. In: IEEE\/CVF CVPR, pp 22106\u201322117","key":"1146_CR46","DOI":"10.1109\/CVPR52729.2023.02117"},{"doi-asserted-by":"crossref","unstructured":"Tevet G, Gordon B, Hertz A, Bermano AH, Cohen-Or D (2022a) Motionclip: exposing human motion generation to clip space. In: ECCV. Springer, pp 358\u2013374","key":"1146_CR45","DOI":"10.1007\/978-3-031-20047-2_21"},{"unstructured":"Tevet G, Raab S, Gordon B et al (2022b) Human motion diffusion model. In: The eleventh international conference on learning representations","key":"1146_CR47"},{"key":"1146_CR48","first-page":"10078","volume":"35","author":"Z Tong","year":"2022","unstructured":"Tong Z, Song Y, Wang J, Wang L (2022) Videomae: masked autoencoders are data-efficient learners for self-supervised video pre-training. NIPS 35:10078\u201310093","journal-title":"NIPS"},{"key":"1146_CR49","doi-asserted-by":"publisher","DOI":"10.1016\/j.rcim.2023.102626","volume":"85","author":"B Wang","year":"2024","unstructured":"Wang B, Zhou H, Li X, Yang G, Zheng P, Song C, Yuan Y, Wuest T, Yang H, Wang L (2024) Human digital twin in the context of industry 5.0. Robot Comput Integr Manuf 85:102626","journal-title":"Robot Comput Integr Manuf"},{"unstructured":"Xu S, Wang Z, Wang Y-X, Gui L-Y (2024) Interdreamer: zero-shot text to 3D dynamic human-object interaction. Preprint at arXiv:2403.19652","key":"1146_CR51"},{"unstructured":"XVERSE Technology Inc (2024) XV3DGS-UEPlugin. https:\/\/github.com\/xverse-engine\/XV3DGS-UEPlugin, Zhenshen, China. Accessed 10 Jan 2024","key":"1146_CR50"},{"doi-asserted-by":"crossref","unstructured":"Yang F, Gu K, Yao A (2024) Kitro: Refining human mesh by 2d clues and kinematic-tree rotation. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 1052\u20131061","key":"1146_CR52","DOI":"10.1109\/CVPR52733.2024.00106"}],"container-title":["Virtual Reality"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10055-025-01146-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10055-025-01146-9\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10055-025-01146-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,21]],"date-time":"2025-06-21T09:58:20Z","timestamp":1750499900000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10055-025-01146-9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,4,29]]},"references-count":52,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2025,6]]}},"alternative-id":["1146"],"URL":"https:\/\/doi.org\/10.1007\/s10055-025-01146-9","relation":{},"ISSN":["1434-9957"],"issn-type":[{"type":"electronic","value":"1434-9957"}],"subject":[],"published":{"date-parts":[[2025,4,29]]},"assertion":[{"value":"14 August 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 April 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"29 April 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no Conflict of interest","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}],"article-number":"76"}}