{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,23]],"date-time":"2026-01-23T16:13:51Z","timestamp":1769184831633,"version":"3.49.0"},"reference-count":58,"publisher":"MDPI AG","issue":"21","license":[{"start":{"date-parts":[[2022,10,29]],"date-time":"2022-10-29T00:00:00Z","timestamp":1667001600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100004329","name":"Slovenian Research Agency","doi-asserted-by":"publisher","award":["P2-0069"],"award-info":[{"award-number":["P2-0069"]}],"id":[{"id":"10.13039\/501100004329","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004329","name":"Slovenian Research Agency","doi-asserted-by":"publisher","award":["6316-3\/2018-255"],"award-info":[{"award-number":["6316-3\/2018-255"]}],"id":[{"id":"10.13039\/501100004329","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004329","name":"Slovenian Research Agency","doi-asserted-by":"publisher","award":["603-1\/2018-16"],"award-info":[{"award-number":["603-1\/2018-16"]}],"id":[{"id":"10.13039\/501100004329","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Young Researcher Funding","award":["P2-0069"],"award-info":[{"award-number":["P2-0069"]}]},{"name":"Young Researcher Funding","award":["6316-3\/2018-255"],"award-info":[{"award-number":["6316-3\/2018-255"]}]},{"name":"Young Researcher Funding","award":["603-1\/2018-16"],"award-info":[{"award-number":["603-1\/2018-16"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"<jats:p>In order to recreate viable and human-like conversational responses, the artificial entity, i.e., an embodied conversational agent, must express correlated speech (verbal) and gestures (non-verbal) responses in spoken social interaction. Most of the existing frameworks focus on intent planning and behavior planning. The realization, however, is left to a limited set of static 3D representations of conversational expressions. In addition to functional and semantic synchrony between verbal and non-verbal signals, the final believability of the displayed expression is sculpted by the physical realization of non-verbal expressions. A major challenge of most conversational systems capable of reproducing gestures is the diversity in expressiveness. In this paper, we propose a method for capturing gestures automatically from videos and transforming them into 3D representations stored as part of the conversational agent\u2019s repository of motor skills. The main advantage of the proposed method is ensuring the naturalness of the embodied conversational agent\u2019s gestures, which results in a higher quality of human-computer interaction. The method is based on a Kanade\u2013Lucas\u2013Tomasi tracker, a Savitzky\u2013Golay filter, a Denavit\u2013Hartenberg-based kinematic model and the EVA framework. Furthermore, we designed an objective method based on cosine similarity instead of a subjective evaluation of synthesized movement. The proposed method resulted in a 96% similarity.<\/jats:p>","DOI":"10.3390\/s22218318","type":"journal-article","created":{"date-parts":[[2022,10,30]],"date-time":"2022-10-30T10:47:57Z","timestamp":1667126877000},"page":"8318","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["Capturing Conversational Gestures for Embodied Conversational Agents Using an Optimized Kaneda\u2013Lucas\u2013Tomasi Tracker and Denavit\u2013Hartenberg-Based Kinematic Model"],"prefix":"10.3390","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6697-8931","authenticated-orcid":false,"given":"Grega","family":"Mo\u010dnik","sequence":"first","affiliation":[{"name":"Faculty of Electrical Engineering and Computer Science, University of Maribor, Koro\u0161ka c. 46, 2000 Maribor, Slovenia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zdravko","family":"Ka\u010di\u010d","sequence":"additional","affiliation":[{"name":"Faculty of Electrical Engineering and Computer Science, University of Maribor, Koro\u0161ka c. 46, 2000 Maribor, Slovenia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6856-7992","authenticated-orcid":false,"given":"Riko","family":"\u0160afari\u010d","sequence":"additional","affiliation":[{"name":"Faculty of Electrical Engineering and Computer Science, University of Maribor, Koro\u0161ka c. 46, 2000 Maribor, Slovenia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4910-1879","authenticated-orcid":false,"given":"Izidor","family":"Mlakar","sequence":"additional","affiliation":[{"name":"Faculty of Electrical Engineering and Computer Science, University of Maribor, Koro\u0161ka c. 46, 2000 Maribor, Slovenia"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2022,10,29]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"38","DOI":"10.1016\/j.cognition.2018.04.003","article-title":"Communicative intent modulates production and comprehension of actions and gestures: A Kinect study","volume":"180","author":"Trujillo","year":"2018","journal-title":"Cognition"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"260","DOI":"10.1177\/0956797609357327","article-title":"Two Sides of the Same Coin: Speech and Gesture Mutually Interact to Enhance Comprehension","volume":"21","author":"Kelly","year":"2010","journal-title":"Psychol. Sci."},{"key":"ref_3","first-page":"67","article-title":"Embodied Conversational Agents: Representation and Intelligence in User Interfaces","volume":"22","author":"Cassell","year":"2001","journal-title":"AI Mag."},{"key":"ref_4","unstructured":"Birdwhistell, R.L. (2010). Kinesics and Context: Essays on Body Motion Communication, University of Pennsylvania Press."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"102409","DOI":"10.1016\/j.ijhcs.2020.102409","article-title":"Design Features of Embodied Conversational Agents in eHealth: A Literature Review","volume":"138","author":"Kramer","year":"2020","journal-title":"Int. J. Hum.-Comput. Stud."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"2","DOI":"10.1038\/s41746-019-0213-y","article-title":"Trust and acceptance of a virtual psychiatric interview between embodied conversational agents and outpatients","volume":"3","author":"Philip","year":"2020","journal-title":"NPJ Digit. Med."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Ruttkay, Z. (2004). From Brows to Trust: Evaluating Embodied Conversational Agents, Kluwer Academic Publisher. Human-Computer Interaction Series.","DOI":"10.1007\/1-4020-2730-3"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"124","DOI":"10.1016\/j.engappai.2016.01.010","article-title":"Associating gesture expressivity with affective representations","volume":"51","author":"Malatesta","year":"2016","journal-title":"Eng. Appl. Artif. Intell."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"607","DOI":"10.1016\/j.chb.2017.03.041","article-title":"Assessment with computer agents that engage in conversational dialogues and trialogues with learners","volume":"76","author":"Graesser","year":"2017","journal-title":"Comput. Hum. Behav."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"1742","DOI":"10.1109\/TVCG.2017.2690433","article-title":"Virtual Character Animation Based on Affordable Motion Capture and Reconfigurable Tangible Interfaces","volume":"24","author":"Lamberti","year":"2017","journal-title":"IEEE Trans. Vis. Comput. Graph."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"83","DOI":"10.1080\/09540091.2015.1130021","article-title":"What makes virtual agents believable?","volume":"28","author":"Bogdanovych","year":"2016","journal-title":"Connect. Sci."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3150976","article-title":"Perceptual Validation for the Generation of Expressive Movements from End-Effector Trajectories","volume":"8","author":"Carreno","year":"2018","journal-title":"ACM Trans. Interact. Intell. Syst."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Neff, M. (2018). Hand Gesture Synthesis for Conversational Characters. Handbook of Human Motion, Springer.","DOI":"10.1007\/978-3-319-14418-4_5"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Lee, J., and Marsella, S. (2006). Nonverbal Behavior Generator for Embodied Conversational Agents, Springer.","DOI":"10.1007\/11821830_20"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Bozkurt, E., Erzin, E., and Yemez, Y. (July, January 29). Affect-expressive hand gestures synthesis and animation. Proceedings of the 2015 IEEE International Conference on Multimedia and Expo (ICME), Turin, Italy.","DOI":"10.1109\/ICME.2015.7177478"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"80","DOI":"10.1016\/j.engappai.2016.10.006","article-title":"The TTS-driven affective embodied conversational agent EVA, based on a novel conversational-behavior generation algorithm","volume":"57","author":"Rojc","year":"2017","journal-title":"Eng. Appl. Artif. Intell."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"546","DOI":"10.1109\/TAFFC.2017.2754365","article-title":"Audio-Driven Laughter Behavior Controller","volume":"8","author":"Ding","year":"2017","journal-title":"IEEE Trans. Affect. Comput."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Larboulette, C., and Gibet, S. (2016, January 5\u20136). I Am a Tree: Embodiment Using Physically Based Animation Driven by Expressive Descriptors of Motion. Proceedings of the 3rd International Symposium on Movement and Computing, Thessaloniki, Greece.","DOI":"10.1145\/2948910.2948939"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"14","DOI":"10.1109\/MCG.2017.3271459","article-title":"Animation of Natural Virtual Characters","volume":"37","author":"Neff","year":"2017","journal-title":"IEEE Comput. Graph. Appl."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Mlakar, I., Kacic, Z., Borko, M., Markus, A., and Rojc, M. (2019). Development of a Repository of Virtual 3D Conversational Gestures and Expressions, Springer.","DOI":"10.1007\/978-3-030-21507-1_16"},{"key":"ref_21","first-page":"15","article-title":"A Novel Realizer of Conversational Behavior for Affective and Personalized Human Machine Interaction\u2014EVA U-Realizer","volume":"14","author":"Mlakar","year":"2018","journal-title":"WSEAS Trans. Environ. Dev."},{"key":"ref_22","unstructured":"Sadoughi, N., and Busso, C. (2017). Speech-driven Animation with Meaningful Behaviors. arXiv."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"104","DOI":"10.1007\/978-3-642-15892-6_11","article-title":"Individualized Gesturing Outperforms Average Gesturing\u2014Evaluating Gesture Production in Virtual Humans","volume":"Volume 6356","author":"Allbeck","year":"2010","journal-title":"Intelligent Virtual Agents"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"82:1","DOI":"10.1145\/2601097.2601112","article-title":"Tangible and modular input device for character articulation","volume":"33","author":"Jacobson","year":"2014","journal-title":"ACM Trans. Graph."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"149","DOI":"10.1007\/s10055-018-0333-8","article-title":"Semantic framework for interactive animation generation and its application in virtual shadow play performance","volume":"22","author":"Liang","year":"2018","journal-title":"Virtual Real."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1111\/cgf.12325","article-title":"Interactive motion mapping for real-time character control","volume":"33","author":"Rhodin","year":"2014","journal-title":"Comput. Graph. Forum"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"1339","DOI":"10.3758\/s13428-019-01319-w","article-title":"Motion capture-based animated characters for the study of speech\u2013gesture integration","volume":"52","author":"Nirme","year":"2020","journal-title":"Behav. Res. Methods"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"45651","DOI":"10.1109\/ACCESS.2019.2905879","article-title":"Fragmentation Guided Human Shape Reconstruction","volume":"7","author":"Zhang","year":"2019","journal-title":"IEEE Access"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/1618452.1618520","article-title":"Dynamic Shape Capture using Multi-View Photometric Stereo","volume":"28","author":"Vlasic","year":"2009","journal-title":"ACM Trans. Graph."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"17534","DOI":"10.1109\/ACCESS.2017.2743068","article-title":"Balancing and Reconstruction of Segmented Postures for Humanoid Robots in Imitation of Motion","volume":"5","author":"Lin","year":"2017","journal-title":"IEEE Access"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/2897824.2925969","article-title":"Fusion4D: Real-time performance capture of challenging scenes","volume":"35","author":"Dou","year":"2016","journal-title":"ACM Trans. Graph."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Slavcheva, M., Baust, M., Cremers, D., and Ilic, S. (2017, January 21\u201326). KillingFusion: Non-rigid 3D Reconstruction without Correspondences. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.581"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Leroy, V., Franco, J.-S., and Boyer, E. (2017, January 22\u201329). Multi-view Dynamic Shape Refinement Using Local Temporal Integration. Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.336"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"8264","DOI":"10.1109\/ACCESS.2016.2629987","article-title":"Heterogeneous Multi-View Information Fusion: Review of 3-D Reconstruction Methods and a New Registration with Uncertainty Modeling","volume":"4","author":"Aliakbarpour","year":"2016","journal-title":"IEEE Access"},{"key":"ref_35","unstructured":"Pelachaud, C. (2015, January 4\u20138). Greta, an Interactive Expressive Embodied Conversational Agent. Proceedings of the 14th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2015), Istanbul, Turkey."},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Sun, X., Lichtenauer, J., Valstar, M., Nijholt, A., and Pantic, M. (2011). A Multimodal Database for Mimicry Analysis, Springer.","DOI":"10.1007\/978-3-642-24600-5_40"},{"key":"ref_37","unstructured":"Knight, D. (2011). Multimodality and Active Listenership: A Corpus Approach, Continuum. Research in Corpus and Discourse."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Rogez, G., Weinzaepfel, P., and Schmid, C. (2017, January 21\u201326). LCR-Net: Localization-Classification-Regression for Human Pose. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.134"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Zhou, X., Huang, Q., Sun, X., Xue, X., and Wei, Y. (2017, January 22\u201329). Towards 3D Human Pose Estimation in the Wild: A Weakly-Supervised Approach. Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy.","DOI":"10.1109\/ICCV.2017.51"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"14:1","DOI":"10.1145\/3311970","article-title":"LiveCap: Real-Time Human Performance Capture From Monocular Video","volume":"38","author":"Habermann","year":"2019","journal-title":"ACM Trans. Graph."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"143076","DOI":"10.1109\/ACCESS.2020.3013917","article-title":"An Adaptive Viewpoint Transformation Network for 3D Human Pose Estimation","volume":"8","author":"Liang","year":"2020","journal-title":"IEEE Access"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Cao, Z., Simon, T., Wei, S.-E., and Sheikh, Y. (2017, January 21\u201326). Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields. Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.143"},{"key":"ref_43","unstructured":"Marcus, G. (2018). Deep Learning: A Critical Appraisal. arXiv."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"143769","DOI":"10.1109\/ACCESS.2020.3014186","article-title":"Applying Pose Estimation to Predict Amateur Golf Swing Performance Using Edge Processing","volume":"8","author":"Kim","year":"2020","journal-title":"IEEE Access"},{"key":"ref_45","unstructured":"(2022, August 01). KLT: Kanade-Lucas-Tomasi Feature Tracker. Available online: https:\/\/cecas.clemson.edu\/~stb\/klt\/."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"215","DOI":"10.1115\/1.4011045","article-title":"A kinematic notation for lower-pair mechanisms based on matrices","volume":"22","author":"Denavit","year":"1955","journal-title":"Trans ASME E J. Appl. Mech."},{"key":"ref_47","unstructured":"Godler, J., and Urankar, D. (2022, August 01). Gospoda. Available online: https:\/\/www.youtube.com\/c\/Gospodapodcast."},{"key":"ref_48","unstructured":"Hanke, T. (2004, January 26\u201328). HamNoSys\u2014Representing Sign Language Data in Language Resources and Language Processing Contexts. Proceedings of the 4th International Conference on Language Resources and Evaluation, Lisbon, Portugal."},{"key":"ref_49","unstructured":"Shi, J. (1994, January 21\u201323). Good features to track. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition CVPR-94, Seattle, WA, USA."},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Quan, M., Mu, B., and Chai, Z. (2019, January 18\u201320). IMRL: An Improved Inertial-Aided KLT Feature Tracker. Proceedings of the 2019 IEEE International Conference on Cybernetics and Intelligent Systems (CIS) and IEEE Conference on Robotics, Automation and Mechatronics (RAM), Bangkok, Thailand.","DOI":"10.1109\/CIS-RAM47153.2019.9095829"},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"52202","DOI":"10.1109\/ACCESS.2019.2912199","article-title":"Self-Similarity and Symmetry With SIFT for Multi-Modal Image Registration","volume":"7","author":"Lv","year":"2019","journal-title":"IEEE Access"},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"1627","DOI":"10.1021\/ac60214a047","article-title":"Smoothing and Differentiation of Data by Simplified Least Squares Procedures","volume":"36","author":"Savitzky","year":"1964","journal-title":"Anal. Chem."},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"015003","DOI":"10.1117\/1.NPh.5.1.015003","article-title":"Motion artifact detection and correction in functional near-infrared spectroscopy: A new hybrid method based on spline interpolation method and Savitzky-Golay filtering","volume":"5","author":"Jahani","year":"2018","journal-title":"Neurophotonics"},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"111","DOI":"10.1109\/MSP.2011.941097","article-title":"What Is a Savitzky-Golay Filter? [Lecture Notes]","volume":"28","author":"Schafer","year":"2011","journal-title":"IEEE Signal Process. Mag."},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"e01053","DOI":"10.1016\/j.heliyon.2018.e01053","article-title":"Development of an 8DOF quadruped robot and implementation of Inverse Kinematics using Denavit-Hartenberg convention","volume":"4","author":"Atique","year":"2018","journal-title":"Heliyon"},{"key":"ref_56","unstructured":"R\u00f6der, T. (2007). Similarity, Retrieval, and Classification of Motion Capture Data. [Ph.D. Thesis, Rheinische Friedrich-Wilhelms-Universit\u00e4t]."},{"key":"ref_57","doi-asserted-by":"crossref","first-page":"559","DOI":"10.1145\/1015706.1015760","article-title":"Automated extraction and parameterization of motions in large data sets","volume":"23","author":"Kovar","year":"2004","journal-title":"ACM Trans. Graph."},{"key":"ref_58","doi-asserted-by":"crossref","unstructured":"Chen, S., Sun, Z., Li, Y., and Li, Q. (2012, January 23\u201325). Partial Similarity Human Motion Retrieval Based on Relative Geometry Features. Proceedings of the 2012 Fourth International Conference on Digital Home, Guangzhou, China.","DOI":"10.1109\/ICDH.2012.91"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/21\/8318\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T01:05:56Z","timestamp":1760144756000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/22\/21\/8318"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,10,29]]},"references-count":58,"journal-issue":{"issue":"21","published-online":{"date-parts":[[2022,11]]}},"alternative-id":["s22218318"],"URL":"https:\/\/doi.org\/10.3390\/s22218318","relation":{},"ISSN":["1424-8220"],"issn-type":[{"value":"1424-8220","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,10,29]]}}}