{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,30]],"date-time":"2025-07-30T14:21:16Z","timestamp":1753885276822,"version":"3.41.2"},"reference-count":22,"publisher":"World Scientific Pub Co Pte Ltd","issue":"05","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Int. J. Model. Simul. Sci. Comput."],"published-print":{"date-parts":[[2023,10]]},"abstract":"<jats:p> Visual speech is hard to recreate by human hands because animation itself is a time-consuming task: both precision and detail must be considered and match the expectations of the developers, but above all, those of the audience. To solve this problem, some approaches has been designed to help accelerate the animation of characters faces, as procedural animation or speech-lip synchronization, where the most common areas for researching these methods are Computer Vision and Machine Learning. However, in general, these tools can have any of these main problems: difficulty on adapting to another language, subject or animation software, high hardware specifications, or the results can be receipted as robotic. Our work presents a Deep Learning model for automatic expressive facial animation using audio. We extract generic audio features from expressive audio speeches rich in phonemes for nonidiom focus speech processing and emotion recognition. From videos used for training, we extracted the landmarks for frame-speech targeting and have the model learn animation for phonemes pronunciation. We evaluated four variants of our model (two function losses and with emotion conditioning) by using a user perspective survey where the one using a Reconstruction Loss Function with emotion training conditioning got more natural results and score in synchronization with the approval of the majority of interviewees. For perception of naturalness, it obtained a 38.89% of the total votes of approval and for language synchronization obtained the highest average score with 65.55% (98.33 of a 150 total points) for English, German and Korean languages. <\/jats:p>","DOI":"10.1142\/s1793962324500028","type":"journal-article","created":{"date-parts":[[2022,11,28]],"date-time":"2022-11-28T07:03:25Z","timestamp":1669619005000},"source":"Crossref","is-referenced-by-count":0,"title":["Emotional 3D speech visualization from 2D audio visual data"],"prefix":"10.1142","volume":"14","author":[{"given":"Luis","family":"Guillermo","sequence":"first","affiliation":[{"name":"Universidad Peruana de Ciencias Aplicadas (UPC), Prolongaci\u00f3n Primavera 2390, Lima, Lima 15023, Peru"}]},{"given":"Jose-Maria","family":"Rojas","sequence":"additional","affiliation":[{"name":"Universidad Peruana de Ciencias Aplicadas (UPC), Prolongaci\u00f3n Primavera 2390, Lima, Lima 15023, Peru"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7510-618X","authenticated-orcid":false,"given":"Willy","family":"Ugarte","sequence":"additional","affiliation":[{"name":"Universidad Peruana de Ciencias Aplicadas (UPC), Prolongaci\u00f3n Primavera 2390, Lima, Lima 15023, Peru"}]}],"member":"219","published-online":{"date-parts":[[2022,11,26]]},"reference":[{"key":"S1793962324500028BIB001","doi-asserted-by":"publisher","DOI":"10.1109\/MRA.2012.2192811"},{"key":"S1793962324500028BIB003","doi-asserted-by":"publisher","DOI":"10.1145\/3072959.3073640"},{"key":"S1793962324500028BIB004","doi-asserted-by":"publisher","DOI":"10.1145\/3388767.3407339"},{"key":"S1793962324500028BIB005","doi-asserted-by":"publisher","DOI":"10.1145\/2897824.2925984"},{"key":"S1793962324500028BIB006","doi-asserted-by":"publisher","DOI":"10.1145\/3072959.3073699"},{"key":"S1793962324500028BIB007","doi-asserted-by":"publisher","DOI":"10.1145\/3072959.3073658"},{"key":"S1793962324500028BIB008","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-58589-1_42"},{"key":"S1793962324500028BIB009","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.01034"},{"key":"S1793962324500028BIB010","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2017.2766843"},{"key":"S1793962324500028BIB012","first-page":"1","volume":"37","author":"Zhou Y.","year":"2018","journal-title":"ACM Trans. Graph."},{"key":"S1793962324500028BIB013","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2018.2887027"},{"key":"S1793962324500028BIB014","doi-asserted-by":"publisher","DOI":"10.1109\/TAFFC.2020.3022017"},{"key":"S1793962324500028BIB015","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2020.107231"},{"key":"S1793962324500028BIB016","doi-asserted-by":"publisher","DOI":"10.3390\/s21041249"},{"volume-title":"Deep Learning","year":"2016","author":"Goodfellow I.","key":"S1793962324500028BIB017"},{"key":"S1793962324500028BIB018","doi-asserted-by":"publisher","DOI":"10.1117\/3.633187"},{"key":"S1793962324500028BIB019","series-title":"Machine Learning Mastery","volume-title":"Generative Adversarial Networks with Python: Deep Learning Generative Models for Image Synthesis and Image Translation","author":"Brownlee J.","year":"2019"},{"key":"S1793962324500028BIB020","series-title":"Machine Learning Mastery","volume-title":"Long Short-Term Memory Networks with Python: Develop Sequence Prediction Models with Deep Learning","author":"Brownlee J.","year":"2017"},{"key":"S1793962324500028BIB021","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2013.2279659"},{"key":"S1793962324500028BIB022","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-12604-8_6"},{"key":"S1793962324500028BIB023","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2019.2935843"},{"key":"S1793962324500028BIB024","first-page":"1","volume-title":"2016 IEEE Symp. Series on Computational Intelligence (SSCI)","author":"Zhang L.","year":"2016"}],"container-title":["International Journal of Modeling, Simulation, and Scientific Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.worldscientific.com\/doi\/pdf\/10.1142\/S1793962324500028","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,12,21]],"date-time":"2023-12-21T13:26:31Z","timestamp":1703165191000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.worldscientific.com\/doi\/10.1142\/S1793962324500028"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,11,26]]},"references-count":22,"journal-issue":{"issue":"05","published-print":{"date-parts":[[2023,10]]}},"alternative-id":["10.1142\/S1793962324500028"],"URL":"https:\/\/doi.org\/10.1142\/s1793962324500028","relation":{},"ISSN":["1793-9623","1793-9615"],"issn-type":[{"type":"print","value":"1793-9623"},{"type":"electronic","value":"1793-9615"}],"subject":[],"published":{"date-parts":[[2022,11,26]]},"article-number":"2450002"}}