{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,13]],"date-time":"2026-06-13T19:33:49Z","timestamp":1781379229910,"version":"3.54.1"},"reference-count":104,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2022,6,1]],"date-time":"2022-06-01T00:00:00Z","timestamp":1654041600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Research Council of Norway","award":["#314690"],"award-info":[{"award-number":["#314690"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["BDCC"],"abstract":"<jats:p>When responding to allegations of child sexual, physical, and psychological abuse, Child Protection Service (CPS) workers and police personnel need to elicit detailed and accurate accounts of the abuse to assist in decision-making and prosecution. Current research emphasizes the importance of the interviewer\u2019s ability to follow empirically based guidelines. In doing so, it is essential to implement economical and scientific training courses for interviewers. Due to recent advances in artificial intelligence, we propose to generate a realistic and interactive child avatar, aiming to mimic a child. Our ongoing research involves the integration and interaction of different components with each other, including how to handle the language, auditory, emotional, and visual components of the avatar. This paper presents three subjective studies that investigate and compare various state-of-the-art methods for implementing multiple aspects of the child avatar. The first user study evaluates the whole system and shows that the system is well received by the expert and highlights the importance of its realism. The second user study investigates the emotional component and how it can be integrated with video and audio, and the third user study investigates realism in the auditory and visual components of the avatar created by different methods. The insights and feedback from these studies have contributed to the refined and improved architecture of the child avatar system which we present here.<\/jats:p>","DOI":"10.3390\/bdcc6020062","type":"journal-article","created":{"date-parts":[[2022,6,1]],"date-time":"2022-06-01T12:48:31Z","timestamp":1654087711000},"page":"62","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":28,"title":["Synthesizing a Talking Child Avatar to Train Interviewers Working with Maltreated Children"],"prefix":"10.3390","volume":"6","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9149-3829","authenticated-orcid":false,"given":"Pegah","family":"Salehi","sequence":"first","affiliation":[{"name":"SimulaMet, 0167 Oslo, Norway"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3830-9869","authenticated-orcid":false,"given":"Syed Zohaib","family":"Hassan","sequence":"additional","affiliation":[{"name":"SimulaMet, 0167 Oslo, Norway"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0501-7421","authenticated-orcid":false,"given":"Myrthe","family":"Lammerse","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Oslo Metropolitan University, 0130 Oslo, Norway"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5348-8546","authenticated-orcid":false,"given":"Saeed Shafiee","family":"Sabet","sequence":"additional","affiliation":[{"name":"SimulaMet, 0167 Oslo, Norway"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ingvild","family":"Riiser","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Oslo Metropolitan University, 0130 Oslo, Norway"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2439-9397","authenticated-orcid":false,"given":"Ragnhild Klingenberg","family":"R\u00f8ed","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Oslo Metropolitan University, 0130 Oslo, Norway"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Miriam S.","family":"Johnson","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Oslo Metropolitan University, 0130 Oslo, Norway"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6026-0929","authenticated-orcid":false,"given":"Vajira","family":"Thambawita","sequence":"additional","affiliation":[{"name":"SimulaMet, 0167 Oslo, Norway"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3332-1201","authenticated-orcid":false,"given":"Steven A.","family":"Hicks","sequence":"additional","affiliation":[{"name":"SimulaMet, 0167 Oslo, Norway"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5092-1308","authenticated-orcid":false,"given":"Martine","family":"Powell","sequence":"additional","affiliation":[{"name":"Centre for Investigative Interviewing, Griffith Criminology Institute, Griffith University, Brisbane 4122, Australia"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6792-3526","authenticated-orcid":false,"given":"Michael E.","family":"Lamb","sequence":"additional","affiliation":[{"name":"Deaprtment of Psychology, University of Cambridge, Cambridge CB2 3RQ, UK"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1322-5316","authenticated-orcid":false,"given":"Gunn Astrid","family":"Baugerud","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Oslo Metropolitan University, 0130 Oslo, Norway"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2073-7029","authenticated-orcid":false,"given":"P\u00e5l","family":"Halvorsen","sequence":"additional","affiliation":[{"name":"SimulaMet, 0167 Oslo, Norway"},{"name":"Department of Computer Science, Oslo Metropolitan University, 0130 Oslo, Norway"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3153-2064","authenticated-orcid":false,"given":"Michael A.","family":"Riegler","sequence":"additional","affiliation":[{"name":"SimulaMet, 0167 Oslo, Norway"},{"name":"Department of Computer Science, University of Troms\u00f8, 9037 Troms\u00f8, Norway"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"1968","published-online":{"date-parts":[[2022,6,1]]},"reference":[{"key":"ref_1","unstructured":"Sethi, D., Bellis, M., Hughes, K., Gilbert, R., Mitis, F., and Galea, G. (2013). European Report on Preventing Child Maltreatment, World Health Organization, Regional Office for Europe."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Widom, C.S. (2014). Longterm consequences of child maltreatment. Handbook of Child Maltreatment, Springer.","DOI":"10.1007\/978-94-007-7208-3_12"},{"key":"ref_3","unstructured":"World Health Organization (2009). Global Health Risks: Mortality and Burden of Disease Attributable to Selected Major Risks, World Health Organization."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Dixon, L., Perkins, D.F., Hamilton-Giachritsis, C., and Craig, L.A. (2017). The Wiley Handbook of What Works in Child Maltreatment: An Evidence-Based Approach to Assessment and Intervention in Child Protection, John Wiley & Sons.","DOI":"10.1002\/9781118976111"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"480","DOI":"10.1002\/acp.3511","article-title":"Forks in the road, routes chosen, and journeys that beckon: A selective review of scholarship on childrenss testimony","volume":"33","author":"Brown","year":"2019","journal-title":"Appl. Cogn. Psychol."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Lamb, M.E., La Rooy, D.J., Malloy, L.C., and Katz, C. (2011). Children\u2019s Testimony: A Handbook of Psychological Research and Forensic Practice, John Wiley & Sons.","DOI":"10.1002\/9781119998495"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"225","DOI":"10.1016\/j.jpag.2017.12.011","article-title":"Interpretation of Medical Findings in Suspected Child Sexual Abuse: An Update for 2018","volume":"31","author":"Adams","year":"2018","journal-title":"J. Pediatr. Adolesc. Gynecol."},{"key":"ref_8","unstructured":"Newlin, C., Steele, L.C., Chamberlin, A., Anderson, J., Kenniston, J., Russell, A., Stewart, H., and Vaughan-Eden, V. (2015). Child Forensic Interviewing: Best Practices."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Lamb, M.E., Brown, D.A., Hershkowitz, I., Orbach, Y., and Esplin, P.W. (2018). Tell Me What Happened: Questioning Children about Abuse, John Wiley & Sons.","DOI":"10.1002\/9781118881248"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"1201","DOI":"10.1016\/j.chiabu.2007.03.021","article-title":"A structured forensic interview protocol improves the quality and informativeness of investigative interviews with children: A review of research using the NICHD Investigative Interview Protocol","volume":"31","author":"Lamb","year":"2007","journal-title":"Child Abus. Negl."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"645","DOI":"10.1111\/ap.12468","article-title":"The origin, experimental basis, and application of the standard interview method: An information-gathering framework","volume":"55","author":"Powell","year":"2020","journal-title":"Aust. Psychol."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"73","DOI":"10.1146\/annurev-lawsocsci-110413-030913","article-title":"Interviewing children","volume":"10","author":"Lyon","year":"2014","journal-title":"Annu. Rev. Law Soc. Sci."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"283","DOI":"10.1080\/15614263.2012.704170","article-title":"The relationship between investigative interviewing experience and open-ended question usage","volume":"15","author":"Powell","year":"2014","journal-title":"Police Pract. Res."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"710","DOI":"10.1037\/amp0000039","article-title":"Difficulties translating research on forensic interview practices to practitioners: Finding water, leading horses, but can we get them to drink?","volume":"71","author":"Lamb","year":"2016","journal-title":"Am. Psychol."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"365","DOI":"10.1080\/10439463.2014.942850","article-title":"Improving child investigative interviewer performance through computer-based learning activities","volume":"26","author":"Powell","year":"2016","journal-title":"Polic. Soc."},{"key":"ref_16","first-page":"4","article-title":"Actors, avatars and agents: Potentials and implications of natural face technology for the creation of realistic visual presence","volume":"19","author":"Seymour","year":"2018","journal-title":"J. Assoc. Inf. Syst."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Hassan, S.Z., Salehi, P., R\u00f8ed, R.K., Halvorsen, P., Baugerud, G.A., Johnson, M.S., Lison, P., Riegler, M., Lamb, M.E., and Griwodz, C. (2022, January 14\u201317). Towards an AI-Driven Talking Avatar in Virtual Reality for Investigative Interviews of Children. Proceedings of the 2nd Edition of the Game Systems Workshop (GameSys \u201922), Athlone, Ireland.","DOI":"10.1145\/3534085.3534340"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Salehi, P., Hassan, S.Z., Sabet, S.S., Baugerud, G.A., Johnson, M.S., Riegler, M., and Halvorsen, P. (2022, January 27\u201330). Is More Realistic Better? A Comparison of Game Engine and GAN-based Avatars for Investigative Interviews of Children. Proceedings of the ICDAR Workshop, ACM ICMR 2022, Newark, NJ, USA.","DOI":"10.1145\/3512731.3534209"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"1355","DOI":"10.1016\/S0145-2134(00)00183-6","article-title":"Investigative interviews of child witnesses in Sweden","volume":"24","author":"Cederborg","year":"2000","journal-title":"Child Abus. Negl."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"654","DOI":"10.1002\/acp.3647","article-title":"Forensic interviews with preschool children: An analysis of extended interviews in Norway (2015\u20132017)","volume":"34","author":"Baugerud","year":"2020","journal-title":"Appl. Cogn. Psychol."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"109","DOI":"10.1111\/j.1467-9450.2006.00498.x","article-title":"Dynamics of verbal interaction between interviewer and child in interviews with alleged victims of child sexual abuse","volume":"47","author":"Korkman","year":"2006","journal-title":"Scand. J. Psychol."},{"key":"ref_22","first-page":"449","article-title":"Use of a structured investigative protocol enhances the quality of investigative interviews with alleged victims of child sexual abuse in Britain","volume":"23","author":"Lamb","year":"2009","journal-title":"Appl. Cogn. Psychol. Off. J. Soc. Appl. Res. Mem. Cogn."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1037\/law0000332","article-title":"Teaching child investigative interviewing skills: Long-term retention requires cumulative training","volume":"28","author":"Brubacher","year":"2021","journal-title":"Psychol. Public Policy Law"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1002\/acp.3316","article-title":"The Effects of Feedback and Reflection on the Questioning Style of Untrained Interviewers in Simulated Child Sexual Abuse Interviews","volume":"31","author":"Krause","year":"2017","journal-title":"Appl. Cogn. Psychol."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"998","DOI":"10.3389\/fpsyg.2020.00998","article-title":"Online simulation training of child sexual abuse interviews with feedback improves interview quality in Japanese university students","volume":"11","author":"Haginoya","year":"2020","journal-title":"Front. Psychol."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"105013","DOI":"10.1016\/j.chiabu.2021.105013","article-title":"The combination of feedback and modeling in online simulation training of child sexual abuse interviews improves interview quality in clinical psychologists","volume":"115","author":"Haginoya","year":"2021","journal-title":"Child Abus. Negl."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"28","DOI":"10.1080\/1068316X.2014.915323","article-title":"Simulations of child sexual abuse interviews using avatars paired with feedback improves interview quality","volume":"21","author":"Pompedda","year":"2015","journal-title":"Psychol. Crime Law"},{"key":"ref_28","unstructured":"Mayer, J.D., and Salovey, P. (1997). What is emotional intelligence?. Emotional Development and Emotional Intelligence: Educational Implications, Basic Books."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"54","DOI":"10.1037\/a0017286","article-title":"Emotional intelligence: An integrative meta-analysis and cascading model","volume":"95","author":"Joseph","year":"2010","journal-title":"J. Appl. Psychol."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Hochschild, A.R. (2012). The Managed Heart: Commercialization of Human Feeling, University of California Press.","DOI":"10.1525\/9780520951853"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"410","DOI":"10.1080\/15228932.2016.1234143","article-title":"Emotional Intelligence in Police Interviews\u2014Approach, Training and the Usefulness of the Concept","volume":"16","author":"Risan","year":"2016","journal-title":"J. Forensic Psychol. Pract."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"231","DOI":"10.1177\/1049732317734828","article-title":"Walking Children Through a Minefield: How Professionals Experience Exploring Adverse Childhood Experiences","volume":"28","author":"Albaek","year":"2018","journal-title":"Qual. Health Res."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"159","DOI":"10.1007\/BF00992253","article-title":"A new pan-cultural facial expression of emotion","volume":"10","author":"Ekman","year":"1986","journal-title":"Motiv. Emot."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1007\/BF00993116","article-title":"The universality of a contempt expression: A replication","volume":"12","author":"Ekman","year":"1988","journal-title":"Motiv. Emot."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"363","DOI":"10.1007\/BF00992972","article-title":"More evidence for the universality of a contempt expression","volume":"16","author":"Matsumoto","year":"1992","journal-title":"Motiv. Emot."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"343","DOI":"10.1111\/j.1467-9507.2007.00388.x","article-title":"Maternal meta-emotion philosophy and adolescent depressive symptomatology","volume":"16","author":"Katz","year":"2007","journal-title":"Soc. Dev."},{"key":"ref_37","first-page":"10775595211063497","article-title":"Nonverbal Emotions While Disclosing Child Abuse: The Role of Interviewer Support","volume":"29","author":"Hershkowitz","year":"2021","journal-title":"Child Maltreatment"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"111","DOI":"10.1002\/jts.22087","article-title":"Numbing of Positive, Negative, and General Emotions: Associations with Trauma Exposure, Posttraumatic Stress, and Depressive Symptoms Among Justice-Involved Youth: Numbing of Positive, Negative, or General Emotions","volume":"29","author":"Kerig","year":"2016","journal-title":"J. Trauma. Stress"},{"key":"ref_39","first-page":"55","article-title":"Recent trends in deep learning based natural language processing","volume":"13","author":"Young","year":"2018","journal-title":"CIM"},{"key":"ref_40","unstructured":"Vinyals, O., and Le, Q. (2015). A neural conversational model. arXiv."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Zhou, H., Huang, M., Zhang, T., Zhu, X., and Liu, B. (2018, January 2\u20137). Emotional chatting machine: Emotional conversation generation with internal and external memory. Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA.","DOI":"10.1609\/aaai.v32i1.11325"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Zhou, L., Gao, J., Li, D., and Shum, H.Y. (2019). The Design and Implementation of XiaoIce, an Empathetic Social Chatbot. arXiv.","DOI":"10.1162\/coli_a_00368"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Li, J., Galley, M., Brockett, C., Spithourakis, G.P., Gao, J., and Dolan, B. (2016). A persona-based neural conversation model. arXiv.","DOI":"10.18653\/v1\/P16-1094"},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Tachibana, H., Uenoyama, K., and Aihara, S. (2018, January 15\u201320). Efficiently trainable text-to-speech system based on deep convolutional networks with guided attention. Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada.","DOI":"10.1109\/ICASSP.2018.8461829"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Amberkar, A., Awasarmol, P., Deshmukh, G., and Dave, P. (2018, January 1\u20133). Speech recognition using recurrent neural networks. Proceedings of the 2018 International Conference on Current Trends towards Converging Technologies (ICCTCT), Coimbatore, India.","DOI":"10.1109\/ICCTCT.2018.8551185"},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"2410","DOI":"10.1109\/TASLP.2017.2756440","article-title":"Toward Human Parity in Conversational Speech Recognition","volume":"25","author":"Xiong","year":"2017","journal-title":"IEEE\/ACM Trans. Audio Speech Lang. Proc."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Park, D.S., Chan, W., Zhang, Y., Chiu, C.C., Zoph, B., Cubuk, E.D., and Le, Q.V. (2019). Specaugment: A simple data augmentation method for automatic speech recognition. arXiv.","DOI":"10.21437\/Interspeech.2019-2680"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Sadjadi, O., Greenberg, C., Singer, E., Mason, L., and Reynolds, D. (2021). NIST 2021 Speaker Recognition Evaluation Plan.","DOI":"10.21437\/Odyssey.2022-45"},{"key":"ref_49","unstructured":"Zhang, Y., Qin, J., Park, D.S., Han, W., Chiu, C.C., Pang, R., Le, Q.V., and Wu, Y. (2020). Pushing the limits of semi-supervised learning for automatic speech recognition. arXiv."},{"key":"ref_50","doi-asserted-by":"crossref","unstructured":"Chung, Y.A., Zhang, Y., Han, W., Chiu, C.C., Qin, J., Pang, R., and Wu, Y. (2021). W2v-bert: Combining contrastive learning and masked language modeling for self-supervised speech pre-training. arXiv.","DOI":"10.1109\/ASRU51503.2021.9688253"},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"335","DOI":"10.1007\/s10579-008-9076-6","article-title":"IEMOCAP: Interactive emotional dyadic motion capture database","volume":"42","author":"Busso","year":"2008","journal-title":"Lang. Resour. Eval."},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"312","DOI":"10.1016\/j.bspc.2018.08.035","article-title":"Speech emotion recognition using deep 1D & 2D CNN LSTM networks","volume":"47","author":"Zhao","year":"2019","journal-title":"Biomed. Signal Process. Control"},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"60","DOI":"10.1016\/j.neunet.2017.02.013","article-title":"Evaluating deep learning architectures for Speech Emotion Recognition","volume":"92","author":"Fayek","year":"2017","journal-title":"Neural Netw."},{"key":"ref_54","doi-asserted-by":"crossref","unstructured":"Shen, J., Pang, R., Weiss, R.J., Schuster, M., Jaitly, N., Yang, Z., Chen, Z., Zhang, Y., Wang, Y., and Skerrv-Ryan, R. (2018, January 15\u201320). Natural tts synthesis by conditioning wavenet on mel spectrogram predictions. Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada.","DOI":"10.1109\/ICASSP.2018.8461368"},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Prenger, R., Valle, R., and Catanzaro, B. (2019, January 12\u201319). Waveglow: A flow-based generative network for speech synthesis. Proceedings of the ICASSP 2019\u20132019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK.","DOI":"10.1109\/ICASSP.2019.8683143"},{"key":"ref_56","unstructured":"Kalchbrenner, N., Elsen, E., Simonyan, K., Noury, S., Casagrande, N., Lockhart, E., Stimberg, F., Oord, A.V.D., Dieleman, S., and Kavukcuoglu, K. (2018). Efficient Neural Audio Synthesis. arXiv."},{"key":"ref_57","first-page":"2672","article-title":"Generative adversarial nets","volume":"27","author":"Goodfellow","year":"2014","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_58","unstructured":"Karras, T., Aila, T., Laine, S., and Lehtinen, J. (2017). Progressive growing of gans for improved quality, stability, and variation. arXiv."},{"key":"ref_59","doi-asserted-by":"crossref","unstructured":"Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., and Aila, T. (2020, January 14\u201319). Analyzing and improving the image quality of stylegan. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.00813"},{"key":"ref_60","doi-asserted-by":"crossref","unstructured":"Choi, Y., Choi, M., Kim, M., Ha, J.W., Kim, S., and Choo, J. (2018, January 18\u201322). Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00916"},{"key":"ref_61","unstructured":"Agarwal, S., Farid, H., Gu, Y., He, M., Nagano, K., and Li, H. (2019, January 16\u201320). Protecting World Leaders Against Deep Fakes. Proceedings of the CVPR Workshops, Long Beach, CA, USA."},{"key":"ref_62","unstructured":"Perov, I., Gao, D., Chervoniy, N., Liu, K., Marangonda, S., Um\u00e9, C., Dpfks, M., Facenheim, C.S., RP, L., and Jiang, J. (2020). DeepFaceLab: Integrated, flexible and extensible face-swapping framework. arXiv."},{"key":"ref_63","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3072959.3073640","article-title":"Synthesizing obama: Learning lip sync from audio","volume":"36","author":"Suwajanakorn","year":"2017","journal-title":"ACM Trans. Graph. ToG"},{"key":"ref_64","doi-asserted-by":"crossref","unstructured":"Chen, L., Maddox, R.K., Duan, Z., and Xu, C. (2019, January 15\u201320). Hierarchical cross-modal talking face generation with dynamic pixel-wise loss. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00802"},{"key":"ref_65","first-page":"1","article-title":"Makelttalk: Speaker-aware talking-head animation","volume":"39","author":"Zhou","year":"2020","journal-title":"ACM Trans. Graph. TOG"},{"key":"ref_66","doi-asserted-by":"crossref","unstructured":"Meshry, M., Suri, S., Davis, L.S., and Shrivastava, A. (2021, January 11\u201317). Learned Spatial Representations for Few-shot Talking-Head Synthesis. Proceedings of the IEEE\/CVF International Conference on Computer Vision, Montreal, BC, Canada.","DOI":"10.1109\/ICCV48922.2021.01357"},{"key":"ref_67","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3478513.3480484","article-title":"Live speech portraits: Real-time photorealistic talking-head animation","volume":"40","author":"Lu","year":"2021","journal-title":"ACM Trans. Graph. TOG"},{"key":"ref_68","unstructured":"Yi, R., Ye, Z., Zhang, J., Bao, H., and Liu, Y.J. (2020). Audio-driven talking face video generation with learning-based personalized head pose. arXiv."},{"key":"ref_69","doi-asserted-by":"crossref","unstructured":"Thies, J., Elgharib, M., Tewari, A., Theobalt, C., and Nie\u00dfner, M. (2020, January 23\u201328). Neural voice puppetry: Audio-driven facial reenactment. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58517-4_42"},{"key":"ref_70","doi-asserted-by":"crossref","unstructured":"Chen, L., Cui, G., Liu, C., Li, Z., Kou, Z., Xu, Y., and Xu, C. (2020, January 23\u201328). Talking-head generation with rhythmic head motion. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","DOI":"10.1007\/978-3-030-58545-7_3"},{"key":"ref_71","doi-asserted-by":"crossref","unstructured":"Richard, A., Lea, C., Ma, S., Gall, J., De la Torre, F., and Sheikh, Y. (2021, January 11). Audio-and gaze-driven facial animation of codec avatars. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA.","DOI":"10.1109\/WACV48630.2021.00009"},{"key":"ref_72","doi-asserted-by":"crossref","first-page":"585","DOI":"10.1109\/TIFS.2022.3146783","article-title":"Everybody\u2019s talkin\u2019: Let me talk as you want","volume":"17","author":"Song","year":"2022","journal-title":"IEEE Trans. Inf. Forensics Secur."},{"key":"ref_73","unstructured":"Thies, J., Zollhofer, M., Stamminger, M., Theobalt, C., and Nie\u00dfner, M. (July, January 26). Face2face: Real-time face capture and reenactment of rgb videos. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_74","doi-asserted-by":"crossref","unstructured":"Tripathy, S., Kannala, J., and Rahtu, E. (2020, January 4\u20138). Icface: Interpretable and controllable face reenactment using gans. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA.","DOI":"10.1109\/WACV45572.2020.9093474"},{"key":"ref_75","doi-asserted-by":"crossref","unstructured":"Zhou, H., Sun, Y., Wu, W., Loy, C.C., Wang, X., and Liu, Z. (2021, January 20\u201325). Pose-controllable talking face generation by implicitly modularized audio-visual representation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.00416"},{"key":"ref_76","unstructured":"Zhou, H., Liu, Y., Liu, Z., Luo, P., and Wang, X. (February, January 27). Talking face generation by adversarially disentangled audio-visual representation. Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA."},{"key":"ref_77","doi-asserted-by":"crossref","unstructured":"Wiles, O., Koepke, A., and Zisserman, A. (2018, January 8\u201314). X2face: A network for controlling face generation using images, audio, and pose codes. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01261-8_41"},{"key":"ref_78","doi-asserted-by":"crossref","unstructured":"Ha, S., Kersner, M., Kim, B., Seo, S., and Kim, D. (2020, January 7\u201312). Marionette: Few-shot face reenactment preserving identity of unseen targets. Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA.","DOI":"10.1609\/aaai.v34i07.6721"},{"key":"ref_79","doi-asserted-by":"crossref","unstructured":"Bansal, A., Ma, S., Ramanan, D., and Sheikh, Y. (2018, January 8\u201314). Recycle-gan: Unsupervised video retargeting. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01228-1_8"},{"key":"ref_80","first-page":"1","article-title":"Deep video portraits","volume":"37","author":"Kim","year":"2018","journal-title":"ACM Trans. Graph. TOG"},{"key":"ref_81","doi-asserted-by":"crossref","first-page":"1474","DOI":"10.3389\/fpsyg.2017.01474","article-title":"A combination of outcome and process feedback enhances performance in simulations of child sexual abuse interviews using avatars","volume":"8","author":"Pompedda","year":"2017","journal-title":"Front. Psychol."},{"key":"ref_82","doi-asserted-by":"crossref","first-page":"43","DOI":"10.1080\/19012276.2020.1788417","article-title":"Transfer of simulated interview training effects into interviews with children exposed to a mock event","volume":"73","author":"Pompedda","year":"2020","journal-title":"Nordic Psychol."},{"key":"ref_83","doi-asserted-by":"crossref","unstructured":"Pompedda, F., Zhang, Y., Haginoya, S., and Santtila, P. (2022). A Mega-Analysis of the Effects of Feedback on the Quality of Simulated Child Sexual Abuse Interviews with Avatars. J. Police Crim. Psychol., 1\u201314.","DOI":"10.21203\/rs.3.rs-1121518\/v1"},{"key":"ref_84","unstructured":"Dalli, K.C. (2021). Technological Acceptance of an Avatar Based Interview Training Application: The Development and Technological Acceptance Study of the AvBIT Application. [Master\u2019s Thesis, Linnaeus University]."},{"key":"ref_85","unstructured":"Johansson, D. (2015). Design and Evaluation of an Avatar-Mediated System for Child Interview Training. [Master\u2019s Thesis, Line University]."},{"key":"ref_86","unstructured":"Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., and Devin, M. (2016). Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv."},{"key":"ref_87","unstructured":"Bunk, T., Varshneya, D., Vlasov, V., and Nichol, A. (2020). Diet: Lightweight language understanding for dialogue systems. arXiv."},{"key":"ref_88","unstructured":"ITU-T Recommendation P.809 (2018). Subjective Evaluation Methods for Gaming Quality, International Telecommunication Union."},{"key":"ref_89","doi-asserted-by":"crossref","unstructured":"Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., and Zettlemoyer, L. (2019). BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. arXiv.","DOI":"10.18653\/v1\/2020.acl-main.703"},{"key":"ref_90","doi-asserted-by":"crossref","unstructured":"Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., and Bowman, S. (2018, January 1). GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, Brussels, Belgium.","DOI":"10.18653\/v1\/W18-5446"},{"key":"ref_91","unstructured":"Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., and Askell, A. (2020). Language Models are Few-Shot Learners. arXiv."},{"key":"ref_92","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1016\/j.cub.2013.11.064","article-title":"Dynamic Facial Expressions of Emotion Transmit an Evolving Hierarchy of Signals over Time","volume":"24","author":"Jack","year":"2014","journal-title":"Curr. Biol."},{"key":"ref_93","unstructured":"(2022, May 20). Deepfakes. github. Available online: https:\/\/github.com\/deepfakes\/faceswap."},{"key":"ref_94","unstructured":"Sha, T., Zhang, W., Shen, T., Li, Z., and Mei, T. (2021). Deep Person Generation: A Survey from the Perspective of Face, Pose and Cloth Synthesis. arXiv."},{"key":"ref_95","doi-asserted-by":"crossref","first-page":"351","DOI":"10.1007\/s11633-021-1293-0","article-title":"Deep audio-visual learning: A survey","volume":"18","author":"Zhu","year":"2021","journal-title":"Int. J. Autom. Comput."},{"key":"ref_96","doi-asserted-by":"crossref","first-page":"131","DOI":"10.1016\/j.inffus.2020.06.014","article-title":"Deepfakes and beyond: A survey of face manipulation and fake detection","volume":"64","author":"Tolosana","year":"2020","journal-title":"Inf. Fusion"},{"key":"ref_97","unstructured":"Kumar, R., Sotelo, J., Kumar, K., de Br\u00e9bisson, A., and Bengio, Y. (2018). ObamaNet: Photo-realistic lip-sync from text. arXiv."},{"key":"ref_98","doi-asserted-by":"crossref","unstructured":"Baugerud, G.A., Johnson, M.S., Klingenberg R\u00f8ed, R., Lamb, M.E., Powell, M., Thambawita, V., Hicks, S.A., Salehi, P., Hassan, S.Z., and Halvorsen, P. (2021, January 21). Multimodal virtual avatars for investigative interviews with children. Proceedings of the 2021 Workshop on Intelligent Cross-Data Analysis and Retrieval, Taipei, Taiwan.","DOI":"10.1145\/3463944.3469269"},{"key":"ref_99","doi-asserted-by":"crossref","unstructured":"Chung, J.S., Nagrani, A., and Zisserman, A. (2018). Voxceleb2: Deep speaker recognition. arXiv.","DOI":"10.21437\/Interspeech.2018-1929"},{"key":"ref_100","doi-asserted-by":"crossref","first-page":"155","DOI":"10.1016\/j.jcm.2016.02.012","article-title":"A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research","volume":"15","author":"Koo","year":"2016","journal-title":"J. Chiropr. Med."},{"key":"ref_101","doi-asserted-by":"crossref","unstructured":"Karras, T., Laine, S., and Aila, T. (2019, January 16\u201319). A style-based generator architecture for generative adversarial networks. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00453"},{"key":"ref_102","doi-asserted-by":"crossref","first-page":"98","DOI":"10.1109\/MRA.2012.2192811","article-title":"The uncanny valley [from the field]","volume":"19","author":"Mori","year":"2012","journal-title":"IEEE Robot. Autom. Mag."},{"key":"ref_103","doi-asserted-by":"crossref","first-page":"695","DOI":"10.1016\/j.chb.2008.12.026","article-title":"Too real for comfort? Uncanny responses to computer generated faces","volume":"25","author":"MacDorman","year":"2009","journal-title":"Comput. Hum. Behav."},{"key":"ref_104","unstructured":"Brunnstr\u00f6m, K., Beker, S.A., De Moor, K., Dooms, A., Egger, S., Garcia, M.N., Hossfeld, T., Jumisko-Pyykk\u00f6, S., Keimel, C., and Larabi, M.C. (2013). Qualinet white Paper on Definitions of Quality of Experience, HAL."}],"container-title":["Big Data and Cognitive Computing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-2289\/6\/2\/62\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T23:23:26Z","timestamp":1760138606000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-2289\/6\/2\/62"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,6,1]]},"references-count":104,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2022,6]]}},"alternative-id":["bdcc6020062"],"URL":"https:\/\/doi.org\/10.3390\/bdcc6020062","relation":{},"ISSN":["2504-2289"],"issn-type":[{"value":"2504-2289","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,6,1]]}}}