{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,19]],"date-time":"2026-05-19T14:55:40Z","timestamp":1779202540954,"version":"3.51.4"},"reference-count":61,"publisher":"Association for Computing Machinery (ACM)","issue":"5","license":[{"start":{"date-parts":[[2024,1,22]],"date-time":"2024-01-22T00:00:00Z","timestamp":1705881600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"European Union under the Italian National Recovery and Resilience Plan (NRRP) of NextGenerationEU"},{"name":"Sustainable Mobility Center"},{"name":"Centro Nazionale per la Mobilit Sostenibile, CNMS","award":["CN_00000023"],"award-info":[{"award-number":["CN_00000023"]}]},{"name":"Dottorati e contratti di ricerca su tematiche dell\u201d innovazione","award":["1062 on 10.08.2021"],"award-info":[{"award-number":["1062 on 10.08.2021"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Multimedia Comput. Commun. Appl."],"published-print":{"date-parts":[[2024,5,31]]},"abstract":"<jats:p>The utilization of user\u2019s facial- and speech-related features for the estimation of the Quality of Experience (QoE) of multimedia services is still underinvestigated despite its potential. Currently, only the use of either facial or speech features individually has been proposed, and relevant limited experiments have been performed. To advance in this respect, in this study, we focused on WebRTC-based videoconferencing, where it is often possible to capture both the facial expressions and vocal speech characteristics of the users. First, we performed thorough statistical analysis to identify the most significant facial- and speech-related features for QoE estimation, which we extracted from the participants\u2019 audio-video data collected during a subjective assessment. Second, we trained individual QoE estimation machine learning-based models on the separated facial and speech datasets. Finally, we employed data fusion techniques to combine the facial and speech datasets into a single dataset to enhance the QoE estimation performance due to the integrated knowledge provided by the fusion of facial and speech features. The obtained results demonstrate that the data fusion technique based on the Improved Centered Kernel Alignment (ICKA) allows for reaching a mean QoE estimation accuracy of 0.93, whereas the values of 0.78 and 0.86 are reached when using only facial or speech features, respectively.<\/jats:p>","DOI":"10.1145\/3638251","type":"journal-article","created":{"date-parts":[[2023,12,21]],"date-time":"2023-12-21T11:51:31Z","timestamp":1703159491000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":13,"title":["QoE Estimation of WebRTC-based Audio-visual Conversations from Facial and Speech Features"],"prefix":"10.1145","volume":"20","author":[{"ORCID":"https:\/\/orcid.org\/0009-0005-8959-112X","authenticated-orcid":false,"given":"G\u00fclnaziye","family":"Bing\u00f6l","sequence":"first","affiliation":[{"name":"DIEE, University of Cagliari, Italy and CNIT, University of Cagliari, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0792-1200","authenticated-orcid":false,"given":"Simone","family":"Porcu","sequence":"additional","affiliation":[{"name":"DIEE, University of Cagliari, Italy and CNIT, University of Cagliari, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8745-1327","authenticated-orcid":false,"given":"Alessandro","family":"Floris","sequence":"additional","affiliation":[{"name":"DIEE, University of Cagliari, Italy and CNIT, University of Cagliari, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1350-3574","authenticated-orcid":false,"given":"Luigi","family":"Atzori","sequence":"additional","affiliation":[{"name":"DIEE, University of Cagliari, Italy and CNIT, University of Cagliari, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,1,22]]},"reference":[{"key":"e_1_3_2_2_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-014-2331-5"},{"key":"e_1_3_2_3_2","first-page":"1","volume-title":"International Symposium on Programming and Systems (ISPS\u201918)","author":"Amour L.","year":"2018","unstructured":"L. Amour, M. I. Boulabiar, S. Souihi, and A. Mellouk. 2018. An improved QoE estimation method based on QoS and affective computing. In International Symposium on Programming and Systems (ISPS\u201918). 1\u20136."},{"key":"e_1_3_2_4_2","first-page":"1","volume-title":"11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG\u201915)","volume":"06","author":"Baltrus\u0306aitis T.","year":"2015","unstructured":"T. Baltrus\u0306aitis, M. Mahmoud, and P. Robinson. 2015. Cross-dataset learning and person-specific normalisation for automatic Action Unit detection. In 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG\u201915), Vol. 06. 1\u20136."},{"key":"e_1_3_2_5_2","first-page":"59","volume-title":"13th IEEE International Conference on Automatic Face Gesture Recognition (FG\u201918)","author":"Baltrus\u0306aitis T.","year":"2018","unstructured":"T. Baltrus\u0306aitis, A. Zadeh, Y. C. Lim, and L. Morency. 2018. OpenFace 2.0: Facial behavior analysis toolkit. In 13th IEEE International Conference on Automatic Face Gesture Recognition (FG\u201918). 59\u201366."},{"issue":"6","key":"e_1_3_2_6_2","article-title":"Survey of research on Quality of experience modelling for web browsing","author":"Barakovic Sabina","year":"2017","unstructured":"Sabina Barakovic and Lea Skorin-Kapov. 2017. Survey of research on Quality of experience modelling for web browsing. Qual. User Exper.6 (2017).","journal-title":"Qual. User Exper."},{"issue":"1","key":"e_1_3_2_7_2","article-title":"Quality of experience evaluation of voice communication: An affect-based approach","volume":"2","author":"Bhattacharya Abhishek","year":"2012","unstructured":"Abhishek Bhattacharya, Wanmin Wu, and Zhenyu Yang. 2012. Quality of experience evaluation of voice communication: An affect-based approach. Hum.-Cent. Comput. Inf. Sci. 2, 1 (2012).","journal-title":"Hum.-Cent. Comput. Inf. Sci."},{"key":"e_1_3_2_8_2","first-page":"577","volume-title":"16th International Conference on Signal-Image Technology & Internet-based Systems (SITIS\u201922)","author":"Bing\u00f6l G\u00fclnaziye","year":"2022","unstructured":"G\u00fclnaziye Bing\u00f6l, Simone Porcu, Alessandro Floris, and Luigi Atzori. 2022. QoE estimation of WebRTC-based audiovisual conversations from facial expressions. In 16th International Conference on Signal-Image Technology & Internet-based Systems (SITIS\u201922). 577\u2013584."},{"key":"e_1_3_2_9_2","first-page":"1","volume-title":"14th International Conference on Quality of Multimedia Experience (QoMEX\u201922)","author":"Bing\u00f6l G\u00fclnaziye","year":"2022","unstructured":"G\u00fclnaziye Bing\u00f6l, Luigi Serreli, Simone Porcu, Alessandro Floris, and Luigi Atzori. 2022. The impact of network impairments on the QoE of WebRTC applications: A subjective study. In 14th International Conference on Quality of Multimedia Experience (QoMEX\u201922). 1\u20136."},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/2910017.2910605"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNSM.2016.2537645"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMC.2015.2461216"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/COMST.2014.2363139"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.comcom.2021.06.029"},{"key":"e_1_3_2_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/QoMEX.2017.7965665"},{"key":"e_1_3_2_16_2","first-page":"1","volume-title":"8th International Conference on Quality of Multimedia Experience (QoMEX\u201916)","author":"Egan Darragh","year":"2016","unstructured":"Darragh Egan, Sean Brennan, John Barrett, Yuansong Qiao, Christian Timmerer, and Niall Murray. 2016. An evaluation of heart rate and electrodermal activity as an objective QoE evaluation method for immersive virtual reality environments. In 8th International Conference on Quality of Multimedia Experience (QoMEX\u201916). 1\u20136."},{"key":"e_1_3_2_17_2","doi-asserted-by":"publisher","DOI":"10.1037\/h0030377"},{"key":"e_1_3_2_18_2","doi-asserted-by":"crossref","unstructured":"Paul Ekman and Wallace V. Friesen. 1978. Facial Action Coding System: A Technique for the Measurement of Facial Movement. Consulting Psychologists Press Palo Alto.","DOI":"10.1037\/t27734-000"},{"key":"e_1_3_2_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSTSP.2016.2609843"},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1145\/2502081.2502224"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/1873951.1874246"},{"key":"e_1_3_2_22_2","doi-asserted-by":"publisher","DOI":"10.3390\/electronics9030462"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.clindermatol.2019.07.010"},{"key":"e_1_3_2_24_2","first-page":"1322","volume-title":"IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence)","author":"He Haibo","year":"2008","unstructured":"Haibo He, Yang Bai, Edwardo A. Garcia, and Shutao Li. 2008. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence). 1322\u20131328."},{"key":"e_1_3_2_25_2","unstructured":"ITU. 1996. Methods for subjective determination of transmission quality. Recommendation ITU-T P.800. https:\/\/www.itu.int\/rec\/T-REC-P.800-199608-I"},{"key":"e_1_3_2_26_2","unstructured":"ITU. 2015. The E-model: A computational model for use in transmission planning. Recommendation ITU-T G.107. https:\/\/www.itu.int\/rec\/T-REC-G.107-201506-I\/en"},{"key":"e_1_3_2_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/SURV.2011.120811.00063"},{"key":"e_1_3_2_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2019.2936124"},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","unstructured":"N. Khan and M. G. Martini. 2016. QoE-driven multi-user scheduling and rate adaptation with reduced cross-layer signaling for scalable video streaming over LTE wireless systems. EURASIP Journal on Wireless Communications and Networking 93 (2016). DOI:10.1186\/s13638-016-0584-6","DOI":"10.1186\/s13638-016-0584-6"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10772-011-9125-1"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNSM.2019.2926720"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-03243-2_649-1"},{"key":"e_1_3_2_33_2","volume-title":"Qualinet White Paper on Definitions of Quality of Experience","author":"Callet Patrick Le","year":"2012","unstructured":"Patrick Le Callet, Sebastian M\u00f6ller, and Andrew Perkis. 2012. Qualinet White Paper on Definitions of Quality of Experience. European Network on Quality of Experience in Multimedia Systems and Services (COST Action IC 1003), Lausanne, Switzerland, Version 1.2, March 2013."},{"key":"e_1_3_2_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/TAFFC.2020.2981446"},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2022.03.062"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2019.2934425"},{"key":"e_1_3_2_37_2","first-page":"18","volume-title":"14th Python in Science Conference.","author":"McFee Brian","year":"2015","unstructured":"Brian McFee, Colin Raffel, Dawen Liang, Daniel P. W. Ellis, Matt McVicar, Eric Battenberg, and Oriol Nieto. 2015. librosa: Audio and music signal analysis in Python. In 14th Python in Science Conference.18\u201325."},{"key":"e_1_3_2_38_2","first-page":"204","volume-title":"Human Vision and Electronic Imaging XIX","author":"Moor Katrien De","year":"2014","unstructured":"Katrien De Moor, Filippo Mazza, Isabelle Hupont, Miguel R\u00edos Quintero, Toni M\u00e4ki, and Mart\u00edn Varela. 2014. Chamber QoE: A multi-instrumental approach to explore affective aspects in relation to quality of experience. In Human Vision and Electronic Imaging XIX, Bernice E. Rogowitz, Thrasyvoulos N. Pappas, and Huib de Ridder (Eds.), Vol. 9014. SPIE, 204\u2013217."},{"key":"e_1_3_2_39_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10772-018-9493-x"},{"issue":"1","key":"e_1_3_2_40_2","article-title":"Analysis of the quality of remote working experience: a speech-based approach","volume":"7","author":"Porcu Simone","year":"2022","unstructured":"Simone Porcu, Alessandro Floris, and Luigi Atzori. 2022. Analysis of the quality of remote working experience: a speech-based approach. Qual. User Exper. 7, 1 (2022).","journal-title":"Qual. User Exper."},{"key":"e_1_3_2_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNSM.2020.3018303"},{"issue":"5","key":"e_1_3_2_42_2","article-title":"Reducing videoconferencing fatigue through facial emotion recognition","volume":"13","author":"R\u00f6\u00dfler Jannik","year":"2021","unstructured":"Jannik R\u00f6\u00dfler, Jiachen Sun, and Peter Gloor. 2021. Reducing videoconferencing fatigue through facial emotion recognition. Fut. Internet 13, 5 (2021).","journal-title":"Fut. Internet"},{"key":"e_1_3_2_43_2","doi-asserted-by":"publisher","DOI":"10.5281\/zenodo.3243139"},{"key":"e_1_3_2_44_2","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pcbi.1008228"},{"key":"e_1_3_2_45_2","unstructured":"Sandvine. 2023. 2023 Global Internet Phenomena Report. Retrieved from https:\/\/www.sandvine.com\/global-internet-phenomena-report-2023"},{"key":"e_1_3_2_46_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2017.2777466"},{"key":"e_1_3_2_47_2","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2016-129"},{"key":"e_1_3_2_48_2","doi-asserted-by":"crossref","first-page":"481","DOI":"10.1145\/2733373.2806254","volume-title":"23rd ACM International Conference on Multimedia","author":"Scott Michael James","year":"2015","unstructured":"Michael James Scott, Sharath Chandra Guntuku, Yang Huan, Weisi Lin, and Gheorghita Ghinea. 2015. Modelling human factors in perceptual multimedia quality: On the role of personality and culture. In 23rd ACM International Conference on Multimedia. ACM, 481\u2013490."},{"key":"e_1_3_2_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/COMST.2014.2360940"},{"key":"e_1_3_2_50_2","doi-asserted-by":"publisher","DOI":"10.1145\/3176648"},{"key":"e_1_3_2_51_2","first-page":"1","volume-title":"14th International Conference on Quality of Multimedia Experience (QoMEX\u201922)","author":"Tiotsop Lohic Fotio","year":"2022","unstructured":"Lohic Fotio Tiotsop, Antonio Servetti, Marcus Barkowsky, and Enrico Masala. 2022. Regularized maximum likelihood estimation of the subjective quality from noisy individual ratings. In 14th International Conference on Quality of Multimedia Experience (QoMEX\u201922). 1\u20134."},{"key":"e_1_3_2_52_2","doi-asserted-by":"publisher","DOI":"10.1109\/IFIPNetworking.2015.7145309"},{"issue":"1","key":"e_1_3_2_53_2","first-page":"1","article-title":"A survey on parametric QoE estimation for popular services","volume":"77","author":"Tsolkas Dimitris","year":"2017","unstructured":"Dimitris Tsolkas, Eirini Liotou, Nikos Passas, and Lazaros Merakos. 2017. A survey on parametric QoE estimation for popular services. J. Netw. Comput. Applic. 77, 1 (2017), 1\u201317.","journal-title":"J. Netw. Comput. Applic."},{"key":"e_1_3_2_54_2","first-page":"1","article-title":"A survey of challenges and methods for quality of experience assessment of interactive VR applications","author":"Vlahovic Sara","year":"2022","unstructured":"Sara Vlahovic, Mirko Suznjevic, and Lea Skorin-Kapov. 2022. A survey of challenges and methods for quality of experience assessment of interactive VR applications. J. Multimod. User Interf. (042022), 1\u201335.","journal-title":"J. Multimod. User Interf."},{"key":"e_1_3_2_55_2","doi-asserted-by":"crossref","first-page":"459","DOI":"10.1007\/978-3-030-05710-7_38","volume-title":"MultiMedia Modeling","author":"Vu\u010di\u0107 Dunja","year":"2019","unstructured":"Dunja Vu\u010di\u0107 and Lea Skorin-Kapov. 2019. The impact of packet loss and Google congestion control on QoE for WebRTC-based mobile multiparty audiovisual telemeetings. In MultiMedia Modeling, Ioannis Kompatsiaris, Benoit Huet, Vasileios Mezaris, Cathal Gurrin, Wen-Huang Cheng, and Stefanos Vrochidis (Eds.). Springer International Publishing, Cham, 459\u2013470."},{"key":"e_1_3_2_56_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.3000467"},{"key":"e_1_3_2_57_2","doi-asserted-by":"crossref","first-page":"59","DOI":"10.21437\/PQS.2016-13","volume-title":"5th ISCA\/DEGA Workshop on Perceptual Quality of Systems (PQS\u201916)","author":"Vu\u010di\u0107 Dunja","year":"2016","unstructured":"Dunja Vu\u010di\u0107, Lea Skorin-Kapov, and Mirko Su\u017enjevi\u0107. 2016. The impact of bandwidth limitations and video resolution size on QoE for WebRTC-based mobile multi-party video conferencing. In 5th ISCA\/DEGA Workshop on Perceptual Quality of Systems (PQS\u201916). 59\u201363."},{"key":"e_1_3_2_58_2","first-page":"305","volume-title":"20th ACM Conference on Embedded Networked Sensor Systems","author":"Wang Chaowei","year":"2023","unstructured":"Chaowei Wang, Huadi Zhu, and Ming Li. 2023. SpeechQoE: A novel personalized QoE assessment model for voice services via speech sensing. In 20th ACM Conference on Embedded Networked Sensor Systems. ACM, 305\u2013319."},{"key":"e_1_3_2_59_2","doi-asserted-by":"publisher","DOI":"10.1007\/b95439"},{"key":"e_1_3_2_60_2","article-title":"On the acoustics of emotion in audio: What speech, music, and sound have in common","volume":"4","author":"Weninger Felix","year":"2013","unstructured":"Felix Weninger, Florian Eyben, Bj\u00f6rn Schuller, Marcello Mortillaro, and Klaus Scherer. 2013. On the acoustics of emotion in audio: What speech, music, and sound have in common. Front. Psychol. 4 (2013).","journal-title":"Front. Psychol."},{"key":"e_1_3_2_61_2","first-page":"3756","volume-title":"IEEE International Conference on Computer Vision (ICCV\u201915)","author":"Wood E.","year":"2015","unstructured":"E. Wood, T. Baltrus\u0306aitis, X. Zhang, Y. Sugano, P. Robinson, and A. Bulling. 2015. Rendering of eyes for eye-shape registration and gaze estimation. In IEEE International Conference on Computer Vision (ICCV\u201915). 3756\u20133764."},{"key":"e_1_3_2_62_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNSM.2020.3043482"}],"container-title":["ACM Transactions on Multimedia Computing, Communications, and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3638251","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3638251","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T22:53:35Z","timestamp":1750287215000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3638251"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,1,22]]},"references-count":61,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2024,5,31]]}},"alternative-id":["10.1145\/3638251"],"URL":"https:\/\/doi.org\/10.1145\/3638251","relation":{},"ISSN":["1551-6857","1551-6865"],"issn-type":[{"value":"1551-6857","type":"print"},{"value":"1551-6865","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,1,22]]},"assertion":[{"value":"2023-07-05","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-12-17","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-01-22","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}