{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,17]],"date-time":"2026-04-17T01:05:27Z","timestamp":1776387927593,"version":"3.51.2"},"reference-count":45,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2019,7,26]],"date-time":"2019-07-26T00:00:00Z","timestamp":1564099200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100007601","name":"Horizon 2020 Framework Programme","doi-asserted-by":"publisher","award":["IoTCrawler Project (contract 779852)"],"award-info":[{"award-number":["IoTCrawler Project (contract 779852)"]}],"id":[{"id":"10.13039\/501100007601","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Knowl. Discov. Data"],"published-print":{"date-parts":[[2019,8,31]]},"abstract":"<jats:p>Modern human-computer interaction systems may not only be based on interpreting natural language but also on detecting speaker interpersonal characteristics in order to determine dialog strategies. This may be of high interest in different fields such as telephone marketing or automatic voice-based interactive services. However, when such systems encounter signals transmitted over a communication network instead of clean speech, e.g., in call centers, the speaker characterization accuracy might be impaired by the degradations caused in the speech signal by the encoding and communication processes. This article addresses a binary classification of high versus low warm--attractive speakers over different channel and encoding conditions. The ground truth is derived from ratings given to clean speech extracted from an extensive subjective test. Our results show that, under the considered conditions, the AMR-WB+ codec permits good levels of classification accuracy, comparable to the classification with clean, non-degraded speech. This is especially notable for the case of a Random Forest-based classifier, which presents the best performance among the set of evaluated algorithms. The impact of different packet loss rates has been examined, whereas jitter effects have been found to be negligible.<\/jats:p>","DOI":"10.1145\/3332146","type":"journal-article","created":{"date-parts":[[2019,7,26]],"date-time":"2019-07-26T13:17:18Z","timestamp":1564147038000},"page":"1-17","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["On the Impact of Voice Encoding and Transmission on the Predictions of Speaker Warmth and Attractiveness"],"prefix":"10.1145","volume":"13","author":[{"given":"Laura Fern\u00e1ndez","family":"Gallardo","sequence":"first","affiliation":[{"name":"Technische Universit\u00e4t Berlin"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0069-3017","authenticated-orcid":false,"given":"Ramon","family":"Sanchez-Iborra","sequence":"additional","affiliation":[{"name":"University of Murcia, Spain"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2019,7,26]]},"reference":[{"key":"e_1_2_1_1_1","first-page":"55","article-title":"Effect of speech compression on the automatic recognition of emotions","volume":"4","author":"Albahri A.","year":"2016","unstructured":"A. Albahri , M. Lech , and E. Cheng . 2016 . Effect of speech compression on the automatic recognition of emotions . International Journal of Signal Processing Systems 4 , 1 (2016), 55 -- 61 . A. Albahri, M. Lech, and E. Cheng. 2016. Effect of speech compression on the automatic recognition of emotions. International Journal of Signal Processing Systems 4, 1 (2016), 55--61.","journal-title":"International Journal of Signal Processing Systems"},{"key":"e_1_2_1_2_1","volume-title":"8th ACM\/IEEE International Conference on Human-Robot Interaction (HRI\u201913)","author":"Aly A.","unstructured":"A. Aly and A. Tapus . 2013. A model for synthesizing a combined verbal and nonverbal behavior based on personality traits in human-robot interaction . In 8th ACM\/IEEE International Conference on Human-Robot Interaction (HRI\u201913) . 325--332. A. Aly and A. Tapus. 2013. A model for synthesizing a combined verbal and nonverbal behavior based on personality traits in human-robot interaction. In 8th ACM\/IEEE International Conference on Human-Robot Interaction (HRI\u201913). 325--332."},{"key":"e_1_2_1_3_1","volume-title":"Computing and Communication Workshop and Conference (CCWC\u201918)","author":"Argal A.","unstructured":"A. Argal , S. Gupta , A. Modi , P. Pandey , S. Shim , and C. Choo . 2018. Intelligent travel chatbot for predictive recommendation in echo platform . In Computing and Communication Workshop and Conference (CCWC\u201918) . A. Argal, S. Gupta, A. Modi, P. Pandey, S. Shim, and C. Choo. 2018. Intelligent travel chatbot for predictive recommendation in echo platform. In Computing and Communication Workshop and Conference (CCWC\u201918)."},{"key":"e_1_2_1_4_1","unstructured":"E. Aronson T. D. Wilson and R. M. Akert. 2009. Social Psychology (7th ed.). Prentice Hall.  E. Aronson T. D. Wilson and R. M. Akert. 2009. Social Psychology (7th ed.). Prentice Hall."},{"key":"e_1_2_1_5_1","volume-title":"Natural Interaction with Robots, Knowbots and Smartphones","author":"Bellegarda J. R.","unstructured":"J. R. Bellegarda . 2014. Spoken Language Understanding for Natural Interaction: The Siri Experience . In Natural Interaction with Robots, Knowbots and Smartphones . Springer , New York, NY , 3--14. J. R. Bellegarda. 2014. Spoken Language Understanding for Natural Interaction: The Siri Experience. In Natural Interaction with Robots, Knowbots and Smartphones. Springer, New York, NY, 3--14."},{"key":"e_1_2_1_6_1","volume-title":"Annual Conference of Interspeech. 1557--1560","author":"Burkhardt F.","unstructured":"F. Burkhardt , B. Schuller , B. Weiss , and F. Weninger . 2011. \u2018Would you buy a car from me?\u2019 -- On the likability of telephone voices . In Annual Conference of Interspeech. 1557--1560 . F. Burkhardt, B. Schuller, B. Weiss, and F. Weninger. 2011. \u2018Would you buy a car from me?\u2019 -- On the likability of telephone voices. In Annual Conference of Interspeech. 1557--1560."},{"key":"e_1_2_1_7_1","volume-title":"Annual Conference of the International Speech Communication Association (Interspeech\u201917)","author":"Burmania A.","unstructured":"A. Burmania and C. Busso . 2017. A stepwise analysis of aggregated crowdsourced labels describing multimodal emotional behaviors . In Annual Conference of the International Speech Communication Association (Interspeech\u201917) . 152--156. A. Burmania and C. Busso. 2017. A stepwise analysis of aggregated crowdsourced labels describing multimodal emotional behaviors. In Annual Conference of the International Speech Communication Association (Interspeech\u201917). 152--156."},{"key":"e_1_2_1_8_1","doi-asserted-by":"crossref","unstructured":"A. S. Chowdhury and G. Riccardi. 2017. A deep learning approach to modeling competitiveness in spoken conversations. In ICASSP. 5680--5684.  A. S. Chowdhury and G. Riccardi. 2017. A deep learning approach to modeling competitiveness in spoken conversations. In ICASSP. 5680--5684.","DOI":"10.1109\/ICASSP.2017.7953244"},{"key":"e_1_2_1_9_1","doi-asserted-by":"crossref","unstructured":"M. Esk\u00e9nazi G. A. Levow H. Meng G. Parent and D. Suendermann. 2013. Crowdsourcing for Speech Processing: Applications to Data Collection Transcription and Assessment. Wiley.   M. Esk\u00e9nazi G. A. Levow H. Meng G. Parent and D. Suendermann. 2013. Crowdsourcing for Speech Processing: Applications to Data Collection Transcription and Assessment. Wiley.","DOI":"10.1002\/9781118541241"},{"key":"e_1_2_1_10_1","volume-title":"Speech Processing, Transmission and Quality Aspects (STQ)","author":"ETSI EG","unstructured":"ETSI EG 202 396-2. 2006. Speech Processing, Transmission and Quality Aspects (STQ) ; Speech quality performance in the presence of background noise; Part 2: Background noise transmission -- Network simulation -- Subjective test database and results. European Telecommunications Standards Institute . ETSI EG 202 396-2. 2006. Speech Processing, Transmission and Quality Aspects (STQ); Speech quality performance in the presence of background noise; Part 2: Background noise transmission -- Network simulation -- Subjective test database and results. European Telecommunications Standards Institute."},{"key":"e_1_2_1_11_1","volume-title":"Speech and Multimedia Transmission Quality (STQ)","author":"ETSI TR","unstructured":"ETSI TR 103 138. 2016. Speech and Multimedia Transmission Quality (STQ) ; Speech Samples and Their Use for QoS Testing. European Telecommunications Standards Institute . ETSI TR 103 138. 2016. Speech and Multimedia Transmission Quality (STQ); Speech Samples and Their Use for QoS Testing. European Telecommunications Standards Institute."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/TAFFC.2015.2457417"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/2502081.2502224"},{"key":"e_1_2_1_14_1","volume-title":"Human and Automatic Speaker Recognition Over Telecommunication Channels","author":"Gallardo L. Fern\u00e1ndez","unstructured":"L. Fern\u00e1ndez Gallardo . 2016. Human and Automatic Speaker Recognition Over Telecommunication Channels . Springer-Verlag , Singapore . L. Fern\u00e1ndez Gallardo. 2016. Human and Automatic Speaker Recognition Over Telecommunication Channels. Springer-Verlag, Singapore."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/QoMEX.2018.8463395"},{"key":"e_1_2_1_16_1","doi-asserted-by":"crossref","unstructured":"L. Fern\u00e1ndez Gallardo S. M\u00f6ller and J. G. Beerends. 2017. Predicting automatic speech recognition performance over communication channels from instrumental speech quality and intelligibility scores. In Interspeech. 2939--2943.  L. Fern\u00e1ndez Gallardo S. M\u00f6ller and J. G. Beerends. 2017. Predicting automatic speech recognition performance over communication channels from instrumental speech quality and intelligibility scores. In Interspeech. 2939--2943.","DOI":"10.21437\/Interspeech.2017-36"},{"key":"e_1_2_1_17_1","unstructured":"L. Fern\u00e1ndez-Gallardo and B. Weiss. 2017. Perceived interpersonal speaker attributes and their acoustic features. In Phonetik und Phonologie im Deutschprachigen Raum (PundP\u201913). 61--64.  L. Fern\u00e1ndez-Gallardo and B. Weiss. 2017. Perceived interpersonal speaker attributes and their acoustic features. In Phonetik und Phonologie im Deutschprachigen Raum (PundP\u201913). 61--64."},{"key":"e_1_2_1_18_1","unstructured":"L. Fern\u00e1ndez Gallardo and B. Weiss. 2018. The nautilus speaker characterization corpus: Speech recordings and labels of speaker characteristics and voice descriptions. In Submitted to International Conference on Language Resources and Evaluation (LREC\u201918).  L. Fern\u00e1ndez Gallardo and B. Weiss. 2018. The nautilus speaker characterization corpus: Speech recordings and labels of speaker characteristics and voice descriptions. In Submitted to International Conference on Language Resources and Evaluation (LREC\u201918)."},{"key":"e_1_2_1_19_1","unstructured":"ffmpeg. 2018. Retrieved from https:\/\/www.ffmpeg.org\/.  ffmpeg. 2018. Retrieved from https:\/\/www.ffmpeg.org\/."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2016.2604038"},{"key":"e_1_2_1_21_1","volume-title":"20th Symposium on Signal Processing, Images and Computer Vision (STSIVA\u201915)","author":"Garc\u00eda N.","unstructured":"N. Garc\u00eda , J. C. V\u00e1squez-Correa , J. D. Arias-Londo\u00f1o , J. F. V\u00e1rgas-Bonilla , and J. R. Orozco-Arroyave . 2015. Automatic emotion recognition in compressed speech using acoustic and. non-linear features . In 20th Symposium on Signal Processing, Images and Computer Vision (STSIVA\u201915) . N. Garc\u00eda, J. C. V\u00e1squez-Correa, J. D. Arias-Londo\u00f1o, J. F. V\u00e1rgas-Bonilla, and J. R. Orozco-Arroyave. 2015. Automatic emotion recognition in compressed speech using acoustic and. non-linear features. In 20th Symposium on Signal Processing, Images and Computer Vision (STSIVA\u201915)."},{"key":"e_1_2_1_22_1","volume-title":"Annual Conference of the International Speech Communication Association (Interspeech\u201917)","author":"Gideon J.","unstructured":"J. Gideon , S. Khorram , Z. Aldeneh , D. Dimitriadis , and E. M. Provost . 2017. Progressive neural networks for transfer learning in emotion recognition . In Annual Conference of the International Speech Communication Association (Interspeech\u201917) . 1098--1102. J. Gideon, S. Khorram, Z. Aldeneh, D. Dimitriadis, and E. M. Provost. 2017. Progressive neural networks for transfer learning in emotion recognition. In Annual Conference of the International Speech Communication Association (Interspeech\u201917). 1098--1102."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/LWC.2018.2806442"},{"key":"e_1_2_1_24_1","volume-title":"Software Tools for Speech and Audio Coding Standardization","author":"ITU-T","unstructured":"ITU-T Recommendation G.191. 2000. Software Tools for Speech and Audio Coding Standardization . International Telecommunication Union . ITU-T Recommendation G.191. 2000. Software Tools for Speech and Audio Coding Standardization. International Telecommunication Union."},{"key":"e_1_2_1_25_1","volume-title":"Specification for an Intermediate Reference System","author":"ITU-T","unstructured":"ITU-T Recommendation P.48. 1988. Specification for an Intermediate Reference System . International Telecommunication Union . ITU-T Recommendation P.48. 1988. Specification for an Intermediate Reference System. International Telecommunication Union."},{"key":"e_1_2_1_26_1","volume-title":"Objective Measurement of Active Speech Level","author":"ITU-T","unstructured":"ITU-T Recommendation P.56. 1993. Objective Measurement of Active Speech Level . International Telecommunication Union , CH- Geneva . ITU-T Recommendation P.56. 1993. Objective Measurement of Active Speech Level. International Telecommunication Union, CH-Geneva."},{"key":"e_1_2_1_27_1","first-page":"145","article-title":"Interpersonale adjektivliste (IAL)","volume":"51","author":"Jacobs I.","year":"2005","unstructured":"I. Jacobs and W. Scholl . 2005 . Interpersonale adjektivliste (IAL) . Diagnostica -- Zeitschrift f\u00fcr Psychologische Diagnostik und Differentielle Psychologie 51 , 3 (2005), 145 -- 155 . I. Jacobs and W. Scholl. 2005. Interpersonale adjektivliste (IAL). Diagnostica -- Zeitschrift f\u00fcr Psychologische Diagnostik und Differentielle Psychologie 51, 3 (2005), 145--155.","journal-title":"Diagnostica -- Zeitschrift f\u00fcr Psychologische Diagnostik und Differentielle Psychologie"},{"key":"e_1_2_1_28_1","volume-title":"Annual SIGdial Meeting on Discourse and Dialogue (SIGDIAL\u201914)","author":"Litman D.","unstructured":"D. Litman and K. Forbes-Riley . 2014. Evaluating a spoken dialogue system that detects and adapts to user affective states . In Annual SIGdial Meeting on Discourse and Dialogue (SIGDIAL\u201914) . 181--185. D. Litman and K. Forbes-Riley. 2014. Evaluating a spoken dialogue system that detects and adapts to user affective states. In Annual SIGdial Meeting on Discourse and Dialogue (SIGDIAL\u201914). 181--185."},{"key":"e_1_2_1_29_1","volume-title":"Advances in Intelligent Systems and Computing","author":"Masche J.","unstructured":"J. Masche and N.-T. Le. 2018. A review of technologies for conversational systems . In Advances in Intelligent Systems and Computing . Springer , 212--225. J. Masche and N.-T. Le. 2018. A review of technologies for conversational systems. In Advances in Intelligent Systems and Computing. Springer, 212--225."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/T-AFFC.2012.5"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASL.2006.883262"},{"key":"e_1_2_1_32_1","volume-title":"Annual Conference of the International Speech Communication Association (Interspeech\u201917)","author":"Chasaide A. N\u00ed","unstructured":"A. N\u00ed Chasaide , I. Yanushevskaya , and C. Gobl . 2017. Voice-to-affect mapping: Inferences on language voice baseline settings . In Annual Conference of the International Speech Communication Association (Interspeech\u201917) . 1258--1262. A. N\u00ed Chasaide, I. Yanushevskaya, and C. Gobl. 2017. Voice-to-affect mapping: Inferences on language voice baseline settings. In Annual Conference of the International Speech Communication Association (Interspeech\u201917). 1258--1262."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.nicl.2014.04.004"},{"key":"e_1_2_1_34_1","volume-title":"Annual Conference of the International Speech Communication Association (Interspeech\u201917)","author":"Parthasarathy S.","unstructured":"S. Parthasarathy and C. Busso . 2017. Jointly predicting arousal, valence and dominance with multi-task learning . In Annual Conference of the International Speech Communication Association (Interspeech\u201917) . 1103--1107. S. Parthasarathy and C. Busso. 2017. Jointly predicting arousal, valence and dominance with multi-task learning. In Annual Conference of the International Speech Communication Association (Interspeech\u201917). 1103--1107."},{"key":"e_1_2_1_35_1","doi-asserted-by":"crossref","unstructured":"J. Pohjalainen and P. Alku. 2014. Multi-scale modulation filtering in. automatic detection of emotions in telephone speech. In ICASSP. 980--984.  J. Pohjalainen and P. Alku. 2014. Multi-scale modulation filtering in. automatic detection of emotions in telephone speech. In ICASSP. 980--984.","DOI":"10.1109\/ICASSP.2014.6853743"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/MSP.2016.2617341"},{"key":"e_1_2_1_37_1","volume-title":"Annual Conference of the International Speech Communication Association (Interspeech). 495--499","author":"Schmitt M.","unstructured":"M. Schmitt , F. Ringeval , and B. Schuller . 2016. At the border of acoustics and linguistics: Bag-of-audio-words for the recognition of emotions in speech . In Annual Conference of the International Speech Communication Association (Interspeech). 495--499 . M. Schmitt, F. Ringeval, and B. Schuller. 2016. At the border of acoustics and linguistics: Bag-of-audio-words for the recognition of emotions in speech. In Annual Conference of the International Speech Communication Association (Interspeech). 495--499."},{"key":"e_1_2_1_38_1","volume-title":"International Workshop on Grounding Language Understanding (GLU\u201917)","author":"Silber-Varod V.","unstructured":"V. Silber-Varod , A. Lerner , and O. Jokisch . 2017. Automatic speaker\u2019s role classification with a bottom-up acoustic feature selection . In International Workshop on Grounding Language Understanding (GLU\u201917) . 52--56. V. Silber-Varod, A. Lerner, and O. Jokisch. 2017. Automatic speaker\u2019s role classification with a bottom-up acoustic feature selection. In International Workshop on Grounding Language Understanding (GLU\u201917). 52--56."},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1145\/3139491.3139492"},{"key":"e_1_2_1_40_1","volume-title":"12th Digital Signal Processing Workshop - 4th Signal Processing Education Workshop. 251--256","author":"Teng Y.","unstructured":"Y. Teng and R. F. Kubichek . 2006. Speech intelligibility evaluation of low bit rate speech codecs . In 12th Digital Signal Processing Workshop - 4th Signal Processing Education Workshop. 251--256 . Y. Teng and R. F. Kubichek. 2006. Speech intelligibility evaluation of low bit rate speech codecs. In 12th Digital Signal Processing Workshop - 4th Signal Processing Education Workshop. 251--256."},{"key":"e_1_2_1_41_1","volume-title":"IEEE International Carnahan Conference on Security Techology (ICCST\u201915)","author":"Vasquez-Correa J. C.","unstructured":"J. C. Vasquez-Correa , N. Garcia , J. R. Orozco-Arroyave , J. D. Arias-Londo\u00c3\u015bo , J. F. Vargas-Bonilla , and E. N\u00f6th . 2015. Emotion recognition from speech under environmental noise conditions using wavelet decomposition . In IEEE International Carnahan Conference on Security Techology (ICCST\u201915) . 247--252. J. C. Vasquez-Correa, N. Garcia, J. R. Orozco-Arroyave, J. D. Arias-Londo\u00c3\u015bo, J. F. Vargas-Bonilla, and E. N\u00f6th. 2015. Emotion recognition from speech under environmental noise conditions using wavelet decomposition. In IEEE International Carnahan Conference on Security Techology (ICCST\u201915). 247--252."},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.trit.2016.12.004"},{"key":"e_1_2_1_43_1","doi-asserted-by":"crossref","unstructured":"B. Weiss and F. Burkhardt. 2012. Is \u2018not bad\u2019 good enough? aspects of unknown voices\u2019 likability. In Interspeech. 510--513.  B. Weiss and F. Burkhardt. 2012. Is \u2018not bad\u2019 good enough? aspects of unknown voices\u2019 likability. In Interspeech. 510--513.","DOI":"10.21437\/Interspeech.2012-97"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1207\/s15327906mbr2304_8"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1011174108613"}],"container-title":["ACM Transactions on Knowledge Discovery from Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3332146","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3332146","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:53:26Z","timestamp":1750204406000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3332146"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,7,26]]},"references-count":45,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2019,8,31]]}},"alternative-id":["10.1145\/3332146"],"URL":"https:\/\/doi.org\/10.1145\/3332146","relation":{},"ISSN":["1556-4681","1556-472X"],"issn-type":[{"value":"1556-4681","type":"print"},{"value":"1556-472X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,7,26]]},"assertion":[{"value":"2018-12-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-05-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-07-26","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}