{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,25]],"date-time":"2026-04-25T15:19:46Z","timestamp":1777130386116,"version":"3.51.4"},"reference-count":121,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2023,9,27]],"date-time":"2023-09-27T00:00:00Z","timestamp":1695772800000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000001","name":"NSF","doi-asserted-by":"publisher","award":["2124285"],"award-info":[{"award-number":["2124285"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Interact. Mob. Wearable Ubiquitous Technol."],"published-print":{"date-parts":[[2023,9,27]]},"abstract":"<jats:p>Smart speaker voice assistants (VAs) such as Amazon Echo and Google Home have been widely adopted due to their seamless integration with smart home devices and the Internet of Things (IoT) technologies. These VA services raise privacy concerns, especially due to their access to our speech. This work considers one such use case: the unaccountable and unauthorized surveillance of a user's emotion via speech emotion recognition (SER). This paper presents DARE-GP, a solution that creates additive noise to mask users' emotional information while preserving the transcription-relevant portions of their speech. DARE-GP does this by using a constrained genetic programming approach to learn the spectral frequency traits that depict target users' emotional content, and then generating a universal adversarial audio perturbation that provides this privacy protection. Unlike existing works, DARE-GP provides: a) real-time protection of previously unheard utterances, b) against previously unseen black-box SER classifiers, c) while protecting speech transcription, and d) does so in a realistic, acoustic environment. Further, this evasion is robust against defenses employed by a knowledgeable adversary. The evaluations in this work culminate with acoustic evaluations against two off-the-shelf commercial smart speakers using a small-form-factor (raspberry pi) integrated with a wake-word system to evaluate the efficacy of its real-world, real-time deployment.<\/jats:p>","DOI":"10.1145\/3610887","type":"journal-article","created":{"date-parts":[[2023,9,27]],"date-time":"2023-09-27T15:45:03Z","timestamp":1695829503000},"page":"1-30","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":10,"title":["Privacy against Real-Time Speech Emotion Detection via Acoustic Adversarial Evasion of Machine Learning"],"prefix":"10.1145","volume":"7","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2349-9564","authenticated-orcid":false,"given":"Brian","family":"Testa","sequence":"first","affiliation":[{"name":"Syracuse University, Syracuse, New York, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5261-5440","authenticated-orcid":false,"given":"Yi","family":"Xiao","sequence":"additional","affiliation":[{"name":"Syracuse University, Syracuse, New York, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7016-6220","authenticated-orcid":false,"given":"Harshit","family":"Sharma","sequence":"additional","affiliation":[{"name":"Syracuse University, Syracuse, New York, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-9535-3974","authenticated-orcid":false,"given":"Avery","family":"Gump","sequence":"additional","affiliation":[{"name":"Syracuse University, Syracuse, New York, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0807-8967","authenticated-orcid":false,"given":"Asif","family":"Salekin","sequence":"additional","affiliation":[{"name":"Syracuse University, Syracuse, New York, USA"}]}],"member":"320","published-online":{"date-parts":[[2023,9,27]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"2022. Normal conversation loudness. https:\/\/tinyurl.com\/9r4xrz24"},{"key":"e_1_2_1_2_1","volume-title":"Patrick Cardinal, and Alessandro L Koerich.","author":"Abdoli Sajjad","year":"2019","unstructured":"Sajjad Abdoli, Luiz G Hafemann, Jerome Rony, Ismail Ben Ayed, Patrick Cardinal, and Alessandro L Koerich. 2019. Universal adversarial audio perturbations. arXiv preprint arXiv:1908.03173 (2019)."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/SP40001.2021.00014"},{"key":"e_1_2_1_4_1","volume-title":"Ultimate Alexa Command Guide: 200+ Voice Commands You Need to Know for Your Echo. Retrieved","author":"Aguilar Nelson","year":"2023","unstructured":"Nelson Aguilar. 2022. Ultimate Alexa Command Guide: 200+ Voice Commands You Need to Know for Your Echo. Retrieved February 9, 2023 from https:\/\/www.cnet.com\/home\/smart-home\/ultimate-alexa-command-guide-200-voice-commands-you-need-to-know-for-your-echo\/"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3362743.3362960"},{"key":"e_1_2_1_6_1","volume-title":"Emotionless: Privacy-preserving speech analysis for voice assistants. arXiv preprint arXiv:1908.03632","author":"Aloufi Ranya","year":"2019","unstructured":"Ranya Aloufi, Hamed Haddadi, and David Boyle. 2019. Emotionless: Privacy-preserving speech analysis for voice assistants. arXiv preprint arXiv:1908.03632 (2019)."},{"key":"e_1_2_1_7_1","volume-title":"Paralinguistic privacy protection at the edge. ACM Transactions on Privacy and Security","author":"Aloufi Ranya","year":"2020","unstructured":"Ranya Aloufi, Hamed Haddadi, and David Boyle. 2020. Paralinguistic privacy protection at the edge. ACM Transactions on Privacy and Security (2020)."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/3411495.3421355"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10115-020-01449-0"},{"key":"e_1_2_1_10_1","volume-title":"Did you hear that? adversarial examples against automatic speech recognition. arXiv preprint arXiv:1801.00554","author":"Alzantot Moustafa","year":"2018","unstructured":"Moustafa Alzantot, Bharathan Balaji, and Mani Srivastava. 2018. Did you hear that? adversarial examples against automatic speech recognition. arXiv preprint arXiv:1801.00554 (2018)."},{"key":"e_1_2_1_11_1","volume-title":"Retrieved","year":"2022","unstructured":"Amazon. 2022. What Is Automatic Speech Recognition? Retrieved May 14, 2022 from https:\/\/developer.amazon.com\/en-US\/alexa\/alexa-skills-kit\/asr"},{"key":"e_1_2_1_12_1","doi-asserted-by":"crossref","unstructured":"Tawfiq Ammari Jofish Kaye Janice Y Tsai and Frank Bentley. 2019. Music Search and IoT: How People (Really) Use Voice Assistants.","DOI":"10.1145\/3311956"},{"key":"e_1_2_1_13_1","volume-title":"17--1","author":"Trans ACM","year":"2019","unstructured":"ACM Trans. Comput. Hum. Interact. 26, 3 (2019), 17--1."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.33215\/sbr.v1i3.660"},{"key":"e_1_2_1_15_1","unstructured":"Brooke Auxier Lee Rainie Monica Anderson Andrew Perrin Madhu Kumar and Erica Turner. 2019. Americans and privacy: Concerned confused and feeling lack of control over their personal information. (2019)."},{"key":"e_1_2_1_16_1","volume-title":"wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in neural information processing systems 33","author":"Baevski Alexei","year":"2020","unstructured":"Alexei Baevski, Yuhao Zhou, Abdelrahman Mohamed, and Michael Auli. 2020. wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in neural information processing systems 33 (2020), 12449--12460."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.3390\/sym12010021"},{"key":"e_1_2_1_18_1","unstructured":"Nick Barney and Ivy Wigmore. 2022. What is surveillance capitalism? - definition from whatis.com. https:\/\/www.techtarget.com\/whatis\/definition\/surveillance-capitalism"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3264901"},{"key":"e_1_2_1_20_1","first-page":"918","article-title":"Method and article of manufacture for content-based analysis, storage, retrieval, and segmentation of audio information","volume":"5","author":"Blum Thomas L","year":"1999","unstructured":"Thomas L Blum, Douglas F Keislar, James A Wheaton, and Erling H Wold. 1999. Method and article of manufacture for content-based analysis, storage, retrieval, and segmentation of audio information. US Patent 5,918,223.","journal-title":"US Patent"},{"key":"e_1_2_1_21_1","volume-title":"IEMOCAP: Interactive emotional dyadic motion capture database. Language resources and evaluation 42","author":"Busso Carlos","year":"2008","unstructured":"Carlos Busso, Murtaza Bulut, Chi-Chun Lee, Abe Kazemzadeh, Emily Mower, Samuel Kim, Jeannette N Chang, Sungbok Lee, and Shrikanth S Narayanan. 2008. IEMOCAP: Interactive emotional dyadic motion capture database. Language resources and evaluation 42 (2008), 335--359."},{"key":"e_1_2_1_22_1","volume-title":"Audio adversarial examples: Targeted attacks on speech-to-text. In 2018 IEEE security and privacy workshops (SPW)","author":"Carlini Nicholas","unstructured":"Nicholas Carlini and David Wagner. 2018. Audio adversarial examples: Targeted attacks on speech-to-text. In 2018 IEEE security and privacy workshops (SPW). IEEE, 1--7."},{"key":"e_1_2_1_23_1","volume-title":"Query-efficient hard-label black-box attack: An optimization-based approach. arXiv preprint arXiv:1807.04457","author":"Cheng Minhao","year":"2018","unstructured":"Minhao Cheng, Thong Le, Pin-Yu Chen, Jinfeng Yi, Huan Zhang, and Cho-Jui Hsieh. 2018. Query-efficient hard-label black-box attack: An optimization-based approach. arXiv preprint arXiv:1807.04457 (2018)."},{"key":"e_1_2_1_24_1","volume-title":"Sign-opt: A query-efficient hard-label adversarial attack. arXiv preprint arXiv:1909.10773","author":"Cheng Minhao","year":"2019","unstructured":"Minhao Cheng, Simranjit Singh, Patrick Chen, Pin-Yu Chen, Sijia Liu, and Cho-Jui Hsieh. 2019. Sign-opt: A query-efficient hard-label adversarial attack. arXiv preprint arXiv:1909.10773 (2019)."},{"key":"e_1_2_1_25_1","volume-title":"The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC genomics 21, 1","author":"Chicco Davide","year":"2020","unstructured":"Davide Chicco and Giuseppe Jurman. 2020. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC genomics 21, 1 (2020), 1--13."},{"key":"e_1_2_1_26_1","volume-title":"The band pass filter. international economic review 44, 2","author":"Christiano Lawrence J","year":"2003","unstructured":"Lawrence J Christiano and Terry J Fitzgerald. 2003. The band pass filter. international economic review 44, 2 (2003), 435--465."},{"key":"e_1_2_1_27_1","volume-title":"Comparison of techniques for environmental sound recognition. Pattern recognition letters 24, 15","author":"Cowling Michael","year":"2003","unstructured":"Michael Cowling and Renate Sitte. 2003. Comparison of techniques for environmental sound recognition. Pattern recognition letters 24, 15 (2003), 2895--2907."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jretconser.2016.03.009"},{"key":"e_1_2_1_29_1","volume-title":"Retrieved","year":"2019","unstructured":"data flair.training. 2019. Python Mini Project -- Speech Emotion Recognition with librosa. Retrieved May 14, 2022 from https:\/\/data-flair.training\/blogs\/python-mini-project-speech-emotion-recognition\/"},{"key":"e_1_2_1_30_1","unstructured":"Ambra Demontis Marco Melis Maura Pintor Jagielski Matthew Battista Biggio Oprea Alina Nita-Rotaru Cristina Fabio Roli et al. 2019. Why do adversarial attacks transfer? explaining transferability of evasion and poisoning attacks. In 28th USENIX security symposium. USENIX Association 321--338."},{"key":"e_1_2_1_31_1","volume-title":"Susan Van Hemel","author":"Dobie Robert A","year":"2004","unstructured":"Robert A Dobie, Susan Van Hemel, National Research Council, et al. 2004. Basics of sound, the ear, and hearing. In Hearing Loss: Determining Eligibility for Social Security Benefits. National Academies Press (US)."},{"key":"e_1_2_1_32_1","unstructured":"Kate Dupuis and M Kathleen Pichora-Fuller. 2010. Toronto emotional speech set (TESS)-Younger talker_Happy. (2010)."},{"key":"e_1_2_1_33_1","volume-title":"Retrieved","author":"Engler Alex","year":"2021","unstructured":"Alex Engler. 2021. Why President Biden should ban affective computing in federal law enforcement. Retrieved May 14, 2022 from https:\/\/www.brookings.edu\/blog\/techtank\/2021\/08\/04\/why-president-biden-should-ban-affective-computing-in-federal-law-enforcement\/"},{"key":"e_1_2_1_34_1","volume-title":"Kyoto, Japan","author":"Erickson Donna","year":"2008","unstructured":"Donna Erickson, Albert Rilliard, Takaaki Shochi, Jonghye Han, Hideki Kawahara, and K Sakakibara. 2008. A cross-linguistic comparison of perception to formant frequency cues in emotional speech. COCOSDA, Kyoto, Japan (2008), 163--167."},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSA.2005.854103"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNN.2008.2005601"},{"key":"e_1_2_1_37_1","volume-title":"Compulsive buying disorder. Textbook of Addiction Treatment: International Perspectives","author":"Filomensky Tatiana Zambrano","year":"2021","unstructured":"Tatiana Zambrano Filomensky and Hermano Tavares. 2021. Compulsive buying disorder. Textbook of Addiction Treatment: International Perspectives (2021), 979--994."},{"key":"e_1_2_1_38_1","doi-asserted-by":"crossref","unstructured":"Nico H Frijda. 1988. The Laws of Emotion. (1988).","DOI":"10.1037\/\/0003-066X.43.5.349"},{"key":"e_1_2_1_39_1","unstructured":"Sidney Fussell. 2019. Alexa wants to know how you're feeling today. https:\/\/www.theatlantic.com\/technology\/archive\/2018\/10\/alexa-emotion-detection-ai-surveillance\/572884\/"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1186\/s13636-022-00254-7"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3596256"},{"key":"e_1_2_1_42_1","volume-title":"Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572","author":"Goodfellow Ian J","year":"2014","unstructured":"Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)."},{"key":"e_1_2_1_43_1","volume-title":"Retrieved","year":"2022","unstructured":"Google. 2022. Analyzing Sentiment | Cloud Natural Language API | Google Cloud. Retrieved November 14, 2022 from https:\/\/cloud.google.com\/natural-language\/docs\/analyzing-sentiment"},{"key":"e_1_2_1_44_1","volume-title":"Police facial recognition robot identifies anger and distress. Retrieved","author":"Hamilton Fiona","year":"2023","unstructured":"Fiona Hamilton. 2020. Police facial recognition robot identifies anger and distress. Retrieved February 8, 2023 from https:\/\/www.thetimes.co.uk\/article\/police-facial-recognition-robot-identifies-anger-and-distress-65h0xfrkg"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2021.3122291"},{"key":"e_1_2_1_46_1","volume-title":"30th USENIX Security Symposium (USENIX Security 21)","author":"Hussain Shehzeen","year":"2021","unstructured":"Shehzeen Hussain, Paarth Neekhara, Shlomo Dubnov, Julian McAuley, and Farinaz Koushanfar. 2021. {WaveGuard}: Understanding and Mitigating Audio Adversarial Examples. In 30th USENIX Security Symposium (USENIX Security 21). 2273--2290."},{"key":"e_1_2_1_47_1","volume-title":"International conference on machine learning. PMLR, 2137--2146","author":"Ilyas Andrew","year":"2018","unstructured":"Andrew Ilyas, Logan Engstrom, Anish Athalye, and Jessy Lin. 2018. Black-box adversarial attacks with limited queries and information. In International conference on machine learning. PMLR, 2137--2146."},{"key":"e_1_2_1_48_1","volume-title":"Surrey audio-visual expressed emotion (savee) database","author":"Jackson Philip","year":"2014","unstructured":"Philip Jackson and SJUoSG Haq. 2014. Surrey audio-visual expressed emotion (savee) database. University of Surrey: Guildford, UK (2014)."},{"key":"e_1_2_1_49_1","first-page":"096","article-title":"Voice-based determination of physical and emotional characteristics of users","volume":"10","author":"Jin Huafeng","year":"2018","unstructured":"Huafeng Jin and Shuo Wang. 2018. Voice-based determination of physical and emotional characteristics of users. US Patent 10,096,319.","journal-title":"US Patent"},{"key":"e_1_2_1_50_1","volume-title":"Retrieved","author":"Jovanovic Kosta","year":"2021","unstructured":"Kosta Jovanovic. 2021. GitHub - Data-Science-kosta\/Speech-Emotion-Classification-with-PyTorch. Retrieved May 14, 2022 from https:\/\/github.com\/Data-Science-kosta\/Speech-Emotion-Classification-with-PyTorch"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/ODYSSEY.2006.248123"},{"key":"e_1_2_1_52_1","volume-title":"Understanding emotions","author":"Keltner Dacher","unstructured":"Dacher Keltner, Keith Oatley, and Jennifer M Jenkins. 2014. Understanding emotions. Wiley Hoboken, NJ."},{"key":"e_1_2_1_53_1","volume-title":"Retrieved","author":"Klobuchar Amy","year":"2020","unstructured":"Amy Klobuchar. 2020. Following Privacy Concerns Surrounding Amazon Halo, Klobuchar Urges Administration to Take Action to Protect Personal Health Data. Retrieved May 14, 2022 from https:\/\/www.klobuchar.senate.gov\/public\/index.cfm\/2020\/12\/following-privacy-concerns-surrounding-amazon-halo-klobuchar-urges-administration-to-take-action-to-protect-personal-health-data"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP40776.2020.9053079"},{"key":"e_1_2_1_55_1","volume-title":"Search methodologies","author":"Koza John R","unstructured":"John R Koza and Riccardo Poli. 2005. Genetic programming. In Search methodologies. Springer, 127--164."},{"key":"e_1_2_1_56_1","first-page":"2020","article-title":"Impulse buying and post-purchase regret: a study of shopping behavior for the purchase of grocery products. Abhishek Kumar, Sumana Chaudhuri, Aparna Bhardwaj and Pallavi Mishra, Emotional Intelligence and its Impact on Team Building through Mediation of Leadership Effectiveness","volume":"11","author":"Kumar Abhishek","year":"2021","unstructured":"Abhishek Kumar, Dr Chaudhuri, Dr Bhardwaj, Pallavi Mishra, et al. 2021. Impulse buying and post-purchase regret: a study of shopping behavior for the purchase of grocery products. Abhishek Kumar, Sumana Chaudhuri, Aparna Bhardwaj and Pallavi Mishra, Emotional Intelligence and its Impact on Team Building through Mediation of Leadership Effectiveness, International Journal of Management 11, 12 (2021), 2020.","journal-title":"International Journal of Management"},{"key":"e_1_2_1_57_1","unstructured":"John Laidler. 2019. Harvard professor says surveillance capitalism is undermining democracy. https:\/\/news.harvard.edu\/gazette\/story\/ 2019\/03\/harvard-professor-says-surveillance-capitalism-is-undermining-democracy\/"},{"key":"e_1_2_1_58_1","volume-title":"Retrieved Fec 14","author":"Lauren Rhue","year":"2019","unstructured":"Rhue Lauren. 2019. Emotion-reading tech fails the racial bias test. Retrieved Fec 14, 2023 from https:\/\/theconversation.com\/emotion-reading-tech-fails-the-racial-bias-test-108404"},{"key":"e_1_2_1_59_1","volume-title":"Emotion and adaptation","author":"Lazarus Richard S","unstructured":"Richard S Lazarus. 1991. Emotion and adaptation. Oxford University Press."},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1109\/89.650310"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1098\/rspa.2015.0624"},{"key":"e_1_2_1_62_1","volume-title":"Emotion and decision making. Annual review of psychology 66","author":"Lerner Jennifer S","year":"2015","unstructured":"Jennifer S Lerner, Ye Li, Piercarlo Valdesolo, and Karim S Kassam. 2015. Emotion and decision making. Annual review of psychology 66 (2015), 799--823."},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1145\/3384419.3430713"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1145\/3376897.3377856"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1145\/3372297.3423348"},{"key":"e_1_2_1_66_1","unstructured":"Mark Lippett. 2023. Council post: How can we make the smart speaker feel at home? https:\/\/www.forbes.com\/sites\/forbestechcouncil\/ 2023\/03\/06\/how-can-we-make-the-smart-speaker-feel-at-home\/?sh=4e80ef0d689a"},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM48880.2022.9796920"},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0196391"},{"key":"e_1_2_1_69_1","volume-title":"In International Symposium on Music Information Retrieval. Citeseer.","author":"Logan Beth","year":"2000","unstructured":"Beth Logan. 2000. Mel frequency cepstral coefficients for music modeling. In In International Symposium on Music Information Retrieval. Citeseer."},{"key":"e_1_2_1_70_1","volume-title":"A unified approach to interpreting model predictions. Advances in neural information processing systems 30","author":"Lundberg Scott M","year":"2017","unstructured":"Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. Advances in neural information processing systems 30 (2017)."},{"key":"e_1_2_1_71_1","volume-title":"Retrieved","author":"Macaulay Thomas","year":"2020","unstructured":"Thomas Macaulay. 2020. British police to trial facial recognition system that detects your mood. Retrieved May 14, 2022 from https:\/\/thenextweb.com\/news\/british-police-to-trial-facial-recognition-system-that-detects-your-mood"},{"key":"e_1_2_1_72_1","volume-title":"Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083","author":"Madry Aleksander","year":"2017","unstructured":"Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2017. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 (2017)."},{"key":"e_1_2_1_73_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASRU51503.2021.9688036"},{"key":"e_1_2_1_74_1","volume-title":"Retrieved","author":"Mallawaarachchi Vijini","year":"2017","unstructured":"Vijini Mallawaarachchi. 2017. Introduction to Genetic Algorithms --- Including Example Code. Retrieved May 14, 2022 from https:\/\/towardsdatascience.com\/introduction-to-genetic-algorithms-including-example-code-e396e98d8bf3"},{"key":"e_1_2_1_75_1","doi-asserted-by":"publisher","DOI":"10.1145\/3549865.3549904"},{"key":"e_1_2_1_76_1","unstructured":"Stephen Edward McAdams. 1984. Spectral fusion spectral parsing and the formation of auditory images. Stanford university."},{"key":"e_1_2_1_77_1","doi-asserted-by":"publisher","DOI":"10.1177\/2053951720904386"},{"key":"e_1_2_1_78_1","volume-title":"Retrieved","author":"Mikulin Eric","year":"2022","unstructured":"Eric Mikulin. 2022. GitHub - Picovoice\/porcupine: On-device wake word detection powered by deep learning. Retrieved June 20, 2022 from https:\/\/github.com\/Picovoice\/porcupine"},{"key":"e_1_2_1_79_1","volume-title":"Speech processing in the auditory system","author":"Mogran Nelson","unstructured":"Nelson Mogran, Herv\u00e9 Bourlard, and Hynek Hermansky. 2004. Automatic speech recognition: An auditory perspective. In Speech processing in the auditory system. Springer, 309--338."},{"key":"e_1_2_1_80_1","volume-title":"Retrieved","author":"Stanford","year":"2022","unstructured":"Stanford NLP. 2022. Sentiment Analysis API | DeepAI. Retrieved November 14, 2022 from https:\/\/deepai.org\/machine-learning-model\/sentiment-analysis"},{"key":"e_1_2_1_81_1","volume-title":"IEEE 2011 workshop on automatic speech recognition and understanding. IEEE Signal Processing Society.","author":"Povey Daniel","year":"2011","unstructured":"Daniel Povey, Arnab Ghoshal, Gilles Boulianne, Lukas Burget, Ondrej Glembek, Nagendra Goel, Mirko Hannemann, Petr Motlicek, Yanmin Qian, Petr Schwarz, et al. 2011. The Kaldi speech recognition toolkit. In IEEE 2011 workshop on automatic speech recognition and understanding. IEEE Signal Processing Society."},{"key":"e_1_2_1_82_1","volume-title":"Retrieved","author":"Puthran Mitesh","year":"2021","unstructured":"Mitesh Puthran. 2021. GitHub - MiteshPuthran\/Speech-Emotion-Analyzer: The neural network model is capable of detecting five different male\/female emotions from audio speeches. (Deep Learning, NLP, Python). Retrieved October 20, 2022 from https:\/\/github.com\/MiteshPuthran\/Speech-Emotion-Analyzer"},{"key":"e_1_2_1_83_1","volume-title":"Proc. 4th Int. Multi-Conf. Systems, Signals & Devices. Amirhossein Rajabi and Carsten Witt.","author":"Rabaoui Asma","year":"2021","unstructured":"Asma Rabaoui, Zied Lachiri, and Noureddine Ellouze. 2007. Towards an optimal feature set for robustness improvement of sounds classification in a HMM-based classifier adapted to real world background noise. In Proc. 4th Int. Multi-Conf. Systems, Signals & Devices. Amirhossein Rajabi and Carsten Witt. 2021. Stagnation detection with randomized local search. In European Conference on Evolutionary Computation in Combinatorial Optimization (Part of EvoStar). Springer, 152--168."},{"key":"e_1_2_1_84_1","volume-title":"gender and compulsive buying among early adolescents. Young Consumers","author":"Roberts James A","year":"2012","unstructured":"James A Roberts and Camille Roberts. 2012. Stress, gender and compulsive buying among early adolescents. Young Consumers (2012)."},{"key":"e_1_2_1_85_1","doi-asserted-by":"publisher","DOI":"10.3389\/fpsyg.2021.697080"},{"key":"e_1_2_1_86_1","volume-title":"Retrieved","author":"Salo Jackie","year":"2021","unstructured":"Jackie Salo. 2021. China using 'emotion recognition technology' for surveillance. Retrieved May 14, 2022 from https:\/\/nypost.com\/2021\/03\/04\/china-using-emotion-recognition-technology-in-surveillance\/"},{"key":"e_1_2_1_87_1","volume-title":"Bias and Fairness on Multimodal Emotion Detection Algorithms. arXiv preprint arXiv:2205.08383","author":"Schmitz Matheus","year":"2022","unstructured":"Matheus Schmitz, Rehan Ahmed, and Jimi Cao. 2022. Bias and Fairness on Multimodal Emotion Detection Algorithms. arXiv preprint arXiv:2205.08383 (2022)."},{"key":"e_1_2_1_88_1","doi-asserted-by":"publisher","DOI":"10.1145\/3427228.3427276"},{"key":"e_1_2_1_89_1","unstructured":"Eric Hal Schwartz. 2021. Microsoft patents AI Emotion Detection System for xbox. https:\/\/voicebot.ai\/2021\/09\/15\/microsoft-patents-ai-emotion-detection-system\/"},{"key":"e_1_2_1_90_1","volume-title":"Circularity in judgments of relative pitch. The journal of the acoustical society of America 36, 12","author":"Shepard Roger N","year":"1964","unstructured":"Roger N Shepard. 1964. Circularity in judgments of relative pitch. The journal of the acoustical society of America 36, 12 (1964), 2346--2353."},{"key":"e_1_2_1_91_1","volume-title":"Marco Tagliasacchi, Ira Shavitt, Dotan Emanuel, and Yinnon Haviv.","author":"Shor Joel","year":"2020","unstructured":"Joel Shor, Aren Jansen, Ronnie Maor, Oran Lang, Omry Tuval, Felix de Chaumont Quitry, Marco Tagliasacchi, Ira Shavitt, Dotan Emanuel, and Yinnon Haviv. 2020. Towards learning a universal non-semantic representation of speech. arXiv preprint arXiv:2002.12764 (2020)."},{"key":"e_1_2_1_92_1","volume-title":"Smart speakers offer new legal challenges as privacy goes public. Retrieved","author":"Smith Katherine Snow","year":"2023","unstructured":"Katherine Snow Smith. 2020. Smart speakers offer new legal challenges as privacy goes public. Retrieved Feb 7, 2023 from https:\/\/www.legalexaminer.com\/home-family\/smart-speakers-offer-new-legal-challenges-as-privacy-goes-public\/"},{"key":"e_1_2_1_93_1","doi-asserted-by":"publisher","DOI":"10.5120\/21780-5056"},{"key":"e_1_2_1_94_1","doi-asserted-by":"publisher","DOI":"10.1201\/9781420036879"},{"key":"e_1_2_1_95_1","volume-title":"The best voice assistant | ZDNET. Retrieved","year":"2023","unstructured":"Reviews.com Staff. 2021. The best voice assistant | ZDNET. Retrieved February 8, 2023 from https:\/\/www.zdnet.com\/home-and-office\/smart-home\/the-best-voice-assistant\/"},{"key":"e_1_2_1_96_1","first-page":"249","article-title":"Comparison of different impulse response measurement techniques","volume":"50","author":"Stan Guy-Bart","year":"2002","unstructured":"Guy-Bart Stan, Jean-Jacques Embrechts, and Dominique Archambeau. 2002. Comparison of different impulse response measurement techniques. Journal of the Audio engineering society 50, 4 (2002), 249--262.","journal-title":"Journal of the Audio engineering society"},{"key":"e_1_2_1_97_1","volume-title":"A scale for the measurement of the psychological magnitude pitch. The journal of the acoustical society of america 8, 3","author":"Stevens Stanley Smith","year":"1937","unstructured":"Stanley Smith Stevens, John Volkmann, and Edwin Broomell Newman. 1937. A scale for the measurement of the psychological magnitude pitch. The journal of the acoustical society of america 8, 3 (1937), 185--190."},{"key":"e_1_2_1_98_1","doi-asserted-by":"publisher","DOI":"10.1145\/3384419.3430727"},{"key":"e_1_2_1_99_1","doi-asserted-by":"publisher","DOI":"10.23919\/APSIPAASC55919.2022.9980214"},{"key":"e_1_2_1_100_1","volume-title":"A taxonomy and terminology of adversarial machine learning. NIST IR","author":"Tabassi Elham","year":"2019","unstructured":"Elham Tabassi, Kevin J Burns, Michael Hadjimichael, Andres D Molina-Markham, and Julian T Sexton. 2019. A taxonomy and terminology of adversarial machine learning. NIST IR (2019), 1--29."},{"key":"e_1_2_1_101_1","doi-asserted-by":"publisher","DOI":"10.21437\/Blizzard.2015-7"},{"key":"e_1_2_1_102_1","volume-title":"Targeted adversarial examples for black box audio systems. In 2019 IEEE security and privacy workshops (SPW)","author":"Taori Rohan","unstructured":"Rohan Taori, Amog Kamsetty, Brenton Chu, and Nikita Vemuri. 2019. Targeted adversarial examples for black box audio systems. In 2019 IEEE security and privacy workshops (SPW). IEEE, 15--20."},{"key":"e_1_2_1_103_1","doi-asserted-by":"publisher","DOI":"10.15388\/infedu.2020.21"},{"key":"e_1_2_1_104_1","doi-asserted-by":"publisher","DOI":"10.1080\/14015430500456739"},{"key":"e_1_2_1_105_1","doi-asserted-by":"publisher","DOI":"10.1159\/000151762"},{"key":"e_1_2_1_106_1","unstructured":"Najah Walker. 2022. Spotify patented emotional recognition technology to recommend songs based on user's emotions. https:\/\/jolt.richmond.edu\/2022\/01\/11\/spotify-patented-emotional-recognition-technology-to-recommend-songs-based-on-users-emotions\/"},{"key":"e_1_2_1_107_1","unstructured":"Julian Wallis. 2022. The Tech behind Amazon alexa. https:\/\/webo.digital\/blog\/the-tech-behind-amazon-alexa\/"},{"key":"e_1_2_1_108_1","volume-title":"Communications standard dictionary","author":"Weik Martin","unstructured":"Martin Weik. 2012. Communications standard dictionary. Springer Science & Business Media."},{"key":"e_1_2_1_109_1","volume-title":"Proceedings of the 23rd Annual Network and Distributed System Security Symposium.","author":"Weilin Xu","year":"2016","unstructured":"Xu Weilin, Qi Yanjun, and Evans David. 2016. Automatically evading classifiers: A case study on PDF malware classifiers. In Proceedings of the 23rd Annual Network and Distributed System Security Symposium."},{"key":"e_1_2_1_110_1","volume-title":"Neuropathology of drug addictions and substance misuse","author":"Weinstein Aviv","unstructured":"Aviv Weinstein, Aniko Maraz, Mark D Griffiths, Michel Lejoyeux, and Zsolt Demetrovics. 2016. Compulsive buying---features and characteristics of addiction. In Neuropathology of drug addictions and substance misuse. Elsevier, 993--1007."},{"key":"e_1_2_1_111_1","volume-title":"On the acoustics of emotion in audio: what speech, music, and sound have in common. Frontiers in psychology 4","author":"Weninger Felix","year":"2013","unstructured":"Felix Weninger, Florian Eyben, Bj\u00f6rn W Schuller, Marcello Mortillaro, and Klaus R Scherer. 2013. On the acoustics of emotion in audio: what speech, music, and sound have in common. Frontiers in psychology 4 (2013), 292."},{"key":"e_1_2_1_112_1","volume-title":"Meet the Star Witness: Your Smart Speaker. Retrieved Fec 14","author":"WIRED.","year":"2020","unstructured":"WIRED. 2020. Meet the Star Witness: Your Smart Speaker. Retrieved Fec 14, 2023 from https:\/\/www.wired.com\/story\/star-witness-your-smart-speaker\/"},{"key":"e_1_2_1_113_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSA.2005.852988"},{"key":"e_1_2_1_114_1","doi-asserted-by":"publisher","DOI":"10.1109\/DySPAN.2019.8935789"},{"key":"e_1_2_1_115_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP39728.2021.9414635"},{"key":"e_1_2_1_116_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-65414-6_35"},{"key":"e_1_2_1_117_1","volume-title":"Alexa is always listening --- and so are Amazon workers - ABC News. Retrieved","author":"Youn Soo","year":"2023","unstructured":"Soo Youn. 2019. Alexa is always listening --- and so are Amazon workers - ABC News. Retrieved February 8, 2023 from https:\/\/abcnews.go.com\/Technology\/alexa-listening-amazon-workers\/story?id=62331191"},{"key":"e_1_2_1_118_1","volume-title":"SMACK: Semantically Meaningful Adversarial Audio Attack.","author":"Yu Zhiyuan","year":"2022","unstructured":"Zhiyuan Yu, Yuanhaur Chang, Ning Zhang, and Chaowei Xiao. 2022. SMACK: Semantically Meaningful Adversarial Audio Attack. (2022)."},{"key":"e_1_2_1_119_1","doi-asserted-by":"publisher","DOI":"10.1145\/3133956.3134052"},{"key":"e_1_2_1_120_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.chb.2019.09.012"},{"key":"e_1_2_1_121_1","doi-asserted-by":"publisher","DOI":"10.1057\/jit.2015.5"}],"container-title":["Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3610887","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3610887","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3610887","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,28]],"date-time":"2025-07-28T16:26:29Z","timestamp":1753719989000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3610887"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,9,27]]},"references-count":121,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2023,9,27]]}},"alternative-id":["10.1145\/3610887"],"URL":"https:\/\/doi.org\/10.1145\/3610887","relation":{},"ISSN":["2474-9567"],"issn-type":[{"value":"2474-9567","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,9,27]]},"assertion":[{"value":"2023-09-27","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}