{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,14]],"date-time":"2026-04-14T00:41:36Z","timestamp":1776127296198,"version":"3.50.1"},"reference-count":66,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2023,6,12]],"date-time":"2023-06-12T00:00:00Z","timestamp":1686528000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100012166","name":"National Key R&D Program of China","doi-asserted-by":"crossref","award":["2020AAA0107700"],"award-info":[{"award-number":["2020AAA0107700"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100012226","name":"Fundamental Research Funds for the Central Universities","doi-asserted-by":"publisher","award":["2021FZZX001-27"],"award-info":[{"award-number":["2021FZZX001-27"]}],"id":[{"id":"10.13039\/501100012226","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62102354, 62032021, 62122066, 62172359, 61972348, 62172277"],"award-info":[{"award-number":["62102354, 62032021, 62122066, 62172359, 61972348, 62172277"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Interact. Mob. Wearable Ubiquitous Technol."],"published-print":{"date-parts":[[2023,6,12]]},"abstract":"<jats:p>Faced with the threat of identity leakage during voice data publishing, users are engaged in a privacy-utility dilemma when enjoying the utility of voice services. Existing machine-centric studies employ direct modification or text-based re-synthesis to de-identify users' voices but cause inconsistent audibility for human participants in emerging online communication scenarios, such as virtual meetings. In this paper, we propose a human-centric voice de-identification system, VoiceCloak, which uses adversarial examples to balance the privacy and utility of voice services. Instead of typical additive examples inducing perceivable distortions, we design a novel convolutional adversarial example that modulates perturbations into real-world room impulse responses. Benefiting from this, VoiceCloak could preserve user identity from exposure by Automatic Speaker Identification (ASI), while remaining the voice perceptual quality for non-intrusive de-identification. Moreover, VoiceCloak learns a compact speaker distribution through a conditional variational auto-encoder to synthesize diverse targets on demand. Guided by these pseudo targets, VoiceCloak constructs adversarial examples in an input-specific manner, enabling any-to-any identity transformation for robust de-identification. Experimental results show that VoiceCloak could achieve over 92% and 84% successful de-identification on mainstream ASIs and commercial systems with excellent voiceprint consistency, speech integrity, and audio quality.<\/jats:p>","DOI":"10.1145\/3596266","type":"journal-article","created":{"date-parts":[[2023,6,12]],"date-time":"2023-06-12T18:58:16Z","timestamp":1686596296000},"page":"1-21","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":17,"title":["VoiceCloak"],"prefix":"10.1145","volume":"7","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4775-5107","authenticated-orcid":false,"given":"Meng","family":"Chen","sequence":"first","affiliation":[{"name":"Zhejiang University, School of Cyber Science and Technology, Key Laboratory of Blockchain and Cyberspace Governance of Zhejiang Province, Hangzhou, China and ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5230-3749","authenticated-orcid":false,"given":"Li","family":"Lu","sequence":"additional","affiliation":[{"name":"Zhejiang University, School of Cyber Science and Technology, Key Laboratory of Blockchain and Cyberspace Governance of Zhejiang Province, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-2373-4185","authenticated-orcid":false,"given":"Junhao","family":"Wang","sequence":"additional","affiliation":[{"name":"Zhejiang University, School of Cyber Science and Technology, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0207-9643","authenticated-orcid":false,"given":"Jiadi","family":"Yu","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Department of Computer Science and Engineering, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3994-766X","authenticated-orcid":false,"given":"Yingying","family":"Chen","sequence":"additional","affiliation":[{"name":"Rutgers University, WINLAB, Department of Electrical and Computer Engineering, Piscataway, NJ, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5804-3279","authenticated-orcid":false,"given":"Zhibo","family":"Wang","sequence":"additional","affiliation":[{"name":"Zhejiang University, School of Cyber Science and Technology, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0921-8869","authenticated-orcid":false,"given":"Zhongjie","family":"Ba","sequence":"additional","affiliation":[{"name":"Zhejiang University, School of Cyber Science and Technology, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5240-5200","authenticated-orcid":false,"given":"Feng","family":"Lin","sequence":"additional","affiliation":[{"name":"Zhejiang University, School of Cyber Science and Technology, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3441-6277","authenticated-orcid":false,"given":"Kui","family":"Ren","sequence":"additional","affiliation":[{"name":"Zhejiang University, School of Cyber Science and Technology, Hangzhou, China"}]}],"member":"320","published-online":{"date-parts":[[2023,6,12]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.14722\/ndss.2019.23362"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/EUSIPCO.2015.7362755"},{"key":"e_1_2_1_3_1","volume-title":"Proceedings of USENIX Security. Virtual Event, 2703--2720","author":"Ahmed Shimaa","year":"2020","unstructured":"Shimaa Ahmed, Amrita Roy Chowdhury, Kassem Fawaz, and Parmesh Ramanathan. 2020. Preech: A System for Privacy-Preserving Speech Transcription. In Proceedings of USENIX Security. Virtual Event, 2703--2720."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3351230"},{"key":"e_1_2_1_5_1","unstructured":"Alibaba Cloud. 2017. Voiceprint Recognition System --- Not Just a Powerful Authentication Tool. https:\/\/alibaba-cloud.medium.com\/voiceprint-recognition-system-not-just-a-powerful-authentication-tool-6b3702b5c5a."},{"key":"e_1_2_1_6_1","unstructured":"Apple. 2022. Apple Siri. https:\/\/machinelearning.apple.com\/research\/personalized-hey-siri."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3397332"},{"key":"e_1_2_1_8_1","volume-title":"Proceedings of IEEE S&P","author":"Carlini Nicholas","unstructured":"Nicholas Carlini and David A. Wagner. 2017. Towards Evaluating the Robustness of Neural Networks. In Proceedings of IEEE S&P. San Jose, CA, USA, 39--57."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/SP40001.2021.00004"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM48880.2022.9796934"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3495243.3558260"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3560905.3568518"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2019-2663"},{"key":"e_1_2_1_14_1","unstructured":"Veaux Christophe Yamagishi Junichi and MacDonald Kirsten. 2016. CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit. https:\/\/datashare.ed.ac.uk\/handle\/10283\/2119."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASL.2010.2064307"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2020-2650"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.1996.541103"},{"key":"e_1_2_1_18_1","volume-title":"Proceedings of USENIX Security. 2309--2326","author":"Eisenhofer Thorsten","year":"2021","unstructured":"Thorsten Eisenhofer, Lea Sch\u00f6nherr, Joel Frank, Lars Speckemeier, Dorothea Kolossa, and Thorsten Holz. 2021. Dompteur: Taming Audio Adversarial Examples. In Proceedings of USENIX Security. 2309--2326."},{"key":"e_1_2_1_19_1","volume-title":"Hern\u00e1ndez G\u00f3mez","author":"Espinoza-Cuadros Fernando M.","year":"2020","unstructured":"Fernando M. Espinoza-Cuadros, Juan M. Perero-Codosero, Javier Ant\u00f3n-Mart\u00edn, and Luis A. Hern\u00e1ndez G\u00f3mez. 2020. Speaker De-identification System using Autoencoders and Adversarial Training. CoRR abs\/2011.04696 (2020). arXiv:2011.04696"},{"key":"e_1_2_1_20_1","volume-title":"Speaker Anonymization Using X-vector and Neural Waveform Models. CoRR abs\/1905.13561","author":"Fang Fuming","year":"2019","unstructured":"Fuming Fang, Xin Wang, Junichi Yamagishi, Isao Echizen, Massimiliano Todisco, Nicholas W. D. Evans, and Jean-Fran\u00e7ois Bonastre. 2019. Speaker Anonymization Using X-vector and Neural Waveform Models. CoRR abs\/1905.13561 (2019). arXiv:1905.13561"},{"key":"e_1_2_1_21_1","unstructured":"Haytham M. Fayek. 2016. Speech Processing for Machine Learning: Filter banks Mel-Frequency Cepstral Coefficients (MFCCs) and What's In-Between. https:\/\/haythamfayek.com\/2016\/04\/21\/speech-processing-for-machine-learning.html."},{"key":"e_1_2_1_22_1","unstructured":"Forbes. 2021. Apple Just Gave 1.5 Billion iPad iPhone Users A Reason To Leave. https:\/\/www.forbes.com\/sites\/gordonkelly\/2022\/02\/12\/apple-iphone-ipad-siri-audio-recordings-iphone-privacy\/?sh=68fc85bd4193."},{"key":"e_1_2_1_23_1","volume-title":"Proceedings of ICLR","author":"Goodfellow Ian J.","year":"2015","unstructured":"Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and Harnessing Adversarial Examples. In Proceedings of ICLR. San Diego, CA, USA."},{"key":"e_1_2_1_24_1","volume-title":"Google Meet. https:\/\/www.bluestacks.com\/apps\/communication\/google-meet-on-pc.html.","year":"2022","unstructured":"Google. 2022. Google Meet. https:\/\/www.bluestacks.com\/apps\/communication\/google-meet-on-pc.html."},{"key":"e_1_2_1_25_1","unstructured":"Google Privacy & Terms. 2022. How Google Voice works. https:\/\/policies.google.com\/technologies\/voice?hl=en-US."},{"key":"e_1_2_1_26_1","unstructured":"CMU Speech Group. 2012. Statistical parametirc sythesis and voice conversion techniques. http:\/\/festvox.org\/11752\/slides\/lecture11a.pdf."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICME46284.2020.9102875"},{"key":"e_1_2_1_28_1","volume-title":"Proceedings of USENIX Security. 2273--2290","author":"Hussain Shehzeen","year":"2021","unstructured":"Shehzeen Hussain, Paarth Neekhara, Shlomo Dubnov, Julian J. McAuley, and Farinaz Koushanfar. 2021. WaveGuard: Understanding and Mitigating Audio Adversarial Examples. In Proceedings of USENIX Security. 2273--2290."},{"key":"e_1_2_1_29_1","unstructured":"iFLYTEK Open Platform. 2022. Voiceprint Recognition. https:\/\/www.xfyun.cn\/service\/isv."},{"key":"e_1_2_1_30_1","volume-title":"Proceedings of IEEE ICASSP","author":"Jin Qin","unstructured":"Qin Jin, Arthur R. Toth, Tanja Schultz, and Alan W. Black. 2009. Voice convergin: Speaker de-identification by voice transformation. In Proceedings of IEEE ICASSP. Taipei, Taiwan, 3909--3912."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/FG.2015.7285021"},{"key":"e_1_2_1_32_1","volume-title":"Kingma and Max Welling","author":"Diederik","year":"2014","unstructured":"Diederik P. Kingma and Max Welling. 2014. Auto-Encoding Variational Bayes. In Proceedings of ICLR. Banff, AB, Canada."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1186\/s13634-016-0306-6"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2017-452"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/3372297.3423348"},{"key":"e_1_2_1_36_1","volume-title":"Proceedings of ICLR","author":"Madry Aleksander","year":"2018","unstructured":"Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2018. Towards Deep Learning Models Resistant to Adversarial Attacks. In Proceedings of ICLR. Vancouver, BC, Canada."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.csl.2017.05.001"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/SPLIM.2016.7528408"},{"key":"e_1_2_1_39_1","unstructured":"Microsoft. 2022. How does Microsoft protect my privacy while improving its speech recognition technology? https:\/\/support.microsoft.com\/en-us\/windows\/how-does-microsoft-protect-my-privacy-while-improving-its-speech-recognition-technology-f465d7a7--4a4f-40b7--9441-f0e6e97e24ec."},{"key":"e_1_2_1_40_1","unstructured":"Microsoft. 2022. Microsoft Teams. https:\/\/www.microsoft.com\/en-us\/microsoft-teams\/group-chat-software."},{"key":"e_1_2_1_41_1","unstructured":"Microsoft Azure Congnitive Service. 2022. Speaker recognition. https:\/\/azure.microsoft.com\/en-us\/services\/cognitive-services\/speaker-recognition\/."},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.specom.2017.01.008"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2015.7178964"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/MIPRO.2014.6859761"},{"key":"e_1_2_1_45_1","unstructured":"Popular Mechanics. 2018. Hundreds of Apps Can Eavesdrop Through Phone Microphones to Target Ads. https:\/\/www.popularmechanics.com\/technology\/security\/a14533262\/alphonso-audio-ad-targeting\/."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/3274783.3274855"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM.2018.8486250"},{"key":"e_1_2_1_48_1","volume-title":"Proceedings of ICML","volume":"97","author":"Qin Yao","year":"2019","unstructured":"Yao Qin, Nicholas Carlini, Garrison Cottrell, Ian Goodfellow, and Colin Raffel. 2019. Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition. In Proceedings of ICML, Vol. 97. Long Beach, California, 5231--5240."},{"key":"e_1_2_1_49_1","volume-title":"Renato De Mori, and Yoshua Bengio","author":"Ravanelli Mirco","year":"2021","unstructured":"Mirco Ravanelli, Titouan Parcollet, Peter Plantinga, Aku Rouhe, Samuele Cornell, Loren Lugosch, Cem Subakan, Nauman Dawalatabad, Abdelwahab Heba, Jianyuan Zhong, Ju-Chieh Chou, Sung-Lin Yeh, Szu-Wei Fu, Chien-Feng Liao, Elena Rastorgueva, Fran\u00e7ois Grondin, William Aris, Hwidong Na, Yan Gao, Renato De Mori, and Yoshua Bengio. 2021. SpeechBrain: A General-Purpose Speech Toolkit. CoRR abs\/2106.04624 (2021). arXiv:2106.04624"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.14722\/ndss.2019.23288"},{"key":"e_1_2_1_51_1","volume-title":"Proceedings of NIPS. Vancouver and Whistler","author":"Schultz Matthew","year":"2003","unstructured":"Matthew Schultz and Thorsten Joachims. 2003. Learning a Distance Metric from Relative Comparisons. In Proceedings of NIPS. Vancouver and Whistler, British Columbia, Canada, 41--48."},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2018.8461375"},{"key":"e_1_2_1_53_1","volume-title":"Proceedings of ISCA Interspeech","author":"Lal Brij Mohan","year":"2019","unstructured":"Brij Mohan Lal Srivastava, Aur\u00e9lien Bellet, Marc Tommasi, and Emmanuel Vincent. 2019. Privacy-Preserving Adversarial Representation Learning in ASR: Reality or Illusion?. In Proceedings of ISCA Interspeech. Graz, Austria, 3700--3704."},{"key":"e_1_2_1_54_1","volume-title":"Proceedings of ISCA Interspeech. Virtual Event","author":"Lal Brij Mohan","year":"2020","unstructured":"Brij Mohan Lal Srivastava, Natalia A. Tomashenko, Xin Wang, Emmanuel Vincent, Junichi Yamagishi, Mohamed Maouche, Aur\u00e9lien Bellet, and Marc Tommasi. 2020. Design Choices for X-Vector Based Speaker Anonymization. In Proceedings of ISCA Interspeech. Virtual Event, Shanghai, China, 1713--1717."},{"key":"e_1_2_1_55_1","volume-title":"Proceedings of IEEE ICASSP","author":"Lal Brij Mohan","year":"2020","unstructured":"Brij Mohan Lal Srivastava, Nathalie Vauquier, Md. Sahidullah, Aur\u00e9lien Bellet, Marc Tommasi, and Emmanuel Vincent. 2020. Evaluating Voice Conversion-Based Privacy Protection against Informed Attackers. In Proceedings of IEEE ICASSP. Barcelona, Spain, 2802--2806."},{"key":"e_1_2_1_56_1","volume-title":"Proceedings of IEEE ICASSP","author":"Lal Brij Mohan","year":"2020","unstructured":"Brij Mohan Lal Srivastava, Nathalie Vauquier, Md. Sahidullah, Aur\u00e9lien Bellet, Marc Tommasi, and Emmanuel Vincent. 2020. Evaluating Voice Conversion-Based Privacy Protection against Informed Attackers. In Proceedings of IEEE ICASSP. Barcelona, Spain, 2802--2806."},{"key":"e_1_2_1_57_1","unstructured":"The New York Times. 2019. Amazon's Alexa Never Stops Listening to You. Should You Worry? https:\/\/www.nytimes.com\/wirecutter\/blog\/amazons-alexa- never- stops- listening-to-you\/."},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.csl.2021.101318"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/SPW.2019.00026"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2020-1955"},{"key":"e_1_2_1_61_1","volume-title":"Voiceprint: The New WeChat Password. https:\/\/blog.wechat.com\/2015\/05\/21\/voiceprint-the-new-wechat-password.","author":"Official Wechat","year":"2015","unstructured":"Wechat Official. 2015. Voiceprint: The New WeChat Password. https:\/\/blog.wechat.com\/2015\/05\/21\/voiceprint-the-new-wechat-password."},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i16.17663"},{"key":"e_1_2_1_63_1","volume-title":"Proceedings of USENIX Security","author":"Yuan Xuejing","unstructured":"Xuejing Yuan, Yuxuan Chen, Yue Zhao, Yunhui Long, Xiaokang Liu, Kai Chen, Shengzhi Zhang, Heqing Huang, Xiaofeng Wang, and Carl A. Gunter. 2018. CommanderSong: A Systematic Approach for Practical Adversarial Voice Recognition. In Proceedings of USENIX Security. Baltimore, MD, USA, 49--64."},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1109\/JIOT.2020.2983228"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1145\/3534618"},{"key":"e_1_2_1_66_1","unstructured":"Zoom. 2022. One platform to connect. https:\/\/zoom.us\/."}],"container-title":["Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3596266","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3596266","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,14]],"date-time":"2025-07-14T04:45:25Z","timestamp":1752468325000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3596266"}},"subtitle":["Adversarial Example Enabled Voice De-Identification with Balanced Privacy and Utility"],"short-title":[],"issued":{"date-parts":[[2023,6,12]]},"references-count":66,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2023,6,12]]}},"alternative-id":["10.1145\/3596266"],"URL":"https:\/\/doi.org\/10.1145\/3596266","relation":{},"ISSN":["2474-9567"],"issn-type":[{"value":"2474-9567","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,6,12]]},"assertion":[{"value":"2023-06-12","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}