{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,8]],"date-time":"2026-01-08T01:45:38Z","timestamp":1767836738094,"version":"3.49.0"},"reference-count":49,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2023,3,27]],"date-time":"2023-03-27T00:00:00Z","timestamp":1679875200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62102354, 62032021, 62172359, 61972348"],"award-info":[{"award-number":["62102354, 62032021, 62172359, 61972348"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100012166","name":"National Key Research and Development Program of China","doi-asserted-by":"publisher","award":["2020AAA0107700"],"award-info":[{"award-number":["2020AAA0107700"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100012226","name":"Fundamental Research Funds for the Central Universities","doi-asserted-by":"publisher","award":["2021FZZX001-27"],"award-info":[{"award-number":["2021FZZX001-27"]}],"id":[{"id":"10.13039\/501100012226","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Interact. Mob. Wearable Ubiquitous Technol."],"published-print":{"date-parts":[[2023,3,27]]},"abstract":"<jats:p>Recently, voice leakage gradually raises more significant concerns of users, due to its underlying sensitive and private information when providing intelligent services. Existing studies demonstrate the feasibility of applying learning-based solutions on built-in sensor measurements to recover voices. However, due to the privacy concerns, large-scale voices-sensor measurements samples for model training are not publicly available, leading to significant efforts in data collection for such an attack. In this paper, we propose a training-free and universal eavesdropping attack on built-in speakers, VoiceListener, which releases the data collection efforts and is able to adapt to various voices, platforms, and domains. In particular, VoiceListener develops an aliasing-corrected super resolution mechanism, including an aliasing-based pitch estimation and an aliasing-corrected voice recovering, to convert the undersampled narrow-band sensor measurements to wide-band voices. Extensive experiments demonstrate that our proposed VoiceListener could accurately recover the voices from undersampled sensor measurements and is robust to different voices, platforms and domains, realizing the universal eavesdropping attack.<\/jats:p>","DOI":"10.1145\/3580789","type":"journal-article","created":{"date-parts":[[2023,3,28]],"date-time":"2023-03-28T14:57:51Z","timestamp":1680015471000},"page":"1-22","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":10,"title":["VoiceListener"],"prefix":"10.1145","volume":"7","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0975-2252","authenticated-orcid":false,"given":"Lei","family":"Wang","sequence":"first","affiliation":[{"name":"Zhejiang University, School of Cyber Science and Technology, China and ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4775-5107","authenticated-orcid":false,"given":"Meng","family":"Chen","sequence":"additional","affiliation":[{"name":"Zhejiang University, School of Cyber Science and Technology, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5230-3749","authenticated-orcid":false,"given":"Li","family":"Lu","sequence":"additional","affiliation":[{"name":"Zhejiang University, School of Cyber Science and Technology, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0921-8869","authenticated-orcid":false,"given":"Zhongjie","family":"Ba","sequence":"additional","affiliation":[{"name":"Zhejiang University, School of Cyber Science and Technology, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5240-5200","authenticated-orcid":false,"given":"Feng","family":"Lin","sequence":"additional","affiliation":[{"name":"Zhejiang University, School of Cyber Science and Technology, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3441-6277","authenticated-orcid":false,"given":"Kui","family":"Ren","sequence":"additional","affiliation":[{"name":"Zhejiang University, School of Cyber Science and Technology, Hangzhou, China"}]}],"member":"320","published-online":{"date-parts":[[2023,3,28]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Amazon. 2021. Amazon Alexa - Learn what Alexa can do | Amazon.com. https:\/\/www.amazon.com\/b?node=21576558011. (2021)."},{"key":"e_1_2_1_2_1","volume-title":"Proceedings of IEEE S&P. 1000--1017","author":"Abhishek Anand S","year":"2018","unstructured":"S Abhishek Anand and Nitesh Saxena. 2018. Speechless: Analyzing the threat to speech privacy from smartphone motion sensors. In Proceedings of IEEE S&P. 1000--1017."},{"key":"e_1_2_1_3_1","unstructured":"Apple. 2021. Getting Raw Gyroscope Events. https:\/\/developer.apple.com\/documentation\/coremotion\/getting_raw_gyroscope_events. (2021)."},{"key":"e_1_2_1_4_1","unstructured":"Apple. 2021. Siri - Apple. https:\/\/www.apple.com\/siri\/. (2021)."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.14722\/ndss.2020.24076"},{"key":"e_1_2_1_6_1","volume-title":"Interpreting and Explaining Deep Neural Networks for Classification of Audio Signals. arXiv preprint arXiv:1807.03418","author":"Becker S\u00f6ren","year":"2018","unstructured":"S\u00f6ren Becker, Marcel Ackermann, Sebastian Lapuschkin, Klaus-Robert M\u00fcller, and Wojciech Samek. 2018. Interpreting and Explaining Deep Neural Networks for Classification of Audio Signals. arXiv preprint arXiv:1807.03418 (2018). arXiv:1807.03418"},{"key":"e_1_2_1_7_1","unstructured":"Fox Bussiness. 2019. Apple's Siri is eavesdropping on your conversations putting users at risk: Report. https:\/\/www.foxbusiness.com\/technology\/apples-siri-is-eavesdropping-on-your-conversations-putting-users-at-risk. (2019)."},{"key":"e_1_2_1_8_1","volume-title":"A practical introduction to phonetics","author":"Catford John Cunnison","unstructured":"John Cunnison Catford. 1988. A practical introduction to phonetics. Clarendon Press Oxford."},{"key":"e_1_2_1_9_1","unstructured":"CowBoy Channel. 2021. Voice Assistant Industry Size Market Share: 2021 Market Research with Growth Manufacturers Segments and 2023 Forecasts Research. https:\/\/www.thecowboychannel.com\/story\/43600953\/voice-assistant-industry-size-market-share-2021-market-research-with-growth-manufacturers-segments-and-2023-forecasts-research. (2021)."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/SP40001.2021.00004"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM48880.2022.9796934"},{"key":"e_1_2_1_12_1","unstructured":"ChinaDialy. 2018. Suit claims Baidu apps illegally tap data. http:\/\/www.chinadaily.com.cn\/a\/201801\/06\/WS5a5016cfa31008cf16da568a.html. (2018)."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/SCFT.1999.781522"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM48880.2022.9796890"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.6028\/NIST.IR.4930"},{"key":"e_1_2_1_16_1","unstructured":"Google. 2021. Android Developer. https:\/\/developer.android.com\/guide\/topics\/sensors\/sensors_overview. (2021)."},{"key":"e_1_2_1_17_1","unstructured":"Google. 2021. Google Assistant your own personal Google. https:\/\/assistant.google.com\/. (2021)."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASSP.1976.1162849"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASSP.1984.1164317"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3055031.3055088"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1121\/1.396427"},{"key":"e_1_2_1_22_1","volume-title":"Proceedings of USENIX Security. 2273--2290","author":"Hussain Shehzeen","year":"2021","unstructured":"Shehzeen Hussain, Paarth Neekhara, Shlomo Dubnov, Julian J. McAuley, and Farinaz Koushanfar. 2021. WaveGuard: Understanding and Mitigating Audio Adversarial Examples. In Proceedings of USENIX Security. 2273--2290."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2003.1198872"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0165-1684(03)00082-3"},{"key":"e_1_2_1_25_1","volume-title":"Proceedings of ICLR.","author":"Kuleshov Volodymyr","year":"2017","unstructured":"Volodymyr Kuleshov, S Zayd Enam, and Stefano Ermon. 2017. Audio super-resolution using neural nets. In Proceedings of ICLR."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.apacoust.2011.10.013"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2019-3043"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3372297.3423348"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2018.8462049"},{"key":"e_1_2_1_30_1","first-page":"428","article-title":"High-frequency regeneration in speech coding systems","volume":"4","author":"Makhoul John","year":"1979","unstructured":"John Makhoul and Michael Berouti. 1979. High-frequency regeneration in speech coding systems. In Proceedings of IEEE ICASSP, Vol. 4. 428--431.","journal-title":"Proceedings of IEEE ICASSP"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/WASPAA.2015.7336890"},{"key":"e_1_2_1_32_1","first-page":"4","volume-title":"Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2","author":"Cordourier Maruri H\u00e9ctor A.","year":"2018","unstructured":"H\u00e9ctor A. Cordourier Maruri, Paulo Lopez-Meyer, Jonathan Huang, Willem Marco Beltman, Lama Nachman, and Hong Lu. 2018. V-Speech: Noise-Robust Speech Capturing Glasses Using Vibration Sensors. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2, 4 (2018)."},{"key":"e_1_2_1_33_1","unstructured":"MEMSIC. 2021. MMC3416xPJ. http:\/\/www.memsic.com\/uploadfiles\/2021\/02\/20210210110317113.pdf. (2021)."},{"key":"e_1_2_1_34_1","volume-title":"Proceedings of USENIX Security. 1053--1067","author":"Michalevsky Yan","year":"2014","unstructured":"Yan Michalevsky, Dan Boneh, and Gabi Nakibly. 2014. Gyrophone: Recognizing speech from gyroscope signals. In Proceedings of USENIX Security. 1053--1067."},{"key":"e_1_2_1_35_1","unstructured":"Microsoft. 2021. Cortana - Your personal productivity assistant. https:\/\/www.microsoft.com\/en-us\/cortana. (2021)."},{"key":"e_1_2_1_36_1","volume-title":"Proceedings of IEEE International Conference on Communications and Signal Processing. 410--412","author":"Mohan D Murali","year":"2011","unstructured":"D Murali Mohan, Dileep B Karpur, Manoj Narayan, and J Kishore. 2011. Artificial bandwidth extension of narrowband speech using Gaussian mixture model. In Proceedings of IEEE International Conference on Communications and Signal Processing. 410--412."},{"key":"e_1_2_1_37_1","first-page":"1843","article-title":"Narrowband to wideband conversion of speech using GMM based transformation","volume":"3","author":"Park Kun-Youl","year":"2000","unstructured":"Kun-Youl Park and Hyung Soon Kim. 2000. Narrowband to wideband conversion of speech using GMM based transformation. In Proceedings of IEEE ICASSP, Vol. 3. 1843--1846.","journal-title":"Proceedings of IEEE ICASSP"},{"key":"e_1_2_1_38_1","volume-title":"Proceedings of Australian International Conference on Speech Science, Technology. 106--111","author":"Qian Yasheng","year":"2002","unstructured":"Yasheng Qian and Peter Kabal. 2002. Wideband speech recovery from narrowband speech using classified codebook mapping. In Proceedings of Australian International Conference on Speech Science, Technology. 106--111."},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/MLSP52302.2021.9596082"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/3384419.3430781"},{"key":"e_1_2_1_41_1","volume-title":"Samsung Bixby: Your Personal Voice Assistant | Samsung US. https:\/\/www.samsung.com\/us\/explore\/bixby\/.","year":"2021","unstructured":"Samsung. 2021. Samsung Bixby: Your Personal Voice Assistant | Samsung US. https:\/\/www.samsung.com\/us\/explore\/bixby\/. (2021)."},{"key":"e_1_2_1_42_1","first-page":"4","volume-title":"Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 5","author":"Su Weigao","year":"2022","unstructured":"Weigao Su, Daibo Liu, Taiyuan Zhang, and Hongbo Jiang. 2022. Towards Device Independent Eavesdropping on Telephone Conversations with Built-in Accelerometer. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 5, 4 (2022)."},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2010.5495701"},{"key":"e_1_2_1_44_1","unstructured":"The New York Times. 2019. Amazon's Alexa Never Stops Listening to You. Should You Worry? https:\/\/www.nytimes.com\/wirecutter\/blog\/amazons-alexa-never-stops-listening-to-you\/. (2019)."},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP40776.2020.9053712"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/3478102"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/2789168.2790119"},{"key":"e_1_2_1_48_1","volume-title":"Proceedings of IEEE ICASSP","volume":"1","author":"Yao Sheng","year":"2005","unstructured":"Sheng Yao and Cheung-Fat Chan. 2005. Block-based bandwidth extension of narrowband speech signal by using CDHMM. In Proceedings of IEEE ICASSP, Vol. 1. I-793."},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/2742647.2742658"}],"container-title":["Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3580789","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3580789","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,14]],"date-time":"2025-07-14T04:43:51Z","timestamp":1752468231000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3580789"}},"subtitle":["A Training-free and Universal Eavesdropping Attack on Built-in Speakers of Mobile Devices"],"short-title":[],"issued":{"date-parts":[[2023,3,27]]},"references-count":49,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,3,27]]}},"alternative-id":["10.1145\/3580789"],"URL":"https:\/\/doi.org\/10.1145\/3580789","relation":{},"ISSN":["2474-9567"],"issn-type":[{"value":"2474-9567","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,3,27]]},"assertion":[{"value":"2023-03-28","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}