{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,7]],"date-time":"2026-01-07T15:10:42Z","timestamp":1767798642912,"version":"3.49.0"},"reference-count":65,"publisher":"Association for Computing Machinery (ACM)","issue":"1","funder":[{"name":"National Key Research and Development Program of China","award":["2024YFC3014300"],"award-info":[{"award-number":["2024YFC3014300"]}]},{"DOI":"10.13039\/501100001809","name":"Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62372378, 72225011, and 72434005"],"award-info":[{"award-number":["62372378, 72225011, and 72434005"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100015401","name":"Key Research and Development Program of ShaanXi","doi-asserted-by":"crossref","award":["2024GX-YBXM-538"],"award-info":[{"award-number":["2024GX-YBXM-538"]}],"id":[{"id":"10.13039\/501100015401","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Priv. Secur."],"published-print":{"date-parts":[[2026,2,28]]},"abstract":"<jats:p>\n                    With the popularity of mobile devices, a variety of motion sensors are integrated to enhance the user experience. Although existing studies demonstrated that non-acoustic motion sensors can be attacked by adversaries, they overlook the limited sampling frequencies of motion sensors (e.g., &lt; 500 Hz) in mobile devices and are evaluated in the controlled laboratory settings. In this article, we explore a new attack model on non-acoustic motion sensors based on the off-the-shelf mobile devices. We propose a general framework named\n                    <jats:italic toggle=\"yes\">VoiceFormer<\/jats:italic>\n                    to synthesize high-fidelity speeches based on the vibrations of accelerometers and gyroscopes with a low sampling frequency. Specifically, in\n                    <jats:italic toggle=\"yes\">VoiceFormer<\/jats:italic>\n                    , we introduce a signal alignment approach to remove the time offsets between two nonsynchronous signals, and leverage Time Interleaved Analog-Digital-Conversion (TI-ADC) to generate a high-frequency synthetic signal (e.g., &gt; 8 KHz) based on the vibration signals of accelerometers and gyroscopes on the same motherboard. To synthesize the high-fidelity acoustic waveforms, we propose a wavelet-based generative adversarial network to learn the spatiotemporal latent mapping between vibrations and original speech signals. Extensive experimental results demonstrate the feasibility of voice synthesis by spying the low-frequency non-acoustic motion sensors in off-the-shelf mobile devices.\n                    <jats:italic toggle=\"yes\">VoiceFormer<\/jats:italic>\n                    shows impressive performance in the synthesized acoustical signals with a Mean Opinion Score of 3.38. Although there are significant differences of mobile devices in hardware settings, VoiceFormer shows robust performance in synthesizing intelligible voice signals. Our results suggest that eavesdropping an off-the-shelf mobile device remotely by fusing non-acoustic sensors is feasible.\n                  <\/jats:p>","DOI":"10.1145\/3779062","type":"journal-article","created":{"date-parts":[[2025,12,1]],"date-time":"2025-12-01T12:01:37Z","timestamp":1764590497000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["VoiceFormer: Fusing Non-Acoustic Motion Sensors for High-Fidelity Voice Synthesis in Mobile Devices"],"prefix":"10.1145","volume":"29","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8139-2231","authenticated-orcid":false,"given":"Xiaokai","family":"Yan","sequence":"first","affiliation":[{"name":"School of Computer Science, Northwestern Polytechnical University","place":["Xi'an, China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8381-8187","authenticated-orcid":false,"given":"Yunji","family":"Liang","sequence":"additional","affiliation":[{"name":"School of Computer Science, Northwestern Polytechnical University","place":["Xi'an, China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5145-2654","authenticated-orcid":false,"given":"Lei","family":"Liu","sequence":"additional","affiliation":[{"name":"School of Computer Science, Northwestern Polytechnical University","place":["Xi'an, China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4513-805X","authenticated-orcid":false,"given":"Sagar","family":"Samtani","sequence":"additional","affiliation":[{"name":"Kelley\u2019s Data Science and Artificial Intelligence Lab, Indiana University","place":["Bloomington, United States"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6097-2467","authenticated-orcid":false,"given":"Bin","family":"Guo","sequence":"additional","affiliation":[{"name":"School of Computer Science, Northwestern Polytechnical University","place":["Xi'an, China"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9905-3238","authenticated-orcid":false,"given":"Zhiwen","family":"Yu","sequence":"additional","affiliation":[{"name":"School of Computer Science, Northwestern Polytechnical University","place":["Xi'an, China"]}]}],"member":"320","published-online":{"date-parts":[[2026,1,6]]},"reference":[{"key":"e_1_3_3_2_2","doi-asserted-by":"publisher","DOI":"10.1109\/SP.2018.00004"},{"key":"e_1_3_3_3_2","doi-asserted-by":"publisher","DOI":"10.1145\/3448300.3468499"},{"key":"e_1_3_3_4_2","doi-asserted-by":"publisher","DOI":"10.14722\/ndss.2020.24076"},{"key":"e_1_3_3_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/SP46214.2022.9833568"},{"key":"e_1_3_3_6_2","doi-asserted-by":"crossref","unstructured":"Yetong Cao Fan Li Huijie Chen Xiaochen Liu Shengchun Zhai Song Yang and Yu Wang. 2023. Live speech recognition via earphone motion sensors. IEEE Transactions on Mobile Computing 23 6 (2023) 7284\u20137300.","DOI":"10.1109\/TMC.2023.3333214"},{"key":"e_1_3_3_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/7.106130"},{"key":"e_1_3_3_8_2","volume-title":"Proceedings of the Network and Distributed System Security Symposium (NDSS)","author":"Cayir Derin","year":"2025","unstructured":"Derin Cayir, Reham Mohamed, Riccardo Lazzeretti, Marco Angelini, Abbas Acar, Mauro Conti, Z. Berkay Celik, and Selcuk Uluagac. 2025. Speak up, I\u2019m listening: Extracting speech from zero-permission VR sensors. In Proceedings of the Network and Distributed System Security Symposium (NDSS)."},{"key":"e_1_3_3_9_2","doi-asserted-by":"publisher","DOI":"10.1145\/3365366"},{"key":"e_1_3_3_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/3365366"},{"key":"e_1_3_3_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2016.7472621"},{"key":"e_1_3_3_12_2","doi-asserted-by":"crossref","unstructured":"Shuai Chen Mengmeng Hao Fangyu Ding Dong Jiang Jiping Dong Shize Zhang Qiquan Guo and Chundong Gao. 2023. Exploring the global geography of cybercrime and its driving forces. Humanities and Social Sciences Communications 10 1 (2023) 1\u201310.","DOI":"10.1057\/s41599-023-01560-x"},{"key":"e_1_3_3_13_2","doi-asserted-by":"publisher","unstructured":"Yunzhong Chen Jiadi Yu Linghe Kong and Yanmin Zhu. 2025. A comprehensive survey of side-channel sound-sensing methods. IEEE Internet of Things Journal 12 2 (2025) 1554\u20131578. DOI:10.1109\/JIOT.2024.3501334","DOI":"10.1109\/JIOT.2024.3501334"},{"key":"e_1_3_3_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/9.280746"},{"key":"e_1_3_3_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/3507952"},{"key":"e_1_3_3_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/3696418"},{"key":"e_1_3_3_17_2","doi-asserted-by":"publisher","DOI":"10.1145\/3510583"},{"key":"e_1_3_3_18_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jisa.2023.103479"},{"key":"e_1_3_3_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/3403947"},{"key":"e_1_3_3_20_2","unstructured":"Krassimir Hristov Denishev and Mihaela Rangelova Petrova. 2007. Accelerometer design. Proceedings of the ELECTRONICS (2007) 159\u2013164."},{"key":"e_1_3_3_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52734.2025.01907"},{"key":"e_1_3_3_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM48880.2022.9796890"},{"key":"e_1_3_3_23_2","unstructured":"Yangyang Gu Xianglong Li Haolin Wu Jing Chen Kun He Ruiying Du and Cong Wu. 2025. CSI2Dig: Recovering digit content from smartphone loudspeakers using channel state information. arXiv:2504.14812. Retrieved from https:\/\/arxiv.org\/abs\/2504.14812"},{"key":"e_1_3_3_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/3055031.3055088"},{"key":"e_1_3_3_25_2","first-page":"836","volume-title":"Proceedings of the 2023 IEEE Symposium on Security and Privacy (SP)","author":"Hu Pengfei","year":"2022","unstructured":"Pengfei Hu, Wenhao Li, Riccardo Spolaor, and Xiuzhen Cheng. 2022. mmEcho: A mmWave-based acoustic eavesdropping method. In Proceedings of the 2023 IEEE Symposium on Security and Privacy (SP). IEEE Computer Society, 836\u2013852."},{"key":"e_1_3_3_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM48880.2022.9796940"},{"key":"e_1_3_3_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/SP46214.2022.9833716"},{"key":"e_1_3_3_28_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v39i16.33908"},{"key":"e_1_3_3_29_2","doi-asserted-by":"crossref","unstructured":"Wenbin Huang Hanyuan Chen Hangcheng Cao Ju Ren Hongbo Jiang Zhangjie Fu and Yaoxue Zhang. 2024. Manipulating voice assistants eavesdropping via inherent vulnerability unveiling in mobile systems. IEEE Transactions on Mobile Computing 23 12 (2024) 11549\u201311563.","DOI":"10.1109\/TMC.2024.3401096"},{"key":"e_1_3_3_30_2","doi-asserted-by":"publisher","DOI":"10.1038\/s42256-021-00389-w"},{"key":"e_1_3_3_31_2","first-page":"2410","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Kalchbrenner Nal","year":"2018","unstructured":"Nal Kalchbrenner, Erich Elsen, Karen Simonyan, Seb Noury, Norman Casagrande, Edward Lockhart, Florian Stimberg, Aaron Oord, Sander Dieleman, and Koray Kavukcuoglu. 2018. Efficient neural audio synthesis. In Proceedings of the International Conference on Machine Learning. PMLR, 2410\u20132419."},{"key":"e_1_3_3_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/IRI.2013.6642539"},{"key":"e_1_3_3_33_2","doi-asserted-by":"publisher","DOI":"10.1145\/3372420"},{"key":"e_1_3_3_34_2","first-page":"17022","article-title":"HiFi-GAN: Generative adversarial networks for efficient and high fidelity speech synthesis","volume":"33","author":"Kong Jungil","year":"2020","unstructured":"Jungil Kong, Jaehyeon Kim, and Jaekyoung Bae. 2020. HiFi-GAN: Generative adversarial networks for efficient and high fidelity speech synthesis. Advances in Neural Information Processing Systems 33 (2020), 17022\u201317033.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_3_35_2","unstructured":"Kundan Kumar Rithesh Kumar Thibault de Boissiere Lucas Gestin Wei Zhen Teoh Jose Sotelo Alexandre de Brebisson Yoshua Bengio and Aaron Courville. 2019. MelGAN: Generative adversarial networks for conditional waveform synthesis. In Proceedings of the 33rd International Conference on Neural Information Processing Systems. 14910\u201314921."},{"key":"e_1_3_3_36_2","doi-asserted-by":"crossref","unstructured":"Yunji Liang Yuchen Qin Qi Li Xiaokai Yan Luwen Huangfu Sagar Samtani Bin Guo and Zhiwen Yu. 2022. An escalated eavesdropping attack on mobile devices via low-resolution vibration signals. IEEE Transactions on Dependable and Secure Computing 20 4 (2022) 3037\u20133050.","DOI":"10.1109\/TDSC.2022.3198934"},{"key":"e_1_3_3_37_2","doi-asserted-by":"crossref","unstructured":"Yunji Liang Yuchen Qin Qi Li Xiaokai Yan Zhiwen Yu Bin Guo Sagar Samtani and Yanyong Zhang. 2022. Accmyrinx: Speech synthesis with non-acoustic sensor. In Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies 6 3 (2022) 1\u201324.","DOI":"10.1145\/3550338"},{"key":"e_1_3_3_38_2","doi-asserted-by":"publisher","DOI":"10.1145\/3637063"},{"key":"e_1_3_3_39_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2007.366978"},{"key":"e_1_3_3_40_2","article-title":"Everything about STMicroelectronics\u20193-axis digital MEMS gyroscopes","author":"Microelectronics ST","year":"2011","unstructured":"ST Microelectronics. 2011. Everything about STMicroelectronics\u20193-axis digital MEMS gyroscopes. Technical Article TA0343. ST Microelectronics .","journal-title":"Technical Article TA0343. ST Microelectronics"},{"key":"e_1_3_3_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2015.7178964"},{"key":"e_1_3_3_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/WASPAA.2013.6701851"},{"key":"e_1_3_3_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2019.8683143"},{"key":"e_1_3_3_44_2","first-page":"163","volume-title":"Proceedings of the 2013 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS)","author":"Qi Xin","year":"2013","unstructured":"Xin Qi, Matthew Keally, Gang Zhou, Yantao Li, and Zhen Ren. 2013. AdaSense: Adapting sampling rates for activity recognition in body sensor networks. In Proceedings of the 2013 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS). IEEE, 163\u2013172."},{"key":"e_1_3_3_45_2","first-page":"26426","article-title":"Gyrophone recognizing speech from gyroscope signals","volume":"21","author":"Ren Xingjing","year":"2014","unstructured":"Xingjing Ren, Xin Zhou, Sheng Yu, Xuezhong Wu, and Dingbang Xiao. 2014. Gyrophone recognizing speech from gyroscope signals. InProceedings of the 23rd USENIX Security Symposium 21 . 26426\u201326446.","journal-title":"Proceedings of the 23rd USENIX Security Symposium"},{"key":"e_1_3_3_46_2","doi-asserted-by":"publisher","DOI":"10.1145\/3712308"},{"key":"e_1_3_3_47_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ast.2014.06.005"},{"key":"e_1_3_3_48_2","doi-asserted-by":"publisher","DOI":"10.1145\/3384419.3430781"},{"key":"e_1_3_3_49_2","doi-asserted-by":"publisher","DOI":"10.1145\/3632175"},{"key":"e_1_3_3_50_2","doi-asserted-by":"publisher","DOI":"10.1145\/3659603"},{"key":"e_1_3_3_51_2","doi-asserted-by":"publisher","DOI":"10.14722\/ndss.2023.24077"},{"key":"e_1_3_3_52_2","doi-asserted-by":"publisher","DOI":"10.1126\/sciadv.abl6464"},{"key":"e_1_3_3_53_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCAS.2006.1693352"},{"key":"e_1_3_3_54_2","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM48880.2022.9796806"},{"key":"e_1_3_3_55_2","first-page":"3997","volume-title":"Proceedings of the 33rd USENIX Security Symposium (USENIX Security 24)","author":"Wang Chao","year":"2024","unstructured":"Chao Wang, Feng Lin, Hao Yan, Tong Wu, Wenyao Xu, and Kui Ren. 2024. VibSpeech: Exploring practical wideband eavesdropping via bandlimited signal of vibration-based side channel. In Proceedings of the 33rd USENIX Security Symposium (USENIX Security 24). 3997\u20134014."},{"key":"e_1_3_3_56_2","doi-asserted-by":"publisher","DOI":"10.1145\/3729475"},{"key":"e_1_3_3_57_2","doi-asserted-by":"publisher","DOI":"10.1145\/3580789"},{"key":"e_1_3_3_58_2","doi-asserted-by":"publisher","DOI":"10.1145\/3749463"},{"key":"e_1_3_3_59_2","volume-title":"Proceedings of the 13th International Conference on Learning Representations","author":"Wang Shiyu","year":"2025","unstructured":"Shiyu Wang, Jiawei LI, Xiaoming Shi, Zhou Ye, Baichuan Mo, Wenze Lin, Ju Shengtong, Zhixuan Chu, and Ming Jin. 2025. TimeMixer++: A general time series pattern machine for universal predictive analysis. In Proceedings of the 13th International Conference on Learning Representations. Retrieved from https:\/\/openreview.net\/forum?id=1CLzLXSFNn"},{"key":"e_1_3_3_60_2","volume-title":"Proceedings of the 12th International Conference on Learning Representations","author":"Wang Shiyu","year":"2024","unstructured":"Shiyu Wang, Haixu Wu, Xiaoming Shi, Tengge Hu, Huakun Luo, Lintao Ma, James Y. Zhang, and Jun Zhou. 2024. TimeMixer: Decomposable multiscale mixing for time series forecasting. In Proceedings of the 12th International Conference on Learning Representations. Retrieved from https:\/\/openreview.net\/forum?id=7oLshfEIC2"},{"key":"e_1_3_3_61_2","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM52122.2024.10621229"},{"key":"e_1_3_3_62_2","doi-asserted-by":"publisher","DOI":"10.1109\/SLT48900.2021.9383551"},{"key":"e_1_3_3_63_2","first-page":"6619","volume-title":"Proceedings of the 34th USENIX Security Symposium (USENIX Security 25)","author":"Yao Xin","year":"2025","unstructured":"Xin Yao, Kecheng Huang, Yimin Chen, Jiawei Guo, Jie Tang, and Ming Zhao. 2025. EchoLLM: LLM-augmented acoustic eavesdropping attack on bone conduction headphones with mmWave radar. In Proceedings of the 34th USENIX Security Symposium (USENIX Security 25). 6619\u20136638."},{"key":"e_1_3_3_64_2","doi-asserted-by":"publisher","DOI":"10.1145\/3701725"},{"key":"e_1_3_3_65_2","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM52122.2024.10621338"},{"issue":"4","key":"e_1_3_3_66_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3569486","article-title":"I spy you: Eavesdropping continuous speech on smartphones via motion sensors","volume":"6","author":"Zhang Shijia","year":"2023","unstructured":"Shijia Zhang, Yilin Liu, and Mahanth Gowda. 2023. I spy you: Eavesdropping continuous speech on smartphones via motion sensors. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 4 (2023), 1\u201331.","journal-title":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies"}],"container-title":["ACM Transactions on Privacy and Security"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3779062","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,7]],"date-time":"2026-01-07T12:32:13Z","timestamp":1767789133000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3779062"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,1,6]]},"references-count":65,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2026,2,28]]}},"alternative-id":["10.1145\/3779062"],"URL":"https:\/\/doi.org\/10.1145\/3779062","relation":{},"ISSN":["2471-2566","2471-2574"],"issn-type":[{"value":"2471-2566","type":"print"},{"value":"2471-2574","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,1,6]]},"assertion":[{"value":"2025-04-16","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-11-22","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2026-01-06","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}