{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,24]],"date-time":"2026-03-24T15:25:40Z","timestamp":1774365940529,"version":"3.50.1"},"reference-count":50,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2023,6,12]],"date-time":"2023-06-12T00:00:00Z","timestamp":1686528000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62172277, 62072304"],"award-info":[{"award-number":["62172277, 62072304"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Interact. Mob. Wearable Ubiquitous Technol."],"published-print":{"date-parts":[[2023,6,12]]},"abstract":"<jats:p>Eavesdropping on human voice is one of the most common but harmful threats to personal privacy. Glasses are in direct contact with human face, which could sense facial motions when users speak, so human speech contents could be inferred by sensing the movements of glasses. In this paper, we present a live voice eavesdropping method, RF-Mic, which utilizes common glasses attached with a low-cost RFID tag to sense subtle facial speech dynamics for inferring possible voice contents. When a user with a glasses, which is attached an RFID tag on the glass bridge, is speaking, RF-Mic first collects RF signals through forward propagation and backscattering. Then, body motion interference is eliminated from the collected RF signals through a proposed Conditional Denoising AutoEncoder (CDAE) network. Next, RF-Mic extracts three kinds of facial speech dynamic features (i.e., facial movements, bone-borne vibrations, and airborne vibrations) by designing three different deep-learning models. Based on the extracted features, a facial speech dynamics model is constructed for live voice eavesdropping. Extensive experiments in different real environments demonstrate that RF-Mic can achieve robust and accurate human live voice eavesdropping.<\/jats:p>","DOI":"10.1145\/3596259","type":"journal-article","created":{"date-parts":[[2023,6,12]],"date-time":"2023-06-12T18:58:16Z","timestamp":1686596296000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":20,"title":["RF-Mic"],"prefix":"10.1145","volume":"7","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2389-4188","authenticated-orcid":false,"given":"Yunzhong","family":"Chen","sequence":"first","affiliation":[{"name":"Shanghai Jiao Tong University, Department of Computer Science and Engineering, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0207-9643","authenticated-orcid":false,"given":"Jiadi","family":"Yu","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Department of Computer Science and Engineering, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9266-3044","authenticated-orcid":false,"given":"Linghe","family":"Kong","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Department of Computer Science and Engineering, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0871-9795","authenticated-orcid":false,"given":"Hao","family":"Kong","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Department of Computer Science and Engineering, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6406-4992","authenticated-orcid":false,"given":"Yanmin","family":"Zhu","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Department of Computer Science and Engineering, Shanghai, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0782-4953","authenticated-orcid":false,"given":"Yi-Chao","family":"Chen","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, Department of Computer Science and Engineering, Shanghai, China"}]}],"member":"320","published-online":{"date-parts":[[2023,6,12]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Proc. IEEE Symposium on Security and Privacy","author":"Abhishek Anand S.","year":"2018","unstructured":"S. Abhishek Anand and Nitesh Saxena. 2018. Speechless: Analyzing the Threat to Speech Privacy from Smartphone Motion Sensors. In Proc. IEEE Symposium on Security and Privacy. San Francisco, USA, 1000--1017."},{"key":"e_1_2_1_2_1","volume-title":"Learning-based Practical Smartphone Eavesdropping with Built-in Accelerometer. In proc. NDSS","author":"Ba Zhongjie","year":"2020","unstructured":"Zhongjie Ba, Tianhang Zheng, Xinyu Zhang, Zhan Qin, Baochun Li, Xue Liu, and Kui Ren. 2020. Learning-based Practical Smartphone Eavesdropping with Built-in Accelerometer. In proc. NDSS. San Diego, USA, 23--26."},{"key":"e_1_2_1_3_1","unstructured":"C. BYU. 2020. Word frequency: based on 450 million word coca corpus. [Online]. Available: https:\/\/www.wordfrequency.info\/."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3447993.3483251"},{"key":"e_1_2_1_5_1","unstructured":"M Dobhn Daniel et al. 2008. The rf in rfid passive uhf rfid in practice. In Elsevier."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/2601097.2601119"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/2809695.2809708"},{"key":"e_1_2_1_8_1","volume-title":"Dynamics of speech production and perception","author":"Divenyi Pierre","unstructured":"Pierre Divenyi, Steven Greenberg, and Georg Meyer. 2006. Dynamics of speech production and perception. Vol. 374. Ios Press."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448101"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3300061.3343379"},{"key":"e_1_2_1_11_1","unstructured":"Google. 2023. Google Assistant your own personal Google. [Online]. Available: https:\/\/assistant.google.com\/."},{"key":"e_1_2_1_12_1","first-page":"1","article-title":"Towards Unconstrained Vocabulary Eavesdropping With Mmwave Radar Using GAN","volume":"01","author":"Hu Pengfei","year":"2022","unstructured":"Pengfei Hu, Wenhao Li, Yifan Ma, Panneer Selvam Santhalingam, Parth Pathak, Hong Li, Huanle Zhang, Guoming Zhang, Xiuzhen Cheng, and Prasant Mohapatra. 2022. Towards Unconstrained Vocabulary Eavesdropping With Mmwave Radar Using GAN. IEEE Transactions on Mobile Computing 01 (2022), 1--14.","journal-title":"IEEE Transactions on Mobile Computing"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM48880.2022.9796940"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/SP46214.2022.9833716"},{"key":"e_1_2_1_15_1","unstructured":"iflytek. 2022. iFlytek Input. [Online]. Available: https:\/\/srf.xunfei.cn\/."},{"key":"e_1_2_1_16_1","volume-title":"Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114","author":"Kingma Diederik P","year":"2013","unstructured":"Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1161\/CIRCULATIONAHA.105.584474"},{"key":"e_1_2_1_18_1","unstructured":"Mike Lenehan. 2021. Impinj Inc. Application Note -- Low Level User Data Support. [Online]. Available: https:\/\/support.impinj.com\/hc\/en-us\/articles\/202755318-Application-Note-Low-Level-User-Data-Support."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM.2019.8737592"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMC.2019.2963152"},{"key":"e_1_2_1_21_1","first-page":"4","volume-title":"Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2","author":"Cordourier Maruri H\u00e9ctor A.","year":"2018","unstructured":"H\u00e9ctor A. Cordourier Maruri, Paulo Lopez-Meyer, Jonathan Huang, Willem Marco Beltman, Lama Nachman, and Hong Lu. 2018. V-Speech: Noise-Robust Speech Capturing Glasses Using Vibration Sensors. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2, 4 (2018), 180:1--180:23."},{"key":"e_1_2_1_22_1","volume-title":"Proc. USENIX","author":"Michalevsky Yan","year":"2014","unstructured":"Yan Michalevsky, Dan Boneh, and Gabi Nakibly. 2014. Gyrophone: Recognizing Speech from Gyroscope Signals. In Proc. USENIX. San Diego, CA,USA, 1053--1067."},{"key":"e_1_2_1_23_1","unstructured":"F. Mavromatis N. Kargas and A. Bletsas. 2019. USRP reader. [Online]. Available: https:\/\/github.com\/nkargas\/Gen2-UHF-RFID-Reader."},{"key":"e_1_2_1_24_1","volume-title":"Lamphone: Real-Time Passive Sound Recovery from Light Bulb Vibrations. Cryptology ePrint Archive, Paper 2020\/708.","author":"Nassi Ben","year":"2020","unstructured":"Ben Nassi, Yaron Pirutin, Adi Shamir, Yuval Elovici, and Boris Zadov. 2020. Lamphone: Real-Time Passive Sound Recovery from Light Bulb Vibrations. Cryptology ePrint Archive, Paper 2020\/708."},{"key":"e_1_2_1_25_1","unstructured":"Louis C.W. Pols. 2011. SPEECH DYNAMICS. In Plenary Lecture."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1121\/1.2146113"},{"key":"e_1_2_1_27_1","unstructured":"rfidhy. 2022. The Smallest RFID Tag as Thin as Sand. [Online]. Available: https:\/\/www.rfidhy.com\/the-smallest-rfid-tag-as-thin-as-sand\/."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3384419.3430781"},{"key":"e_1_2_1_29_1","volume-title":"An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition","author":"Shi Baoguang","year":"2016","unstructured":"Baoguang Shi, Xiang Bai, and Cong Yao. 2016. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE transactions on pattern analysis and machine intelligence 39, 11 (2016), 2298--2304."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3447993.3483272"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/3494969"},{"key":"e_1_2_1_32_1","volume-title":"Attention is all you need. Advances in neural information processing systems 30","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017)."},{"key":"e_1_2_1_33_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3214288","article-title":"Rf-ecg: Heart rate variability assessment based on cots rfid tag array","volume":"2","author":"Wang Chuyu","year":"2018","unstructured":"Chuyu Wang and Lei Xie. 2018. Rf-ecg: Heart rate variability assessment based on cots rfid tag array. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2, 2 (2018), 1--26.","journal-title":"Proc. ACM Interact. Mob. Wearable Ubiquitous Technol."},{"key":"e_1_2_1_34_1","first-page":"4","volume-title":"Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 5","author":"Wang Chuyu","year":"2021","unstructured":"Chuyu Wang, Lei Xie, Yuancan Lin, Wei Wang, and Yingying Chen et al. 2021. Thru-the-wall Eavesdropping on Loudspeakers via RFID by Capturing Sub-mm Level Vibration. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 5, 4 (2021), 182:1--182:25."},{"key":"e_1_2_1_35_1","volume-title":"Speech separation by humans and machines","author":"Wang DeLiang","unstructured":"DeLiang Wang. 2005. On ideal binary mask as the computational goal of auditory scene analysis. In Speech separation by humans and machines. Springer, 181--197."},{"key":"e_1_2_1_36_1","volume-title":"We can hear you with Wi-Fi! IEEE Transactions on Mobile Computing 15, 11","author":"Wang Guanhua","year":"2016","unstructured":"Guanhua Wang, Yongpan Zou, Zimu Zhou, Kaishun Wu, and Lionel M Ni. 2016. We can hear you with Wi-Fi! IEEE Transactions on Mobile Computing 15, 11 (2016), 2907--2920."},{"key":"e_1_2_1_37_1","first-page":"1","article-title":"Toothsonic: Earable authentication via acoustic toothprint","volume":"6","author":"Wang Zi","year":"2022","unstructured":"Zi Wang, Yili Ren, Yingying Chen, and Jie Yang. 2022. Toothsonic: Earable authentication via acoustic toothprint. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 2 (2022), 1--24.","journal-title":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/2789168.2790119"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM.2016.7524436"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNET.2017.2766526"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3384419.3430771"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3307334.3326073"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMC.2017.2772253"},{"key":"e_1_2_1_44_1","volume-title":"Tagbeat: Sensing mechanical vibration period with cots rfid systems","author":"Yang Lei","year":"2017","unstructured":"Lei Yang, Yao Li, Qiongzheng Lin, Huanyu Jia, Xiang-Yang Li, and Yunhao Liu. 2017. Tagbeat: Sensing mechanical vibration period with cots rfid systems. IEEE\/ACM transactions on networking 25, 6 (2017), 3823--3835."},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/2973750.2973759"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM41043.2020.9155251"},{"key":"e_1_2_1_47_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3090095","article-title":"Soundtrak: Continuous 3d tracking of a finger using active acoustics","volume":"1","author":"Zhang Cheng","year":"2017","unstructured":"Cheng Zhang, Qiuyue Xue, Anandghan Waghmare, Sumeet Jain, Yiming Pu, Sinan Hersek, Kent Lyons, Kenneth A Cunefare, Omer T Inan, and Gregory D Abowd. 2017. Soundtrak: Continuous 3d tracking of a finger using active acoustics. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 1, 2 (2017), 1--25.","journal-title":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/2742647.2742658"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/TII.2019.2943898"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1186\/1687-1499-2014-137"}],"container-title":["Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3596259","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3596259","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,14]],"date-time":"2025-07-14T04:46:33Z","timestamp":1752468393000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3596259"}},"subtitle":["Live Voice Eavesdropping via Capturing Subtle Facial Speech Dynamics Leveraging RFID"],"short-title":[],"issued":{"date-parts":[[2023,6,12]]},"references-count":50,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2023,6,12]]}},"alternative-id":["10.1145\/3596259"],"URL":"https:\/\/doi.org\/10.1145\/3596259","relation":{},"ISSN":["2474-9567"],"issn-type":[{"value":"2474-9567","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,6,12]]},"assertion":[{"value":"2023-06-12","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}