{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,25]],"date-time":"2025-09-25T18:23:06Z","timestamp":1758824586824,"version":"3.41.0"},"reference-count":44,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2021,9,9]],"date-time":"2021-09-09T00:00:00Z","timestamp":1631145600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100006754","name":"Army Research Laboratory","doi-asserted-by":"publisher","award":["W911NF-17-20196"],"award-info":[{"award-number":["W911NF-17-20196"]}],"id":[{"id":"10.13039\/100006754","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"NSF","doi-asserted-by":"publisher","award":["CPS 20-38817"],"award-info":[{"award-number":["CPS 20-38817"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Interact. Mob. Wearable Ubiquitous Technol."],"published-print":{"date-parts":[[2021,9,9]]},"abstract":"<jats:p>In this paper, we present a novel deep neural network architecture that reconstructs the high-frequency audio of selected spoken human words from low-sampling-rate signals of (ego-)motion sensors, such as accelerometer and gyroscope data, recorded on everyday mobile devices. As the sampling rate of such motion sensors is much lower than the Nyquist rate of ordinary human voice (around 6kHz+), these motion sensor recordings suffer from a significant frequency aliasing effect. In order to recover the original high-frequency audio signal, our neural network introduces a novel layer, called the alias unfolding layer, specialized in expanding the bandwidth of an aliased signal by reversing the frequency folding process in the time-frequency domain. While perfect unfolding is known to be unrealizable, we leverage the sparsity of the original signal to arrive at a sufficiently accurate statistical approximation. Comprehensive experiments show that our neural network significantly outperforms the state of the art in audio reconstruction from motion sensor data, effectively reconstructing a pre-trained set of spoken keywords from low-frequency motion sensor signals (with a sampling rate of 100-400 Hz). The approach demonstrates the potential risk of information leakage from motion sensors in smart mobile devices.<\/jats:p>","DOI":"10.1145\/3478102","type":"journal-article","created":{"date-parts":[[2021,9,14]],"date-time":"2021-09-14T22:48:23Z","timestamp":1631659703000},"page":"1-29","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["Audio Keyword Reconstruction from On-Device Motion Sensor Signals via Neural Frequency Unfolding"],"prefix":"10.1145","volume":"5","author":[{"given":"Tianshi","family":"Wang","sequence":"first","affiliation":[{"name":"University of Illinois at Urbana-Champaign"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shuochao","family":"Yao","sequence":"additional","affiliation":[{"name":"George Mason University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Shengzhong","family":"Liu","sequence":"additional","affiliation":[{"name":"University of Illinois at Urbana-Champaign"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jinyang","family":"Li","sequence":"additional","affiliation":[{"name":"University of Illinois at Urbana-Champaign"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dongxin","family":"Liu","sequence":"additional","affiliation":[{"name":"University of Illinois at Urbana-Champaign"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Huajie","family":"Shao","sequence":"additional","affiliation":[{"name":"University of Illinois at Urbana-Champaign"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ruijie","family":"Wang","sequence":"additional","affiliation":[{"name":"University of Illinois at Urbana-Champaign"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tarek","family":"Abdelzaher","sequence":"additional","affiliation":[{"name":"University of Illinois at Urbana-Champaign"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,9,14]]},"reference":[{"key":"e_1_2_2_1_1","unstructured":"2019. Voice frequency. https:\/\/en.wikipedia.org\/wiki\/Voice_frequency  2019. Voice frequency. https:\/\/en.wikipedia.org\/wiki\/Voice_frequency"},{"key":"e_1_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2018.8462362"},{"key":"e_1_2_2_3_1","volume-title":"Spearphone: A speech privacy exploit via accelerometer-sensed reverberations from smartphone loudspeakers. arXiv preprint arXiv:1907.05972","author":"Anand S Abhishek","year":"2019","unstructured":"S Abhishek Anand , Chen Wang , Jian Liu , Nitesh Saxena , and Yingying Chen . 2019 . Spearphone: A speech privacy exploit via accelerometer-sensed reverberations from smartphone loudspeakers. arXiv preprint arXiv:1907.05972 (2019). S Abhishek Anand, Chen Wang, Jian Liu, Nitesh Saxena, and Yingying Chen. 2019. Spearphone: A speech privacy exploit via accelerometer-sensed reverberations from smartphone loudspeakers. arXiv preprint arXiv:1907.05972 (2019)."},{"volume-title":"2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 5927--5931","author":"Atti V.","key":"e_1_2_2_4_1","unstructured":"V. Atti , V. Krishnan , D. Dewasurendra , V. Chebiyyam , S. Subasingha , D. J. Sinder , V. Rajendran , I. Varga , J. Gibbs , L. Miao , V. Grancharov , and H. Pobloth . 2015. Super-wideband bandwidth extension for speech in the 3GPP EVS codec . In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 5927--5931 . V. Atti, V. Krishnan, D. Dewasurendra, V. Chebiyyam, S. Subasingha, D. J. Sinder, V. Rajendran, I. Varga, J. Gibbs, L. Miao, V. Grancharov, and H. Pobloth. 2015. Super-wideband bandwidth extension for speech in the 3GPP EVS codec. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 5927--5931."},{"volume-title":"Proceedings of the Network and Distributed Systems Security (NDSS) Symposium. 23--26","author":"Ba Zhongjie","key":"e_1_2_2_5_1","unstructured":"Zhongjie Ba , Tianhang Zheng , Xinyu Zhang , Zhan Qin , Baochun Li , Xue Liu , and Kui Ren . [n.d.]. Learning-based practical smartphone eavesdropping with built-in accelerometer . In Proceedings of the Network and Distributed Systems Security (NDSS) Symposium. 23--26 . Zhongjie Ba, Tianhang Zheng, Xinyu Zhang, Zhan Qin, Baochun Li, Xue Liu, and Kui Ren. [n.d.]. Learning-based practical smartphone eavesdropping with built-in accelerometer. In Proceedings of the Network and Distributed Systems Security (NDSS) Symposium. 23--26."},{"key":"e_1_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2005-528"},{"key":"e_1_2_2_7_1","volume-title":"2009 17th European Signal Processing Conference. IEEE","author":"Bauer Patrick","year":"2009","unstructured":"Patrick Bauer and Tim Fingscheidt . 2009 . A statistical framework for artificial bandwidth extension exploiting speech waveform and phonetic transcription . In 2009 17th European Signal Processing Conference. IEEE , 1839--1843. Patrick Bauer and Tim Fingscheidt. 2009. A statistical framework for artificial bandwidth extension exploiting speech waveform and phonetic transcription. In 2009 17th European Signal Processing Conference. IEEE, 1839--1843."},{"key":"e_1_2_2_8_1","volume-title":"Interpreting and Explaining Deep Neural Networks for Classification of Audio Signals. CoRR abs\/1807.03418","author":"Becker S\u00f6ren","year":"2018","unstructured":"S\u00f6ren Becker , Marcel Ackermann , Sebastian Lapuschkin , Klaus-Robert M\u00fcller , and Wojciech Samek . 2018. Interpreting and Explaining Deep Neural Networks for Classification of Audio Signals. CoRR abs\/1807.03418 ( 2018 ). arXiv:1807.03418 S\u00f6ren Becker, Marcel Ackermann, Sebastian Lapuschkin, Klaus-Robert M\u00fcller, and Wojciech Samek. 2018. Interpreting and Explaining Deep Neural Networks for Classification of Audio Signals. CoRR abs\/1807.03418 (2018). arXiv:1807.03418"},{"key":"e_1_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/89.326637"},{"key":"e_1_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2001.940919"},{"key":"e_1_2_2_11_1","volume-title":"Adversarial audio synthesis. arXiv preprint arXiv:1802.04208","author":"Donahue Chris","year":"2018","unstructured":"Chris Donahue , Julian McAuley , and Miller Puckette . 2018. Adversarial audio synthesis. arXiv preprint arXiv:1802.04208 ( 2018 ). Chris Donahue, Julian McAuley, and Miller Puckette. 2018. Adversarial audio synthesis. arXiv preprint arXiv:1802.04208 (2018)."},{"key":"e_1_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/SCFT.1999.781522"},{"key":"e_1_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2018.2886739"},{"key":"e_1_2_2_14_1","doi-asserted-by":"crossref","unstructured":"Yu Gu and Zhen-Hua Ling. 2017. Waveform Modeling Using Stacked Dilated Convolutional Neural Networks for Speech Bandwidth Extension.. In INTERSPEECH. 1123--1127.  Yu Gu and Zhen-Hua Ling. 2017. Waveform Modeling Using Stacked Dilated Convolutional Neural Networks for Speech Bandwidth Extension.. In INTERSPEECH. 1123--1127.","DOI":"10.21437\/Interspeech.2017-336"},{"key":"e_1_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/WASPAA.2019.8937169"},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3055031.3055088"},{"key":"e_1_2_2_17_1","unstructured":"Scott Havard. [n.d.]. Google Pixel XL Teardown. https:\/\/www.ifixit.com\/Teardown\/Google+Pixel+XL+Teardown\/71237  Scott Havard. [n.d.]. Google Pixel XL Teardown. https:\/\/www.ifixit.com\/Teardown\/Google+Pixel+XL+Teardown\/71237"},{"key":"e_1_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2005-527"},{"key":"e_1_2_2_19_1","volume-title":"Proceedings. Meeting the Challenges of the New Millennium (Cat. No. 00EX421)","author":"Jax Peter","year":"2000","unstructured":"Peter Jax and Peter Vary . 2000 . Wideband extension of telephone speech using a hidden Markov model. In 2000 IEEE Workshop on Speech Coding . Proceedings. Meeting the Challenges of the New Millennium (Cat. No. 00EX421) . IEEE, 133--135. Peter Jax and Peter Vary. 2000. Wideband extension of telephone speech using a hidden Markov model. In 2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No. 00EX421). IEEE, 133--135."},{"key":"e_1_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0165-1684(03)00082-3"},{"key":"e_1_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASL.2006.885934"},{"key":"e_1_2_2_22_1","unstructured":"Volodymyr Kuleshov S Zayd Enam and Stefano Ermon. 2017. Audio super-resolution using neural nets. In ICLR (Workshop Track).  Volodymyr Kuleshov S Zayd Enam and Stefano Ermon. 2017. Audio super-resolution using neural nets. In ICLR (Workshop Track)."},{"key":"e_1_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2015-555"},{"key":"e_1_2_2_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2018.8462588"},{"key":"e_1_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2018.8462049"},{"key":"e_1_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2018.2798811"},{"key":"e_1_2_2_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/3432228"},{"key":"e_1_2_2_28_1","unstructured":"Brittany McCrigler. [n.d.]. Google Nexus 5 Teardown. https:\/\/www.ifixit.com\/Teardown\/Nexus+5+Teardown\/19016  Brittany McCrigler. [n.d.]. Google Nexus 5 Teardown. https:\/\/www.ifixit.com\/Teardown\/Nexus+5+Teardown\/19016"},{"key":"e_1_2_2_29_1","volume-title":"Gyrophone: Recognizing speech from gyroscope signals. In 23rd {USENIX} Security Symposium ({USENIX} Security 14). 1053--1067.","author":"Michalevsky Yan","year":"2014","unstructured":"Yan Michalevsky , Dan Boneh , and Gabi Nakibly . 2014 . Gyrophone: Recognizing speech from gyroscope signals. In 23rd {USENIX} Security Symposium ({USENIX} Security 14). 1053--1067. Yan Michalevsky, Dan Boneh, and Gabi Nakibly. 2014. Gyrophone: Recognizing speech from gyroscope signals. In 23rd {USENIX} Security Symposium ({USENIX} Security 14). 1053--1067."},{"key":"e_1_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.21437\/Eurospeech.1997-469"},{"key":"e_1_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2011-419"},{"key":"e_1_2_2_32_1","volume-title":"SEGAN: Speech enhancement generative adversarial network. arXiv preprint arXiv:1703.09452","author":"Pascual Santiago","year":"2017","unstructured":"Santiago Pascual , Antonio Bonafonte , and Joan Serra . 2017 . SEGAN: Speech enhancement generative adversarial network. arXiv preprint arXiv:1703.09452 (2017). Santiago Pascual, Antonio Bonafonte, and Joan Serra. 2017. SEGAN: Speech enhancement generative adversarial network. arXiv preprint arXiv:1703.09452 (2017)."},{"key":"e_1_2_2_33_1","unstructured":"Pery Pearson. [n.d.]. Sound Sampling. http:\/\/www.hitl.washington.edu\/projects\/knowledge_base\/virtual-worlds\/EVE\/I.B.3.a.SoundSampling.html  Pery Pearson. [n.d.]. Sound Sampling. http:\/\/www.hitl.washington.edu\/projects\/knowledge_base\/virtual-worlds\/EVE\/I.B.3.a.SoundSampling.html"},{"key":"e_1_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.5815\/ijisa.2016.02.06"},{"key":"e_1_2_2_35_1","volume-title":"Australian Int. Conf. Speech Science, Technology. 106--111","author":"Qian Yasheng","year":"2002","unstructured":"Yasheng Qian and Peter Kabal . 2002 . Wideband speech recovery from narrowband speech using classified codebook mapping . In Australian Int. Conf. Speech Science, Technology. 106--111 . Yasheng Qian and Peter Kabal. 2002. Wideband speech recovery from narrowband speech using classified codebook mapping. In Australian Int. Conf. Speech Science, Technology. 106--111."},{"key":"e_1_2_2_36_1","volume-title":"In Proceedings of the 15th International Society for Music Information Retrieval Conference, ISMIR. Citeseer.","author":"Raffel Colin","year":"2014","unstructured":"Colin Raffel , Brian McFee , Eric J Humphrey , Justin Salamon , Oriol Nieto , Dawen Liang , Daniel PW Ellis , and C Colin Raffel . 2014 . mir_eval: A transparent implementation of common MIR metrics . In In Proceedings of the 15th International Society for Music Information Retrieval Conference, ISMIR. Citeseer. Colin Raffel, Brian McFee, Eric J Humphrey, Justin Salamon, Oriol Nieto, Dawen Liang, Daniel PW Ellis, and C Colin Raffel. 2014. mir_eval: A transparent implementation of common MIR metrics. In In Proceedings of the 15th International Society for Music Information Retrieval Conference, ISMIR. Citeseer."},{"key":"e_1_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.sigpro.2011.10.007"},{"key":"e_1_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/CSNT.2015.233"},{"key":"e_1_2_2_39_1","volume-title":"Speech Commands: A public dataset for single-word speech recognition. Dataset available from http:\/\/download.tensorflow.org\/data\/speechcommandsv0.01.tar.gz","author":"Warden Pete","year":"2017","unstructured":"Pete Warden . 2017 . Speech Commands: A public dataset for single-word speech recognition. Dataset available from http:\/\/download.tensorflow.org\/data\/speechcommandsv0.01.tar.gz (2017). Pete Warden. 2017. Speech Commands: A public dataset for single-word speech recognition. Dataset available from http:\/\/download.tensorflow.org\/data\/speechcommandsv0.01.tar.gz (2017)."},{"key":"e_1_2_2_40_1","volume-title":"MA Tu\u011ftekin Turan, and Engin Erzin","author":"Ya\u011fl Can","year":"2013","unstructured":"Can Ya\u011fl &iota; , MA Tu\u011ftekin Turan, and Engin Erzin . 2013 . Artificial bandwidth extension of spectral envelope along a Viterbi path. Speech communication 55, 1 (2013), 111--118. Can Ya\u011fl&iota;, MA Tu\u011ftekin Turan, and Engin Erzin. 2013. Artificial bandwidth extension of spectral envelope along a Viterbi path. Speech communication 55, 1 (2013), 111--118."},{"key":"e_1_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3038912.3052577"},{"key":"e_1_2_2_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3308558.3313426"},{"key":"e_1_2_2_43_1","doi-asserted-by":"publisher","DOI":"10.21437\/ICSLP.1994-412"},{"key":"e_1_2_2_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/2742647.2742658"}],"container-title":["Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3478102","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3478102","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3478102","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:31:32Z","timestamp":1750188692000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3478102"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,9,9]]},"references-count":44,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2021,9,9]]}},"alternative-id":["10.1145\/3478102"],"URL":"https:\/\/doi.org\/10.1145\/3478102","relation":{},"ISSN":["2474-9567"],"issn-type":[{"type":"electronic","value":"2474-9567"}],"subject":[],"published":{"date-parts":[[2021,9,9]]},"assertion":[{"value":"2021-09-14","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}