{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,14]],"date-time":"2026-03-14T17:57:52Z","timestamp":1773511072149,"version":"3.50.1"},"reference-count":96,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2024,1,11]],"date-time":"2024-01-11T00:00:00Z","timestamp":1704931200000},"content-version":"vor","delay-in-days":386,"URL":"http:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["CNS-2008384"],"award-info":[{"award-number":["CNS-2008384"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Interact. Mob. Wearable Ubiquitous Technol."],"published-print":{"date-parts":[[2022,12,21]]},"abstract":"<jats:p>This paper presents iSpyU, a system that shows the feasibility of recognition of natural speech content played on a phone during conference calls (Skype, Zoom, etc) using a fusion of motion sensors such as accelerometer and gyroscope. While microphones require permissions from the user to be accessible by an app developer, the motion sensors are zero-permission sensors, thus accessible by a developer without alerting the user. This allows a malicious app to potentially eavesdrop on sensitive speech content played by the user's phone. In designing the attack, iSpyU tackles a number of technical challenges including: (i) Low sampling rate of motion sensors (500 Hz in comparison to 44 kHz for a microphone). (ii) Lack of availability of large-scale training datasets to train models for Automatic Speech Recognition (ASR) with motion sensors. iSpyU systematically addresses these challenges by a combination of techniques in synthetic training data generation, ASR modeling, and domain adaptation. Extensive measurement studies on modern smartphones show a word level accuracy of 53.3 - 59.9% over a dictionary of 2000-10000 words, and a character level accuracy of 70.0 - 74.8%. We believe such levels of accuracy poses a significant threat when viewed from a privacy perspective.<\/jats:p>","DOI":"10.1145\/3569486","type":"journal-article","created":{"date-parts":[[2023,1,11]],"date-time":"2023-01-11T15:34:01Z","timestamp":1673451241000},"page":"1-31","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":13,"title":["I Spy You"],"prefix":"10.1145","volume":"6","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7304-4571","authenticated-orcid":false,"given":"Shijia","family":"Zhang","sequence":"first","affiliation":[{"name":"The Pennsylvania State University, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4322-1818","authenticated-orcid":false,"given":"Yilin","family":"Liu","sequence":"additional","affiliation":[{"name":"The Pennsylvania State University, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5325-5013","authenticated-orcid":false,"given":"Mahanth","family":"Gowda","sequence":"additional","affiliation":[{"name":"The Pennsylvania State University, USA"}]}],"member":"320","published-online":{"date-parts":[[2023,1,11]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"International conference on machine learning. PMLR, 173--182","author":"Amodei Dario","year":"2016","unstructured":"Dario Amodei, Sundaram Ananthanarayanan, Rishita Anubhai, Jingliang Bai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Qiang Cheng, Guoliang Chen, et al. 2016. Deep speech 2: End-to-end speech recognition in english and mandarin. In International conference on machine learning. PMLR, 173--182."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/SP.2018.00004"},{"key":"e_1_2_1_3_1","volume-title":"Spearphone: A speech privacy exploit via accelerometer-sensed reverberations from smartphone loudspeakers. arXiv preprint arXiv:1907.05972","author":"Anand S Abhishek","year":"2019","unstructured":"S Abhishek Anand, Chen Wang, Jian Liu, Nitesh Saxena, and Yingying Chen. 2019. Spearphone: A speech privacy exploit via accelerometer-sensed reverberations from smartphone loudspeakers. arXiv preprint arXiv:1907.05972 (2019)."},{"key":"e_1_2_1_4_1","unstructured":"Android Sensors 2022. Sensors Overview. https:\/\/developer.android.com\/guide\/topics\/sensors\/sensors_overview."},{"key":"e_1_2_1_5_1","volume-title":"Data Privacy Management, Cryptocurrencies and Blockchain Technology","author":"Azzakhnini Safaa","unstructured":"Safaa Azzakhnini and Ralf C Staudemeyer. 2020. Extracting speech from motion-sensitive sensors. In Data Privacy Management, Cryptocurrencies and Blockchain Technology. Springer, 145--160."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.14722\/ndss.2020.24076"},{"key":"e_1_2_1_7_1","unstructured":"Ronald J Baken and Robert F Orlikoff. 2000. Clinical measurement of speech and voice. Cengage Learning."},{"key":"e_1_2_1_8_1","unstructured":"Batterystats 2022. Profile battery usage with Batterystats and Battery Historian. https:\/\/developer.android.com\/topic\/performance\/power\/setup-battery-historian."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-662-47854-7_30"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASSP.1979.1163209"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2016.7472621"},{"key":"e_1_2_1_12_1","volume-title":"31st Annual Symposium on Combinatorial Pattern Matching (CPM","author":"Charalampopoulos Panagiotis","year":"2020","unstructured":"Panagiotis Charalampopoulos, Tomasz Kociumaka, and Shay Mozes. 2020. Dynamic string alignment. In 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020). Schloss Dagstuhl-Leibniz-Zentrum f\u00fcr Informatik."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2018.8462105"},{"key":"e_1_2_1_14_1","unstructured":"Coriolis Force 2022. Coriolis force. https:\/\/en.wikipedia.org\/wiki\/Coriolis_force."},{"key":"e_1_2_1_15_1","doi-asserted-by":"crossref","unstructured":"Abe Davis Michael Rubinstein Neal Wadhwa Gautham J Mysore Fredo Durand and William T Freeman. 2014. The visual microphone: Passive recovery of sound from video. (2014).","DOI":"10.1145\/2601097.2601119"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_2_1_17_1","volume-title":"Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805","author":"Devlin Jacob","year":"2018","unstructured":"Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)."},{"key":"e_1_2_1_18_1","volume-title":"S3: Side-Channel Attack on Stylus Pencil through Sensors. arXiv preprint arXiv:2103.05840","author":"Farrukh Habiba","year":"2021","unstructured":"Habiba Farrukh, Tinghan Yang, Hanwen Xu, Yuxuan Yin, He Wang, and Z Berkay Celik. 2021. S3: Side-Channel Attack on Stylus Pencil through Sensors. arXiv preprint arXiv:2103.05840 (2021)."},{"key":"e_1_2_1_19_1","unstructured":"Haytham M. Fayek. 2016. Speech Processing for Machine Learning: Filter banks Mel-Frequency Cepstral Coefficients (MFCCs) and What's In-Between. https:\/\/haythamfayek.com\/2016\/04\/21\/speech-processing-for-machine-learning.html"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.specom.2006.11.005"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/1143844.1143891"},{"key":"e_1_2_1_22_1","unstructured":"Gyroscope 2010. New Gyroscope Design Will Help Autonomous Cars and Robots Map the World. https:\/\/tinyurl.com\/4s464sy8."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3055031.3055088"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.3390\/rs9080848"},{"key":"e_1_2_1_25_1","volume-title":"Feature-Based Learning Hidden Unit Contributions for Domain Adaptation of RNN-LMs. In 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","author":"Hentschel Michael","unstructured":"Michael Hentschel, Marc Delcroix, Atsunori Ogawa, and Tomohiro Nakatani. 2018. Feature-Based Learning Hidden Unit Contributions for Domain Adaptation of RNN-LMs. In 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, 1692--1696."},{"key":"e_1_2_1_26_1","volume-title":"AccEar: Accelerometer Acoustic Eavesdropping with Unconstrained Vocabulary. In 2022 IEEE Symposium on Security and Privacy (SP). IEEE Computer Society, 1530--1530","author":"Hu Pengfei","year":"2022","unstructured":"Pengfei Hu, Hui Zhuang, Panneer Selvam Santhalingam, Riccardo Spolaor, Parth Pathak, Guoming Zhang, and Xiuzhen Cheng. 2022. AccEar: Accelerometer Acoustic Eavesdropping with Unconstrained Vocabulary. In 2022 IEEE Symposium on Security and Privacy (SP). IEEE Computer Society, 1530--1530."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/SLT.2018.8639563"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/JBHI.2020.3001216"},{"key":"e_1_2_1_29_1","volume-title":"Blackout: Speeding up recurrent neural network language models with very large vocabularies. arXiv preprint arXiv:1511.06909","author":"Ji Shihao","year":"2015","unstructured":"Shihao Ji, SVN Vishwanathan, Nadathur Satish, Michael J Anderson, and Pradeep Dubey. 2015. Blackout: Speeding up recurrent neural network language models with very large vocabularies. arXiv preprint arXiv:1511.06909 (2015)."},{"key":"e_1_2_1_30_1","volume-title":"No Seeing is Also Believing: Electromagnetic-emission-based Application Guessing Attacks via Smartphones","author":"Ji Xiaoyu","year":"2021","unstructured":"Xiaoyu Ji, Yushi Cheng, Wenyuan Xu, Yuehan Chi, Hao Pan, Zhuangdi Zhu, Chuang-Wen You, Yi-Chao Chen, and Lili Qiu. 2021. No Seeing is Also Believing: Electromagnetic-emission-based Application Guessing Attacks via Smartphones. IEEE Transactions on Mobile Computing (2021)."},{"key":"e_1_2_1_31_1","volume-title":"LibriVox: Free public domain audiobooks. Reference Reviews","author":"Kearns Jodi","year":"2014","unstructured":"Jodi Kearns. 2014. LibriVox: Free public domain audiobooks. Reference Reviews (2014)."},{"key":"e_1_2_1_32_1","unstructured":"Suyoun Kim Siddharth Dalmia and Florian Metze. 2018. Situation informed end-to-end asr for chime-5 challenge. In CHiME5 workshop."},{"key":"e_1_2_1_33_1","volume-title":"Missing-feature reconstruction by leveraging temporal spectral correlation for robust speech recognition in background noise conditions","author":"Kim Wooil","year":"2010","unstructured":"Wooil Kim and John HL Hansen. 2010. Missing-feature reconstruction by leveraging temporal spectral correlation for robust speech recognition in background noise conditions. IEEE transactions on audio, speech, and language processing 18, 8 (2010), 2111--2120."},{"key":"e_1_2_1_34_1","volume-title":"Int. Conf. on Speech and Computer (SPECOM07)","volume":"2","author":"Kinnunen Tomi","year":"2007","unstructured":"Tomi Kinnunen, Evgenia Chernenko, Marko Tuononen, Pasi Fr\u00e4nti, and Haizhou Li. 2007. Voice activity detection using MFCC features and support vector machine. In Int. Conf. on Speech and Computer (SPECOM07), Moscow, Russia, Vol. 2. 556--561."},{"key":"e_1_2_1_35_1","first-page":"2","article-title":"A Comprehenisve Review of the Acoustic Correlate of Duration and Its Linguistic Implications","volume":"10","author":"Koffi Ettien","year":"2021","unstructured":"Ettien Koffi. 2021. A Comprehenisve Review of the Acoustic Correlate of Duration and Its Linguistic Implications. Linguistic Portfolios 10, 1 (2021), 2.","journal-title":"Linguistic Portfolios"},{"key":"e_1_2_1_36_1","volume-title":"Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25","author":"Krizhevsky Alex","year":"2012","unstructured":"Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25 (2012)."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2018.8462017"},{"key":"e_1_2_1_38_1","volume-title":"Hybrid deep neural network-hidden markov model (dnn-hmm) based speech emotion recognition. In 2013 Humaine association conference on affective computing and intelligent interaction","author":"Li Longfei","unstructured":"Longfei Li, Yong Zhao, Dongmei Jiang, Yanning Zhang, Fengna Wang, Isabel Gonzalez, Enescu Valentin, and Hichem Sahli. 2013. Hybrid deep neural network-hidden markov model (dnn-hmm) based speech emotion recognition. In 2013 Humaine association conference on affective computing and intelligent interaction. IEEE, 312--317."},{"key":"e_1_2_1_39_1","unstructured":"Lip Read 2015. This Is What It Really Feels Like To Lip Read. https:\/\/www.bustle.com\/articles\/131261-how-accurate-is-lip-reading-this-is-how-it-feels-to-depend-on-it-every-day."},{"key":"e_1_2_1_40_1","unstructured":"Lip Read Learning 2020. How To Learn To Lip Read. https:\/\/www.connecthearing.com\/blog\/hearing-loss\/lip-reading\/."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/SAHCN.2017.7964907"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.3109\/03005368709077769"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.3109\/03005368909076523"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/3302506.3310398"},{"key":"e_1_2_1_45_1","unstructured":"MEMS Accelerometers 2017. MEMS Accelerometers | Silicon Sensing. https:\/\/www.siliconsensing.com\/technology\/mems-accelerometers\/."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2020.2987752"},{"key":"e_1_2_1_47_1","volume-title":"Gyrophone: Recognizing speech from gyroscope signals. In 23rd {USENIX} Security Symposium ({USENIX} Security 14). 1053--1067.","author":"Michalevsky Yan","year":"2014","unstructured":"Yan Michalevsky, Dan Boneh, and Gabi Nakibly. 2014. Gyrophone: Recognizing speech from gyroscope signals. In 23rd {USENIX} Security Symposium ({USENIX} Security 14). 1053--1067."},{"key":"e_1_2_1_48_1","volume-title":"OS 2022. Mobile Operating System Market Share. https:\/\/gs.statcounter.com\/os-market-share\/mobile\/.","author":"Mobile","unstructured":"Mobile OS 2022. Mobile Operating System Market Share. https:\/\/gs.statcounter.com\/os-market-share\/mobile\/."},{"key":"e_1_2_1_49_1","unstructured":"Mobile Processor 2020. Mobile Processor Crosses 3 GHz CPU Clock Speed Mark. https:\/\/tinyurl.com\/2p8ya97w."},{"key":"e_1_2_1_50_1","unstructured":"Moore's Law 2020. Does Moore's Law still apply to smartphones in 2020? https:\/\/tinyurl.com\/2p8su4cb."},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/PLANS.2014.6851507"},{"key":"e_1_2_1_52_1","volume-title":"Proc. Interspeech.","author":"Nakatani Tomohiro","year":"2019","unstructured":"Tomohiro Nakatani. 2019. Improving transformer-based end-to-end speech recognition with connectionist temporal classification and language model integration. In Proc. Interspeech."},{"key":"e_1_2_1_53_1","volume-title":"Lamphone: Real-Time Passive Sound Recovery from Light Bulb Vibrations. Cryptology ePrint Archive.","author":"Ben Nassi","year":"2020","unstructured":"Ben Nassi et al. 2020. Lamphone: Real-Time Passive Sound Recovery from Light Bulb Vibrations. Cryptology ePrint Archive."},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-93000-8_99"},{"key":"e_1_2_1_55_1","unstructured":"Noise Cancellation 2021. Microsoft adds AI-enabled noise cancellation feature to Skype: Here's how you can enable it. https:\/\/indianexpress.com\/article\/technology\/social\/skype-ai-enabled-noise-cancellation-feature-how-to-enable-7230339\/."},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/2162081.2162095"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2015.7178964"},{"key":"e_1_2_1_58_1","volume-title":"Specaugment: A simple data augmentation method for automatic speech recognition. arXiv preprint arXiv:1904.08779","author":"Park Daniel S","year":"2019","unstructured":"Daniel S Park, William Chan, Yu Zhang, Chung-Cheng Chiu, Barret Zoph, Ekin D Cubuk, and Quoc V Le. 2019. Specaugment: A simple data augmentation method for automatic speech recognition. arXiv preprint arXiv:1904.08779 (2019)."},{"key":"e_1_2_1_59_1","volume-title":"International conference on machine learning. PMLR, 4055--4064","author":"Parmar Niki","year":"2018","unstructured":"Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Lukasz Kaiser, Noam Shazeer, Alexander Ku, and Dustin Tran. 2018. Image transformer. In International conference on machine learning. PMLR, 4055--4064."},{"key":"e_1_2_1_60_1","unstructured":"Perplexisty 2020. Perplexity in Language Models. https:\/\/towardsdatascience.com\/perplexity-in-language-models-87a196019a94."},{"key":"e_1_2_1_61_1","volume-title":"Towards a seamless integration of word senses into downstream nlp applications. arXiv preprint arXiv:1710.06632","author":"Pilehvar Mohammad Taher","year":"2017","unstructured":"Mohammad Taher Pilehvar, Jose Camacho-Collados, Roberto Navigli, and Nigel Collier. 2017. Towards a seamless integration of word senses into downstream nlp applications. arXiv preprint arXiv:1710.06632 (2017)."},{"key":"e_1_2_1_62_1","volume-title":"2013 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS). IEEE, 163--172","author":"Qi Xin","year":"2013","unstructured":"Xin Qi, Matthew Keally, Gang Zhou, Yantao Li, and Zhen Ren. 2013. AdaSense: Adapting sampling rates for activity recognition in body sensor networks. In 2013 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS). IEEE, 163--172."},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1145\/3331184.3331341"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1109\/5.18626"},{"key":"e_1_2_1_65_1","volume-title":"Best Mobile Processor Ranking List","author":"Ranking","year":"2021","unstructured":"Ranking 2021. Best Mobile Processor Ranking List 2021. https:\/\/www.techcenturion.com\/smartphone-processors-ranking."},{"key":"e_1_2_1_66_1","volume-title":"After embracing remote work","author":"Remote Work","year":"2020","unstructured":"Remote Work 2010. After embracing remote work in 2020, companies face conflicts making it permanent. https:\/\/tinyurl.com\/57yavem8."},{"key":"e_1_2_1_67_1","volume-title":"13th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 16). 671--684.","author":"Roy Nirupam","unstructured":"Nirupam Roy and Romit Roy Choudhury. 2016. Ripple {II}: Faster communication through physical vibration. In 13th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 16). 671--684."},{"key":"e_1_2_1_68_1","volume-title":"12th USENIX Symposium on Networked Systems Design and Implementation (NSDI 15)","author":"Roy Nirupam","year":"2015","unstructured":"Nirupam Roy, Mahanth Gowda, and Romit Roy Choudhury. 2015. Ripple: Communicating through physical vibration. In 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI 15). 265--278."},{"key":"e_1_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1145\/3384419.3430781"},{"key":"e_1_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.1109\/78.650093"},{"key":"e_1_2_1_71_1","unstructured":"Sensors 2014. Sensors are Fundamental to Industrial IoT. https:\/\/tinyurl.com\/54pj3268."},{"key":"e_1_2_1_72_1","doi-asserted-by":"crossref","unstructured":"Ilya Sklyar et al. 2021. Streaming multi-speaker asr with rnn-t. In IEEE ICASSP.","DOI":"10.1109\/ICASSP39728.2021.9413471"},{"key":"e_1_2_1_73_1","unstructured":"Smart Phone Market 2022. Global Smartphone Market Share: By Quarter. https:\/\/www.counterpointresearch.com\/global-smartphone-share\/."},{"key":"e_1_2_1_74_1","volume-title":"Gartner Says Worldwide Smartphone Sales Grew 10.8 percent in Second Quarter of","author":"Smartphone Sales","year":"2021","unstructured":"Smartphone Sales 2021. Gartner Says Worldwide Smartphone Sales Grew 10.8 percent in Second Quarter of 2021. https:\/\/www.gartner.com\/en\/newsroom\/press-releases\/2021-09-01-2q21-smartphone-market-share."},{"key":"e_1_2_1_75_1","unstructured":"Sound Vibration 2022. Vibration: Origin Effects Solution. https:\/\/www.gcaudio.com\/tips-tricks\/vibration-origins-effects-solutions\/."},{"key":"e_1_2_1_76_1","unstructured":"Sound Vibration Proof 2016. Practical Sound and Vibration Proofing via Speaker Isolation. https:\/\/www.andrehvac.com\/blog\/vibration-control-products\/practical-sound-vibration-proofing-speaker-isolation-pads\/."},{"key":"e_1_2_1_77_1","unstructured":"Speaking 2022. Speaking. http:\/\/www.psy.vanderbilt.edu\/courses\/psy216\/SPEAKING.html."},{"key":"e_1_2_1_78_1","unstructured":"Speech 2021. FACTS ABOUT SPEECH INTELLIGIBILITY. https:\/\/tinyurl.com\/38j2bjbt."},{"key":"e_1_2_1_79_1","unstructured":"Speech Recognition 2022. Speech Recognition on LibriSpeech test-clean. https:\/\/paperswithcode.com\/sota\/speech-recognition-on-librispeech-test-clean."},{"key":"e_1_2_1_80_1","volume-title":"Cold fusion: Training seq2seq models together with language models. arXiv preprint arXiv:1708.06426","author":"Sriram Anuroop","year":"2017","unstructured":"Anuroop Sriram, Heewoo Jun, Sanjeev Satheesh, and Adam Coates. 2017. Cold fusion: Training seq2seq models together with language models. arXiv preprint arXiv:1708.06426 (2017)."},{"key":"e_1_2_1_81_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-49409-8_35"},{"key":"e_1_2_1_82_1","volume-title":"Sequence to sequence learning with neural networks. arXiv preprint arXiv:1409.3215","author":"Sutskever Ilya","year":"2014","unstructured":"Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural networks. arXiv preprint arXiv:1409.3215 (2014)."},{"key":"e_1_2_1_83_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.316"},{"key":"e_1_2_1_84_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.procs.2015.06.066"},{"key":"e_1_2_1_85_1","volume-title":"Attention is all you need. Advances in neural information processing systems 30","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, \u0141ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017)."},{"key":"e_1_2_1_86_1","volume-title":"Conferences","author":"Video","year":"2021","unstructured":"Video Conferences 2021. How Zoom leverages AI to provide the best videoconferencing experience. https:\/\/digital.hbs.edu\/platform-digit\/submission\/how-zoom-leverages-ai-to-provide-the-best-videoconferencing-experience\/."},{"key":"e_1_2_1_87_1","unstructured":"Volume Booster 2019. Volume Booster Tips for Smartphones and Tablets. https:\/\/www.lifewire.com\/boost-volume-on-phone-and-tablet-4142971."},{"key":"e_1_2_1_88_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.683"},{"key":"e_1_2_1_89_1","doi-asserted-by":"publisher","DOI":"10.1145\/2789168.2790121"},{"key":"e_1_2_1_90_1","doi-asserted-by":"publisher","DOI":"10.1145\/2971648.2971688"},{"key":"e_1_2_1_91_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCCN49398.2020.9209713"},{"key":"e_1_2_1_92_1","unstructured":"Word Error Rate 2019. Word Error Rate Mechanism ASR Transcription and Challenges in Accuracy Measurement. https:\/\/tinyurl.com\/4229u6a3."},{"key":"e_1_2_1_93_1","volume-title":"An exploration of directly using word as acoustic modeling unit for speech recognition. In 2018 IEEE spoken language technology workshop (SLT)","author":"Zhang Chunlei","unstructured":"Chunlei Zhang, Chengzhu Yu, Chao Weng, Jia Cui, and Dong Yu. 2018. An exploration of directly using word as acoustic modeling unit for speech recognition. In 2018 IEEE spoken language technology workshop (SLT). IEEE, 64--69."},{"key":"e_1_2_1_94_1","volume-title":"Pretraining-based natural language generation for text summarization. arXiv preprint arXiv:1902.09243","author":"Zhang Haoyu","year":"2019","unstructured":"Haoyu Zhang, Jianjun Xu, and Ji Wang. 2019. Pretraining-based natural language generation for text summarization. arXiv preprint arXiv:1902.09243 (2019)."},{"key":"e_1_2_1_95_1","doi-asserted-by":"publisher","DOI":"10.1145\/2742647.2742658"},{"key":"e_1_2_1_96_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.506"}],"container-title":["Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3569486","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3569486","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3569486","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,15]],"date-time":"2025-07-15T20:54:43Z","timestamp":1752612883000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3569486"}},"subtitle":["Eavesdropping Continuous Speech on Smartphones via Motion Sensors"],"short-title":[],"issued":{"date-parts":[[2022,12,21]]},"references-count":96,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2022,12,21]]}},"alternative-id":["10.1145\/3569486"],"URL":"https:\/\/doi.org\/10.1145\/3569486","relation":{},"ISSN":["2474-9567"],"issn-type":[{"value":"2474-9567","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,12,21]]},"assertion":[{"value":"2023-01-11","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}