{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T17:45:56Z","timestamp":1776102356782,"version":"3.50.1"},"reference-count":96,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2023,9,7]],"date-time":"2023-09-07T00:00:00Z","timestamp":1694044800000},"content-version":"vor","delay-in-days":366,"URL":"http:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["2110193,2132112"],"award-info":[{"award-number":["2110193,2132112"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Interact. Mob. Wearable Ubiquitous Technol."],"published-print":{"date-parts":[[2022,9,6]]},"abstract":"<jats:p>In this paper, we present MuteIt, an ear-worn system for recognizing unvoiced human commands. MuteIt presents an intuitive alternative to voice-based interactions that can be unreliable in noisy environments, disruptive to those around us, and compromise our privacy. We propose a twin-IMU set up to track the user's jaw motion and cancel motion artifacts caused by head and body movements. MuteIt processes jaw motion during word articulation to break each word signal into its constituent syllables, and further each syllable into phonemes (vowels, visemes, and plosives). Recognizing unvoiced commands by only tracking jaw motion is challenging. As a secondary articulator, jaw motion is not distinctive enough for unvoiced speech recognition. MuteIt combines IMU data with the anatomy of jaw movement as well as principles from linguistics, to model the task of word recognition as an estimation problem. Rather than employing machine learning to train a word classifier, we reconstruct each word as a sequence of phonemes using a bi-directional particle filter, enabling the system to be easily scaled to a large set of words. We validate MuteIt for 20 subjects with diverse speech accents to recognize 100 common command words. MuteIt achieves a mean word recognition accuracy of 94.8% in noise-free conditions. When compared with common voice assistants, MuteIt outperforms them in noisy acoustic environments, achieving higher than 90% recognition accuracy. Even in the presence of motion artifacts, such as head movement, walking, and riding in a moving vehicle, MuteIt achieves mean word recognition accuracy of 91% over all scenarios.<\/jats:p>","DOI":"10.1145\/3550281","type":"journal-article","created":{"date-parts":[[2022,9,7]],"date-time":"2022-09-07T14:54:27Z","timestamp":1662562467000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":44,"title":["MuteIt"],"prefix":"10.1145","volume":"6","author":[{"given":"Tanmay","family":"Srivastava","sequence":"first","affiliation":[{"name":"Stony Brook University, New York, USA"}]},{"given":"Prerna","family":"Khanna","sequence":"additional","affiliation":[{"name":"Stony Brook University, New York, USA"}]},{"given":"Shijia","family":"Pan","sequence":"additional","affiliation":[{"name":"University of California, Merced, Merced, USA"}]},{"given":"Phuc","family":"Nguyen","sequence":"additional","affiliation":[{"name":"University of Texas at Arlington, Arlington, USA"}]},{"given":"Shubham","family":"Jain","sequence":"additional","affiliation":[{"name":"Stony Brook University, New York, USA"}]}],"member":"320","published-online":{"date-parts":[[2022,9,7]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"Lu\u00eds Aguiar-Conraria and Maria Joana Soares. 2011. The continuous wavelet transform: A primer. Technical Report. NIPE-Universidade do Minho."},{"key":"e_1_2_1_2_1","volume-title":"Most used voice assistants in the United States","year":"2021","unstructured":"Amazon. 2021. Most used voice assistants in the United States in 2021, by age group. https:\/\/www.statista.com\/statistics\/1274429\/voice-assistants-use-by-age-group-united-states\/"},{"key":"e_1_2_1_3_1","unstructured":"Amazon. 2022. Amazon Alexa. https:\/\/developer.amazon.com\/en-US\/alexa"},{"key":"e_1_2_1_4_1","unstructured":"IoT Analytics. 2021. State of IoT 2021: Number of connected IoT devices growing 9% to 12.3 billion globally cellular IoT now surpassing 2 billion. https:\/\/iot-analytics.com\/number-connected-iot-devices\/"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3126594.3126649"},{"key":"e_1_2_1_6_1","unstructured":"Apple. 2022. Siri Apple. https:\/\/www.apple.com\/siri\/"},{"key":"e_1_2_1_7_1","doi-asserted-by":"crossref","unstructured":"Helen L Bear. 2017. Decoding visemes: improving machine lipreading. arXiv:1710.01288 [cs.CV]","DOI":"10.1109\/ICASSP.2016.7472029"},{"key":"e_1_2_1_8_1","volume-title":"Optics and Photonics for Counterterrorism, Crime Fighting, and Defence X","author":"Bear Helen L","unstructured":"Helen L Bear, Gari Owen, Richard Harvey, and Barry-John Theobald. 2014. Some observations on computer lip-reading: moving from the dream to the reality. In Optics and Photonics for Counterterrorism, Crime Fighting, and Defence X; and Optical Materials and Biomaterials in Security and Defence Systems Technology XI, Vol. 9253. International Society for Optics and Photonics, 92530G."},{"key":"e_1_2_1_9_1","volume-title":"Twelfth Annual Conference of the International Speech Communication Association.","author":"Be\u0148u\u0161 \u0160tefan","year":"2011","unstructured":"\u0160tefan Be\u0148u\u0161 and Marianne Pouplier. 2011. Jaw movement in vowels and liquids forming the syllable nucleus. In Twelfth Annual Conference of the International Speech Communication Association."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1044\/jshr.3802.446"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2018.2865609"},{"key":"e_1_2_1_12_1","unstructured":"Hang Chen Jun Du Yu Hu Li-Rong Dai Chin-Hui Lee and Bao-Cai Yin. 2020. Lip-reading with Hierarchical Pyramidal Convolution and Self-Attention. arXiv:2012.14360 [cs.CV]"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3052973.3053005"},{"key":"e_1_2_1_14_1","volume-title":"Ultrasound-based articulatory-to-acoustic mapping with WaveGlow speech synthesis. arXiv preprint arXiv:2008.03152","author":"Csap\u00f3 Tam\u00e1s G\u00e1bor","year":"2020","unstructured":"Tam\u00e1s G\u00e1bor Csap\u00f3, Csaba Zaink\u00f3, L\u00e1szl\u00f3 T\u00f3th, G\u00e1bor Gosztolya, and Alexandra Mark\u00f3. 2020. Ultrasound-based articulatory-to-acoustic mapping with WaveGlow speech synthesis. arXiv preprint arXiv:2008.03152 (2020)."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btw350"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/MMCS.1999.778619"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2006.1660033"},{"key":"e_1_2_1_18_1","article-title":"Physical therapy for patients with TMD: a descriptive study of treatment, disability, and health status","volume":"12","author":"Di Fabio Richard P","year":"1998","unstructured":"Richard P Di Fabio. 1998. Physical therapy for patients with TMD: a descriptive study of treatment, disability, and health status. Journal of orofacial pain 12, 2 (1998).","journal-title":"Journal of orofacial pain"},{"key":"e_1_2_1_19_1","unstructured":"Collins Dictionary. 2021. Collins Dictionary. https:\/\/www.collinsdictionary.com\/"},{"key":"e_1_2_1_20_1","unstructured":"Elago. 2021. AirPods Pro EarHook. https:\/\/www.elago.com\/new\/airpods-pro-earhook-white-lkt4w"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1159\/000066067"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.imavis.2018.07.002"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1044\/jshr.1104.796"},{"key":"e_1_2_1_24_1","unstructured":"Fortune Business Insights. 2021. Speech and Voice Recognition Market Size. https:\/\/tinyurl.com\/yyyxe4rk"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/3242587.3242603"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3411830"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2017.2757263"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/1143844.1143891"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/JIOT.2019.2946593"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCYB.2013.2265378"},{"key":"e_1_2_1_31_1","unstructured":"Theodore P. Hill. 2009. Conflations of Probability Distributions. arXiv:0808.1808 [math.PR]"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/3458709.3458985"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.specom.2012.02.001"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.imavis.2019.11.006"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.specom.2009.11.004"},{"key":"e_1_2_1_36_1","volume-title":"Proc. of ISSP","author":"Hueber Thomas","year":"2008","unstructured":"Thomas Hueber, G\u00e9rard Chollet, Bruce Denby, and Maureen Stone. 2008. Acquisition of ultrasound, video and acoustic speech data for a silent-speech interface application. Proc. of ISSP (2008), 365--369."},{"key":"e_1_2_1_37_1","unstructured":"Monsoon Solutions Inc. 2022. Monsoon Power Monitor. https:\/\/www.msoon.com\/online-store"},{"key":"e_1_2_1_38_1","volume-title":"Present, Future, and A Proposed Model. undefined","author":"Jefferson Madeline","year":"2019","unstructured":"Madeline Jefferson. 2019. Usability of Automatic Speech Recognition Systems for Individuals with Speech Disorders: Past, Present, Future, and A Proposed Model. undefined (2019). https:\/\/www.semanticscholar.org\/paper\/Usability-of-Automatic-Speech-Recognition-Systems-A-Jefferson\/73eefd141f43750b3ae0648e6ef099597e24c6c9"},{"key":"e_1_2_1_39_1","volume-title":"Statistical methods for speech recognition","author":"Jelinek Frederick","unstructured":"Frederick Jelinek. 1997. Statistical methods for speech recognition. MIT press."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/3550321"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.9734\/BJAST\/2015\/14975"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3172944.3172977"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/3210240.3211113"},{"key":"e_1_2_1_44_1","volume-title":"NZ) 10","author":"Kaye Rachel","year":"2017","unstructured":"Rachel Kaye, Christopher G Tang, and Catherine F Sinclair. 2017. The electrolarynx: voice restoration after total laryngectomy. Medical Devices (Auckland, NZ) 10 (2017), 133."},{"key":"e_1_2_1_45_1","volume-title":"Measuring the effectiveness of voice conversion on speaker identification and automatic speech recognition systems. arXiv preprint arXiv:1905.12531","author":"Keskin Gokce","year":"2019","unstructured":"Gokce Keskin, Tyler Lee, Cory Stephenson, and Oguz H Elibol. 2019. Measuring the effectiveness of voice conversion on speaker identification and automatic speech recognition systems. arXiv preprint arXiv:1905.12531 (2019)."},{"key":"e_1_2_1_46_1","volume-title":"Proceedings of the 22nd International Workshop on Mobile Computing Systems and Applications. 44--49","author":"Khanna Prerna","year":"2021","unstructured":"Prerna Khanna, Tanmay Srivastava, Shijia Pan, Shubham Jain, and Phuc Nguyen. 2021. JawSense: recognizing unvoiced sound using a low-cost ear-worn system. In Proceedings of the 22nd International Workshop on Mobile Computing Systems and Applications. 44--49."},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/JSEN.2019.2901271"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/3491102.3502015"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/3399715.3399852"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/3290605.3300376"},{"key":"e_1_2_1_51_1","unstructured":"Mbient Lab. 2020. Mbient IMU. https:\/\/mbientlab.com\/metamotionr\/"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1080\/10447318.2018.1455307"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/3311823.3311831"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","unstructured":"Rochelle Lieber. [n.d.]. Point and manner of articulation of English consonants and vowels. Introducing Morphology ([n. d.]) xii-xii. https:\/\/doi.org\/10.1017\/cbo9780511808845.003","DOI":"10.1017\/cbo9780511808845.003"},{"key":"e_1_2_1_55_1","unstructured":"LifeWire. 2021. Top commands. https:\/\/www.lifewire.com\/top-google-assistant-and-google-home-commands-4158256"},{"key":"e_1_2_1_56_1","volume-title":"Voicing and gaps in plosive systems. The world atlas of language structures online","author":"Maddieson Ian","year":"2013","unstructured":"Ian Maddieson. 2013. Voicing and gaps in plosive systems. The world atlas of language structures online (2013)."},{"key":"e_1_2_1_57_1","unstructured":"Magoosh. 2022. 44 Phonemes In English And Other Sound Blends. https:\/\/magoosh.com\/english-speaking\/44-phonemes-in-english-and-other-sound-blends\/"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/765891.765996"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1088\/1741-2552\/aac965"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1088\/1741-2552\/aac965"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1145\/3210240.3210322"},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1353\/aad.2012.0213"},{"key":"e_1_2_1_63_1","unstructured":"The University of Reading. 2021. The production of speech sounds. http:\/\/www.personal.rdg.ac.uk\/~llsroach\/phon2\/artic-basics.htm"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","unstructured":"John J. Ohala and Haruko Kawasaki-Fukumori. [n.d.]. Alternatives to the sonority hierarchy for explaining segmental sequential constraints. Language and its Ecology ([n.d.]). https:\/\/doi.org\/10.1515\/9783110805369.343","DOI":"10.1515\/9783110805369.343"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1145\/3411764.3445565"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1145\/3411764.3445430"},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2013.23"},{"key":"e_1_2_1_68_1","unstructured":"Physiopedia. 2021. TMJ Anatomy. https:\/\/www.physio-pedia.com\/TMJ_Anatomy"},{"key":"e_1_2_1_69_1","unstructured":"PlayStore. [n.d.]. Sound Meter. https:\/\/play.google.com\/store\/apps\/details?id=com.gamebasic.decibel&hl=en_US&gl=US"},{"key":"e_1_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.1145\/3372224.3419197"},{"key":"e_1_2_1_71_1","doi-asserted-by":"publisher","DOI":"10.1145\/3027063.3053246"},{"key":"e_1_2_1_72_1","doi-asserted-by":"publisher","DOI":"10.1109\/5.18626"},{"key":"e_1_2_1_73_1","doi-asserted-by":"publisher","DOI":"10.1145\/3458709.3458941"},{"key":"e_1_2_1_74_1","volume-title":"Speech is 3x faster than typing for english and mandarin text entry on mobile devices. arXiv preprint arXiv:1608.07323","author":"Ruan Sherry","year":"2016","unstructured":"Sherry Ruan, Jacob O Wobbrock, Kenny Liou, Andrew Ng, and James Landay. 2016. Speech is 3x faster than typing for english and mandarin text entry on mobile devices. arXiv preprint arXiv:1608.07323 (2016)."},{"key":"e_1_2_1_75_1","doi-asserted-by":"publisher","DOI":"10.1145\/2634317.2634322"},{"key":"e_1_2_1_76_1","doi-asserted-by":"publisher","DOI":"10.1145\/2634317.2634322"},{"key":"e_1_2_1_77_1","volume-title":"Fast and accurate recurrent neural network acoustic models for speech recognition. arXiv preprint arXiv:1507.06947","author":"Sak Ha\u015fim","year":"2015","unstructured":"Ha\u015fim Sak, Andrew Senior, Kanishka Rao, and Fran\u00e7oise Beaufays. 2015. Fast and accurate recurrent neural network acoustic models for speech recognition. arXiv preprint arXiv:1507.06947 (2015)."},{"key":"e_1_2_1_78_1","volume-title":"Voice Activity Detection for Ultrasound-based Silent Speech Interfaces using Convolutional Neural Networks. arXiv preprint arXiv:2105.13718","author":"Shandiz Amin Honarmandi","year":"2021","unstructured":"Amin Honarmandi Shandiz and L\u00e1szl\u00f3 T\u00f3th. 2021. Voice Activity Detection for Ultrasound-based Silent Speech Interfaces using Convolutional Neural Networks. arXiv preprint arXiv:2105.13718 (2021)."},{"key":"e_1_2_1_79_1","doi-asserted-by":"publisher","DOI":"10.1121\/1.3699209"},{"key":"e_1_2_1_80_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSP.2008.924137"},{"key":"e_1_2_1_81_1","doi-asserted-by":"publisher","DOI":"10.1177\/1525740108328410"},{"key":"e_1_2_1_82_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.367"},{"key":"e_1_2_1_83_1","doi-asserted-by":"publisher","DOI":"10.1145\/3242587.3242599"},{"key":"e_1_2_1_84_1","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM.2017.8057099"},{"key":"e_1_2_1_85_1","volume-title":"SeLaB: Semantic Labeling with BERT. In 2021 International Joint Conference on Neural Networks (IJCNN). IEEE, 1--8.","author":"Trabelsi Mohamed","year":"2021","unstructured":"Mohamed Trabelsi, Jin Cao, and Jeff Heflin. 2021. SeLaB: Semantic Labeling with BERT. In 2021 International Joint Conference on Neural Networks (IJCNN). IEEE, 1--8."},{"key":"e_1_2_1_86_1","doi-asserted-by":"publisher","DOI":"10.1038\/s41698-021-00195-y"},{"key":"e_1_2_1_87_1","doi-asserted-by":"publisher","unstructured":"G. Wang Y. Zou Z. Zhou K. Wu and L. M. Ni. 2016. We Can Hear You with Wi-Fi! IEEE Transactions on Mobile Computing 15 11 (2016) 2907--2920. https:\/\/doi.org\/10.1109\/TMC.2016.2517630","DOI":"10.1109\/TMC.2016.2517630"},{"key":"e_1_2_1_88_1","doi-asserted-by":"publisher","DOI":"10.1145\/3369812"},{"key":"e_1_2_1_89_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICBASE53849.2021.00133"},{"key":"e_1_2_1_90_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCSLP.2018.8706675"},{"key":"e_1_2_1_91_1","doi-asserted-by":"publisher","DOI":"10.1109\/TENCON.2008.4766822"},{"key":"e_1_2_1_92_1","volume-title":"The validity and reliability of phonemic awareness tests. Reading research quarterly","author":"Yopp Hallie Kay","year":"1988","unstructured":"Hallie Kay Yopp. 1988. The validity and reliability of phonemic awareness tests. Reading research quarterly (1988), 159--177."},{"key":"e_1_2_1_93_1","doi-asserted-by":"publisher","DOI":"10.1109\/JAS.2017.7510508"},{"key":"e_1_2_1_94_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448087"},{"key":"e_1_2_1_95_1","doi-asserted-by":"publisher","DOI":"10.1145\/3494990"},{"key":"e_1_2_1_96_1","doi-asserted-by":"publisher","DOI":"10.1145\/3432192"}],"container-title":["Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3550281","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3550281","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3550281","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,14]],"date-time":"2025-07-14T04:41:17Z","timestamp":1752468077000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3550281"}},"subtitle":["Jaw Motion Based Unvoiced Command Recognition Using Earable"],"short-title":[],"issued":{"date-parts":[[2022,9,6]]},"references-count":96,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2022,9,6]]}},"alternative-id":["10.1145\/3550281"],"URL":"https:\/\/doi.org\/10.1145\/3550281","relation":{},"ISSN":["2474-9567"],"issn-type":[{"value":"2474-9567","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,9,6]]},"assertion":[{"value":"2022-09-07","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}