{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,3]],"date-time":"2026-06-03T07:54:17Z","timestamp":1780473257584,"version":"3.54.1"},"reference-count":31,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2018,12,27]],"date-time":"2018-12-27T00:00:00Z","timestamp":1545868800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Interact. Mob. Wearable Ubiquitous Technol."],"published-print":{"date-parts":[[2018,12,27]]},"abstract":"<jats:p>Smart glasses are often used in public environments or industrial scenarios that are relatively noisy. Background noise and sound from competing speakers deteriorate voice communication or performance of automatic speech recognition (ASR). Typically, signal processing techniques are used to reduce noise and enhance voice quality, but they have limitations in performance, hardware and\/or computing resources. Voice capturing techniques using bone conducting on the head have been proposed in some experimental and commercial devices, with good robustness against environmental noise, but limited by signal distortions inherent to the capturing method. We present V-Speech, a novel sensing and signal processing solution that enables speech recognition and human-to-human communication in very noisy environments. It captures the voice signal with a vibration sensor located in the nasal pads of smart glasses and performs a transformation to the sensor signal in order to mimic that of a regular microphone in low noise conditions. The signal transformation is key, as it eliminates the \"nasal distortion\" that is introduced for nasal phonemes in the speech induced vibrations of the nasal bone. The output of V-Speech has low noise, sounds natural, and can be used in voice communication or as input to an off-the-shelf ASR service. We evaluated V-Speech in noise-free and noisy conditions with 30 volunteer speakers uttering 145 phrases and validated its performance on ASR engines and with assessments of voice quality using the Perceptual Evaluation of Speech Quality (PESQ) metric. The results show in extreme noise conditions a mean improvement of 50% for Word Error Rate (WER), and 1.0 on a scale of 5.0 for PESQ. In addition, real life recordings were made under various representative noise conditions, some with sound pressure levels of 93 dBA, which require hearing protection. Subjective listening tests were conducted according to a modified ITU P.835 approach to determine intelligibility, naturalness and overall quality. Under these extreme conditions, where V-Speech achieved 30 dB SNR, subjective results show the speech is intelligible, and the naturalness of the speech is rated as fair to good. This enables clear voice communication in challenging work environments, for example in places with industrial, factory, mining and construction noise. With our proposed smart switching technique between a regular microphone signal and V-Speech, the optimal quality can be maintained from low to high noise conditions.<\/jats:p>","DOI":"10.1145\/3287058","type":"journal-article","created":{"date-parts":[[2018,12,27]],"date-time":"2018-12-27T19:28:03Z","timestamp":1545938883000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":38,"title":["V-Speech"],"prefix":"10.1145","volume":"2","author":[{"given":"H\u00e9ctor A. Cordourier","family":"Maruri","sequence":"first","affiliation":[{"name":"Intel Labs, Zapopan, Jalisco, Mexico"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Paulo","family":"Lopez-Meyer","sequence":"additional","affiliation":[{"name":"Intel Labs, Zapopan, Jalisco, Mexico"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jonathan","family":"Huang","sequence":"additional","affiliation":[{"name":"Intel Labs, Santa Clara, California, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Willem Marco","family":"Beltman","sequence":"additional","affiliation":[{"name":"Intel Labs, Hillsboro, Oregon, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Lama","family":"Nachman","sequence":"additional","affiliation":[{"name":"Intel Labs, Santa Clara, California, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Hong","family":"Lu","sequence":"additional","affiliation":[{"name":"Intel Labs, Santa Clara, California, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2018,12,27]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Proceedings of New Directions for Improving Audio Effectiveness, Neuilly-sur-Seine","author":"Acker-Mills Barbara","year":"2005"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASSP.1979.1163209"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1121\/1.3654911"},{"key":"e_1_2_1_4_1","unstructured":"Buhel Soundglasses. 2015. Retrieved November 1 2017 from: http:\/\/www.buhel.com\/  Buhel Soundglasses. 2015. Retrieved November 1 2017 from: http:\/\/www.buhel.com\/"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jvoice.2013.12.014"},{"key":"e_1_2_1_6_1","volume-title":"ECMA international","author":"Standard","year":"2017","edition":"14"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASSP.1984.1164453"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2016.2647702"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1121\/1.408535"},{"key":"e_1_2_1_10_1","volume-title":"Letowski","author":"Henry Paula","year":"2007"},{"key":"e_1_2_1_11_1","unstructured":"ITU-T Recommendation P.835 2003. Subjective test methodology for evaluating speech communication systems that include noise suppression algorithm.  ITU-T Recommendation P.835 2003. Subjective test methodology for evaluating speech communication systems that include noise suppression algorithm."},{"key":"e_1_2_1_12_1","unstructured":"ITU-T Recommendation P.862 2001. PESQ an objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs.  ITU-T Recommendation P.862 2001. PESQ an objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs."},{"key":"e_1_2_1_13_1","unstructured":"ITU-T Recommendation P.862.1 2003. Mapping function for transforming P. 862 raw result scores to MOS-LQO.  ITU-T Recommendation P.862.1 2003. Mapping function for transforming P. 862 raw result scores to MOS-LQO."},{"key":"e_1_2_1_14_1","unstructured":"ITU-T Recommendation P.10 2006. Vocabulary for performance and quality of service.  ITU-T Recommendation P.10 2006. Vocabulary for performance and quality of service."},{"key":"e_1_2_1_15_1","unstructured":"Knowles. 2017. Retrieved February 14 2018. http:\/\/www.knowles.com\/eng\/Products\/Sensors\/Accelerometers  Knowles. 2017. Retrieved February 14 2018. http:\/\/www.knowles.com\/eng\/Products\/Sensors\/Accelerometers"},{"key":"e_1_2_1_16_1","volume-title":"The Sounds of the World's Languages","author":"Ladefoged Peter","year":"1981"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASL.2006.876760"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.specom.2016.07.002"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/PROC.1979.11540"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1177\/154193120805200505"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.apergo.2010.09.004"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2013.6639038"},{"key":"e_1_2_1_23_1","volume-title":"ISBN 978-0-309-15632-5","author":"Quieter Technology","year":"2010"},{"key":"e_1_2_1_24_1","unstructured":"NeoVictory. 2012. Retrieved November 1 from: http:\/\/neovictory.com\/2012\/index.php?lang=us  NeoVictory. 2012. Retrieved November 1 from: http:\/\/neovictory.com\/2012\/index.php?lang=us"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/45.1890"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSA.2005.855838"},{"key":"e_1_2_1_27_1","unstructured":"Polytec Laser Doppler Vibrometry 2018. Retrieved June 21 2018 from: https:\/\/www.polytec.com\/us\/vibrometry\/technology\/  Polytec Laser Doppler Vibrometry 2018. Retrieved June 21 2018 from: https:\/\/www.polytec.com\/us\/vibrometry\/technology\/"},{"key":"e_1_2_1_28_1","first-page":"79","volume-title":"Hamidreza Taghavi and M\u00e5ns Eeg-Olofsson","author":"Reinfeldt Sabine","year":"2015"},{"key":"e_1_2_1_29_1","first-page":"205","article-title":"Acoustic Sensor for Health Status Monitoring","volume":"2","author":"Scanlon Michael V.","year":"1998","journal-title":"Proc. IRIS Acoustic Seismic Sensing"},{"key":"e_1_2_1_30_1","unstructured":"Speech API -- Speech Recognition Google Cloud Platform 2018. Retrieved January 15 2018 from: https:\/\/cloud.google.com\/speech\/?hl=en  Speech API -- Speech Recognition Google Cloud Platform 2018. Retrieved January 15 2018 from: https:\/\/cloud.google.com\/speech\/?hl=en"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2014.2364452"}],"container-title":["Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3287058","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3287058","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T01:02:08Z","timestamp":1750208528000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3287058"}},"subtitle":["Noise-Robust Speech Capturing Glasses Using Vibration Sensors"],"short-title":[],"issued":{"date-parts":[[2018,12,27]]},"references-count":31,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2018,12,27]]}},"alternative-id":["10.1145\/3287058"],"URL":"https:\/\/doi.org\/10.1145\/3287058","relation":{},"ISSN":["2474-9567"],"issn-type":[{"value":"2474-9567","type":"electronic"}],"subject":[],"published":{"date-parts":[[2018,12,27]]},"assertion":[{"value":"2018-02-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2018-10-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2018-12-27","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}