{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T17:46:04Z","timestamp":1776102364971,"version":"3.50.1"},"reference-count":68,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2025,3,3]],"date-time":"2025-03-03T00:00:00Z","timestamp":1740960000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"NSFC","doi-asserted-by":"crossref","award":["NSFC 62102245"],"award-info":[{"award-number":["NSFC 62102245"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Interact. Mob. Wearable Ubiquitous Technol."],"published-print":{"date-parts":[[2025,3,3]]},"abstract":"<jats:p>Smart eyewear has rapidly evolved in recent years, yet its mobile and in-the-wild characteristics often make voice interactions on such devices susceptible to external interferences. In this paper, we introduce WearSE, a system that utilizes acoustic signals emitted and received by speakers and microphones mounted on eyewear to perceive facial movements during speech, achieving multimodal speech enhancement. WearSE incorporates three key designs to meet the high demands for real-time operation and robustness on smart eyewear. First, considering the frequent use in mobile scenarios, we design a sensing-enhanced network to amplify the capability of acoustic sensing, eliminating dynamic multipath interferences. Second, we develop a lightweight speech enhancement network that enhances both the amplitude and phase of the speech spectrum. Through a casual network design, computational demands are significantly reduced, ensuring real-time operation on mobile devices. Third, addressing the scarcity of paired data, we design a memory-based back-translation mechanism to generate pseudo-acoustic sensing data using a large amount of publicly available speech data for network training. We construct a prototype system and extensively evaluate WearSE through experiments. In multi-speaker scenarios, our approach exhibits much better performance than pure audio speech enhancement methods. Comparisons with commercial smart eyewear also demonstrate that WearSE significantly surpasses existing noise reduction algorithms in these devices. The audio demo of WearSE is available on https:\/\/github.com\/WearSE\/wearse.github.io.<\/jats:p>","DOI":"10.1145\/3712288","type":"journal-article","created":{"date-parts":[[2025,3,4]],"date-time":"2025-03-04T12:10:14Z","timestamp":1741090214000},"page":"1-30","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["WearSE: Enabling Streaming Speech Enhancement on Eyewear Using Acoustic Sensing"],"prefix":"10.1145","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7708-8694","authenticated-orcid":false,"given":"Qian","family":"Zhang","sequence":"first","affiliation":[{"name":"Shanghai Jiao Tong University, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-5639-1837","authenticated-orcid":false,"given":"Kaiyi","family":"Guo","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-0233-4394","authenticated-orcid":false,"given":"Yifei","family":"Yang","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8444-1636","authenticated-orcid":false,"given":"Dong","family":"Wang","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, China"}]}],"member":"320","published-online":{"date-parts":[[2025,3,4]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2018-1400"},{"key":"e_1_2_1_2_1","unstructured":"Ltd Beijing DataTang Technology Co. [n.d.]. aidatatang_200zh a free Chinese Mandarin speech corpus. https:\/\/www.datatang.com"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICSDA.2017.8384449"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3570361.3613270"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3636534.3649350"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/3498361.3538933"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2016.2572259"},{"key":"e_1_2_1_8_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3351238","article-title":"ASSV: handwritten signature verification using acoustic signals","volume":"3","author":"Ding Feng","year":"2019","unstructured":"Feng Ding, Dong Wang, Qian Zhang, and Run Zhao. 2019. ASSV: handwritten signature verification using acoustic signals. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 3, 3 (2019), 1--22.","journal-title":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies"},{"key":"e_1_2_1_9_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3550303","article-title":"Ultraspeech: Speech enhancement by interaction between ultrasound and speech","volume":"6","author":"Ding Han","year":"2022","unstructured":"Han Ding, Yizhan Wang, Hao Li, Cui Zhao, Ge Wang, Wei Xi, and Jizhong Zhao. 2022. Ultraspeech: Speech enhancement by interaction between ultrasound and speech. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 3 (2022), 1--25.","journal-title":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3197517.3201357"},{"key":"e_1_2_1_11_1","unstructured":"Centers for Disease Control and Prevention. 2023. Public Health and Scientific Information. https:\/\/www.cdc.gov\/nceh\/hearing_loss\/public_health_scientific_info.html."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01524"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/JSTSP.2020.2980956"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3678594"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3581791.3596832"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNN.2004.832812"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.5555\/2209820.2210675"},{"key":"e_1_2_1_18_1","unstructured":"Custom Market Insights. 2024. Global Smart Eyewear Technology Market 2024-2033. https:\/\/www.custommarketinsights.com\/report\/smart-eyewear-technology-market\/#collapseOne."},{"key":"e_1_2_1_19_1","unstructured":"ITU. 2003. Series G: Transmission Systems and Media Digital Systems and Networks. (2003)."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2012.6288223"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3581641.3584071"},{"key":"e_1_2_1_22_1","first-page":"1","article-title":"EarCommand: \" Hearing","volume":"6","author":"Jin Yincheng","year":"2022","unstructured":"Yincheng Jin, Yang Gao, Xuhai Xu, Seokmin Choi, Jiyang Li, Feng Liu, Zhengxiong Li, and Zhanpeng Jin. 2022. EarCommand: \" Hearing\" Your Silent Speech Commands In Ear. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 2 (2022), 1--28.","journal-title":"Your Silent Speech Commands In Ear. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP48485.2024.10447679"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3384419.3430780"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/3569476"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3560905.3568528"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/3636534.3649376"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3613904.3642613"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/3569476"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3610882"},{"key":"e_1_2_1_31_1","first-page":"1","article-title":"BlinkListener: \" Listen\" to Your Eye Blink Using Your Smartphone","volume":"5","author":"Liu Jialin","year":"2021","unstructured":"Jialin Liu, Dong Li, Lei Wang, and Jie Xiong. 2021. BlinkListener: \" Listen\" to Your Eye Blink Using Your Smartphone. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5, 2 (2021), 1--27.","journal-title":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2013-130"},{"key":"e_1_2_1_33_1","volume-title":"Conv-tasnet: Surpassing ideal time--frequency magnitude masking for speech separation","author":"Luo Yi","year":"2019","unstructured":"Yi Luo and Nima Mesgarani. 2019. Conv-tasnet: Surpassing ideal time--frequency magnitude masking for speech separation. IEEE\/ACM transactions on audio, speech, and language processing 27, 8 (2019), 1256--1266."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3659602"},{"key":"e_1_2_1_35_1","volume-title":"Key-value memory networks for directly reading documents. arXiv preprint arXiv:1606.03126","author":"Miller Alexander","year":"2016","unstructured":"Alexander Miller, Adam Fisch, Jesse Dodge, Amir-Hossein Karimi, Antoine Bordes, and Jason Weston. 2016. Key-value memory networks for directly reading documents. arXiv preprint arXiv:1606.03126 (2016)."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASL.2013.2270369"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2023.3250846"},{"key":"e_1_2_1_38_1","volume-title":"The importance of phase in speech enhancement. speech communication 53, 4","author":"Paliwal Kuldip","year":"2011","unstructured":"Kuldip Paliwal, Kamil W\u00f3jcicki, and Benjamin Shannon. 2011. The importance of phase in speech enhancement. speech communication 53, 4 (2011), 465--494."},{"key":"e_1_2_1_39_1","volume-title":"SEGAN: Speech enhancement generative adversarial network. arXiv preprint arXiv:1703.09452","author":"Pascual Santiago","year":"2017","unstructured":"Santiago Pascual, Antonio Bonafonte, and Joan Serra. 2017. SEGAN: Speech enhancement generative adversarial network. arXiv preprint arXiv:1703.09452 (2017)."},{"key":"e_1_2_1_40_1","volume-title":"Subjective test methodology for evaluating speech communication systems that include noise suppression algorithm. ITU-T recommendation","author":"Recommendation ITUT","year":"2003","unstructured":"ITUT Recommendation. 2003. Subjective test methodology for evaluating speech communication systems that include noise suppression algorithm. ITU-T recommendation (2003), 835."},{"key":"e_1_2_1_41_1","unstructured":"Grand View Research. 2024. Smart Glasses Market Size Trends. https:\/\/www.grandviewresearch.com\/industry-analysis\/smart-glasses-market-report."},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448121"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/3576842.3582365"},{"key":"e_1_2_1_44_1","volume-title":"Improving neural machine translation models with monolingual data. arXiv preprint arXiv:1511.06709","author":"Sennrich Rico","year":"2015","unstructured":"Rico Sennrich. 2015. Improving neural machine translation models with monolingual data. arXiv preprint arXiv:1511.06709 (2015)."},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/IPSN54338.2022.00019"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/3447993.3448626"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/3678541"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2018.2842159"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASSP.1982.1163920"},{"key":"e_1_2_1_50_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3550293","article-title":"LoEar: Push the range limit of acoustic sensing for vital sign monitoring","volume":"6","author":"Wang Lei","year":"2022","unstructured":"Lei Wang, Wei Li, Ke Sun, Fusang Zhang, Tao Gu, Chenren Xu, and Daqing Zhang. 2022. LoEar: Push the range limit of acoustic sensing for vital sign monitoring. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 3 (2022), 1--24.","journal-title":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/3161188"},{"key":"e_1_2_1_52_1","volume-title":"Proceedings of the 22nd Annual International Conference on Mobile Computing and Networking. 82--94","author":"Wang Wei","year":"2016","unstructured":"Wei Wang, Alex X Liu, and Ke Sun. 2016. Device-free gesture tracking using acoustic signals. In Proceedings of the 22nd Annual International Conference on Mobile Computing and Networking. 82--94."},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2014.2352935"},{"key":"e_1_2_1_54_1","volume-title":"Complex ratio masking for monaural speech separation","author":"Williamson Donald S","year":"2015","unstructured":"Donald S Williamson, Yuxuan Wang, and DeLiang Wang. 2015. Complex ratio masking for monaural speech separation. IEEE\/ACM transactions on audio, speech, and language processing 24, 3 (2015), 483--492."},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448105"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2014.2364452"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/3613904.3642437"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i05.6489"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/LSP.2014.2352352"},{"key":"e_1_2_1_60_1","volume-title":"Strata: Fine-Grained Acoustic-based Device-Free Tracking.","author":"Yun Sangki","year":"2017","unstructured":"Sangki Yun, Yichao Chen, Huihuang Zheng, Lili Qiu, and Wenguang Mao. 2017. Strata: Fine-Grained Acoustic-based Device-Free Tracking. (2017), 15--28."},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1145\/3659614"},{"key":"e_1_2_1_62_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3659598","article-title":"Sensing to hear through memory: Ultrasound speech enhancement without real ultrasound signals","volume":"8","author":"Zhang Qian","year":"2024","unstructured":"Qian Zhang, Ke Liu, and Dong Wang. 2024. Sensing to hear through memory: Ultrasound speech enhancement without real ultrasound signals. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 8, 2 (2024), 1--31.","journal-title":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1145\/3494990"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1145\/3494990"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1145\/3544548.3580801"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1145\/3432192"},{"key":"e_1_2_1_67_1","volume-title":"Mechanics of human voice production and control. The journal of the acoustical society of america 140, 4","author":"Zhang Zhaoyan","year":"2016","unstructured":"Zhaoyan Zhang. 2016. Mechanics of human voice production and control. The journal of the acoustical society of america 140, 4 (2016), 2614--2635."},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1145\/3488544"}],"container-title":["Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3712288","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3712288","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,22]],"date-time":"2025-08-22T19:30:19Z","timestamp":1755891019000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3712288"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,3]]},"references-count":68,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,3,3]]}},"alternative-id":["10.1145\/3712288"],"URL":"https:\/\/doi.org\/10.1145\/3712288","relation":{},"ISSN":["2474-9567"],"issn-type":[{"value":"2474-9567","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,3,3]]},"assertion":[{"value":"2025-03-04","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}