{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,14]],"date-time":"2026-03-14T17:57:46Z","timestamp":1773511066798,"version":"3.50.1"},"reference-count":78,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2021,9,9]],"date-time":"2021-09-09T00:00:00Z","timestamp":1631145600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100003399","name":"Science and Technology Commission of Shanghai Municipality","doi-asserted-by":"crossref","award":["19DZ2283800"],"award-info":[{"award-number":["19DZ2283800"]}],"id":[{"id":"10.13039\/501100003399","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Interact. Mob. Wearable Ubiquitous Technol."],"published-print":{"date-parts":[[2021,9,9]]},"abstract":"<jats:p>Voice interactions and voice messages on mobile phones are rapidly growing in popularity. However, the user experience of these services is still worse than desired in noisy environments, especially in multi-talker scenarios, where the phone can only provide low-quality voice recordings. Speech enhancement using only audio as the input remains a grand challenge in these scenarios. In this paper, we handle this with the help of the emerging acoustic sensing technology. The key insight is that the inaudible acoustic signals emitted by speakers of phones can capture the subtle lip movements when people speak. Instead of enabling lip reading for the classification of limited voice commands, we further unlock the potential of acoustic sensing and leverage the captured lip information to improve the voice recording quality. We propose WaveVoice, a joint audio-sensory deep learning method for end-to-end speech enhancement on mobile phones. The model of WaveVoice is structured as an encoder-decoder network, in which audio and acoustic sensing data are processed through two individual CNN branches, respectively, and then fused into a joint network to generate enhanced speech. In addition, to improve the performance on new users, a self-supervised learning methodology is developed to adapt the model to extract speaker-specific features. We construct a dataset to train and evaluate WaveVoice. We also perform online tests under various noisy conditions to show the applicability of our system in real-world scenarios. Experimental results show that WaveVoice can effectively reconstruct the target clean speech from the noisy audio signals, and yield notably superior performance compared with the audio-only encoder-decoder model and the state-of-the-art speech enhancement methods. Given its promising performance, we believe that WaveVoice has made a substantial contribution to the advancement of mobile voice input.<\/jats:p>","DOI":"10.1145\/3478093","type":"journal-article","created":{"date-parts":[[2021,9,14]],"date-time":"2021-09-14T22:48:23Z","timestamp":1631659703000},"page":"1-30","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":16,"title":["Sensing to Hear"],"prefix":"10.1145","volume":"5","author":[{"given":"Qian","family":"Zhang","sequence":"first","affiliation":[{"name":"Shanghai Jiao Tong University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dong","family":"Wang","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Run","family":"Zhao","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yinggang","family":"Yu","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Junjie","family":"Shen","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,9,14]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2018-1400"},{"key":"e_1_2_1_2_1","unstructured":"Ltd Beijing DataTang Technology Co. [n.d.]. aidatatang_200zh a free Chinese Mandarin speech corpus. https:\/\/www.datatang.com  Ltd Beijing DataTang Technology Co. [n.d.]. aidatatang_200zh a free Chinese Mandarin speech corpus. https:\/\/www.datatang.com"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICSDA.2017.8384449"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2016.2572259"},{"key":"e_1_2_1_5_1","doi-asserted-by":"crossref","unstructured":"Zhuo Chen Yan Huang Jinyu Li and Yifan Gong. 2017. Improving mask learning based speech enhancement system with restoration layers and residual connection. In Interspeech. ISCA.  Zhuo Chen Yan Huang Jinyu Li and Yifan Gong. 2017. Improving mask learning based speech enhancement system with restoration layers and residual connection. In Interspeech. ISCA.","DOI":"10.21437\/Interspeech.2017-515"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.367"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3341162.3343756"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/3197517.3201357"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.213"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3411830"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2017.7952261"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/2380116.2380184"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1523\/JNEUROSCI.3675-12.2013"},{"key":"e_1_2_1_14_1","unstructured":"Google. [n.d.]. Deploy machine learning models on mobile and IoT devices. https:\/\/www.tensorflow.org\/lite  Google. [n.d.]. Deploy machine learning models on mobile and IoT devices. https:\/\/www.tensorflow.org\/lite"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/2207676.2208331"},{"key":"e_1_2_1_16_1","volume-title":"Deep Residual Learning for Image Recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).","author":"He Kaiming","year":"2016","unstructured":"Kaiming He , Xiangyu Zhang , Shaoqing Ren , and Jian Sun . 2016 . Deep Residual Learning for Image Recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)."},{"key":"e_1_2_1_17_1","volume-title":"The cortical organization of speech processing. Nature reviews neuroscience 8, 5","author":"Hickok Gregory","year":"2007","unstructured":"Gregory Hickok and David Poeppel . 2007. The cortical organization of speech processing. Nature reviews neuroscience 8, 5 ( 2007 ), 393--402. Gregory Hickok and David Poeppel. 2007. The cortical organization of speech processing. Nature reviews neuroscience 8, 5 (2007), 393--402."},{"key":"e_1_2_1_18_1","volume-title":"Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531","author":"Hinton Geoffrey","year":"2015","unstructured":"Geoffrey Hinton , Oriol Vinyals , and Jeff Dean . 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 ( 2015 ). Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/TETCI.2017.2784878"},{"key":"e_1_2_1_20_1","volume-title":"MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv: Computer Vision and Pattern Recognition","author":"Howard Andrew","year":"2017","unstructured":"Andrew Howard , Menglong Zhu , Bo Chen , Dmitry Kalenichenko , Weijun Wang , Tobias Weyand , M Andreetto , and Hartwig Adam . 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv: Computer Vision and Pattern Recognition ( 2017 ). Andrew Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, M Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv: Computer Vision and Pattern Recognition (2017)."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.5555\/2209820.2210675"},{"key":"e_1_2_1_22_1","unstructured":"Google Inc. [n.d.]. Clund Speech-to-Text. https:\/\/cloud.google.com\/speech-to-text\/  Google Inc. [n.d.]. Clund Speech-to-Text. https:\/\/cloud.google.com\/speech-to-text\/"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2012.6288223"},{"key":"e_1_2_1_24_1","volume-title":"Adam: A Method for Stochastic Optimization. arXiv: Learning","author":"Kingma Diederik P","year":"2014","unstructured":"Diederik P Kingma and Jimmy Ba . 2014 . Adam: A Method for Stochastic Optimization. arXiv: Learning (2014). Diederik P Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. arXiv: Learning (2014)."},{"key":"e_1_2_1_25_1","volume-title":"ICML deep learning workshop","author":"Koch Gregory","unstructured":"Gregory Koch , Richard Zemel , and Ruslan Salakhutdinov . 2015. Siamese neural networks for one-shot image recognition . In ICML deep learning workshop , Vol. 2 . Lille . Gregory Koch, Richard Zemel, and Ruslan Salakhutdinov. 2015. Siamese neural networks for one-shot image recognition. In ICML deep learning workshop, Vol. 2. Lille."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2016.2628641"},{"key":"e_1_2_1_27_1","volume-title":"International Conference on Machine Learning. PMLR, 2965--2974","author":"Lehtinen Jaakko","year":"2018","unstructured":"Jaakko Lehtinen , Jacob Munkberg , Jon Hasselgren , Samuli Laine , Tero Karras , Miika Aittala , and Timo Aila . 2018 . Noise2Noise: Learning Image Restoration without Clean Data . In International Conference on Machine Learning. PMLR, 2965--2974 . Jaakko Lehtinen, Jacob Munkberg, Jon Hasselgren, Samuli Laine, Tero Karras, Miika Aittala, and Timo Aila. 2018. Noise2Noise: Learning Image Restoration without Clean Data. In International Conference on Machine Learning. PMLR, 2965--2974."},{"key":"e_1_2_1_28_1","volume-title":"Implementing quality of service over Cisco MPLS VPNs. Selecting MPLS VPN Services","author":"Lewis Chris","year":"2006","unstructured":"Chris Lewis and Steve Pickavance . 2006. Implementing quality of service over Cisco MPLS VPNs. Selecting MPLS VPN Services ( 2006 ). Chris Lewis and Steve Pickavance. 2006. Implementing quality of service over Cisco MPLS VPNs. Selecting MPLS VPN Services (2006)."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/3384419.3430780"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/SLT.2018.8639575"},{"key":"e_1_2_1_31_1","volume-title":"UltraGesture: Fine-Grained Gesture Sensing and Recognition. In 2018 15th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON). 1--9.","author":"Ling K.","unstructured":"K. Ling , H. Dai , Y. Liu , and A. X. Liu . 2018 . UltraGesture: Fine-Grained Gesture Sensing and Recognition. In 2018 15th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON). 1--9. K. Ling, H. Dai, Y. Liu, and A. X. Liu. 2018. UltraGesture: Fine-Grained Gesture Sensing and Recognition. In 2018 15th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON). 1--9."},{"key":"e_1_2_1_32_1","first-page":"1","article-title":"BlinkListener: \" Listen\" to Your Eye Blink Using Your Smartphone","volume":"5","author":"Liu Jialin","year":"2021","unstructured":"Jialin Liu , Dong Li , Lei Wang , and Jie Xiong . 2021 . BlinkListener: \" Listen\" to Your Eye Blink Using Your Smartphone . Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5 , 2 (2021), 1 -- 27 . Jialin Liu, Dong Li, Lei Wang, and Jie Xiong. 2021. BlinkListener: \" Listen\" to Your Eye Blink Using Your Smartphone. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5, 2 (2021), 1--27.","journal-title":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNET.2019.2891733"},{"key":"e_1_2_1_34_1","first-page":"436","article-title":"Speech enhancement based on deep denoising autoencoder","volume":"2013","author":"Lu Xugang","year":"2013","unstructured":"Xugang Lu , Yu Tsao , Shigeki Matsuda , and Chiori Hori . 2013 . Speech enhancement based on deep denoising autoencoder .. In Interspeech , Vol. 2013. 436 -- 440 . Xugang Lu, Yu Tsao, Shigeki Matsuda, and Chiori Hori. 2013. Speech enhancement based on deep denoising autoencoder.. In Interspeech, Vol. 2013. 436--440.","journal-title":"Interspeech"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/2973750.2973755"},{"key":"e_1_2_1_36_1","volume-title":"Hearing lips and seeing voices. Nature 264, 5588","author":"McGurk Harry","year":"1976","unstructured":"Harry McGurk and John MacDonald . 1976. Hearing lips and seeing voices. Nature 264, 5588 ( 1976 ), 746--748. Harry McGurk and John MacDonald. 1976. Hearing lips and seeing voices. Nature 264, 5588 (1976), 746--748."},{"key":"e_1_2_1_37_1","volume-title":"Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784","author":"Mirza Mehdi","year":"2014","unstructured":"Mehdi Mirza and Simon Osindero . 2014. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 ( 2014 ). Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014)."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/2742647.2742674"},{"key":"e_1_2_1_39_1","volume-title":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing. 7092--7096","author":"Narayanan A.","unstructured":"A. Narayanan and D. Wang . 2013. Ideal ratio mask estimation using deep neural networks for robust speech recognition . In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. 7092--7096 . A. Narayanan and D. Wang. 2013. Ideal ratio mask estimation using deep neural networks for robust speech recognition. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. 7092--7096."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2019.8683634"},{"key":"e_1_2_1_41_1","volume-title":"SEGAN: Speech enhancement generative adversarial network. arXiv preprint arXiv:1703.09452","author":"Pascual Santiago","year":"2017","unstructured":"Santiago Pascual , Antonio Bonafonte , and Joan Serra . 2017 . SEGAN: Speech enhancement generative adversarial network. arXiv preprint arXiv:1703.09452 (2017). Santiago Pascual, Antonio Bonafonte, and Joan Serra. 2017. SEGAN: Speech enhancement generative adversarial network. arXiv preprint arXiv:1703.09452 (2017)."},{"key":"e_1_2_1_42_1","volume-title":"Subjective test methodology for evaluating speech communication systems that include noise suppression algorithm. ITU-T recommendation","author":"Recommendation ITUT","year":"2003","unstructured":"ITUT Recommendation . 2003. Subjective test methodology for evaluating speech communication systems that include noise suppression algorithm. ITU-T recommendation ( 2003 ), 835. ITUT Recommendation. 2003. Subjective test methodology for evaluating speech communication systems that include noise suppression algorithm. ITU-T recommendation (2003), 835."},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2001.941023"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/3242587.3242599"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/3447993.3448626"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM.2017.8057099"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/3191768"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2019.8683385"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2020\/528"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1121\/1.4799597"},{"key":"e_1_2_1_51_1","volume-title":"Fundamentals of wireless communication","author":"Tse David","unstructured":"David Tse and Pramod Viswanath . 2005. Fundamentals of wireless communication . Cambridge university press . David Tse and Pramod Viswanath. 2005. Fundamentals of wireless communication. Cambridge university press."},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSA.2005.858005"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSA.2005.858005"},{"key":"e_1_2_1_54_1","unstructured":"voicebot.ai. 2019. Voice Assistant Consumer Adoption Report. https:\/\/voicebot.ai\/wp-content\/uploads\/2019\/01\/voice-assistant-consumer-adoption-report-2018-voicebot.pdf  voicebot.ai. 2019. Voice Assistant Consumer Adoption Report. https:\/\/voicebot.ai\/wp-content\/uploads\/2019\/01\/voice-assistant-consumer-adoption-report-2018-voicebot.pdf"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/3191771"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2018.2842159"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2018.2842159"},{"key":"e_1_2_1_58_1","volume-title":"Proceedings of the 22nd Annual International Conference on Mobile Computing and Networking. 82--94","author":"Wang Wei","year":"2016","unstructured":"Wei Wang , Alex X Liu , and Ke Sun . 2016 . Device-free gesture tracking using acoustic signals . In Proceedings of the 22nd Annual International Conference on Mobile Computing and Networking. 82--94 . Wei Wang, Alex X Liu, and Ke Sun. 2016. Device-free gesture tracking using acoustic signals. In Proceedings of the 22nd Annual International Conference on Mobile Computing and Networking. 82--94."},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2014.2352935"},{"key":"e_1_2_1_60_1","unstructured":"WeChat. 2017. The 2017 WeChat Data Report. https:\/\/blog.wechat.com\/2017\/11\/09\/the-2017-wechat-data-report\/.  WeChat. 2017. The 2017 WeChat Data Report. https:\/\/blog.wechat.com\/2017\/11\/09\/the-2017-wechat-data-report\/."},{"key":"e_1_2_1_61_1","volume-title":"Complex ratio masking for monaural speech separation","author":"Williamson Donald S","year":"2015","unstructured":"Donald S Williamson , Yuxuan Wang , and DeLiang Wang . 2015. Complex ratio masking for monaural speech separation . IEEE\/ACM transactions on audio, speech, and language processing 24, 3 ( 2015 ), 483--492. Donald S Williamson, Yuxuan Wang, and DeLiang Wang. 2015. Complex ratio masking for monaural speech separation. IEEE\/ACM transactions on audio, speech, and language processing 24, 3 (2015), 483--492."},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIFS.2019.2944058"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2020.3026622"},{"key":"e_1_2_1_64_1","volume-title":"Listening to Sounds of Silence for Speech Denoising. arXiv preprint arXiv:2010.12013","author":"Xu Ruilin","year":"2020","unstructured":"Ruilin Xu , Rundi Wu , Yuko Ishiwaka , Carl Vondrick , and Changxi Zheng . 2020. Listening to Sounds of Silence for Speech Denoising. arXiv preprint arXiv:2010.12013 ( 2020 ). Ruilin Xu, Rundi Wu, Yuko Ishiwaka, Carl Vondrick, and Changxi Zheng. 2020. Listening to Sounds of Silence for Speech Denoising. arXiv preprint arXiv:2010.12013 (2020)."},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM.2017.8057022"},{"key":"e_1_2_1_66_1","unstructured":"Xiangyu Xu Jiadi Yu Yingying Chen Yanmin Zhu Linghe Kong and Minglu Li. 2019. BreathListener: Fine-grained Breathing Monitoring in Driving Environments Utilizing Acoustic Signals. (2019) 54--66.  Xiangyu Xu Jiadi Yu Yingying Chen Yanmin Zhu Linghe Kong and Minglu Li. 2019. BreathListener: Fine-grained Breathing Monitoring in Driving Environments Utilizing Acoustic Signals. (2019) 54--66."},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2014.2364452"},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1145\/3359266"},{"key":"e_1_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i05.6489"},{"key":"e_1_2_1_70_1","volume-title":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 241--245","author":"Yu D.","unstructured":"D. Yu , M. Kolb\u00e6k , Z. Tan , and J. Jensen . 2017. Permutation invariant training of deep models for speaker-independent multi-talker speech separation . In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 241--245 . D. Yu, M. Kolb\u00e6k, Z. Tan, and J. Jensen. 2017. Permutation invariant training of deep models for speaker-independent multi-talker speech separation. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 241--245."},{"key":"e_1_2_1_71_1","unstructured":"Fisher Yu and Vladlen Koltun. 2016. Multi-Scale Context Aggregation by Dilated Convolutions. (2016).  Fisher Yu and Vladlen Koltun. 2016. Multi-Scale Context Aggregation by Dilated Convolutions. (2016)."},{"key":"e_1_2_1_72_1","volume-title":"Strata: Fine-Grained Acoustic-based Device-Free Tracking.","author":"Yun Sangki","year":"2017","unstructured":"Sangki Yun , Yichao Chen , Huihuang Zheng , Lili Qiu , and Wenguang Mao . 2017 . Strata: Fine-Grained Acoustic-based Device-Free Tracking. (2017), 15--28. Sangki Yun, Yichao Chen, Huihuang Zheng, Lili Qiu, and Wenguang Mao. 2017. Strata: Fine-Grained Acoustic-based Device-Free Tracking. (2017), 15--28."},{"key":"e_1_2_1_73_1","doi-asserted-by":"publisher","DOI":"10.1145\/3264958"},{"key":"e_1_2_1_74_1","doi-asserted-by":"publisher","DOI":"10.1145\/3133956.3133962"},{"key":"e_1_2_1_75_1","doi-asserted-by":"publisher","DOI":"10.1145\/3494987"},{"key":"e_1_2_1_76_1","doi-asserted-by":"publisher","DOI":"10.1145\/3381008"},{"key":"e_1_2_1_77_1","volume-title":"Two-stage deep learning for noisy-reverberant speech enhancement","author":"Zhao Yan","year":"2018","unstructured":"Yan Zhao , Zhong-Qiu Wang , and DeLiang Wang . 2018. Two-stage deep learning for noisy-reverberant speech enhancement . IEEE\/ACM transactions on audio, speech, and language processing 27, 1 ( 2018 ), 53--62. Yan Zhao, Zhong-Qiu Wang, and DeLiang Wang. 2018. Two-stage deep learning for noisy-reverberant speech enhancement. IEEE\/ACM transactions on audio, speech, and language processing 27, 1 (2018), 53--62."},{"key":"e_1_2_1_78_1","doi-asserted-by":"publisher","DOI":"10.1145\/3241539.3241575"}],"container-title":["Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3478093","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3478093","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:31:32Z","timestamp":1750188692000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3478093"}},"subtitle":["Speech Enhancement for Mobile Devices Using Acoustic Signals"],"short-title":[],"issued":{"date-parts":[[2021,9,9]]},"references-count":78,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2021,9,9]]}},"alternative-id":["10.1145\/3478093"],"URL":"https:\/\/doi.org\/10.1145\/3478093","relation":{},"ISSN":["2474-9567"],"issn-type":[{"value":"2474-9567","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,9,9]]},"assertion":[{"value":"2021-09-14","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}