{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,14]],"date-time":"2026-03-14T17:57:47Z","timestamp":1773511067785,"version":"3.50.1"},"reference-count":55,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2020,3,18]],"date-time":"2020-03-18T00:00:00Z","timestamp":1584489600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100004663","name":"Ministry of Science and Technology of Taiwan","doi-asserted-by":"crossref","award":["MOST108-2636-E-009-011- and 108-2633-E-002-001-"],"award-info":[{"award-number":["MOST108-2636-E-009-011- and 108-2633-E-002-001-"]}],"id":[{"id":"10.13039\/501100004663","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100005799","name":"National Chiao Tung University","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100005799","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Startup Fund for Youngman Research at SJTU"},{"name":"Joint Key Project of the NSFC","award":["U1736207"],"award-info":[{"award-number":["U1736207"]}]},{"DOI":"10.13039\/501100012166","name":"National Key R&D Program of China","doi-asserted-by":"crossref","award":["2018YFB2101102"],"award-info":[{"award-number":["2018YFB2101102"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Interact. Mob. Wearable Ubiquitous Technol."],"published-print":{"date-parts":[[2020,3,18]]},"abstract":"<jats:p>Using silent speech to issue commands has received growing attention, as users can utilize existing command sets from voice-based interfaces without attracting other people's attention. Such interaction maintains privacy and social acceptance from others. However, current solutions for recognizing silent speech mainly rely on camera-based data or attaching sensors to the throat. Camera-based solutions require 5.82 times larger power consumption or have potential privacy issues; attaching sensors to the throat is not practical for commercial-off-the-shell (COTS) devices because additional sensors are required. In this paper, we propose a sensing technique that only needs a microphone and a speaker on COTS devices, which not only consumes little power but also has fewer privacy concerns. By deconstructing the received acoustic signals, a 2D motion profile can be generated. We propose a classifier based on convolutional neural networks (CNN) to identify the corresponding silent command from the 2D motion profiles. The proposed classifier can adapt to users and is robust when tested by environmental factors. Our evaluation shows that the system achieves 92.5% accuracy in classifying 20 commands.<\/jats:p>","DOI":"10.1145\/3381008","type":"journal-article","created":{"date-parts":[[2020,3,18]],"date-time":"2020-03-18T18:54:31Z","timestamp":1584557671000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":40,"title":["Endophasia"],"prefix":"10.1145","volume":"4","author":[{"given":"Yongzhao","family":"Zhang","sequence":"first","affiliation":[{"name":"Shanghai Jiao Tong University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wei-Hsiang","family":"Huang","sequence":"additional","affiliation":[{"name":"National Chiao Tung University, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chih-Yun","family":"Yang","sequence":"additional","affiliation":[{"name":"National Chiao Tung University, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wen-Ping","family":"Wang","sequence":"additional","affiliation":[{"name":"National Chiao Tung University, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yi-Chao","family":"Chen","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chuang-Wen","family":"You","sequence":"additional","affiliation":[{"name":"National Taiwan University, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Da-Yuan","family":"Huang","sequence":"additional","affiliation":[{"name":"National Chiao Tung University, Taiwan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Guangtao","family":"Xue","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jiadi","family":"Yu","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2020,3,18]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSP.2007.893911"},{"key":"e_1_2_1_2_1","volume-title":"LipNet: End-to-End Sentence-level Lipreading. arXiv: Learning","author":"Assael Yannis M","year":"2017","unstructured":"Yannis M Assael , Brendan Shillingford , Shimon Whiteson , and Nando De Freitas . 2017. LipNet: End-to-End Sentence-level Lipreading. arXiv: Learning ( 2017 ). Yannis M Assael, Brendan Shillingford, Shimon Whiteson, and Nando De Freitas. 2017. LipNet: End-to-End Sentence-level Lipreading. arXiv: Learning (2017)."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/2638728.2638774"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.1958.10501456"},{"key":"e_1_2_1_5_1","volume-title":"Microphone arrays: signal processing techniques and applications","author":"Brandstein Michael","unstructured":"Michael Brandstein and Darren Ward . 2013. Microphone arrays: signal processing techniques and applications . Springer Science & Business Media . Michael Brandstein and Darren Ward. 2013. Microphone arrays: signal processing techniques and applications. Springer Science & Business Media."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.specom.2010.01.001"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00288"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCSI.2007.904667"},{"key":"e_1_2_1_9_1","doi-asserted-by":"crossref","unstructured":"Joon Son Chung and Andrew Zisserman. 2016. Lip Reading in the Wild. (2016) 87--103.  Joon Son Chung and Andrew Zisserman. 2016. Lip Reading in the Wild. (2016) 87--103.","DOI":"10.1007\/978-3-319-54184-6_6"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1121\/1.2229005"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.specom.2009.08.002"},{"key":"e_1_2_1_12_1","volume-title":"Retrieved","year":"2020","unstructured":"digikey.com. 2020 . 8 Ohms General Purpose Speaker 700mW 100Hz 20kHz Top Rectangular . Retrieved January 20, 2020 from https:\/\/www.digikey.com\/product-detail\/en\/cui-inc\/CMS-15113-078SP\/102-5644-ND\/8581915 digikey.com. 2020. 8 Ohms General Purpose Speaker 700mW 100Hz 20kHz Top Rectangular. Retrieved January 20, 2020 from https:\/\/www.digikey.com\/product-detail\/en\/cui-inc\/CMS-15113-078SP\/102-5644-ND\/8581915"},{"key":"e_1_2_1_13_1","volume-title":"Retrieved","year":"2020","unstructured":"digikey.com. 2020 . Knowles SPH0641LU4H-1 Microphone . Retrieved January 20, 2020 from https:\/\/www.digikey.com\/product-detail\/en\/knowles\/SPH0641LU4H-1\/423-1402-1-ND\/5332430 digikey.com. 2020. Knowles SPH0641LU4H-1 Microphone. Retrieved January 20, 2020 from https:\/\/www.digikey.com\/product-detail\/en\/knowles\/SPH0641LU4H-1\/423-1402-1-ND\/5332430"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jcp.2004.09.018"},{"key":"e_1_2_1_15_1","unstructured":"EN ETSI. [n.d.]. 300 908 (GSM 05.02) Digital Cellular Telecommunications System. Multiplexing and Multiple Access on the Radio Path ([n.d.]).  EN ETSI. [n.d.]. 300 908 (GSM 05.02) Digital Cellular Telecommunications System. Multiplexing and Multiple Access on the Radio Path ([n.d.])."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.1972.1054829"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3242587.3242603"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.1965.1053828"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.90"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.specom.2009.12.001"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.specom.2012.02.001"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1044\/jshr.1404.755"},{"key":"e_1_2_1_23_1","volume-title":"Microphone Arrays","author":"Kellermann Walter L","unstructured":"Walter L Kellermann . 2001. Acoustic echo cancellation for beamforming microphone arrays . In Microphone Arrays . Springer , 281--306. Walter L Kellermann. 2001. Acoustic echo cancellation for beamforming microphone arrays. In Microphone Arrays. Springer, 281--306."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3290605.3300376"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2639108.2639142"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3311823.3311831"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNET.2019.2891733"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM.2018.8486283"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/2973750.2973755"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3210240.3210325"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/3081333.3081362"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2017.2740000"},{"key":"e_1_2_1_33_1","unstructured":"Saeid Motiian Quinn Jones Seyed Iranmanesh and Gianfranco Doretto. 2017. Few-shot adversarial domain adaptation. In Advances in Neural Information Processing Systems. 6670--6680.  Saeid Motiian Quinn Jones Seyed Iranmanesh and Gianfranco Doretto. 2017. Few-shot adversarial domain adaptation. In Advances in Neural Information Processing Systems. 6670--6680."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.5555\/3104322.3104425"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1093\/ietisy\/e89-d.1.1"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2003.1200069"},{"key":"e_1_2_1_37_1","first-page":"1","article-title":"Inferring imagined speech using EEG signals: a new approach using Riemannian manifold features","volume":"15","author":"Nguyen Chuong H","year":"2017","unstructured":"Chuong H Nguyen , George K Karavas , and Panagiotis Artemiadis . 2017 . Inferring imagined speech using EEG signals: a new approach using Riemannian manifold features . Journal of Neural Engineering 15 , 1 (nov 2017), 016002. https:\/\/doi.org\/10.1088\/1741-2552\/aa8235 10.1088\/1741-2552 Chuong H Nguyen, George K Karavas, and Panagiotis Artemiadis. 2017. Inferring imagined speech using EEG signals: a new approach using Riemannian manifold features. Journal of Neural Engineering 15, 1 (nov 2017), 016002. https:\/\/doi.org\/10.1088\/1741-2552\/aa8235","journal-title":"Journal of Neural Engineering"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/1322263.1322265"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/18.144727"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/3214278"},{"key":"e_1_2_1_41_1","first-page":"66","article-title":"Channel estimation modeling","volume":"17","author":"Pukkila Markku","year":"2000","unstructured":"Markku Pukkila . 2000 . Channel estimation modeling . Nokia Research Center 17 (2000), 66 . Markku Pukkila. 2000. Channel estimation modeling. Nokia Research Center 17 (2000), 66.","journal-title":"Nokia Research Center"},{"key":"e_1_2_1_42_1","unstructured":"Theodore S Rappaport etal 1996. Wireless communications: principles and practice. Vol. 2. prentice hall PTR New Jersey.  Theodore S Rappaport et al. 1996. Wireless communications: principles and practice. Vol. 2. prentice hall PTR New Jersey."},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2014.131"},{"key":"e_1_2_1_44_1","volume-title":"Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15, 1","author":"Srivastava Nitish","year":"2014","unstructured":"Nitish Srivastava , Geoffrey Hinton , Alex Krizhevsky , Ilya Sutskever , and Ruslan Salakhutdinov . 2014. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15, 1 ( 2014 ), 1929--1958. Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research 15, 1 (2014), 1929--1958."},{"key":"e_1_2_1_45_1","volume-title":"Getting started with sigma-delta digital interface on applicable STM32 microcontrollers. Retrieved","year":"2020","unstructured":"st.com. 2020. Getting started with sigma-delta digital interface on applicable STM32 microcontrollers. Retrieved January 5, 2020 from https:\/\/www.st.com\/content\/ccc\/resource\/technical\/document\/application_note\/group0\/b2\/44\/42\/9d\/46\/b4\/4d\/34\/DM00354333\/files\/DM00354333.pdf\/jcr:content\/translations\/en.DM00354333.pdf st.com. 2020. Getting started with sigma-delta digital interface on applicable STM32 microcontrollers. Retrieved January 5, 2020 from https:\/\/www.st.com\/content\/ccc\/resource\/technical\/document\/application_note\/group0\/b2\/44\/42\/9d\/46\/b4\/4d\/34\/DM00354333\/files\/DM00354333.pdf\/jcr:content\/translations\/en.DM00354333.pdf"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/3242587.3242599"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/3241539.3241568"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/3191768"},{"key":"e_1_2_1_49_1","volume-title":"Retrieved","year":"2020","unstructured":"taobao.com. 2020 . STM43L476 Mini Development Board . Retrieved January 20, 2020 from https:\/\/item.taobao.com\/item.htm?spm=a230r.1.14.298.499c2265yYV2qF&id=582824201272&ns=1&abbucket=20#detail taobao.com. 2020. STM43L476 Mini Development Board. Retrieved January 20, 2020 from https:\/\/item.taobao.com\/item.htm?spm=a230r.1.14.298.499c2265yYV2qF&id=582824201272&ns=1&abbucket=20#detail"},{"key":"e_1_2_1_50_1","volume-title":"Retrieved","author":"VICON","year":"2019","unstructured":"VICON vero. 2019 . Vero X, large field of view . Retrieved August 10, 2019 from https:\/\/www.vicon.com\/products\/camera-systems\/vero VICON vero. 2019. Vero X, large field of view. Retrieved August 10, 2019 from https:\/\/www.vicon.com\/products\/camera-systems\/vero"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/3290605.3300248"},{"key":"e_1_2_1_52_1","volume-title":"Proceedings of the 22nd Annual International Conference on Mobile Computing and Networking. ACM, 82--94","author":"Wang Wei","year":"2016","unstructured":"Wei Wang , Alex X Liu , and Ke Sun . 2016 . Device-free gesture tracking using acoustic signals . In Proceedings of the 22nd Annual International Conference on Mobile Computing and Networking. ACM, 82--94 . Wei Wang, Alex X Liu, and Ke Sun. 2016. Device-free gesture tracking using acoustic signals. In Proceedings of the 22nd Annual International Conference on Mobile Computing and Networking. ACM, 82--94."},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/2742647.2742662"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1145\/3081333.3081356"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/3241539.3241575"}],"container-title":["Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3381008","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3381008","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:44:58Z","timestamp":1750203898000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3381008"}},"subtitle":["Utilizing Acoustic-Based Imaging for Issuing Contact-Free Silent Speech Commands"],"short-title":[],"issued":{"date-parts":[[2020,3,18]]},"references-count":55,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2020,3,18]]}},"alternative-id":["10.1145\/3381008"],"URL":"https:\/\/doi.org\/10.1145\/3381008","relation":{},"ISSN":["2474-9567"],"issn-type":[{"value":"2474-9567","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,3,18]]},"assertion":[{"value":"2020-03-18","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}