{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,31]],"date-time":"2025-12-31T12:19:14Z","timestamp":1767183554357,"version":"3.44.0"},"reference-count":90,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2023,12,19]],"date-time":"2023-12-19T00:00:00Z","timestamp":1702944000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"NSF","award":["2238433"],"award-info":[{"award-number":["2238433"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Interact. Mob. Wearable Ubiquitous Technol."],"published-print":{"date-parts":[[2023,12,19]]},"abstract":"<jats:p>This paper presents the design and implementation of Scribe, a comprehensive voice processing and handwriting interface for voice assistants. Distinct from prior works, Scribe is a precise tracking interface that can co-exist with the voice interface on low sampling rate voice assistants. Scribe can be used for 3D free-form drawing, writing, and motion tracking for gaming. Taking handwriting as a specific application, it can also capture natural strokes and the individualized style of writing while occupying only a single frequency. The core technique includes an accurate acoustic ranging method called Cross Frequency Continuous Wave (CFCW) sonar, enabling voice assistants to use ultrasound as a ranging signal while using the regular microphone system of voice assistants as a receiver. We also design a new optimization algorithm that only requires a single frequency for time difference of arrival. Scribe prototype achieves 73 \u03bcm of median error for 1D ranging and 1.4 mm of median error in 3D tracking of an acoustic beacon using the microphone array used in voice assistants. Our implementation of an in-air handwriting interface achieves 94.1% accuracy with automatic handwriting-to-text software, similar to writing on paper (96.6%). At the same time, the error rate of voice-based user authentication only increases from 6.26% to 8.28%.<\/jats:p>","DOI":"10.1145\/3631411","type":"journal-article","created":{"date-parts":[[2024,1,12]],"date-time":"2024-01-12T12:52:04Z","timestamp":1705063924000},"page":"1-31","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Scribe"],"prefix":"10.1145","volume":"7","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-4437-9457","authenticated-orcid":false,"given":"Yang","family":"Bai","sequence":"first","affiliation":[{"name":"University of Maryland College Park, Maryland, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3963-9966","authenticated-orcid":false,"given":"Irtaza","family":"Shahid","sequence":"additional","affiliation":[{"name":"University of Maryland College Park, Maryland, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5454-8413","authenticated-orcid":false,"given":"Harshvardhan","family":"Takawale","sequence":"additional","affiliation":[{"name":"University of Maryland College Park, Maryland, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5261-7780","authenticated-orcid":false,"given":"Nirupam","family":"Roy","sequence":"additional","affiliation":[{"name":"University of Maryland College Park, Maryland, USA"}]}],"member":"320","published-online":{"date-parts":[[2024,1,12]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"2021. Roughly 1 in 4 U.S. adults now owns a smart speaker according to New Report. https:\/\/martech.org\/roughly-1-in-4-u-s-adults-now-owns-a-smart-speaker-according-to-new-report\/"},{"key":"e_1_2_1_2_1","unstructured":"2022. 20kHz speaker. https:\/\/www.digikey.com\/en\/products\/detail\/pui-audio-inc.\/ASX05408-HD-R\/7227653utm_adgroup=Speakers&utm_source=google&utm_medium=cpc&utm_campaign=Shopping_Product_Audio."},{"key":"e_1_2_1_3_1","unstructured":"2022. 60kHz ultrasound speaker. https:\/\/www.steminc.com\/PZT\/en\/ultrasonic-air-transducer-60-khz."},{"key":"e_1_2_1_4_1","unstructured":"2022. 80kHz ultrasound speaker. https:\/\/www.steminc.com\/PZT\/en\/ultrasonic-air-transducer-80-khz."},{"key":"e_1_2_1_5_1","unstructured":"2022. Google Speech-To-Text API. https:\/\/cloud.google.com\/speech-to-text."},{"key":"e_1_2_1_6_1","unstructured":"2022. PMM-3738-VM1010-R MEMS Microphone. https:\/\/www.puiaudio.com\/products\/pmm-3738-vm1010-r."},{"volume-title":"Voice-enabled devices steer the growth of the voice assistant application market as per the Business Research Company's voice assistant application global market report","year":"2022","key":"e_1_2_1_7_1","unstructured":"2022. Voice-enabled devices steer the growth of the voice assistant application market as per the Business Research Company's voice assistant application global market report 2022. https:\/\/www.globenewswire.com\/news-release\/2022\/02\/02\/2377858\/0\/en\/Voice-Enabled-Devices-Steer-The-Growth-Of-The-Voice-Assistant-Application-Market-As-Per-The-Business-Research-Company-s-Voice-Assistant-Application-Global-Market-Report-2022.html"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1007\/s12008-014-0249-9"},{"key":"e_1_2_1_9_1","volume-title":"Proceedings of Graphics Interface 2014 (Montreal","author":"Annett Michelle","year":"2014","unstructured":"Michelle Annett, Fraser Anderson, Walter F. Bischof, and Anoop Gupta. 2014. The Pen is Mightier: Understanding Stylus Behaviour While Inking on Tablets. In Proceedings of Graphics Interface 2014 (Montreal, Quebec, Canada) (GI '14). Canadian Information Processing Society, CAN, 193--200."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3498361.3539775"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3498361.3538765"},{"key":"e_1_2_1_12_1","volume-title":"Poster: Inaudible High-throughput Communication Through Acoustic Signals. In The 25th Annual International Conference on Mobile Computing and Networking. 1--3.","author":"Bai Yang","year":"2019","unstructured":"Yang Bai, Jian Liu, Yingying Chen, Li Lu, and Jiadi Yu. 2019. Poster: Inaudible High-throughput Communication Through Acoustic Signals. In The 25th Annual International Conference on Mobile Computing and Networking. 1--3."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3384419.3430773"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.comnet.2020.107447"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cageo.2021.104844"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3384419.3430730"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3458864.3467885"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3161191"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3563766.3564091"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3117811.3117840"},{"key":"e_1_2_1_21_1","unstructured":"Murata Manufacturing Co. 2020. Ultrasound speaker. https:\/\/www.murata.com\/-\/media\/webrenewal\/products\/sensor\/ultrasonic\/open\/datasheet_maopn.ashx."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ijhcs.2011.11.002"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1093\/gerona\/56.9.M584"},{"key":"e_1_2_1_24_1","unstructured":"Analog Devices. 2013. Admp401: Omnidirectional microphone with bottom port and analog output."},{"key":"e_1_2_1_25_1","volume-title":"Signet: Convolutional siamese network for writer independent offline signature verification. arXiv preprint arXiv:1707.02131","author":"Dey Sounak","year":"2017","unstructured":"Sounak Dey, Anjan Dutta, J Ignacio Toledo, Suman K Ghosh, Josep Llad\u00f3s, and Umapada Pal. 2017. Signet: Convolutional siamese network for writer independent offline signature verification. arXiv preprint arXiv:1707.02131 (2017)."},{"key":"e_1_2_1_26_1","unstructured":"Victor Dibia. 2022. Signver - A deep learning library for signature verification. https:\/\/devpost.com\/software\/signver-a-deep-learning-library-for-signature-verification."},{"key":"e_1_2_1_27_1","volume-title":"An efficient approach for trilateration in 3D positioning. Computer communications 31, 17","author":"Doukhnitch Evgueni","year":"2008","unstructured":"Evgueni Doukhnitch, Muhammed Salamah, and Emre Ozen. 2008. An efficient approach for trilateration in 3D positioning. Computer communications 31, 17 (2008), 4124--4129."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448085"},{"key":"e_1_2_1_29_1","volume-title":"Surface parameterization: a tutorial and survey. Advances in multiresolution for geometric modelling","author":"Floater Michael S","year":"2005","unstructured":"Michael S Floater and Kai Hormann. 2005. Surface parameterization: a tutorial and survey. Advances in multiresolution for geometric modelling (2005), 157--186."},{"key":"e_1_2_1_30_1","doi-asserted-by":"crossref","unstructured":"Daniel Garcia-Romero and Carol Y Espy-Wilson. 2011. Analysis of i-vector length normalization in speaker recognition systems. In Twelfth annual conference of the international speech communication association.","DOI":"10.21437\/Interspeech.2011-53"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/3458864.3466906"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/3458864.3467880"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/2207676.2208331"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/ITI.2008.4588449"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM.2014.6847959"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3302506.3310386"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3267305.3267567"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/3210240.3210328"},{"key":"e_1_2_1_39_1","unstructured":"Keysight. 2020. https:\/\/www.keysight.com\/us\/en\/products\/waveform-and-function-generators.html."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/3143361.3143379"},{"key":"e_1_2_1_41_1","volume-title":"2009 IEEE International Conference on RFID. IEEE, 147--154","author":"Li Xin","year":"2009","unstructured":"Xin Li, Yimin Zhang, and Moeness G Amin. 2009. Multifrequency-based range estimation of RFID tags. In 2009 IEEE International Conference on RFID. IEEE, 147--154."},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3300061.3300139"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/3386901.3389030"},{"key":"e_1_2_1_44_1","volume-title":"16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19)","author":"Luo Zhihong","year":"2019","unstructured":"Zhihong Luo, Qiping Zhang, Yunfei Ma, Manish Singh, and Fadel Adib. 2019. 3D Backscatter Localization for {Fine-Grained} Robotics. In 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19). 765--782."},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/2973750.2973754"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/3117811.3117833"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/108844.108868"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/2973750.2973755"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/3432195"},{"volume-title":"Evolutionary algorithms and neural networks","author":"Mirjalili Seyedali","key":"e_1_2_1_50_1","unstructured":"Seyedali Mirjalili. 2019. Genetic algorithm. In Evolutionary algorithms and neural networks. Springer, 43--55."},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/2742647.2742674"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/3274783.3274851"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/2858036.2858580"},{"key":"e_1_2_1_54_1","volume-title":"Report: Voice assistants in use to triple to 8 billion by","author":"Perez Sarah","year":"2019","unstructured":"Sarah Perez. 2019. Report: Voice assistants in use to triple to 8 billion by 2023. https:\/\/techcrunch.com\/2019\/02\/12\/report-voice-assistants-in-use-to-triple-to-8-billion-by-2023\/"},{"key":"e_1_2_1_55_1","volume-title":"Digital handwriting with a finger or a stylus: a biomechanical comparison","author":"Prattichizzo Domenico","year":"2015","unstructured":"Domenico Prattichizzo, Leonardo Meli, and Monica Malvezzi. 2015. Digital handwriting with a finger or a stylus: a biomechanical comparison. IEEE transactions on haptics 8, 4 (2015), 356--370."},{"key":"e_1_2_1_56_1","unstructured":"Signal Processing and Speech Communication Laboratory. 2019. Pitch Tracking Database. https:\/\/www.spsc.tugraz.at\/databases-and-tools\/ptdb-tug-pitch-tracking-database-from-graz-university-of-technology.html."},{"key":"e_1_2_1_57_1","volume-title":"30th USENIX Security Symposium (USENIX Security 21)","author":"Ramesh Soundarya","year":"2021","unstructured":"Soundarya Ramesh, Rui Xiao, Anindya Maiti, Jong Taek Lee, Harini Ramprasad, Ananda Kumar, Murtuza Jadliwala, and Jun Han. 2021. Acoustics to the Rescue: Physical Key Inference Attack Revisited. In 30th USENIX Security Symposium (USENIX Security 21). 3255--3272."},{"key":"e_1_2_1_58_1","unstructured":"Nirupam Roy Haitham Hassanieh and Romit Roy Choudhury. 2017. BackDoor: Making Microphones Hear Inaudible Sounds. In ACM MobiSys."},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1145\/3081333.3081366"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1145\/2971648.2971736"},{"key":"e_1_2_1_61_1","doi-asserted-by":"crossref","unstructured":"Alla Sheffer Emil Praun Kenneth Rose et al. 2007. Mesh parameterization methods and their applications. Foundations and Trends\u00ae in Computer Graphics and Vision 2 2 (2007) 105--171.","DOI":"10.1561\/0600000011"},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1097\/00005768-200009001-00003"},{"key":"e_1_2_1_63_1","unstructured":"Synertial. 2021. Synertial Motion Capture. https:\/\/www.synertial.com\/."},{"key":"e_1_2_1_64_1","unstructured":"Xsens Technologies. 2021. Xsens Motion Capture. https:\/\/www.xsens.com\/."},{"key":"e_1_2_1_65_1","volume-title":"A global geometric framework for nonlinear dimensionality reduction. science 290, 5500","author":"Tenenbaum Joshua B","year":"2000","unstructured":"Joshua B Tenenbaum, Vin de Silva, and John C Langford. 2000. A global geometric framework for nonlinear dimensionality reduction. science 290, 5500 (2000), 2319--2323."},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1145\/2207676.2208584"},{"key":"e_1_2_1_67_1","unstructured":"Vicon Motion Systems Ltd UK. 2022. Vicon Motion Systems. https:\/\/www.vicon.com\/."},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1145\/3230543.3230565"},{"key":"e_1_2_1_69_1","doi-asserted-by":"publisher","DOI":"10.1145\/1240624.1240727"},{"key":"e_1_2_1_70_1","doi-asserted-by":"publisher","DOI":"10.1145\/3290605.3300248"},{"key":"e_1_2_1_71_1","volume-title":"Using smart speakers to contactlessly monitor heart rhythms. Communications biology 4, 1","author":"Wang Anran","year":"2021","unstructured":"Anran Wang, Dan Nguyen, Arun R Sridhar, and Shyamnath Gollakota. 2021. Using smart speakers to contactlessly monitor heart rhythms. Communications biology 4, 1 (2021), 1--12."},{"key":"e_1_2_1_72_1","doi-asserted-by":"publisher","DOI":"10.1145\/3300061.3345453"},{"key":"e_1_2_1_73_1","doi-asserted-by":"publisher","DOI":"10.1145\/2486001.2486029"},{"key":"e_1_2_1_74_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3369812","article-title":"Rfid tattoo: A wireless platform for speech recognition","volume":"3","author":"Wang Jingxian","year":"2019","unstructured":"Jingxian Wang, Chengfeng Pan, Haojian Jin, Vaibhav Singh, Yash Jain, Jason I Hong, Carmel Majidi, and Swarun Kumar. 2019. Rfid tattoo: A wireless platform for speech recognition. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 3, 4 (2019), 1--24.","journal-title":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies"},{"key":"e_1_2_1_75_1","doi-asserted-by":"publisher","DOI":"10.1145\/3412382.3458254"},{"key":"e_1_2_1_76_1","volume-title":"18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21)","author":"Wang Mei","year":"2021","unstructured":"Mei Wang, Wei Sun, and Lili Qiu. 2021. {MAVL}: Multiresolution Analysis of Voice Localization. In 18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21). 845--858."},{"key":"e_1_2_1_77_1","volume-title":"Proceedings of the 22nd Annual International Conference on Mobile Computing and Networking. 82--94","author":"Wang Wei","year":"2016","unstructured":"Wei Wang, Alex X Liu, and Ke Sun. 2016. Device-free gesture tracking using acoustic signals. In Proceedings of the 22nd Annual International Conference on Mobile Computing and Networking. 82--94."},{"key":"e_1_2_1_78_1","doi-asserted-by":"publisher","DOI":"10.1121\/1.1906764"},{"key":"e_1_2_1_79_1","doi-asserted-by":"publisher","DOI":"10.1121\/1.1908830"},{"key":"e_1_2_1_80_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNET.2017.2766526"},{"key":"e_1_2_1_81_1","doi-asserted-by":"publisher","DOI":"10.1145\/3307334.3326074"},{"key":"e_1_2_1_82_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.pmcj.2020.101183"},{"key":"e_1_2_1_83_1","doi-asserted-by":"publisher","DOI":"10.1145\/2639108.2639111"},{"key":"e_1_2_1_84_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISMAR.2015.37"},{"key":"e_1_2_1_85_1","doi-asserted-by":"publisher","DOI":"10.1007\/s42979-020-00179-y"},{"key":"e_1_2_1_86_1","doi-asserted-by":"publisher","DOI":"10.1145\/2742647.2742662"},{"key":"e_1_2_1_87_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3090095","article-title":"Soundtrak: Continuous 3d tracking of a finger using active acoustics","volume":"1","author":"Zhang Cheng","year":"2017","unstructured":"Cheng Zhang, Qiuyue Xue, Anandghan Waghmare, Sumeet Jain, Yiming Pu, Sinan Hersek, Kent Lyons, Kenneth A Cunefare, Omer T Inan, and Gregory D Abowd. 2017. Soundtrak: Continuous 3d tracking of a finger using active acoustics. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 1, 2 (2017), 1--25.","journal-title":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies"},{"key":"e_1_2_1_88_1","doi-asserted-by":"publisher","DOI":"10.1145\/3432192"},{"key":"e_1_2_1_89_1","doi-asserted-by":"publisher","DOI":"10.1145\/3133956.3134052"},{"key":"e_1_2_1_90_1","doi-asserted-by":"publisher","DOI":"10.1145\/1057432.1057439"}],"container-title":["Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3631411","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3631411","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,27]],"date-time":"2025-08-27T16:56:55Z","timestamp":1756313815000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3631411"}},"subtitle":["Simultaneous Voice and Handwriting Interface"],"short-title":[],"issued":{"date-parts":[[2023,12,19]]},"references-count":90,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2023,12,19]]}},"alternative-id":["10.1145\/3631411"],"URL":"https:\/\/doi.org\/10.1145\/3631411","relation":{},"ISSN":["2474-9567"],"issn-type":[{"type":"electronic","value":"2474-9567"}],"subject":[],"published":{"date-parts":[[2023,12,19]]},"assertion":[{"value":"2024-01-12","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}