{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,16]],"date-time":"2026-06-16T07:16:23Z","timestamp":1781594183741,"version":"3.54.5"},"reference-count":65,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2024,5,13]],"date-time":"2024-05-13T00:00:00Z","timestamp":1715558400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62132010"],"award-info":[{"award-number":["62132010"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Young Elite Scientists Sponsorship Program by CAST","award":["2021QNRC001"],"award-info":[{"award-number":["2021QNRC001"]}]},{"name":"Tsinghua University Initiative Scientifc Research Program, Beijing Key Lab of Networked Multimedia, Institute for Artificial Intelligence, Tsinghua University"},{"name":"Undergraduate \/ Graduate Education Innovation Grants, Tsinghua University, and Beijing National Research Center for Information Science and Technology"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Interact. Mob. Wearable Ubiquitous Technol."],"published-print":{"date-parts":[[2024,5,13]]},"abstract":"<jats:p>Subject-aware vocal activity sensing on wearables, which specifically recognizes and monitors the wearer's distinct vocal activities, is essential in advancing personal health monitoring and enabling context-aware applications. While recent advancements in earables present new opportunities, the absence of relevant datasets and effective methods remains a significant challenge. In this paper, we introduce EarSAVAS, the first publicly available dataset constructed specifically for subject-aware human vocal activity sensing on earables. EarSAVAS encompasses eight distinct vocal activities from both the earphone wearer and bystanders, including synchronous two-channel audio and motion data collected from 42 participants totaling 44.5 hours. Further, we propose EarVAS, a lightweight multi-modal deep learning architecture that enables efficient subject-aware vocal activity recognition on earables. To validate the reliability of EarSAVAS and the efficiency of EarVAS, we implemented two advanced benchmark models. Evaluation results on EarSAVAS reveal EarVAS's effectiveness with an accuracy of 90.84% and a Macro-AUC of 89.03%. Comprehensive ablation experiments were conducted on benchmark models and demonstrated the effectiveness of feedback microphone audio and highlighted the potential value of sensor fusion in subject-aware vocal activity sensing on earables. We hope that the proposed EarSAVAS and benchmark models can inspire other researchers to further explore efficient subject-aware human vocal activity sensing on earables.<\/jats:p>","DOI":"10.1145\/3659616","type":"journal-article","created":{"date-parts":[[2024,5,15]],"date-time":"2024-05-15T12:20:41Z","timestamp":1715775641000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":10,"title":["The EarSAVAS Dataset"],"prefix":"10.1145","volume":"8","author":[{"ORCID":"https:\/\/orcid.org\/0009-0002-9337-2278","authenticated-orcid":false,"given":"Xiyuxing","family":"Zhang","sequence":"first","affiliation":[{"name":"Key Laboratory of Pervasive Computing, Ministry of Education, Department of Computer Science and Technology, Tsinghua University, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4249-8893","authenticated-orcid":false,"given":"Yuntao","family":"Wang","sequence":"additional","affiliation":[{"name":"Key Laboratory of Pervasive Computing, Ministry of Education, Department of Computer Science and Technology, Tsinghua University, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-1056-2441","authenticated-orcid":false,"given":"Yuxuan","family":"Han","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Technology, Tsinghua University, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0579-2716","authenticated-orcid":false,"given":"Chen","family":"Liang","sequence":"additional","affiliation":[{"name":"Key Laboratory of Pervasive Computing, Ministry of Education, Department of Computer Science and Technology, Tsinghua University, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2123-6392","authenticated-orcid":false,"given":"Ishan","family":"Chatterjee","sequence":"additional","affiliation":[{"name":"Paul G. Allen School of Computer Science and Engineering, University of Washington, United States"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-5388-4552","authenticated-orcid":false,"given":"Jiankai","family":"Tang","sequence":"additional","affiliation":[{"name":"Xinya College, Tsinghua University, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8041-7962","authenticated-orcid":false,"given":"Xin","family":"Yi","sequence":"additional","affiliation":[{"name":"Institute for Network Sciences and Cyberspace, Tsinghua University, China"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6300-4389","authenticated-orcid":false,"given":"Shwetak","family":"Patel","sequence":"additional","affiliation":[{"name":"Paul G. Allen School of Computer Science and Engineering, University of Washington, United States"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2273-6927","authenticated-orcid":false,"given":"Yuanchun","family":"Shi","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Technology, Tsinghua University, China and Intelligent Computing and Application Laboratory of Qinghai Province, Qinghai University, China"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"320","published-online":{"date-parts":[[2024,5,15]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/3544548.3581265"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.comnet.2020.107447"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.3390\/s140406474"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3130902"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/JIOT.2021.3063479"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/JSEN.2021.3098861"},{"key":"e_1_2_1_7_1","volume-title":"Global smart device shipment forecasts 2020 to","year":"2023","unstructured":"Canalys. 2020. Global smart device shipment forecasts 2020 to 2023. https:\/\/www.canalys.com\/newsroom\/canalys-worldwide-smart-device-shipments-2023 Last accessed September 2023."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM42981.2021.9488852"},{"key":"e_1_2_1_9_1","unstructured":"Sanyuan Chen Yu Wu Chengyi Wang Shujie Liu Daniel Tompkins Zhuo Chen and Furu Wei. 2022. BEATs: Audio Pre-Training with Acoustic Tokenizers. arXiv:2212.09058 [eess.AS]"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/3446382.3450216"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1109\/PerComWorkshops53856.2022.9767394"},{"key":"e_1_2_1_12_1","volume-title":"Voxceleb2: Deep speaker recognition. arXiv preprint arXiv:1806.05622","author":"Chung Joon Son","year":"2018","unstructured":"Joon Son Chung, Arsha Nagrani, and Andrew Zisserman. 2018. Voxceleb2: Deep speaker recognition. arXiv preprint arXiv:1806.05622 (2018)."},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/3594739.3610673"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2015.06.026"},{"key":"e_1_2_1_15_1","doi-asserted-by":"crossref","unstructured":"Eduardo Fonseca Xavier Favory Jordi Pons Frederic Font and Xavier Serra. 2022. FSD50K: An Open Dataset of Human-Labeled Sound Events. arXiv:2010.00475 [cs.SD]","DOI":"10.1109\/TASLP.2021.3133208"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/CHASE.2016.14"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3610872"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.medengphy.2013.07.011"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2017.7952261"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/taslp.2021.3120633"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/icassp43922.2022.9746828"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/TBCAS.2015.2504959"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2016.12.035"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/EMBC46164.2021.9629886"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","unstructured":"Feiyu Han Panlong Yang Shaojie Yan Haohua Du and Yuanhao Feng. 2023. BreathSign: Transparent and Continuous In-ear Authentication Using Bone-conducted Breathing Biometrics. 1--10. https:\/\/doi.org\/10.1109\/INFOCOM53939.2023.10229037","DOI":"10.1109\/INFOCOM53939.2023.10229037"},{"key":"e_1_2_1_26_1","volume-title":"Speaker recognition by machines and humans: A tutorial review","author":"Hansen John HL","year":"2015","unstructured":"John HL Hansen and Taufiq Hasan. 2015. Speaker recognition by machines and humans: A tutorial review. IEEE Signal processing magazine 32, 6 (2015), 74--99."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/3241539.3241548"},{"key":"e_1_2_1_28_1","volume-title":"Kingma and Jimmy Ba","author":"Diederik","year":"2017","unstructured":"Diederik P. Kingma and Jimmy Ba. 2017. Adam: A Method for Stochastic Optimization. arXiv:1412.6980 [cs.LG]"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2022.10.015"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3544548.3581008"},{"key":"e_1_2_1_31_1","unstructured":"Tsung-Yi Lin Priya Goyal Ross Girshick Kaiming He and Piotr Doll\u00e1r. 2018. Focal Loss for Dense Object Detection. arXiv:1708.02002 [cs.CV]"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/BSN.2016.7516246"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/3211960.3211970"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3550284"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/bsn51625.2021.9507017"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/BigData52589.2021.9671796"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2003.817122"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.21437\/interspeech.2019-2680"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP39728.2021.9414356"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.7910\/DVN\/YDEPUT"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3372224.3419197"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM.2018.8485978"},{"key":"e_1_2_1_43_1","unstructured":"Md Juber Rahman Ebrahim Nemati Mahbubur Rahman Korosh Vatanparvar Viswam Nathan and Jilong Kuang. 2019. Efficient Online Cough Detection with a Minimal Feature Set Using Smartphones for Automated Assessment of Pulmonary Patients."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1049\/sil2.12233"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","unstructured":"Yanzhi Ren Chen Wang Jie Yang and Yingying Chen. 2015. Fine-grained sleep monitoring: Hearing your breathing with smartphones. 1194--1202. https:\/\/doi.org\/10.1109\/INFOCOM.2015.7218494","DOI":"10.1109\/INFOCOM.2015.7218494"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1145\/3550314"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/3345615.3361130"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448121"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/3345615.3361134"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1145\/3569472"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","unstructured":"Akira Tamamori Tomoki Hayashi Tomoki Toda and Kazuya Takeda. 2017. An investigation of recurrent neural network for daily activity recognition using multi-modal signals. 1334--1340. https:\/\/doi.org\/10.1109\/APSIPA.2017.8282239","DOI":"10.1109\/APSIPA.2017.8282239"},{"key":"e_1_2_1_52_1","volume-title":"Le","author":"Tan Mingxing","year":"2020","unstructured":"Mingxing Tan and Quoc V. Le. 2020. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv:1905.11946 [cs.LG]"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2023.3234974"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","unstructured":"Timo Tigges Thomas B\u00fcchler Alexandru-Gabriel Pielmus Michael Klum Aarne Feldheiser Oliver Hunsicker and Reinhold Orglmeister. 2018. Assessment of In-ear Photoplethysmography as a Surrogate for Electrocardiography in Heart Rate Variability Analysis. 293--297. https:\/\/doi.org\/10.1007\/978-981-10-9038-7_54","DOI":"10.1007\/978-981-10-9038-7_54"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP43922.2022.9747661"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/embc44109.2020.9176835"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1145\/3389189.3389196"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/3491102.3517698"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ymeth.2022.05.002"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1145\/3313831.3376836"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1145\/3332165.3347950"},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1109\/UEMCON.2017.8249110"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1109\/IoTDI54339.2022.00014"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1145\/3544549.3585903"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2010.02.005"}],"container-title":["Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3659616","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3659616","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,22]],"date-time":"2025-08-22T17:03:39Z","timestamp":1755882219000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3659616"}},"subtitle":["Enabling Subject-Aware Vocal Activity Sensing on Earables"],"short-title":[],"issued":{"date-parts":[[2024,5,13]]},"references-count":65,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2024,5,13]]}},"alternative-id":["10.1145\/3659616"],"URL":"https:\/\/doi.org\/10.1145\/3659616","relation":{},"ISSN":["2474-9567"],"issn-type":[{"value":"2474-9567","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,5,13]]},"assertion":[{"value":"2024-05-15","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}