{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,7]],"date-time":"2026-03-07T18:25:12Z","timestamp":1772907912689,"version":"3.50.1"},"reference-count":68,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2022,9,6]],"date-time":"2022-09-06T00:00:00Z","timestamp":1662422400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Interact. Mob. Wearable Ubiquitous Technol."],"published-print":{"date-parts":[[2022,9,6]]},"abstract":"<jats:p>Despite advances in audio- and motion-based human activity recognition (HAR) systems, a practical, power-efficient, and privacy-sensitive activity recognition system has remained elusive. State-of-the-art activity recognition systems often require power-hungry and privacy-invasive audio data. This is especially challenging for resource-constrained wearables, such as smartwatches. To counter the need for an always-on audio-based activity classification system, we first make use of power and compute-optimized IMUs sampled at 50 Hz to act as a trigger for detecting activity events. Once detected, we use a multimodal deep learning model that augments the motion data with audio data captured on a smartwatch. We subsample this audio to rates \u2264 1 kHz, rendering spoken content unintelligible, while also reducing power consumption on mobile devices. Our multimodal deep learning model achieves a recognition accuracy of 92.2% across 26 daily activities in four indoor environments. Our findings show that subsampling audio from 16 kHz down to 1 kHz, in concert with motion data, does not result in a significant drop in inference accuracy. We also analyze the speech content intelligibility and power requirements of audio sampled at less than 1 kHz and demonstrate that our proposed approach can improve the practicality of human activity recognition systems.<\/jats:p>","DOI":"10.1145\/3550284","type":"journal-article","created":{"date-parts":[[2022,9,7]],"date-time":"2022-09-07T14:54:27Z","timestamp":1662562467000},"page":"1-19","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":47,"title":["SAMoSA"],"prefix":"10.1145","volume":"6","author":[{"given":"Vimal","family":"Mollyn","sequence":"first","affiliation":[{"name":"Carnegie Mellon University, Pittsburgh, PA, USA"}]},{"given":"Karan","family":"Ahuja","sequence":"additional","affiliation":[{"name":"Carnegie Mellon University, Pittsburgh, PA, USA"}]},{"given":"Dhruv","family":"Verma","sequence":"additional","affiliation":[{"name":"University of Toronto, Toronto, ON, Canada"}]},{"given":"Chris","family":"Harrison","sequence":"additional","affiliation":[{"name":"Carnegie Mellon University, Pittsburgh, PA, USA"}]},{"given":"Mayank","family":"Goel","sequence":"additional","affiliation":[{"name":"Carnegie Mellon University, Pittsburgh, PA, USA"}]}],"member":"320","published-online":{"date-parts":[[2022,9,7]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448083"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1145\/3411764.3445582"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.279"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/JSEN.2020.2985374"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1121\/1.1498853"},{"key":"e_1_2_1_6_1","volume-title":"Intille","author":"Bao Ling","year":"2004","unstructured":"Ling Bao and Stephen S. Intille. 2004. Activity Recognition from User-Annotated Acceleration Data. In Pervasive Computing, Alois Ferscha and Friedemann Mattern (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 1--17."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3341163.3347735"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1121\/1.3670000"},{"key":"e_1_2_1_9_1","volume-title":"Davide Del Testa","author":"Bojarski Mariusz","year":"2016","unstructured":"Mariusz Bojarski, Davide Del Testa, Daniel Dworakowski, Bernhard Firner, Beat Flepp, Prasoon Goyal, Lawrence D Jackel, Mathew Monfort, Urs Muller, Jiakai Zhang, et al. 2016. End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316 (2016)."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.01164"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1459359.1459472"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3447744"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298878"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2014.6854641"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1121\/1.1908992"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/1620545.1620581"},{"key":"e_1_2_1_17_1","volume-title":"A survey of quantization methods for efficient neural network inference. arXiv preprint arXiv:2103.13630","author":"Gholami Amir","year":"2021","unstructured":"Amir Gholami, Sehoon Kim, Zhen Dong, Zhewei Yao, Michael W Mahoney, and Kurt Keutzer. 2021. A survey of quantization methods for efficient neural network inference. arXiv preprint arXiv:2103.13630 (2021)."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1121\/1.1804628"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2016.09.005"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCSP48568.2020.9182416"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/1864349.1864375"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2017.7952132"},{"key":"e_1_2_1_23_1","volume-title":"Proceedings of the 32nd International Conference on International Conference on Machine Learning -","volume":"37","author":"Ioffe Sergey","year":"2015","unstructured":"Sergey Ioffe and Christian Szegedy. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37 (Lille, France) (ICML'15). JMLR.org, 448--456."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3411764.3445169"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298932"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1121\/1.1862575"},{"key":"e_1_2_1_27_1","volume-title":"Kingma and Jimmy Ba","author":"Diederik","year":"2015","unstructured":"Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http:\/\/arxiv.org\/abs\/1412.6980"},{"key":"e_1_2_1_28_1","unstructured":"Knowles. 2022. SPH0645 Digital MEMS Microphone. https:\/\/www.digikey.com\/en\/products\/detail\/knowles\/SPH0645LM4H-B\/5332440"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","unstructured":"Raghuraman Krishnamoorthi. 2018. Quantizing deep convolutional networks for efficient inference: A whitepaper. https:\/\/doi.org\/10.48550\/ARXIV.1806.08342","DOI":"10.48550\/ARXIV.1806.08342"},{"key":"e_1_2_1_30_1","volume-title":"Robust and Deployable Gesture Recognition for Smartwatches. In 27th International Conference on Intelligent User Interfaces. 277--291","author":"Kunwar Utkarsh","year":"2022","unstructured":"Utkarsh Kunwar, Sheetal Borar, Moritz Berghofer, Julia Kylm\u00e4l\u00e4, Ilhan Aslan, Luis A Leiva, and Antti Oulasvirta. 2022. Robust and Deployable Gesture Recognition for Smartwatches. In 27th International Conference on Intelligent User Interfaces. 277--291."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/1964897.1964918"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/3411841"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/2750858.2804262"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3242587.3242609"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/3290605.3300568"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/2984511.2984582"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3025453.3025773"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/2030112.2030163"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/BIGCOMP.2017.7881728"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/3397318"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3379503.3403551"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3314404"},{"key":"e_1_2_1_43_1","volume-title":"Recognizing workshop activity using body worn microphones and accelerometers. In International conference on pervasive computing. Springer, 18--32","author":"Lukowicz Paul","year":"2004","unstructured":"Paul Lukowicz, Jamie A Ward, Holger Junker, Mathias St\u00e4ger, Gerhard Tr\u00f6ster, Amin Atrash, and Thad Starner. 2004. Recognizing workshop activity using body worn microphones and accelerometers. In International conference on pervasive computing. Springer, 18--32."},{"key":"e_1_2_1_44_1","volume-title":"Explain images with multimodal recurrent neural networks. arXiv preprint arXiv:1410.1090","author":"Mao Junhua","year":"2014","unstructured":"Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, and Alan L Yuille. 2014. Explain images with multimodal recurrent neural networks. arXiv preprint arXiv:1410.1090 (2014)."},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/3494998"},{"key":"e_1_2_1_46_1","unstructured":"Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve restricted boltzmann machines. In Icml."},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2015.7178964"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.5555\/1953048.2078195"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/3161174"},{"key":"e_1_2_1_50_1","volume-title":"Proceedings of the 17th Conference on Innovative Applications of Artificial Intelligence -","volume":"3","author":"Ravi Nishkam","unstructured":"Nishkam Ravi, Nikhil Dandekar, Preetham Mysore, and Michael L. Littman. 2005. Activity Recognition from Accelerometer Data. In Proceedings of the 17th Conference on Innovative Applications of Artificial Intelligence - Volume 3 (Pittsburgh, Pennsylvania) (IAAI'05). AAAI Press, 1541--1546."},{"key":"e_1_2_1_51_1","volume-title":"European Conference on Computer Vision. Springer, 356--371","author":"Rogez Gr\u00e9gory","year":"2014","unstructured":"Gr\u00e9gory Rogez, Maryam Khademi, JS Supan\u010di\u010d III, Jose Maria Martinez Montiel, and Deva Ramanan. 2014. 3d hand pose detection in egocentric rgb-d images. In European Conference on Computer Vision. Springer, 356--371."},{"key":"e_1_2_1_52_1","doi-asserted-by":"crossref","unstructured":"George Saon Gakuto Kurata Tom Sercu Kartik Audhkhasi Samuel Thomas Dimitrios Dimitriadis Xiaodong Cui Bhuvana Ramabhadran Michael Picheny Lynn-Li Lim Bergul Roomi and Phil Hall. 2017. English Conversational Telephone Speech Recognition by Humans and Machines. In INTERSPEECH.","DOI":"10.21437\/Interspeech.2017-405"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1109\/PERCOMW.2015.7134104"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.3390\/s19143213"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1109\/ROMAN.2012.6343802"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/3432701"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICICES.2016.7518920"},{"key":"e_1_2_1_58_1","unstructured":"TDK-InvenSense. 2022. INMP441 Digital MEMS Microphone. https:\/\/invensense.tdk.com\/products\/digital\/inmp441\/"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1145\/3161192"},{"key":"e_1_2_1_60_1","volume-title":"Proceedings, The Twentieth National Conference on Artificial Intelligence and the Seventeenth Innovative Applications of Artificial Intelligence Conference","author":"Manuela","year":"2005","unstructured":"Manuela M. Veloso and Subbarao Kambhampati (Eds.). 2005. Proceedings, The Twentieth National Conference on Artificial Intelligence and the Seventeenth Innovative Applications of Artificial Intelligence Conference, July 9-13, 2005, Pittsburgh, Pennsylvania, USA. AAAI Press \/ The MIT Press."},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1145\/3478085"},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.1998.710744"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2006.197"},{"key":"e_1_2_1_64_1","volume-title":"Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1--14","author":"Harrison Chris","year":"2020","unstructured":"JasonWu, Chris Harrison, Jeffrey P Bigham, and Gierad Laput. 2020. Automated Class Discovery and One-Shot Interactions for Acoustic Activity Recognition. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1--14."},{"key":"e_1_2_1_65_1","doi-asserted-by":"crossref","unstructured":"W. Xiong J. Droppo X. Huang F. Seide M. Seltzer A. Stolcke D. Yu and G. Zweig. 2017. Achieving Human Parity in Conversational Speech Recognition. arXiv:1610.05256 [cs.CL]","DOI":"10.1109\/TASLP.2017.2756440"},{"key":"e_1_2_1_66_1","volume-title":"Phyo Phyo San, Xiao Li Li, and Shonali Krishnaswamy.","author":"Yang Jianbo","year":"2015","unstructured":"Jianbo Yang, Minh Nhut Nguyen, Phyo Phyo San, Xiao Li Li, and Shonali Krishnaswamy. 2015. Deep convolutional neural networks on multichannel time series for human activity recognition. In Twenty-fourth international joint conference on artificial intelligence."},{"key":"e_1_2_1_67_1","doi-asserted-by":"publisher","DOI":"10.4108\/icst.mobicase.2014.257786"},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1145\/3130985"}],"container-title":["Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3550284","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3550284","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,14]],"date-time":"2025-07-14T04:42:17Z","timestamp":1752468137000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3550284"}},"subtitle":["Sensing Activities with Motion and Subsampled Audio"],"short-title":[],"issued":{"date-parts":[[2022,9,6]]},"references-count":68,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2022,9,6]]}},"alternative-id":["10.1145\/3550284"],"URL":"https:\/\/doi.org\/10.1145\/3550284","relation":{},"ISSN":["2474-9567"],"issn-type":[{"value":"2474-9567","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,9,6]]},"assertion":[{"value":"2022-09-07","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}