{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,28]],"date-time":"2026-01-28T21:47:24Z","timestamp":1769636844075,"version":"3.49.0"},"reference-count":49,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2021,3,19]],"date-time":"2021-03-19T00:00:00Z","timestamp":1616112000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Interact. Mob. Wearable Ubiquitous Technol."],"published-print":{"date-parts":[[2021,3,19]]},"abstract":"<jats:p>Conversational assistants in the form of stand-alone devices such as Amazon Echo and Google Home have become popular and embraced by millions of people. By serving as a natural interface to services ranging from home automation to media players, conversational assistants help people perform many tasks with ease, such as setting timers, playing music and managing to-do lists. While these systems offer useful capabilities, they are largely passive and unaware of the human behavioral context in which they are used. In this work, we explore how off-the-shelf conversational assistants can be enhanced with acoustic-based human activity recognition by leveraging the short interval after a voice command is given to the device. Since always-on audio recording can pose privacy concerns, our method is unique in that it does not require capturing and analyzing any audio other than the speech-based interactions between people and their conversational assistants. In particular, we leverage background environmental sounds present in these short duration voice-based interactions to recognize activities of daily living. We conducted a study with 14 participants in 3 different locations in their own homes. We showed that our method can recognize 19 different activities of daily living with average precision of 84.85% and average recall of 85.67% in a leave-one-participant-out performance evaluation with 30-second audio clips bound by the voice interactions.<\/jats:p>","DOI":"10.1145\/3448090","type":"journal-article","created":{"date-parts":[[2021,3,30]],"date-time":"2021-03-30T18:56:41Z","timestamp":1617130601000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":26,"title":["Ok Google, What Am I Doing?"],"prefix":"10.1145","volume":"5","author":[{"given":"Rebecca","family":"Adaimi","sequence":"first","affiliation":[{"name":"University of Texas at Austin, Speedway, Austin, Texas, USA"}]},{"given":"Howard","family":"Yong","sequence":"additional","affiliation":[{"name":"University of Texas at Austin, Speedway, Austin, Texas, USA"}]},{"given":"Edison","family":"Thomaz","sequence":"additional","affiliation":[{"name":"University of Texas at Austin, Speedway, Austin, Texas, USA"}]}],"member":"320","published-online":{"date-parts":[[2021,3,30]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Smart Home Personal Assistants: A Security and Privacy Review. Comput. Surveys (22","year":"2020","unstructured":"2020. Smart Home Personal Assistants: A Security and Privacy Review. Comput. Surveys (22 July 2020 ). 2020. Smart Home Personal Assistants: A Security and Privacy Review. Comput. Surveys (22 July 2020)."},{"key":"e_1_2_1_2_1","volume-title":"Fifteenth Symposium on Usable Privacy and Security (SOUPS 2019","author":"Abdi Noura","year":"2019","unstructured":"Noura Abdi , Kopo M. Ramokapane , and Jose M. Such . 2019. More than Smart Speakers: Security and Privacy Perceptions of Smart Home Personal Assistants . In Fifteenth Symposium on Usable Privacy and Security (SOUPS 2019 ). USENIX Association, Santa Clara, CA. https:\/\/www.usenix.org\/conference\/soups 2019 \/presentation\/abdi Noura Abdi, Kopo M. Ramokapane, and Jose M. Such. 2019. More than Smart Speakers: Security and Privacy Perceptions of Smart Home Personal Assistants. In Fifteenth Symposium on Usable Privacy and Security (SOUPS 2019). USENIX Association, Santa Clara, CA. https:\/\/www.usenix.org\/conference\/soups2019\/presentation\/abdi"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.3390\/info6030505"},{"key":"e_1_2_1_4_1","volume-title":"2010 IEEE International Workshop on Medical Measurements and Applications. 32--37","author":"Arcelus A.","unstructured":"A. Arcelus , R. Goubran , H. Sveistrup , M. Bilodeau , and F. Knoefel . 2010. Context-aware smart home monitoring through pressure measurement sequences . In 2010 IEEE International Workshop on Medical Measurements and Applications. 32--37 . A. Arcelus, R. Goubran, H. Sveistrup, M. Bilodeau, and F. Knoefel. 2010. Context-aware smart home monitoring through pressure measurement sequences. In 2010 IEEE International Workshop on Medical Measurements and Applications. 32--37."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3264901"},{"key":"e_1_2_1_6_1","unstructured":"John Callaham. 2018. Speaking to Google Home will now be more natural with Continued Conversation. https:\/\/www.androidauthority.com\/google-home-continued-conversation-878770\/  John Callaham. 2018. Speaking to Google Home will now be more natural with Continued Conversation. https:\/\/www.androidauthority.com\/google-home-continued-conversation-878770\/"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2016.01.020"},{"key":"e_1_2_1_8_1","volume-title":"Eating Episode Detection with Jawbone-Mounted Inertial Sensing. In 2020 42nd Annual International Conference of the IEEE Engineering in Medicine Biology Society (EMBC). 4361--4364","author":"Chun K. S.","year":"2020","unstructured":"K. S. Chun , H. Jeong , R. Adaimi , and E. Thomaz . 2020 . Eating Episode Detection with Jawbone-Mounted Inertial Sensing. In 2020 42nd Annual International Conference of the IEEE Engineering in Medicine Biology Society (EMBC). 4361--4364 . https:\/\/doi.org\/10.1109\/EMBC44109. 2020 .9175949 K. S. Chun, H. Jeong, R. Adaimi, and E. Thomaz. 2020. Eating Episode Detection with Jawbone-Mounted Inertial Sensing. In 2020 42nd Annual International Conference of the IEEE Engineering in Medicine Biology Society (EMBC). 4361--4364. https:\/\/doi.org\/10.1109\/EMBC44109.2020.9175949"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/3301275.3302315"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/MSP.2015.2503881"},{"key":"e_1_2_1_11_1","volume-title":"Ambient Intelligence, Emile Aarts, Boris de Ruyter, Panos Markopoulos, Evert van Loenen, Reiner Wichert, Ben Schouten, Jacques Terken, Rob Van Kranenburg, Elke Den Ouden, and Gregory O'Hare (Eds.)","author":"Dimitrov Svilen","unstructured":"Svilen Dimitrov , Jochen Britz , Boris Brandherm , and Jochen Frey . 2014. Analyzing Sounds of Home Environment for Device Recognition . In Ambient Intelligence, Emile Aarts, Boris de Ruyter, Panos Markopoulos, Evert van Loenen, Reiner Wichert, Ben Schouten, Jacques Terken, Rob Van Kranenburg, Elke Den Ouden, and Gregory O'Hare (Eds.) . Springer International Publishing , Cham , 1--16. Svilen Dimitrov, Jochen Britz, Boris Brandherm, and Jochen Frey. 2014. Analyzing Sounds of Home Environment for Device Recognition. In Ambient Intelligence, Emile Aarts, Boris de Ruyter, Panos Markopoulos, Evert van Loenen, Reiner Wichert, Ben Schouten, Jacques Terken, Rob Van Kranenburg, Elke Den Ouden, and Gregory O'Hare (Eds.). Springer International Publishing, Cham, 1--16."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3078072.3084330"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.2478\/popets-2020-0072"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/TBME.2016.2566619"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3381002"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2017.7952261"},{"key":"e_1_2_1_17_1","unstructured":"Google. [n.d.]. Introduction to the Google Assistant Service | Google Assistant SDK. https:\/\/developers.google.com\/assistant\/sdk\/guides\/service\/python  Google. [n.d.]. Introduction to the Google Assistant Service | Google Assistant SDK. https:\/\/developers.google.com\/assistant\/sdk\/guides\/service\/python"},{"key":"e_1_2_1_18_1","volume-title":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 131--135","author":"Hershey S.","unstructured":"S. Hershey , S. Chaudhuri , D. P. W. Ellis , J. F. Gemmeke , A. Jansen , R. C. Moore , M. Plakal , D. Platt , R. A. Saurous , B. Seybold , M. Slaney , R. J. Weiss , and K. Wilson . 2017. CNN architectures for large-scale audio classification . In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 131--135 . S. Hershey, S. Chaudhuri, D. P. W. Ellis, J. F. Gemmeke, A. Jansen, R. C. Moore, M. Plakal, D. Platt, R. A. Saurous, B. Seybold, M. Slaney, R. J. Weiss, and K. Wilson. 2017. CNN architectures for large-scale audio classification. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 131--135."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/2632048.2636084"},{"key":"e_1_2_1_20_1","volume-title":"Loup Ventures Says 75% of U.S. Households Will Have Smart Speakers by","author":"Kinsella Bret","year":"2025","unstructured":"Bret Kinsella . 2019. Loup Ventures Says 75% of U.S. Households Will Have Smart Speakers by 2025 , Google to Surpass Amazon in Market Share . https:\/\/voicebot.ai\/2019\/06\/18\/loup-ventures-says-75-of-u-s-households-will-have-smart-speakers-by-2025-google-to-surpass-amazon-in-market-share\/ Bret Kinsella. 2019. Loup Ventures Says 75% of U.S. Households Will Have Smart Speakers by 2025, Google to Surpass Amazon in Market Share. https:\/\/voicebot.ai\/2019\/06\/18\/loup-ventures-says-75-of-u-s-households-will-have-smart-speakers-by-2025-google-to-surpass-amazon-in-market-share\/"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/2854946.2854961"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2020.3030497"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3290607.3299053"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/3242587.3242609"},{"key":"e_1_2_1_25_1","volume-title":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1951--1960","author":"Li Y.","unstructured":"Y. Li , W. Li , V. Mahadevan , and N. Vasconcelos . 2016. VLAD3: Encoding Dynamics of Deep Features for Action Recognition . In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1951--1960 . Y. Li, W. Li, V. Mahadevan, and N. Vasconcelos. 2016. VLAD3: Encoding Dynamics of Deep Features for Action Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1951--1960."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1145\/3314404"},{"key":"e_1_2_1_27_1","doi-asserted-by":"crossref","unstructured":"Martino Lombardi Roberto Vezzani and Rita Cucchiara. 2015. Detection of Human Movements with Pressure Floor Sensors. In ICIAP.  Martino Lombardi Roberto Vezzani and Rita Cucchiara. 2015. Detection of Human Movements with Pressure Floor Sensors. In ICIAP.","DOI":"10.1007\/978-3-319-23234-8_57"},{"key":"e_1_2_1_28_1","unstructured":"Raspberry Pi Camera Module. [n.d.]. Raspberry Pi Camera Module. https:\/\/www.raspberrypi.org\/documentation\/usage\/camera\/  Raspberry Pi Camera Module. [n.d.]. Raspberry Pi Camera Module. https:\/\/www.raspberrypi.org\/documentation\/usage\/camera\/"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/JSEN.2014.2370945"},{"key":"e_1_2_1_30_1","volume-title":"Privacy As Contextual Integrity. Washington Law Review 79 (05","author":"Nissenbaum Helen","year":"2004","unstructured":"Helen Nissenbaum . 2004. Privacy As Contextual Integrity. Washington Law Review 79 (05 2004 ). Helen Nissenbaum. 2004. Privacy As Contextual Integrity. Washington Law Review 79 (05 2004)."},{"key":"e_1_2_1_31_1","unstructured":"NPR. 2020. NPR and Edison Research Report: 60M U.S. Adults 18 Own a Smart Speaker. https:\/\/www.npr.org\/about-npr\/794588984\/npr-and-edison-research-report-60m-u-s-adults-18-own-a-smart-speaker  NPR. 2020. NPR and Edison Research Report: 60M U.S. Adults 18 Own a Smart Speaker. https:\/\/www.npr.org\/about-npr\/794588984\/npr-and-edison-research-report-60m-u-s-adults-18-own-a-smart-speaker"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/2968219.2971400"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/3173574.3174214"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/3173574.3174214"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/3359316"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3173574.3174033"},{"key":"e_1_2_1_37_1","unstructured":"Zafar Rafii and Bryan Pardo. 2012. Music\/Voice Separation Using the Similarity Matrix. In ISMIR. 583--588.  Zafar Rafii and Bryan Pardo. 2012. Music\/Voice Separation Using the Similarity Matrix. In ISMIR. 583--588."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/3173574.3174072"},{"key":"e_1_2_1_39_1","unstructured":"Bradley Spicer. 2020. How Much Internet Speed Does Your Smart Home Need? https:\/\/www.smarthomebit.com\/how-much-internet-speed-does-your-smart-home-need\/  Bradley Spicer. 2020. How Much Internet Speed Does Your Smart Home Need? https:\/\/www.smarthomebit.com\/how-much-internet-speed-does-your-smart-home-need\/"},{"key":"e_1_2_1_40_1","unstructured":"Seeed Studio. [n.d.]. ReSpeaker 4-Mic Linear Array Kit for Raspberry Pi. http:\/\/wiki.seeedstudio.com\/ReSpeaker_4-Mic_Linear_Array_Kit_for_Raspberry_Pi\/  Seeed Studio. [n.d.]. ReSpeaker 4-Mic Linear Array Kit for Raspberry Pi. http:\/\/wiki.seeedstudio.com\/ReSpeaker_4-Mic_Linear_Array_Kit_for_Raspberry_Pi\/"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/3369807"},{"key":"e_1_2_1_42_1","volume-title":"Lane","author":"Tong Catherine","year":"2020","unstructured":"Catherine Tong , Shyam A. Tailor , and Nicholas D . Lane . 2020 . Are Accelerometers for Activity Recognition a Dead-end ? arXiv:2001.08111 [cs.CV] Catherine Tong, Shyam A. Tailor, and Nicholas D. Lane. 2020. Are Accelerometers for Activity Recognition a Dead-end? arXiv:2001.08111 [cs.CV]"},{"key":"e_1_2_1_43_1","volume-title":"Exploiting Environmental Sounds for Activity Recognition in Smart Homes. In AAAI Workshop: Artificial Intelligence Applied to Assistive Technologies and Smart Environments.","author":"Tremblay S\u00e9bastien","year":"2015","unstructured":"S\u00e9bastien Tremblay , Dany Fortin-Simard , Erika Blackburn-Verreault , S\u00e9bastien Gaboury , Bruno Bouchard , and Abdenour Bouzouane . 2015 . Exploiting Environmental Sounds for Activity Recognition in Smart Homes. In AAAI Workshop: Artificial Intelligence Applied to Assistive Technologies and Smart Environments. S\u00e9bastien Tremblay, Dany Fortin-Simard, Erika Blackburn-Verreault, S\u00e9bastien Gaboury, Bruno Bouchard, and Abdenour Bouzouane. 2015. Exploiting Environmental Sounds for Activity Recognition in Smart Homes. In AAAI Workshop: Artificial Intelligence Applied to Assistive Technologies and Smart Environments."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.procs.2015.08.338"},{"key":"e_1_2_1_45_1","volume-title":"Urs Peter Mosimann, and Tobias Nef.","author":"Urwyler Prabitha","year":"2015","unstructured":"Prabitha Urwyler , Luca Rampa , Reto Stucki , Marcel B\u00fcchler , Ren\u00e9 Martin M\u00fcri , Urs Peter Mosimann, and Tobias Nef. 2015 . Recognition of activities of daily living in healthy subjects using two ad-hoc classifiers. In Biomedical engineering online. Prabitha Urwyler, Luca Rampa, Reto Stucki, Marcel B\u00fcchler, Ren\u00e9 Martin M\u00fcri, Urs Peter Mosimann, and Tobias Nef. 2015. Recognition of activities of daily living in healthy subjects using two ad-hoc classifiers. In Biomedical engineering online."},{"key":"e_1_2_1_46_1","volume-title":"2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society. 5291--5294","author":"Vacher M.","unstructured":"M. Vacher , D. Istrate , F. Portet , T. Joubert , T. Chevalier , S. Smidtas , B. Meillon , B. Lecouteux , M. Sehili , P. Chahuara , and S. M\u00e9niard . 2011. The sweet-home project: Audio technology in smart homes to improve well-being and reliance . In 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society. 5291--5294 . M. Vacher, D. Istrate, F. Portet, T. Joubert, T. Chevalier, S. Smidtas, B. Meillon, B. Lecouteux, M. Sehili, P. Chahuara, and S. M\u00e9niard. 2011. The sweet-home project: Audio technology in smart homes to improve well-being and reliance. In 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society. 5291--5294."},{"key":"e_1_2_1_47_1","volume-title":"Proceedings of the 12th ACM International Conference on PErvasive Technologies Related to Assistive Environments","author":"Wang Wei","unstructured":"Wei Wang , Fatjon Seraj , Nirvana Meratnia , and Paul J. M. Havinga . 2019. Privacy-Aware Environmental Sound Classification for Indoor Human Activity Recognition . In Proceedings of the 12th ACM International Conference on PErvasive Technologies Related to Assistive Environments ( Rhodes, Greece) (PETRA '19). Association for Computing Machinery, New York, NY, USA, 36--44. https:\/\/doi.org\/10.1145\/3316782.3321521 Wei Wang, Fatjon Seraj, Nirvana Meratnia, and Paul J. M. Havinga. 2019. Privacy-Aware Environmental Sound Classification for Indoor Human Activity Recognition. In Proceedings of the 12th ACM International Conference on PErvasive Technologies Related to Assistive Environments (Rhodes, Greece) (PETRA '19). Association for Computing Machinery, New York, NY, USA, 36--44. https:\/\/doi.org\/10.1145\/3316782.3321521"},{"key":"e_1_2_1_48_1","unstructured":"Jiaxuan Wu Yunfei Feng and Peng Sun. 2018. Sensor Fusion for Recognition of Activities of Daily Living. In Sensors.  Jiaxuan Wu Yunfei Feng and Peng Sun. 2018. Sensor Fusion for Recognition of Activities of Daily Living. In Sensors."},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jcgg.2012.06.002"}],"container-title":["Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3448090","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3448090","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T21:24:59Z","timestamp":1750195499000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3448090"}},"subtitle":["Acoustic Activity Recognition Bounded by Conversational Assistant Interactions"],"short-title":[],"issued":{"date-parts":[[2021,3,19]]},"references-count":49,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,3,19]]}},"alternative-id":["10.1145\/3448090"],"URL":"https:\/\/doi.org\/10.1145\/3448090","relation":{},"ISSN":["2474-9567"],"issn-type":[{"value":"2474-9567","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,3,19]]},"assertion":[{"value":"2021-03-30","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}