{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,28]],"date-time":"2026-04-28T17:01:41Z","timestamp":1777395701003,"version":"3.51.4"},"reference-count":49,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2023,12,13]],"date-time":"2023-12-13T00:00:00Z","timestamp":1702425600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Science Foundation","award":["1813651, 2106690, 1928448, 1955653"],"award-info":[{"award-number":["1813651, 2106690, 1928448, 1955653"]}]},{"DOI":"10.13039\/100000006","name":"Office of Naval Research","doi-asserted-by":"crossref","award":["#N00014-18-1-2776"],"award-info":[{"award-number":["#N00014-18-1-2776"]}],"id":[{"id":"10.13039\/100000006","id-type":"DOI","asserted-by":"crossref"}]},{"name":"NSF Graduate Research Fellowship"},{"name":"NASEM Ford Predoctoral Fellowship"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["J. Hum.-Robot Interact."],"published-print":{"date-parts":[[2023,12,31]]},"abstract":"<jats:p>\n            Social robots in the home will need to solve audio identification problems to better interact with their users. This article focuses on the classification between (a)\n            <jats:italic>natural<\/jats:italic>\n            conversation that includes at least one co-located user and (b)\n            <jats:italic>media<\/jats:italic>\n            that is playing from electronic sources and does not require a social response, such as television shows. This classification can help social robots detect a user\u2019s social presence using sound. Social robots that are able to solve this problem can apply this information to assist them in making decisions, such as determining when and how to appropriately engage human users. We compiled a dataset from a variety of acoustic environments that contained either\n            <jats:italic>natural<\/jats:italic>\n            or\n            <jats:italic>media<\/jats:italic>\n            audio, including audio that we recorded in our own homes. Using this dataset, we performed an experimental evaluation on a range of traditional machine learning classifiers and assessed the classifiers\u2019 abilities to generalize to new recordings, acoustic conditions, and environments. We conclude that a C-Support Vector Classification (SVC) algorithm outperformed other classifiers. Finally, we present a classification pipeline that in-home robots can utilize, and we discuss the timing and size of the trained classifiers as well as privacy and ethics considerations.\n          <\/jats:p>","DOI":"10.1145\/3611658","type":"journal-article","created":{"date-parts":[[2023,8,18]],"date-time":"2023-08-18T12:42:26Z","timestamp":1692362546000},"page":"1-33","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Is Someone There or Is That the TV? Detecting Social Presence Using Sound"],"prefix":"10.1145","volume":"12","author":[{"given":"Nicholas C.","family":"Georgiou","sequence":"first","affiliation":[{"name":"Social Robotics Lab, Yale University, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Rebecca","family":"Ramnauth","sequence":"additional","affiliation":[{"name":"Social Robotics Lab, Yale University, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Emmanuel","family":"Adeniran","sequence":"additional","affiliation":[{"name":"Social Robotics Lab, Yale University, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Michael","family":"Lee","sequence":"additional","affiliation":[{"name":"Social Robotics Lab, Yale University, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lila","family":"Selin","sequence":"additional","affiliation":[{"name":"Social Robotics Lab, Yale University, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Brian","family":"Scassellati","sequence":"additional","affiliation":[{"name":"Social Robotics Lab, Yale University, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2023,12,13]]},"reference":[{"key":"e_1_3_4_2_2","doi-asserted-by":"publisher","DOI":"10.1016\/S1018-3639(18)30850-X"},{"key":"e_1_3_4_3_2","doi-asserted-by":"publisher","DOI":"10.3390\/s17040854"},{"key":"e_1_3_4_4_2","doi-asserted-by":"publisher","DOI":"10.3390\/electronics8050483"},{"key":"e_1_3_4_5_2","article-title":"The fifth \u201cCHiME\u201d speech separation and recognition challenge: Dataset, task and baselines","author":"Barker Jon Philip","year":"2018","unstructured":"Jon Philip Barker, Shinji Watanabe, Emmanuel Vincent, and Jan Trmal. 2018. The fifth \u201cCHiME\u201d speech separation and recognition challenge: Dataset, task and baselines. In Proceedings of the Interspeech Conference.","journal-title":"Proceedings of the Interspeech Conference"},{"key":"e_1_3_4_6_2","article-title":"Classification of musical genre: A machine learning approach","author":"Basili Roberto","year":"2004","unstructured":"Roberto Basili, Alfredo Serafini, and Armando Stellato. 2004. Classification of musical genre: A machine learning approach. In Proceedings of the International Society for Music Information Retrieval (ISMIR\u201904).","journal-title":"Proceedings of the International Society for Music Information Retrieval (ISMIR\u201904)"},{"key":"e_1_3_4_7_2","doi-asserted-by":"publisher","DOI":"10.1145\/3212480.3212505"},{"key":"e_1_3_4_8_2","doi-asserted-by":"publisher","DOI":"10.1201\/9781315139470"},{"key":"e_1_3_4_9_2","doi-asserted-by":"publisher","DOI":"10.1145\/1961189.1961199"},{"key":"e_1_3_4_10_2","doi-asserted-by":"publisher","DOI":"10.1007\/11428572_4"},{"key":"e_1_3_4_11_2","first-page":"885","article-title":"\u201cWhere am I?\u201d Scene recognition for mobile robots using audio features","author":"Chu Selina M.","year":"2006","unstructured":"Selina M. Chu, Shrikanth S. Narayanan, C.-C. Jay Kuo, and Maja J. Matari\u0107. 2006. \u201cWhere am I?\u201d Scene recognition for mobile robots using audio features. In Proceedings of the IEEE International Conference on Multimedia and Expo. 885\u2013888.","journal-title":"Proceedings of the IEEE International Conference on Multimedia and Expo"},{"key":"e_1_3_4_12_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.compbiomed.2022.106027"},{"key":"e_1_3_4_13_2","doi-asserted-by":"publisher","DOI":"10.1186\/s40638-016-0042-2"},{"key":"e_1_3_4_14_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.robot.2017.12.008"},{"key":"e_1_3_4_15_2","doi-asserted-by":"publisher","DOI":"10.1145\/3372278.3390730"},{"key":"e_1_3_4_16_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-011-0923-x"},{"key":"e_1_3_4_17_2","doi-asserted-by":"publisher","DOI":"10.1023\/B:STCO.0000035301.49549.88"},{"key":"e_1_3_4_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2016.7471670"},{"key":"e_1_3_4_19_2","doi-asserted-by":"publisher","DOI":"10.2307\/1403797"},{"key":"e_1_3_4_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISETC.2018.8583897"},{"key":"e_1_3_4_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/TENCON.1993.327987"},{"key":"e_1_3_4_22_2","doi-asserted-by":"publisher","DOI":"10.1007\/BF02985802"},{"key":"e_1_3_4_23_2","doi-asserted-by":"publisher","DOI":"10.31209\/2019.100000136"},{"key":"e_1_3_4_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICME.2002.1035731"},{"key":"e_1_3_4_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2008.4650977"},{"key":"e_1_3_4_26_2","first-page":"252","article-title":"Direct modeling of raw audio with DNNS for wake word detection","author":"Kumatani Ken\u2019ichi","year":"2017","unstructured":"Ken\u2019ichi Kumatani, Sankaran Panchapagesan, Minhua Wu, Minjae Kim, Nikko Strom, Gautam Tiwari, and Arindam Mandal. 2017. Direct modeling of raw audio with DNNS for wake word detection. In Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU\u201917). 252\u2013257.","journal-title":"Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU\u201917)"},{"key":"e_1_3_4_27_2","article-title":"Next-generation of virtual personal assistants (Microsoft Cortana, Apple Siri, Amazon Alexa and Google Home)","author":"K\u00ebpuska Veton","year":"2018","unstructured":"Veton K\u00ebpuska and Gamal Bohouta. 2018. Next-generation of virtual personal assistants (Microsoft Cortana, Apple Siri, Amazon Alexa and Google Home). In Proceedings of the IEEE 8th Annual Computing and Communication Workshop and Conference (CCWC\u201918).","journal-title":"Proceedings of the IEEE 8th Annual Computing and Communication Workshop and Conference (CCWC\u201918)"},{"key":"e_1_3_4_28_2","article-title":"Mel frequency cepstral coefficients for music modeling","author":"Logan Beth","year":"2000","unstructured":"Beth Logan. 2000. Mel frequency cepstral coefficients for music modeling. In Proceedings of the 1st International Symposium on Music Information Retrieval.","journal-title":"Proceedings of the 1st International Symposium on Music Information Retrieval"},{"key":"e_1_3_4_29_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jss.2018.11.022"},{"key":"e_1_3_4_30_2","doi-asserted-by":"publisher","DOI":"10.3390\/fi11110231"},{"key":"e_1_3_4_31_2","first-page":"6285","article-title":"Sound representation and classification benchmark for domestic robots","author":"Maxime J.","year":"2014","unstructured":"J. Maxime, X. Alameda-Pineda, L. Girin, and R. Horaud. 2014. Sound representation and classification benchmark for domestic robots. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA\u201914). 6285\u20136292.","journal-title":"Proceedings of the IEEE International Conference on Robotics and Automation (ICRA\u201914)"},{"key":"e_1_3_4_32_2","doi-asserted-by":"publisher","DOI":"10.25080\/Majora-7b98e3ed-003"},{"key":"e_1_3_4_33_2","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2020-1602"},{"key":"e_1_3_4_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/SSP.2007.4301365"},{"key":"e_1_3_4_35_2","article-title":"Chroma toolbox: MATLAB implementations for extracting variants of chroma-based audio features","author":"M\u00fcller Meinard","year":"2011","unstructured":"Meinard M\u00fcller and Sebastian Ewert. 2011. Chroma toolbox: MATLAB implementations for extracting variants of chroma-based audio features. In Proceedings of the International Conference on Music Information Retrieval (ISMIR\u201911).","journal-title":"In Proceedings of the International Conference on Music Information Retrieval (ISMIR\u201911)"},{"key":"e_1_3_4_36_2","first-page":"36","article-title":"Context aware virtual assistant with case-based conflict resolution in multi-user smart home environment","author":"Ospan Bauyrzhan","year":"2018","unstructured":"Bauyrzhan Ospan, Nawaz Khan, Juan Augusto Wrede, Mario Quinde, and Kenzhegali Nurgaliyev. 2018. Context aware virtual assistant with case-based conflict resolution in multi-user smart home environment. In Proceedings of the International Conference on Computing and Network Communications (CoCoNet\u201918). 36\u201344.","journal-title":"Proceedings of the International Conference on Computing and Network Communications (CoCoNet\u201918)"},{"key":"e_1_3_4_37_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.aei.2020.101238"},{"key":"e_1_3_4_38_2","doi-asserted-by":"publisher","DOI":"10.5555\/1953048.2078195"},{"key":"e_1_3_4_39_2","doi-asserted-by":"crossref","unstructured":"H\u00e9ctor Lozano Inmaculada Hern\u00e1ez Artzai Pic\u00f3n Javier Camarena and Eva Navas. 2010. Audio classification techniques in home environments for elderly\/dependant people. In Proceedings of the 12th International Conference on Computers Helping People with Special Needs: Part I (ICCHP\u201910) . Springer-Verlag Berlin Heidelberg 320\u2013323.","DOI":"10.1007\/978-3-642-14097-6_51"},{"key":"e_1_3_4_40_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.proeng.2012.06.120"},{"key":"e_1_3_4_41_2","doi-asserted-by":"publisher","DOI":"10.1007\/s42486-019-00014-1"},{"key":"e_1_3_4_42_2","doi-asserted-by":"publisher","DOI":"10.1080\/02533839.2012.751330"},{"key":"e_1_3_4_43_2","doi-asserted-by":"publisher","DOI":"10.5555\/1759639.1759676"},{"key":"e_1_3_4_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSA.2002.800560"},{"key":"e_1_3_4_45_2","article-title":"American Time Use Survey Summary","author":"Statistics United States Bureau of Labor","year":"2019","unstructured":"United States Bureau of Labor Statistics. 2019. American Time Use Survey Summary. Retrieved from https:\/\/www.bls.gov\/news.release\/pdf\/atus.pdf","journal-title":"R"},{"key":"e_1_3_4_46_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.engappai.2019.08.020"},{"key":"e_1_3_4_47_2","article-title":"Genre Breakdown of the Top 250 TV Programs in the United States in 2017","author":"Watson A.","year":"2019","unstructured":"A. Watson. 2019. Genre Breakdown of the Top 250 TV Programs in the United States in 2017. Retrieved from https:\/\/www.statista.com\/statistics\/201565\/most-popular-genres-in-us-primetime-tv\/","journal-title":"R"},{"key":"e_1_3_4_48_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2019.01.085"},{"key":"e_1_3_4_49_2","unstructured":"Harry Zhang. 2004. The optimality of naive bayes. Proceedings of the 17th International Florida Artificial Intelligence Research Society Conference (FLAIRS\u201904) 2."},{"key":"e_1_3_4_50_2","doi-asserted-by":"publisher","DOI":"10.1145\/279232.279236"}],"container-title":["ACM Transactions on Human-Robot Interaction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3611658","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3611658","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:36:12Z","timestamp":1750178172000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3611658"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,12,13]]},"references-count":49,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2023,12,31]]}},"alternative-id":["10.1145\/3611658"],"URL":"https:\/\/doi.org\/10.1145\/3611658","relation":{},"ISSN":["2573-9522"],"issn-type":[{"value":"2573-9522","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,12,13]]},"assertion":[{"value":"2022-05-02","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-06-22","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-12-13","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}