{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,22]],"date-time":"2026-04-22T19:46:50Z","timestamp":1776887210061,"version":"3.51.2"},"reference-count":49,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2021,9,9]],"date-time":"2021-09-09T00:00:00Z","timestamp":1631145600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Interact. Mob. Wearable Ubiquitous Technol."],"published-print":{"date-parts":[[2021,9,9]]},"abstract":"<jats:p>Voice interaction is friendly and convenient for users. Smart devices such as Amazon Echo allow users to interact with them by voice commands and become increasingly popular in our daily life. In recent years, research works focus on using the microphone array built in smart devices to localize the user's position, which adds additional context information to voice commands. In contrast, few works explore the user's head orientation, which also contains useful context information. For example, when a user says, \"turn on the light\", the head orientation could infer which light the user is referring to. Existing model-based works require a large number of microphone arrays to form an array network, while machine learning-based approaches need laborious data collection and training workload. The high deployment\/usage cost of these methods is unfriendly to users. In this paper, we propose HOE, a model-based system that enables Head Orientation Estimation for smart devices with only two microphone arrays, which requires a lower training overhead than previous approaches. HOE first estimates the user's head orientation candidates by measuring the voice energy radiation pattern. Then, the voice frequency radiation pattern is leveraged to obtain the final result. Real-world experiments are conducted, and the results show that HOE can achieve a median estimation error of 23 degrees. To the best of our knowledge, HOE is the first model-based attempt to estimate the head orientation by only two microphone arrays without the arduous data training overhead.<\/jats:p>","DOI":"10.1145\/3478089","type":"journal-article","created":{"date-parts":[[2021,9,14]],"date-time":"2021-09-14T22:48:23Z","timestamp":1631659703000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":13,"title":["Model-based Head Orientation Estimation for Smart Devices"],"prefix":"10.1145","volume":"5","author":[{"given":"Qiang","family":"Yang","sequence":"first","affiliation":[{"name":"The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yuanqing","family":"Zheng","sequence":"additional","affiliation":[{"name":"The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,9,14]]},"reference":[{"key":"e_1_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2006-649"},{"key":"e_1_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2007-257"},{"key":"e_1_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/3379337.3415588"},{"key":"e_1_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/WACV.2016.7477553"},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2005-745"},{"key":"e_1_2_2_6_1","volume-title":"2011 19th European Signal Processing Conference. IEEE, 151--155","author":"Brutti Alessio","year":"2011","unstructured":"Alessio Brutti , Maurizio Omologo , and Piergiorgio Svaizer . 2011 . Inference of acoustic source directivity using environment awareness . In 2011 19th European Signal Processing Conference. IEEE, 151--155 . Alessio Brutti, Maurizio Omologo, and Piergiorgio Svaizer. 2011. Inference of acoustic source directivity using environment awareness. In 2011 19th European Signal Processing Conference. IEEE, 151--155."},{"key":"e_1_2_2_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2007.366957"},{"key":"e_1_2_2_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/INFOCOM41043.2020.9155430"},{"key":"e_1_2_2_9_1","unstructured":"Wing Tin Chu and A C C Warnock. 2002. Detailed directivity of sound fields around human talkers. (2002).  Wing Tin Chu and A C C Warnock. 2002. Detailed directivity of sound fields around human talkers. (2002)."},{"key":"e_1_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1121\/1.3425729"},{"key":"e_1_2_2_11_1","unstructured":"Zoey Collier. 2016. Beco Focuses on Developing a Spatially-Aware Alexa Skill. https:\/\/developer.amazon.com\/blogs\/alexa\/post\/Tx1BPHXBLZV5ZVN\/beco-focuses-on-developing-a-spatially-aware-alexa-skill  Zoey Collier. 2016. Beco Focuses on Developing a Spatially-Aware Alexa Skill. https:\/\/developer.amazon.com\/blogs\/alexa\/post\/Tx1BPHXBLZV5ZVN\/beco-focuses-on-developing-a-spatially-aware-alexa-skill"},{"key":"e_1_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/2636242.2636245"},{"key":"e_1_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASL.2011.2104953"},{"key":"e_1_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2013.6696919"},{"key":"e_1_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASSP.1976.1162830"},{"key":"e_1_2_2_16_1","volume-title":"Proc. of International Workshop on Acoustic Echo and Noise Control (IWAENC). Citeseer, 211--223","author":"Korhonen Teemu","year":"2008","unstructured":"Teemu Korhonen . 2008 . Acoustic localization using reverberation with virtual microphones . In Proc. of International Workshop on Acoustic Echo and Noise Control (IWAENC). Citeseer, 211--223 . Teemu Korhonen. 2008. Acoustic localization using reverberation with virtual microphones. In Proc. of International Workshop on Acoustic Echo and Noise Control (IWAENC). Citeseer, 211--223."},{"key":"e_1_2_2_17_1","volume-title":"A robust method to extract talker azimuth orientation using a large-aperture microphone array","author":"Levi Avram","year":"2009","unstructured":"Avram Levi and Harvey Silverman . 2009. A robust method to extract talker azimuth orientation using a large-aperture microphone array . IEEE transactions on audio, speech, and language processing 18, 2 ( 2009 ), 277--285. Avram Levi and Harvey Silverman. 2009. A robust method to extract talker azimuth orientation using a large-aperture microphone array. IEEE transactions on audio, speech, and language processing 18, 2 (2009), 277--285."},{"key":"e_1_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICME.2008.4607497"},{"key":"e_1_2_2_19_1","volume-title":"Audio Report","author":"National Public Media. 2019.","year":"2019","unstructured":"National Public Media. 2019. The Smart Audio Report 2019 . https:\/\/www.nationalpublicmedia.com\/uploads\/2020\/01\/The-Smart-Audio-Report-Winter-2019.pdf Accessed October 19, 2020. National Public Media. 2019. The Smart Audio Report 2019. https:\/\/www.nationalpublicmedia.com\/uploads\/2020\/01\/The-Smart-Audio-Report-Winter-2019.pdf Accessed October 19, 2020."},{"key":"e_1_2_2_20_1","volume-title":"ITG Symposium. VDE, 1--5.","author":"M\u00fcller Menno","unstructured":"Menno M\u00fcller , Steven van de Par, and Joerg Bitzer. 2016. Head-Orientation-Based Device Selection: Are You Talking to Me?. In Speech Communication; 12 . ITG Symposium. VDE, 1--5. Menno M\u00fcller, Steven van de Par, and Joerg Bitzer. 2016. Head-Orientation-Based Device Selection: Are You Talking to Me?. In Speech Communication; 12. ITG Symposium. VDE, 1--5."},{"key":"e_1_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSMCB.2004.826398"},{"key":"e_1_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2005.1544981"},{"key":"e_1_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.5555\/1733343.1733483"},{"key":"e_1_2_2_24_1","volume-title":"International workshop on telecommunications (IWT2013)","author":"Nakano Alberto Yoshihiro","year":"2013","unstructured":"Alberto Yoshihiro Nakano and Phillip Mark Seymour Burt . 2013 . Estimation of user orientation using GMMs for multiple voice-command devices environments . In International workshop on telecommunications (IWT2013) . Alberto Yoshihiro Nakano and Phillip Mark Seymour Burt. 2013. Estimation of user orientation using GMMs for multiple voice-command devices environments. In International workshop on telecommunications (IWT2013)."},{"key":"e_1_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1121\/1.3257548"},{"key":"e_1_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/DSP.2009.4785995"},{"key":"e_1_2_2_27_1","volume-title":"American National Standard Specification for Octave-band and Fractional-octave-band Analog and Digital Filters. Standards Secretariat","author":"Acoustical Society of America. 2004.","unstructured":"Acoustical Society of America. 2004. American National Standard Specification for Octave-band and Fractional-octave-band Analog and Digital Filters. Standards Secretariat , Acoustical Society of America . Acoustical Society of America. 2004. American National Standard Specification for Octave-band and Fractional-octave-band Analog and Digital Filters. Standards Secretariat, Acoustical Society of America."},{"key":"e_1_2_2_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICME.2010.5583886"},{"key":"e_1_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2004.1326764"},{"key":"e_1_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.sysarc.2011.04.003"},{"key":"e_1_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.4108\/ICST.ROBOCOMM2009.5815"},{"key":"e_1_2_2_32_1","volume-title":"ReSpeaker Mic Array v2.0. https:\/\/wiki.seeedstudio.com\/ReSpeaker_Mic_Array_v2.0\/\/ Accessed","year":"2020","unstructured":"Seeed. 2020. ReSpeaker Mic Array v2.0. https:\/\/wiki.seeedstudio.com\/ReSpeaker_Mic_Array_v2.0\/\/ Accessed October 29, 2020 . Seeed. 2020. ReSpeaker Mic Array v2.0. https:\/\/wiki.seeedstudio.com\/ReSpeaker_Mic_Array_v2.0\/\/ Accessed October 29, 2020."},{"key":"e_1_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2008-387"},{"key":"e_1_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2007.366327"},{"key":"e_1_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.3390\/s140202259"},{"key":"e_1_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2012-475"},{"key":"e_1_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/3372224.3380884"},{"key":"e_1_2_2_39_1","volume-title":"ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","author":"Sundar Harshavardhan","unstructured":"Harshavardhan Sundar , Weiran Wang , Ming Sun , and Chao Wang . 2020. Raw Waveform Based End-to-end Deep Convolutional Network for Spatial Localization of Multiple Acoustic Sources . In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . IEEE , 4642--4646. Harshavardhan Sundar, Weiran Wang, Ming Sun, and Chao Wang. 2020. Raw Waveform Based End-to-end Deep Convolutional Network for Spatial Localization of Multiple Acoustic Sources. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 4642--4646."},{"key":"e_1_2_2_40_1","volume-title":"2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO). IEEE, 1024--1028","author":"Svaizer Piergiorgio","year":"2012","unstructured":"Piergiorgio Svaizer , Alessio Brutti , and Maurizio Omologo . 2012 . Environment aware estimation of the orientation of acoustic sources using a line array . In 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO). IEEE, 1024--1028 . Piergiorgio Svaizer, Alessio Brutti, and Maurizio Omologo. 2012. Environment aware estimation of the orientation of acoustic sources using a line array. In 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO). IEEE, 1024--1028."},{"key":"e_1_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2011-147"},{"key":"e_1_2_2_42_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2012-403"},{"key":"e_1_2_2_43_1","volume-title":"Sound capture and processing: practical approaches","author":"Tashev Ivan Jelev","unstructured":"Ivan Jelev Tashev . 2009. Sound capture and processing: practical approaches . John Wiley & Sons . Ivan Jelev Tashev. 2009. Sound capture and processing: practical approaches. John Wiley & Sons."},{"key":"e_1_2_2_44_1","unstructured":"TCL. 2021. P717 Series. 4K UHD ANDROID TV. https:\/\/www.tcl.com\/hk\/en\/products\/p717\/p717-50.html.  TCL. 2021. P717 Series. 4K UHD ANDROID TV. https:\/\/www.tcl.com\/hk\/en\/products\/p717\/p717-50.html."},{"key":"e_1_2_2_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2010.5496123"},{"key":"e_1_2_2_46_1","volume-title":"CHIL: Computers in the human interaction loop.","author":"Alex","year":"2005","unstructured":"Alex Waibe11, Hartwig Steusloff , Rainer Stiefelhagen , 2005 . CHIL: Computers in the human interaction loop. (2005). Alex Waibe11, Hartwig Steusloff, Rainer Stiefelhagen, et al. 2005. CHIL: Computers in the human interaction loop. (2005)."},{"key":"e_1_2_2_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/3384419.3430724"},{"key":"e_1_2_2_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2016.2556422"},{"key":"e_1_2_2_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/3313831.3376427"},{"key":"e_1_2_2_50_1","doi-asserted-by":"publisher","DOI":"10.1121\/1.1907816"}],"container-title":["Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3478089","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3478089","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T19:31:32Z","timestamp":1750188692000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3478089"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,9,9]]},"references-count":49,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2021,9,9]]}},"alternative-id":["10.1145\/3478089"],"URL":"https:\/\/doi.org\/10.1145\/3478089","relation":{},"ISSN":["2474-9567"],"issn-type":[{"value":"2474-9567","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,9,9]]},"assertion":[{"value":"2021-09-14","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}