{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,2]],"date-time":"2025-12-02T19:52:21Z","timestamp":1764705141032,"version":"3.46.0"},"reference-count":62,"publisher":"Association for Computing Machinery (ACM)","issue":"4","funder":[{"DOI":"10.13039\/100000062","name":"National Institute of Diabetes and Digestive and Kidney Diseases","doi-asserted-by":"publisher","award":["R01DK129843"],"award-info":[{"award-number":["R01DK129843"]}],"id":[{"id":"10.13039\/100000062","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000070","name":"National Institute of Biomedical Imaging and Bioengineering","doi-asserted-by":"publisher","award":["R21EB030305"],"award-info":[{"award-number":["R21EB030305"]}],"id":[{"id":"10.13039\/100000070","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Interact. Mob. Wearable Ubiquitous Technol."],"published-print":{"date-parts":[[2025,12,2]]},"abstract":"<jats:p>\n                    Wearable cameras are increasingly used as an observational and interventional tool for human behaviors by providing detailed visual data of hand-related activities. This data can be leveraged to facilitate memory recall for logging of behavior or timely interventions aimed at improving health. However, continuous processing of RGB images from these cameras consumes significant power impacting battery lifetime, generates a large volume of unnecessary video data for post-processing, raises privacy concerns, and requires substantial computational resources for real-time analysis. We introduce THOR, a real-time adaptive spatio-temporal RGB frame sampling method that leverages thermal sensing to capture hand-object patches and classify them in real time. We use low-resolution thermal camera data to identify moments when a person switches from one hand-related activity to another and adjust the RGB frame sampling rate by increasing it during activity transitions and reducing it during periods of sustained activity (when the system has enough information to identify the activity). Additionally, we use the thermal cues from the hand to localize the region of interest\n                    <jats:italic toggle=\"yes\">(i.e.<\/jats:italic>\n                    , the hand-object interaction) in each RGB frame, allowing the system to crop and process only the necessary part of the image for activity recognition. We develop a wearable device to validate our method through an in-the-wild study with 14 participants and over 30 activities, and further evaluate it on Ego4D (923 participants across 9 countries, totaling 3,670 hours of video). Our results show that using only 3% of the original RGB video data, our method captures all the activity segments, and achieves a hand-related activity recognition F1-score (95%) comparable to using the entire RGB video (94%). Our work provides a more practical path for the longitudinal use of wearable cameras to monitor hand-related activities and health-risk behaviors in real time.\n                  <\/jats:p>","DOI":"10.1145\/3770695","type":"journal-article","created":{"date-parts":[[2025,12,2]],"date-time":"2025-12-02T19:42:32Z","timestamp":1764704552000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["THOR: Thermal-Guided Hand-Object Reasoning via Adaptive Vision Sampling"],"prefix":"10.1145","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7898-3957","authenticated-orcid":false,"given":"Soroush","family":"Shahi","sequence":"first","affiliation":[{"name":"Computer Science, Northwestern University, Evanston, Illinois, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-0160-4972","authenticated-orcid":false,"given":"Farzad","family":"Shahabi","sequence":"additional","affiliation":[{"name":"Computer Science, Northwestern University, Evanston, Illinois, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-8926-2458","authenticated-orcid":false,"given":"Rama","family":"Naboulsi","sequence":"additional","affiliation":[{"name":"Computer Science, Northwestern University, Evanston, Illinois, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9070-4594","authenticated-orcid":false,"given":"Glenn","family":"Fernandes","sequence":"additional","affiliation":[{"name":"Computer Science, Northwestern University, Evanston, Illinois, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4554-0070","authenticated-orcid":false,"given":"Aggelos K","family":"Katsaggelos","sequence":"additional","affiliation":[{"name":"Computer Science, Northwestern University, Evanston, Illinois, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6681-7564","authenticated-orcid":false,"given":"Nabil","family":"Alshurafa","sequence":"additional","affiliation":[{"name":"Preventive Medicine and Computer Science, Northwestern University, Chicago, Illinois, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,12,2]]},"reference":[{"key":"e_1_2_2_1_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2022.10.037"},{"key":"e_1_2_2_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/PerCom53586.2022.9762385"},{"key":"e_1_2_2_3_1","first-page":"4","volume-title":"Proceedings of the ACM on interactive, mobile, wearable and ubiquitous technologies 6","author":"Alharbi Rawan","year":"2023","unstructured":"Rawan Alharbi, Soroush Shahi, Stefany Cruz, Lingfeng Li, Sougata Sen, Mahdi Pedram, Christopher Romano, Josiah Hester, Aggelos K Katsaggelos, and Nabil Alshurafa. 2023. Smokemon: unobtrusive extraction of smoking topography using wearable energy-efficient thermal. Proceedings of the ACM on interactive, mobile, wearable and ubiquitous technologies 6, 4 (2023), 1\u201325."},{"key":"e_1_2_2_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3264900"},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.226"},{"key":"e_1_2_2_6_1","volume-title":"Hoi-ref: Hand-object interaction referral in egocentric vision. arXiv preprint arXiv:2404.09933","author":"Bansal Siddhant","year":"2024","unstructured":"Siddhant Bansal, Michael Wray, and Dima Damen. 2024. Hoi-ref: Hand-object interaction referral in egocentric vision. arXiv preprint arXiv:2404.09933 (2024)."},{"key":"e_1_2_2_7_1","first-page":"4","article-title":"Is space-time attention all you need for video understanding?","volume":"2","author":"Bertasius Gedas","year":"2021","unstructured":"Gedas Bertasius, Heng Wang, and Lorenzo Torresani. 2021. Is space-time attention all you need for video understanding?. In ICML, Vol. 2. 4.","journal-title":"ICML"},{"key":"e_1_2_2_8_1","unstructured":"Rebekah Carter. 2024. Snap Spectacles 5 Review: The Latest Snap AR Glasses. https:\/\/www.xrtoday.com\/augmented-reality\/snap-spectacles-5-review-the-latest-snap-ar-glasses\/. Accessed: 2025-04-28."},{"key":"e_1_2_2_9_1","volume-title":"International conference on machine learning. PmLR, 1597\u20131607","author":"Chen Ting","year":"2020","unstructured":"Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In International conference on machine learning. PmLR, 1597\u20131607."},{"key":"e_1_2_2_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_2_2_11_1","volume-title":"ThermoHands: A Benchmark for 3D Hand Pose Estimation from Egocentric Thermal Images. arXiv preprint arXiv:2403.09871","author":"Ding Fangqiang","year":"2024","unstructured":"Fangqiang Ding, Yunzhou Zhu, Xiangyu Wen, Gaowen Liu, and Chris Xiaoxuan Lu. 2024. ThermoHands: A Benchmark for 3D Hand Pose Estimation from Egocentric Thermal Images. arXiv preprint arXiv:2403.09871 (2024)."},{"key":"e_1_2_2_12_1","volume-title":"Si\u00e2n Lindley, Paul Kelly, and Charlie Foster.","author":"Doherty Aiden R","year":"2013","unstructured":"Aiden R Doherty, Steve E Hodges, Abby C King, Alan F Smeaton, Emma Berry, Chris JA Moulin, Si\u00e2n Lindley, Paul Kelly, and Charlie Foster. 2013. Wearable cameras in health: the state of the art and future possibilities. American journal of preventive medicine 44, 3 (2013), 320\u2013323."},{"key":"e_1_2_2_13_1","unstructured":"Jakob Engel Kiran Somasundaram Michael Goesele Albert Sun Alexander Gamino Andrew Turner Arjang Talattof Arnie Yuan Bilal Souti Brighid Meredith et al. 2023. Project aria: A new tool for egocentric multi-modal ai research. arXiv preprint arXiv:2308.13561 (2023)."},{"key":"e_1_2_2_14_1","volume-title":"2019 15th International Conference on Distributed Computing in Sensor Systems (DCOSS). IEEE, 41\u201348","author":"Fang Shiwei","year":"2019","unstructured":"Shiwei Fang, Ketan Mayer-Patel, and Shahriar Nirjon. 2019. ZenCam: Context-driven control of autonomous body cameras. In 2019 15th International Conference on Distributed Computing in Sensor Systems (DCOSS). IEEE, 41\u201348."},{"key":"e_1_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3678591"},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jand.2014.09.015"},{"key":"e_1_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01535"},{"key":"e_1_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01842"},{"key":"e_1_2_2_19_1","volume-title":"Lifelogging: Personal big data. Foundations and Trends\u00ae in information retrieval 8, 1","author":"Gurrin Cathal","year":"2014","unstructured":"Cathal Gurrin, Alan F Smeaton, Aiden R Doherty, et al. 2014. Lifelogging: Personal big data. Foundations and Trends\u00ae in information retrieval 8, 1 (2014), 1\u2013125."},{"key":"e_1_2_2_20_1","unstructured":"Grace Harmon. 2024. Ray-Ban Meta Smart Glasses get an on-device assistant with battery life limitations. https:\/\/www.emarketer.com\/content\/ray-ban-meta-smart-glasses-on-device-assistant\u2013with-battery-life-limitations. Accessed: 2025-04-28."},{"key":"e_1_2_2_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2020.3030723"},{"key":"e_1_2_2_22_1","volume-title":"2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, 1\u20136.","author":"He Mingzhou","year":"2024","unstructured":"Mingzhou He, Haojie Wang, Shuchang Zhou, Qingbo Wu, King Ngi Ngan, Fanman Meng, and Hongliang Li. 2024. Inertial Strengthened CLIP model for Zero-shot Multimodal Egocentric Activity Recognition. In 2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, 1\u20136."},{"key":"e_1_2_2_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/3025453.3025821"},{"key":"e_1_2_2_24_1","volume-title":"USA","author":"Hodges Steve","year":"2006","unstructured":"Steve Hodges, Lyndsay Williams, Emma Berry, Shahram Izadi, James Srinivasan, Alex Butler, Gavin Smyth, Narinder Kapur, and Ken Wood. 2006. SenseCam: A retrospective memory aid. In UbiComp 2006: Ubiquitous Computing: 8th International Conference, UbiComp 2006 Orange County, CA, USA, September 17-21, 2006 Proceedings 8. Springer, 177\u2013193."},{"key":"e_1_2_2_25_1","volume-title":"SIMBAD 2015, Copenhagen, Denmark, October 12-14, 2015. Proceedings 3. Springer, 84\u201392","author":"Hoffer Elad","year":"2015","unstructured":"Elad Hoffer and Nir Ailon. 2015. Deep metric learning using triplet network. In Similarity-based pattern recognition: third international workshop, SIMBAD 2015, Copenhagen, Denmark, October 12-14, 2015. Proceedings 3. Springer, 84\u201392."},{"key":"e_1_2_2_26_1","first-page":"3","article-title":"Lora: Low-rank adaptation of large language models","volume":"1","author":"Hu Edward J","year":"2022","unstructured":"Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. 2022. Lora: Low-rank adaptation of large language models. ICLR 1, 2 (2022), 3.","journal-title":"ICLR"},{"key":"e_1_2_2_27_1","first-page":"2","volume-title":"Proceedings of the ACM on interactive, mobile, wearable and ubiquitous technologies 4","author":"Hu Fang","year":"2020","unstructured":"Fang Hu, Peng He, Songlin Xu, Yin Li, and Cheng Zhang. 2020. FingerTrak: Continuous 3D hand pose tracking by deep learning hand silhouettes captured by miniature thermal cameras on wrist. Proceedings of the ACM on interactive, mobile, wearable and ubiquitous technologies 4, 2 (2020), 1\u201324."},{"key":"e_1_2_2_28_1","volume-title":"2015 IEEE International Conference on Image Processing (ICIP). IEEE, 1349\u20131353","author":"Ishihara Tatsuya","year":"2015","unstructured":"Tatsuya Ishihara, Kris M Kitani, Wei-Chiu Ma, Hironobu Takagi, and Chieko Asakawa. 2015. Recognizing hand-object interactions in wearable camera videos. In 2015 IEEE International Conference on Image Processing (ICIP). IEEE, 1349\u20131353."},{"key":"e_1_2_2_29_1","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision. 6232\u20136242","author":"Korbar Bruno","year":"2019","unstructured":"Bruno Korbar, Du Tran, and Lorenzo Torresani. 2019. Scsampler: Sampling salient clips from video for efficient action recognition. In Proceedings of the IEEE\/CVF International Conference on Computer Vision. 6232\u20136242."},{"key":"e_1_2_2_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2021.3073469"},{"key":"e_1_2_2_31_1","doi-asserted-by":"publisher","DOI":"10.1142\/S219688882250035X"},{"key":"e_1_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCVW54120.2021.00210"},{"key":"e_1_2_2_33_1","volume-title":"Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition. 13894\u201313903","author":"Lin Jintao","year":"2022","unstructured":"Jintao Lin, Haodong Duan, Kai Chen, Dahua Lin, and Limin Wang. 2022. Ocsampler: Compressing videos to one clip with single-step sampling. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition. 13894\u201313903."},{"key":"e_1_2_2_34_1","volume-title":"Tarveen Ragbir Singh, Christopher Neil, Dinh Phung, and Kylie Ball.","author":"Maddison Ralph","year":"2019","unstructured":"Ralph Maddison, Susie Cartledge, Michelle Rogerson, Nicole Sylvia Goedhart, Tarveen Ragbir Singh, Christopher Neil, Dinh Phung, and Kylie Ball. 2019. Usefulness of wearable cameras as a tool to enhance chronic disease self-management: scoping review. JMIR mHealth and uHealth 7, 1 (2019), e10371."},{"key":"e_1_2_2_35_1","volume-title":"Ar-net: Adaptive frame resolution for efficient action recognition. In Computer Vision-ECCV 2020: 16th European Conference","author":"Meng Yue","year":"2020","unstructured":"Yue Meng, Chung-Ching Lin, Rameswar Panda, Prasanna Sattigeri, Leonid Karlinsky, Aude Oliva, Kate Saenko, and Rogerio Feris. 2020. Ar-net: Adaptive frame resolution for efficient action recognition. In Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part VII 16. Springer, 86\u2013104."},{"key":"e_1_2_2_36_1","unstructured":"Meta Platforms Inc. 2025. Battery life on Ray-Ban Meta AI Glasses. https:\/\/www.meta.com\/help\/ai-glasses\/303057485648146\/. Accessed: 2025-04-28."},{"key":"e_1_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/3379175.3391712"},{"key":"e_1_2_2_38_1","volume-title":"Proceedings of the IEEE international conference on computer vision. 1154\u20131163","author":"Mueller Franziska","year":"2017","unstructured":"Franziska Mueller, Dushyant Mehta, Oleksandr Sotnychenko, Srinath Sridhar, Dan Casas, and Christian Theobalt. 2017. Real-time hand tracking under occlusion from an egocentric rgb-d sensor. In Proceedings of the IEEE international conference on computer vision. 1154\u20131163."},{"key":"e_1_2_2_39_1","volume-title":"Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services. 292\u2013305","author":"Naderiparizi Saman","year":"2017","unstructured":"Saman Naderiparizi, Pengyu Zhang, Matthai Philipose, Bodhi Priyantha, Jie Liu, and Deepak Ganesan. 2017. Glimpse: A programmable early-discard camera architecture for continuous mobile vision. In Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services. 292\u2013305."},{"key":"e_1_2_2_40_1","volume-title":"2018 14th International Conference on Intelligent Environments (IE). IEEE, 91\u201394","author":"Cuc Nguyen Thi Hoa","year":"2018","unstructured":"Thi Hoa Cuc Nguyen, Jean-Christophe Nebel, Gordon Hunter, and Francisco Florez-Revuelta. 2018. Automated detection of hands and objects in egocentric videos, for ambient assisted living applications. In 2018 14th International Conference on Intelligent Environments (IE). IEEE, 91\u201394."},{"key":"e_1_2_2_41_1","doi-asserted-by":"publisher","DOI":"10.1145\/2786567.2794320"},{"key":"e_1_2_2_42_1","volume-title":"Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition. 12999\u201313008","author":"Ohkawa Takehiko","year":"2023","unstructured":"Takehiko Ohkawa, Kun He, Fadime Sener, Tomas Hodan, Luan Tran, and Cem Keskin. 2023. Assemblyhands: Towards egocentric activity understanding via 3d hand pose estimation. In Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition. 12999\u201313008."},{"key":"e_1_2_2_43_1","volume-title":"Egovideo: Exploring egocentric foundation model and downstream adaptation. arXiv preprint arXiv:2406.18070","author":"Pei Baoqi","year":"2024","unstructured":"Baoqi Pei, Guo Chen, Jilan Xu, Yuping He, Yicheng Liu, Kanghua Pan, Yifei Huang, Yali Wang, Tong Lu, Limin Wang, et al. 2024. Egovideo: Exploring egocentric foundation model and downstream adaptation. arXiv preprint arXiv:2406.18070 (2024)."},{"key":"e_1_2_2_44_1","volume-title":"2012 IEEE conference on computer vision and pattern recognition. IEEE, 2847\u20132854","author":"Pirsiavash Hamed","year":"2012","unstructured":"Hamed Pirsiavash and Deva Ramanan. 2012. Detecting activities of daily living in first-person camera views. In 2012 IEEE conference on computer vision and pattern recognition. IEEE, 2847\u20132854."},{"key":"e_1_2_2_45_1","volume-title":"Proceedings of the IEEE conference on computer vision and pattern recognition. 5967\u20135976","author":"Possas Rafael","year":"2018","unstructured":"Rafael Possas, Sheila Pinto Caceres, and Fabio Ramos. 2018. Egocentric activity recognition on a budget. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5967\u20135976."},{"key":"e_1_2_2_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.01072"},{"key":"e_1_2_2_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00474"},{"key":"e_1_2_2_48_1","volume-title":"2018 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops). IEEE, 872\u2013877","author":"Schiboni Giovanni","year":"2018","unstructured":"Giovanni Schiboni, Fabio Wasner, and Oliver Amft. 2018. A privacy-preserving wearable camera setup for dietary event spotting in free-living. In 2018 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops). IEEE, 872\u2013877."},{"key":"e_1_2_2_49_1","volume-title":"2022 IEEE-EMBS International Conference on Wearable and Implantable Body Sensor Networks (BSN). IEEE, 1\u20134.","author":"Shahi Soroush","year":"2022","unstructured":"Soroush Shahi, Mahdi Pedram, Glenn Fernandes, and Nabil Alshurafa. 2022. Smartact: energy efficient and real-time hand-to-mouth gesture detection using wearable rgb-t. In 2022 IEEE-EMBS International Conference on Wearable and Implantable Body Sensor Networks (BSN). IEEE, 1\u20134."},{"key":"e_1_2_2_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00989"},{"key":"e_1_2_2_51_1","first-page":"33485","article-title":"Egodistill: Egocentric head motion distillation for efficient video understanding","volume":"36","author":"Tan Shuhan","year":"2023","unstructured":"Shuhan Tan, Tushar Nagarajan, and Kristen Grauman. 2023. Egodistill: Egocentric head motion distillation for efficient video understanding. Advances in Neural Information Processing Systems 36 (2023), 33485\u201333498.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_2_52_1","unstructured":"Peng Wang Shuai Bai Sinan Tan Shijie Wang Zhihao Fan Jinze Bai Keqin Chen Xuejing Liu Jialin Wang Wenbin Ge et al. 2024. Qwen2-vl: Enhancing vision-language model's perception of the world at any resolution. arXiv preprint arXiv:2409.12191 (2024)."},{"key":"e_1_2_2_53_1","volume-title":"Physical activity assessments for health-related research. Human Kinetics","author":"Welk GJ","year":"2002","unstructured":"GJ Welk. 2002. Physical activity assessments for health-related research. Human Kinetics (2002)."},{"key":"e_1_2_2_54_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00137"},{"key":"e_1_2_2_55_1","volume-title":"Do egocentric video-language models really understand hand-object interactions? arXiv preprint arXiv:2405.17719","author":"Xu Boshen","year":"2024","unstructured":"Boshen Xu, Ziheng Wang, Yang Du, Zhinan Song, Sipeng Zheng, and Qin Jin. 2024. Egonce++: Do egocentric video-language models really understand hand-object interactions? arXiv preprint arXiv:2405.17719 (2024)."},{"key":"e_1_2_2_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52733.2024.01284"},{"key":"e_1_2_2_57_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 22479\u201322489","author":"Ye Yufei","year":"2023","unstructured":"Yufei Ye, Xueting Li, Abhinav Gupta, Shalini De Mello, Stan Birchfield, Jiaming Song, Shubham Tulsiani, and Sifei Liu. 2023. Affordance diffusion: Synthesizing hand-object interactions. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 22479\u201322489."},{"key":"e_1_2_2_58_1","volume-title":"Mediapipe hands: On-device real-time hand tracking. arXiv preprint arXiv:2006.10214","author":"Zhang Fan","year":"2020","unstructured":"Fan Zhang, Valentin Bazarevsky, Andrey Vakunov, Andrei Tkachenka, George Sung, Chuo-Ling Chang, and Matthias Grundmann. 2020. Mediapipe hands: On-device real-time hand tracking. arXiv preprint arXiv:2006.10214 (2020)."},{"key":"e_1_2_2_59_1","volume-title":"European Conference on Computer Vision. Springer, 127\u2013145","author":"Zhang Lingzhi","year":"2022","unstructured":"Lingzhi Zhang, Shenghao Zhou, Simon Stent, and Jianbo Shi. 2022. Fine-grained egocentric hand-object segmentation: Dataset, model, and applications. In European Conference on Computer Vision. Springer, 127\u2013145."},{"key":"e_1_2_2_60_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 14625\u201314635","author":"Zhang Yanyi","year":"2021","unstructured":"Yanyi Zhang, Xinyu Li, and Ivan Marsic. 2021. Multi-label activity recognition using activity-specific features and activity correlations. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 14625\u201314635."},{"key":"e_1_2_2_61_1","volume-title":"Proceedings of the IEEE\/CVF International conference on Computer Vision. 1513\u20131522","author":"Zhi Yuan","year":"2021","unstructured":"Yuan Zhi, Zhan Tong, Limin Wang, and Gangshan Wu. 2021. Mgsampler: An explainable sampling strategy for video action recognition. In Proceedings of the IEEE\/CVF International conference on Computer Vision. 1513\u20131522."},{"key":"e_1_2_2_62_1","volume-title":"2019 IEEE 21st International Workshop on Multimedia Signal Processing (MMSP). IEEE, 1\u20136.","author":"Zhong Chengzhang","year":"2019","unstructured":"Chengzhang Zhong, Amy R Reibman, Hansel Mina Cordoba, and Amanda J Deering. 2019. Hand-hygiene activity recognition in egocentric video. In 2019 IEEE 21st International Workshop on Multimedia Signal Processing (MMSP). IEEE, 1\u20136."}],"container-title":["Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3770695","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,2]],"date-time":"2025-12-02T19:49:13Z","timestamp":1764704953000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3770695"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12,2]]},"references-count":62,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025,12,2]]}},"alternative-id":["10.1145\/3770695"],"URL":"https:\/\/doi.org\/10.1145\/3770695","relation":{},"ISSN":["2474-9567"],"issn-type":[{"value":"2474-9567","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,12,2]]},"assertion":[{"value":"2025-12-02","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}