{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,2]],"date-time":"2025-08-02T16:42:09Z","timestamp":1754152929310,"version":"3.41.2"},"reference-count":69,"publisher":"Association for Computing Machinery (ACM)","issue":"4","funder":[{"name":"Meta Faculty Award, and used the Dutch national e-infrastructure with the support of the SURF Cooperative","award":["EINF-2391, no. EINF-8964, and no. EINF-9272"],"award-info":[{"award-number":["EINF-2391, no. EINF-8964, and no. EINF-9272"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Sen. Netw."],"published-print":{"date-parts":[[2025,7,31]]},"abstract":"<jats:p>Gaze estimation is of great importance to many scientific fields and daily applications, ranging from fundamental research in cognitive psychology to attention-aware systems. While recent advancements in deep learning have led to highly accurate gaze estimation systems, these solutions often come with high computational costs and depend on large-scale labeled gaze data for supervised learning, posing significant practical challenges. To move beyond these limitations, we present EfficientGaze, a resource-efficient framework for gaze representation learning. We introduce the frequency-domain gaze estimation, which exploits the feature extraction capability and the spectral compaction property of discrete cosine transform to substantially reduce the computational cost of gaze estimation systems for both calibration and inference. Moreover, to overcome the data labeling hurdle, we design a novel multi-task gaze-aware contrastive learning framework to learn gaze representations that are generic across subjects in an unsupervised manner. Our evaluation on two gaze estimation datasets demonstrates that EfficientGaze achieves comparable gaze estimation performance to existing supervised learning-based approaches, while enabling up to 6.80 times and 1.67 times speedup in system calibration and gaze estimation, respectively.<\/jats:p>","DOI":"10.1145\/3736767","type":"journal-article","created":{"date-parts":[[2025,5,23]],"date-time":"2025-05-23T07:25:42Z","timestamp":1747985142000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Resource-efficient Gaze Estimation via Frequency-domain Multi-task Contrastive Learning"],"prefix":"10.1145","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5479-7866","authenticated-orcid":false,"given":"Lingyu","family":"Du","sequence":"first","affiliation":[{"name":"Software Technology, Delft University of Technology","place":["Delft, Netherlands"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8368-3542","authenticated-orcid":false,"given":"Xucong","family":"Zhang","sequence":"additional","affiliation":[{"name":"Intelligent Systems, Delft University of Technology","place":["Delft, Netherlands"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2190-9937","authenticated-orcid":false,"given":"Guohao","family":"Lan","sequence":"additional","affiliation":[{"name":"Software Technology, Delft University of Technology","place":["Delft, Netherlands"]}]}],"member":"320","published-online":{"date-parts":[[2025,7,22]]},"reference":[{"doi-asserted-by":"publisher","key":"e_1_3_3_2_2","DOI":"10.1109\/T-C.1974.223784"},{"doi-asserted-by":"publisher","key":"e_1_3_3_3_2","DOI":"10.1145\/3127589"},{"doi-asserted-by":"publisher","key":"e_1_3_3_4_2","DOI":"10.1109\/CVPR52688.2022.00417"},{"unstructured":"Irwan Bello William Fedus Xianzhi Du Ekin Dogus Cubuk Aravind Srinivas Tsung-Yi Lin Jonathon Shlens and Barret Zoph. 2021. Revisiting ResNets: Improved training and scaling strategies. Advances in Neural Information Pro-cessing Systems 34 (2021) 22617\u201322627.","key":"e_1_3_3_5_2"},{"key":"e_1_3_3_6_2","first-page":"1597","volume-title":"Proceedings of International Conference on Machine Learning (ICML)","author":"Chen Ting","year":"2020","unstructured":"Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In Proceedings of International Conference on Machine Learning (ICML). 1597\u20131607."},{"doi-asserted-by":"publisher","key":"e_1_3_3_7_2","DOI":"10.1109\/TCOM.1984.1096066"},{"doi-asserted-by":"crossref","unstructured":"Yihua Cheng Haofei Wang Yiwei Bao and Feng Lu. 2024. Appearance-Based Gaze Estimation with deep learning: A review and benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence 46 12 (2024) 7509\u20137528.","key":"e_1_3_3_8_2","DOI":"10.1109\/TPAMI.2024.3393571"},{"doi-asserted-by":"publisher","key":"e_1_3_3_9_2","DOI":"10.1016\/j.imavis.2008.10.007"},{"unstructured":"Alexey Dosovitskiy Jost Tobias Springenberg Martin Riedmiller and Thomas Brox. 2014. Discriminative unsupervised feature learning with convolutional neural networks. Advances in Neural Information Processing Systems 27 (2014) 766\u2013774.","key":"e_1_3_3_10_2"},{"key":"e_1_3_3_11_2","first-page":"60","volume-title":"Proceedings of the International Conference on Embedded Wireless Systems and Networks (EWSN)","author":"Du Lingyu","year":"2023","unstructured":"Lingyu Du and Guohao Lan. 2023. FreeGaze: Resource-efficient Gaze Estimation via frequency-domain contrastive learning. In Proceedings of the International Conference on Embedded Wireless Systems and Networks (EWSN). 60\u201371."},{"doi-asserted-by":"publisher","key":"e_1_3_3_12_2","DOI":"10.1109\/ICCV.2019.00358"},{"doi-asserted-by":"publisher","key":"e_1_3_3_13_2","DOI":"10.1109\/ICPR.2016.7900182"},{"doi-asserted-by":"publisher","key":"e_1_3_3_14_2","DOI":"10.1109\/WACV51458.2022.00123"},{"key":"e_1_3_3_15_2","volume-title":"Proceedings of International Conference on Learning Representations (ICLR)","author":"Gidaris Spyros","year":"2018","unstructured":"Spyros Gidaris, Praveer Singh, and Nikos Komodakis. 2018. Unsupervised representation learning by predicting image rotations. In Proceedings of International Conference on Learning Representations (ICLR)."},{"unstructured":"Jean-Bastien Grill Florian Strub Florent Altch\u00e9 Corentin Tallec Pierre Richemond Elena Buchatskaya Carl Doersch Bernardo Avila Pires Zhaohan Guo Mohammad Gheshlaghi Azar Bilal Piot Koray Kavukcuoglu Remi Munos and Michal Valko. 2020. Bootstrap your own latent-a new approach to self-supervised learning. Advances in Neural Information Processing Systems 33 (2020) 21271\u201321284.","key":"e_1_3_3_16_2"},{"unstructured":"Lionel Gueguen Alex Sergeev Ben Kadlec Rosanne Liu and Jason Yosinski. 2018. Faster neural networks straight from JPEG. Advances in Neural Information Processing Systems (2018) 3937\u20133948.","key":"e_1_3_3_17_2"},{"doi-asserted-by":"publisher","key":"e_1_3_3_18_2","DOI":"10.1109\/TBME.2005.863952"},{"doi-asserted-by":"publisher","key":"e_1_3_3_19_2","DOI":"10.1016\/j.neubiorev.2014.03.013"},{"doi-asserted-by":"publisher","key":"e_1_3_3_20_2","DOI":"10.1109\/CVPR.2006.100"},{"doi-asserted-by":"publisher","key":"e_1_3_3_21_2","DOI":"10.1109\/TPAMI.2009.30"},{"unstructured":"Yujiao Hao Rong Zheng and Boyu Wang. 2022. Invariant feature learning for sensor-based human activity recognition. IEEE Transactions on Mobile Computing 21 11 (2022) 4013\u20134024.","key":"e_1_3_3_22_2"},{"doi-asserted-by":"publisher","key":"e_1_3_3_23_2","DOI":"10.1109\/CVPR42600.2020.00975"},{"doi-asserted-by":"publisher","key":"e_1_3_3_24_2","DOI":"10.1109\/CVPR.2015.7299173"},{"doi-asserted-by":"publisher","key":"e_1_3_3_25_2","DOI":"10.1109\/CVPR.2016.90"},{"doi-asserted-by":"publisher","key":"e_1_3_3_26_2","DOI":"10.1145\/3494999"},{"doi-asserted-by":"publisher","key":"e_1_3_3_27_2","DOI":"10.1145\/2638728.2641695"},{"doi-asserted-by":"publisher","key":"e_1_3_3_28_2","DOI":"10.1109\/ICCV.2019.00701"},{"issue":"4","key":"e_1_3_3_29_2","first-page":"1","article-title":"Foveated AR: Dynamically-foveated augmented reality display","volume":"38","author":"Kim Jonghyun","year":"2019","unstructured":"Jonghyun Kim, Youngmo Jeong, Michael Stengel, Kaan Ak\u015fit, Rachel Albert, Ben Boudaoud, Trey Greer, Joohwan Kim, Ward Lopes, Zander Majercik, et\u00a0al. 2019. Foveated AR: Dynamically-foveated augmented reality display. ACM Transactions on Graphics 38, 4 (2019), 1\u201315.","journal-title":"ACM Transactions on Graphics"},{"unstructured":"Diederik Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of International Conference on Learning Representations (ICLR).","key":"e_1_3_3_30_2"},{"unstructured":"Diederik P. Kingma and Max Welling. 2014. Auto-encoding variational bayes. In Proceedings of International Conference on Learning Representations (ICLR).","key":"e_1_3_3_31_2"},{"doi-asserted-by":"publisher","key":"e_1_3_3_32_2","DOI":"10.1109\/CVPR.2016.239"},{"doi-asserted-by":"publisher","key":"e_1_3_3_33_2","DOI":"10.1016\/j.patcog.2021.107848"},{"doi-asserted-by":"publisher","key":"e_1_3_3_34_2","DOI":"10.1145\/3394171.3413548"},{"doi-asserted-by":"publisher","key":"e_1_3_3_35_2","DOI":"10.1007\/978-1-4471-6392-3_3"},{"doi-asserted-by":"publisher","key":"e_1_3_3_36_2","DOI":"10.1007\/s10339-012-0499-z"},{"doi-asserted-by":"publisher","key":"e_1_3_3_37_2","DOI":"10.1109\/CVPR.2016.465"},{"doi-asserted-by":"publisher","key":"e_1_3_3_38_2","DOI":"10.1109\/FG52635.2021.9666792"},{"doi-asserted-by":"publisher","key":"e_1_3_3_39_2","DOI":"10.1145\/3313831.3376799"},{"doi-asserted-by":"publisher","key":"e_1_3_3_40_2","DOI":"10.1007\/978-3-319-46466-4_5"},{"doi-asserted-by":"publisher","key":"e_1_3_3_41_2","DOI":"10.1145\/2980179.2980246"},{"doi-asserted-by":"crossref","unstructured":"Liangying Peng Ling Chen Zhenan Ye and Yi Zhang. 2018. AROMA: A deep multi-task learning based simple and complex human activity recognition method using wearable sensors. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2 2 (2018) 1\u201316.","key":"e_1_3_3_42_2","DOI":"10.1145\/3214277"},{"doi-asserted-by":"publisher","key":"e_1_3_3_43_2","DOI":"10.5555\/573326"},{"doi-asserted-by":"publisher","key":"e_1_3_3_44_2","DOI":"10.1145\/2858036.2858117"},{"unstructured":"Alec Radford Luke Metz and Soumith Chintala. 2016. Unsupervised representation learning with deep convolutional generative adversarial networks. In Proceedings of International Conference on Learning Representations (ICLR).","key":"e_1_3_3_45_2"},{"key":"e_1_3_3_46_2","volume-title":"Discrete Cosine Transform: Algorithms, Advantages, Applications","author":"Rao K Ramamohan","year":"2014","unstructured":"K Ramamohan Rao and Ping Yip. 2014. Discrete Cosine Transform: Algorithms, Advantages, Applications. Academic Press."},{"doi-asserted-by":"crossref","unstructured":"Aaqib Saeed Tanir Ozcelebi and Johan Lukkien. 2019. Multi-task self-supervised learning for human activity detection. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 3 2 (2019) 1\u201330.","key":"e_1_3_3_47_2","DOI":"10.1145\/3328932"},{"doi-asserted-by":"publisher","key":"e_1_3_3_48_2","DOI":"10.1002\/npr2.12046"},{"doi-asserted-by":"publisher","key":"e_1_3_3_49_2","DOI":"10.1109\/ICCV48922.2021.00368"},{"doi-asserted-by":"crossref","unstructured":"Nachiappan Valliappan Na Dai Ethan Steinberg Junfeng He Kantwon Rogers Venky Ramachandran Pingmei Xu Mina Shojaeizadeh Li Guo Kai Kohlhoff and Vidhya Navalpakkam. 2020. Accelerating eye movement research via accurate and affordable smartphone eye tracking. Nature Communications 11 1 (2020) 1\u201312.","key":"e_1_3_3_50_2","DOI":"10.1038\/s41467-020-18360-5"},{"doi-asserted-by":"publisher","key":"e_1_3_3_51_2","DOI":"10.1145\/103085.103089"},{"unstructured":"Xiaofang Wang Kris M. Kitani and Martial Hebert. 2016. Contextual visual similarity. arXiv:1612.02534. Retrieved from https:\/\/arxiv.org\/abs\/1612.02534","key":"e_1_3_3_52_2"},{"doi-asserted-by":"publisher","key":"e_1_3_3_53_2","DOI":"10.1109\/CVPR52688.2022.01877"},{"doi-asserted-by":"publisher","key":"e_1_3_3_54_2","DOI":"10.1109\/ICCV.2015.428"},{"doi-asserted-by":"publisher","key":"e_1_3_3_55_2","DOI":"10.1145\/2578153.2578185"},{"doi-asserted-by":"publisher","key":"e_1_3_3_56_2","DOI":"10.1109\/CVPR.2018.00393"},{"doi-asserted-by":"publisher","key":"e_1_3_3_57_2","DOI":"10.1109\/ICCV48922.2021.00828"},{"doi-asserted-by":"publisher","key":"e_1_3_3_58_2","DOI":"10.1109\/CVPR42600.2020.00181"},{"doi-asserted-by":"publisher","key":"e_1_3_3_59_2","DOI":"10.1016\/j.artmed.2018.06.005"},{"doi-asserted-by":"publisher","key":"e_1_3_3_60_2","DOI":"10.1145\/3274783.3274840"},{"doi-asserted-by":"publisher","key":"e_1_3_3_61_2","DOI":"10.1109\/CVPR.2019.00637"},{"doi-asserted-by":"publisher","key":"e_1_3_3_62_2","DOI":"10.1109\/CVPR.2019.01221"},{"doi-asserted-by":"publisher","key":"e_1_3_3_63_2","DOI":"10.1109\/CVPR42600.2020.00734"},{"key":"e_1_3_3_64_2","volume-title":"Intelligent Image and Video Compression: Communicating Pictures","author":"Zhang Fan","year":"2021","unstructured":"Fan Zhang and David R Bull. 2021. Intelligent Image and Video Compression: Communicating Pictures. Academic Press."},{"doi-asserted-by":"publisher","key":"e_1_3_3_65_2","DOI":"10.1145\/3173574.3174198"},{"doi-asserted-by":"publisher","key":"e_1_3_3_66_2","DOI":"10.1145\/3173574.3174198"},{"doi-asserted-by":"publisher","key":"e_1_3_3_67_2","DOI":"10.1007\/978-3-030-58558-7_22"},{"doi-asserted-by":"publisher","key":"e_1_3_3_68_2","DOI":"10.1109\/CVPR.2015.7299081"},{"key":"e_1_3_3_69_2","first-page":"51","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPR)","author":"Zhang Xucong","year":"2017","unstructured":"Xucong Zhang, Yusuke Sugano, Mario Fritz, and Andreas Bulling. 2017. It\u2019s written all over your face: Full-face appearance-based gaze estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPR). 51\u201360."},{"doi-asserted-by":"publisher","key":"e_1_3_3_70_2","DOI":"10.1109\/TPAMI.2017.2778103"}],"container-title":["ACM Transactions on Sensor Networks"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3736767","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,22]],"date-time":"2025-07-22T13:35:55Z","timestamp":1753191355000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3736767"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,22]]},"references-count":69,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025,7,31]]}},"alternative-id":["10.1145\/3736767"],"URL":"https:\/\/doi.org\/10.1145\/3736767","relation":{},"ISSN":["1550-4859","1550-4867"],"issn-type":[{"type":"print","value":"1550-4859"},{"type":"electronic","value":"1550-4867"}],"subject":[],"published":{"date-parts":[[2025,7,22]]},"assertion":[{"value":"2024-07-08","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-05-06","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-07-22","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}