{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,31]],"date-time":"2026-01-31T10:31:06Z","timestamp":1769855466211,"version":"3.49.0"},"reference-count":32,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2025,5,26]],"date-time":"2025-05-26T00:00:00Z","timestamp":1748217600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Bundesministerium f\u00fcr Digitales und Verkehr, Germany"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Comput. Graph. Interact. Tech."],"published-print":{"date-parts":[[2025,6]]},"abstract":"<jats:p>Recent advancements in deep learning models for object segmentation have significantly enhanced the analysis of mobile eye-tracking data, enabling automatic dataset annotation. However, many of these models rely solely on raw gaze and fixation points to indicate annotation targets on individual video frames, thereby neglecting the valuable temporal contexts provided by gaze features. In this paper, we introduce a pipeline that integrates temporal gaze features into the SAM 2 model to improve the segmentation accuracy of automatic annotations for mobile eye-tracking data. Specifically, we investigate different types of encoding temporal gaze information, used as prompts, to help the model learn gaze patterns at various levels of object granularity and to account for inaccuracies in gaze location introduced by mobile scenarios. Our experiments show that adding temporal context largely benefits the segmentation performance. In addition, our proposed fusion strategy allows us to combine different types of prompts and shows promising results.<\/jats:p>","DOI":"10.1145\/3729412","type":"journal-article","created":{"date-parts":[[2025,5,27]],"date-time":"2025-05-27T05:51:27Z","timestamp":1748325087000},"page":"1-17","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Leveraging Temporal Gaze Patterns for Intelligent Segmentation in Mobile Eye-Tracking"],"prefix":"10.1145","volume":"8","author":[{"ORCID":"https:\/\/orcid.org\/0009-0008-6635-3411","authenticated-orcid":false,"given":"Shupeng","family":"Wang","sequence":"first","affiliation":[{"name":"Institute of Cartography and Geoinformation, ETH Zurich, Zurich, ZH, Switzerland"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1637-0092","authenticated-orcid":false,"given":"Yizi","family":"Chen","sequence":"additional","affiliation":[{"name":"Institute of Cartography and Geoinformation, ETH Zurich, Zurich, ZH, Switzerland"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1669-6690","authenticated-orcid":false,"given":"Sidi","family":"Wu","sequence":"additional","affiliation":[{"name":"Institute of Cartography and Geoinformation, ETH Zurich, Zurich, ZH, Switzerland"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4457-0438","authenticated-orcid":false,"given":"Peter","family":"Kiefer","sequence":"additional","affiliation":[{"name":"Institute of Cartography and Geoinformation, ETH Zurich, Zurich, ZH, Switzerland"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5951-6835","authenticated-orcid":false,"given":"Martin","family":"Raubal","sequence":"additional","affiliation":[{"name":"Institute of Cartography and Geoinformation, ETH Zurich, Zurich, ZH, Switzerland"}]}],"member":"320","published-online":{"date-parts":[[2025,5,26]]},"reference":[{"key":"e_1_3_2_2_1","doi-asserted-by":"publisher","DOI":"10.3390\/s24092666"},{"key":"e_1_3_2_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/3450341.3458766"},{"key":"e_1_3_2_4_1","first-page":"21","volume-title":"Proceedings of The 2nd Gaze Meets ML workshop","author":"Beckmann Daniel","year":"2024","unstructured":"Daniel Beckmann, Jacqueline Kockwelp, Joerg Gromoll, Friedemann Kiefer, and Benjamin Risse. 2024. SAM meets Gaze: Passive Eye Tracking for Prompt-based Instance Segmentation. In Proceedings of The 2nd Gaze Meets ML workshop, Vol.\u00a0226. 21\u201339. https:\/\/proceedings.mlr.press\/v226\/beckmann24a.html"},{"key":"e_1_3_2_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2019.2929257"},{"key":"e_1_3_2_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.00135"},{"key":"e_1_3_2_7_1","doi-asserted-by":"publisher","DOI":"10.3758\/s13428-022-01833-4"},{"key":"e_1_3_2_8_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cag.2018.04.002"},{"key":"e_1_3_2_9_1","doi-asserted-by":"publisher","DOI":"10.3758\/s13428-024-02473-6"},{"key":"e_1_3_2_10_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0169-8141(98)00068-7"},{"key":"e_1_3_2_11_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.compmedimag.2017.04.006"},{"key":"e_1_3_2_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.322"},{"key":"e_1_3_2_13_1","doi-asserted-by":"publisher","DOI":"10.3758\/s13428-022-02010-3"},{"key":"e_1_3_2_14_1","doi-asserted-by":"publisher","DOI":"10.16910\/jemr.10.1.3"},{"key":"e_1_3_2_15_1","doi-asserted-by":"publisher","DOI":"10.1080\/13875868.2016.1254634"},{"key":"e_1_3_2_16_1","doi-asserted-by":"publisher","unstructured":"Alexander Kirillov Eric Mintun Nikhila Ravi Hanzi Mao Chloe Rolland Laura Gustafson Tete Xiao Spencer Whitehead Alexander\u00a0C. Berg Wan-Yen Lo Piotr Doll\u00e1r and Ross Girshick. 2023. Segment Anything. (2023). doi: 10.48550\/arXiv.2304.02643 arXiv:https:\/\/arXiv.org\/abs\/2304.02643","DOI":"10.48550\/arXiv.2304.02643"},{"key":"e_1_3_2_17_1","doi-asserted-by":"publisher","DOI":"10.1093\/acprof:oso\/9780198570943.001.0001"},{"key":"e_1_3_2_18_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4471-6392-33"},{"key":"e_1_3_2_19_1","doi-asserted-by":"publisher","DOI":"10.1145\/3654704"},{"key":"e_1_3_2_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3450341.3458886"},{"key":"e_1_3_2_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/3655600"},{"key":"e_1_3_2_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/2857491.2857541"},{"key":"e_1_3_2_23_1","doi-asserted-by":"publisher","unstructured":"Nikhila Ravi Valentin Gabeur Yuan-Ting Hu Ronghang Hu Chaitanya Ryali Tengyu Ma Haitham Khedr Roman R\u00e4dle Chloe Rolland Laura Gustafson Eric Mintun Junting Pan Kalyan\u00a0Vasudev Alwala Nicolas Carion Chao-Yuan Wu Ross Girshick Piotr Doll\u00e1r and Christoph Feichtenhofer. 2024. SAM 2: Segment Anything in Images and Videos. (2024). doi: 10.48550\/arXiv.2304.02643 arXiv:https:\/\/arXiv.org\/abs\/2408.00714","DOI":"10.48550\/arXiv.2304.02643"},{"key":"e_1_3_2_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.91"},{"key":"e_1_3_2_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00075"},{"key":"e_1_3_2_26_1","first-page":"29441","volume-title":"Proceedings of the 40th International Conference on Machine Learning","author":"Ryali Chaitanya","year":"2023","unstructured":"Chaitanya Ryali, Yuan-Ting Hu, Daniel Bolya, Chen Wei, Haoqi Fan, Po-Yao Huang, Vaibhav Aggarwal, Arkabandhu Chowdhury, Omid Poursaeed, Judy Hoffman, Jitendra Malik, Yanghao Li, and Christoph Feichtenhofer. 2023. Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles. In Proceedings of the 40th International Conference on Machine Learning, Vol.\u00a0202. 29441\u201329454. https:\/\/proceedings.mlr.press\/v202\/ryali23a.html"},{"key":"e_1_3_2_27_1","doi-asserted-by":"publisher","DOI":"10.1145\/3649902.3653335"},{"key":"e_1_3_2_28_1","article-title":"Controller app","year":"2024","unstructured":"Tobii. 2024a. Controller app. Online. https:\/\/www.tobii.com\/products\/eye-trackers\/wearables\/tobii-pro-glasses-3\/controller-app Last accessed 30 October 2024.","journal-title":"Online"},{"key":"e_1_3_2_29_1","article-title":"Technical specifications","year":"2024","unstructured":"Tobii. 2024b. Technical specifications. Online. https:\/\/www.tobii.com\/products\/eye-trackers\/wearables\/tobii-pro-glasses-3#specifications Last accessed 30 October 2024.","journal-title":"Online"},{"key":"e_1_3_2_30_1","article-title":"Tobii Pro Lab","year":"2024","unstructured":"Tobii. 2024c. Tobii Pro Lab. Online. https:\/\/www.tobii.com\/products\/software\/behavior-research-software\/tobii-pro-lab Last accessed 30 October 2024.","journal-title":"Online"},{"key":"e_1_3_2_31_1","doi-asserted-by":"publisher","unstructured":"Bin Wang Armstrong Aboah Zheyuan Zhang and Ulas Bagci. 2023a. GazeSAM: What You See is What You Segment. (2023). doi: 10.48550\/arXiv.2304.13844 arXiv:https:\/\/arXiv.org\/abs\/2304.13844","DOI":"10.48550\/arXiv.2304.13844"},{"key":"e_1_3_2_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/3588015.3589840"},{"key":"e_1_3_2_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/3589132.3625572"}],"container-title":["Proceedings of the ACM on Computer Graphics and Interactive Techniques"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3729412","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:56:57Z","timestamp":1750298217000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3729412"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,5,26]]},"references-count":32,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,6]]}},"alternative-id":["10.1145\/3729412"],"URL":"https:\/\/doi.org\/10.1145\/3729412","relation":{},"ISSN":["2577-6193"],"issn-type":[{"value":"2577-6193","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,5,26]]},"assertion":[{"value":"2025-05-26","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}