{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T05:06:34Z","timestamp":1750309594599,"version":"3.41.0"},"reference-count":28,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2025,5,26]],"date-time":"2025-05-26T00:00:00Z","timestamp":1748217600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Comput. Graph. Interact. Tech."],"published-print":{"date-parts":[[2025,6]]},"abstract":"<jats:p>In everyday visual search tasks, humans rely on prior knowledge of object placements in scenes to efficiently locate target objects. This ability is evidenced by eye movement patterns, where individuals focus on areas that are more likely to contain the target, such as searching for a cup on a table or shoes on the floor. Building on this, we propose a new annotation pipeline that leverages these priors by extracting a knowledge graph from images based on automatically annotated objects. This knowledge graph is then used with large language models (LLMs) to predict the most likely locations of a specific target object in an image. Our approach is the first instance of using LLMs to identify relevant prior knowledge in images and to bridge the gap between human scene understanding and computational models.<\/jats:p>","DOI":"10.1145\/3729414","type":"journal-article","created":{"date-parts":[[2025,5,27]],"date-time":"2025-05-27T05:51:27Z","timestamp":1748325087000},"page":"1-18","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["A Framework for Leveraging LLMs for Scene Analysis and Cognitive Processing"],"prefix":"10.1145","volume":"8","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8826-5163","authenticated-orcid":false,"given":"Catarina","family":"Moreira","sequence":"first","affiliation":[{"name":"Data Science Institute, University of Technology Sydney, Sydney, Australia, and INESC-ID, Lisbon, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2004-8653","authenticated-orcid":false,"given":"Jeffrey","family":"Cockburn","sequence":"additional","affiliation":[{"name":"University of Iowa, Iowa City, Iowa, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1097-9721","authenticated-orcid":false,"given":"Monica S.","family":"Castelhano","sequence":"additional","affiliation":[{"name":"Psychology, Queen's University, Kingston, Ontario, Canada"}]}],"member":"320","published-online":{"date-parts":[[2025,5,26]]},"reference":[{"key":"e_1_3_3_2_1","doi-asserted-by":"publisher","DOI":"10.1126\/science.177.4043.77"},{"key":"e_1_3_3_3_1","doi-asserted-by":"publisher","DOI":"10.3758\/s13423-011-0107-8"},{"key":"e_1_3_3_4_1","doi-asserted-by":"publisher","DOI":"10.1146\/annurev-vision-121219-081745"},{"key":"e_1_3_3_5_1","doi-asserted-by":"publisher","DOI":"10.1177\/0956797616629130"},{"key":"e_1_3_3_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIP.2018.2851672"},{"key":"e_1_3_3_7_1","first-page":"545","volume-title":"Advances in Neural Information Processing Systems 19, Proceedings of the Twentieth Annual Conference on Neural Information Processing Systems","author":"Harel Jonathan","year":"2006","unstructured":"Jonathan Harel, Christof Koch, and Pietro Perona. 2006. Graph-Based Visual Saliency. In Advances in Neural Information Processing Systems 19, Proceedings of the Twentieth Annual Conference on Neural Information Processing Systems, Bernhard Schoelkopf, John\u00a0C. Platt, and Thomas Hofmann (Eds.). Vancouver, British Columbia, Canada, 545\u2013552."},{"key":"e_1_3_3_8_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.tics.2003.09.006"},{"key":"e_1_3_3_9_1","doi-asserted-by":"publisher","DOI":"10.1038\/s41598-023-41463-0"},{"key":"e_1_3_3_10_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0042-6989(99)00163-7"},{"key":"e_1_3_3_11_1","doi-asserted-by":"publisher","DOI":"10.1038\/35058500"},{"key":"e_1_3_3_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2009.5459462"},{"key":"e_1_3_3_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV51070.2023.00371"},{"key":"e_1_3_3_14_1","doi-asserted-by":"publisher","DOI":"10.3758\/s13423-024-02515-2"},{"key":"e_1_3_3_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/3588015.3588405"},{"key":"e_1_3_3_16_1","article-title":"DeepGaze II: Reading fixations from deep features trained on object recognition","author":"K\u00fcmmerer Matthias","year":"2016","unstructured":"Matthias K\u00fcmmerer, Thomas\u00a0SA Wallis, and Matthias Bethge. 2016. DeepGaze II: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/1610.01563 (2016).","journal-title":"arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/1610.01563"},{"key":"e_1_3_3_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3588015.3588403"},{"key":"e_1_3_3_18_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.visres.2006.01.006"},{"key":"e_1_3_3_19_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ejrad.2024.111341"},{"key":"e_1_3_3_20_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.tics.2007.09.009"},{"key":"e_1_3_3_21_1","doi-asserted-by":"publisher","DOI":"10.1037\/a0037524"},{"key":"e_1_3_3_22_1","doi-asserted-by":"publisher","DOI":"10.3758\/s13423-019-01610-z"},{"key":"e_1_3_3_23_1","doi-asserted-by":"publisher","DOI":"10.1080\/17470210902816461"},{"key":"e_1_3_3_24_1","doi-asserted-by":"publisher","DOI":"10.1037\/0033-295X.113.4.766"},{"key":"e_1_3_3_25_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.cognition.2012.09.017"},{"key":"e_1_3_3_26_1","doi-asserted-by":"publisher","DOI":"10.3390\/vision3030033"},{"key":"e_1_3_3_27_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.tics.2010.12.001"},{"key":"e_1_3_3_28_1","unstructured":"Yuxin Wu Alexander Kirillov Francisco Massa Wan-Yen Lo and Ross Girshick. 2019. Detectron2. https:\/\/github.com\/facebookresearch\/detectron2."},{"key":"e_1_3_3_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW.2019.00111"}],"container-title":["Proceedings of the ACM on Computer Graphics and Interactive Techniques"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3729414","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:56:57Z","timestamp":1750298217000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3729414"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,5,26]]},"references-count":28,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,6]]}},"alternative-id":["10.1145\/3729414"],"URL":"https:\/\/doi.org\/10.1145\/3729414","relation":{},"ISSN":["2577-6193"],"issn-type":[{"type":"electronic","value":"2577-6193"}],"subject":[],"published":{"date-parts":[[2025,5,26]]},"assertion":[{"value":"2025-05-26","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}