{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,3]],"date-time":"2026-04-03T15:37:20Z","timestamp":1775230640958,"version":"3.50.1"},"reference-count":119,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2023,3,21]],"date-time":"2023-03-21T00:00:00Z","timestamp":1679356800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Artif. Intell."],"abstract":"<jats:p>Situational context is crucial for linguistic reference to visible objects, since the same description can refer unambiguously to an object in one context but be ambiguous or misleading in others. This also applies to Referring Expression Generation (<jats:italic>REG<\/jats:italic>), where the production of identifying descriptions is always dependent on a given context. Research in REG has long represented visual domains through<jats:italic>symbolic<\/jats:italic>information about objects and their properties, to determine identifying sets of target features during content determination. In recent years, research in<jats:italic>visual REG<\/jats:italic>has turned to neural modeling and recasted the REG task as an inherently multimodal problem, looking at more natural settings such as generating descriptions for objects in photographs. Characterizing the precise ways in which context influences generation is challenging in both paradigms, as context is notoriously lacking precise definitions and categorization. In multimodal settings, however, these problems are further exacerbated by the increased complexity and low-level representation of perceptual inputs. The main goal of this article is to provide a systematic review of the types and functions of visual context across various approaches to REG so far and to argue for integrating and extending different perspectives on visual context that currently co-exist in research on REG. By analyzing the ways in which symbolic REG integrates context in rule-based approaches, we derive a set of categories of contextual integration, including the distinction between<jats:italic>positive<\/jats:italic>and<jats:italic>negative semantic forces<\/jats:italic>exerted by context during reference generation. Using this as a framework, we show that so far existing work in visual REG has considered only some of the ways in which visual context can facilitate end-to-end reference generation. Connecting with preceding research in related areas, as possible directions for future research, we highlight some additional ways in which contextual integration can be incorporated into REG and other multimodal generation tasks.<\/jats:p>","DOI":"10.3389\/frai.2023.1067125","type":"journal-article","created":{"date-parts":[[2023,3,21]],"date-time":"2023-03-21T05:55:41Z","timestamp":1679378141000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Rethinking symbolic and visual context in Referring Expression Generation"],"prefix":"10.3389","volume":"6","author":[{"given":"Simeon","family":"Sch\u00fcz","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Albert","family":"Gatt","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sina","family":"Zarrie\u00df","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1965","published-online":{"date-parts":[[2023,3,21]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","first-page":"339","DOI":"10.1146\/annurev.neuro.25.112701.142900","article-title":"Contextual influences on visual processing","volume":"25","author":"Albright","year":"2002","journal-title":"Annu. Rev. Neurosci"},{"key":"B2","first-page":"640","article-title":"\u201cA computational model of referring,\u201d","volume-title":"Proceedings of the 10th International Joint Conference on Artificial Intelligence - Volume 2","author":"Appelt","year":"1987"},{"key":"B3","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/0004-3702(85)90011-6","article-title":"Planning english referring expressions","volume":"26","author":"Appelt","year":"1985","journal-title":"Artif. Intell"},{"key":"B4","first-page":"42","article-title":"\u201cReferring expressions as formulas of description logic,\u201d","volume-title":"Proceedings of the Fifth International Natural Language Generation Conference","author":"Areces","year":"2008"},{"key":"B5","article-title":"\u201cNeural machine translation by jointly learning to align and translate,\u201d","volume-title":"3rd International Conference on Learning Representations, ICLR 2015","author":"Bahdanau","year":"2015"},{"key":"B6","doi-asserted-by":"publisher","first-page":"103","DOI":"10.3389\/fpsyg.2016.00103","article-title":"Talking about relations: factors influencing the production of relational descriptions","volume":"7","author":"Baltaretu","year":"2016","journal-title":"Front. Psychol"},{"key":"B7","doi-asserted-by":"publisher","first-page":"617","DOI":"10.1038\/nrn1476","article-title":"Visual objects in context","volume":"5","author":"Bar","year":"2004","journal-title":"Nat. Rev. Neurosci"},{"key":"B8","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-642-15573-4_15","article-title":"\u201cGenerating referring expressions in context: the grec task evaluation challenges,\u201d","volume-title":"Empirical Methods in Natural Language Generation. EACL ENLG 2009 2009. Lecture Notes in Computer Science, Vol. 5790","author":"Belz","year":"2010"},{"key":"B9","doi-asserted-by":"publisher","first-page":"77","DOI":"10.1126\/science.177.4043.77","article-title":"Perceiving real-world scenes","volume":"177","author":"Biederman","year":"1972","journal-title":"Science"},{"key":"B10","doi-asserted-by":"publisher","first-page":"14","DOI":"10.1037\/h0041727","article-title":"How shall a thing be called?","volume":"65","author":"Brown","year":"1958","journal-title":"Psychol. Rev"},{"key":"B11","doi-asserted-by":"publisher","DOI":"10.48550\/ARXIV.2109.07301","article-title":"What vision-language models \u2018see' when they see scenes","author":"Cafagna","year":"2021","journal-title":"[Pre-Print]."},{"key":"B12","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/FUZZ45933.2021.9494544","article-title":"\u201cReferring expression generation from images via deep learning object extraction and fuzzy graphs,\u201d","volume-title":"2021 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE)","author":"Chamorro-Mart\u00ednez","year":"2021"},{"key":"B13","doi-asserted-by":"publisher","first-page":"28","DOI":"10.1006\/cogp.1998.0681","article-title":"Contextual cueing: implicit learning and memory of visual context guides spatial attention","volume":"36","author":"Chun","year":"1998","journal-title":"Cogn. Psychol"},{"key":"B14","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511620539","volume-title":"Using Language","author":"Clark","year":"1996"},{"key":"B15","doi-asserted-by":"publisher","first-page":"329","DOI":"10.3389\/fpsyg.2013.00329","article-title":"Where's wally: the influence of visual salience on referring expression generation","volume":"4","author":"Clarke","year":"","journal-title":"Front. Psychol"},{"key":"B16","doi-asserted-by":"publisher","first-page":"927","DOI":"10.3389\/fpsyg.2013.00927","article-title":"The impact of attentional, linguistic, and visual features during object naming","volume":"4","author":"Clarke","year":"","journal-title":"Front. Psychol"},{"key":"B17","doi-asserted-by":"publisher","first-page":"1793","DOI":"10.3389\/fpsyg.2015.01793","article-title":"Giving good directions: Order of mention reflects visual salience","volume":"6","author":"Clarke","year":"2015","journal-title":"Front. Psychol"},{"key":"B18","doi-asserted-by":"crossref","first-page":"68","DOI":"10.3115\/981623.981632","article-title":"\u201cCooking up referring expressions,\u201d","volume-title":"27th Annual Meeting of the Association for Computational Linguistics","author":"Dale","year":"1989"},{"key":"B19","volume-title":"Generating Referring Expressions: Constructing Descriptions in a Domain of Objects and Processes","author":"Dale","year":"1992"},{"key":"B20","doi-asserted-by":"publisher","first-page":"252","DOI":"10.1111\/j.1467-8640.1991.tb00399.x","article-title":"Content determination in the generation of referring expressions","volume":"7","author":"Dale","year":"","journal-title":"Comput. Intell"},{"key":"B21","doi-asserted-by":"crossref","DOI":"10.3115\/977180.977208","article-title":"\u201cGenerating referring expressions involving relations,\u201d","volume-title":"Fifth Conference of the European Chapter of the Association for Computational Linguistics","author":"Dale","year":""},{"key":"B22","doi-asserted-by":"publisher","first-page":"233","DOI":"10.1207\/s15516709cog1902_3","article-title":"Computational interpretations of the gricean maxims in the generation of referring expressions","volume":"19","author":"Dale","year":"1995","journal-title":"Cogn. Sci"},{"key":"B23","doi-asserted-by":"crossref","DOI":"10.1109\/CVPR.2009.5206532","article-title":"\u201cAn empirical study of context in object detection,\u201d","volume-title":"2009 IEEE Conference on Computer Vision and Pattern Recognition","author":"Divvala","year":"2009"},{"key":"B24","article-title":"\u201cToward human-like object naming in artificial neural systems,\u201d","author":"Eisape","year":"2020","journal-title":"International Conference on Learning Representations (ICLR 2020), Bridging AI and Cognitive Science Workshop"},{"key":"B25","first-page":"1544","article-title":"\u201cCollaborative models for referring expression generation in situated dialogue,\u201d","volume-title":"Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence","author":"Fang","year":"2014"},{"key":"B26","doi-asserted-by":"crossref","DOI":"10.1145\/2696454.2696467","article-title":"\u201cEmbodied collaborative referring expression generation in situated human-robot interaction,\u201d","volume-title":"Proceedings of the Tenth Annual ACM\/IEEE International Conference on Human-Robot Interaction","author":"Fang","year":"2015"},{"key":"B27","first-page":"392","article-title":"\u201cTowards situated dialogue: revisiting referring expression generation,\u201d","volume-title":"Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing","author":"Fang","year":"2013"},{"key":"B28","first-page":"55","article-title":"\u201cScenes-and-frames semantics,\u201d","volume-title":"Linguistic Structures Processing","author":"Fillmore","year":"1977"},{"key":"B29","doi-asserted-by":"publisher","first-page":"1700","DOI":"10.1080\/17470210903490969","article-title":"The use of visual context during the production of referring expressions","volume":"63","author":"Fukumura","year":"2010","journal-title":"Q. J. Exp. Psychol"},{"key":"B30","doi-asserted-by":"publisher","first-page":"712","DOI":"10.1016\/j.cviu.2010.02.004","article-title":"Context based object categorization: a critical survey","volume":"114","author":"Galleguillos","year":"2010","journal-title":"Comput. Vis. Image Understand"},{"key":"B31","first-page":"96","article-title":"\u201cGenerating minimal definite descriptions,\u201d","volume-title":"Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics","author":"Gardent","year":"2002"},{"key":"B32","doi-asserted-by":"crossref","first-page":"255","DOI":"10.3115\/1273073.1273106","article-title":"\u201cConceptual coherence in the generation of referring expressions,\u201d","volume-title":"Proceedings of the COLING\/ACL 2006 Main Conference Poster Sessions","author":"Gatt","year":"2006"},{"key":"B33","doi-asserted-by":"publisher","first-page":"423","DOI":"10.1007\/s10849-007-9047-0","article-title":"Lexical choice and conceptual perspective in the generation of plural referring expressions","volume":"16","author":"Gatt","year":"2007","journal-title":"J. Logic Lang. Inf"},{"key":"B34","first-page":"153","article-title":"\u201cKnowing when to look for what and where: evaluating generation of spatial descriptions with adaptive attention,\u201d","volume-title":"Lecture Notes in Computer Science","author":"Ghanimifard","year":"2019"},{"key":"B35","first-page":"2261","article-title":"\u201cAnimal, dog, or dalmatian? level of abstraction in nominal referring expressions,\u201d","volume-title":"Proceedings of the 38th Annual Conference of the Cognitive Science Society","author":"Graf","year":"2016"},{"key":"B36","doi-asserted-by":"publisher","first-page":"777","DOI":"10.3389\/fpsyg.2013.00777","article-title":"Statistics of high-level scene context","volume":"4","author":"Greene","year":"2013","journal-title":"Front. Psychol"},{"key":"B37","first-page":"41","article-title":"\u201cLogic and conversation,\u201d","volume-title":"Syntax and Semantics: Vol. 3: Speech Acts","author":"Grice","year":"1975"},{"key":"B38","doi-asserted-by":"publisher","first-page":"36","DOI":"10.1016\/j.cognition.2018.02.011","article-title":"Encoding of event roles from visual scenes is rapid, spontaneous, and interacts with higher-level visual processing","volume":"175","author":"Hafri","year":"2018","journal-title":"Cognition"},{"key":"B39","doi-asserted-by":"publisher","first-page":"335","DOI":"10.1016\/0167-2789(90)90087-6","article-title":"The symbol grounding problem","volume":"42","author":"Harnad","year":"1990","journal-title":"Physica D"},{"key":"B40","doi-asserted-by":"publisher","first-page":"641","DOI":"10.3758\/s13423-020-01823-7","article-title":"Perspective determines the production and interpretation of pointing gestures","volume":"28","author":"Herbort","year":"2021","journal-title":"Psychonomic Bull. Rev"},{"key":"B41","doi-asserted-by":"crossref","first-page":"70","DOI":"10.1007\/978-3-540-27823-8_8","article-title":"\u201cOn referring to sets of objects naturally,\u201d","volume-title":"Natural Language Generation","author":"Horacek","year":"2004"},{"key":"B42","article-title":"\u201cGenerating referential descriptions under conditions of uncertainty,\u201d","volume-title":"Proceedings of the Tenth European Workshop on Natural Language Generation (ENLG-05)","author":"Horacek","year":"2005"},{"key":"B43","doi-asserted-by":"publisher","first-page":"1","DOI":"10.3765\/sp.11.10","article-title":"A formal semantics for situated conversation","volume":"11","author":"Hunter","year":"2018","journal-title":"Semant Pragmat"},{"key":"B44","first-page":"250","article-title":"\u201cInfluences on attribute selection in redescriptions: a corpus study,\u201d","volume-title":"Proceedings of the Twenty-Second Annual Conference of the Cognitive Science Society","author":"Jordan","year":"2000"},{"key":"B45","doi-asserted-by":"crossref","first-page":"787","DOI":"10.3115\/v1\/D14-1086","article-title":"\u201cReferItGame: Referring to objects in photographs of natural scenes,\u201d","volume-title":"Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Kazemzadeh","year":"2014"},{"key":"B46","first-page":"1041","article-title":"\u201cIncremental generation of spatial referring expressions in situated dialog,\u201d","volume-title":"Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics","author":"Kelleher","year":"2006"},{"key":"B47","doi-asserted-by":"crossref","first-page":"1952","DOI":"10.18653\/v1\/2020.coling-main.177","article-title":"\u201cCoNAN: a complementary neighboring-based attention network for referring expression generation,\u201d","volume-title":"Proceedings of the 28th International Conference on Computational Linguistics","author":"Kim","year":"2020"},{"key":"B48","doi-asserted-by":"publisher","first-page":"2247","DOI":"10.3389\/fpsyg.2019.02247","article-title":"On visually-grounded reference production: testing the effects of perceptual grouping and 2d\/3d presentation mode","volume":"10","author":"Koolen","year":"2019","journal-title":"Front. Psychol"},{"key":"B49","doi-asserted-by":"publisher","first-page":"3231","DOI":"10.1016\/j.pragma.2011.06.008","article-title":"Factors causing overspecification in definite descriptions","volume":"43","author":"Koolen","year":"2011","journal-title":"J. Pragmat"},{"key":"B50","doi-asserted-by":"publisher","first-page":"1617","DOI":"10.1111\/cogs.12297","article-title":"How distractor objects trigger referential overspecification: testing the effects of visual clutter and distractor distance","volume":"40","author":"Koolen","year":"2015","journal-title":"Cogn. Sci"},{"key":"B51","first-page":"223","article-title":"\u201cEfficient context-sensitive generation of referring expressions,\u201d","volume-title":"Number 143 in Lecture Notes","author":"Krahmer","year":"2002"},{"key":"B52","doi-asserted-by":"publisher","first-page":"173","DOI":"10.1162\/COLI_a_00088","article-title":"Computational generation of referring expressions: a survey","volume":"38","author":"Krahmer","year":"2012","journal-title":"Comput. Linguist"},{"key":"B53","article-title":"\u201cComputational generation of referring expressions: an updated survey,\u201d","author":"Krahmer","year":"2019","journal-title":"The Oxford Handbook of Reference"},{"key":"B54","article-title":"\u201cA new model for generating multimodal referring expressions,\u201d","volume-title":"Proceedings of the 9th European Workshop on Natural Language Generation (ENLG-2003) at EACL 2003","author":"Krahmer","year":"2003"},{"key":"B55","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1162\/089120103321337430","article-title":"Graph-based generation of referring expressions","volume":"29","author":"Krahmer","year":"2003","journal-title":"Computat. Linguist"},{"key":"B56","first-page":"155","article-title":"\u201cDeictic object reference in task-oriented dialogue,\u201d","volume-title":"Situated Communication, number 166 in Trends in Linguistics. Studies and Monographs [TiLSM","author":"Kranstedt","year":"2006"},{"key":"B57","article-title":"\u201cIncremental generation of multimodal deixis referring to objects,\u201d","volume-title":"Proceedings of the Tenth European Workshop on Natural Language Generation (ENLG-05)","author":"Kranstedt","year":"2005"},{"key":"B58","doi-asserted-by":"crossref","first-page":"6867","DOI":"10.1109\/CVPR.2018.00718","article-title":"\u201cReferring relationships,\u201d","volume-title":"2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Krishna","year":"2018"},{"key":"B59","doi-asserted-by":"publisher","first-page":"32","DOI":"10.1007\/s11263-016-0981-7","article-title":"Visual genome: connecting language and vision using crowdsourced dense image annotations","volume":"123","author":"Krishna","year":"2017","journal-title":"Int. J. Comput. Vis"},{"key":"B60","doi-asserted-by":"crossref","first-page":"60","DOI":"10.3115\/981623.981631","article-title":"\u201cConversationally relevant descriptions,\u201d","volume-title":"27th Annual Meeting of the Association for Computational Linguistics","author":"Kronfeld","year":"1989"},{"key":"B61","doi-asserted-by":"crossref","first-page":"28","DOI":"10.1007\/978-3-030-60457-8_3","article-title":"\u201cReferring expression generation via visual dialogue,\u201d","volume-title":"Natural Language Processing and Chinese Computing","author":"Li","year":"2020"},{"key":"B62","doi-asserted-by":"publisher","first-page":"2749","DOI":"10.1109\/TMM.2018.2811621","article-title":"Bundled object context for referring expressions","volume":"20","author":"Li","year":"2018","journal-title":"IEEE Trans. Multimedia"},{"key":"B63","doi-asserted-by":"crossref","DOI":"10.1109\/ICCV.2017.520","article-title":"\u201cReferring expression generation and comprehension via attributes,\u201d","volume-title":"2017 IEEE International Conference on Computer Vision (ICCV)","author":"Liu","year":"2017"},{"key":"B64","doi-asserted-by":"publisher","first-page":"5244","DOI":"10.1109\/TIP.2020.2979010","article-title":"Attribute-guided attention for referring expression generation and comprehension","volume":"29","author":"Liu","year":"2020","journal-title":"IEEE Trans. Image Process"},{"key":"B65","doi-asserted-by":"publisher","first-page":"261","DOI":"10.1007\/s11263-019-01247-4","article-title":"Deep learning for generic object detection: a survey","volume":"128","author":"Liu","year":"2019","journal-title":"Int. J. Comput. Vis"},{"key":"B66","doi-asserted-by":"crossref","first-page":"3125","DOI":"10.1109\/CVPR.2017.333","article-title":"\u201cComprehension-guided referring expressions,\u201d","volume-title":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Luo","year":"2017"},{"key":"B67","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1109\/CVPR.2016.9","article-title":"\u201cGeneration and comprehension of unambiguous object descriptions,\u201d","volume-title":"2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Mao","year":"2016"},{"key":"B68","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1075\/la.196.04mei","article-title":"\u201cWhat is a context? theoretical and empirical evidence,\u201d","volume-title":"What is a Context? Linguistic Approaches and Challenges","author":"Meibauer","year":"2012"},{"key":"B69","first-page":"1174","article-title":"\u201cGenerating expressions that refer to visible objects,\u201d","volume-title":"Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Mitchell","year":"2013"},{"key":"B70","doi-asserted-by":"publisher","first-page":"1183","DOI":"10.1613\/jair.1.11688","article-title":"Trends in integration of vision and language research: a survey of tasks, datasets, and methods","volume":"71","author":"Mogadala","year":"2021","journal-title":"J. Artif. Intell. Res"},{"key":"B71","doi-asserted-by":"crossref","first-page":"792","DOI":"10.1007\/978-3-319-46493-0_48","article-title":"\u201cModeling context between objects for referring expression understanding,\u201d","volume-title":"Computer Vision\u2013ECCV 2016","author":"Nagaraja","year":"2016"},{"key":"B72","first-page":"23","article-title":"\u201cChapter 2 building the gist of a scene: the role of global image features in recognition,\u201d","volume-title":"Progress in Brain Research","author":"Oliva","year":"2006"},{"key":"B73","doi-asserted-by":"publisher","first-page":"520","DOI":"10.1016\/j.tics.2007.09.009","article-title":"The role of context in object recognition","volume":"11","author":"Oliva","year":"2007","journal-title":"Trends Cogn. Sci"},{"key":"B74","doi-asserted-by":"crossref","DOI":"10.1109\/ICIP.2003.1246946","article-title":"\u201cTop-down control of visual attention in object detection,\u201d","volume-title":"Proceedings 2003 International Conference on Image Processing","author":"Oliva","year":"2003"},{"key":"B75","doi-asserted-by":"publisher","first-page":"108","DOI":"10.1145\/2885252","article-title":"Learning to name objects","volume":"59","author":"Ordonez","year":"2016","journal-title":"Commun. ACM"},{"key":"B76","doi-asserted-by":"publisher","first-page":"519","DOI":"10.3758\/BF03197524","article-title":"The effects of contextual scenes on the identification of objects","volume":"3","author":"Palmer","year":"1975","journal-title":"Mem. Cogn"},{"key":"B77","doi-asserted-by":"crossref","first-page":"41","DOI":"10.18653\/v1\/2020.inlg-1.7","article-title":"\u201cImproving the naturalness and diversity of referring expression generation models using minimum risk training,\u201d","volume-title":"Proceedings of the 13th International Conference on Natural Language Generation","author":"Panagiaris","year":"2020"},{"key":"B78","doi-asserted-by":"publisher","first-page":"101184","DOI":"10.1016\/j.csl.2020.101184","article-title":"Generating unambiguous and diverse referring expressions","volume":"68","author":"Panagiaris","year":"2021","journal-title":"Comput. Speech Lang"},{"key":"B79","first-page":"55","article-title":"\u201cOverspecified reference in hierarchical domains: measuring the benefits for readers,\u201d","volume-title":"Proceedings of the Fourth International Natural Language Generation Conference","author":"Paraboni","year":"2006"},{"key":"B80","doi-asserted-by":"publisher","first-page":"89","DOI":"10.1515\/ling.1989.27.1.89","article-title":"Incremental speech production and referential overspecification","volume":"27","author":"Pechmann","year":"1989","journal-title":"Linguistics"},{"key":"B81","doi-asserted-by":"publisher","first-page":"2056","DOI":"10.1037\/a0037524","article-title":"Peripheral guidance in scenes: the interaction of scene context and object content","volume":"40","author":"Pereira","year":"2014","journal-title":"J. Exp. Psychol. Hum. Percept. Perform"},{"key":"B82","doi-asserted-by":"crossref","first-page":"207","DOI":"10.1145\/1647314.1647351","article-title":"\u201cSalience in the generation of multimodal referring acts,\u201d","volume-title":"Proceedings of the 2009 International Conference on Multimodal Interfaces, ICMI-MLMI '09","author":"Piwek","year":"2009"},{"key":"B83","volume-title":"Object Naming in Visual Search Tasks","author":"Pontillo","year":"2017"},{"key":"B84","first-page":"1","article-title":"\u201cEvery object tells a story,\u201d","volume-title":"Proceedings of the Workshop Events and Stories in the News 2018","author":"Pustejovsky","year":"2018"},{"key":"B85","doi-asserted-by":"crossref","DOI":"10.1109\/ICCV.2007.4408986","article-title":"\u201cObjects in context,\u201d","volume-title":"2007 IEEE 11th International Conference on Computer Vision","author":"Rabinovich","year":"2007"},{"key":"B86","doi-asserted-by":"crossref","first-page":"97","DOI":"10.3115\/981823.981836","article-title":"\u201cThe computational complexity of avoiding conversational implicatures,\u201d","volume-title":"28th Annual Meeting of the Association for Computational Linguistics","author":"Reiter","year":"1990"},{"key":"B87","doi-asserted-by":"crossref","unstructured":"\u201cA fast algorithm for the generation of referring expressions,\u201d ReiterE. DaleR. COLING 1992 Volume 1: The 14th International Conference on Computational Linguistics1992","DOI":"10.3115\/992066.992105"},{"key":"B88","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511519857","volume-title":"Building Natural Language Generation Systems","author":"Reiter","year":"2000"},{"key":"B89","doi-asserted-by":"publisher","first-page":"495","DOI":"10.1006\/cogp.1998.0712","article-title":"Food for thought: cross-classification and category organization in a complex real-world domain","volume":"38","author":"Ross","year":"1999","journal-title":"Cogn. Psychol"},{"key":"B90","first-page":"47","article-title":"\u201cDecoupling pragmatics: discriminative decoding for referring expression generation,\u201d","volume-title":"Proceedings of the Reasoning and Interaction Conference (ReInAct 2021)","author":"Sch\u00fcz","year":"2021"},{"key":"B91","first-page":"5792","article-title":"\u201cObject naming in language and vision: a survey and a new dataset,\u201d","volume-title":"Proceedings of the 12th Language Resources and Evaluation Conference","author":"Silberer","year":""},{"key":"B92","first-page":"1893","article-title":"\u201cHumans meet models on object naming: a new dataset and analysis,\u201d","volume-title":"Proceedings of the 28th International Conference on Computational Linguistics","author":"Silberer","year":""},{"key":"B93","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1075\/aicr.93.01spi","article-title":"\u201cTowards a situated view of language,\u201d","volume-title":"Visually Situated Language Comprehension","author":"Spivey","year":"2016"},{"key":"B94","first-page":"217","article-title":"\u201cEmploying contextual information in computer vision,\u201d","author":"Strat","year":"1993","journal-title":"Proceedings of ARPA Image Understanding Workshop"},{"key":"B95","doi-asserted-by":"publisher","first-page":"3147385","DOI":"10.1109\/TMM.2022.3147385","article-title":"A proposal-free one-stage framework for referring expression comprehension and generation via dense cross-attention","volume":"2022","author":"Sun","year":"2022","journal-title":"IEEE Trans. Multimedia"},{"key":"B96","doi-asserted-by":"crossref","first-page":"5793","DOI":"10.1109\/ICCV.2019.00589","article-title":"\u201cGenerating easy-to-understand referring expressions for target identifications,\u201d","volume-title":"2019 IEEE\/CVF International Conference on Computer Vision (ICCV)","author":"Tanaka","year":"2019"},{"key":"B97","doi-asserted-by":"publisher","first-page":"766","DOI":"10.1037\/0033-295X.113.4.766","article-title":"Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search","volume":"113","author":"Torralba","year":"2006","journal-title":"Psychol. Rev"},{"key":"B98","doi-asserted-by":"publisher","first-page":"10","DOI":"10.1016\/j.visres.2020.11.003","article-title":"The meaning and structure of scenes","volume":"181","author":"V o","year":"2021","journal-title":"Vision Res"},{"key":"B99","doi-asserted-by":"publisher","first-page":"37","DOI":"10.1162\/089120102317341765","article-title":"Generating referring expressions: boolean extensions of the incremental algorithm","volume":"28","author":"van Deemter","year":"2002","journal-title":"Comput. Linguist"},{"key":"B100","doi-asserted-by":"crossref","DOI":"10.7551\/mitpress\/9082.001.0001","volume-title":"Computational Models of Referring: A Study in Cognitive Science","author":"van Deemter","year":"2016"},{"key":"B101","first-page":"130","article-title":"\u201cBuilding a semantically transparent corpus for the generation of referring expressions,\u201d","volume-title":"Proceedings of the Fourth International Natural Language Generation Conference","author":"van Deemter","year":"2006"},{"key":"B102","doi-asserted-by":"crossref","first-page":"158","DOI":"10.1163\/9789004333901_012","article-title":"\u201cGenerating referring expressions in a multimodal context: an empirically oriented approach,\u201d","volume-title":"Computational Linguistics in the Netherlands 2000","author":"van der Sluis","year":"2001"},{"key":"B103","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18653\/v1\/E17-4001","article-title":"\u201cPragmatic descriptions of perceptual stimuli,\u201d","volume-title":"Proceedings of the Student Research Workshop at the 15th Conference of the European Chapter of the Association for Computational Linguistics","author":"van Miltenburg","year":"2017"},{"key":"B104","first-page":"59","article-title":"\u201cThe use of spatial relations in referring expression generation,\u201d","volume-title":"Proceedings of the Fifth International Natural Language Generation Conference","author":"Viethen","year":"2008"},{"key":"B105","doi-asserted-by":"crossref","DOI":"10.1109\/CVPR.2015.7298935","article-title":"\u201cShow and tell: a neural image caption generator,\u201d","volume-title":"2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Vinyals","year":"2015"},{"key":"B106","doi-asserted-by":"publisher","first-page":"1323","DOI":"10.1080\/01690965.2012.682072","article-title":"Who is where referred to how, and why? the influence of visual saliency on referent accessibility in spoken language production","volume":"28","author":"Vogels","year":"2013","journal-title":"Lang. Cogn. Processes"},{"key":"B107","first-page":"5333","article-title":"\u201cOCID-ref: A 3D robotic dataset with embodied language for clutter scene grounding,\u201d","volume-title":"Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Wang","year":"2021"},{"key":"B108","doi-asserted-by":"crossref","DOI":"10.1109\/CVPR.2019.00206","article-title":"\u201cNeighbourhood watch: Referring expression comprehension via language-guided graph attention networks,\u201d","volume-title":"2019 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","author":"Wang","year":"2019"},{"key":"B109","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/0010-0285(72)90002-3","article-title":"Understanding natural language","volume":"3","author":"Winograd","year":"1972","journal-title":"Cognitive Psychology"},{"key":"B110","first-page":"2048","article-title":"\u201cShow, attend and tell: Neural image caption generation with visual attention,\u201d","volume-title":"Proceedings of the 32nd International Conference on Machine Learning","author":"Xu","year":"2015"},{"key":"B111","doi-asserted-by":"crossref","DOI":"10.1109\/CVPR.2010.5540235","article-title":"\u201cModeling mutual context of object and human pose in human-object interaction activities,\u201d","volume-title":"2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition","author":"Yao","year":"2010"},{"key":"B112","doi-asserted-by":"crossref","first-page":"69","DOI":"10.1007\/978-3-319-46475-6_5","article-title":"\u201cModeling context in referring expressions,\u201d","volume-title":"Computer Vision-ECCV 2016","author":"Yu","year":"2016"},{"key":"B113","doi-asserted-by":"crossref","DOI":"10.1109\/CVPR.2017.375","article-title":"\u201cA joint speaker-listener-reinforcer model for referring expressions,\u201d","volume-title":"Computer Vision and Pattern Recognition (CVPR)","author":"Yu","year":"2017"},{"key":"B114","doi-asserted-by":"publisher","first-page":"103514","DOI":"10.1016\/j.dsp.2022.103514","article-title":"A survey of modern deep learning based object detection models","volume":"126","author":"Zaidi","year":"2022","journal-title":"Digit. Signal Process"},{"key":"B115","doi-asserted-by":"crossref","first-page":"610","DOI":"10.18653\/v1\/P16-1058","article-title":"\u201cEasy things first: installments improve referring expression generation for objects in photographs,\u201d","volume-title":"Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Zarrie\u00df","year":"2016"},{"key":"B116","doi-asserted-by":"crossref","first-page":"243","DOI":"10.18653\/v1\/P17-1023","article-title":"\u201cObtaining referential word meanings from visual and distributional information: experiments on object naming,\u201d","volume-title":"Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Zarrie\u00df","year":"2017"},{"key":"B117","doi-asserted-by":"crossref","first-page":"503","DOI":"10.18653\/v1\/W18-6563","article-title":"\u201cDecoding strategies for neural referring expression generation,\u201d","volume-title":"Proceedings of the 11th International Conference on Natural Language Generation","author":"Zarrie\u00df","year":"2018"},{"key":"B118","doi-asserted-by":"crossref","first-page":"654","DOI":"10.18653\/v1\/P19-1063","article-title":"\u201cKnow what you don't know: Modeling a pragmatic speaker that refers to objects of unknown categories,\u201d","volume-title":"Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics","author":"Zarrie\u00df","year":"2019"},{"key":"B119","doi-asserted-by":"crossref","DOI":"10.1109\/CVPR.2018.00437","article-title":"\u201cGrounding referring expressions in images by variational context,\u201d","volume-title":"2018 IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Zhang","year":"2018"}],"container-title":["Frontiers in Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2023.1067125\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,16]],"date-time":"2024-10-16T18:43:35Z","timestamp":1729104215000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2023.1067125\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,3,21]]},"references-count":119,"alternative-id":["10.3389\/frai.2023.1067125"],"URL":"https:\/\/doi.org\/10.3389\/frai.2023.1067125","relation":{},"ISSN":["2624-8212"],"issn-type":[{"value":"2624-8212","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,3,21]]},"article-number":"1067125"}}