{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,2]],"date-time":"2026-04-02T05:55:54Z","timestamp":1775109354173,"version":"3.50.1"},"reference-count":56,"publisher":"SAGE Publications","issue":"2-3","license":[{"start":{"date-parts":[[2020,1,2]],"date-time":"2020-01-02T00:00:00Z","timestamp":1577923200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of Robotics Research"],"published-print":{"date-parts":[[2020,3]]},"abstract":"<jats:p> This article presents INGRESS, a robot system that follows human natural language instructions to pick and place everyday objects. The key question here is to ground referring expressions: understand expressions about objects and their relationships from image and natural language inputs. INGRESS allows unconstrained object categories and rich language expressions. Further, it asks questions to clarify ambiguous referring expressions interactively. To achieve these, we take the approach of grounding by generation and propose a two-stage neural-network model for grounding. The first stage uses a neural network to generate visual descriptions of objects, compares them with the input language expressions, and identifies a set of candidate objects. The second stage uses another neural network to examine all pairwise relations between the candidates and infers the most likely referred objects. The same neural networks are used for both grounding and question generation for disambiguation. Experiments show that INGRESS outperformed a state-of-the-art method on the RefCOCO dataset and in robot experiments with humans. The INGRESS source code is available at https:\/\/github.com\/MohitShridhar\/ingress . <\/jats:p>","DOI":"10.1177\/0278364919897133","type":"journal-article","created":{"date-parts":[[2020,1,2]],"date-time":"2020-01-02T12:35:35Z","timestamp":1577968535000},"page":"217-232","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":76,"title":["INGRESS: Interactive visual grounding of referring expressions"],"prefix":"10.1177","volume":"39","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7382-763X","authenticated-orcid":false,"given":"Mohit","family":"Shridhar","sequence":"first","affiliation":[{"name":"Paul G Allen School of Computer Science and Engineering,University of Washington, Seattle, WA, USA"}]},{"given":"Dixant","family":"Mittal","sequence":"additional","affiliation":[{"name":"School of Computing, National University of Singapore, Singapore"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2309-4535","authenticated-orcid":false,"given":"David","family":"Hsu","sequence":"additional","affiliation":[{"name":"School of Computing, National University of Singapore, Singapore"}]}],"member":"179","published-online":{"date-parts":[[2020,1,2]]},"reference":[{"key":"bibr1-0278364919897133","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.12"},{"key":"bibr2-0278364919897133","author":"Arbel\u00e1ez P","year":"2014","journal-title":"Proceedings of Computer Vision and Pattern Recognition (CVPR)"},{"key":"bibr3-0278364919897133","author":"Arkin J","year":"2018","journal-title":"SIGDIAL Special Session on Physically Situated Dialogue (RoboDIAL-18)"},{"key":"bibr4-0278364919897133","first-page":"65","volume-title":"Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and\/or Summarization","author":"Banerjee S","year":"2005"},{"key":"bibr5-0278364919897133","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N16-1089"},{"key":"bibr6-0278364919897133","doi-asserted-by":"publisher","DOI":"10.15607\/RSS.2005.I.020"},{"key":"bibr7-0278364919897133","doi-asserted-by":"publisher","DOI":"10.1037\/10096-006"},{"key":"bibr8-0278364919897133","author":"Das A","year":"2017","journal-title":"Proceedings of Computer Vision and Pattern Recognition (CVPR)"},{"key":"bibr9-0278364919897133","author":"De Vries H","year":"2017","journal-title":"Proceedings of Computer Vision and Pattern Recognition (CVPR)"},{"key":"bibr10-0278364919897133","first-page":"6594","author":"De Vries H","year":"2017","journal-title":"Advances in Neural Information Processing Systems (NIPS)"},{"key":"bibr11-0278364919897133","doi-asserted-by":"publisher","DOI":"10.1080\/09540090802413145"},{"key":"bibr12-0278364919897133","doi-asserted-by":"publisher","DOI":"10.15607\/RSS.2016.XII.036"},{"key":"bibr13-0278364919897133","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2016.7487504"},{"key":"bibr14-0278364919897133","first-page":"1914","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"FitzGerald N","year":"2013"},{"key":"bibr15-0278364919897133","author":"Fried D","year":"2018","journal-title":"Advances in Neural Information Processing Systems (NIPS)"},{"key":"bibr16-0278364919897133","doi-asserted-by":"publisher","DOI":"10.1002\/9781118611463.wbielsi066"},{"key":"bibr17-0278364919897133","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Golland D","year":"2010"},{"key":"bibr18-0278364919897133","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2013.6696569"},{"key":"bibr19-0278364919897133","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2018.8460699"},{"key":"bibr20-0278364919897133","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2015.7354097"},{"key":"bibr21-0278364919897133","author":"Hu R","year":"2017","journal-title":"Proceedings of Computer Vision and Pattern Recognition (CVPR)"},{"key":"bibr22-0278364919897133","author":"Hu R","year":"2016","journal-title":"Proceedings of Computer Vision and Pattern Recognition (CVPR)"},{"key":"bibr23-0278364919897133","doi-asserted-by":"publisher","DOI":"10.1109\/SMARTCOMP.2016.7501708"},{"key":"bibr24-0278364919897133","author":"Johnson J","year":"2017","journal-title":"Proceedings of Computer Vision and Pattern Recognition (CVPR)"},{"key":"bibr25-0278364919897133","author":"Johnson J","year":"2017","journal-title":"ICCV"},{"key":"bibr26-0278364919897133","author":"Johnson J","year":"2016","journal-title":"Proceedings of Computer Vision and Pattern Recognition (CVPR)"},{"key":"bibr27-0278364919897133","first-page":"3128","author":"Karpathy A","year":"2015","journal-title":"Proceedings of Computer Vision and Pattern Recognition (CVPR)"},{"key":"bibr28-0278364919897133","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1086"},{"key":"bibr29-0278364919897133","volume-title":"Proceedings of the ACM\/IEEE International Conference on Human-Robot Interaction (HRI)","author":"Kollar T","year":"2010"},{"key":"bibr30-0278364919897133","doi-asserted-by":"publisher","DOI":"10.1007\/s11263-016-0981-7"},{"key":"bibr31-0278364919897133","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00220"},{"key":"bibr32-0278364919897133","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2016.7759839"},{"key":"bibr33-0278364919897133","doi-asserted-by":"publisher","DOI":"10.1109\/ROMAN.2016.7745089"},{"key":"bibr34-0278364919897133","doi-asserted-by":"publisher","DOI":"10.15607\/RSS.2017.XIII.058"},{"key":"bibr35-0278364919897133","author":"Mao J","year":"2016","journal-title":"Proceedings of Computer Vision and Pattern Recognition (CVPR)"},{"key":"bibr36-0278364919897133","volume-title":"Proceedings of the International Conference on Machine Learning (ICML)","author":"Matuszek C","year":"2012"},{"key":"bibr37-0278364919897133","volume-title":"Thirtieth AAAI Conference on Artificial Intelligence","author":"Mei H","year":"2016"},{"key":"bibr38-0278364919897133","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46493-0_48"},{"key":"bibr39-0278364919897133","doi-asserted-by":"publisher","DOI":"10.4324\/9781315736174"},{"key":"bibr40-0278364919897133","first-page":"714","volume-title":"Proceedings of The 2nd Conference on Robot Learning (CoRL) (Proceedings of Machine Learning Research, Vol. 87)","author":"Nyga D","year":"2018"},{"key":"bibr41-0278364919897133","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2016.7759741"},{"key":"bibr42-0278364919897133","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2012.6385603"},{"key":"bibr43-0278364919897133","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.1995.479398"},{"key":"bibr44-0278364919897133","doi-asserted-by":"publisher","DOI":"10.15607\/RSS.2016.XII.037"},{"key":"bibr45-0278364919897133","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2017\/629"},{"key":"bibr46-0278364919897133","author":"Santoro A","year":"2017","journal-title":"Advances in Neural Information Processing Systems (NIPS)"},{"key":"bibr47-0278364919897133","doi-asserted-by":"publisher","DOI":"10.15607\/RSS.2018.XIV.028"},{"key":"bibr48-0278364919897133","first-page":"2164","volume-title":"Advances in Neural Information Processing Systems (NIPS)","author":"Silver D","year":"2010"},{"key":"bibr49-0278364919897133","first-page":"1772","author":"Somani A","year":"2013","journal-title":"Advances in Neural Information Processing Systems (NIPS)"},{"key":"bibr50-0278364919897133","author":"Tellex S","year":"2014","journal-title":"Proceedings of Robotics: Science and Systems (RSS)"},{"key":"bibr51-0278364919897133","doi-asserted-by":"publisher","DOI":"10.1145\/1891903.1891944"},{"key":"bibr52-0278364919897133","first-page":"3156","author":"Vinyals O","year":"2015","journal-title":"Proceedings of Computer Vision and Pattern Recognition (CVPR)"},{"key":"bibr53-0278364919897133","volume-title":"The Oxford Handbook of Compositionality","author":"Werning M","year":"2012"},{"key":"bibr54-0278364919897133","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2017.7989121"},{"key":"bibr55-0278364919897133","doi-asserted-by":"publisher","DOI":"10.1016\/j.csl.2006.06.008"},{"key":"bibr56-0278364919897133","author":"Yu L","year":"2017","journal-title":"Proceedings of Computer Vision and Pattern Recognition (CVPR)"}],"container-title":["The International Journal of Robotics Research"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/0278364919897133","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/0278364919897133","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/0278364919897133","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,1]],"date-time":"2025-03-01T11:01:22Z","timestamp":1740826882000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/0278364919897133"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,1,2]]},"references-count":56,"journal-issue":{"issue":"2-3","published-print":{"date-parts":[[2020,3]]}},"alternative-id":["10.1177\/0278364919897133"],"URL":"https:\/\/doi.org\/10.1177\/0278364919897133","relation":{},"ISSN":["0278-3649","1741-3176"],"issn-type":[{"value":"0278-3649","type":"print"},{"value":"1741-3176","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,1,2]]}}}