{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,12]],"date-time":"2026-03-12T02:38:19Z","timestamp":1773283099125,"version":"3.50.1"},"reference-count":100,"publisher":"SAGE Publications","issue":"10-11","license":[{"start":{"date-parts":[[2020,6,5]],"date-time":"2020-06-05T00:00:00Z","timestamp":1591315200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"funder":[{"DOI":"10.13039\/100002186","name":"Lockheed Martin","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100002186","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100015599","name":"Toyota Research Institute","doi-asserted-by":"crossref","award":["LP- C000765-SR"],"award-info":[{"award-number":["LP- C000765-SR"]}],"id":[{"id":"10.13039\/100015599","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Robotics Consortium of the U.S. Army Research Laboratory under the Collaborative Technology Alliance Program","award":["W911NF-15-1-0402"],"award-info":[{"award-number":["W911NF-15-1-0402"]}]}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of Robotics Research"],"published-print":{"date-parts":[[2020,9]]},"abstract":"<jats:p> The goal of this article is to enable robots to perform robust task execution following human instructions in partially observable environments. A robot\u2019s ability to interpret and execute commands is fundamentally tied to its semantic world knowledge. Commonly, robots use exteroceptive sensors, such as cameras or LiDAR, to detect entities in the workspace and infer their visual properties and spatial relationships. However, semantic world properties are often visually imperceptible. We posit the use of non-exteroceptive modalities including physical proprioception, factual descriptions, and domain knowledge as mechanisms for inferring semantic properties of objects. We introduce a probabilistic model that fuses linguistic knowledge with visual and haptic observations into a cumulative belief over latent world attributes to infer the meaning of instructions and execute the instructed tasks in a manner robust to erroneous, noisy, or contradictory evidence. In addition, we provide a method that allows the robot to communicate knowledge dissonance back to the human as a means of correcting errors in the operator\u2019s world model. Finally, we propose an efficient framework that anticipates possible linguistic interactions and infers the associated groundings for the current world state, thereby bootstrapping both language understanding and generation. We present experiments on manipulators for tasks that require inference over partially observed semantic properties, and evaluate our framework\u2019s ability to exploit expressed information and knowledge bases to facilitate convergence, and generate statements to correct declared facts that were observed to be inconsistent with the robot\u2019s estimate of object properties. <\/jats:p>","DOI":"10.1177\/0278364920917755","type":"journal-article","created":{"date-parts":[[2020,6,5]],"date-time":"2020-06-05T07:29:36Z","timestamp":1591342176000},"page":"1279-1304","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":32,"title":["Multimodal estimation and communication of latent semantic knowledge for robust execution of robot instructions"],"prefix":"10.1177","volume":"39","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1074-9248","authenticated-orcid":false,"given":"Jacob","family":"Arkin","sequence":"first","affiliation":[{"name":"Robotics and Artificial Intelligence Laboratory, University of Rochester, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1287-9433","authenticated-orcid":false,"given":"Daehyung","family":"Park","sequence":"additional","affiliation":[{"name":"Computer Science & Artificial Intelligence Laboratory, Massachusetts Institute of Technology, USA"}]},{"given":"Subhro","family":"Roy","sequence":"additional","affiliation":[{"name":"Computer Science & Artificial Intelligence Laboratory, Massachusetts Institute of Technology, USA"}]},{"given":"Matthew R","family":"Walter","sequence":"additional","affiliation":[{"name":"Robot Intelligence through Perception Laboratory, Toyota Technological Institute at Chicago, USA"}]},{"given":"Nicholas","family":"Roy","sequence":"additional","affiliation":[{"name":"Computer Science & Artificial Intelligence Laboratory, Massachusetts Institute of Technology, USA"}]},{"given":"Thomas M","family":"Howard","sequence":"additional","affiliation":[{"name":"Robotics and Artificial Intelligence Laboratory, University of Rochester, USA"}]},{"given":"Rohan","family":"Paul","sequence":"additional","affiliation":[{"name":"Computer Science & Artificial Intelligence Laboratory, Massachusetts Institute of Technology, USA"},{"name":"Department of Computer Science and Engineering, Indian Institute of Technology Delhi, India"}]}],"member":"179","published-online":{"date-parts":[[2020,6,5]]},"reference":[{"key":"bibr1-0278364920917755","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00387"},{"key":"bibr2-0278364920917755","doi-asserted-by":"publisher","DOI":"10.1109\/HRI.2013.6483608"},{"key":"bibr3-0278364920917755","first-page":"502","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Angeli G","year":"2010"},{"key":"bibr4-0278364920917755","author":"Arkin J","year":"2018","journal-title":"Late-breaking Track at the SIGDIAL Special Session on Physically Situated Dialogue (RoboDIAL)"},{"key":"bibr5-0278364920917755","volume-title":"Proceedings of the International Symposium on Experimental Robotics (ISER)","author":"Arkin J","year":"2018"},{"key":"bibr6-0278364920917755","author":"Barber DJ","year":"2016","journal-title":"SPIE Defense+ Security"},{"key":"bibr7-0278364920917755","doi-asserted-by":"publisher","DOI":"10.3115\/1220575.1220617"},{"key":"bibr8-0278364920917755","author":"Barzilay R","year":"2004","journal-title":"arXiv preprint arXiv:0405039"},{"key":"bibr9-0278364920917755","doi-asserted-by":"publisher","DOI":"10.1109\/HUMANOIDS.2013.7029979"},{"key":"bibr10-0278364920917755","volume-title":"Pattern Recognition and Machine Learning","author":"Bishop CM","year":"2006"},{"key":"bibr11-0278364920917755","first-page":"147","volume":"18","author":"Blei D","year":"2006","journal-title":"Advances in Neural Information Processing Systems"},{"key":"bibr12-0278364920917755","doi-asserted-by":"publisher","DOI":"10.1214\/06-BA104"},{"key":"bibr13-0278364920917755","first-page":"993","volume":"3","author":"Blei DM","year":"2003","journal-title":"Journal of Machine Learning Research"},{"key":"bibr14-0278364920917755","first-page":"2787","author":"Bordes A","year":"2013","journal-title":"Advances in Neural Information Processing Systems (NeurIPS)"},{"key":"bibr15-0278364920917755","doi-asserted-by":"publisher","DOI":"10.1145\/1390156.1390173"},{"key":"bibr16-0278364920917755","doi-asserted-by":"publisher","DOI":"10.1109\/TRO.2011.2134130"},{"key":"bibr17-0278364920917755","doi-asserted-by":"publisher","DOI":"10.1016\/j.robot.2014.09.021"},{"key":"bibr18-0278364920917755","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-14749-4_27"},{"key":"bibr19-0278364920917755","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W15-4715"},{"key":"bibr20-0278364920917755","volume-title":"International Symposium on Robotics Research","author":"Daniele A","year":"2017"},{"key":"bibr21-0278364920917755","doi-asserted-by":"publisher","DOI":"10.1145\/2909824.3020241"},{"key":"bibr22-0278364920917755","doi-asserted-by":"publisher","DOI":"10.5898\/JHRI.2.2.Deits"},{"key":"bibr23-0278364920917755","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2013.6630702"},{"key":"bibr24-0278364920917755","volume-title":"Proceedings of the International Symposium on Experimental Robotics (ISER)","author":"Duvallet F","year":"2014"},{"key":"bibr25-0278364920917755","doi-asserted-by":"publisher","DOI":"10.1016\/S1071-5819(03)00038-7"},{"key":"bibr26-0278364920917755","doi-asserted-by":"publisher","DOI":"10.1145\/2696454.2696467"},{"key":"bibr27-0278364920917755","author":"Forbes M","year":"2017","journal-title":"Proceedings of the Association for Computational Linguistics (ACL)"},{"key":"bibr28-0278364920917755","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2012.6385471"},{"key":"bibr29-0278364920917755","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2018.8460754"},{"key":"bibr30-0278364920917755","doi-asserted-by":"publisher","DOI":"10.1177\/0278364915587924"},{"key":"bibr31-0278364920917755","doi-asserted-by":"publisher","DOI":"10.1016\/0167-2789(90)90087-6"},{"key":"bibr32-0278364920917755","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2015.7139984"},{"key":"bibr33-0278364920917755","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2015.7354097"},{"key":"bibr34-0278364920917755","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2013.6696752"},{"key":"bibr35-0278364920917755","volume-title":"Proceedings IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS) Workshop on Rehabilitation and Assistive Robotics","author":"Howard TM","year":"2014"},{"key":"bibr36-0278364920917755","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2014.6907841"},{"key":"bibr37-0278364920917755","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1086"},{"key":"bibr38-0278364920917755","first-page":"1041","author":"Kelleher JD","year":"2006","journal-title":"Proceedings of the Association for Computational Linguistics (ACL)"},{"key":"bibr39-0278364920917755","first-page":"543","volume-title":"Proceedings of the International Conference on Computational Linguistics","author":"Kim J","year":"2010"},{"key":"bibr40-0278364920917755","first-page":"721","author":"Kollar T","year":"2013","journal-title":"Proceedings Robotics: Science and Systems (RSS)"},{"key":"bibr41-0278364920917755","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2013.6631186"},{"key":"bibr42-0278364920917755","first-page":"259","volume-title":"Proceedings ACM\/IEEE International Conference on Human\u2013Robot Interaction (HRI)","author":"Kollar T","year":"2010"},{"key":"bibr43-0278364920917755","first-page":"752","volume-title":"Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)","author":"Konstas I","year":"2012"},{"key":"bibr44-0278364920917755","first-page":"91","author":"Liang P","year":"2009","journal-title":"Proceedings of the Association for Computational Linguistics (ACL)"},{"key":"bibr45-0278364920917755","doi-asserted-by":"publisher","DOI":"10.1162\/COLI_a_00127"},{"key":"bibr46-0278364920917755","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2017.333"},{"key":"bibr47-0278364920917755","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.9"},{"key":"bibr48-0278364920917755","unstructured":"Massa F, Girshick R (2018) Mask R-CNN-benchmark: Fast, modular reference implementation of instance segmentation and object detection algorithms in PyTorch. Available at: https:\/\/github.com\/facebookresearch\/maskrcnn-benchmark (accessed 16 July 2019)."},{"key":"bibr49-0278364920917755","volume-title":"Proceedings of the National Conference on Artificial Intelligence (AAAI)","author":"Matuszek C","year":"2014"},{"key":"bibr50-0278364920917755","volume-title":"Proceedings of the International Conference on Machine Learning (ICML)","author":"Matuszek C","year":"2012"},{"key":"bibr51-0278364920917755","first-page":"251","volume-title":"Proceedings ACM\/IEEE International Conference on Human\u2013Robot Interaction (HRI)","author":"Matuszek C","year":"2010"},{"key":"bibr52-0278364920917755","unstructured":"Matuszek C, Herbst E, Zettlemoyer L, Fox D (2012b) Learning to parse natural language to a robot execution system. Technical Report UW-CSE-12-01-01, University of Washington."},{"key":"bibr53-0278364920917755","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v30i1.10364"},{"key":"bibr54-0278364920917755","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N16-1086"},{"key":"bibr55-0278364920917755","doi-asserted-by":"publisher","DOI":"10.1177\/0278364915602060"},{"key":"bibr56-0278364920917755","volume-title":"Proceedings of the International Symposium on Experimental Robotics (ISER)","author":"Oh J","year":"2016"},{"key":"bibr57-0278364920917755","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2014.6907334"},{"key":"bibr58-0278364920917755","doi-asserted-by":"publisher","DOI":"10.1007\/s10514-018-9733-6"},{"key":"bibr59-0278364920917755","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-33712-3_26"},{"key":"bibr60-0278364920917755","doi-asserted-by":"publisher","DOI":"10.1177\/0278364918777627"},{"key":"bibr61-0278364920917755","doi-asserted-by":"publisher","DOI":"10.24963\/ijcai.2017\/629"},{"key":"bibr62-0278364920917755","first-page":"63","author":"Paul R","year":"2013","journal-title":"RLDM 2013"},{"key":"bibr63-0278364920917755","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1162"},{"key":"bibr64-0278364920917755","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v27i1.8475"},{"key":"bibr65-0278364920917755","doi-asserted-by":"publisher","DOI":"10.1109\/5.18626"},{"key":"bibr66-0278364920917755","doi-asserted-by":"publisher","DOI":"10.15607\/RSS.2013.IX.023"},{"key":"bibr67-0278364920917755","doi-asserted-by":"crossref","unstructured":"Rashkin H, Sap M, Allaway E, Smith NA, Choi Y (2018) Event2mind: Commonsense inference on events, intents, and reactions. arXiv preprint arXiv:1805.06939.","DOI":"10.18653\/v1\/P18-1043"},{"key":"bibr68-0278364920917755","volume-title":"Artificial Intelligence: A Modern Approach","author":"Russell SJ","year":"2016"},{"key":"bibr69-0278364920917755","unstructured":"Schliep A, Rungsarityotin W, Georgi B (2004) General hidden Markov model library. Available at http:\/\/www.ghmm.org\/ (accessed March 2020)."},{"key":"bibr70-0278364920917755","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.74"},{"key":"bibr71-0278364920917755","first-page":"1634","author":"She L","year":"2017","journal-title":"Proceedings of the Association for Computational Linguistics (ACL)"},{"key":"bibr72-0278364920917755","doi-asserted-by":"crossref","unstructured":"Shridhar M, Hsu D (2018) Interactive visual grounding of referring expressions for human\u2013robot interaction. arXiv preprint arXiv:1806.03831.","DOI":"10.15607\/RSS.2018.XIV.028"},{"key":"bibr73-0278364920917755","doi-asserted-by":"publisher","DOI":"10.1016\/j.robot.2012.10.007"},{"key":"bibr74-0278364920917755","volume-title":"Proceedings of the RSS 2009 Workshop on Mobile Manipulation","author":"Sinapov J","year":"2009"},{"key":"bibr75-0278364920917755","first-page":"926","author":"Socher R","year":"2013","journal-title":"Advances in Neural Information Processing Systems (NeurIPS)"},{"key":"bibr76-0278364920917755","author":"Tellex S","year":"2014","journal-title":"Proceedings Robotics: Science and Systems (RSS)"},{"key":"bibr77-0278364920917755","doi-asserted-by":"publisher","DOI":"10.1609\/aimag.v32i4.2384"},{"key":"bibr78-0278364920917755","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v25i1.7979"},{"key":"bibr79-0278364920917755","doi-asserted-by":"publisher","DOI":"10.15607\/RSS.2012.VIII.052"},{"key":"bibr80-0278364920917755","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.11966"},{"key":"bibr81-0278364920917755","first-page":"3477","volume-title":"Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI)","author":"Thomason J","year":"2016"},{"key":"bibr82-0278364920917755","first-page":"1923","volume-title":"Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI)","author":"Thomason J","year":"2015"},{"key":"bibr83-0278364920917755","volume-title":"Probabilistic Robotics","author":"Thrun S","year":"2005"},{"key":"bibr84-0278364920917755","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-28872-7_26"},{"key":"bibr85-0278364920917755","volume-title":"International Symposium on Robotics Research","author":"Tucker M","year":"2017"},{"key":"bibr86-0278364920917755","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2015.292"},{"key":"bibr87-0278364920917755","doi-asserted-by":"publisher","DOI":"10.3115\/1073336.1073339"},{"key":"bibr88-0278364920917755","doi-asserted-by":"publisher","DOI":"10.1002\/rob.21539"},{"key":"bibr89-0278364920917755","doi-asserted-by":"publisher","DOI":"10.15607\/RSS.2013.IX.004"},{"key":"bibr90-0278364920917755","doi-asserted-by":"publisher","DOI":"10.1177\/0278364914537359"},{"key":"bibr91-0278364920917755","doi-asserted-by":"publisher","DOI":"10.1109\/HRI.2016.7451741"},{"key":"bibr92-0278364920917755","volume-title":"Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI)","author":"Wang Q","year":"2015"},{"key":"bibr93-0278364920917755","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2016.7487507"},{"key":"bibr94-0278364920917755","first-page":"172","volume-title":"Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)","author":"Wong YW","year":"2007"},{"key":"bibr95-0278364920917755","unstructured":"Yang B, Yih Wt, He X, Gao J, Deng L (2014) Embedding entities and relations for learning and inference in knowledge bases. arXiv preprint arXiv:1412.6575."},{"key":"bibr96-0278364920917755","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N16-1023"},{"key":"bibr97-0278364920917755","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46475-6_5"},{"key":"bibr98-0278364920917755","volume-title":"Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI)","author":"Zender H","year":"2009"},{"key":"bibr99-0278364920917755","volume-title":"Dagstuhl Seminar Proceedings","author":"Zettlemoyer LS","year":"2008"},{"key":"bibr100-0278364920917755","first-page":"5165","author":"Zhang M","year":"2018","journal-title":"Advances in Neural Information Processing Systems (NeurIPS)"}],"container-title":["The International Journal of Robotics Research"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/0278364920917755","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/0278364920917755","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/0278364920917755","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,28]],"date-time":"2025-02-28T14:20:41Z","timestamp":1740752441000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/0278364920917755"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,6,5]]},"references-count":100,"journal-issue":{"issue":"10-11","published-print":{"date-parts":[[2020,9]]}},"alternative-id":["10.1177\/0278364920917755"],"URL":"https:\/\/doi.org\/10.1177\/0278364920917755","relation":{},"ISSN":["0278-3649","1741-3176"],"issn-type":[{"value":"0278-3649","type":"print"},{"value":"1741-3176","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,6,5]]}}}