{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,12]],"date-time":"2026-05-12T18:06:10Z","timestamp":1778609170886,"version":"3.51.4"},"reference-count":56,"publisher":"MIT Press","license":[{"start":{"date-parts":[[2021,12,9]],"date-time":"2021-12-09T00:00:00Z","timestamp":1639008000000},"content-version":"vor","delay-in-days":342,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["direct.mit.edu"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2021,12,6]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>We study continual learning for natural language instruction generation, by observing human users\u2019 instruction execution. We focus on a collaborative scenario, where the system both acts and delegates tasks to human users using natural language. We compare user execution of generated instructions to the original system intent as an indication to the system\u2019s success communicating its intent. We show how to use this signal to improve the system\u2019s ability to generate instructions via contextual bandit learning. In interaction with real users, our system demonstrates dramatic improvements in its ability to generate language over time.<\/jats:p>","DOI":"10.1162\/tacl_a_00428","type":"journal-article","created":{"date-parts":[[2021,12,9]],"date-time":"2021-12-09T18:16:48Z","timestamp":1639073808000},"page":"1303-1319","update-policy":"https:\/\/doi.org\/10.1162\/mitpressjournals.corrections.policy","source":"Crossref","is-referenced-by-count":4,"title":["Continual Learning for Grounded Instruction Generation by Observing Human Following Behavior"],"prefix":"10.1162","volume":"9","author":[{"given":"Noriyuki","family":"Kojima","sequence":"first","affiliation":[{"name":"Department of Computer Science and Cornell Tech, Cornell University, USA. nk654@cornell.edu"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Alane","family":"Suhr","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Cornell Tech, Cornell University, USA. suhr@cs.cornell.edu"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yoav","family":"Artzi","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Cornell Tech, Cornell University, USA. yoav@cs.cornell.edu"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"281","published-online":{"date-parts":[[2021,12,6]]},"reference":[{"key":"2021120918160969800_bib1","doi-asserted-by":"publisher","first-page":"319","DOI":"10.1023\/A:1022821128753","article-title":"Queries and concept learning","volume":"2","author":"Angluin","year":"1988","journal-title":"Machine Learning"},{"key":"2021120918160969800_bib2","article-title":"Deep reinforcement learning from policy-dependent human feedback","author":"Arumugam","year":"2019","journal-title":"CoRR"},{"key":"2021120918160969800_bib3","first-page":"1415","article-title":"Learning to map natural language instructions to physical quadcopter control using simulated flight","volume-title":"Proceedings of the Conference on Robot Learning","author":"Blukis","year":"2019"},{"key":"2021120918160969800_bib4","article-title":"Deep reinforcement learning from human preferences","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","author":"Christiano","year":"2017"},{"key":"2021120918160969800_bib5","doi-asserted-by":"publisher","first-page":"170","DOI":"10.1080\/15475441.2017.1340843","article-title":"Conversation and language acquisition: A pragmatic approach","volume":"14","author":"Clark","year":"2018","journal-title":"Language Learning and Development"},{"key":"2021120918160969800_bib6","doi-asserted-by":"publisher","first-page":"441","DOI":"10.1080\/0163853X.2020.1719795","article-title":"Conversational repair and the acquisition of language","volume":"57","author":"Clark","year":"2020","journal-title":"Discourse Processes"},{"issue":"1","key":"2021120918160969800_bib7","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/0010-0277(86)90010-7","article-title":"Referring as a collaborative process","volume":"22","author":"Clark","year":"1986","journal-title":"Cognition"},{"key":"2021120918160969800_bib8","doi-asserted-by":"publisher","first-page":"233","DOI":"10.1207\/s15516709cog1902_3","article-title":"Computational interpretations of the gricean maxims in the generation of referring expressions","volume":"19","author":"Dale","year":"1995","journal-title":"Cognitive Science"},{"key":"2021120918160969800_bib9","doi-asserted-by":"publisher","first-page":"389","DOI":"10.1007\/s10514-015-9454-z","article-title":"Active reward learning with a novel acquisition function","volume":"39","author":"Daniel","year":"2015","journal-title":"Autonomous Robots"},{"key":"2021120918160969800_bib10","article-title":"Natural language generation in the context of providing indoor route instructions","volume-title":"Proceedings of the Robotics: Science and Systems Workshop on Model Learning for Human-Robot Communication","author":"Daniele","year":"2016"},{"issue":"78","key":"2021120918160969800_bib11","first-page":"1","article-title":"POT: Python optimal transport","volume":"22","author":"Flamary","year":"2021","journal-title":"Journal of Machine Learning Research"},{"key":"2021120918160969800_bib12","doi-asserted-by":"publisher","first-page":"1951","DOI":"10.18653\/v1\/N18-1177","article-title":"Unified pragmatic models for generating and following instructions","volume-title":"Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Fried","year":"2018"},{"key":"2021120918160969800_bib13","doi-asserted-by":"publisher","first-page":"65","DOI":"10.1613\/jair.5477","article-title":"Survey of the state of the art in natural language generation: Core tasks, applications and evaluation","volume":"61","author":"Gatt","year":"2017","journal-title":"Journal Artificial Intelligence Research"},{"key":"2021120918160969800_bib14","doi-asserted-by":"publisher","first-page":"408","DOI":"10.18653\/v1\/2020.conll-1.33","article-title":"Continual adaptation for efficient machine communication","volume-title":"Proceedings of the Conference on Computational Natural Language Learning","author":"Hawkins","year":"2020"},{"issue":"6","key":"2021120918160969800_bib15","doi-asserted-by":"publisher","first-page":"e12845","DOI":"10.1111\/cogs.12845","article-title":"Characterizing the dynamics of learning in repeated reference games","volume":"44","author":"Hawkins","year":"2020","journal-title":"Cognitive Science"},{"key":"2021120918160969800_bib16","article-title":"Hexaconv","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Hoogeboom","year":"2018"},{"issue":"260","key":"2021120918160969800_bib17","doi-asserted-by":"publisher","first-page":"663","DOI":"10.1080\/01621459.1952.10483446","article-title":"A generalization of sampling without replacement from a finite universe","volume":"47","author":"Horvitz","year":"1952","journal-title":"Journal of the American Statistical Association"},{"key":"2021120918160969800_bib18","first-page":"208","article-title":"The GRUVE challenge: Generating routes under uncertainty in virtual environments","volume-title":"Proceedings of the European Workshop on Natural Language Generation","author":"Janarthanam","year":"2011"},{"key":"2021120918160969800_bib19","doi-asserted-by":"publisher","first-page":"3985","DOI":"10.18653\/v1\/2020.emnlp-main.327","article-title":"Human-centric dialog training via offline reinforcement learning","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing","author":"Jaques","year":"2020"},{"issue":"1","key":"2021120918160969800_bib20","doi-asserted-by":"publisher","first-page":"26","DOI":"10.1145\/357417.357420","article-title":"An iterative design methodology for user-friendly natural language office information applications","volume":"2","author":"Kelley","year":"1984","journal-title":"ACM Transactions on Information Systems"},{"key":"2021120918160969800_bib21","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1145\/1597735.1597738","article-title":"Interactively shaping agents via human reinforcement: the TAMER framework","volume-title":"Proceedings of the fifth international conference on Knowledge capture","author":"Bradley Knox","year":"2009"},{"key":"2021120918160969800_bib22","article-title":"Report on the second NLG challenge on generating instructions in virtual environments (GIVE-2)","volume-title":"Proceedings of International Natural Language Generation Conference","author":"Koller","year":"2010"},{"key":"2021120918160969800_bib23","doi-asserted-by":"publisher","first-page":"343","DOI":"10.1037\/h0023705","article-title":"Concurrent feedback, confirmation, and the encoding of referents in verbal communication.","volume":"43","author":"Krauss","year":"1966","journal-title":"Journal of Personality and Social Psychology"},{"key":"2021120918160969800_bib24","doi-asserted-by":"publisher","first-page":"92","DOI":"10.18653\/v1\/N18-3012","article-title":"Can neural machine translation be improved with user feedback?","volume-title":"Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Kreutzer","year":"2018"},{"key":"2021120918160969800_bib25","doi-asserted-by":"publisher","first-page":"1503","DOI":"10.18653\/v1\/P17-1138","article-title":"Bandit structured prediction for neural sequence-to-sequence learning","volume-title":"Proceedings of the Annual Meeting of the Association for Computational Linguistics","author":"Kreutzer","year":"2017"},{"key":"2021120918160969800_bib26","doi-asserted-by":"publisher","first-page":"1777","DOI":"10.18653\/v1\/P18-1165","article-title":"Reliability and learnability of human bandit feedback for sequence-to-sequence reinforcement learning","volume-title":"Proceedings of the Annual Meeting of the Association for Computational Linguistics","author":"Kreutzer","year":"2018"},{"key":"2021120918160969800_bib27","doi-asserted-by":"publisher","first-page":"1820","DOI":"10.18653\/v1\/P18-1169","article-title":"Improving a neural semantic parser by counterfactual learning from human bandit feedback","volume-title":"Proceedings of the Annual Meeting of the Association for Computational Linguistics","author":"Lawrence","year":"2018"},{"key":"2021120918160969800_bib28","doi-asserted-by":"publisher","first-page":"2566","DOI":"10.18653\/v1\/D17-1272","article-title":"Counterfactual learning from bandit feedback under deterministic logging : A case study in statistical machine translation","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing","author":"Lawrence","year":"2017"},{"key":"2021120918160969800_bib29","doi-asserted-by":"publisher","first-page":"2443","DOI":"10.18653\/v1\/D17-1259","article-title":"Deal or no deal? End-to-end learning of negotiation dialogues","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing","author":"Lewis","year":"2017"},{"key":"2021120918160969800_bib30","doi-asserted-by":"publisher","first-page":"2060","DOI":"10.18653\/v1\/N18-1187","article-title":"Dialogue learning with human teaching and feedback in end-to-end trainable task-oriented dialogue systems","volume-title":"Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Liu","year":"2018"},{"key":"2021120918160969800_bib31","article-title":"Decoupled weight decay regularization","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Loshchilov","year":"2018"},{"key":"2021120918160969800_bib32","article-title":"Interactive learning from policy-dependent human feedback","volume-title":"Proceedings of the International Conference on Machine Learning","author":"MacGlashan","year":"2017"},{"key":"2021120918160969800_bib33","article-title":"Simultaneous control and human feedback in the training of a robotic agent with actor-critic reinforcement learning","volume":"abs\/1606.06979","author":"Mathewson","year":"2016","journal-title":"arXiv"},{"key":"2021120918160969800_bib34","doi-asserted-by":"publisher","first-page":"5405","DOI":"10.18653\/v1\/P19-1537","article-title":"Collaborative dialogue in Minecraft","volume-title":"Proceedings of the Annual Meeting of the Association for Computational Linguistics","author":"Narayan-Chen","year":"2019"},{"key":"2021120918160969800_bib35","doi-asserted-by":"publisher","first-page":"1464","DOI":"10.18653\/v1\/D17-1153","article-title":"Reinforcement learning for bandit neural machine translation with simulated human feedback","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing","author":"Nguyen","year":"2017"},{"key":"2021120918160969800_bib36","doi-asserted-by":"publisher","first-page":"311","DOI":"10.3115\/1073083.1073135","article-title":"BLEU: A method for automatic evaluation of machine translation","volume-title":"Proceedings of the Annual Meeting of the Association for Computational Linguistics","author":"Papineni","year":"2002"},{"key":"2021120918160969800_bib37","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/ICORR.2011.5975338","article-title":"Online human training of a myoelectric prosthesis controller via actor-critic reinforcement learning","volume-title":"Proceedings of the International Conference on Rehabilitation Robotics","author":"Pilarski","year":"2011"},{"key":"2021120918160969800_bib38","article-title":"Language models are unsupervised multitask learners","author":"Radford","year":"2019"},{"key":"2021120918160969800_bib39","first-page":"2001","article-title":"icarl: Incremental classifier and representation learning","volume-title":"Proceedings of the Conference on Computer Vision and Pattern Recognition","author":"Rebuffi","year":"2017"},{"issue":"2","key":"2021120918160969800_bib40","doi-asserted-by":"publisher","first-page":"123","DOI":"10.1080\/09540099550039318","article-title":"Catastrophic forgetting, rehearsal and pseudorehearsal","volume":"7","author":"Robins","year":"1995","journal-title":"Connection Science"},{"key":"2021120918160969800_bib41","doi-asserted-by":"crossref","DOI":"10.1109\/ICCV.1998.710701","article-title":"A metric for distributions with applications to image databases","volume-title":"Proceedings of the International Conference on Computer Vision","author":"Rubner","year":"1998"},{"key":"2021120918160969800_bib42","article-title":"Proximal policy optimization algorithms","author":"Schulman","year":"2017","journal-title":"arXiv"},{"key":"2021120918160969800_bib43","article-title":"Active learning literature survey","author":"Settles","year":"2009"},{"key":"2021120918160969800_bib44","first-page":"3008","article-title":"Learning to summarize with human feedback","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","author":"Stiennon","year":"2020"},{"key":"2021120918160969800_bib45","doi-asserted-by":"publisher","first-page":"2119","DOI":"10.18653\/v1\/D19-1218","article-title":"Executing instructions in situated collaborative interactions","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing","author":"Suhr","year":"2019"},{"key":"2021120918160969800_bib46","first-page":"3008","article-title":"Sequence to sequence learning with neural networks","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","author":"Sutskever","year":"2014"},{"key":"2021120918160969800_bib47","doi-asserted-by":"publisher","first-page":"2610","DOI":"10.18653\/v1\/N19-1268","article-title":"Learning to navigate unseen environments: Back translation with environmental dropout","volume-title":"Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Tan","year":"2019"},{"key":"2021120918160969800_bib48","doi-asserted-by":"publisher","first-page":"2368","DOI":"10.18653\/v1\/P16-1224","article-title":"Learning language games through interaction","volume-title":"Proceedings of the Annual Meeting of the Association for Computational Linguistics","author":"Wang","year":"2016"},{"key":"2021120918160969800_bib49","first-page":"3589","article-title":"Optimal and adaptive off-policy evaluation in contextual bandits","volume-title":"Proceedings of International Conference on Machine Learning","author":"Wang","year":"2017"},{"key":"2021120918160969800_bib50","doi-asserted-by":"crossref","DOI":"10.1609\/aaai.v32i1.11485","article-title":"Deep TAMER: Interactive agent shaping in high-dimensional state spaces","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","author":"Warnell","year":"2018"},{"key":"2021120918160969800_bib51","article-title":"A Bayesian approach for policy learning from trajectory preference queries","volume-title":"Proceedings of the Advances in Neural Information Processing Systems","author":"Wilson","year":"2012"},{"key":"2021120918160969800_bib52","first-page":"443","article-title":"Convergence of syntactic complexity in conversation","volume-title":"Proceedings of the Annual Meeting of the Association for Computational Linguistics","author":"Yang","year":"2016"},{"key":"2021120918160969800_bib53","first-page":"7893","article-title":"Kl-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition","volume-title":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","author":"Dong","year":"2013"},{"key":"2021120918160969800_bib54","article-title":"BERTScore: Evaluating text generation with BERT","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Zhang","year":"2020"},{"key":"2021120918160969800_bib55","first-page":"1302","article-title":"On the evaluation of vision-and-language navigation instructions","volume-title":"Proceedings of the European Chapter of the Association for Computational Linguistics","author":"Zhao","year":"2021"},{"key":"2021120918160969800_bib56","article-title":"Encoder-agnostic adaptation for conditional language generation","author":"Ziegler","year":"2019","journal-title":"arXiv"}],"container-title":["Transactions of the Association for Computational Linguistics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00428\/1976207\/tacl_a_00428.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/direct.mit.edu\/tacl\/article-pdf\/doi\/10.1162\/tacl_a_00428\/1976207\/tacl_a_00428.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,17]],"date-time":"2023-01-17T20:26:53Z","timestamp":1673987213000},"score":1,"resource":{"primary":{"URL":"https:\/\/direct.mit.edu\/tacl\/article\/doi\/10.1162\/tacl_a_00428\/108610\/Continual-Learning-for-Grounded-Instruction"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021]]},"references-count":56,"URL":"https:\/\/doi.org\/10.1162\/tacl_a_00428","relation":{},"ISSN":["2307-387X"],"issn-type":[{"value":"2307-387X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021]]},"published":{"date-parts":[[2021]]}}}