{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,2]],"date-time":"2026-06-02T12:02:01Z","timestamp":1780401721915,"version":"3.54.1"},"reference-count":50,"publisher":"Springer Science and Business Media LLC","issue":"8","license":[{"start":{"date-parts":[[2023,8,28]],"date-time":"2023-08-28T00:00:00Z","timestamp":1693180800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,8,28]],"date-time":"2023-08-28T00:00:00Z","timestamp":1693180800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100006034","name":"University of Southern California","doi-asserted-by":"crossref","id":[{"id":"10.13039\/100006034","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Auton Robot"],"published-print":{"date-parts":[[2023,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Task planning can require defining myriad domain knowledge about the world in which a robot needs to act. To ameliorate that effort, large language models (LLMs) can be used to score potential next actions during task planning, and even generate action sequences directly, given an instruction in natural language with no additional domain information. However, such methods either require enumerating all possible next steps for scoring, or generate free-form text that may contain actions not possible on a given robot in its current context. We present a programmatic LLM prompt structure that enables plan generation functional across situated environments, robot capabilities, and tasks. Our key insight is to prompt the LLM with program-like specifications of the available actions and objects in an environment, as well as with example  that can be executed. We make concrete recommendations about prompt structure and generation constraints through ablation experiments, demonstrate state of the art success rates in VirtualHome household tasks, and deploy our method on a physical robot arm for tabletop tasks. Website and code at <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/progprompt.github.io\/\">progprompt.github.io<\/jats:ext-link><\/jats:p>","DOI":"10.1007\/s10514-023-10135-3","type":"journal-article","created":{"date-parts":[[2023,8,28]],"date-time":"2023-08-28T17:06:01Z","timestamp":1693242361000},"page":"999-1012","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":62,"title":["ProgPrompt: program generation for situated robot task planning using large language models"],"prefix":"10.1007","volume":"47","author":[{"given":"Ishika","family":"Singh","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Valts","family":"Blukis","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Arsalan","family":"Mousavian","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ankit","family":"Goyal","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Danfei","family":"Xu","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jonathan","family":"Tremblay","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Dieter","family":"Fox","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jesse","family":"Thomason","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Animesh","family":"Garg","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2023,8,28]]},"reference":[{"key":"10135_CR1","unstructured":"Ahn, M., Brohan, A., Brown, N., Chebotar, Y., Cortes, O., David, B., & Yan, M. (2022). Do as i can, not as i say: Grounding language in robotic affordances. arXiv."},{"key":"10135_CR2","unstructured":"Akakzia, A., Colas, C., Oudeyer, P. Y., Chetouani, M., & Sigaud, O. (2021). Grounding language to autonomously-acquired skills via goal generation. In International conference on learning representations."},{"key":"10135_CR3","unstructured":"Baier, J. A., Bacchus, F., & McIlraith, S. A. (2007). A heuristic search approach to planning with temporally extended preferences. In Proceedings of the 20th international joint conference on artifical intelligence (pp. 1808\u20131815). Morgan Kaufmann Publishers Inc."},{"key":"10135_CR4","unstructured":"Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., & Amodei, D. (2020). Language models are few-shot learners. arXiv."},{"issue":"1","key":"10135_CR5","first-page":"47","volume":"28","author":"D Bryce","year":"2007","unstructured":"Bryce, D., & Kambhampati, S. (2007). A tutorial on planning graph based reachability heuristics. AI Magazine, 28(1), 47.","journal-title":"AI Magazine"},{"key":"10135_CR6","unstructured":"Cao, Y., & Lee, C. (2023). Robot behavior-tree-based task generation with large language models. arXiv preprint arXiv:2302.12927"},{"key":"10135_CR7","unstructured":"Capitanelli, A., & Mastrogiovanni, F. (2023). A framework to generate neurosymbolic pddl-compliant planners. arXiv preprint arXiv:2303.00438"},{"key":"10135_CR8","unstructured":"Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H. P. D. O., Kaplan, J., & Zaremba, W. (2021). Evaluating large language models trained on code. arXiv."},{"key":"10135_CR9","doi-asserted-by":"crossref","unstructured":"Danielczuk, M., Mousavian, A., Eppner, C., & Fox, D. (2021). Object rearrangement using learned implicit collision functions. In IEEE international conference on robotics and automation (ICRA).","DOI":"10.1109\/ICRA48506.2021.9561516"},{"key":"10135_CR10","unstructured":"Eysenbach, B., Salakhutdinov, R. R., & Levine, S. (2019). Search on the replay buffer: Bridging planning and reinforcement learning. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d\u2019 Alch\u00e9-Buc, E. Fox, & R. Garnett (Eds.), Advances in neural information processing systems (vol. 32). Curran Associates, Inc."},{"key":"10135_CR11","doi-asserted-by":"crossref","unstructured":"Fikes, R. E., & Nilsson, N. J. (1971). Strips: A new approach to the application of theorem proving to problem solving. In Proceedings of the 2nd international joint conference on artificial intelligence (pp. 608\u2013620). Morgan Kaufmann Publishers Inc.","DOI":"10.1016\/0004-3702(71)90010-5"},{"issue":"1","key":"10135_CR12","doi-asserted-by":"publisher","first-page":"440","DOI":"10.1609\/icaps.v30i1.6739","volume":"30","author":"CR Garrett","year":"2020","unstructured":"Garrett, C. R., Lozano-P\u00e9rez, T., & Kaelbling, L. P. (2020). Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. Proceedings of the International Conference on Automated Planning and Scheduling, 30(1), 440\u2013448.","journal-title":"Proceedings of the International Conference on Automated Planning and Scheduling"},{"key":"10135_CR13","unstructured":"Gu, X., Lin, T. Y., Kuo, W., & Cui, Y. (2022). Open-vocabulary object detection via vision and language knowledge distillation. In International conference on learning representations."},{"key":"10135_CR14","doi-asserted-by":"crossref","unstructured":"Gupta, T., & Kembhavi, A. (2022). Visual programming: Compositional visual reasoning without training. arXiv preprint arXiv:2211.11559","DOI":"10.1109\/CVPR52729.2023.01436"},{"issue":"1","key":"10135_CR15","doi-asserted-by":"publisher","first-page":"191","DOI":"10.1613\/jair.1705","volume":"26","author":"M Helmert","year":"2006","unstructured":"Helmert, M. (2006). The fast downward planning system. Journal of Artificial Intelligence Research, 26(1), 191\u2013246.","journal-title":"Journal of Artificial Intelligence Research"},{"issue":"3","key":"10135_CR16","first-page":"57","volume":"22","author":"J Hoffmann","year":"2001","unstructured":"Hoffmann, J. (2001). Ff: The fast-forward planning system. AI Magazine, 22(3), 57.","journal-title":"AI Magazine"},{"key":"10135_CR17","unstructured":"Holtzman, A., Buys, J., Du, L., Forbes, M., & Choi, Y. (2020). The curious case of neural text degeneration. In International conference on learning representations."},{"key":"10135_CR18","unstructured":"Huang, W., Abbeel, P., Pathak, D., & Mordatch, I. (2022). Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. arXiv preprint arXiv:2201.07207"},{"key":"10135_CR19","unstructured":"Huang, W., Xia, F., Shah, D., Driess, D., Zeng, A., Lu, Y., others (2023). Grounded decoding: Guiding text generation with grounded models for robot control. arXiv preprint arXiv:2303.00855"},{"key":"10135_CR20","unstructured":"Huang, W., Xia, F., Xiao, T., Chan, H., Liang, J., Florence, P., & Ichter, B. (2022). Inner monologue: Embodied reasoning through planning with language models. arxiv preprint arxiv:2207.05608."},{"key":"10135_CR21","doi-asserted-by":"crossref","unstructured":"Jansen, P. (2020). Visually-grounded planning without vision: Language models infer detailed plans from high-level instructions. In Findings of the association for computational linguistics: Emnlp 2020 (pp. 4412\u20134417). Online: Association for Computational Linguistics.","DOI":"10.18653\/v1\/2020.findings-emnlp.395"},{"key":"10135_CR22","unstructured":"Jiang, Y., Gu, S. S., Murphy, K. P., & Finn, C. (2019). Language as an abstraction for hierarchical deep reinforcement learning. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d\u2019 Alch\u00e9-Buc, E. Fox, & R. Garnett (Eds.), Advances in neural information processing systems.  (vol. 32). Curran Associates, Inc."},{"key":"10135_CR23","doi-asserted-by":"crossref","unstructured":"Jiang, Y., Zhang, S., Khandelwal, P., & Stone, P. (2018). Task planning in robotics: An empirical comparison of pddl-based and asp-based systems. arXiv.","DOI":"10.1631\/FITEE.1800514"},{"key":"10135_CR24","unstructured":"Kurutach, T., Tamar, A., Yang, G., Russell, S. J., & Abbeel, P. (2018). Learning plannable representations with causal infogan. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, & R. Garnett (Eds.), Advances in neural information processing systems  (vol. 31). Curran Associates, Inc."},{"key":"10135_CR25","unstructured":"Li, S., Puig, X., Paxton, C., Du, Y., Wang, C., Fan, L., & Zhu, Y. (2022). Pre-trained language models for interactive decision-making. arXiv."},{"key":"10135_CR26","doi-asserted-by":"crossref","unstructured":"Liang, J., Huang, W., Xia, F., Xu, P., Hausman, K., Ichter, B., & Zeng, A. (2023). Code as policies: Language model programs for embodied control.","DOI":"10.1109\/ICRA48891.2023.10160591"},{"key":"10135_CR27","unstructured":"Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., & Neubig, G. (2021). Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. arXiv."},{"key":"10135_CR28","doi-asserted-by":"crossref","unstructured":"Luong, T., Pham, H., & Manning, C. D. (2015). Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 conference on empirical methods in natural language processing (pp. 1412\u20131421). Association for Computational Linguistics.","DOI":"10.18653\/v1\/D15-1166"},{"key":"10135_CR29","unstructured":"Mai, J., Chen, J., Li, B., Qian, G., Elhoseiny, M., & Ghanem, B. (2023). Llm as a robotic brain: Unifying egocentric memory and control. arXiv preprint arXiv:2304.09349"},{"key":"10135_CR30","unstructured":"Mirchandani, S., Karamcheti, S., & Sadigh, D. (2021). Ella: Exploration through learned language abstraction. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, J. W. Vaughan (Eds.), Advances in neural information processing systems (vol.\u00a034, pp. 29529\u201329540). Curran Associates, Inc."},{"key":"10135_CR31","unstructured":"Nair, S., & Finn, C. (2020). Hierarchical foresight: Self-supervised learning of long-horizon tasks via visual subgoal generation. In International conference on learning representations."},{"key":"10135_CR32","unstructured":"OpenAI (2023). Gpt-4 technical report. arXiv."},{"key":"10135_CR33","unstructured":"Patel, R., & Pavlick, E. (2022). Mapping language models to grounded conceptual spaces. In International conference on learning representations."},{"key":"10135_CR34","doi-asserted-by":"crossref","unstructured":"Puig, X., Ra, K., Boben, M., Li, J., Wang, T., Fidler, S., & Torralba, A. (2018). Virtualhome: Simulating household activities via programs. In 2018 IEEE\/cvf conference on computer vision and pattern recognition (pp. 8494\u20138502).","DOI":"10.1109\/CVPR.2018.00886"},{"key":"10135_CR35","unstructured":"Shah, D., Toshev, A. T., Levine, S., & brian ichter. (2022). Value function spaces: Skill-centric state abstractions for long-horizon reasoning. In International conference on learning representations."},{"key":"10135_CR36","doi-asserted-by":"crossref","unstructured":"Sharma, P., Torralba, A., & Andreas, J. (2022). Skill induction and planning with latent language. In Proceedings of the 60th annual meeting of the association for computational linguistics (volume 1: Long papers) (pp. 1713\u20131726). Association for Computational Linguistics.","DOI":"10.18653\/v1\/2022.acl-long.120"},{"key":"10135_CR37","doi-asserted-by":"crossref","unstructured":"Shridhar, M., Thomason, J., Gordon, D., Bisk, Y., Han, W., Mottaghi, R., & Fox, D. (2020). ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks. In The IEEE conference on computer vision and pattern recognition (cvpr).","DOI":"10.1109\/CVPR42600.2020.01075"},{"key":"10135_CR38","unstructured":"Silver, T., Chitnis, R., Kumar, N., McClinton, W., Lozano-Perez, T., Kaelbling, L. P., & Tenenbaum, J. (2022). Inventing relational state and action abstractions for effective and efficient bilevel planning. In The multi-disciplinary conference on reinforcement learning and decision making (rldm)."},{"key":"10135_CR39","unstructured":"Skreta, M., Yoshikawa, N., Arellano-Rubach, S., Ji, Z., Kristensen, L. B., Darvish, K., & Garg, A. (2023). Errors are useful prompts: Instruction guided task programming with verifier-assisted iterative prompting. arXiv preprint arXiv:2303.14100"},{"key":"10135_CR40","unstructured":"Srinivas, A., Jabri, A., Abbeel, P., Levine, S., & Finn, C. (2018). Universal planning networks: Learning generalizable representations for visuomotor control. In J. Dy, & A. Krause (Eds.), Proceedings of the 35th international conference on machine learning (vol.\u00a080, pp. 4732\u20134741). PMLR."},{"key":"10135_CR41","doi-asserted-by":"crossref","unstructured":"Sundermeyer, M., Mousavian, A., Triebel, R., & Fox, D. (2021). Contact-graspnet: Efficient 6-dof grasp generation in cluttered scenes. In 2021 IEEE international conference on robotics and automation (icra) (pp. 13438\u201313444).","DOI":"10.1109\/ICRA48506.2021.9561877"},{"key":"10135_CR42","unstructured":"Vemprala, S., Bonatti, R., Bucker, A., & Kapoor, A. (2023). Chatgpt for robotics: Design principles and model abilities. 2023"},{"key":"10135_CR43","unstructured":"Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., & Zhou, D. (2022). Chain of thought prompting elicits reasoning in large language models. arXiv."},{"key":"10135_CR44","doi-asserted-by":"crossref","unstructured":"Wiseman, S., Shieber, S., & Rush, A. (2017). Challenges in data-to-document generation. In Proceedings of the 2017 conference on empirical methods in natural language processing (pp. 2253\u20132263). Association for Computational Linguistics.","DOI":"10.18653\/v1\/D17-1239"},{"key":"10135_CR45","unstructured":"Xie, Y., Yu, C., Zhu, T., Bai, J., Gong, Z., & Soh, H. (2023a). Translating natural language to planning goals with large-language models. arXiv preprint arXiv:2302.05128"},{"key":"10135_CR46","unstructured":"Xie, Y., Yu, C., Zhu, T., Bai, J., Gong, Z., & Soh, H. (2023b). Translating natural language to planning goals with large-language models."},{"key":"10135_CR47","unstructured":"Xu, D., Mart\u00edn-Mart\u00edn, R., Huang, D. A., Zhu, Y., Savarese, S., & Fei-Fei, L. F. (2019). Regression planning networks. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d\u2019 Alch\u00e9-Buc, E. Fox, & R. Garnett (Eds.), Advances in neural information processing systems  (vol. 32). Curran Associates, Inc."},{"key":"10135_CR48","doi-asserted-by":"crossref","unstructured":"Xu, D., Nair, S., Zhu, Y., Gao, J., Garg, A., Fei-Fei, L., & Savarese, S. (2018). Neural task programming: Learning to generalize across hierarchical tasks. In 2018 IEEE international conference on robotics and automation (icra) (pp. 3795\u20133802).","DOI":"10.1109\/ICRA.2018.8460689"},{"key":"10135_CR49","unstructured":"Zeng, A., Attarian, M., Ichter, B., Choromanski, K., Wong, A., Welker, S., & Florence, P. (2022). Socratic models: Composing zero-shot multimodal reasoning with language. arXiv"},{"key":"10135_CR50","doi-asserted-by":"crossref","unstructured":"Zhu, Y., Tremblay, J., Birchfield, S., & Zhu, Y. (2020). Hierarchical planning for long-horizon manipulation with geometric and symbolic scene graphs. arXiv.","DOI":"10.1109\/ICRA48506.2021.9561548"}],"container-title":["Autonomous Robots"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10514-023-10135-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10514-023-10135-3\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10514-023-10135-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,28]],"date-time":"2023-11-28T18:09:24Z","timestamp":1701194964000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10514-023-10135-3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,8,28]]},"references-count":50,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2023,12]]}},"alternative-id":["10135"],"URL":"https:\/\/doi.org\/10.1007\/s10514-023-10135-3","relation":{},"ISSN":["0929-5593","1573-7527"],"issn-type":[{"value":"0929-5593","type":"print"},{"value":"1573-7527","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,8,28]]},"assertion":[{"value":"1 May 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"3 August 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"28 August 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}