{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,30]],"date-time":"2025-07-30T16:05:54Z","timestamp":1753891554105,"version":"3.41.2"},"reference-count":52,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2023,8,24]],"date-time":"2023-08-24T00:00:00Z","timestamp":1692835200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Neurorobot."],"abstract":"<jats:p>Traditional AI-planning methods for task planning in robotics require a symbolically encoded domain description. While powerful in well-defined scenarios, as well as human-interpretable, setting this up requires a substantial effort. Different from this, most everyday planning tasks are solved by humans intuitively, using mental imagery of the different planning steps. Here, we suggest that the same approach can be used for robots too, in cases which require only limited execution accuracy. In the current study, we propose a novel sub-symbolic method called Simulated Mental Imagery for Planning (SiMIP), which consists of perception, simulated action, success checking, and re-planning performed on 'imagined' images. We show that it is possible to implement mental imagery-based planning in an algorithmically sound way by combining regular convolutional neural networks and generative adversarial networks. With this method, the robot acquires the capability to use the initially existing scene to generate action plans without symbolic domain descriptions, while at the same time, plans remain human-interpretable, different from deep reinforcement learning, which is an alternative sub-symbolic approach. We create a data set from real scenes for a packing problem of having to correctly place different objects into different target slots. This way efficiency and success rate of this algorithm could be quantified.<\/jats:p>","DOI":"10.3389\/fnbot.2023.1218977","type":"journal-article","created":{"date-parts":[[2023,8,24]],"date-time":"2023-08-24T13:55:35Z","timestamp":1692885335000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Simulated mental imagery for robotic task planning"],"prefix":"10.3389","volume":"17","author":[{"given":"Shijia","family":"Li","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tomas","family":"Kulvicius","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Minija","family":"Tamosiunaite","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Florentin","family":"W\u00f6rg\u00f6tter","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1965","published-online":{"date-parts":[[2023,8,24]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","first-page":"709","DOI":"10.1613\/jair.1.13754","article-title":"Deepsym: Deep symbol generation and rule learning for planning from unsupervised robot interaction","volume":"75","author":"Ahmetoglu","year":"2022","journal-title":"J. Artificial Intellig. Res"},{"key":"B2","doi-asserted-by":"crossref","DOI":"10.1609\/aaai.v32i1.12077","article-title":"\u201cClassical planning in deep latent space: Bridging the subsymbolic-symbolic boundary,\u201d","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","author":"Asai","year":"2018"},{"key":"B3","first-page":"2653","article-title":"\u201cA cloud service for robotic mental simulations,\u201d","author":"Bozcuoglu","year":"2017"},{"key":"B4","first-page":"35","article-title":"\u201cObject representations as fixed points: Training iterative refinement algorithms with implicit differentiation,\u201d","volume-title":"Advances in Neural Information Processing Systems","author":"Chang","year":"2022"},{"key":"B5","first-page":"6935","article-title":"\u201cCross-domain image captioning with discriminative finetuning,\u201d","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Dess\u00ec","year":"2023"},{"key":"B6","first-page":"5882","article-title":"\u201cAffordancenet: An end-to-end deep learning approach for object affordance detection,\u201d","author":"Do","year":"2018","journal-title":"2018 IEEE International Conference on Robotics and Automation (ICRA)"},{"key":"B7","doi-asserted-by":"publisher","DOI":"10.15607\/RSS.2020.XVI.003","article-title":"Deep visual reasoning: Learning to predict action sequences for task and motion planning from an initial scene image","author":"Driess","year":"2020","journal-title":"arXiv"},{"key":"B8","article-title":"Visual foresight: Model-based deep reinforcement learning for vision-based robotic control","author":"Ebert","year":"2018","journal-title":"arXiv"},{"key":"B9","doi-asserted-by":"publisher","first-page":"189","DOI":"10.1016\/0004-3702(71)90010-5","article-title":"STRIPS: A new approach to the application of theorem proving to problem solving","volume":"2","author":"Fikes","year":"1971","journal-title":"Artif. Intell"},{"key":"B10","doi-asserted-by":"crossref","DOI":"10.4324\/9781315740218","volume-title":"The Ecological Approach to Visual Perception: Classic Edition","author":"Gibson","year":"2014"},{"key":"B11","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1803.10122","article-title":"World models","author":"Ha","year":"2018","journal-title":"arXiv"},{"key":"B12","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2010.02193","article-title":"Mastering atari with discrete world models","author":"Hafner","year":"2020","journal-title":"arXiv"},{"key":"B13","first-page":"2555","article-title":"\u201cLearning latent dynamics for planning from pixels,\u201d","volume-title":"International Conference on Machine Learning","author":"Hafner","year":"2019"},{"key":"B14","doi-asserted-by":"publisher","first-page":"191","DOI":"10.1613\/jair.1705","article-title":"The fast downward planning system","volume":"26","author":"Helmert","year":"2006","journal-title":"J. Artif. Intell. Res"},{"key":"B15","doi-asserted-by":"publisher","first-page":"71","DOI":"10.1016\/j.brainres.2011.06.026","article-title":"The current status of the simulation theory of cognition","volume":"1428","author":"Hesslow","year":"2012","journal-title":"Brain Res"},{"key":"B16","doi-asserted-by":"publisher","first-page":"57","DOI":"10.1609\/aimag.v22i3.1572","article-title":"FF: The fast-forward planning system","volume":"22","author":"Hoffmann","year":"2001","journal-title":"AI Magazine"},{"key":"B17","doi-asserted-by":"publisher","first-page":"10","DOI":"10.1016\/j.artint.2014.11.003","article-title":"Deliberation for autonomous robots: a survey","volume":"247","author":"Ingrand","year":"2017","journal-title":"Artif. Intell"},{"key":"B18","article-title":"\u201cAutonomous learning of object-centric abstractions for high-level planning,\u201d","volume-title":"International Conference on Learning Representations","author":"James","year":"2022"},{"key":"B19","first-page":"4401","article-title":"\u201cA style-based generator architecture for generative adversarial networks,\u201d","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Karras","year":"2019"},{"key":"B20","first-page":"1231","article-title":"\u201cLearning to simulate dynamic environments with gamegan,\u201d","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Kim","year":"2020"},{"key":"B21","doi-asserted-by":"publisher","first-page":"1","DOI":"10.2200\/S00426ED1V01Y201206AIM017","article-title":"Planning with markov decision processes: An AI perspective","volume":"6","author":"Kolobov","year":"2012","journal-title":"Synth. Lect"},{"key":"B22","doi-asserted-by":"publisher","first-page":"215","DOI":"10.1613\/jair.5575","article-title":"From skills to symbols: learning symbolic representations for abstract high-level planning","volume":"61","author":"Konidaris","year":"2018","journal-title":"J. Artif. Intell. Res"},{"key":"B23","doi-asserted-by":"publisher","first-page":"12","DOI":"10.1049\/ccs.2018.0002","article-title":"Advantage of prediction and mental imagery for goal-directed behaviour in agents and robots","volume":"1","author":"Krichmar","year":"2019","journal-title":"Cogn. Comput"},{"key":"B24","first-page":"107","article-title":"\u201cSimulation-based temporal projection of everyday robot object manipulation,\u201d","volume-title":"The 10th International Conference on Autonomous Agents and Multiagent Systems","author":"Kunze","year":"2011"},{"key":"B25","doi-asserted-by":"publisher","DOI":"10.1126\/scirobotics.aau9354","article-title":"Task-agnostic self-modeling machines","author":"Kwiatkowski","year":"2019","journal-title":"Sci. Robot"},{"key":"B26","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1613\/jair.3093","article-title":"Planning with noisy probabilistic relational rules","volume":"39","author":"Lang","year":"2010","journal-title":"J. Artif. Intell. Res"},{"key":"B27","first-page":"1916","article-title":"\u201cMira: mental imagery for robotic affordances,\u201d","author":"Lin","year":"2023","journal-title":"Conference on Robot Learning"},{"key":"B28","first-page":"85","article-title":"\u201cImage inpainting for irregular holes using partial convolutions,\u201d","volume-title":"Proceedings of the European Conference on Computer Vision (ECCV)","author":"Liu","year":"2018"},{"key":"B29","doi-asserted-by":"publisher","first-page":"92","DOI":"10.1016\/j.robot.2019.05.005","article-title":"Context-based affordance segmentation from 2D images for robot actions","volume":"119","author":"L\u00fcddecke","year":"2019","journal-title":"Rob. Auton. Syst"},{"key":"B30","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1904.12584","article-title":"The neuro-symbolic concept learner: Interpreting scenes, words, and sentences from natural supervision","author":"Mao","year":"2019","journal-title":"arXiv"},{"key":"B31","article-title":"Hierarchical foresight: self-supervised learning of long-horizon tasks via visual subgoal generation","author":"Nair","year":"2019","journal-title":"arXiv"},{"key":"B32","article-title":"\u201cImagination-augmented agents for deep reinforcement learning,\u201d","volume-title":"Advances in Neural Information Processing Systems","author":"Racani\u00e8re","year":"2017"},{"key":"B33","first-page":"10684","article-title":"\u201cHigh-resolution image synthesis with latent diffusion models,\u201d","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Rombach","year":"2022"},{"key":"B34","first-page":"234","article-title":"\u201cU-net: Convolutional networks for biomedical image segmentation,\u201d","volume-title":"International Conference on Medical Image Computing and Computer-Assisted Intervention","author":"Ronneberger","year":"2015"},{"key":"B35","doi-asserted-by":"crossref","first-page":"186","DOI":"10.1007\/978-3-642-33515-0_19","article-title":"\u201cControl by 3d simulation-a new robotics approach to control design in automation,\u201d","volume-title":"International Conference on Intelligent Robotics and Applications","author":"Rossmann","year":"2012"},{"key":"B36","doi-asserted-by":"publisher","first-page":"604","DOI":"10.1038\/s41586-020-03051-4","article-title":"Mastering atari, go, chess and shogi by planning with a learned model","volume":"588","author":"Schrittwieser","year":"2020","journal-title":"Nature"},{"key":"B37","doi-asserted-by":"crossref","first-page":"435","DOI":"10.1007\/978-3-540-79547-6_42","article-title":"\u201cFunctional object class detection based on learned affordance cues,\u201d","volume-title":"International Conference on Computer Vision Systems","author":"Stark","year":"2008"},{"key":"B38","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1803.11361","article-title":"Ddrprog: a clever differentiable dynamic reasoning programmer","author":"Suarez","year":"2018","journal-title":"arXiv"},{"key":"B39","first-page":"10781","article-title":"\u201cEfficientdet: Scalable and efficient object detection,\u201d","volume-title":"Proceedings of the IEEE\/CVF Conference On Computer Vision and Pattern Recognition","author":"Tan","year":"2020"},{"key":"B40","doi-asserted-by":"crossref","first-page":"2627","DOI":"10.1109\/ICRA.2015.7139553","article-title":"\u201cBottom-up learning of object categories, action effects and logical rules: from continuous manipulative exploration to symbolic planning,\u201d","volume-title":"2015 IEEE International Conference on Robotics and Automation (ICRA)","author":"Ugur","year":"2015"},{"key":"B41","first-page":"1439","article-title":"\u201cEntity abstraction in visual model-based reinforcement learning,\u201d","author":"Veerapaneni","year":"2020","journal-title":"Conference on Robot Learning"},{"key":"B42","doi-asserted-by":"publisher","DOI":"10.15607\/RSS.2019.XV.074","article-title":"Learning robotic manipulation through visual planning and acting","author":"Wang","year":"2019","journal-title":"arXiv"},{"key":"B43","doi-asserted-by":"publisher","first-page":"271","DOI":"10.1109\/LRA.2020.3039943","article-title":"Can I pour into it? Robot imagining open containability affordance of previously unseen objects via physical simulations","volume":"6","author":"Wu","year":"2020","journal-title":"IEEE Robot. Autom. Lett"},{"key":"B44","first-page":"699","article-title":"\u201cNeural scene de-rendering,\u201d","volume-title":"Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition","author":"Wu","year":"2017"},{"key":"B45","doi-asserted-by":"crossref","first-page":"6206","DOI":"10.1109\/ICRA48506.2021.9560841","article-title":"\u201cDeep affordance foresight: Planning through what can be done in the future,\u201d","author":"Xu","year":"2021","journal-title":"2021 IEEE International Conference on Robotics and Automation (ICRA)"},{"key":"B46","article-title":"\u201cRegression planning networks,\u201d","author":"Xu","year":"2019","journal-title":"Advances in Neural Information Processing Systems"},{"key":"B47","article-title":"\u201cNeural-symbolic VQA: Disentangling reasoning from vision and language understanding,\u201d","author":"Yi","year":"2018","journal-title":"Advances in Neural Information Processing Systems"},{"key":"B48","first-page":"4471","article-title":"\u201cFree-form image inpainting with gated convolution,\u201d","author":"Yu","year":"2019","journal-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision"},{"key":"B49","first-page":"3784","article-title":"\u201cSelf-supervised scene de-occlusion,\u201d","author":"Zhan","year":"2020","journal-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition"},{"key":"B50","article-title":"\u201cDynamically constructed (PO) MDPs for adaptive robot planning,\u201d","author":"Zhang","year":"2017","journal-title":"Thirty-First AAAI Conference on Artificial Intelligence"},{"key":"B51","first-page":"408","article-title":"\u201cReasoning about object affordances in a knowledge base representation,\u201d","volume-title":"European Conference on Computer Vision","author":"Zhu","year":"2014"},{"key":"B52","doi-asserted-by":"crossref","first-page":"6541","DOI":"10.1109\/ICRA48506.2021.9561548","article-title":"\u201cHierarchical planning for long-horizon manipulation with geometric and symbolic scene graphs,\u201d","volume-title":"2021 IEEE International Conference on Robotics and Automation (ICRA)","author":"Zhu","year":"2021"}],"container-title":["Frontiers in Neurorobotics"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fnbot.2023.1218977\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,8,24]],"date-time":"2023-08-24T13:56:00Z","timestamp":1692885360000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fnbot.2023.1218977\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,8,24]]},"references-count":52,"alternative-id":["10.3389\/fnbot.2023.1218977"],"URL":"https:\/\/doi.org\/10.3389\/fnbot.2023.1218977","relation":{},"ISSN":["1662-5218"],"issn-type":[{"type":"electronic","value":"1662-5218"}],"subject":[],"published":{"date-parts":[[2023,8,24]]},"article-number":"1218977"}}