{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,10]],"date-time":"2026-01-10T12:19:32Z","timestamp":1768047572977,"version":"3.49.0"},"reference-count":42,"publisher":"Springer Science and Business Media LLC","issue":"4","license":[{"start":{"date-parts":[[2025,10,28]],"date-time":"2025-10-28T00:00:00Z","timestamp":1761609600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,10,28]],"date-time":"2025-10-28T00:00:00Z","timestamp":1761609600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100020963","name":"Moonshot Research and Development Program","doi-asserted-by":"publisher","award":["JPMJMS2032"],"award-info":[{"award-number":["JPMJMS2032"]}],"id":[{"id":"10.13039\/501100020963","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Intell Robot Syst"],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Imitation learning (IL) is a promising approach for acquiring policies to automate robotic tasks from human demonstrations, leveraging the high cognitive capabilities of humans. However, applying IL to nonprehensile manipulation tasks involving invisible objects under partial observability, such as excavating buried rocks, remains challenging. In such settings, the demonstrator must make complex action decisions, including exploratory actions to locate the object and task-oriented actions to accomplish the task, while inferring the object\u2019s hidden state. This often leads to inconsistent demonstrations and imposes a high cognitive load. For these problems, insights from cognitive science suggest that encouraging demonstrators to follow simple, pre-designed exploration rules can help mitigate the problems of action inconsistency and high cognitive load. Accordingly, when performing IL from demonstrations guided by such exploration rules, it is crucial to imitate not only the demonstrator\u2019s task-oriented behavior but also his\/her mode-switching behavior (between exploration and task-oriented behavior) under partial observability. Based on the above considerations, this paper proposes a novel IL framework, called Belief Exploration-Action Cloning (BEAC), which employs a switching policy structure that integrates a pre-designed exploration policy with a task-oriented action policy trained on belief states estimated from past history. Through simulation and real-robot experiments, we demonstrate that BEAC achieves superior task performance, higher accuracy in mode and action prediction, and reduced demonstrator cognitive load, as confirmed by a user study.<\/jats:p>","DOI":"10.1007\/s10846-025-02326-0","type":"journal-article","created":{"date-parts":[[2025,10,28]],"date-time":"2025-10-28T07:22:33Z","timestamp":1761636153000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["BEAC: Imitating Complex Exploration and Task-oriented Behaviors for Invisible Object Nonprehensile Manipulation"],"prefix":"10.1007","volume":"111","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5883-9181","authenticated-orcid":false,"given":"Hirotaka","family":"Tahara","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3545-4814","authenticated-orcid":false,"given":"Takamitsu","family":"Matsubara","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2025,10,28]]},"reference":[{"key":"2326_CR1","unstructured":"Zhou, W., Jiang, B., Yang, F., Paxton, C., Held, D.: Hacman: Learning hybrid actor-critic maps for 6d non-prehensile manipulation. In: Conference on Robot Learning, pp. 1\u201325 (2023)"},{"issue":"3","key":"2326_CR2","doi-asserted-by":"publisher","first-page":"172988142110073","DOI":"10.1177\/17298814211007305","volume":"18","author":"T Zhang","year":"2021","unstructured":"Zhang, T., Mo, H.: Reinforcement learning for robot research: A comprehensive review and open issues. Int. J. Adv. Rob. Syst. 18(3), 17298814211007304 (2021)","journal-title":"Int. J. Adv. Rob. Syst."},{"key":"2326_CR3","doi-asserted-by":"crossref","unstructured":"Elguea-Aguinaco, \u00cd., Serrano-Mu\u00f1oz, A., Chrysostomou, D., Inziarte-Hidalgo, I., B\u00f8gh, S., Arana-Arexolaleiba, N.: A review on reinforcement learning for contact-rich robotic manipulation tasks. Robot. Comput.-Integr. Manuf. 81, 102517 (2023)","DOI":"10.1016\/j.rcim.2022.102517"},{"key":"2326_CR4","doi-asserted-by":"crossref","unstructured":"Osa, T., Pajarinen, J., Neumann, G., Bagnell, J.A., Abbeel, P., Peters, J., et al.: An Algorithmic Perspective on Imitation Learning. Found. Trends\u00ae Robot. 7(1-2), 1\u2013179 (2018)","DOI":"10.1561\/2300000053"},{"key":"2326_CR5","doi-asserted-by":"crossref","unstructured":"Zhang, T., McCarthy, Z., Jow, O., Lee, D., Chen, X., Goldberg, K., Abbeel, P.: Deep imitation learning for complex manipulation tasks from virtual reality teleoperation. In: IEEE International Conference on Robotics and Automation, pp. 5628\u20135635 (2018)","DOI":"10.1109\/ICRA.2018.8461249"},{"key":"2326_CR6","doi-asserted-by":"crossref","unstructured":"Zare, M., Kebria, P.M., Khosravi, A., Nahavandi, S.: A survey of imitation learning: Algorithms, recent developments, and challenges. IEEE Trans. Cybern. (2024)","DOI":"10.1109\/TCYB.2024.3395626"},{"key":"2326_CR7","doi-asserted-by":"crossref","unstructured":"Zhao, T.Z., Kumar, V., Levine, S., Finn, C.: Learning fine-grained bimanual manipulation with low-cost hardware (2023). arXiv preprint arXiv:2304.13705","DOI":"10.15607\/RSS.2023.XIX.016"},{"key":"2326_CR8","doi-asserted-by":"crossref","unstructured":"Buamanee, T., Kobayashi, M., Uranishi, Y., Takemura, H.: Bi-act: Bilateral control-based imitation learning via action chunking with transformer. In: 2024 IEEE International Conference on Advanced Intelligent Mechatronics (AIM), pp. 410\u2013415 (2024). IEEE","DOI":"10.1109\/AIM55361.2024.10637173"},{"key":"2326_CR9","unstructured":"Subramanian, K., Isbell\u00a0Jr, C.L., Thomaz, A.L.: Exploration from Demonstration for Interactive Reinforcement Learning. In: International Conference on Autonomous Agents & Multiagent Systems, pp. 447\u2013456 (2016)"},{"key":"2326_CR10","doi-asserted-by":"crossref","unstructured":"Oh, H., Sasaki, H., Michael, B., Matsubara, T.: Bayesian Disturbance Injection: Robust Imitation Learning of Flexible Policies. In: IEEE International Conference on Robotics and Automation, pp. 8629\u20138635 (2021)","DOI":"10.1109\/ICRA48506.2021.9561573"},{"key":"2326_CR11","doi-asserted-by":"publisher","first-page":"102","DOI":"10.1016\/j.chb.2014.03.047","volume":"36","author":"JA Alvarado-Valencia","year":"2014","unstructured":"Alvarado-Valencia, J.A., Barrero, L.H.: Reliance, trust and heuristics in judgmental forecasting. Comput. Hum. Behav. 36, 102\u2013113 (2014)","journal-title":"Comput. Hum. Behav."},{"issue":"1\u20132","key":"2326_CR12","first-page":"99","volume":"101","author":"Planning and acting in partially observable stochastic domains","year":"1998","unstructured":"Planning and acting in partially observable stochastic domains: author=Kaelbling, Leslie Pack and Littman, Michael L and Cassandra. Anthony R. Artif. Intell. 101(1\u20132), 99\u2013134 (1998)","journal-title":"Anthony R. Artif. Intell."},{"key":"2326_CR13","unstructured":"Gangwani, T., Lehman, J., Liu, Q., Peng, J.: Learning Belief Representations for Imitation Learning in POMDPs. In: Uncertainty in Artificial Intelligence, pp. 1061\u20131071 (2020)"},{"key":"2326_CR14","doi-asserted-by":"crossref","unstructured":"Katayama, T.: Subspace Methods for System Identification. Springer, London (2005)","DOI":"10.1007\/1-84628-158-X"},{"issue":"1","key":"2326_CR15","doi-asserted-by":"publisher","first-page":"102","DOI":"10.1016\/j.ifacol.2023.02.018","volume":"56","author":"K Yamada","year":"2023","unstructured":"Yamada, K., Maruta, I., Fujimoto, K.: Subspace State-Space Identification of Nonlinear Dynamical System Using Deep Neural Network with a Bottleneck. IFAC-PapersOnLine. 56(1), 102\u2013107 (2023)","journal-title":"IFAC-PapersOnLine."},{"key":"2326_CR16","doi-asserted-by":"crossref","unstructured":"Hart, S.G.: Nasa-Task Load Index (NASA-TLX); 20 Years Later. In: Human Factors and Ergonomics Society Annual Meeting, vol. 50, pp. 904\u2013908 (2006)","DOI":"10.1177\/154193120605000909"},{"key":"2326_CR17","doi-asserted-by":"crossref","unstructured":"Tahara, H., Sasaki, H., Oh, H., Michael, B., Matsubara, T.: Disturbance\u2013injected Robust Imitation Learning with Task Achievement. In: IEEE International Conference on Robotics and Automation, pp. 2466\u20132472 (2022)","DOI":"10.1109\/ICRA46639.2022.9812376"},{"issue":"5","key":"2326_CR18","doi-asserted-by":"publisher","first-page":"2724","DOI":"10.1109\/LRA.2023.3260586","volume":"8","author":"H Tahara","year":"2023","unstructured":"Tahara, H., Sasaki, H., Oh, H., Anarossi, E., Matsubara, T.: Disturbance injection under partial automation: Robust imitation learning for long-horizon tasks. IEEE Robot. Autom. Lett. 8(5), 2724\u20132731 (2023)","journal-title":"IEEE Robot. Autom. Lett."},{"issue":"1","key":"2326_CR19","doi-asserted-by":"publisher","first-page":"21","DOI":"10.1109\/TRO.2022.3200138","volume":"39","author":"M Lauri","year":"2022","unstructured":"Lauri, M., Hsu, D., Pajarinen, J.: Partially observable markov decision processes in robotics: A survey. IEEE Trans. Rob. 39(1), 21\u201340 (2022)","journal-title":"IEEE Trans. Rob."},{"key":"2326_CR20","unstructured":"Igl, M., Zintgraf, L., Le, T.A., Wood, F., Whiteson, S.: Deep Variational Reinforcement Learning for POMDPs. In: International Conference on Machine Learning, pp. 2117\u20132126 (2018)"},{"key":"2326_CR21","unstructured":"Zintgraf, L., Shiarlis, K., Igl, M., Schulze, S., Gal, Y., Hofmann, K., Whiteson, S.: Varibad: a very good method for bayes-adaptive deep rl via meta-learning. In: International Conference on Learning Representations (2020)"},{"key":"2326_CR22","doi-asserted-by":"crossref","unstructured":"Dang, H., Weisz, J., Allen, P.K.: Blind Grasping: Stable Robotic Grasping Using Tactile Feedback and Hand Kinematics. In: IEEE International Conference on Robotics and Automation, pp. 5917\u20135922 (2011)","DOI":"10.1109\/ICRA.2011.5979679"},{"key":"2326_CR23","doi-asserted-by":"crossref","unstructured":"Felip, J., Bernab\u00e9, J., Morales, A.: Contact-based blind grasping of unknown objects. In: International Conference on Humanoid Robots, pp. 396\u2013401 (2012)","DOI":"10.1109\/HUMANOIDS.2012.6651550"},{"key":"2326_CR24","doi-asserted-by":"publisher","DOI":"10.1016\/j.conengprac.2019.104136","volume":"92","author":"W Shaw-Cortez","year":"2019","unstructured":"Shaw-Cortez, W., Oetomo, D., Manzie, C., Choong, P.: Robust Object Manipulation for Tactile-based Blind Grasping. Control. Eng. Pract. 92, 104136 (2019)","journal-title":"Control. Eng. Pract."},{"issue":"2","key":"2326_CR25","doi-asserted-by":"publisher","first-page":"2232","DOI":"10.1109\/LRA.2020.2970622","volume":"5","author":"Y Yang","year":"2020","unstructured":"Yang, Y., Liang, H., Choi, C.: A deep learning approach to grasping the invisible. IEEE Robot. Autom. Lett. 5(2), 2232\u20132239 (2020)","journal-title":"IEEE Robot. Autom. Lett."},{"key":"2326_CR26","doi-asserted-by":"crossref","unstructured":"Murali, A., Li, Y., Gandhi, D., Gupta, A.: Learning to Grasp without Seeing. In: International Symposium on Experimental Robotics, pp. 375\u2013386 (2020)","DOI":"10.1007\/978-3-030-33950-0_33"},{"issue":"2","key":"2326_CR27","doi-asserted-by":"publisher","first-page":"3507","DOI":"10.1109\/LRA.2022.3146915","volume":"7","author":"S Zhong","year":"2022","unstructured":"Zhong, S., Fazeli, N., Berenson, D.: Soft Tracking Using Contacts for Cluttered Objects to Perform Blind Object Retrieval. IEEE Robot. Autom. Lett. 7(2), 3507\u20133514 (2022)","journal-title":"IEEE Robot. Autom. Lett."},{"issue":"4","key":"2326_CR28","doi-asserted-by":"publisher","first-page":"6670","DOI":"10.1109\/LRA.2020.3013848","volume":"5","author":"A Kadian","year":"2020","unstructured":"Kadian, A., Truong, J., Gokaslan, A., Clegg, A., Wijmans, E., Lee, S., Savva, M., Chernova, S., Batra, D.: Sim2real predictivity: Does evaluation in simulation predict real-world performance? IEEE Robot. Autom. Lett. 5(4), 6670\u20136677 (2020)","journal-title":"IEEE Robot. Autom. Lett."},{"key":"2326_CR29","doi-asserted-by":"crossref","unstructured":"Kadokawa, Y., Tahara, H., Matsubara, T.: Progressive-resolution policy distillation: Leveraging coarse-resolution simulations for time-efficient fine-resolution policy learning. IEEE Trans. Autom. Sci. Eng. (2025)","DOI":"10.1109\/TASE.2025.3590068"},{"key":"2326_CR30","unstructured":"Chung, J., Kastner, K., Dinh, L., Goel, K., Courville, A.C., Bengio, Y.: A Recurrent Latent Variable Model for Sequential Data. Adv. Neural. Inf. Process. Syst. 28, 2980\u20132988 (2015)"},{"key":"2326_CR31","unstructured":"Nguyen, K.: Imitation learning with recurrent neural networks pp. 1\u20135 (2016). arXiv preprint arXiv:1607.05241"},{"key":"2326_CR32","unstructured":"Poole, B., Ozair, S., Van Den\u00a0Oord, A., Alemi, A., Tucker, G.: On Variational Bounds of Mutual Information. In: International Conference on Machine Learning, pp. 5171\u20135180 (2019)"},{"key":"2326_CR33","unstructured":"Esslinger, K., Platt, R., Amato, C.: Deep Transformer Q-Networks for Partially Observable Reinforcement Learning (2022). arXiv preprint arXiv:2206.01078"},{"key":"2326_CR34","unstructured":"Lu, C., Shi, R., Liu, Y., Hu, K., Du, S.S., Xu, H.: Rethinking transformers in solving pomdps (2024). arXiv preprint arXiv:2405.17358"},{"key":"2326_CR35","unstructured":"PyTorch Foundation: API LSTM. https:\/\/docs.pytorch.org\/docs\/stable\/generated\/torch.nn.LSTM.html"},{"key":"2326_CR36","first-page":"75012","volume":"37","author":"A Shai","year":"2024","unstructured":"Shai, A., Teixeira, L., Oldenziel, A., Marzen, S., Riechers, P.: Transformers represent belief state geometry in their residual stream. Adv. Neural. Inf. Process. Syst. 37, 75012\u201375034 (2024)","journal-title":"Adv. Neural. Inf. Process. Syst."},{"key":"2326_CR37","unstructured":"PyTorch Foundation: API Transformer. https:\/\/docs.pytorch.org\/docs\/stable\/generated\/torch.nn.Transformer.html"},{"issue":"11","key":"2326_CR38","first-page":"2579","volume":"9","author":"L Maaten","year":"2008","unstructured":"Maaten, L., Hinton, G.: Visualizing Data using t-SNE. J. Mach. Learn. Res. 9(11), 2579\u20132605 (2008)","journal-title":"J. Mach. Learn. Res."},{"issue":"2","key":"2326_CR39","doi-asserted-by":"publisher","first-page":"2491","DOI":"10.1109\/LRA.2020.2972891","volume":"5","author":"FE Sotiropoulos","year":"2020","unstructured":"Sotiropoulos, F.E., Asada, H.H.: Autonomous Excavation of Rocks Using a Gaussian Process Model and Unscented Kalman Filter. IEEE Robot. Autom. Lett. 5(2), 2491\u20132497 (2020)","journal-title":"IEEE Robot. Autom. Lett."},{"key":"2326_CR40","doi-asserted-by":"crossref","unstructured":"Tsai, Y., Guo, Y., Yang, G.: Unsupervised Task Segmentation Approach for Bimanual Surgical Tasks using Spatiotemporal and Variance Properties. In: International Conference on Intelligent Robots and Systems, pp. 1\u20137 (2019)","DOI":"10.1109\/IROS40897.2019.8968016"},{"key":"2326_CR41","unstructured":"Ross, S., Gordon, G., Bagnell, D.: A reduction of imitation learning and structured prediction to no-regret online learning. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 627\u2013635 (2011)"},{"key":"2326_CR42","unstructured":"Hoque, R., Balakrishna, A., Novoseller, E., Wilcox, A., Brown, D.S., Goldberg, K.: Thriftydagger: Budget-aware novelty and risk gating for interactive imitation learning. In: Conference on Robot Learning, pp. 598\u2013608 (2022)"}],"container-title":["Journal of Intelligent &amp; Robotic Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10846-025-02326-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10846-025-02326-0","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10846-025-02326-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,10]],"date-time":"2026-01-10T07:49:29Z","timestamp":1768031369000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10846-025-02326-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,28]]},"references-count":42,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["2326"],"URL":"https:\/\/doi.org\/10.1007\/s10846-025-02326-0","relation":{},"ISSN":["1573-0409"],"issn-type":[{"value":"1573-0409","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,10,28]]},"assertion":[{"value":"4 April 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 October 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"28 October 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics Approval"}},{"value":"Not applicable","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent to Participate"}},{"value":"Not applicable","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent to Publish"}},{"value":"The authors have no relevant financial or non-financial interests to disclose.","order":5,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflicts of Interest"}}],"article-number":"118"}}