{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T15:29:54Z","timestamp":1772119794297,"version":"3.50.1"},"reference-count":46,"publisher":"Springer Science and Business Media LLC","issue":"5","license":[{"start":{"date-parts":[[2024,10,4]],"date-time":"2024-10-04T00:00:00Z","timestamp":1728000000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,10,4]],"date-time":"2024-10-04T00:00:00Z","timestamp":1728000000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Neural Process Lett"],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Deep reinforcement learning has significantly advanced robot manipulations by providing an alternative solution for designing control strategies using raw images as direct inputs. While images offer additional environmental information, the end-to-end policy training manner (from image to action) requires simultaneous representation and task learning by the agent. This often necessitates a substantial number of interaction samples to achieve satisfactory policy performance. Previous works has attempted to address this challenge by learning a visual representation model that encodes the entire image into a low-dimensional vector before the policy training. However, since this vector contains both robot and object information, it inevitably introduces coupling within the state, which can mislead the policy training process. In this study, a novel method called Reinforcement Learning with Decoupled State Representation is proposed to effectively decouple robot and object information within the state representation. Experimental results demonstrate that the proposed method exhibits faster learning speed and achieves superior performance compared to previous methods across various robot manipulation tasks. Moreover, with only 3096 offline images, the proposed method successfully applies to real-world robot pushing tasks, which demonstrates its high practicability.<\/jats:p>","DOI":"10.1007\/s11063-024-11650-9","type":"journal-article","created":{"date-parts":[[2024,10,4]],"date-time":"2024-10-04T06:02:24Z","timestamp":1728021744000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Reinforcement Learning with Decoupled State Representation for Robot Manipulations"],"prefix":"10.1007","volume":"56","author":[{"given":"Kun","family":"Dong","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yu","family":"Zeng","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kun","family":"Wang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yongle","family":"Luo","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yuxin","family":"Wang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Erkang","family":"Cheng","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhiyong","family":"Sun","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Qiang","family":"Zhang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Bo","family":"Song","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2024,10,4]]},"reference":[{"issue":"7540","key":"11650_CR1","doi-asserted-by":"publisher","first-page":"529","DOI":"10.1038\/nature14236","volume":"518","author":"V Mnih","year":"2015","unstructured":"Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529\u2013533","journal-title":"Nature"},{"key":"11650_CR2","unstructured":"Hu S, Zhu F, Chang X, Liang X (2021) UPDeT: universal multi-agent reinforcement learning via policy decoupling with transformers. arXiv preprint arXiv:2101.08001"},{"issue":"1","key":"11650_CR3","doi-asserted-by":"publisher","first-page":"73","DOI":"10.1109\/TETCI.2018.2823329","volume":"3","author":"K Shao","year":"2018","unstructured":"Shao K, Zhu Y, Zhao D (2018) StarCraft micromanagement with reinforcement learning and curriculum transfer learning. IEEE Trans Emerg Top Comput Intell 3(1):73\u201384","journal-title":"IEEE Trans Emerg Top Comput Intell"},{"issue":"7676","key":"11650_CR4","doi-asserted-by":"publisher","first-page":"354","DOI":"10.1038\/nature24270","volume":"550","author":"D Silver","year":"2017","unstructured":"Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A et al (2017) Mastering the game of go without human knowledge. Nature 550(7676):354\u2013359","journal-title":"Nature"},{"issue":"6419","key":"11650_CR5","doi-asserted-by":"publisher","first-page":"1140","DOI":"10.1126\/science.aar6404","volume":"362","author":"D Silver","year":"2018","unstructured":"Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, Lanctot M, Sifre L, Kumaran D, Graepel T et al (2018) A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419):1140\u20131144","journal-title":"Science"},{"issue":"6","key":"11650_CR6","doi-asserted-by":"publisher","first-page":"1335","DOI":"10.1109\/TETCI.2022.3166555","volume":"6","author":"T Bonjour","year":"2022","unstructured":"Bonjour T, Haliem M, Alsalem A, Thomas S, Li H, Aggarwal V, Kejriwal M, Bhargava B (2022) Decision making in monopoly using a hybrid deep reinforcement learning approach. IEEE Trans Emerg Top Comput Intell 6(6):1335\u20131344","journal-title":"IEEE Trans Emerg Top Comput Intell"},{"issue":"6337","key":"11650_CR7","doi-asserted-by":"publisher","first-page":"508","DOI":"10.1126\/science.aam6960","volume":"356","author":"M Morav\u010d\u00edk","year":"2017","unstructured":"Morav\u010d\u00edk M, Schmid M, Burch N, Lis\u1ef3 V, Morrill D, Bard N, Davis T, Waugh K, Johanson M, Bowling M (2017) DeepStack: expert-level artificial intelligence in heads-up no-limit poker. Science 356(6337):508\u2013513","journal-title":"Science"},{"issue":"6456","key":"11650_CR8","doi-asserted-by":"publisher","first-page":"885","DOI":"10.1126\/science.aay2400","volume":"365","author":"N Brown","year":"2019","unstructured":"Brown N, Sandholm T (2019) Superhuman AI for multiplayer poker. Science 365(6456):885\u2013890","journal-title":"Science"},{"issue":"7","key":"11650_CR9","doi-asserted-by":"publisher","first-page":"9467","DOI":"10.1007\/s11063-023-11209-0","volume":"55","author":"Y Kong","year":"2023","unstructured":"Kong Y, Shi H, Wu X, Rui Y (2023) Application of DQN-IRL framework in Doudizhu\u2019s sparse reward. Neural Process Lett 55(7):9467\u20139482","journal-title":"Neural Process Lett"},{"key":"11650_CR10","doi-asserted-by":"publisher","first-page":"57","DOI":"10.1016\/j.image.2014.10.009","volume":"30","author":"N Ponomarenko","year":"2015","unstructured":"Ponomarenko N, Jin L, Ieremeiev O, Lukin V, Egiazarian K, Astola J, Vozel B, Chehdi K, Carli M, Battisti F et al (2015) Image database TID2013: peculiarities, results and perspectives. Signal Process Image Commun 30:57\u201377","journal-title":"Signal Process Image Commun"},{"key":"11650_CR11","doi-asserted-by":"crossref","unstructured":"Zhang R, Isola P, Efros AA, Shechtman E, Wang O (2018) The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 586\u2013595","DOI":"10.1109\/CVPR.2018.00068"},{"key":"11650_CR12","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2021.107952","volume":"116","author":"J Zhang","year":"2021","unstructured":"Zhang J, Cao Y, Wu Q (2021) Vector of locally and adaptively aggregated descriptors for image feature representation. Pattern Recognit 116:107952","journal-title":"Pattern Recognit"},{"key":"11650_CR13","doi-asserted-by":"crossref","unstructured":"Dong K, Luo Y, Cheng E, Sun Z, Zhao L, Zhang Q, Zhou C, Song B (2022) Balance between efficient and effective learning: Dense2Sparse reward shaping for robot manipulation with environment uncertainty. In: International conference on advanced intelligent mechatronics, pp 1192\u20131198","DOI":"10.1109\/AIM52237.2022.9863259"},{"key":"11650_CR14","unstructured":"Kingma DP, Welling M (2013) Auto-encoding variational Bayes. arXiv preprint arXiv:1312.6114"},{"issue":"5","key":"11650_CR15","doi-asserted-by":"publisher","first-page":"3117","DOI":"10.1002\/int.22814","volume":"37","author":"J Zhang","year":"2022","unstructured":"Zhang J, Yang J, Yu J, Fan J (2022) Semisupervised image classification by mutual learning of multiple self-supervised models. Int J Intell Syst 37(5):3117\u20133141","journal-title":"Int J Intell Syst"},{"issue":"3","key":"11650_CR16","doi-asserted-by":"publisher","first-page":"42","DOI":"10.1109\/MSP.2021.3134634","volume":"39","author":"L Ericsson","year":"2022","unstructured":"Ericsson L, Gouk H, Loy CC, Hospedales TM (2022) Self-supervised representation learning: introduction, advances, and challenges. IEEE Signal Process Mag 39(3):42\u201362","journal-title":"IEEE Signal Process Mag"},{"key":"11650_CR17","doi-asserted-by":"crossref","unstructured":"Lee MA, Zhu Y, Srinivasan K, Shah P, Savarese S, Fei-Fei L, Garg A, Bohg J (2019) Making sense of vision and touch: self-supervised learning of multimodal representations for contact-rich tasks. In: International conference on robotics and automation, pp 8943\u20138950","DOI":"10.1109\/ICRA.2019.8793485"},{"key":"11650_CR18","doi-asserted-by":"crossref","unstructured":"Kim T, Park Y, Park Y, Lee SH, Suh IH (2021) Acceleration of actor-critic deep reinforcement learning for visual grasping by state representation learning based on a preprocessed input image. In: International conference on intelligent robots and systems. IEEE, pp 198\u2013205","DOI":"10.1109\/IROS51168.2021.9635931"},{"key":"11650_CR19","doi-asserted-by":"publisher","DOI":"10.3389\/fnbot.2022.829437","volume":"16","author":"L Cong","year":"2022","unstructured":"Cong L, Liang H, Ruppel P, Shi Y, G\u00f6rner M, Hendrich N, Zhang J (2022) Reinforcement learning with vision-proprioception model for robot planar pushing. Front Neurorobot 16:829437","journal-title":"Front Neurorobot"},{"key":"11650_CR20","unstructured":"Nair AV, Pong V, Dalal M, Bahl S, Lin S, Levine S (2018) Visual reinforcement learning with imagined goals. Adv Neural Inf Process Syst 31. https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2018\/file\/7ec69dd44416c46745f6edd947b470cd-Paper.pdf"},{"key":"11650_CR21","unstructured":"Pong V, Dalal M, Lin S, Nair A, Bahl S, Levine S (2020) Skew-fit: state-covering self-supervised reinforcement learning. In: International conference on machine learning, pp 7783\u20137792"},{"issue":"2","key":"11650_CR22","doi-asserted-by":"publisher","first-page":"3537","DOI":"10.1109\/LRA.2021.3064509","volume":"6","author":"J Huang","year":"2021","unstructured":"Huang J, Rojas J, Zimmer M, Wu H, Guan Y, Weng P (2021) Hyperparameter auto-tuning in self-supervised robotic learning. IEEE Robot Autom Lett 6(2):3537\u20133544","journal-title":"IEEE Robot Autom Lett"},{"key":"11650_CR23","first-page":"3483","volume":"28","author":"K Sohn","year":"2015","unstructured":"Sohn K, Lee H, Yan X (2015) Learning structured output representation using deep conditional generative models. Adv Neural Inf Process Syst 28:3483\u20133491","journal-title":"Adv Neural Inf Process Syst"},{"key":"11650_CR24","doi-asserted-by":"crossref","unstructured":"Joshi S, Kumra S, Sahin F (2020) Robotic grasping using deep reinforcement learning. In: International conference on automation science and engineering, pp 1461\u20131466","DOI":"10.1109\/CASE48305.2020.9216986"},{"key":"11650_CR25","doi-asserted-by":"crossref","unstructured":"Pinto L, Andrychowicz M, Welinder P, Zaremba W, Abbeel P (2018) Asymmetric actor critic for image-based robot learning. In: 14th Robotics: science and systems, RSS 2018. MIT Press Journals","DOI":"10.15607\/RSS.2018.XIV.008"},{"key":"11650_CR26","unstructured":"Agrawal P, Nair AV, Abbeel P, Malik J, Levine S (2016) Learning to poke by poking: experiential learning of intuitive physics. Adv Neural Inf Process Syst 29"},{"key":"11650_CR27","doi-asserted-by":"crossref","unstructured":"Finn C, Levine S (2017) Deep visual foresight for planning robot motion. In: IEEE international conference on robotics and automation, pp 2786\u20132793","DOI":"10.1109\/ICRA.2017.7989324"},{"issue":"5","key":"11650_CR28","doi-asserted-by":"publisher","first-page":"3979","DOI":"10.1007\/s11063-022-10796-8","volume":"54","author":"Y Lyu","year":"2022","unstructured":"Lyu Y, Shi Y, Zhang X (2022) Improving target-driven visual navigation with attention on 3d spatial relationships. Neural Process Lett 54(5):3979\u20133998","journal-title":"Neural Process Lett"},{"key":"11650_CR29","unstructured":"Wang Y, Gautham N, Lin X, Okorn B, Held D (2021) Roll: visual self-supervised reinforcement learning with object reasoning. In: Conference on robot learning, pp 1030\u20131048"},{"key":"11650_CR30","doi-asserted-by":"crossref","unstructured":"Sermanet P, Lynch C, Chebotar Y, Hsu J, Jang E, Schaal S, Levine S, Brain G (2018) Time-contrastive networks: self-supervised learning from video. In: International conference on robotics and automation, pp 1134\u20131141","DOI":"10.1109\/ICRA.2018.8462891"},{"key":"11650_CR31","unstructured":"Laskin M, Srinivas A, Abbeel P (2020) CURL: contrastive unsupervised representations for reinforcement learning. In: International conference on machine learning, pp 5639\u20135650"},{"key":"11650_CR32","unstructured":"Hafner D, Lillicrap T, Fischer I, Villegas R, Ha D, Lee H, Davidson J (2019) Learning latent dynamics for planning from pixels. In: International conference on machine learning, pp 2555\u20132565"},{"key":"11650_CR33","first-page":"741","volume":"33","author":"AX Lee","year":"2020","unstructured":"Lee AX, Nagabandi A, Abbeel P, Levine S (2020) Stochastic latent actor-critic: deep reinforcement learning with a latent variable model. Adv Neural Inf Process Syst 33:741\u2013752","journal-title":"Adv Neural Inf Process Syst"},{"key":"11650_CR34","unstructured":"Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767"},{"key":"11650_CR35","first-page":"91","volume":"28","author":"S Ren","year":"2015","unstructured":"Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28:91\u201399","journal-title":"Adv Neural Inf Process Syst"},{"issue":"12","key":"11650_CR36","doi-asserted-by":"publisher","first-page":"2481","DOI":"10.1109\/TPAMI.2016.2644615","volume":"39","author":"V Badrinarayanan","year":"2017","unstructured":"Badrinarayanan V, Kendall A, Cipolla R (2017) SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481\u20132495","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"11650_CR37","doi-asserted-by":"crossref","unstructured":"He K, Gkioxari G, Doll\u00e1r P, Girshick R (2017) Mask R-CNN. In: International conference on computer vision, pp 2961\u20132969","DOI":"10.1109\/ICCV.2017.322"},{"issue":"5","key":"11650_CR38","doi-asserted-by":"publisher","first-page":"1343","DOI":"10.1109\/TRO.2021.3060341","volume":"37","author":"C Xie","year":"2021","unstructured":"Xie C, Xiang Y, Mousavian A, Fox D (2021) Unseen object instance segmentation for robotic environments. IEEE Trans Robot 37(5):1343\u20131359","journal-title":"IEEE Trans Robot"},{"key":"11650_CR39","unstructured":"Xiang Y, Xie C, Mousavian A, Fox D (2021) Learning RGB-D feature embeddings for unseen object instance segmentation. In: Conference on robot learning, pp 461\u2013470"},{"key":"11650_CR40","doi-asserted-by":"crossref","unstructured":"Zivkovic Z (2004) Improved adaptive Gaussian mixture model for background subtraction. In: International conference on pattern recognition, vol 2, pp 28\u201331","DOI":"10.1109\/ICPR.2004.1333992"},{"issue":"7","key":"11650_CR41","doi-asserted-by":"publisher","first-page":"773","DOI":"10.1016\/j.patrec.2005.11.005","volume":"27","author":"Z Zivkovic","year":"2006","unstructured":"Zivkovic Z, Van Der Heijden F (2006) Efficient adaptive density estimation per image pixel for the task of background subtraction. Pattern Recognit Lett 27(7):773\u2013780","journal-title":"Pattern Recognit Lett"},{"key":"11650_CR42","unstructured":"Nair A, Bahl S, Khazatsky A, Pong V, Berseth G, Levine S (2020) Contextual imagined goals for self-supervised robotic learning. In: Conference on robot learning, pp 530\u2013539"},{"issue":"11","key":"11650_CR43","doi-asserted-by":"publisher","first-page":"2278","DOI":"10.1109\/5.726791","volume":"86","author":"Y LeCun","year":"1998","unstructured":"LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278\u20132324","journal-title":"Proc IEEE"},{"key":"11650_CR44","unstructured":"Haarnoja T, Zhou A, Hartikainen K, Tucker G, Ha S, Tan J, Kumar V, Zhu H, Gupta A, Abbeel P et al (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905"},{"key":"11650_CR45","unstructured":"Andrychowicz M, Wolski F, Ray A, Schneider J, Fong R, Welinder P, McGrew B, Tobin J, Pieter Abbeel O, Zaremba W (2017) Hindsight experience replay. Adv Neural Inf Process Syst 30. https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2017\/file\/453fadbd8a1a3af50a9df4df899537b5-Paper.pdf"},{"key":"11650_CR46","doi-asserted-by":"crossref","unstructured":"Todorov E, Erez T, Tassa Y (2012) MuJoCo: a physics engine for model-based control. In: International conference on intelligent robots and systems, pp 5026\u20135033","DOI":"10.1109\/IROS.2012.6386109"}],"container-title":["Neural Processing Letters"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11063-024-11650-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11063-024-11650-9\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11063-024-11650-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,30]],"date-time":"2024-10-30T11:59:25Z","timestamp":1730289565000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11063-024-11650-9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,10,4]]},"references-count":46,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2024,10]]}},"alternative-id":["11650"],"URL":"https:\/\/doi.org\/10.1007\/s11063-024-11650-9","relation":{"has-preprint":[{"id-type":"doi","id":"10.21203\/rs.3.rs-3117846\/v1","asserted-by":"object"}]},"ISSN":["1573-773X"],"issn-type":[{"value":"1573-773X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,10,4]]},"assertion":[{"value":"8 May 2024","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 October 2024","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"239"}}