{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,6]],"date-time":"2026-06-06T01:04:30Z","timestamp":1780707870568,"version":"3.54.1"},"reference-count":32,"publisher":"Cambridge University Press (CUP)","issue":"11","license":[{"start":{"date-parts":[[2022,5,6]],"date-time":"2022-05-06T00:00:00Z","timestamp":1651795200000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/www.cambridge.org\/core\/terms"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Robotica"],"published-print":{"date-parts":[[2022,11]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Robotic systems are usually controlled to repetitively perform specific actions for manufacturing tasks. The traditional control methods are domain-dependent and model-dependent with cost of much human efforts. They cannot meet the new requirements of generality and flexibility in many areas such as intelligent manufacturing and customized production. This paper develops a general model-free approach to enable robots to perform multi-step object sorting tasks through deep reinforcement learning. Taking projected heightmap images from different time steps as input without extra high-level image analysis and understanding, critic models are designed to produce a pixel-wise Q value map for each type of action. It is a new trial to apply pixel-wise Q value-based critic networks to solve multi-step sorting tasks that involve many types of actions and complex action constraints. The experimental validations on simulated and realistic object sorting tasks demonstrate the effectiveness of the proposed approach. Qualitative results (videos), code for simulated and realistic experiments, and pre-trained models are available at <jats:uri xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" xlink:href=\"https:\/\/github.com\/JiatongBao\/DRLSorting\">https:\/\/github.com\/JiatongBao\/DRLSorting<\/jats:uri><\/jats:p>","DOI":"10.1017\/s0263574722000650","type":"journal-article","created":{"date-parts":[[2022,5,6]],"date-time":"2022-05-06T05:59:14Z","timestamp":1651816754000},"page":"3878-3894","source":"Crossref","is-referenced-by-count":10,"title":["Learn multi-step object sorting tasks through deep reinforcement learning"],"prefix":"10.1017","volume":"40","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9476-5316","authenticated-orcid":false,"given":"Jiatong","family":"Bao","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Guoqing","family":"Zhang","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Yi","family":"Peng","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Zhiyu","family":"Shao","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Aiguo","family":"Song","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"56","published-online":{"date-parts":[[2022,5,6]]},"reference":[{"key":"S0263574722000650_ref15","doi-asserted-by":"crossref","unstructured":"[15] Chen, Y. , Ju, Z. and Yang, C. , \u201cCombining Reinforcement Learning and Rule-Based Method to Manipulate Objects in Clutter,\u201d In: International Joint Conference on Neural Networks, Glassgow, UK (2020) pp. 1\u20136.","DOI":"10.1109\/IJCNN48605.2020.9207153"},{"key":"S0263574722000650_ref4","doi-asserted-by":"publisher","DOI":"10.1017\/S0263574717000613"},{"key":"S0263574722000650_ref23","first-page":"2021","article-title":"A visual grasping strategy for improving assembly efficiency based on deep reinforcement learning","volume":"1","author":"Wang","year":"2021","journal-title":"J. Sens"},{"key":"S0263574722000650_ref12","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2021.01.077"},{"key":"S0263574722000650_ref28","doi-asserted-by":"crossref","unstructured":"[28] Deng, J. , Dong, W. , Socher, R. , Li, L.-J. , Li, K. and Fei-Fei, L. , \u201cImagenet: A Large-Scale Hierarchical Image Database,\u201d In: IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA (2009)\u00a0pp. 248\u2013255.","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"S0263574722000650_ref13","doi-asserted-by":"crossref","unstructured":"[13] Tang, B. , Corsaro, M. , Konidaris, G. , Nikolaidis, S. and Tellex, S. , \u201cLearning Collaborative Pushing and Grasping Policies in Dense Clutter,\u201d In: International Conference on Robotics and Automation, Xi\u2019an, China (2021) pp. 6177\u20136184.","DOI":"10.1109\/ICRA48506.2021.9561828"},{"key":"S0263574722000650_ref14","doi-asserted-by":"crossref","unstructured":"[14] Joshi, S. , Kumra, S. and Sahin, F. , \u201cRobotic Grasping Using Deep Reinforcement Learning,\u201d In: IEEE International Conference on Automation Science and Engineering, Hong Kong, China (2020)\u00a0pp. 1461\u20131466.","DOI":"10.1109\/CASE48305.2020.9216986"},{"key":"S0263574722000650_ref16","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2020.3015448"},{"key":"S0263574722000650_ref24","doi-asserted-by":"crossref","unstructured":"[24] Ceola, F. , Tosello, E. , Tagliapietra, L. , Nicola, G. and Ghidoni, S. , \u201cRobot Task Planning Via Deep Reinforcement Learning: A Tabletop Object Sorting Application,\u201d In: IEEE International Conference on Systems, Man and Cybernetics, Bari, Italy (2019)\u00a0pp. 486\u2013492.","DOI":"10.1109\/SMC.2019.8914278"},{"key":"S0263574722000650_ref7","doi-asserted-by":"crossref","unstructured":"[7] Zeng, A. , Song, S. , Welker, S. , Lee, J. , Rodriguez, A. and Funkhouser, T. , \u201cLearning Synergies Between Pushing and Grasping with Self-Supervised Deep Reinforcement Learning,\u201d In: IEEE\/RSJ International Conference on Intelligent Robots and Systems, Madrid, Spain (2018)\u00a0pp. 633\u2013638.","DOI":"10.1109\/IROS.2018.8593986"},{"key":"S0263574722000650_ref27","doi-asserted-by":"crossref","unstructured":"[27] Huang, G. , Liu, Z. and Weinberger, K. Q. , \u201cDensely Connected Convolutional Networks,\u201d In: IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA (2017) pp. 2261\u20132269.","DOI":"10.1109\/CVPR.2017.243"},{"key":"S0263574722000650_ref20","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2019.01.087"},{"key":"S0263574722000650_ref22","doi-asserted-by":"publisher","DOI":"10.3390\/app10196923"},{"key":"S0263574722000650_ref1","doi-asserted-by":"crossref","unstructured":"[1] Jia, Y. , She, L. , Cheng, Y. , Bao, J. , Chai, J. Y. and Xi, N. , \u201cProgram Robots Manufacturing Tasks By Natural Language Instructions,\u201d In: IEEE International Conference on Automation Science and Engineering, Fort Worth, TX, USA (2016) \u00a0\u00a0pp. 633\u2013638.","DOI":"10.1109\/COASE.2016.7743461"},{"key":"S0263574722000650_ref18","first-page":"1","article-title":"Robotic manipulation skill acquisition via demonstration policy learning","author":"Liu","year":"2021","journal-title":"IEEE Trans. Cognit. Develop. Syst"},{"key":"S0263574722000650_ref2","doi-asserted-by":"publisher","DOI":"10.1177\/1729881419851619"},{"key":"S0263574722000650_ref29","unstructured":"[29] Nair, V. and Hinton, G. E. , \u201cRectified Linear Units Improve Restricted Boltzmann Machines,\u201d In: Proceedings of the 27th International Conference on Machine Learning, Haifa, Israel (2010)\u00a0pp. 807\u2013814."},{"key":"S0263574722000650_ref19","doi-asserted-by":"crossref","unstructured":"[19] Schoettler, G. , Nair, A. , Luo, J. , Bahl, S. , Ojea, J. , Solowjow, E. and Levine, S. , \u201cDeep Reinforcement Learning for Industrial Insertion Tasks with Visual Inputs and Natural Rewards,\u201d In: IEEE\/RSJ International Conference on Intelligent Robots and Systems, Las Vegas, NV, USA (2020)\u00a0pp. 5548\u20135555.","DOI":"10.1109\/IROS45743.2020.9341714"},{"key":"S0263574722000650_ref6","doi-asserted-by":"crossref","unstructured":"[6] Haarnoja, T. , Pong, V. H. , Zhou, A. , Dalal, M. , Abbeel, P. and Levine, S. , \u201cComposable Deep Reinforcement Learning for Robotic Manipulation,\u201d In: IEEE International Conference on Robotics and Automation, Brisbane, QLD, Australia (2018)\u00a0pp. 6244\u20136251.","DOI":"10.1109\/ICRA.2018.8460756"},{"key":"S0263574722000650_ref32","doi-asserted-by":"crossref","unstructured":"[32] Rohmer, E. , Singh, S. P. N. and Freese, M. , \u201cV-rep: A Versatile and Scalable Robot Simulation Framework,\u201d In: IEEE\/RSJ International Conference on Intelligent Robots and Systems, Tokyo, Japan (2013)\u00a0pp. 1321\u20131326.","DOI":"10.1109\/IROS.2013.6696520"},{"key":"S0263574722000650_ref8","doi-asserted-by":"publisher","DOI":"10.1017\/S0263574720000703"},{"key":"S0263574722000650_ref9","doi-asserted-by":"publisher","DOI":"10.1109\/JAS.2021.1004255"},{"key":"S0263574722000650_ref26","unstructured":"[26] Hellaby, W. C. J. C. , Learning From Delayed Rewards, PhD Thesis (University of Cambridge, 1989)."},{"key":"S0263574722000650_ref25","doi-asserted-by":"publisher","DOI":"10.1038\/nature14236"},{"key":"S0263574722000650_ref17","first-page":"1","article-title":"Hierarchical reinforcement learning with universal policies for multistep robotic manipulation","author":"Yang","year":"2021","journal-title":"IEEE Trans. Neur. Netw. Learn. Syst."},{"key":"S0263574722000650_ref31","unstructured":"[31] Schaul, T. , Quan, J. , Antonoglou, I. and Silver, D. , \u201cPrioritized Experience Replay,\u201d In: International Conference on Learning Representations, Caribe Hilton, San Juan, Puerto Rico (2016)."},{"key":"S0263574722000650_ref5","doi-asserted-by":"crossref","unstructured":"[5] Nicola, G. , Tagliapietra, L. , Tosello, E. , Navarin, N. , Ghidoni, S. and Menegatti, E. , \u201cRobotic Object Sorting Via Deep Reinforcement Learning: A Generalized Approach,\u201d In: IEEE International Conference on Robot and Human Interactive Communication, Naples, Italy (2020)\u00a0pp. 1266\u20131273.","DOI":"10.1109\/RO-MAN47096.2020.9223484"},{"key":"S0263574722000650_ref10","doi-asserted-by":"publisher","DOI":"10.1007\/s10489-020-01870-6"},{"key":"S0263574722000650_ref30","unstructured":"[30] Ioffe, S. and Szegedy, C. , \u201cBatch Normalization: Accelerating Deep Network Training By Reducing Internal Covariate Shift,\u201d In: International Conference on Machine Learning, Lille, France (2015)\u00a0pp. 448\u2013456."},{"key":"S0263574722000650_ref3","doi-asserted-by":"publisher","DOI":"10.1017\/S0263574719000614"},{"key":"S0263574722000650_ref21","doi-asserted-by":"crossref","unstructured":"[21] Luo, J. , Solowjow, E. , Wen, C. , Ojea, J. A. , Agogino, A. , Tamar, A. and Abbeel, P. , \u201cReinforcement Learning on Variable Impedance Controller for High-Precision Robotic Assembly,\u201d In: International Conference on Robotics and Automation, Montreal, QC, Canada (2019) pp. 3080\u20133087.","DOI":"10.1109\/ICRA.2019.8793506"},{"key":"S0263574722000650_ref11","doi-asserted-by":"publisher","DOI":"10.7746\/jkros.2020.15.2.197"}],"container-title":["Robotica"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S0263574722000650","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,10,6]],"date-time":"2022-10-06T12:57:03Z","timestamp":1665061023000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S0263574722000650\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,5,6]]},"references-count":32,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2022,11]]}},"alternative-id":["S0263574722000650"],"URL":"https:\/\/doi.org\/10.1017\/s0263574722000650","relation":{},"ISSN":["0263-5747","1469-8668"],"issn-type":[{"value":"0263-5747","type":"print"},{"value":"1469-8668","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,5,6]]}}}