{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,15]],"date-time":"2025-08-15T02:33:36Z","timestamp":1755225216424,"version":"3.43.0"},"reference-count":40,"publisher":"SAGE Publications","issue":"12","license":[{"start":{"date-parts":[[2022,8,17]],"date-time":"2022-08-17T00:00:00Z","timestamp":1660694400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"funder":[{"name":"Guizhou Human Resources and Social Security Department","award":["(2020)04"],"award-info":[{"award-number":["(2020)04"]}]},{"DOI":"10.13039\/501100004001","name":"Guizhou Science and Technology Department","doi-asserted-by":"publisher","award":["[2020]1Y233"],"award-info":[{"award-number":["[2020]1Y233"]}],"id":[{"id":"10.13039\/501100004001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003459","name":"Guizhou University","doi-asserted-by":"publisher","award":["(2019)67"],"award-info":[{"award-number":["(2019)67"]}],"id":[{"id":"10.13039\/501100003459","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003459","name":"Guizhou University","doi-asserted-by":"publisher","award":["GZUAMT2021KF[03]"],"award-info":[{"award-number":["GZUAMT2021KF[03]"]}],"id":[{"id":"10.13039\/501100003459","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003459","name":"Guizhou University","doi-asserted-by":"publisher","award":["[2020]51"],"award-info":[{"award-number":["[2020]51"]}],"id":[{"id":"10.13039\/501100003459","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Transactions of the Institute of Measurement and Control"],"published-print":{"date-parts":[[2025,8]]},"abstract":"<jats:p>\n            Tetris has been an important field for research in deep reinforcement learning (DRL). However, most studies about Tetris are focused on simulation validation, and a few attempts are conducted in the real-world environment. In this paper, the DRL algorithms are trained in the constructed Tetris simulation environment, after that they are deployed into the real-world Tetris experiments. The dynamic timesteps method is integrated into the proximal policy optimization (PPO) method to accelerate its training speed, which reaches the goal of the game within 1483 episodes. With the help of multiple recognition and segmented moving techniques, the robotic arm provides accurate and robust performance to play real-world Tetris. The effectiveness of the developed system is experimentally verified; the experimental results show that the proposed algorithm achieved superior performance compared with conventional method and Deep\n            <jats:italic>Q<\/jats:italic>\n            -Network (DQN) in real-world Tetris environments.\n          <\/jats:p>","DOI":"10.1177\/01423312221114694","type":"journal-article","created":{"date-parts":[[2022,8,17]],"date-time":"2022-08-17T03:28:22Z","timestamp":1660706902000},"page":"2333-2342","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":2,"title":["Deep reinforcement learning in playing Tetris with robotic arm experiment"],"prefix":"10.1177","volume":"47","author":[{"given":"Yu","family":"Yan","sequence":"first","affiliation":[{"name":"School of Mechanical Engineering, Guizhou University, China"}]},{"given":"Peng","family":"Liu","sequence":"additional","affiliation":[{"name":"School of Mechanical Engineering, Guizhou University, China"}]},{"given":"Jin","family":"Zhao","sequence":"additional","affiliation":[{"name":"School of Mechanical Engineering, Guizhou University, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3130-6497","authenticated-orcid":false,"given":"Chengxi","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Internet of Things Engineering, Jiangnan University, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1794-0619","authenticated-orcid":false,"given":"Guangwei","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Mechanical Engineering, Guizhou University, China"},{"name":"Key Laboratory of Advanced Manufacturing Technology of the Ministry of Education, Guizhou University, China"}]}],"member":"179","published-online":{"date-parts":[[2022,8,17]]},"reference":[{"key":"e_1_3_2_2_1","first-page":"01652","article-title":"The game of Tetris in machine learning","volume":"1905","author":"Algorta S","year":"2019","unstructured":"Algorta S, \u015eim\u015fek \u00d6 (2019) The game of Tetris in machine learning. arXiv preprint arXiv: 1905.01652.","journal-title":"arXiv preprint arXiv:"},{"key":"e_1_3_2_3_1","doi-asserted-by":"publisher","DOI":"10.1016\/0196-6774(85)90018-5"},{"key":"e_1_3_2_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/CDC.1995.478953"},{"key":"e_1_3_2_5_1","unstructured":"B\u00f6hm N K\u00f3kai G Mandl S (2005) An evolutionary approach to tetris. In: The sixth metaheuristics international conference (MIC2005) p. 5. Citeseer. Available at: https:\/\/citeseerx.ist.psu.edu\/viewdoc\/download?doi=10.1.1.68.9918&rep=rep1&type=pdf#:\u223c:text=To%20determine%20the%20best%20tetris finding%20a%20good%20rating%20function."},{"key":"e_1_3_2_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/18.21239"},{"key":"e_1_3_2_7_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jco.2022.101646"},{"key":"e_1_3_2_8_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0020-0190(97)00120-8"},{"key":"e_1_3_2_9_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-662-43948-7_36"},{"key":"e_1_3_2_10_1","first-page":"486","volume-title":"Learning for dynamics and control","author":"Fan J","year":"2020","unstructured":"Fan J, Wang Z, Xie Y, et al. (2020) A theoretical analysis of deep Q-learning. In: Learning for dynamics and control, Online, 11\u201312 June, pp. 486\u2013489. New York, NY: PMLR."},{"key":"e_1_3_2_11_1","volume-title":"International conference on learning representations","author":"Hafner D","year":"2019","unstructured":"Hafner D, Lillicrap T, Ba J, et al. (2019) Dream to control: Learning behaviors by latent imagination. In: International conference on learning representations, New Orleans, 6\u20139 May."},{"key":"e_1_3_2_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2021.3092685"},{"key":"e_1_3_2_13_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0360-8352(99)00097-2"},{"key":"e_1_3_2_14_1","unstructured":"Hu H Zhang X Yan X et al. (2017) Solving a new 3D bin packing problem with deep reinforcement learning method. arXiv preprint arXiv:1708.05930."},{"key":"e_1_3_2_15_1","unstructured":"Huang S Onta\u00f1\u00f3n S (2020) A closer look at invalid action masking in policy gradient algorithms. arXiv preprint arXiv:2006. 14171."},{"key":"e_1_3_2_16_1","doi-asserted-by":"crossref","unstructured":"Jentzen A Riekert A (2021) A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLu activation for constant target functions. arXiv preprint arXiv:2104.00277.","DOI":"10.1007\/s00033-022-01716-w"},{"key":"e_1_3_2_17_1","first-page":"1531","volume-title":"Advances in Neural Information Processing Systems","author":"Kakade SM","year":"2001","unstructured":"Kakade SM (2001) A natural policy gradient. In: Dietterich T, Becker S, Ghahramani Z (eds) Advances in Neural Information Processing Systems 14. Cambridge, MA: MIT Press, pp.1531\u20131538."},{"key":"e_1_3_2_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/TG.2021.3124340"},{"key":"e_1_3_2_19_1","doi-asserted-by":"publisher","DOI":"10.1177\/0142331221995336"},{"key":"e_1_3_2_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/RO-MAN46459.2019.8956393"},{"key":"e_1_3_2_21_1","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-46014-4_23"},{"key":"e_1_3_2_22_1","doi-asserted-by":"publisher","DOI":"10.5815\/ijieeb.2012.02.02"},{"key":"e_1_3_2_23_1","doi-asserted-by":"publisher","DOI":"10.1057\/palgrave.jors.2601771"},{"key":"e_1_3_2_24_1","doi-asserted-by":"publisher","DOI":"10.1007\/s12555-020-0069-6"},{"key":"e_1_3_2_25_1","unstructured":"Liu H Liu L (2020) Learn to play tetris with deep reinforcement learning. Unpublished. Available at: https:\/\/openreview.net\/forum?id=8TLyqLGQ7Tg"},{"issue":"4","key":"e_1_3_2_26_1","article-title":"A collaborative control method of dual-arm robots based on deep reinforcement learning","volume":"11","author":"Liu L","year":"2021","unstructured":"Liu L, Liu Q, Song Y, et al. (2021) A collaborative control method of dual-arm robots based on deep reinforcement learning. Applied Sciences 11(4): 1816.","journal-title":"Applied Sciences"},{"key":"e_1_3_2_27_1","doi-asserted-by":"publisher","DOI":"10.1038\/nature14236"},{"key":"e_1_3_2_28_1","first-page":"5","volume-title":"ICRA workshop on open source software","volume":"3","author":"Quigley M","year":"2009","unstructured":"Quigley M, Conley K, Gerkey B, et al. (2009) ROS: An open-source robot operating system. In: ICRA workshop on open source software, vol. 3, Kobe, Japan, Online, 17 May, p. 5. New York, NY: IEEE."},{"key":"e_1_3_2_29_1","first-page":"10","volume-title":"ICML-2004 Workshop on Relational Reinforcement Learning","author":"Ramon J","year":"2004","unstructured":"Ramon J, Driessens K (2004) On the numeric stability of Gaussian processes regression for relational reinforcement learning. In: ICML-2004 Workshop on Relational Reinforcement Learning, Banff, AB, Canada, 9 July, pp. 10\u201314. New York, NY: ACM."},{"key":"e_1_3_2_30_1","first-page":"1889","volume-title":"International conference on machine learning","author":"Schulman J","year":"2015","unstructured":"Schulman J, Levine S, Abbeel P, et al. (2015) Trust region policy optimization. In: International conference on machine learning, Lille, France, 6\u201311 July, pp. 1889\u20131897. New York, NY: PMLR."},{"key":"e_1_3_2_31_1","unstructured":"Schulman J Wolski F Dhariwal P et al. (2017) Proximal policy optimization algorithms. Arxiv Preprint Arxiv:1707.06347."},{"key":"e_1_3_2_32_1","doi-asserted-by":"publisher","DOI":"10.3390\/app11177917"},{"key":"e_1_3_2_33_1","doi-asserted-by":"publisher","DOI":"10.1038\/nature24270"},{"key":"e_1_3_2_34_1","unstructured":"Stevens M Pradhan S (2016) Playing tetris with deep reinforcement learning. Available at: http:\/\/cs231n.stanford.edu\/reports\/2016\/pdfs\/121_Report.pdf"},{"key":"e_1_3_2_35_1","volume-title":"Reinforcement Learning: An Introduction","author":"Sutton RS","year":"2018","unstructured":"Sutton RS, Barto AG (2018) Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press."},{"key":"e_1_3_2_36_1","doi-asserted-by":"publisher","DOI":"10.1613\/jair.2368"},{"key":"e_1_3_2_37_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1018008221616"},{"key":"e_1_3_2_38_1","first-page":"6653586","article-title":"A flexible reinforced bin packing framework with automatic slack selection","volume":"2021","author":"Yang T","year":"2021","unstructured":"Yang T, Luo F, Fuentes J, et al. (2021) A flexible reinforced bin packing framework with automatic slack selection. Mathematical Problems in Engineering 2021: 6653586.","journal-title":"Mathematical Problems in Engineering"},{"key":"e_1_3_2_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2022.3144515"},{"key":"e_1_3_2_40_1","doi-asserted-by":"publisher","DOI":"10.1177\/01423312211037847"},{"key":"e_1_3_2_41_1","doi-asserted-by":"publisher","DOI":"10.1142\/S0218001411008919"}],"container-title":["Transactions of the Institute of Measurement and Control"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/01423312221114694","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/01423312221114694","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/01423312221114694","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,13]],"date-time":"2025-08-13T08:22:56Z","timestamp":1755073376000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/01423312221114694"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,8,17]]},"references-count":40,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2025,8]]}},"alternative-id":["10.1177\/01423312221114694"],"URL":"https:\/\/doi.org\/10.1177\/01423312221114694","relation":{},"ISSN":["0142-3312","1477-0369"],"issn-type":[{"type":"print","value":"0142-3312"},{"type":"electronic","value":"1477-0369"}],"subject":[],"published":{"date-parts":[[2022,8,17]]}}}