{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,17]],"date-time":"2026-05-17T15:07:09Z","timestamp":1779030429772,"version":"3.51.4"},"reference-count":36,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2026,5,12]],"date-time":"2026-05-12T00:00:00Z","timestamp":1778544000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/legalcode"}],"funder":[{"name":"National Research Foundation of Korea (NRF) through a grant funded by the Korean government","award":["RS-2022-NR068758"],"award-info":[{"award-number":["RS-2022-NR068758"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Cyber-Phys. Syst."],"published-print":{"date-parts":[[2026,7,31]]},"abstract":"<jats:p>The rapid increase in high-rise building construction has intensified the need for efficient elevator system operations. This article addresses the elevator dispatching problem in elevator group control systems. We formulate the problem as a Semi-Markov Decision Process (SMDP), defining the state representation, action space, and reward function. A two-phase model is then introduced, integrating imitation learning and deep reinforcement learning techniques to derive the optimal elevator dispatching policy from the formulated SMDP. In the first phase, a policy network is pre-trained by estimating the time required for elevator cars to pick up assigned hall requests. In the second phase, the pre-trained policy network is further optimized using Proximal Policy Optimization (PPO), a well-known policy-based deep reinforcement learning method. Additionally, we propose a novel update interval, termed the \u201cdirect-effect\u201d interval, which improves policy training during the reinforcement learning phase. Notably, this direct-effect interval concept has potential applicability to other multi-resource scheduling problems. Empirical experiments demonstrate the advantages of incorporating imitation learning before reinforcement learning, as well as the effectiveness of employing the direct-effect update interval during the reinforcement learning phase. Furthermore, the proposed model outperforms various benchmark rules in terms of average waiting time and the distribution of long waiting times, as validated across four traffic patterns.<\/jats:p>","DOI":"10.1145\/3790252","type":"journal-article","created":{"date-parts":[[2026,1,23]],"date-time":"2026-01-23T14:47:24Z","timestamp":1769179644000},"page":"1-19","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["A Hybrid Approach of Imitation Learning and Deep Reinforcement Learning with Direct-Effect Update Interval for Elevator Dispatching"],"prefix":"10.1145","volume":"10","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3748-8404","authenticated-orcid":false,"given":"Jiansong","family":"Wan","sequence":"first","affiliation":[{"name":"Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2195-798X","authenticated-orcid":false,"given":"Kanghoon","family":"Lee","sequence":"additional","affiliation":[{"name":"Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5161-661X","authenticated-orcid":false,"given":"Hayong","family":"Shin","sequence":"additional","affiliation":[{"name":"Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2026,5,12]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-0-387-30164-8_417"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.artint.2021.103500"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.4324\/9781315723600"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1145\/3447623"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.asoc.2003.11.002"},{"key":"e_1_3_1_7_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cie.2021.107190"},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1023\/A:1007518724497"},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.conengprac.2006.06.005"},{"key":"e_1_3_1_10_2","unstructured":"Tapio Hautam\u00e4ki. 2021. Multiobjective Optimization Model for Elevator Call Allocation. Retrieved from https:\/\/sal.aalto.fi\/publications\/pdf-files\/theses\/mas\/thau21_public.pdf"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.5555\/3157382.3157608"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cie.2020.106749"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1145\/3054912"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/SMC.2013.423"},{"key":"e_1_3_1_15_2","unstructured":"Anton Jansson and Kristoffer Uggla Lingvall. 2015. Elevator Control Using Reinforcement Learning to Select Strategy. Retrieved from https:\/\/www.kth.se\/social\/files\/588617c2f276547fe1dbf8d2\/AJanssonKUgglaLingvall_dkand15.pdf"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-45422-5_33"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.5555\/765580.765587"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2021.3060187"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSMC.2023.3237027"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1609\/icaps.v27i1.13799"},{"key":"e_1_3_1_21_2","volume-title":"Estimated Time of Arrival (ETA) Based Elevator Group Control Algorithm with More Accurate Estimation","author":"Rong Aiying","year":"2003","unstructured":"Aiying Rong, Henri Hakonen, and Risto Lahdelma. 2003. Estimated Time of Arrival (ETA) Based Elevator Group Control Algorithm with More Accurate Estimation. Turku Centre for Computer Science."},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICOMITEE53461.2021.9650221"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ejor.2016.01.019"},{"key":"e_1_3_1_24_2","first-page":"1889","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Schulman John","year":"2015","unstructured":"John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. 2015. Trust region policy optimization. In Proceedings of the International Conference on Machine Learning. PMLR, 1889\u20131897."},{"key":"e_1_3_1_25_2","unstructured":"John Schulman Philipp Moritz Sergey Levine Michael Jordan and Pieter Abbeel. 2015. High-dimensional continuous control using generalized advantage estimation. arXiv:1506.02438. Retrieved from https:\/\/arxiv.org\/abs\/1506.02438"},{"key":"e_1_3_1_26_2","unstructured":"John Schulman Filip Wolski Prafulla Dhariwal Alec Radford and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv:1707.06347. Retrieved from https:\/\/arxiv.org\/abs\/1707.06347"},{"key":"e_1_3_1_27_2","volume-title":"Reinforcement Learning: An Introduction","author":"Sutton Richard S.","year":"2018","unstructured":"Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction. MIT Press."},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ifacol.2016.07.071"},{"key":"e_1_3_1_29_2","doi-asserted-by":"crossref","unstructured":"Faraz Torabi Garrett Warnell and Peter Stone. 2018. Behavioral cloning from observation. arXiv:1805.01954. Retrieved from https:\/\/arxiv.org\/abs\/1805.01954","DOI":"10.24963\/ijcai.2018\/687"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSMCA.2010.2064766"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1145\/3582576"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1080\/00207543.2021.1910870"},{"issue":"2","key":"e_1_3_1_33_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3582577","article-title":"Green data center cooling control via physics-guided safe reinforcement learning","volume":"8","author":"Wang Ruihang","year":"2022","unstructured":"Ruihang Wang, Zhiwei Cao, Xin Zhou, Yonggang Wen, and Rui Tan. 2022. Green data center cooling control via physics-guided safe reinforcement learning. ACM Transactions on Cyber-Physical Systems 8, 2 (2022), 1\u201326.","journal-title":"ACM Transactions on Cyber-Physical Systems"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.aei.2021.101286"},{"key":"e_1_3_1_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2020.2965208"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.23919\/ECC55457.2022.9838059"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2022.3213246"}],"container-title":["ACM Transactions on Cyber-Physical Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3790252","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,5,17]],"date-time":"2026-05-17T14:36:42Z","timestamp":1779028602000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3790252"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,5,12]]},"references-count":36,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2026,7,31]]}},"alternative-id":["10.1145\/3790252"],"URL":"https:\/\/doi.org\/10.1145\/3790252","relation":{},"ISSN":["2378-962X","2378-9638"],"issn-type":[{"value":"2378-962X","type":"print"},{"value":"2378-9638","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,5,12]]},"assertion":[{"value":"2024-01-05","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2026-01-08","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2026-05-12","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}