{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,21]],"date-time":"2026-03-21T04:05:56Z","timestamp":1774065956932,"version":"3.50.1"},"reference-count":41,"publisher":"MDPI AG","issue":"11","license":[{"start":{"date-parts":[[2019,10,23]],"date-time":"2019-10-23T00:00:00Z","timestamp":1571788800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>The scheduling problems in mass production, manufacturing, assembly, synthesis, and transportation, as well as internet services, can partly be attributed to a hybrid flow-shop scheduling problem (HFSP). To solve the problem, a reinforcement learning (RL) method for HFSP is studied for the first time in this paper. HFSP is described and attributed to the Markov Decision Processes (MDP), for which the special states, actions, and reward function are designed. On this basis, the MDP framework is established. The Boltzmann exploration policy is adopted to trade-off the exploration and exploitation during choosing action in RL. Compared with the first-come-first-serve strategy that is frequently adopted when coding in most of the traditional intelligent algorithms, the rule in the RL method is first-come-first-choice, which is more conducive to achieving the global optimal solution. For validation, the RL method is utilized for scheduling in a metal processing workshop of an automobile engine factory. Then, the method is applied to the sortie scheduling of carrier aircraft in continuous dispatch. The results demonstrate that the machining and support scheduling obtained by this RL method are reasonable in result quality, real-time performance and complexity, indicating that this RL method is practical for HFSP.<\/jats:p>","DOI":"10.3390\/a12110222","type":"journal-article","created":{"date-parts":[[2019,10,25]],"date-time":"2019-10-25T03:20:36Z","timestamp":1571973636000},"page":"222","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":49,"title":["A Reinforcement Learning Method for a Hybrid Flow-Shop Scheduling Problem"],"prefix":"10.3390","volume":"12","author":[{"given":"Wei","family":"Han","sequence":"first","affiliation":[{"name":"Department of Airborne Vehicle Engineering, Naval Aviation University, Yantai 264001, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9893-1994","authenticated-orcid":false,"given":"Fang","family":"Guo","sequence":"additional","affiliation":[{"name":"Department of Airborne Vehicle Engineering, Naval Aviation University, Yantai 264001, China"}]},{"given":"Xichao","family":"Su","sequence":"additional","affiliation":[{"name":"Department of Airborne Vehicle Engineering, Naval Aviation University, Yantai 264001, China"}]}],"member":"1968","published-online":{"date-parts":[[2019,10,23]]},"reference":[{"key":"ref_1","first-page":"1087","article-title":"A Hybrid Particle Swarm Optimization Method for Flow Shop Scheduling Problem","volume":"39","author":"Tian","year":"2011","journal-title":"Acta Electron. Sinica"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"4993","DOI":"10.1080\/00207543.2016.1157276","article-title":"Speeding up a Rollout algorithm for complex parallel machine scheduling","volume":"54","author":"Ciavotta","year":"2016","journal-title":"Int. J. Prod. Res."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"1161","DOI":"10.1002\/aic.11725","article-title":"Scheduling dispensing and counting in secondary pharmaceutical manufacturing","volume":"55","author":"Ciavotta","year":"2009","journal-title":"AIChE J."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"172","DOI":"10.1016\/0377-2217(94)00235-5","article-title":"Preemptive scheduling in a two-stage multiprocessor flow shop is NP-hard","volume":"89","author":"Hoogeveen","year":"1996","journal-title":"Eur. J. Oper. Res."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"437","DOI":"10.3724\/SP.J.1004.2012.00437","article-title":"An Estimation of Distribution Algorithm for Solving Hybrid Flow-shop Scheduling Problem","volume":"38","author":"Wang","year":"2012","journal-title":"Acta Autom. Sin."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"461","DOI":"10.1016\/S0895-7177(02)00176-0","article-title":"An Exact Approach for Batch Scheduling in Flexible Flow Lines with Limited Intermediate Buffers","volume":"36","author":"Sawik","year":"2002","journal-title":"Math. Comput. Model."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1142\/S096031310200028X","article-title":"Scheduling of Printed Wiring Board Assembly in Surface Mount Technology Lines","volume":"11","author":"Sawik","year":"2002","journal-title":"J. Electron. Manuf."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.ejor.2009.09.024","article-title":"The Hybrid Flow Shop Scheduling Problem","volume":"205","author":"Ruiz","year":"2010","journal-title":"Eur. J. Oper. Res."},{"key":"ref_9","unstructured":"Xiao, W., Hao, P., Zhang, S., and Xu, X. (July, January 26). Hybrid Flow Shop Scheduling Using Genetic Algorithms. Proceedings of the 3rd World Congress on Intelligent Control and Automation, Hefei, China."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"2013","DOI":"10.1016\/j.cor.2004.01.003","article-title":"Simulated Annealing Heuristic for Flow Shop Scheduling Problems with Unrelated Parallel Machines","volume":"32","author":"Low","year":"2005","journal-title":"Comput. Oper. Res."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"115","DOI":"10.1016\/S0377-2217(02)00644-6","article-title":"Hybrid Flow-shop Scheduling Problems with Multiprocessor Task Systems","volume":"152","author":"Zinder","year":"2004","journal-title":"Eur. J. Oper. Res."},{"key":"ref_12","first-page":"1372","article-title":"Study on an average reward reinforcement learning algorithm","volume":"30","author":"Gao","year":"2007","journal-title":"Chin. J. Comput."},{"key":"ref_13","first-page":"677","article-title":"A novel off policy Q(\u03bb) algorithm based on linear function approximation","volume":"37","author":"Quan","year":"2014","journal-title":"Chin. J. Comput."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"1238","DOI":"10.1177\/0278364913495721","article-title":"Reinforcement learning in robotics: A survey","volume":"32","author":"Kober","year":"2013","journal-title":"Int. J. Robot. Res."},{"key":"ref_15","first-page":"765","article-title":"A reinforcement learning-based approach to dynamic job-shop scheduling","volume":"31","author":"Wei","year":"2005","journal-title":"Acta Autom. Sin."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"39","DOI":"10.1145\/1394608.1382172","article-title":"Self-optimizing memory controllers: A reinforcement learning approach","volume":"36","author":"Ipek","year":"2008","journal-title":"Comput. Archit."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"215","DOI":"10.1162\/neco.1994.6.2.215","article-title":"TD-Gammon, a self-teaching backgammon program, achieves master-level play","volume":"6","author":"Tesauro","year":"1994","journal-title":"Neural Comput."},{"key":"ref_18","unstructured":"Kocsis, L., and Szepesv\u00e1ri, C. (2006). Bandit based Monte-Carlo planning. Machine Learning: ECML 2006, Proceedings of the 17th European Conference on Machine Learning, Berlin, Germany, 18\u201322 September 2006, Springer."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1007\/BF00992698","article-title":"Q-learning","volume":"8","author":"Watkins","year":"1992","journal-title":"Mach. Learn."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"107","DOI":"10.1007\/s00186-006-0066-4","article-title":"Optimal Scheduling of a Two-stage Hybrid Flow Shop","volume":"64","author":"Haouar","year":"2006","journal-title":"Math. Methods Oper. Res."},{"key":"ref_21","first-page":"376","article-title":"Lagrangian Relation Algorithm for Real-time Hybrid Flow-shop Scheduling with No-wait in Process","volume":"21","author":"Hua","year":"2006","journal-title":"Control Decis."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"4353","DOI":"10.1080\/00207540210159536","article-title":"Sequencing a Hybrid Two-stage Flow Shop with Dedicated Machines","volume":"40","author":"Riane","year":"2002","journal-title":"Int. J Prod. Res."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"845","DOI":"10.1016\/j.cor.2006.04.004","article-title":"A Two-stage Hybrid Flow-shop Scheduling Problem with a Function Constraint and Unrelated Alternative Machines","volume":"35","author":"Low","year":"2008","journal-title":"Comput. Oper. Res."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"426","DOI":"10.1007\/s005000100109","article-title":"A Palmer-based Continuous Fuzzy Flexible Flow-shop Scheduling Algorithm","volume":"5","author":"Hong","year":"2001","journal-title":"Soft Comput."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"3161","DOI":"10.1080\/00207540500536939","article-title":"Multiprocessor Task Scheduling in Multistage Hybrid Flow-shops: An Ant Colony System Approach","volume":"44","author":"Ying","year":"2006","journal-title":"Int. J Prod. Res."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"907","DOI":"10.1016\/j.cor.2007.11.004","article-title":"A Tabu Search Heuristic for the Hybrid Flow-shop Scheduling with Finite Intermediate Buffers","volume":"36","author":"Wang","year":"2008","journal-title":"Comput. Oper. Res."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"541","DOI":"10.1007\/s00170-007-1048-2","article-title":"Using Ant Colony Optimization to Solve Hybrid Flow Shop Scheduling Problems","volume":"35","author":"Alaykyran","year":"2007","journal-title":"Int. J. Adv. Manuf. Technol."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"4655","DOI":"10.1080\/00207540701294627","article-title":"A Particle Swarm Optimization Algorithm for Hybrid Flow-shop scheduling with Multiprocessor Tasks","volume":"46","author":"Tseng","year":"2008","journal-title":"Int. J. Prod. Res."},{"key":"ref_29","first-page":"107","article-title":"Improved grey wolf optimization algorithm for flexible shop scheduling problem","volume":"41","author":"Wu","year":"2019","journal-title":"Manuf. Autom."},{"key":"ref_30","first-page":"18","article-title":"Study on multi-objective flexible Job-Shop scheduling problem based on hybrid artificial bee colony algorithm","volume":"36","author":"Meng","year":"2019","journal-title":"Appl. Res. Comput."},{"key":"ref_31","unstructured":"Liu, F., Zhang, X.P., Zou, F.X., and Zeng, L.L. (2009, January 17\u201319). Immune clonal selection algorithm for hybrid flow-shop scheduling problem. Proceedings of the Chinese Control and Decision Conference, Guilin, China."},{"key":"ref_32","first-page":"2991","article-title":"Research on Agent-based Hybrid Flow Shop Dynamic Scheduling Problem","volume":"37","author":"Wang","year":"2017","journal-title":"J. Comput. Appl."},{"key":"ref_33","unstructured":"Haarnoja, T., Zhou, A., Abbeel, P., and Levine, S. (2018). Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Haarnoja, T., Pong, V., Zhou, A., Dalal, M., Abbeel, P., and Levine, S. (2018, January 21\u201325). Composable deep reinforcement learning for robotic manipulation. Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia.","DOI":"10.1109\/ICRA.2018.8460756"},{"key":"ref_35","unstructured":"Gao, Y., Lin, J., Yu, F., Levine, S., and Darrell, T. (2018). Reinforcement learning from imperfect demonstrations. arXiv."},{"key":"ref_36","unstructured":"Gu, S., Lillicrap, T., Sutskever, I., and Levine, S. (2016, January 19\u201324). Continuous deep q-learning with model-based acceleration. Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA."},{"key":"ref_37","unstructured":"O\u2019Donoghue, B., Munos, R., Kavukcuoglu, K., and Mnih, V. (2016). Combining policy gradient and Q-learning. arXiv."},{"key":"ref_38","unstructured":"Bertsekas, D.P. (2005). Dynamic Programming and Optimal Control, Athena Scientific."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1038\/nature14236","article-title":"Human-level control through deep reinforcement learning","volume":"518","author":"Mnih","year":"2015","journal-title":"Nature"},{"key":"ref_40","first-page":"863","article-title":"Hybrid flow-shop scheduling approach based on genetic algorithm","volume":"14","author":"Wang","year":"2002","journal-title":"J. Syst. Simul."},{"key":"ref_41","first-page":"74","article-title":"Hybrid differential evolution algorithm for sortie scheduling of carrier aircraft","volume":"32","author":"Su","year":"2015","journal-title":"Comput. Simul."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/12\/11\/222\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T13:28:36Z","timestamp":1760189316000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/12\/11\/222"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,10,23]]},"references-count":41,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2019,11]]}},"alternative-id":["a12110222"],"URL":"https:\/\/doi.org\/10.3390\/a12110222","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2019,10,23]]}}}