{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,13]],"date-time":"2026-03-13T06:15:14Z","timestamp":1773382514517,"version":"3.50.1"},"reference-count":47,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2025,1,15]],"date-time":"2025-01-15T00:00:00Z","timestamp":1736899200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,1,15]],"date-time":"2025-01-15T00:00:00Z","timestamp":1736899200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Technology Innovation Fund Project of Dalian Neusoft University of Information","award":["TIFP202307"],"award-info":[{"award-number":["TIFP202307"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2025,2]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>The flexible job shop scheduling problem (FJSP) holds significant importance in both theoretical research and practical applications. Given the complexity and diversity of FJSP, improving the generalization and quality of scheduling methods has become a hot topic of interest in both industry and academia. To address this, this paper proposes a Preference-Based Mask-PPO (PBMP) algorithm, which leverages the strengths of preference learning and invalid action masking to optimize FJSP solutions. First, a reward predictor based on preference learning is designed to model reward prediction by comparing random fragments, eliminating the need for complex reward function design. Second, a novel intelligent switching mechanism is introduced, where proximal policy optimization (PPO) is employed to enhance exploration during sampling, and masked proximal policy optimization (Mask-PPO) refines the action space during training, significantly improving efficiency and solution quality. Furthermore, the Pearson correlation coefficient (PCC) is used to evaluate the performance of the preference model. Finally, comparative experiments on FJSP benchmark instances of varying sizes demonstrate that PBMP outperforms traditional scheduling strategies such as dispatching rules, OR-Tools, and other deep reinforcement learning (DRL) algorithms, achieving superior scheduling policies and faster convergence. Even with increasing instance sizes, preference learning proves to be an effective reward mechanism in reinforcement learning for FJSP. The ablation study further highlights the advantages of each key component in the PBMP algorithm across performance metrics.<\/jats:p>","DOI":"10.1007\/s40747-024-01772-x","type":"journal-article","created":{"date-parts":[[2025,1,15]],"date-time":"2025-01-15T08:38:59Z","timestamp":1736930339000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["Preference learning based deep reinforcement learning for flexible job shop scheduling problem"],"prefix":"10.1007","volume":"11","author":[{"ORCID":"https:\/\/orcid.org\/0009-0003-4032-1976","authenticated-orcid":false,"given":"Xinning","family":"Liu","sequence":"first","affiliation":[]},{"given":"Li","family":"Han","sequence":"additional","affiliation":[]},{"given":"Ling","family":"Kang","sequence":"additional","affiliation":[]},{"given":"Jiannan","family":"Liu","sequence":"additional","affiliation":[]},{"given":"Huadong","family":"Miao","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,1,15]]},"reference":[{"key":"1772_CR1","doi-asserted-by":"publisher","first-page":"3159805","DOI":"10.1155\/2016\/3159805","volume":"12","author":"S Wang","year":"2016","unstructured":"Wang S, Wan J, Li D, Zhang C (2016) Implementing smart factory of Industry 4.0: an outlook. Int J Distrib Sens Netw 12:3159805","journal-title":"Int J Distrib Sens Netw"},{"issue":"1","key":"1772_CR2","doi-asserted-by":"publisher","first-page":"61","DOI":"10.1002\/nav.3800010110","volume":"1","author":"SM Johnson","year":"1954","unstructured":"Johnson SM (1954) Optimal two-and three-stage production schedules with setup times included. Naval Res Logist Q 1(1):61\u201368","journal-title":"Naval Res Logist Q"},{"issue":"2","key":"1772_CR3","doi-asserted-by":"publisher","first-page":"117","DOI":"10.1287\/moor.1.2.117","volume":"1","author":"MR Garey","year":"1976","unstructured":"Garey MR, Sethi JR (1976) The complexity of flow-shop and job-shop scheduling. Math Oper Res 1(2):117\u2013129","journal-title":"Math Oper Res"},{"key":"1772_CR4","doi-asserted-by":"publisher","first-page":"369","DOI":"10.1007\/BF02238804","volume":"45","author":"P Brucker","year":"1990","unstructured":"Brucker P, Schlie R (1990) Job-shop scheduling with multi-purpose machines. Computing 45:369\u2013375","journal-title":"Computing"},{"key":"1772_CR5","doi-asserted-by":"publisher","first-page":"117","DOI":"10.1287\/moor.1.2.117","volume":"1","author":"MR Garey","year":"1976","unstructured":"Garey MR, Johnson DS, Sethi R (1976) The complexity of flowshop and jobshop scheduling. Math Oper Res 1:117\u2013129","journal-title":"Math Oper Res"},{"issue":"1","key":"1772_CR6","doi-asserted-by":"publisher","first-page":"45","DOI":"10.1287\/opre.25.1.45","volume":"25","author":"SS Panwalkar","year":"1977","unstructured":"Panwalkar SS, Iskander W (1977) A survey of scheduling rules. Oper Res 25(1):45\u201361","journal-title":"Oper Res"},{"issue":"10","key":"1772_CR7","doi-asserted-by":"publisher","first-page":"3202","DOI":"10.1016\/j.cor.2007.02.014","volume":"35","author":"F Pezzella","year":"2008","unstructured":"Pezzella F, Morganti G, Ciaschetti G (2008) A genetic algorithm for the flexible job-shop scheduling problem. Comput Oper Res 35(10):3202\u20133212","journal-title":"Comput Oper Res"},{"key":"1772_CR8","volume-title":"Reinforcement learning: an introduction","author":"RS Sutton","year":"2018","unstructured":"Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT Press, Cambridge"},{"issue":"2","key":"1772_CR9","doi-asserted-by":"publisher","first-page":"93","DOI":"10.1080\/09537287.2013.782846","volume":"25","author":"G Calleja","year":"2014","unstructured":"Calleja G, Pastor R (2014) A dispatching algorithm for flexible job-shop scheduling with transfer batches: an industrial application. Prod Plan Control 25(2):93\u2013109","journal-title":"Prod Plan Control"},{"key":"1772_CR10","doi-asserted-by":"publisher","first-page":"409","DOI":"10.1007\/s10479-017-2678-x","volume":"264","author":"MA Ort\u00edz","year":"2018","unstructured":"Ort\u00edz MA, Betancourt LE, Negrete KP et al (2018) Dispatching algorithm for production programming of flexible job-shop systems in the smart factory industry. Ann Oper Res 264:409\u2013433","journal-title":"Ann Oper Res"},{"key":"1772_CR11","doi-asserted-by":"publisher","first-page":"563","DOI":"10.1007\/s00170-005-0375-4","volume":"32","author":"M Saidi-Mehrabad","year":"2007","unstructured":"Saidi-Mehrabad M, Fattahi P (2007) Flexible job shop scheduling with tabu search algorithms. Int J Adv Manuf Technol 32:563\u2013570","journal-title":"Int J Adv Manuf Technol"},{"key":"1772_CR12","doi-asserted-by":"publisher","first-page":"603","DOI":"10.1007\/s10845-015-1039-3","volume":"29","author":"M Nouiri","year":"2018","unstructured":"Nouiri M, Bekrar A, Jemai A et al (2018) An effective and distributed particle swarm optimization algorithm for flexible job-shop scheduling problem. J Intell Manuf 29:603\u2013615","journal-title":"J Intell Manuf"},{"key":"1772_CR13","doi-asserted-by":"publisher","DOI":"10.1016\/j.ijpe.2021.108342","volume":"243","author":"R Braune","year":"2022","unstructured":"Braune R, Benda F, Doerner KF et al (2022) A genetic programming learning approach to generate dispatching rules for flexible shop scheduling problems. Int J Prod Econ 243:108342","journal-title":"Int J Prod Econ"},{"key":"1772_CR14","unstructured":"Zhang W, Dietterich TG (1995) A reinforcement learning approach to job-shop scheduling. In: IJCAI, vol 95, pp 1114\u20131120"},{"key":"1772_CR15","doi-asserted-by":"crossref","unstructured":"Xue T, Zeng P, Yu H (2018) A reinforcement learning method for multi-AGV scheduling in manufacturing. In: 2018 IEEE international conference on industrial technology (ICIT). IEEE, Lyon, pp 1557\u20131561","DOI":"10.1109\/ICIT.2018.8352413"},{"issue":"11","key":"1772_CR16","doi-asserted-by":"publisher","first-page":"3362","DOI":"10.1080\/00207543.2020.1717008","volume":"58","author":"D Shi","year":"2020","unstructured":"Shi D, Fan W, Xiao Y et al (2020) Intelligent scheduling of discrete automated production line via deep reinforcement learning. Int J Prod Res 58(11):3362\u20133380","journal-title":"Int J Prod Res"},{"issue":"2","key":"1772_CR17","doi-asserted-by":"publisher","first-page":"375","DOI":"10.2507\/IJSIMM20-2-CO7","volume":"20","author":"BA Han","year":"2021","unstructured":"Han BA, Yang J (2021) A deep reinforcement learning based solution for flexible job shop scheduling problem. Int J Simul Model 20(2):375\u2013386","journal-title":"Int J Simul Model"},{"issue":"4","key":"1772_CR18","doi-asserted-by":"publisher","first-page":"1157","DOI":"10.1093\/jcde\/qwac044","volume":"9","author":"SH Oh","year":"2022","unstructured":"Oh SH, Cho YI, Woo JH (2022) Distributional reinforcement learning with the independent learners for flexible job shop scheduling problem with high variability. J Comput Design Eng 9(4):1157\u20131174","journal-title":"J Comput Design Eng"},{"key":"1772_CR19","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2022.110083","volume":"259","author":"JD Zhang","year":"2023","unstructured":"Zhang JD, He Z, Chan WH et al (2023) DeepMAG: deep reinforcement learning with multi-agent graphs for flexible job shop scheduling. Knowl-Based Syst 259:110083","journal-title":"Knowl-Based Syst"},{"issue":"4","key":"1772_CR20","doi-asserted-by":"publisher","first-page":"1036","DOI":"10.1109\/TETCI.2022.3145706","volume":"7","author":"Y Du","year":"2022","unstructured":"Du Y, Li J, Chen X et al (2022) Knowledge-based reinforcement learning and estimation of distribution algorithm for flexible job shop scheduling problem. IEEE Trans Emerg Top Comput Intell 7(4):1036\u20131050","journal-title":"IEEE Trans Emerg Top Comput Intell"},{"key":"1772_CR21","doi-asserted-by":"crossref","unstructured":"Bonetta G, Zago D, Cancelliere R et al (2023) Job shop scheduling via deep reinforcement learning: a sequence to sequence approach. In: International conference on learning and intelligent optimization. Springer International Publishing, Cham, pp 475\u2013490","DOI":"10.1007\/978-3-031-44505-7_32"},{"issue":"1","key":"1772_CR22","doi-asserted-by":"publisher","first-page":"201","DOI":"10.1109\/TSMC.2023.3305541","volume":"54","author":"R Li","year":"2024","unstructured":"Li R, Gong WY, Wang L, Lu C, Dong CX (2024) Co-evolution with deep reinforcement learning for energy-aware distributed heterogeneous flexible job shop scheduling. IEEE Trans Syst Man Cybern Syst 54(1):201\u2013211","journal-title":"IEEE Trans Syst Man Cybern Syst"},{"key":"1772_CR23","doi-asserted-by":"crossref","unstructured":"Ren T, Dong Z, Qi F et al (2024) Solving the flow-shop scheduling problem with human factors and two competing agents with deep reinforcement learning. Eng Optim 1\u201317","DOI":"10.1080\/0305215X.2024.2329998"},{"key":"1772_CR24","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2023.123019","volume":"245","author":"E Yuan","year":"2024","unstructured":"Yuan E, Wang L, Cheng S, Song S, Fan W, Li Y (2024) Solving flexible job shop scheduling problems via deep reinforcement learning. Expert Syst Appl 245:123019","journal-title":"Expert Syst Appl"},{"key":"1772_CR25","doi-asserted-by":"publisher","first-page":"3696","DOI":"10.3390\/electronics13183696","volume":"13","author":"S Xu","year":"2024","unstructured":"Xu S, Li Y, Li Q (2024) A deep reinforcement learning method based on a transformer model for the flexible job shop scheduling problem. Electronics 13:3696. https:\/\/doi.org\/10.3390\/electronics13183696","journal-title":"Electronics"},{"issue":"4","key":"1772_CR26","doi-asserted-by":"publisher","first-page":"3020","DOI":"10.1109\/TASE.2021.3104716","volume":"19","author":"S Luo","year":"2021","unstructured":"Luo S, Zhang L, Fan Y (2021) Real-time scheduling for dynamic partial-no-wait multiobjective flexible job shop by deep reinforcement learning. IEEE Trans Autom Sci Eng 19(4):3020\u20133038","journal-title":"IEEE Trans Autom Sci Eng"},{"key":"1772_CR27","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2022.117796","volume":"205","author":"K Lei","year":"2022","unstructured":"Lei K, Guo P, Zhao W et al (2022) A multi-action deep reinforcement learning framework for flexible job-shop scheduling problem. Expert Syst Appl 205:117796","journal-title":"Expert Syst Appl"},{"key":"1772_CR28","first-page":"1","volume":"2023","author":"L Zhao","year":"2023","unstructured":"Zhao L, Fan J, Zhang C et al (2023) A DRL-based reactive scheduling policy for flexible job shops with random job arrivals. IEEE Trans Autom Sci Eng 2023:1\u201312","journal-title":"IEEE Trans Autom Sci Eng"},{"issue":"2","key":"1772_CR29","doi-asserted-by":"publisher","first-page":"82","DOI":"10.3390\/info15020082","volume":"15","author":"YH Chang","year":"2024","unstructured":"Chang YH, Liu CH, You SD (2024) Scheduling for the flexible job-shop problem with a dynamic number of machines using deep reinforcement learning. Information 15(2):82","journal-title":"Information"},{"key":"1772_CR30","doi-asserted-by":"crossref","unstructured":"Hammami NEH, Lardeux B, Hadj-Alouane AB et al (2024) Design and calibration of a DRL algorithm for solving the job shop scheduling problem under unexpected job arrivals. Flex Serv Manuf J 1\u201332","DOI":"10.1007\/s10696-024-09540-2"},{"issue":"1","key":"1772_CR31","doi-asserted-by":"publisher","first-page":"147","DOI":"10.1080\/00207543.2021.1943762","volume":"61","author":"J Heger","year":"2023","unstructured":"Heger J, Voss T (2023) Dynamically adjusting the k-values of the ATCS rule in a flexible flow shop scenario with reinforcement learning. Int J Prod Res 61(1):147\u2013161","journal-title":"Int J Prod Res"},{"issue":"4","key":"1772_CR32","doi-asserted-by":"publisher","DOI":"10.1002\/cpe.6658","volume":"34","author":"X Long","year":"2022","unstructured":"Long X, Zhang J, Qi X et al (2022) A self-learning artificial bee colony algorithm based on reinforcement learning for a flexible job-shop scheduling problem. Concurr Comput Pract Exp 34(4):e6658","journal-title":"Concurr Comput Pract Exp"},{"issue":"2","key":"1772_CR33","doi-asserted-by":"publisher","first-page":"1600","DOI":"10.1109\/TII.2022.3189725","volume":"19","author":"W Song","year":"2022","unstructured":"Song W, Chen X, Li Q et al (2022) Flexible job-shop scheduling via graph neural network and deep reinforcement learning. IEEE Trans Ind Inf 19(2):1600\u20131610","journal-title":"IEEE Trans Ind Inf"},{"issue":"4","key":"1772_CR34","doi-asserted-by":"publisher","first-page":"5695","DOI":"10.1109\/TNNLS.2022.3208942","volume":"35","author":"Y Du","year":"2022","unstructured":"Du Y, Li J, Li C et al (2022) A reinforcement learning approach for flexible job shop scheduling problem with crane transportation and setup times. IEEE Trans Neural Netw Learn Syst 35(4):5695\u20135709","journal-title":"IEEE Trans Neural Netw Learn Syst"},{"key":"1772_CR35","doi-asserted-by":"publisher","DOI":"10.1016\/j.aei.2023.102230","volume":"58","author":"M Yuan","year":"2023","unstructured":"Yuan M, Huang H, Li Z et al (2023) A multi-agent double deep-Q-network based on state machine and event stream for flexible job shop scheduling problem. Adv Eng Inform 58:102230","journal-title":"Adv Eng Inform"},{"key":"1772_CR36","doi-asserted-by":"publisher","DOI":"10.1016\/j.aei.2023.102307","volume":"59","author":"S Yang","year":"2024","unstructured":"Yang S, Wang J, Xu Z (2024) Learning to schedule dynamic distributed reconfigurable workshops using expected deep Q-network. Adv Eng Inform 59:102307","journal-title":"Adv Eng Inform"},{"issue":"1","key":"1772_CR37","doi-asserted-by":"publisher","first-page":"75","DOI":"10.1007\/s10845-022-02037-5","volume":"35","author":"X Jing","year":"2024","unstructured":"Jing X, Yao X, Liu M et al (2024) Multi-agent reinforcement learning based on graph convolutional network for flexible job shop scheduling. J Intell Manuf 35(1):75\u201393","journal-title":"J Intell Manuf"},{"key":"1772_CR38","doi-asserted-by":"publisher","first-page":"50935","DOI":"10.1109\/ACCESS.2024.3384923","volume":"12","author":"D Huang","year":"2024","unstructured":"Huang D, Zhao H, Zhang L, Chen K (2024) Learning to dispatch for flexible job shop scheduling based on deep reinforcement learning via graph gated channel transformation. IEEE Access 12:50935\u201350948","journal-title":"IEEE Access"},{"key":"1772_CR39","doi-asserted-by":"publisher","first-page":"584","DOI":"10.3390\/machines12080584","volume":"12","author":"H Tang","year":"2024","unstructured":"Tang H, Dong J (2024) Solving flexible job-shop scheduling problem with heterogeneous graph neural network based on relation and deep reinforcement learning. Machines 12:584. https:\/\/doi.org\/10.3390\/machines12080584","journal-title":"Machines"},{"issue":"3","key":"1772_CR40","doi-asserted-by":"publisher","first-page":"3573","DOI":"10.32604\/cmc.2024.055244","volume":"80","author":"Q Zhu","year":"2024","unstructured":"Zhu Q, Gao K, Huang W et al (2024) Q-learning-assisted meta-heuristics for scheduling distributed hybrid flow shop problems. Comput Mater Continua 80(3):3573\u20133589","journal-title":"Comput Mater Continua"},{"key":"1772_CR41","unstructured":"Huang S, Onta\u00f1\u00f3n S (2020) A closer look at invalid action masking in policy gradient algorithms. arXiv preprint http:\/\/arxiv.org\/abs\/2006.14171"},{"key":"1772_CR42","unstructured":"Lawrence S (1984) An experimental investigation of heuristic scheduling techniques. Supplement to resource constrained project scheduling, 1984"},{"key":"1772_CR43","doi-asserted-by":"crossref","unstructured":"Akiba T, Sano S, Yanase T et al (2019) Optuna: a next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp 2623\u20132631","DOI":"10.1145\/3292500.3330701"},{"key":"1772_CR44","unstructured":"Fisher H, Thompson G Probabilistic learning combinations of local job-shop scheduling rules. In: Muth JF, Thompson GL (eds) Industrial scheduling, vol 1963. Prentice-Hall, Upper Saddle River, pp 225\u2013251"},{"issue":"1","key":"1772_CR45","doi-asserted-by":"publisher","first-page":"137","DOI":"10.1016\/s0377-2217(97)00019-2","volume":"109","author":"E Demirkol","year":"1998","unstructured":"Demirkol E, Mehta S, Uzsoy R (1998) Benchmarks for shop scheduling problems. Eur J Oper Res 109(1):137\u2013141. https:\/\/doi.org\/10.1016\/s0377-2217(97)00019-2","journal-title":"Eur J Oper Res"},{"issue":"10","key":"1772_CR46","doi-asserted-by":"publisher","first-page":"1495","DOI":"10.1287\/mnsc.38.10.1495","volume":"38","author":"RH Storer","year":"1992","unstructured":"Storer RH, Wu SD, Vaccari R (1992) New search spaces for sequencing problems with application to job shop scheduling. Manag Sci 38(10):1495\u20131509. https:\/\/doi.org\/10.1287\/mnsc.38.10.1495","journal-title":"Manag Sci"},{"issue":"2","key":"1772_CR47","doi-asserted-by":"publisher","first-page":"278","DOI":"10.1016\/0377-2217(93)90182-M","volume":"64","author":"E Taillard","year":"1993","unstructured":"Taillard E (1993) Benchmarks for basic scheduling problems. Eur J Oper Res 64(2):278\u2013285","journal-title":"Eur J Oper Res"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-024-01772-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-024-01772-x\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-024-01772-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,7]],"date-time":"2025-02-07T16:29:52Z","timestamp":1738945792000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-024-01772-x"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,1,15]]},"references-count":47,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,2]]}},"alternative-id":["1772"],"URL":"https:\/\/doi.org\/10.1007\/s40747-024-01772-x","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"value":"2199-4536","type":"print"},{"value":"2198-6053","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,1,15]]},"assertion":[{"value":"6 June 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"29 December 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"15 January 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}],"article-number":"144"}}