{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,20]],"date-time":"2026-04-20T15:41:01Z","timestamp":1776699661875,"version":"3.51.2"},"reference-count":66,"publisher":"MDPI AG","issue":"8","license":[{"start":{"date-parts":[[2021,4,20]],"date-time":"2021-04-20T00:00:00Z","timestamp":1618876800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Applied Sciences"],"abstract":"<jats:p>In this paper, we present and discuss an innovative approach to solve Job Shop scheduling problems based on machine learning techniques. Traditionally, when choosing how to solve Job Shop scheduling problems, there are two main options: either use an efficient heuristic that provides a solution quickly, or use classic optimization approaches (e.g., metaheuristics) that take more time but will output better solutions, closer to their optimal value. In this work, we aim to create a novel architecture that incorporates reinforcement learning into scheduling systems in order to improve their overall performance and overcome the limitations that current approaches present. It is also intended to investigate the development of a learning environment for reinforcement learning agents to be able to solve the Job Shop scheduling problem. The reported experimental results and the conducted statistical analysis conclude about the benefits of using an intelligent agent created with reinforcement learning techniques. The main contribution of this work is proving that reinforcement learning has the potential to become the standard method whenever a solution is necessary quickly, since it solves any problem in very few seconds with high quality, approximate to the optimal methods.<\/jats:p>","DOI":"10.3390\/app11083710","type":"journal-article","created":{"date-parts":[[2021,4,20]],"date-time":"2021-04-20T13:58:04Z","timestamp":1618927084000},"page":"3710","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":27,"title":["Intelligent Scheduling with Reinforcement Learning"],"prefix":"10.3390","volume":"11","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8661-3080","authenticated-orcid":false,"given":"Bruno","family":"Cunha","sequence":"first","affiliation":[{"name":"ISRC\u2014Interdisciplinary Studies Research Center, 4200-072 Porto, Portugal"},{"name":"Institute of Engineering\u2014Polytechnic of Porto (ISEP\/P.PORTO), 4200-072 Porto, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0264-4710","authenticated-orcid":false,"given":"Ana","family":"Madureira","sequence":"additional","affiliation":[{"name":"ISRC\u2014Interdisciplinary Studies Research Center, 4200-072 Porto, Portugal"},{"name":"Institute of Engineering\u2014Polytechnic of Porto (ISEP\/P.PORTO), 4200-072 Porto, Portugal"},{"name":"INOV\u2014Instituto de Engenharia de Sistemas e Computadores Inova\u00e7\u00e3o, 1000-029 Lisboa, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0850-9755","authenticated-orcid":false,"given":"Benjamim","family":"Fonseca","sequence":"additional","affiliation":[{"name":"INESC TEC and University of Tr\u00e1s-os-Montes and Alto Douro (UTAD), 5000-801 Vila Real, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5126-188X","authenticated-orcid":false,"given":"Jo\u00e3o","family":"Matos","sequence":"additional","affiliation":[{"name":"Institute of Engineering\u2014Polytechnic of Porto (ISEP\/P.PORTO), 4200-072 Porto, Portugal"},{"name":"LEMA\u2014Laboratory for Mathematical Engineering, 4200-072 Porto, Portugal"}]}],"member":"1968","published-online":{"date-parts":[[2021,4,20]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1257\/jep.14.4.23","article-title":"Beyond computation: Information technology, organizational transformation and business performance","volume":"14","author":"Brynjolfsson","year":"2000","journal-title":"J. Econ. Perspect."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Cunha, B., Madureira, A.M., Fonseca, B., and Coelho, D. (2020). Deep Reinforcement Learning as a Job Shop Scheduling Solver: A Literature Review. Hybrid Intelligent Systems, Springer.","DOI":"10.1007\/978-3-030-14347-3_34"},{"key":"ref_3","unstructured":"Pinedo, M.L. (2016). Scheduling: Theory, Algorithms, and Systems, Springer. [5th ed.]."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1809","DOI":"10.1007\/s10845-017-1350-2","article-title":"Review of job shop scheduling research and its new perspectives under Industry 4.0","volume":"30","author":"Zhang","year":"2019","journal-title":"J. Intell. Manuf."},{"key":"ref_5","unstructured":"Madureira, A., Pereira, I., and Falc\u00e3o, D. (2013, January 9\u201310). Dynamic Adaptation for Scheduling Under Rush Manufacturing Orders With Case-Based Reasoning. Proceedings of the International Conference on Algebraic and Symbolic Computation (SYMCOMP), Lisbon, Portugal."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"271","DOI":"10.1080\/09537287.2017.1401143","article-title":"Event-driven production scheduling in SME","volume":"29","author":"Villa","year":"2018","journal-title":"Prod. Plan. Control"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"609","DOI":"10.2507\/IJSIMM17(4)447","article-title":"Determination of optimal production process using scheduling and simulation software","volume":"17","author":"Duplakova","year":"2018","journal-title":"Int. J. Simul. Model."},{"key":"ref_8","first-page":"319","article-title":"Optimization of time structures in manufacturing management by using scheduling software Lekin","volume":"5","author":"Balog","year":"2016","journal-title":"TEM J."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Sun, X., Wang, Y., Kang, H., Shen, Y., Chen, Q., and Wang, D. (2021). Modified Multi-Crossover Operator NSGA-III for Solving Low Carbon Flexible Job Shop Scheduling Problem. Processes, 9.","DOI":"10.3390\/pr9010062"},{"key":"ref_10","first-page":"42","article-title":"Application of simulation software in the production process of milled parts","volume":"1","year":"2018","journal-title":"SAR J."},{"key":"ref_11","unstructured":"Madureira, A. (2003). Aplica\u00e7\u00e3o de Meta-Heur\u00edsticas ao Problema de Escalonamento em Ambiente Din\u00e2mico de Produ\u00e7\u00e3o Discreta. [Ph.D. Thesis, Tese de Doutoramento, Universidade do Minho]."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1287\/moor.7.1.57","article-title":"Unit execution time shop problems","volume":"7","author":"Gonzalez","year":"1982","journal-title":"Math. Oper. Res."},{"key":"ref_13","first-page":"94","article-title":"Sequencing and Scheduling: An Introduction to the Mathematics of the Job-Shop","volume":"13","author":"Rand","year":"1982","journal-title":"J. Oper. Res. Soc."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Floudas, C.A., and Pardalos, P.M. (2009). Job-shop scheduling problemJob-shop Scheduling Problem. Encyclopedia of Optimization, Springer.","DOI":"10.1007\/978-0-387-74759-0"},{"key":"ref_15","unstructured":"Beir\u00e3o, N. (1997). Sistema de Apoio \u00e0 Decis\u00e3o para Sequenciamento de Opera\u00e7\u00f5es em Ambientes Job Shop. [Master\u2019s Thesis, Faculdade de Engenharia da Universidade do Porto]."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Cook, S.A. (1971, January 3\u20135). The complexity of theorem-proving procedures. Proceedings of the third Annual ACM Symposium on Theory of Computing, Shaker Heights, OH, USA.","DOI":"10.1145\/800157.805047"},{"key":"ref_17","unstructured":"Cormen, T.H., Leiserson, C.E., Rivest, R.L., and Stein, C. (2009). Introduction to Algorithms, The MIT Press."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Yamada, T., Yamada, T., and Nakano, R. (1997, January 18\u201319). Genetic Algorithms for Job-Shop Scheduling Problems. Proceedings of the Modern Heuristi for Decision Support, London, UK.","DOI":"10.1049\/PBCE055E_ch7"},{"key":"ref_19","unstructured":"Madureira, A., Cunha, B., Pereira, J.P., Pereira, I., and Gomes, S. (August, January 30). An Architecture for User Modeling on Intelligent and Adaptive Scheduling Systems. Proceedings of the Sixth World Congress on Nature and Biologically Inspired Computing (NaBIC), Porto, Portugal."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Wang, H., Sarker, B.R., Li, J., and Li, J. (2020). Adaptive scheduling for assembly job shop with uncertain assembly times based on dual Q-learning. Int. J. Prod. Res., 1\u201317.","DOI":"10.1080\/00207543.2020.1794075"},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Ojstersek, R., Tang, M., and Buchmeister, B. (2020). Due date optimization in multi-objective scheduling of flexible job shop production. Adv. Prod. Eng. Manag., 15.","DOI":"10.14743\/apem2020.4.380"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"210","DOI":"10.1147\/rd.33.0210","article-title":"Some Studies in Machine Learning Using the Game of Checkers","volume":"3","author":"Samuel","year":"1959","journal-title":"IBM J. Res. Dev."},{"key":"ref_23","unstructured":"Mitchell, T.M. (1997). Machine Learning, McGraw-Hill, Inc."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Cascio, D., Taormina, V., and Raso, G. (2019). Deep CNN for IIF Images Classification in Autoimmune Diagnostics. Appl. Sci., 9.","DOI":"10.3390\/app9081618"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Cascio, D., Taormina, V., and Raso, G. (2019). Deep Convolutional Neural Network for HEp-2 Fluorescence Intensity Classification. Appl. Sci., 9.","DOI":"10.3390\/app9030408"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Joshi, A.V. (2020). Machine Learning and Artificial Intelligence, Springer.","DOI":"10.1007\/978-3-030-26622-6"},{"key":"ref_27","unstructured":"Burkov, A. (2019). The Hundred-Page Machine Learning Book, CHaleyBooks."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"75","DOI":"10.1007\/BF00154794","article-title":"Cluster analysis","volume":"14","author":"Everitt","year":"1980","journal-title":"Qual. Quant."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Zimek, A., and Schubert, E. (2017). Outlier Detection. Encyclopedia of Database Systems, Springe.","DOI":"10.1007\/978-1-4899-7993-3_80719-1"},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"2323","DOI":"10.1126\/science.290.5500.2323","article-title":"Nonlinear dimensionality reduction by locally linear embedding","volume":"290","author":"Roweis","year":"2000","journal-title":"Science"},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"363","DOI":"10.1109\/TIT.1965.1053799","article-title":"Probability of error of some adaptive pattern-recognition machines","volume":"11","author":"Scudder","year":"1965","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"McClosky, D., Charniak, E., and Johnson, M. (2006, January 4\u20139). Effective self-training for parsing. Proceedings of the HLT-NAACL 2006\u2014Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, New York, NY, USA.","DOI":"10.3115\/1220835.1220855"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Yarowsky, D. (1995, January 26\u201330). Unsupervised word sense disambiguation rivaling supervised methods. Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, Cambridge, MA, USA.","DOI":"10.3115\/981658.981684"},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Blum, A., and Mitchell, T. (1998, January 24\u201326). Combining labeled and unlabeled data with co-training. Proceedings of the Annual ACM Conference on Computational Learning Theory, Madison, WI, USA.","DOI":"10.1145\/279943.279962"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"1529","DOI":"10.1109\/TKDE.2005.186","article-title":"Tri-training: Exploiting unlabeled data using three classifiers","volume":"17","author":"Zhou","year":"2005","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_36","unstructured":"Sutton, R.S., and Barto, A.G. (2018). Reinforcement Learning: An Introduction, The MIT Press. [2nd ed.]."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"212","DOI":"10.2307\/1415413","article-title":"The Law of Effect","volume":"39","author":"Thorndike","year":"1927","journal-title":"Am. J. Psychol."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"484","DOI":"10.1038\/nature16961","article-title":"Mastering the game of Go with deep neural networks and tree search","volume":"529","author":"Silver","year":"2016","journal-title":"Nature"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"354","DOI":"10.1038\/nature24270","article-title":"Mastering the game of Go without human knowledge","volume":"550","author":"Silver","year":"2017","journal-title":"Nature"},{"key":"ref_40","unstructured":"Akkaya, I., Andrychowicz, M., Chociej, M., Litwin, M., McGrew, B., Petron, A., Paino, A., Plappert, M., Powell, G., and Ribas, R. (2019). Solving Rubik\u2019s Cube with a Robot Hand. arXiv."},{"key":"ref_41","unstructured":"Nagabandi, A., Konoglie, K., Levine, S., and Kumar, V. (2019). Deep Dynamics Models for Learning Dexterous Manipulation. arXiv."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"12786","DOI":"10.1109\/TVT.2020.3025627","article-title":"Battery-Involved Energy Management for Hybrid Electric Bus Based on Expert-Assistance Deep Deterministic Policy Gradient Algorithm","volume":"69","author":"Wu","year":"2020","journal-title":"IEEE Trans. Veh. Technol."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"3751","DOI":"10.1109\/TII.2020.3014599","article-title":"Battery Thermal- and Health-Constrained Energy Management for Hybrid Electric Bus Based on Soft Actor-Critic DRL Algorithm","volume":"17","author":"Wu","year":"2021","journal-title":"IEEE Trans. Ind. Inform."},{"key":"ref_44","unstructured":"Kaplan, R., Sauer, C., and Sosa, A. (2017). Beating Atari with Natural Language Guided Reinforcement Learning. arXiv."},{"key":"ref_45","unstructured":"Salimans, T., and Chen, R. (2018). Learning Montezuma\u2019s Revenge from a Single Demonstration. arXiv."},{"key":"ref_46","unstructured":"Schulman, J., Levine, S., Abbeel, P., Jordan, M., and Moritz, P. (2015, January 6\u201311). Trust region policy optimization. Proceedings of the 32nd International Conference on Machine Learning (ICML-15), Lille, France."},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"McKay, B., Yao, X., Newton, C.S., Kim, J.H., and Furuhashi, T. (1999). Reinforcement Learning: Past, Present and Future. Simulated Evolution and Learning, Springer.","DOI":"10.1007\/3-540-48873-1"},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Zhang, T., Xie, S., and Rose, O. (2017, January 3\u20136). Real-time job shop scheduling based on simulation and Markov decision processes. Proceedings of the Winter Simulation Conference, Las Vegas, NV, USA.","DOI":"10.1109\/WSC.2017.8248100"},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"127","DOI":"10.1007\/s10951-017-0547-8","article-title":"The Current state of bounds on benchmark instances of the job-shop scheduling problem","volume":"21","year":"2018","journal-title":"J. Sched."},{"key":"ref_50","first-page":"231","article-title":"Reinforcement Learning Environment for Job Shop Scheduling Problems","volume":"12","author":"Cunha","year":"2020","journal-title":"Int. J. Comput. Inf. Syst. Ind. Mana. Appl."},{"key":"ref_51","unstructured":"Sommerville, I. (2011). Software Engineering, Addison Wesley. [9th ed.]."},{"key":"ref_52","unstructured":"Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. (2017). Proximal policy optimization algorithms. arXiv."},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"278","DOI":"10.1016\/0377-2217(93)90182-M","article-title":"Benchmarks for basic scheduling problems","volume":"64","author":"Taillard","year":"1993","journal-title":"Eur. J. Oper. Res."},{"key":"ref_54","unstructured":"Gamma, E., Helm, R., Johnson, R., and Vlissides, J. (1996). Design Patterns: Elements of Reusable Software, Pearson Education."},{"key":"ref_55","unstructured":"Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., and Riedmiller, M. (2013). Playing Atari with Deep Reinforcement Learning. arXiv."},{"key":"ref_56","first-page":"2825","article-title":"Scikit-learn: Machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"ref_57","unstructured":"Raschka, S., and Mirjalili, V. (2019). Python Machine Learning: Machine Learning and Deep Learning with Python, Scikit-learn, and TensorFlow 2, Packt Publishing Ltd."},{"key":"ref_58","first-page":"1","article-title":"Google supercharges machine learning tasks with TPU custom chip","volume":"18","author":"Jouppi","year":"2016","journal-title":"Google Blog May"},{"key":"ref_59","doi-asserted-by":"crossref","first-page":"282","DOI":"10.1016\/j.cor.2006.02.024","article-title":"A Very Fast TS\/SA Algorithm for the Job Shop Scheduling Problem","volume":"35","author":"Zhang","year":"2008","journal-title":"Comput. Oper. Res."},{"key":"ref_60","doi-asserted-by":"crossref","first-page":"154","DOI":"10.1016\/j.cor.2014.08.006","article-title":"A tabu search\/path relinking algorithm to solve the job shop scheduling problem","volume":"53","author":"Peng","year":"2015","journal-title":"Comput. Oper. Res."},{"key":"ref_61","first-page":"2623","article-title":"Deconstructing Nowicki and Smutnicki\u2019s i-TSAB Tabu Search Algorithm for the Job-Shop Scheduling Problem","volume":"33","author":"Howe","year":"2005","journal-title":"Comput. Oper. Res."},{"key":"ref_62","doi-asserted-by":"crossref","first-page":"331","DOI":"10.1007\/s10287-006-0023-y","article-title":"An Algorithm for the Job Shop Scheduling Problem based on Global Equilibrium Search Techniques","volume":"3","author":"Pardalos","year":"2006","journal-title":"Comput. Manag. Sci."},{"key":"ref_63","doi-asserted-by":"crossref","first-page":"86","DOI":"10.1214\/aoms\/1177731944","article-title":"A comparison of alternative tests of significance for the problem of m rankings","volume":"11","author":"Friedman","year":"1940","journal-title":"Ann. Math. Stat."},{"key":"ref_64","unstructured":"Nemenyi, P. (1963). Distribution-Free Multiple Comparisons. [Ph.D. Thesis, Princeton University]."},{"key":"ref_65","doi-asserted-by":"crossref","unstructured":"Vil\u00edm, P., Laborie, P., and Shaw, P. (2015). Failure-Directed Search for Constraint-Based Scheduling. International Conference on AI and OR Techniques in Constriant Programming for Combinatorial Optimization Problems, Springer.","DOI":"10.1007\/978-3-319-18008-3_30"},{"key":"ref_66","doi-asserted-by":"crossref","unstructured":"Pesant, G. (2015). Two Clause Learning Approaches for Disjunctive Scheduling. Principles and Practice of Constraint Programming, Springer.","DOI":"10.1007\/978-3-319-23219-5"}],"container-title":["Applied Sciences"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2076-3417\/11\/8\/3710\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T05:50:12Z","timestamp":1760161812000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2076-3417\/11\/8\/3710"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,4,20]]},"references-count":66,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2021,4]]}},"alternative-id":["app11083710"],"URL":"https:\/\/doi.org\/10.3390\/app11083710","relation":{},"ISSN":["2076-3417"],"issn-type":[{"value":"2076-3417","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,4,20]]}}}