{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,3]],"date-time":"2026-06-03T06:56:12Z","timestamp":1780469772924,"version":"3.54.1"},"reference-count":124,"publisher":"SAGE Publications","issue":"1","license":[{"start":{"date-parts":[[2024,1,30]],"date-time":"2024-01-30T00:00:00Z","timestamp":1706572800000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"},{"start":{"date-parts":[[2024,1,30]],"date-time":"2024-01-30T00:00:00Z","timestamp":1706572800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Mining Technology: Transactions of the Institutions of Mining and Metallurgy"],"published-print":{"date-parts":[[2024,3]]},"abstract":"<jats:p>The mathematical methods developed so far for addressing truck dispatching problems in fleet management systems (FMSs) of open-pit mines fail to capture the autonomy and dynamicity demanded by Mining 4.0, having led to the popularity of reinforcement learning (RL) methods capable of capturing real-time operational changes. Nonetheless, this nascent field feels the absence of a comprehensive study to elicit the shortfalls of previous studies in favour of more mature future works. To fill the gap, the present study attempts to critically review previously published articles in RL-based mine FMSs through both developing a five-feature-class scale embedded with 29 widely used dispatching features and an insightful review of basics and trends in RL. Results show that 60% of those features were neglected in previous works and that the underlying algorithms have many potentials for improvement. This study also laid out future research directions, pertinent challenges and possible solutions.<\/jats:p>","DOI":"10.1177\/25726668231222998","type":"journal-article","created":{"date-parts":[[2024,1,31]],"date-time":"2024-01-31T00:47:36Z","timestamp":1706662056000},"page":"50-73","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":19,"title":["Transition to intelligent fleet management systems in open pit mines: A critical review on application of reinforcement-learning-based systems"],"prefix":"10.1177","volume":"133","author":[{"given":"Arman","family":"Hazrathosseini","sequence":"first","affiliation":[{"name":"IntelMine Lab, Department of Mining, Metallurgical and Materials Engineering, Laval University, Quebec City, Canada"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Ali","family":"Moradi Afrapoli","sequence":"additional","affiliation":[{"name":"IntelMine Lab, Department of Mining, Metallurgical and Materials Engineering, Laval University, Quebec City, Canada"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"179","published-online":{"date-parts":[[2024,1,30]]},"reference":[{"key":"e_1_3_2_2_1","unstructured":"Achiam J (2018) Spinning up in deep reinforcement learning GitHub repository."},{"key":"e_1_3_2_3_1","doi-asserted-by":"publisher","DOI":"10.1076\/ijsm.16.1.59.3408"},{"key":"e_1_3_2_4_1","doi-asserted-by":"publisher","DOI":"10.3390\/electronics8050543"},{"key":"e_1_3_2_5_1","first-page":"299","volume-title":"Orebody modelling and strategic mine planning symposium","author":"Askari-Nasab H","year":"2014","unstructured":"Askari-Nasab H, Upadhyay S, Torkamani E, et al. (2014) Simulation optimisation of mine operational plans. In: Orebody modelling and strategic mine planning symposium, Perth, WA, Australia, 24\u201326 November 2014, pp. 299\u2013311."},{"key":"e_1_3_2_6_1","doi-asserted-by":"crossref","unstructured":"Baird LC (1994) Reinforcement learning in continuous time: Advantage updating. In: Proceedings of 1994 IEEE international conference on neural networks (ICNN'94) Orlando FL USA 1994 vol. 4 pp. 2448\u20132453. https:\/\/doi.org\/10.1109\/ICNN.1994.374604.","DOI":"10.1109\/ICNN.1994.374604"},{"key":"e_1_3_2_7_1","doi-asserted-by":"crossref","unstructured":"Bastos GS Souza LE Ramos FT et al. (2011) A single-dependent agent approach for stochastic time-dependent truck dispatching in open-pit mining. In: 2011 14th International IEEE conference on intelligent transportation systems (ITSC) Washington DC USA pp. 1057\u20131062. https:\/\/doi.org\/10.1109\/ITSC.2011.6082902.","DOI":"10.1109\/ITSC.2011.6082902"},{"key":"e_1_3_2_8_1","volume-title":"Advances in neural information processing systems 20","author":"Bhatnagar S","year":"2007","unstructured":"Bhatnagar S, Ghavamzadeh M, Lee M, et al. (2007) Incremental natural actor-critic algorithms. In: Advances in neural information processing systems 20, Vancouver, B.C., Canada, December 3, pp. 100\u2013103."},{"key":"e_1_3_2_9_1","volume-title":"Advances in neural information processing systems, 13","author":"Boyan J","year":"2000","unstructured":"Boyan J, Littman M (2000) Exact solutions to time-dependent MDPs. In: Advances in neural information processing systems, 13, MIT Press, Cambridge, Massachusetts, United States, pp. 70\u201377."},{"key":"e_1_3_2_10_1","doi-asserted-by":"publisher","DOI":"10.1017\/9781009089517"},{"key":"e_1_3_2_11_1","doi-asserted-by":"crossref","unstructured":"Busoniu L Babuska R De Schutter B (2006) Multi-agent reinforcement learning: A survey. In: 2006 9th international conference on control automation robotics and vision. Singapore pp. 1\u20136. https:\/\/doi.org\/10.1109\/ICARCV.2006.345353.","DOI":"10.1109\/ICARCV.2006.345353"},{"key":"e_1_3_2_12_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-14435-6_7"},{"key":"e_1_3_2_13_1","doi-asserted-by":"publisher","DOI":"10.3390\/app11114948"},{"key":"e_1_3_2_14_1","volume-title":"Caterpillar Performance Handbook","author":"Caterpillar Inc.","year":"2010","unstructured":"Caterpillar Inc. (2010) Caterpillar Performance Handbook. Caterpillar Inc."},{"key":"e_1_3_2_15_1","doi-asserted-by":"publisher","DOI":"10.1155\/2015\/745378"},{"key":"e_1_3_2_16_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ijmst.2017.01.007"},{"key":"e_1_3_2_17_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ejor.2017.02.039"},{"key":"e_1_3_2_18_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11053-020-09766-5"},{"key":"e_1_3_2_19_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.resourpol.2021.102522"},{"key":"e_1_3_2_20_1","doi-asserted-by":"crossref","unstructured":"Choudhury S Naik H (2022) Use of machine learning algorithm models to optimize the fleet management system in opencast mines. In: 2022 IEEE 7th international conference for convergence in technology (I2CT) Mumbai India pp. 1\u20138. https:\/\/doi.org\/10.1109\/I2CT54291.2022.9825450.","DOI":"10.1109\/I2CT54291.2022.9825450"},{"key":"e_1_3_2_21_1","volume-title":"Advances in neural information processing systems 31","author":"Chua K","year":"2018","unstructured":"Chua K, Calandra R, McAllister R, et al. (2018) Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In: Advances in neural information processing systems 31 (NeurIPS 2018), Montr\u00e9al, Canada, December 2, pp. 40\u201352."},{"issue":"746","key":"e_1_3_2_22_1","first-page":"2","article-title":"The dynamics of reinforcement learning in cooperative multiagent systems","volume":"1998","author":"Claus C","year":"1998","unstructured":"Claus C, Boutilier C (1998) The dynamics of reinforcement learning in cooperative multiagent systems. AAAI\/IAAI 1998(746\u2013752): 2.","journal-title":"AAAI\/IAAI"},{"key":"e_1_3_2_23_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.procs.2021.08.172"},{"key":"e_1_3_2_24_1","doi-asserted-by":"publisher","DOI":"10.1201\/9780203881248"},{"key":"e_1_3_2_25_1","doi-asserted-by":"publisher","DOI":"10.3390\/min11060587"},{"key":"e_1_3_2_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2016.2522401"},{"key":"e_1_3_2_27_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-981-15-4095-0_2"},{"key":"e_1_3_2_28_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-021-05961-4"},{"key":"e_1_3_2_29_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-18305-3_1"},{"issue":"7","key":"e_1_3_2_30_1","first-page":"433","article-title":"Optimization of shovel\u2013truck system for surface mining","volume":"109","author":"Ercelebi SG","year":"2009","unstructured":"Ercelebi SG, Bascetin A (2009) Optimization of shovel\u2013truck system for surface mining. Journal of the Southern African Institute of Mining and Metallurgy 109(7): 433\u2013439.","journal-title":"Journal of the Southern African Institute of Mining and Metallurgy"},{"key":"e_1_3_2_31_1","volume-title":"Advances in neural information processing systems, Curran Associates, Red Hook, New York, United States, 29","author":"Foerster J","year":"2016","unstructured":"Foerster J, Assael IA, De Freitas N, et al. (2016) Learning to communicate with deep multi-agent reinforcement learning. In: Advances in neural information processing systems, Curran Associates, Red Hook, New York, United States, 29, pp. 1\u20139."},{"key":"e_1_3_2_32_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.11794"},{"key":"e_1_3_2_33_1","unstructured":"Gartner Inc. (2017) Gartner hype cycle for emerging technologies in 2017. Available at: https:\/\/www.gartner.com\/ (accessed 1 April 2023)."},{"issue":"9","key":"e_1_3_2_34_1","article-title":"Variance reduction techniques for gradient estimates in reinforcement learning","volume":"5","author":"Greensmith E","year":"2004","unstructured":"Greensmith E, Bartlett PL, Baxter J (2004) Variance reduction techniques for gradient estimates in reinforcement learning. Journal of Machine Learning Research 5(9).","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_2_35_1","doi-asserted-by":"publisher","DOI":"10.1504\/IJADS.2011.038091"},{"key":"e_1_3_2_36_1","unstructured":"Haarnoja T Zhou A Hartikainen K et al. (2018) Soft actor\u2013critic algorithms and applications. arXiv preprint arXiv:1812.05905."},{"key":"e_1_3_2_37_1","unstructured":"Hansen N Wang X Su H (2022) Temporal difference learning for model predictive control. arXiv preprint arXiv:2203.04955."},{"key":"e_1_3_2_38_1","volume-title":"2015 AAAI fall symposium series","author":"Hausknecht M","year":"2015","unstructured":"Hausknecht M, Stone P (2015) Deep recurrent q-learning for partially observable MDPS. In: 2015 AAAI fall symposium series. ArXiv, abs\/1507.06527."},{"key":"e_1_3_2_39_1","article-title":"Intelligent fleet management systems in surface mining: Status, threats, and opportunities","author":"Hazrathosseini A","unstructured":"Hazrathosseini A, Moradi Afrapoli A (2023a) Intelligent fleet management systems in surface mining: Status, threats, and opportunities. Mining, Metallurgy & Exploration, 40: 2087\u20132106. https:\/\/doi.org\/10.1007\/s42461-023-00875-2.","journal-title":"Mining, Metallurgy & Exploration"},{"key":"e_1_3_2_40_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.resourpol.2022.103155"},{"key":"e_1_3_2_41_1","doi-asserted-by":"crossref","unstructured":"Hazrathosseini A Moradi Afrapoli A (2024) Maximizing mining operations: Unlocking the crucial role of intelligent fleet management systems in surface mining's value chain. Mining 4: 7-20. https:\/\/doi.org\/10.3390\/mining4010002.","DOI":"10.3390\/mining4010002"},{"key":"e_1_3_2_42_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-981-15-4095-0_4"},{"key":"e_1_3_2_43_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.resconrec.2022.106664"},{"key":"e_1_3_2_44_1","doi-asserted-by":"publisher","DOI":"10.1177\/0278364920987859"},{"key":"e_1_3_2_45_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-44783-0_35"},{"key":"e_1_3_2_46_1","volume-title":"Second AISB symposium on adaptive agents and multi-agent systems","author":"Kapetanakis S","year":"2002","unstructured":"Kapetanakis S, Kudenko D (2002) Improving on the reinforcement learning of coordination in cooperative multi-agent systems. In: Second AISB symposium on adaptive agents and multi-agent systems."},{"key":"e_1_3_2_47_1","unstructured":"Khorasgani H Wang H Gupta C (2020) Challenges of applying deep reinforcement learning in dynamic dispatching. arXiv preprint arXiv:2011.05570."},{"key":"e_1_3_2_48_1","doi-asserted-by":"crossref","unstructured":"Khorasgani H Wang H Tang H-K et al. (2021) K-nearest multi-agent deep reinforcement learning for collaborative tasks with a variable number of agents. In: 2021 IEEE international conference on Big Data (Big Data). Orlando FL USA pp. 3883\u20133889 https:\/\/doi.org\/10.1109\/BigData52589.2021.9671691.","DOI":"10.1109\/BigData52589.2021.9671691"},{"key":"e_1_3_2_49_1","volume-title":"Advances in neural information processing systems, 12","author":"Konda V","year":"1999","unstructured":"Konda V, Tsitsiklis J (1999) Actor\u2013critic algorithms. In: Advances in neural information processing systems, 12, MIT Press, Cambridge, Massachusetts, United States, pp. 1008\u20131014."},{"key":"e_1_3_2_50_1","doi-asserted-by":"publisher","DOI":"10.3233\/KES-2010-0206"},{"key":"e_1_3_2_51_1","doi-asserted-by":"crossref","unstructured":"Lee D (2020) Birth of Intelligence: From RNA to Artificial Intelligence pp. 30\u201350. New York NY USA: Oxford University Press.","DOI":"10.1093\/oso\/9780190908324.001.0001"},{"key":"e_1_3_2_52_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-18305-3_3"},{"key":"e_1_3_2_53_1","doi-asserted-by":"publisher","DOI":"10.1016\/0167-9031(90)90543-2"},{"key":"e_1_3_2_54_1","unstructured":"Lillicrap TP Hunt JJ Pritzel A et al. (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971."},{"key":"e_1_3_2_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/3219819.3219993"},{"key":"e_1_3_2_56_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-9031(87)90910-8"},{"key":"e_1_3_2_57_1","volume-title":"Advances in neural information processing systems, Long Beach, California, USA, p. 30","author":"Lowe R","year":"2017","unstructured":"Lowe R, Wu YI, Tamar A, et al. (2017). Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in neural information processing systems, Long Beach, California, USA, p. 30."},{"key":"e_1_3_2_58_1","doi-asserted-by":"publisher","DOI":"10.1049\/stg2.12068"},{"key":"e_1_3_2_59_1","unstructured":"MathWorks. Available at: https:\/\/www.mathworks.com\/products\/simevents.html."},{"key":"e_1_3_2_60_1","doi-asserted-by":"publisher","DOI":"10.1109\/JRPROC.1961.287775"},{"key":"e_1_3_2_61_1","volume-title":"International conference on machine learning","author":"Mnih V","year":"2016","unstructured":"Mnih V, Badia AP, Mirza M, et al. (2016) Asynchronous methods for deep reinforcement learning. In: International conference on machine learning. PMLR 48: 1928\u20131937. https:\/\/proceedings.mlr.press\/v48\/mniha16.html."},{"key":"e_1_3_2_62_1","doi-asserted-by":"publisher","DOI":"10.1038\/nature14236"},{"key":"e_1_3_2_63_1","doi-asserted-by":"publisher","DOI":"10.1080\/17480930.2021.1949861"},{"key":"e_1_3_2_64_1","doi-asserted-by":"publisher","DOI":"10.1080\/17480930.2022.2067709"},{"key":"e_1_3_2_65_1","doi-asserted-by":"publisher","DOI":"10.1080\/17480930.2017.1336607"},{"key":"e_1_3_2_66_1","doi-asserted-by":"publisher","DOI":"10.1504\/IJMME.2020.111929"},{"key":"e_1_3_2_67_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ejor.2019.01.008"},{"key":"e_1_3_2_68_1","doi-asserted-by":"publisher","DOI":"10.1080\/25726668.2018.1473314"},{"key":"e_1_3_2_69_1","doi-asserted-by":"publisher","DOI":"10.17159\/2411-9717\/522\/2021"},{"key":"e_1_3_2_70_1","doi-asserted-by":"publisher","DOI":"10.1080\/0305215X.2022.2153840"},{"key":"e_1_3_2_71_1","unstructured":"Nagabandi A Clavera I Liu S et al. (2018) Learning to adapt in dynamic real-world environments through meta-reinforcement learning. arXiv preprint arXiv:1803.11347."},{"key":"e_1_3_2_72_1","doi-asserted-by":"publisher","DOI":"10.1109\/WSC.2013.6721714"},{"key":"e_1_3_2_73_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCYB.2020.2977374"},{"key":"e_1_3_2_74_1","doi-asserted-by":"publisher","DOI":"10.3390\/mining2030028"},{"key":"e_1_3_2_75_1","unstructured":"OpenAi. Available at: https:\/\/openai.com\/."},{"key":"e_1_3_2_76_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.simpat.2019.04.006"},{"key":"e_1_3_2_77_1","doi-asserted-by":"publisher","DOI":"10.1080\/17480930701482961"},{"key":"e_1_3_2_78_1","volume-title":"Competitive Strategy: Creating and Sustaining Superior Performance","author":"Porter ME","year":"1985","unstructured":"Porter ME (1985) Competitive Strategy: Creating and Sustaining Superior Performance. New York: The Free."},{"key":"e_1_3_2_79_1","doi-asserted-by":"publisher","DOI":"10.1002\/9780470316887"},{"key":"e_1_3_2_80_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10458-019-09433-x"},{"key":"e_1_3_2_81_1","volume-title":"35th international conference on machine learning, PMLR 80: 4295\u20134304","author":"Rashid T","year":"2018","unstructured":"Rashid T, De Witt C, Farquhar G, et al. (2018) QMIX: Monotonic value function factorisation for deep multi-agent reinforcement Learning. In: 35th international conference on machine learning, PMLR 80: 4295\u20134304."},{"key":"e_1_3_2_82_1","doi-asserted-by":"publisher","DOI":"10.1613\/jair.3987"},{"key":"e_1_3_2_83_1","volume-title":"On-line Q-learning Using Connectionist Systems","author":"Rummery GA","year":"1994","unstructured":"Rummery GA, Niranjan M (1994) On-line Q-learning Using Connectionist Systems, Vol. 37. UK: University of Cambridge, Department of Engineering Cambridge."},{"key":"e_1_3_2_84_1","volume-title":"Artificial Intelligence \u2013 A Modern Approach","author":"Russell SJ","year":"2010","unstructured":"Russell SJ, Norvig P (2010) Artificial Intelligence \u2013 A Modern Approach, 3rd Int. ed. Upper Saddle River, NJ, USA: Pearson Education.","edition":"3"},{"key":"e_1_3_2_85_1","doi-asserted-by":"publisher","DOI":"10.1147\/rd.33.0210"},{"key":"e_1_3_2_86_1","unstructured":"Schaul T Quan J Antonoglou I et al. (2015) Prioritized experience replay. arXiv preprint arXiv:1511.05952."},{"key":"e_1_3_2_87_1","volume-title":"International conference on machine learning","author":"Schulman J","year":"2015","unstructured":"Schulman J, Levine S, Abbeel P, et al. (2015) Trust region policy optimization. In: International conference on machine learning. PMLR 37: 1889\u20131897."},{"key":"e_1_3_2_88_1","unstructured":"Schulman J Wolski F Dhariwal P et al. (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347."},{"key":"e_1_3_2_89_1","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.39.10.1095"},{"key":"e_1_3_2_90_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.apenergy.2015.04.064"},{"key":"e_1_3_2_91_1","doi-asserted-by":"publisher","DOI":"10.1038\/nature16961"},{"key":"e_1_3_2_92_1","volume-title":"international conference on machine learning","author":"Silver D","year":"2014","unstructured":"Silver D, Lever G, Heess N, et al. (2014) Deterministic policy gradient algorithms. In: international conference on machine learning. PMLR 32(1): 387\u2013395."},{"key":"e_1_3_2_93_1","unstructured":"SimPy. Available at: https:\/\/simpy.readthedocs.io\/en\/latest\/."},{"key":"e_1_3_2_94_1","doi-asserted-by":"publisher","DOI":"10.1109\/TC.1980.1675516"},{"key":"e_1_3_2_95_1","first-page":"674","article-title":"Evaluation of the new truck dispatching in the Mount Wright mine","author":"Soumis F","year":"1989","unstructured":"Soumis F, Ethier J, Elbrond J (1989) Evaluation of the new truck dispatching in the Mount Wright mine. In: Application of Computers and Operations Research in the Mineral Industry, pp. 674\u2013682.","journal-title":"Application of Computers and Operations Research in the Mineral Industry"},{"key":"e_1_3_2_96_1","first-page":"4368045","article-title":"The use of a machine learning method to predict the real-time link travel time of open-pit trucks","volume":"2018","author":"Sun X","year":"2018","unstructured":"Sun X, Zhang H, Tian F, et al. (2018) The use of a machine learning method to predict the real-time link travel time of open-pit trucks. Mathematical Problems in Engineering 2018: 4368045.","journal-title":"Mathematical Problems in Engineering"},{"key":"e_1_3_2_97_1","unstructured":"Sunehag P Lever G Gruslys A et al. (2017) Value-decomposition networks for cooperative multi-agent learning. arXiv preprint arXiv:1706.05296."},{"key":"e_1_3_2_98_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF00115009"},{"key":"e_1_3_2_99_1","volume-title":"Advances in Neural Information Processing Systems. MIT Press, Cambridge, Massachusetts, United States, 8","author":"Sutton RS","year":"1995","unstructured":"Sutton RS (1995) Generalization in reinforcement learning: Successful examples using sparse coarse coding. In: Advances in Neural Information Processing Systems. MIT Press, Cambridge, Massachusetts, United States, 8."},{"key":"e_1_3_2_100_1","unstructured":"Sutton RS Barto AG (2018) Reinforcement Learning: An Introduction pp. 50\u2013200. Cambridge Massachusetts USA: MIT press."},{"key":"e_1_3_2_101_1","article-title":"Policy gradient methods for reinforcement learning with function approximation","author":"Sutton RS","year":"1999","unstructured":"Sutton RS, McAllester D, Singh S, et al. (1999) Policy gradient methods for reinforcement learning with function approximation. In: Advances in Neural Information Processing Systems 12.","journal-title":"Advances in Neural Information Processing Systems 12"},{"key":"e_1_3_2_102_1","doi-asserted-by":"publisher","DOI":"10.1016\/B978-1-55860-307-3.50049-6"},{"key":"e_1_3_2_103_1","doi-asserted-by":"publisher","DOI":"10.1142\/S0950609898000092"},{"key":"e_1_3_2_104_1","doi-asserted-by":"publisher","DOI":"10.1179\/1743286315Y.0000000024"},{"key":"e_1_3_2_105_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA48891.2023.10160983"},{"key":"e_1_3_2_106_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-89378-3_37"},{"key":"e_1_3_2_107_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11192-009-0146-3"},{"key":"e_1_3_2_108_1","volume-title":"Advances in Neural Information Processing Systems, 23","author":"Van Hasselt H","year":"2010","unstructured":"Van Hasselt H (2010) Double Q-learning. In: Advances in Neural Information Processing Systems, 23."},{"key":"e_1_3_2_109_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v30i1.10295"},{"issue":"1","key":"e_1_3_2_110_1","first-page":"3483","article-title":"Multi-objective reinforcement learning using sets of Pareto dominating policies","volume":"15","author":"Van Moffaert K","year":"2014","unstructured":"Van Moffaert K, Now\u00e9 A (2014) Multi-objective reinforcement learning using sets of Pareto dominating policies. The Journal of Machine Learning Research 15(1): 3483\u20133512.","journal-title":"The Journal of Machine Learning Research"},{"key":"e_1_3_2_111_1","unstructured":"Vignaux T Muller K Helmbold B (2007) SimPy Manual. Available at: http:\/\/simpy.readthedocs.org."},{"key":"e_1_3_2_112_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ijmst.2014.01.019"},{"key":"e_1_3_2_113_1","volume-title":"International conference on machine learning","author":"Wang Z","year":"2016","unstructured":"Wang Z, Schaul T, Hessel M, et al. (2016) Dueling network architectures for deep reinforcement learning. In: International conference on machine learning. PMLR 48: 1995\u20132003."},{"key":"e_1_3_2_114_1","unstructured":"Watkins C (1989) Learning from delayed rewards. PhD Thesis Cambridge University Cambridge England pp. 10\u201350."},{"key":"e_1_3_2_115_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF00992698"},{"issue":"973","key":"e_1_3_2_116_1","first-page":"43","article-title":"On improving truck\/shovel productivity in open pit mines","volume":"86","author":"White JW","year":"1993","unstructured":"White JW, Olson J, Vohnout S (1993) On improving truck\/shovel productivity in open pit mines. CIM Bulletin 86(973): 43\u201349.","journal-title":"CIM Bulletin"},{"key":"e_1_3_2_117_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4615-3618-5_2"},{"key":"e_1_3_2_118_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10462-022-10299-x"},{"key":"e_1_3_2_119_1","volume-title":"Advances in Neural Information Processing Systems, Curran Associates Inc., Red Hook, New York, United States, 30","author":"Wu Y","unstructured":"Wu Y, Mansimov E, Grosse RB, et al. (2017a) Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation. In: Advances in Neural Information Processing Systems, Curran Associates Inc., Red Hook, New York, United States, 30."},{"key":"e_1_3_2_120_1","unstructured":"Wu Y Mansimov E Liao S et al. (2017b) Openai baselines: ACKTR & A2C. Available at: https:\/\/openai.com\/blog\/baselines-acktra2c."},{"key":"e_1_3_2_121_1","doi-asserted-by":"publisher","DOI":"10.1109\/BigData50022.2020.9378191"},{"key":"e_1_3_2_122_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-981-15-4095-0_11"},{"key":"e_1_3_2_123_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.egypro.2015.07.469"},{"key":"e_1_3_2_124_1","doi-asserted-by":"publisher","DOI":"10.1109\/TITS.2020.3003163"},{"key":"e_1_3_2_125_1","unstructured":"Zhu Z Lin K Jain AK et al. (2020) Transfer learning in deep reinforcement learning: A survey. arXiv preprint arXiv:2009.07888."}],"container-title":["Mining Technology: Transactions of the Institutions of Mining and Metallurgy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/25726668231222998","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/25726668231222998","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/25726668231222998","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,5,1]],"date-time":"2026-05-01T00:44:38Z","timestamp":1777596278000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/25726668231222998"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,1,30]]},"references-count":124,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2024,3]]}},"alternative-id":["10.1177\/25726668231222998"],"URL":"https:\/\/doi.org\/10.1177\/25726668231222998","relation":{},"ISSN":["2572-6668","2572-6676"],"issn-type":[{"value":"2572-6668","type":"print"},{"value":"2572-6676","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,1,30]]}}}