{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,23]],"date-time":"2025-11-23T13:31:21Z","timestamp":1763904681072,"version":"3.37.3"},"reference-count":41,"publisher":"World Scientific Pub Co Pte Ltd","issue":"03","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Int. J. Info. Tech. Dec. Mak."],"published-print":{"date-parts":[[2019,5]]},"abstract":"<jats:p> Usually, real-world problems involve the optimization of multiple, possibly conflicting, objectives. These problems may be addressed by Multi-objective Reinforcement learning (MORL) techniques. MORL is a generalization of standard Reinforcement Learning (RL) where the single reward signal is extended to multiple signals, in particular, one for each objective. MORL is the process of learning policies that optimize multiple objectives simultaneously. In these problems, the use of directional\/gradient information can be useful to guide the exploration to better and better behaviors. However, traditional policy-gradient approaches have two main drawbacks: they require the use of a batch of episodes to properly estimate the gradient information (reducing in this way the learning speed), and they use stochastic policies which could have a disastrous impact on the safety of the learning system. In this paper, we present a novel population-based MORL algorithm for problems in which the underlying objectives are reasonably smooth. It presents two main characteristics: fast computation of the gradient information for each objective through the use of neighboring solutions, and the use of this information to carry out a geometric partition of the search space and thus direct the exploration to promising areas. Finally, the algorithm is evaluated and compared to policy gradient MORL algorithms on different multi-objective problems: the water reservoir and the biped walking problem (the latter both on simulation and on a real robot). <\/jats:p>","DOI":"10.1142\/s0219622019500093","type":"journal-article","created":{"date-parts":[[2018,11,28]],"date-time":"2018-11-28T03:48:34Z","timestamp":1543376914000},"page":"1045-1082","source":"Crossref","is-referenced-by-count":5,"title":["Directed Exploration in Black-Box Optimization for Multi-Objective Reinforcement Learning"],"prefix":"10.1142","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5638-5240","authenticated-orcid":false,"given":"Javier","family":"Garc\u00eda","sequence":"first","affiliation":[{"name":"CiTIUS, Universidade de Santiago de Compostela, Santiago de Compostela, Spain"}]},{"given":"Roberto","family":"Iglesias","sequence":"additional","affiliation":[{"name":"CiTIUS, Universidade de Santiago de Compostela, Santiago de Compostela, Spain"}]},{"given":"Miguel A.","family":"Rodr\u00edguez","sequence":"additional","affiliation":[{"name":"CiTIUS, Universidade de Santiago de Compostela, Santiago de Compostela, Spain"}]},{"given":"Carlos V.","family":"Regueiro","sequence":"additional","affiliation":[{"name":"Department of Electronics and Systems, Universidade de Coru\u00f1a, A Coru\u00f1a, Spain"}]}],"member":"219","published-online":{"date-parts":[[2019,6,10]]},"reference":[{"key":"S0219622019500093BIB001","doi-asserted-by":"publisher","DOI":"10.1109\/TNN.1998.712192"},{"key":"S0219622019500093BIB002","doi-asserted-by":"publisher","DOI":"10.1016\/j.engappai.2014.01.007"},{"first-page":"307","volume-title":"The 12th IEEE Int. Conf. Fuzzy Systems, FUZZ-IEEE 2003","author":"Nojima Y.","key":"S0219622019500093BIB003"},{"key":"S0219622019500093BIB004","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-27645-3_10"},{"key":"S0219622019500093BIB010","doi-asserted-by":"publisher","DOI":"10.1016\/B978-1-55860-247-2.50017-6"},{"key":"S0219622019500093BIB013","first-page":"1633","volume":"10","author":"Taylor M. E.","year":"2009","journal-title":"Journal of Machine Learning Research"},{"issue":"1","key":"S0219622019500093BIB014","first-page":"388","volume":"2","author":"Deisenroth M. P.","year":"2013","journal-title":"Foundations and Trends in Robotics"},{"key":"S0219622019500093BIB015","doi-asserted-by":"publisher","DOI":"10.1007\/BF00992696"},{"key":"S0219622019500093BIB016","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-010-5223-6"},{"first-page":"535","volume-title":"14th IEEE-RAS Int. Conf. Humanoid Robots, Humanoids 2014","author":"Hwangbo J.","key":"S0219622019500093BIB017"},{"key":"S0219622019500093BIB018","doi-asserted-by":"publisher","DOI":"10.1109\/ROBOT.2004.1307456"},{"first-page":"2323","volume-title":"2014 International Joint Conference on Neural Networks, IJCNN 2014","author":"Parisi S.","key":"S0219622019500093BIB019"},{"key":"S0219622019500093BIB020","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2006.282564"},{"first-page":"778","volume-title":"Genetic and Evolutionary Computation \u2014 GECCO 2003, Genetic and Evolutionary Computation Conference","author":"Brown M.","key":"S0219622019500093BIB021"},{"issue":"1","key":"S0219622019500093BIB022","first-page":"3","volume":"6","author":"Brown M.","year":"2005","journal-title":"International Journal of Computers, Systems, and Signals"},{"volume-title":"Manuel d\u2019\u00e9conomie politique","year":"1969","author":"Pareto V.","key":"S0219622019500093BIB023"},{"key":"S0219622019500093BIB024","doi-asserted-by":"publisher","DOI":"10.1613\/jair.3987"},{"key":"S0219622019500093BIB025","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-010-5232-5"},{"key":"S0219622019500093BIB026","doi-asserted-by":"publisher","DOI":"10.1109\/TEVC.2003.810758"},{"key":"S0219622019500093BIB027","first-page":"197","volume-title":"International Conference on Machine Learning (ICML-98)","author":"Gabor Z.","year":"1998"},{"key":"S0219622019500093BIB028","first-page":"325","volume":"5","author":"Mannor S.","year":"2004","journal-title":"Journal of Machine Learning Research"},{"key":"S0219622019500093BIB029","doi-asserted-by":"publisher","DOI":"10.1145\/1102351.1102427"},{"volume-title":"Evolutionary Multi-Criterion Optimization \u2014 7th International Conference, EMO 2013","author":"Van Moffaert K.","key":"S0219622019500093BIB030"},{"key":"S0219622019500093BIB031","first-page":"2928","volume-title":"Proc. Twenty-Ninth AAAI Conf. Artificial Intelligence, AAAI\u201915","author":"Pirotta M.","year":"2015"},{"key":"S0219622019500093BIB032","doi-asserted-by":"publisher","DOI":"10.1007\/BF01197559"},{"key":"S0219622019500093BIB033","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-89378-3_37"},{"key":"S0219622019500093BIB034","doi-asserted-by":"crossref","unstructured":"A. Castelletti,  F. Pianosi and  M. Restelli ,  Tree-Based Fitted q-Iteration for Multi-Objective Markov Decision Problems, in  The 2012 International Joint Conference on Neural Networks (IJCNN),  Brisbane, Australia, June 10\u201315, 2012, pp.  1\u20138.","DOI":"10.1109\/IJCNN.2012.6252759"},{"key":"S0219622019500093BIB035","doi-asserted-by":"publisher","DOI":"10.1162\/evco.2007.15.1.1"},{"key":"S0219622019500093BIB036","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-45712-7_29"},{"key":"S0219622019500093BIB037","doi-asserted-by":"crossref","unstructured":"E. Mezura-Montes,  M. Reyes-Sierra and  C. A. Coello Coello ,  Multi-Objective Optimization Using Differential Evolution: A Survey of the State-of-the-Art (Springer,  Berlin, Heidelberg,  2008), pp.  173\u2013196.","DOI":"10.1007\/978-3-540-68830-3_7"},{"key":"S0219622019500093BIB038","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2010.5649089"},{"key":"S0219622019500093BIB039","doi-asserted-by":"publisher","DOI":"10.2478\/s13230-010-0002-4"},{"key":"S0219622019500093BIB040","doi-asserted-by":"crossref","unstructured":"H. van Hasselt ,  Reinforcement Learning in Continuous State and Action Spaces, Adaptation, Learning, and Optimization, Vol.  12, Chapter 7  (Springer,  Berlin, Heidelberg,  2012), pp.  207\u2013251.","DOI":"10.1007\/978-3-642-27645-3_7"},{"key":"S0219622019500093BIB041","doi-asserted-by":"publisher","DOI":"10.1177\/0278364913495721"},{"key":"S0219622019500093BIB043","volume-title":"Schaum\u2019s Outlines, Advanced Calculus","author":"Wrede R. C.","year":"2010","edition":"3"},{"key":"S0219622019500093BIB044","series-title":"Wiley Series in Probability and Statistics","volume-title":"The EM Algorithm and Extensions","author":"McLachlan G.","year":"2007"},{"key":"S0219622019500093BIB045","doi-asserted-by":"publisher","DOI":"10.1162\/106365600568167"},{"key":"S0219622019500093BIB046","doi-asserted-by":"publisher","DOI":"10.1109\/CEC.2006.1688440"},{"volume-title":"Insights in Reinforcement Learning","year":"2011","author":"Van Hasselt H.","key":"S0219622019500093BIB048"},{"volume-title":"Nonparametric Statistics: A Step-by-Step Approach","year":"2014","author":"Corder G. W.","key":"S0219622019500093BIB049"},{"first-page":"583","volume-title":"RoboCup 2014: Robot World Cup XVIII","author":"Shafii N.","key":"S0219622019500093BIB051"}],"container-title":["International Journal of Information Technology &amp; Decision Making"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.worldscientific.com\/doi\/pdf\/10.1142\/S0219622019500093","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,8,7]],"date-time":"2019-08-07T00:13:39Z","timestamp":1565136819000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.worldscientific.com\/doi\/abs\/10.1142\/S0219622019500093"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,5]]},"references-count":41,"journal-issue":{"issue":"03","published-online":{"date-parts":[[2019,6,10]]},"published-print":{"date-parts":[[2019,5]]}},"alternative-id":["10.1142\/S0219622019500093"],"URL":"https:\/\/doi.org\/10.1142\/s0219622019500093","relation":{},"ISSN":["0219-6220","1793-6845"],"issn-type":[{"type":"print","value":"0219-6220"},{"type":"electronic","value":"1793-6845"}],"subject":[],"published":{"date-parts":[[2019,5]]}}}