{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,2]],"date-time":"2026-05-02T10:04:00Z","timestamp":1777716240094,"version":"3.51.4"},"reference-count":51,"publisher":"SAGE Publications","issue":"3","license":[{"start":{"date-parts":[[2025,8,19]],"date-time":"2025-08-19T00:00:00Z","timestamp":1755561600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"funder":[{"DOI":"10.13039\/501100001691","name":"Japan Society for the Promotion of Science","doi-asserted-by":"publisher","award":["JP20H04265"],"award-info":[{"award-number":["JP20H04265"]}],"id":[{"id":"10.13039\/501100001691","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of Robotics Research"],"published-print":{"date-parts":[[2026,3]]},"abstract":"<jats:p>Uncertainty is inherent in real-world robotics problems, and any control framework must address it to succeed in practical applications. Reinforcement Learning is no different, and epistemic uncertainty arising from model uncertainty or misspecification is a challenge well captured by the sim-to-real gap. A simple solution to this issue is domain randomization (DR), which unfortunately can result in conservative agents. As a remedy to this conservativeness, the use of universal policies that take additional information about the randomized domain has risen as an alternative solution, along with recurrent neural network-based controllers. Uncertainty-aware universal policies present a particularly compelling solution able to account for system identification uncertainties during deployment. In this paper, we reveal that the challenge of efficiently optimizing uncertainty-aware policies can be fundamentally reframed as solving the convex coverage set (CCS) problem within a multi-objective reinforcement learning (MORL) context. By introducing a novel Markov decision process (MDP) framework where each domain\u2019s performance is treated as an independent objective, we unify the training of uncertainty-aware policies with MORL approaches. This connection enables the application of MORL algorithms for domain randomization (DR), allowing for more efficient policy optimization. To illustrate this, we focus on the linear utility function, which aligns with the expectation in DR formulations, and propose a series of algorithms adapted from the MORL literature to solve the CCS, demonstrating their ability to enhance the performance of uncertainty-aware policies.<\/jats:p>","DOI":"10.1177\/02783649251358844","type":"journal-article","created":{"date-parts":[[2025,8,20]],"date-time":"2025-08-20T06:07:23Z","timestamp":1755670043000},"page":"397-451","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":0,"title":["Domains as objectives: Domain-uncertainty-aware policy optimization through explicit multi-domain convex coverage set learning"],"prefix":"10.1177","volume":"45","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1518-2519","authenticated-orcid":false,"given":"Wendyam Eric Lionel","family":"Ilboudo","sequence":"first","affiliation":[{"name":"Division of Information Science, Graduate School of Science and Technology","place":["Japan"]},{"name":",","place":["Japan"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3760-249X","authenticated-orcid":false,"given":"Taisuke","family":"Kobayashi","sequence":"additional","affiliation":[{"name":"National Institute of Informatics (NII) and The Graduate University for Advanced Studies (SOKENDAI), Chiyoda, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3545-4814","authenticated-orcid":false,"given":"Takamitsu","family":"Matsubara","sequence":"additional","affiliation":[{"name":"Division of Information Science, Graduate School of Science and Technology","place":["Japan"]},{"name":",","place":["Japan"]}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"179","published-online":{"date-parts":[[2025,8,19]]},"reference":[{"key":"e_1_3_4_2_1","first-page":"11","volume-title":"International Conference on Machine Learning","author":"Abels A","year":"2019","unstructured":"Abels A, Roijers D, Lenaerts T, et al. (2019) Dynamic weights in multi-objective deep reinforcement learning. In: International Conference on Machine Learning. PMLR, 11\u201320."},{"key":"e_1_3_4_3_1","first-page":"29304","article-title":"Deep reinforcement learning at the edge of the statistical precipice","volume":"34","author":"Agarwal R","year":"2021","unstructured":"Agarwal R, Schwarzer M, Castro PS, et al. (2021) Deep reinforcement learning at the edge of the statistical precipice. Advances in Neural Information Processing Systems 34: 29304\u201329320.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_4_4_1","first-page":"1300","volume-title":"Conference on Robot Learning","author":"Ahn M","year":"2020","unstructured":"Ahn M, Zhu H, Hartikainen K, et al. (2020) Robel: robotics benchmarks for learning with low-cost robots. In: Conference on Robot Learning. PMLR, 1300\u20131313."},{"key":"e_1_3_4_5_1","doi-asserted-by":"publisher","DOI":"10.1177\/0278364919887447"},{"key":"e_1_3_4_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA40945.2020.9196931"},{"key":"e_1_3_4_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2019.8793789"},{"key":"e_1_3_4_8_1","unstructured":"Chen X Hu J Jin C et al. (2021) Understanding domain randomization for sim-to-real transfer. arXiv preprint arXiv:2110.03239."},{"key":"e_1_3_4_9_1","first-page":"9355","article-title":"Hardware conditioned policies for multi-robot transfer learning","volume":"31","author":"Chen T","year":"2018","unstructured":"Chen T, Murali A, Gupta A (2018) Hardware conditioned policies for multi-robot transfer learning. Advances in Neural Information Processing Systems 31: 9355\u20139366.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_4_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/IROS40897.2019.8968139"},{"key":"e_1_3_4_11_1","unstructured":"Derman E Mankowitz D Mann T et al. (2019) A bayesian approach to robust reinforcement learning. arXiv preprint arXiv:1905.08188."},{"key":"e_1_3_4_12_1","unstructured":"Ding Z (2019) Popular-rl-algorithms. https:\/\/github.com\/quantumiracle\/Popular-RL-Algorithms"},{"key":"e_1_3_4_13_1","unstructured":"Ding Z (2021) Not only domain randomization: universal policy with embedding system identification. arXiv preprint arXiv:2109.13438."},{"key":"e_1_3_4_14_1","unstructured":"Haarnoja T Zhou A Hartikainen K et al. (2018) Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905."},{"key":"e_1_3_4_15_1","unstructured":"Heess N Hunt JJ Lillicrap TP et al. (2015) Memory-based control with recurrent neural networks. arXiv preprint arXiv:1512.04455."},{"key":"e_1_3_4_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2020.3041755"},{"key":"e_1_3_4_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/IROS55552.2023.10342236"},{"key":"e_1_3_4_18_1","unstructured":"Lin Z Thomas G Yang G et al. (2020) Model-based adversarial meta-reinforcement learning. arXiv preprint arXiv:2006.08875."},{"key":"e_1_3_4_19_1","unstructured":"Mankowitz DJ Levine N Jeong R et al. (2019) Robust reinforcement learning for continuous control with model misspecification. arXiv preprint arXiv:1906.07516."},{"key":"e_1_3_4_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA40945.2020.9197063"},{"key":"e_1_3_4_21_1","volume-title":"IEEE International Symposium on Circuits and Systems, ISCAS 2017","author":"Medeiros JEG","year":"2018","unstructured":"Medeiros JEG (2018) Unscented transform framework for quantization modeling in data conversion systems. In: IEEE International Symposium on Circuits and Systems, ISCAS 2017. IEEE."},{"key":"e_1_3_4_22_1","first-page":"1162","volume-title":"Conference on Robot Learning","author":"Mehta B","year":"2020","unstructured":"Mehta B, Diaz M, Golemo F, et al. (2020) Active domain randomization. In: Conference on Robot Learning. PMLR, 1162\u20131176."},{"key":"e_1_3_4_23_1","doi-asserted-by":"publisher","DOI":"10.1162\/0899766053011528"},{"key":"e_1_3_4_24_1","unstructured":"Mossalam H Assael YM Roijers DM et al. (2016) Multi-objective deep reinforcement learning. arXiv preprint arXiv:1610.02707."},{"key":"e_1_3_4_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/IROS45743.2020.9341019"},{"key":"e_1_3_4_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/LRA.2021.3052391"},{"key":"e_1_3_4_27_1","first-page":"1532","volume-title":"Conference on Robot Learning","author":"Muratore F","year":"2022","unstructured":"Muratore F, Gruner T, Wiese F, et al. (2022a) Neural posterior domain randomization. In: Conference on Robot Learning. PMLR, 1532\u20131542."},{"key":"e_1_3_4_28_1","first-page":"1532","volume-title":"Conference on Robot Learning","author":"Muratore F","year":"2022","unstructured":"Muratore F, Gruner T, Wiese F, et al. (2022b) Neural posterior domain randomization. In: Conference on Robot Learning. PMLR, 1532\u20131542."},{"key":"e_1_3_4_29_1","doi-asserted-by":"publisher","DOI":"10.1145\/1102351.1102427"},{"key":"e_1_3_4_30_1","unstructured":"Oord A Li Y Vinyals O (2018) Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748."},{"key":"e_1_3_4_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2018.8460528"},{"key":"e_1_3_4_32_1","unstructured":"Petrik M Russel RH (2019) Beyond confidence regions: tight bayesian ambiguity sets for robust mdps. arXiv preprint arXiv:1902.07605. https:\/\/arxiv.org\/abs\/1902.07605"},{"key":"e_1_3_4_33_1","unstructured":"Pinto L Davidson J Sukthankar R et al. (2017) Robust adversarial reinforcement learning. arXiv preprint arXiv:1703.02702."},{"key":"e_1_3_4_34_1","unstructured":"Rajeswaran A Ghotra S Ravindran B et al. (2016) Epopt: learning robust neural network policies using model ensembles. arXiv preprint arXiv:1610.01283."},{"key":"e_1_3_4_35_1","first-page":"5331","volume-title":"International Conference on Machine Learning","author":"Rakelly K","year":"2019","unstructured":"Rakelly K, Zhou A, Finn C, et al. (2019) Efficient off-policy meta-reinforcement learning via probabilistic context variables. In: International Conference on Machine Learning. PMLR, 5331\u20135340."},{"key":"e_1_3_4_36_1","doi-asserted-by":"crossref","unstructured":"Ramos F Possas RC Fox D (2019) Bayessim: adaptive domain randomization via probabilistic inference for robotics simulators. arXiv preprint arXiv:1906.01728.","DOI":"10.15607\/RSS.2019.XV.029"},{"key":"e_1_3_4_37_1","unstructured":"Ruiz N Schulter S Chandraker M (2018) Learning to simulate."},{"key":"e_1_3_4_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICPR56361.2022.9956103"},{"key":"e_1_3_4_39_1","first-page":"1497","article-title":"Managing power consumption and performance of computing systems using reinforcement learning","volume":"20","author":"Tesauro G","year":"2007","unstructured":"Tesauro G, Das R, Chan H, et al. (2007) Managing power consumption and performance of computing systems using reinforcement learning. Advances in Neural Information Processing Systems 20: 1497\u20131504.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_4_40_1","unstructured":"Tessler C Efroni Y Mannor S (2019) Action robust reinforcement learning and applications in continuous control. arXiv preprint arXiv:1901.09184."},{"key":"e_1_3_4_41_1","doi-asserted-by":"crossref","unstructured":"Tobin J Fong R Ray A et al. (2017) Domain randomization for transferring deep neural networks from simulation to the real world. arXiv preprint arXiv:1703.06907.","DOI":"10.1109\/IROS.2017.8202133"},{"key":"e_1_3_4_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2012.6386109"},{"key":"e_1_3_4_43_1","volume-title":"Dynamic Map Building and Localization: New Theoretical Foundations","author":"Uhlmann J","year":"1995","unstructured":"Uhlmann J (1995) Dynamic Map Building and Localization: New Theoretical Foundations. PhD Thesis. University of Oxford."},{"issue":"1","key":"e_1_3_4_44_1","first-page":"3483","article-title":"Multi-objective reinforcement learning using sets of pareto dominating policies","volume":"15","author":"Van Moffaert K","year":"2014","unstructured":"Van Moffaert K, Now\u00e9 A (2014) Multi-objective reinforcement learning using sets of pareto dominating policies. Journal of Machine Learning Research 15(1): 3483\u20133512.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_4_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/ADPRL.2013.6615007"},{"key":"e_1_3_4_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00442"},{"key":"e_1_3_4_47_1","article-title":"Domain randomization for sim2real transfer","author":"Weng L","year":"2019","unstructured":"Weng L (2019) Domain randomization for sim2real transfer. lilianweng.github.io. https:\/\/lilianweng.github.io\/posts\/2019-05-05-domain-randomization\/","journal-title":"lilianweng.github.io"},{"key":"e_1_3_4_48_1","first-page":"24414","volume-title":"International Conference on Machine Learning","author":"Xie A","year":"2022","unstructured":"Xie A, Sodhani S, Finn C, et al. (2022) Robust policy learning over multiple uncertainty sets. In: International Conference on Machine Learning. PMLR, 24414\u201324429."},{"key":"e_1_3_4_49_1","unstructured":"Yang Z Nguyen H (2021) Recurrent off-policy baselines for memory-based continuous control. arXiv preprint arXiv:2110.12628."},{"key":"e_1_3_4_50_1","unstructured":"Yang R Sun X Narasimhan K (2019) A generalized algorithm for multi-objective reinforcement learning and policy adaptation. arXiv preprint arXiv:1908.08342."},{"key":"e_1_3_4_51_1","doi-asserted-by":"crossref","unstructured":"Yu W Tan J Liu CK et al. (2017) Preparing for the unknown: learning a universal policy with online system identification. arXiv preprint arXiv:1702.02453.","DOI":"10.15607\/RSS.2017.XIII.048"},{"key":"e_1_3_4_52_1","volume-title":"International Conference on Learning Representations","author":"Yu W","year":"2019","unstructured":"Yu W, Liu CK, Turk G (2019) Policy transfer with strategy optimization. In: International Conference on Learning Representations. https:\/\/openreview.net\/forum?id=H1g6osRcFQ"}],"container-title":["The International Journal of Robotics Research"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/02783649251358844","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/02783649251358844","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/02783649251358844","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T10:17:45Z","timestamp":1777457865000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/02783649251358844"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8,19]]},"references-count":51,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2026,3]]}},"alternative-id":["10.1177\/02783649251358844"],"URL":"https:\/\/doi.org\/10.1177\/02783649251358844","relation":{},"ISSN":["0278-3649","1741-3176"],"issn-type":[{"value":"0278-3649","type":"print"},{"value":"1741-3176","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,8,19]]}}}