{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,2]],"date-time":"2026-04-02T08:55:47Z","timestamp":1775120147273,"version":"3.50.1"},"reference-count":64,"publisher":"SAGE Publications","issue":"3","license":[{"start":{"date-parts":[[2023,3,1]],"date-time":"2023-03-01T00:00:00Z","timestamp":1677628800000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Queensland University of Technology (QUT) Centre for Robotics"},{"DOI":"10.13039\/501100022594","name":"Australian Research Council Centre of Excellence for Robotic Vision","doi-asserted-by":"crossref","award":["CE140100016"],"award-info":[{"award-number":["CE140100016"]}],"id":[{"id":"10.13039\/501100022594","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["The International Journal of Robotics Research"],"published-print":{"date-parts":[[2023,3]]},"abstract":"<jats:p> We present Bayesian Controller Fusion (BCF): a hybrid control strategy that combines the strengths of traditional hand-crafted controllers and model-free deep reinforcement learning (RL). BCF thrives in the robotics domain, where reliable but suboptimal control priors exist for many tasks, but RL from scratch remains unsafe and data-inefficient. By fusing uncertainty-aware distributional outputs from each system, BCF arbitrates control between them, exploiting their respective strengths. We study BCF on two real-world robotics tasks involving navigation in a vast and long-horizon environment, and a complex reaching task that involves manipulability maximisation. For both these domains, simple handcrafted controllers exist that can solve the task at hand in a risk-averse manner but do not necessarily exhibit the optimal solution given limitations in analytical modelling, controller miscalibration and task variation. As exploration is naturally guided by the prior in the early stages of training, BCF accelerates learning, while substantially improving beyond the performance of the control prior, as the policy gains more experience. More importantly, given the risk-aversity of the control prior, BCF ensures safe exploration and deployment, where the control prior naturally dominates the action distribution in states unknown to the policy. We additionally show BCF\u2019s applicability to the zero-shot sim-to-real setting and its ability to deal with out-of-distribution states in the real world. BCF is a promising approach towards combining the complementary strengths of deep RL and traditional robotic control, surpassing what either can achieve independently. The code and supplementary video material are made publicly available at https:\/\/krishanrana.github.io\/bcf . <\/jats:p>","DOI":"10.1177\/02783649231167210","type":"journal-article","created":{"date-parts":[[2023,4,7]],"date-time":"2023-04-07T12:31:05Z","timestamp":1680870665000},"page":"123-146","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":18,"title":["Bayesian controller fusion: Leveraging control priors in deep reinforcement learning for robotics"],"prefix":"10.1177","volume":"42","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9028-9295","authenticated-orcid":false,"given":"Krishan","family":"Rana","sequence":"first","affiliation":[{"name":"Queensland University of Technology (QUT) Centre for Robotics, Brisbane, Australia"}]},{"given":"Vibhavari","family":"Dasagi","sequence":"additional","affiliation":[{"name":"Queensland University of Technology (QUT) Centre for Robotics, Brisbane, Australia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1227-7459","authenticated-orcid":false,"given":"Jesse","family":"Haviland","sequence":"additional","affiliation":[{"name":"Queensland University of Technology (QUT) Centre for Robotics, Brisbane, Australia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5670-1928","authenticated-orcid":false,"given":"Ben","family":"Talbot","sequence":"additional","affiliation":[{"name":"Queensland University of Technology (QUT) Centre for Robotics, Brisbane, Australia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5162-1793","authenticated-orcid":false,"given":"Michael","family":"Milford","sequence":"additional","affiliation":[{"name":"Queensland University of Technology (QUT) Centre for Robotics, Brisbane, Australia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5286-3789","authenticated-orcid":false,"given":"Niko","family":"S\u00fcnderhauf","sequence":"additional","affiliation":[{"name":"Queensland University of Technology (QUT) Centre for Robotics, Brisbane, Australia"}]}],"member":"179","published-online":{"date-parts":[[2023,4,7]]},"reference":[{"key":"bibr1-02783649231167210","volume-title":"Spinning Up in Deep Reinforcement Learning","author":"Achiam J","year":"2018"},{"key":"bibr2-02783649231167210","unstructured":"Anderson P, Chang A, Chaplot DS, et al. (2018) On evaluation of embodied navigation agents. arXiv preprint arXiv:1807.06757."},{"key":"bibr3-02783649231167210","volume-title":"Hindsight Experience Replay","author":"Andrychowicz M","year":"2018"},{"key":"bibr4-02783649231167210","doi-asserted-by":"publisher","DOI":"10.1177\/0278364919887447"},{"key":"bibr5-02783649231167210","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-97085-1_9"},{"key":"bibr6-02783649231167210","doi-asserted-by":"publisher","DOI":"10.1145\/1553374.1553380"},{"key":"bibr7-02783649231167210","volume-title":"Model Predictive Control","author":"Camacho EF","year":"2013"},{"key":"bibr8-02783649231167210","volume-title":"Control Regularization for Reduced Variance Reinforcement Learning","author":"Cheng R","year":"2019"},{"key":"bibr9-02783649231167210","doi-asserted-by":"publisher","DOI":"10.1109\/TRO.2016.2633567"},{"key":"bibr10-02783649231167210","doi-asserted-by":"crossref","unstructured":"Colledanchise M, \u00d6gren P (2017) Behavior trees in robotics and ai: an introduction. ArXiv abs\/1709.00084.","DOI":"10.1201\/9780429489105"},{"key":"bibr11-02783649231167210","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA48506.2021.9561366"},{"key":"bibr12-02783649231167210","doi-asserted-by":"publisher","DOI":"10.1038\/nn1560"},{"key":"bibr13-02783649231167210","doi-asserted-by":"publisher","DOI":"10.3758\/CABN.8.4.429"},{"key":"bibr14-02783649231167210","doi-asserted-by":"publisher","DOI":"10.1145\/1160633.1160762"},{"key":"bibr15-02783649231167210","first-page":"1587","volume-title":"International conference on machine learning","author":"Fujimoto S","year":"2018"},{"key":"bibr16-02783649231167210","unstructured":"Galashov A, Jayakumar SM, Hasenclever L, et al. (2019) Information asymmetry in kl-regularized rl. arXiv preprint arXiv:1905.01240."},{"key":"bibr17-02783649231167210","doi-asserted-by":"publisher","DOI":"10.1109\/IROS.2017.8206046"},{"key":"bibr18-02783649231167210","doi-asserted-by":"publisher","DOI":"10.1109\/ICMLA.2009.33"},{"key":"bibr19-02783649231167210","doi-asserted-by":"crossref","unstructured":"Haarnoja T, Ha S, Zhou A, et al. (2018) Learning to walk via deep reinforcement learning. arXiv preprint arXiv:1812.11103.","DOI":"10.15607\/RSS.2019.XV.011"},{"key":"bibr20-02783649231167210","unstructured":"Haarnoja T, Zhou A, Hartikainen K, et al. (2019) Soft actor-critic algorithms and applications."},{"key":"bibr21-02783649231167210","doi-asserted-by":"publisher","DOI":"10.1016\/j.neuron.2017.06.011"},{"key":"bibr22-02783649231167210","unstructured":"Hausman K, Springenberg JT, Wang Z, et al. (2018) Learning an embedding space for transferable robot skills. In: International conference on learning representations, Vancouver, Canada, April 2018."},{"key":"bibr23-02783649231167210","unstructured":"Haviland J, Corke P (2021) A purely-reactive manipulability-maximising motion controller. arXiv preprint arXiv:2002.11901."},{"key":"bibr24-02783649231167210","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.11757"},{"key":"bibr25-02783649231167210","first-page":"2911","volume-title":"International conference on machine learning","author":"Hunt J","year":"2019"},{"key":"bibr26-02783649231167210","doi-asserted-by":"publisher","DOI":"10.1177\/0278364920987859"},{"key":"bibr27-02783649231167210","doi-asserted-by":"publisher","DOI":"10.1016\/j.neunet.2008.03.014"},{"key":"bibr28-02783649231167210","first-page":"916","volume-title":"Conference on robot learning","author":"Iscen A","year":"2018"},{"key":"bibr29-02783649231167210","unstructured":"James S, Freese M, Davison AJ (2019) Pyrep: bringing v-rep to deep robot learning. arXiv preprint arXiv:1906.11176."},{"key":"bibr30-02783649231167210","unstructured":"Jeong R, Springenberg JT, Kay J, et al. (2020) Learning dexterous manipulation from suboptimal experts. arXiv preprint arXiv:2010.08587."},{"key":"bibr31-02783649231167210","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i04.5953"},{"key":"bibr32-02783649231167210","doi-asserted-by":"crossref","unstructured":"Johannink T, Bahl S, Nair A, et al. (2018) Residual reinforcement learning for robot control. arXiv preprint arXiv:1812.03201.","DOI":"10.1109\/ICRA.2019.8794127"},{"key":"bibr33-02783649231167210","first-page":"2469","volume-title":"Proceedings of the 35th international conference on machine learning, proceedings of machine learning research","author":"Kang B","year":"2018"},{"key":"bibr34-02783649231167210","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4613-8997-2_29"},{"key":"bibr35-02783649231167210","volume-title":"Optimal Control Theory (An Introduction)","author":"Kirk DE","year":"2004"},{"key":"bibr36-02783649231167210","doi-asserted-by":"publisher","DOI":"10.1109\/ROBOT.1991.131810"},{"key":"bibr37-02783649231167210","volume":"32","author":"Kumar A","year":"2019","journal-title":"Advances in Neural Information Processing Systems"},{"key":"bibr38-02783649231167210","first-page":"6402","volume-title":"Advances in Neural Information Processing Systems","author":"Lakshminarayanan B","year":"2017"},{"key":"bibr39-02783649231167210","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA40945.2020.9197125"},{"key":"bibr40-02783649231167210","doi-asserted-by":"publisher","DOI":"10.1016\/j.neuron.2013.11.028"},{"key":"bibr41-02783649231167210","doi-asserted-by":"publisher","DOI":"10.1038\/nature14236"},{"key":"bibr42-02783649231167210","unstructured":"Nise N (2017) Control Systems Engineering. 6th edition. Pomona: John Wiley & Sons, p. 34."},{"key":"bibr43-02783649231167210","unstructured":"Osband I, Russo D, Wen Z, et al. (2017) Deep exploration via randomized value functions. ArXiv abs\/1703.07608."},{"key":"bibr44-02783649231167210","doi-asserted-by":"publisher","DOI":"10.1007\/s10846-014-0024-y"},{"key":"bibr45-02783649231167210","unstructured":"Pertsch K, Lee Y, Lim JJ (2020) Accelerating reinforcement learning with learned skill priors. arXiv preprint arXiv:2010.11944."},{"key":"bibr46-02783649231167210","doi-asserted-by":"publisher","DOI":"10.1109\/IROS45743.2020.9341372"},{"key":"bibr47-02783649231167210","doi-asserted-by":"crossref","unstructured":"Rana K, Talbot B, Milford M, et al. (2019) Residual reactive navigation: combining classical and learned navigation strategies for deployment in unknown environments. arXiv preprint arXiv:1909.10972.","DOI":"10.1109\/ICRA40945.2020.9197386"},{"key":"bibr48-02783649231167210","doi-asserted-by":"publisher","DOI":"10.1109\/IROS45743.2020.9341714"},{"key":"bibr49-02783649231167210","unstructured":"Schulman J, Levine S, Moritz P, et al. (2017) Trust region policy optimization."},{"key":"bibr50-02783649231167210","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-32552-1"},{"key":"bibr51-02783649231167210","unstructured":"Silver T, Allen K, Tenenbaum J, et al. (2018) Residual policy learning. arXiv preprint arXiv:1812.06298."},{"key":"bibr52-02783649231167210","first-page":"4742","volume-title":"Proceedings of the 35th international conference on machine learning, proceedings of machine learning research","volume":"80","author":"Srouji M","year":"2018"},{"key":"bibr53-02783649231167210","doi-asserted-by":"publisher","DOI":"10.1109\/TNN.1998.712192"},{"key":"bibr54-02783649231167210","unstructured":"Teh YW, Bapst V, Czarnecki WM, et al. (2017) Distral: robust multitask reinforcement learning."},{"key":"bibr55-02783649231167210","unstructured":"Tirumala D, Noh H, Galashov A, et al. (2019) Exploiting hierarchy for learning and transfer in kl-regularized rl. arXiv preprint arXiv:1903.07438."},{"key":"bibr56-02783649231167210","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v30i1.10295"},{"key":"bibr57-02783649231167210","unstructured":"Vecerik M, Hester T, Scholz J, et al. (2017) Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards. arXiv preprint arXiv:1707.08817."},{"key":"bibr58-02783649231167210","doi-asserted-by":"publisher","DOI":"10.1109\/ROBOT.1989.100007"},{"key":"bibr59-02783649231167210","unstructured":"Watkins CJCH (1989) Learning from delayed rewards."},{"key":"bibr60-02783649231167210","doi-asserted-by":"publisher","DOI":"10.1109\/TMMS.1969.299896"},{"key":"bibr61-02783649231167210","doi-asserted-by":"publisher","DOI":"10.1007\/BF00992696"},{"key":"bibr62-02783649231167210","doi-asserted-by":"publisher","DOI":"10.1109\/ICRA.2018.8461203"},{"key":"bibr63-02783649231167210","doi-asserted-by":"publisher","DOI":"10.1177\/027836498500400201"},{"key":"bibr64-02783649231167210","volume-title":"Essentials of Robust Control","volume":"104","author":"Zhou K","year":"1998"}],"container-title":["The International Journal of Robotics Research"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/02783649231167210","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/02783649231167210","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/02783649231167210","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,4]],"date-time":"2025-03-04T01:14:36Z","timestamp":1741050876000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/02783649231167210"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,3]]},"references-count":64,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2023,3]]}},"alternative-id":["10.1177\/02783649231167210"],"URL":"https:\/\/doi.org\/10.1177\/02783649231167210","relation":{},"ISSN":["0278-3649","1741-3176"],"issn-type":[{"value":"0278-3649","type":"print"},{"value":"1741-3176","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,3]]}}}