{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,8,29]],"date-time":"2025-08-29T09:43:12Z","timestamp":1756460592538,"version":"3.41.2"},"reference-count":50,"publisher":"Wiley","issue":"1","license":[{"start":{"date-parts":[[2021,9,24]],"date-time":"2021-09-24T00:00:00Z","timestamp":1632441600000},"content-version":"vor","delay-in-days":266,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61703418","61825305"],"award-info":[{"award-number":["61703418","61825305"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["onlinelibrary.wiley.com"],"crossmark-restriction":true},"short-container-title":["Computational Intelligence and Neuroscience"],"published-print":{"date-parts":[[2021,1]]},"abstract":"<jats:p>Reinforcement learning from demonstration (RLfD) is considered to be a promising approach to improve reinforcement learning (RL) by leveraging expert demonstrations as the additional decision\u2010making guidance. However, most existing RLfD methods only regard demonstrations as low\u2010level knowledge instances under a certain task. Demonstrations are generally used to either provide additional rewards or pretrain the neural network\u2010based RL policy in a supervised manner, usually resulting in poor generalization capability and weak robustness performance. Considering that human knowledge is not only interpretable but also suitable for generalization, we propose to exploit the potential of demonstrations by extracting knowledge from them via Bayesian networks and develop a novel RLfD method called Reinforcement Learning from demonstration via Bayesian Network\u2010based Knowledge (RLBNK). The proposed RLBNK method takes advantage of node influence with the Wasserstein distance metric (NIW) algorithm to obtain abstract concepts from demonstrations and then a Bayesian network conducts knowledge learning and inference based on the abstract data set, which will yield the coarse policy with corresponding confidence. Once the coarse policy\u2019s confidence is low, another RL\u2010based refine module will further optimize and fine\u2010tune the policy to form a (near) optimal hybrid policy. Experimental results show that the proposed RLBNK method improves the learning efficiency of corresponding baseline RL algorithms under both normal and sparse reward settings. Furthermore, we demonstrate that our RLBNK method delivers better generalization capability and robustness than baseline methods.<\/jats:p>","DOI":"10.1155\/2021\/7588221","type":"journal-article","created":{"date-parts":[[2021,9,25]],"date-time":"2021-09-25T03:35:20Z","timestamp":1632540920000},"update-policy":"https:\/\/doi.org\/10.1002\/crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["Efficient Reinforcement Learning from Demonstration via Bayesian Network\u2010Based Knowledge Extraction"],"prefix":"10.1155","volume":"2021","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3647-0245","authenticated-orcid":false,"given":"Yichuan","family":"Zhang","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4503-643X","authenticated-orcid":false,"given":"Yixing","family":"Lan","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5063-6889","authenticated-orcid":false,"given":"Qiang","family":"Fang","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3238-745X","authenticated-orcid":false,"given":"Xin","family":"Xu","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0978-3649","authenticated-orcid":false,"given":"Junxiang","family":"Li","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5765-684X","authenticated-orcid":false,"given":"Yujun","family":"Zeng","sequence":"additional","affiliation":[]}],"member":"311","published-online":{"date-parts":[[2021,9,24]]},"reference":[{"key":"e_1_2_12_1_2","doi-asserted-by":"publisher","DOI":"10.1038\/nature14236"},{"key":"e_1_2_12_2_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10489-020-01839-5"},{"key":"e_1_2_12_3_2","doi-asserted-by":"crossref","unstructured":"ZhaoX. ZhangL. DingZ. XiaL. TangJ. andYinD. GuoY.andFarooqF. Recommendations with negative feedback via pairwise deep reinforcement learning Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining KDD 2018 August 2018 London UK ACM 1040\u20131048 https:\/\/doi.org\/10.1145\/3219819.3219886 2-s2.0-85051481127.","DOI":"10.1145\/3219819.3219886"},{"key":"e_1_2_12_4_2","unstructured":"SchaalS. MozerM. JordanM. I. andPetscheT. Learning from demonstration Proceedings of the Advances in Neural Information Processing Systems 9 NIPS December 1996 Denver CO USA MIT Press 1040\u20131046."},{"key":"e_1_2_12_5_2","unstructured":"Vecer\u00edkM. HesterT. ScholzJ. WangF. PietquinO. PiotB. HeessN. Roth\u00f6rlT. LampeT. andRiedmillerM. A. Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards 2017 http:\/\/arxiv.org\/abs\/1707.08817."},{"key":"e_1_2_12_6_2","doi-asserted-by":"crossref","unstructured":"LiangX. WangT. YangL. andXingE. FerrariV. HebertM. SminchisescuC. andWeissY. CIRL: controllable imitative reinforcement learning for vision-based self-driving Proceedings of the 15th European Conference Computer Vision-ECCV 2018 September 2018 Munich Germany Springer 604\u2013620 https:\/\/doi.org\/10.1007\/978-3-030-01234-2_36 2-s2.0-85055130727.","DOI":"10.1007\/978-3-030-01234-2_36"},{"key":"e_1_2_12_7_2","doi-asserted-by":"publisher","DOI":"10.1561\/2300000053"},{"key":"e_1_2_12_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/tkde.2021.3079836"},{"volume-title":"Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference","year":"2014","author":"Pearl J.","key":"e_1_2_12_9_2"},{"key":"e_1_2_12_10_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.artint.2014.07.003"},{"key":"e_1_2_12_11_2","first-page":"103","volume-title":"Machine Intelligence 15, Intelligent Agents","author":"Bain M.","year":"1995"},{"key":"e_1_2_12_12_2","doi-asserted-by":"publisher","DOI":"10.1038\/nature16961"},{"key":"e_1_2_12_13_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2020.11.050"},{"key":"e_1_2_12_14_2","doi-asserted-by":"crossref","unstructured":"HusseinA. ElyanE. GaberM. M. andJayneC. Deep reward shaping from demonstrations Proceedings of the 2017 International Joint Conference on Neural Networks IJCNN 2017 May 2017 Anchorage AK USA IEEE 510\u2013517 https:\/\/doi.org\/10.1109\/ijcnn.2017.7965896 2-s2.0-85030980364.","DOI":"10.1109\/IJCNN.2017.7965896"},{"key":"e_1_2_12_15_2","unstructured":"ReddyS. DraganA. D. andLevineS. SQIL: imitation learning via reinforcement learning with sparse rewards In Proceedings of the 8th International Conference on Learning Representations ICLR 2020 April 2020 Addis Ababa Ethiopia."},{"key":"e_1_2_12_16_2","unstructured":"HesterT. Vecer\u00edkM. PietquinO. LanctotM. SchaulT. PiotB. HorganD. QuanJ. SendonarisA. OsbandI. Dulac-ArnoldG. AgapiouJ. P. LeiboJ. Z. andGruslysA. McIlraithS. A.andWeinbergerK. Q. Deep q-learning from demonstrations Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18) the 30th Innovative Applications of Artificial Intelligence (IAAI-18) and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18) February 2018 New Orleans LA USA 3223\u20133230."},{"key":"e_1_2_12_17_2","unstructured":"LillicrapT. P. HuntJ. J. PritzelA. HeessN. ErezT. TassaY. SilverD. andWierstraD. BengioY.andLeCunY. Continuous control with deep reinforcement learning Proceedings of the 4th International Conference on Learning Representations ICLR 2016 May 2016 San Juan Puerto Rico."},{"key":"e_1_2_12_18_2","doi-asserted-by":"crossref","unstructured":"CodevillaF. M\u00fcllerM. L\u00f3pezA. M. KoltunV. andDosovitskiyA. End-to-end driving via conditional imitation learning Proceedings of the 2018 IEEE International Conference on Robotics and Automation ICRA 2018 May 2018 Brisbane Australia IEEE 1\u20139 https:\/\/doi.org\/10.1109\/icra.2018.8460487 2-s2.0-85063146266.","DOI":"10.1109\/ICRA.2018.8460487"},{"key":"e_1_2_12_19_2","unstructured":"RossS.andBagnellD. TehY. W.andTitteringtonD. M. Efficient reductions for imitation learning Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics AISTATS 2010 May 2010 Sardinia Italy 661\u2013668."},{"key":"e_1_2_12_20_2","unstructured":"GoodfellowI. J. Pouget-AbadieJ. MirzaM. XuB. Warde-FarleyD. OzairS. CourvilleA. C. andBengioY. GhahramaniZ. WellingM. CortesC. LawrenceN. D. andWeinbergerK. Q. Generative adversarial nets Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014 Montreal Canada December 2014 2672\u20132680."},{"key":"e_1_2_12_21_2","unstructured":"HoJ.andErmonS. LeeD. D. SugiyamaM. von LuxburgU. GuyonI. andGarnettR. Generative adversarial imitation learning Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016 December 2016 Barcelona Spain 4565\u20134573."},{"key":"e_1_2_12_22_2","doi-asserted-by":"publisher","DOI":"10.1002\/acs.2967"},{"key":"e_1_2_12_23_2","doi-asserted-by":"publisher","DOI":"10.1007\/s40815-018-0559-3"},{"key":"e_1_2_12_24_2","doi-asserted-by":"crossref","unstructured":"GaoZ. LinF. ZhouY. ZhangH. WuK. andZhangH. Embedding high-level knowledge into dqns to learn faster and more safely 34 Proceedings of the AAAI Conference on Artificial Intelligence 2020 Vancouver Canada 13608\u201313609 https:\/\/doi.org\/10.1609\/aaai.v34i09.7091.","DOI":"10.1609\/aaai.v34i09.7091"},{"key":"e_1_2_12_25_2","doi-asserted-by":"crossref","unstructured":"TaylorM. E.andStoneP. GhahramaniZ. Cross-domain transfer for reinforcement learning Proceedings of the Twenty-Fourth International Conference (ICML 2007) June 2007 Corvallis OR USA ACM 879\u2013886 https:\/\/doi.org\/10.1145\/1273496.1273607 2-s2.0-34547997175.","DOI":"10.1145\/1273496.1273607"},{"key":"e_1_2_12_26_2","doi-asserted-by":"crossref","unstructured":"ZhangP. HaoJ. WangW. TangH. MaY. DuanY. andZhengY. BessiereC. Kogun: accelerating deep reinforcement learning via integrating human suboptimal knowledge Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence 2020 Yokohama Japan IJCAI 2020 2291\u20132297 https:\/\/doi.org\/10.24963\/ijcai.2020\/317.","DOI":"10.24963\/ijcai.2020\/317"},{"key":"e_1_2_12_27_2","unstructured":"BastaniO. PuY. andSolar-LezamaA. BengioS. WallachH. M. LarochelleH. GraumanK. Cesa-BianchiN. andGarnettR. Verifiable reinforcement learning via policy extraction Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018 Montr\u00e9al Canada December 2018 2499\u20132509."},{"key":"e_1_2_12_28_2","unstructured":"KontschiederP. FiterauM. CriminisiA. andBul\u00f2S. R. KambhampatiS. Deep neural decision forests Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence IJCAI 2016 New York NY USA July 2016 4190\u20134194."},{"key":"e_1_2_12_29_2","doi-asserted-by":"publisher","DOI":"10.1613\/jair.904"},{"key":"e_1_2_12_30_2","doi-asserted-by":"crossref","unstructured":"MadumalP. MillerT. SonenbergL. andVetereF. Explainable reinforcement learning through a causal lens 34 Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence AAAI 2020 the Thirty-Second Innovative Applications of Artificial Intelligence Conference IAAI 2020 The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence EAAI 2020 February 2020 New York NY USA 2493\u20132500 https:\/\/doi.org\/10.1609\/aaai.v34i03.5631.","DOI":"10.1609\/aaai.v34i03.5631"},{"key":"e_1_2_12_31_2","doi-asserted-by":"crossref","unstructured":"da SilvaF. L. Hernandez-LealP. KartalB. andTaylorM. E. Uncertainty-aware action advising for deep reinforcement learning agents 34 Proceedings of the AAAI Conference on Artificial Intelligence February 2020 New York NY USA no. 4 5792\u20135799 https:\/\/doi.org\/10.1609\/aaai.v34i04.6036.","DOI":"10.1609\/aaai.v34i04.6036"},{"volume-title":"Reinforcement Learning: An Introduction","year":"2018","author":"Sutton R. S.","key":"e_1_2_12_32_2"},{"key":"e_1_2_12_33_2","doi-asserted-by":"publisher","DOI":"10.1287\/mnsc.28.1.1"},{"key":"e_1_2_12_34_2","doi-asserted-by":"crossref","unstructured":"YildirimP.andBirantD. Naive bayes classifier for continuous variables using novel method (NBC4D) and distributions Proceedings of the 2014 IEEE International Symposium on Innovations in Intelligent Systems and Applications INISTA 2014 June 2014 Alberobello Italy IEEE 110\u2013115 https:\/\/doi.org\/10.1109\/inista.2014.6873605 2-s2.0-84906653565.","DOI":"10.1109\/INISTA.2014.6873605"},{"volume-title":"Probabilistic Graphical Models-Principles and Techniques","year":"2009","author":"Koller D.","key":"e_1_2_12_35_2"},{"key":"e_1_2_12_36_2","doi-asserted-by":"crossref","unstructured":"YangJ. WangY. PeiS. andHuQ. Monotonicity induced parameter learning for bayesian networks with limited data Proceedings of the 2018 International Joint Conference on Neural Networks IJCNN 2018 July 2018 Rio de Janeiro Brazil IEEE 1\u20138 https:\/\/doi.org\/10.1109\/ijcnn.2018.8489435 2-s2.0-85056495511.","DOI":"10.1109\/IJCNN.2018.8489435"},{"key":"e_1_2_12_37_2","doi-asserted-by":"crossref","unstructured":"WasserkrugS. MarinescuR. ZeltynS. ShindinE. andFeldmanY. A. Learning the parameters of bayesian networks from uncertain data 35 Proceedings of the AAAI Conference on Artificial Intelligence 2021 Vancouver Canada 12190\u201312197.","DOI":"10.1609\/aaai.v35i13.17447"},{"key":"e_1_2_12_38_2","doi-asserted-by":"publisher","DOI":"10.4324\/9780203765418"},{"key":"e_1_2_12_39_2","article-title":"See, feel, act: hierarchical learning for complex manipulation skills with multisensory fusion","volume":"4","author":"Oller M.","year":"2019","journal-title":"Science Robotics"},{"key":"e_1_2_12_40_2","unstructured":"KoiterJ. R. Visualizing inference in Bayesian networks 2006 Delft University of Technology Delft Netherlands Master thesis."},{"key":"e_1_2_12_41_2","doi-asserted-by":"publisher","DOI":"10.3390\/electronics8010040"},{"key":"e_1_2_12_42_2","doi-asserted-by":"publisher","DOI":"10.1023\/A:1026543900054"},{"key":"e_1_2_12_43_2","unstructured":"SchulmanJ. WolskiF. DhariwalP. RadfordA. andKlimovO. Proximal policy optimization algorithms 2017 http:\/\/arxiv.org\/abs\/1707.06347."},{"key":"e_1_2_12_44_2","doi-asserted-by":"crossref","unstructured":"DonadelloI. SerafiniL. andd\u2019Avila GarcezA. S. SierraC. Logic tensor networks for semantic image interpretation Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence IJCAI 2017 Melbourne Australia August 2017 1596\u20131602 https:\/\/doi.org\/10.24963\/ijcai.2017\/221.","DOI":"10.24963\/ijcai.2017\/221"},{"key":"e_1_2_12_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/tkde.2012.35"},{"key":"e_1_2_12_46_2","doi-asserted-by":"publisher","DOI":"10.1023\/a:1016304305535"},{"key":"e_1_2_12_47_2","unstructured":"BarhateN. Minimal pytorch implementation of proximal policy optimization 2021 https:\/\/github.com\/nikhilbarhate99\/PPO-PyTorch."},{"key":"e_1_2_12_48_2","doi-asserted-by":"crossref","unstructured":"AnkanA.andPandaA. PGMPY: probabilistic graphical models using python Proceedings of the 14th Python in Science Conference (SCIPY 2015) 2015 Austin TX USA https:\/\/doi.org\/10.25080\/majora-7b98e3ed-001.","DOI":"10.25080\/Majora-7b98e3ed-001"},{"key":"e_1_2_12_49_2","unstructured":"BrockmanG. CheungV. PetterssonL. SchneiderJ. SchulmanJ. TangJ. andZarembaW. OpenAI gym CoRR 2016."},{"key":"e_1_2_12_50_2","unstructured":"TasfiN. Pygame learning environment 2016 https:\/\/github.com\/ntasfi\/PyGame-Learning-Environment."}],"container-title":["Computational Intelligence and Neuroscience"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/downloads.hindawi.com\/journals\/cin\/2021\/7588221.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/downloads.hindawi.com\/journals\/cin\/2021\/7588221.xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1155\/2021\/7588221","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,6]],"date-time":"2024-08-06T12:34:05Z","timestamp":1722947645000},"score":1,"resource":{"primary":{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/10.1155\/2021\/7588221"}},"subtitle":[],"editor":[{"given":"Ahmed Mostafa","family":"Khalil","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2021,1]]},"references-count":50,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,1]]}},"alternative-id":["10.1155\/2021\/7588221"],"URL":"https:\/\/doi.org\/10.1155\/2021\/7588221","archive":["Portico"],"relation":{},"ISSN":["1687-5265","1687-5273"],"issn-type":[{"type":"print","value":"1687-5265"},{"type":"electronic","value":"1687-5273"}],"subject":[],"published":{"date-parts":[[2021,1]]},"assertion":[{"value":"2021-07-29","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-08-19","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-09-24","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}],"article-number":"7588221"}}