{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,22]],"date-time":"2025-12-22T21:40:04Z","timestamp":1766439604864,"version":"3.48.0"},"reference-count":79,"publisher":"Springer Science and Business Media LLC","issue":"12","license":[{"start":{"date-parts":[[2025,11,25]],"date-time":"2025-11-25T00:00:00Z","timestamp":1764028800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,11,25]],"date-time":"2025-11-25T00:00:00Z","timestamp":1764028800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"NTNU Norwegian University of Science and Technology"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Mach Learn"],"published-print":{"date-parts":[[2025,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>\n                    Deep reinforcement learning policies perform exceptionally well in applications like Atari games, chess, Go, and poker. However, they are incomprehensible, making the process of extracting new knowledge and understanding policy behavior difficult. For the same reason, deploying these policies in high-stakes applications like healthcare, finance, and criminal justice is infeasible. To rectify the incomprehensibility issue, we propose a new concept-based policy distillation method for convolutional neural network-based policies. Our method transforms raw image states into human-interpretable concepts using non-negative matrix factorization on the policy\u2019s activations. The concepts express features in an interpretable way and detail how the policy represents the world internally. We use the concepts to train a distilled policy represented using sparse linear models. The distilled policy chooses one linear model from a set of linear models to make action predictions. Employing a single sparse linear model reduces the complexity, making it easier for humans to understand policy behavior. Experimentally, we show the effectiveness of our distilled policy in four environments: Car Racing, Pong, Breakout, and Ms Pacman. We illustrate that inspecting these linear models gives local and global insight into how the black box policy works. Furthermore, we demonstrate that these linear models perform well by faithfully using the same features as the black box policy and capturing the black box policy\u2019s behavior in critical states. The code, data, trained models, and TensorBoard logs with hyperparameters used are provided (\n                    <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" xlink:href=\"https:\/\/github.com\/observer4599\/interpretable-concept-based-policy-distillation\" ext-link-type=\"uri\">https:\/\/github.com\/observer4599\/interpretable-concept-based-policy-distillation<\/jats:ext-link>\n                    ).\n                  <\/jats:p>","DOI":"10.1007\/s10994-025-06928-5","type":"journal-article","created":{"date-parts":[[2025,11,25]],"date-time":"2025-11-25T21:25:57Z","timestamp":1764105957000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Interpretable Deep Reinforcement Learning Via Concept-Based Policy Distillation"],"prefix":"10.1007","volume":"114","author":[{"given":"Yanzhe","family":"Bekkemoen","sequence":"first","affiliation":[]},{"given":"Helge","family":"Langseth","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,11,25]]},"reference":[{"key":"6928_CR1","doi-asserted-by":"publisher","unstructured":"Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I. J., Harp, A., Irving, G., Isard, M., Jia, Y., J\u00f3zefowicz, R., Kaiser, L., Kudlur, M., \u2026 Zheng, X. (2016). TensorFlow: Large-scale machine learning on heterogeneous distributed systems. CoRR. https:\/\/doi.org\/10.48550\/arXiv.1603.04467","DOI":"10.48550\/arXiv.1603.04467"},{"key":"6928_CR2","unstructured":"Achiam, J. (2018). Spinning up in deep reinforcement learning. https:\/\/github.com\/openai\/spinningup\/. Commit 038665d62d569055401d91856abb287263096178"},{"key":"6928_CR3","unstructured":"Adebayo, J., Gilmer, J., Muelly, M., Goodfellow, I. J., Hardt, M., & Kim, B. (2018). Sanity checks for saliency maps. In Proc. of NeurIPS. https:\/\/proceedings.neurips.cc\/paper\/2018\/hash\/294a8ed24b1ad22ec2e7efea049b8737-Abstract.html"},{"key":"6928_CR4","unstructured":"Albrecht, S. V., Christianos, F., & Sch\u00e4fer, L. (2024). Multi-agent reinforcement learning: Foundations and modern approaches. MIT. https:\/\/www.marl-book.com\/"},{"key":"6928_CR5","unstructured":"Amir, D., & Amir, O. (2018). HIGHLIGHTS: Summarizing agent behavior to people. In Proc. of AAMAS. http:\/\/dl.acm.org\/citation.cfm?id=3237869"},{"key":"6928_CR6","doi-asserted-by":"publisher","unstructured":"Amitai, Y., & Amir, O. (2024). A survey of global explanations in reinforcement learning. In Explainable agency in artificial intelligence. CRC Press. https:\/\/doi.org\/10.1201\/9781003355281-2","DOI":"10.1201\/9781003355281-2"},{"key":"6928_CR7","unstructured":"Bastani, O., Pu, Y., & Solar-Lezama, A. (2018). Verifiable reinforcement learning via policy extraction. In Proc. of NeurIPS. https:\/\/papers.nips.cc\/paper_files\/paper\/2018\/hash\/e6d8545daa42d5ced125a4bf747b3688-Abstract.html"},{"key":"6928_CR8","doi-asserted-by":"publisher","DOI":"10.1007\/S10994-023-06479-7","author":"Y Bekkemoen","year":"2024","unstructured":"Bekkemoen, Y. (2024). Explainable reinforcement learning (XRL): A systematic literature review and taxonomy. Machine Learning https:\/\/doi.org\/10.1007\/S10994-023-06479-7","journal-title":"Machine Learning"},{"key":"6928_CR9","doi-asserted-by":"publisher","DOI":"10.1613\/JAIR.3912","author":"MG Bellemare","year":"2013","unstructured":"Bellemare, M. G., Naddaf, Y., Veness, J., & Bowling, M. (2013). The arcade learning environment: An evaluation platform for general agents. The Journal of Artificial Intelligence Research. https:\/\/doi.org\/10.1613\/JAIR.3912","journal-title":"The Journal of Artificial Intelligence Research"},{"key":"6928_CR10","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1308.3432","author":"Y Bengio","year":"2013","unstructured":"Bengio, Y., L\u00e9onard, N., & Courville, A. C. (2013). Estimating or propagating gradients through stochastic neurons for conditional computation. CoRR. https:\/\/doi.org\/10.48550\/arXiv.1308.3432","journal-title":"CoRR"},{"key":"6928_CR11","doi-asserted-by":"publisher","unstructured":"Brown, N., & Sandholm, T. (2017). Libratus: The superhuman AI for no-limit Poker. In Proc. of IJCAI. https:\/\/doi.org\/10.24963\/IJCAI.2017\/772","DOI":"10.24963\/IJCAI.2017\/772"},{"key":"6928_CR12","unstructured":"Bunse, M. (2023). CRITDD | Critical Difference Diagrams. GitHub. Licensed under the GNU Public License, Version 3.0+. https:\/\/github.com\/mirkobunse\/critdd"},{"key":"6928_CR13","unstructured":"Chen, C., Li, O., Tao, D., Barnett, A., Rudin, C., & Su, J. (2019). This looks like that: Deep learning for interpretable image recognition. In Proc. of NeurIPS. https:\/\/proceedings.neurips.cc\/paper\/2019\/hash\/adf7ee2dcf142b0e11888e72b43fcb75-Abstract.html"},{"key":"6928_CR14","doi-asserted-by":"publisher","unstructured":"Collins, E., Achanta, R., & S\u00fcsstrunk, S. (2018). Deep feature factorization for concept discovery. In Proc. of ECCV. https:\/\/doi.org\/10.1007\/978-3-030-01264-9_21","DOI":"10.1007\/978-3-030-01264-9_21"},{"key":"6928_CR15","unstructured":"Coppens, Y., Efthymiadis, K., Lenaerts, T., & Nowe, A. (2019). Distilling deep reinforcement learning policies in soft decision trees. In Proc. of IJCAI Workshop on XAI. https:\/\/researchportal.vub.be\/en\/publications\/distilling-deep-reinforcement-learning-policies-in-soft-decision-"},{"key":"6928_CR16","unstructured":"Dem\u0161ar, J. (2006). Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research, 7, 1\u201330."},{"key":"6928_CR17","doi-asserted-by":"publisher","unstructured":"Falcon, W. (2019). The PyTorch Lightning team: PyTorch Lightning. GitHub. https:\/\/doi.org\/10.5281\/zenodo.3828935. https:\/\/github.com\/Lightning-AI\/pytorch-lightning","DOI":"10.5281\/zenodo.3828935"},{"key":"6928_CR18","unstructured":"Fel, T., Boutin, V., B\u00e9thune, L., Cad\u00e8ne, R., Moayeri, M., And\u00e9ol, L., Chalvidal, M., & Serre, T. (2023). A holistic approach to unifying automatic concept extraction and concept importance estimation. In Proc. of NeurIPS. http:\/\/papers.nips.cc\/paper_files\/paper\/2023\/hash\/abf3682c9cf9245a0294a4bebe4544ff-Abstract-Conference.html"},{"key":"6928_CR19","doi-asserted-by":"publisher","unstructured":"Fel, T., Picard, A., B\u00e9thune, L., Boissin, T., Vigouroux, D., Colin, J., Cad\u00e8nc, R., & Serre, T. (2023). CRAFT: Concept recursive activation factorization for explainability. In Proc. of CVPR. https:\/\/doi.org\/10.1109\/CVPR52729.2023.00266","DOI":"10.1109\/CVPR52729.2023.00266"},{"key":"6928_CR20","doi-asserted-by":"publisher","unstructured":"Fong, R. C., & Vedaldi, A. (2017). Interpretable explanations of black boxes by meaningful perturbation. In Proc. of ICCV. https:\/\/doi.org\/10.1109\/ICCV.2017.371","DOI":"10.1109\/ICCV.2017.371"},{"key":"6928_CR21","unstructured":"Frosst, N., & Hinton, G. E. (2017). Distilling a neural network into a soft decision tree. In Proc. of CEX. https:\/\/ceur-ws.org\/Vol-2071\/CExAIIA_2017_paper_3.pdf"},{"key":"6928_CR22","doi-asserted-by":"publisher","DOI":"10.1145\/3648472","author":"J Gajcin","year":"2024","unstructured":"Gajcin, J., & Dusparic, I. (2024). Redefining counterfactual explanations for reinforcement learning: Overview, challenges and opportunities. ACM Computing Surveys https:\/\/doi.org\/10.1145\/3648472","journal-title":"ACM Computing Surveys"},{"key":"6928_CR23","unstructured":"Ghorbani, A., Wexler, J., Zou, J. Y., & Kim, B. (2019). Towards automatic concept-based explanations. In Proc. of NeurIPS. https:\/\/proceedings.neurips.cc\/paper\/2019\/hash\/77d2afcb31f6493e350fca61764efb9a-Abstract.html"},{"key":"6928_CR24","unstructured":"Gildenblat, J. (2021). contributors: PyTorch library for CAM methods. MIT License. https:\/\/github.com\/jacobgil\/pytorch-grad-cam"},{"key":"6928_CR25","doi-asserted-by":"publisher","DOI":"10.1007\/s10994-024-06543-w","author":"C Glanois","year":"2024","unstructured":"Glanois, C., Weng, P., Zimmer, M., Li, D., Yang, T., Hao, J., & Liu, W. (2024). A survey on interpretable reinforcement learning. Machine Learning. https:\/\/doi.org\/10.1007\/s10994-024-06543-w","journal-title":"Machine Learning"},{"key":"6928_CR26","unstructured":"Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT. http:\/\/www.deeplearningbook.org"},{"key":"6928_CR27","unstructured":"Greydanus, S., Koul, A., Dodge, J., & Fern, A. (2018). Visualizing and understanding Atari agents. In Proc. of ICML. https:\/\/proceedings.mlr.press\/v80\/greydanus18a.html"},{"key":"6928_CR28","doi-asserted-by":"publisher","unstructured":"Harris, C. R., Millman, K. J., Walt, S., Gommers, R., Virtanen, P., Cournapeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N. J., Kern, R., Picus, M., Hoyer, S., Kerkwijk, M. H., Brett, M., Haldane, A., R\u00edo, J. F., Wiebe, M., Peterson, P., \u2026 Oliphant, T. E. (2020). Array programming with NumPy. Nature. https:\/\/doi.org\/10.1038\/S41586-020-2649-2","DOI":"10.1038\/S41586-020-2649-2"},{"key":"6928_CR29","doi-asserted-by":"publisher","unstructured":"Hastie, T., Tibshirani, R., & Friedman, J. H. (2009). The elements of statistical learning: Data mining, inference, and prediction. (2nd ed.). Springer series in statistics. Springer. https:\/\/doi.org\/10.1007\/978-0-387-84858-7","DOI":"10.1007\/978-0-387-84858-7"},{"key":"6928_CR30","doi-asserted-by":"publisher","DOI":"10.1016\/J.KNOSYS.2020.106685","author":"A Heuillet","year":"2021","unstructured":"Heuillet, A., Couthouis, F., & Rodr\u00edguez, N. D. (2021). Explainability in deep reinforcement learning. Knowledge-Based Systems. https:\/\/doi.org\/10.1016\/J.KNOSYS.2020.106685","journal-title":"Knowledge-Based Systems"},{"key":"6928_CR31","doi-asserted-by":"publisher","DOI":"10.1145\/3623377","author":"T Hickling","year":"2023","unstructured":"Hickling, T., Zenati, A., Aouf, N., & Spencer, P. (2023). Explainability in deep reinforcement learning: A review into current methods and applications. ACM Computing Surveys https:\/\/doi.org\/10.1145\/3623377","journal-title":"ACM Computing Surveys"},{"key":"6928_CR32","doi-asserted-by":"publisher","unstructured":"Huang, S. H., Bhatia, K., Abbeel, P., & Dragan, A. D. (2018). Establishing appropriate trust via critical states. In Proc. of IROS. https:\/\/doi.org\/10.1109\/IROS.2018.8593649","DOI":"10.1109\/IROS.2018.8593649"},{"key":"6928_CR33","unstructured":"Huang, S., Dossa, R. F. J., Ye, C., Braga, J., Chakraborty, D., Mehta, K., & Ara\u00fajo, J. G. M. (2022). CleanRL: High-quality single-file implementations of deep reinforcement learning algorithms. Journal of Machine Learning Research, 23(274), 1\u221218."},{"key":"6928_CR34","doi-asserted-by":"publisher","DOI":"10.1007\/S10514-018-9771-0","author":"SH Huang","year":"2019","unstructured":"Huang, S. H., Held, D., Abbeel, P., & Dragan, A. D. (2019). Enabling robots to communicate their objectives. Autonomous Robots. https:\/\/doi.org\/10.1007\/S10514-018-9771-0","journal-title":"Autonomous Robots"},{"key":"6928_CR35","doi-asserted-by":"publisher","unstructured":"Huber, T., Demmler, M., Mertes, S., Olson, M. L., & Andr\u00e9, E. (2023). GANterfactual-RL: Understanding reinforcement learning agents\u2019 strategies through visual counterfactual explanations. In N. Agmon, B. An, A. Ricci, & W. Yeoh (Eds.), Proc. of AAMAS (pp. 1097\u20131106). ACM. https:\/\/doi.org\/10.5555\/3545946.3598751","DOI":"10.5555\/3545946.3598751"},{"key":"6928_CR36","doi-asserted-by":"publisher","unstructured":"Huber, T., Schiller, D., & Andr\u00e9, E. (2019). Enhancing explainability of deep reinforcement learning through selective layer-wise relevance propagation. In Proc. of KI. https:\/\/doi.org\/10.1007\/978-3-030-30179-8_16","DOI":"10.1007\/978-3-030-30179-8_16"},{"key":"6928_CR37","doi-asserted-by":"publisher","DOI":"10.1109\/MCSE.2007.55","author":"JD Hunter","year":"2007","unstructured":"Hunter, J. D. (2007). Matplotlib: A 2D graphics environment. Computing in Science & Engineering https:\/\/doi.org\/10.1109\/MCSE.2007.55","journal-title":"Computing in Science & Engineering"},{"key":"6928_CR38","unstructured":"Kenny, E. M., Tucker, M., & Shah, J. (2023). Towards Interpretable deep reinforcement learning with human-friendly prototypes. In Proc. of ICLR. https:\/\/openreview.net\/pdf?id=hWwY_Jq0xsN"},{"key":"6928_CR39","unstructured":"Kim, B., Wattenberg, M., Gilmer, J., Cai, C. J., Wexler, J., Vi\u00e9gas, F. B., & Sayres, R. (2018). Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (TCAV). In Proc. of ICML. https:\/\/proceedings.mlr.press\/v80\/kim18d.html"},{"key":"6928_CR40","doi-asserted-by":"publisher","unstructured":"Kingma, D. P., & Ba, J. (2015). ADAM: A method for stochastic optimization. In Proc. of ICLR. https:\/\/doi.org\/10.48550\/arXiv.1412.6980","DOI":"10.48550\/arXiv.1412.6980"},{"issue":"6","key":"6928_CR41","doi-asserted-by":"publisher","first-page":"4909","DOI":"10.1109\/TITS.2021.3054625","volume":"23","author":"BR Kiran","year":"2022","unstructured":"Kiran, B. R., Sobh, I., Talpaert, V., Mannion, P., Sallab, A. A. A., Yogamani, S. K., & P\u00e9rez, P. (2022). Deep reinforcement learning for autonomous driving: A survey. IEEE Transactions on Intelligent Transportation Systems 23(6), 4909\u20134926. https:\/\/doi.org\/10.1109\/TITS.2021.3054625","journal-title":"IEEE Transactions on Intelligent Transportation Systems"},{"key":"6928_CR42","doi-asserted-by":"publisher","unstructured":"Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A. C., Lo, W., Doll\u00e1r, P., & Girshick, R. B. (2023). Segment anything. In Proc. of ICCV. https:\/\/doi.org\/10.1109\/ICCV51070.2023.00371","DOI":"10.1109\/ICCV51070.2023.00371"},{"key":"6928_CR43","unstructured":"Kochenderfer, M. J., Wheeler, T. A., & Wray, K. H. (2022). Algorithms for decision making. MIT. https:\/\/algorithmsbook.com\/"},{"key":"6928_CR44","doi-asserted-by":"publisher","unstructured":"Lage, I., Lifschitz, D., Doshi-Velez, F., & Amir, O. (2019). Exploring Computational User Models for Agent Policy Summarization. In: Proc. of IJCAI. https:\/\/doi.org\/10.24963\/IJCAI.2019\/194","DOI":"10.24963\/IJCAI.2019\/194"},{"issue":"1","key":"6928_CR45","doi-asserted-by":"publisher","first-page":"39","DOI":"10.1038\/s41746-024-01028-5","volume":"7","author":"JC Lauffenburger","year":"2024","unstructured":"Lauffenburger, J. C., Yom-Tov, E., Keller, P. A., McDonnell, M. E., Crum, K. L., Bhatkhande, G., Sears, E. S., Hanken, K., Bessette, L. G., Fontanet, C. P., Haff, N., Vine, S., & Choudhry, N. K. (2024). The impact of using reinforcement learning to personalize communication on medication adherence: Findings from the REINFORCE trial. NPJ Digital Medicine, 7(1), 39. https:\/\/doi.org\/10.1038\/s41746-024-01028-5","journal-title":"NPJ Digital Medicine"},{"issue":"6755","key":"6928_CR46","doi-asserted-by":"publisher","first-page":"788","DOI":"10.1038\/44565","volume":"401","author":"DD Lee","year":"1999","unstructured":"Lee, D. D., & Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755), 788\u2013791. https:\/\/doi.org\/10.1038\/44565","journal-title":"Nature"},{"key":"6928_CR47","doi-asserted-by":"publisher","DOI":"10.1145\/3233231","author":"ZC Lipton","year":"2018","unstructured":"Lipton, Z. C. (2018). The mythos of model interpretability. Communications of the ACM. https:\/\/doi.org\/10.1145\/3233231","journal-title":"Communications of the ACM"},{"key":"6928_CR48","unstructured":"Liu, G., Sun, X., Schulte, O., & Poupart, P. (2021). Learning tree interpretation from object representation for deep reinforcement learning. In Proc. of NeurIPS. https:\/\/proceedings.neurips.cc\/paper\/2021\/hash\/a35fe7f7fe8217b4369a0af4244d1fca-Abstract.html"},{"key":"6928_CR49","doi-asserted-by":"publisher","DOI":"10.1145\/3616864","author":"S Milani","year":"2024","unstructured":"Milani, S., Topin, N., Veloso, M., & Fang, F. (2024). Explainable reinforcement learning: A survey and comparative review. ACM Computing Surveys https:\/\/doi.org\/10.1145\/3616864","journal-title":"ACM Computing Surveys"},{"key":"6928_CR50","doi-asserted-by":"publisher","unstructured":"Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. A. (2013). Playing Atari with deep reinforcement learning. In NIPS deep learning workshop. https:\/\/doi.org\/10.48550\/arXiv.1312.5602","DOI":"10.48550\/arXiv.1312.5602"},{"key":"6928_CR51","doi-asserted-by":"publisher","unstructured":"Nauta, M., Bree, R., & Seifert, C. (2021). Neural prototype trees for interpretable fine-grained image recognition. In Proc. of CVPR. https:\/\/doi.org\/10.1109\/CVPR46437.2021.01469","DOI":"10.1109\/CVPR46437.2021.01469"},{"key":"6928_CR52","doi-asserted-by":"publisher","DOI":"10.1016\/J.ARTINT.2021.103455","volume":"295","author":"ML Olson","year":"2021","unstructured":"Olson, M. L., Khanna, R., Neal, L., Li, F., & Wong, W. (2021). Counterfactual state explanations for reinforcement learning agents via generative deep learning. Artificial Intelligence, 295, Article 103455. https:\/\/doi.org\/10.1016\/J.ARTINT.2021.103455","journal-title":"Artificial Intelligence"},{"key":"6928_CR53","unstructured":"Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., K\u00f6pf, A., Yang, E. Z., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., & Chintala, S. (2019). PyTorch: An imperative style, high-performance deep learning library. In Proc. of NeurIPS. https:\/\/proceedings.neurips.cc\/paper\/2019\/hash\/bdbca288fee7f92f2bfa9f7012727740-Abstract.html"},{"key":"6928_CR54","unstructured":"Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., VanderPlas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. The Journal of Machine Learning Research, 12, 2825\u20132830."},{"key":"6928_CR55","unstructured":"Proeschel, C. (2023). Implementation of the VIPER algorithm introduced in \u201cVerifiable Reinforcement Learning via Policy Extraction\u201d by Bastani et al. https:\/\/github.com\/Safe-RL-Team\/viper-verifiable-rl-impl\/. Commit e4acfe91f42549552fa04805339661039894bea2"},{"key":"6928_CR56","doi-asserted-by":"publisher","unstructured":"Puiutta, E., & Veith, E. M. S. P. (2020). Explainable reinforcement learning: A survey. In Proc. of CD-MAKE. https:\/\/doi.org\/10.1007\/978-3-030-57321-8_5","DOI":"10.1007\/978-3-030-57321-8_5"},{"key":"6928_CR57","unstructured":"Puri, N., Verma, S., Gupta, P., Kayastha, D., Deshmukh, S., Krishnamurthy, B., & Singh, S. (2020). Explain your move: Understanding agent actions using specific and relevant feature attribution. In Proc. of ICLR. https:\/\/openreview.net\/forum?id=SJgzLkBKPB"},{"key":"6928_CR58","unstructured":"Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., & Dormann, N. (2021). Stable-Baselines3: Reliable reinforcement learning implementations. The Journal of Machine Learning Research, 22(1), 12348\u201312355."},{"key":"6928_CR59","unstructured":"Ragodos, R. J., Wang, T., Lin, Q., & Zhou, X. (2022). ProtoX: Explaining a reinforcement learning agent via prototyping. In Proc. of NeurIPS. https:\/\/papers.nips.cc\/paper_files\/paper\/2022\/hash\/ae5bf4f35236240c9460e761c60fa53d-Abstract-Conference.html"},{"key":"6928_CR60","doi-asserted-by":"publisher","unstructured":"Roth, A. M., Manocha, D., Sriram, R. D., & Tabassi, E. (2024). Explainable and interpretable reinforcement learning for robotics. Springer. https:\/\/doi.org\/10.1007\/978-3-031-47518-4","DOI":"10.1007\/978-3-031-47518-4"},{"key":"6928_CR61","doi-asserted-by":"publisher","DOI":"10.1038\/s43586-022-00172-0","author":"C Rudin","year":"2022","unstructured":"Rudin, C. (2022). Why black box machine learning should be avoided for high-stakes decisions, in brief. Nature Reviews Methods Primers. https:\/\/doi.org\/10.1038\/s43586-022-00172-0","journal-title":"Nature Reviews Methods Primers"},{"key":"6928_CR62","doi-asserted-by":"publisher","DOI":"10.1214\/21-SS133","author":"C Rudin","year":"2022","unstructured":"Rudin, C., Chen, C., Chen, Z., Huang, H., Semenova, L., & Zhong, C. (2022). Interpretable machine learning: Fundamental principles and 10 grand challenges. Statistics Surveys. https:\/\/doi.org\/10.1214\/21-SS133","journal-title":"Statistics Surveys"},{"key":"6928_CR63","doi-asserted-by":"publisher","DOI":"10.1038\/S41586-020-03051-4","author":"J Schrittwieser","year":"2020","unstructured":"Schrittwieser, J., Antonoglou, I., Hubert, T., Simonyan, K., Sifre, L., Schmitt, S., Guez, A., Lockhart, E., Hassabis, D., Graepel, T., Lillicrap, T. P., & Silver, D. (2020). Mastering Atari, Go, chess and shogi by planning with a learned model. Nature https:\/\/doi.org\/10.1038\/S41586-020-03051-4","journal-title":"Nature"},{"key":"6928_CR64","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1707.06347","author":"J Schulman","year":"2017","unstructured":"Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. CoRR. https:\/\/doi.org\/10.48550\/arXiv.1707.06347","journal-title":"CoRR"},{"key":"6928_CR65","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2020.3037898","author":"W Shi","year":"2022","unstructured":"Shi, W., Huang, G., Song, S., Wang, Z., Lin, T., & Wu, C. (2022). Self-supervised discovering of interpretable features for reinforcement learning. IEEE Transactions on Pattern Analysis and Machine Intelligence. https:\/\/doi.org\/10.1109\/TPAMI.2020.3037898","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"6928_CR66","doi-asserted-by":"publisher","unstructured":"Sieusahai, A., & Guzdial, M. (2021). Explaining deep reinforcement learning agents in the Atari domain through a surrogate model. In Proc. of AIIDE. https:\/\/doi.org\/10.1609\/aiide.v17i1.18894","DOI":"10.1609\/aiide.v17i1.18894"},{"key":"6928_CR67","doi-asserted-by":"publisher","DOI":"10.1038\/NATURE16961","author":"D Silver","year":"2016","unstructured":"Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T. P., Leach, M., Kavukcuoglu, K., Graepel, T., & Hassabis, D. (2016). Mastering the game of Go with deep neural networks and tree search. Nature. https:\/\/doi.org\/10.1038\/NATURE16961","journal-title":"Nature"},{"key":"6928_CR68","unstructured":"Sun, A., Ma, P., Yuan, Y., & Wang, S. (2023). Explain any concept: Segment anything meets concept-based explanation. In Proc. of NeurIPS. http:\/\/papers.nips.cc\/paper_files\/paper\/2023\/hash\/44cdeb5ab7da31d9b5cd88fd44e3da84-Abstract-Conference.html"},{"key":"6928_CR69","unstructured":"Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT. http:\/\/incompleteideas.net\/book\/the-book-2nd.html"},{"key":"6928_CR70","doi-asserted-by":"publisher","first-page":"153","DOI":"10.1146\/annurev-control-030323-022510","volume":"8","author":"C Tang","year":"2025","unstructured":"Tang, C., Abbatematteo, B., Hu, J., Chandra, R., Mart\u00edn-Mart\u00edn, R., & Stone, P. (2025). Deep reinforcement learning for robotics: A survey of real-world successes. Annual Review of Control, Robotics, and Autonomous Systems., 8, 153\u2013188. https:\/\/doi.org\/10.1146\/annurev-control-030323-022510","journal-title":"Annual Review of Control, Robotics, and Autonomous Systems."},{"key":"6928_CR71","doi-asserted-by":"publisher","unstructured":"Towers, M., Terry, J. K., Kwiatkowski, A., Balis, J. U., Cola, G., Deleu, T., Goul\u00e3o, M., Kallinteris, A., KG, A., Krimmel, M., Perez-Vicente, R., Pierr\u00e9, A., Schulhoff, S., Tai, J. J., Shen, A. T. J., & Younis, O. G. (2023). Gymnasium. Zenodo. Retrieved August 7, 2023, from https:\/\/doi.org\/10.5281\/zenodo.8127025. https:\/\/github.com\/Farama-Foundation\/Gymnasium","DOI":"10.5281\/zenodo.8127025"},{"key":"6928_CR72","doi-asserted-by":"publisher","DOI":"10.1016\/J.NEUNET.2022.03.022","author":"M Vasic","year":"2022","unstructured":"Vasic, M., Petrovic, A., Wang, K., Nikolic, M., Singh, R., & Khurshid, S. (2022). Mo\u00cbT: Mixture of Expert Trees and its application to verifiable reinforcement learning. Neural Networks. https:\/\/doi.org\/10.1016\/J.NEUNET.2022.03.022","journal-title":"Neural Networks"},{"key":"6928_CR73","doi-asserted-by":"publisher","unstructured":"Virtanen, P., Gommers, R., Oliphant, T. E., Haberland, M., Reddy, T., Cournapeau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J., Walt, S., Brett, M., Wilson, J., Millman, K. J., Mayorov, N., Nelson, A. R. J., Jones, E., Kern, R., Larson, E., Carey, C., Polat, I., Feng, Y., Moore, E. W., VanderPlas, J., Laxalde, D., Perktold, J., Cimrman, R., Henriksen, I., Quintero, E. A., Harris, C. R., Archibald, A. M., Ribeiro, A. H., Pedregosa, F., & Mulbregt, P. (2020). SciPy: SciPy 1.0: Fundamental algorithms for scientific computing in Python. Nature Methods. https:\/\/doi.org\/10.1038\/s41592-019-0686-2","DOI":"10.1038\/s41592-019-0686-2"},{"issue":"10","key":"6928_CR74","doi-asserted-by":"publisher","first-page":"2633","DOI":"10.1038\/s41591-023-02552-9","volume":"29","author":"G Wang","year":"2023","unstructured":"Wang, G., Liu, X., Ying, Z., Yang, G., Chen, Z., Liu, Z., Zhang, M., Yan, H., Lu, Y., Gao, Y., Xue, K., Li, X., & Chen, Y. (2023). Optimized glycemic control of type 2 diabetes with reinforcement learning: A proof-of-concept trial. Nature Medicine, 29(10), 2633\u20132642. https:\/\/doi.org\/10.1038\/s41591-023-02552-9","journal-title":"Nature Medicine"},{"key":"6928_CR75","unstructured":"Xu, Y. (2018). Soft decision tree. https:\/\/github.com\/xuyxu\/Soft-Decision-Tree\/. Commit 9b02e635a1265b62df2d831f7f15e1742b0d5002. BSD 3-Clause License."},{"key":"6928_CR76","doi-asserted-by":"publisher","unstructured":"Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Proc. of ECCV. https:\/\/doi.org\/10.1007\/978-3-319-10590-1_53","DOI":"10.1007\/978-3-319-10590-1_53"},{"key":"6928_CR77","unstructured":"Zhang, A., Lipton, Z. C., Li, M., & Smola, A. J. (2023). Dive into deep learning. Cambridge University Press. https:\/\/D2L.ai\/"},{"key":"6928_CR78","doi-asserted-by":"publisher","unstructured":"Zhang, R., Madumal, P., Miller, T., Ehinger, K. A., & Rubinstein, B. I. P. (2021). Invertible concept-based explanations for CNN models with non-negative concept activation vectors. In Proc. of AAAI. https:\/\/doi.org\/10.1609\/aaai.v35i13.17389","DOI":"10.1609\/aaai.v35i13.17389"},{"key":"6928_CR79","doi-asserted-by":"publisher","unstructured":"Zhou, Y., Booth, S., Ribeiro, M. T., & Shah, J. (2022). Do feature attribution methods correctly attribute features? In Proc. of AAAI. https:\/\/doi.org\/10.1609\/AAAI.V36I9.21196","DOI":"10.1609\/AAAI.V36I9.21196"}],"container-title":["Machine Learning"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-025-06928-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10994-025-06928-5","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-025-06928-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,22]],"date-time":"2025-12-22T21:29:54Z","timestamp":1766438994000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10994-025-06928-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11,25]]},"references-count":79,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2025,12]]}},"alternative-id":["6928"],"URL":"https:\/\/doi.org\/10.1007\/s10994-025-06928-5","relation":{},"ISSN":["0885-6125","1573-0565"],"issn-type":[{"type":"print","value":"0885-6125"},{"type":"electronic","value":"1573-0565"}],"subject":[],"published":{"date-parts":[[2025,11,25]]},"assertion":[{"value":"6 August 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 October 2025","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"25 October 2025","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"25 November 2025","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors have no conflict of interest to declare.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}],"article-number":"288"}}