{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,3]],"date-time":"2026-06-03T18:27:03Z","timestamp":1780511223271,"version":"3.54.1"},"publisher-location":"Cham","reference-count":75,"publisher":"Springer International Publishing","isbn-type":[{"value":"9783031040825","type":"print"},{"value":"9783031040832","type":"electronic"}],"license":[{"start":{"date-parts":[[2022,1,1]],"date-time":"2022-01-01T00:00:00Z","timestamp":1640995200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,4,17]],"date-time":"2022-04-17T00:00:00Z","timestamp":1650153600000},"content-version":"vor","delay-in-days":106,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>In reinforcement learning, an agent interacts with an environment from which it receives rewards, that are then used to learn a task. However, it is often unclear what strategies or concepts the agent has learned to solve the task. Thus, interpretability of the agent\u2019s behavior is an important aspect in practical applications, next to the agent\u2019s performance at the task itself. However, with the increasing complexity of both tasks and agents, interpreting the agent\u2019s behavior becomes much more difficult. Therefore, developing new interpretable RL agents is of high importance. To this end, we propose to use Align-RUDDER as an interpretability method for reinforcement learning. Align-RUDDER is a method based on the recently introduced RUDDER framework, which relies on contribution analysis of an LSTM model, to redistribute rewards to key events. From these key events a strategy can be derived, guiding the agent\u2019s decisions in order to solve a certain task. More importantly, the key events are in general interpretable by humans, and are often sub-tasks; where solving these sub-tasks is crucial for solving the main task. Align-RUDDER enhances the RUDDER framework with methods from multiple sequence alignment (MSA) to identify key events from demonstration trajectories. MSA needs only a few trajectories in order to perform well, and is much better understood than deep learning models such as LSTMs. Consequently, strategies and concepts can be learned from a few expert demonstrations, where the expert can be a human or an agent trained by reinforcement learning. By substituting RUDDER\u2019s LSTM with a profile model that is obtained from MSA of demonstration trajectories, we are able to interpret an agent at three stages: First, by extracting common strategies from demonstration trajectories with MSA. Second, by encoding the most prevalent strategy via the MSA profile model and therefore explaining the expert\u2019s behavior. And third, by allowing the interpretation of an arbitrary agent\u2019s behavior based on its demonstration trajectories.<\/jats:p>","DOI":"10.1007\/978-3-031-04083-2_10","type":"book-chapter","created":{"date-parts":[[2022,4,16]],"date-time":"2022-04-16T17:03:23Z","timestamp":1650128603000},"page":"177-205","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["XAI and Strategy Extraction via Reward Redistribution"],"prefix":"10.1007","author":[{"given":"Marius-Constantin","family":"Dinu","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Markus","family":"Hofmarcher","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Vihang P.","family":"Patil","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Matthias","family":"Dorfer","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Patrick M.","family":"Blies","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Johannes","family":"Brandstetter","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jose A.","family":"Arjona-Medina","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Sepp","family":"Hochreiter","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2022,4,17]]},"reference":[{"key":"10_CR1","doi-asserted-by":"publisher","unstructured":"Ancona, M., Ceolini, E., \u00d6ztireli, C., Gross, M.: Gradient-based attribution methods. In: Samek, W., Montavon, G., Vedaldi, A., Hansen, L.K., M\u00fcller, K.-R. (eds.) Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. LNCS (LNAI), vol. 11700, pp. 169\u2013191. Springer, Cham (2019). https:\/\/doi.org\/10.1007\/978-3-030-28954-6_9. ISBN 978-3-030-28954-6","DOI":"10.1007\/978-3-030-28954-6_9"},{"key":"10_CR2","unstructured":"Arjona-Medina, J.A., Gillhofer, M., Widrich, M., Unterthiner, T., Brandstetter, J., Hochreiter, S.: RUDDER: return decomposition for delayed rewards. In: Advances in Neural Information Processing Systems, vol. 32, pp. 13566\u201313577 (2019)"},{"key":"10_CR3","doi-asserted-by":"crossref","unstructured":"Arras, L., Montavon, G., M\u00fcller, K.-R., Samek, W.: Explaining recurrent neural network predictions in sentiment analysis. arXiv, abs\/1706.07206 (2017)","DOI":"10.18653\/v1\/W17-5221"},{"key":"10_CR4","doi-asserted-by":"publisher","unstructured":"Arras, L., et al.: Explaining and interpreting LSTMs. In: Samek, W., Montavon, G., Vedaldi, A., Hansen, L.K., M\u00fcller, K.-R. (eds.) Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. LNCS (LNAI), vol. 11700, pp. 211\u2013238. Springer, Cham (2019). https:\/\/doi.org\/10.1007\/978-3-030-28954-6_11. ISBN978-3-030-28954-6","DOI":"10.1007\/978-3-030-28954-6_11"},{"key":"10_CR5","doi-asserted-by":"publisher","unstructured":"Bach, S., Binder, A., Montavon, G., Klauschen, F., M\u00fcller, K.-R., Samek, W.: On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS One 10(7), e0130140 (2015). https:\/\/doi.org\/10.1371\/journal.pone.0130140","DOI":"10.1371\/journal.pone.0130140"},{"key":"10_CR6","first-page":"1803","volume":"11","author":"D Baehrens","year":"2010","unstructured":"Baehrens, D., Schroeter, T., Harmeling, S., Kawanabe, M., Hansen, K., M\u00fcller, K.-R.: How to explain individual classification decisions. J. Mach. Learn. Res. 11, 1803\u20131831 (2010). ISSN 1532-4435","journal-title":"J. Mach. Learn. Res."},{"key":"10_CR7","unstructured":"Bakker, B.: Reinforcement learning with long short-term memory. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems, vol. 14, pp. 1475\u20131482. MIT Press (2002)"},{"key":"10_CR8","doi-asserted-by":"publisher","unstructured":"Bakker, B.: Reinforcement learning by backpropagation through an LSTM model\/critic. In: IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, pp. 127\u2013134 (2007). https:\/\/doi.org\/10.1109\/ADPRL.2007.368179","DOI":"10.1109\/ADPRL.2007.368179"},{"key":"10_CR9","unstructured":"Barreto, A., et al.: Successor features for transfer in reinforcement learning. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates Inc. (2017)"},{"key":"10_CR10","doi-asserted-by":"publisher","DOI":"10.1515\/9781400874668","volume-title":"Adaptive Control Processes","author":"RE Bellman","year":"1961","unstructured":"Bellman, R.E.: Adaptive Control Processes. Princeton University Press, New Jersey (1961)"},{"key":"10_CR11","doi-asserted-by":"publisher","unstructured":"Binder, A., Bach, S., Montavon, G., M\u00fcller, K.-R., Samek, W.: Layer-wise relevance propagation for deep neural network architectures. In: Information Science and Applications (ICISA) 2016. LNEE, vol. 376, pp. 913\u2013922. Springer, Singapore (2016). https:\/\/doi.org\/10.1007\/978-981-10-0557-2_87. ISBN 978-981-10-0557-2","DOI":"10.1007\/978-981-10-0557-2_87"},{"issue":"4","key":"10_CR12","doi-asserted-by":"publisher","first-page":"613","DOI":"10.1162\/neco.1993.5.4.613","volume":"5","author":"P Dayan","year":"1993","unstructured":"Dayan, P.: Improving generalization for temporal difference learning: the successor representation. Neural Comput. 5(4), 613\u2013624 (1993)","journal-title":"Neural Comput."},{"key":"10_CR13","unstructured":"Correia, A.D.S., Colombini, E.L.: Attention, please! a survey of neural attention models in deep learning. arXiv, abs\/2103.16775 (2021)"},{"key":"10_CR14","unstructured":"Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv, abs\/1810.04805 (2019)"},{"issue":"1","key":"10_CR15","doi-asserted-by":"publisher","first-page":"68","DOI":"10.1145\/3359786","volume":"63","author":"M Du","year":"2019","unstructured":"Du, M., Liu, N., Hu, X.: Techniques for interpretable machine learning. Commun. ACM 63(1), 68\u201377 (2019). https:\/\/doi.org\/10.1145\/3359786. ISSN 0001-0782","journal-title":"Commun. ACM"},{"issue":"4","key":"10_CR16","doi-asserted-by":"publisher","first-page":"401","DOI":"10.2307\/2412923","volume":"27","author":"J Felsenstein","year":"1978","unstructured":"Felsenstein, J.: Cases in which parsimony or compatibility methods will be positively misleading. Syst. Zool. 27(4), 401\u2013410 (1978). https:\/\/doi.org\/10.2307\/2412923","journal-title":"Syst. Zool."},{"key":"10_CR17","unstructured":"Frans, K., Ho, J., Chen, X., Abbeel, P., Schulman, J.: Meta learning shared hierarchies. In: International Conference on Learning Representations (2018). arXiv abs\/1710.09767"},{"issue":"5814","key":"10_CR18","doi-asserted-by":"publisher","first-page":"972","DOI":"10.1126\/science.1136800","volume":"315","author":"BJ Frey","year":"2007","unstructured":"Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972\u2013976 (2007). https:\/\/doi.org\/10.1126\/science.1136800","journal-title":"Science"},{"key":"10_CR19","doi-asserted-by":"crossref","unstructured":"Guss, W.H., et al.: MineRL: a large-scale dataset of minecraft demonstrations. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI 2019) (2019)","DOI":"10.24963\/ijcai.2019\/339"},{"key":"10_CR20","unstructured":"Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Dy, J., Krause, A. (eds.) Proceedings of Machine Learning Research, vol. 80, pp. 1861\u20131870. PMLR (2018). arXiv abs\/1801.01290"},{"key":"10_CR21","unstructured":"Harutyunyan, A., et al.: Hindsight credit assignment. In: Advances in Neural Information Processing Systems, vol. 32, pp. 12467\u201312476 (2019)"},{"issue":"3","key":"10_CR22","doi-asserted-by":"publisher","first-page":"297","DOI":"10.1214\/ss\/1177013604","volume":"1","author":"T Hastie","year":"1986","unstructured":"Hastie, T., Tibshirani, R.: Generalized additive models. Stat. Sci. 1(3), 297\u2013310 (1986). https:\/\/doi.org\/10.1214\/ss\/1177013604","journal-title":"Stat. Sci."},{"key":"10_CR23","unstructured":"Hausknecht, M.J., Stone, P.: Deep recurrent Q-learning for partially observable MDPs. arXiv, abs\/1507.06527 (2015)"},{"key":"10_CR24","unstructured":"Heess, N., Wayne, G., Tassa, Y., Lillicrap, T.P., Riedmiller, M.A., Silver, D.: Learning and transfer of modulated locomotor controllers. arXiv, abs\/1610.05182 (2016)"},{"key":"10_CR25","unstructured":"Hessel, M., et al.: Rainbow: combining improvements in deep reinforcement learning. arXiv, abs\/1710.02298 (2017)"},{"key":"10_CR26","unstructured":"Hester, T., et al.: Deep Q-learning from demonstrations. In: The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18). Association for the Advancement of Artificial Intelligence (2018)"},{"key":"10_CR27","unstructured":"Hinton, G.E., Sejnowski, T.E.: Learning and relearning in Boltzmann machines. In: Parallel Distributed Processing, vol. 1, pp. 282\u2013317. MIT Press, Cambridge (1986)"},{"key":"10_CR28","unstructured":"Hochreiter, S.: Implementierung und Anwendung eines \u2018neuronalen\u2019 Echtzeit-Lernalgorithmus f\u00fcr reaktive Umgebungen. Practical work, Supervisor: J. Schmidhuber, Institut f\u00fcr Informatik, Technische Universit\u00e4t M\u00fcnchen (1990)"},{"key":"10_CR29","unstructured":"Hochreiter, S.: Untersuchungen zu dynamischen neuronalen Netzen. Master\u2019s thesis, Technische Universit\u00e4t M\u00fcnchen (1991)"},{"key":"10_CR30","unstructured":"Hochreiter, S., Schmidhuber, J.: Long short-term memory. Technical report FKI-207-95, Fakult\u00e4t f\u00fcr Informatik, Technische Universit\u00e4t M\u00fcnchen (1995)"},{"issue":"8","key":"10_CR31","doi-asserted-by":"publisher","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","volume":"9","author":"S Hochreiter","year":"1997","unstructured":"Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735\u20131780 (1997)","journal-title":"Neural Comput."},{"key":"10_CR32","first-page":"473","volume-title":"Advances in Neural Information Processing Systems","author":"S Hochreiter","year":"1997","unstructured":"Hochreiter, S., Schmidhuber, J.: LSTM can solve hard long time lag problems. In: Mozer, M.C., Jordan, M.I., Petsche, T. (eds.) Advances in Neural Information Processing Systems, vol. 9, pp. 473\u2013479. MIT Press, Cambridge (1997)"},{"key":"10_CR33","unstructured":"Kanervisto, A., Karttunen, J., Hautam\u00e4ki, V.: Playing Minecraft with behavioural cloning. In: Escalante, H.J., Hadsell, R. (eds.) Proceedings of Machine Learning Research (PMLR), vol. 123, pp. 56\u201366. PMLR (2020)"},{"key":"10_CR34","doi-asserted-by":"publisher","unstructured":"Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740\u2013755. Springer, Cham (2014). https:\/\/doi.org\/10.1007\/978-3-319-10602-1_48. ISBN 978-3-319-10602-1","DOI":"10.1007\/978-3-319-10602-1_48"},{"issue":"1","key":"10_CR35","doi-asserted-by":"publisher","first-page":"56","DOI":"10.1038\/s42256-019-0138-9","volume":"2","author":"SM Lundberg","year":"2020","unstructured":"Lundberg, S.M., et al.: From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2(1), 56\u201367 (2020). https:\/\/doi.org\/10.1038\/s42256-019-0138-9. ISSN 2522-5839","journal-title":"Nat. Mach. Intell."},{"issue":"3","key":"10_CR36","doi-asserted-by":"publisher","first-page":"506","DOI":"10.1002\/smj.2512","volume":"38","author":"J Luoma","year":"2017","unstructured":"Luoma, J., Ruutu, S., King, A.W., Tikkanen, H.: Time delays, competitive interdependence, and firm performance. Strateg. Manag. J. 38(3), 506\u2013525 (2017). https:\/\/doi.org\/10.1002\/smj.2512","journal-title":"Strateg. Manag. J."},{"key":"10_CR37","unstructured":"Milani, S., et al.: Retrospective analysis of the 2019 MineRL competition on sample efficient reinforcement learning. arXiv, abs\/2003.05012 (2020)"},{"issue":"1","key":"10_CR38","doi-asserted-by":"publisher","first-page":"8","DOI":"10.1109\/JRPROC.1961.287775","volume":"49","author":"M Minsky","year":"1961","unstructured":"Minsky, M.: Steps towards artificial intelligence. Proc. IRE 49(1), 8\u201330 (1961). https:\/\/doi.org\/10.1109\/JRPROC.1961.287775","journal-title":"Proc. IRE"},{"issue":"7540","key":"10_CR39","doi-asserted-by":"publisher","first-page":"529","DOI":"10.1038\/nature14236","volume":"518","author":"V Mnih","year":"2015","unstructured":"Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529\u2013533 (2015). https:\/\/doi.org\/10.1038\/nature14236","journal-title":"Nature"},{"key":"10_CR40","unstructured":"Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: Proceedings of the 33rd International Conference on Machine Learning (ICML), Volume 48 of Proceedings of Machine Learning Research, pp. 1928\u20131937. PMLR.org (2016)"},{"key":"10_CR41","doi-asserted-by":"publisher","first-page":"211","DOI":"10.1016\/j.patcog.2016.11.008","volume":"65","author":"G Montavon","year":"2017","unstructured":"Montavon, G., Lapuschkin, S., Binder, A., Samek, W., M\u00fcller, K.-R.: Explaining nonlinear classification decisions with deep Taylor decomposition. Pattern Recogn. 65, 211\u2013222 (2017). https:\/\/doi.org\/10.1016\/j.patcog.2016.11.008","journal-title":"Pattern Recogn."},{"key":"10_CR42","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.dsp.2017.10.011","volume":"73","author":"G Montavon","year":"2017","unstructured":"Montavon, G., Samek, W., M\u00fcller, K.-R.: Methods for interpreting and understanding deep neural networks. Digit. Signal Process. 73, 1\u201315 (2017). https:\/\/doi.org\/10.1016\/j.dsp.2017.10.011","journal-title":"Digit. Signal Process."},{"key":"10_CR43","unstructured":"Munro, P.W.: A dual back-propagation scheme for scalar reinforcement learning. In: Proceedings of the Ninth Annual Conference of the Cognitive Science Society, Seattle, WA, pp. 165\u2013176 (1987)"},{"issue":"3","key":"10_CR44","doi-asserted-by":"publisher","first-page":"443","DOI":"10.1016\/0022-2836(70)90057-4","volume":"48","author":"SB Needleman","year":"1970","unstructured":"Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443\u2013453 (1970)","journal-title":"J. Mol. Biol."},{"key":"10_CR45","unstructured":"Patil, V.P., et al.: Align-rudder: learning from few demonstrations by reward redistribution. arXiv, abs\/2009.14108 (2020). CoRR"},{"key":"10_CR46","unstructured":"Petsiuk, V., Das, A., Saenko, K.: RISE: randomized input sampling for explanation of black-box models. arXiv, abs\/1806.07421 (2018)"},{"key":"10_CR47","unstructured":"Puterman, M.L.: Markov Decision Processes, 2nd edn. Wiley (2005). ISBN 978-0-471-72782-8"},{"issue":"4","key":"10_CR48","doi-asserted-by":"publisher","first-page":"309","DOI":"10.1002\/sdr.427","volume":"25","author":"H Rahmandad","year":"2009","unstructured":"Rahmandad, H., Repenning, N., Sterman, J.: Effects of feedback delay on learning. Syst. Dyn. Rev. 25(4), 309\u2013338 (2009). https:\/\/doi.org\/10.1002\/sdr.427","journal-title":"Syst. Dyn. Rev."},{"key":"10_CR49","unstructured":"Reddy, S., Dragan, A.D., Levine, S.: SQIL: imitation learning via regularized behavioral cloning. In: Eighth International Conference on Learning Representations (ICLR) (2020). arXiv abs\/1905.11108"},{"key":"10_CR50","unstructured":"Robinson, A.J.: Dynamic error propagation networks. PhD thesis, Trinity Hall and Cambridge University Engineering Department (1989)"},{"key":"10_CR51","unstructured":"Robinson, T., Fallside, F.: Dynamic reinforcement driven error propagation networks with application to game playing. In: Proceedings of the 11th Conference of the Cognitive Science Society, Ann Arbor, pp. 836\u2013843 (1989)"},{"issue":"3","key":"10_CR52","doi-asserted-by":"publisher","first-page":"211","DOI":"10.1007\/s11263-015-0816-y","volume":"115","author":"O Russakovsky","year":"2015","unstructured":"Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211\u2013252 (2015). https:\/\/doi.org\/10.1007\/s11263-015-0816-y","journal-title":"Int. J. Comput. Vis."},{"key":"10_CR53","unstructured":"Scheller, C., Schraner, Y., Vogel, M.: Sample efficient reinforcement learning through learning from demonstrations in Minecraft. In: Escalante, H.J., Hadsell, R. (eds.) Proceedings of Machine Learning Research (PMLR), vol. 123, pp. 67\u201376. PMLR (2020)"},{"key":"10_CR54","doi-asserted-by":"crossref","unstructured":"Schmidhuber, J.: Making the world differentiable: On using fully recurrent self-supervised neural networks for dynamic reinforcement learning and planning in non-stationary environments. Technical report FKI-126-90 (revised), Institut f\u00fcr Informatik, Technische Universit\u00e4t M\u00fcnchen (1990). Experiments by Sepp Hochreiter","DOI":"10.1109\/IJCNN.1990.137723"},{"key":"10_CR55","doi-asserted-by":"publisher","first-page":"85","DOI":"10.1016\/j.neunet.2014.09.003","volume":"61","author":"J Schmidhuber","year":"2015","unstructured":"Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85\u2013117 (2015). https:\/\/doi.org\/10.1016\/j.neunet.2014.09.003","journal-title":"Neural Netw."},{"key":"10_CR56","unstructured":"Schulman, J., Levine, S., Moritz, P., Jordan, M.I., Abbeel, P.: Trust region policy optimization. In: 32st International Conference on Machine Learning (ICML), Volume 37 of Proceedings of Machine Learning Research, pp. 1889\u20131897. PMLR (2015)"},{"key":"10_CR57","unstructured":"Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv, abs\/1707.06347 (2018)"},{"issue":"7587","key":"10_CR58","doi-asserted-by":"publisher","first-page":"484","DOI":"10.1038\/nature16961","volume":"529","author":"D Silver","year":"2016","unstructured":"Silver, D., et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484\u2013489 (2016). https:\/\/doi.org\/10.1038\/nature16961","journal-title":"Nature"},{"key":"10_CR59","unstructured":"Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv, abs\/1312.6034 (2014)"},{"key":"10_CR60","first-page":"123","volume":"22","author":"SP Singh","year":"1996","unstructured":"Singh, S.P., Sutton, R.S.: Reinforcement learning with replacing eligibility traces. Mach. Learn. 22, 123\u2013158 (1996)","journal-title":"Mach. Learn."},{"key":"10_CR61","unstructured":"Skrynnik, A., Staroverov, A., Aitygulov, E., Aksenov, K., Davydov, V., Panov, A.I.: Hierarchical deep Q-network with forgetting from imperfect demonstrations in Minecraft. arXiv, abs\/1912.08664 (2019)"},{"issue":"1","key":"10_CR62","doi-asserted-by":"publisher","first-page":"195","DOI":"10.1016\/0022-2836(81)90087-5","volume":"147","author":"TF Smith","year":"1981","unstructured":"Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195\u2013197 (1981)","journal-title":"J. Mol. Biol."},{"issue":"9","key":"10_CR63","doi-asserted-by":"publisher","first-page":"2997","DOI":"10.1093\/nar\/10.9.2997","volume":"10","author":"GD Stormo","year":"1982","unstructured":"Stormo, G.D., Schneider, T.D., Gold, L., Ehrenfeucht, A.: Use of the \u2018Perceptron\u2019 algorithm to distinguish translational initiation sites in E. coli. Nucleic Acids Res. 10(9), 2997\u20133011 (1982)","journal-title":"Nucleic Acids Res."},{"key":"10_CR64","unstructured":"Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. In: Proceedings of the 34th International Conference on Machine Learning, ICML 2017, vol. 70, pp. 3319\u20133328 (2017)"},{"key":"10_CR65","unstructured":"Sutton, R., McAllester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Solla, S., Leen, T., M\u00fcller, K. (eds.) Advances in Neural Information Processing Systems, vol. 12. MIT Press (2000)"},{"key":"10_CR66","unstructured":"Sutton, R.S.: Temporal credit assignment in reinforcement learning. PhD thesis, University of Massachusetts Amherst (1984)"},{"key":"10_CR67","volume-title":"Reinforcement Learning: An Introduction","author":"RS Sutton","year":"2018","unstructured":"Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. MIT Press, Cambridge (2018)","edition":"2"},{"issue":"1\u20132","key":"10_CR68","doi-asserted-by":"publisher","first-page":"181","DOI":"10.1016\/S0004-3702(99)00052-1","volume":"112","author":"RS Sutton","year":"1999","unstructured":"Sutton, R.S., Precup, D., Singh, S.P.: Between MDPs and Semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artif. Intell. 112(1\u20132), 181\u2013211 (1999)","journal-title":"Artif. Intell."},{"issue":"22","key":"10_CR69","doi-asserted-by":"publisher","first-page":"4673","DOI":"10.1093\/nar\/22.22.4673","volume":"22","author":"JD Thompson","year":"1994","unstructured":"Thompson, J.D., Higgins, D.G., Gibson, T.J.: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22(22), 4673\u20134680 (1994)","journal-title":"Nucleic Acids Res."},{"key":"10_CR70","doi-asserted-by":"crossref","unstructured":"Torabi, F., Warnell, G., Stone, P.: Behavioral cloning from observation (2018)","DOI":"10.24963\/ijcai.2018\/687"},{"key":"10_CR71","unstructured":"Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30, pp. 5998\u20136008. Curran Associates Inc. (2017)"},{"key":"10_CR72","unstructured":"Vilone, G., Longo, L.: Explainable artificial intelligence: a systematic review. arXiv, abs\/2006.00093 (2020)"},{"issue":"7782","key":"10_CR73","doi-asserted-by":"publisher","first-page":"350","DOI":"10.1038\/s41586-019-1724-z","volume":"575","author":"O Vinyals","year":"2019","unstructured":"Vinyals, O., et al.: Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575(7782), 350\u2013354 (2019)","journal-title":"Nature"},{"key":"10_CR74","unstructured":"Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. thesis, King\u2019s College (1989)"},{"key":"10_CR75","unstructured":"Wei, D., Dash, S., Gao, T., Gunluk, O.: Generalized linear rule models. In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning, Volume 97 of Proceedings of Machine Learning Research, pp. 6687\u20136696. PMLR, 09\u201315 June 2019"}],"container-title":["Lecture Notes in Computer Science","xxAI - Beyond Explainable AI"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/978-3-031-04083-2_10","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,4,16]],"date-time":"2022-04-16T17:10:47Z","timestamp":1650129047000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/978-3-031-04083-2_10"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022]]},"ISBN":["9783031040825","9783031040832"],"references-count":75,"URL":"https:\/\/doi.org\/10.1007\/978-3-031-04083-2_10","relation":{},"ISSN":["0302-9743","1611-3349"],"issn-type":[{"value":"0302-9743","type":"print"},{"value":"1611-3349","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022]]},"assertion":[{"value":"17 April 2022","order":1,"name":"first_online","label":"First Online","group":{"name":"ChapterHistory","label":"Chapter History"}},{"value":"xxAI","order":1,"name":"conference_acronym","label":"Conference Acronym","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"International Workshop on Extending Explainable AI Beyond Deep Models and Classifiers","order":2,"name":"conference_name","label":"Conference Name","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Vienna","order":3,"name":"conference_city","label":"Conference City","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Austria","order":4,"name":"conference_country","label":"Conference Country","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"2020","order":5,"name":"conference_year","label":"Conference Year","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"18 July 2020","order":7,"name":"conference_start_date","label":"Conference Start Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"18 July 2020","order":8,"name":"conference_end_date","label":"Conference End Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"xxai2020","order":10,"name":"conference_id","label":"Conference ID","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"https:\/\/human-centered.ai\/xxai-icml-2020\/","order":11,"name":"conference_url","label":"Conference URL","group":{"name":"ConferenceInfo","label":"Conference Information"}}]}}