{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,26]],"date-time":"2026-03-26T15:20:39Z","timestamp":1774538439987,"version":"3.50.1"},"reference-count":197,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2023,1,19]],"date-time":"2023-01-19T00:00:00Z","timestamp":1674086400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>Theory of mind (ToM) is the psychological construct by which we model another\u2019s internal mental states. Through ToM, we adjust our own behaviour to best suit a social context, and therefore it is essential to our everyday interactions with others. In adopting an algorithmic (rather than a psychological or neurological) approach to ToM, we gain insights into cognition that will aid us in building more accurate models for the cognitive and behavioural sciences, as well as enable artificial agents to be more proficient in social interactions as they become more embedded in our everyday lives. Inverse reinforcement learning (IRL) is a class of machine learning methods by which to infer the preferences (rewards as a function of state) of a decision maker from its behaviour (trajectories in a Markov decision process). IRL can provide a computational approach for ToM, as recently outlined by Jara-Ettinger, but this will require a better understanding of the relationship between ToM concepts and existing IRL methods at the algorthmic level. Here, we provide a review of prominent IRL algorithms and their formal descriptions, and discuss the applicability of IRL concepts as the algorithmic basis of a ToM in AI.<\/jats:p>","DOI":"10.3390\/a16020068","type":"journal-article","created":{"date-parts":[[2023,1,20]],"date-time":"2023-01-20T02:52:21Z","timestamp":1674183141000},"page":"68","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["Inverse Reinforcement Learning as the Algorithmic Basis for Theory of Mind: Current Methods and Open Problems"],"prefix":"10.3390","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0220-3253","authenticated-orcid":false,"given":"Jaime","family":"Ruiz-Serra","sequence":"first","affiliation":[{"name":"Modelling and Simulation Research Group, School of Computer Science, Faculty of Engineering, The University of Sydney, Sydney, NSW 2006, Australia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2199-2515","authenticated-orcid":false,"given":"Michael S.","family":"Harr\u00e9","sequence":"additional","affiliation":[{"name":"Modelling and Simulation Research Group, School of Computer Science, Faculty of Engineering, The University of Sydney, Sydney, NSW 2006, Australia"}]}],"member":"1968","published-online":{"date-parts":[[2023,1,19]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"R644","DOI":"10.1016\/j.cub.2005.08.041","article-title":"Theory of Mind","volume":"15","author":"Frith","year":"2005","journal-title":"Curr. Biol."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"495","DOI":"10.1017\/S0140525X00058611","article-title":"Pr\u00e9cis of The Intentional Stance","volume":"11","author":"Dennett","year":"1988","journal-title":"Behav. Brain Sci."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"165","DOI":"10.1038\/s42256-019-0039-y","article-title":"Apply Rich Psychological Terms in AI with Care","volume":"1","author":"Shevlin","year":"2019","journal-title":"Nat. Mach. Intell."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"66","DOI":"10.1016\/j.brainres.2005.12.113","article-title":"Mentalizing and Marr: An Information Processing Approach to the Study of Social Cognition","volume":"1079","author":"Mitchell","year":"2006","journal-title":"Brain Res."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"802","DOI":"10.1016\/j.tics.2020.06.011","article-title":"Is There a \u2018Social\u2019 Brain? Implementations and Algorithms","volume":"24","author":"Lockwood","year":"2020","journal-title":"Trends Cogn. Sci."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"107488","DOI":"10.1016\/j.neuropsychologia.2020.107488","article-title":"Theory of Mind and Decision Science: Towards a Typology of Tasks and Computational Models","volume":"146","author":"Rusch","year":"2020","journal-title":"Neuropsychologia"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1067","DOI":"10.1126\/science.ade9097","article-title":"Human-Level Play in the Game of Diplomacy by Combining Language Models with Strategic Reasoning","volume":"378","author":"Bakhtin","year":"2022","journal-title":"Science"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"369","DOI":"10.1080\/09515089.2019.1688778","article-title":"Adopting the Intentional Stance toward Natural and Artificial Agents","volume":"33","author":"Wykowska","year":"2020","journal-title":"Philos. Psychol."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Harr\u00e9, M.S. (2021). Information Theory for Agents in Artificial Intelligence, Psychology, and Economics. Entropy, 23.","DOI":"10.3390\/e23030310"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"750763","DOI":"10.3389\/frai.2022.750763","article-title":"Supporting Artificial Social Intelligence With Theory of Mind","volume":"5","author":"Williams","year":"2022","journal-title":"Front. Artif. Intell."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"959","DOI":"10.1016\/j.tics.2022.08.003","article-title":"Planning with Theory of Mind","volume":"26","author":"Ho","year":"2022","journal-title":"Trends Cogn. Sci."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"213","DOI":"10.1016\/0004-3702(90)90055-5","article-title":"Intention Is Choice with Commitment","volume":"42","author":"Cohen","year":"1990","journal-title":"Artif. Intell."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"515","DOI":"10.1017\/S0140525X00076512","article-title":"Does the Chimpanzee Have a Theory of Mind?","volume":"1","author":"Premack","year":"1978","journal-title":"Behav. Brain Sci."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"45","DOI":"10.1016\/0004-3702(78)90012-7","article-title":"The Plan Recognition Problem: An Intersection of Psychology and Artificial Intelligence","volume":"11","author":"Schmidt","year":"1978","journal-title":"Artif. Intell."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Pollack, M.E. (1986, January 24\u201327). A Model of Plan Inference That Distinguishes between the Beliefs of Actors and Observers. Proceedings of the 24th Annual Meeting on Association for Computational Linguistics (ACL \u201986), New York, NY, USA.","DOI":"10.3115\/981131.981160"},{"key":"ref_16","first-page":"390","article-title":"A Representationalist Theory of Intention","volume":"Volume 1","author":"Konolige","year":"1993","journal-title":"Proceedings of the 13th International Joint Conference on Artifical Intelligence (IJCAI \u201993)"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Yoshida, W., Dolan, R.J., and Friston, K.J. (2008). Game Theory of Mind. PLoS Comput. Biol., 4.","DOI":"10.1371\/journal.pcbi.1000254"},{"key":"ref_18","unstructured":"Baker, C., Saxe, R., and Tenenbaum, J. (2011, January 20\u201323). Bayesian Theory of Mind: Modeling Joint Belief-Desire Attribution. Proceedings of the Annual Meeting of the Cognitive Science Society, Boston, MA, USA."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"64","DOI":"10.1038\/s41562-017-0064","article-title":"Rational Quantitative Attribution of Beliefs, Desires and Percepts in Human Mentalizing","volume":"1","author":"Baker","year":"2017","journal-title":"Nat. Hum. Behav."},{"key":"ref_20","unstructured":"Rabinowitz, N., Perbet, F., Song, F., Zhang, C., Eslami, S.M.A., and Botvinick, M. (2018, January 10\u201315). Machine Theory of Mind. Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"62","DOI":"10.3389\/frai.2022.778852","article-title":"Theory of Mind and Preference Learning at the Interface of Cognitive Science, Neuroscience, and AI: A Review","volume":"5","author":"Langley","year":"2022","journal-title":"Front. Artif. Intell."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"105","DOI":"10.1016\/j.cobeha.2019.04.010","article-title":"Theory of Mind as Inverse Reinforcement Learning","volume":"29","year":"2019","journal-title":"Curr. Opin. Behav. Sci."},{"key":"ref_23","first-page":"1","article-title":"An Algorithmic Perspective on Imitation Learning","volume":"7","author":"Osa","year":"2018","journal-title":"ROB"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1016\/j.arcontrol.2020.06.001","article-title":"From Inverse Optimal Control to Inverse Reinforcement Learning: A Historical Review","volume":"50","author":"Shahmansoorian","year":"2020","journal-title":"Annu. Rev. Control"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"103500","DOI":"10.1016\/j.artint.2021.103500","article-title":"A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress","volume":"297","author":"Arora","year":"2021","journal-title":"Artif. Intell."},{"key":"ref_26","first-page":"202","article-title":"An Overview of Inverse Reinforcement Learning Techniques","volume":"29","author":"Shah","year":"2021","journal-title":"Intell. Environ."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"4307","DOI":"10.1007\/s10462-021-10108-x","article-title":"A Survey of Inverse Reinforcement Learning","volume":"55","author":"Adams","year":"2022","journal-title":"Artif. Intell. Rev."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"66","DOI":"10.1016\/j.artint.2018.01.002","article-title":"Autonomous Agents Modelling Other Agents: A Comprehensive Survey and Open Problems","volume":"258","author":"Albrecht","year":"2018","journal-title":"Artif. Intell."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Gilead, M., and Ochsner, K.N. (2021). Computational Models of Mentalizing. The Neural Basis of Mentalizing, Springer International Publishing.","DOI":"10.1007\/978-3-030-51890-5"},{"key":"ref_30","unstructured":"Kennington, C. (September, January 29). Understanding Intention for Machine Theory of Mind: A Position Paper. Proceedings of the 31st IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), Naples, Italy."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Bossel, H., Klaczko, S., and M\u00fcller, N. (1976). Multiattribute Utility Analysis\u2014A Brief Survey. Systems Theory in the Social Sciences: Stochastic and Control Systems Pattern Recognition Fuzzy Analysis Simulation Behavioral Models, Interdisciplinary Systems Research\/Interdisziplin\u00e4re Systemforschung, Birkh\u00e4user.","DOI":"10.1007\/978-3-0348-5495-5"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Russell, S. (1998, January 24\u201326). Learning Agents for Uncertain Environments (Extended Abstract). Proceedings of the Eleventh Annual Conference on Computational Learning Theory (COLT \u201998), Madison, WI, USA.","DOI":"10.1145\/279943.279964"},{"key":"ref_33","unstructured":"Baker, C.L., Tenenbaum, J.B., and Saxe, R.R. (2005, January 5\u20138). Bayesian Models of Human Action Understanding. Proceedings of the 18th International Conference on Neural Information Processing Systems (NIPS \u201905), Vancouver, BC, Canada."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Syed, U., Bowling, M., and Schapire, R.E. (2008, January 5\u20139). Apprenticeship Learning Using Linear Programming. Proceedings of the 25th International Conference on Machine Learning (ICML \u201908), Helsinki, Finland.","DOI":"10.1145\/1390156.1390286"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"83","DOI":"10.1016\/j.neucom.2012.11.002","article-title":"Apprenticeship Learning with Few Examples","volume":"104","author":"Boularias","year":"2013","journal-title":"Neurocomputing"},{"key":"ref_36","unstructured":"Carmel, D., and Markovitch, S. (1993, January 22\u201324). Learning Models of the Opponent\u2019s Strategy in Game Playing. Proceedings of the AAAI Fall Symposium on Games: Planing and Learning, Raleigh, NC, USA."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"61","DOI":"10.2307\/2548836","article-title":"A Note on the Pure Theory of Consumer\u2019s Behaviour","volume":"5","author":"Samuelson","year":"1938","journal-title":"Economica"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"620","DOI":"10.1103\/PhysRev.106.620","article-title":"Information Theory and Statistical Mechanics","volume":"106","author":"Jaynes","year":"1957","journal-title":"Phys. Rev."},{"key":"ref_39","unstructured":"Ziebart, B.D., Bagnell, J.A., and Dey, A.K. (2010, January 21\u201324). Modeling Interaction via the Principle of Maximum Causal Entropy. Proceedings of the 27th International Conference on International Conference on Machine Learning (ICML \u201910), Haifa, Israel."},{"key":"ref_40","unstructured":"Ng, A.Y., and Russell, S.J. (July, January 29). Algorithms for Inverse Reinforcement Learning. Proceedings of the Seventeenth International Conference on Machine Learning (ICML \u201900), Stanford, CA, USA."},{"key":"ref_41","unstructured":"Chajewska, U., and Koller, D. (July, January 30). Utilities as Random Variables: Density Estimation and Structure Discovery. Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence (UAI \u201900), Stanford, CA, USA."},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Abbeel, P., and Ng, A.Y. (2004, January 4\u20138). Apprenticeship Learning via Inverse Reinforcement Learning. Proceedings of the Twenty-First International Conference on Machine Learning (ICML \u201904), Banff, AB, Canada.","DOI":"10.1145\/1015330.1015430"},{"key":"ref_43","unstructured":"Platt, J., Koller, D., Singer, Y., and Roweis, S. (2007, January 3\u20136). A Game-Theoretic Approach to Apprenticeship Learning. Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada."},{"key":"ref_44","first-page":"295","article-title":"On the Theory of Parlor Games","volume":"100","year":"1928","journal-title":"Math. Ann."},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"79","DOI":"10.1006\/game.1999.0738","article-title":"Adaptive Game Playing Using Multiplicative Weights","volume":"29","author":"Freund","year":"1999","journal-title":"Games Econ. Behav."},{"key":"ref_46","unstructured":"Chajewska, U., Koller, D., and Ormoneit, D. (July, January 28). Learning an Agent\u2019s Utility Function by Observing Behavior. Proceedings of the Eighteenth International Conference on Machine Learning (ICML \u201901), Williamstown, MA, USA."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"493","DOI":"10.1016\/S1364-6613(98)01262-5","article-title":"Mirror Neurons and the Simulation Theory of Mind-Reading","volume":"2","author":"Gallese","year":"1998","journal-title":"Trends Cogn. Sci."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"527","DOI":"10.1002\/wcs.33","article-title":"Simulation Theory","volume":"1","author":"Shanton","year":"2010","journal-title":"WIREs Cogn. Sci."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Ratliff, N.D., Bagnell, J.A., and Zinkevich, M.A. (2006, January 25\u201329). Maximum Margin Planning. Proceedings of the 23rd International Conference on Machine Learning (ICML \u201906), Pittsburgh, PA, USA.","DOI":"10.1145\/1143844.1143936"},{"key":"ref_50","unstructured":"Reddy, S., Dragan, A., Levine, S., Legg, S., and Leike, J. (2020, January 13\u201318). Learning Human Objectives by Evaluating Hypothetical Behavior. Proceedings of the 37th International Conference on Machine Learning, Virtual Event."},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1007\/s10994-009-5110-1","article-title":"Training Parsers by Inverse Reinforcement Learning","volume":"77","author":"Neu","year":"2009","journal-title":"Mach. Learn."},{"key":"ref_52","unstructured":"Ziebart, B.D., Maas, A., Bagnell, J.A., and Dey, A.K. (2008, January 13\u201317). Maximum Entropy Inverse Reinforcement Learning. Proceedings of the 23rd National Conference on Artificial Intelligence-Volume 3 (AAAI \u201908), Chicago, IL, USA."},{"key":"ref_53","unstructured":"Neu, G., and Szepesv\u00e1ri, C. (2007, January 19\u201322). Apprenticeship Learning Using Inverse Reinforcement Learning and Gradient Methods. Proceedings of the Twenty-Third Conference on Uncertainty in Artificial Intelligence (UAI \u201907), Vancouver, BC, Canada."},{"key":"ref_54","unstructured":"Ni, T., Sikchi, H., Wang, Y., Gupta, T., Lee, L., and Eysenbach, B. (2020, January 16\u201318). F-IRL: Inverse Reinforcement Learning via State Marginal Matching. Proceedings of the 2020 Conference on Robot Learning, Virtual Event."},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Lopes, M., Melo, F., and Montesano, L. (2009, January 7\u201311). Active Learning for Reward Estimation in Inverse Reinforcement Learning. Proceedings of the 2009 European Conference on Machine Learning and Knowledge Discovery in Databases-Volume Part II (ECMLPKDD \u201909), Bled, Slovenia.","DOI":"10.1007\/978-3-642-04174-7_3"},{"key":"ref_56","unstructured":"Jin, M., Damianou, A., Abbeel, P., and Spanos, C. (2017, January 11\u201315). Inverse Reinforcement Learning via Deep Gaussian Process. Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), Sydney, Australia."},{"key":"ref_57","unstructured":"Roa-Vicens, J., Chtourou, C., Filos, A., Rullan, F., Gal, Y., and Silva, R. (2019, January 9\u201315). Towards Inverse Reinforcement Learning for Limit Order Book Dynamics. Proceedings of the 36th International Conference on Machine Learning (ICML), Long Beach, CA, USA."},{"key":"ref_58","unstructured":"Chan, A.J., and Schaar, M. (2021, January 3\u20137). Scalable Bayesian Inverse Reinforcement Learning. Proceedings of the 2021 International Conference on Learning Representations (ICLR), Virtual Event, Austria."},{"key":"ref_59","unstructured":"Ramachandran, D., and Amir, E. (2007, January 6\u201312). Bayesian Inverse Reinforcement Learning. Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI \u201907), Hyderabad, India."},{"key":"ref_60","unstructured":"Choi, J., and Kim, K.e. (2011, January 12\u201315). MAP Inference for Bayesian Inverse Reinforcement Learning. Proceedings of the Advances in Neural Information Processing Systems, Granada, Spain."},{"key":"ref_61","unstructured":"Melo, F.S., Lopes, M., and Ferreira, R. (2010, January 16\u201320). Analysis of Inverse Reinforcement Learning with Perturbed Demonstrations. Proceedings of the 19th European Conference on Artificial Intelligence, Lisbon, Portugal."},{"key":"ref_62","doi-asserted-by":"crossref","unstructured":"Gunopulos, D., Hofmann, T., Malerba, D., and Vazirgiannis, M. (2011, January 5\u20139). Preference Elicitation and Inverse Reinforcement Learning. Proceedings of the Machine Learning and Knowledge Discovery in Databases (ECMLPKDD \u201911), Athens, Greece.","DOI":"10.1007\/978-3-642-23783-6"},{"key":"ref_63","doi-asserted-by":"crossref","first-page":"1966","DOI":"10.1109\/TIT.2012.2234824","article-title":"The Principle of Maximum Causal Entropy for Estimating Interacting Processes","volume":"59","author":"Ziebart","year":"2013","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_64","unstructured":"Kramer, G. (1998). Directed Information for Channels with Feedback. [Ph.D. Thesis, Hartung-Gorre Germany, Swiss Federal Institute of Technology]."},{"key":"ref_65","doi-asserted-by":"crossref","unstructured":"Bloem, M., and Bambos, N. (2014, January 15\u201317). Infinite Time Horizon Maximum Causal Entropy Inverse Reinforcement Learning. Proceedings of the 53rd IEEE Conference on Decision and Control, Los Angeles, CA, USA.","DOI":"10.1109\/CDC.2014.7040156"},{"key":"ref_66","doi-asserted-by":"crossref","first-page":"2787","DOI":"10.1109\/TAC.2017.2775960","article-title":"Infinite Time Horizon Maximum Causal Entropy Inverse Reinforcement Learning","volume":"63","author":"Zhou","year":"2018","journal-title":"IEEE Trans. Autom. Control"},{"key":"ref_67","unstructured":"Ziebart, B.D. (2010). Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy. [Ph.D. Thesis, Carnegie Mellon University]."},{"key":"ref_68","unstructured":"Boularias, A., Kober, J., and Peters, J. (2011, January 11\u201313). Relative Entropy Inverse Reinforcement Learning. Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, JMLR Workshop and Conference Proceedings, Ft. Lauderdale, FL, USA."},{"key":"ref_69","doi-asserted-by":"crossref","unstructured":"Snoswell, A.J., Singh, S.P.N., and Ye, N. (2020, January 1\u20134). Revisiting Maximum Entropy Inverse Reinforcement Learning: New Perspectives and Algorithms. Proceedings of the 2020 IEEE Symposium Series on Computational Intelligence (SSCI \u201920), Canberra, ACT, Australia.","DOI":"10.1109\/SSCI47803.2020.9308391"},{"key":"ref_70","doi-asserted-by":"crossref","unstructured":"Aghasadeghi, N., and Bretl, T. (2011, January 25\u201330). Maximum Entropy Inverse Reinforcement Learning in Continuous State Spaces with Path Integrals. Proceedings of the 2011 IEEE\/RSJ International Conference on Intelligent Robots and Systems, San Francisco, CA, USA.","DOI":"10.1109\/IROS.2011.6048804"},{"key":"ref_71","unstructured":"Audiffren, J., Valko, M., Lazaric, A., and Ghavamzadeh, M. (2015, January 25\u201331). Maximum Entropy Semi-Supervised Inverse Reinforcement Learning. Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina."},{"key":"ref_72","unstructured":"Finn, C., Christiano, P., Abbeel, P., and Levine, S. (2016). A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models. arXiv."},{"key":"ref_73","unstructured":"Shiarlis, K., Messias, J., and Whiteson, S. (2016, January 9\u201313). Inverse Reinforcement Learning from Failure. Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems (AAMAS \u201916), Singapore."},{"key":"ref_74","first-page":"25917","article-title":"Robust Inverse Reinforcement Learning under Transition Dynamics Mismatch","volume":"34","author":"Viano","year":"2021","journal-title":"Proceedings of the Advances in Neural Information Processing Systems"},{"key":"ref_75","doi-asserted-by":"crossref","unstructured":"Sanghvi, N., Usami, S., Sharma, M., Groeger, J., and Kitani, K. (2021, January 2\u20139). Inverse Reinforcement Learning with Explicit Policy Estimates. Proceedings of the AAAI Conference on Artificial Intelligence, Virtual Event.","DOI":"10.1609\/aaai.v35i11.17141"},{"key":"ref_76","unstructured":"Dvijotham, K., and Todorov, E. (2010, January 21\u201324). Inverse Optimal Control with Linearly-Solvable MDPs. Proceedings of the 27th International Conference on International Conference on Machine Learning (ICML \u201910), Haifa, Israel."},{"key":"ref_77","unstructured":"Sch\u00f6lkopf, B., Platt, J.C., and Hofmann, T. (2006). Linearly-Solvable Markov Decision Problems. Advances in Neural Information Processing Systems 19, Proceedings of the Twentieth Annual Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 4\u20137 December 2006, MIT Press."},{"key":"ref_78","unstructured":"Klein, E., Geist, M., Piot, B., and Pietquin, O. (2012, January 3\u20138). Inverse Reinforcement Learning through Structured Classification. Proceedings of the Advances in Neural Information Processing Systems (NeurIPS \u201912), Lake Tahoe, NV, USA."},{"key":"ref_79","doi-asserted-by":"crossref","unstructured":"Blockeel, H., Kersting, K., Nijssen, S., and \u017delezn\u00fd, F. (2013). A Cascaded Supervised Learning Approach to Inverse Reinforcement Learning. Proceedings of the Machine Learning and Knowledge Discovery in Databases, Prague, Czech Republic, 23\u201327 September 2013, Springer. Lecture Notes in Computer Science.","DOI":"10.1007\/978-3-642-40994-3"},{"key":"ref_80","doi-asserted-by":"crossref","unstructured":"Doerr, A., Ratliff, N., Bohg, J., Toussaint, M., and Schaal, S. (2015, January 13\u201317). Direct Loss Minimization Inverse Optimal Control. Proceedings of the Robotics: Science and Systems Conference, Rome, Italy.","DOI":"10.15607\/RSS.2015.XI.013"},{"key":"ref_81","doi-asserted-by":"crossref","unstructured":"Pirotta, M., and Restelli, M. (2016, January 12\u201317). Inverse Reinforcement Learning through Policy Gradient Minimization. Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA.","DOI":"10.1609\/aaai.v30i1.10313"},{"key":"ref_82","unstructured":"Metelli, A.M., Pirotta, M., and Restelli, M. (2017, January 4\u20139). Compatible Reward Inverse Reinforcement Learning. Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA."},{"key":"ref_83","unstructured":"Ho, J., and Ermon, S. (2016, January 5\u201310). Generative Adversarial Imitation Learning. Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain."},{"key":"ref_84","unstructured":"Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. (2014, January 8\u201313). Generative Adversarial Nets. Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada."},{"key":"ref_85","unstructured":"Yu, L., Yu, T., Finn, C., and Ermon, S. (2019, January 8\u201314). Meta-Inverse Reinforcement Learning with Probabilistic Context Variables. Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada."},{"key":"ref_86","unstructured":"Fu, J., Luo, K., and Levine, S. (May, January 30). Learning Robust Rewards with Adverserial Inverse Reinforcement Learning. Proceedings of the 6th International Conference on Learning Representations (ICLR \u201918), Vancouver, BC, Canada."},{"key":"ref_87","doi-asserted-by":"crossref","unstructured":"Wang, P., Li, H., and Chan, C.Y. (June, January 30). Meta-Adversarial Inverse Reinforcement Learning for Decision-making Tasks. Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi\u2019an, China.","DOI":"10.1109\/ICRA48506.2021.9561330"},{"key":"ref_88","unstructured":"Peng, X.B., Kanazawa, A., Toyer, S., Abbeel, P., and Levine, S. (2019, January 6\u20139). Variational Discriminator Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow. Proceedings of the 7th International Conference on Learning Representations, New Orleans, LA, USA."},{"key":"ref_89","doi-asserted-by":"crossref","unstructured":"Wang, P., Wang, P., Liu, D., Chen, J., Li, H., Chan, C.Y., and Chan, C.Y. (June, January 30). Decision Making for Autonomous Driving via Augmented Adversarial Inverse Reinforcement Learning. Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi\u2019an, China.","DOI":"10.1109\/ICRA48506.2021.9560907"},{"key":"ref_90","doi-asserted-by":"crossref","first-page":"1880","DOI":"10.1109\/LRA.2021.3061397","article-title":"Adversarial Inverse Reinforcement Learning With Self-Attention Dynamics Model","volume":"6","author":"Sun","year":"2021","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_91","doi-asserted-by":"crossref","unstructured":"Zhou, L., and Small, K. (2020, January 7\u201312). Inverse Reinforcement Learning with Natural Language Goals. Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA.","DOI":"10.1609\/aaai.v35i12.17326"},{"key":"ref_92","unstructured":"Ratliff, N., Bradley, D., Bagnell, J., and Chestnutt, J. (2006, January 4\u20139). Boosting Structured Prediction for Imitation Learning. Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada."},{"key":"ref_93","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1007\/s10514-009-9121-3","article-title":"Learning to Search: Functional Gradient Techniques for Imitation Learning","volume":"27","author":"Ratliff","year":"2009","journal-title":"Auton. Robot"},{"key":"ref_94","unstructured":"Levine, S., Popovic, Z., and Koltun, V. (2010, January 6\u201311). Feature Construction for Inverse Reinforcement Learning. Proceedings of the Advances in Neural Information Processing Systems (NeurIPS \u201910), Vancouver, BC, Canada."},{"key":"ref_95","doi-asserted-by":"crossref","unstructured":"Jin, Z.J., Qian, H., and Zhu, M.L. (2010, January 11\u201314). Gaussian Processes in Inverse Reinforcement Learning. Proceedings of the 2010 International Conference on Machine Learning and Cybernetics (ICMLC \u201910), Qingdao, China.","DOI":"10.1109\/ICMLC.2010.5581063"},{"key":"ref_96","unstructured":"Levine, S., Popovic, Z., and Koltun, V. (2011, January 12\u201317). Nonlinear Inverse Reinforcement Learning with Gaussian Processes. Proceedings of the Advances in Neural Information Processing Systems, Granada, Spain."},{"key":"ref_97","unstructured":"Wulfmeier, M., Ondruska, P., and Posner, I. (2015). Maximum Entropy Deep Inverse Reinforcement Learning. arXiv."},{"key":"ref_98","unstructured":"Levine, S., and Koltun, V. (July2012, January 26). Continuous Inverse Optimal Control with Locally Optimal Examples. Proceedings of the 29th International Conference on Machine Learning (ICML \u201912), Edinburgh, Scotland."},{"key":"ref_99","doi-asserted-by":"crossref","unstructured":"Kim, K.E., and Park, H.S. (2018, January 2\u20137). Imitation Learning via Kernel Mean Embedding. Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA.","DOI":"10.1609\/aaai.v32i1.11720"},{"key":"ref_100","unstructured":"Choi, J., and Kim, K.E. (2013, January 3\u20139). Bayesian Nonparametric Feature Construction for Inverse Reinforcement Learning. Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence (IJCAI \u201913), Beijing, China."},{"key":"ref_101","doi-asserted-by":"crossref","unstructured":"Flach, P.A., De Bie, T., and Cristianini, N. (2012). Bayesian Nonparametric Inverse Reinforcement Learning. Proceedings of the Machine Learning and Knowledge Discovery in Databases, Bristol, UK, 24\u201328 September 2012, Springer. Lecture Notes in Computer Science.","DOI":"10.1007\/978-3-642-33460-3"},{"key":"ref_102","doi-asserted-by":"crossref","unstructured":"Wulfmeier, M., Wang, D.Z., and Posner, I. (2016, January 9\u201314). Watch This: Scalable Cost-Function Learning for Path Planning in Urban Environments. Proceedings of the 2016 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, Republic of Korea.","DOI":"10.1109\/IROS.2016.7759328"},{"key":"ref_103","unstructured":"Bogdanovic, M., Markovikj, D., Denil, M., and de Freitas, N. (2015). Deep Apprenticeship Learning for Playing Video Games. Papers from the 2015 AAAI Workshop, The AAAI Press. AAAI Technical Report WS-15-10."},{"key":"ref_104","unstructured":"Markovikj, D. (2014). Deep Apprenticeship Learning for Playing Games. [Master\u2019s Thesis, University of Oxford]."},{"key":"ref_105","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.robot.2016.06.003","article-title":"Neural Inverse Reinforcement Learning in Autonomous Navigation","volume":"84","author":"Xia","year":"2016","journal-title":"Robot. Auton. Syst."},{"key":"ref_106","doi-asserted-by":"crossref","first-page":"891","DOI":"10.1007\/s11063-017-9702-7","article-title":"Model-Free Deep Inverse Reinforcement Learning by Logistic Regression","volume":"47","author":"Uchibe","year":"2018","journal-title":"Neural. Process Lett."},{"key":"ref_107","unstructured":"Finn, C., Levine, S., and Abbeel, P. (2016, January 19\u201324). Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization. Proceedings of the 33rd International Conference on International Conference on Machine Learning (ICML \u201916), New York, NY, USA."},{"key":"ref_108","doi-asserted-by":"crossref","first-page":"117","DOI":"10.1037\/a0029137","article-title":"On What Ground Do We Mentalize? Characteristics of Current Tasks and Sources of Information That Contribute to Mentalizing Judgments","volume":"25","author":"Achim","year":"2013","journal-title":"Psychol. Assess."},{"key":"ref_109","unstructured":"Kim, K., Garg, S., Shiragur, K., and Ermon, S. (2021, January 18\u201324). Reward Identification in Inverse Reinforcement Learning. Proceedings of the 38th International Conference on Machine Learning, Virtual Event."},{"key":"ref_110","first-page":"12362","article-title":"Identifiability in Inverse Reinforcement Learning","volume":"Volume 34","author":"Cao","year":"2021","journal-title":"Proceedings of the Advances in Neural Information Processing Systems"},{"key":"ref_111","unstructured":"Tauber, S., and Steyvers, M. (2011, January 20\u201323). Using Inverse Planning and Theory of Mind for Social Goal Inference. Proceedings of the 33rd Annual Meeting of the Cognitive Science Society, Boston, MA, USA."},{"key":"ref_112","doi-asserted-by":"crossref","first-page":"3081","DOI":"10.1016\/S1573-4412(05)80020-0","article-title":"Structural Estimation of Markov Decision Processes","volume":"Volume 4","author":"Rust","year":"1994","journal-title":"Handbook of Econometrics"},{"key":"ref_113","unstructured":"Damiani, A., Manganini, G., Metelli, A.M., and Restelli, M. (2022, January 17\u201323). Balancing Sample Efficiency and Suboptimality in Inverse Reinforcement Learning. Proceedings of the 39th International Conference on Machine Learning, Baltimore, MD, USA."},{"key":"ref_114","unstructured":"Jarboui, F., and Perchet, V. (2021). A Generalised Inverse Reinforcement Learning Framework. arXiv."},{"key":"ref_115","unstructured":"Bogert, K., and Doshi, P. (2015, January 25\u201331). Toward Estimating Others\u2019 Transition Models under Occlusion for Multi-Robot IRL. Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina."},{"key":"ref_116","unstructured":"Ramponi, G., Likmeta, A., Metelli, A.M., Tirinzoni, A., and Restelli, M. (2020, January 26\u201328). Truly Batch Model-Free Inverse Reinforcement Learning about Multiple Intentions. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, Virtual Event."},{"key":"ref_117","unstructured":"Xue, W., Lian, B., Fan, J., Kolaric, P., Chai, T., and Lewis, F.L. (2021). Inverse Reinforcement Q-Learning Through Expert Imitation for Discrete-Time Systems. IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_118","doi-asserted-by":"crossref","unstructured":"Donge, V.S., Lian, B., Lewis, F.L., and Davoudi, A. (2022). Multi-Agent Graphical Games with Inverse Reinforcement Learning. IEEE Trans. Control. Netw. Syst.","DOI":"10.1109\/TCNS.2022.3210856"},{"key":"ref_119","unstructured":"Herman, M., Gindele, T., Wagner, J., Schmitt, F., and Burgard, W. (2016, January 9\u201311). Inverse Reinforcement Learning with Simultaneous Estimation of Rewards and Dynamics. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, Cadiz, Spain."},{"key":"ref_120","unstructured":"Reddy, S., Dragan, A., and Levine, S. (2018, January 3\u20138). Where Do You Think You\u2019 Re Going? Inferring Beliefs about Dynamics from Behavior. Proceedings of the Advances in Neural Information Processing Systems, Montr\u00e9al, QC, Canada."},{"key":"ref_121","first-page":"2485","article-title":"What Is It You Really Want of Me? Generalized Reward Learning with Biased Beliefs about Domain Dynamics","volume":"34","author":"Gong","year":"2020","journal-title":"Proc. AAAI Conf. Artif. Intell."},{"key":"ref_122","unstructured":"Munzer, T., Piot, B., Geist, M., Pietquin, O., and Lopes, M. (2015, January 25\u201331). Inverse Reinforcement Learning in Relational Domains. Proceedings of the 24th International Conference on Artificial Intelligence (IJCAI \u201915), Buenos Aires, Argentina."},{"key":"ref_123","unstructured":"Chae, J., Han, S., Jung, W., Cho, M., Choi, S., and Sung, Y. (2022, January 17\u201323). Robust Imitation Learning against Variations in Environment Dynamics. Proceedings of the 39th International Conference on Machine Learning, Baltimore, MD, USA."},{"key":"ref_124","unstructured":"Golub, M., Chase, S., and Yu, B. (2013, January 16\u201321). Learning an Internal Dynamics Model from Control Demonstration. Proceedings of the 30th International Conference on Machine Learning, Atlanta, GA, USA."},{"key":"ref_125","doi-asserted-by":"crossref","first-page":"584","DOI":"10.1111\/cogs.12157","article-title":"Inferring Learners\u2019 Knowledge From Their Actions","volume":"39","author":"Rafferty","year":"2015","journal-title":"Cogn. Sci."},{"key":"ref_126","unstructured":"Rafferty, A.N., Jansen, R.A., and Griffiths, T.L. (July, January 29). Using Inverse Planning for Personalized Feedback. Proceedings of the 9th International Conference on Educational Data Mining, Raleigh, NC, USA."},{"key":"ref_127","first-page":"691","article-title":"Inverse Reinforcement Learning in Partially Observable Environments","volume":"12","author":"Choi","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"ref_128","doi-asserted-by":"crossref","first-page":"329","DOI":"10.1016\/j.cognition.2009.07.005","article-title":"Action Understanding as Inverse Planning","volume":"113","author":"Baker","year":"2009","journal-title":"Cognition"},{"key":"ref_129","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1016\/j.artint.2004.08.003","article-title":"Learning a Decision Maker\u2019s Utility Function from (Possibly) Inconsistent Behavior","volume":"160","author":"Nielsen","year":"2004","journal-title":"Artif. Intell."},{"key":"ref_130","unstructured":"Zheng, J., Liu, S., and Ni, L.M. (July, January Canada). Robust Bayesian Inverse Reinforcement Learning with Sparse Behavior Noise. Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence (AAAI \u201914), Qu\u00e9bec City, QC."},{"key":"ref_131","doi-asserted-by":"crossref","unstructured":"Lian, B., Xue, W., Lewis, F.L., and Chai, T. (2021). Inverse Reinforcement Learning for Adversarial Apprentice Games. IEEE Trans. Neural Netw.","DOI":"10.1109\/CDC45484.2021.9682909"},{"key":"ref_132","first-page":"9197","article-title":"Inverse Reinforcement Learning From Like-Minded Teachers","volume":"35","author":"Noothigattu","year":"2021","journal-title":"Proc. AAAI Conf. Artif. Intell."},{"key":"ref_133","unstructured":"Brown, D., Goo, W., Nagarajan, P., and Niekum, S. (2019, January 9\u201315). Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations. Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA."},{"key":"ref_134","unstructured":"Armstrong, S., and Mindermann, S. (2018, January 3\u20138). Occam\u2019 s Razor Is Insufficient to Infer the Preferences of Irrational Agents. Proceedings of the Advances in Neural Information Processing Systems, Montr\u00e9al, QC, Canada."},{"key":"ref_135","doi-asserted-by":"crossref","unstructured":"Ranchod, P., Rosman, B., and Konidaris, G. (October, January 28). Nonparametric Bayesian Reward Segmentation for Skill Discovery Using Inverse Reinforcement Learning. Proceedings of the 2015 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, Germany.","DOI":"10.1109\/IROS.2015.7353414"},{"key":"ref_136","doi-asserted-by":"crossref","unstructured":"Henderson, P., Chang, W.D., Bacon, P.L., Meger, D., Pineau, J., and Precup, D. (2018, January 2\u20137). OptionGAN: Learning Joint Reward-Policy Options Using Generative Adversarial Inverse Reinforcement Learning. Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, LA, USA.","DOI":"10.1609\/aaai.v32i1.11775"},{"key":"ref_137","unstructured":"Babe\u015f-Vroman, M., Marivate, V., Subramanian, K., and Littman, M. (July, January 28). Apprenticeship Learning about Multiple Intentions. Proceedings of the 28th International Conference on International Conference on Machine Learning (ICML \u201911), Bellevue, WA, USA."},{"key":"ref_138","doi-asserted-by":"crossref","first-page":"2541","DOI":"10.1007\/s10994-020-05939-8","article-title":"Dealing with Multiple Experts and Non-Stationarity in Inverse Reinforcement Learning: An Application to Real-Life Problems","volume":"110","author":"Likmeta","year":"2021","journal-title":"Mach. Learn."},{"key":"ref_139","unstructured":"Gleave, A., and Habryka, O. (2018). Multi-Task Maximum Entropy Inverse Reinforcement Learning. arXiv."},{"key":"ref_140","doi-asserted-by":"crossref","unstructured":"Sanner, S., and Hutter, M. (2012). Bayesian Multitask Inverse Reinforcement Learning. Proceedings of the Recent Advances in Reinforcement Learning\u20149th European Workshop (EWRL), Athens, Greece, 9\u201311 September 2011, Springer. Lecture Notes in Computer Science.","DOI":"10.1007\/978-3-642-29946-9"},{"key":"ref_141","unstructured":"Choi, J., and Kim, K.e. (2012, January 3\u20138). Nonparametric Bayesian Inverse Reinforcement Learning for Multiple Reward Functions. Proceedings of the Advances in Neural Information Processing Systems (NeurIPS \u201912), Lake Tahoe, NV, USA."},{"key":"ref_142","doi-asserted-by":"crossref","unstructured":"Arora, S., Doshi, P., and Banerjee, B. (June, January 30). Min-Max Entropy Inverse RL of Multiple Tasks. Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi\u2019an, China.","DOI":"10.1109\/ICRA48506.2021.9561771"},{"key":"ref_143","first-page":"206","article-title":"Deep Adaptive Multi-Intention Inverse Reinforcement Learning","volume":"2021","author":"Bighashdel","year":"2021","journal-title":"ECML\/PKDD"},{"key":"ref_144","doi-asserted-by":"crossref","unstructured":"Almingol, J., and Montesano, L. (October, January 28). Learning Multiple Behaviours Using Hierarchical Clustering of Rewards. Proceedings of the 2015 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, Germany.","DOI":"10.1109\/IROS.2015.7354033"},{"key":"ref_145","doi-asserted-by":"crossref","first-page":"2295","DOI":"10.1007\/s10994-021-05984-x","article-title":"Inverse Reinforcement Learning in Contextual MDPs","volume":"110","author":"Belogolovsky","year":"2021","journal-title":"Mach. Learn."},{"key":"ref_146","unstructured":"Sharifzadeh, S., Chiotellis, I., Triebel, R., and Cremers, D. (2017). Learning to Drive Using Inverse Reinforcement Learning and Deep Q-Networks. In Proceedings of the NIPS Workshop on Deep Learning for Action and Interaction. arXiv."},{"key":"ref_147","unstructured":"Brown, D., Coleman, R., Srinivasan, R., and Niekum, S. (2020, January 12\u201318). Safe Imitation Learning via Fast Bayesian Reward Inference from Preferences. Proceedings of the 37th International Conference on Machine Learning, Virtual Event."},{"key":"ref_148","doi-asserted-by":"crossref","first-page":"4125","DOI":"10.1109\/TNNLS.2021.3051012","article-title":"Scalable Inverse Reinforcement Learning Through Multifidelity Bayesian Optimization","volume":"33","author":"Imani","year":"2022","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_149","first-page":"4028","article-title":"IQ-Learn: Inverse Soft-Q Learning for Imitation","volume":"Volume 34","author":"Garg","year":"2021","journal-title":"Proceedings of the Advances in Neural Information Processing Systems"},{"key":"ref_150","doi-asserted-by":"crossref","first-page":"102070","DOI":"10.1016\/j.tre.2020.102070","article-title":"Integrating Dijkstra\u2019s Algorithm into Deep Inverse Reinforcement Learning for Food Delivery Route Planning","volume":"142","author":"Liu","year":"2020","journal-title":"Transp. Res. Part E Logist. Transp. Rev."},{"key":"ref_151","unstructured":"Xu, K., Ratner, E., Dragan, A., Levine, S., and Finn, C. (2019, January 9\u201315). Learning a Prior over Intent via Meta-Inverse Reinforcement Learning. Proceedings of the 36th International Conference on Machine Learning, PMLR, Long Beach, CA, USA."},{"key":"ref_152","unstructured":"Seyed Ghasemipour, S.K., Gu, S.S., and Zemel, R. (2019, January 8\u201314). SMILe: Scalable Meta Inverse Reinforcement Learning through Context-Conditional Policies. Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada."},{"key":"ref_153","doi-asserted-by":"crossref","unstructured":"Flach, P.A., De Bie, T., and Cristianini, N. (2012). Structured Apprenticeship Learning. Proceedings of the Machine Learning and Knowledge Discovery in Databases, Bristol, UK, 24\u201328 September 2012, Springer. Lecture Notes in Computer Science.","DOI":"10.1007\/978-3-642-33486-3"},{"key":"ref_154","doi-asserted-by":"crossref","first-page":"46","DOI":"10.1016\/j.artint.2018.07.002","article-title":"Multi-Robot Inverse Reinforcement Learning under Occlusion with Estimation of State Transitions","volume":"263","author":"Bogert","year":"2018","journal-title":"Artif. Intell."},{"key":"ref_155","doi-asserted-by":"crossref","first-page":"848","DOI":"10.1177\/0278364921996384","article-title":"Inverse Optimal Control from Incomplete Trajectory Observations","volume":"40","author":"Jin","year":"2021","journal-title":"Int. J. Robot. Res."},{"key":"ref_156","unstructured":"Suresh, P.S., and Doshi, P. (2022, January 1\u20135). Marginal MAP Estimation for Inverse RL under Occlusion with Observer Noise. Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence, Eindhoven, The Netherlands."},{"key":"ref_157","doi-asserted-by":"crossref","unstructured":"Torabi, F., Warnell, G., and Stone, P. (2019, January 10\u201316). Recent Advances in Imitation Learning from Observation. Proceedings of the Electronic Proceedings of IJCAI (IJCAI \u201919), Macao, China.","DOI":"10.24963\/ijcai.2019\/882"},{"key":"ref_158","doi-asserted-by":"crossref","unstructured":"Das, N., Bechtle, S., Davchev, T., Jayaraman, D., Rai, A., and Meier, F. (2021, January 8\u201311). Model-Based Inverse Reinforcement Learning from Visual Demonstrations. Proceedings of the 2020 Conference on Robot Learning, London, UK.","DOI":"10.1109\/ICRA48506.2021.9561396"},{"key":"ref_159","unstructured":"Zakka, K., Zeng, A., Florence, P., Tompson, J., Bohg, J., and Dwibedi, D. (2022, January 14\u201318). XIRL: Cross-embodiment Inverse Reinforcement Learning. Proceedings of the 5th Conference on Robot Learning, Auckland, New Zealand."},{"key":"ref_160","doi-asserted-by":"crossref","unstructured":"Liu, Y., Gupta, A., Abbeel, P., and Levine, S. (2018, January 21\u201325). Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation. Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia.","DOI":"10.1109\/ICRA.2018.8462901"},{"key":"ref_161","unstructured":"Hadfield-Menell, D., Russell, S.J., Abbeel, P., and Dragan, A. (2016, January 5\u201310). Cooperative Inverse Reinforcement Learning. Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain."},{"key":"ref_162","unstructured":"Amin, K., Jiang, N., and Singh, S. (2017, January 4\u20139). Repeated Inverse Reinforcement Learning. Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA."},{"key":"ref_163","unstructured":"Christiano, P.F., Leike, J., Brown, T., Martic, M., Legg, S., and Amodei, D. (2017, January 4\u20139). Deep Reinforcement Learning from Human Preferences. Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA."},{"key":"ref_164","doi-asserted-by":"crossref","first-page":"497","DOI":"10.1177\/02783649221078031","article-title":"Inducing Structure in Reward Learning by Learning Features","volume":"41","author":"Bobu","year":"2022","journal-title":"Int. J. Robot. Res."},{"key":"ref_165","doi-asserted-by":"crossref","first-page":"133","DOI":"10.1016\/j.cobeha.2015.09.010","article-title":"Social Emotions and Psychological Games","volume":"5","author":"Chang","year":"2015","journal-title":"Curr. Opin. Behav. Sci."},{"key":"ref_166","first-page":"1281","article-title":"Incorporating Fairness into Game Theory and Economics","volume":"83","author":"Rabin","year":"1993","journal-title":"Am. Econ. Rev."},{"key":"ref_167","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1093\/ei\/41.1.20","article-title":"On the Nature of Fair Behavior","volume":"41","author":"Falk","year":"2003","journal-title":"Econ. Inq."},{"key":"ref_168","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.cobeha.2017.07.010","article-title":"On the Interaction of Social Affect and Cognition: Empathy, Compassion and Theory of Mind","volume":"19","author":"Preckel","year":"2018","journal-title":"Curr. Opin. Behav. Sci."},{"key":"ref_169","doi-asserted-by":"crossref","first-page":"338","DOI":"10.1111\/tops.12371","article-title":"Computational Models of Emotion Inference in Theory of Mind: A Review and Roadmap","volume":"11","author":"Ong","year":"2019","journal-title":"Top. Cogn. Sci."},{"key":"ref_170","doi-asserted-by":"crossref","first-page":"141","DOI":"10.1023\/A:1021086215235","article-title":"Estimating a Game Theoretic Model","volume":"18","author":"Lise","year":"2001","journal-title":"Comput. Econ."},{"key":"ref_171","doi-asserted-by":"crossref","first-page":"1529","DOI":"10.3982\/ECTA5434","article-title":"Identification and Estimation of a Discrete Game of Complete Information","volume":"78","author":"Bajari","year":"2010","journal-title":"Econometrica"},{"key":"ref_172","unstructured":"Waugh, K., Ziebart, B.D., and Bagnell, J.A. (July, January 28). Computational Rationalization: The Inverse Equilibrium Problem. Proceedings of the 28th International Conference on International Conference on Machine Learning (ICML \u201911), Bellevue, WA, USA."},{"key":"ref_173","unstructured":"Markakis, E., and Sch\u00e4fer, G. (2015). Inverse Game Theory: Learning Utilities in Succinct Games. Proceedings of the Web and Internet Economics, Amsterdam, The Netherlands, 9\u201312 December 2015, Springer. Lecture Notes in Computer Science."},{"key":"ref_174","doi-asserted-by":"crossref","unstructured":"Cao, K., and Xie, L. (2022). Game-Theoretic Inverse Reinforcement Learning: A Differential Pontryagin\u2019s Maximum Principle Approach. IEEE Trans. Neural Netw. Learn. Syst.","DOI":"10.1109\/TNNLS.2022.3148376"},{"key":"ref_175","doi-asserted-by":"crossref","unstructured":"Natarajan, S., Kunapuli, G., Judah, K., Tadepalli, P., Kersting, K., and Shavlik, J. (2010, January 12\u201314). Multi-Agent Inverse Reinforcement Learning. Proceedings of the 2010 Ninth International Conference on Machine Learning and Applications (ICMLA \u201910), Washington, DC, USA.","DOI":"10.1109\/ICMLA.2010.65"},{"key":"ref_176","doi-asserted-by":"crossref","unstructured":"Reddy, T.S., Gopikrishna, V., Zaruba, G., and Huber, M. (2012, January 14\u201317). Inverse Reinforcement Learning for Decentralized Non-Cooperative Multiagent Systems. Proceedings of the 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC) (IEEE SMC \u201912), Seoul, Republic of Korea.","DOI":"10.1109\/ICSMC.2012.6378020"},{"key":"ref_177","unstructured":"Chen, Y., Zhang, L., Liu, J., and Hu, S. (2022). Individual-Level Inverse Reinforcement Learning for Mean Field Games. arXiv."},{"key":"ref_178","doi-asserted-by":"crossref","unstructured":"Harr\u00e9, M.S. (2022). What Can Game Theory Tell Us about an AI \u2018Theory of Mind\u2019?. Games, 13.","DOI":"10.3390\/g13030046"},{"key":"ref_179","first-page":"105","article-title":"Including Deontic Reasoning as Fundamental to Theory of Mind","volume":"51","author":"Wellman","year":"2008","journal-title":"HDE"},{"key":"ref_180","doi-asserted-by":"crossref","first-page":"598","DOI":"10.1126\/science.1142996","article-title":"Social Decision-Making: Insights from Game Theory and Neuroscience","volume":"318","author":"Sanfey","year":"2007","journal-title":"Science"},{"key":"ref_181","doi-asserted-by":"crossref","first-page":"693","DOI":"10.1146\/annurev.psych.60.110707.163514","article-title":"The Social Brain: Neural Basis of Social Knowledge","volume":"60","author":"Adolphs","year":"2009","journal-title":"Annu. Rev. Psychol."},{"key":"ref_182","doi-asserted-by":"crossref","first-page":"1209","DOI":"10.1126\/science.abe2629","article-title":"Using Large-Scale Experiments and Machine Learning to Discover Theories of Human Decision-Making","volume":"372","author":"Peterson","year":"2021","journal-title":"Science"},{"key":"ref_183","doi-asserted-by":"crossref","unstructured":"Gershman, S.J., Gerstenberg, T., Baker, C.L., and Cushman, F.A. (2016). Plans, Habits, and Theory of Mind. PLoS ONE, 11.","DOI":"10.1371\/journal.pone.0162246"},{"key":"ref_184","doi-asserted-by":"crossref","first-page":"486","DOI":"10.1287\/mnsc.14.7.486","article-title":"Games with Incomplete Information Played by \u201cBayesian\u201d Players, I\u2013III. Part III. The Basic Probability Distribution of the Game","volume":"14","author":"Harsanyi","year":"1968","journal-title":"Manag. Sci."},{"key":"ref_185","doi-asserted-by":"crossref","first-page":"798","DOI":"10.3758\/s13423-018-1559-x","article-title":"Understanding Individual Differences in Theory of Mind via Representation of Minds, Not Mental States","volume":"26","author":"Conway","year":"2019","journal-title":"Psychon. Bull. Rev."},{"key":"ref_186","unstructured":"Velez-Ginorio, J., Siegel, M.H., Tenenbaum, J., and Jara-Ettinger, J. (2017, January 16\u201329). Interpreting Actions by Attributing Compositional Desires. Proceedings of the 39th Annual Meeting of the Cognitive Science Society, London, UK."},{"key":"ref_187","doi-asserted-by":"crossref","unstructured":"Sun, L., Zhan, W., and Tomizuka, M. (2018, January 4\u20137). Probabilistic Prediction of Interactive Driving Behavior via Hierarchical Inverse Reinforcement Learning. Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA.","DOI":"10.1109\/ITSC.2018.8569453"},{"key":"ref_188","unstructured":"Kolter, J., Abbeel, P., and Ng, A. (2007, January 3\u20136). Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion. Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada."},{"key":"ref_189","unstructured":"Natarajan, S., Joshi, S., Tadepalli, P., Kersting, K., and Shavlik, J. (2011, January 16\u201322). Imitation Learning in Relational Domains: A Functional-Gradient Boosting Approach. Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, Barcelona, Spain."},{"key":"ref_190","unstructured":"Okal, B., Gilbert, H., and Arras, K.O. (2015, January 13\u201317). Efficient Inverse Reinforcement Learning Using Adaptive State-Graphs. Proceedings of the Robotics: Science and Systems XI Conference (RSS \u201915), Rome, Italy."},{"key":"ref_191","doi-asserted-by":"crossref","unstructured":"Gao, X., Gong, R., Zhao, Y., Wang, S., Shu, T., and Zhu, S.C. (September, January 31). Joint Mind Modeling for Explanation Generation in Complex Human-Robot Collaborative Tasks. Proceedings of the 2020 29th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), Naples, Italy.","DOI":"10.1109\/RO-MAN47096.2020.9223595"},{"key":"ref_192","doi-asserted-by":"crossref","first-page":"103216","DOI":"10.1016\/j.artint.2019.103216","article-title":"The Hanabi Challenge: A New Frontier for AI Research","volume":"280","author":"Bard","year":"2020","journal-title":"Artif. Intell."},{"key":"ref_193","unstructured":"Heidecke, J. (2019). Evaluating the Robustness of GAN-Based Inverse Reinforcement Learning Algorithms. [Master\u2019s Thesis, Universitat Polit\u00e8cnica de Catalunya]."},{"key":"ref_194","unstructured":"Snoswell, A.J., Singh, S.P.N., and Ye, N. (2021). LiMIIRL: Lightweight Multiple-Intent Inverse Reinforcement Learning. arXiv."},{"key":"ref_195","unstructured":"Toyer, S., Shah, R., Critch, A., and Russell, S. (2020). The MAGICAL Benchmark for Robust Imitation. arXiv."},{"key":"ref_196","doi-asserted-by":"crossref","unstructured":"Waade, P.T., Enevoldsen, K.C., Vermillet, A.Q., Simonsen, A., and Fusaroli, R. (2022). Introducing Tomsup: Theory of Mind Simulations Using Python. Behav. Res. Methods.","DOI":"10.3758\/s13428-022-01827-2"},{"key":"ref_197","doi-asserted-by":"crossref","first-page":"1408","DOI":"10.1073\/pnas.1722396115","article-title":"Conceptualizing Degrees of Theory of Mind","volume":"115","author":"Conway","year":"2018","journal-title":"Proc. Natl. Acad. Sci. USA"}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/16\/2\/68\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T18:11:28Z","timestamp":1760119888000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/16\/2\/68"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,1,19]]},"references-count":197,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2023,2]]}},"alternative-id":["a16020068"],"URL":"https:\/\/doi.org\/10.3390\/a16020068","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,1,19]]}}}