{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,24]],"date-time":"2025-11-24T15:23:09Z","timestamp":1763997789117,"version":"3.45.0"},"reference-count":34,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2025,11,24]],"date-time":"2025-11-24T00:00:00Z","timestamp":1763942400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>People need to internalize the skills of AI agents to improve their own capabilities. Our paper focuses on Mahjong, a multiplayer game involving imperfect information and requiring effective long-term decision-making amidst randomness and hidden information. Through the efforts of AI researchers, several impressive Mahjong AI agents have already achieved performance levels comparable to those of professional human players; however, these agents are often treated as black boxes from which few insights can be gleaned. This paper introduces Mxplainer, a parameterized search algorithm that can be converted into an equivalent neural network to learn the parameters of black-box agents. Experiments on both human and AI agents demonstrate that Mxplainer achieves a top-three action prediction accuracy of over 92% and 90%, respectively, while providing faithful and interpretable approximations that outperform decision-tree methods (34.8% top-three accuracy). This enables Mxplainer to deliver both strategy-level insights into agent characteristics and actionable, step-by-step explanations for individual decisions.<\/jats:p>","DOI":"10.3390\/a18120738","type":"journal-article","created":{"date-parts":[[2025,11,24]],"date-time":"2025-11-24T14:59:04Z","timestamp":1763996344000},"page":"738","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Mxplainer: Explain and Learn Insights by Imitating Mahjong Agents"],"prefix":"10.3390","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9872-002X","authenticated-orcid":false,"given":"Lingfeng","family":"Li","sequence":"first","affiliation":[{"name":"School of Computer Science, Peking University, Beijing 100871, China"}]},{"given":"Yunlong","family":"Lu","sequence":"additional","affiliation":[{"name":"School of Computer Science, Peking University, Beijing 100871, China"}]},{"given":"Yongyi","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Computer Science, Peking University, Beijing 100871, China"}]},{"given":"Qifan","family":"Zheng","sequence":"additional","affiliation":[{"name":"School of Computer Science, Peking University, Beijing 100871, China"}]},{"given":"Wenxin","family":"Li","sequence":"additional","affiliation":[{"name":"School of Computer Science, Peking University, Beijing 100871, China"}]}],"member":"1968","published-online":{"date-parts":[[2025,11,24]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"354","DOI":"10.1038\/nature24270","article-title":"Mastering the game of Go without human knowledge","volume":"550","author":"Silver","year":"2017","journal-title":"Nature"},{"key":"ref_2","unstructured":"Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., and Graepel, T. (2017). Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. arXiv."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"418","DOI":"10.1126\/science.aao1733","article-title":"Superhuman AI for heads-up no-limit poker: Libratus beats top professionals","volume":"359","author":"Brown","year":"2018","journal-title":"Science"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"885","DOI":"10.1126\/science.aay2400","article-title":"Superhuman AI for multiplayer poker","volume":"365","author":"Brown","year":"2019","journal-title":"Science"},{"key":"ref_5","unstructured":"Yang, G., Liu, M., Hong, W., Zhang, W., Fang, F., Zeng, G., and Lin, Y. (2024). PerfectDou: Dominating DouDizhu with Perfect Information Distillation. arXiv."},{"key":"ref_6","unstructured":"Zha, D., Xie, J., Ma, W., Zhang, S., Lian, X., Hu, X., and Liu, J. (2021). DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement Learning. arXiv."},{"key":"ref_7","unstructured":"Li, J., Koyamada, S., Ye, Q., Liu, G., Wang, C., Yang, R., Zhao, L., Qin, T., Liu, T.Y., and Hon, H.W. (2020). Suphx: Mastering Mahjong with Deep Reinforcement Learning. arXiv."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"350","DOI":"10.1038\/s41586-019-1724-z","article-title":"Grandmaster level in StarCraft II using multi-agent reinforcement learning","volume":"575","author":"Vinyals","year":"2019","journal-title":"Nature"},{"key":"ref_9","unstructured":"Fan, L., Wang, G., Jiang, Y., Mandlekar, A., Yang, Y., Zhu, H., Tang, A., Huang, D.A., Zhu, Y., and Anandkumar, A. (2022). MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge. arXiv."},{"key":"ref_10","unstructured":"Wei, H., Chen, J., Ji, X., Qin, H., Deng, M., Li, S., Wang, L., Zhang, W., Yu, Y., and Liu, L. (2022). Honor of Kings Arena: An Environment for Generalization in Competitive Reinforcement Learning. arXiv."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"484","DOI":"10.1038\/nature16961","article-title":"Mastering the game of Go with deep neural networks and tree search","volume":"529","author":"Silver","year":"2016","journal-title":"Nature"},{"key":"ref_12","unstructured":"Willingham, E. (2024, April 12). AI\u2019s Victories in Go Inspire Better Human Game Playing. Available online: https:\/\/www.scientificamerican.com\/article\/ais-victories-in-go-inspire-better-human-game-playing\/."},{"key":"ref_13","unstructured":"Huang, A., Hui, F., and Baker, L. (2024, April 12). AlphaGo Teach. Available online: https:\/\/alphagoteach.deepmind.com\/."},{"key":"ref_14","unstructured":"Saedol, L. (2024, April 12). 8 Years Later: A World Go Champion\u2019s Reflections on AlphaGo. Available online: https:\/\/blog.google\/around-the-globe\/google-asia\/8-years-later-a-world-go-champions-reflections-on-alphago\/."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Li, J., Wu, S., Fu, H., Fu, Q., Zhao, E., and Xing, J. (2022, January 21\u201324). Speedup Training Artificial Intelligence for Mahjong via Reward Variance Reduction. Proceedings of the 2022 IEEE Conference on Games (CoG), Beijing, China.","DOI":"10.1109\/CoG51982.2022.9893584"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Zhao, X., and Holden, S.B. (2022). Building a 3-Player Mahjong AI using Deep Reinforcement Learning. arXiv.","DOI":"10.1109\/CoG51982.2022.9893576"},{"key":"ref_17","unstructured":"Arrieta, A.B., D\u00edaz-Rodr\u00edguez, N., Ser, J.D., Bennetot, A., Tabik, S., Barbado, A., Garc\u00eda, S., Gil-L\u00f3pez, S., Molina, D., and Benjamins, R. (2019). Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI. arXiv."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Linardatos, P., Papastefanopoulos, V., and Kotsiantis, S. (2021). Explainable AI: A Review of Machine Learning Interpretability Methods. Entropy, 23.","DOI":"10.3390\/e23010018"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Ribeiro, M.T., Singh, S., and Guestrin, C. (2016). \u201cWhy Should I Trust You?\u201d: Explaining the Predictions of Any Classifier. arXiv.","DOI":"10.1145\/2939672.2939778"},{"key":"ref_20","unstructured":"Berlingerio, M., Bonchi, F., G\u00e4rtner, T., Hurley, N., and Ifrim, G. Toward Interpretable Deep Reinforcement Learning with Linear Model U-Trees. Proceedings of the Machine Learning and Knowledge Discovery in Databases."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"168","DOI":"10.1145\/3616864","article-title":"Explainable Reinforcement Learning: A Survey and Comparative Review","volume":"56","author":"Milani","year":"2024","journal-title":"ACM Comput. Surv."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Abbeel, P., and Ng, A.Y. (2004, January 4\u20138). Apprenticeship learning via inverse reinforcement learning. Proceedings of the Twenty-First International Conference on Machine Learning, Banff, AB, Canada. ICML \u201904.","DOI":"10.1145\/1015330.1015430"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"233","DOI":"10.1016\/S1364-6613(99)01327-3","article-title":"Is imitation learning the route to humanoid robots?","volume":"3","author":"Schaal","year":"1999","journal-title":"Trends Cogn. Sci."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Jhunjhunwala, A., Lee, J., Sedwards, S., Abdelzad, V., and Czarnecki, K. (2020, January 19\u201324). Improved Policy Extraction via Online Q-Value Distillation. Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK.","DOI":"10.1109\/IJCNN48605.2020.9207648"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"741","DOI":"10.1007\/s40747-020-00175-y","article-title":"Interpretable policy derivation for reinforcement learning based on evolutionary feature synthesis","volume":"6","author":"Zhang","year":"2020","journal-title":"Complex Intell. Syst."},{"key":"ref_26","unstructured":"Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., and Zaremba, W. (2016). OpenAI Gym. arXiv."},{"key":"ref_27","unstructured":"Dy, J., and Krause, A. (2018, January 10\u201315). Programmatically Interpretable Reinforcement Learning. Proceedings of the 35th International Conference on Machine Learning, PMLR, Stockholm, Sweden. Proceedings of Machine Learning Research."},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Lu, Y., Li, W., and Li, W. (2023). Official International Mahjong: A New Playground for AI Research. Algorithms, 16.","DOI":"10.3390\/a16050235"},{"key":"ref_29","unstructured":"Novikov, V. (2016). Handbook on Mahjong Competition Rules, European Mahjong Association."},{"key":"ref_30","unstructured":"Lee, J., and Leyffer, S. Using Piecewise Linear Functions for Solving MINLPs. Proceedings of the Mixed Integer Nonlinear Programming."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"1139","DOI":"10.1080\/10556788.2013.796683","article-title":"On stable piecewise linearization and generalized algorithmic differentiation","volume":"28","author":"Griewank","year":"2013","journal-title":"Optim. Methods Softw."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Morteza, A., and Chou, R.A. (2025, January 22\u201327). Constrained Optimization of Access Functions in Uniform Secret Sharing. Proceedings of the 2025 IEEE International Symposium on Information Theory (ISIT), Ann Arbor, MI, USA.","DOI":"10.1109\/ISIT63088.2025.11195691"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Zhou, H., Zhang, H., Zhou, Y., Wang, X., and Li, W. (2018, January 2\u20134). Botzone: An Online Multi-Agent Competitive Platform for AI Education. Proceedings of the 23rd Annual ACM Conference on Innovation and Technology in Computer Science Education, Larnaca, Cyprus. ITiCSE 2018.","DOI":"10.1145\/3197091.3197099"},{"key":"ref_34","unstructured":"Sutton, R.S., and Barto, A.G. (2018). Reinforcement Learning: An Introduction, Cambridge University Press."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/18\/12\/738\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,24]],"date-time":"2025-11-24T15:17:56Z","timestamp":1763997476000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/18\/12\/738"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11,24]]},"references-count":34,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["a18120738"],"URL":"https:\/\/doi.org\/10.3390\/a18120738","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,11,24]]}}}