{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T01:47:08Z","timestamp":1760060828055,"version":"build-2065373602"},"reference-count":57,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2025,9,24]],"date-time":"2025-09-24T00:00:00Z","timestamp":1758672000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"Scientific Research Projects Department of Istanbul Technical University under Project","award":["MDK-2022-43798","4023492025"],"award-info":[{"award-number":["MDK-2022-43798","4023492025"]}]},{"name":"National Center for High Performance Computing of T\u00fcrkiye (UHeM)","award":["MDK-2022-43798","4023492025"],"award-info":[{"award-number":["MDK-2022-43798","4023492025"]}]},{"name":"ITU Artificial Intelligence and Data Science Application and Research Center (ITU AI)","award":["MDK-2022-43798","4023492025"],"award-info":[{"award-number":["MDK-2022-43798","4023492025"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["MAKE"],"abstract":"<jats:p>Reinforcement learning agents are highly susceptible to adversarial attacks that can severely compromise their performance. Although adversarial training is a common countermeasure, most existing research focuses on defending against single-type attacks targeting either observations or actions. This narrow focus overlooks the complexity of real-world mixed attacks, where an agent\u2019s perceptions and resulting actions are perturbed simultaneously. To systematically study these threats, we introduce the Action and State-Adversarial Markov Decision Process (ASA-MDP), which models the interaction as a zero-sum game between the agent and an adversary attacking both states and actions. Using this framework, we show that agents trained conventionally or against single-type attacks remain highly vulnerable to mixed perturbations. Moreover, we identify a key challenge in this setting: a naive mixed-type adversary often fails to effectively balance its perturbations across modalities during training, limiting the agent\u2019s robustness. To address this, we propose the Action and State-Adversarial Proximal Policy Optimization (ASA-PPO) algorithm, which enables the adversary to learn a balanced strategy, distributing its attack budget across both state and action spaces. This, in turn, enhances the robustness of the trained agent against a wide range of adversarial scenarios. Comprehensive experiments across diverse environments demonstrate that policies trained with ASA-PPO substantially outperform baselines\u2014including standard PPO and single-type adversarial methods\u2014under action-only, observation-only, and, most notably, mixed-attack conditions.<\/jats:p>","DOI":"10.3390\/make7040108","type":"journal-article","created":{"date-parts":[[2025,9,24]],"date-time":"2025-09-24T13:16:11Z","timestamp":1758719771000},"page":"108","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Learning to Balance Mixed Adversarial Attacks for Robust Reinforcement Learning"],"prefix":"10.3390","volume":"7","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5228-1743","authenticated-orcid":false,"given":"Mustafa","family":"Erdem","sequence":"first","affiliation":[{"name":"Department of Mechatronics Engineering, Istanbul Technical University, Maslak, 34467 Istanbul, T\u00fcrkiye"},{"name":"Department of Mechatronics Engineering, Turkish-German University, Beykoz, 34820 Istanbul, T\u00fcrkiye"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2660-2141","authenticated-orcid":false,"given":"Naz\u0131m Kemal","family":"\u00dcre","sequence":"additional","affiliation":[{"name":"Department of Artificial Intelligence and Data Engineering, Istanbul Technical University, Maslak, 34467 Istanbul, T\u00fcrkiye"}]}],"member":"1968","published-online":{"date-parts":[[2025,9,24]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Sutton, R.S., and Barto, A.G. (1998). Reinforcement Learning: An Introduction, MIT Press.","DOI":"10.1109\/TNN.1998.712192"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1038\/nature14236","article-title":"Human-level control through deep reinforcement learning","volume":"518","author":"Mnih","year":"2015","journal-title":"Nature"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"484","DOI":"10.1038\/nature16961","article-title":"Mastering the game of Go with deep neural networks and tree search","volume":"529","author":"Silver","year":"2016","journal-title":"Nature"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1172","DOI":"10.3390\/ai5030057","article-title":"Optimization Strategies for Atari Game Environments: Integrating Snake Optimization Algorithm and Energy Valley Optimization in Reinforcement Learning Models","volume":"5","author":"Sarkhi","year":"2024","journal-title":"AI"},{"key":"ref_5","first-page":"1","article-title":"End-to-end training of deep visuomotor policies","volume":"17","author":"Levine","year":"2016","journal-title":"J. Mach. Learn. Res."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"2096","DOI":"10.1109\/LRA.2017.2720851","article-title":"Control of a quadrotor with reinforcement learning","volume":"2","author":"Hwangbo","year":"2017","journal-title":"IEEE Robot. Autom. Lett."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"555","DOI":"10.3390\/ai5020029","article-title":"Development of an Attention Mechanism for Task-Adaptive Heterogeneous Robot Teaming","volume":"5","author":"Guo","year":"2024","journal-title":"AI"},{"key":"ref_8","unstructured":"Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., and Fergus, R. (2013). Intriguing properties of neural networks. arXiv."},{"key":"ref_9","unstructured":"Goodfellow, I.J., Shlens, J., and Szegedy, C. (2014). Explaining and harnessing adversarial examples. arXiv."},{"key":"ref_10","unstructured":"Huang, S., Papernot, N., Goodfellow, I., Duan, Y., and Abbeel, P. (2017). Adversarial attacks on neural network policies. arXiv."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Sun, J., Zhang, T., Xie, X., Ma, L., Zheng, Y., Chen, K., and Liu, Y. (2020, January 7\u201312). Stealthy and efficient adversarial attacks against deep reinforcement learning. Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA.","DOI":"10.1609\/aaai.v34i04.6047"},{"key":"ref_12","unstructured":"Chow, Y., Nachum, O., Duenez-Guzman, E., and Ghavamzadeh, M. (2018, January 3\u20138). A lyapunov-based approach to safe reinforcement learning. Proceedings of the Advances in Neural Information Processing Systems 31 (NeurIPS 2018), Montreal, QC, Canada."},{"key":"ref_13","unstructured":"Tian, H., Hamedmoghadam, H., Shorten, R., and Ferraro, P. (2024). Reinforcement Learning with Adaptive Control Regularization for Safe Control of Critical Systems. arXiv."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., and Abbeel, P. (2017, January 24\u201328). Domain randomization for transferring deep neural networks from simulation to the real world. Proceedings of the 2017 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada.","DOI":"10.1109\/IROS.2017.8202133"},{"key":"ref_15","unstructured":"Kos, J., and Song, D. (2017). Delving into adversarial attacks on deep policies. arXiv."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"107728","DOI":"10.1016\/j.engappai.2023.107728","article-title":"Adversarial deep reinforcement learning based robust depth tracking control for underactuated autonomous underwater vehicle","volume":"130","author":"Wang","year":"2024","journal-title":"Eng. Appl. Artif. Intell."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"127191","DOI":"10.1016\/j.neucom.2023.127191","article-title":"Enhancing the robustness of QMIX against state-adversarial attacks","volume":"572","author":"Guo","year":"2024","journal-title":"Neurocomputing"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"109402","DOI":"10.1016\/j.ast.2024.109402","article-title":"UAV air combat autonomous trajectory planning method based on robust adversarial reinforcement learning","volume":"153","author":"Wang","year":"2024","journal-title":"Aerosp. Sci. Technol."},{"key":"ref_19","unstructured":"Pinto, L., Davidson, J., Sukthankar, R., and Gupta, A. (2017, January 6\u201311). Robust adversarial reinforcement learning. Proceedings of the International Conference on Machine Learning, PMLR, Sydney, Australia."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3708320","article-title":"Adversarial Machine Learning Attacks and Defences in Multi-Agent Reinforcement Learning","volume":"57","author":"Standen","year":"2025","journal-title":"ACM Comput. Surv."},{"key":"ref_21","first-page":"21024","article-title":"Robust deep reinforcement learning against adversarial perturbations on state observations","volume":"33","author":"Zhang","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_22","first-page":"26156","article-title":"Robust deep reinforcement learning through adversarial loss","volume":"34","author":"Oikarinen","year":"2021","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_23","unstructured":"Tessler, C., Efroni, Y., and Mannor, S. (2019, January 9\u201315). Action robust reinforcement learning and applications in continuous control. Proceedings of the International Conference on Machine Learning. PMLR, Long Beach, CA, USA."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"126578","DOI":"10.1016\/j.neucom.2023.126578","article-title":"Reward poisoning attacks in deep reinforcement learning based on exploration strategies","volume":"553","author":"Cai","year":"2023","journal-title":"Neurocomputing"},{"key":"ref_25","unstructured":"Rakhsha, A., Radanovic, G., Devidze, R., Zhu, X., and Singla, A. (2020, January 13\u201318). Policy teaching via environment poisoning: Training-time adversarial attacks against reinforcement learning. Proceedings of the International Conference on Machine Learning, PMLR, Virtual."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"276","DOI":"10.3390\/make4010013","article-title":"Robust reinforcement learning: A review of foundations and recent advances","volume":"4","author":"Moos","year":"2022","journal-title":"Mach. Learn. Knowl. Extr."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Liu, Q., Kuang, Y., and Wang, J. (July, January 30). Robust deep reinforcement learning with adaptive adversarial perturbations in action space. Proceedings of the 2024 International Joint Conference on Neural Networks (IJCNN), Yokohama, Japan.","DOI":"10.1109\/IJCNN60899.2024.10651543"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"3263","DOI":"10.1017\/S026357472400119X","article-title":"Curriculum reinforcement learning-based drifting along a general path for autonomous vehicles","volume":"42","author":"Yu","year":"2024","journal-title":"Robotica"},{"key":"ref_29","unstructured":"Zakka, K., Tabanpour, B., Liao, Q., Haiderbhai, M., Holt, S., Luo, J.Y., Allshire, A., Frey, E., Sreenath, K., and Kahrs, L.A. (2025). MuJoCo Playground. arXiv."},{"key":"ref_30","unstructured":"Tan, K., Wang, J., and Kantaros, Y. (2023, January 15\u201316). Targeted adversarial attacks against neural network trajectory predictors. Proceedings of the Learning for Dynamics and Control Conference. PMLR, Philadelphia, PA, USA."},{"key":"ref_31","first-page":"79980","article-title":"Discovering general reinforcement learning algorithms with adversarial environment design","volume":"36","author":"Jackson","year":"2023","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Lee, X.Y., Ghadai, S., Tan, K.L., Hegde, C., and Sarkar, S. (2020, January 7\u201312). Spatiotemporally constrained action space attacks on deep reinforcement learning agents. Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA.","DOI":"10.1609\/aaai.v34i04.5887"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Chen, K., Guo, S., Zhang, T., Xie, X., and Liu, Y. (2021, January 7\u201311). Stealing deep reinforcement learning models for fun and profit. Proceedings of the 2021 ACM Asia Conference on Computer and Communications Security, Hong Kong, China.","DOI":"10.1145\/3433210.3453090"},{"key":"ref_34","unstructured":"Gleave, A., Dennis, M., Wild, C., Kant, N., Levine, S., and Russell, S. (2020, January 26\u201330). Adversarial Policies: Attacking Deep Reinforcement Learning. Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia."},{"key":"ref_35","unstructured":"Pattanaik, A., Tang, Z., Liu, S., Bommannan, G., and Chowdhary, G. (2018, January 10\u201315). Robust Deep Reinforcement Learning with Adversarial Attacks. Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, Stockholm, Sweden."},{"key":"ref_36","unstructured":"Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and Vladu, A. (May, January 30). Towards Deep Learning Models Resistant to Adversarial Attacks. Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"108965","DOI":"10.1016\/j.knosys.2022.108965","article-title":"Deep-attack over the deep reinforcement learning","volume":"250","author":"Li","year":"2022","journal-title":"Knowl.-Based Syst."},{"key":"ref_38","unstructured":"Behzadan, V., and Munir, A. (2017, January 15\u201320). Vulnerability of deep reinforcement learning to policy induction attacks. Proceedings of the Machine Learning and Data Mining in Pattern Recognition: 13th International Conference, MLDM 2017, New York, NY, USA. Proceedings 13."},{"key":"ref_39","first-page":"1633","article-title":"On adaptive attacks to adversarial example defenses","volume":"33","author":"Tramer","year":"2020","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Pan, X., Seita, D., Gao, Y., and Canny, J. (2019, January 20\u201324). Risk averse robust adversarial reinforcement learning. Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada.","DOI":"10.1109\/ICRA.2019.8794293"},{"key":"ref_41","unstructured":"Zhang, H., Chen, H., Boning, D., and Hsieh, C.J. (2021, January 3\u20137). Robust Reinforcement Learning on State Observations with Learned Optimal Adversary. Proceedings of the International Conference on Learning Representation (ICLR), Virtual."},{"key":"ref_42","unstructured":"Albrecht, S.V., Christianos, F., and Sch\u00e4fer, L. (2024). Multi-Agent Reinforcement Learning: Foundations and Modern Approaches, MIT Press."},{"key":"ref_43","unstructured":"Vinitsky, E., Du, Y., Parvate, K., Jang, K., Abbeel, P., and Bayen, A. (2020). Robust reinforcement learning using adversarial populations. arXiv."},{"key":"ref_44","unstructured":"He, S., Han, S., Su, S., Han, S., Zou, S., and Miao, F. (2023). Robust Multi-Agent Reinforcement Learning with State Uncertainty. Trans. Mach. Learn. Res."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Mandlekar, A., Zhu, Y., Garg, A., Fei-Fei, L., and Savarese, S. (2017, January 24\u201328). Adversarially robust policy learning: Active construction of physically-plausible perturbations. Proceedings of the 2017 IEEE\/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada.","DOI":"10.1109\/IROS.2017.8206245"},{"key":"ref_46","first-page":"24401","article-title":"Efficient adversarial attacks on online multi-agent reinforcement learning","volume":"36","author":"Liu","year":"2023","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"1095","DOI":"10.1073\/pnas.39.10.1095","article-title":"Stochastic games","volume":"39","author":"Shapley","year":"1953","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"780","DOI":"10.1287\/opre.1050.0216","article-title":"Robust control of Markov decision processes with uncertain transition matrices","volume":"53","author":"Nilim","year":"2005","journal-title":"Oper. Res."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Suilen, M., Badings, T., Bovy, E.M., Parker, D., and Jansen, N. (2024). Robust markov decision processes: A place where AI and formal methods meet. Principles of Verification: Cycling the Probabilistic Landscape: Essays Dedicated to Joost-Pieter Katoen on the Occasion of His 60th Birthday, Part III, Springer.","DOI":"10.1007\/978-3-031-75778-5_7"},{"key":"ref_50","unstructured":"Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. (2017). Proximal policy optimization algorithms. arXiv."},{"key":"ref_51","unstructured":"Perolat, J., Scherrer, B., Piot, B., and Pietquin, O. (2015, January 7\u20139). Approximate dynamic programming for two-player zero-sum Markov games. Proceedings of the International Conference on Machine Learning. PMLR, Lille, France."},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"100022","DOI":"10.1016\/j.simpa.2020.100022","article-title":"dm_control: Software and tasks for continuous control","volume":"6","author":"Tunyasuvunakool","year":"2020","journal-title":"Softw. Impacts"},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Todorov, E., Erez, T., and Tassa, Y. (2012, January 7\u201312). Mujoco: A physics engine for model-based control. Proceedings of the 2012 IEEE\/RSJ International Conference on Intelligent Robots and Systems, Algarve, Portugal.","DOI":"10.1109\/IROS.2012.6386109"},{"key":"ref_54","first-page":"16455","article-title":"Discovered policy optimisation","volume":"35","author":"Lu","year":"2022","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_55","unstructured":"Bradbury, J., Frostig, R., Hawkins, P., Johnson, M.J., Leary, C., Maclaurin, D., Necula, G., Paszke, A., VanderPlas, J., and Wanderman-Milne, S. (2025, September 21). JAX: Composable transformations of Python+NumPy programs. 2018. Available online: http:\/\/github.com\/jax-ml\/jax."},{"key":"ref_56","unstructured":"Islam, R., Henderson, P., Gomrokchi, M., and Precup, D. (2017). Reproducibility of benchmarked deep reinforcement learning tasks for continuous control. arXiv."},{"key":"ref_57","doi-asserted-by":"crossref","first-page":"554","DOI":"10.3390\/make3030029","article-title":"Recent advances in deep reinforcement learning applications for solving partially observable markov decision processes (pomdp) problems: Part 1\u2014fundamentals and applications in games, robotics and natural language processing","volume":"3","author":"Xiang","year":"2021","journal-title":"Mach. Learn. Knowl. Extr."}],"container-title":["Machine Learning and Knowledge Extraction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-4990\/7\/4\/108\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T18:48:58Z","timestamp":1760035738000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-4990\/7\/4\/108"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,9,24]]},"references-count":57,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["make7040108"],"URL":"https:\/\/doi.org\/10.3390\/make7040108","relation":{},"ISSN":["2504-4990"],"issn-type":[{"type":"electronic","value":"2504-4990"}],"subject":[],"published":{"date-parts":[[2025,9,24]]}}}