{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,2]],"date-time":"2026-05-02T13:25:38Z","timestamp":1777728338460,"version":"3.51.4"},"reference-count":15,"publisher":"SAGE Publications","issue":"2","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IA"],"published-print":{"date-parts":[[2023,12,20]]},"abstract":"<jats:p>We study the problem of online interaction in general decision making problems, where the objective is not only to find optimal strategies, but also to satisfy certain safety guarantees, expressed in terms of costs accrued. In particular, we focus on the online learning problem in which an agent has to find the optimal solution of a linear objective. Moreover, the agent has to satisfy a linear safety constraint at each round. We propose a theoretical framework to address such problems and present BAN-SOLO, a UCB-like algorithm that, in an online interaction with an unknown environment, attains sublinear regret of order O ( T ) and satisfies a safety constraint with high probability at each iteration. BAN-SOLO\u00a0provides a general framework that can be applied to any setting in which estimators of the objective and the cost function are available. At its core, it relies on tools from convex duality to manage environment exploration while satisfying the safety constraint imposed by the problem. To show the applicability of our framework, we provide two game theoretical applications: normal-form games and sequential decision-making problems.<\/jats:p>","DOI":"10.3233\/ia-230008","type":"journal-article","created":{"date-parts":[[2023,10,31]],"date-time":"2023-10-31T13:04:34Z","timestamp":1698757474000},"page":"195-205","source":"Crossref","is-referenced-by-count":0,"title":["A framework for safe decision making: A convex duality approach"],"prefix":"10.1177","volume":"17","author":[{"given":"Martino","family":"Bernasconi","sequence":"first","affiliation":[{"name":"Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Federico","family":"Cacciamani","sequence":"additional","affiliation":[{"name":"Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Matteo","family":"Castiglioni","sequence":"additional","affiliation":[{"name":"Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"179","reference":[{"key":"10.3233\/IA-230008_ref1","first-page":"2312","article-title":"Improved algorithms for linear stochastic bandits","volume":"24","author":"Yasin Abbasi-Yadkori","year":"2011","journal-title":"Advances in Neural Information Processing Systems"},{"key":"10.3233\/IA-230008_ref3","doi-asserted-by":"crossref","first-page":"103216","DOI":"10.1016\/j.artint.2019.103216","article-title":"The hanabi challenge: A new frontier for ai research","volume":"280","author":"Nolan Bard","year":"2020","journal-title":"Artificial Intelligence"},{"key":"10.3233\/IA-230008_ref5","unstructured":"Martino Bernasconi , Matteo Castiglioni , Alberto Marchesi , Nicola Gatti , Francesco Trov\u00f2 , Sequential information design: Learning to persuade in the dark,, Advances in Neural Information Processing Systems 35 (2022)."},{"key":"10.3233\/IA-230008_ref6","unstructured":"Martino Bernasconi-de-Luca , Federico Cacciamani , Simone Fioravanti , Nicola Gatti , Alberto Marchesi , Francesco Trov\u00f2 , Exploiting opponents under utility constraints in sequential games,, Advances in Neural Information Processing Systems 34 (2021)."},{"key":"10.3233\/IA-230008_ref7","doi-asserted-by":"crossref","first-page":"418","DOI":"10.1126\/science.aao1733","article-title":"Superhuman ai for heads-up no-limit poker: Libratus beats top professionals","volume":"359","author":"Noam Brown","year":"2018","journal-title":"Science"},{"issue":"2","key":"10.3233\/IA-230008_ref11","doi-asserted-by":"crossref","first-page":"384","DOI":"10.2307\/1428063","article-title":"Rates of convergence for random approximations of convex sets","volume":"28","author":"Lutz Dumbgen","year":"1996","journal-title":"Advances in Applied Probability"},{"issue":"6624","key":"10.3233\/IA-230008_ref12","doi-asserted-by":"crossref","first-page":"1067","DOI":"10.1126\/science.ade9097","article-title":"Human-level play in the game of diplomacy by combining language models with strategic reasoning","volume":"378","author":"Anton Bakhtin","year":"2022","journal-title":"Science"},{"key":"10.3233\/IA-230008_ref13","doi-asserted-by":"crossref","first-page":"5372","DOI":"10.1609\/aaai.v35i6.16677","article-title":"Bandit linear optimization for sequential decision making and extensive-form games","volume":"35","author":"Gabriele Farina","year":"2021","journal-title":"In Proceedings of the AAAI Conference on Artificial Intelligence"},{"issue":"4","key":"10.3233\/IA-230008_ref14","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1287\/inte.33.4.53.16370","article-title":"Sensitivity analysis and uncertainty in linear programming","volume":"33","author":"Julia Higle","year":"2003","journal-title":"Interfaces"},{"issue":"2","key":"10.3233\/IA-230008_ref15","doi-asserted-by":"crossref","first-page":"494","DOI":"10.1287\/moor.1100.0452","article-title":"Smoothing techniques for computing nash equilibria of sequential games","volume":"35","author":"Samid Hoda","year":"2010","journal-title":"Mathematics of Operations Research"},{"issue":"7676","key":"10.3233\/IA-230008_ref18","doi-asserted-by":"crossref","first-page":"354","DOI":"10.1038\/nature24270","article-title":"Mastering the game of go without human knowledge","volume":"550","author":"David Silver","year":"2017","journal-title":"nature"},{"issue":"6419","key":"10.3233\/IA-230008_ref19","doi-asserted-by":"crossref","first-page":"1140","DOI":"10.1126\/science.aar6404","article-title":"A general reinforcement learning algorithm that masters chess, shogi, and go through self-play","volume":"362","author":"David Silver","year":"2018","journal-title":"Science"},{"issue":"2","key":"10.3233\/IA-230008_ref22","doi-asserted-by":"crossref","first-page":"220","DOI":"10.1006\/game.1996.0050","article-title":"Efficient computation of behavior strategies","volume":"14","author":"Bernhard Von Stengel","year":"1996","journal-title":"Games and Economic Behavior"},{"issue":"2","key":"10.3233\/IA-230008_ref23","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3392157","article-title":"Online primal-dual mirror descent under stochastic constraints","volume":"4","author":"Xiaohan Wei","year":"2020","journal-title":"Proceedings of the ACM on Measurement and Analysis of Computing Systems"},{"key":"10.3233\/IA-230008_ref24","unstructured":"Hao Yu , Michael Neely , Xiaohan Wei , and Online convex optimization with stochastic constraints, Advances in Neural Information Processing Systems 30 (2017)."}],"container-title":["Intelligenza Artificiale"],"original-title":[],"link":[{"URL":"https:\/\/content.iospress.com\/download?id=10.3233\/IA-230008","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T10:51:49Z","timestamp":1777459909000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/full\/10.3233\/IA-230008"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,12,20]]},"references-count":15,"journal-issue":{"issue":"2"},"URL":"https:\/\/doi.org\/10.3233\/ia-230008","relation":{},"ISSN":["1724-8035","2211-0097"],"issn-type":[{"value":"1724-8035","type":"print"},{"value":"2211-0097","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,12,20]]}}}