{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,3]],"date-time":"2026-02-03T18:53:53Z","timestamp":1770144833490,"version":"3.49.0"},"reference-count":66,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2025,12,2]],"date-time":"2025-12-02T00:00:00Z","timestamp":1764633600000},"content-version":"vor","delay-in-days":1,"URL":"http:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"NSF","award":["2045641,\u00a02325956, 2512128,\u00a02533814"],"award-info":[{"award-number":["2045641,\u00a02325956, 2512128,\u00a02533814"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Meas. Anal. Comput. Syst."],"published-print":{"date-parts":[[2025,12]]},"abstract":"<jats:p>\n                    We study a novel heterogeneous multi-agent multi-armed bandit problem with a cluster structure induced by stochastic block models (SBMs), influencing not only graph topology, but also reward heterogeneity. Specifically, agents are distributed on random graphs based on SBMs, a generalized Erdos-Renyi model with heterogeneous edge probabilities: agents are grouped into clusters (known or unknown); edge probabilities for agents within the same cluster differ from those across clusters. The same cluster structure in SBMs also determines our heterogeneous rewards. Rewards distributions of the same arm vary across agents in different clusters but remain consistent within a cluster, unifying homogeneous and heterogeneous settings and varying degree of heterogeneity. Rewards are independent samples from these distributions. The objective is to minimize system-wide regret across all agents. To address this, we propose a novel algorithm applicable to both known and unknown cluster settings. The algorithm combines an averaging-based consensus approach with a newly introduced information aggregation and weighting technique, resulting in a UCB-type strategy. It accounts for graph randomness, leverages both intra-cluster (homogeneous) and inter-cluster (heterogeneous) information from rewards and graphs, and incorporates cluster detection for unknown cluster settings. We derive optimal instance-dependent regret upper bounds of order log\n                    <jats:italic toggle=\"yes\">T<\/jats:italic>\n                    under sub-Gaussian rewards. Importantly, our regret bounds capture the degree of heterogeneity in the system (an additional layer of complexity), exhibit smaller constants, scale better for large systems, and impose significantly relaxed assumptions on edge probabilities.\n                  <\/jats:p>","DOI":"10.1145\/3771570","type":"journal-article","created":{"date-parts":[[2025,12,2]],"date-time":"2025-12-02T20:07:03Z","timestamp":1764706023000},"page":"1-59","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Heterogeneous Multi-agent Multi-armed Bandit on Stochastic Block Models"],"prefix":"10.1145","volume":"9","author":[{"ORCID":"https:\/\/orcid.org\/0009-0005-1814-0372","authenticated-orcid":false,"given":"Mengfan","family":"Xu","sequence":"first","affiliation":[{"name":"Mechanical and Industrial Engineering, University of Massachusetts Amherst, Amherst, MA, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4273-3005","authenticated-orcid":false,"given":"Liren","family":"Shan","sequence":"additional","affiliation":[{"name":"Toyota Technological Institute at Chicago, Chicago, IL, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4049-3922","authenticated-orcid":false,"given":"Fatemeh","family":"Ghaffari","sequence":"additional","affiliation":[{"name":"Manning College of Information &amp; Computer Sciences, University of Massachusetts Amherst, Amherst, MA, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-8043-8521","authenticated-orcid":false,"given":"Xuchuang","family":"Wang","sequence":"additional","affiliation":[{"name":"Manning College of Information &amp; Computer Sciences, University of Massachusetts Amherst, Amherst, MA, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8628-5873","authenticated-orcid":false,"given":"Xutong","family":"Liu","sequence":"additional","affiliation":[{"name":"Computer Science and Systems, University of Washington, Tacoma, WA, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9278-2254","authenticated-orcid":false,"given":"Mohammad","family":"Hajiesmaili","sequence":"additional","affiliation":[{"name":"Manning College of Information &amp; Computer Sciences, University of Massachusetts Amherst, Amherst, MA, USA"}]}],"member":"320","published-online":{"date-parts":[[2025,12,2]]},"reference":[{"issue":"177","key":"e_1_2_1_1_1","first-page":"1","article-title":"Community detection and stochastic block models: recent developments","volume":"18","author":"Abbe E.","year":"2018","unstructured":"E. Abbe. Community detection and stochastic block models: recent developments. Journal of Machine Learning Research, 18(177):1-86, 2018.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.2015.2490670"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1214\/22-AOS2196"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.5555\/3586589.3586801"},{"key":"e_1_2_1_5_1","first-page":"1","volume-title":"Proceedings of the international biometrics society annual meeting","volume":"15","author":"Airoldi E. M.","year":"2006","unstructured":"E. M. Airoldi, D. M. Blei, S. E. Fienberg, E. P. Xing, and T. Jaakkola. Mixed membership stochastic block models for relational data with application to protein-protein interactions. In Proceedings of the international biometrics society annual meeting, volume 15, page 1, 2006."},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1013689704352"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1137\/S0097539701398375"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/3637528.3671691"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1140\/epjb\/e2004-00127-8"},{"key":"e_1_2_1_10_1","first-page":"31","article-title":"Distributed multi-player bandits-a game of thrones approach","author":"Bistritz I.","year":"2018","unstructured":"I. Bistritz and A. Leshem. Distributed multi-player bandits-a game of thrones approach. Advances in Neural Information Processing Systems, 31, 2018.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_1_11_1","first-page":"631","volume-title":"International Conference on Artificial Intelligence and Statistics","author":"Blaser E.","year":"2024","unstructured":"E. Blaser, C. Li, and H. Wang. Federated linear contextual bandits with heterogeneous clients. In International Conference on Artificial Intelligence and Statistics, pages 631-639. PMLR, 2024."},{"key":"e_1_2_1_12_1","first-page":"2257","volume-title":"International Conference on Machine Learning","author":"Braun G.","year":"2022","unstructured":"G. Braun, H. Tyagi, and C. Biernacki. An iterative clustering algorithm for the contextual stochastic block model with optimality guarantees. In International Conference on Machine Learning, pages 2257-2291. PMLR, 2022."},{"key":"e_1_2_1_13_1","first-page":"3471","volume-title":"International conference on artificial intelligence and statistics","author":"Chawla R.","year":"2020","unstructured":"R. Chawla, A. Sankararaman, A. Ganesh, and S. Shakkottai. The gossiping insert-eliminate algorithm for multi-agent bandits. In International conference on artificial intelligence and statistics, pages 3471-3481. PMLR, 2020."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/TWC.2018.2876823"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11192-020-03708-x"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/3664647.3681269"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1051\/proc\/201760001"},{"key":"e_1_2_1_18_1","first-page":"31","article-title":"Contextual stochastic block models","author":"Deshpande Y.","year":"2018","unstructured":"Y. Deshpande, S. Sen, A. Montanari, and E. Mossel. Contextual stochastic block models. Advances in Neural Information Processing Systems, 31, 2018.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_1_19_1","first-page":"36","article-title":"Exact recovery and bregman hard clustering of node-attributed stochastic block model","author":"Dreveton M.","year":"2024","unstructured":"M. Dreveton, F. Fernandes, and D. Figueiredo. Exact recovery and bregman hard clustering of node-attributed stochastic block model. Advances in Neural Information Processing Systems, 36, 2024.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_1_20_1","first-page":"2730","volume-title":"International Conference on Machine Learning","author":"Dubey A.","year":"2020","unstructured":"A. Dubey and A. Pentland. Cooperative multi-agent bandits with heavy tails. In International Conference on Machine Learning, 2730-2739, 2020."},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1051\/ps\/2022019"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00607-024-01341-7"},{"key":"e_1_2_1_23_1","volume-title":"On random graphs i. Publ. math. debrecen, 6(290-297):18","author":"P.","year":"1959","unstructured":"P. ERDdS and A. R&wi. On random graphs i. Publ. math. debrecen, 6(290-297):18, 1959."},{"key":"e_1_2_1_24_1","first-page":"757","volume-title":"International conference on machine learning","author":"Gentile C.","year":"2014","unstructured":"C. Gentile, S. Li, and G. Zappella. Online clustering of bandits. In International conference on machine learning, pages 757-765. PMLR, 2014."},{"key":"e_1_2_1_25_1","first-page":"1253","volume-title":"International Conference on machine learning","author":"Gentile C.","year":"2017","unstructured":"C. Gentile, S. Li, P. Kar, A. Karatzoglou, G. Zappella, and E. Etrue. On context-dependent clustering of bandits. In International Conference on machine learning, pages 1253-1262. PMLR, 2017."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1016\/0378-8733(83)90021-7"},{"key":"e_1_2_1_27_1","first-page":"27057","article-title":"Federated linear contextual bandits","volume":"34","author":"Huang R.","year":"2021","unstructured":"R. Huang, W. Wu, J. Yang, and C. Shen. Federated linear contextual bandits. Advances in Neural Information Processing Systems, 34:27057-27068, 2021.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_1_28_1","first-page":"1","article-title":"Multi-agent bandit with agent-dependent expected rewards","author":"Jiang F.","year":"2023","unstructured":"F. Jiang and H. Cheng. Multi-agent bandit with agent-dependent expected rewards. Swarm Intelligence, 1-33, 2023.","journal-title":"Swarm Intelligence"},{"key":"e_1_2_1_29_1","first-page":"1301","volume-title":"International conference on machine learning","author":"Korda N.","year":"2016","unstructured":"N. Korda, B. Szorenyi, and S. Li. Distributed clustering of linear bandits in peer to peer networks. In International conference on machine learning, pages 1301-1309. PMLR, 2016."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/ECC.2016.7810293"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/CDC.2016.7798264"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.automatica.2020.109445"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11257-023-09358-x"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.11763"},{"key":"e_1_2_1_35_1","volume-title":"Online context-dependent clustering in recommendations based on exploration-exploitation algorithms. ArXiv, abs\/1608.03544","author":"Li S.","year":"2016","unstructured":"S. Li, C. Gentile, A. Karatzoglou, and G. Zappella. Online context-dependent clustering in recommendations based on exploration-exploitation algorithms. ArXiv, abs\/1608.03544, 2016."},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/2911451.2911548"},{"key":"e_1_2_1_37_1","volume-title":"Improved algorithm on online clustering of bandits. arXiv preprint arXiv:1902.09162","author":"Li S.","year":"2019","unstructured":"S. Li, W. Chen, and K.-S. Leung. Improved algorithm on online clustering of bandits. arXiv preprint arXiv:1902.09162, 2019."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/JSAC.2022.3142374"},{"key":"e_1_2_1_39_1","volume-title":"Demystifying online clustering of bandits: Enhanced exploration under stochastic and smoothed adversarial contexts. arXiv preprint arXiv:2501.00891","author":"Li Z.","year":"2025","unstructured":"Z. Li, M. Liu, X. Dai, and J. Lui. Demystifying online clustering of bandits: Enhanced exploration under stochastic and smoothed adversarial contexts. arXiv preprint arXiv:2501.00891, 2025."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.physa.2008.01.120"},{"key":"e_1_2_1_41_1","first-page":"1221","volume-title":"Uncertainty in Artificial Intelligence","author":"Liu X.","year":"2022","unstructured":"X. Liu, H. Zhao, T. Yu, S. Li, and J. C. Lui. Federated online clustering of bandits. In Uncertainty in Artificial Intelligence, pages 1221-1231. PMLR, 2022."},{"key":"e_1_2_1_42_1","first-page":"32","article-title":"Decentralized cooperative stochastic bandits","author":"Mart\u00ednez-Rubio D.","year":"2019","unstructured":"D. Mart\u00ednez-Rubio, V. Kanade, and P. Rebeschini. Decentralized cooperative stochastic bandits. Advances in Neural Information Processing Systems, 32, 2019.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_1_43_1","volume-title":"Exploiting heterogeneity in robust federated best-arm identification. arXiv preprint arXiv:2109.05700","author":"Mitra A.","year":"2021","unstructured":"A. Mitra, H. Hassani, and G. Pappas. Exploiting heterogeneity in robust federated best-arm identification. arXiv preprint arXiv:2109.05700, 2021."},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1145\/2661829.2662063"},{"key":"e_1_2_1_45_1","first-page":"36","article-title":"Blocked collaborative bandits: online collaborative filtering with per-item budget constraints","author":"Pal S.","year":"2024","unstructured":"S. Pal, A. Suggala, K. Shanmugam, and P. Jain. Blocked collaborative bandits: online collaborative filtering with per-item budget constraints. Advances in Neural Information Processing Systems, 36, 2024.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_2_1_46_1","volume-title":"Near-optimal collaborative learning in bandits. In 2022-36th Conference on Neural Information Processing System","author":"R\u00e9da C.","year":"2022","unstructured":"C. R\u00e9da, S. Vakili, and E. Kaufmann. Near-optimal collaborative learning in bandits. In 2022-36th Conference on Neural Information Processing System, 2022."},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/3366701"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1007\/s41109-019-0170-z"},{"key":"e_1_2_1_49_1","first-page":"4120","volume-title":"International Conference on Artificial Intelligence and Statistics","author":"Wang P.-A.","year":"2020","unstructured":"P.-A. Wang, A. Proutiere, K. Ariu, Y. Jedra, and A. Russo. Optimal algorithms for multiplayer multi-armed bandits. In International Conference on Artificial Intelligence and Statistics, pages 4120-4129. PMLR, 2020."},{"key":"e_1_2_1_50_1","first-page":"4120","volume-title":"International Conference on Artificial Intelligence and Statistics","author":"Wang P.-A.","year":"2020","unstructured":"P.-A. Wang, A. Proutiere, K. Ariu, Y. Jedra, and A. Russo. Optimal algorithms for multiplayer multi-armed bandits. In International Conference on Artificial Intelligence and Statistics, pages 4120-4129. PMLR, 2020."},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2018.2866041"},{"key":"e_1_2_1_52_1","volume-title":"The Eleventh International Conference on Learning Representations","author":"Wang X.","year":"2022","unstructured":"X.Wang, L. Yang, Y.-Z. J. Chen, X. Liu, M. Hajiesmaili, D. Towsley, and J. C. Lui. Achieving near-optimal individual regret & low communications in multi-agent bandits. In The Eleventh International Conference on Learning Representations, 2022."},{"key":"e_1_2_1_53_1","volume-title":"International Conference on Learning Representations","author":"Wang X.","year":"2023","unstructured":"X.Wang, L. Yang, Y.-Z. J. Chen, X. Liu, M. Hajiesmaili, D. Towsley, and J. C. Lui. Achieve near-optimal individual regret & low communications in multi-agent bandits. In International Conference on Learning Representations, 2023."},{"key":"e_1_2_1_54_1","first-page":"1531","volume-title":"International Conference on Artificial Intelligence and Statistics","author":"Wang Z.","year":"2021","unstructured":"Z. Wang, C. Zhang, M. K. Singh, L. Riek, and K. Chaudhuri. Multitask bandit learning through heterogeneous feedback aggregation. In International Conference on Artificial Intelligence and Statistics, 1531-1539, 2021."},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/3459637.3482328"},{"key":"e_1_2_1_56_1","volume-title":"Regret lower bounds in multi-agent multi-armed bandit. arXiv preprint arXiv:2308.08046","author":"Xu M.","year":"2023","unstructured":"M. Xu and D. Klabjan. Regret lower bounds in multi-agent multi-armed bandit. arXiv preprint arXiv:2308.08046, 2023."},{"key":"e_1_2_1_57_1","volume-title":"Advances on Neural Information Processing Systems","author":"Xu M.","year":"2023","unstructured":"M. Xu and D. Klabjan. Decentralized randomly distributed multi-agent multi-armed bandit with heterogeneous rewards. Advances on Neural Information Processing Systems, 2023."},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP43922.2022.9747833"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v38i18.30045"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2018.2870521"},{"key":"e_1_2_1_61_1","volume-title":"An information flow model for conflict and fission in small groups. Journal of anthropological research, 33(4):452-473","author":"Zachary W. W.","year":"1977","unstructured":"W. W. Zachary. An information flow model for conflict and fission in small groups. Journal of anthropological research, 33(4):452-473, 1977."},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1109\/JIOT.2020.3017377"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1109\/TAC.2023.3247982"},{"key":"e_1_2_1_64_1","doi-asserted-by":"publisher","DOI":"10.1109\/CDC42340.2020.9303836"},{"key":"e_1_2_1_65_1","volume-title":"Decentralized multi-armed bandit can outperform classic upper confidence bound. arXiv preprint arXiv:2111.10933","author":"Zhu J.","year":"2021","unstructured":"J. Zhu, E. Mulle, C. S. Smith, and J. Liu. Decentralized multi-armed bandit can outperform classic upper confidence bound. arXiv preprint arXiv:2111.10933, 2021."},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1145\/3410220.3453919"}],"container-title":["Proceedings of the ACM on Measurement and Analysis of Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3771570","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3771570","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,3]],"date-time":"2025-12-03T17:26:17Z","timestamp":1764782777000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3771570"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,12]]},"references-count":66,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2025,12]]}},"alternative-id":["10.1145\/3771570"],"URL":"https:\/\/doi.org\/10.1145\/3771570","relation":{},"ISSN":["2476-1249"],"issn-type":[{"value":"2476-1249","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,12]]},"assertion":[{"value":"2025-12-02","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}