{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T03:55:16Z","timestamp":1760241316271,"version":"build-2065373602"},"reference-count":47,"publisher":"Association for Computing Machinery (ACM)","issue":"5","funder":[{"name":"NSF","award":["CCF-1907661, DMS-2014279, IIS-2218713 and IIS-2218773"],"award-info":[{"award-number":["CCF-1907661, DMS-2014279, IIS-2218713 and IIS-2218773"]}]},{"name":"ARO under MURI","award":["W911NF-11-1-0304"],"award-info":[{"award-number":["W911NF-11-1-0304"]}]},{"name":"Sloan Research Fellowship","award":["NSF CCF 2002272, NSF IIS 2107304, NSF CIF 2212262"],"award-info":[{"award-number":["NSF CCF 2002272, NSF IIS 2107304, NSF CIF 2212262"]}]},{"name":"ONR Young Investigator Award, and NSF CAREER Award","award":["2144994"],"award-info":[{"award-number":["2144994"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["J. ACM"],"published-print":{"date-parts":[[2025,10,31]]},"abstract":"<jats:p>\n            Multi-distribution learning (MDL), which seeks to learn a shared model that minimizes the worst-case risk across\n            <jats:italic toggle=\"yes\">k<\/jats:italic>\n            distinct data distributions, has emerged as a unified framework in response to the evolving demand for robustness, fairness, multi-group collaboration, and so on. Achieving data-efficient MDL necessitates adaptive sampling, also called on-demand sampling, throughout the learning process. However, there exist substantial gaps between the state-of-the-art upper and lower bounds on the optimal sample complexity. Focusing on a hypothesis class of Vapnik\u2013Chervonenkis (VC) dimension\n            <jats:italic toggle=\"yes\">d<\/jats:italic>\n            , we propose a novel algorithm that yields an \u025b-optimal randomized hypothesis with a sample complexity on the order of\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\(\\frac{d+k}{\\varepsilon ^2}\\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            (modulo some logarithmic factor), matching the best-known lower bound. Our algorithmic ideas and theory are further extended to accommodate Rademacher classes. The proposed algorithms are oracle-efficient, which access the hypothesis class solely through an empirical risk minimization oracle. Additionally, we establish the necessity of improper learning, revealing a large sample size barrier when only deterministic, proper hypotheses are permitted. These findings resolve three open problems presented in COLT 2023 (i.e., Awasthi et\u00a0al. [\n            <jats:xref ref-type=\"bibr\">4<\/jats:xref>\n            , Problems 1, 3, and 4]).\n          <\/jats:p>","DOI":"10.1145\/3760256","type":"journal-article","created":{"date-parts":[[2025,8,25]],"date-time":"2025-08-25T11:25:44Z","timestamp":1756121144000},"page":"1-71","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Optimal Multi-Distribution Learning"],"prefix":"10.1145","volume":"72","author":[{"ORCID":"https:\/\/orcid.org\/0009-0001-2061-6801","authenticated-orcid":false,"given":"Zihan","family":"Zhang","sequence":"first","affiliation":[{"name":"electrical and computer engineering, Princeton University","place":["Princeton, United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-7640-0136","authenticated-orcid":false,"given":"Wenhao","family":"Zhan","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, Princeton University","place":["Princeton, United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9256-5815","authenticated-orcid":false,"given":"Yuxin","family":"Chen","sequence":"additional","affiliation":[{"name":"Department of Statistics and Data Science, University of Pennsylvania","place":["Philadelphia, United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0056-8299","authenticated-orcid":false,"given":"Simon S.","family":"Du","sequence":"additional","affiliation":[{"name":"Paul G. Allen School of Computer Science and Engineering, University of Washington","place":["Seattle, United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0064-7800","authenticated-orcid":false,"given":"Jason","family":"Lee","sequence":"additional","affiliation":[{"name":"Department of Electrical and Computer Engineering, Princeton University","place":["Princeton, United States"]}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,10,11]]},"reference":[{"key":"e_1_3_4_2_2","first-page":"53","volume-title":"Proceedings of the 39th International Conference on Machine Learning","author":"Abernethy Jacob","year":"2022","unstructured":"Jacob Abernethy, Pranjal Awasthi, Matth\u00e4us Kleindessner, Jamie Morgenstern, Chris Russell, and Jie Zhang. 2022. Active sampling for min-max fairness. In Proceedings of the 39th International Conference on Machine Learning. 53\u201365."},{"key":"e_1_3_4_3_2","doi-asserted-by":"publisher","DOI":"10.4086\/toc.2012.v008a006"},{"key":"e_1_3_4_4_2","first-page":"10810","volume-title":"Proceedings of the 34th International Conference on Neural Information Processing Systems","author":"Asi Hilal","year":"2021","unstructured":"Hilal Asi, Yair Carmon, Arun Jambulapati, Yujia Jin, and Aaron Sidford. 2021. Stochastic bias-reduced gradient methods. In Proceedings of the 34th International Conference on Neural Information Processing Systems. 10810\u201310822."},{"key":"e_1_3_4_5_2","first-page":"5943","volume-title":"Proceedings of the 36th Annual Conference on Learning Theory","author":"Awasthi Pranjal","year":"2023","unstructured":"Pranjal Awasthi, Nika Haghtalab, and Eric Zhao. 2023. Open problem: The sample complexity of multi-distribution learning for VC classes. In Proceedings of the 36th Annual Conference on Learning Theory. 5943\u20135949."},{"key":"e_1_3_4_6_2","first-page":"1005","volume-title":"Proceedings of the 38th International Conference on Machine Learning","author":"Blum Avrim","year":"2021","unstructured":"Avrim Blum, Nika Haghtalab, Richard Lanas Phillips, and Han Shao. 2021. One for one, or all for all: Equilibria and optimality of collaboration in federated learning. In Proceedings of the 38th International Conference on Machine Learning. 1005\u20131014."},{"key":"e_1_3_4_7_2","volume-title":"Proceedings of the 30th International Conference on Neural Information Processing Systems","author":"Blum Avrim","year":"2017","unstructured":"Avrim Blum, Nika Haghtalab, Ariel D. Procaccia, and Mingda Qiao. 2017. Collaborative PAC learning. In Proceedings of the 30th International Conference on Neural Information Processing Systems."},{"key":"e_1_3_4_8_2","first-page":"6786","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence","author":"Blum Avrim","year":"2021","unstructured":"Avrim Blum, Shelby Heinecke, and Lev Reyzin. 2021. Communication-aware collaborative learning. In Proceedings of the AAAI Conference on Artificial Intelligence. 6786\u20136793."},{"key":"e_1_3_4_9_2","doi-asserted-by":"publisher","DOI":"10.1145\/76359.76371"},{"issue":"1","key":"e_1_3_4_10_2","doi-asserted-by":"crossref","first-page":"126","DOI":"10.1109\/JPROC.2015.2494161","article-title":"Magging: Maximin aggregation for inhomogeneous large-scale data","volume":"104","author":"B\u00fchlmann Peter","year":"2015","unstructured":"Peter B\u00fchlmann and Nicolai Meinshausen. 2015. Magging: Maximin aggregation for inhomogeneous large-scale data. Proc. IEEE 104, 1 (2015), 126\u2013135.","journal-title":"Proc. IEEE"},{"key":"e_1_3_4_11_2","first-page":"35866","volume-title":"Proceedings of the 35th International Conference on Neural Information Processing Systems","author":"Carmon Yair","year":"2022","unstructured":"Yair Carmon and Danielle Hausler. 2022. Distributionally robust optimization via ball oracle acceleration. In Proceedings of the 35th International Conference on Neural Information Processing Systems. 35866\u201335879."},{"key":"e_1_3_4_12_2","volume-title":"Proceedings of the 31st International Conference on Neural Information Processing Systems","author":"Chen Jiecao","year":"2018","unstructured":"Jiecao Chen, Qin Zhang, and Yuan Zhou. 2018. Tight bounds for collaborative PAC learning via multiplicative weights. In Proceedings of the 31st International Conference on Neural Information Processing Systems."},{"key":"e_1_3_4_13_2","first-page":"15111","volume-title":"Proceedings of the 33rd International Conference on Neural Information Processing Systems","author":"Deng Yuyang","year":"2020","unstructured":"Yuyang Deng, Mohammad Mahdi Kamani, and Mehrdad Mahdavi. 2020. Distributionally robust federated averaging. In Proceedings of the 33rd International Conference on Neural Information Processing Systems. 15111\u201315122."},{"key":"e_1_3_4_14_2","doi-asserted-by":"publisher","DOI":"10.5555\/3692070.3692488"},{"key":"e_1_3_4_15_2","first-page":"181","volume-title":"Proceedings of the SIAM International Conference on Data Mining","author":"Du Wei","year":"2021","unstructured":"Wei Du, Depeng Xu, Xintao Wu, and Hanghang Tong. 2021. Fairness-aware agnostic federated learning. In Proceedings of the SIAM International Conference on Data Mining. SIAM, 181\u2013189."},{"key":"e_1_3_4_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/3402203"},{"key":"e_1_3_4_17_2","first-page":"1095","volume-title":"Proceedings of the ACM SIGACT Symposium on Theory of Computing","author":"Dwork Cynthia","year":"2021","unstructured":"Cynthia Dwork, Michael P. Kim, Omer Reingold, Guy N. Rothblum, and Gal Yona. 2021. Outcome indistinguishability. In Proceedings of the ACM SIGACT Symposium on Theory of Computing. 1095\u20131108."},{"key":"e_1_3_4_18_2","doi-asserted-by":"publisher","DOI":"10.1214\/aop\/1176996452"},{"key":"e_1_3_4_19_2","doi-asserted-by":"publisher","DOI":"10.1006\/jcss.1997.1504"},{"key":"e_1_3_4_20_2","first-page":"1","article-title":"Statistical inference for maximin effects: Identifying stable associations across multiple studies","author":"Guo Zijian","year":"2023","unstructured":"Zijian Guo. 2023. Statistical inference for maximin effects: Identifying stable associations across multiple studies. Journal of the American Statistical Association (2023), 1\u201317.","journal-title":"Journal of the American Statistical Association"},{"key":"e_1_3_4_21_2","first-page":"406","volume-title":"Proceedings of the 35th International Conference on Neural Information Processing Systems","author":"Haghtalab Nika","year":"2022","unstructured":"Nika Haghtalab, Michael Jordan, and Eric Zhao. 2022. On-demand sampling: Learning optimally from multiple distributions. In Proceedings of the 35th International Conference on Neural Information Processing Systems. 406\u2013419."},{"key":"e_1_3_4_22_2","first-page":"72464","volume-title":"Proceedings of the 36th International Conference on Neural Information Processing Systems","author":"Haghtalab Nika","year":"2023","unstructured":"Nika Haghtalab, Michael Jordan, and Eric Zhao. 2023. A unifying perspective on multi-calibration: Game dynamics for multi-objective learning. In Proceedings of the 36th International Conference on Neural Information Processing Systems. 72464\u201372506."},{"key":"e_1_3_4_23_2","first-page":"1929","volume-title":"Proceedings of the 35th International Conference on Machine Learning","author":"Hashimoto Tatsunori","year":"2018","unstructured":"Tatsunori Hashimoto, Megha Srivastava, Hongseok Namkoong, and Percy Liang. 2018. Fairness without demographics in repeated loss minimization. In Proceedings of the 35th International Conference on Machine Learning. 1929\u20131938."},{"key":"e_1_3_4_24_2","volume-title":"Introduction to Online Convex Optimization","author":"Hazan Elad","year":"2022","unstructured":"Elad Hazan. 2022. Introduction to Online Convex Optimization. MIT Press."},{"key":"e_1_3_4_25_2","first-page":"1939","volume-title":"Proceedings of the 35th International Conference on Machine Learning","author":"H\u00e9bert-Johnson Ursula","year":"2018","unstructured":"Ursula H\u00e9bert-Johnson, Michael Kim, Omer Reingold, and Guy Rothblum. 2018. Multicalibration: Calibration for the (computationally-identifiable) masses. In Proceedings of the 35th International Conference on Machine Learning. 1939\u20131948."},{"key":"e_1_3_4_26_2","first-page":"2029","volume-title":"Proceedings of the 35th International Conference on Machine Learning","author":"Hu Weihua","year":"2018","unstructured":"Weihua Hu, Gang Niu, Issei Sato, and Masashi Sugiyama. 2018. Does distributionally robust supervised learning give robust classifiers?. In Proceedings of the 35th International Conference on Machine Learning. 2029\u20132037."},{"key":"e_1_3_4_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00465"},{"key":"e_1_3_4_28_2","doi-asserted-by":"publisher","DOI":"10.1017\/9781108571401"},{"key":"e_1_3_4_29_2","doi-asserted-by":"publisher","DOI":"10.5555\/2981780.2981910"},{"key":"e_1_3_4_30_2","doi-asserted-by":"publisher","DOI":"10.5555\/3360093"},{"key":"e_1_3_4_31_2","first-page":"4615","volume-title":"Proceedings of the 36th International Conference on Machine Learning","author":"Mohri Mehryar","year":"2019","unstructured":"Mehryar Mohri, Gary Sivek, and Ananda Theertha Suresh. 2019. Agnostic federated learning. In Proceedings of the 36th International Conference on Machine Learning. 4615\u20134625."},{"key":"e_1_3_4_32_2","volume-title":"Proceedings of the 31st International Conference on Neural Information Processing Systems","author":"Nguyen Huy","year":"2018","unstructured":"Huy Nguyen and Lydia Zakynthinou. 2018. Improved algorithms for collaborative PAC learning. In Proceedings of the 31st International Conference on Neural Information Processing Systems."},{"key":"e_1_3_4_33_2","first-page":"4185","volume-title":"Proceedings of the 37th Annual Conference on Learning Theory","author":"Peng Binghui","year":"2024","unstructured":"Binghui Peng. 2024. The sample complexity of multi-distribution learning. In Proceedings of the 37th Annual Conference on Learning Theory. PMLR, 4185\u20134204."},{"key":"e_1_3_4_34_2","first-page":"9107","volume-title":"Proceedings of the 38th International Conference on Machine Learning","author":"Rothblum Guy N.","year":"2021","unstructured":"Guy N. Rothblum and Gal Yona. 2021. Multi-group agnostic PAC learnability. In Proceedings of the 38th International Conference on Machine Learning. 9107\u20139115."},{"key":"e_1_3_4_35_2","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9781316779309"},{"key":"e_1_3_4_36_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Sagawa Shiori","year":"2020","unstructured":"Shiori Sagawa, Pang Wei Koh, Tatsunori B. Hashimoto, and Percy Liang. 2020. Distributionally robust neural networks. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_4_37_2","doi-asserted-by":"publisher","DOI":"10.5555\/3524938.3525711"},{"key":"e_1_3_4_38_2","doi-asserted-by":"publisher","DOI":"10.1561\/2200000018"},{"key":"e_1_3_4_39_2","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9781107298019"},{"key":"e_1_3_4_40_2","doi-asserted-by":"publisher","DOI":"10.1007\/BF01448847"},{"key":"e_1_3_4_41_2","doi-asserted-by":"publisher","DOI":"10.1145\/1968.1972"},{"key":"e_1_3_4_42_2","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1994.6.5.851"},{"key":"e_1_3_4_43_2","doi-asserted-by":"publisher","DOI":"10.1017\/9781108627771"},{"key":"e_1_3_4_44_2","unstructured":"Zhenyu Wang Peter B\u00fchlmann and Zijian Guo. 2023. Distributionally robust machine learning with multi-source data. arXiv:2309.02211. Retrieved from https:\/\/arxiv.org\/abs\/2309.02211"},{"key":"e_1_3_4_45_2","unstructured":"Xin Xiong Zijian Guo and Tianxi Cai. 2023. Distributionally robust transfer learning. arXiv:2309.06534. Retrieved from https:\/\/arxiv.org\/abs\/2309.06534"},{"key":"e_1_3_4_46_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Zhang Jingzhao","year":"2020","unstructured":"Jingzhao Zhang, Aditya Krishna Menon, Andreas Veit, Srinadh Bhojanapalli, Sanjiv Kumar, and Suvrit Sra. 2020. Coping with label shift via distributionally robust optimisation. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_4_47_2","first-page":"3858","volume-title":"Proceedings of the 35th Annual Conference on Learning Theory","author":"Zhang Zihan","year":"2022","unstructured":"Zihan Zhang, Xiangyang Ji, and Simon Du. 2022. Horizon-free reinforcement learning in polynomial time: The power of stationary policies. In Proceedings of the 35th Annual Conference on Learning Theory. 3858\u20133904."},{"key":"e_1_3_4_48_2","doi-asserted-by":"crossref","unstructured":"Sicheng Zhao Bo Li Pengfei Xu and Kurt Keutzer. 2020. Multi-source domain adaptation in the deep learning era: A systematic survey. arXiv:2002.12169. Retrieved from https:\/\/arxiv.org\/abs\/2002.12169","DOI":"10.1007\/978-3-030-45529-3_12"}],"container-title":["Journal of the ACM"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3760256","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T14:48:13Z","timestamp":1760194093000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3760256"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,11]]},"references-count":47,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2025,10,31]]}},"alternative-id":["10.1145\/3760256"],"URL":"https:\/\/doi.org\/10.1145\/3760256","relation":{},"ISSN":["0004-5411","1557-735X"],"issn-type":[{"type":"print","value":"0004-5411"},{"type":"electronic","value":"1557-735X"}],"subject":[],"published":{"date-parts":[[2025,10,11]]},"assertion":[{"value":"2024-07-13","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-07-27","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-10-11","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}