{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,7]],"date-time":"2026-03-07T18:00:39Z","timestamp":1772906439017,"version":"3.50.1"},"reference-count":55,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2023,12,8]],"date-time":"2023-12-08T00:00:00Z","timestamp":1701993600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"Key R&D in Hubei Province","award":["2023BAB081"],"award-info":[{"award-number":["2023BAB081"]}]},{"name":"Thousand Talents Plan of Jiangxi Province","award":["jxsq2019201124"],"award-info":[{"award-number":["jxsq2019201124"]}]},{"DOI":"10.13039\/501100006374","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62202338,62376110,61977038"],"award-info":[{"award-number":["62202338,62376110,61977038"]}],"id":[{"id":"10.13039\/501100006374","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Manag. Data"],"published-print":{"date-parts":[[2023,12,8]]},"abstract":"<jats:p>This paper proposes a federated, fair, and fast k-means algorithm (F3KM) to solve the fair clustering problem efficiently in scenarios where data cannot be shared among different parties. The proposed algorithm decomposes the fair k-means problem into multiple subproblems and assigns each subproblem to a client for local computation. Our algorithm allows each client to possess multiple sensitive attributes (or have no sensitive attributes). We propose an in-processing method that employs the alternating direction method of multipliers (ADMM) to solve each subproblem. During the procedure of solving subproblems, only the computation results are exchanged between the server and the clients, without exchanging the raw data. Our theoretical analysis shows that F3KM is efficient in terms of both communication and computation complexities. Specifically, it achieves a better trade-off between utility and communication complexity, and reduces the computation complexity to linear with respect to the dataset size. Our experiments show that F3KM achieves a better trade-off between utility and fairness than other methods. Moreover, F3KM is able to cluster five million points in one hour, highlighting its impressive efficiency.<\/jats:p>","DOI":"10.1145\/3626728","type":"journal-article","created":{"date-parts":[[2023,12,12]],"date-time":"2023-12-12T14:01:21Z","timestamp":1702389681000},"page":"1-25","source":"Crossref","is-referenced-by-count":6,"title":["F3KM: Federated, Fair, and Fast k-means"],"prefix":"10.1145","volume":"1","author":[{"ORCID":"https:\/\/orcid.org\/0009-0009-2850-1505","authenticated-orcid":false,"given":"Shengkun","family":"Zhu","sequence":"first","affiliation":[{"name":"Wuhan University, Wuhan, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8989-9662","authenticated-orcid":false,"given":"Quanqing","family":"Xu","sequence":"additional","affiliation":[{"name":"OceanBase, Ant Group, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1719-3358","authenticated-orcid":false,"given":"Jinshan","family":"Zeng","sequence":"additional","affiliation":[{"name":"Jiangxi Normal University, Nanchang, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5461-4281","authenticated-orcid":false,"given":"Sheng","family":"Wang","sequence":"additional","affiliation":[{"name":"Wuhan University, Wuhan, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2911-0070","authenticated-orcid":false,"given":"Yuan","family":"Sun","sequence":"additional","affiliation":[{"name":"La Trobe University, Melbourne, Australia"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-9747-965X","authenticated-orcid":false,"given":"Zhifeng","family":"Yang","sequence":"additional","affiliation":[{"name":"OceanBase, Ant Group, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-3530-6476","authenticated-orcid":false,"given":"Chuanhui","family":"Yang","sequence":"additional","affiliation":[{"name":"OceanBase, Ant Group, Hangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1062-0970","authenticated-orcid":false,"given":"Zhiyong","family":"Peng","sequence":"additional","affiliation":[{"name":"Wuhan University, Wuhan, China"}]}],"member":"320","published-online":{"date-parts":[[2023,12,12]]},"reference":[{"key":"e_1_2_2_1_1","unstructured":"2015. Open University Learning Analytics dataset. https:\/\/analyse.kmi.open.ac.uk\/open_dataset."},{"key":"e_1_2_2_2_1","unstructured":"2017. The Home Mortgage Disclosure Act. https:\/\/ffiec.cfpb.gov\/data-browser\/."},{"key":"e_1_2_2_3_1","unstructured":"2022. Utrecht Fairness Recruitment dataset. https:\/\/www.kaggle.com\/datasets\/ictinstitute\/utrecht-fairness-recruitment-dataset."},{"key":"e_1_2_2_4_1","doi-asserted-by":"crossref","unstructured":"Sara Ahmadian Alessandro Epasto Ravi Kumar and Mohammad Mahdian. 2019. Clustering without over-representation. In KDD. 267--275.","DOI":"10.1145\/3292500.3330987"},{"key":"e_1_2_2_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/304181.304187"},{"key":"e_1_2_2_6_1","doi-asserted-by":"crossref","unstructured":"Abolfazl Asudeh HV Jagadish Julia Stoyanovich and Gautam Das. 2019. Designing fair ranking schemes. In SIGMOD. 1259--1276.","DOI":"10.1145\/3299869.3300079"},{"key":"e_1_2_2_7_1","unstructured":"Tsanas Athanasios and Little Max. 2009. UCI Machine Learning Repository: Gender Gap in Spanish WP Data Set. https:\/\/archive.ics.uci.edu\/ml\/datasets\/ParkinsonsTelemonitoring\/"},{"key":"e_1_2_2_8_1","doi-asserted-by":"crossref","unstructured":"Olivier Bachem Mario Lucic and Andreas Krause. 2018. Scalable k-means clustering via lightweight coresets. In KDD. 1119--1127.","DOI":"10.1145\/3219819.3219973"},{"key":"e_1_2_2_9_1","unstructured":"Arturs Backurs Piotr Indyk Krzysztof Onak Baruch Schieber Ali Vakilian and Tal Wagner. 2019. Scalable fair clustering. In ICML. 405--413."},{"key":"e_1_2_2_10_1","unstructured":"Maria-Florina F Balcan Steven Ehrlich and Yingyu Liang. 2013. Distributed k-means and k-median clustering on general topologies. In NeurIPS. 1995--2003."},{"key":"e_1_2_2_11_1","unstructured":"Suman Bera Deeparnab Chakrabarty Nicolas Flores and Maryam Negahbani. 2019. Fair algorithms for clustering. In NeurIPS. 4955--4966."},{"key":"e_1_2_2_12_1","doi-asserted-by":"crossref","unstructured":"Suman Bera Syamantak Das Sainyam Galhotra and Sagar Sudhir Kale. 2022. Fair k-Center Clustering in MapReduce and Streaming Settings. In WWW. 1414--1422.","DOI":"10.1145\/3485447.3512188"},{"key":"e_1_2_2_13_1","doi-asserted-by":"publisher","DOI":"10.1561\/2200000016"},{"key":"e_1_2_2_14_1","doi-asserted-by":"crossref","unstructured":"Badrish Chandramouli Junyi Xie and Jun Yang. 2006. On the database\/network interface in large-scale publish\/subscribe systems. In SIGMOD. 587--598.","DOI":"10.1145\/1142473.1142539"},{"key":"e_1_2_2_15_1","unstructured":"Flavio Chierichetti Ravi Kumar Silvio Lattanzi and Sergei Vassilvitskii. 2017. Fair clustering through fairlets. In NeurIPS. 5029--5037."},{"key":"e_1_2_2_16_1","volume-title":"Yin Tat Lee, and Zhao Song","author":"Cohen Michael B","year":"2019","unstructured":"Michael B Cohen, Yin Tat Lee, and Zhao Song. 2019. Solving linear programs in the current matrix multiplication time. In STOC. 938--942."},{"key":"e_1_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jalgor.2003.12.001"},{"key":"e_1_2_2_18_1","unstructured":"Don Kurian Dennis Tian Li and Virginia Smith. 2021. Heterogeneity for the win: One-shot federated clustering. In ICML. 2611--2620."},{"key":"e_1_2_2_19_1","unstructured":"Hu Ding Yu Liu Lingxiao Huang and Jian Li. 2016. k-means clustering with distributed dimensions. In ICML. 1339--1348."},{"key":"e_1_2_2_20_1","volume-title":"Vertica-ml: Distributed machine learning in vertica database. In SIGMOD. 755--768.","author":"Fard Arash","year":"2020","unstructured":"Arash Fard, Anh Le, George Larionov, Waqas Dhillon, and Chuck Bear. 2020. Vertica-ml: Distributed machine learning in vertica database. In SIGMOD. 755--768."},{"key":"e_1_2_2_21_1","doi-asserted-by":"crossref","unstructured":"Tom Farrand Fatemehsadat Mireshghallah Sahib Singh and Andrew Trask. 2020. Neither private nor fair: Impact of data imbalance on utility and fairness in differential privacy. In CCS. 15--19.","DOI":"10.1145\/3411501.3419419"},{"key":"e_1_2_2_22_1","doi-asserted-by":"crossref","unstructured":"Michael Feldman Sorelle A Friedler John Moeller Carlos Scheidegger and Suresh Venkatasubramanian. 2015. Certifying and removing disparate impact. In KDD. 259--268.","DOI":"10.1145\/2783258.2783311"},{"key":"e_1_2_2_23_1","volume-title":"Blindfl: Vertical federated machine learning without peeking into your data. In SIGMOD. 1316--1330.","author":"Fu Fangcheng","year":"2022","unstructured":"Fangcheng Fu, Huanran Xue, Yong Cheng, Yangyu Tao, and Bin Cui. 2022. Blindfl: Vertical federated machine learning without peeking into your data. In SIGMOD. 1316--1330."},{"key":"e_1_2_2_24_1","doi-asserted-by":"crossref","unstructured":"Mehrdad Ghadiri Samira Samadi and Santosh Vempala. 2021. Socially fair k-means clustering. In FAccT. 438--448.","DOI":"10.1145\/3442188.3445906"},{"key":"e_1_2_2_25_1","unstructured":"Avishek Ghosh Jichan Chung Dong Yin and Kannan Ramchandran. 2020. An Efficient Framework for Clustered Federated Learning. In NeurIPS. 19586--19597."},{"key":"e_1_2_2_26_1","unstructured":"Mehmet G\u00f6nen and Adam A Margolin. 2014. Localized data fusion for kernel k-means clustering with application to cancer biology. In NeurIPS. 1305--1313."},{"key":"e_1_2_2_27_1","volume-title":"KFC: A Scalable Approximation Algorithm for k-center Fair Clustering. In NeurIPS. 14509--14519.","author":"Harb Elfarouk","year":"2020","unstructured":"Elfarouk Harb and Ho Shan Lam. 2020. KFC: A Scalable Approximation Algorithm for k-center Fair Clustering. In NeurIPS. 14509--14519."},{"key":"e_1_2_2_28_1","doi-asserted-by":"crossref","unstructured":"Xi He Ashwin Machanavajjhala and Bolin Ding. 2014. Blowfish privacy: Tuning privacy-utility trade-offs using policies. In SIGMOD. 1447--1458.","DOI":"10.1145\/2588555.2588581"},{"key":"e_1_2_2_29_1","doi-asserted-by":"crossref","unstructured":"Zhicheng He Wei Xia Kai Dong Huifeng Guo Ruiming Tang Dingyin Xia and Rui Zhang. 2022. Unsupervised Learning Style Classification for Learning Path Generation in Online Education Platforms. In KDD. 2997--3006.","DOI":"10.1145\/3534678.3539107"},{"key":"e_1_2_2_30_1","unstructured":"Lingxiao Huang Shaofeng Jiang and Nisheeth Vishnoi. 2019. Coresets for clustering with fairness constraints. In NeurIPS. 7587--7598."},{"key":"e_1_2_2_31_1","first-page":"29566","article-title":"Coresets for Vertical Federated Learning: Regularized Linear Regression and k-Means Clustering","volume":"35","author":"Huang Lingxiao","year":"2022","unstructured":"Lingxiao Huang, Zhize Li, Jialin Sun, and Haoyu Zhao. 2022. Coresets for Vertical Federated Learning: Regularized Linear Regression and k-Means Clustering. In NeurIPS, Vol. 35. 29566--29581.","journal-title":"NeurIPS"},{"key":"e_1_2_2_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/3477495.3531724"},{"key":"e_1_2_2_33_1","doi-asserted-by":"crossref","unstructured":"Maliha Tashfia Islam Anna Fariha Alexandra Meliou and Babak Salimi. 2022. Through the data management lens: Experimental analysis and evaluation of fair classification. In SIGMOD. 232--246.","DOI":"10.1145\/3514221.3517841"},{"key":"e_1_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1561\/2200000083"},{"key":"e_1_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/MSP.2020.2975749"},{"key":"e_1_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.14778\/3583140.3583146"},{"key":"e_1_2_2_37_1","doi-asserted-by":"crossref","unstructured":"Edo Liberty Zohar Karnin Bing Xiang Laurence Rouesnel Baris Coskun Ramesh Nallapati Julio Delgado Amir Sadoughi Yury Astashonok Piali Das et al. 2020. Elastic machine learning algorithms in amazon sagemaker. In SIGMOD. 731--737.","DOI":"10.1145\/3318464.3386126"},{"key":"e_1_2_2_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.1982.1056489"},{"key":"e_1_2_2_39_1","unstructured":"Brendan McMahan Eider Moore Daniel Ramage Seth Hampson and Blaise Aguera y Arcas. 2017. Communication-efficient learning of deep networks from decentralized data. In AISTATS. PMLR 1273--1282."},{"key":"e_1_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/3457607"},{"key":"e_1_2_2_41_1","first-page":"2371","article-title":"Coordinate Descent Method for k-means","volume":"44","author":"Nie Feiping","year":"2021","unstructured":"Feiping Nie, Jingjing Xue, Danyang Wu, Rong Wang, Hui Li, and Xuelong Li. 2021. Coordinate Descent Method for k-means. IEEE Trans. Pattern Anal. Mach. Intell. 44, 5 (2021), 2371--2385.","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"e_1_2_2_42_1","doi-asserted-by":"crossref","unstructured":"John Paparrizos and Luis Gravano. 2015. k-shape: Efficient and accurate clustering of time series. In SIGMOD. 1855--1870.","DOI":"10.1145\/2723372.2737793"},{"key":"e_1_2_2_43_1","doi-asserted-by":"publisher","DOI":"10.14778\/2824032.2824115"},{"key":"e_1_2_2_44_1","first-page":"4424","article-title":"Federated multi-task learning","volume":"30","author":"Smith Virginia","year":"2017","unstructured":"Virginia Smith, Chao-Kai Chiang, Maziar Sanjabi, and Ameet S Talwalkar. 2017. Federated multi-task learning. In NeurIPS, Vol. 30. 4424--4434.","journal-title":"NeurIPS"},{"key":"e_1_2_2_45_1","volume-title":"Metacare: Meta-learning with hierarchical subtyping for cold-start diagnosis prediction in healthcare data. In SIGIR. 449--459.","author":"Tan Yanchao","year":"2022","unstructured":"Yanchao Tan, Carl Yang, Xiangyu Wei, Chaochao Chen, Weiming Liu, Longfei Li, Jun Zhou, and Xiaolin Zheng. 2022. Metacare: Meta-learning with hierarchical subtyping for cold-start diagnosis prediction in healthcare data. In SIGIR. 449--459."},{"key":"e_1_2_2_46_1","doi-asserted-by":"crossref","unstructured":"Suhas Thejaswi Ameet Gadekar Bruno Ordozgoiti and Michal Osadnik. 2022. Clustering with Fair-Center Representation: Parameterized Approximation Algorithms and Heuristics. In KDD. 1749--1759.","DOI":"10.1145\/3534678.3539487"},{"key":"e_1_2_2_47_1","doi-asserted-by":"publisher","DOI":"10.14778\/3425879.3425887"},{"key":"e_1_2_2_48_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10915-018-0757-z"},{"key":"e_1_2_2_49_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10115-007-0114-2"},{"key":"e_1_2_2_50_1","unstructured":"Yi Xu Mingrui Liu Qihang Lin and Tianbao Yang. 2017. ADMM without a fixed penalty parameter: Faster convergence with new adaptive penalization. In NeurIPS. 1267--1277."},{"key":"e_1_2_2_51_1","doi-asserted-by":"publisher","DOI":"10.1145\/3339474"},{"key":"e_1_2_2_52_1","doi-asserted-by":"publisher","DOI":"10.14778\/3554821.3554830"},{"key":"e_1_2_2_53_1","volume-title":"Shaobo Lin, and Yuan Yao.","author":"Zeng Jinshan","year":"2019","unstructured":"Jinshan Zeng, Tim Tsz-Kit Lau, Shaobo Lin, and Yuan Yao. 2019. Global convergence of block coordinate descent in deep learning. In ICML. 7313--7323."},{"key":"e_1_2_2_54_1","doi-asserted-by":"publisher","DOI":"10.1145\/3589267"},{"key":"e_1_2_2_55_1","doi-asserted-by":"publisher","DOI":"10.14778\/1687627.1687709"}],"container-title":["Proceedings of the ACM on Management of Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3626728","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3626728","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,22]],"date-time":"2025-08-22T13:01:20Z","timestamp":1755867680000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3626728"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,12,8]]},"references-count":55,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2023,12,8]]}},"alternative-id":["10.1145\/3626728"],"URL":"https:\/\/doi.org\/10.1145\/3626728","relation":{},"ISSN":["2836-6573"],"issn-type":[{"value":"2836-6573","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,12,8]]}}}