{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,1]],"date-time":"2026-05-01T14:01:01Z","timestamp":1777644061831,"version":"3.51.4"},"reference-count":112,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,6,23]],"date-time":"2025-06-23T00:00:00Z","timestamp":1750636800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,6,23]],"date-time":"2025-06-23T00:00:00Z","timestamp":1750636800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100007637","name":"German University in Cairo","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100007637","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Big Data"],"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Machine learning model accuracy, generalization, and reliability are greatly affected by the training data quality. High-quality data-characterized by completeness, consistency, accuracy, representativeness and homogeneity enables meaningful pattern learning and robust prediction. In federated learning (FL), the learning process is collaborative and conducted across decentralized and locally private data nodes. The heterogeneity of data across these nodes degrade model performance and may lead to overfitting, underfitting, and erroneous decision-making. Heterogeneity is caused by inconsistent labeling, missing values, and class imbalances across these nodes. Proper data preparation, including cleaning, normalization, and augmentation, is essential to mitigate these issues and ensure that these distributed datasets reflect the problem domain accurately. The raw data, which is generated from diverse sources with the fundamental constraint that this data cannot be shared among learning nodes exacerbates these challenges. Although data preparation has received great interest in recent years; little attention has been given to data challenges posed when FL is used. Although some surveys mention FL challenges, it is discussed superficially. These papers predominantly focus on one aspect of data challenges such as quality, homogeneity or balance discussing FL within the context of these specific challenges. No recent survey examine all data-related challenges in FL, including their interdependencies and interactions. To address these limitations, the main contribution of this paper is providing a comprehensive overview of data challenges in FL, encompassing data heterogeneity, skewness, representation, quality, bias, and fairness. The paper begins by identifying the data challenges highlighted in the existing literature, with a particular focus on the interrelationships among these challenges, which are categorized into two main groups: non-independently and non-identically distributed (Non-IID) data issues and data quality issues. Subsequently, the paper reviews and compares recognized data challenges solution approaches exploring additional data preparation techniques that could serve as candidate solutions. The paper aims to define the necessary work to optimize the effectiveness of these techniques with respect to distributed and isolated data in FL.<\/jats:p>","DOI":"10.1186\/s40537-025-01195-6","type":"journal-article","created":{"date-parts":[[2025,6,23]],"date-time":"2025-06-23T17:06:36Z","timestamp":1750698396000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":17,"title":["Comprehensive review of federated learning challenges: a data preparation viewpoint"],"prefix":"10.1186","volume":"12","author":[{"given":"Nawraz","family":"Saeed","sequence":"first","affiliation":[]},{"given":"Mohamed","family":"Ashour","sequence":"additional","affiliation":[]},{"given":"Maggie","family":"Mashaly","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,6,23]]},"reference":[{"key":"1195_CR1","volume-title":"Data preparation for data mining","author":"D Pyle","year":"1999","unstructured":"Pyle D. Data preparation for data mining. San Francisco, CA: Morgan Kaufmann; 1999."},{"issue":"5","key":"1195_CR2","first-page":"4646","volume":"35","author":"C Chai","year":"2022","unstructured":"Chai C, Wang J, Luo Y, Niu Z, Li G. Data management for machine learning: a survey. IEEE Trans Knowl Data Eng. 2022;35(5):4646\u201367.","journal-title":"IEEE Trans Knowl Data Eng"},{"key":"1195_CR3","doi-asserted-by":"crossref","unstructured":"Li P, Rao X, Blase J, Zhang Y, Chu X, Zhang C. Cleanml: A study for evaluating the impact of data cleaning on ml classification tasks. In: 2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE; 2021, p. 13\u201324.","DOI":"10.1109\/ICDE51399.2021.00009"},{"issue":"12","key":"1195_CR4","doi-asserted-by":"publisher","first-page":"3429","DOI":"10.14778\/3415478.3415562","volume":"13","author":"SE Whang","year":"2020","unstructured":"Whang SE, Lee J-G. Data collection and quality challenges for deep learning. Proc VLDB Endow. 2020;13(12):3429\u201332.","journal-title":"Proc VLDB Endow"},{"key":"1195_CR5","doi-asserted-by":"crossref","unstructured":"Fredriksson T, Mattos DI, Bosch J, Olsson HH. Data labeling: An empirical investigation into industrial challenges and mitigation strategies. In: International Conference on Product-Focused Software Process Improvement. Cham: Springer; 2020. p. 202\u2013216.","DOI":"10.1007\/978-3-030-64148-1_13"},{"key":"1195_CR6","doi-asserted-by":"crossref","unstructured":"Murray DG, Simsa J, Klimovic A, Indyk I. tf. data: a machine learning data processing framework; 2021. arXiv preprint arXiv:2101.12127.","DOI":"10.14778\/3476311.3476374"},{"issue":"2","key":"1195_CR7","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3377454","volume":"53","author":"J Verbraeken","year":"2020","unstructured":"Verbraeken J, Wolting M, Katzy J, Kloppenburg J, Verbelen T, Rellermeyer JS. A survey on distributed machine learning. ACM Comput Surv 2020;53(2):1\u201333.","journal-title":"ACM Comput Surv"},{"issue":"2","key":"1195_CR8","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3377454","volume":"53","author":"J Verbraeken","year":"2020","unstructured":"Verbraeken J, Wolting M, Katzy J, Kloppenburg J, Verbelen T, Rellermeyer JS. A survey on distributed machine learning. ACM Comput Surv 2020;53(2):1\u201333.","journal-title":"ACM Comput Surv"},{"issue":"5","key":"1195_CR9","doi-asserted-by":"publisher","first-page":"637","DOI":"10.1109\/JIOT.2016.2579198","volume":"3","author":"W Shi","year":"2016","unstructured":"Shi W, Cao J, Zhang Q, Li Y, Xu L. Edge computing: vision and challenges. IEEE Internet Things J. 2016;3(5):637\u201346.","journal-title":"IEEE Internet Things J"},{"issue":"4","key":"1195_CR10","doi-asserted-by":"publisher","first-page":"885","DOI":"10.1007\/s10115-022-01664-x","volume":"64","author":"J Liu","year":"2022","unstructured":"Liu J, Huang J, Zhou Y, Li X, Ji S, Xiong H, Dou D. From distributed machine learning to federated learning: a survey. Knowl Inf Syst. 2022;64(4):885\u2013917.","journal-title":"Knowl Inf Syst"},{"key":"1195_CR11","doi-asserted-by":"crossref","unstructured":"Shvachko K, Kuang H, Radia S, Chansler R. The hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST); 2010, p. 1\u201310. IEEE.","DOI":"10.1109\/MSST.2010.5496972"},{"key":"1195_CR12","unstructured":"McMahan B, Moore E, Ramage D, Hampson S, Arcas BA. Communication-efficient learning of deep networks from decentralized data. In: Artificial Intelligence and Statistics. PMLR; 2017, p. 1273\u20131282."},{"key":"1195_CR13","unstructured":"Liu Y, Zhang L, Ge N, Li G. A systematic literature review on federated learning: From a model quality perspective; 2020. arXiv preprint arXiv:2012.01973."},{"key":"1195_CR14","unstructured":"Zhao Y, Li M, Lai L, Suda N, Civin D, Chandra V. Federated learning with non-iid data; 2018. arXiv preprint arXiv:1806.00582."},{"key":"1195_CR15","unstructured":"Nguyen J, Wang J, Malik K, Sanjabi M, Rabbat M. Where to begin? on the impact of pre-training and initialization in federated learning; 2022. arXiv preprint arXiv:2206.15387."},{"key":"1195_CR16","doi-asserted-by":"crossref","unstructured":"Ruan Y, Joe-Wong C. Fedsoft: Soft clustered federated learning with proximal local updating. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2022;36:8124\u20138131.","DOI":"10.1609\/aaai.v36i7.20785"},{"issue":"8","key":"1195_CR17","doi-asserted-by":"publisher","first-page":"2818","DOI":"10.1109\/TMC.2020.3045266","volume":"21","author":"Q Wu","year":"2020","unstructured":"Wu Q, Chen X, Zhou Z, Zhang J. Fedhome: Cloud-edge based personalized federated learning for in-home health monitoring. IEEE Trans Mob Comput. 2020;21(8):2818\u201332.","journal-title":"IEEE Trans Mob Comput"},{"key":"1195_CR18","doi-asserted-by":"publisher","DOI":"10.1016\/j.comnet.2022.108820","volume":"207","author":"O Nassef","year":"2022","unstructured":"Nassef O, Sun W, Purmehdi H, Tatipamula M, Mahmoodi T. A survey: distributed machine learning for 5g and beyond. Comput Netw. 2022;207: 108820.","journal-title":"Comput Netw"},{"issue":"1\u20132","key":"1195_CR19","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1561\/2200000083","volume":"14","author":"P Kairouz","year":"2021","unstructured":"Kairouz P, McMahan HB, Avent B, Bellet A, Bennis M, Bhagoji AN, Bonawitz K, Charles Z, Cormode G, Cummings R, et al. Advances and open problems in federated learning. Foundations and trends\u00ae in machine learning. 2021;14(1\u20132):1\u2013210.","journal-title":"Foundations and trends\u00ae in machine learning"},{"issue":"2","key":"1195_CR20","doi-asserted-by":"publisher","first-page":"513","DOI":"10.1007\/s13042-022-01647-y","volume":"14","author":"J Wen","year":"2023","unstructured":"Wen J, Zhang Z, Lan Y, Cui Z, Cai J, Zhang W. A survey on federated learning: challenges and applications. Int J Mach Learn Cybern. 2023;14(2):513\u201335.","journal-title":"Int J Mach Learn Cybern"},{"issue":"3","key":"1195_CR21","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3625558","volume":"56","author":"M Ye","year":"2023","unstructured":"Ye M, Fang X, Du B, Yuen PC, Tao D. Heterogeneous federated learning: state-of-the-art and research challenges. ACM Comput Surv. 2023;56(3):1\u201344.","journal-title":"ACM Comput Surv"},{"issue":"8","key":"1195_CR22","doi-asserted-by":"publisher","first-page":"3710","DOI":"10.1109\/TNNLS.2020.3015958","volume":"32","author":"F Sattler","year":"2020","unstructured":"Sattler F, M\u00fcller K-R, Samek W. Clustered federated learning: model-agnostic distributed multitask optimization under privacy constraints. IEEE Trans Neural Netw Learn Syst 2020;32(8):3710\u201322.","journal-title":"IEEE Trans Neural Netw Learn Syst"},{"issue":"3","key":"1195_CR23","doi-asserted-by":"publisher","first-page":"50","DOI":"10.1109\/MSP.2020.2975749","volume":"37","author":"T Li","year":"2020","unstructured":"Li T, Sahu AK, Talwalkar A, Smith V. Federated learning: challenges, methods, and future directions. IEEE Signal Process Mag. 2020;37(3):50\u201360.","journal-title":"IEEE Signal Process Mag"},{"key":"1195_CR24","doi-asserted-by":"publisher","first-page":"371","DOI":"10.1016\/j.neucom.2021.07.098","volume":"465","author":"H Zhu","year":"2021","unstructured":"Zhu H, Xu J, Liu S, Jin Y. Federated learning on non-iid data: a survey. Neurocomputing. 2021;465:371\u201390.","journal-title":"Neurocomputing"},{"issue":"Suppl 2","key":"1195_CR25","doi-asserted-by":"publisher","first-page":"1773","DOI":"10.1007\/s10462-023-10563-8","volume":"56","author":"Y Shanmugarasa","year":"2023","unstructured":"Shanmugarasa Y, Paik H-Y, Kanhere SS, Zhu L. A systematic review of federated learning from clients\u2019 perspective: challenges and solutions. Artif Intell Rev. 2023;56(Suppl 2):1773\u2013827.","journal-title":"Artif Intell Rev"},{"key":"1195_CR26","unstructured":"Rafi TH, Noor FA, Hussain T, Chae D-K, Yang Z. A generalized look at federated learning: Survey and perspectives; 2023. arXiv preprint arXiv:2303.14787."},{"key":"1195_CR27","doi-asserted-by":"publisher","first-page":"244","DOI":"10.1016\/j.future.2022.05.003","volume":"135","author":"X Ma","year":"2022","unstructured":"Ma X, Zhu J, Lin Z, Chen S, Qin Y. A state-of-the-art survey on solving non-iid data in federated learning. Futur Gener Comput Syst. 2022;135:244\u201358.","journal-title":"Futur Gener Comput Syst"},{"key":"1195_CR28","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2023.102198","volume":"105","author":"TH Rafi","year":"2024","unstructured":"Rafi TH, Noor FA, Hussain T, Chae D-K. Fairness and privacy preserving in federated learning: a survey. Inf Fusion. 2024;105: 102198.","journal-title":"Inf Fusion"},{"key":"1195_CR29","doi-asserted-by":"crossref","unstructured":"Shi Y, Yu H, Leung C. Towards fairness-aware federated learning. IEEE Trans Neural Netw Learning Syst. 2023; 35(9):11922\u201338.","DOI":"10.1109\/TNNLS.2023.3263594"},{"key":"1195_CR30","doi-asserted-by":"crossref","unstructured":"Liu B, Lv N, Guo Y, Li Y. Recent advances on federated learning: A systematic survey; 2023. arXiv preprint arXiv:2301.01299.","DOI":"10.2139\/ssrn.4410417"},{"key":"1195_CR31","doi-asserted-by":"crossref","unstructured":"Abdelmoniem AM, Ho C-Y, Papageorgiou P, Canini M. A comprehensive empirical study of heterogeneity in federated learning. IEEE Internet of Things J. 2023;10(16):14071\u201383.","DOI":"10.1109\/JIOT.2023.3250275"},{"key":"1195_CR32","doi-asserted-by":"crossref","unstructured":"Vucinich S, Zhu Q. The current state and challenges of fairness in federated learning. IEEE Access. 2023; 11:80903\u201314.","DOI":"10.1109\/ACCESS.2023.3295412"},{"issue":"3","key":"1195_CR33","doi-asserted-by":"publisher","first-page":"711","DOI":"10.1093\/comjnl\/bxab192","volume":"66","author":"Y Tian","year":"2021","unstructured":"Tian Y, Zhang W, Simpson A, Liu Y, Jiang ZL. Defending against data poisoning attacks: from distributed learning to federated learning. Comput J. 2021;66(3):711\u201326.","journal-title":"Comput J"},{"issue":"1","key":"1195_CR34","doi-asserted-by":"publisher","first-page":"437","DOI":"10.1109\/TDSC.2021.3135422","volume":"20","author":"X Li","year":"2021","unstructured":"Li X, Qu Z, Zhao S, Tang B, Lu Z, Liu Y. Lomar: A local defense against poisoning attack on federated learning. IEEE Trans Dependable Secure Comput. 2021;20(1):437\u201350.","journal-title":"IEEE Trans Dependable Secure Comput"},{"key":"1195_CR35","unstructured":"Corinzia L, Beuret A, Buhmann JM. Variational federated multi-task learning; 2019. arXiv preprint arXiv:1906.06268."},{"key":"1195_CR36","unstructured":"Khodak M, Balcan M-FF, Talwalkar AS. Adaptive gradient-based meta-learning methods. Adv Neural Inf Process Syst. 2019;32:434."},{"key":"1195_CR37","unstructured":"Eichner H, Koren T, McMahan B, Srebro N, Talwar K. Semi-cyclic stochastic gradient descent. In: International Conference on Machine Learning. PMLR; 2019. p. 1764\u20131773."},{"key":"1195_CR38","unstructured":"Mohri M, Sivek G, Suresh AT. Agnostic federated learning. In: International Conference on Machine Learning. PMLR; 2019. pp. 4615\u20134625."},{"key":"1195_CR39","unstructured":"Li T, Sanjabi M, Beirami A, Smith V. Fair resource allocation in federated learning; 2019. arXiv preprint arXiv:1905.10497"},{"issue":"2","key":"1195_CR40","doi-asserted-by":"publisher","first-page":"394","DOI":"10.1109\/TPDS.2020.3023905","volume":"32","author":"C Wang","year":"2020","unstructured":"Wang C, Yang Y, Zhou P. Towards efficient scheduling of federated mobile devices under computational and statistical heterogeneity. IEEE Trans Parallel Distrib Syst. 2020;32(2):394\u2013410.","journal-title":"IEEE Trans Parallel Distrib Syst"},{"issue":"1","key":"1195_CR41","doi-asserted-by":"publisher","first-page":"408","DOI":"10.1109\/TCCN.2021.3100574","volume":"8","author":"A Ta\u00efk","year":"2021","unstructured":"Ta\u00efk A, Mlika Z, Cherkaoui S. Data-aware device scheduling for federated edge learning. IEEE Trans Cogn Commun Netw. 2021;8(1):408\u201321.","journal-title":"IEEE Trans Cogn Commun Netw"},{"issue":"13","key":"1195_CR42","doi-asserted-by":"publisher","first-page":"10355","DOI":"10.1007\/s00521-021-06861-3","volume":"34","author":"W Li","year":"2022","unstructured":"Li W, Wang S. Federated meta-learning for spatial-temporal prediction. Neural Comput Appl. 2022;34(13):10355\u201374.","journal-title":"Neural Comput Appl"},{"key":"1195_CR43","doi-asserted-by":"publisher","first-page":"3607","DOI":"10.1007\/s13042-021-01410-9","volume":"12","author":"K Hu","year":"2021","unstructured":"Hu K, Wu J, Weng L, Zhang Y, Zheng F, Pang Z, Xia M. A novel federated learning approach based on the confidence of federated kalman filters. Int J Mach Learn Cybern. 2021;12:3607\u201327.","journal-title":"Int J Mach Learn Cybern"},{"key":"1195_CR44","doi-asserted-by":"crossref","unstructured":"Luo B, Xiao W, Wang S, Huang J, Tassiulas L. Tackling system and statistical heterogeneity for federated learning with adaptive client sampling. In: IEEE INFOCOM 2022-IEEE Conference on Computer Communications. IEEE; 2022. p. 1739\u20131748.","DOI":"10.1109\/INFOCOM48880.2022.9796935"},{"key":"1195_CR45","doi-asserted-by":"crossref","unstructured":"Lyu L, Xu X, Wang Q, Yu H. Collaborative fairness in federated learning. Federated Learning: Privacy and Incentive; 2020. p. 189\u2013204.","DOI":"10.1007\/978-3-030-63076-8_14"},{"key":"1195_CR46","doi-asserted-by":"crossref","unstructured":"Zhang DY, Kou Z, Wang D. Fairfl: A fair federated learning approach to reducing demographic bias in privacy-sensitive classification models. In: 2020 IEEE International Conference on Big Data (Big Data). IEEE; 2020. p. 1051\u20131060.","DOI":"10.1109\/BigData50022.2020.9378043"},{"key":"1195_CR47","doi-asserted-by":"crossref","unstructured":"Lyu L, Xu X, Wang Q, Yu H. Collaborative fairness in federated learning. Federated Learning: Privacy and Incentive, 2020; p. 189\u2013204.","DOI":"10.1007\/978-3-030-63076-8_14"},{"key":"1195_CR48","first-page":"15434","volume":"34","author":"O Marfoq","year":"2021","unstructured":"Marfoq O, Neglia G, Bellet A, Kameni L, Vidal R. Federated multi-task learning under a mixture of distributions. Adv Neural Inf Process Syst. 2021;34:15434\u201347.","journal-title":"Adv Neural Inf Process Syst"},{"key":"1195_CR49","unstructured":"Acar DAE, Zhao Y, Zhu R, Matas R, Mattina M, Whatmough P, Saligrama V. Debiasing model updates for improving personalized federated training. In: International Conference on Machine Learning. PMLR; 2021. pp. 21\u201331."},{"key":"1195_CR50","unstructured":"Jiang Y, Kone\u010dn\u1ef3 J, Rush K, Kannan S. Improving federated learning personalization via model agnostic meta learning; 2019. arXiv preprint arXiv:1909.12488."},{"key":"1195_CR51","unstructured":"Fallah A, Mokhtari A, Ozdaglar A. Personalized federated learning: A meta-learning approach; 2020. arXiv preprint arXiv:2002.07948"},{"key":"1195_CR52","unstructured":"Wang K, Mathews R, Kiddon C, Eichner H, Beaufays F, Ramage D. Federated evaluation of on-device personalization; 2019. arXiv preprint arXiv:1910.10252"},{"key":"1195_CR53","unstructured":"Yu T, Bagdasaryan E, Shmatikov V. Salvaging federated learning by local adaptation; 2020. arXiv preprint arXiv:2002.04758"},{"key":"1195_CR54","unstructured":"Peng X, Huang Z, Zhu Y, Saenko K. Federated adversarial domain adaptation; 2019. arXiv preprint arXiv:1911.02054."},{"key":"1195_CR55","first-page":"3622","volume":"34","author":"K Ozkara","year":"2021","unstructured":"Ozkara K, Singh N, Data D, Diggavi S. Quped: Quantized personalization via distillation with applications to federated learning. Adv Neural Inf Process Syst. 2021;34:3622\u201334.","journal-title":"Adv Neural Inf Process Syst"},{"key":"1195_CR56","first-page":"19586","volume":"33","author":"A Ghosh","year":"2020","unstructured":"Ghosh A, Chung J, Yin D, Ramchandran K. An efficient framework for clustered federated learning. Adv Neural Inf Process Syst. 2020;33:19586\u201397.","journal-title":"Adv Neural Inf Process Syst"},{"key":"1195_CR57","doi-asserted-by":"crossref","unstructured":"Liu B, Guo Y, Chen X. Pfa: Privacy-preserving federated adaptation for effective model personalization. In: Proceedings of the Web Conference 2021; 2021. p. 923\u2013934.","DOI":"10.1145\/3442381.3449847"},{"issue":"4","key":"1195_CR58","first-page":"1","volume":"5","author":"B Liu","year":"2021","unstructured":"Liu B, Cai Y, Zhang Z, Li Y, Wang L, Li D, Guo Y, Chen X. Distfl: Distribution-aware federated learning for mobile scenarios. Proc ACM Interact Mob Wearable Ubiquitous Technol. 2021;5(4):1\u201326.","journal-title":"Proc ACM Interact Mob Wearable Ubiquitous Technol"},{"key":"1195_CR59","doi-asserted-by":"crossref","unstructured":"Yang M, Wang X, Zhu H, Wang H, Qian H. Federated learning with class imbalance reduction. In: 2021 29th European Signal Processing Conference (EUSIPCO). IEEE; 2021. p. 2174\u20132178.","DOI":"10.23919\/EUSIPCO54536.2021.9616052"},{"key":"1195_CR60","unstructured":"Jeong E, Oh S, Kim H, Park J, Bennis M, Kim S-L. Communication-efficient on-device machine learning: Federated distillation and augmentation under non-iid private data; 2018. arXiv preprint arXiv:1811.11479."},{"key":"1195_CR61","unstructured":"Shin M, Hwang C, Kim J, Park J, Bennis M, Kim S-L. Xor mixup: Privacy-preserving data augmentation for one-shot federated learning; 2020. arXiv preprint arXiv:2006.05148"},{"issue":"1","key":"1195_CR62","doi-asserted-by":"publisher","first-page":"59","DOI":"10.1109\/TPDS.2020.3009406","volume":"32","author":"M Duan","year":"2020","unstructured":"Duan M, Liu D, Chen X, Liu R, Tan Y, Liang L. Self-balancing federated learning with global imbalanced data in mobile systems. IEEE Trans Parallel Distrib Syst. 2020;32(1):59\u201371.","journal-title":"IEEE Trans Parallel Distrib Syst"},{"key":"1195_CR63","doi-asserted-by":"crossref","unstructured":"Zhang DY, Kou Z, Wang D. Fedsens: A federated learning approach for smart health sensing with class imbalance in resource constrained edge computing. In: IEEE INFOCOM 2021-IEEE Conference on Computer Communications. IEEE; 2021. p. 1\u201310.","DOI":"10.1109\/INFOCOM42981.2021.9488776"},{"key":"1195_CR64","doi-asserted-by":"crossref","unstructured":"Martinez I, Francis S, Hafid AS. Record and reward federated learning contributions with blockchain. In: 2019 International Conference on Cyber-enabled Distributed Computing and Knowledge Discovery (CyberC); IEEE; 2019. p. 50\u201357.","DOI":"10.1109\/CyberC.2019.00018"},{"key":"1195_CR65","doi-asserted-by":"crossref","unstructured":"Moon J, Kum S, Kim Y, Stankovski V, Pa\u0161\u010dinski U, Kochovski P. A decentralized ai data management system in federated learning. In: 2020 International Conference on Intelligent Systems and Computer Vision (ISCV). IEEE; 2020. p. 1\u20134.","DOI":"10.1109\/ISCV49265.2020.9204271"},{"key":"1195_CR66","doi-asserted-by":"crossref","unstructured":"Ezzeldin YH, Yan S, He C, Ferrara E, Avestimehr AS. Fairfed: Enabling group fairness in federated learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2023;37:7494\u20137502.","DOI":"10.1609\/aaai.v37i6.25911"},{"key":"1195_CR67","unstructured":"Pentyala S, Neophytou N, Nascimento A, De\u00a0Cock M, Farnadi G. Privfairfl: Privacy-preserving group fairness in federated learning; 2022. arXiv preprint arXiv:2205.11584."},{"key":"1195_CR68","unstructured":"Pentyala S, Neophytou N, Nascimento A, De\u00a0Cock M, Farnadi G. Privfairfl: Privacy-preserving group fairness in federated learning; 2022. arXiv preprint arXiv:2205.11584."},{"key":"1195_CR69","unstructured":"McMahan B, Moore E, Ramage D, Hampson S, Arcas BA. Communication-efficient learning of deep networks from decentralized data. In: Artificial Intelligence and Statistics. PMLR; 2017. p. 1273\u20131282."},{"issue":"7","key":"1195_CR70","doi-asserted-by":"publisher","first-page":"5476","DOI":"10.1109\/JIOT.2020.3030072","volume":"8","author":"S AbdulRahman","year":"2020","unstructured":"AbdulRahman S, Tout H, Ould-Slimane H, Mourad A, Talhi C, Guizani M. A survey on federated learning: The journey from centralized to distributed on-site learning and beyond. IEEE Internet Things J. 2020;8(7):5476\u201397.","journal-title":"IEEE Internet Things J"},{"key":"1195_CR71","doi-asserted-by":"crossref","unstructured":"Baunsgaard S, Boehm M, Innerebner K, Kehayov M, Lackner F, Ovcharenko O, Phani A, Rieger T, Weissteiner D, Wrede SB. Federated data preparation, learning, and debugging in apache systemds. In: Proceedings of the 31st ACM International Conference on Information & Knowledge Management; 2022. pp. 4813\u20134817.","DOI":"10.1145\/3511808.3557162"},{"key":"1195_CR72","unstructured":"Radovic B, Pejovic V. Repa: Client clustering without training and data labels for improved federated learning in non-iid settings; 2023. arXiv arXiv:2309.14088."},{"key":"1195_CR73","doi-asserted-by":"crossref","unstructured":"Tan Y, Long G, Liu L, Zhou T, Lu Q, Jiang J, Zhang C. Fedproto: Federated prototype learning across heterogeneous clients. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2022;36:8432\u20138440","DOI":"10.1609\/aaai.v36i8.20819"},{"key":"1195_CR74","doi-asserted-by":"crossref","unstructured":"Gong B, Xing T, Liu Z, Xi W, Chen X. Adaptive client clustering for efficient federated learning over non-iid and imbalanced data. IEEE Trans Big Data. 2022; 10(6):1051\u201365.","DOI":"10.1109\/TBDATA.2022.3167994"},{"issue":"1","key":"1195_CR75","doi-asserted-by":"publisher","first-page":"124","DOI":"10.3390\/ai3010008","volume":"3","author":"S Rai","year":"2022","unstructured":"Rai S, Kumari A, Prasad DK. Client selection in federated learning under imperfections in environment. AI. 2022;3(1):124\u201345.","journal-title":"AI"},{"key":"1195_CR76","unstructured":"Diao Y, Li Q, He B. Towards addressing label skews in one-shot federated learning. In: The Eleventh International Conference on Learning Representations; 2022."},{"key":"1195_CR77","doi-asserted-by":"crossref","unstructured":"Casella B, Esposito R, Sciarappa A, Cavazzoni C, Aldinucci M. Experimenting with normalization layers in federated learning on non-iid scenarios; 2023. arXiv preprint arXiv:2303.10630.","DOI":"10.1109\/ACCESS.2024.3383783"},{"key":"1195_CR78","unstructured":"Tutorial 1: MNIST, the Hello World of Deep Learning; 2024. Accessed 08 June 2024. https:\/\/medium.com\/fenwicks\/tutorial-1-mnist-the-hello-world-of-deep-learning-abd252c47709"},{"key":"1195_CR79","doi-asserted-by":"crossref","unstructured":"Li Q, Diao Y, Chen Q, He B. Federated learning on non-iid data silos: An experimental study. In: 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE; 2022. p. 965\u2013978.","DOI":"10.1109\/ICDE53745.2022.00077"},{"key":"1195_CR80","doi-asserted-by":"crossref","unstructured":"Vahidian S, Morafah M, Shah M, Lin B. Rethinking data heterogeneity in federated learning: Introducing a new notion and standard benchmarks. IEEE Trans Artif Intell. 2023; 5(3):1386\u201397.","DOI":"10.1109\/TAI.2023.3293068"},{"key":"1195_CR81","unstructured":"Calmon F, Wei D, Vinzamuri B, Natesan\u00a0Ramamurthy K, Varshney KR. Optimized pre-processing for discrimination prevention. Adv Neural Inf Process Syst 2017."},{"key":"1195_CR82","doi-asserted-by":"crossref","unstructured":"Boufares F, Salem AB. Heterogeneous data-integration and data quality: Overview of conflicts. In: 2012 6th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT). IEEE; 2012. p. 867\u2013874.","DOI":"10.1109\/SETIT.2012.6482029"},{"key":"1195_CR83","doi-asserted-by":"crossref","unstructured":"Deng Y, Lyu F, Ren J, Chen Y-C, Yang P, Zhou Y, Zhang Y. Fair: Quality-aware federated learning with precise user incentive and model aggregation. In: IEEE INFOCOM 2021-IEEE Conference on Computer Communications. IEEE; 2021. p. 1\u201310.","DOI":"10.1109\/INFOCOM42981.2021.9488743"},{"key":"1195_CR84","doi-asserted-by":"crossref","unstructured":"Chen Y, Yang X, Qin X, Yu H, Chan P, Shen Z. Dealing with label quality disparity in federated learning. Federated Learning Privacy Incentive. 2020:108\u2013121.","DOI":"10.1007\/978-3-030-63076-8_8"},{"key":"1195_CR85","doi-asserted-by":"publisher","unstructured":"Mohamed NS, Ashour M, Mashaly M. Quality-aware node selection for efficient federated learning based on a global perspective. In: 2024 International Conference on Microelectronics (ICM); 2024, p. 1\u20136. https:\/\/doi.org\/10.1109\/ICM63406.2024.10815867","DOI":"10.1109\/ICM63406.2024.10815867"},{"issue":"6","key":"1195_CR86","first-page":"1942","volume":"34","author":"W Wu","year":"2023","unstructured":"Wu W, He L, Lin W, Maple C. Fedprof: selective federated learning based on distributional representation profiling. IEEE Trans Parallel Distrib Syst. 2023;34(6):1942\u201353.","journal-title":"IEEE Trans Parallel Distrib Syst"},{"key":"1195_CR87","doi-asserted-by":"crossref","unstructured":"Huang J, Hong C, Liu Y, Chen LY, Roos S. Maverick matters: Client contribution and selection in federated learning. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining. Cham: Springer; 2023. p. 269\u2013282.","DOI":"10.1007\/978-3-031-33377-4_21"},{"key":"1195_CR88","doi-asserted-by":"crossref","unstructured":"Vanschoren J. Meta-learning. Automated machine learning: methods, systems, challenges. 2019. p. 35\u201361.","DOI":"10.1007\/978-3-030-05318-5_2"},{"key":"1195_CR89","first-page":"21394","volume":"33","author":"CT Dinh","year":"2020","unstructured":"Dinh CT, Tran N, Nguyen J. Personalized federated learning with moreau envelopes. Adv Neural Inf Process Syst. 2020;33:21394\u2013405.","journal-title":"Adv Neural Inf Process Syst"},{"key":"1195_CR90","unstructured":"Arivazhagan MG, Aggarwal V, Singh AK, Choudhary S. Federated learning with personalization layers; 2019. arXiv preprint arXiv:1912.00818."},{"key":"1195_CR91","doi-asserted-by":"crossref","unstructured":"Sabah F, Chen Y, Yang Z, Azam M, Ahmad N, Sarwar R. Model optimization techniques in personalized federated learning: a survey. Expert Syst Appl. 2023:122874.","DOI":"10.1016\/j.eswa.2023.122874"},{"key":"1195_CR92","unstructured":"Liang PP, Liu T, Ziyin L, Allen NB, Auerbach RP, Brent D, Salakhutdinov R, Morency L-P. Think locally, act globally: Federated learning with local and global representations; 2020. arXiv preprint arXiv:2001.01523."},{"issue":"1","key":"1195_CR93","doi-asserted-by":"publisher","first-page":"91","DOI":"10.1016\/j.gltp.2022.04.020","volume":"3","author":"K Maharana","year":"2022","unstructured":"Maharana K, Mondal S, Nemade B. A review: Data pre-processing and data augmentation techniques. Global Transit Proc 2022;3(1):91\u20139.","journal-title":"Glob Transit Proc"},{"key":"1195_CR94","doi-asserted-by":"crossref","unstructured":"Yoshida N, Nishio T, Morikura M, Yamamoto K, Yonetani R. Hybrid-fl for wireless networks: Cooperative learning mechanism using non-iid data. In: ICC 2020-2020 IEEE International Conference On Communications (ICC). IEEE; 2020. p. 1\u20137.","DOI":"10.1109\/ICC40277.2020.9149323"},{"issue":"1","key":"1195_CR95","doi-asserted-by":"publisher","first-page":"234","DOI":"10.1109\/JSTSP.2022.3231527","volume":"17","author":"YJ Cho","year":"2023","unstructured":"Cho YJ, Wang J, Chirvolu T, Joshi G. Communication-efficient and model-heterogeneous personalized federated learning via clustered knowledge transfer. IEEE J Sel Top Signal Process. 2023;17(1):234\u201347.","journal-title":"IEEE J Sel Top Signal Process"},{"key":"1195_CR96","unstructured":"Li D, Wang J. Fedmd: Heterogenous federated learning via model distillation; 2019. arXiv preprint arXiv:1910.03581."},{"key":"1195_CR97","first-page":"2351","volume":"33","author":"T Lin","year":"2020","unstructured":"Lin T, Kong L, Stich SU, Jaggi M. Ensemble distillation for robust model fusion in federated learning. Adv Neural Inf Process Syst. 2020;33:2351\u201363.","journal-title":"Adv Neural Inf Process Syst"},{"key":"1195_CR98","unstructured":"Yoon T, Shin S, Hwang SJ, Yang E. Fedmix: Approximation of mixup under mean augmented federated learning; 2021. arXiv preprint arXiv:2107.00233."},{"key":"1195_CR99","unstructured":"Sattler F, M\u00fcller K-R, Samek W. Clustered federated learning. In: Proceedings of the NeurIPS\u201919 Workshop on Federated Learning for Data Privacy and Confidentiality; 2019. p. 1\u20135."},{"key":"1195_CR100","doi-asserted-by":"crossref","unstructured":"Rana O, Spyridopoulos T, Hudson N, Baughman M, Chard K, Foster I, Khan A. Hierarchical and decentralised federated learning. In: 2022 Cloud Continuum. IEEE; 2022. p. 1\u20139.","DOI":"10.1109\/CloudContinuum57429.2022.00008"},{"issue":"1","key":"1195_CR101","doi-asserted-by":"publisher","first-page":"91","DOI":"10.1016\/j.gltp.2022.04.020","volume":"3","author":"K Maharana","year":"2022","unstructured":"Maharana K, Mondal S, Nemade B. A review: data pre-processing and data augmentation techniques. Global Transitions Proceedings. 2022;3(1):91\u20139.","journal-title":"Global Transitions Proceedings"},{"key":"1195_CR102","doi-asserted-by":"crossref","unstructured":"Wang Y, Shi Q, Chang T-H. Why batch normalization damage federated learning on non-iid data? IEEE Trans Neural Netw Learning Syst. 2023; 36(1):1692\u20131706.","DOI":"10.1109\/TNNLS.2023.3323302"},{"key":"1195_CR103","volume-title":"Feature engineering for machine learning and data analytics","author":"G Dong","year":"2018","unstructured":"Dong G, Liu H. Feature engineering for machine learning and data analytics. Boca Raton, FL: CRC Press; 2018."},{"key":"1195_CR104","unstructured":"Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning. PMLR; 2015. p. 448\u2013456."},{"issue":"8","key":"1195_CR105","doi-asserted-by":"publisher","first-page":"10173","DOI":"10.1109\/TPAMI.2023.3250241","volume":"45","author":"L Huang","year":"2023","unstructured":"Huang L, Qin J, Zhou Y, Zhu F, Liu L, Shao L. Normalization techniques in training dnns: methodology, analysis and application. IEEE Trans Pattern Anal Mach Intell. 2023;45(8):10173\u201396.","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"issue":"2","key":"1195_CR106","first-page":"31","volume":"5","author":"P Ajitha","year":"2015","unstructured":"Ajitha P, Chandra E. A survey on outliers detection in distributed data mining for big data. J Basic Appl Scientific Res. 2015;5(2):31\u20138.","journal-title":"J Basic Appl Scientific Res"},{"key":"1195_CR107","doi-asserted-by":"crossref","unstructured":"Hao W, El-Khamy M, Lee J, Zhang J, Liang KJ, Chen C, Duke LC. Towards fair federated learning with zero-shot data augmentation. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition; 2021. p. 3310\u20133319.","DOI":"10.1109\/CVPRW53098.2021.00369"},{"issue":"1","key":"1195_CR108","doi-asserted-by":"publisher","first-page":"124","DOI":"10.3390\/ai3010008","volume":"3","author":"S Rai","year":"2022","unstructured":"Rai S, Kumari A, Prasad DK. Client selection in federated learning under imperfections in environment. AI. 2022;3(1):124\u201345.","journal-title":"AI"},{"key":"1195_CR109","doi-asserted-by":"crossref","unstructured":"Li A, Zhang L, Tan J, Qin Y, Wang J, Li X-Y. Sample-level data selection for federated learning. In: IEEE INFOCOM 2021-IEEE Conference on Computer Communications. IEEE; 2021. p. 1\u201310.","DOI":"10.1109\/INFOCOM42981.2021.9488723"},{"key":"1195_CR110","unstructured":"Fang P, Cai Z, Chen H, Shi Q. Flfe: a communication-efficient and privacy-preserving federated feature engineering framework; 2020. arXiv preprint arXiv:2009.02557"},{"key":"1195_CR111","unstructured":"Zhang G, Beitollahi M, Bie A, Chen X. Understanding the role of layer normalization in label-skewed federated learning. Transactions on Machine Learning Research; 2023."},{"key":"1195_CR112","doi-asserted-by":"crossref","unstructured":"Itokazu K, Wang L, Ozawa S. Outlier detection by privacy-preserving ensemble decision tree u sing homomorphic encryption. In: 2021 International Joint Conference on Neural Networks (IJCNN). IEEE; 2021. p. 1\u20137.","DOI":"10.1109\/IJCNN52387.2021.9534464"}],"container-title":["Journal of Big Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-025-01195-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s40537-025-01195-6\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-025-01195-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,23]],"date-time":"2025-06-23T18:03:22Z","timestamp":1750701802000},"score":1,"resource":{"primary":{"URL":"https:\/\/journalofbigdata.springeropen.com\/articles\/10.1186\/s40537-025-01195-6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,23]]},"references-count":112,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["1195"],"URL":"https:\/\/doi.org\/10.1186\/s40537-025-01195-6","relation":{},"ISSN":["2196-1115"],"issn-type":[{"value":"2196-1115","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,6,23]]},"assertion":[{"value":"23 April 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"12 May 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"23 June 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare no competing interests.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"153"}}