{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,12]],"date-time":"2026-05-12T09:10:10Z","timestamp":1778577010789,"version":"3.51.4"},"reference-count":51,"publisher":"Oxford University Press (OUP)","issue":"8","license":[{"start":{"date-parts":[[2022,2,9]],"date-time":"2022-02-09T00:00:00Z","timestamp":1644364800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"name":"European Union\u2019s Horizon2020 research and innovation programme","award":["826078"],"award-info":[{"award-number":["826078"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,4,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Limited data access has hindered the field of precision medicine from exploring its full potential, e.g. concerning machine learning and privacy and data protection rules.<\/jats:p>\n                  <jats:p>Our study evaluates the efficacy of federated Random Forests (FRF) models, focusing particularly on the heterogeneity within and between datasets. We addressed three common challenges: (i) number of parties, (ii) sizes of datasets and (iii) imbalanced phenotypes, evaluated on five biomedical datasets.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>The FRF outperformed the average local models and performed comparably to the data-centralized models trained on the entire data. With an increasing number of models and decreasing dataset size, the performance of local models decreases drastically. The FRF, however, do not decrease significantly. When combining datasets of different sizes, the FRF vastly improve compared to the average local models. We demonstrate that the FRF remain more robust and outperform the local models by analyzing different class-imbalances.<\/jats:p>\n                  <jats:p>Our results support that FRF overcome boundaries of clinical research and enables collaborations across institutes without violating privacy or legal regulations. Clinicians benefit from a vast collection of unbiased data aggregated from different geographic locations, demographics and other varying factors. They can build more generalizable models to make better clinical decisions, which will have relevance, especially for patients in rural areas and rare or geographically uncommon diseases, enabling personalized treatment. In combination with secure multi-party computation, federated learning has the power to revolutionize clinical practice by increasing the accuracy and robustness of healthcare AI and thus paving the way for precision medicine.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>The implementation of the federated random forests can be found at https:\/\/featurecloud.ai\/.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btac065","type":"journal-article","created":{"date-parts":[[2022,2,1]],"date-time":"2022-02-01T12:18:16Z","timestamp":1643717896000},"page":"2278-2286","source":"Crossref","is-referenced-by-count":86,"title":["Federated Random Forests can improve local performance of predictive models for various healthcare applications"],"prefix":"10.1093","volume":"38","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7499-4373","authenticated-orcid":false,"given":"Anne-Christin","family":"Hauschild","sequence":"first","affiliation":[{"name":"Department of Mathematics and Computer Science, University of Marburg , Marburg, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Marta","family":"Lemanczyk","sequence":"additional","affiliation":[{"name":"Department of Mathematics and Computer Science, University of Marburg , Marburg, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Julian","family":"Matschinske","sequence":"additional","affiliation":[{"name":"TUM School of Life Sciences Weihenstephan, Technical University of Munich , Freising-Weihenstephan, Germany"},{"name":"Computational Systems Biology, University of Hamburg , Hamburg, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tobias","family":"Frisch","sequence":"additional","affiliation":[{"name":"Department of Mathematics and Computer Science, University of Southern Denmark , Odense, Denmark"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Olga","family":"Zolotareva","sequence":"additional","affiliation":[{"name":"TUM School of Life Sciences Weihenstephan, Technical University of Munich , Freising-Weihenstephan, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6786-5194","authenticated-orcid":false,"given":"Andreas","family":"Holzinger","sequence":"additional","affiliation":[{"name":"Institut f\u00fcr Medizinische Informatik, Statistik und Dokumentation, Medizinische Universit\u00e4t Graz , Graz, Austria"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jan","family":"Baumbach","sequence":"additional","affiliation":[{"name":"Computational Systems Biology, University of Hamburg , Hamburg, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3108-8311","authenticated-orcid":false,"given":"Dominik","family":"Heider","sequence":"additional","affiliation":[{"name":"Department of Mathematics and Computer Science, University of Marburg , Marburg, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2022,2,9]]},"reference":[{"key":"2023020109021810500_btac065-B1","doi-asserted-by":"crossref","first-page":"6","DOI":"10.1038\/s41540-017-0007-2","article-title":"On the performance of de novo pathway enrichment","volume":"3","author":"Batra","year":"2017","journal-title":"NPJ Syst. Biol. Appl"},{"key":"2023020109021810500_btac065-B2","first-page":"1296","article-title":"Der GALAD-Score, ein AFP-, AFP-L3- und DCP-basierter Diagnosealgorithmus verbessert die Detektionsrate des hepatozellul\u00e4ren Karzinoms im BCLC-Fr\u00fchstadium signifikant","volume":"54","author":"Best","year":"2016","journal-title":"Z. Gastroenterol"},{"key":"2023020109021810500_btac065-B3","doi-asserted-by":"crossref","first-page":"e0183458","DOI":"10.1371\/journal.pone.0183458","article-title":"MammaPrint versus EndoPredict: poor correlation in disease recurrence risk classification of hormone receptor positive breast cancer","volume":"12","author":"B\u00f6sl","year":"2017","journal-title":"PLoS One"},{"key":"2023020109021810500_btac065-B4","doi-asserted-by":"crossref","first-page":"493","DOI":"10.1002\/widm.1072","article-title":"Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics","volume":"2","author":"Boulesteix","year":"2012","journal-title":"Wiley Interdisc. Rev. Data Min. Knowl. Discov"},{"key":"2023020109021810500_btac065-B5","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1016\/j.ijmedinf.2018.01.007","article-title":"Federated learning of predictive models from federated Electronic Health Records","volume":"112","author":"Brisimi","year":"2018","journal-title":"Int. J. Med. Inf"},{"key":"2023020109021810500_btac065-B6","doi-asserted-by":"crossref","first-page":"20","DOI":"10.3390\/diagnostics9010020","article-title":"Machine-learning-based laboratory developed test for the diagnosis of sepsis in high-risk patients","volume":"9","author":"Calvert","year":"2019","journal-title":"Diagnostics"},{"key":"2023020109021810500_btac065-B7","doi-asserted-by":"crossref","first-page":"83","DOI":"10.1109\/MIS.2020.2988604","article-title":"FedHealth: a federated transfer learning framework for wearable healthcare","volume":"35","author":"Chen","year":"2020","journal-title":"IEEE Intell. Syst"},{"key":"2023020109021810500_btac065-B8","first-page":"87","author":"Cheng","year":"2021"},{"key":"2023020109021810500_btac065-B9","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/S0933-3657(02)00049-0","article-title":"Uniqueness of medical data mining","volume":"26","author":"Cios","year":"2002","journal-title":"Artif. Intell. Med"},{"key":"2023020109021810500_btac065-B10","doi-asserted-by":"crossref","first-page":"S2","DOI":"10.1186\/1472-6947-15-S5-S2","article-title":"Privacy-preserving GWAS analysis on federated genomic datasets","volume":"15","author":"Constable","year":"2015","journal-title":"BMC Med. Inf. Dec. Mak"},{"key":"2023020109021810500_btac065-B11","doi-asserted-by":"crossref","first-page":"94","DOI":"10.3390\/fi13040094","article-title":"Privacy preserving machine learning with homomorphic encryption and federated learning","volume":"13","author":"Fang","year":"2021","journal-title":"Fut. Internet"},{"key":"2023020109021810500_btac065-B12","first-page":"1","article-title":"Survey of machine learning algorithms for disease diagnostic","volume":"09","author":"Fatima","year":"2017","journal-title":"J. Intell. Learn. Syst. Appl"},{"key":"2023020109021810500_btac065-B14","author":"Gan","year":"2017"},{"key":"2023020109021810500_btac065-B16","doi-asserted-by":"crossref","first-page":"301","DOI":"10.1056\/NEJMp1006304","article-title":"The path to personalized medicine","volume":"363","author":"Hamburg","year":"2010","journal-title":"N. Engl. J. Med"},{"key":"2023020109021810500_btac065-B17","doi-asserted-by":"crossref","first-page":"171","DOI":"10.1515\/icom-2020-0024","article-title":"Explainable AI and multi-modal causability in medicine","volume":"19","author":"Holzinger","year":"2021","journal-title":"i-com"},{"key":"2023020109021810500_btac065-B18","volume-title":"Elements of Causal Inference Foundations and Learning Algorithms","author":"Janzing","year":"2017"},{"key":"2023020109021810500_btac065-B19","author":"Jeanquartier","year":"2016"},{"key":"2023020109021810500_btac065-B20","article-title":"Collective data mining: a new perspective toward distributed data mining","author":"Kargupta","year":"2000","journal-title":"Adv. Distrib. Parallel Knowl. Discov"},{"key":"2023020109021810500_btac065-B21","article-title":"Federated learning: strategies for improving communication efficiency","author":"Kone\u010dn\u00fd","year":"2016","journal-title":"arXiv"},{"key":"2023020109021810500_btac065-B22","article-title":"Federated optimization: distributed machine learning for on-device intelligence","author":"Kone\u010dn\u00fd","year":"2016","journal-title":"arXiv"},{"key":"2023020109021810500_btac065-B23","doi-asserted-by":"crossref","first-page":"735","DOI":"10.1200\/JCO.2017.74.6586","article-title":"JOURNAL OF CLINICAL ONCOLOGY PAM50 risk of recurrence score predicts 10-year distant recurrence in a comprehensive danish cohort of postmenopausal women allocated to 5 years of endocrine therapy for hormone receptor-positive early breast cancer","volume":"36","author":"L\u00e6nkholm","year":"2018","journal-title":"J. Clin. Oncol"},{"key":"2023020109021810500_btac065-B24","first-page":"311","article-title":"The distributed boosting algorithm","author":"Lazarevic","year":"2001"},{"key":"2023020109021810500_btac065-B25","first-page":"e7744","article-title":"Privacy-preserving patient similarity learning in a federated environment: development and analysis","volume":"6","author":"Lee","year":"2018","journal-title":"JMIR Med. Inf"},{"key":"2023020109021810500_btac065-B26","doi-asserted-by":"crossref","first-page":"101814","DOI":"10.1016\/j.artmed.2020.101814","article-title":"A multicenter random forest model for effective prognosis prediction in collaborative clinical research network","volume":"103","author":"Li","year":"2020","journal-title":"Artif. Intell. Med"},{"key":"2023020109021810500_btac065-B27","doi-asserted-by":"crossref","first-page":"400","DOI":"10.1016\/j.cell.2018.02.052","article-title":"An integrated TCGA pan-cancer clinical data resource to drive high-quality survival outcome analytics","volume":"173","author":"Liu","year":"2018","journal-title":"Cell"},{"key":"2023020109021810500_btac065-B28","article-title":"Federated forest","volume":"1","author":"Liu","year":"2020","journal-title":"IEEE Trans. Big Data"},{"key":"2023020109021810500_btac065-B29","first-page":"1016016","author":"Lorenzi","year":"2017"},{"key":"2023020109021810500_btac065-B30","first-page":"54, 1273","article-title":"Communication-efficient learning of deep networks from decentralized data","author":"McMahan","year":"2016","journal-title":"Artif. Intell. Stat"},{"key":"2023020109021810500_btac065-B32","author":"Nasirigerdeh","year":"2022"},{"key":"2023020109021810500_btac065-B33","doi-asserted-by":"crossref","first-page":"3148","DOI":"10.3390\/cancers13133148","article-title":"Integrative analysis of next-generation sequencing for next-generation cancer research toward artificial intelligence","volume":"13","author":"Park","year":"2021","journal-title":"Cancers"},{"key":"2023020109021810500_btac065-B34","doi-asserted-by":"crossref","first-page":"lqab104","DOI":"10.1093\/nargab\/lqab104","article-title":"Transfer learning compensates limited data, batch effects and technological heterogeneity in single-cell sequencing","volume":"3","author":"Park","year":"2021","journal-title":"NAR Genomics Bioinf"},{"key":"2023020109021810500_btac065-B35","first-page":"506","article-title":"A critical comparative study of liver patients from USA and INDIA: an exploratory analysis","volume":"9","author":"Ramana","year":"2012","journal-title":"Int. J. Comput. Sci. Issues"},{"key":"2023020109021810500_btac065-B36","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41746-020-00323-1","article-title":"The future of digital health with federated learning","volume":"3","author":"Rieke","year":"2020","journal-title":"NPJ Digit. Med"},{"key":"2023020109021810500_btac065-B37","doi-asserted-by":"crossref","first-page":"371","DOI":"10.1186\/s12859-017-1783-9","article-title":"eccCL: parallelized GPU implementation of ensemble classifier chains","volume":"18","author":"Riemenschneider","year":"2017","journal-title":"BMC Bioinformatics"},{"key":"2023020109021810500_btac065-B38","doi-asserted-by":"crossref","first-page":"186ra66","DOI":"10.1126\/scitranslmed.3005723","article-title":"Ectopic activation of germline and placental genes identifies aggressive metastasis-prone lung cancers","volume":"5","author":"Rousseaux","year":"2013","journal-title":"Sci. Transl. Med"},{"key":"2023020109021810500_btac065-B39","article-title":"Braintorrent: a peer-to-peer environment for decentralized federated learning","author":"Roy","year":"2019","journal-title":"arXiv"},{"key":"2023020109021810500_btac065-B40","doi-asserted-by":"crossref","first-page":"609","DOI":"10.1038\/520609a","article-title":"Personalized medicine: time for one-person trials","volume":"520","author":"Schork","year":"2015","journal-title":"Nature"},{"key":"2023020109021810500_btac065-B41","doi-asserted-by":"crossref","first-page":"2458","DOI":"10.1093\/bioinformatics\/bty984","article-title":"GUESS: projecting machine learning scores to well-calibrated probability estimates for clinical decision-making","volume":"35","author":"Schwarz","year":"2019","journal-title":"Bioinformatics"},{"key":"2023020109021810500_btac065-B42","doi-asserted-by":"crossref","first-page":"417","DOI":"10.1586\/erm.09.32","article-title":"MammaPrint 70-gene signature: another milestone in personalized medical care for breast cancer patients","volume":"9","author":"Slodkowska","year":"2009","journal-title":"Exp. Rev. Mol. Diagn"},{"key":"2023020109021810500_btac065-B43","first-page":"535","article-title":"Merging Decision Trees: a case study in predicting student performance","author":"Strecht","year":"2014"},{"key":"2023020109021810500_btac065-B44","doi-asserted-by":"crossref","first-page":"71","DOI":"10.1080\/10556788.2010.511669","article-title":"A new class of distributed optimization algorithms: application to regression of distributed data","volume":"27","author":"Sundhar Ram","year":"2012","journal-title":"Optim. Methods Softw"},{"key":"2023020109021810500_btac065-B45","author":"Sweeney","year":"2013"},{"key":"2023020109021810500_btac065-B46","year":"2016"},{"key":"2023020109021810500_btac065-B47","doi-asserted-by":"crossref","first-page":"480","DOI":"10.1016\/j.jbi.2013.03.008","article-title":"EXpectation Propagation LOgistic REgRession (EXPLORER): distributed privacy-preserving online model learning","volume":"46","author":"Wang","year":"2013","journal-title":"J. Biomed. Inf"},{"key":"2023020109021810500_btac065-B48","first-page":"1113","author":"Weinstein","year":"2013"},{"key":"2023020109021810500_btac065-B49","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1089\/sysm.2018.0013","article-title":"Time-resolved systems medicine reveals viral infection-modulating host targets","volume":"2","author":"Wiwie","year":"2019","journal-title":"Syst. Med"},{"key":"2023020109021810500_btac065-B50","doi-asserted-by":"crossref","first-page":"9193","DOI":"10.1073\/pnas.87.23.9193","article-title":"Multisurface method of pattern separation for medical diagnosis applied to breast cytology","volume":"87","author":"Wolberg","year":"1990","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023020109021810500_btac065-B51","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3339474","article-title":"Federated machine learning","volume":"10","author":"Yang","year":"2019","journal-title":"ACM Trans. Intell. Syst. Technol"},{"key":"2023020109021810500_btac065-B52","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3339474","article-title":"Federated machine learning: concept and applications","volume":"10","author":"Yang","year":"2019","journal-title":"ACM Trans. Intell. Syst. Technol"},{"key":"2023020109021810500_btac065-B53","doi-asserted-by":"crossref","first-page":"393","DOI":"10.1016\/j.procs.2020.02.235","article-title":"Privacy-preserving machine learning as a tool for secure personalized information services","volume":"169","author":"Zapechnikov","year":"2020","journal-title":"Proc. Comput. Sci"},{"key":"2023020109021810500_btac065-B54","article-title":"Learning from longitudinal data in electronic health record and genetic data to improve cardiovascular event prediction","volume":"9","author":"Zhao","year":"2019","journal-title":"Sci. Rep"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btac065\/42580378\/btac065.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/8\/2278\/49009424\/btac065.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/8\/2278\/49009424\/btac065.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,1]],"date-time":"2023-02-01T20:49:47Z","timestamp":1675284587000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/38\/8\/2278\/6525214"}},"subtitle":[],"editor":[{"given":"Jonathan","family":"Wren","sequence":"additional","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2022,2,9]]},"references-count":51,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2022,4,12]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btac065","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,4,15]]},"published":{"date-parts":[[2022,2,9]]}}}