{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T07:55:59Z","timestamp":1773820559133,"version":"3.50.1"},"reference-count":46,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T00:00:00Z","timestamp":1773792000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Bioinform."],"abstract":"<jats:sec>\n                    <jats:title>Introduction<\/jats:title>\n                    <jats:p>Rapid diagnosis of bacterial pneumonia is crucial for clinical diagnosis and treatment, but traditional methods are time-consuming. The wide application of machine learning techniques in medical diagnosis provides an effective way to solve this problem. However, the complexity of medical datasets and the problem of class imbalance poses serious challenges to classical machine learning algorithms.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Methods<\/jats:title>\n                    <jats:p>Aiming at the multiclass imbalanced problem in complete blood count (CBC) datasets, this study proposes a novel ensemble learning algorithm, Forest of Evolutionary Multi-Classifiers Based on Bagging with Error-Correcting Output Coding (Forest-EMCBE). The algorithm integrates Multi-Objective Genetic Algorithm, Error-Correcting Output Codes (ECOC), and balanced sampling strategy, which enhances the generalization ability of the classifiers through a three-layer integrated structure.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>To validate the effectiveness of the proposed method, we trained the diagnostic model on a CBC dataset, which contains 1,457 samples and 4 different classes of bacterial pneumonia results, and compared it with 11 state-of-the-art algorithms. The experimental results demonstrate the superior performance of the Forest-EMCBE algorithm on the CBC dataset, outperforming all other compared algorithms.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Discussion<\/jats:title>\n                    <jats:p>Based on the Shapley value-based feature importance analysis method, this study dissects the contributions of key features to the prediction outcomes and further elucidates the differential impacts of features such as age, gender, and neutrophil percentage on predicting infections by different bacterial species.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.3389\/fbinf.2026.1792643","type":"journal-article","created":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T06:56:52Z","timestamp":1773817012000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Forest-EMCBE: an evolutionary ensemble learning algorithm for multiclass diagnosis of bacterial pneumonia using the CBC dataset"],"prefix":"10.3389","volume":"6","author":[{"given":"Yimin","family":"Shen","sequence":"first","affiliation":[{"name":"School of Computer, Electronics and Information, Guangxi University","place":["Nanning, China"]}]},{"given":"Xiaotian","family":"Xu","sequence":"additional","affiliation":[{"name":"Qixia People\u2019s Hospital of Shandong Province","place":["Yantai, China"]}]},{"given":"Xiaoxi","family":"Hao","sequence":"additional","affiliation":[{"name":"School of Computer, Electronics and Information, Guangxi University","place":["Nanning, China"]}]},{"given":"Cuimin","family":"Sun","sequence":"additional","affiliation":[{"name":"School of Computer, Electronics and Information, Guangxi University","place":["Nanning, China"]},{"name":"Guangxi Colleges and Universities Key Laboratory of Multimedia Communications and Information Processing","place":["Nanning, China"]}]},{"given":"Wei","family":"Lan","sequence":"additional","affiliation":[{"name":"School of Computer, Electronics and Information, Guangxi University","place":["Nanning, China"]},{"name":"Guangxi Colleges and Universities Key Laboratory of Multimedia Communications and Information Processing","place":["Nanning, China"]}]}],"member":"1965","published-online":{"date-parts":[[2026,3,18]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","first-page":"1385","DOI":"10.1007\/s00726-010-0595-2","article-title":"An approach for classification of highly imbalanced data using weighting and undersampling","volume":"39","author":"Anand","year":"2010","journal-title":"Amino Acids"},{"key":"B2","volume-title":"The behavior of adaptive systems which employ genetic and correlation algorithms","author":"Bagley","year":"1967"},{"key":"B3","doi-asserted-by":"publisher","first-page":"78","DOI":"10.1007\/s10916-016-0436-2","article-title":"Learning ECOC code matrix for multiclass classification with application to glaucoma diagnosis","volume":"40","author":"Bai","year":"2016","journal-title":"J. Medical Systems"},{"key":"B4","doi-asserted-by":"publisher","first-page":"123","DOI":"10.1023\/A:1018054314350","article-title":"Bagging predictors","volume":"24","author":"Breiman","year":"1996","journal-title":"Mach. Learning"},{"key":"B5","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learning"},{"key":"B6","doi-asserted-by":"publisher","first-page":"321","DOI":"10.1613\/jair.953","article-title":"SMOTE: synthetic minority over-sampling technique","volume":"16","author":"Chawla","year":"2002","journal-title":"J. Artificial Intelligence Research"},{"key":"B7","doi-asserted-by":"crossref","DOI":"10.1145\/2939672.2939785","volume-title":"XGBoost: a scalable tree boosting System","author":"Chen","year":"2016"},{"key":"B8","doi-asserted-by":"publisher","first-page":"195","DOI":"10.1016\/j.inffus.2017.09.010","article-title":"Dynamic classifier selection: recent advances and perspectives","volume":"41","author":"Cruz","year":"2018","journal-title":"Inf. Fusion"},{"key":"B9","doi-asserted-by":"publisher","first-page":"124558","DOI":"10.1016\/j.eswa.2024.124558","article-title":"Class-overlap detection based on heterogeneous clustering ensemble for multi-class imbalance problem","volume":"255","author":"Dai","year":"2024","journal-title":"Expert Syst. Appl."},{"key":"B10","doi-asserted-by":"publisher","first-page":"182","DOI":"10.1109\/4235.996017","article-title":"A fast and elitist multiobjective genetic algorithm: NSGA-II","volume":"6","author":"Deb","year":"2002","journal-title":"IEEE Transactions Evolutionary Computation"},{"key":"B11","doi-asserted-by":"publisher","first-page":"263","DOI":"10.1613\/jair.105","article-title":"Solving multiclass learning problems via error-correcting output codes","volume":"2","author":"Dietterich","year":"1994","journal-title":"J. Artificial Intelligence Research"},{"key":"B12","doi-asserted-by":"publisher","first-page":"241","DOI":"10.1007\/s11704-019-8208-z","article-title":"A survey on ensemble learning","volume":"14","author":"Dong","year":"2020","journal-title":"Front. Comput. Sci."},{"key":"B13","doi-asserted-by":"publisher","first-page":"95","DOI":"10.1016\/j.mib.2020.09.013","article-title":"The challenges of estimating the human global burden of disease of antimicrobial resistant bacteria","volume":"57","author":"Dunachie","year":"2020","journal-title":"Curr. Opin. Microbiol."},{"key":"B14","doi-asserted-by":"publisher","first-page":"141","DOI":"10.1016\/j.ins.2019.04.052","article-title":"Evolutionary inversion of class distribution in overlapping areas for multi-class imbalanced learning","volume":"494","author":"Fernandes","year":"2019","journal-title":"Inf. Sci."},{"key":"B15","doi-asserted-by":"publisher","first-page":"1104","DOI":"10.1109\/TKDE.2019.2898861","article-title":"Ensemble of classifiers based on multiobjective genetic sampling for imbalanced data","volume":"32","author":"Fernandes","year":"2019","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"B16","doi-asserted-by":"publisher","first-page":"228","DOI":"10.1186\/s12911-024-02630-z","article-title":"Prediction of sepsis mortality in ICU patients using machine learning methods","volume":"24","author":"Gao","year":"2024","journal-title":"BMC Med. Inf. Decis. Mak."},{"key":"B17","doi-asserted-by":"publisher","first-page":"22","DOI":"10.1016\/j.ins.2018.03.002","article-title":"Dynamic ensemble selection for multi-class imbalanced datasets","volume":"445","author":"Garc\u00eda","year":"2018","journal-title":"Inf. Sci."},{"key":"B18","doi-asserted-by":"publisher","first-page":"220","DOI":"10.1016\/j.eswa.2016.12.035","article-title":"Learning from class-imbalanced data: review of methods and applications","volume":"73","author":"Haixiang","year":"2017","journal-title":"Expert Systems Applications"},{"key":"B19","first-page":"1322","article-title":"ADASYN: adaptive synthetic sampling approach for imbalanced learning","volume-title":"2008 IEEE international joint conference on neural networks","author":"He","year":"2008"},{"key":"B20","doi-asserted-by":"publisher","first-page":"364","DOI":"10.1007\/s10495-024-02050-4","article-title":"Integrated explainable machine learning and multi-omics analysis for survival prediction in cancer with immunotherapy response","volume":"30","author":"Hounye","year":"2025","journal-title":"Apoptosis"},{"key":"B21","doi-asserted-by":"publisher","first-page":"109817","DOI":"10.1016\/j.knosys.2022.109817","article-title":"Deep active learning models for imbalanced image classification","volume":"257","author":"Jin","year":"2022","journal-title":"Knowledge-Based Syst."},{"key":"B22","doi-asserted-by":"publisher","first-page":"122778","DOI":"10.1016\/j.eswa.2023.122778","article-title":"A review of ensemble learning and data augmentation models for class imbalanced problems: combination, implementation and evaluation","volume":"244","author":"Khan","year":"2024","journal-title":"Expert Syst. Appl."},{"key":"B23","doi-asserted-by":"publisher","first-page":"552","DOI":"10.1109\/TSMCA.2010.2084081","article-title":"Comparing boosting and bagging techniques with noisy and imbalanced data","volume":"41","author":"Khoshgoftaar","year":"2010","journal-title":"IEEE Trans. Syst. Man, Cybernetics-Part A Syst. Humans"},{"key":"B24","doi-asserted-by":"publisher","first-page":"34","DOI":"10.1016\/j.patcog.2018.05.015","article-title":"Dynamic ensemble selection for multi-class classification with one-class classifiers","volume":"83","author":"Krawczyk","year":"2018","journal-title":"Pattern Recognit."},{"key":"B25","doi-asserted-by":"publisher","first-page":"116962","DOI":"10.1016\/j.eswa.2022.116962","article-title":"What makes multi-class imbalanced problems difficult? An experimental study","volume":"199","author":"Lango","year":"2022","journal-title":"Expert Syst. Appl."},{"key":"B26","doi-asserted-by":"publisher","first-page":"105580","DOI":"10.1016\/j.asoc.2019.105580","article-title":"Maximizing diversity by transformed ensemble learning","volume":"82","author":"Mao","year":"2019","journal-title":"Appl. Soft Comput."},{"key":"B27","doi-asserted-by":"publisher","first-page":"419","DOI":"10.3414\/ME13-01-0122","article-title":"The evolution of boosting algorithms","volume":"53","author":"Mayr","year":"2014","journal-title":"Methods Information Medicine"},{"key":"B28","doi-asserted-by":"publisher","first-page":"2817","DOI":"10.1038\/s41467-024-46663-4","article-title":"Data-driven identification of predictive risk biomarkers for subgroups of osteoarthritis using interpretable machine learning","volume":"15","author":"Nielsen","year":"2024","journal-title":"Nat. Commun."},{"key":"B29","doi-asserted-by":"publisher","first-page":"5258","DOI":"10.1109\/TNNLS.2024.3383672","article-title":"To combat multiclass imbalanced problems by aggregating evolutionary hierarchical classifiers","volume":"36","author":"Ning","year":"2024","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"B30","doi-asserted-by":"publisher","first-page":"395","DOI":"10.1016\/j.ins.2022.12.090","article-title":"Non-parallel bounded support matrix machine and its application in roller bearing fault diagnosis","volume":"624","author":"Pan","year":"2023","journal-title":"Inf. Sci."},{"key":"B31","doi-asserted-by":"publisher","first-page":"110886","DOI":"10.1016\/j.patcog.2024.110886","article-title":"Imbalanced ensemble learning leveraging a novel data-level diversity metric","volume":"157","author":"Pang","year":"2025","journal-title":"Pattern Recognit."},{"key":"B32","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1038\/s41591-021-01614-0","article-title":"AI in health and medicine","volume":"28","author":"Rajpurkar","year":"2022","journal-title":"Nat. Medicine"},{"key":"B33","doi-asserted-by":"publisher","first-page":"541","DOI":"10.1007\/s11227-022-04679-x","article-title":"GAAE: a novel genetic algorithm based on autoencoder with ensemble classifiers for imbalanced healthcare data","volume":"79","author":"Ram","year":"2023","journal-title":"J. Supercomput."},{"key":"B34","doi-asserted-by":"publisher","first-page":"47","DOI":"10.1016\/j.jneumeth.2017.12.005","article-title":"Ensemble of random forests one vs. rest classifiers for MCI and AD prediction using ANOVA cortical and subcortical feature selection and partial least squares","volume":"302","author":"Ram\u00edrez","year":"2018","journal-title":"J. Neuroscience Methods"},{"key":"B35","doi-asserted-by":"publisher","first-page":"103319","DOI":"10.1016\/j.engappai.2019.103319","article-title":"Improvement of bagging performance for classification of imbalanced datasets using evolutionary multi-objective optimization","volume":"87","author":"Roshan","year":"2020","journal-title":"Eng. Appl. Artif. Intell."},{"key":"B36","doi-asserted-by":"publisher","first-page":"119522","DOI":"10.1016\/j.eswa.2023.119522","article-title":"Imbalanced ensemble learning in determining Parkinson\u2019s disease using Keystroke dynamics","volume":"217","author":"Roy","year":"2023","journal-title":"Expert Syst. Appl."},{"key":"B37","doi-asserted-by":"publisher","first-page":"273","DOI":"10.1007\/s10462-024-10884-2","article-title":"Handling imbalanced medical datasets: review of a decade of research","volume":"57","author":"Salmi","year":"2024","journal-title":"Artif. Intelligence Review"},{"key":"B38","doi-asserted-by":"publisher","first-page":"344","DOI":"10.1038\/s41746-024-01299-y","article-title":"A machine-learned model for predicting weight loss success using weight change features early in treatment","volume":"7","author":"Shahabi","year":"2024","journal-title":"Npj Digit. Med."},{"key":"B39","doi-asserted-by":"publisher","first-page":"70","DOI":"10.1186\/s40537-020-00349-y","article-title":"Boosting methods for multi-class imbalanced data classification: an experimental review","volume":"7","author":"Tanha","year":"2020","journal-title":"J. Big Data"},{"key":"B40","doi-asserted-by":"publisher","first-page":"369","DOI":"10.1613\/jair.120","article-title":"Cost-sensitive classification: empirical evaluation of a hybrid genetic decision tree induction algorithm","volume":"2","author":"Turney","year":"1994","journal-title":"J. Artificial Intelligence Research"},{"key":"B41","doi-asserted-by":"publisher","first-page":"351","DOI":"10.1186\/s12933-024-02439-0","article-title":"Construction of machine learning diagnostic models for cardiovascular pan-disease based on blood routine and biochemical detection data","volume":"23","author":"Wang","year":"2024","journal-title":"Cardiovasc. Diabetol."},{"key":"B42","doi-asserted-by":"publisher","first-page":"372","DOI":"10.1093\/cvr\/cvae018","article-title":"Networks of gut bacteria relate to cardiovascular disease in a multi-ethnic population: the HELIUS study","volume":"120","author":"Warmbrunn","year":"2024","journal-title":"Cardiovasc. Res."},{"key":"B43","doi-asserted-by":"publisher","first-page":"121269","DOI":"10.1016\/j.eswa.2023.121269","article-title":"A genetic Algorithm-based sequential instance selection framework for ensemble learning","volume":"236","author":"Xu","year":"2024","journal-title":"Expert Syst. Appl."},{"key":"B44","doi-asserted-by":"publisher","first-page":"100709","DOI":"10.1016\/j.swevo.2020.100709","article-title":"A novel multi-objective genetic algorithm based error correcting output codes","volume":"57","author":"Zhang","year":"2020","journal-title":"Swarm Evol. Comput."},{"key":"B45","doi-asserted-by":"publisher","first-page":"80","DOI":"10.1016\/j.inffus.2016.11.009","article-title":"One versus one multi-class classification fusion using optimizing decision directed acyclic graph for predicting listing status of companies","volume":"36","author":"Zhou","year":"2017","journal-title":"Inf. Fusion"},{"key":"B46","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.ins.2021.04.038","article-title":"The design of dynamic ensemble selection strategy for the error-correcting output codes family","volume":"571","author":"Zou","year":"2021","journal-title":"Inf. Sci."}],"container-title":["Frontiers in Bioinformatics"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fbinf.2026.1792643\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T06:56:55Z","timestamp":1773817015000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fbinf.2026.1792643\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3,18]]},"references-count":46,"alternative-id":["10.3389\/fbinf.2026.1792643"],"URL":"https:\/\/doi.org\/10.3389\/fbinf.2026.1792643","relation":{},"ISSN":["2673-7647"],"issn-type":[{"value":"2673-7647","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,3,18]]},"article-number":"1792643"}}