{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,24]],"date-time":"2026-02-24T08:11:13Z","timestamp":1771920673117,"version":"3.50.1"},"reference-count":38,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2026,1,14]],"date-time":"2026-01-14T00:00:00Z","timestamp":1768348800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["No. 62304057"],"award-info":[{"award-number":["No. 62304057"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Bioinform."],"abstract":"<jats:sec>\n                    <jats:title>Introduction<\/jats:title>\n                    <jats:p>Bacterial pneumonia remains a major global health challenge, and early pathogen identification is important for timely and targeted treatment. However, conventional microbiological diagnostics such as sputum or blood culture are labor-intensive and time-consuming.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Methods<\/jats:title>\n                    <jats:p>\n                      We propose an interpretable ensemble learning framework (PreBP) for rapid pathogen identification using routinely available complete blood count (CBC) parameters. We analyzed 1,334 CBC samples from patients with culture-confirmed bacterial pneumonia caused by four major pathogens:\n                      <jats:italic>Pseudomonas aeruginosa<\/jats:italic>\n                      ,\n                      <jats:italic>Escherichia coli<\/jats:italic>\n                      ,\n                      <jats:italic>Staphylococcus aureus<\/jats:italic>\n                      , and\n                      <jats:italic>Streptococcus<\/jats:italic>\n                      pneumoniae. Pathogen labels were determined based on clinical culture results. Five machine learning models (extreme gradient boosting (XGBoost), multilayer perceptron neural network (MLPNN), adaptive boosting (AdaBoost), random forest (RF), and extremely randomized trees (ExtraTrees)) were trained as comparators, and PreBP was developed with metaheuristic-optimized hyperparameters. Key CBC biomarkers were refined using a dual-phase feature selection strategy combining Lasso and Boruta. To enhance transparency, SHapley additive explanations (SHAP) were applied to provide both global biomarker importance and local, case-level explanations.\n                    <\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>PreBP achieved the best overall performance, with an AUC of 0.920, precision of 87.1%, and accuracy and sensitivity of 86.7%.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Discussion<\/jats:title>\n                    <jats:p>Because the framework relies on routine CBC measurements, it can generate interpretable predictions once CBC results are available, which may provide supplementary evidence for earlier pathogen-oriented clinical decision-making alongside culture-dependent workflows. Overall, PreBP offers an interpretable and computational approach for pathogen identification in bacterial pneumonia based on routine laboratory data.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.3389\/fbinf.2025.1769816","type":"journal-article","created":{"date-parts":[[2026,1,14]],"date-time":"2026-01-14T06:30:57Z","timestamp":1768372257000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["PreBP: an interpretable, optimized ensemble framework using routine complete blood count for rapid pathogen identification in bacterial pneumonia"],"prefix":"10.3389","volume":"5","author":[{"given":"Xiaoxi","family":"Hao","sequence":"first","affiliation":[{"name":"School of Computer, Electronics and Information, Guangxi University","place":["Nanning, China"]}]},{"given":"Dingjian","family":"Liang","sequence":"additional","affiliation":[{"name":"Department of Radiation Oncology, Minzu Hospital of Guangxi Zhuang Autonomous Region, Affiliated Minzu Hospital of Guangxi Medical University","place":["Nanning, Guangxi, China"]}]},{"given":"Yimin","family":"Shen","sequence":"additional","affiliation":[{"name":"School of Computer, Electronics and Information, Guangxi University","place":["Nanning, China"]}]},{"given":"Cuimin","family":"Sun","sequence":"additional","affiliation":[{"name":"School of Computer, Electronics and Information, Guangxi University","place":["Nanning, China"]},{"name":"Guangxi Colleges and Universities Key Laboratory of Multimedia Communications and Information Processing","place":["Nanning, China"]}]},{"given":"Wei","family":"Lan","sequence":"additional","affiliation":[{"name":"School of Computer, Electronics and Information, Guangxi University","place":["Nanning, China"]},{"name":"Guangxi Colleges and Universities Key Laboratory of Multimedia Communications and Information Processing","place":["Nanning, China"]}]}],"member":"1965","published-online":{"date-parts":[[2026,1,14]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","first-page":"4165","DOI":"10.1007\/s10994-023-06353-6","article-title":"A survey on learning from imbalanced data streams: taxonomy, challenges, empirical study, and reproducible experimental framework","volume":"113","author":"Aguiar","year":"2023","journal-title":"Mach. Learn."},{"key":"B2","doi-asserted-by":"publisher","first-page":"e25884","DOI":"10.2196\/25884","article-title":"Machine learning approach to predicting COVID-19 disease severity based on clinical blood test data: statistical analysis and model development","volume":"9","author":"Aktar","year":"2021","journal-title":"JMIR Med. Inf."},{"key":"B3","doi-asserted-by":"publisher","DOI":"10.4081\/monaldi.2021.2050","article-title":"Association between neutrophil to lymphocyte ratio and mortality among community acquired pneumonia patients: a meta-analysis","volume":"92","author":"Alzoubi","year":"2021","journal-title":"Monaldi Arch. Chest Dis."},{"key":"B4","doi-asserted-by":"publisher","first-page":"629","DOI":"10.1016\/S0140-6736(21)02724-0","article-title":"Global burden of bacterial antimicrobial resistance in 2019: a systematic analysis","volume":"399","author":"Antimicrobial Resistance","year":"2022","journal-title":"Lancet"},{"key":"B5","doi-asserted-by":"publisher","first-page":"1610","DOI":"10.1016\/j.spinee.2020.10.006","article-title":"Fostering reproducibility and generalizability in machine learning for clinical prediction modeling in spine surgery","volume":"21","author":"Azad","year":"2021","journal-title":"Spine J."},{"key":"B6","doi-asserted-by":"publisher","first-page":"1032","DOI":"10.1007\/s11336-023-09914-9","article-title":"Comparing bayesian variable selection to lasso approaches for applications in psychology","volume":"88","author":"Bainter","year":"2023","journal-title":"Psychometrika"},{"key":"B7","doi-asserted-by":"publisher","first-page":"123667","DOI":"10.1016\/j.eswa.2024.123667","article-title":"Analysis and comparison of feature selection methods towards performance and stability","volume":"249","author":"Barbieri","year":"2024","journal-title":"Expert Syst. Appl."},{"key":"B8","doi-asserted-by":"publisher","first-page":"56","DOI":"10.1016\/S1473-3099(18)30605-4","article-title":"Attributable deaths and disability-adjusted life-years caused by infections with antibiotic-resistant bacteria in the EU and the European Economic area in 2015: a population-level modelling analysis","volume":"19","author":"Cassini","year":"2019","journal-title":"Lancet Infect. Dis."},{"key":"B9","doi-asserted-by":"publisher","first-page":"23742895211015343","DOI":"10.1177\/23742895211015343","article-title":"Educational case: staphylococcus aureus bacteremia: utilization of rapid diagnostics for bloodstream pathogen identification and prediction of antimicrobial susceptibility","volume":"8","author":"Castrodad-Rodriguez","year":"2021","journal-title":"Acad. Pathol."},{"key":"B10","doi-asserted-by":"publisher","first-page":"e1063","DOI":"10.1097\/CCM.0000000000005337","article-title":"Surviving sepsis campaign: international guidelines for management of sepsis and septic shock 2021","volume":"49","author":"Evans","year":"2021","journal-title":"Crit. Care Med."},{"key":"B11","doi-asserted-by":"publisher","first-page":"3685","DOI":"10.1007\/s11135-022-01480-z","article-title":"Identification of the most important external features of highly cited scholarly papers through 3 (i.e., Ridge, Lasso, and Boruta) feature selection data mining methods","volume":"57","author":"Fahimifar","year":"2022","journal-title":"Qual. & Quantity"},{"key":"B12","doi-asserted-by":"publisher","first-page":"365","DOI":"10.1007\/s13167-021-00252-3","article-title":"Diagnosing hospital bacteraemia in the framework of predictive, preventive and personalised medicine using electronic health records and machine learning classifiers","volume":"12","author":"Garnica","year":"2021","journal-title":"EPMA J."},{"key":"B13","doi-asserted-by":"publisher","first-page":"128682","DOI":"10.1016\/j.fuel.2023.128682","article-title":"Super learner approach to predict total organic carbon using stacking machine learning models based on well logs","volume":"353","author":"Goliatt","year":"2023","journal-title":"Fuel"},{"key":"B14","doi-asserted-by":"publisher","first-page":"104858","DOI":"10.1016\/j.ebiom.2023.104858","article-title":"A dual-process of targeted and unbiased nanopore sequencing enables accurate and rapid diagnosis of lower respiratory infections","volume":"98","author":"Guo","year":"2023","journal-title":"EBioMedicine"},{"key":"B15","doi-asserted-by":"publisher","first-page":"100254","DOI":"10.1016\/j.caeai.2024.100254","article-title":"XGBoost to enhance learner performance prediction","volume":"7","author":"Hakkal","year":"2024","journal-title":"Comput. Educ. Artif. Intell."},{"key":"B16","doi-asserted-by":"publisher","first-page":"1105","DOI":"10.1007\/s10115-022-01791-5","article-title":"Dynamic ensemble selection classification algorithm based on window over imbalanced drift data stream","volume":"65","author":"Han","year":"2022","journal-title":"Knowl. Inf. Syst."},{"key":"B17","doi-asserted-by":"publisher","first-page":"529","DOI":"10.1080\/16078454.2021.1950898","article-title":"Hematological and biochemical parameters as diagnostic and prognostic markers in SARS-COV-2 infected patients of Pakistan: a retrospective comparative analysis","volume":"26","author":"Khalid","year":"2021","journal-title":"Hematology"},{"key":"B18","doi-asserted-by":"publisher","first-page":"697","DOI":"10.3389\/fmicb.2016.00697","article-title":"How to optimize the use of blood cultures for the diagnosis of bloodstream infections? A state-of-the art","volume":"7","author":"Lamy","year":"2016","journal-title":"Front. Microbiol."},{"key":"B19","doi-asserted-by":"publisher","first-page":"120456","DOI":"10.1016\/j.eswa.2023.120456","article-title":"Nonlocal low-rank plus deep denoising prior for robust image compressed sensing reconstruction","volume":"228","author":"Li","year":"2023","journal-title":"Expert Syst. Appl."},{"key":"B20","doi-asserted-by":"publisher","first-page":"447","DOI":"10.1186\/s12890-024-03252-x","article-title":"Interpretable mortality prediction model for ICU patients with pneumonia: using shapley additive explanation method","volume":"24","author":"Li","year":"2024","journal-title":"BMC Pulm. Med."},{"key":"B21","doi-asserted-by":"publisher","first-page":"288","DOI":"10.1186\/s12879-023-08262-4","article-title":"Comparison of presepsin and mid-regional Pro-adrenomedullin in the diagnosis of sepsis or septic shock: a systematic review and meta-analysis","volume":"23","author":"Liang","year":"2023","journal-title":"BMC Infect. Dis."},{"key":"B22","doi-asserted-by":"publisher","first-page":"18627","DOI":"10.1038\/s41598-024-69209-6","article-title":"Comparative analysis of feature selection techniques for COVID-19 dataset","volume":"14","author":"Mohtasham","year":"2024","journal-title":"Sci. Rep."},{"key":"B23","doi-asserted-by":"publisher","first-page":"821","DOI":"10.1111\/1469-0691.12719","article-title":"The difficult-to-control spread of carbapenemase producers among Enterobacteriaceae worldwide","volume":"20","author":"Nordmann","year":"2014","journal-title":"Clin. Microbiol. Infect."},{"key":"B24","doi-asserted-by":"publisher","first-page":"1152","DOI":"10.3390\/en16031152","article-title":"Review and comparison of genetic algorithm and particle swarm optimization in the optimal power flow problem","volume":"16","author":"Papazoglou","year":"2023","journal-title":"Energies"},{"key":"B25","doi-asserted-by":"publisher","first-page":"e70056","DOI":"10.1111\/cts.70056","article-title":"Practical guide to SHAP analysis: explaining supervised machine learning model predictions in drug development","volume":"17","author":"Ponce-Bobadilla","year":"2024","journal-title":"Clin. Transl. Sci."},{"key":"B26","doi-asserted-by":"publisher","first-page":"102437","DOI":"10.1016\/j.rineng.2024.102437","article-title":"Metaheuristic optimization algorithms for real-world electrical and civil engineering application: a review","volume":"23","author":"Rezk","year":"2024","journal-title":"Results Eng."},{"key":"B27","doi-asserted-by":"publisher","first-page":"108 e101","DOI":"10.1016\/j.cmi.2022.07.013","article-title":"Adults with symptoms of pneumonia: a prospective comparison of patients with and without infiltrates on chest radiography","volume":"29","author":"Rognvaldsson","year":"2023","journal-title":"Clin. Microbiol. Infect."},{"key":"B28","doi-asserted-by":"publisher","first-page":"646","DOI":"10.1016\/j.jiac.2023.03.001","article-title":"Adherence to use of blood cultures according to current national guidelines and their impact in patients with community acquired pneumonia: a retrospective cohort","volume":"29","author":"Ruiz-Gaviria","year":"2023","journal-title":"J. Infect. Chemother."},{"key":"B29","doi-asserted-by":"publisher","first-page":"178","DOI":"10.1016\/j.imj.2023.05.003","article-title":"Pseudomonas aeruginosa: infections and novel approaches to treatment \u201cKnowing the enemy\u201d the threat of Pseudomonas aeruginosa and exploring novel approaches to treatment","volume":"2","author":"Sathe","year":"2023","journal-title":"Infect. Med. (Beijing)"},{"key":"B30","doi-asserted-by":"publisher","first-page":"4219","DOI":"10.1038\/s41467-021-24454-5","article-title":"Improved CRISPR genome editing using small highly active and specific engineered RNA-guided nucleases","volume":"12","author":"Schmidt","year":"2021","journal-title":"Nat. Commun."},{"key":"B31","doi-asserted-by":"publisher","first-page":"367","DOI":"10.1186\/s12871-023-02317-4","article-title":"Metabolomics and machine learning approaches for diagnostic and prognostic biomarkers screening in sepsis","volume":"23","author":"She","year":"2023","journal-title":"BMC Anesthesiol."},{"key":"B32","doi-asserted-by":"publisher","first-page":"e13957","DOI":"10.1111\/eci.13957","article-title":"Multisite validation of a host response signature for predicting likelihood of bacterial and viral infections in patients with suspected influenza","volume":"53","author":"Shojaei","year":"2023","journal-title":"Eur. J. Clin. Invest"},{"key":"B33","doi-asserted-by":"publisher","first-page":"331","DOI":"10.1007\/s42514-021-00080-x","article-title":"Computing infrastructure construction and optimization for high-performance computing and artificial intelligence","volume":"3","author":"Su","year":"2021","journal-title":"CCF Trans. High Perform. Comput."},{"key":"B34","doi-asserted-by":"publisher","first-page":"469","DOI":"10.1093\/cid\/ciac727","article-title":"Association between time to appropriate antimicrobial treatment and 30-day mortality in patients with bloodstream infections: a retrospective cohort study","volume":"76","author":"Van Heuverswyn","year":"2023","journal-title":"Clin. Infect. Dis."},{"key":"B35","doi-asserted-by":"publisher","first-page":"58","DOI":"10.53469\/jtpes.2024.04(01).08","article-title":"Diabetes risk analysis based on machine learning LASSO regression model","volume":"4","author":"Wang","year":"2024","journal-title":"J. Theory Pract. Eng. Sci."},{"key":"B36","doi-asserted-by":"publisher","first-page":"210","DOI":"10.54254\/2755-2721\/22\/20231219","article-title":"A systematic review: deep learning-based methods for pneumonia region detection","volume":"22","author":"Xu","year":"2023","journal-title":"Appl. Comput. Eng."},{"key":"B37","doi-asserted-by":"publisher","first-page":"7305","DOI":"10.1007\/s11227-022-04959-6","article-title":"Dung beetle optimizer: a new meta-heuristic algorithm for global optimization","volume":"79","author":"Xue","year":"2022","journal-title":"J. Supercomput."},{"key":"B38","doi-asserted-by":"publisher","first-page":"224","DOI":"10.1186\/s12859-023-05300-5","article-title":"A diabetes prediction model based on Boruta feature selection and ensemble learning","volume":"24","author":"Zhou","year":"2023","journal-title":"BMC Bioinforma."}],"container-title":["Frontiers in Bioinformatics"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fbinf.2025.1769816\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,2,24]],"date-time":"2026-02-24T05:39:08Z","timestamp":1771911548000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fbinf.2025.1769816\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,1,14]]},"references-count":38,"alternative-id":["10.3389\/fbinf.2025.1769816"],"URL":"https:\/\/doi.org\/10.3389\/fbinf.2025.1769816","relation":{},"ISSN":["2673-7647"],"issn-type":[{"value":"2673-7647","type":"electronic"}],"subject":[],"published":{"date-parts":[[2026,1,14]]},"article-number":"1769816"}}