{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,19]],"date-time":"2026-02-19T23:08:31Z","timestamp":1771542511278,"version":"3.50.1"},"reference-count":84,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2021,2,3]],"date-time":"2021-02-03T00:00:00Z","timestamp":1612310400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,2,3]],"date-time":"2021-02-03T00:00:00Z","timestamp":1612310400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BioData Mining"],"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Background<\/jats:title><jats:p>Sepsis is a life-threatening clinical condition that happens when the patient\u2019s body has an excessive reaction to an infection, and should be treated in one hour. Due to the urgency of sepsis, doctors and physicians often do not have enough time to perform laboratory tests and analyses to help them forecast the consequences of the sepsis episode. In this context, machine learning can provide a fast computational prediction of sepsis severity, patient survival, and sequential organ failure by just analyzing the electronic health records of the patients. Also, machine learning can be employed to understand which features in the medical records are more predictive of sepsis severity, of patient survival, and of sequential organ failure in a fast and non-invasive way.<\/jats:p><\/jats:sec><jats:sec><jats:title>Dataset and methods<\/jats:title><jats:p>In this study, we analyzed a dataset of electronic health records of 364 patients collected between 2014 and 2016. The medical record of each patient has 29 clinical features, and includes a binary value for survival, a binary value for septic shock, and a numerical value for the sequential organ failure assessment (SOFA) score. We disjointly utilized each of these three factors as an independent target, and employed several machine learning methods to predict it (binary classifiers for survival and septic shock, and regression analysis for the SOFA score). Afterwards, we used a data mining approach to identify the most important dataset features in relation to each of the three targets separately, and compared these results with the results achieved through a standard biostatistics approach.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results and conclusions<\/jats:title><jats:p>Our results showed that machine learning can be employed efficiently to predict septic shock, SOFA score, and survival of patients diagnoses with sepsis, from their electronic health records data. And regarding clinical feature ranking, our results showed that Random Forests feature selection identified several unexpected symptoms and clinical components as relevant for septic shock, SOFA score, and survival. These discoveries can help doctors and physicians in understanding and predicting septic shock. We made the analyzed dataset and our developed software code publicly available online.<\/jats:p><\/jats:sec>","DOI":"10.1186\/s13040-021-00235-0","type":"journal-article","created":{"date-parts":[[2021,2,3]],"date-time":"2021-02-03T18:04:14Z","timestamp":1612375454000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":18,"title":["Data analytics and clinical feature ranking of medical records of patients with sepsis"],"prefix":"10.1186","volume":"14","author":[{"given":"Davide","family":"Chicco","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8445-395X","authenticated-orcid":false,"given":"Luca","family":"Oneto","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2021,2,3]]},"reference":[{"issue":"8","key":"235_CR1","doi-asserted-by":"publisher","first-page":"801","DOI":"10.1001\/jama.2016.0287","volume":"315","author":"M Singer","year":"2016","unstructured":"Singer M, Deutschman CS, Seymour CW, Shankar-Hari M, Annane D, Bauer M, Bellomo R, Bernard GR, Chiche J-D, Coopersmith CM, Hotchkiss RS, Levy MM, Marshall JC, Martin GS, Opal SM, Rubenfeld GD, van der Poll T, Vincent J-L, Angus DC. The third international consensus definitions for sepsis and septic shock (Sepsis-3). J Am Med Assoc (JAMA). 2016; 315(8):801\u201310.","journal-title":"J Am Med Assoc (JAMA)"},{"issue":"1","key":"235_CR2","doi-asserted-by":"publisher","first-page":"89","DOI":"10.1016\/S0933-3657(01)00077-X","volume":"23","author":"I Kononenko","year":"2001","unstructured":"Kononenko I. Machine learning for medical diagnosis: history, state of the art and perspective. Artif Intell Med. 2001; 23(1):89\u2013109.","journal-title":"Artif Intell Med"},{"issue":"16","key":"235_CR3","first-page":"1","volume":"20","author":"D Chicco","year":"2020","unstructured":"Chicco D, Jurman G. Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone. BMC Med Informat Decis Mak. 2020; 20(16):1\u201316.","journal-title":"BMC Med Informat Decis Mak"},{"key":"235_CR4","doi-asserted-by":"crossref","unstructured":"Shin S, Austin PC, Ross HJ, Abdel-Qadir H, Freitas C, Tomlinson G, Chicco D, Mahendiran M, Lawler PR, Billia F, Gramolini A, Epelman S, Wang B, Lee DS. Machine learning vs. conventional statistical models for predicting heart failure readmission and mortality. ESC Heart Fail. 2020:1\u201310.","DOI":"10.1002\/ehf2.13073"},{"issue":"1","key":"235_CR5","doi-asserted-by":"publisher","first-page":"0208737","DOI":"10.1371\/journal.pone.0208737","volume":"14","author":"D Chicco","year":"2019","unstructured":"Chicco D, Rovelli C. Computational prediction of diagnosis and feature selection on mesothelioma patient health records. PLoS ONE. 2019; 14(1):0208737.","journal-title":"PLoS ONE"},{"issue":"12","key":"235_CR6","doi-asserted-by":"publisher","first-page":"347","DOI":"10.1186\/s12859-016-1194-3","volume":"17","author":"D Cangelosi","year":"2016","unstructured":"Cangelosi D, Pelassa S, Morini M, Conte M, Bosco MC, Eva A, Sementa AR, Varesio L. Artificial neural network classifier predicts neuroblastoma patients\u2019 outcome. BMC Bioinformatics. 2016; 17(12):347.","journal-title":"BMC Bioinformatics"},{"issue":"12","key":"235_CR7","doi-asserted-by":"publisher","first-page":"0208924","DOI":"10.1371\/journal.pone.0208924","volume":"13","author":"V Maggio","year":"2018","unstructured":"Maggio V, Chierici M, Jurman G, Furlanello C. Distillation of the clinical algorithm improves prognosis by multi-task deep learning in high-risk neuroblastoma. PLoS ONE. 2018; 13(12):0208924.","journal-title":"PLoS ONE"},{"issue":"5992","key":"235_CR8","first-page":"1","volume":"11","author":"O Melaiu","year":"2020","unstructured":"Melaiu O, Chierici M, Lucarini V, Jurman G, Conti LA, Vito RD, Boldrini R, Cifaldi L, Castellano A, Furlanello C, Barnaba V, Locatelli F, Fruci D. Cellular and gene signatures of tumor-infiltrating dendritic cells and natural-killer cells predict prognosis of neuroblastoma. Nat Communi. 2020; 11(5992):1\u201315.","journal-title":"Nat Communi"},{"issue":"1","key":"235_CR9","doi-asserted-by":"publisher","first-page":"29","DOI":"10.1186\/s12885-017-3877-1","volume":"18","author":"M Patr\u00edcio","year":"2018","unstructured":"Patr\u00edcio M, Pereira J, Cris\u00f3stomo J, Matafome P, Gomes M, Sei\u00e7a R, Caramelo F. Using Resistin, glucose, age and BMI to predict the presence of breast cancer. BMC Cancer. 2018; 18(1):29.","journal-title":"BMC Cancer"},{"issue":"2","key":"235_CR10","doi-asserted-by":"publisher","first-page":"315","DOI":"10.1136\/amiajnl-2013-001815","volume":"21","author":"E Gultepe","year":"2013","unstructured":"Gultepe E, Green JP, Nguyen H, Adams J, Albertson T, Tagkopoulos I. From vital signs to clinical outcomes for patients with sepsis: a machine learning basis for a clinical decision support system. J Am Med Inform Assoc. 2013; 21(2):315\u201325.","journal-title":"J Am Med Inform Assoc"},{"issue":"1","key":"235_CR11","doi-asserted-by":"publisher","first-page":"11","DOI":"10.2196\/medinform.3445","volume":"3","author":"A Tsoukalas","year":"2015","unstructured":"Tsoukalas A, Albertson T, Tagkopoulos I. From data to optimal decision making: a data-driven, probabilistic machine learning approach to decision support for patients with sepsis. JMIR Med Inform. 2015; 3(1):11.","journal-title":"JMIR Med Inform"},{"issue":"3","key":"235_CR12","doi-asserted-by":"publisher","first-page":"269","DOI":"10.1111\/acem.12876","volume":"23","author":"RA Taylor","year":"2016","unstructured":"Taylor RA, Pare JR, Venkatesh AK, Mowafi H, Melnick ER, Fleischman W, Hall MK. Prediction of in-hospital mortality in emergency department patients with sepsis: a local big data\u2013driven, machine learning approach. Acad Emerg Med. 2016; 23(3):269\u201378.","journal-title":"Acad Emerg Med"},{"issue":"4","key":"235_CR13","doi-asserted-by":"publisher","first-page":"0174708","DOI":"10.1371\/journal.pone.0174708","volume":"12","author":"S Horng","year":"2017","unstructured":"Horng S, Sontag DA, Halpern Y, Jernite Y, Shapiro NI, Nathanson LA. Creating an automated trigger for sepsis clinical decision support at emergency department triage using machine learning. PLoS ONE. 2017; 12(4):0174708.","journal-title":"PLoS ONE"},{"issue":"1","key":"235_CR14","doi-asserted-by":"publisher","first-page":"000234","DOI":"10.1136\/bmjresp-2017-000234","volume":"4","author":"DW Shimabukuro","year":"2017","unstructured":"Shimabukuro DW, Barton CW, Feldman MD, Mataraso SJ, Das R. Effect of a machine learning-based severe sepsis prediction algorithm on patient survival and hospital length of stay: a randomised clinical trial. BMJ Open Respir Res. 2017; 4(1):000234.","journal-title":"BMJ Open Respir Res"},{"key":"235_CR15","first-page":"1","volume":"224014","author":"H Burdick","year":"2018","unstructured":"Burdick H, Pino E, Gabel-Comeau D, Gu C, Huang H, Lynn-Palevsky A, Das R. Evaluating a sepsis prediction machine learning algorithm in the emergency department and intensive care unit: a before and after comparative study. bioRxiv. 2018; 224014:1\u201313.","journal-title":"bioRxiv"},{"issue":"1","key":"235_CR16","doi-asserted-by":"publisher","first-page":"20","DOI":"10.3390\/diagnostics9010020","volume":"9","author":"J Calvert","year":"2019","unstructured":"Calvert J, Saber N, Hoffman J, Das R. Machine-learning-based laboratory developed test for the diagnosis of sepsis in high-risk patients. Diagnostics. 2019; 9(1):20.","journal-title":"Diagnostics"},{"key":"235_CR17","doi-asserted-by":"publisher","first-page":"79","DOI":"10.1016\/j.compbiomed.2019.04.027","volume":"109","author":"C Barton","year":"2019","unstructured":"Barton C, Chettipally U, Zhou Y, Jiang Z, Lynn-Palevsky A, Le S, Calvert J, Das R, Evaluation of a machine learning algorithm for up to 48-hour advance prediction of sepsis using six vital signs. Comput Biol Med. 2019; 109:79\u201384.","journal-title":"Comput Biol Med"},{"issue":"2","key":"235_CR18","doi-asserted-by":"publisher","first-page":"326","DOI":"10.1136\/amiajnl-2013-001854","volume":"21","author":"S Mani","year":"2014","unstructured":"Mani S, Ozdas A, Aliferis C, Varol HA, Chen Q, Carnevale R, Chen Y, Romano-Keeler J, Nian H, Weitkamp J-H. Medical decision support using machine learning for early detection of late-onset neonatal sepsis. J Am Med Inform Assoc. 2014; 21(2):326\u201336.","journal-title":"J Am Med Inform Assoc"},{"key":"235_CR19","volume-title":"American Journal of Respiratory and Critical Care Medicine","author":"C Barton","year":"2018","unstructured":"Barton C, Desautels T, Hoffman J, Mao Q, Jay M, Calvert J, Das R. Predicting pediatric severe sepsis with machine learning techniques. In: American Journal of Respiratory and Critical Care Medicine. New York: American Thoracic Society: 2018. p. A4282\u2013A4282."},{"issue":"2","key":"235_CR20","doi-asserted-by":"publisher","first-page":"0212665","DOI":"10.1371\/journal.pone.0212665","volume":"14","author":"AJ Masino","year":"2019","unstructured":"Masino AJ, Harris MC, Forsyth D, Ostapenko S, Srinivasan L, Bonafide CP, Balamuth F, Schmatz M, Grundmeier RW. Machine learning models for early sepsis recognition in the neonatal intensive care unit using readily available electronic health record data. PLoS ONE. 2019; 14(2):0212665.","journal-title":"PLoS ONE"},{"issue":"3","key":"235_CR21","first-page":"28","volume":"4","author":"T Desautels","year":"2016","unstructured":"Desautels T, Calvert J, Hoffman J, Jay M, Kerem Y, Shieh L, Shimabukuro D, Chettipally U, Feldman MD, Barton C, Wales DJ, Das R. Prediction of sepsis in the intensive care unit with minimal electronic health record data: a machine learning approach. J Med Internet Res. 2016; 4(3):28.","journal-title":"J Med Internet Res"},{"issue":"1","key":"235_CR22","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41598-020-73558-3","volume":"10","author":"D Chicco","year":"2020","unstructured":"Chicco D, Jurman G. Survival prediction of patients with sepsis from age, sex, and septic episode number alone. Sci Rep. 2020; 10(1):1\u201312.","journal-title":"Sci Rep"},{"issue":"3","key":"235_CR23","doi-asserted-by":"publisher","first-page":"70","DOI":"10.1109\/51.932728","volume":"20","author":"GB Moody","year":"2001","unstructured":"Moody GB, Mark RG, Goldberger AL. PhysioNet: a web-based resource for the study of physiologic signals. IEEE Eng Med Biol Mag. 2001; 20(3):70\u20135.","journal-title":"IEEE Eng Med Biol Mag"},{"key":"235_CR24","unstructured":"PhysioNet. PhysioNet, the research resource for the physiologic signals. https:\/\/www.physionet.org. URL visited on 19th May 2019."},{"key":"235_CR25","unstructured":"PhysioNet. Early prediction of sepsis from clinical data: the PhysioNet\/Computing in Cardiology Challenge 2019. https:\/\/physionet.org\/challenge\/2019\/. URL visited on 19th May 2019."},{"key":"235_CR26","unstructured":"Dascena Inc.InSight by Dascena. https:\/\/www.dascena.com\/insight. URL visited on 19th May 2019."},{"key":"235_CR27","doi-asserted-by":"publisher","first-page":"160035","DOI":"10.1038\/sdata.2016.35","volume":"3","author":"AE Johnson","year":"2016","unstructured":"Johnson AE, Pollard TJ, Shen L, Li-wei HL, Feng M, Ghassemi M, Moody B, Szolovits P, Celi LA, Mark RG. MIMIC-III, a freely accessible critical care database. Sci Data. 2016; 3:160035.","journal-title":"Sci Data"},{"issue":"7","key":"235_CR28","doi-asserted-by":"publisher","first-page":"0181001","DOI":"10.1371\/journal.pone.0181001","volume":"12","author":"T Ahmad","year":"2017","unstructured":"Ahmad T, Munir A, Bhatti SH, Aftab M, Raza MA. Survival analysis of heart failure patients: a case study. PLoS ONE. 2017; 12(7):0181001.","journal-title":"PLoS ONE"},{"issue":"11","key":"235_CR29","doi-asserted-by":"publisher","first-page":"0206527","DOI":"10.1371\/journal.pone.0206527","volume":"13","author":"I Yunus","year":"2018","unstructured":"Yunus I, Fasih A, Wang Y. The use of procalcitonin in the determination of severity of sepsis, patient outcomes and infection characteristics. PLoS ONE. 2018; 13(11):0206527.","journal-title":"PLoS ONE"},{"key":"235_CR30","doi-asserted-by":"publisher","unstructured":"Yunus I, Fasih A, Wang Y. The use of procalcitonin in the determination of severity of sepsis, patient outcomes and infection characteristics. S2 Table \u2013 Interpretation key. https:\/\/doi.org\/10.1371\/journal.pone.0206527.s002. URL visited on 7th February 2019.","DOI":"10.1371\/journal.pone.0206527.s002"},{"key":"235_CR31","doi-asserted-by":"publisher","unstructured":"Yunus I, Fasih A, Wang Y. The use of procalcitonin in the determination of severity of sepsis, patient outcomes and infection characteristics. S1 Table \u2013 Data collection sheet. https:\/\/doi.org\/10.1371\/journal.pone.0206527.s001. URL visited on 7th February 2019.","DOI":"10.1371\/journal.pone.0206527.s001"},{"key":"235_CR32","volume-title":"Deep Learning","author":"I Goodfellow","year":"2016","unstructured":"Goodfellow I, Bengio Y, Courville A. Deep Learning. Cambridge, Massachusetts, USA: MIT Press; 2016."},{"issue":"2","key":"235_CR33","first-page":"27","volume":"20","author":"ZF Lansdowne","year":"1996","unstructured":"Lansdowne ZF, Woodward BS. Applying the Borda ranking method. Air Force J Logist. 1996; 20(2):27\u20139.","journal-title":"Air Force J Logist"},{"issue":"2","key":"235_CR34","doi-asserted-by":"publisher","first-page":"442","DOI":"10.1016\/0005-2795(75)90109-9","volume":"405","author":"BW Matthews","year":"1975","unstructured":"Matthews BW. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta (BBA) \u2013 Mol Basis Dis. 1975; 405(2):442\u201351.","journal-title":"Biochim Biophys Acta (BBA) \u2013 Mol Basis Dis"},{"issue":"1","key":"235_CR35","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s12864-019-6413-7","volume":"21","author":"D Chicco","year":"2020","unstructured":"Chicco D, Jurman G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics. 2020; 21(1):1\u201313.","journal-title":"BMC Genomics"},{"issue":"35","key":"235_CR36","first-page":"1","volume":"10","author":"D Chicco","year":"2017","unstructured":"Chicco D. Ten quick tips for machine learning in computational biology. BioData Min. 2017; 10(35):1\u201317.","journal-title":"BioData Min"},{"key":"235_CR37","volume-title":"Proceedings of IEEE BIBE 2013 \u2013 the 13th IEEE International Conference on BioInformatics and BioEngineering","author":"D Chicco","year":"2013","unstructured":"Chicco D, Masseroli M. A discrete optimization approach for SVD best truncation choice based on ROC curves. In: Proceedings of IEEE BIBE 2013 \u2013 the 13th IEEE International Conference on BioInformatics and BioEngineering. Chania: IEEE: 2013. p. 1\u20134."},{"issue":"8","key":"235_CR38","doi-asserted-by":"publisher","first-page":"855","DOI":"10.1016\/j.jclinepi.2015.02.010","volume":"68","author":"B Ozenne","year":"2015","unstructured":"Ozenne B, Subtil F, Maucort-Boulch D. The precision\u2013recall curve overcame the optimism of the receiver operating characteristic curve in rare diseases. J Clin Epidemiol. 2015; 68(8):855\u20139.","journal-title":"J Clin Epidemiol"},{"key":"235_CR39","unstructured":"LaMorte WW. Screening for disease: positive and negative predictive value. 2016. http:\/\/sphweb.bumc.bu.edu\/otlt\/MPH-Modules\/EP\/EP713_Screening\/EP713_Screening5.html. URL visited on 3rd February 2020."},{"key":"235_CR40","first-page":"73","volume":"9","author":"AJ Onwuegbuzie","year":"1999","unstructured":"Onwuegbuzie AJ, Daniel LG. Uses and misuses of the correlation coefficient. Res Sch. 1999; 9:73\u201390.","journal-title":"Res Sch"},{"key":"235_CR41","doi-asserted-by":"crossref","unstructured":"Haynes W. Student\u2019s t-test. Encycl Syst Biol. 2013:2023\u20135.","DOI":"10.1007\/978-1-4419-9863-7_1184"},{"issue":"22","key":"235_CR42","doi-asserted-by":"publisher","first-page":"2369","DOI":"10.1001\/jama.2018.16627","volume":"320","author":"M Legrand","year":"2018","unstructured":"Legrand M, Kellum JA. Serum creatinine in the critically ill patient with sepsis. J Am Med Inform Assoc. 2018; 320(22):2369\u201370.","journal-title":"J Am Med Inform Assoc"},{"issue":"8","key":"235_CR43","doi-asserted-by":"publisher","first-page":"939","DOI":"10.1152\/ajprenal.00025.2013","volume":"307","author":"A Leelahavanichkul","year":"2014","unstructured":"Leelahavanichkul A, Souza ACP, Street JM, Hsu V, Tsuji T, Doi K, Li L, Hu X, Zhou H, Kumar P, et al. Comparison of serum creatinine and serum cystatin C as biomarkers to detect sepsis-induced acute kidney injury and to predict mortality in CD-1 mice. Am J Physiol Ren Physiol. 2014; 307(8):939\u201348.","journal-title":"Am J Physiol Ren Physiol"},{"issue":"8","key":"235_CR44","doi-asserted-by":"publisher","first-page":"0183156","DOI":"10.1371\/journal.pone.0183156","volume":"12","author":"HR Kang","year":"2017","unstructured":"Kang HR, Lee SN, Cho YJ, Jeon JS, Noh H, Han DC, Park S, Kwon SH. A decrease in serum creatinine after ICU admission is associated with increased mortality. PLoS ONE. 2017; 12(8):0183156.","journal-title":"PLoS ONE"},{"issue":"4","key":"235_CR45","first-page":"1","volume":"17","author":"AR Santana","year":"2013","unstructured":"Santana AR, de Sousa JL, Amorim FF, Menezes BM, Ara\u00fajo FVB, Soares FB, de Carvalho Santos LC, de Ara\u00fajo MPB, Rocha PHG, J\u00fanior PNF. SaO 2\/FiO 2 ratio as risk stratification for patients with sepsis. Crit Care. 2013; 17(4):1\u201359.","journal-title":"Crit Care"},{"issue":"1","key":"235_CR46","doi-asserted-by":"publisher","first-page":"77","DOI":"10.1186\/1471-2369-14-77","volume":"14","author":"J Maizel","year":"2013","unstructured":"Maizel J, Deransy R, Dehedin B, Secq E, Zogheib E, Lewandowski E, Tribouilloy C, Massy ZA, Choukroun G, Slama M. Impact of non-dialysis chronic kidney disease on survival in patients with septic shock. BMC Nephrology. 2013; 14(1):77.","journal-title":"BMC Nephrology"},{"key":"235_CR47","doi-asserted-by":"publisher","first-page":"13527","DOI":"10.1109\/ACCESS.2020.2966296","volume":"8","author":"B Pes","year":"2020","unstructured":"Pes B. Learning from high-dimensional biomedical datasets: the issue of class imbalance. IEEE Access. 2020; 8:13527\u201340.","journal-title":"IEEE Access"},{"issue":"4","key":"235_CR48","doi-asserted-by":"publisher","first-page":"837","DOI":"10.1109\/TCBB.2014.2382127","volume":"12","author":"D Chicco","year":"2015","unstructured":"Chicco D, Masseroli M. Software suite for gene and protein annotation prediction and similarity search. IEEE\/ACM Trans Comput Biol Bioinforma. 2015; 12(4):837\u201343.","journal-title":"IEEE\/ACM Trans Comput Biol Bioinforma"},{"key":"235_CR49","volume-title":"Statistical Learning Theory","author":"VN Vapnik","year":"1998","unstructured":"Vapnik VN. Statistical Learning Theory. New York, New York, USA: Wiley; 1998."},{"key":"235_CR50","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-14142-8","volume-title":"Data Mining: the Textbook","author":"CC Aggarwal","year":"2015","unstructured":"Aggarwal CC. Data Mining: the Textbook. Heidelberg, Germany: Springer; 2015."},{"issue":"10","key":"235_CR51","doi-asserted-by":"publisher","first-page":"1087","DOI":"10.1016\/j.jclinepi.2006.01.014","volume":"59","author":"ART Donders","year":"2006","unstructured":"Donders ART, Van Der Heijden GJ, Stijnen T, Moons KG. A gentle introduction to imputation of missing values. J Clin Epidemiol. 2006; 59(10):1087\u201391.","journal-title":"J Clin Epidemiol"},{"key":"235_CR52","doi-asserted-by":"publisher","first-page":"220","DOI":"10.1016\/j.eswa.2016.12.035","volume":"73","author":"G Haixiang","year":"2017","unstructured":"Haixiang G, Yijing L, Shang J, Mingyun G, Yuanyue H, Bing G. Learning from class-imbalanced data: review of methods and applications. Expert Syst Appl. 2017; 73:220\u201339.","journal-title":"Expert Syst Appl"},{"issue":"4","key":"235_CR53","doi-asserted-by":"publisher","first-page":"1252","DOI":"10.1002\/widm.1252","volume":"8","author":"L Oneto","year":"2018","unstructured":"Oneto L. Model selection and error estimation without the agonizing pain. Wiley Interdiscip Rev Data Min Knowl Discov. 2018; 8(4):1252.","journal-title":"Wiley Interdiscip Rev Data Min Knowl Discov"},{"key":"235_CR54","first-page":"1157","volume":"3","author":"I Guyon","year":"2003","unstructured":"Guyon I, Elisseeff A. An introduction to variable and feature selection. J Mach Learn Res. 2003; 3:1157\u201382.","journal-title":"J Mach Learn Res"},{"key":"235_CR55","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9781107298019","volume-title":"Understanding Machine Learning: From Theory To Algorithms","author":"S Shalev-Shwartz","year":"2014","unstructured":"Shalev-Shwartz S, Ben-David S. Understanding Machine Learning: From Theory To Algorithms. Cambridge, England, United Kingdom: Cambridge University Press; 2014."},{"key":"235_CR56","volume-title":"Data Mining with Decision Trees: Theory and Applications","author":"L Rokach","year":"2008","unstructured":"Rokach L, Maimon OZ, Vol. 69. Data Mining with Decision Trees: Theory and Applications. Singapore: World Scientific; 2008."},{"issue":"1","key":"235_CR57","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1023\/A:1010933404324","volume":"45","author":"L Breiman","year":"2001","unstructured":"Breiman L. Random forests. Mach Learn. 2001; 45(1):5\u201332.","journal-title":"Mach Learn"},{"key":"235_CR58","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511809682","volume-title":"Kernel Methods for Pattern Analysis","author":"J Shawe-Taylor","year":"2004","unstructured":"Shawe-Taylor J, Cristianini N. Kernel Methods for Pattern Analysis. Cambridge, England, United Kingdom: Cambridge University Press; 2004."},{"key":"235_CR59","unstructured":"Scholkopf B. The kernel trick for distances. In: Advances in Neural Information Processing Systems: 2001. p. 301\u2013307."},{"issue":"7","key":"235_CR60","doi-asserted-by":"publisher","first-page":"1667","DOI":"10.1162\/089976603321891855","volume":"15","author":"SS Keerthi","year":"2003","unstructured":"Keerthi SS, Lin C-J. Asymptotic behaviors of support vector machines with Gaussian kernel. Neural Comput. 2003; 15(7):1667\u201389.","journal-title":"Neural Comput"},{"key":"235_CR61","doi-asserted-by":"crossref","DOI":"10.1093\/oso\/9780198538493.001.0001","volume-title":"Neural Networks for Pattern Recognition","author":"CM Bishop","year":"1995","unstructured":"Bishop CM. Neural Networks for Pattern Recognition. Oxford, England, United Kingdom: Oxford University Press; 1995."},{"issue":"6","key":"235_CR62","doi-asserted-by":"publisher","first-page":"386","DOI":"10.1037\/h0042519","volume":"65","author":"F Rosenblatt","year":"1958","unstructured":"Rosenblatt F. The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev. 1958; 65(6):386.","journal-title":"Psychol Rev"},{"issue":"3","key":"235_CR63","first-page":"1","volume":"5","author":"DE Rumelhart","year":"1988","unstructured":"Rumelhart DE, Hinton GE, Williams RJ. Learning representations by back-propagating errors. Cogn Model. 1988; 5(3):1.","journal-title":"Cogn Model"},{"issue":"4","key":"235_CR64","doi-asserted-by":"publisher","first-page":"303","DOI":"10.1007\/BF02551274","volume":"2","author":"G Cybenko","year":"1989","unstructured":"Cybenko G. Approximation by superpositions of a sigmoidal function. Math Control Signals Syst. 1989; 2(4):303\u201314.","journal-title":"Math Control Signals Syst"},{"key":"235_CR65","unstructured":"Rish I. An empirical study of the naive Bayes classifier. In: Proceedings of IJCAI 2001 \u2013 the 2001 International Joint Conference on Artificial Intelligence, Workshop on Empirical Methods in Artificial Intelligence: 2001. p. 41\u201346."},{"issue":"1","key":"235_CR66","doi-asserted-by":"publisher","first-page":"21","DOI":"10.1109\/TIT.1967.1053964","volume":"13","author":"T Cover","year":"1967","unstructured":"Cover T, Hart P. Nearest neighbor pattern classification. IEEE Trans Inf Theory. 1967; 13(1):21\u201327.","journal-title":"IEEE Trans Inf Theory"},{"key":"235_CR67","doi-asserted-by":"publisher","DOI":"10.1002\/9781118548387","volume-title":"Applied Logistic Regression","author":"DW Hosmer Jr","year":"2013","unstructured":"Hosmer Jr DW, Lemeshow S, Sturdivant RX, Vol. 398. Applied Logistic Regression. New York: John Wiley & Sons; 2013."},{"issue":"7553","key":"235_CR68","doi-asserted-by":"publisher","first-page":"436","DOI":"10.1038\/nature14539","volume":"521","author":"Y LeCun","year":"2015","unstructured":"LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015; 521(7553):436\u201344.","journal-title":"Nature"},{"issue":"16","key":"235_CR69","doi-asserted-by":"publisher","first-page":"2035","DOI":"10.1093\/bioinformatics\/btp363","volume":"25","author":"KF Kerr","year":"2009","unstructured":"Kerr KF. Comments on the analysis of unbalanced microarray data. Bioinformatics. 2009; 25(16):2035\u201341.","journal-title":"Bioinformatics"},{"issue":"3","key":"235_CR70","doi-asserted-by":"publisher","first-page":"105","DOI":"10.1515\/jib-2011-177","volume":"8","author":"R Laza","year":"2011","unstructured":"Laza R, Pav\u00f3n R, Reboiro-Jato M, Fdez-Riverola F. Evaluating the effect of unbalanced data in biomedical document classification. J Integr Bioinforma. 2011; 8(3):105\u201317.","journal-title":"J Integr Bioinforma"},{"key":"235_CR71","volume-title":"Proceedings of BIBMW 2010 \u2013 the 2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops","author":"K Han","year":"2010","unstructured":"Han K, Kim K-Z, Park T. Unbalanced sample size effect on the genome-wide population differentiation studies. In: Proceedings of BIBMW 2010 \u2013 the 2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops. Hong Kong: IEEE: 2010. p. 347\u201352."},{"issue":"9","key":"235_CR72","first-page":"1263","volume":"21","author":"H He","year":"2008","unstructured":"He H, Garcia EA. Learning from imbalanced data. IEEE Trans Knowl Data Eng. 2008; 21(9):1263\u201384.","journal-title":"IEEE Trans Knowl Data Eng"},{"key":"235_CR73","volume-title":"Proceedings of IJCAI 1995 \u2013 the International Joint Conference on Artificial Intelligence","author":"R Kohavi","year":"1995","unstructured":"Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of IJCAI 1995 \u2013 the International Joint Conference on Artificial Intelligence. Montreal, Quebec, Canada: IJCAI: 1995. p. 1137\u201345."},{"key":"235_CR74","doi-asserted-by":"crossref","unstructured":"Saeys Y, Abeel T, Van de Peer Y. Robust feature selection using ensemble feature selection techniques. In: Proceedings of ECML PKDD 2008 \u2013 the 2008 Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer: 2008. p. 313\u201325.","DOI":"10.1007\/978-3-540-87481-2_21"},{"issue":"14","key":"235_CR75","doi-asserted-by":"publisher","first-page":"2225","DOI":"10.1016\/j.patrec.2010.03.014","volume":"31","author":"R Genuer","year":"2010","unstructured":"Genuer R, Poggi J-M, Tuleau-Malot C. Variable selection using random forests. Pattern Recognit Lett. 2010; 31(14):2225\u201336.","journal-title":"Pattern Recognit Lett"},{"key":"235_CR76","volume-title":"Ensemble Machine Learning","author":"Y Qi","year":"2012","unstructured":"Qi Y. Random forest for bioinformatics. In: Ensemble Machine Learning. Boston, Massachusetts, USA: Springer: 2012. p. 1\u201318."},{"issue":"1","key":"235_CR77","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1186\/1471-2105-7-3","volume":"7","author":"R D\u00edaz-Uriarte","year":"2006","unstructured":"D\u00edaz-Uriarte R, De Andres SA. Gene selection and classification of microarray data using random forest. BMC Bioinformatics. 2006; 7(1):3.","journal-title":"BMC Bioinformatics"},{"key":"235_CR78","volume-title":"Permutation Tests: a Practical Guide to Resampling Methods for Testing Hypotheses","author":"P Good","year":"2013","unstructured":"Good P. Permutation Tests: a Practical Guide to Resampling Methods for Testing Hypotheses. Heidelberg, Germany: Springer; 2013."},{"issue":"1","key":"235_CR79","doi-asserted-by":"publisher","first-page":"86","DOI":"10.1093\/bib\/bbq011","volume":"12","author":"ML Calle","year":"2010","unstructured":"Calle ML, Urrea V. Letter to the editor: stability of random forest importance measures. Brief Bioinform. 2010; 12(1):86\u20139.","journal-title":"Brief Bioinform"},{"issue":"1","key":"235_CR80","doi-asserted-by":"publisher","first-page":"8","DOI":"10.1186\/1471-2105-15-8","volume":"15","author":"MB Kursa","year":"2014","unstructured":"Kursa MB. Robustness of Random Forest-based gene selection methods. BMC Bioinformatics. 2014; 15(1):8.","journal-title":"BMC Bioinformatics"},{"issue":"1","key":"235_CR81","doi-asserted-by":"publisher","first-page":"60","DOI":"10.1186\/s12859-016-0900-5","volume":"17","author":"H Wang","year":"2016","unstructured":"Wang H, Yang F, Luo Z. An experimental study of the intrinsic stability of random forest variable importance measures. BMC Bioinformatics. 2016; 17(1):60.","journal-title":"BMC Bioinformatics"},{"key":"235_CR82","volume-title":"Proceedings of the 2007 SIAM International Conference on Data Mining","author":"D Sculley","year":"2007","unstructured":"Sculley D. Rank aggregation for similar items. In: Proceedings of the 2007 SIAM International Conference on Data Mining. Santa Fe, New Mexico: Society for Industrial and Applied Mathematics (SIAM): 2007. p. 587\u201392."},{"issue":"309","key":"235_CR83","first-page":"320","volume":"60","author":"D Owen","year":"1965","unstructured":"Owen D. The power of Student\u2019s t-test. J Am Stat Assoc. 1965; 60(309):320\u201333.","journal-title":"J Am Stat Assoc"},{"key":"235_CR84","volume-title":"Noise Reduction in Speech Processing","author":"J Benesty","year":"2009","unstructured":"Benesty J, Chen J, Huang Y, Cohen I. Pearson correlation coefficient. In: Noise Reduction in Speech Processing. Heidelberg, Germany: Springer: 2009. p. 1\u20134."}],"container-title":["BioData Mining"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13040-021-00235-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13040-021-00235-0\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13040-021-00235-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,23]],"date-time":"2024-08-23T14:37:46Z","timestamp":1724423866000},"score":1,"resource":{"primary":{"URL":"https:\/\/biodatamining.biomedcentral.com\/articles\/10.1186\/s13040-021-00235-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,2,3]]},"references-count":84,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2021,12]]}},"alternative-id":["235"],"URL":"https:\/\/doi.org\/10.1186\/s13040-021-00235-0","relation":{},"ISSN":["1756-0381"],"issn-type":[{"value":"1756-0381","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,2,3]]},"assertion":[{"value":"18 August 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"5 January 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"3 February 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The original study containing the dataset analyzed in this manuscript was approved by the Institutional Review Board of the University of Illinois, College of Medicine at Peoria, Peoria, Illinois, USA [].","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare they have no competing interests.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"12"}}