{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,28]],"date-time":"2026-01-28T11:28:32Z","timestamp":1769599712169,"version":"3.49.0"},"reference-count":50,"publisher":"Oxford University Press (OUP)","issue":"3","license":[{"start":{"date-parts":[[2019,12,30]],"date-time":"2019-12-30T00:00:00Z","timestamp":1577664000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"name":"Swedish Institute scholarship granted to Kushan De Silva for studies at Lund University"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020,3,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Objective<\/jats:title><jats:p>To identify predictors of prediabetes using feature selection and machine learning on a nationally representative sample of the US population.<\/jats:p><\/jats:sec><jats:sec><jats:title>Materials and Methods<\/jats:title><jats:p>We analyzed n\u2009=\u20096346 men and women enrolled in the National Health and Nutrition Examination Survey 2013\u20132014. Prediabetes was defined using American Diabetes Association guidelines. The sample was randomly partitioned to training (n\u2009=\u20093174) and internal validation (n\u2009=\u20093172) sets. Feature selection algorithms were run on training data containing 156 preselected exposure variables. Four machine learning algorithms were applied on 46 exposure variables in original and resampled training datasets built using 4 resampling methods. Predictive models were tested on internal validation data (n\u2009=\u20093172) and external validation data (n\u2009=\u20093000) prepared from National Health and Nutrition Examination Survey 2011\u20132012. Model performance was evaluated using area under the receiver operating characteristic curve (AUROC). Predictors were assessed by odds ratios in logistic models and variable importance in others. The Centers for Disease Control (CDC) prediabetes screening tool was the benchmark to compare model performance.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>Prediabetes prevalence was 23.43%. The CDC prediabetes screening tool produced 64.40% AUROC. Seven optimal (\u2265 70% AUROC) models identified 25 predictors including 4 potentially novel associations; 20 by both logistic and other nonlinear\/ensemble models and 5 solely by the latter. All optimal models outperformed the CDC prediabetes screening tool (P\u2009&amp;lt;\u20090.05).<\/jats:p><\/jats:sec><jats:sec><jats:title>Discussion<\/jats:title><jats:p>Combined use of feature selection and machine learning increased predictive performance outperforming the recommended screening tool. A range of predictors of prediabetes was identified.<\/jats:p><\/jats:sec><jats:sec><jats:title>Conclusion<\/jats:title><jats:p>This work demonstrated the value of combining feature selection with machine learning to identify a wide range of predictors that could enhance prediabetes prediction and clinical decision-making.<\/jats:p><\/jats:sec>","DOI":"10.1093\/jamia\/ocz204","type":"journal-article","created":{"date-parts":[[2019,11,13]],"date-time":"2019-11-13T20:10:44Z","timestamp":1573675844000},"page":"396-406","source":"Crossref","is-referenced-by-count":39,"title":["A combined strategy of feature selection and machine learning to identify predictors of prediabetes"],"prefix":"10.1093","volume":"27","author":[{"given":"Kushan","family":"De Silva","sequence":"first","affiliation":[{"name":"Department of Clinical Sciences, Faculty of Medicine, Lund University, Lund,Sweden"},{"name":"Department of General Practice, School of Primary and Allied Health Care, Faculty of Medicine, Nursing, and Health Sciences, Monash University, Notting Hill, Australia"}]},{"given":"Daniel","family":"J\u00f6nsson","sequence":"additional","affiliation":[{"name":"Department of Periodontology, Malm\u00f6 University, Malm\u00f6 and Swedish Dental Service of Skane, Lund, Sweden"}]},{"given":"Ryan T","family":"Demmer","sequence":"additional","affiliation":[{"name":"Division of Epidemiology and Community Health, School of Public Health, University of Minnesota, Minneapolis, Minnesota, USA"}]}],"member":"286","published-online":{"date-parts":[[2019,12,30]]},"reference":[{"key":"2020110613104068500_ocz204-B1","doi-asserted-by":"crossref","first-page":"i5953.","DOI":"10.1136\/bmj.i5953","article-title":"Association between prediabetes and risk of cardiovascular disease and all-cause mortality: systematic review and meta-analysis","volume":"355","author":"Huang","year":"2016","journal-title":"BMJ"},{"issue":"11","key":"2020110613104068500_ocz204-B2","doi-asserted-by":"crossref","first-page":"2261","DOI":"10.1007\/s00125-014-3361-2","article-title":"Prediabetes and the risk of cancer: a meta-analysis","volume":"57","author":"Huang","year":"2014","journal-title":"Diabetologia"},{"issue":"4","key":"2020110613104068500_ocz204-B3","doi-asserted-by":"crossref","first-page":"751","DOI":"10.1016\/j.ecl.2016.06.007","article-title":"Prediabetes: a worldwide epidemic","volume":"45","author":"Edwards","year":"2016","journal-title":"Endocrinol Metab Clin North Am"},{"issue":"2","key":"2020110613104068500_ocz204-B4","doi-asserted-by":"crossref","first-page":"296","DOI":"10.4239\/wjd.v6.i2.296","article-title":"Prediabetes diagnosis and treatment: a review","volume":"6","author":"Bansal","year":"2015","journal-title":"World J Diabetes"},{"issue":"1","key":"2020110613104068500_ocz204-B5","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1186\/1478-7954-12-12","article-title":"Detecting type 2 diabetes and prediabetes among asymptomatic adults in the United States: modeling American Diabetes Association versus US Preventive Services Task Force diabetes screening guidelines","volume":"12","author":"Dall","year":"2014","journal-title":"Popul Health Metr"},{"key":"2020110613104068500_ocz204-B6","doi-asserted-by":"crossref","first-page":"g4485.","DOI":"10.1136\/bmj.g4485","article-title":"The epidemic of pre-diabetes: the medicine and the politics","volume":"349","author":"Yudkin","year":"2014","journal-title":"BMJ"},{"issue":"8","key":"2020110613104068500_ocz204-B7","doi-asserted-by":"crossref","first-page":"1468","DOI":"10.2337\/dc15-2113","article-title":"Prediabetes: are there problems with this label? Yes, the label creates further problems!","volume":"39","author":"Yudkin","year":"2016","journal-title":"Diabetes Care"},{"issue":"9833","key":"2020110613104068500_ocz204-B8","doi-asserted-by":"crossref","first-page":"2279","DOI":"10.1016\/S0140-6736(12)60283-9","article-title":"Prediabetes: a high-risk state for developing diabetes","volume":"379","author":"Tab\u00e1","year":"2012","journal-title":"Lancet"},{"issue":"12","key":"2020110613104068500_ocz204-B9","doi-asserted-by":"crossref","first-page":"1207.","DOI":"10.4239\/wjd.v6.i12.1207","article-title":"Treatment of prediabetes","volume":"6","author":"Kanat","year":"2015","journal-title":"World J Diabetes"},{"issue":"5","key":"2020110613104068500_ocz204-B10","doi-asserted-by":"crossref","first-page":"393","DOI":"10.1159\/000492604","article-title":"A 12-month lifestyle intervention program improves body composition and reduces the prevalence of prediabetes in obese patients","volume":"11","author":"K\u00f6nig","year":"2018","journal-title":"Obes Facts"},{"issue":"5","key":"2020110613104068500_ocz204-B11","doi-asserted-by":"crossref","first-page":"393","DOI":"10.1016\/j.pcd.2018.07.003","article-title":"Effects of lifestyle changes on adults with prediabetes: a systematic review and meta-analysis","volume":"12","author":"Glechner","year":"2018","journal-title":"Prim Care Diabetes"},{"key":"2020110613104068500_ocz204-B12"},{"issue":"8","key":"2020110613104068500_ocz204-B13","doi-asserted-by":"crossref","first-page":"640","DOI":"10.1111\/pedi.12225","article-title":"Comparing glycemic indicators of prediabetes: a prospective study of obese Latino youth","volume":"16","author":"Kim","year":"2015","journal-title":"Pediatr Diabetes"},{"key":"2020110613104068500_ocz204-B14","doi-asserted-by":"crossref","first-page":"104","DOI":"10.1016\/j.csbj.2016.12.005","article-title":"Machine learning and data mining methods in diabetes research","volume":"15","author":"Kavakiotis","year":"2017","journal-title":"Comput Struct Biotechnol J"},{"key":"2020110613104068500_ocz204-B15","doi-asserted-by":"crossref","first-page":"345","DOI":"10.2147\/DMSO.S100074","article-title":"Novel biomarkers for prediabetes, diabetes, and associated complications","volume":"10","author":"Dorcely","year":"2017","journal-title":"DMSO"},{"issue":"5","key":"2020110613104068500_ocz204-B16","doi-asserted-by":"crossref","first-page":"1040","DOI":"10.2337\/dc07-1150","article-title":"Diabetes risk calculator: a simple tool for detecting undiagnosed diabetes and pre-diabetes","volume":"31","author":"Heikes","year":"2008","journal-title":"Diabetes Care"},{"issue":"9","key":"2020110613104068500_ocz204-B17","doi-asserted-by":"crossref","first-page":"1030","DOI":"10.1016\/j.jclinepi.2009.11.012","article-title":"A simple tool detected diabetes and prediabetes in rural Chinese","volume":"63","author":"Xin","year":"2010","journal-title":"J Clin Epidemiol"},{"issue":"13","key":"2020110613104068500_ocz204-B18","doi-asserted-by":"crossref","first-page":"1351","DOI":"10.1001\/jama.2013.393","article-title":"The inevitable application of big data to health care","volume":"309","author":"Murdoch","year":"2013","journal-title":"JAMA"},{"issue":"10","key":"2020110613104068500_ocz204-B19","doi-asserted-by":"crossref","first-page":"e0163942.","DOI":"10.1371\/journal.pone.0163942","article-title":"Prediction of incident diabetes in the Jackson Heart Study using high-dimensional machine learning","volume":"11","author":"Casanova","year":"2016","journal-title":"PLoS One"},{"issue":"1","key":"2020110613104068500_ocz204-B20","doi-asserted-by":"crossref","first-page":"103.","DOI":"10.1186\/1741-7015-9-103","article-title":"Developing risk prediction models for type 2 diabetes: a systematic review of methodology and reporting","volume":"9","author":"Collins","year":"2011","journal-title":"BMC Med"},{"issue":"1","key":"2020110613104068500_ocz204-B21","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.diabres.2014.03.007","article-title":"Risk assessment tools for detecting those with pre-diabetes: a systematic review","volume":"105","author":"Barber","year":"2014","journal-title":"Diabetes Res Clin Pract"},{"issue":"1","key":"2020110613104068500_ocz204-B22","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1186\/s13040-017-0142-8","article-title":"EFS: an ensemble feature selection tool implemented as R-package and web-application","volume":"10","author":"Neumann","year":"2017","journal-title":"BioData Min"},{"issue":"2","key":"2020110613104068500_ocz204-B23","doi-asserted-by":"crossref","first-page":"224.","DOI":"10.7763\/IJMLC.2013.V3.307","article-title":"Addressing the class imbalance problem in medical datasets","volume":"3","author":"Rahman","year":"2013","journal-title":"IJMLC"},{"issue":"2","key":"2020110613104068500_ocz204-B24","doi-asserted-by":"crossref","first-page":"728","DOI":"10.1109\/JBHI.2014.2325615","article-title":"Rule extraction from support vector machines using ensemble learning approach: an application for diagnosis of diabetes","volume":"19","author":"Han","year":"2015","journal-title":"IEEE J Biomed Health Inform"},{"issue":"2-3","key":"2020110613104068500_ocz204-B25","doi-asserted-by":"crossref","first-page":"427","DOI":"10.1016\/j.neunet.2007.12.031","article-title":"Training neural network classifiers for medical decision making: the effects of imbalanced datasets on classification performance","volume":"21","author":"Mazurowski","year":"2008","journal-title":"Neural Netw"},{"key":"2020110613104068500_ocz204-B26"},{"key":"2020110613104068500_ocz204-B27"},{"key":"2020110613104068500_ocz204-B28","first-page":"S81","article-title":"Diagnosis and Classification of Diabetes Mellitus","volume":"37 (Suppl 1)","year":"2014","journal-title":"Diabetes Care"},{"key":"2020110613104068500_ocz204-B29","first-page":"1","article-title":"Groothuis-Oudshoorn K. mice: multivariate imputation by chained equations in R","author":"Buuren","year":"2010","journal-title":"J Stat Softw"},{"issue":"16","key":"2020110613104068500_ocz204-B30","first-page":"95.","article-title":"A prediction model for the peripheral arterial disease using NHANES data","author":"Zhang","year":"2016","journal-title":"Medicine"},{"issue":"11","key":"2020110613104068500_ocz204-B31","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18637\/jss.v036.i11","article-title":"Feature selection with the Boruta package","volume":"36","author":"Kursa","year":"2010","journal-title":"J Stat Softw"},{"key":"2020110613104068500_ocz204-B32","volume-title":"Fselector: Selecting Attributes","author":"Romanski"},{"issue":"1","key":"2020110613104068500_ocz204-B33","doi-asserted-by":"crossref","first-page":"1.","DOI":"10.18637\/jss.v033.i01","article-title":"Regularization paths for generalized linear models via coordinate descent","volume":"33","author":"Friedman","year":"2010","journal-title":"J Stat Softw"},{"key":"2020110613104068500_ocz204-B34","author":"Kuhn"},{"issue":"7","key":"2020110613104068500_ocz204-B35","doi-asserted-by":"crossref","first-page":"e0179805.","DOI":"10.1371\/journal.pone.0179805","article-title":"Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The Henry Ford ExercIse Testing (FIT) project","volume":"12","author":"Alghamdi","year":"2017","journal-title":"PLoS One"},{"key":"2020110613104068500_ocz204-B36"},{"key":"2020110613104068500_ocz204-B37","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1613\/jair.953","article-title":"SMOTE: synthetic minority over-sampling technique","volume":"16","author":"Chawla","year":"2002","journal-title":"J Artif Intell Res"},{"issue":"10","key":"2020110613104068500_ocz204-B38","article-title":"Evaluation measures for models assessment over imbalanced datasets","volume":"3","author":"Bekkar","year":"2013","journal-title":"J Inf Eng Appl"},{"key":"2020110613104068500_ocz204-B39","doi-asserted-by":"crossref","first-page":"875","DOI":"10.1007\/978-0-387-09823-4_45","volume-title":"Data Mining and Knowledge Discovery Handbook","author":"Chawla","year":"2009"},{"issue":"1","key":"2020110613104068500_ocz204-B40","doi-asserted-by":"crossref","first-page":"26.","DOI":"10.1186\/s40537-017-0082-7","article-title":"Survey on clinical prediction models for diabetes prediction","volume":"4","author":"Jayanthi","year":"2017","journal-title":"J Big Data"},{"key":"2020110613104068500_ocz204-B41","doi-asserted-by":"crossref","first-page":"146","DOI":"10.1016\/j.diabres.2016.06.022","article-title":"Comparison of screening scores for diabetes and prediabetes","volume":"118","author":"Poltavskiy","year":"2016","journal-title":"Diabetes Res Clin Pract"},{"issue":"1","key":"2020110613104068500_ocz204-B42","doi-asserted-by":"crossref","first-page":"29","DOI":"10.1148\/radiology.143.1.7063747","article-title":"The meaning and use of the area under a receiver operating characteristic (ROC) curve","volume":"143","author":"Hanley","year":"1982","journal-title":"Radiology"},{"key":"2020110613104068500_ocz204-B43","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.metabol.2017.08.014","article-title":"Lipidome as a predictive tool in progression to type 2 diabetes in Finnish men","volume":"78","author":"Suvitaival","year":"2018","journal-title":"Metab Clin Exp"},{"issue":"5","key":"2020110613104068500_ocz204-B44","doi-asserted-by":"crossref","first-page":"397","DOI":"10.1016\/j.trsl.2012.12.013","article-title":"Inconsistency in albuminuria predictors in type 2 diabetes: a comparison between neural network and conditional logistic regression","volume":"161","author":"Morteza","year":"2013","journal-title":"Transl Res"},{"key":"2020110613104068500_ocz204-B45","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1155\/2014\/485353","article-title":"Screening for prediabetes using machine learning models","volume":"2014","author":"Choi","year":"2014","journal-title":"Comput Math Methods Med"},{"key":"2020110613104068500_ocz204-B46","volume-title":"Definition and Diagnosis of Diabetes Mellitus and Intermediate Hyperglycemia: Report of a WHO\/IDF Consultation","year":"2006"},{"issue":"1","key":"2020110613104068500_ocz204-B47","doi-asserted-by":"crossref","first-page":"67","DOI":"10.6339\/JDS.201601_14(1).0005","article-title":"Understanding variable effects from black box prediction: Quantifying effects in tree ensembles using partial dependence","volume":"14","author":"Cafri","year":"2016","journal-title":"J Data Sci"},{"issue":"1","key":"2020110613104068500_ocz204-B48","doi-asserted-by":"crossref","first-page":"e000169.","DOI":"10.1136\/bmjdrc-2015-000169","article-title":"Gender-related affecting factors of prediabetes on its 10-year outcome","volume":"4","author":"Song","year":"2016","journal-title":"BMJ Open Diabetes Res Care"},{"issue":"1","key":"2020110613104068500_ocz204-B49","doi-asserted-by":"crossref","first-page":"6","DOI":"10.1177\/1932296815620200","article-title":"Reverse engineering and evaluation of prediction models for progression to type 2 diabetes: an application of machine learning using electronic health records","volume":"10","author":"Anderson","year":"2016","journal-title":"J Diabetes Sci Technol"},{"issue":"10","key":"2020110613104068500_ocz204-B50","doi-asserted-by":"crossref","first-page":"1116","DOI":"10.1111\/j.1464-5491.2010.03065.x","article-title":"Prediction models for incident type 2 diabetes mellitus in the older population: KORA S4\/F4 cohort study","volume":"27","author":"Rathmann","year":"2010","journal-title":"Diabet Med"}],"container-title":["Journal of the American Medical Informatics Association"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/jamia\/article-pdf\/27\/3\/396\/34152872\/ocz204.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/academic.oup.com\/jamia\/article-pdf\/27\/3\/396\/34152872\/ocz204.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,10,5]],"date-time":"2022-10-05T00:09:58Z","timestamp":1664928598000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/jamia\/article\/27\/3\/396\/5691201"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,12,30]]},"references-count":50,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2019,12,30]]},"published-print":{"date-parts":[[2020,3,1]]}},"URL":"https:\/\/doi.org\/10.1093\/jamia\/ocz204","relation":{},"ISSN":["1527-974X"],"issn-type":[{"value":"1527-974X","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2020,3]]},"published":{"date-parts":[[2019,12,30]]}}}