{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,12]],"date-time":"2026-01-12T05:10:43Z","timestamp":1768194643127,"version":"3.49.0"},"reference-count":54,"publisher":"Public Library of Science (PLoS)","issue":"6","license":[{"start":{"date-parts":[[2022,6,24]],"date-time":"2022-06-24T00:00:00Z","timestamp":1656028800000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100010661","name":"Horizon 2020 Framework Programme","doi-asserted-by":"publisher","award":["NORTE-08-5369-FSE-000018"],"award-info":[{"award-number":["NORTE-08-5369-FSE-000018"]}],"id":[{"id":"10.13039\/100010661","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Funda\u00e7\u00e3o para a Ci\u00eancia e Tecnologia","award":["UID\/MAT\/00006\/2019"],"award-info":[{"award-number":["UID\/MAT\/00006\/2019"]}]},{"name":"Funda\u00e7\u00e3o para a Ci\u00eancia e Tecnologia","award":["PTDC\/SAU-SER\/29180\/2017"],"award-info":[{"award-number":["PTDC\/SAU-SER\/29180\/2017"]}]}],"content-domain":{"domain":["www.plosone.org"],"crossmark-restriction":false},"short-container-title":["PLoS ONE"],"abstract":"<jats:p>Familial Hypercholesterolemia (FH) is an inherited disorder of cholesterol metabolism. Current criteria for FH diagnosis, like Simon Broome (SB) criteria, lead to high false positive rates. The aim of this work was to explore alternative classification procedures for FH diagnosis, based on different biological and biochemical indicators. For this purpose, logistic regression (LR), naive Bayes classifier (NB), random forest (RF) and extreme gradient boosting (XGB) algorithms were combined with Synthetic Minority Oversampling Technique (SMOTE), or threshold adjustment by maximizing Youden index (YI), and compared. Data was tested through a 10 \u00d7 10 repeated <jats:italic>k<\/jats:italic>-fold cross validation design. The LR model presented an overall better performance, as assessed by the areas under the receiver operating characteristics (AUROC) and precision-recall (AUPRC) curves, and several operating characteristics (OC), regardless of the strategy to cope with class imbalance. When adopting either data processing technique, significantly higher accuracy (<jats:italic>Acc<\/jats:italic>), <jats:italic>G<\/jats:italic>-mean and <jats:italic>F<\/jats:italic><jats:sub>1<\/jats:sub> score values were found for all classification algorithms, compared to SB criteria (<jats:italic>p<\/jats:italic> &lt; 0.01), revealing a more balanced predictive ability for both classes, and higher effectiveness in classifying FH patients. Adjustment of the cut-off values through pre or post-processing methods revealed a considerable gain in sensitivity (<jats:italic>Sens<\/jats:italic>) values (<jats:italic>p<\/jats:italic> &lt; 0.01). Although the performance of pre and post-processing strategies was similar, SMOTE does not cause model\u2019s parameters to loose interpretability. These results suggest a LR model combined with SMOTE can be an optimal approach to be used as a widespread screening tool.<\/jats:p>","DOI":"10.1371\/journal.pone.0269713","type":"journal-article","created":{"date-parts":[[2022,6,24]],"date-time":"2022-06-24T17:38:00Z","timestamp":1656092280000},"page":"e0269713","update-policy":"https:\/\/doi.org\/10.1371\/journal.pone.corrections_policy","source":"Crossref","is-referenced-by-count":18,"title":["Comparative study on the performance of different classification algorithms, combined with pre- and post-processing techniques to handle imbalanced data, in the diagnosis of adult patients with familial hypercholesterolemia"],"prefix":"10.1371","volume":"17","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9151-2121","authenticated-orcid":true,"given":"Jo\u00e3o","family":"Albuquerque","sequence":"first","affiliation":[]},{"given":"Ana Margarida","family":"Medeiros","sequence":"additional","affiliation":[]},{"given":"Ana Catarina","family":"Alves","sequence":"additional","affiliation":[]},{"given":"Mafalda","family":"Bourbon","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1257-2829","authenticated-orcid":true,"given":"Mar\u00edlia","family":"Antunes","sequence":"additional","affiliation":[]}],"member":"340","published-online":{"date-parts":[[2022,6,24]]},"reference":[{"issue":"1","key":"pone.0269713.ref001","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1007\/s40119-015-0037-z","article-title":"Familial hypercholesterolemia: a review of the natural history, diagnosis, and management","volume":"4","author":"O Najam","year":"2015","journal-title":"Cardiol Ther"},{"issue":"2","key":"pone.0269713.ref002","doi-asserted-by":"crossref","first-page":"169","DOI":"10.1016\/j.ccl.2015.01.001","article-title":"Familial hypercholesterolemia","volume":"33","author":"VE Bouhairie","year":"2015","journal-title":"Cardiol Clin"},{"issue":"5","key":"pone.0269713.ref003","doi-asserted-by":"crossref","first-page":"44","DOI":"10.1007\/s11886-017-0848-8","article-title":"Genetic architecture of familial hypercholesterolaemia","volume":"19","author":"M Sharifi","year":"2017","journal-title":"Curr Cardiol Rep"},{"issue":"5","key":"pone.0269713.ref004","doi-asserted-by":"crossref","first-page":"407","DOI":"10.1093\/aje\/kwh236","article-title":"Genetic causes of monogenic heterozygous familial hypercholesterolemia: a HuGE prevalence review","volume":"160","author":"MA Austin","year":"2004","journal-title":"Am J Epidemiol"},{"issue":"1","key":"pone.0269713.ref005","doi-asserted-by":"crossref","first-page":"39","DOI":"10.1186\/s12929-016-0256-1","article-title":"The genetics and screening of familial hypercholesterolaemia","volume":"23","author":"R Henderson","year":"2016","journal-title":"J Biomed Sci"},{"issue":"1","key":"pone.0269713.ref006","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/nrdp.2017.93","article-title":"Familial hypercholesterolaemia","volume":"3","author":"JC Defesche","year":"2017","journal-title":"Nat Rev Dis Primers"},{"key":"pone.0269713.ref007","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1016\/j.atherosclerosis.2018.06.855","article-title":"Epidemiology of familial hypercholesterolaemia: community and clinical","volume":"277","author":"AJ Vallejo-Vaz","year":"2018","journal-title":"Atherosclerosis"},{"issue":"20","key":"pone.0269713.ref008","doi-asserted-by":"crossref","first-page":"2553","DOI":"10.1016\/j.jacc.2020.03.057","article-title":"Worldwide prevalence of familial hypercholesterolemia: meta-analyses of 11 million subjects","volume":"75","author":"SO Beheshti","year":"2020","journal-title":"J Am Coll Cardiol"},{"issue":"45","key":"pone.0269713.ref009","doi-asserted-by":"crossref","first-page":"3478","DOI":"10.1093\/eurheartj\/eht273","article-title":"Familial hypercholesterolaemia is underdiagnosed and undertreated in the general population: guidance for clinicians to prevent coronary heart disease: consensus statement of the European Atherosclerosis Society","volume":"34","author":"BG Nordestgaard","year":"2013","journal-title":"Eur Heart J"},{"issue":"13","key":"pone.0269713.ref010","doi-asserted-by":"crossref","first-page":"962","DOI":"10.1093\/eurheartj\/eht015","article-title":"Diagnosis and treatment of familial hypercholesterolaemia","volume":"34","author":"GK Hovingh","year":"2013","journal-title":"Eur Heart J"},{"key":"pone.0269713.ref011","doi-asserted-by":"crossref","first-page":"893","DOI":"10.1136\/bmj.303.6807.893","article-title":"Risk of fatal coronary heart disease in familial hypercholesterolaemia","volume":"303","author":"Register Scientific Steering Committee on behalf of the Simon Broome","year":"1991","journal-title":"BMJ"},{"issue":"1","key":"pone.0269713.ref012","doi-asserted-by":"crossref","first-page":"155","DOI":"10.1016\/j.atherosclerosis.2004.12.001","article-title":"The relationship of molecular genetic to clinical diagnosis of familial hypercholesterolemia in a Danish population","volume":"180","author":"D Damgaard","year":"2005","journal-title":"Atherosclerosis"},{"issue":"1","key":"pone.0269713.ref013","doi-asserted-by":"crossref","first-page":"161","DOI":"10.1016\/j.atherosclerosis.2013.04.011","article-title":"Analysis of the frequency and spectrum of mutations recognised to cause familial hypercholesterolaemia in routine clinical practice in a UK specialist hospital lipid clinic","volume":"229","author":"M Futema","year":"2013","journal-title":"Atherosclerosis"},{"issue":"4","key":"pone.0269713.ref014","doi-asserted-by":"crossref","first-page":"1704","DOI":"10.1210\/jc.2017-02622","article-title":"A comparative analysis of phenotypic predictors of mutations in familial hypercholesterolemia","volume":"103","author":"DC Chan","year":"2018","journal-title":"J Clin Endocrinol Metab"},{"issue":"1","key":"pone.0269713.ref015","doi-asserted-by":"crossref","first-page":"e81998","DOI":"10.1371\/journal.pone.0081998","article-title":"Availability and quality of coronary heart disease family history in primary care medical records: implications for cardiovascular risk assessment","volume":"9","author":"P Dhiman","year":"2014","journal-title":"PLoS One"},{"key":"pone.0269713.ref016","doi-asserted-by":"crossref","DOI":"10.1002\/9781118548387","volume-title":"Applied logistic regression","author":"DW Hosmer","year":"2013","edition":"3"},{"issue":"4","key":"pone.0269713.ref017","doi-asserted-by":"crossref","first-page":"317","DOI":"10.1080\/08839519308949993","article-title":"Inductive and Bayesian learning in medical diagnosis","volume":"7","author":"I Kononenko","year":"1993","journal-title":"Appl Artif Intell"},{"issue":"1","key":"pone.0269713.ref018","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"L Breiman","year":"2001","journal-title":"Mach Learn"},{"issue":"2","key":"pone.0269713.ref019","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1007\/BF00058655","article-title":"Bagging predictors","volume":"24","author":"L Breiman","year":"1996","journal-title":"Mach Learn"},{"issue":"8","key":"pone.0269713.ref020","doi-asserted-by":"crossref","first-page":"832","DOI":"10.1109\/34.709601","article-title":"The random subspace method for constructing decision forests","volume":"20","author":"TK Ho","year":"1998","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"pone.0269713.ref021","doi-asserted-by":"crossref","unstructured":"Chen T, Guestrin C. Xgboost: A scalable tree boosting system. in: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 2016; 785\u2013794.","DOI":"10.1145\/2939672.2939785"},{"issue":"5","key":"pone.0269713.ref022","doi-asserted-by":"crossref","first-page":"1189","DOI":"10.1214\/aos\/1013203451","article-title":"Greedy function approximation: a gradient boosting machine","volume":"29","author":"JH Friedman","year":"2001","journal-title":"Ann Stat"},{"key":"pone.0269713.ref023","unstructured":"Ruder S. An overview of gradient descent optimization algorithms. arXiv preprint. 2016; arXiv:1609.04747."},{"key":"pone.0269713.ref024","doi-asserted-by":"crossref","first-page":"113","DOI":"10.1016\/j.ins.2013.07.007","article-title":"An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics","volume":"250","author":"V L\u00f3pez","year":"2013","journal-title":"Inf Sci"},{"issue":"4","key":"pone.0269713.ref025","doi-asserted-by":"crossref","first-page":"221","DOI":"10.1007\/s13748-016-0094-0","article-title":"Learning from imbalanced data: open challenges and future directions","volume":"5","author":"B Krawczyk","year":"2016","journal-title":"Prog Artif Intell"},{"key":"pone.0269713.ref026","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1613\/jair.953","article-title":"SMOTE: synthetic minority over-sampling technique","volume":"16","author":"NV Chawla","year":"2002","journal-title":"J Artif Intell Res"},{"key":"pone.0269713.ref027","unstructured":"Branco P, Torgo L, Ribeiro R. A survey of predictive modelling under imbalanced distributions. arXiv preprint. 2015; arXiv:1505.01658."},{"key":"pone.0269713.ref028","unstructured":"Provost F. Machine learning from imbalanced data sets 101. in: Proceedings of the AAAI\u20192000 workshop on imbalanced data sets. 2000; 68:1\u20133."},{"issue":"3","key":"pone.0269713.ref029","doi-asserted-by":"crossref","first-page":"400","DOI":"10.1111\/j.1751-5823.2012.00183.x","article-title":"Assessing the performance of classification methods","volume":"80","author":"DJ Hand","year":"2012","journal-title":"Int Stat Rev"},{"issue":"8","key":"pone.0269713.ref030","doi-asserted-by":"crossref","first-page":"861","DOI":"10.1016\/j.patrec.2005.10.010","article-title":"An introduction to ROC analysis","volume":"27","author":"T Fawcett","year":"2006","journal-title":"Pattern Recognit Lett"},{"key":"pone.0269713.ref031","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1155\/2017\/3762651","article-title":"Defining an optimal cut-point value in ROC analysis: an alternative approach","author":"I Unal","year":"2017","journal-title":"Comput Math Methods Med"},{"issue":"3","key":"pone.0269713.ref032","doi-asserted-by":"crossref","first-page":"297","DOI":"10.11613\/BM.2016.034","article-title":"On determining the most appropriate test cut-off value: the case of tests with continuous results","volume":"26","author":"F Habibzadeh","year":"2016","journal-title":"Biochem Med"},{"key":"pone.0269713.ref033","doi-asserted-by":"crossref","unstructured":"Davis J, Goadrich M. The relationship between Precision-Recall and ROC curves. in: Proceedings of the 23rd international conference on Machine learning. 2006; 233\u2013240.","DOI":"10.1145\/1143844.1143874"},{"issue":"3","key":"pone.0269713.ref034","doi-asserted-by":"crossref","first-page":"e0118432","DOI":"10.1371\/journal.pone.0118432","article-title":"The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets","volume":"10","author":"T Saito","year":"2015","journal-title":"PloS One"},{"issue":"8","key":"pone.0269713.ref035","first-page":"565","article-title":"Selection of individuals for genetic testing for familial hypercholesterolaemia: development and external validation of a prediction model for the presence of a mutation causing familial hypercholesterolaemia","volume":"38","author":"J Besseling","year":"2017","journal-title":"Eur Heart J"},{"issue":"2","key":"pone.0269713.ref036","first-page":"233","article-title":"Improving identification of familial hypercholesterolaemia in primary care: derivation and validation of the familial hypercholesterolaemia case ascertainment tool (FAMCAT)","volume":"238","author":"SF Weng","year":"2015","journal-title":"Atherosclerosis"},{"issue":"5","key":"pone.0269713.ref037","doi-asserted-by":"crossref","first-page":"e256","DOI":"10.1016\/S2468-2667(19)30061-1","article-title":"Detection of familial hypercholesterolaemia: external validation of the FAMCAT clinical case-finding algorithm to identify patients in primary care","volume":"4","author":"S Weng","year":"2019","journal-title":"Lancet Public Health"},{"issue":"1","key":"pone.0269713.ref038","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41746-019-0101-5","article-title":"Finding missed cases of familial hypercholesterolemia in health systems using machine learning","volume":"2","author":"JM Banda","year":"2019","journal-title":"NPJ Digit Med"},{"issue":"15","key":"pone.0269713.ref039","doi-asserted-by":"crossref","first-page":"1639","DOI":"10.1177\/2047487319898951","article-title":"Virtual genetic diagnosis for familial hypercholesterolemia powered by machine learning","volume":"27","author":"A Pina","year":"2020","journal-title":"Eur J Prev Cardiol"},{"issue":"1","key":"pone.0269713.ref040","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41746-020-00349-5","article-title":"Performance and clinical utility of supervised machine-learning approaches in detecting familial hypercholesterolaemia in primary care","volume":"3","author":"RK Akyea","year":"2020","journal-title":"NPJ Digit Med"},{"key":"pone.0269713.ref041","unstructured":"Niehaus KE, Banda JM, Knowles JW, Shah NH. FIND FH\u2014A phenotype model to identify patients with familial hypercholesterolemia. in: Proceedings of Data Mining for Medical Informatics Workshop. 2015."},{"issue":"2","key":"pone.0269713.ref042","doi-asserted-by":"crossref","first-page":"553","DOI":"10.1016\/j.atherosclerosis.2010.07.012","article-title":"Update of the Portuguese Familial Hypercholesterolaemia Study","volume":"212","author":"AM Medeiros","year":"2010","journal-title":"Atherosclerosis"},{"key":"pone.0269713.ref043","doi-asserted-by":"crossref","first-page":"e49","DOI":"10.1016\/j.atherosclerosis.2016.07.391","article-title":"E_LIPID: Characterization of the lipid profile in the Portuguese population","volume":"252","author":"C Mariano","year":"2016","journal-title":"Atherosclerosis"},{"issue":"11","key":"pone.0269713.ref044","doi-asserted-by":"crossref","first-page":"3956","DOI":"10.1210\/jc.2012-1563","article-title":"Familial hypercholesterolemia in the Danish general population: prevalence, coronary artery disease, and cholesterol-lowering medication","volume":"97","author":"M Benn","year":"2012","journal-title":"J Clin Endocrinol Metab"},{"issue":"12","key":"pone.0269713.ref045","doi-asserted-by":"crossref","first-page":"980","DOI":"10.1038\/gim.2015.14","article-title":"The importance of an integrated analysis of clinical, molecular, and functional data for the genetic diagnosis of familial hypercholesterolemia","volume":"17","author":"A Benito-Vicente","year":"2015","journal-title":"Genet Med"},{"issue":"2","key":"pone.0269713.ref046","first-page":"1","article-title":"Single versus multiple imputation methods applied to classify dyslipidemic patients concerning statin usage: a comparative performance study","volume":"2","author":"J Albuquerque","year":"2020","journal-title":"J Stat Health Dec"},{"issue":"7","key":"pone.0269713.ref047","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18637\/jss.v074.i07","article-title":"Imputation with the R Package VIM","volume":"74","author":"A Kowarik","year":"2016","journal-title":"J Stat Softw"},{"issue":"1","key":"pone.0269713.ref048","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/2193-1801-2-222","article-title":"Principled missing data methods for researchers","volume":"2","author":"Y Dong","year":"2013","journal-title":"SpringerPlus"},{"issue":"5","key":"pone.0269713.ref049","doi-asserted-by":"crossref","first-page":"1763","DOI":"10.1213\/ANE.0000000000002864","article-title":"Correlation coefficients: appropriate use and interpretation","volume":"126","author":"P Schober","year":"2018","journal-title":"Anesth Analg"},{"key":"pone.0269713.ref050","first-page":"6","article-title":"randomForest: Breiman and Cutler\u2019s random forests for classification and regression","author":"A Liaw","year":"2015","journal-title":"R package version 4"},{"issue":"1","key":"pone.0269713.ref051","doi-asserted-by":"crossref","first-page":"168","DOI":"10.1016\/j.aci.2018.08.003","article-title":"Classification assessment methods","volume":"17","author":"A Tharwat","year":"2018","journal-title":"Appl Comput Inform"},{"issue":"4","key":"pone.0269713.ref052","doi-asserted-by":"crossref","first-page":"e203959","DOI":"10.1001\/jamanetworkopen.2020.3959","article-title":"Association of rare pathogenic DNA variants for familial hypercholesterolemia, hereditary breast and ovarian cancer syndrome, and lynch syndrome with disease risk in adults according to family history","volume":"3","author":"AP Patel","year":"2020","journal-title":"JAMA Netw Open"},{"issue":"12","key":"pone.0269713.ref053","doi-asserted-by":"crossref","first-page":"1373","DOI":"10.1016\/S0895-4356(96)00236-3","article-title":"A simulation study of the number of events per variable in logistic regression analysis","volume":"49","author":"P Peduzzi","year":"1996","journal-title":"J Clin Epidemiol"},{"issue":"3","key":"pone.0269713.ref054","doi-asserted-by":"crossref","first-page":"239","DOI":"10.1023\/A:1024068626366","article-title":"Inference for the generalization error","volume":"52","author":"C Nadeau","year":"2003","journal-title":"Mach Learn"}],"container-title":["PLOS ONE"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pone.0269713","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,6,24]],"date-time":"2022-06-24T17:38:57Z","timestamp":1656092337000},"score":1,"resource":{"primary":{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pone.0269713"}},"subtitle":[],"editor":[{"given":"Sotirios","family":"Koukoulas","sequence":"first","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2022,6,24]]},"references-count":54,"journal-issue":{"issue":"6","published-online":{"date-parts":[[2022,6,24]]}},"URL":"https:\/\/doi.org\/10.1371\/journal.pone.0269713","relation":{},"ISSN":["1932-6203"],"issn-type":[{"value":"1932-6203","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,6,24]]}}}