{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,30]],"date-time":"2026-04-30T03:12:48Z","timestamp":1777518768521,"version":"3.51.4"},"reference-count":121,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2024,9,10]],"date-time":"2024-09-10T00:00:00Z","timestamp":1725926400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Bioinform."],"abstract":"<jats:p>Machine learning (ML) has shown great promise in genetics and genomics where large and complex datasets have the potential to provide insight into many aspects of disease risk, pathogenesis of genetic disorders, and prediction of health and wellbeing. However, with this possibility there is a responsibility to exercise caution against biases and inflation of results that can have harmful unintended impacts. Therefore, researchers must understand the metrics used to evaluate ML models which can influence the critical interpretation of results. In this review we provide an overview of ML metrics for clustering, classification, and regression and highlight the advantages and disadvantages of each. We also detail common pitfalls that occur during model evaluation. Finally, we provide examples of how researchers can assess and utilise the results of ML models, specifically from a genomics perspective.<\/jats:p>","DOI":"10.3389\/fbinf.2024.1457619","type":"journal-article","created":{"date-parts":[[2024,9,10]],"date-time":"2024-09-10T09:21:11Z","timestamp":1725960071000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":88,"title":["A review of model evaluation metrics for machine learning in genetics and genomics"],"prefix":"10.3389","volume":"4","author":[{"given":"Catriona","family":"Miller","sequence":"first","affiliation":[]},{"given":"Theo","family":"Portlock","sequence":"additional","affiliation":[]},{"given":"Denis M.","family":"Nyaga","sequence":"additional","affiliation":[]},{"given":"Justin M.","family":"O\u2019Sullivan","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2024,9,10]]},"reference":[{"key":"B1","unstructured":"PyCaret: an open source, low-code machine learning library in Python\n          \n          \n            \n              Ali\n              M.\n            \n          \n          \n          2020"},{"key":"B2","doi-asserted-by":"publisher","first-page":"7781","DOI":"10.3390\/ijms24097781","article-title":"Machine learning models for the identification of prognostic and predictive cancer biomarkers: a systematic review","volume":"2023","author":"Al-Tashi","year":"2023","journal-title":"Int. J. Mol. Sci."},{"key":"B3","doi-asserted-by":"publisher","first-page":"402","DOI":"10.1007\/s41666-018-0029-6","article-title":"Nearest consensus clustering classification to identify subclasses and predict disease","volume":"2","author":"Alyousef","year":"2018","journal-title":"J. Healthc. Inf. Res."},{"key":"B4","doi-asserted-by":"publisher","first-page":"1885","DOI":"10.1182\/blood.2020010603","article-title":"Machine learning integrates genomic signatures for subclassification beyond primary and secondary acute myeloid leukemia","volume":"138","author":"Awada","year":"2021","journal-title":"Blood"},{"key":"B5","doi-asserted-by":"crossref","DOI":"10.1109\/ELNANO.2017.7939756","article-title":"Criterial analysis of gene expression sequences to create the objective clustering inductive technology","volume-title":"2017 IEEE 37th international conference on electronics and nanotechnology (ELNANO)","author":"Babichev","year":"2017"},{"key":"B6","doi-asserted-by":"publisher","first-page":"412","DOI":"10.1093\/bioinformatics\/16.5.412","article-title":"Assessing the accuracy of prediction algorithms for classification: an overview","volume":"16","author":"Baldi","year":"2000","journal-title":"Bioinformatics"},{"key":"B7","doi-asserted-by":"publisher","first-page":"169","DOI":"10.1109\/tcbb.2023.3343808","article-title":"Genomic machine learning meta-regression: insights on associations of study features with reported model performance","volume":"21","author":"Barnett","year":"2023","journal-title":"IEEE\/ACM Trans. Comput. Biol. Bioinform"},{"key":"B8","doi-asserted-by":"publisher","first-page":"809","DOI":"10.1534\/genetics.118.301298","article-title":"Can deep learning improve genomic prediction of complex human traits?","volume":"210","author":"Bellot","year":"2018","journal-title":"Genetics"},{"key":"B9","doi-asserted-by":"publisher","first-page":"825","DOI":"10.1016\/j.eswa.2006.10.022","article-title":"Comparison of classification accuracy using Cohen\u2019s Weighted Kappa","volume":"34","author":"Ben-David","year":"2008","journal-title":"Expert Syst. Appl."},{"key":"B10","doi-asserted-by":"publisher","first-page":"15790","DOI":"10.1038\/s41598-019-52134-4","article-title":"Prediction and analysis of skin cancer progression using genomics profiles of patients","volume":"9","author":"Bhalla","year":"2019","journal-title":"Sci. Rep."},{"key":"B11","doi-asserted-by":"publisher","first-page":"118","DOI":"10.2307\/270820","article-title":"Robustness in regression analysis","volume":"3","author":"Bohrnstedt","year":"1971","journal-title":"Sociol. Methodol."},{"key":"B12","doi-asserted-by":"publisher","first-page":"1121","DOI":"10.1007\/s10803-014-2268-6","article-title":"Applying machine learning to facilitate autism diagnostics: pitfalls and promises","volume":"45","author":"Bone","year":"2015","journal-title":"J. Autism Dev. Disord."},{"key":"B13","doi-asserted-by":"publisher","first-page":"70","DOI":"10.1038\/s41380-020-0825-2","article-title":"Machine learning for genetic prediction of psychiatric disorders: a systematic review","volume":"26","author":"Bracher-Smith","year":"2021","journal-title":"Mol. Psychiatry"},{"key":"B14","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1080\/03610927408827101","article-title":"A dendrite method for cluster analysis","volume":"3","author":"Cali\u00f1ski","year":"1974","journal-title":"Commun. Statistics"},{"key":"B15","doi-asserted-by":"publisher","first-page":"2267","DOI":"10.3390\/ijms19082267","article-title":"ClusterMI: detecting high-order SNP interactions based on clustering and mutual information","volume":"19","author":"Cao","year":"2018","journal-title":"Int. J. Mol. Sci."},{"key":"B16","doi-asserted-by":"publisher","first-page":"5762","DOI":"10.1016\/j.csbj.2021.10.009","article-title":"AI applications in functional genomics","volume":"19","author":"Caudai","year":"2021","journal-title":"Comput. Struct. Biotechnol. J."},{"key":"B17","doi-asserted-by":"publisher","first-page":"140","DOI":"10.1080\/10408363.2023.2259466","article-title":"Emerging applications of machine learning in genomic medicine and healthcare","volume":"61","author":"Chafai","year":"2024","journal-title":"Crit. Rev. Clin. Lab. Sci."},{"key":"B18","doi-asserted-by":"publisher","first-page":"1247","DOI":"10.5194\/gmd-7-1247-2014","article-title":"Root mean square error (RMSE) or mean absolute error (MAE)? -Arguments against avoiding RMSE in the literature","volume":"7","author":"Chai","year":"2014","journal-title":"Geosci. Model Dev."},{"key":"B19","doi-asserted-by":"crossref","unstructured":"Illusory generalizability of clinical prediction models\n          \n          164\n          167\n          \n            \n              Chekroud\n              A. M.\n            \n            \n              Hawrilenko\n              M.\n            \n            \n              Loho\n              H.\n            \n            \n              Bondar\n              J.\n            \n            \n              Gueorguieva\n              R.\n            \n            \n              Hasan\n              A.\n            \n          \n          10.1126\/science.adg8538\n          383\n          2024","DOI":"10.1126\/science.adg8538"},{"key":"B20","doi-asserted-by":"publisher","first-page":"532","DOI":"10.1007\/s11481-018-9811-8","article-title":"Prediction of schizophrenia diagnosis by integration of genetically correlated conditions and traits","volume":"13","author":"Chen","year":"2018","journal-title":"J. Neuroimmune Pharmacol."},{"key":"B21","doi-asserted-by":"publisher","first-page":"940","DOI":"10.1038\/s41588-022-01102-2","article-title":"A sequence-based global map of regulatory activity for deciphering human genetics","volume":"54","author":"Chen","year":"2022","journal-title":"Nat. Genet."},{"key":"B22","doi-asserted-by":"publisher","first-page":"8869","DOI":"10.1109\/ACCESS.2017.2694446","article-title":"Disease prediction by machine learning over big data from healthcare communities","volume":"5","author":"Chen","year":"2017","journal-title":"IEEE Access"},{"key":"B23","doi-asserted-by":"publisher","first-page":"130698","DOI":"10.1109\/ACCESS.2021.3114099","article-title":"An overview of fairness in clustering","volume":"9","author":"Chhabra","year":"2021","journal-title":"IEEE Access"},{"key":"B24","doi-asserted-by":"publisher","first-page":"6","DOI":"10.1186\/s12864-019-6413-7","article-title":"The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation","volume":"21","author":"Chicco","year":"2020","journal-title":"BMC Genomics"},{"key":"B25","doi-asserted-by":"publisher","first-page":"4","DOI":"10.1186\/s13040-023-00322-4","article-title":"The Matthews correlation coefficient (MCC) should replace the ROC AUC as the standard metric for assessing binary classification","volume":"16","author":"Chicco","year":"2023","journal-title":"BioData Min."},{"key":"B26","doi-asserted-by":"publisher","first-page":"736","DOI":"10.3390\/genes12050736","article-title":"Statistical learning methods applicable to genome-wide association studies on unbalanced case-control disease data","volume":"12","author":"Dai","year":"2021","journal-title":"Genes (Basel)"},{"key":"B27","doi-asserted-by":"publisher","first-page":"224","DOI":"10.1109\/TPAMI.1979.4766909","article-title":"A cluster separation measure","volume":"1","author":"Davies","year":"1979","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell. PAMI-"},{"key":"B28","doi-asserted-by":"publisher","first-page":"e0222916","DOI":"10.1371\/journal.pone.0222916","article-title":"Why Cohen\u2019s Kappa should be avoided as performance measure in classification","volume":"14","author":"Delgado","year":"2019","journal-title":"PLoS One"},{"key":"B29","doi-asserted-by":"publisher","first-page":"1545","DOI":"10.1534\/genetics.109.104935","article-title":"Reliability of genomic predictions across multiple populations","volume":"183","author":"De Roos","year":"2009","journal-title":"Genetics"},{"key":"B30","doi-asserted-by":"publisher","first-page":"112866","DOI":"10.1016\/j.eswa.2019.112866","article-title":"Unbalanced breast cancer data classification using novel fitness functions in genetic programming","volume":"140","author":"Devarriya","year":"2020","journal-title":"Expert Syst. Appl."},{"key":"B31","doi-asserted-by":"publisher","first-page":"313","DOI":"10.3390\/genes14020313","article-title":"Using machine learning to explore shared genetic pathways and possible endophenotypes in autism spectrum disorder","volume":"14","author":"Di Giovanni","year":"2023","journal-title":"Genes (Basel)"},{"key":"B32","doi-asserted-by":"publisher","first-page":"19","DOI":"10.1002\/cem.1189","article-title":"Use of cluster separation indices and the influence of outliers: application of two new separation indices, the modified silhouette index and the overlap coefficient to simulated data and mouse urine metabolomic profiles","volume":"23","author":"Dixon","year":"2009","journal-title":"J. Chemom."},{"key":"B33","doi-asserted-by":"publisher","first-page":"1283","DOI":"10.1093\/ije\/dyab046","article-title":"Genetic risk scores for cardiometabolic traits in sub-Saharan African populations","volume":"50","author":"Ekoru","year":"2021","journal-title":"Int. J. Epidemiol."},{"key":"B34","doi-asserted-by":"publisher","first-page":"513","DOI":"10.2214\/AJR.18.20490","article-title":"Artificial intelligence for medical image analysis: a guide for authors and reviewers","volume":"212","author":"England","year":"2019","journal-title":"Am. J. Roentgenol."},{"key":"B35","doi-asserted-by":"publisher","first-page":"94","DOI":"10.1186\/s13023-020-01374-z","article-title":"Diagnosis support systems for rare diseases: a scoping review","volume":"15","author":"Faviez","year":"2020","journal-title":"Orphanet J. Rare Dis."},{"key":"B36","doi-asserted-by":"publisher","first-page":"553","DOI":"10.2307\/2288117","article-title":"A method for comparing two hierarchical clusterings","volume":"78","author":"Fowlkes","year":"1983","journal-title":"J. Am. Stat. Assoc."},{"key":"B37","doi-asserted-by":"publisher","first-page":"2021","DOI":"10.1155\/2021\/6663455","article-title":"Improving the accuracy for analyzing heart diseases prediction based on the ensemble method","volume":"2021","author":"Gao","year":"2021","journal-title":"Complexity"},{"key":"B38","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1162\/neco.1992.4.1.1","article-title":"Neural networks and the bias\/variance dilemma","volume":"4","author":"Geman","year":"1992","journal-title":"Neural comput."},{"key":"B39","doi-asserted-by":"publisher","first-page":"917","DOI":"10.1186\/s12864-017-4273-6","article-title":"Higher recall in metagenomic sequence classification exploiting overlapping reads","volume":"18","author":"Girotto","year":"2017","journal-title":"BMC Genomics"},{"key":"B40","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1109\/I2CT51068.2021.9418099","article-title":"Recall-based machine learning approach for early detection of cervical cancer","volume-title":"2021 6th international conference for convergence in technology (I2CT)","author":"Gupta","year":"2021"},{"key":"B41","doi-asserted-by":"publisher","first-page":"28","DOI":"10.1186\/s11689-022-09438-w","article-title":"Bringing machine learning to research on intellectual and developmental disabilities: taking inspiration from neurological diseases","volume":"14","author":"Gupta","year":"2022","journal-title":"J. Neurodev. Disord."},{"key":"B42","doi-asserted-by":"publisher","first-page":"520","DOI":"10.1038\/s41576-019-0144-0","article-title":"Genomics of disease risk in globally diverse populations","volume":"20","author":"Gurdasani","year":"2019","journal-title":"Nat. Rev. Genet."},{"key":"B43","doi-asserted-by":"publisher","first-page":"14738","DOI":"10.1038\/s41598-017-15137-7","article-title":"Development of multivariable models to predict change in Body Mass Index within a clinical trial population of psychotic individuals","volume":"7","author":"Harrison","year":"2017","journal-title":"Sci. Rep."},{"key":"B44","doi-asserted-by":"publisher","first-page":"504","DOI":"10.1186\/s12887-022-03554-1","article-title":"Assessing whether genetic scores explain extra variation in birthweight, when added to clinical and anthropometric measures","volume":"22","author":"Haulder","year":"2022","journal-title":"BMC Pediatr."},{"key":"B45","doi-asserted-by":"publisher","first-page":"1132","DOI":"10.1038\/s41592-021-01256-7","article-title":"Reproducibility standards for machine learning in the life sciences","volume":"18","author":"Heil","year":"2021","journal-title":"Nat. Methods"},{"key":"B46","doi-asserted-by":"publisher","first-page":"785436","DOI":"10.3389\/fgene.2021.785436","article-title":"Machine learning identifies six genetic variants and alterations in the heart atrial appendage as key contributors to PD risk predictivity","volume":"12","author":"Ho","year":"2022","journal-title":"Front. Genet."},{"key":"B47","doi-asserted-by":"publisher","first-page":"267","DOI":"10.3389\/fgene.2019.00267","article-title":"Machine learning SNP based prediction for precision medicine","volume":"10","author":"Ho","year":"2019","journal-title":"Front. Genet."},{"key":"B48","doi-asserted-by":"publisher","first-page":"5481","DOI":"10.5194\/gmd-15-5481-2022","article-title":"Root-mean-square error (RMSE) or mean absolute error (MAE): when to use them or not","volume":"15","author":"Hodson","year":"2022","journal-title":"Geosci. Model Dev."},{"key":"B49","doi-asserted-by":"publisher","first-page":"2641","DOI":"10.1093\/jamia\/ocab203","article-title":"Deep significance clustering: a novel approach for identifying risk-stratified and predictive patient subgroups","volume":"28","author":"Huang","year":"2021","journal-title":"J. Am. Med. Inf. Assoc."},{"key":"B50","doi-asserted-by":"publisher","first-page":"193","DOI":"10.1007\/bf01908075","article-title":"Comparing partitions","volume":"2","author":"Hubert","year":"1985","journal-title":"J. Classif."},{"key":"B51","doi-asserted-by":"publisher","first-page":"450","DOI":"10.1097\/EDE.0b013e31821b506e","article-title":"The false-positive to false-negative ratio in epidemiologic studies","volume":"22","author":"Ioannidis","year":"2011","journal-title":"Epidemiology"},{"key":"B52","doi-asserted-by":"publisher","first-page":"S2","DOI":"10.1186\/1471-2105-15-S2-S2","article-title":"On the selection of appropriate distances for gene expression data clustering","volume":"15","author":"Jaskowiak","year":"2014","journal-title":"BMC Bioinforma."},{"key":"B53","doi-asserted-by":"crossref","first-page":"245","DOI":"10.1109\/ACII.2013.47","article-title":"Facing imbalanced data - recommendations for the use of performance metrics","volume-title":"Proceedings - 2013 humaine association conference on affective computing and intelligent interaction, ACII 2013","author":"Jeni","year":"2013"},{"key":"B54","unstructured":"REFORMS: reporting standards for machine learning based science\n          \n          \n            \n              Kapoor\n              S.\n            \n            \n              Cantrell\n              E.\n            \n            \n              Peng\n              K.\n            \n            \n              Pham\n              T. H.\n            \n            \n              Bail\n              C. A.\n            \n            \n              Gundersen\n              O. E.\n            \n          \n          \n          2023"},{"key":"B55","doi-asserted-by":"publisher","first-page":"100804","DOI":"10.1016\/j.patter.2023.100804","article-title":"Leakage and the reproducibility crisis in machine-learning-based science","volume":"4","author":"Kapoor","year":"2023","journal-title":"Patterns"},{"key":"B56","doi-asserted-by":"publisher","first-page":"609","DOI":"10.1016\/j.ins.2021.11.036","article-title":"Root mean square error or mean absolute error? Use their ratio as well","volume":"585","author":"Karunasingha","year":"2022","journal-title":"Inf. Sci. (N Y)"},{"key":"B57","doi-asserted-by":"publisher","first-page":"85","DOI":"10.1186\/s13073-021-00902-1","article-title":"Integrative statistical analyses of multiple liquid biopsy analytes in metastatic breast cancer","volume":"13","author":"Keup","year":"2021","journal-title":"Genome Med."},{"key":"B58","doi-asserted-by":"publisher","first-page":"1466","DOI":"10.1016\/j.csbj.2020.06.017","article-title":"Deep learning models in genomics; are we there yet?","volume":"18","author":"Koumakis","year":"2020","journal-title":"Comput. Struct. Biotechnol. J."},{"key":"B59","doi-asserted-by":"publisher","first-page":"104431","DOI":"10.1016\/j.compbiomed.2021.104431","article-title":"Comparison of various approaches to combine logistic regression with genetic algorithms in survival prediction of hepatocellular carcinoma","volume":"134","author":"Ksi\u0105\u017cek","year":"2021","journal-title":"Comput. Biol. Med."},{"key":"B60","doi-asserted-by":"publisher","first-page":"bbac611","DOI":"10.1093\/bib\/bbac611","article-title":"A comparison between similarity matrices for principal component analysis to assess population stratification in sequenced genetic data sets","volume":"24","author":"Lee","year":"2023","journal-title":"Brief. Bioinform"},{"key":"B61","doi-asserted-by":"publisher","first-page":"3485","DOI":"10.1038\/s41598-020-60595-1","article-title":"Prediction of Alzheimer\u2019s disease using blood gene expression data","volume":"10","author":"Lee","year":"2020","journal-title":"Sci. Rep."},{"key":"B62","doi-asserted-by":"publisher","first-page":"477","DOI":"10.1534\/genetics.118.301267","article-title":"Accurate genomic prediction of human height","volume":"210","author":"Lello","year":"2018","journal-title":"Genetics"},{"key":"B63","doi-asserted-by":"publisher","first-page":"737","DOI":"10.1080\/01621459.2012.688462","article-title":"DD-classifier: nonparametric classification procedure based on DD-plot","volume":"107","author":"Li","year":"2012","journal-title":"J. Am. Stat. Assoc."},{"key":"B64","doi-asserted-by":"publisher","first-page":"183","DOI":"10.1111\/biom.12962","article-title":"Bayesian negative binomial mixture regression models for the analysis of sequence count and methylation data","volume":"75","author":"Li","year":"2019","journal-title":"Biometrics"},{"key":"B65","doi-asserted-by":"publisher","first-page":"5416","DOI":"10.1007\/s10489-022-03657-3","article-title":"Uncertainty measurement for a gene space based on class-consistent technology: an application in gene selection","volume":"53","author":"Li","year":"2023","journal-title":"Appl. Intell."},{"key":"B66","doi-asserted-by":"publisher","first-page":"321","DOI":"10.1038\/nrg3920","article-title":"Machine learning applications in genetics and genomics","volume":"16","author":"Libbrecht","year":"2015","journal-title":"Nat. Rev. Genet."},{"key":"B67","doi-asserted-by":"publisher","first-page":"bbab207","DOI":"10.1093\/bib\/bbab207","article-title":"Deep learning model reveals potential risk genes for ADHD, especially Ephrin receptor gene EPHA5","volume":"22","author":"Liu","year":"2021","journal-title":"Brief. Bioinform"},{"key":"B68","doi-asserted-by":"publisher","first-page":"30","DOI":"10.1016\/j.jbi.2018.07.004","article-title":"An unsupervised machine learning method for discovering patient clusters based on genetic signatures","volume":"85","author":"Lopez","year":"2018","journal-title":"J. Biomed. Inf."},{"key":"B69","doi-asserted-by":"publisher","first-page":"2256","DOI":"10.1093\/bioinformatics\/btm322","article-title":"Annotation-based distance measures for patient subgroup discovery in clinical microarray studies","volume":"23","author":"Lottaz","year":"2007","journal-title":"Bioinformatics"},{"key":"B70","doi-asserted-by":"publisher","first-page":"1973","DOI":"10.1007\/s00125-021-05485-5","article-title":"Comparison between data-driven clusters and models based on clinical features to predict outcomes in type 2 diabetes: nationwide observational study","volume":"64","author":"Lugner","year":"2021","journal-title":"Diabetologia"},{"key":"B71","doi-asserted-by":"publisher","first-page":"1529","DOI":"10.1007\/s00439-021-02393-x","article-title":"The promise of automated machine learning for the genetic analysis of complex traits","volume":"141","author":"Manduchi","year":"2022","journal-title":"Hum. Genet."},{"key":"B72","doi-asserted-by":"publisher","first-page":"64","DOI":"10.1016\/j.trsl.2011.08.001","article-title":"Molecular genetic studies of complex phenotypes","volume":"159","author":"Marian","year":"2012","journal-title":"Transl. Res."},{"key":"B73","doi-asserted-by":"publisher","first-page":"442","DOI":"10.1016\/0005-2795(75)90109-9","article-title":"Comparison of the predicted and observed secondary structure of T4 phage lysozyme","volume":"405","author":"Matthews","year":"1975","journal-title":"Biochimica Biophysica Acta (BBA) - Protein Struct."},{"key":"B74","doi-asserted-by":"publisher","first-page":"1515","DOI":"10.1007\/s00439-021-02402-z","article-title":"What makes a good prediction? Feature importance and beginning to open the black box of machine learning in genetics","volume":"141","author":"Musolf","year":"2022","journal-title":"Hum. Genet."},{"key":"B75","doi-asserted-by":"publisher","first-page":"97025","DOI":"10.18632\/oncotarget.20923","article-title":"Precision and recall oncology: combining multiple gene mutations for improved identification of drug-sensitive tumours","volume":"8","author":"Naulaerts","year":"2017","journal-title":"Oncotarget"},{"key":"B76","doi-asserted-by":"publisher","first-page":"1397","DOI":"10.1111\/risa.13239","article-title":"Machine learning methods as a tool for predicting risk of illness applying next-generation sequencing data","volume":"39","author":"Njage","year":"2019","journal-title":"Risk Anal."},{"key":"B77","doi-asserted-by":"publisher","first-page":"438","DOI":"10.1016\/j.ygeno.2017.06.009","article-title":"Analysis of genetic association using hierarchical clustering and cluster validation indices","volume":"109","author":"Pagnuco","year":"2017","journal-title":"Genomics"},{"key":"B78","doi-asserted-by":"publisher","first-page":"2152","DOI":"10.1016\/j.csbj.2024.05.003","article-title":"3D clustering of gene expression data from systemic autoinflammatory diseases using self-organizing maps (Clust3D)","volume":"23","author":"Papagiannopoulos","year":"2024","journal-title":"Comput. Struct. Biotechnol. J."},{"key":"B79","first-page":"53","article-title":"A comparison between the silhouette index and the davies-bouldin index in labelling IDS clusters","volume-title":"Proceedings of the 11th Nordic workshop of secure IT systems","author":"Petrovi\u2019c","year":"2006"},{"key":"B80","article-title":"Improving reproducibility in machine learning research (A report from the NeurIPS 2019 reproducibility program)","author":"Pineau","year":"2021"},{"key":"B81","doi-asserted-by":"publisher","first-page":"534","DOI":"10.1001\/jamapsychiatry.2019.3671","article-title":"Establishment of best practices for evidence for prediction: a review","volume":"77","author":"Poldrack","year":"2020","journal-title":"JAMA Psychiatry"},{"key":"B82","doi-asserted-by":"publisher","first-page":"927312","DOI":"10.3389\/fbinf.2022.927312","article-title":"A review of feature selection methods for machine learning-based disease risk prediction","volume":"2","author":"Pudjihartono","year":"2022","journal-title":"Front. Bioinforma."},{"key":"B83","article-title":"Imbalanced dataset classification and solutions: a review","volume":"5","author":"Ramyachitra","year":"2014","journal-title":"Int. J. Comput. Bus. Res."},{"key":"B84","article-title":"Adjusting for chance clustering comparison measures","author":"Romano","year":"2016"},{"key":"B85","doi-asserted-by":"publisher","first-page":"53","DOI":"10.1016\/0377-0427(87)90125-7","article-title":"Silhouettes: a graphical aid to the interpretation and validation of cluster analysis","volume":"20","author":"Rousseeuw","year":"1987","journal-title":"J. Comput. Appl. Math."},{"key":"B86","doi-asserted-by":"crossref","first-page":"272","DOI":"10.1007\/978-3-319-68765-0_23","article-title":"Estimating sequence similarity from contig sets","volume-title":"Advances in intelligent data analysis XVI","author":"Ry\u0161av\u00fd","year":"2017"},{"key":"B87","doi-asserted-by":"publisher","first-page":"e0175057","DOI":"10.1371\/journal.pone.0175057","article-title":"Associations between body fat variability and later onset of cardiovascular disease risk factors","volume":"12","author":"Saito","year":"2017","journal-title":"PLoS One"},{"key":"B88","doi-asserted-by":"publisher","first-page":"124","DOI":"10.1016\/j.asoc.2016.11.026","article-title":"Classification of human cancer diseases by gene expression profiles","volume":"50","author":"Salem","year":"2017","journal-title":"Appl. Soft Comput."},{"key":"B89","doi-asserted-by":"publisher","first-page":"1059","DOI":"10.1007\/s13258-021-01128-6","article-title":"Enhancing performance of gene expression value prediction with cluster-based regression","volume":"43","author":"Seok","year":"2021","journal-title":"Genes Genomics"},{"key":"B90","doi-asserted-by":"publisher","first-page":"84","DOI":"10.1186\/s12859-022-05047-5","article-title":"Gene regulation network inference using k-nearest neighbor-based mutual information estimation: revisiting an old DREAM","volume":"24","author":"Shachaf","year":"2023","journal-title":"BMC Bioinforma."},{"key":"B91","doi-asserted-by":"crossref","first-page":"747","DOI":"10.1109\/DSAA49011.2020.00096","article-title":"Cluster quality analysis using silhouette score","volume-title":"2020 IEEE 7th international conference on data science and advanced analytics (DSAA)","author":"Shahapure","year":"2020"},{"key":"B92","doi-asserted-by":"publisher","first-page":"227","DOI":"10.1007\/s13534-020-00156-7","article-title":"A deep learning approach for prediction of Parkinson\u2019s disease progression","volume":"10","author":"Shahid","year":"2020","journal-title":"Biomed. Eng. Lett."},{"key":"B93","doi-asserted-by":"publisher","first-page":"25","DOI":"10.1186\/s40168-021-01199-3","article-title":"Performance determinants of unsupervised clustering methods for microbiome data","volume":"10","author":"Shi","year":"2022","journal-title":"Microbiome"},{"key":"B94","doi-asserted-by":"publisher","first-page":"E2700","DOI":"10.1210\/clinem\/dgab093","article-title":"Prediction of adult height by machine learning technique","volume":"106","author":"Shmoish","year":"2021","journal-title":"J. Clin. Endocrinol. Metabolism"},{"key":"B95","doi-asserted-by":"publisher","first-page":"92","DOI":"10.1016\/j.compbiomed.2018.06.030","article-title":"Machine learning models to predict the progression from early to late stages of papillary renal cell carcinoma","volume":"100","author":"Singh","year":"2018","journal-title":"Comput. Biol. Med."},{"key":"B96","doi-asserted-by":"publisher","first-page":"386","DOI":"10.1037\/1082-989X.9.3.386","article-title":"Properties of the hubert-arable adjusted Rand index","volume":"9","author":"Steinley","year":"2004","journal-title":"Psychol. Methods"},{"key":"B97","doi-asserted-by":"publisher","first-page":"1155","DOI":"10.1038\/s41598-020-80814-z","article-title":"Prediction of lithium response using genomic data","volume":"11","author":"Stone","year":"2021","journal-title":"Sci. Rep."},{"key":"B98","doi-asserted-by":"publisher","first-page":"e106","DOI":"10.1093\/nar\/gkx204","article-title":"Differential expression analysis for RNAseq using Poisson mixed models","volume":"45","author":"Sun","year":"2017","journal-title":"Nucleic Acids Res."},{"key":"B99","doi-asserted-by":"publisher","first-page":"347","DOI":"10.4103\/jfsm.jfsm_36_23","article-title":"Pinpointing the short-tandem repeats alleles for ethnic inferencing in forensic identification by K-medoids approach","volume":"9","author":"Syukriani","year":"2023","journal-title":"J. Forensic Sci. Med."},{"key":"B100","doi-asserted-by":"publisher","first-page":"616","DOI":"10.1038\/s41586-023-06139-9","article-title":"Transfer learning enables predictions in network biology","volume":"618","author":"Theodoris","year":"2023","journal-title":"Nature"},{"key":"B101","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-658-20540-9","volume-title":"Projection-based clustering through self-organization and swarm intelligence","author":"Thrun","year":"2018"},{"key":"B102","doi-asserted-by":"publisher","first-page":"411","DOI":"10.1111\/1467-9868.00293","article-title":"Estimating the number of clusters in a data set via the gap statistic","volume":"63","author":"Tibshirani","year":"2001","journal-title":"J. R. Stat. Soc. Ser. B Stat. Methodol."},{"key":"B103","doi-asserted-by":"publisher","first-page":"103","DOI":"10.1002\/ajmg.b.32638","article-title":"Machine learning in schizophrenia genomics, a case-control study using 5,090 exomes","volume":"180","author":"Trakadis","year":"2019","journal-title":"Am. J. Med. Genet. Part B Neuropsychiatric Genet."},{"key":"B104","first-page":"2837","article-title":"Information theoretic measures for clusterings comparison: variants, properties, normalization and correction for chance","volume-title":"J. Machi. Lear. Resear","author":"Vinh","year":"2010"},{"key":"B105","doi-asserted-by":"publisher","first-page":"25","DOI":"10.1038\/s43856-021-00028-w","article-title":"Mitigating bias in machine learning for medicine","volume":"1","author":"Vokinger","year":"2021","journal-title":"Commun. Med."},{"key":"B106","volume-title":"Comparing clusterings-an overview","author":"Wagner","year":"2007"},{"key":"B107","doi-asserted-by":"publisher","DOI":"10.1142\/S1094406021500141","article-title":"The impact of outliers on regression coefficients: a sensitivity analysis","volume":"56","author":"Wang","year":"2021","journal-title":"Int. J. Account."},{"key":"B108","doi-asserted-by":"publisher","first-page":"137","DOI":"10.1007\/978-3-031-08530-7_12","article-title":"The differential gene detecting method for identifying leukemia patients","author":"Wang","year":"2022"},{"key":"B109","doi-asserted-by":"publisher","first-page":"487","DOI":"10.1007\/s00357-022-09413-z","article-title":"Understanding the adjusted Rand index and other partition comparison indices based on counting object pairs","volume":"39","author":"Warrens","year":"2022","journal-title":"J. Classif."},{"key":"B110","doi-asserted-by":"publisher","first-page":"1479","DOI":"10.1007\/s00542-023-05473-2","article-title":"A novel method for diabetes classification and prediction with Pycaret","volume":"29","author":"Whig","year":"2023","journal-title":"Microsyst. Technol."},{"key":"B111","doi-asserted-by":"publisher","first-page":"749","DOI":"10.1016\/j.atmosenv.2008.10.005","article-title":"Ambiguities inherent in sums-of-squares-based error statistics","volume":"43","author":"Willmott","year":"2009","journal-title":"Atmos. Environ."},{"key":"B112","doi-asserted-by":"publisher","first-page":"4482","DOI":"10.1038\/s41598-021-83828-3","article-title":"Machine learning approaches for the prediction of bone mineral density by using genomic and phenotypic data of 5130 older men","volume":"11","author":"Wu","year":"2021","journal-title":"Sci. Rep."},{"key":"B113","doi-asserted-by":"publisher","first-page":"295","DOI":"10.1016\/j.neucom.2020.07.061","article-title":"On hyperparameter optimization of machine learning algorithms: theory and practice","volume":"415","author":"Yang","year":"2020","journal-title":"Neurocomputing"},{"key":"B114","doi-asserted-by":"publisher","first-page":"405","DOI":"10.1186\/s12859-015-0825-4","article-title":"Rare variants analysis using penalization methods for whole genome sequence data","volume":"16","author":"Yazdani","year":"2015","journal-title":"BMC Bioinforma."},{"key":"B115","doi-asserted-by":"publisher","first-page":"106","DOI":"10.1016\/j.jpsychires.2018.09.010","article-title":"Leveraging genome-wide association and clinical data in revealing schizophrenia subgroups","volume":"106","author":"Yin","year":"2018","journal-title":"J. Psychiatr. Res."},{"key":"B116","doi-asserted-by":"publisher","first-page":"167","DOI":"10.1186\/s12864-019-5546-z","article-title":"Architectures and accuracy of artificial neural network for disease classification from omics data","volume":"20","author":"Yu","year":"2019","journal-title":"BMC Genomics"},{"key":"B117","doi-asserted-by":"publisher","first-page":"474","DOI":"10.1186\/s12859-020-03758-1","article-title":"NIMBus: a negative binomial regression based Integrative Method for mutation Burden Analysis","volume":"21","author":"Zhang","year":"2020","journal-title":"BMC Bioinforma."},{"key":"B118","doi-asserted-by":"publisher","first-page":"533","DOI":"10.1038\/s41551-021-00745-6","article-title":"Deep-learning models for the detection and incidence prediction of chronic kidney disease and type 2 diabetes from retinal fundus images","volume":"5","author":"Zhang","year":"2021","journal-title":"Nat. Biomed. Eng."},{"key":"B119","doi-asserted-by":"publisher","first-page":"183","DOI":"10.3390\/jpm13020183","article-title":"Multi-objective genetic algorithm for cluster analysis of single-cell transcriptomes","volume":"13","author":"Zhao","year":"2023","journal-title":"J. Pers. Med."},{"key":"B120","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1186\/s12575-018-0067-8","article-title":"Silhouette scores for arbitrary defined groups in gene expression data and insights into differential expression results","volume":"20","author":"Zhao","year":"2018","journal-title":"Biol. Proced. Online"},{"key":"B121","doi-asserted-by":"publisher","first-page":"bbac385","DOI":"10.1093\/bib\/bbac385","article-title":"A review and performance evaluation of clustering frameworks for single-cell Hi-C data","volume":"23","author":"Zhen","year":"2022","journal-title":"Brief. Bioinform"}],"container-title":["Frontiers in Bioinformatics"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fbinf.2024.1457619\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,10]],"date-time":"2024-09-10T09:21:24Z","timestamp":1725960084000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fbinf.2024.1457619\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,9,10]]},"references-count":121,"alternative-id":["10.3389\/fbinf.2024.1457619"],"URL":"https:\/\/doi.org\/10.3389\/fbinf.2024.1457619","relation":{},"ISSN":["2673-7647"],"issn-type":[{"value":"2673-7647","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,9,10]]},"article-number":"1457619"}}