{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:33:45Z","timestamp":1772138025525,"version":"3.50.1"},"reference-count":48,"publisher":"Oxford University Press (OUP)","issue":"4","license":[{"start":{"date-parts":[[2025,7,11]],"date-time":"2025-07-11T00:00:00Z","timestamp":1752192000000},"content-version":"vor","delay-in-days":10,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"R\u00e9gion Wallonne within the project WalInnov-NACATS","award":["1610125"],"award-info":[{"award-number":["1610125"]}]},{"name":"European Union\u2019s Horizon 2020 research and innovation program"},{"name":"Marie Sk\u0142odowska-Curie","award":["813533"],"award-info":[{"award-number":["813533"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,7,2]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Biomarker signature discovery remains the main path to developing clinical diagnostic tools when the biological knowledge on pathology is weak. Shortest signatures are often preferred to reduce the cost of the diagnostic. The ability to find the best and shortest signature relies on the robustness of the models that can be built on such a set of molecules. The classification algorithm that will be used is often selected based on the average Area Under the Curve (AUC) performance of its models. However, it is not guaranteed that an algorithm with a large AUC distribution will keep a stable performance when facing data. Here, we propose two AUC-derived hyper-stability scores, the Hyper-stability Resampling Sensitive (HRS) and the Hyper-stability Signature Sensitive (HSS), as complementary metrics to the average AUC that should bring confidence in the choice for the best classification algorithm. To emphasize the importance of these scores, we compared 15 different Random Forest implementations. Our findings show that the Random Forest implementation should be chosen according to the data at hand and the classification question being evaluated. No Random Forest implementation can be used universally for any classification and on any dataset. Each of them should be tested for their average AUC performance and AUC-derived stability, prior to analysis.<\/jats:p>","DOI":"10.1093\/bib\/bbaf318","type":"journal-article","created":{"date-parts":[[2025,6,25]],"date-time":"2025-06-25T08:05:15Z","timestamp":1750838715000},"source":"Crossref","is-referenced-by-count":0,"title":["Assessing Random Forest self-reproducibility for optimal short biomarker signature discovery"],"prefix":"10.1093","volume":"26","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2662-0772","authenticated-orcid":false,"given":"Ahmed","family":"Debit","sequence":"first","affiliation":[{"name":"Laboratory of Human Genetics, GIGA Institute, University of Liege (ULiege) , Avenue Hippocrate 1\/11, 4000 Liege ,","place":["Belgium"]},{"name":"BIO3, GIGA Institute, University of Liege (ULiege) , Avenue Hippocrate 1\/11, 4000 Liege ,","place":["Belgium"]},{"name":"Institut de Biologie de l\u2019ENS (IBENS), Ecole Normale Superieure , 46 rue d'Ulm, 75005 Paris ,","place":["France"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Christophe","family":"Poulet","sequence":"additional","affiliation":[{"name":"Department of Rheumatology, GIGA Institute, University Hospital of Liege (CHULiege), University of Liege (ULiege) , Avenue Hippocrate 1\/11, 4000 Liege ,","place":["Belgium"]},{"name":"Fibropole Research Group, University Hospital of Liege (CHULiege) , Avenue de l'Hopital 1, 4000 Liege ,","place":["Belgium"]},{"name":"GIGA-I3 Research Group, GIGA Institute, University of Liege (ULiege) and University Hospital of Liege (CHULiege) , Avenue Hippocrate 1\/11, 4000 Liege ,","place":["Belgium"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Claire","family":"Josse","sequence":"additional","affiliation":[{"name":"Laboratory of Human Genetics, GIGA Institute, University of Liege (ULiege) , Avenue Hippocrate 1\/11, 4000 Liege ,","place":["Belgium"]},{"name":"Oncology Department, University Hospital of Liege (CHULiege), University of Liege (ULiege) , Avenue de l'Hopital 1, 4000 Liege ,","place":["Belgium"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Guy","family":"Jerusalem","sequence":"additional","affiliation":[{"name":"Oncology Department, University Hospital of Liege (CHULiege), University of Liege (ULiege) , Avenue de l'Hopital 1, 4000 Liege ,","place":["Belgium"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chloe-Agathe","family":"Azencott","sequence":"additional","affiliation":[{"name":"Mines Paris, PSL Research University , CBIO-Centre for Computational Biology, 60 boulevard Saint-Michel, F-75006 Paris ,","place":["France"]},{"name":"Institut Curie, PSL Research University , 26 rue d'Ulm, F-75005 Paris ,","place":["France"]},{"name":"Inserm, U900 , 26 rue d'Ulm, F-75005 Paris ,","place":["France"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Vincent","family":"Bours","sequence":"additional","affiliation":[{"name":"Laboratory of Human Genetics, GIGA Institute, University of Liege (ULiege) , Avenue Hippocrate 1\/11, 4000 Liege ,","place":["Belgium"]},{"name":"Center for Human Genetics, University Hospital of Liege (CHULiege) , Avenue de l'Hopital 1, 4000 Liege ,","place":["Belgium"]}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kristel","family":"Van Steen","sequence":"additional","affiliation":[{"name":"BIO3, GIGA Institute, University of Liege (ULiege) , Avenue Hippocrate 1\/11, 4000 Liege ,","place":["Belgium"]}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2025,7,11]]},"reference":[{"key":"2025071023352945300_ref1","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s42003-019-0464-9","article-title":"High throughput proteomics identifies a high-accuracy 11 plasma protein biomarker signature for ovarian cancer","volume":"2","author":"Enroth","year":"2019","journal-title":"Commun Biol"},{"key":"2025071023352945300_ref2","doi-asserted-by":"publisher","first-page":"8","DOI":"10.1016\/j.ctrv.2018.09.005","article-title":"Predictive and on-treatment monitoring biomarkers in advanced melanoma: moving toward personalized medicine","volume":"71","author":"Tarhini","year":"2018","journal-title":"Cancer Treat Rev"},{"key":"2025071023352945300_ref3","article-title":"Clinical trials on Cancer and biomarkers","year":"2019"},{"key":"2025071023352945300_ref4","doi-asserted-by":"publisher","first-page":"1177271917715236","DOI":"10.1177\/1177271917715236","article-title":"Making meaningful clinical use of biomarkers","volume":"12","author":"Selleck","year":"2017","journal-title":"Biomark Insights"},{"key":"2025071023352945300_ref5","doi-asserted-by":"publisher","first-page":"284","DOI":"10.1016\/j.ejca.2017.01.017","article-title":"Clinical use of biomarkers in breast cancer: updated guidelines from the European group on tumor markers (EGTM)","volume":"75","author":"Duffy","year":"2017","journal-title":"Eur J Cancer"},{"key":"2025071023352945300_ref6","doi-asserted-by":"publisher","first-page":"5416","DOI":"10.18632\/oncotarget.6786","article-title":"Circulating microRNA-based screening tool for breast cancer","volume":"7","author":"Fr\u00e8res","year":"2015","journal-title":"Oncotarget"},{"key":"2025071023352945300_ref7","doi-asserted-by":"publisher","first-page":"34","DOI":"10.1186\/s12014-017-9169-6","article-title":"A multiplex platform for the identification of ovarian cancer biomarkers","volume":"14","author":"Boylan","year":"2017","journal-title":"Clin Proteomics"},{"key":"2025071023352945300_ref8","doi-asserted-by":"publisher","first-page":"66","DOI":"10.1186\/s12943-016-0548-9","article-title":"Prediction of chemo-response in serous ovarian cancer","volume":"15","author":"Gonzalez Bosquet","year":"2016","journal-title":"Mol Cancer"},{"key":"2025071023352945300_ref9","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach Learn"},{"key":"2025071023352945300_ref10","doi-asserted-by":"crossref","DOI":"10.1057\/9780230509993","article-title":"Classification and regression by randomForest","volume":"2","author":"Liaw","year":"2002","journal-title":"R News"},{"key":"2025071023352945300_ref11","doi-asserted-by":"publisher","DOI":"10.1214\/08-AOAS169","volume":"7","author":"Ishwaran","year":"2007","journal-title":"Random Survival Forests for R"},{"key":"2025071023352945300_ref12","doi-asserted-by":"publisher","first-page":"1","DOI":"10.18637\/jss.v077.i01","article-title":"Ranger: a fast implementation of random forests for high dimensional data in C++ and R","volume":"77","author":"Wright","year":"2017","journal-title":"J Stat Softw"},{"key":"2025071023352945300_ref13","first-page":"3905","article-title":"Partykit: a modular toolkit for recursive Partytioning in R","volume":"16","author":"Hothorn","year":"2015","journal-title":"J Mach Learn Res"},{"key":"2025071023352945300_ref14","article-title":"Rborist: extensible, parallelizable implementation of the random Forest algorithm 2019.","volume-title":"XVI CongressoBrasileiro De Engenharia Ciencias Dos Materiais","author":"Seligman"},{"key":"2025071023352945300_ref15","doi-asserted-by":"publisher","first-page":"1677","DOI":"10.1587\/transinf.E97.D.1677","article-title":"Tree-based ensemble multi-task learning method for classification and regression","volume":"E97.D","author":"Simm","year":"2014","journal-title":"IEICE Trans Inf Syst"},{"key":"2025071023352945300_ref16","author":"Ciss","journal-title":"Random Uniform Forests"},{"key":"2025071023352945300_ref17","doi-asserted-by":"publisher","volume-title":"The 2012 International Joint Conference on Neural Networks (IJCNN)","DOI":"10.1109\/IJCNN.2012.6252640"},{"key":"2025071023352945300_ref18","doi-asserted-by":"publisher","first-page":"1","DOI":"10.18637\/jss.v077.i03","article-title":"Wsrf: an R package for classification with scalable weighted subspace random forests","volume":"77","author":"Zhao","year":"2017","journal-title":"J Stat Softw"},{"key":"2025071023352945300_ref19","doi-asserted-by":"publisher","first-page":"1943","DOI":"10.1073\/pnas.1711236115","article-title":"Iterative random forests to discover predictive and stable high-order interactions","volume":"115","author":"Basu","year":"2018","journal-title":"Proc Natl Acad Sci USA"},{"key":"2025071023352945300_ref20","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1507.05444"},{"key":"2025071023352945300_ref21","doi-asserted-by":"publisher","first-page":"1168","DOI":"10.1080\/10618600.2020.1870480","article-title":"A projection pursuit Forest algorithm for supervised classification","volume":"30","author":"Silva","year":"2021","journal-title":"J Comput Graph Stat"},{"key":"2025071023352945300_ref22","first-page":"453","volume-title":"Lecture Notes in Computer Science","author":"Menze","year":"2011"},{"key":"2025071023352945300_ref23"},{"key":"2025071023352945300_ref24","article-title":"Rerf: Randomer Forest","author":"Browne","year":"2019"},{"key":"2025071023352945300_ref25","doi-asserted-by":"publisher","first-page":"952","DOI":"10.1093\/bioinformatics\/btv677","article-title":"TCGA2STAT: simple TCGA data access for integrated statistical analysis in R","volume":"32","author":"Wan","year":"2016","journal-title":"Bioinformatics"},{"key":"2025071023352945300_ref26","doi-asserted-by":"publisher","volume-title":"Proceedings of the 2011 IEEE 23rd International Conference on Tools with Artificial Intelligence (ICTAI '11)","DOI":"10.1109\/ICTAI.2011.167"},{"key":"2025071023352945300_ref27","doi-asserted-by":"publisher","first-page":"1","DOI":"10.18637\/jss.v028.i05","article-title":"Building predictive models in R using the caret package","volume":"28","author":"Kuhn","year":"2008","journal-title":"J Stat Softw"},{"key":"2025071023352945300_ref28"},{"key":"2025071023352945300_ref29","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/RoboMech.2016.7813171","volume-title":"Proceedings of the 2016 Pattern Recognition Association of South Africa and Robotics and Mechatronics International Conference (PRASA-RobMech)","author":"Pretorius","year":"2016"},{"key":"2025071023352945300_ref30","doi-asserted-by":"publisher","first-page":"559","DOI":"10.1186\/1471-2105-9-559","article-title":"WGCNA: an R package for weighted correlation network analysis","volume":"9","author":"Langfelder","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2025071023352945300_ref31","doi-asserted-by":"publisher","first-page":"e1001057","DOI":"10.1371\/journal.pcbi.1001057","article-title":"Is my network module preserved and reproducible?","volume":"7","author":"Langfelder","year":"2011","journal-title":"PLoS Comput Biol"},{"key":"2025071023352945300_ref32","doi-asserted-by":"publisher","first-page":"145","DOI":"10.1111\/j.1466-8238.2007.00358.x","article-title":"AUC: a misleading measure of the performance of predictive distribution models","volume":"17","author":"Lobo","year":"2008","journal-title":"Glob Ecol Biogeogr"},{"key":"2025071023352945300_ref33","doi-asserted-by":"publisher","first-page":"e91249","DOI":"10.1371\/journal.pone.0091249","article-title":"Alternative performance measures for prediction models","volume":"9","author":"Wu","year":"2014","journal-title":"PLoS One"},{"key":"2025071023352945300_ref34","doi-asserted-by":"publisher","first-page":"822","DOI":"10.1093\/bioinformatics\/btq037","article-title":"Small-sample precision of ROC-related estimates","volume":"26","author":"Hanczar","year":"2010","journal-title":"Bioinformatics"},{"key":"2025071023352945300_ref35","doi-asserted-by":"publisher","DOI":"10.1201\/9781315114590"},{"key":"2025071023352945300_ref36","article-title":"Advances in random forests with application to classification"},{"key":"2025071023352945300_ref37","article-title":"To tune or not to tune the number of trees in random Forest","volume-title":"Journal of Machine Learning Research","author":"Probst"},{"key":"2025071023352945300_ref38","doi-asserted-by":"publisher","volume":"42","journal-title":"Nat Biotechnol","DOI":"10.1038\/s41587-023-02033-x"},{"key":"2025071023352945300_ref39","doi-asserted-by":"publisher","DOI":"10.1214\/13-EJS810","article-title":"PPtree: projection pursuit classification tree","volume":"7","author":"Lee","year":"2013","journal-title":"Electron J Statist"},{"key":"2025071023352945300_ref40","doi-asserted-by":"publisher","first-page":"5923","DOI":"10.1073\/pnas.0601231103","article-title":"Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer","volume":"103","author":"Ein-Dor","year":"2006","journal-title":"Proc Natl Acad Sci"},{"key":"2025071023352945300_ref41"},{"key":"2025071023352945300_ref42","doi-asserted-by":"publisher","first-page":"e0118432","DOI":"10.1371\/journal.pone.0118432","article-title":"The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets","volume":"10","author":"Saito","year":"2015","journal-title":"PLoS One"},{"key":"2025071023352945300_ref43","volume-title":"Proc. Seventh Australasian Data Mining Conference (AusDM 2008)"},{"key":"2025071023352945300_ref44","volume-title":"Int J Oncol Cancer Ther"},{"key":"2025071023352945300_ref45","volume-title":"Stat. Anal. Data Min. ASA Data Sci. J"},{"key":"2025071023352945300_ref46","doi-asserted-by":"publisher","first-page":"e0201904","DOI":"10.1371\/journal.pone.0201904","article-title":"On the overestimation of random Forest\u2019s out-of-bag error","volume":"13","author":"Janitza","year":"2018","journal-title":"PLoS One"},{"key":"2025071023352945300_ref47","doi-asserted-by":"publisher","first-page":"1986","DOI":"10.1093\/bioinformatics\/btr300","article-title":"Classification with correlated features: unreliability of feature ranking and solutions","volume":"27","author":"Tolo\u015fi","year":"2011","journal-title":"Bioinformatics"},{"key":"2025071023352945300_ref48","doi-asserted-by":"publisher","first-page":"e1000790","DOI":"10.1371\/journal.pcbi.1000790","article-title":"Analysis and computational dissection of molecular signature multiplicity","volume":"6","author":"Statnikov","year":"2010","journal-title":"PLoS Comput Biol"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/26\/4\/bbaf318\/63725421\/bbaf318.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/26\/4\/bbaf318\/63725421\/bbaf318.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,11]],"date-time":"2025-07-11T03:35:38Z","timestamp":1752204938000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbaf318\/8196357"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7]]},"references-count":48,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025,7,2]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbaf318","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2023.03.29.534695","asserted-by":"object"}]},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,7]]},"published":{"date-parts":[[2025,7]]},"article-number":"bbaf318"}}