{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,19]],"date-time":"2025-12-19T15:34:25Z","timestamp":1766158465092,"version":"3.37.3"},"reference-count":49,"publisher":"Oxford University Press (OUP)","issue":"11","license":[{"start":{"date-parts":[[2020,4,14]],"date-time":"2020-04-14T00:00:00Z","timestamp":1586822400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"name":"Bioinformatics Center"},{"DOI":"10.13039\/501100002341","name":"Academy of Finland","doi-asserted-by":"publisher","award":["275151","292307","322761"],"award-info":[{"award-number":["275151","292307","322761"]}],"id":[{"id":"10.13039\/501100002341","id-type":"DOI","asserted-by":"publisher"}]},{"name":"EU H2020 LIFEPATH","award":["633666"],"award-info":[{"award-number":["633666"]}]},{"name":"EU FP7 NANOSOLUTIONS project","award":["FP7-309329"],"award-info":[{"award-number":["FP7-309329"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2020,6,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Omics technologies have the potential to facilitate the discovery of new biomarkers. However, only few omics-derived biomarkers have been successfully translated into clinical applications to date. Feature selection is a crucial step in this process that identifies small sets of features with high predictive power. Models consisting of a limited number of features are not only more robust in analytical terms, but also ensure cost effectiveness and clinical translatability of new biomarker panels. Here we introduce GARBO, a novel multi-island adaptive genetic algorithm to simultaneously optimize accuracy and set size in omics-driven biomarker discovery problems.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>Compared to existing methods, GARBO enables the identification of biomarker sets that best optimize the trade-off between classification accuracy and number of biomarkers. We tested GARBO and six alternative selection methods with two high relevant topics in precision medicine: cancer patient stratification and drug sensitivity prediction. We found multivariate biomarker models from different omics data types such as mRNA, miRNA, copy number variation, mutation and DNA methylation. The top performing models were evaluated by using two different strategies: the Pareto-based selection, and the weighted sum between accuracy and set size (w\u2009=\u20090.5). Pareto-based preferences show the ability of the proposed algorithm to search minimal subsets of relevant features that can be used to model accurate random forest-based classification systems. Moreover, GARBO systematically identified, on larger omics data types, such as gene expression and DNA methylation, biomarker panels exhibiting higher classification accuracy or employing a number\u00a0of\u00a0features much lower than those discovered with other methods. These results were confirmed on independent datasets.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>github.com\/Greco-Lab\/GARBO.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Contact<\/jats:title>\n                  <jats:p>dario.greco@tuni.fi<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaa144","type":"journal-article","created":{"date-parts":[[2020,2,26]],"date-time":"2020-02-26T20:18:20Z","timestamp":1582748300000},"page":"3393-3400","source":"Crossref","is-referenced-by-count":22,"title":["Feature set optimization in biomarker discovery from genome-scale data"],"prefix":"10.1093","volume":"36","author":[{"given":"V","family":"Fortino","sequence":"first","affiliation":[{"name":"Institute of Biomedicine , University of Eastern Finland, Kuopio 70210, Finland"}]},{"given":"G","family":"Scala","sequence":"additional","affiliation":[{"name":"Faculty of Medicine and Health Technology , Tampere University, Tampere 33100, Finland"},{"name":"Institute of Biotechnology , University of Helsinki, Helsinki 00014, Finland"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9195-9003","authenticated-orcid":false,"given":"D","family":"Greco","sequence":"additional","affiliation":[{"name":"Faculty of Medicine and Health Technology , Tampere University, Tampere 33100, Finland"},{"name":"Institute of Biotechnology , University of Helsinki, Helsinki 00014, Finland"}]}],"member":"286","published-online":{"date-parts":[[2020,4,14]]},"reference":[{"key":"2023062300082879600_btaa144-B1","doi-asserted-by":"crossref","first-page":"238","DOI":"10.1038\/onc.2011.224","article-title":"Willin\/FRMD6 expression activates the Hippo signaling pathway kinases in mammals and antagonizes oncogenic YAP","volume":"31","author":"Angus","year":"2012","journal-title":"Oncogene"},{"key":"2023062300082879600_btaa144-B2","doi-asserted-by":"crossref","first-page":"2","DOI":"10.1186\/s12918-014-0135-x","article-title":"Prediction of signaling cross-talks contributing to acquired drug resistance in breast cancer cells by Bayesian statistical modeling","volume":"9","author":"Azad","year":"2015","journal-title":"BMC Syst. Biol"},{"key":"2023062300082879600_btaa144-B3","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1007\/s12293-008-0005-4","article-title":"Improving the scalability of rule-based evolutionary learning","volume":"1","author":"Bacardit","year":"2009","journal-title":"Memetic Comput"},{"key":"2023062300082879600_btaa144-B4","doi-asserted-by":"crossref","first-page":"3101","DOI":"10.1105\/tpc.111.088153","article-title":"Functional network construction in Arabidopsis using rule-based machine learning on large-scale data sets","volume":"23","author":"Bassel","year":"2011","journal-title":"Plant Cell"},{"key":"2023062300082879600_btaa144-B5","doi-asserted-by":"crossref","first-page":"323","DOI":"10.1016\/j.ygeno.2012.04.003","article-title":"Random forests for genomic data analysis","volume":"99","author":"Chen","year":"2012","journal-title":"Genomics"},{"key":"2023062300082879600_btaa144-B6","doi-asserted-by":"crossref","first-page":"641","DOI":"10.1021\/acs.jcim.7b00447","article-title":"Optimal HTS fingerprint definitions by using a desirability function and a genetic algorithm","volume":"58","author":"Cortes Cabrera","year":"2018","journal-title":"J. Chem. Inf. Model"},{"key":"2023062300082879600_btaa144-B7","doi-asserted-by":"crossref","first-page":"492","DOI":"10.1093\/bib\/bbx124","article-title":"Evaluation of variable selection methods for random forests and omics data sets","volume":"20","author":"Degenhardt","year":"2019","journal-title":"Brief. Bioinformatics"},{"key":"2023062300082879600_btaa144-B8","doi-asserted-by":"crossref","first-page":"87","DOI":"10.1186\/1741-7015-10-87","article-title":"The failure of protein cancer biomarkers to reach the clinic: why, and what can be done to address the problem?","volume":"10","author":"Diamandis","year":"2012","journal-title":"BMC Med"},{"key":"2023062300082879600_btaa144-B9","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1186\/1471-2105-7-3","article-title":"Gene selection and classification of microarray data using random forest","volume":"7","author":"D\u00edaz-Uriarte","year":"2006","journal-title":"BMC Bioinformatics"},{"key":"2023062300082879600_btaa144-B10","doi-asserted-by":"crossref","first-page":"489","DOI":"10.1186\/s12885-015-1492-6","article-title":"Anticancer drug sensitivity prediction in cell lines from baseline gene expression through recursive feature selection","volume":"15","author":"Dong","year":"2015","journal-title":"BMC Cancer"},{"key":"2023062300082879600_btaa144-B11","first-page":"1595","article-title":"EGFR mutations as a prognostic and predictive marker in non-small-cell lung cancer","volume":"8","author":"Fang","year":"2014","journal-title":"Drug Des. Dev. Ther"},{"key":"2023062300082879600_btaa144-B12","doi-asserted-by":"crossref","first-page":"e107801","DOI":"10.1371\/journal.pone.0107801","article-title":"A robust and accurate method for feature selection and prioritization from multi-class OMICs data","volume":"9","author":"Fortino","year":"2014","journal-title":"PLoS One"},{"key":"2023062300082879600_btaa144-B13","doi-asserted-by":"crossref","first-page":"D808","DOI":"10.1093\/nar\/gks1094","article-title":"STRING v9.1: protein-protein interaction networks, with increased coverage and integration","volume":"41","author":"Franceschini","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2023062300082879600_btaa144-B14","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18637\/jss.v033.i01","article-title":"Regularization paths for generalized linear models via coordinate descent","volume":"33","author":"Friedman","year":"2010","journal-title":"J. Statist. Softw"},{"key":"2023062300082879600_btaa144-B15","doi-asserted-by":"crossref","first-page":"23857","DOI":"10.1038\/srep23857","article-title":"Prioritization of anticancer drugs against a cancer using genomic features of cancer cells: a step towards personalized medicine","volume":"6","author":"Gupta","year":"2016","journal-title":"Sci. Rep"},{"key":"2023062300082879600_btaa144-B16","doi-asserted-by":"crossref","first-page":"S4","DOI":"10.1186\/1471-2105-15-S13-S4","article-title":"Feature selection and classifier performance on diverse bio- logical datasets","volume":"15 (Suppl. 13","author":"Hemphill","year":"2014","journal-title":"BMC Bioinformatics"},{"key":"2023062300082879600_btaa144-B17","doi-asserted-by":"crossref","first-page":"180","DOI":"10.1016\/S0929-6646(09)60051-6","article-title":"Induction of Akt activity by chemotherapy confers acquired resistance","volume":"108","author":"Huang","year":"2009","journal-title":"J. Formos Med. Assoc"},{"key":"2023062300082879600_btaa144-B18","doi-asserted-by":"crossref","first-page":"963","DOI":"10.1373\/clinchem.2016.254649","article-title":"Waste, leaks, and failures in the biomarker pipeline","volume":"63","author":"Ioannidis","year":"2017","journal-title":"Clin. Chem"},{"key":"2023062300082879600_btaa144-B19","doi-asserted-by":"crossref","first-page":"a006593","DOI":"10.1101\/cshperspect.a006593","article-title":"The VEGF pathway in cancer and disease: responses, resistance, and the path forward","volume":"2","author":"Kieran","year":"2012","journal-title":"Cold Spring Harb. Perspect. Med"},{"key":"2023062300082879600_btaa144-B20","doi-asserted-by":"crossref","first-page":"8","DOI":"10.1186\/1471-2105-15-8","article-title":"Robustness of random forest-based gene selection methods","volume":"15","author":"Kursa","year":"2014","journal-title":"BMC Bioinformatics"},{"key":"2023062300082879600_btaa144-B21","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18637\/jss.v036.i11","article-title":"Feature selection with the Boruta package","volume":"36","author":"Kursa","year":"2010","journal-title":"J. Statist. Softw"},{"key":"2023062300082879600_btaa144-B22","doi-asserted-by":"crossref","first-page":"253","DOI":"10.1186\/1471-2105-12-253","article-title":"Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems","volume":"12","author":"L\u00ea Cao","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"2023062300082879600_btaa144-B23","doi-asserted-by":"crossref","first-page":"2005","DOI":"10.1002\/sim.4238","article-title":"A min-max combination of biomarkers to improve diagnostic accuracy","volume":"30","author":"Liu","year":"2011","journal-title":"Statist. Med"},{"key":"2023062300082879600_btaa144-B24","doi-asserted-by":"crossref","first-page":"e60028","DOI":"10.1371\/journal.pone.0060028","article-title":"Willin, an upstream component of the hippo signaling pathway, orchestrates mammalian peripheral nerve fibroblasts","volume":"8","author":"Moleirinho","year":"2013","journal-title":"PLoS One"},{"key":"2023062300082879600_btaa144-B25","doi-asserted-by":"crossref","first-page":"1422","DOI":"10.1109\/TCBB.2012.63","article-title":"Gene selection using iterative feature elimination random forests for survival outcomes","volume":"9","author":"Pang","year":"2012","journal-title":"IEEE\/ACM Trans. Comput. Biol. Bioinform"},{"key":"2023062300082879600_btaa144-B26","doi-asserted-by":"crossref","first-page":"1226","DOI":"10.1109\/TPAMI.2005.159","article-title":"Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy","volume":"27","author":"Peng","year":"2005","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell"},{"key":"2023062300082879600_btaa144-B27","doi-asserted-by":"crossref","first-page":"233","DOI":"10.1109\/CBMS.2014.10","volume-title":"2014 IEEE 27th International Symposium on Computer-Based Medical Systems","author":"Popovic","year":"2014"},{"key":"2023062300082879600_btaa144-B28","doi-asserted-by":"crossref","first-page":"1061","DOI":"10.1002\/ar.23777","article-title":"HMGA1 overexpression is associated with the malignant status and progression of breast cancer","volume":"301","author":"Qi","year":"2018","journal-title":"Anat. Rec. (Hoboken)"},{"key":"2023062300082879600_btaa144-B29","doi-asserted-by":"crossref","first-page":"11768","DOI":"10.1038\/s41598-017-11409-4","article-title":"HMGA1 regulates the Plasminogen activation system in the secretome of breast cancer cells","volume":"7","author":"Resmini","year":"2017","journal-title":"Sci. Rep"},{"key":"2023062300082879600_btaa144-B30","doi-asserted-by":"crossref","first-page":"1113","DOI":"10.1016\/j.ajpath.2013.08.002","article-title":"Molecular and cellular heterogeneity in breast cancer: challenges for personalized medicine","volume":"183","author":"Rivenbark","year":"2013","journal-title":"Am. J. Pathol"},{"key":"2023062300082879600_btaa144-B31","doi-asserted-by":"crossref","first-page":"e1005752","DOI":"10.1371\/journal.pcbi.1005752","article-title":"mixOmics: an R package for \u2018omics feature selection and multiple data integration","volume":"13","author":"Rohart","year":"2017","journal-title":"PLoS Comput. Biol"},{"key":"2023062300082879600_btaa144-B32","doi-asserted-by":"crossref","first-page":"1126","DOI":"10.1038\/s41467-017-01153-8","article-title":"Gene isoforms as expression-based biomarkers predictive of drug response in vitro","volume":"8","author":"Safikhani","year":"2017","journal-title":"Nat. Commun"},{"key":"2023062300082879600_btaa144-B33","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18637\/jss.v053.i04","article-title":"GA: a package for genetic algorithms in R","volume":"53","author":"Scrucca","year":"2013","journal-title":"J. Statist. Softw"},{"key":"2023062300082879600_btaa144-B34","doi-asserted-by":"crossref","first-page":"e660","DOI":"10.1371\/journal.pone.0000660","article-title":"p53 target gene SMAR1 is dysregulated in breast cancer: its role in cancer cell migration and invasion","volume":"2","author":"Singh","year":"2007","journal-title":"PLoS One"},{"key":"2023062300082879600_btaa144-B35","doi-asserted-by":"crossref","first-page":"1267","DOI":"10.1074\/jbc.M801088200","article-title":"Tumor suppressor SMAR1 represses IkappaBalpha expression and inhibits p65 transactivation through matrix attachment regions","volume":"284","author":"Singh","year":"2009","journal-title":"J. Biol. Chem"},{"key":"2023062300082879600_btaa144-B36","doi-asserted-by":"crossref","first-page":"888","DOI":"10.1038\/s41556-018-0142-z","article-title":"YAP\/TAZ upstream signals and downstream responses","volume":"20","author":"Totaro","year":"2018","journal-title":"Nat. Cell Biol"},{"key":"2023062300082879600_btaa144-B37","doi-asserted-by":"crossref","first-page":"1154","DOI":"10.1093\/bioinformatics\/btl074","article-title":"GALGO: an R package for multivariate variable selection using genetic algorithms","volume":"22","author":"Trevino","year":"2006","journal-title":"Bioinformatics"},{"key":"2023062300082879600_btaa144-B38","doi-asserted-by":"crossref","first-page":"57","DOI":"10.1016\/j.gpb.2017.04.001","article-title":"Disease biomarkers for precision medicine: challenges and future opportunities","volume":"15","author":"Wang","year":"2017","journal-title":"Genomics Proteomics Bioinformatics"},{"key":"2023062300082879600_btaa144-B39","doi-asserted-by":"crossref","first-page":"S15","DOI":"10.1186\/1752-0509-6-S1-S15","article-title":"Revealing metabolite biomarkers for acupuncture treatment by linear programming based feature selection","volume":"6 (Suppl. 1","author":"Wang","year":"2012","journal-title":"BMC Syst. Biol"},{"key":"2023062300082879600_btaa144-B40","doi-asserted-by":"crossref","first-page":"788","DOI":"10.3390\/biom9120788","article-title":"The impact of integrin-mediated matrix adhesion on cisplatin resistance of W1 ovarian cancer cells","volume":"9","author":"Wantoch von Rekowski","year":"2019","journal-title":"Biomolecules"},{"key":"2023062300082879600_btaa144-B41","doi-asserted-by":"crossref","first-page":"36","DOI":"10.1007\/s12032-014-0036-2","article-title":"A novel point mutation in exon 20 of EGFR showed sensitivity to erlotinib","volume":"31","author":"Xing","year":"2014","journal-title":"Med. Oncol"},{"key":"2023062300082879600_btaa144-B42","doi-asserted-by":"crossref","first-page":"606","DOI":"10.1109\/TEVC.2015.2504420","article-title":"A survey on evolutionary computation approaches to feature selection","volume":"20","author":"Xue","year":"2016","journal-title":"IEEE Trans. Evol. Comput"},{"key":"2023062300082879600_btaa144-B43","doi-asserted-by":"crossref","first-page":"26","DOI":"10.1186\/s12943-019-0954-x","article-title":"Targeting PI3K in cancer: mechanisms and advances in clinical trials","volume":"18","author":"Yang","year":"2019","journal-title":"Mol. Cancer"},{"key":"2023062300082879600_btaa144-B44","doi-asserted-by":"crossref","first-page":"180","DOI":"10.1016\/j.lungcan.2009.11.006","article-title":"Expression of candidate tumor suppressor gene ING2 is lost in non-small cell lung carcinoma","volume":"69","author":"Ythier","year":"2010","journal-title":"Lung Cancer"},{"year":"2008","author":"Yu","key":"2023062300082879600_btaa144-B45"},{"key":"2023062300082879600_btaa144-B46","doi-asserted-by":"crossref","first-page":"29","DOI":"10.1016\/j.swevo.2018.02.021","article-title":"Large-dimensionality small-instance set feature selection: a hybrid bio-inspired heuristic approach","volume":"42","author":"Zawbaa","year":"2018","journal-title":"Swarm Evol. Comput"},{"key":"2023062300082879600_btaa144-B47","doi-asserted-by":"crossref","first-page":"7502","DOI":"10.18632\/oncotarget.10649","article-title":"BTG1 might be employed as a biomarker for carcinogenesis and a target for gene therapy in colorectal cancers","volume":"8","author":"Zhao","year":"2017","journal-title":"Oncotarget"},{"key":"2023062300082879600_btaa144-B48","doi-asserted-by":"crossref","first-page":"3236","DOI":"10.1016\/j.patcog.2007.02.007","article-title":"Markov blanket-embedded genetic algorithm for gene selection","volume":"40","author":"Zhu","year":"2007","journal-title":"Pattern Recognit"},{"key":"2023062300082879600_btaa144-B49","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1016\/j.ymeth.2015.05.011","article-title":"A novel mixed integer programming for multi-biomarker panel identification by distinguishing malignant from benign colorectal tumors","volume":"83","author":"Zou","year":"2015","journal-title":"Methods"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaa144\/33041568\/btaa144.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/11\/3393\/50670732\/bioinformatics_36_11_3393.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/36\/11\/3393\/50670732\/bioinformatics_36_11_3393.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,24]],"date-time":"2023-06-24T18:21:36Z","timestamp":1687630896000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/36\/11\/3393\/5771332"}},"subtitle":[],"editor":[{"given":"Anthony","family":"Mathelier","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2020,4,14]]},"references-count":49,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2020,6,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaa144","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2020,6]]},"published":{"date-parts":[[2020,4,14]]}}}