{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,3]],"date-time":"2026-06-03T06:32:33Z","timestamp":1780468353391,"version":"3.54.1"},"reference-count":27,"publisher":"Springer Science and Business Media LLC","issue":"S2","license":[{"start":{"date-parts":[[2020,3,1]],"date-time":"2020-03-01T00:00:00Z","timestamp":1583020800000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"},{"start":{"date-parts":[[2020,3,11]],"date-time":"2020-03-11T00:00:00Z","timestamp":1583884800000},"content-version":"vor","delay-in-days":10,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2020,3]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n                <jats:title>Background<\/jats:title>\n                <jats:p>In biomarker discovery, applying domain knowledge is an effective approach to eliminating false positive features, prioritizing functionally impactful markers and facilitating the interpretation of predictive signatures. Several computational methods have been developed that formulate the knowledge-based biomarker discovery as a feature selection problem guided by prior information. These methods often require that prior information is encoded as a single score and the algorithms are optimized for biological knowledge of a specific type. However, in practice, domain knowledge from diverse resources can provide complementary information. But no current methods can integrate heterogeneous prior information for biomarker discovery. To address this problem, we developed the Know-GRRF (know-guided regularized random forest) method that enables dynamic incorporation of domain knowledge from multiple disciplines to guide feature selection.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Results<\/jats:title>\n                <jats:p>Know-GRRF embeds domain knowledge in a regularized random forest framework. It combines prior information from multiple domains in a linear model to derive a composite score, which, together with other tuning parameters, controls the regularization of the random forests model. Know-GRRF concurrently optimizes the weight given to each type of domain knowledge and other tuning parameters to minimize the AIC of out-of-bag predictions. The objective is to select a compact feature subset that has a high discriminative power and strong functional relevance to the biological phenotype.<\/jats:p>\n                <jats:p>Via rigorous simulations, we show that Know-GRRF guided by multiple-domain prior information outperforms feature selection methods guided by single-domain prior information or no prior information. We then applied Known-GRRF to a real-world study to identify prognostic biomarkers of prostate cancers. We evaluated the combination of cancer-related gene annotations, evolutionary conservation and pre-computed statistical scores as the prior knowledge to assemble a panel of biomarkers. We discovered a compact set of biomarkers with significant improvements on prediction accuracies.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Conclusions<\/jats:title>\n                <jats:p>Know-GRRF is a powerful novel method to incorporate knowledge from multiple domains for feature selection. It has a broad range of applications in biomarker discoveries. We implemented this method and released a KnowGRRF package in the R\/CRAN archive.<\/jats:p>\n              <\/jats:sec>","DOI":"10.1186\/s12859-020-3344-x","type":"journal-article","created":{"date-parts":[[2020,3,13]],"date-time":"2020-03-13T02:02:49Z","timestamp":1584064969000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":16,"title":["Dynamic incorporation of prior knowledge from multiple domains in biomarker discovery"],"prefix":"10.1186","volume":"21","author":[{"given":"Xin","family":"Guan","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"George","family":"Runger","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Li","family":"Liu","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2020,3,11]]},"reference":[{"key":"3344_CR1","first-page":"2079","volume":"11","author":"GC Cawley","year":"2010","unstructured":"Cawley GC, Talbot NLC. On over-fitting in model selection and subsequent selection Bias in performance evaluation. J Mach Learn Res. 2010;11:2079\u2013107.","journal-title":"J Mach Learn Res"},{"issue":"8","key":"3344_CR2","doi-asserted-by":"publisher","first-page":"e103910","DOI":"10.1371\/journal.pone.0103910","volume":"9","author":"Z Liu","year":"2014","unstructured":"Liu Z, Zhang Y, Niu Y, Li K, Liu X, Chen H, Gao C. A systematic review and meta-analysis of diagnostic and prognostic serum biomarkers of colorectal cancer. PLoS One. 2014;9(8):e103910.","journal-title":"PLoS One"},{"issue":"5","key":"3344_CR3","doi-asserted-by":"publisher","first-page":"335","DOI":"10.1038\/nrg3706","volume":"15","author":"PC Sham","year":"2014","unstructured":"Sham PC, Purcell SM. Statistical power and significance testing in large-scale genetic studies. Nat Rev Genet. 2014;15(5):335\u201346.","journal-title":"Nat Rev Genet"},{"key":"3344_CR4","unstructured":"Li Y, Wu FX, Ngom A. A review on machine learning principles for multi-view biological data integration. Brief Bioinform. 2018;19(2):325\u201340"},{"issue":"2","key":"3344_CR5","doi-asserted-by":"publisher","first-page":"265","DOI":"10.1089\/cmb.2008.12TT","volume":"16","author":"X Chen","year":"2009","unstructured":"Chen X, Wang L. Integrating biological knowledge with gene expression profiles for survival prediction of cancer. J Comput Biol. 2009;16(2):265\u201378.","journal-title":"J Comput Biol"},{"key":"3344_CR6","doi-asserted-by":"publisher","first-page":"94","DOI":"10.1186\/1471-2105-13-94","volume":"13","author":"SM Hill","year":"2012","unstructured":"Hill SM, Neve RM, Bayani N, Kuo WL, Ziyad S, Spellman PT, Gray JW, Mukherjee S. Integrating biological knowledge into variable selection: an empirical Bayes approach with an application in cancer biology. BMC Bioinformatics. 2012;13:94.","journal-title":"BMC Bioinformatics"},{"issue":"1","key":"3344_CR7","doi-asserted-by":"publisher","first-page":"68","DOI":"10.1111\/eva.12417","volume":"10","author":"L Liu","year":"2017","unstructured":"Liu L, Chang Y, Yang T, Noren DP, Long B, Kornblau S, Qutub A, Ye J. Evolution-informed modeling improves outcome prediction for cancers. Evol Appl. 2017;10(1):68\u201376.","journal-title":"Evol Appl"},{"issue":"1","key":"3344_CR8","doi-asserted-by":"publisher","first-page":"37","DOI":"10.1517\/17530059.2012.718329","volume":"7","author":"JE McDermott","year":"2013","unstructured":"McDermott JE, Wang J, Mitchell H, Webb-Robertson BJ, Hafen R, Ramey J, Rodland KD. Challenges in biomarker discovery: combining expert insights with statistical analysis of complex Omics data. Expert Opin Med Diagn. 2013;7(1):37\u201351.","journal-title":"Expert Opin Med Diagn"},{"issue":"5439","key":"3344_CR9","doi-asserted-by":"publisher","first-page":"531","DOI":"10.1126\/science.286.5439.531","volume":"286","author":"TR Golub","year":"1999","unstructured":"Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999;286(5439):531\u20137.","journal-title":"Science"},{"issue":"18","key":"3344_CR10","doi-asserted-by":"publisher","first-page":"2831","DOI":"10.1093\/bioinformatics\/btw358","volume":"32","author":"H Zhou","year":"2016","unstructured":"Zhou H, Skolnick J. A knowledge-based approach for predicting gene-disease associations. Bioinformatics. 2016;32(18):2831\u20138.","journal-title":"Bioinformatics"},{"issue":"7","key":"3344_CR11","doi-asserted-by":"publisher","first-page":"1017","DOI":"10.1002\/sim.6792","volume":"35","author":"CB Peterson","year":"2016","unstructured":"Peterson CB, Stingo FC, Vannucci M. Joint Bayesian variable and graph selection for regression models with network-structured predictors. Stat Med. 2016;35(7):1017\u201331.","journal-title":"Stat Med"},{"issue":"2","key":"3344_CR12","doi-asserted-by":"publisher","first-page":"138","DOI":"10.1089\/cmb.2016.0140","volume":"24","author":"H Park","year":"2017","unstructured":"Park H, Niida A, Imoto S, Miyano S. Interaction-based feature selection for uncovering Cancer driver genes through copy number-driven expression level. J Comput Biol. 2017;24(2):138\u201352.","journal-title":"J Comput Biol"},{"key":"3344_CR13","doi-asserted-by":"crossref","unstructured":"Guan X, Liu L. Know-GRRF: Domain-Knowledge Informed Biomarker Discovery with Random Forests. In: International Conference on Bioinformatics and Biomedical Engineering. New York, NY: Springer; 2018. p. 3\u201314.","DOI":"10.1007\/978-3-319-78759-6_1"},{"key":"3344_CR14","doi-asserted-by":"crossref","unstructured":"Akaike H. Information theory and an extension of the maximum likelihood principle. In: Selected papers of hirotugu akaike. New York, NY: Springer; 1998. p. 199\u2013213.","DOI":"10.1007\/978-1-4612-1694-0_15"},{"issue":"8","key":"3344_CR15","doi-asserted-by":"publisher","first-page":"832","DOI":"10.1109\/34.709601","volume":"20","author":"TK Ho","year":"1998","unstructured":"Ho TK. The random subspace method for constructing decision forests. Ieee T Pattern Anal. 1998;20(8):832\u201344.","journal-title":"Ieee T Pattern Anal"},{"issue":"5","key":"3344_CR16","doi-asserted-by":"publisher","first-page":"1190","DOI":"10.1137\/0916069","volume":"16","author":"RH Byrd","year":"1995","unstructured":"Byrd RH, Lu P, Nocedal J, Zhu C. A limited memory algorithm for bound constrained optimization. SIAM J Sci Comput. 1995;16(5):1190\u2013208.","journal-title":"SIAM J Sci Comput"},{"key":"3344_CR17","unstructured":"Deng H, Runger G. Feature selection via regularized trees. In: Neural Networks (IJCNN), The 2012 International Joint Conference on. New York, NY: IEEE; 2012. p. 1\u20138."},{"issue":"1","key":"3344_CR18","doi-asserted-by":"publisher","first-page":"1","DOI":"10.18637\/jss.v033.i01","volume":"33","author":"J Friedman","year":"2010","unstructured":"Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw. 2010;33(1):1\u201322.","journal-title":"J Stat Softw"},{"key":"3344_CR19","doi-asserted-by":"publisher","first-page":"111","DOI":"10.1016\/j.ins.2014.05.042","volume":"282","author":"V Bolon-Canedo","year":"2014","unstructured":"Bolon-Canedo V, Sanchez-Marono N, Alonso-Betanzos A, Benitez JM, Herrera F. A review of microarray datasets and applied feature selection methods. Inf Sci. 2014;282:111\u201335.","journal-title":"Inf Sci"},{"issue":"5","key":"3344_CR20","doi-asserted-by":"publisher","first-page":"e2318","DOI":"10.1371\/journal.pone.0002318","volume":"3","author":"T Nakagawa","year":"2008","unstructured":"Nakagawa T, Kollmeyer TM, Morlan BW, Anderson SK, Bergstralh EJ, Davis BJ, Asmann YW, Klee GG, Ballman KV, Jenkins RB. A tissue biomarker panel predicting systemic progression after PSA recurrence post-definitive prostate cancer therapy. PLoS One. 2008;3(5):e2318.","journal-title":"PLoS One"},{"issue":"9","key":"3344_CR21","doi-asserted-by":"publisher","first-page":"855","DOI":"10.1038\/nmeth.2147","volume":"9","author":"S Kumar","year":"2012","unstructured":"Kumar S, Sanderford M, Gray VE, Ye J, Liu L. Evolutionary diagnosis method for variants in personal exomes. Nat Methods. 2012;9(9):855\u20136.","journal-title":"Nat Methods"},{"issue":"9","key":"3344_CR22","doi-asserted-by":"publisher","first-page":"377","DOI":"10.1016\/j.tig.2011.06.004","volume":"27","author":"S Kumar","year":"2011","unstructured":"Kumar S, Dudley JT, Filipski A, Liu L. Phylomedicine: an evolutionary telescope to explore and diagnose the universe of disease mutations. Trends Genet. 2011;27(9):377\u201386.","journal-title":"Trends Genet"},{"issue":"3","key":"3344_CR23","doi-asserted-by":"publisher","first-page":"231","DOI":"10.1038\/pcan.2016.17","volume":"19","author":"ES Antonarakis","year":"2016","unstructured":"Antonarakis ES, Armstrong AJ, Dehm SM, Luo J. Androgen receptor variant-driven prostate cancer: clinical implications and therapeutic targeting. Prostate Cancer Prostatic Dis. 2016;19(3):231\u201341.","journal-title":"Prostate Cancer Prostatic Dis"},{"issue":"7","key":"3344_CR24","doi-asserted-by":"publisher","first-page":"136","DOI":"10.21037\/atm.2016.03.35","volume":"4","author":"Z Zhang","year":"2016","unstructured":"Zhang Z. Variable selection with stepwise and best subset approaches. Ann Transl Med. 2016;4(7):136.","journal-title":"Ann Transl Med"},{"issue":"11","key":"3344_CR25","doi-asserted-by":"publisher","first-page":"696","DOI":"10.1038\/s41568-018-0060-1","volume":"18","author":"Z Sondka","year":"2018","unstructured":"Sondka Z, Bamford S, Cole CG, Ward SA, Dunham I, Forbes SA. The COSMIC Cancer gene census: describing genetic dysfunction across all human cancers. Nat Rev Cancer. 2018;18(11):696\u2013705.","journal-title":"Nat Rev Cancer"},{"issue":"Database issue","key":"3344_CR26","doi-asserted-by":"publisher","first-page":"D670","DOI":"10.1093\/nar\/gku1177","volume":"43","author":"KR Rosenbloom","year":"2015","unstructured":"Rosenbloom KR, Armstrong J, Barber GP, Casper J, Clawson H, Diekhans M, Dreszer TR, Fujita PA, Guruvadoo L, Haeussler M, et al. The UCSC genome browser database: 2015 update. Nucleic Acids Res. 2015;43(Database issue):D670\u201381.","journal-title":"Nucleic Acids Res"},{"issue":"6","key":"3344_CR27","doi-asserted-by":"publisher","first-page":"1252","DOI":"10.1093\/molbev\/mst037","volume":"30","author":"L Liu","year":"2013","unstructured":"Liu L, Kumar S. Evolutionary balancing is critical for correctly forecasting disease-associated amino acid variants. Mol Biol Evol. 2013;30(6):1252\u20137.","journal-title":"Mol Biol Evol"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-020-3344-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/s12859-020-3344-x\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-020-3344-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,3,11]],"date-time":"2021-03-11T00:06:00Z","timestamp":1615421160000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-020-3344-x"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,3]]},"references-count":27,"journal-issue":{"issue":"S2","published-print":{"date-parts":[[2020,3]]}},"alternative-id":["3344"],"URL":"https:\/\/doi.org\/10.1186\/s12859-020-3344-x","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,3]]},"assertion":[{"value":"11 March 2020","order":1,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Not applicable.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"77"}}