{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,1]],"date-time":"2026-05-01T21:10:38Z","timestamp":1777669838396,"version":"3.51.4"},"reference-count":36,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2024,4,16]],"date-time":"2024-04-16T00:00:00Z","timestamp":1713225600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,4,16]],"date-time":"2024-04-16T00:00:00Z","timestamp":1713225600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"German Federal Ministry of Education and Research","award":["01ZX1510 532"],"award-info":[{"award-number":["01ZX1510 532"]}]},{"DOI":"10.13039\/501100004168","name":"Universit\u00e4t zu L\u00fcbeck","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100004168","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BioData Mining"],"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n                <jats:title>Background<\/jats:title>\n                <jats:p>Gene network information is believed to be beneficial for disease module and pathway identification, but has not been explicitly utilized in the standard random forest (RF) algorithm for gene expression data analysis. We investigate the performance of a network-guided RF where the network information is summarized into a sampling probability of predictor variables which is further used in the construction of the RF.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Results<\/jats:title>\n                <jats:p>Our simulation results suggest that network-guided RF does not provide better disease prediction than the standard RF. In terms of disease gene discovery, if disease genes form module(s), network-guided RF identifies them more accurately. In addition, when disease status is independent from genes in the given network, spurious gene selection results can occur when using network information, especially on hub genes. Our empirical analysis on two balanced microarray and RNA-Seq breast cancer datasets from The Cancer Genome Atlas (TCGA) for classification of progesterone receptor (PR) status also demonstrates that network-guided RF can identify genes from PGR-related pathways, which leads to a better connected module of identified genes.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Conclusions<\/jats:title>\n                <jats:p>Gene networks can provide additional information to aid the gene expression analysis for disease module and pathway identification. But they need to be used with caution and validation on the results need to be carried out to guard against spurious gene selection. More robust approaches to incorporate such information into RF construction also warrant further study.<\/jats:p>\n              <\/jats:sec>","DOI":"10.1186\/s13040-024-00361-5","type":"journal-article","created":{"date-parts":[[2024,4,16]],"date-time":"2024-04-16T15:13:02Z","timestamp":1713280382000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["Evaluation of network-guided random forest for disease gene discovery"],"prefix":"10.1186","volume":"17","author":[{"given":"Jianchang","family":"Hu","sequence":"first","affiliation":[]},{"given":"Silke","family":"Szymczak","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,4,16]]},"reference":[{"issue":"3","key":"361_CR1","doi-asserted-by":"publisher","first-page":"195","DOI":"10.1007\/s40484-018-0144-7","volume":"6","author":"WV Li","year":"2018","unstructured":"Li WV, Li JJ. Modeling and analysis of RNA-seq data: a review from a statistical perspective. Quant Biol. 2018;6(3):195\u2013209.","journal-title":"Quant Biol."},{"issue":"21","key":"361_CR2","doi-asserted-by":"publisher","first-page":"8685","DOI":"10.1073\/pnas.0701361104","volume":"104","author":"KI Goh","year":"2007","unstructured":"Goh KI, Cusick ME, Valle D, Childs B, Vidal M, Barab\u00e1si AL. The human disease network. Proc Natl Acad Sci. 2007;104(21):8685\u201390.","journal-title":"Proc Natl Acad Sci."},{"issue":"1","key":"361_CR3","doi-asserted-by":"publisher","first-page":"56","DOI":"10.1038\/nrg2918","volume":"12","author":"AL Barab\u00e1si","year":"2011","unstructured":"Barab\u00e1si AL, Gulbahce N, Loscalzo J. Network medicine: a network-based approach to human disease. Nat Rev Genet. 2011;12(1):56\u201368.","journal-title":"Nat Rev Genet."},{"issue":"4","key":"361_CR4","doi-asserted-by":"publisher","first-page":"644","DOI":"10.1101\/gr.071852.107","volume":"18","author":"T Ideker","year":"2008","unstructured":"Ideker T, Sharan R. Protein networks in disease. Genome Res. 2008;18(4):644\u201352.","journal-title":"Genome Res."},{"issue":"1","key":"361_CR5","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1023\/A:1010933404324","volume":"45","author":"L Breiman","year":"2001","unstructured":"Breiman L. Random forests. Mach Learn. 2001;45(1):5\u201332.","journal-title":"Mach Learn."},{"issue":"3","key":"361_CR6","doi-asserted-by":"publisher","first-page":"841","DOI":"10.1214\/08-AOAS169","volume":"2","author":"H Ishwaran","year":"2008","unstructured":"Ishwaran H, Kogalur UB, Blackstone EH, Lauer MS. Random survival forests. Ann Appl Stat. 2008;2(3):841\u201360.","journal-title":"Ann Appl Stat."},{"issue":"6","key":"361_CR7","doi-asserted-by":"publisher","first-page":"323","DOI":"10.1016\/j.ygeno.2012.04.003","volume":"99","author":"X Chen","year":"2012","unstructured":"Chen X, Ishwaran H. Random forests for genomic data analysis. Genomics. 2012;99(6):323\u20139.","journal-title":"Genomics."},{"issue":"11","key":"361_CR8","doi-asserted-by":"publisher","first-page":"2783","DOI":"10.1890\/07-0539.1","volume":"88","author":"DR Cutler","year":"2007","unstructured":"Cutler DR, Edwards TC, Beard KH, Cutler A, Hess KT, Gibson J, et al. Random forests for classification in ecology. Ecology. 2007;88(11):2783\u201392. https:\/\/doi.org\/10.1890\/07-0539.1.","journal-title":"Ecology."},{"issue":"18","key":"361_CR9","doi-asserted-by":"publisher","first-page":"2010","DOI":"10.1093\/bioinformatics\/btn356","volume":"24","author":"D Amaratunga","year":"2008","unstructured":"Amaratunga D, Cabrera J, Lee YS. Enriched random forests. Bioinformatics. 2008;24(18):2010\u20134.","journal-title":"Bioinformatics."},{"key":"361_CR10","doi-asserted-by":"crossref","unstructured":"Liu Y, Zhao H. Variable importance-weighted random forests. Quant Biol. 2017;5:338\u201351.","DOI":"10.1007\/s40484-017-0121-6"},{"issue":"1","key":"361_CR11","doi-asserted-by":"publisher","first-page":"13202","DOI":"10.1038\/s41598-018-31497-0","volume":"8","author":"W Wang","year":"2018","unstructured":"Wang W, Liu W. Integration of gene interaction information into a reweighted random survival forest approach for accurate survival prediction and survival biomarker discovery. Sci Rep. 2018;8(1):13202.","journal-title":"Sci Rep."},{"issue":"2","key":"361_CR12","first-page":"151","volume":"4","author":"CA Lange","year":"2008","unstructured":"Lange CA, Yee D. Progesterone and breast cancer. Women\u2019s Health. 2008;4(2):151\u201362.","journal-title":"Women\u2019s Health."},{"key":"361_CR13","unstructured":"Breiman L, Friedman J, Stone CJ, Olshen RA. Classification and regression trees. CRC Press; 1984."},{"issue":"4","key":"361_CR14","doi-asserted-by":"publisher","first-page":"949","DOI":"10.1016\/j.ajhg.2008.02.013","volume":"82","author":"S K\u00f6hler","year":"2008","unstructured":"K\u00f6hler S, Bauer S, Horn D, Robinson PN. Walking the interactome for prioritization of candidate disease genes. Am J Hum Genet. 2008;82(4):949\u201358.","journal-title":"Am J Hum Genet."},{"key":"361_CR15","doi-asserted-by":"crossref","unstructured":"D\u00edaz-Uriarte R, Alvarez\u00a0de Andr\u00e9s S. Gene selection and classification of microarray data using random forest. BMC Bioinformatics. 2006;7:1\u201313.","DOI":"10.1186\/1471-2105-7-3"},{"issue":"11","key":"361_CR16","doi-asserted-by":"publisher","first-page":"2074","DOI":"10.1002\/sim.8086","volume":"38","author":"TP Morris","year":"2019","unstructured":"Morris TP, White IR, Crowther MJ. Using simulation studies to evaluate statistical methods. Stat Med. 2019;38(11):2074\u2013102.","journal-title":"Stat Med."},{"key":"361_CR17","doi-asserted-by":"crossref","unstructured":"Grimes T, Datta S. SeqNet: an R package for generating gene-gene networks and simulating RNA-seq data. J Stat Softw. 2021;98(12):1\u201349.","DOI":"10.18637\/jss.v098.i12"},{"issue":"5594","key":"361_CR18","doi-asserted-by":"publisher","first-page":"824","DOI":"10.1126\/science.298.5594.824","volume":"298","author":"R Milo","year":"2002","unstructured":"Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U. Network motifs: simple building blocks of complex networks. Science. 2002;298(5594):824\u20137.","journal-title":"Science."},{"key":"361_CR19","doi-asserted-by":"publisher","unstructured":"Wright MN, Ziegler A. ranger: a fast implementation of random forests for high dimensional data in C++ and R. J Stat Softw. 2017;77(1):1\u201317. https:\/\/doi.org\/10.18637\/jss.v077.i01.","DOI":"10.18637\/jss.v077.i01"},{"key":"361_CR20","doi-asserted-by":"publisher","first-page":"958","DOI":"10.1200\/CCI.19.00119","volume":"1","author":"M Ramos","year":"2020","unstructured":"Ramos M, Geistlinger L, Oh S, Schiffer L, Azhar R, Kodali H, et al. Multiomic integration of public oncology databases in bioconductor. JCO Clin Cancer Informat. 2020;1:958\u201371.","journal-title":"JCO Clin Cancer Informat."},{"issue":"2","key":"361_CR21","doi-asserted-by":"publisher","first-page":"492","DOI":"10.1093\/bib\/bbx124","volume":"20","author":"F Degenhardt","year":"2019","unstructured":"Degenhardt F, Seifert S, Szymczak S. Evaluation of variable selection methods for random forests and omics data sets. Brief Bioinform. 2019;20(2):492\u2013503.","journal-title":"Brief Bioinform."},{"issue":"D1","key":"361_CR22","doi-asserted-by":"publisher","first-page":"D638","DOI":"10.1093\/nar\/gkac1000","volume":"51","author":"D Szklarczyk","year":"2023","unstructured":"Szklarczyk D, Kirsch R, Koutrouli M, Nastou K, Mehryary F, Hachilif R, et al. The STRING database in 2023: protein-protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Res. 2023;51(D1):D638\u201346.","journal-title":"Nucleic Acids Res."},{"issue":"19","key":"361_CR23","doi-asserted-by":"publisher","first-page":"3663","DOI":"10.1093\/bioinformatics\/btz149","volume":"35","author":"S Seifert","year":"2019","unstructured":"Seifert S, Gundlach S, Szymczak S. Surrogate minimal depth as an importance measure for variables in random forests. Bioinformatics. 2019;35(19):3663\u201371.","journal-title":"Bioinformatics."},{"issue":"18","key":"361_CR24","doi-asserted-by":"publisher","first-page":"10577","DOI":"10.3390\/ijms231810577","volume":"23","author":"S Kumar","year":"2022","unstructured":"Kumar S, Prajapati KS, Gupta S. The multifaceted role of signal peptide-CUB-EGF domain-containing protein (SCUBE) in cancer. Int J Mol Sci. 2022;23(18):10577.","journal-title":"Int J Mol Sci."},{"issue":"9","key":"361_CR25","doi-asserted-by":"publisher","first-page":"1401","DOI":"10.2967\/jnumed.116.188011","volume":"58","author":"C Morgat","year":"2017","unstructured":"Morgat C, MacGrogan G, Brouste V, V\u00e9lasco V, Sevenet N, Bonnefoi H, et al. Expression of gastrin-releasing peptide receptor in breast cancer and its association with pathologic, biologic, and clinical parameters: a study of 1,432 primary tumors. J Nucl Med. 2017;58(9):1401\u20137.","journal-title":"J Nucl Med."},{"key":"361_CR26","doi-asserted-by":"publisher","first-page":"213","DOI":"10.1007\/s10549-012-2340-x","volume":"137","author":"JJ De Ronde","year":"2013","unstructured":"De Ronde JJ, Lips EH, Mulder L, Vincent AD, Wesseling J, Nieuwland M, et al. SERPINA6, BEX1, AGTR1, SLC26A3, and LAPTM4B are markers of resistance to neoadjuvant chemotherapy in HER2-negative breast cancer. Breast Cancer Res Treat. 2013;137:213\u201323.","journal-title":"Breast Cancer Res Treat."},{"issue":"3","key":"361_CR27","doi-asserted-by":"publisher","first-page":"323","DOI":"10.1038\/onc.2013.553","volume":"34","author":"I Moy","year":"2015","unstructured":"Moy I, Todorovi\u0107 V, Dubash A, Coon J, Parker JB, Buranapramest M, et al. Estrogen-dependent sushi domain containing 3 regulates cytoskeleton organization and migration in breast cancer cells. Oncogene. 2015;34(3):323\u201333.","journal-title":"Oncogene."},{"issue":"12","key":"361_CR28","doi-asserted-by":"publisher","first-page":"1190","DOI":"10.1038\/mp.2009.120","volume":"15","author":"T Bates","year":"2010","unstructured":"Bates T, Lind P, Luciano M, Montgomery G, Martin NG, Wright MJ. Dyslexia and DYX1C1: deficits in reading and spelling associated with a missense mutation. Mol Psychiatry. 2010;15(12):1190\u20136.","journal-title":"Mol Psychiatry."},{"key":"361_CR29","doi-asserted-by":"publisher","first-page":"1","DOI":"10.18637\/jss.v036.i11","volume":"36","author":"MB Kursa","year":"2010","unstructured":"Kursa MB, Rudnicki WR. Feature selection with the Boruta package. J Stat Softw. 2010;36:1\u201313.","journal-title":"J Stat Softw."},{"key":"361_CR30","doi-asserted-by":"publisher","first-page":"885","DOI":"10.1007\/s11634-016-0276-4","volume":"12","author":"S Janitza","year":"2018","unstructured":"Janitza S, Celik E, Boulesteix AL. A computationally fast variable importance test for random forests for high-dimensional data. ADAC. 2018;12:885\u2013915.","journal-title":"ADAC."},{"issue":"21","key":"361_CR31","doi-asserted-by":"publisher","first-page":"3711","DOI":"10.1093\/bioinformatics\/bty373","volume":"34","author":"S Nembrini","year":"2018","unstructured":"Nembrini S, K\u00f6nig IR, Wright MN. The revival of the Gini importance? Bioinformatics. 2018;34(21):3711\u20138.","journal-title":"Bioinformatics."},{"issue":"9","key":"361_CR32","doi-asserted-by":"publisher","first-page":"926","DOI":"10.1038\/nbt.3001","volume":"32","author":"C Wang","year":"2014","unstructured":"Wang C, Gong B, Bushel PR, Thierry-Mieg J, Thierry-Mieg D, Xu J, et al. The concordance between RNA-seq and microarray data depends on chemical treatment and transcript abundance. Nat Biotechnol. 2014;32(9):926\u201332.","journal-title":"Nat Biotechnol."},{"key":"361_CR33","doi-asserted-by":"publisher","first-page":"138","DOI":"10.12659\/MSMBR.892101","volume":"20","author":"KJ Mantione","year":"2014","unstructured":"Mantione KJ, Kream RM, Kuzelova H, Ptacek R, Raboch J, Samuel JM, et al. Comparing bioinformatic gene expression profiling methods: microarray and RNA-Seq. Med Sci Monit Basic Res. 2014;20:138.","journal-title":"Med Sci Monit Basic Res."},{"issue":"2","key":"361_CR34","first-page":"1","volume":"21","author":"X Guan","year":"2020","unstructured":"Guan X, Runger G, Liu L. Dynamic incorporation of prior knowledge from multiple domains in biomarker discovery. BMC Bioinformatics. 2020;21(2):1\u201310.","journal-title":"BMC Bioinformatics."},{"key":"361_CR35","doi-asserted-by":"publisher","first-page":"5160396","DOI":"10.1155\/2020\/5160396","volume":"2020","author":"R Zhao","year":"2020","unstructured":"Zhao R, Hu B, Chen L, Zhou B. Identification of latent oncogenes with a network embedding method and random forest. BioMed Res Int. 2020;2020:5160396.","journal-title":"BioMed Res Int."},{"issue":"14","key":"361_CR36","first-page":"1","volume":"21","author":"N Adnan","year":"2020","unstructured":"Adnan N, Lei C, Ruan J. Robust edge-based biomarker discovery improves prediction of breast cancer metastasis. BMC Bioinformatics. 2020;21(14):1\u201318.","journal-title":"BMC Bioinformatics."}],"container-title":["BioData Mining"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13040-024-00361-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13040-024-00361-5\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13040-024-00361-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,4,16]],"date-time":"2024-04-16T15:14:28Z","timestamp":1713280468000},"score":1,"resource":{"primary":{"URL":"https:\/\/biodatamining.biomedcentral.com\/articles\/10.1186\/s13040-024-00361-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,4,16]]},"references-count":36,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2024,12]]}},"alternative-id":["361"],"URL":"https:\/\/doi.org\/10.1186\/s13040-024-00361-5","relation":{},"ISSN":["1756-0381"],"issn-type":[{"value":"1756-0381","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,4,16]]},"assertion":[{"value":"15 February 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"9 April 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"16 April 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare no competing interests.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"10"}}