{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,7]],"date-time":"2026-04-07T16:33:12Z","timestamp":1775579592068,"version":"3.50.1"},"reference-count":42,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2016,2,3]],"date-time":"2016-02-03T00:00:00Z","timestamp":1454457600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2016,2,3]],"date-time":"2016-02-03T00:00:00Z","timestamp":1454457600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"National Natural Science Foundation of China under Grant","award":["61202144"],"award-info":[{"award-number":["61202144"]}]},{"name":"Natural Science Foundation of Fujian Province in China","award":["2012J01274"],"award-info":[{"award-number":["2012J01274"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n                <jats:title>Background<\/jats:title>\n                <jats:p>The stability of Variable Importance Measures (VIMs) based on random forest has recently received increased attention. Despite the extensive attention on traditional stability of data perturbations or parameter variations, few studies include influences coming from the intrinsic randomness in generating VIMs, i.e. bagging, randomization and permutation. To address these influences, in this paper we introduce a new concept of intrinsic stability of VIMs, which is defined as the self-consistence among feature rankings in repeated runs of VIMs without data perturbations and parameter variations. Two widely used VIMs, i.e., Mean Decrease Accuracy (MDA) and Mean Decrease Gini (MDG) are comprehensively investigated. The motivation of this study is two-fold. First, we empirically verify the prevalence of intrinsic stability of VIMs over many real-world datasets to highlight that the instability of VIMs does not originate exclusively from data perturbations or parameter variations, but also stems from the intrinsic randomness of VIMs. Second, through Spearman and Pearson tests we comprehensively investigate how different factors influence the intrinsic stability.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Results<\/jats:title>\n                <jats:p>The experiments are carried out on 19 benchmark datasets with diverse characteristics, including 10 high-dimensional and small-sample gene expression datasets. Experimental results demonstrate the prevalence of intrinsic stability of VIMs. Spearman and Pearson tests on the correlations between intrinsic stability and different factors show that #feature (number of features) and #sample (size of sample) have a coupling effect on the intrinsic stability. The synthetic indictor, #feature\/#sample, shows both negative monotonic correlation and negative linear correlation with the intrinsic stability, while OOB accuracy has monotonic correlations with intrinsic stability. This indicates that high-dimensional, small-sample and high complexity datasets may suffer more from intrinsic instability of VIMs. Furthermore, with respect to parameter settings of random forest, a large number of trees is preferred. No significant correlations can be seen between intrinsic stability and other factors. Finally, the magnitude of intrinsic stability is always smaller than that of traditional stability.<\/jats:p>\n              <\/jats:sec><jats:sec>\n                <jats:title>Conclusion<\/jats:title>\n                <jats:p>First, the prevalence of intrinsic stability of VIMs demonstrates that the instability of VIMs not only comes from data perturbations or parameter variations, but also stems from the intrinsic randomness of VIMs. This finding gives a better understanding of VIM stability, and may help reduce the instability of VIMs. Second, by investigating the potential factors of intrinsic stability, users would be more aware of the risks and hence more careful when using VIMs, especially on high-dimensional, small-sample and high complexity datasets.<\/jats:p>\n              <\/jats:sec>","DOI":"10.1186\/s12859-016-0900-5","type":"journal-article","created":{"date-parts":[[2016,2,3]],"date-time":"2016-02-03T02:10:51Z","timestamp":1454465451000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":190,"title":["An experimental study of the intrinsic stability of random forest variable importance measures"],"prefix":"10.1186","volume":"17","author":[{"given":"Huazhen","family":"Wang","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Fan","family":"Yang","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhiyuan","family":"Luo","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2016,2,3]]},"reference":[{"issue":"1","key":"900_CR1","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1023\/A:1010933404324","volume":"45","author":"L Breiman","year":"2001","unstructured":"Breiman L. Random forests. Mach Learn. 2001; 45(1):5\u201332.","journal-title":"Mach Learn"},{"key":"900_CR2","volume-title":"Computational Intelligence and Bioinformatics and Computational Biology, 2006. CIBCB\u201906. 2006 IEEE Symposium On","author":"DM Reif","year":"2006","unstructured":"Reif DM, Motsinger AA, McKinney BA, Crowe JE, Moore JH. Feature selection using a random forests classifier for the integrated analysis of multiple data types. In: Computational Intelligence and Bioinformatics and Computational Biology, 2006. CIBCB\u201906. 2006 IEEE Symposium On. Toronto, Canada: IEEE: 2006. p. 1\u20138."},{"issue":"1","key":"900_CR3","doi-asserted-by":"publisher","first-page":"3","DOI":"10.1186\/1471-2105-7-3","volume":"7","author":"R D\u00edaz-Uriarte","year":"2006","unstructured":"D\u00edaz-Uriarte R, De Andres SA. Gene selection and classification of microarray data using random forest. BMC Bioinformatics. 2006; 7(1):3.","journal-title":"BMC Bioinformatics"},{"key":"900_CR4","volume-title":"Pattern Recognition and Image Analysis","author":"O Okun","year":"2007","unstructured":"Okun O, Priisalu H. Random forest for gene expression based cancer classification: overlooked issues. In: Pattern Recognition and Image Analysis. Girona, Spain: Springer: 2007. p. 483\u201390."},{"issue":"1","key":"900_CR5","doi-asserted-by":"publisher","first-page":"319","DOI":"10.1186\/1471-2105-9-319","volume":"9","author":"A Statnikov","year":"2008","unstructured":"Statnikov A, Wang L, Aliferis CF. A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification. BMC Bioinformatics. 2008; 9(1):319.","journal-title":"BMC Bioinformatics"},{"issue":"6","key":"900_CR6","first-page":"493","volume":"2","author":"AL Boulesteix","year":"2012","unstructured":"Boulesteix AL, Janitza S, Kruppa J, K\u00f6nig IR. Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics. Wiley Interdiscip Rev: Data Min Knowl Discov. 2012; 2(6):493\u2013507.","journal-title":"Wiley Interdiscip Rev: Data Min Knowl Discov"},{"issue":"14","key":"900_CR7","doi-asserted-by":"publisher","first-page":"1603","DOI":"10.1093\/bioinformatics\/btn239","volume":"24","author":"SS Lee","year":"2008","unstructured":"Lee SS, Sun L, Kustra R, Bull SB. Em-random forest and new measures of variable importance for multi-locus quantitative trait linkage analysis. Bioinformatics. 2008; 24(14):1603\u201310.","journal-title":"Bioinformatics"},{"issue":"10","key":"900_CR8","doi-asserted-by":"publisher","first-page":"1340","DOI":"10.1093\/bioinformatics\/btq134","volume":"26","author":"A Altmann","year":"2010","unstructured":"Altmann A, Tolo\u015fi L, Sander O, Lengauer T. Permutation importance: a corrected feature importance measure. Bioinformatics. 2010; 26(10):1340\u20137.","journal-title":"Bioinformatics"},{"issue":"3","key":"900_CR9","doi-asserted-by":"publisher","first-page":"131","DOI":"10.1016\/j.compbiolchem.2011.04.009","volume":"35","author":"D Ma","year":"2011","unstructured":"Ma D, Xiao J, Li Y, Diao Y, Guo Y, Li M. Feature importance analysis in guide strand identification of micrornas. Comput Biol Chem. 2011; 35(3):131\u20136.","journal-title":"Comput Biol Chem"},{"issue":"4","key":"900_CR10","doi-asserted-by":"publisher","first-page":"201","DOI":"10.1002\/cem.1375","volume":"25","author":"DS Cao","year":"2011","unstructured":"Cao DS, Liang YZ, Xu QS, Zhang LX, Hu QN, Li HD. Feature importance sampling-based adaptive random forest as a useful tool to screen underlying lead compounds. J Chemometrics. 2011; 25(4):201\u20137.","journal-title":"J Chemometrics"},{"key":"900_CR11","volume-title":"ECML Workshop on Solving Complex Machine Learning Problems with Ensemble Methods","author":"J Paul","year":"2013","unstructured":"Paul J, Verleysen M, Dupont P. Identification of statistically significant features from random forests. In: ECML Workshop on Solving Complex Machine Learning Problems with Ensemble Methods. Prague, Czech Republic: Springer: 2013."},{"key":"900_CR12","volume-title":"Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","author":"L Yu","year":"2008","unstructured":"Yu L, Ding C, Loscalzo S. Stable feature selection via dense feature groups. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Las Vegas, Nevada, USA: ACM: 2008. p. 803\u201311."},{"key":"900_CR13","volume-title":"Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","author":"S Loscalzo","year":"2009","unstructured":"Loscalzo S, Yu L, Ding C. Consensus group stable feature selection. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Paris, France: ACM: 2009. p. 567\u201376."},{"issue":"4","key":"900_CR14","doi-asserted-by":"publisher","first-page":"215","DOI":"10.1016\/j.compbiolchem.2010.07.002","volume":"34","author":"Z He","year":"2010","unstructured":"He Z, Yu W. Stable feature selection for biomarker discovery. Comput Biol Chem. 2010; 34(4):215\u201325.","journal-title":"Comput Biol Chem"},{"issue":"1","key":"900_CR15","doi-asserted-by":"publisher","first-page":"262","DOI":"10.1109\/TCBB.2011.47","volume":"9","author":"L Yu","year":"2012","unstructured":"Yu L, Han Y, Berens ME. Stable gene selection from microarray data via sample weighting. IEEE\/ACM Trans Comput Biol Bioinformatics (TCBB). 2012; 9(1):262\u201372.","journal-title":"IEEE\/ACM Trans Comput Biol Bioinformatics (TCBB)"},{"issue":"5","key":"900_CR16","doi-asserted-by":"publisher","first-page":"428","DOI":"10.1002\/sam.11152","volume":"5","author":"Y Han","year":"2012","unstructured":"Han Y, Yu L. A variance reduction framework for stable feature selection. Stat Anal Data Min: The ASA Data Science Journal. 2012; 5(5):428\u201345.","journal-title":"Stat Anal Data Min: The ASA Data Science Journal"},{"key":"900_CR17","first-page":"1532","volume":"53","author":"I Kamkar","year":"2014","unstructured":"Kamkar I, Gupta SK, Phung D, Venkatesh S. Stable feature selection for clinical prediction: Exploiting icd tree structure using tree-lasso. Journal of biomedical informatics. 2014; 53:1532\u20130464.","journal-title":"Journal of biomedical informatics"},{"issue":"5","key":"900_CR18","doi-asserted-by":"publisher","first-page":"2336","DOI":"10.1016\/j.eswa.2014.10.044","volume":"42","author":"CH Park","year":"2015","unstructured":"Park CH, Kim SB. Sequential random k-nearest neighbor feature selection for high-dimensional data. Expert Syst Appl. 2015; 42(5):2336\u201342.","journal-title":"Expert Syst Appl"},{"issue":"1","key":"900_CR19","doi-asserted-by":"publisher","first-page":"95","DOI":"10.1007\/s10115-006-0040-8","volume":"12","author":"A Kalousis","year":"2007","unstructured":"Kalousis A, Prados J, Hilario M. Stability of feature selection algorithms: a study on high-dimensional spaces. Knowl Inform Syst. 2007; 12(1):95\u2013116.","journal-title":"Knowl Inform Syst"},{"issue":"12","key":"900_CR20","doi-asserted-by":"publisher","first-page":"28210","DOI":"10.1371\/journal.pone.0028210","volume":"6","author":"AC Haury","year":"2011","unstructured":"Haury AC, Gestraud P, Vert JP. The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures. PloS one. 2011; 6(12):28210.","journal-title":"PloS one"},{"issue":"1","key":"900_CR21","doi-asserted-by":"publisher","first-page":"147","DOI":"10.1186\/1471-2105-10-147","volume":"10","author":"SY Kim","year":"2009","unstructured":"Kim SY. Effects of sample size on robustness and prediction accuracy of a prognostic gene signature. BMC Bioinformatics. 2009; 10(1):147.","journal-title":"BMC Bioinformatics"},{"issue":"1","key":"900_CR22","doi-asserted-by":"publisher","first-page":"86","DOI":"10.1093\/bib\/bbq011","volume":"12","author":"ML Calle","year":"2011","unstructured":"Calle ML, Urrea V. Letter to the editor: Stability of random forest importance measures. Brief Bioinformatics. 2011; 12(1):86\u20139.","journal-title":"Brief Bioinformatics"},{"issue":"4","key":"900_CR23","doi-asserted-by":"publisher","first-page":"369","DOI":"10.1093\/bib\/bbr016","volume":"12","author":"KK Nicodemus","year":"2011","unstructured":"Nicodemus KK. Letter to the editor: On the stability and ranking of predictors from random forest variable importance measures. Briefings in bioinformatics. 2011; 12(4):369\u201373.","journal-title":"Briefings in bioinformatics"},{"issue":"2","key":"900_CR24","doi-asserted-by":"publisher","first-page":"330","DOI":"10.1016\/j.patcog.2010.08.011","volume":"44","author":"A Verikas","year":"2011","unstructured":"Verikas A, Gelzinis A, Bacauskiene M. Mining data with random forests: A survey and results of new tests. Pattern Recognit. 2011; 44(2):330\u201349.","journal-title":"Pattern Recognit"},{"issue":"1","key":"900_CR25","doi-asserted-by":"publisher","first-page":"8","DOI":"10.1186\/1471-2105-15-8","volume":"15","author":"MB Kursa","year":"2014","unstructured":"Kursa MB. Robustness of random forest-based gene selection methods. BMC Bioinformatics. 2014; 15(1):8.","journal-title":"BMC Bioinformatics"},{"issue":"1\u20133","key":"900_CR26","doi-asserted-by":"publisher","first-page":"389","DOI":"10.1023\/A:1012487302797","volume":"46","author":"I Guyon","year":"2002","unstructured":"Guyon I, Weston J, Barnhill S, Vapnik V. Gene selection for cancer classification using support vector machines. Mach Learn. 2002; 46(1\u20133):389\u2013422.","journal-title":"Mach Learn"},{"issue":"Suppl 2","key":"900_CR27","doi-asserted-by":"publisher","first-page":"27","DOI":"10.1186\/1471-2164-9-S2-S27","volume":"9","author":"Y Zhang","year":"2008","unstructured":"Zhang Y, Ding C, Li T. Gene selection algorithm by combining relieff and mrmr. BMC Genomics. 2008; 9(Suppl 2):27.","journal-title":"BMC Genomics"},{"issue":"8","key":"900_CR28","doi-asserted-by":"publisher","first-page":"3241","DOI":"10.12733\/jics20105854","volume":"12","author":"H Wang","year":"2015","unstructured":"Wang H, Wang C, Lv B, Pan X. Improved variable importance measure of random forest via combining of proximity measure and support vector machine for stable feature selection. J Inform Comput Sci. 2015; 12(8):3241\u201352. doi:10.12733\/jics20105854.","journal-title":"J Inform Comput Sci."},{"key":"900_CR29","doi-asserted-by":"crossref","unstructured":"Boulesteix AL, Bender A, Bermejo JL, Strobl C. Brief Bioinform. 2012; 13(3):292\u2013304.","DOI":"10.1093\/bib\/bbr053"},{"issue":"3","key":"900_CR30","doi-asserted-by":"publisher","first-page":"543","DOI":"10.1080\/10485252.2012.677843","volume":"24","author":"R Genuer","year":"2012","unstructured":"Genuer R. Variance reduction in purely random forests. J Nonparametric Stat. 2012; 24(3):543\u201362.","journal-title":"J Nonparametric Stat"},{"issue":"16","key":"900_CR31","doi-asserted-by":"publisher","first-page":"6241","DOI":"10.1016\/j.eswa.2013.05.051","volume":"40","author":"JM Cadenas","year":"2013","unstructured":"Cadenas JM, Garrido MC, Mart\u00edNez R. Feature subset selection filter\u2013wrapper based on low quality data. Expert Syst Appl. 2013; 40(16):6241\u201352.","journal-title":"Expert Syst Appl"},{"issue":"1","key":"900_CR32","first-page":"1144","volume":"36","author":"VY Kulkarni","year":"2013","unstructured":"Kulkarni VY, Sinha PK. Random forest classifiers: a survey and future research directions. Int J Adv Comput. 2013; 36(1):1144\u201353.","journal-title":"Int J Adv Comput"},{"key":"900_CR33","volume-title":"Artificial Intelligence and Applications","author":"LI Kuncheva","year":"2007","unstructured":"Kuncheva LI. A stability index for feature selection. In: Artificial Intelligence and Applications. Innsbruck, Austria: Springer: 2007. p. 421\u20137."},{"key":"900_CR34","volume-title":"High Performance Computing and Communications (HPCC), 2011 IEEE 13th International Conference On","author":"S Alelyani","year":"2011","unstructured":"Alelyani S, Zhao Z, Liu H. A dilemma in assessing stability of feature selection algorithms. In: High Performance Computing and Communications (HPCC), 2011 IEEE 13th International Conference On. Banff, Canada: IEEE: 2011. p. 701\u20137."},{"issue":"1","key":"900_CR35","doi-asserted-by":"publisher","first-page":"134","DOI":"10.1137\/S0895480102412856","volume":"17","author":"R Fagin","year":"2003","unstructured":"Fagin R, Kumar R, Sivakumar D. Comparing top k lists. SIAM J Discrete Math. 2003; 17(1):134\u201360.","journal-title":"SIAM J Discrete Math"},{"issue":"5","key":"900_CR36","doi-asserted-by":"publisher","first-page":"556","DOI":"10.1093\/bib\/bbp034","volume":"10","author":"AL Boulesteix","year":"2009","unstructured":"Boulesteix AL, Slawski M. Stability and aggregation of ranked gene lists. Brief Bioinformatics. 2009; 10(5):556\u201368.","journal-title":"Brief Bioinformatics"},{"key":"900_CR37","doi-asserted-by":"publisher","first-page":"470","DOI":"10.1093\/biomet\/44.3-4.470","volume":"44","author":"EC Fieller","year":"1957","unstructured":"Fieller EC, Hartley HO, Pearson ES. Tests for rank correlation coefficients. i.Biometrika. 1957; 44:470\u2013481.","journal-title":"Biometrika"},{"issue":"3","key":"900_CR38","doi-asserted-by":"publisher","first-page":"315","DOI":"10.1016\/0306-4573(89)90048-4","volume":"25","author":"L Hamers","year":"1989","unstructured":"Hamers L, Hemeryck Y, Herweyers G, Janssen M, Keters H, Rousseau R, et al. Similarity measures in scientometric research: the jaccard index versus salton\u2019s cosine formula. Inform Process Manag. 1989; 25(3):315\u20138.","journal-title":"Inform Process Manag"},{"issue":"4","key":"900_CR39","doi-asserted-by":"crossref","first-page":"833","DOI":"10.1177\/193229681300700405","volume":"7","author":"S Pleus","year":"2013","unstructured":"Pleus S, Schmid C, Link M, Zschornack E, Kl\u00f6tzer HM, Haug C, et al. Performance evaluation of a continuous glucose monitoring system under conditions similar to daily life. J Diabetes Sci Technol. 2013; 7(4):833\u201341.","journal-title":"J Diabetes Sci Technol"},{"issue":"7","key":"900_CR40","doi-asserted-by":"publisher","first-page":"491","DOI":"10.1016\/j.ijmedinf.2005.05.002","volume":"74","author":"A Statnikov","year":"2005","unstructured":"Statnikov A, Tsamardinos I, Dosbayev Y, Aliferis CF. Gems: a system for automated cancer diagnosis and biomarker discovery from microarray gene expression data. Int J Med Inform. 2005; 74(7):491\u2013503.","journal-title":"Int J Med Inform"},{"issue":"2","key":"900_CR41","doi-asserted-by":"publisher","first-page":"102","DOI":"10.1007\/s100440200009","volume":"5","author":"TK Ho","year":"2002","unstructured":"Ho TK. A data complexity analysis of comparative advantages of decision forest constructors. Pattern Anal Appl. 2002; 5(2):102\u201312.","journal-title":"Pattern Anal Appl"},{"key":"900_CR42","unstructured":"Liaw A, Wiener M. The randomForest package. Software manual. 2003. https:\/\/cran.r-project.org\/web\/packages\/randomForest\/."}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-016-0900-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12859-016-0900-5\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-016-0900-5","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-016-0900-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,2,1]],"date-time":"2024-02-01T18:25:46Z","timestamp":1706811946000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-016-0900-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,2,3]]},"references-count":42,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2016,12]]}},"alternative-id":["900"],"URL":"https:\/\/doi.org\/10.1186\/s12859-016-0900-5","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2016,2,3]]},"assertion":[{"value":"16 June 2015","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"15 December 2015","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"3 February 2016","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"60"}}