{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,28]],"date-time":"2026-02-28T09:08:09Z","timestamp":1772269689850,"version":"3.50.1"},"reference-count":31,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2020,2,5]],"date-time":"2020-02-05T00:00:00Z","timestamp":1580860800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,2,5]],"date-time":"2020-02-05T00:00:00Z","timestamp":1580860800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001809","name":"Natural Science Foundation of China","doi-asserted-by":"crossref","award":["61771165"],"award-info":[{"award-number":["61771165"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"China Postdoctoral Science Foundation Funded Project","award":["2014M551246, 2018T110302"],"award-info":[{"award-number":["2014M551246, 2018T110302"]}]},{"name":"Innovation Project of State Key Laboratory of Tree Genetics and Breeding","award":["2019A04"],"award-info":[{"award-number":["2019A04"]}]},{"DOI":"10.13039\/501100012226","name":"Fundamental Research Funds for the Central Universities","doi-asserted-by":"crossref","award":["2572018BH01"],"award-info":[{"award-number":["2572018BH01"]}],"id":[{"id":"10.13039\/501100012226","id-type":"DOI","asserted-by":"crossref"}]},{"name":"National Undergraduate Innovation Project","award":["201910225184"],"award-info":[{"award-number":["201910225184"]}]},{"name":"Specialized Personnel Start-up Grant","award":["41113237"],"award-info":[{"award-number":["41113237"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2020,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n<jats:title>Background<\/jats:title>\n<jats:p>Various methods for differential expression analysis have been widely used to identify features which best distinguish between different categories of samples. Multiple hypothesis testing may leave out explanatory features, each of which may be composed of individually insignificant variables. Multivariate hypothesis testing holds a non-mainstream position, considering the large computation overhead of large-scale matrix operation. Random forest provides a classification strategy for calculation of variable importance. However, it may be unsuitable for different distributions of samples.<\/jats:p>\n<\/jats:sec><jats:sec>\n<jats:title>Results<\/jats:title>\n<jats:p>Based on the thought of using an <jats:bold><jats:underline>e<\/jats:underline><\/jats:bold>nsemble <jats:bold><jats:underline>c<\/jats:underline><\/jats:bold>lassifier, we develop a <jats:bold><jats:underline>f<\/jats:underline><\/jats:bold>eature <jats:bold><jats:underline>s<\/jats:underline><\/jats:bold>election tool for <jats:bold><jats:underline>d<\/jats:underline><\/jats:bold>ifferential <jats:bold><jats:underline>e<\/jats:underline><\/jats:bold>xpression <jats:bold><jats:underline>a<\/jats:underline><\/jats:bold>nalysis on expression profiles (i.e., ECFS-DEA for short). Considering the differences in sample distribution, a graphical user interface is designed to allow the selection of different base classifiers. Inspired by random forest, a common measure which is applicable to any base classifier is proposed for calculation of variable importance. After an interactive selection of a feature on sorted individual variables, a projection heatmap is presented using k-means clustering. ROC curve is also provided, both of which can intuitively demonstrate the effectiveness of the selected feature.<\/jats:p>\n<\/jats:sec><jats:sec>\n<jats:title>Conclusions<\/jats:title>\n<jats:p>Feature selection through ensemble classifiers helps to select important variables and thus is applicable for different sample distributions. Experiments on simulation and realistic data demonstrate the effectiveness of ECFS-DEA for differential expression analysis on expression profiles. The software is available at <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"http:\/\/bio-nefu.com\/resource\/ecfs-dea\">http:\/\/bio-nefu.com\/resource\/ecfs-dea<\/jats:ext-link>.<\/jats:p>\n<\/jats:sec>","DOI":"10.1186\/s12859-020-3388-y","type":"journal-article","created":{"date-parts":[[2020,2,5]],"date-time":"2020-02-05T14:03:48Z","timestamp":1580911428000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":64,"title":["ECFS-DEA: an ensemble classifier-based feature selection for differential expression analysis on expression profiles"],"prefix":"10.1186","volume":"21","author":[{"given":"Xudong","family":"Zhao","sequence":"first","affiliation":[]},{"given":"Qing","family":"Jiao","sequence":"additional","affiliation":[]},{"given":"Hangyu","family":"Li","sequence":"additional","affiliation":[]},{"given":"Yiming","family":"Wu","sequence":"additional","affiliation":[]},{"given":"Hanxu","family":"Wang","sequence":"additional","affiliation":[]},{"given":"Shan","family":"Huang","sequence":"additional","affiliation":[]},{"given":"Guohua","family":"Wang","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2020,2,5]]},"reference":[{"key":"3388_CR1","doi-asserted-by":"publisher","first-page":"721","DOI":"10.2174\/1574893614666190116170406","volume":"14","author":"GI Lambrou","year":"2019","unstructured":"Lambrou GI, Sdraka M, Koutsouris D. The \u201cGene Cube\u201d: a novel approach to three-dimensional clustering of gene expression data. Curr Bioinforma. 2019; 14:721\u20137.","journal-title":"Curr Bioinforma"},{"key":"3388_CR2","doi-asserted-by":"publisher","first-page":"272","DOI":"10.1007\/0-387-21679-0_12","volume-title":"Statistics for Biology and Health","author":"John D. Storey","year":"2003","unstructured":"Storey JD, Tibshirani R, Garrett ES, Irizarry R, Zeger SL. SAM thresholding and false discovery rates for detecting differential gene expression in DNA microarrays In: Parmigiani G, editor. The Analysis of, Gene Expression Data. Springer: 2003. p. 272\u201390. https:\/\/doi.org\/10.1007\/0-387-21679-0_12."},{"key":"3388_CR3","doi-asserted-by":"publisher","first-page":"e47","DOI":"10.1093\/nar\/gkv007","volume":"43","author":"ME Ritchie","year":"2015","unstructured":"Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015; 43:e47.","journal-title":"Nucleic Acids Res"},{"key":"3388_CR4","doi-asserted-by":"crossref","unstructured":"Pollard KS, Dudoit S, van der Laan MJ. Multiple testing procedures: the multiset package and application to genomics In: Gentleman R, Carey V, Huber W, Irizarry R, Dudoit S, editors. Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Springer: 2005. p. 249\u201371. https:\/\/link.springer.com\/chapter\/10.1007%2F0-387-29362-0_15.","DOI":"10.1007\/0-387-29362-0_15"},{"key":"3388_CR5","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/s41598-016-0028-x","volume":"7","author":"XD Zhao","year":"2017","unstructured":"Zhao XD, Wang L, Chen GS. Joint covariate detection on expression profles for identifying microRNAs related to venous metastasis in hepatocellular carcinoma. Sci Rep. 2017; 7:1\u201311.","journal-title":"Sci Rep"},{"key":"3388_CR6","doi-asserted-by":"publisher","unstructured":"Kanji GK. 100 statistical tests, 3rd edition: SAGE Publication; 2006. https:\/\/doi.org\/10.4135\/9781849208499.","DOI":"10.4135\/9781849208499"},{"key":"3388_CR7","doi-asserted-by":"publisher","first-page":"e127","DOI":"10.1093\/nar\/gkz740","volume":"47","author":"B Liu","year":"2019","unstructured":"Liu B, Gao X, Zhang H. BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA, and protein sequences at sequence level and residue level based on machine learning approaches. Nucleic Acids Res. 2019; 47:e127.","journal-title":"Nucleic Acids Res"},{"key":"3388_CR8","doi-asserted-by":"publisher","first-page":"337","DOI":"10.1016\/j.omtn.2019.05.028","volume":"17","author":"HY Lai","year":"2019","unstructured":"Lai HY, Zhang ZY, Su ZD, Su W, Ding H, Chen W, Lin H. iProEP: a computational predictor for predicting promoter. Mol Ther Nucleic Acids. 2019; 17:337\u201346.","journal-title":"Mol Ther Nucleic Acids"},{"key":"3388_CR9","doi-asserted-by":"crossref","unstructured":"Lv H, Zhang ZM, Li SH, Tan JX, Chen W, Lin H. Evaluation of different computational methods on 5-methylcytosine sites identification. Brief Bioinforma. 2019; bbz048. https:\/\/academic.oup.com\/bib\/advance-articleabstract\/doi\/10.1093\/bib\/bbz048\/5510088?redirectedFrom=fulltext.","DOI":"10.1093\/bib\/bbz048"},{"key":"3388_CR10","doi-asserted-by":"publisher","first-page":"205","DOI":"10.1261\/rna.069112.118","volume":"25","author":"Q Zou","year":"2019","unstructured":"Zou Q, Xing PW, Wei LY, Liu B. Gene2vec: gene subsequence embedding for prediction of mammalian n6-methyladenosine sites from mRNA. RNA. 2019; 25:205\u201318.","journal-title":"RNA"},{"key":"3388_CR11","doi-asserted-by":"publisher","first-page":"2029","DOI":"10.1093\/bioinformatics\/bty039","volume":"34","author":"CZ Jia","year":"2018","unstructured":"Jia CZ, Zuo Y, Zou Q. O-GlcNAcPRED-II: an integrated classification algorithm for identifying O-GlcNAcylation sites based on fuzzy undersampling and a k-means PCA oversampling technique. Bioinformatics. 2018; 34:2029\u201336.","journal-title":"Bioinformatics"},{"key":"3388_CR12","first-page":"17","volume":"7","author":"SH Li","year":"2019","unstructured":"Li SH, Zhang J, Zhao YW, Dao FY, Ding H, Chen W, Tang H. iPhoPred: a predictor for identifying phosphorylation sites in human protein. IEEE Access. 2019; 7:17\u201328.","journal-title":"IEEE Access"},{"key":"3388_CR13","doi-asserted-by":"publisher","first-page":"215","DOI":"10.3389\/fbioe.2019.00215","volume":"7","author":"ZB Lv","year":"2019","unstructured":"Lv ZB, Jin SS, Ding H, Zou Q. A random forest sub-Golgi protein classifier optimized via dipeptide and amino acid composition features. Front Bioeng Biotechnol. 2019; 7:215.","journal-title":"Front Bioeng Biotechnol"},{"key":"3388_CR14","doi-asserted-by":"publisher","first-page":"787","DOI":"10.1016\/j.knosys.2018.10.007","volume":"163","author":"XJ Zhu","year":"2019","unstructured":"Zhu XJ, Feng CQ, Lai HY, Chen W, Lin H. Predicting protein structural classes for low-similarity sequences by evaluating different features. Knowl Based Syst. 2019; 163:787\u201393.","journal-title":"Knowl Based Syst"},{"key":"3388_CR15","doi-asserted-by":"publisher","first-page":"2931","DOI":"10.1021\/acs.jproteome.9b00250","volume":"18","author":"XQ Ru","year":"2019","unstructured":"Ru XQ, Li LH, Zou Q. Incorporating distance-based top-n-gram and random forest to identify electron transport proteins. J Proteome Res. 2019; 18:2931\u20139.","journal-title":"J Proteome Res"},{"key":"3388_CR16","doi-asserted-by":"publisher","first-page":"1392","DOI":"10.1021\/acs.jproteome.9b00012","volume":"18","author":"YJ Li","year":"2019","unstructured":"Li YJ, Niu MT, Zou Q. ELM-MHC: an improved MHC identification method with extreme learning machine algorithm. J Proteome Res. 2019; 18:1392\u2013401.","journal-title":"J Proteome Res"},{"key":"3388_CR17","doi-asserted-by":"publisher","unstructured":"Li C, Liu B. MotifCNN-fold: protein fold recognition based on fold-specific features extracted by motif-based convolutional neural networks. Brief Bioinforma. 2019; bbz133. https:\/\/doi.org\/10.1093\/bib\/bbz133.","DOI":"10.1093\/bib\/bbz133"},{"key":"3388_CR18","doi-asserted-by":"publisher","unstructured":"Liu B, Zhu Y, Yan K. Fold-LTR-TCP: protein fold recognition based on triadic closure principle. Brief Bioinforma. 2019; bbz139. https:\/\/doi.org\/10.1093\/bib\/bbz139.","DOI":"10.1093\/bib\/bbz139"},{"key":"3388_CR19","doi-asserted-by":"publisher","unstructured":"Liu B, Li C, Yan K. DeepSVM-fold: Protein fold recognition by combining Support Vector Machines and pairwise sequence similarity scores generated by deep learning networks. Brief Bioinforma. 2019;bbz098. https:\/\/doi.org\/10.1093\/bib\/bbz098.","DOI":"10.1093\/bib\/bbz098"},{"key":"3388_CR20","doi-asserted-by":"publisher","first-page":"6862","DOI":"10.1038\/s41598-017-07199-4","volume":"7","author":"J Song","year":"2017","unstructured":"Song J, Wang H, Wang J, Leier A, Marquez-Lago T, Yang B, Zhang Z, Akutsu T, Webb GI, Daly RJ. PhosphoPredict: A bioinformatics tool for prediction of human kinase-specific phosphorylation substrates and sites by integrating heterogeneous feature selection. Sci Rep. 2017; 7:6862.","journal-title":"Sci Rep"},{"key":"3388_CR21","doi-asserted-by":"publisher","first-page":"149","DOI":"10.1142\/S0219720011005288","volume":"9","author":"J Song","year":"2011","unstructured":"Song J, Tan H, Boyd SE, Shen H, Mahmood K, Webb GI, Akutsu T, Whisstock JC, Pike RN. Bioinformatic approaches for predicting substrates of proteases. J Bioinforma Comput Biol. 2011; 9:149\u201378.","journal-title":"J Bioinforma Comput Biol"},{"key":"3388_CR22","doi-asserted-by":"publisher","first-page":"e30361","DOI":"10.1371\/journal.pone.0030361","volume":"7","author":"J Song","year":"2012","unstructured":"Song J, Tan H, Wang M, Webb GI, Akutsu T. TANGLE: two-level support vector regression approach for protein backbone torsion angle prediction from primary sequences. PloS ONE. 2012; 7:e30361.","journal-title":"PloS ONE"},{"key":"3388_CR23","doi-asserted-by":"publisher","first-page":"219","DOI":"10.1186\/s12859-015-0629-6","volume":"16","author":"XP Cheng","year":"2015","unstructured":"Cheng XP, Cai HM, Zhang Y, Xu B, Su WF. Optimal combination of feature selection and classification via local hyperplane based learning strategy. BMC Bioinformatics. 2015; 16:219.","journal-title":"BMC Bioinformatics"},{"key":"3388_CR24","doi-asserted-by":"publisher","first-page":"70","DOI":"10.1186\/1471-2105-15-70","volume":"15","author":"HM Cai","year":"2014","unstructured":"Cai HM, Ruan PY, Ng M, Akutsu T. Feature weight estimation for gene selection: a local hyperlinear learning approach. BMC Bioinformatics. 2014; 15:70.","journal-title":"BMC Bioinformatics"},{"key":"3388_CR25","doi-asserted-by":"publisher","unstructured":"Shmueli G. To Explain or to Predict? 2010; 25:289\u2013311. https:\/\/doi.org\/10.2139\/ssrn.1351252.","DOI":"10.2139\/ssrn.1351252"},{"key":"3388_CR26","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1023\/A:1010933404324","volume":"45","author":"L Breiman","year":"2001","unstructured":"Breiman L. Random forests. Mach Learn. 2001; 45:5\u201332.","journal-title":"Mach Learn"},{"key":"3388_CR27","doi-asserted-by":"publisher","first-page":"215","DOI":"10.3389\/fbioe.2019.00215","volume":"7","author":"ZB Lv","year":"2019","unstructured":"Lv ZB, Jin SS, Ding H, Zou Q. A random forest sub-Golgi protein classifier optimized via dipeptide and amino acid composition features. Front Bioeng Biotechnol. 2019; 7:215.","journal-title":"Front Bioeng Biotechnol"},{"key":"3388_CR28","doi-asserted-by":"publisher","first-page":"365","DOI":"10.1186\/s12859-019-2893-3","volume":"20","author":"Y Li","year":"2019","unstructured":"Li Y, Liu YN, Wu YM, Zhao XD. JCD-DEA: a joint covariate detection tool for differential expression analysis on tumor expression profiles. BMC Bioinformatics. 2019; 20:365.","journal-title":"BMC Bioinformatics"},{"key":"3388_CR29","doi-asserted-by":"publisher","first-page":"402","DOI":"10.1038\/msb.2010.58","volume":"24","author":"J Burchard","year":"2010","unstructured":"Burchard J, Zhang C, Liu AM, Poon RT, Lee NPY, Wong KF, Sham PC, Lam BY, Ferguson MD, Tokiwa G, Smith R, Leeson B, Beard R, Lamb JR, Lim L, Mao M, Dai H, Luk JM. microRNA-122 as a regulator of mitochondrial metabolic gene network in hepatocellular carcinoma. Mol Syst Biol. 2010; 24:402.","journal-title":"Mol Syst Biol"},{"key":"3388_CR30","doi-asserted-by":"publisher","first-page":"13494","DOI":"10.1002\/jcb.28623","volume":"120","author":"JC Ma","year":"2019","unstructured":"Ma JC, Qin CY, Yuan ZG, Liu SL. LncRNA PAPAS promotes hepatocellular carcinoma by interacting with miR-188-5p. J Cell Biochem. 2019; 120:13494\u2013500.","journal-title":"J Cell Biochem"},{"key":"3388_CR31","doi-asserted-by":"publisher","first-page":"237","DOI":"10.1016\/j.ebiom.2019.05.053","volume":"44","author":"FZ Meng","year":"2019","unstructured":"Meng FZ, Zhang SG, Song RP, Liu Y, Wang JB, Liang YJ, Wang JZ, Han JH, Song X, Lu ZY, Yang GC, Pan SH, Li XY, Liu YF, Zhou F, Wang Y, Cui YF, Zhang B, Ma K, Zhang CY, Sun YF, Xin MY, Liu LX. NCAPG2 overexpression promotes hepatocellular carcinoma proliferation and metastasis through activating the STAT3 and NF-kappa B\/miR-188-3p pathways. Ebiomedicine. 2019; 44:237\u201349.","journal-title":"Ebiomedicine"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-020-3388-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/s12859-020-3388-y\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-020-3388-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,2,4]],"date-time":"2021-02-04T00:13:26Z","timestamp":1612397606000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-020-3388-y"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,2,5]]},"references-count":31,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2020,12]]}},"alternative-id":["3388"],"URL":"https:\/\/doi.org\/10.1186\/s12859-020-3388-y","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,2,5]]},"assertion":[{"value":"3 December 2019","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"27 January 2020","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"5 February 2020","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Not applicable.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"43"}}