{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,23]],"date-time":"2026-01-23T16:34:05Z","timestamp":1769186045930,"version":"3.49.0"},"reference-count":39,"publisher":"MDPI AG","issue":"7","license":[{"start":{"date-parts":[[2023,6,29]],"date-time":"2023-06-29T00:00:00Z","timestamp":1687996800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["81900134"],"award-info":[{"award-number":["81900134"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>Feature selection plays an important role in improving the performance of classification or reducing the dimensionality of high-dimensional datasets, such as high-throughput genomics\/proteomics data in bioinformatics. As a popular approach with computational efficiency and scalability, information theory has been widely incorporated into feature selection. In this study, we propose a unique weight-based feature selection (WBFS) algorithm that assesses selected features and candidate features to identify the key protein biomarkers for classifying lung cancer subtypes from The Cancer Proteome Atlas (TCPA) database and we further explored the survival analysis between selected biomarkers and subtypes of lung cancer. Results show good performance of the combination of our WBFS method and Bayesian network for mining potential biomarkers. These candidate signatures have valuable biological significance in tumor classification and patient survival analysis. Taken together, this study proposes the WBFS method that helps to explore candidate biomarkers from biomedical datasets and provides useful information for tumor diagnosis or therapy strategies.<\/jats:p>","DOI":"10.3390\/e25071003","type":"journal-article","created":{"date-parts":[[2023,6,30]],"date-time":"2023-06-30T00:43:30Z","timestamp":1688085810000},"page":"1003","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["The Weight-Based Feature Selection (WBFS) Algorithm Classifies Lung Cancer Subtypes Using Proteomic Data"],"prefix":"10.3390","volume":"25","author":[{"given":"Yangyang","family":"Wang","sequence":"first","affiliation":[{"name":"School of Electronics and Information, Northwestern Polytechnical University, Xi\u2019an 710129, China"}]},{"given":"Xiaoguang","family":"Gao","sequence":"additional","affiliation":[{"name":"School of Electronics and Information, Northwestern Polytechnical University, Xi\u2019an 710129, China"}]},{"given":"Xinxin","family":"Ru","sequence":"additional","affiliation":[{"name":"School of Electronics and Information, Northwestern Polytechnical University, Xi\u2019an 710129, China"}]},{"given":"Pengzhan","family":"Sun","sequence":"additional","affiliation":[{"name":"School of Electronics and Information, Northwestern Polytechnical University, Xi\u2019an 710129, China"}]},{"given":"Jihan","family":"Wang","sequence":"additional","affiliation":[{"name":"Xi\u2019an Key Laboratory of Stem Cell and Regenerative Medicine, Institute of Medical Research, Northwestern Polytechnical University, Xi\u2019an 710072, China"}]}],"member":"1968","published-online":{"date-parts":[[2023,6,29]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"585","DOI":"10.1016\/j.molmed.2019.04.012","article-title":"Abandoning the notion of non-small cell lung cancer","volume":"25","author":"Relli","year":"2019","journal-title":"Trends Mol. Med."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"1046","DOI":"10.1038\/nmeth.2650","article-title":"TCPA: A resource for cancer functional proteomics data","volume":"10","author":"Li","year":"2013","journal-title":"Nat. Methods"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"BSR20194337","DOI":"10.1042\/BSR20194337","article-title":"Mining TCGA database for tumor mutation burden and their clinical significance in bladder cancer","volume":"40","author":"Lv","year":"2020","journal-title":"Biosci. Rep."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1971","DOI":"10.1007\/s00262-019-02408-7","article-title":"Identification of prognostic genes in the acute myeloid leukemia immune microenvironment based on TCGA data analysis","volume":"68","author":"Yan","year":"2019","journal-title":"Cancer Immunol. Immunother."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"882","DOI":"10.1109\/TEVC.2020.2968743","article-title":"Variable-size cooperative coevolutionary particle swarm optimization for feature selection on high-dimensional data","volume":"24","author":"Song","year":"2020","journal-title":"IEEE Trans. Evol. Comput."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"e9656","DOI":"10.7717\/peerj.9656","article-title":"Predictive models for stage and risk classification in head and neck squamous cell carcinoma (HNSCC)","volume":"8","author":"Kumar","year":"2020","journal-title":"PeerJ"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"2068","DOI":"10.1016\/j.jid.2019.07.682","article-title":"Research techniques made simple: Feature selection for biomarker discovery","volume":"139","author":"Torres","year":"2019","journal-title":"J. Investig. Dermatol."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"175","DOI":"10.1007\/s00521-013-1368-0","article-title":"A review of feature selection methods based on mutual information","volume":"24","author":"Vergara","year":"2014","journal-title":"Neural Comput. Appl."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"537","DOI":"10.1109\/72.298224","article-title":"Using mutual information for selecting features in supervised neural net learning","volume":"5","author":"Battiti","year":"1994","journal-title":"IEEE Trans. Neural Netw."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Lewis, D.D. (1992, January 23\u201326). Feature Selection and Feature Extraction for Text Categorization. Proceedings of the Speech and Natural Language: Proceedings of a Workshop Held at Harriman, Harriman, NY, USA.","DOI":"10.3115\/1075527.1075574"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"143","DOI":"10.1109\/72.977291","article-title":"Input feature selection for classification problems","volume":"13","author":"Kwak","year":"2002","journal-title":"IEEE Trans. Neural Netw."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"1226","DOI":"10.1109\/TPAMI.2005.159","article-title":"Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy","volume":"27","author":"Peng","year":"2005","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_13","unstructured":"Lin, D., and Tang, X. (2006). European Conference on Computer Vision, Springer."},{"key":"ref_14","first-page":"27","article-title":"Conditional likelihood maximisation: A unifying framework for information theoretic feature selection","volume":"13","author":"Brown","year":"2012","journal-title":"J. Mach. Learn. Res."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"891","DOI":"10.1016\/j.ins.2021.10.026","article-title":"Dynamic interaction feature selection based on fuzzy rough set","volume":"581","author":"Wan","year":"2021","journal-title":"Inf. Sci."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Nakariyakul, S. (2019). A hybrid gene selection algorithm based on interaction information for microarray-based cancer classification. PLoS ONE, 14.","DOI":"10.1371\/journal.pone.0212333"},{"key":"ref_17","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"Hinton","year":"2008","journal-title":"J. Mach. Learn. Res."},{"key":"ref_18","unstructured":"Krijthe, J., van der Maaten, L., Krijthe, M.J., and Package \u2018Rtsne\u2019 (2023, January 11). R Package Version 0.13 2017URL. Available online: https:\/\/github.com\/jkrijthe\/Rtsne."},{"key":"ref_19","first-page":"1","article-title":"FactoMineR: An R package for multivariate analysis","volume":"25","author":"Josse","year":"2008","journal-title":"J. Stat. Softw."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"524","DOI":"10.3389\/fgene.2019.00524","article-title":"Review of causal discovery methods based on graphical models","volume":"10","author":"Glymour","year":"2019","journal-title":"Front. Genet."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"134","DOI":"10.1016\/j.envsoft.2012.03.012","article-title":"Good practice in Bayesian network modelling","volume":"37","author":"Chen","year":"2012","journal-title":"Environ. Model. Softw."},{"key":"ref_22","first-page":"300","article-title":"Towards Principled Feature Selection: Relevancy, Filters and Wrappers","volume":"Volume R4","author":"Christopher","year":"2003","journal-title":"International Workshop on Artificial Intelligence and Statistics"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3409382","article-title":"Causality-based feature selection: Methods and evaluations","volume":"53","author":"Yu","year":"2020","journal-title":"ACM Comput. Surv."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"92","DOI":"10.1016\/j.patrec.2022.09.021","article-title":"Causal learner: A toolbox for causal structure and markov blanket learning","volume":"163","author":"Ling","year":"2022","journal-title":"Pattern Recognit. Lett."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1016\/0169-2607(95)01703-8","article-title":"MedCalc: A new computer program for medical statistics","volume":"48","author":"Schoonjans","year":"1995","journal-title":"Comput. Methods Programs Biomed."},{"key":"ref_26","unstructured":"Kassambara, A., Kosinski, M., Biecek, P., and Fabian, S. (2017). Survminer: Drawing Survival Curves Using \u2018ggplot2\u2032, R Core Team. R Package version 0.3."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Kramer, O., and Kramer, O. (2013). Dimensionality Reduction with Unsupervised Nearest Neighbors, Springer.","DOI":"10.1007\/978-3-642-38652-7"},{"key":"ref_28","first-page":"123","article-title":"Naive bayesian classifier","volume":"2007","author":"Leung","year":"2007","journal-title":"Polytech. Univ. Dep. Comput. Sci.\/Financ. Risk Eng."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/1961189.1961199","article-title":"LIBSVM: A library for support vector machines","volume":"2","author":"Chang","year":"2011","journal-title":"ACM Trans. Intell. Syst. Technol."},{"key":"ref_30","unstructured":"Meyer, P.E., and Bontempi, G. (2006). Proceedings of the Applications of Evolutionary Computing: EvoWorkshops 2006: EvoBIO, EvoCOMNET, EvoHOT, EvoIASP, EvoINTERACTION, EvoMUSART, and EvoSTOC, Budapest, Hungary, 10\u201312 April 2006, Springer."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"910494","DOI":"10.3389\/fonc.2022.910494","article-title":"Identification of therapeutically potential targets and their ligands for the treatment of OSCC","volume":"12","author":"Kumari","year":"2022","journal-title":"Front. Oncol."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"86","DOI":"10.1007\/s11306-022-01937-0","article-title":"Feature selection approaches identify potential plasma metabolites in postmenopausal osteoporosis patients","volume":"18","author":"Wang","year":"2022","journal-title":"Metabolomics"},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"e933","DOI":"10.7717\/peerj-cs.933","article-title":"A hybrid feature selection algorithm and its application in bioinformatics","volume":"8","author":"Wang","year":"2022","journal-title":"PeerJ Comput. Sci."},{"key":"ref_34","first-page":"9","article-title":"Literature review on feature selection methods for high-dimensional data","volume":"136","author":"Gnana","year":"2016","journal-title":"Int. J. Comput. Appl."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"616","DOI":"10.1109\/TBME.2010.2068048","article-title":"Heartbeat Classification Using Feature Selection Driven by Database Generalization Criteria","volume":"58","author":"Llamedo","year":"2011","journal-title":"IEEE Trans. Biomed. Eng."},{"key":"ref_36","unstructured":"Koller, D., and Sahami, M. (1996). Toward Optimal Feature Selection, Stanford InfoLab."},{"key":"ref_37","first-page":"36","article-title":"Gait feature subset selection by mutual information","volume":"39","author":"Guo","year":"2008","journal-title":"IEEE Trans. Syst. MAN Cybern.-Part A Syst. Hum."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"107525","DOI":"10.1016\/j.patcog.2020.107525","article-title":"Mutual information based feature subset selection in multivariate time series classification","volume":"108","author":"Ircio","year":"2020","journal-title":"Pattern Recognit."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"4339","DOI":"10.1242\/jcs.123208","article-title":"Regulation of EGFR trafficking and cell signaling by Sprouty2 and MIG6 in lung cancer cells","volume":"126","author":"Walsh","year":"2013","journal-title":"J. Cell Sci."}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/25\/7\/1003\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T20:04:03Z","timestamp":1760126643000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/25\/7\/1003"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,6,29]]},"references-count":39,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2023,7]]}},"alternative-id":["e25071003"],"URL":"https:\/\/doi.org\/10.3390\/e25071003","relation":{},"ISSN":["1099-4300"],"issn-type":[{"value":"1099-4300","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,6,29]]}}}