{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,3]],"date-time":"2026-03-03T16:11:33Z","timestamp":1772554293333,"version":"3.50.1"},"reference-count":59,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2022,1,10]],"date-time":"2022-01-10T00:00:00Z","timestamp":1641772800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>Analysis of high-dimensional data, with more features (p) than observations (N) (p&gt;N), places significant demand in cost and memory computational usage attributes. Feature selection can be used to reduce the dimensionality of the data. We used a graph-based approach, principal component analysis (PCA) and recursive feature elimination to select features for classification from RNAseq datasets from two lung cancer datasets. The selected features were discretized for association rule mining where support and lift were used to generate informative rules. Our results show that the graph-based feature selection improved the performance of sequential minimal optimization (SMO) and multilayer perceptron classifiers (MLP) in both datasets. In association rule mining, features selected using the graph-based approach outperformed the other two feature-selection techniques at a support of 0.5 and lift of 2. The non-redundant rules reflect the inherent relationships between features. Biological features are usually related to functions in living systems, a relationship that cannot be deduced by feature selection and classification alone. Therefore, the graph-based feature-selection approach combined with rule mining is a suitable way of selecting and finding associations between features in high-dimensional RNAseq data.<\/jats:p>","DOI":"10.3390\/a15010021","type":"journal-article","created":{"date-parts":[[2022,1,10]],"date-time":"2022-01-10T17:42:25Z","timestamp":1641836545000},"page":"21","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":13,"title":["Graph Based Feature Selection for Reduction of Dimensionality in Next-Generation RNA Sequencing Datasets"],"prefix":"10.3390","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2783-9992","authenticated-orcid":false,"given":"Consolata","family":"Gakii","sequence":"first","affiliation":[{"name":"Department of Computing and Information Technology, University of Embu, P.O. Box 6-60100, Embu 60100, Kenya"},{"name":"School of Computing and Information Technology, Jomo Kenyatta University of Agriculture and Technology, P.O. Box 62000-00200, Nairobi 00200, Kenya"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7965-2428","authenticated-orcid":false,"given":"Paul O.","family":"Mireji","sequence":"additional","affiliation":[{"name":"Biotechnology Research Institute, Kenya Agricultural and Livestock Research Organization, P.O. Box 362-00902, Kikuyu 00902, Kenya"}]},{"given":"Richard","family":"Rimiru","sequence":"additional","affiliation":[{"name":"School of Computing and Information Technology, Jomo Kenyatta University of Agriculture and Technology, P.O. Box 62000-00200, Nairobi 00200, Kenya"}]}],"member":"1968","published-online":{"date-parts":[[2022,1,10]]},"reference":[{"key":"ref_1","first-page":"42","article-title":"A review on dimensionality reduction techniques","volume":"173","author":"Jindal","year":"2017","journal-title":"Int. J. Comput. Appl."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Nguyen, L.H., and Holmes, S. (2019). Ten quick tips for effective dimensionality reduction. PLoS Comput. Biol., 15.","DOI":"10.1371\/journal.pcbi.1006907"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"56","DOI":"10.38094\/jastt1224","article-title":"A comprehensive review of dimensionality reduction techniques for feature selection and feature extraction","volume":"1","author":"Zebari","year":"2020","journal-title":"J. Appl. Sci. Technol. Trends"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Abdulrazzaq, M.B., and Saeed, J.N. (2019, January 2\u20134). A Comparison of Three Classification Algorithms for Handwritten Digit Recognition. Proceedings of the 2019 International Conference on Advanced Science and Engineering (ICOASE), Zakho-Duhok, Iraq.","DOI":"10.1109\/ICOASE.2019.8723702"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"441","DOI":"10.1016\/j.asoc.2017.11.006","article-title":"Whale optimization approaches for wrapper feature selection","volume":"62","author":"Mafarja","year":"2018","journal-title":"Appl. Soft Comput."},{"key":"ref_6","unstructured":"Yu, L., and Liu, H. (2003, January 21\u201324). Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution. Proceedings of the 20th international conference on machine learning (ICML-03), Washington, DC, USA."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"16","DOI":"10.1016\/j.compeleceng.2013.11.024","article-title":"A survey on feature selection methods","volume":"40","author":"Chandrashekar","year":"2014","journal-title":"Comput. Electr. Eng."},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Jovi\u0107, A., Brki\u0107, K., and Bogunovi\u0107, N. (2015, January 25\u201329). A review of feature selection methods with applications. Proceedings of the 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), Opatija, Croatia.","DOI":"10.1109\/MIPRO.2015.7160458"},{"key":"ref_9","first-page":"57","article-title":"A survey and comparative study of filter and wrapper feature selection techniques","volume":"5","author":"Mlambo","year":"2016","journal-title":"Int. J. Eng. Sci."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"189","DOI":"10.1016\/j.jbi.2018.07.014","article-title":"Relief-based feature selection: Introduction and review","volume":"85","author":"Urbanowicz","year":"2018","journal-title":"J. Biomed. Inform."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"15091","DOI":"10.1007\/s00521-021-06406-8","article-title":"A systematic review of emerging feature selection optimization methods for optimal text classification: The present state and prospective opportunities","volume":"33","author":"Abiodun","year":"2021","journal-title":"Neural Comput. Appl."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"137","DOI":"10.3389\/fgene.2021.611506","article-title":"Feature Selection Stability and Accuracy of Prediction Models for Genomic Prediction of Residual Feed Intake in Pigs Using Machine Learning","volume":"12","author":"Piles","year":"2021","journal-title":"Front. Genet."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1186\/s13059-021-02544-3","article-title":"Feature selection revisited in the single-cell era","volume":"22","author":"Yang","year":"2021","journal-title":"Genome Biol."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s40537-021-00415-z","article-title":"Optimized hybrid investigative based dimensionality reduction methods for malaria vector using KNN classifier","volume":"8","author":"Arowolo","year":"2021","journal-title":"J. Big Data"},{"key":"ref_15","unstructured":"Cateni, S., Vannucci, M., Vannocci, M., and Colla, V. (2021, December 07). Variable Selection and Feature Extraction through Artificial Intelligence Techniques. Available online: https:\/\/www.intechopen.com\/chapters\/41752."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"49","DOI":"10.1016\/j.eswa.2018.05.023","article-title":"An improved semi-supervised dimensionality reduction using feature weighting: Application to sentiment analysis","volume":"109","author":"Kim","year":"2018","journal-title":"Expert Syst. Appl."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"210","DOI":"10.1147\/rd.33.0210","article-title":"Some studies in machine learning using the game of checkers","volume":"3","author":"Samuel","year":"1959","journal-title":"IBM J. Res. Dev."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Das, H., Naik, B., and Behera, H. (2018). Classification of diabetes mellitus disease (DMD): A data mining (DM) approach. Progress in Computing, Analytics and Networking, Springer.","DOI":"10.1007\/978-981-10-7871-2_52"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"358","DOI":"10.4218\/etrij.2018-0522","article-title":"An enhanced feature selection filter for classification of microarray cancer data","volume":"41","author":"Mazumder","year":"2019","journal-title":"ETRI J."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13059-019-1898-6","article-title":"Accuracy, robustness and scalability of dimensionality reduction methods for single-cell RNA-seq analysis","volume":"20","author":"Sun","year":"2019","journal-title":"Genome Biol."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"420","DOI":"10.1007\/s10015-018-0437-y","article-title":"Association rule mining algorithms on high-dimensional datasets","volume":"23","author":"Ai","year":"2018","journal-title":"Artif. Life Robot."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Agrawal, R., Imieli\u0144ski, T., and Swami, A. (1993, January 25\u201328). Mining Association Rules between Sets of Items in Large Databases. Proceedings of the 1993 ACM SIGMOD international conference on Management of Data, Washington, DC, USA.","DOI":"10.1145\/170035.170072"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Liu, X., Sang, X., Chang, J., Zheng, Y., and Han, Y. (2021). The water supply association analysis method in Shenzhen based on kmeans clustering discretization and apriori algorithm. PLoS ONE, 16.","DOI":"10.1371\/journal.pone.0255684"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"2507","DOI":"10.1093\/bioinformatics\/btm344","article-title":"A review of feature selection techniques in bioinformatics","volume":"23","author":"Saeys","year":"2007","journal-title":"Bioinformatics"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"971","DOI":"10.1109\/TCBB.2015.2478454","article-title":"Supervised, unsupervised, and semi-supervised feature selection: A review on gene selection","volume":"13","author":"Ang","year":"2016","journal-title":"IEEE\/ACM Trans. Comput. Biol. Bioinform."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Ray, R.B., Kumar, M., and Rath, S.K. (2016, January 8\u20139). Fast In-Memory Cluster Computing of Sizeable Microarray Using Spark. Proceedings of the 2016 International Conference on Recent Trends in Information Technology (ICRTIT), Chennai, India.","DOI":"10.1109\/ICRTIT.2016.7569599"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Lokeswari, Y., and Jacob, S.G. (2017). Prediction of child tumours from microarray gene expression data through parallel gene selection and classification on spark. Computational Intelligence in Data Mining, Springer.","DOI":"10.1007\/978-981-10-3874-7_62"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Peralta, D., Del R\u00edo, S., Ram\u00edrez-Gallego, S., Triguero, I., Benitez, J.M., and Herrera, F. (2015). Evolutionary feature selection for big data classification: A mapreduce approach. Math. Probl. Eng., 2015.","DOI":"10.1155\/2015\/246139"},{"key":"ref_29","unstructured":"Sonnenburg, S., Franc, V., Yom-Tov, E., and Sebag, M. (2008, January 5\u20139). Pascal Large Scale Learning Challenge. Proceedings of the 25th International Conference on Machine Learning (ICML2008) Workshop, Helsinki, Finland."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"91535","DOI":"10.1109\/ACCESS.2019.2927080","article-title":"On the scalability of machine-learning algorithms for breast cancer prediction in big data context","volume":"7","author":"Alghunaim","year":"2019","journal-title":"IEEE Access"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Turgut, S., Da\u011ftekin, M., and Ensari, T. (2018, January 18\u201319). Microarray Breast Cancer Data Classification Using Machine Learning Methods. Proceedings of the 2018 Electric Electronics, Computer Science, Biomedical Engineerings\u2019 Meeting (EBBT), Istanbul, Turkey.","DOI":"10.1109\/EBBT.2018.8391468"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"1098","DOI":"10.1373\/clinchem.2015.238691","article-title":"Tumor microRNA expression profiling identifies circulating microRNAs for early breast cancer detection","volume":"61","author":"Matamala","year":"2015","journal-title":"Clin. Chem."},{"key":"ref_33","first-page":"1","article-title":"An ensemble of filters and wrappers for microarray data classification","volume":"3","author":"Morovvat","year":"2016","journal-title":"Mach. Learn. Appl. An. Int. J."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"615","DOI":"10.1007\/s10044-017-0668-x","article-title":"An approach of feature selection using graph-theoretic heuristic and hill climbing","volume":"22","author":"Goswami","year":"2019","journal-title":"Pattern Anal. Appl."},{"key":"ref_35","doi-asserted-by":"crossref","unstructured":"Zhang, Z., and Hancock, E.R. (2011). A Graph-Based Approach to Feature Selection. International Workshop on Graph-Based Representations in Pattern Recognition, Springer.","DOI":"10.1007\/978-3-642-20844-7_21"},{"key":"ref_36","doi-asserted-by":"crossref","unstructured":"Schroeder, D.T., Styp-Rekowski, K., Schmidt, F., Acker, A., and Kao, O. (2019, January 22\u201325). Graph-Based Feature Selection Filter Utilizing Maximal Cliques. Proceedings of the 2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS), Granada, Spain.","DOI":"10.1109\/SNAMS.2019.8931841"},{"key":"ref_37","first-page":"12","article-title":"Infinite feature selection: A graph-based feature filtering approach","volume":"43","author":"Roffo","year":"2020","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_38","doi-asserted-by":"crossref","unstructured":"Rana, P., Thai, P., Dinh, T., and Ghosh, P. (2021). Relevant and Non-Redundant Feature Selection for Cancer Classification and Subtype Detection. Cancers, 13.","DOI":"10.3390\/cancers13174297"},{"key":"ref_39","unstructured":"Nguyen, H., Thai, P., Thai, M., Vu, T., and Dinh, T. (2019). Approximate k-Cover in Hypergraphs: Efficient Algorithms, and Applications. arXiv."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"7112","DOI":"10.3934\/mbe.2019357","article-title":"Identification of lncRNAs-gene interactions in transcription regulation based on co-expression analysis of RNA-seq data","volume":"16","author":"Lu","year":"2019","journal-title":"Math. Biosci. Eng."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1016\/j.knosys.2018.04.038","article-title":"ARM\u2013AMO: An efficient association rule mining algorithm based on animal migration optimization","volume":"154","author":"Chiclana","year":"2018","journal-title":"Knowl. Based Syst."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"779","DOI":"10.1016\/j.cie.2019.03.020","article-title":"A hybrid temporal association rules mining method for traffic congestion prediction","volume":"130","author":"Wen","year":"2019","journal-title":"Comput. Ind. Eng."},{"key":"ref_43","unstructured":"Shui, Y., and Cho, Y.-R. (2016, January 15\u201318). Filtering Association Rules in GENE Ontology Based on Term Specificity. Proceedings of the 2016 IEEE international conference on bioinformatics and biomedicine (bibm), Shenzhen, China."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"113","DOI":"10.1016\/j.cmpb.2015.03.007","article-title":"Using GO-WAR for mining cross-ontology weighted association rules","volume":"120","author":"Agapito","year":"2015","journal-title":"Comput. Methods Programs Biomed."},{"key":"ref_45","first-page":"2231","article-title":"A comparative study of training algorithms for supervised machine learning","volume":"2","author":"Bhavsar","year":"2012","journal-title":"Int. J. Soft Comput. Eng. (IJSCE)"},{"key":"ref_46","unstructured":"Han, J., Pei, J., and Kamber, M. (2011). Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"10","DOI":"10.1145\/1656274.1656278","article-title":"The WEKA data mining software: An update","volume":"11","author":"Hall","year":"2009","journal-title":"ACM SIGKDD Explor. Newsl."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"988","DOI":"10.1109\/72.788640","article-title":"An overview of statistical learning theory","volume":"10","author":"Vapnik","year":"1999","journal-title":"IEEE Trans. Neural Netw."},{"key":"ref_49","doi-asserted-by":"crossref","unstructured":"Tanwani, A.K., Afridi, J., Shafiq, M.Z., and Farooq, M. (2009). Guidelines to Select Machine Learning Scheme for Classification of Biomedical Datasets. Proceedings of the European Conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics, Springer.","DOI":"10.1007\/978-3-642-01184-9_12"},{"key":"ref_50","unstructured":"Carletta, J. (1996). Assessing agreement on classification tasks: The kappa statistic. arXiv."},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"1247","DOI":"10.5194\/gmd-7-1247-2014","article-title":"Root mean square error (RMSE) or mean absolute error (MAE)?\u2014Arguments against avoiding RMSE in the literature","volume":"7","author":"Chai","year":"2014","journal-title":"Geosci. Model Dev."},{"key":"ref_52","unstructured":"Dunham, M.H., and Sridhar, S. (2006). Data Mining: Introductory and Advanced Topics, Dorling Kindersley, Pearson Education."},{"key":"ref_53","doi-asserted-by":"crossref","unstructured":"Jiang, L., Huang, J., Higgs, B.W., Hu, Z., Xiao, Z., Yao, X., Conley, S., Zhong, H., Liu, Z., and Brohawn, P. (2016). Genomic landscape survey identifies SRSF1 as a key oncodriver in small cell lung cancer. PLoS Genet., 12.","DOI":"10.1371\/journal.pgen.1005895"},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"e86837","DOI":"10.1172\/jci.insight.86837","article-title":"Profiling cancer testis antigens in non\u2013small-cell lung cancer","volume":"1","author":"Djureinovic","year":"2016","journal-title":"JCI Insight"},{"key":"ref_55","doi-asserted-by":"crossref","unstructured":"Bullard, J., Purdom, E., Hansen, K.D., and Dudoit, S. (2010). Evaluation of statistical methods for normalization and differential expression in mrna-seq experiments. BMC Bioinform., 11.","DOI":"10.1186\/1471-2105-11-94"},{"key":"ref_56","doi-asserted-by":"crossref","unstructured":"Ustebay, S., Turgut, Z., and Aydin, M.A. (2018, January 3\u20134). Intrusion Detection System with Recursive Feature Elimination by Using Random Forest and Deep Learning Classifier. Proceedings of the International Congress on Big Data, Deep Learning and Fighting Cyber Terrorism (IBIGDELFT), Ankara, Turkey.","DOI":"10.1109\/IBIGDELFT.2018.8625318"},{"key":"ref_57","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s40854-021-00243-3","article-title":"An efficient stock market prediction model using hybrid feature reduction method based on variational autoencoders and recursive feature elimination","volume":"7","author":"Gunduz","year":"2021","journal-title":"Financ. Innov."},{"key":"ref_58","doi-asserted-by":"crossref","first-page":"564","DOI":"10.1016\/j.procs.2021.06.066","article-title":"Review the performance of the Bernoulli Na\u00efve Bayes Classifier in Intrusion Detection Systems using Recursive Feature Elimination with Cross-validated selection of the best number of features","volume":"190","author":"Artur","year":"2021","journal-title":"Procedia Comput. Sci."},{"key":"ref_59","first-page":"13","article-title":"Tumor Type Detection Using Na\u00efve Bayes Algorithm on Gene Expression Cancer RNA-Seq Data Set","volume":"10","author":"Furat","year":"2019","journal-title":"Lung Cancer"}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/15\/1\/21\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,13]],"date-time":"2025-10-13T14:01:55Z","timestamp":1760364115000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/15\/1\/21"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,1,10]]},"references-count":59,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2022,1]]}},"alternative-id":["a15010021"],"URL":"https:\/\/doi.org\/10.3390\/a15010021","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,1,10]]}}}