{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T20:35:50Z","timestamp":1776112550528,"version":"3.50.1"},"reference-count":30,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2024,4,3]],"date-time":"2024-04-03T00:00:00Z","timestamp":1712102400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["MAKE"],"abstract":"<jats:p>Effective data reduction must retain the greatest possible amount of informative content of the data under examination. Feature selection is the default for dimensionality reduction, as the relevant features of a dataset are usually retained through this method. In this study, we used unsupervised learning to discover the top-k discriminative features present in the large multivariate IoT dataset used. We used the statistics of principal component analysis to filter the relevant features based on the ranks of the features along the principal directions while also considering the coefficients of the components. The selected number of principal components was used to decide the number of features to be selected in the SVD process. A number of experiments were conducted using different benchmark datasets, and the effectiveness of the proposed method was evaluated based on the reconstruction error. The potency of the results was verified by subjecting the algorithm to a large IoT dataset, and we compared the performance based on accuracy and reconstruction error to the results of the benchmark datasets. The performance evaluation showed consistency with the results obtained with the benchmark datasets, which were of high accuracy and low reconstruction error.<\/jats:p>","DOI":"10.3390\/make6020037","type":"journal-article","created":{"date-parts":[[2024,4,3]],"date-time":"2024-04-03T11:01:41Z","timestamp":1712142101000},"page":"789-799","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":15,"title":["Effective Data Reduction Using Discriminative Feature Selection Based on Principal Component Analysis"],"prefix":"10.3390","volume":"6","author":[{"given":"Faith","family":"Nwokoma","sequence":"first","affiliation":[{"name":"Electrical and Computer Engineering Department, Prairie View A&M University, Prairie View, TX 77446, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6344-6689","authenticated-orcid":false,"given":"Justin","family":"Foreman","sequence":"additional","affiliation":[{"name":"Electrical and Computer Engineering Department, Prairie View A&M University, Prairie View, TX 77446, USA"}]},{"given":"Cajetan M.","family":"Akujuobi","sequence":"additional","affiliation":[{"name":"Electrical and Computer Engineering Department, Prairie View A&M University, Prairie View, TX 77446, USA"}]}],"member":"1968","published-online":{"date-parts":[[2024,4,3]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Kavitha, R., and Kannan, E. (2016, January 24\u201326). An efficient framework for heart disease classification using feature extraction and feature selection technique in data mining. Proceedings of the 2016 International Conference on Emerging Trends in Engineering, Technology and Science (ICETETS), Pudukkottai, India.","DOI":"10.1109\/ICETETS.2016.7603000"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"1303","DOI":"10.1109\/JSTSP.2018.2873988","article-title":"PF-FELM: A Robust PCA Feature Selection for Fuzzy Extreme Learning Machine","volume":"12","author":"Kale","year":"2018","journal-title":"IEEE J. Sel. Top. Signal Process."},{"key":"ref_3","unstructured":"Duda, R.O., Hart, P.E., and Stork, D.G. (2012). Pattern Classification, John Wiley & Sons."},{"key":"ref_4","unstructured":"Cui, Y., and Fang, Y. (2020, January 23\u201325). Research on PCA Data Dimension Reduction Algorithm Based on Entropy Weight Method. Proceedings of the 2020 2nd International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI), Taiyuan, China."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Ibrahim, M.F.I., and Al-Jumaily, A.A. (2016, January 15\u201317). PCA indexing based feature learning and feature selection. Proceedings of the 2016 8th Cairo International Biomedical Engineering Conference (CIBEC), Cairo, Egypt.","DOI":"10.1109\/CIBEC.2016.7836122"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Kane, A., and Shiri, N. (2016, January 12\u201315). Selecting the Top-K Discriminative Feature Using Principal Component Analysis. Proceedings of the 2016 IEEE 16th International Conference on Data Mining Workshops, Barcelona, Spain.","DOI":"10.1109\/ICDMW.2016.0096"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Chandak, T., Ghorpade, C., and Shukla, S. (2019, January 26\u201328). Effective Analysis of Feature Selection Algorithms for Network based Intrusion Detection System. Proceedings of the 2019 IEEE Bombay Section Signature Conference (IBSSC), Mumbai, India.","DOI":"10.1109\/IBSSC47189.2019.8973103"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Shah, H., and Verma, K. (2016, January 25\u201327). Voltage stability monitoring by different ANN architectures using PCA based feature selection. Proceedings of the 2016 IEEE 7th Power India International Conference (PIICON), Bikaner, India.","DOI":"10.1109\/POWERI.2016.8077157"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Ahmadi, S.S., Rashad, S., and Elgazzar, H. (2019, January 10\u201312). Efficient Feature Selection for Intrusion Detection Systems. Proceedings of the 2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), New York, NY, USA.","DOI":"10.1109\/UEMCON47517.2019.8992960"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Sagar, S., Shrivastava, A., and Gupta, C. (2018, January 28\u201329). Feature Reduction and Selection Based Optimization for Hybrid Intrusion Detection System Using PGO followed by SVM. Proceedings of the 2018 International Conference on Advanced Computation and Telecommunication (ICACAT), Bhopal, India.","DOI":"10.1109\/ICACAT.2018.8933651"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Khonde, S.R., and Ulagamuthalvi, D.V. (2020, January 14\u201316). Ensemble and Feature Selection-Based Intrusion Detection System for Multi-attack Environment. Proceedings of the 2020 5th International Conference on Computing, Communication and Security (ICCCS), Patna, India.","DOI":"10.1109\/ICCCS49678.2020.9276875"},{"key":"ref_12","first-page":"148","article-title":"A Subset Feature Elimination Mechanism for Intrusion Detection System","volume":"7","author":"Nkiama","year":"2016","journal-title":"(IJACSA) Int. J. Adv. Comput. Sci. Appl."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Hakim, L., Fatma, R. (2019, January 16\u201317). Influence Analysis for Feature Selection to Network Intrusion Detection System Performance Using NSL-KDD Dataset. Proceedings of the 2019 International Conference on Computer Science, Information Technology, and Electrical Engineering (ICOMITEE), Jember, Indonesia.","DOI":"10.1109\/ICOMITEE.2019.8920961"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"39","DOI":"10.1023\/A:1008280620621","article-title":"Overcoming the myopia of inductive learning algorithms with RELIEFF","volume":"7","author":"Kononenko","year":"1997","journal-title":"Appl. Intell."},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Li, Y., Shi, K., Qiao, F., and Luo, H. (2020, January 23\u201325). A Feature Subset Selection Method Based on the Combination of PCA and Improved GA. Proceedings of the 2020 2nd International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI), Taiyuan, China.","DOI":"10.1109\/MLBDBI51377.2020.00042"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Jolliffe, I. (2014). Principal Component Analysis, Wiley Online Library.","DOI":"10.1002\/9781118445112.stat06472"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Patil, G.V., Pachghare, K.V., and Kshirsagar, D.D. (2018, January 18\u201319). Feature Reduction in Flow Based Intrusion Detection System. Proceedings of the 2018 3rd IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT-2018), Bangalore, India.","DOI":"10.1109\/RTEICT42901.2018.9012554"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Divekar, A., Parekh, M., Savla, V., and Mishra, R. (2018, January 25\u201327). Benchmarking datasets for Anomaly-based Network Intrusion Detection: KDD CUP 99 alternatives. Proceedings of the 2018 IEEE 3rd International Conference on Computing, Communication and Security (ICCCS), Kathmandu, Nepal.","DOI":"10.1109\/CCCS.2018.8586840"},{"key":"ref_19","unstructured":"UCI Machine Learning Repository (2024, March 14). Arrhythmia Data Set. Available online: https:\/\/archive.ics.uci.edu\/dataset\/5\/arrhythmia."},{"key":"ref_20","unstructured":"UCI Machine Learning Repository (2024, March 14). Madelon Data Set. Available online: https:\/\/archive.ics.uci.edu\/dataset\/171\/madelon."},{"key":"ref_21","unstructured":"UCI Machine Learning Repository (2024, March 14). Gisette Data Set. Available online: https:\/\/archive.ics.uci.edu\/dataset\/170\/gisette."},{"key":"ref_22","unstructured":"UCI Machine Learning Repository (2024, March 14). Ionosphere Data Set. Available online: https:\/\/archive.ics.uci.edu\/dataset\/52\/ionosphere."},{"key":"ref_23","unstructured":"Simplilearn (2024, March 14). What Is Data Standardization?. Available online: https:\/\/www.simplilearn.com\/what-is-data-standardization-article#:~:text=Data%20standardization%20is%20converting%20data,YYYY%2DMM%2DDD."},{"key":"ref_24","unstructured":"Choudhary, A. (2024, March 14). Understanding the Covariance Matrix. DataScience+. Available online: https:\/\/datascienceplus.com\/understanding-the-covariance-matrix\/."},{"key":"ref_25","unstructured":"Wikipedia (2024, March 14). Singular Value Decomposition. Available online: https:\/\/en.wikipedia.org\/wiki\/Singular_value_decomposition."},{"key":"ref_26","unstructured":"Guruswami, V. (2024, March 14). Chapter 4: Error-Correcting Codes. Carnegie Mellon University. Available online: https:\/\/www.cs.cmu.edu\/~venkatg\/teaching\/CStheory-infoage\/book-chapter-4.pdf."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Megantara, A.A., and Ahmad, T. (2020, January 14\u201316). Feature Importance Ranking for Increasing Performance of Intrusion Detection System. Proceedings of the 2020 3rd International Conference on Computer and Information Engineering (IC2IE), Beijing, China.","DOI":"10.1109\/IC2IE50715.2020.9274570"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Ekici, B., Tarhan, A., and Ozsoy, A. (2019, January 11\u201315). Data Cleaning for Process Mining with Smart Contract. Proceedings of the (UBMK\u201919) 4th International Conference on Computer Science and Engineering-324, Samsun, Turkey.","DOI":"10.1109\/UBMK.2019.8907140"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"20150202","DOI":"10.1098\/rsta.2015.0202","article-title":"Principal Component Analysis: A Review and Recent Developments","volume":"374","author":"Jollife","year":"2016","journal-title":"Philos. Trans. R. Soc. A"},{"key":"ref_30","unstructured":"Junaid, A. (2024, March 14). Metrics to Evaluate Your Machine Learning Algorithm. Towards Data Science. Available online: https:\/\/towardsdatascience.com\/metrics-to-evaluate-your-machine-learning-algorithm-f10ba6e38234."}],"container-title":["Machine Learning and Knowledge Extraction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2504-4990\/6\/2\/37\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T14:22:59Z","timestamp":1760106179000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2504-4990\/6\/2\/37"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,4,3]]},"references-count":30,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2024,6]]}},"alternative-id":["make6020037"],"URL":"https:\/\/doi.org\/10.3390\/make6020037","relation":{},"ISSN":["2504-4990"],"issn-type":[{"value":"2504-4990","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,4,3]]}}}