{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T04:16:34Z","timestamp":1760242594517,"version":"build-2065373602"},"reference-count":18,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2017,12,6]],"date-time":"2017-12-06T00:00:00Z","timestamp":1512518400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>Feature selection is a useful tool for identifying which features, or attributes, of a dataset cause or explain the phenomena that the dataset describes, and improving the efficiency and accuracy of learning algorithms for discovering such phenomena. Consequently, feature selection has been studied intensively in machine learning research. However, while feature selection algorithms that exhibit excellent accuracy have been developed, they are seldom used for analysis of high-dimensional data because high-dimensional data usually include too many instances and features, which make traditional feature selection algorithms inefficient. To eliminate this limitation, we tried to improve the run-time performance of two of the most accurate feature selection algorithms known in the literature. The result is two accurate and fast algorithms, namely sCwc and sLcc. Multiple experiments with real social media datasets have demonstrated that our algorithms improve the performance of their original algorithms remarkably. For example, we have two datasets, one with 15,568 instances and 15,741 features, and another with 200,569 instances and 99,672 features. sCwc performed feature selection on these datasets in 1.4 seconds and in 405 seconds, respectively. In addition, sLcc has turned out to be as fast as sCwc on average. This is a remarkable improvement because it is estimated that the original algorithms would need several hours to dozens of days to process the same datasets. In addition, we introduce a fast implementation of our algorithms: sCwc does not require any adjusting parameter, while sLcc requires a threshold parameter, which we can use to control the number of features that the algorithm selects.<\/jats:p>","DOI":"10.3390\/info8040159","type":"journal-article","created":{"date-parts":[[2017,12,6]],"date-time":"2017-12-06T11:29:36Z","timestamp":1512559776000},"page":"159","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["sCwc\/sLcc: Highly Scalable Feature Selection Algorithms"],"prefix":"10.3390","volume":"8","author":[{"given":"Kilho","family":"Shin","sequence":"first","affiliation":[{"name":"Graduate School of Applied Informatics, University of Hyogo, Kobe 651-2197, Japan"},{"name":"Information Networking Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA"}]},{"given":"Tetsuji","family":"Kuboyama","sequence":"additional","affiliation":[{"name":"Computer Centre, Gakushuin University, Tokyo 171-0031, Japan"}]},{"given":"Takako","family":"Hashimoto","sequence":"additional","affiliation":[{"name":"Institude of Economic Research, Chiba University of Commerce, Chiba 272-8512, Japan"}]},{"given":"Dave","family":"Shepard","sequence":"additional","affiliation":[{"name":"Center for Digital Humanities, University of California Las Angeles; Los Angeles, CA 90095, USA"}]}],"member":"1968","published-online":{"date-parts":[[2017,12,6]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1016\/0004-3702(94)90084-1","article-title":"Learning boolean concepts in the presence of many irrelevant features","volume":"69","author":"Almuallim","year":"1994","journal-title":"Artif. Intell."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Liu, H., Motoda, H., and Dash, M. (1998, January 21\u201323). A monotonic measure for optimal feature selection. Proceedings of the European Conference on Machine Learning, Chemnitz, Germany.","DOI":"10.1007\/BFb0026678"},{"key":"ref_3","unstructured":"Zhao, Z., and Liu, H. (2007, January 6\u201312). Searching for Interacting Features. Proceedings of the International Joint Conference on Artificial Intelligence, Hyderabad, India."},{"key":"ref_4","unstructured":"Shin, K., and Xu, X. (2009, January 28\u201330). Consistency-based feature selection. Proceedings of the 13th International Conferece on Knowledge-Based and Intelligent Information & Engineering System, Santiago, Chile."},{"key":"ref_5","unstructured":"Shin, K., Fernandes, D., and Miyazaki, S. (2011, January 16\u201322). Consistency Measures for Feature Selection: A Formal Definition, Relative Sensitivity Comparison, and a Fast Algorithm. Proceedings of the 22nd International Joint Conference on Artificial Intelligence, Barcelona, Spain."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Kononenko, I. (1994). Estimating Attributes: Analysis and Extension of RELIEF, Springer.","DOI":"10.1007\/3-540-57868-4_57"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1226","DOI":"10.1109\/TPAMI.2005.159","article-title":"Feature selection based on mutual information: Criteria of max-dependency, max-relevance and min-redundancy","volume":"27","author":"Peng","year":"2005","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_8","unstructured":"Yu, L., and Liu, H. (2003, January 21\u201324). Feature selection for high-dimensional data: A fast correlation-based filter solution. Proceedings of the Twentieth International Conference on Machine Learning, Washington, DC, USA."},{"key":"ref_9","unstructured":"Hall, M.A. (July, January 29). Correlation-based feature selection for discrete and numeric class machine learning. Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000), Stanford, CA, USA."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"199","DOI":"10.1016\/j.patcog.2012.07.028","article-title":"Selecting feature subset for high dimensional data via the propositional FOIL rules","volume":"46","author":"Wang","year":"2013","journal-title":"Pattern Recognit."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Quinlan, J., and Cameron-Jones, R. (1993, January 5\u20137). FOIL: A midterm report. Proceedings of the European Conference on Machine Learning, Vienna, Austria.","DOI":"10.1007\/3-540-56602-3_124"},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"646","DOI":"10.1111\/coin.12072","article-title":"A Fast and Accurate Feature Selection Algorithm based on Binary Consistency Measure","volume":"32","author":"Shin","year":"2016","journal-title":"Comput. Intell."},{"key":"ref_13","unstructured":"Kira, K., and Rendell, L. (1992, January 1\u20133). A practical approach to feature selection. Proceedings of the 9th International Workshop on Machine Learning, Aberdeen, UK."},{"key":"ref_14","unstructured":"Neural Information Processing Systems (NIPS) (2003). Neural Information Processing Systems Conference 2003: Feature Selection Challenge, NIPS."},{"key":"ref_15","unstructured":"(2006, January 16\u201321). World Congress on Computational Intelligence (WCCI). Proceedings of the IEEE World Congress on Computational Intelligence 2006: Performance Prediction Challenge, Vancouver, BC, Canada."},{"key":"ref_16","unstructured":"Blake, C.S., and Merz, C.J. (1998). UCI Repository of Machine Learning Databases, University of California. Technical Report."},{"key":"ref_17","first-page":"1","article-title":"Statistical comparisons of classifiers over multiple data sets","volume":"7","year":"2006","journal-title":"J. Mach. Learn. Theory"},{"key":"ref_18","unstructured":"Snoek, J., Larochelle, H., and Adams, R.P. (arXiv, 2012). Practical Bayesian Optimization of Machine Learning Algorithms, arXiv."}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/8\/4\/159\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T18:52:50Z","timestamp":1760208770000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/8\/4\/159"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,12,6]]},"references-count":18,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2017,12]]}},"alternative-id":["info8040159"],"URL":"https:\/\/doi.org\/10.3390\/info8040159","relation":{},"ISSN":["2078-2489"],"issn-type":[{"type":"electronic","value":"2078-2489"}],"subject":[],"published":{"date-parts":[[2017,12,6]]}}}