{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,10]],"date-time":"2026-04-10T02:22:30Z","timestamp":1775787750564,"version":"3.50.1"},"reference-count":39,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,1,1]],"date-time":"2025-01-01T00:00:00Z","timestamp":1735689600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,1,13]],"date-time":"2025-01-13T00:00:00Z","timestamp":1736726400000},"content-version":"vor","delay-in-days":12,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100005005","name":"Ben-Gurion University","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100005005","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Mach Learn"],"published-print":{"date-parts":[[2025,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>The challenge of getting big amounts of high-quality labeled data is compounded by the fact that data labeling is often subjective and requires significant human effort. In many cases, the quality of the labeled data depends entirely on the expertise and experience of human annotators, making it challenging to ensure labeling accuracy in large and dynamic datasets. Moreover, there may be a significant delay between the arrival of a new instance and its manual labeling. This paper explores the use of fully unsupervised feature selection algorithms in non-stationary data streams, where the importance of features may change over time. We introduce a novel feature selection algorithm called Online Fast FEa-ture SELection-OFFESEL, which calculates the feature importance scores in each incoming window based on their mean normalized values and without using any class labels. We evaluate OFFESEL on 17 benchmark data streams, both stationary and non-stationary, using popular online classifiers like PerceptronMask, VFDT, Online Boosting, and Linear SVM. We compare OFFESEL to several other feature selection algorithms, including state-of-the-art supervised ones like FIRES and ABFS, as well as popular unsupervised ones like MCFS, LS, and Max Variance, which we adapted to data streams. Our results indicate that OFFESEL outperforms all supervised and unsupervised feature selection algorithms in terms of classification accuracy. Specifically, OFFESEL preserves the accuracy level of the supervised FIRES algorithm, which proved more accurate than ABFS in our experiments, while maintaining the accuracy level achieved by the unsupervised Max Variance algorithm. Moreover, OFFESEL requires even less computation time than Max Variance and shows high stability on stationary datasets. Overall, our study demonstrates the potential benefits of using unlabeled data for feature ranking and selection in dynamic data streams.<\/jats:p>","DOI":"10.1007\/s10994-024-06712-x","type":"journal-article","created":{"date-parts":[[2025,1,13]],"date-time":"2025-01-13T21:16:39Z","timestamp":1736802999000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["Fast online feature selection in streaming data"],"prefix":"10.1007","volume":"114","author":[{"given":"Yael","family":"Hochma","sequence":"first","affiliation":[]},{"given":"Mark","family":"Last","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,1,13]]},"reference":[{"issue":"4","key":"6712_CR1","first-page":"e1364","volume":"10","author":"N Al Nuaimi","year":"2020","unstructured":"Al Nuaimi, N., & Masud, M. M. (2020). Online streaming feature selection with incremental feature grouping. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 10(4), e1364.","journal-title":"Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery"},{"key":"6712_CR2","doi-asserted-by":"crossref","unstructured":"Barddal, J.\u00a0P., Gomes, H.\u00a0M., Granatyr, J., de\u00a0Souza\u00a0Britto, A., & Enembreck, F. (2016). Overcoming feature drifts via dynamic feature weighted k-nearest neighbor learning. In 2016 23rd International conference on pattern recognition (ICPR) (pp. 2186\u20132191). IEEE.","DOI":"10.1109\/ICPR.2016.7899960"},{"key":"6712_CR3","doi-asserted-by":"publisher","first-page":"13","DOI":"10.1016\/j.is.2019.02.003","volume":"83","author":"JP Barddal","year":"2019","unstructured":"Barddal, J. P., Enembreck, F., Gomes, H. M., Bifet, A., & Pfahringer, B. (2019). Boosting decision stumps for dynamic feature selection on data streams. Information Systems, 83, 13\u201329.","journal-title":"Information Systems"},{"key":"6712_CR4","doi-asserted-by":"publisher","first-page":"278","DOI":"10.1016\/j.jss.2016.07.005","volume":"127","author":"JP Barddal","year":"2017","unstructured":"Barddal, J. P., Gomes, H. M., Enembreck, F., & Pfahringer, B. (2017). A survey on feature drift adaptation: Definition, benchmark, challenges and future directions. Journal of Systems and Software, 127, 278\u2013294.","journal-title":"Journal of Systems and Software"},{"key":"6712_CR5","first-page":"1601","volume":"11","author":"A Bifet","year":"2010","unstructured":"Bifet, A., Holmes, G., Kirkby, R., & Pfahringer, B. (2010). MOA: Massive online analysis. Journal of Machine Learning Research, 11, 1601\u20131604.","journal-title":"Journal of Machine Learning Research"},{"key":"6712_CR6","doi-asserted-by":"crossref","unstructured":"Cai, D., Zhang, C., & He, X. (2010). Unsupervised feature selection for multi-cluster data. In Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 333\u2013342).","DOI":"10.1145\/1835804.1835848"},{"key":"6712_CR7","doi-asserted-by":"crossref","unstructured":"Carvalho, V.\u00a0R., & Cohen, W.\u00a0W. (2006). Single-pass online learning: Performance, voting schemes and online feature selection. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 548\u2013553).","DOI":"10.1145\/1150402.1150466"},{"key":"6712_CR8","doi-asserted-by":"crossref","unstructured":"Chandra, B. (2016). Gene selection methods for microarray data. In Applied computing in medicine and health (pp. 45\u201378). Elsevier.","DOI":"10.1016\/B978-0-12-803468-2.00003-5"},{"key":"6712_CR9","doi-asserted-by":"crossref","unstructured":"Domingos, P., & Hulten, G. (2000). Mining high-speed data streams. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 71\u201380).","DOI":"10.1145\/347090.347107"},{"key":"6712_CR10","doi-asserted-by":"crossref","unstructured":"Duarte, J., & Gama, J. (2017). Feature ranking in Hoeffding algorithms for regression In Proceedings of the symposium on applied computing (pp. 836\u2013841).","DOI":"10.1145\/3019612.3019670"},{"key":"6712_CR11","doi-asserted-by":"publisher","first-page":"205","DOI":"10.1016\/j.knosys.2018.08.007","volume":"161","author":"MS Hammoodi","year":"2018","unstructured":"Hammoodi, M. S., Stahl, F., & Badii, A. (2018). Real-time feature selection technique with concept drift detection using adaptive micro-clusters for data stream mining. Knowledge-Based Systems, 161, 205\u2013239.","journal-title":"Knowledge-Based Systems"},{"key":"6712_CR12","doi-asserted-by":"crossref","unstructured":"Haug, J., Pawelczyk, M., Broelemann, K., & Kasneci, G. (2020). Leveraging model inherent variable importance for stable online feature selection. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining (pp. 1478\u20131502).","DOI":"10.1145\/3394486.3403200"},{"key":"6712_CR13","unstructured":"He, X., Cai, D., & Niyogi, P. (2005). Laplacian score for feature selection. Advances in Neural Information Processing Systems, 18."},{"issue":"4","key":"6712_CR14","doi-asserted-by":"publisher","first-page":"18","DOI":"10.1109\/5254.708428","volume":"13","author":"MA Hearst","year":"1998","unstructured":"Hearst, M. A., Dumais, S. T., Osuna, E., Platt, J., & Scholkopf, B. (1998). Support vector machines. IEEE Intelligent Systems and Their Applications, 13(4), 18\u201328.","journal-title":"IEEE Intelligent Systems and Their Applications"},{"key":"6712_CR15","doi-asserted-by":"crossref","unstructured":"Huang, H., Yoo, S., & Kasiviswanathan, S.\u00a0P. (2015). Unsupervised feature selection on data streams. In Proceedings of the 24th ACM international on conference on information and knowledge management (pp. 1031\u20131040).","DOI":"10.1145\/2806416.2806521"},{"key":"6712_CR16","unstructured":"Katakis, I., Tsoumakas, G., & Vlahavas, I. (2008). An ensemble of classifiers for coping with recurring contexts in data streams. In ECAI 2008 (pp. 763\u2013764). IOS Press."},{"key":"6712_CR17","doi-asserted-by":"publisher","first-page":"132","DOI":"10.1016\/j.inffus.2017.02.004","volume":"37","author":"B Krawczyk","year":"2017","unstructured":"Krawczyk, B., Minku, L. L., Gama, J., Stefanowski, J., & Wo\u017aniak, M. (2017). Ensemble learning for data stream analysis: A survey. Information Fusion, 37, 132\u2013156.","journal-title":"Information Fusion"},{"issue":"2","key":"6712_CR18","doi-asserted-by":"publisher","first-page":"129","DOI":"10.3233\/IDA-2002-6203","volume":"6","author":"M Last","year":"2002","unstructured":"Last, M. (2002). Online classification of nonstationary data streams. Intelligent Data Analysis, 6(2), 129\u2013147.","journal-title":"Intelligent Data Analysis"},{"issue":"6","key":"6712_CR19","doi-asserted-by":"publisher","first-page":"94","DOI":"10.1145\/3136625","volume":"50","author":"J Li","year":"2018","unstructured":"Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R. P., Tang, J., & Liu, H. (2018). Feature selection: A data perspective. ACM Computing Surveys (CSUR), 50(6), 94.","journal-title":"ACM Computing Surveys (CSUR)"},{"issue":"2","key":"6712_CR20","doi-asserted-by":"publisher","first-page":"627","DOI":"10.1007\/s10618-022-00911-7","volume":"37","author":"Y Lu","year":"2023","unstructured":"Lu, Y., Wu, R., Mueen, A., Zuluaga, M. A., & Keogh, E. (2023). Damp: Accurate time series anomaly detection on trillions of datapoints and ultra-fast arriving data streams. Data Mining and Knowledge Discovery, 37(2), 627\u2013669.","journal-title":"Data Mining and Knowledge Discovery"},{"key":"6712_CR21","doi-asserted-by":"publisher","unstructured":"Mitchell, T. (1999). Twenty newsgroups. UCI Machine Learning Repository. https:\/\/doi.org\/10.24432\/C5C323","DOI":"10.24432\/C5C323"},{"key":"6712_CR22","unstructured":"MOA. (2023). Moa website. https:\/\/moa.cms.waikato.ac.nz\/datasets\/."},{"key":"6712_CR23","unstructured":"Montiel, J., Read, J., Bifet, A., & Abdessalem, T. Skmultiflow onlineboosting."},{"key":"6712_CR24","unstructured":"Montiel, J., Read, J., Bifet, A., & Abdessalem, T. Skmultiflow perceptronmask. https:\/\/scikit-multiflow.readthedocs.io\/en\/stable\/api\/generated\/skmultiflow.neural_networks.PerceptronMask.html"},{"key":"6712_CR25","unstructured":"Montiel, J., Read, J., Bifet, A., & Abdessalem, T. Skmultiflow vfdt. https:\/\/scikit-multiflow.readthedocs.io\/en\/stable\/api\/generated\/skmultiflow.trees.HoeffdingTreeClassifier.html"},{"issue":"72","key":"6712_CR26","first-page":"1","volume":"19","author":"J Montiel","year":"2018","unstructured":"Montiel, J., Read, J., Bifet, A., & Abdessalem, T. (2018). Scikit-multiflow: A multi-output streaming framework. Journal of Machine Learning Research, 19(72), 1\u20135.","journal-title":"Journal of Machine Learning Research"},{"key":"6712_CR27","unstructured":"Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. Scikit-learn sgdclassifier. https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.linear_model.SGDClassifier.html"},{"key":"6712_CR28","first-page":"2825","volume":"12","author":"F Pedregosa","year":"2011","unstructured":"Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825\u20132830.","journal-title":"Journal of Machine Learning Research"},{"key":"6712_CR29","doi-asserted-by":"crossref","unstructured":"Petkovi\u0107, M., D\u017eeroski, S., & Kocev, D. (2022). Feature ranking for semi-supervised learning. Machine Learning, 1\u201330.","DOI":"10.1007\/s10994-022-06181-0"},{"issue":"9","key":"6712_CR30","doi-asserted-by":"publisher","first-page":"5377","DOI":"10.1007\/s00500-022-07767-5","volume":"27","author":"DK Rakesh","year":"2023","unstructured":"Rakesh, D. K., Anwit, R., & Jana, P. K. (2023). A new ranking-based stability measure for feature selection algorithms. Soft Computing, 27(9), 5377\u20135396.","journal-title":"Soft Computing"},{"issue":"1","key":"6712_CR31","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1186\/s13634-018-0528-x","volume":"2018","author":"A Rosenfeld","year":"2018","unstructured":"Rosenfeld, A., Illuz, R., Gottesman, D., & Last, M. (2018). Using discretization for extending the set of predictive features. EURASIP Journal on Advances in Signal Processing, 2018(1), 1\u201311.","journal-title":"EURASIP Journal on Advances in Signal Processing"},{"key":"6712_CR32","doi-asserted-by":"publisher","first-page":"838","DOI":"10.1109\/TIP.2023.3234497","volume":"32","author":"D Shi","year":"2023","unstructured":"Shi, D., Zhu, L., Li, J., Zhang, Z., & Chang, X. (2023). Unsupervised adaptive feature selection with binary hashing. IEEE Transactions on Image Processing, 32, 838\u2013853.","journal-title":"IEEE Transactions on Image Processing"},{"key":"6712_CR33","doi-asserted-by":"publisher","first-page":"1805","DOI":"10.1007\/s10618-020-00698-5","volume":"34","author":"VM Souza","year":"2020","unstructured":"Souza, V. M., dos Reis, D. M., Maletzke, A. G., & Batista, G. E. (2020). Challenges in benchmarking stream learning algorithms with real-world data. Data Mining and Knowledge Discovery, 34, 1805\u20131858.","journal-title":"Data Mining and Knowledge Discovery"},{"issue":"1","key":"6712_CR34","doi-asserted-by":"publisher","first-page":"56","DOI":"10.1016\/j.inffus.2006.11.002","volume":"9","author":"A Tsymbal","year":"2008","unstructured":"Tsymbal, A., Pechenizkiy, M., Cunningham, P., & Puuronen, S. (2008). Dynamic integration of classifiers for handling concept drift. Information fusion, 9(1), 56\u201368.","journal-title":"Information fusion"},{"key":"6712_CR35","doi-asserted-by":"crossref","unstructured":"Wang, M., & Barbu, A. (2022). Online feature screening for data streams with concept drift. IEEE Transactions on Knowledge and Data Engineering.","DOI":"10.1109\/TKDE.2022.3232752"},{"issue":"12","key":"6712_CR36","doi-asserted-by":"publisher","first-page":"3353","DOI":"10.1109\/TKDE.2016.2609424","volume":"28","author":"B Wang","year":"2016","unstructured":"Wang, B., & Pineau, J. (2016). Online bagging and boosting for imbalanced data streams. IEEE Transactions on Knowledge and Data Engineering, 28(12), 3353\u20133366.","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"issue":"3","key":"6712_CR37","doi-asserted-by":"publisher","first-page":"698","DOI":"10.1109\/TKDE.2013.32","volume":"26","author":"J Wang","year":"2013","unstructured":"Wang, J., Zhao, P., Hoi, S. C., & Jin, R. (2013). Online feature selection and its applications. IEEE Transactions on Knowledge and Data Engineering, 26(3), 698\u2013710.","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"key":"6712_CR38","doi-asserted-by":"crossref","unstructured":"Zheng, X., Xu, N., Trinh, L., Wu, D., Huang, T., Sivaranjani, S., Liu, Y., & Xie, L. (2021). A multi-scale time-series dataset with benchmark for machine learning in decarbonized energy grids. arXiv preprint arXiv:2110.06324","DOI":"10.1038\/s41597-022-01455-7"},{"key":"6712_CR39","doi-asserted-by":"publisher","unstructured":"Zheng, X., Xu, N., Wu, D., Trinh, L., Huang, T., Sivaranjani, S., Liu, Y., & Xie, L. (2021). PSML: A multi-scale time-series dataset for machine learning in decarbonized energy grids (dataset). https:\/\/doi.org\/10.5281\/zenodo.5130612","DOI":"10.5281\/zenodo.5130612"}],"container-title":["Machine Learning"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-024-06712-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10994-024-06712-x\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10994-024-06712-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,2,7]],"date-time":"2025-02-07T16:16:43Z","timestamp":1738945003000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10994-024-06712-x"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,1]]},"references-count":39,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,1]]}},"alternative-id":["6712"],"URL":"https:\/\/doi.org\/10.1007\/s10994-024-06712-x","relation":{},"ISSN":["0885-6125","1573-0565"],"issn-type":[{"value":"0885-6125","type":"print"},{"value":"1573-0565","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,1]]},"assertion":[{"value":"21 December 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"21 October 2024","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"26 October 2024","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 January 2025","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}},{"value":"Not applicable","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval"}},{"value":"Not applicable","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent to participate"}},{"value":"Not applicable","order":5,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}}],"article-number":"1"}}