{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,24]],"date-time":"2026-03-24T02:05:12Z","timestamp":1774317912193,"version":"3.50.1"},"reference-count":36,"publisher":"Springer Science and Business Media LLC","issue":"5","license":[{"start":{"date-parts":[[2022,8,13]],"date-time":"2022-08-13T00:00:00Z","timestamp":1660348800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,8,13]],"date-time":"2022-08-13T00:00:00Z","timestamp":1660348800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100010061","name":"University of Waikato","doi-asserted-by":"crossref","id":[{"id":"10.13039\/100010061","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Data Min Knowl Disc"],"published-print":{"date-parts":[[2022,9]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Most research in machine learning for data streams has focused on classification algorithms, whereas regression methods have received a lot less attention. This paper proposes Self-Optimising K-Nearest Leaves (SOKNL), a novel forest-based algorithm for streaming regression problems. Specifically, the Adaptive Random Forest Regression, a state-of-the-art online regression algorithm is extended like this: in each leaf, a representative data point \u2013 also called centroid \u2013 is generated by compressing the information from all instances in that leaf. During the prediction step, instead of letting all trees in the forest participate, the distances between the input instance and all centroids from relevant leaves are calculated, only <jats:italic>k<\/jats:italic> trees that possess the smallest distances are utilised for the prediction. Furthermore, we simplify the algorithm by introducing a mechanism for tuning the <jats:italic>k<\/jats:italic> values, which is dynamically and automatically optimised based on historical information. This new algorithm produces promising predictive results and achieves a superior ranking according to statistical testing when compared with several standard stream regression methods over typical benchmark datasets. This improvement incurs only a small increase in runtime and memory consumption over the basic Adaptive Random Forest Regressor.<\/jats:p>","DOI":"10.1007\/s10618-022-00858-9","type":"journal-article","created":{"date-parts":[[2022,8,13]],"date-time":"2022-08-13T20:16:56Z","timestamp":1660421816000},"page":"2006-2032","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":19,"title":["SOKNL: A novel way of integrating K-nearest neighbours with adaptive random forest regression for data streams"],"prefix":"10.1007","volume":"36","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8325-1889","authenticated-orcid":false,"given":"Yibin","family":"Sun","sequence":"first","affiliation":[]},{"given":"Bernhard","family":"Pfahringer","sequence":"additional","affiliation":[]},{"given":"Heitor Murilo","family":"Gomes","sequence":"additional","affiliation":[]},{"given":"Albert","family":"Bifet","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,8,13]]},"reference":[{"key":"858_CR1","doi-asserted-by":"crossref","unstructured":"Almeida E, Ferreira C, Gama J (2013) Adaptive model rules from data streams. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp 480\u2013492. Springer","DOI":"10.1007\/978-3-642-40988-2_31"},{"key":"858_CR2","volume-title":"k-means++: The advantages of careful seeding","author":"D Arthur","year":"2006","unstructured":"Arthur D, Vassilvitskii S (2006) k-means++: The advantages of careful seeding. Technical report, Stanford"},{"key":"858_CR3","doi-asserted-by":"crossref","unstructured":"Bifet A, Gavalda R (2007) Learning from time-changing data with adaptive windowing. In: Proceedings of the 2007 SIAM International Conference on Data Mining, pp 443\u2013448. SIAM","DOI":"10.1137\/1.9781611972771.42"},{"key":"858_CR4","first-page":"1601","volume":"11","author":"A Bifet","year":"2010","unstructured":"Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) MOA: massive online analysis. J Mach Learn Res 11:1601\u20131604","journal-title":"J Mach Learn Res"},{"key":"858_CR5","doi-asserted-by":"crossref","unstructured":"Bifet A, Holmes G, Pfahringer B (2010) Leveraging bagging for evolving data streams. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp 135\u2013150. Springer","DOI":"10.1007\/978-3-642-15880-3_15"},{"key":"858_CR6","doi-asserted-by":"crossref","unstructured":"Boulegane D, Bifet A, Madhusudan G (2019) Arbitrated dynamic ensemble with abstaining for time-series forecasting on data streams. In: 2019 IEEE International Conference on Big Data (Big Data), pp 1040\u20131045. IEEE","DOI":"10.1109\/BigData47090.2019.9005541"},{"issue":"1","key":"858_CR7","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","volume":"45","author":"L Breiman","year":"2001","unstructured":"Breiman L (2001) Random forests. Mach Learn 45(1):5\u201332","journal-title":"Mach Learn"},{"key":"858_CR8","doi-asserted-by":"crossref","unstructured":"Cerqueira V, Torgo L, Pinto F, Soares C (2017) Arbitrated ensemble for time series forecasting. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp 478\u2013494. Springer","DOI":"10.1007\/978-3-319-71246-8_29"},{"key":"858_CR9","doi-asserted-by":"publisher","DOI":"10.7717\/peerj-cs.623","volume":"7","author":"D Chicco","year":"2021","unstructured":"Chicco D, Warrens MJ, Jurman G (2021) The coefficient of determination r-squared is more informative than smape, mae, mape, mse and rmse in regression analysis evaluation. PeerJ Comput Sci 7:e623","journal-title":"PeerJ Comput Sci"},{"key":"858_CR10","doi-asserted-by":"publisher","first-page":"733","DOI":"10.1007\/978-981-16-2712-5_57","volume-title":"Soft Computing for Problem Solving","author":"A Choudhary","year":"2021","unstructured":"Choudhary A, Jha P, Tiwari A, Bharill N (2021) A brief survey on concept drifted data stream regression. In: Tiwari A, Ahuja K, Yadav A, Bansal JC, Deep K, Nagar AK (eds) Soft Computing for Problem Solving. Singapore, Springer Singapore, pp 733\u2013744"},{"key":"858_CR11","unstructured":"Dhanabal S, Chandramathi S (2011) A review of various k-nearest neighbor query processing techniques. International Journal of Computer Applications"},{"key":"858_CR12","doi-asserted-by":"crossref","unstructured":"Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 71\u201380","DOI":"10.1145\/347090.347107"},{"key":"858_CR13","doi-asserted-by":"crossref","unstructured":"Friedman JH (1991) Multivariate adaptive regression splines. The Annals of Statistics, pp 1\u201367","DOI":"10.1214\/aos\/1176347963"},{"issue":"200","key":"858_CR14","doi-asserted-by":"publisher","first-page":"675","DOI":"10.1080\/01621459.1937.10503522","volume":"32","author":"M Friedman","year":"1937","unstructured":"Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675\u2013701","journal-title":"J Am Stat Assoc"},{"issue":"10","key":"858_CR15","doi-asserted-by":"publisher","first-page":"2044","DOI":"10.1016\/j.ins.2009.12.010","volume":"180","author":"S Garc\u00eda","year":"2010","unstructured":"Garc\u00eda S, Fern\u00e1ndez A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Inf Sci 180(10):2044\u20132064","journal-title":"Inf Sci"},{"issue":"9\u201310","key":"858_CR16","doi-asserted-by":"publisher","first-page":"1469","DOI":"10.1007\/s10994-017-5642-8","volume":"106","author":"HM Gomes","year":"2017","unstructured":"Gomes HM, Bifet A, Read J, Barddal JP, Enembreck F, Pfahringer B, Holmes G, Abdessalem T (2017) Adaptive random forests for evolving data stream classification. Mach Learn 106(9\u201310):1469\u20131495","journal-title":"Mach Learn"},{"key":"858_CR17","unstructured":"Gomes HM, Barddal JP, Ferreira LEB, Bifet A (2018) Adaptive random forests for data stream regression. In: ESANN"},{"key":"858_CR18","doi-asserted-by":"crossref","unstructured":"Gomes HM, Montiel J, Mastelini SM, Pfahringer B, Bifet A (2020) On ensemble techniques for data stream regression. In: IJCNN. IEEE","DOI":"10.1109\/IJCNN48605.2020.9206756"},{"key":"858_CR19","doi-asserted-by":"crossref","unstructured":"Hoeffding W (1994) Probability inequalities for sums of bounded random variables. In: The Collected Works of Wassily Hoeffding, pp 409\u2013426. Springer","DOI":"10.1007\/978-1-4612-0865-5_26"},{"key":"858_CR20","doi-asserted-by":"crossref","unstructured":"Hoeffding W (1994) Probability inequalities for sums of bounded random variables. In: The Collected Works of Wassily Hoeffding, pp 409\u2013426. Springer","DOI":"10.1007\/978-1-4612-0865-5_26"},{"issue":"2","key":"858_CR21","doi-asserted-by":"publisher","first-page":"3537","DOI":"10.1109\/LRA.2021.3064509","volume":"6","author":"J Huang","year":"2021","unstructured":"Huang J, Rojas J, Zimmer M, Wu H, Guan Y, Weng P (2021) Hyperparameter auto-tuning in self-supervised robotic learning. IEEE Robot Autom Lett 6(2):3537\u20133544","journal-title":"IEEE Robot Autom Lett"},{"issue":"1","key":"858_CR22","doi-asserted-by":"publisher","first-page":"128","DOI":"10.1007\/s10618-010-0201-y","volume":"23","author":"E Ikonomovska","year":"2011","unstructured":"Ikonomovska E, Gama J, D\u017eeroski S (2011) Learning model trees from evolving data streams. Data Min Knowl Disc 23(1):128\u2013168","journal-title":"Data Min Knowl Disc"},{"key":"858_CR23","unstructured":"Ikonomovska E, Gama J, Zenko B, Dzeroski S (2011) Speeding-up hoeffding-based regression trees with options. In: ICML"},{"key":"858_CR24","doi-asserted-by":"publisher","first-page":"677","DOI":"10.1016\/j.asoc.2017.12.008","volume":"68","author":"B Krawczyk","year":"2018","unstructured":"Krawczyk B, Cano A (2018) Online ensemble learning with abstaining classifiers for drifting and noisy data streams. Appl Soft Comput 68:677\u2013692","journal-title":"Appl Soft Comput"},{"issue":"1","key":"858_CR25","doi-asserted-by":"publisher","first-page":"171","DOI":"10.1007\/s10115-017-1137-y","volume":"54","author":"V Losing","year":"2018","unstructured":"Losing V, Hammer B, Wersing H (2018) Tackling heterogeneous concept drift with the self-adjusting memory (sam). Knowl Inf Syst 54(1):171\u2013201","journal-title":"Knowl Inf Syst"},{"key":"858_CR26","doi-asserted-by":"crossref","unstructured":"Louppe G, Geurts P (2012) Ensembles on random patches. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases pp 346\u2013361. Springer","DOI":"10.1007\/978-3-642-33460-3_28"},{"key":"858_CR27","doi-asserted-by":"crossref","unstructured":"Lu J, Liu A, Dong F, Gu F, Gama J, Zhang G (2018) Learning under concept drift: A review. IEEE TKDE","DOI":"10.1109\/TKDE.2018.2876857"},{"issue":"1","key":"858_CR28","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s13721-016-0125-6","volume":"5","author":"G Luo","year":"2016","unstructured":"Luo G (2016) A review of automatic selection methods for machine learning algorithms and hyper-parameter values. Network Modeling Analysis in Health Informatics and Bioinformatics 5(1):1\u201316","journal-title":"Network Modeling Analysis in Health Informatics and Bioinformatics"},{"key":"858_CR29","unstructured":"Mouss H, Mouss D, Mouss N, Sefouhi L (2004) Test of page-hinckley, an approach for fault detection in an agro-alimentary production system. In: 2004 5th Asian Control Conference (IEEE Cat. No. 04EX904) 2: 815\u2013818. IEEE"},{"key":"858_CR30","unstructured":"Nash WJ, Sellers TL, Talbot SR, Cawthorn AJ, Ford WB (1994) The population biology of abalone (haliotis species) in tasmania. i. blacklip abalone (h. rubra) from the north coast and islands of bass strait. Sea Fisheries Division, Technical Report, 48:p411"},{"issue":"1\/2","key":"858_CR31","doi-asserted-by":"publisher","first-page":"100","DOI":"10.2307\/2333009","volume":"41","author":"ES Page","year":"1954","unstructured":"Page ES (1954) Continuous inspection schemes. Biometrika 41(1\/2):100\u2013115","journal-title":"Biometrika"},{"issue":"367","key":"858_CR32","doi-asserted-by":"publisher","first-page":"680","DOI":"10.1080\/01621459.1979.10481670","volume":"74","author":"D Quade","year":"1979","unstructured":"Quade D (1979) Using weighted rankings in the analysis of complete blocks with additive block effects. J Am Stat Assoc 74(367):680\u2013683","journal-title":"J Am Stat Assoc"},{"issue":"4","key":"858_CR33","doi-asserted-by":"publisher","first-page":"235","DOI":"10.1007\/s12530-012-9059-0","volume":"3","author":"A Shaker","year":"2012","unstructured":"Shaker A, H\u00fcllermeier E (2012) Iblstreams: A system for instance-based classification and regression on data streams. Evol Syst 3(4):235\u2013249","journal-title":"Evol Syst"},{"key":"858_CR34","doi-asserted-by":"crossref","unstructured":"Veloso B, Gama J, Malheiro B (2018) Self hyper-parameter tuning for data streams. In: International Conference on Discovery Science, pp 241\u2013255. Springer","DOI":"10.1007\/978-3-030-01771-2_16"},{"key":"858_CR35","unstructured":"Wright S (1921) Correlation and causation"},{"issue":"2","key":"858_CR36","doi-asserted-by":"publisher","first-page":"103","DOI":"10.1145\/235968.233324","volume":"25","author":"T Zhang","year":"1996","unstructured":"Zhang T, Ramakrishnan R, Livny M (1996) Birch: An efficient data clustering method for very large databases. ACM SIGMOD Rec 25(2):103\u2013114","journal-title":"ACM SIGMOD Rec"}],"container-title":["Data Mining and Knowledge Discovery"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10618-022-00858-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10618-022-00858-9\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10618-022-00858-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,10,6]],"date-time":"2022-10-06T10:12:20Z","timestamp":1665051140000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10618-022-00858-9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,8,13]]},"references-count":36,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2022,9]]}},"alternative-id":["858"],"URL":"https:\/\/doi.org\/10.1007\/s10618-022-00858-9","relation":{},"ISSN":["1384-5810","1573-756X"],"issn-type":[{"value":"1384-5810","type":"print"},{"value":"1573-756X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,8,13]]},"assertion":[{"value":"11 December 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"18 July 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"13 August 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}