{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,24]],"date-time":"2026-03-24T02:05:20Z","timestamp":1774317920864,"version":"3.50.1"},"reference-count":29,"publisher":"MDPI AG","issue":"6","license":[{"start":{"date-parts":[[2020,6,11]],"date-time":"2020-06-11T00:00:00Z","timestamp":1591833600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>As new cyberattacks are launched against systems and networks on a daily basis, the ability for network intrusion detection systems to operate efficiently in the big data era has become critically important, particularly as more low-power Internet-of-Things (IoT) devices enter the market. This has motivated research in applying machine learning algorithms that can operate on streams of data, trained online or \u201clive\u201d on only a small amount of data kept in memory at a time, as opposed to the more classical approaches that are trained solely offline on all of the data at once. In this context, one important concept from machine learning for improving detection performance is the idea of \u201censembles\u201d, where a collection of machine learning algorithms are combined to compensate for their individual limitations and produce an overall superior algorithm. Unfortunately, existing research lacks proper performance comparison between homogeneous and heterogeneous online ensembles. Hence, this paper investigates several homogeneous and heterogeneous ensembles, proposes three novel online heterogeneous ensembles for intrusion detection, and compares their performance accuracy, run-time complexity, and response to concept drifts. Out of the proposed novel online ensembles, the heterogeneous ensemble consisting of an adaptive random forest of Hoeffding Trees combined with a Hoeffding Adaptive Tree performed the best, by dealing with concept drift in the most effective way. While this scheme is less accurate than a larger size adaptive random forest, it offered a marginally better run-time, which is beneficial for online training.<\/jats:p>","DOI":"10.3390\/info11060315","type":"journal-article","created":{"date-parts":[[2020,6,12]],"date-time":"2020-06-12T05:02:24Z","timestamp":1591938144000},"page":"315","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":40,"title":["Ensemble-Based Online Machine Learning Algorithms for Network Intrusion Detection Systems Using Streaming Data"],"prefix":"10.3390","volume":"11","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5036-5433","authenticated-orcid":false,"given":"Nathan","family":"Martindale","sequence":"first","affiliation":[{"name":"Department of Computer Science, College of Engineering, Tennessee Tech University, Cookeville, TN 38505, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8051-9747","authenticated-orcid":false,"given":"Muhammad","family":"Ismail","sequence":"additional","affiliation":[{"name":"Department of Computer Science, College of Engineering, Tennessee Tech University, Cookeville, TN 38505, USA"}]},{"given":"Douglas A.","family":"Talbert","sequence":"additional","affiliation":[{"name":"Department of Computer Science, College of Engineering, Tennessee Tech University, Cookeville, TN 38505, USA"}]}],"member":"1968","published-online":{"date-parts":[[2020,6,11]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"147","DOI":"10.1016\/j.comnet.2019.01.023","article-title":"Internet of Things: A survey on machine learning-based intrusion detection approaches","volume":"151","author":"Papa","year":"2019","journal-title":"Comput. Netw."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"2823","DOI":"10.1007\/s13042-018-00906-1","article-title":"Review: Machine learning techniques applied to cybersecurity","volume":"10","year":"2019","journal-title":"Int. J. Mach. Learn. Cybern."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Srivastava, N., and Chandra Jaiswal, U. (2019, January 27\u201329). Big Data Analytics Technique in Cyber Security: A Review. Proceedings of the 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC), Erode, India.","DOI":"10.1109\/ICCMC.2019.8819634"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"169","DOI":"10.1613\/jair.614","article-title":"Popular ensemble methods: An empirical study","volume":"11","author":"Opitz","year":"1999","journal-title":"J. Artif. Intell. Res."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"181","DOI":"10.1023\/A:1022859003006","article-title":"Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy","volume":"51","author":"Kuncheva","year":"2003","journal-title":"Mach. Learn."},{"key":"ref_6","unstructured":"Domingos, P., and Hulten, G. (2001, January 20). Catching up with the data: Research issues in mining data streams. Proceedings of the Workshop on Research Issues in Data Mining and Knowledge Discovery, Santa Barbara, CA, USA."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"89","DOI":"10.1007\/s13748-011-0008-0","article-title":"Learning from streaming data with concept drift and imbalance: An overview","volume":"1","author":"Hoens","year":"2012","journal-title":"Prog. Artif. Intell."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"677","DOI":"10.1016\/j.asoc.2017.12.008","article-title":"Online ensemble learning with abstaining classifiers for drifting and noisy data streams","volume":"68","author":"Krawczyk","year":"2018","journal-title":"Appl. Soft Comput."},{"key":"ref_9","unstructured":"Cup, K. (2020, June 10). 2007. Available online: http:\/\/kdd.ics.uci.edu\/databases\/kddcup99\/kddcup99.html."},{"key":"ref_10","first-page":"103","article-title":"On diversity and accuracy of homogeneous and heterogeneous ensembles","volume":"4","author":"Bian","year":"2007","journal-title":"Int. J. Hybrid Intell. Syst."},{"key":"ref_11","first-page":"89","article-title":"Network Anomaly Detection by Means of Machine Learning: Random Forest Approach with Apache Spark","volume":"22","author":"Hajialian","year":"2018","journal-title":"Inf. Econ."},{"key":"ref_12","first-page":"1520","article-title":"Performance Analysis of Big Data Intrusion Detection System over Random Forest Algorithm","volume":"13","author":"Abd","year":"2018","journal-title":"Int. J. Appl. Eng. Res."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Verma, A., and Ranga, V. (2019). Machine Learning Based Intrusion Detection Systems for IoT Applications. Wireless Personal Communications, Springer.","DOI":"10.1007\/s11277-019-06986-8"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Rettig, L., Khayati, M., Cudre-Mauroux, P., and Piorkowski, M. (November, January 29). Online anomaly detection over Big Data streams. Proceedings of the 2015 IEEE International Conference on Big Data, IEEE Big Data 2015, Santa Clara, CA, USA.","DOI":"10.1109\/BigData.2015.7363865"},{"key":"ref_15","unstructured":"Guha, S., Mishra, N., Roy, G., and Schrijvers, O. (2016, January 19\u201324). Robust random cut forest based anomaly detection on streams. Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Mulinka, P., and Casas, P. (2018, January 20). Stream-based machine learning for network security and anomaly detection. Proceedings of the Big-DAMA 2018\u2014Proceedings of the 2018 Workshop on Big Data Analytics and Machine Learning for Data Communication Networks, Part of SIGCOMM 2018, Budapest, Hungary.","DOI":"10.1145\/3229607.3229612"},{"key":"ref_17","unstructured":"Tan, S.C., Ting, K.M., and Liu, T.F. (2011, January 16\u201322). Fast anomaly detection for streaming data. Proceedings of the IJCAI International Joint Conference on Artificial Intelligence, Barcelona, Spain."},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Verma, A., and Ranga, V. (2019, January 18\u201319). ELNIDS: Ensemble Learning based Network Intrusion Detection System for RPL based Internet of Things. Proceedings of the 2019 4th International Conference on Internet of Things: Smart Innovation and Usages, IoT-SIU 2019, San Diego, CA, USA.","DOI":"10.1109\/IoT-SIU.2019.8777504"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Mirsky, Y., Doitshman, T., Elovici, Y., and Shabtai, A. (2018). Kitsune: An Ensemble of Autoencoders for Online Network Intrusion Detection. arXiv.","DOI":"10.14722\/ndss.2018.23204"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Hsu, Y.F., He, Z.Y., Tarutani, Y., and Matsuoka, M. (2019, January 8\u201313). Toward an online network intrusion detection system based on ensemble learning. Proceedings of the IEEE International Conference on Cloud Computing, CLOUD 2019, Milan, Italy.","DOI":"10.1109\/CLOUD.2019.00037"},{"key":"ref_21","first-page":"75","article-title":"An ensemble approach to big data security (Cyber Security)","volume":"9","author":"Hashmani","year":"2018","journal-title":"Int. J. Adv. Comput. Sci. Appl."},{"key":"ref_22","first-page":"1601","article-title":"MOA: Massive Online Analysis","volume":"11","author":"Bifet","year":"2010","journal-title":"J. Mach. Learn. Res."},{"key":"ref_23","unstructured":"Frank, E., and Mark, A. (2016). The WEKA Workbench. Online Appendix for \u201cData Mining: Practical Machine Learning Tools and Techniques\u201d, Morgan Kaufmann."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1109\/TIT.1967.1053964","article-title":"Nearest neighbor pattern classification","volume":"13","author":"Cover","year":"1967","journal-title":"IEEE Trans. Inf. Theory"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1007\/BF00994018","article-title":"Support-vector networks","volume":"20","author":"Cortes","year":"1995","journal-title":"Mach. Learn."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Domingos, P., and Hulten, G. (2000, January 20\u201323). Mining high-speed data streams. Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, MA, USA.","DOI":"10.1145\/347090.347107"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Bifet, A., and Gavald\u00e0, R. (September, January 31). Adaptive learning from evolving data streams. Proceedings of the 8th International Symposium on Intelligent Data Analysis, IDA 2009, Lyon, France.","DOI":"10.1007\/978-3-642-03915-7_22"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"1469","DOI":"10.1007\/s10994-017-5642-8","article-title":"Adaptive random forests for evolving data stream classification","volume":"106","author":"Gomes","year":"2017","journal-title":"Mach. Learn."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"531","DOI":"10.1007\/s10115-017-1022-8","article-title":"Prequential AUC: Properties of the area under the ROC curve for data streams with concept drift","volume":"52","author":"Brzezinski","year":"2017","journal-title":"Knowl. Inf. Syst."}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/11\/6\/315\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T09:37:55Z","timestamp":1760175475000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/11\/6\/315"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,6,11]]},"references-count":29,"journal-issue":{"issue":"6","published-online":{"date-parts":[[2020,6]]}},"alternative-id":["info11060315"],"URL":"https:\/\/doi.org\/10.3390\/info11060315","relation":{},"ISSN":["2078-2489"],"issn-type":[{"value":"2078-2489","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,6,11]]}}}