{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,16]],"date-time":"2026-04-16T20:10:12Z","timestamp":1776370212714,"version":"3.51.2"},"reference-count":66,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2021,9,22]],"date-time":"2021-09-22T00:00:00Z","timestamp":1632268800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100000038","name":"Natural Sciences and Engineering Research Council of Canada","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100000038","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Ontario Research Fund\u2013Research Excellence award in BRAIN Alliance"},{"name":"York Research Chairs"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Cyber-Phys. Syst."],"published-print":{"date-parts":[[2021,10,31]]},"abstract":"<jats:p>Industrial Information Technology infrastructures are often vulnerable to cyberattacks. To ensure security to the computer systems in an industrial environment, it is required to build effective intrusion detection systems to monitor the cyber-physical systems (e.g., computer networks) in the industry for malicious activities. This article aims to build such intrusion detection systems to protect the computer networks from cyberattacks. More specifically, we propose a novel unsupervised machine learning approach that combines the K-Means algorithm with the Isolation Forest for anomaly detection in industrial big data scenarios. Since our objective is to build the intrusion detection system for the big data scenario in the industrial domain, we utilize the Apache Spark framework to implement our proposed model that was trained in large network traffic data (about 123 million instances of network traffic) stored in Elasticsearch. Moreover, we evaluate our proposed model on the live streaming data and find that our proposed system can be used for real-time anomaly detection in the industrial setup. In addition, we address different challenges that we face while training our model on large datasets and explicitly describe how these issues were resolved. Based on our empirical evaluation in different use cases for anomaly detection in real-world network traffic data, we observe that our proposed system is effective to detect anomalies in big data scenarios. Finally, we evaluate our proposed model on several academic datasets to compare with other models and find that it provides comparable performance with other state-of-the-art approaches.<\/jats:p>","DOI":"10.1145\/3460976","type":"journal-article","created":{"date-parts":[[2021,9,22]],"date-time":"2021-09-22T21:36:34Z","timestamp":1632346594000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":58,"title":["Extending Isolation Forest for Anomaly Detection in Big Data via K-Means"],"prefix":"10.1145","volume":"5","author":[{"given":"Md Tahmid Rahman","family":"Laskar","sequence":"first","affiliation":[{"name":"York University, Toronto, On, Canada"}]},{"given":"Jimmy Xiangji","family":"Huang","sequence":"additional","affiliation":[{"name":"York University, Toronto, On, Canada"}]},{"given":"Vladan","family":"Smetana","sequence":"additional","affiliation":[{"name":"iSecurity Consulting Inc., Toronto, ON, Canada"}]},{"given":"Chris","family":"Stewart","sequence":"additional","affiliation":[{"name":"iSecurity Consulting Inc., Toronto, ON, Canada"}]},{"given":"Kees","family":"Pouw","sequence":"additional","affiliation":[{"name":"iSecurity Consulting Inc., Toronto, ON, Canada"}]},{"given":"Aijun","family":"An","sequence":"additional","affiliation":[{"name":"York University, Toronto, On, Canada"}]},{"given":"Stephen","family":"Chan","sequence":"additional","affiliation":[{"name":"Dapasoft Inc., Toronto, ON, Canada"}]},{"given":"Lei","family":"Liu","sequence":"additional","affiliation":[{"name":"York University, Toronto, On, Canada"}]}],"member":"320","published-online":{"date-parts":[[2021,9,22]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jnca.2015.11.016"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.adhoc.2019.02.001"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/2500853.2500857"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3190664"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.5555\/1283383.1283494"},{"key":"e_1_2_1_6_1","volume-title":"Retrieved on","author":"Asuncion Arthur","year":"2007","unstructured":"Arthur Asuncion and David Newman . 2007 . UCI machine learning repository . Retrieved on March 15, 2021 from https:\/\/archive.ics.uci.edu\/ml\/index.php. Arthur Asuncion and David Newman. 2007. UCI machine learning repository. Retrieved on March 15, 2021 from https:\/\/archive.ics.uci.edu\/ml\/index.php."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.5555\/3157096.3157103"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.14778\/2180912.2180915"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/956750.956758"},{"key":"e_1_2_1_10_1","doi-asserted-by":"crossref","unstructured":"Raghavendra Chalapathy and Sanjay Chawla. 2019. Deep learning for anomaly detection: A survey. arXiv:1901.03407.  Raghavendra Chalapathy and Sanjay Chawla. 2019. Deep learning for anomaly detection: A survey. arXiv:1901.03407.","DOI":"10.1145\/3394486.3406704"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1541880.1541882"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11227-019-02805-w"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00521-006-0030-5"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/1629175.1629198"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2017.09.037"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2019.2935975"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.5555\/3001460.3001507"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.procs.2016.06.047"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.future.2013.06.027"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.5555\/1172965.1173603"},{"key":"e_1_2_1_21_1","volume-title":"How much can k-means be improved by using better initialization and repeats?Pattern Recognition 93","author":"Fr\u00e4nti Pasi","year":"2019","unstructured":"Pasi Fr\u00e4nti and Sami Sieranoja . 2019. How much can k-means be improved by using better initialization and repeats?Pattern Recognition 93 ( 2019 ), 95\u2013112. Pasi Fr\u00e4nti and Sami Sieranoja. 2019. How much can k-means be improved by using better initialization and repeats?Pattern Recognition 93 (2019), 95\u2013112."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/2523813"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.5555\/2904394"},{"key":"e_1_2_1_24_1","unstructured":"Yuvraj Gupta. 2015. Kibana Essentials. Packt Publishing Ltd.  Yuvraj Gupta. 2015. Kibana Essentials. Packt Publishing Ltd."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ijinfomgt.2018.08.006"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.3390\/a14010006"},{"key":"e_1_2_1_27_1","volume-title":"Matias Carrasco Kind, and Robert J. Brunner","author":"Hariri Sahand","year":"2018","unstructured":"Sahand Hariri , Matias Carrasco Kind, and Robert J. Brunner . 2018 . Extended isolation forest. arXiv:1811.02141. Sahand Hariri, Matias Carrasco Kind, and Robert J. Brunner. 2018. Extended isolation forest. arXiv:1811.02141."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.4108\/eai.3-12-2015.2262516"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2020.105659"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.5555\/1082161.1082198"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-4048(02)00514-X"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2008.17"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/2133360.2133363"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.5555\/2946645.2946679"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.protcy.2012.05.017"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/IJCNN.2002.1007774"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.1901\/jeab.2001.76-235"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICC.2018.8422401"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICC.2019.8761575"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/3406093"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/LSENS.2017.2752719"},{"key":"e_1_2_1_42_1","volume-title":"Proceedings of the 2020 International Conference on Communications (ICC'20)","author":"Otoum Safa","unstructured":"Safa Otoum , Burak Kantarci , and Hussein T. Mouftah . 2020. A novel ensemble method for advanced intrusion detection in wireless sensor networks . In Proceedings of the 2020 International Conference on Communications (ICC'20) . IEEE, Los Alamitos, CA, 1\u20136. Safa Otoum, Burak Kantarci, and Hussein T. Mouftah. 2020. A novel ensemble method for advanced intrusion detection in wireless sensor networks. In Proceedings of the 2020 International Conference on Communications (ICC'20). IEEE, Los Alamitos, CA, 1\u20136."},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.5555\/289373"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0306-4530(02)00108-7"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.5555\/1898681.1898696"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1198\/106186006X94072"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/TETCI.2017.2772792"},{"key":"e_1_2_1_48_1","volume-title":"Elasticsearch for Hadoop","author":"Shukla Vishal","unstructured":"Vishal Shukla . 2015. Elasticsearch for Hadoop . Packt Publishing Ltd . Vishal Shukla. 2015. Elasticsearch for Hadoop. Packt Publishing Ltd."},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1109\/MSST.2010.5496972"},{"key":"e_1_2_1_50_1","doi-asserted-by":"publisher","DOI":"10.1109\/SP.2010.25"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1109\/TR.2010.2048740"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1109\/UBMK.2017.8093473"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSP.2003.814797"},{"key":"e_1_2_1_54_1","volume-title":"Proceedings of the European Conference of the Prognostics and Health Management Society. 1\u20139.","author":"Tian Jing","year":"2014","unstructured":"Jing Tian , Michael H. Azarian , and Michael Pecht . 2014 . Anomaly detection using self-organizing maps-based k-nearest neighbor algorithm . In Proceedings of the European Conference of the Prognostics and Health Management Society. 1\u20139. Jing Tian, Michael H. Azarian, and Michael Pecht. 2014. Anomaly detection using self-organizing maps-based k-nearest neighbor algorithm. In Proceedings of the European Conference of the Prognostics and Health Management Society. 1\u20139."},{"key":"e_1_2_1_55_1","first-page":"39","article-title":"Reinforced intrusion detection using pursuit reinforcement competitive learning","volume":"2","author":"Prafitaning Tiyas Indah Yulia","year":"2014","unstructured":"Indah Yulia Prafitaning Tiyas , Ali Ridho Barakbah , Tri Harsono , and Amang Sudarsono . 2014 . Reinforced intrusion detection using pursuit reinforcement competitive learning . EMITTER International Journal of Engineering Technology 2 , 1 (2014), 39 \u2013 49 . Indah Yulia Prafitaning Tiyas, Ali Ridho Barakbah, Tri Harsono, and Amang Sudarsono. 2014. Reinforced intrusion detection using pursuit reinforcement competitive learning. EMITTER International Journal of Engineering Technology 2, 1 (2014), 39\u201349.","journal-title":"EMITTER International Journal of Engineering Technology"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.14778\/2994509.2994526"},{"key":"e_1_2_1_57_1","volume-title":"linkedin-isolation-forest. Retrieved","author":"Verbus James","year":"2021","unstructured":"James Verbus . 2019. linkedin-isolation-forest. Retrieved July 7, 2021 from https:\/\/github.com\/linkedin\/isolation-forest. James Verbus. 2019. linkedin-isolation-forest. Retrieved July 7, 2021 from https:\/\/github.com\/linkedin\/isolation-forest."},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICACCI.2017.8126009"},{"key":"e_1_2_1_59_1","doi-asserted-by":"publisher","DOI":"10.5555\/2285539"},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1016\/0169-7439(87)80084-9"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1145\/3377408"},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1007\/11538059_103"},{"key":"e_1_2_1_63_1","doi-asserted-by":"publisher","DOI":"10.1023\/B:DAMI.0000023676.72185.7c"},{"key":"e_1_2_1_64_1","volume-title":"spark-iforest. Retrieved","author":"Yang Fangzhou","year":"2021","unstructured":"Fangzhou Yang . 2018. spark-iforest. Retrieved July 7, 2021 from https:\/\/github.com\/titicaca\/spark-iforest. Fangzhou Yang. 2018. spark-iforest. Retrieved July 7, 2021 from https:\/\/github.com\/titicaca\/spark-iforest."},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2017.2762418"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1145\/2934664"}],"container-title":["ACM Transactions on Cyber-Physical Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3460976","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3460976","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:48:22Z","timestamp":1750193302000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3460976"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,9,22]]},"references-count":66,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2021,10,31]]}},"alternative-id":["10.1145\/3460976"],"URL":"https:\/\/doi.org\/10.1145\/3460976","relation":{},"ISSN":["2378-962X","2378-9638"],"issn-type":[{"value":"2378-962X","type":"print"},{"value":"2378-9638","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,9,22]]},"assertion":[{"value":"2020-08-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-04-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-09-22","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}