{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,28]],"date-time":"2025-10-28T03:19:18Z","timestamp":1761621558194,"version":"build-2065373602"},"reference-count":40,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2020,12,30]],"date-time":"2020-12-30T00:00:00Z","timestamp":1609286400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Data"],"abstract":"<jats:p>In data mining, outlier detection is a major challenge as it has an important role in many applications such as medical data, image processing, fraud detection, intrusion detection, and so forth. An extensive variety of clustering based approaches have been developed to detect outliers. However they are by nature time consuming which restrict their utilization with real-time applications. Furthermore, outlier detection requests are handled one at a time, which means that each request is initiated individually with a particular set of parameters. In this paper, the first clustering based outlier detection framework, (On the Fly Clustering Based Outlier Detection (OFCOD)) is presented. OFCOD enables analysts to effectively find out outliers on time with request even within huge datasets. The proposed framework has been tested and evaluated using two real world datasets with different features and applications; one with 699 records, and another with five millions records. The experimental results show that the performance of the proposed framework outperforms other existing approaches while considering several evaluation metrics.<\/jats:p>","DOI":"10.3390\/data6010001","type":"journal-article","created":{"date-parts":[[2020,12,30]],"date-time":"2020-12-30T20:13:41Z","timestamp":1609359221000},"page":"1","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":20,"title":["OFCOD: On the Fly Clustering Based Outlier Detection Framework"],"prefix":"10.3390","volume":"6","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7282-6598","authenticated-orcid":false,"given":"Ahmed","family":"Elmogy","sequence":"first","affiliation":[{"name":"Computer Engineering Department, Prince Sattam Bin Abdulaziz University, Al-Kharj 16278, Saudi Arabia"},{"name":"Computers and Control Engineering Department, Faculty of Engineering, Tanta University, Tanta 31733, Egypt"}]},{"given":"Hamada","family":"Rizk","sequence":"additional","affiliation":[{"name":"Computers and Control Engineering Department, Faculty of Engineering, Tanta University, Tanta 31733, Egypt"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0151-9619","authenticated-orcid":false,"given":"Amany M.","family":"Sarhan","sequence":"additional","affiliation":[{"name":"Computers and Control Engineering Department, Faculty of Engineering, Tanta University, Tanta 31733, Egypt"}]}],"member":"1968","published-online":{"date-parts":[[2020,12,30]]},"reference":[{"key":"ref_1","unstructured":"Simon, H., Hongxing, H., Graham, W., and Rohan, B. (2002). Outlier Detection Using Replicator Neural Networks. Data Warehousing and Knowledge Discovery, Springer."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"407","DOI":"10.15837\/ijccc.2013.3.466","article-title":"Automatic Growth Detection of Cell Cultures through Outlier Techniques using 2D Images","volume":"8","author":"Gagniuc","year":"2013","journal-title":"Int. J. Comput. Commun."},{"key":"ref_3","unstructured":"Han, J., Kamber, M., and Pei, J. (2012). Data Mining: Concepts and Techniques, Elsevier."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"93","DOI":"10.1145\/335191.335388","article-title":"LOF: Identifying Density-based Local Outliers","volume":"29","author":"Markus","year":"2000","journal-title":"SIGMOD Rec."},{"key":"ref_5","unstructured":"Lei, C., Mingrui, W., Di, Y., and Elke, R. (2015, January 10\u201313). Online Outlier Exploration Over Large Datasets. Proceedings of the KDD \u201915, 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, Australia."},{"key":"ref_6","first-page":"93","article-title":"Outlier Detection Algorithm Combined with Decision Tree Classifier for Early Diagnosis of Breast Cancer","volume":"7","author":"Howsalya","year":"2009","journal-title":"Int. J. Adv. Eng. Technol."},{"key":"ref_7","first-page":"160","article-title":"Real time traffic flow outlier detection using short-term traffic conditional variance prediction","volume":"50","author":"Jianhua","year":"2014","journal-title":"Transp. Res. Part C Emerg. Technol."},{"key":"ref_8","first-page":"1","article-title":"Recent Progress of Anomaly Detection","volume":"2019","author":"Xiaodan","year":"2019","journal-title":"Complexity"},{"key":"ref_9","unstructured":"Jatindra, P., and Sukumar, N. (2011, January 19\u201320). An Outlier Detection Method Based on Clustering. Proceedings of the 2011 Second International Conference on Emerging Applications of Information Technology, Kolkata, India."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Chawla, S., and Gionis, A. (2013). k-means-: A Unified Approach to Clustering and Outlier Detection, SDM.","DOI":"10.1137\/1.9781611972832.21"},{"key":"ref_11","unstructured":"Kanishka, B., Bryan, M., and Chris, G. (2011, January 21\u201324). Algorithms for Speeding Up Distance-based Outlier Detection. Proceedings of the KDD \u201911, 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"652","DOI":"10.2991\/ijcis.11.1.50","article-title":"A Comparison of Outlier Detection Techniques for High-Dimensional Data","volume":"11","author":"Xu","year":"2018","journal-title":"Int. J. Comput. Intell. Syst."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"671","DOI":"10.1080\/17445760.2018.1446210","article-title":"Parallel and distributed clustering framework for big spatial data mining","volume":"34","author":"Bendechache","year":"2019","journal-title":"Int. J. Parallel Emergent Distrib. Syst."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"56","DOI":"10.15837\/ijccc.2014.1.50","article-title":"Improving the Efficiency of Image Clustering using Modified Non Euclidean Distance Measures in Data Mining","volume":"9","author":"Santhi","year":"2014","journal-title":"Int. Comput. Commun."},{"key":"ref_15","first-page":"12","article-title":"Comparative Analysis of Outlier Detection Techniques","volume":"97","author":"Kamal","year":"2014","journal-title":"Int. J. Comput. Appl."},{"key":"ref_16","first-page":"5:1","article-title":"Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection","volume":"10","author":"Ricardo","year":"2015","journal-title":"ACM Trans. Knowl. Discov. Data"},{"key":"ref_17","unstructured":"Edwin, K., and Raymond, N. (1999, January 7\u201310). Finding Intensional Knowledge of Distance-Based Outliers. Proceedings of the VLDB \u201999, 25th International Conference on Very Large Data Bases, Edinburgh, UK."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"349","DOI":"10.1007\/s10618-008-0093-2","article-title":"Fast mining of distance-based outliers in high-dimensional datasets","volume":"16","author":"Amol","year":"2008","journal-title":"Data Min. Knowl. Discov."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Tang, B., and He, H. (2016). A Local Density-Based Approach for Local Outlier Detection. arXiv.","DOI":"10.1016\/j.neucom.2017.02.039"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Su, S., Xiao, L., Zhang, Z., Gu, F., Ruan, L., Li, S., He, Z., Huo, Z., Yan, B., and Wang, H. (2017, January 18\u201320). N2DLOF: A New Local Density-Based Outlier Detection Approach for Scattered Data. Proceedings of the 2017 IEEE 19th International Conference on High Performance Computing and Communications, Bangkok, Thailand.","DOI":"10.1109\/HPCC-SmartCity-DSS.2017.60"},{"key":"ref_21","first-page":"9","article-title":"Discovering Cluster Based Local Outliers","volume":"2003","author":"He","year":"2003","journal-title":"Pattern Recognit. Lett."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Jiang, S., and An, Q. (2008, January 18\u201320). Clustering-Based Outlier Detection Method. Proceedings of the 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery, Shandong, China.","DOI":"10.1109\/FSKD.2008.244"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Rizk, H., Elgokhy, S., and Sarhan, A. (2015, January 23\u201324). A hybrid outlier detection algorithm based on partitioning clustering and density measures. Proceedings of the Tenth International Conference on Computer Engineering & Systems (ICCES), Cairo, Egypt.","DOI":"10.1109\/ICCES.2015.7393040"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"753","DOI":"10.1016\/j.asej.2013.01.003","article-title":"A hybrid network intrusion detection framework based on random forests and weighted k-means","volume":"4","author":"Elbasiony","year":"2013","journal-title":"Ain Shams Eng. J."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"237","DOI":"10.1007\/s007780050006","article-title":"Distance-based Outliers: Algorithms and Applications","volume":"8","author":"Edwin","year":"2000","journal-title":"Vldb J."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1016\/j.is.2015.07.006","article-title":"Efficient and Flexible Algorithms for Monitoring Distance-based Outliers over Data Streams","volume":"55","author":"Maria","year":"2016","journal-title":"Inf. Syst."},{"key":"ref_27","unstructured":"Justin, Z. (2007, January 7\u201310). Privacy preserving K-medoids clustering. Proceedings of the 2007 IEEE International Conference on Systems, Man and Cybernetics, Montreal, QC, Canada."},{"key":"ref_28","first-page":"3","article-title":"Dengue Fever in Perspective of Clustering Algorithms","volume":"6","author":"Shaukat","year":"2015","journal-title":"J. Data Min. Genom. Proteom."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"691","DOI":"10.1016\/S0167-8655(00)00131-8","article-title":"Two-phase Clustering Process for Outliers Detection","volume":"22","author":"Jaing","year":"2001","journal-title":"Pattern Recogn. Lett."},{"key":"ref_30","first-page":"839","article-title":"Comparative Study between K-Means and K-Medoids Clustering Algorithms","volume":"6","author":"Nirmal","year":"2019","journal-title":"J. Classif."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"1","DOI":"10.3390\/a12090177","article-title":"Simple K-Medoids Partitioning Algorithm for Mixed Variable Data","volume":"12","author":"Budiaji","year":"2019","journal-title":"Algorithms"},{"key":"ref_32","first-page":"13","article-title":"K-means with Three different Distance Metrics","volume":"67","author":"Archana","year":"2013","journal-title":"Int. J. Comput. Appl."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"85","DOI":"10.1023\/B:AIRE.0000045502.10941.a9","article-title":"A Survey of Outlier Detection Methodologies","volume":"22","author":"Victoria","year":"2004","journal-title":"Artif. Intell. Rev."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"3336","DOI":"10.1016\/j.eswa.2008.01.039","article-title":"A simple and fast algorithm for K-medoids clustering","volume":"36","author":"Park","year":"2009","journal-title":"Expert Syst. Appl."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"259","DOI":"10.1007\/s10618-009-0148-z","article-title":"A fast outlier detection strategy for distributed high-dimensional data sets with mixed attributes","volume":"20","author":"Anna","year":"2010","journal-title":"Data Min. Knowl. Discov."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"861","DOI":"10.1016\/j.patrec.2005.10.010","article-title":"An introduction to ROC analysis","volume":"27","author":"Fawcett","year":"2006","journal-title":"Pattern Recognit. Lett."},{"key":"ref_37","unstructured":"Wolberg, W., Street, W., and Mangasarian, O. (1998). UCI Repository of Machine Learning Databases: Breast Cancer Wisconsin (Diagnostic) Data Set, UCI."},{"key":"ref_38","unstructured":"(2020, December 20). KDD\u201999: The KDD Intrusion Detection Dataset. Available online: http:\/\/kdd.ics.uci.edu\/databases\/kddcup99\/kddcup99.html."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"1476","DOI":"10.1016\/j.eswa.2013.08.044","article-title":"Breast cancer diagnosis based on feature extraction using a hybrid of K-means and support vector machine algorithms","volume":"41","author":"Zheng","year":"2014","journal-title":"Expert Syst. Appl."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Shanxiong, C., Maoling, P., Hailing, X., and Sheng, W. (2017). An anomaly detection method based on Lasso. Clust. Comput., 22.","DOI":"10.1007\/s10586-017-1255-z"}],"container-title":["Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2306-5729\/6\/1\/1\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T10:48:05Z","timestamp":1760179685000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2306-5729\/6\/1\/1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,12,30]]},"references-count":40,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2021,1]]}},"alternative-id":["data6010001"],"URL":"https:\/\/doi.org\/10.3390\/data6010001","relation":{},"ISSN":["2306-5729"],"issn-type":[{"type":"electronic","value":"2306-5729"}],"subject":[],"published":{"date-parts":[[2020,12,30]]}}}