{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,18]],"date-time":"2026-01-18T02:31:33Z","timestamp":1768703493008,"version":"3.49.0"},"reference-count":43,"publisher":"Springer Science and Business Media LLC","issue":"4","license":[{"start":{"date-parts":[[2022,2,17]],"date-time":"2022-02-17T00:00:00Z","timestamp":1645056000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,2,17]],"date-time":"2022-02-17T00:00:00Z","timestamp":1645056000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2022,8]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Outlier or anomaly detection is the process through which datum\/data with different properties from the rest of the data is\/are identified. Their importance lies in their use in various domains such as fraud detection, network intrusion detection, and spam filtering. In this paper, we introduce a new outlier detection algorithm based on an ensemble method and distance-based data filtering with an iterative approach to detect outliers in unlabeled data. The ensemble method is used to cluster the unlabeled data and to filter out potential isolated outliers from the same by iteratively using a cluster membership threshold until the Dunn index score for clustering is maximized. The distance-based data filtering, on the other hand, removes the potential outlier clusters from the post-clustered data based on a distance threshold using the Euclidean distance measure of each data point from the majority cluster as the filtering factor. The performance of our algorithm is evaluated by applying it to 10 real-world machine learning datasets. Finally, we compare the results of our algorithm to various supervised and unsupervised outlier detection algorithms using Precision@n and F-score evaluation metrics.<\/jats:p>","DOI":"10.1007\/s40747-022-00674-0","type":"journal-article","created":{"date-parts":[[2022,2,17]],"date-time":"2022-02-17T03:02:33Z","timestamp":1645066953000},"page":"3215-3230","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":15,"title":["An iterative approach to unsupervised outlier detection using ensemble method and distance-based data filtering"],"prefix":"10.1007","volume":"8","author":[{"given":"Bodhan","family":"Chakraborty","sequence":"first","affiliation":[]},{"given":"Agneet","family":"Chaterjee","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4217-2372","authenticated-orcid":false,"given":"Samir","family":"Malakar","sequence":"additional","affiliation":[]},{"given":"Ram","family":"Sarkar","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,2,17]]},"reference":[{"key":"674_CR1","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s40747-018-0085-9","volume":"5","author":"A Borah","year":"2019","unstructured":"Borah A, Nath B (2019) Rare pattern mining: challenges and future perspectives. Complex Intell Syst 5:1\u201323","journal-title":"Complex Intell Syst"},{"key":"674_CR2","doi-asserted-by":"crossref","unstructured":"Dhieb N, Ghazzai H, Besbes H, Massoud Y (2019) A very deep transfer learning model for vehicle damage detection and localization. In: 2019 31st international conference on microelectronics (ICM). IEEE, pp 158\u2013161","DOI":"10.1109\/ICM48031.2019.9021687"},{"key":"674_CR3","doi-asserted-by":"publisher","first-page":"133","DOI":"10.1007\/s40747-017-0040-1","volume":"3","author":"BK Sarkar","year":"2017","unstructured":"Sarkar BK (2017) Big data for secure healthcare system: a conceptual design. Complex Intell Syst 3:133\u2013151","journal-title":"Complex Intell Syst"},{"key":"674_CR4","first-page":"11","volume":"5","author":"V Shambharkar","year":"2016","unstructured":"Shambharkar V, Sahare V (2016) Survey on outlier detection for support vector machine. Int J Data Min Tech Appl 5:11\u201314","journal-title":"Int J Data Min Tech Appl"},{"key":"674_CR5","doi-asserted-by":"publisher","first-page":"33","DOI":"10.1007\/s40747-016-0033-5","volume":"3","author":"V Shah","year":"2017","unstructured":"Shah V, Aggarwal AK, Chaubey N (2017) Performance improvement of intrusion detection with fusion of multiple sensors. Complex Intell Syst 3:33\u201339","journal-title":"Complex Intell Syst"},{"key":"674_CR6","doi-asserted-by":"publisher","first-page":"3575","DOI":"10.1007\/s10462-019-09771-y","volume":"53","author":"A Carre\u00f1o","year":"2020","unstructured":"Carre\u00f1o A, Inza I, Lozano JA (2020) Analyzing rare event, anomaly, novelty and outlier detection terms under the supervised classification framework. Artif Intell Rev 53:3575\u20133594","journal-title":"Artif Intell Rev"},{"key":"674_CR7","doi-asserted-by":"crossref","unstructured":"Tian W, Liu J (2009) Intrusion detection quantitative analysis with support vector regression and particle swarm optimization algorithm. In: 2009 international conference on wireless networks and information systems. IEEE, pp 133\u2013136","DOI":"10.1109\/WNIS.2009.79"},{"key":"674_CR8","first-page":"6","volume":"161","author":"P Save","year":"2017","unstructured":"Save P, Tiwarekar P, Jain KN, Mahyavanshi N (2017) A novel idea for credit card fraud detection using decision tree. Int J Comput Appl 161:6\u20139","journal-title":"Int J Comput Appl"},{"key":"674_CR9","doi-asserted-by":"publisher","first-page":"111","DOI":"10.1007\/978-3-319-47578-3_4","volume-title":"Outlier analysis","author":"CC Aggarwal","year":"2017","unstructured":"Aggarwal CC (2017) Proximity-based outlier detection. Outlier analysis. Springer, Berlin, pp 111\u2013147"},{"key":"674_CR10","doi-asserted-by":"crossref","unstructured":"Zhang J, Zulkernine M (2006) Anomaly based network intrusion detection with unsupervised outlier detection. In: IEEE international conference on communications","DOI":"10.1109\/ICC.2006.255127"},{"key":"674_CR11","doi-asserted-by":"crossref","unstructured":"Zhang K, Shi S, Gao H, Li J (2007) Unsupervised outlier detection in sensor networks using aggregation tree. In: International conference on advanced data mining and applications. Springer, pp 158\u2013169","DOI":"10.1007\/978-3-540-73871-8_16"},{"key":"674_CR12","doi-asserted-by":"crossref","unstructured":"Dasgupta D, Majumdar NS (2002) Anomaly detection in multidimensional data using negative selection algorithm. In: Proceedings of the 2002 Congress on Evolutionary Computation. CEC\u201902 (Cat. No. 02TH8600). IEEE, pp 1039\u20131044","DOI":"10.1109\/CEC.2002.1004386"},{"key":"674_CR13","doi-asserted-by":"publisher","first-page":"2481","DOI":"10.1016\/j.sigpro.2003.07.018","volume":"83","author":"M Markou","year":"2003","unstructured":"Markou M, Singh S (2003) Novelty detection: a review\u2014part 1: statistical approaches. Signal Process 83:2481\u20132497","journal-title":"Signal Process"},{"key":"674_CR14","doi-asserted-by":"publisher","first-page":"35145","DOI":"10.1007\/s11042-020-09628-5","volume":"80","author":"A Saha","year":"2021","unstructured":"Saha A, Chatterjee A, Ghosh S et al (2021) An ensemble approach to outlier detection using some conventional clustering algorithms. Multimed Tools Appl 80:35145\u201335169. https:\/\/doi.org\/10.1007\/s11042-020-09628-5","journal-title":"Multimed Tools Appl"},{"key":"674_CR15","doi-asserted-by":"crossref","unstructured":"Hautam\u00e4ki V, Cherednichenko S, K\u00e4rkk\u00e4inen I, et al (2005) Improving K-means by outlier removal. In: Scandinavian conference on image analysis. Springer, pp 978\u2013987","DOI":"10.1007\/11499145_99"},{"key":"674_CR16","doi-asserted-by":"publisher","first-page":"1641","DOI":"10.1016\/S0167-8655(03)00003-5","volume":"24","author":"Z He","year":"2003","unstructured":"He Z, Xu X, Deng S (2003) Discovering cluster-based local outliers. Pattern Recognit Lett 24:1641\u20131650","journal-title":"Pattern Recognit Lett"},{"key":"674_CR17","doi-asserted-by":"publisher","unstructured":"Hawkins S, He H,\nWilliams G, Baxter R (2002) Outlier detection using replicator neural\nnetworks. In: Kambayashi Y, Winiwarter W, Arikawa M (eds) Data\nwarehousing and knowledge discovery. DaWaK 2002. Lecture Notes in\nComputer Science, vol 2454. Springer, Berlin, Heidelberg.\nhttps:\/\/doi.org\/10.1007\/3-540-46145-0_17","DOI":"10.1007\/3-540-46145-0_17"},{"key":"674_CR18","doi-asserted-by":"publisher","DOI":"10.1080\/1351847X.2019.1647864","author":"N Loperfido","year":"2019","unstructured":"Loperfido N (2019) Kurtosis-based projection pursuit for outlier detection in financial time series. Eur J Financ. https:\/\/doi.org\/10.1080\/1351847X.2019.1647864","journal-title":"Eur J Financ"},{"key":"674_CR19","doi-asserted-by":"publisher","unstructured":"Zhang K, Hutter M, Jin H (2009)\nA new local distance-based outlier detection approach for scattered real-world data. In: Theeramunkong T, Kijsirikul B, Cercone N, Ho TB (eds)\nAdvances in knowledge discovery and data mining. PAKDD 2009. Lecture\nNotes in Computer Science, vol 5476. Springer, Berlin, Heidelberg.\nhttps:\/\/doi.org\/10.1007\/978-3-642-01307-2_84","DOI":"10.1007\/978-3-642-01307-2_84"},{"key":"674_CR20","doi-asserted-by":"publisher","DOI":"10.1145\/956750.956758","author":"S Bay","year":"2003","unstructured":"Bay S, Schwabacher M (2003) Mining distance-based outliers in near linear time with randomization and a simple pruning rule. Proc ACM SIGKDD Int Conf Knowl Discov Data Min. https:\/\/doi.org\/10.1145\/956750.956758","journal-title":"Proc ACM SIGKDD Int Conf Knowl Discov Data Min"},{"key":"674_CR21","doi-asserted-by":"crossref","unstructured":"Ghoting A, Parthasarathy S, Otey ME Fast mining of distance-based outliers in high-dimensional datasets. In: Proceedings of the 2006 SIAM international conference on data mining. pp 609\u2013613","DOI":"10.1137\/1.9781611972764.70"},{"key":"674_CR22","doi-asserted-by":"publisher","first-page":"691","DOI":"10.1016\/S0167-8655(00)00131-8","volume":"22","author":"M-F Jiang","year":"2001","unstructured":"Jiang M-F, Tseng S, Su CM (2001) Two-phase clustering process for outliers detection. Pattern Recognit Lett 22:691\u2013700. https:\/\/doi.org\/10.1016\/S0167-8655(00)00131-8","journal-title":"Pattern Recognit Lett"},{"key":"674_CR23","doi-asserted-by":"publisher","first-page":"12059","DOI":"10.1088\/1742-6596\/1437\/1\/012059","volume":"1437","author":"W Chen","year":"2020","unstructured":"Chen W, Tian Z, Zhang L (2020) Interpolation-based outlier detection for sparse, high dimensional data. J Phys Conf Ser 1437:12059. https:\/\/doi.org\/10.1088\/1742-6596\/1437\/1\/012059","journal-title":"J Phys Conf Ser"},{"key":"674_CR24","doi-asserted-by":"publisher","first-page":"222","DOI":"10.1016\/j.patcog.2009.05.017","volume":"43","author":"C-F Tsai","year":"2010","unstructured":"Tsai C-F, Lin C-Y (2010) A triangle area based nearest neighbors approach to intrusion detection. Pattern Recognit 43:222\u2013229. https:\/\/doi.org\/10.1016\/j.patcog.2009.05.017","journal-title":"Pattern Recognit"},{"key":"674_CR25","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2019.2895094","author":"C Wang","year":"2019","unstructured":"Wang C, Liu Z, Gao H, Fu Y (2019) Applying anomaly pattern score for outlier detection. IEEE Access. https:\/\/doi.org\/10.1109\/ACCESS.2019.2895094","journal-title":"IEEE Access"},{"key":"674_CR26","unstructured":"Feng Q, Zhang Z, Huang\nZ, Xu J, Wang J (2019) Improved algorithms for clustering with\noutliers. In: Proc. 30th International symposium on algorithms and computation\n(ISAAC 2019)"},{"key":"674_CR27","doi-asserted-by":"publisher","first-page":"105","DOI":"10.1109\/TFUZZ.2010.2087382","volume":"19","author":"X Yang","year":"2011","unstructured":"Yang X, Zhang G, Lu J (2011) A kernel Fuzzy C-means clustering-based fuzzy support vector machine algorithm for classification problems with outliers or noises. Fuzzy Syst IEEE Trans 19:105\u2013115. https:\/\/doi.org\/10.1109\/TFUZZ.2010.2087382","journal-title":"Fuzzy Syst IEEE Trans"},{"key":"674_CR28","doi-asserted-by":"publisher","first-page":"49","DOI":"10.1145\/2481244.2481252","volume":"14","author":"C Aggarwal","year":"2012","unstructured":"Aggarwal C (2012) Outlier ensembles: position paper. SIGKDD Explor 14:49\u201358","journal-title":"SIGKDD Explor"},{"key":"674_CR29","doi-asserted-by":"publisher","first-page":"260","DOI":"10.1186\/1471-2105-10-260","volume":"10","author":"E-Y Kim","year":"2009","unstructured":"Kim E-Y, Kim S-Y, Ashlock D, Nam D (2009) MULTI-K: accurate classification of microarray subtypes using ensemble K-means clustering. BMC Bioinform 10:260. https:\/\/doi.org\/10.1186\/1471-2105-10-260","journal-title":"BMC Bioinform"},{"key":"674_CR30","doi-asserted-by":"crossref","unstructured":"Chen J et al (2017) Outlier detection with autoencoder\nensembles. In: Proceedings of the 2017 SIAM international conference on data\nmining. Society for Industrial and Applied Mathematics","DOI":"10.1137\/1.9781611974973.11"},{"key":"674_CR31","doi-asserted-by":"publisher","first-page":"126","DOI":"10.2307\/2346830","volume":"28","author":"JA Hartigan","year":"1979","unstructured":"Hartigan JA (1979) A K-means clustering algorithm: Algorithm AS 136. Appl. Stat. 28:126\u2013130","journal-title":"Appl. Stat."},{"key":"674_CR32","doi-asserted-by":"publisher","first-page":"129","DOI":"10.1109\/TIT.1982.1056489","volume":"28","author":"S Lloyd","year":"1982","unstructured":"Lloyd S (1982) Least squares quantization in PCM\u2019s. IEEE Trans Inf Theory 28:129\u2013136. https:\/\/doi.org\/10.1109\/TIT.1982.1056489","journal-title":"IEEE Trans Inf Theory"},{"key":"674_CR33","unstructured":"Arthur D, Vassilvitskii S (2007) K-means++: the advantages of careful seeding. In: Proc. of the annu. ACM-SIAM Symp. on discrete algorithms. pp 1027\u20131035"},{"key":"674_CR34","doi-asserted-by":"publisher","first-page":"191","DOI":"10.1016\/0098-3004(84)90020-7","volume":"10","author":"J Bezdek","year":"1984","unstructured":"Bezdek J, Ehrlich R, Full W (1984) FCM\u2014the Fuzzy C-means clustering-algorithm. Comput Geosci 10:191\u2013203. https:\/\/doi.org\/10.1016\/0098-3004(84)90020-7","journal-title":"Comput Geosci"},{"key":"674_CR35","doi-asserted-by":"publisher","first-page":"95","DOI":"10.1080\/01969727408546059","volume":"4","author":"JC Dunn","year":"2008","unstructured":"Dunn JC (2008) Well-separated clusters and optimal fuzzy partitions. Cybern Syst 4:95\u2013104. https:\/\/doi.org\/10.1080\/01969727408546059","journal-title":"Cybern Syst"},{"key":"674_CR36","doi-asserted-by":"publisher","first-page":"32","DOI":"10.1080\/01969727308546046","volume":"3","author":"J Dunn","year":"1973","unstructured":"Dunn J (1973) A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. Cybern Syst 3:32\u201357. https:\/\/doi.org\/10.1080\/01969727308546046","journal-title":"Cybern Syst"},{"issue":"2","key":"674_CR37","doi-asserted-by":"publisher","first-page":"391","DOI":"10.1007\/s40747-020-00137-4","volume":"6","author":"R Pal","year":"2020","unstructured":"Pal R, Yadav S, Karnwal R (2020) EEWC: energy-efficien tweighted clustering method based on genetic algorithm for HWSNs. Complex Intell  Syst 6(2):391\u2013400","journal-title":"Complex Intell Syst"},{"key":"674_CR38","doi-asserted-by":"publisher","first-page":"59","DOI":"10.4018\/IJCVIP.2017010104","volume":"7","author":"S Malakar","year":"2017","unstructured":"Malakar S, Sharma P, Singh PK et al (2017) A holistic approach for handwritten Hindi word recognition. Int J Comput Vis Image Process 7:59\u201378. https:\/\/doi.org\/10.4018\/IJCVIP.2017010104","journal-title":"Int J Comput Vis Image Process"},{"key":"674_CR39","doi-asserted-by":"publisher","first-page":"243","DOI":"10.1145\/3130348.3130374","volume":"51","author":"K J\u00e4rvelin","year":"2017","unstructured":"J\u00e4rvelin K, Kek\u00e4l\u00e4inen J (2017) IR evaluation methods for retrieving highly relevant documents. ACM SIGIR Forum 51:243\u2013250. https:\/\/doi.org\/10.1145\/3130348.3130374","journal-title":"ACM SIGIR Forum"},{"issue":"1","key":"674_CR40","doi-asserted-by":"publisher","first-page":"100","DOI":"10.1017\/S1351324909005129","volume":"16","author":"C Manning","year":"2010","unstructured":"Manning C, Raghavan P, Sch\u00fctze H (2010) Introduction to information retrieval. Nat Lang Eng 16(1):100\u2013103","journal-title":"Nat Lang Eng"},{"key":"674_CR41","doi-asserted-by":"publisher","first-page":"1464","DOI":"10.1109\/5.58325","volume":"78","author":"T Kohonen","year":"1990","unstructured":"Kohonen T (1990) The self-organizing map. Proc IEEE 78:1464\u20131480","journal-title":"Proc IEEE"},{"key":"674_CR42","doi-asserted-by":"publisher","first-page":"419","DOI":"10.1016\/0360-8352(89)90160-5","volume":"16","author":"HK Seifoddini","year":"1989","unstructured":"Seifoddini HK (1989) Single linkage versus average linkage clustering in machine cells formation applications. Comput Ind Eng 16:419\u2013426","journal-title":"Comput Ind Eng"},{"key":"674_CR43","doi-asserted-by":"publisher","DOI":"10.1007\/s10618-015-0444-8","author":"G Campos","year":"2016","unstructured":"Campos G, Zimek A, Sander J et al (2016) On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Min Knowl Discov. https:\/\/doi.org\/10.1007\/s10618-015-0444-8","journal-title":"Data Min Knowl Discov"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-022-00674-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-022-00674-0\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-022-00674-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,8,3]],"date-time":"2022-08-03T10:25:53Z","timestamp":1659522353000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-022-00674-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,2,17]]},"references-count":43,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2022,8]]}},"alternative-id":["674"],"URL":"https:\/\/doi.org\/10.1007\/s40747-022-00674-0","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"value":"2199-4536","type":"print"},{"value":"2198-6053","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,2,17]]},"assertion":[{"value":"12 August 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"28 January 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"17 February 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}