{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,10]],"date-time":"2026-02-10T17:39:36Z","timestamp":1770745176153,"version":"3.49.0"},"reference-count":52,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2020,2,3]],"date-time":"2020-02-03T00:00:00Z","timestamp":1580688000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Knowl. Discov. Data"],"published-print":{"date-parts":[[2020,2,29]]},"abstract":"<jats:p>We present a novel notion of outlier, called the Concentration Free Outlier Factor, or CFOF. As a main contribution, we formalize the notion of concentration of outlier scores and theoretically prove that CFOF does not concentrate in the Euclidean space for any arbitrary large dimensionality. To the best of our knowledge, there are no other proposals of data analysis measures related to the Euclidean distance for which it has been provided theoretical evidence that they are immune to the concentration effect. We determine the closed form of the distribution of CFOF scores in arbitrarily large dimensionalities and show that the CFOF score of a point depends on its squared norm standard score and on the kurtosis of the data distribution, thus providing a clear and statistically founded characterization of this notion. Moreover, we leverage this closed form to provide evidence that the definition does not suffer of the hubness problem affecting other measures in high dimensions. We prove that the number of CFOF outliers coming from each cluster is proportional to cluster size and kurtosis, a property that we call semi-locality. We leverage theoretical findings to shed lights on properties of well-known outlier scores. Indeed, we determine that semi-locality characterizes existing reverse nearest neighbor-based outlier definitions, thus clarifying the exact nature of their observed local behavior. We also formally prove that classical distance-based and density-based outliers concentrate both for bounded and unbounded sample sizes and for fixed and variable values of the neighborhood parameter. We introduce the fast-CFOF algorithm for detecting outliers in large high-dimensional dataset. The algorithm has linear cost, supports multi-resolution analysis, and is embarrassingly parallel. Experiments highlight that the technique is able to efficiently process huge datasets and to deal even with large values of the neighborhood parameter, to avoid concentration, and to obtain excellent accuracy.<\/jats:p>","DOI":"10.1145\/3362158","type":"journal-article","created":{"date-parts":[[2020,2,3]],"date-time":"2020-02-03T14:54:43Z","timestamp":1580741683000},"page":"1-53","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":16,"title":["CFOF"],"prefix":"10.1145","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9860-7569","authenticated-orcid":false,"given":"Fabrizio","family":"Angiulli","sequence":"first","affiliation":[{"name":"University of Calabria, Rende, CS, Italy"}]}],"member":"320","published-online":{"date-parts":[[2020,2,3]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/373626.373638"},{"key":"e_1_2_1_2_1","volume-title":"Outlier Analysis","author":"Aggarwal Charu C."},{"key":"e_1_2_1_3_1","volume-title":"Aggarwal and Saket Sathe","author":"Charu","year":"2017"},{"key":"e_1_2_1_4_1","volume-title":"Proceedings of the International Conference on Managment of Data (SIGMOD\u201901)","author":"Aggarwal C. C."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10618-014-0365-y"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-71249-9_1"},{"key":"e_1_2_1_7_1","first-page":"1","article-title":"On the behavior of intrinsically high-dimensional spaces: Distances, direct and reverse nearest neighbors, and hubness","volume":"18","author":"Angiulli Fabrizio","year":"2018","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_2_1_8_1","first-page":"18","article-title":"Distance-based detection and prediction of outliers","volume":"2","author":"Angiulli Fabrizio","year":"2006","journal-title":"IEEE Transaction on Knowledge and Data Engineering"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/1497577.1497581"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/1508857.1508864"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1007\/3-540-45681-3_2"},{"key":"e_1_2_1_12_1","first-page":"17","article-title":"Outlier mining in large high-dimensional data sets","volume":"2","author":"Angiulli Fabrizio","year":"2005","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"key":"e_1_2_1_13_1","volume-title":"Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD\u201996)","author":"Arning A."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2007.04.012"},{"key":"e_1_2_1_15_1","unstructured":"V. Barnett and T. Lewis. 1994. Outliers in Statistical Data. John Wiley 8 Sons.  V. Barnett and T. Lewis. 1994. Outliers in Statistical Data. John Wiley 8 Sons."},{"key":"e_1_2_1_16_1","volume-title":"Adaptive Control Processes: A Guided Tour","author":"Bellman Richard"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.5555\/645503.656271"},{"key":"e_1_2_1_18_1","volume-title":"Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD\u201900)","author":"Breunig M. M."},{"key":"e_1_2_1_19_1","doi-asserted-by":"crossref","unstructured":"V. Chandola A. Banerjee and V. Kumar. 2009. Anomaly detection: A survey. ACM Computing Surveys 41 3 (2009) 15:1--15:58.  V. Chandola A. Banerjee and V. Kumar. 2009. Anomaly detection: A survey. ACM Computing Surveys 41 3 (2009) 15:1--15:58.","DOI":"10.1145\/1541880.1541882"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2010.235"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1145\/502807.502808"},{"key":"e_1_2_1_22_1","volume-title":"Nonparametric Statistics: A Step-by-Step Approach","author":"Corder G. W.","year":"2014"},{"key":"e_1_2_1_23_1","volume-title":"Precision at n","author":"Craswell Nick"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.1993.10476339"},{"key":"e_1_2_1_26_1","doi-asserted-by":"crossref","first-page":"290","DOI":"10.5486\/PMD.1959.6.3-4.12","article-title":"On random graphs I","volume":"6","author":"Erd\u00f6s P.","year":"1959","journal-title":"Publicationes Mathematicae Debrecen"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1111\/j.1751-5823.2009.00076.x"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2007.1037"},{"key":"e_1_2_1_29_1","unstructured":"J. Han and M. Kamber. 2001. Data Mining Concepts and Technique. Morgan Kaufmann San Francisco.  J. Han and M. Kamber. 2001. Data Mining Concepts and Technique. Morgan Kaufmann San Francisco."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICPR.2004.1334558"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1023\/B:AIRE.0000045502.10941.a9"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.14778\/2809974.2809988"},{"key":"e_1_2_1_33_1","volume-title":"Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD\u201901)","author":"Jin W."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1007\/11731139_68"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2012.88"},{"key":"e_1_2_1_36_1","volume-title":"Proceedings of the 24th International Conference on Very Large Data Bases (VLDB\u201998)","author":"Knorr E."},{"key":"e_1_2_1_37_1","volume-title":"Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD\u201908)","author":"Kriegel H.-P."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/1081870.1081891"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1137\/1.9781611972788.60"},{"key":"e_1_2_1_40_1","volume-title":"Isolation-based anomaly detection. ACM Transactions on Knowledge Discovery from Data 6, 1","author":"Liu F. T.","year":"2012"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.2307\/1427321"},{"key":"e_1_2_1_42_1","volume-title":"Proceedings 19th International Conference on Data Engineering (ICDE\u201903)","author":"Papadimitriou S."},{"key":"e_1_2_1_43_1","first-page":"169","article-title":"Skew variation, a rejoinder","volume":"4","author":"Pearson Karl","year":"1905","journal-title":"Biometrika"},{"key":"e_1_2_1_44_1","doi-asserted-by":"crossref","volume-title":"Introduction to Parallel Computing","author":"Petersen Wesley","DOI":"10.1093\/oso\/9780198515760.001.0001"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1145\/1553374.1553485"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2014.2365790"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1145\/342009.335437"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0020-0190(98)00127-6"},{"key":"e_1_2_1_49_1","volume-title":"Non-parametric Statistics for the Behavioral Sciences","author":"Siegel Sidney"},{"key":"e_1_2_1_50_1","first-page":"2009","article-title":"Dimensionality Reduction","author":"van der Maaten Laurens","year":"2009","journal-title":"A Comparative Review. Technical Report TiCC-TR"},{"key":"e_1_2_1_51_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.tcs.2005.09.003"},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1080\/00031305.2014.917055"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1002\/sam.11161"}],"container-title":["ACM Transactions on Knowledge Discovery from Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3362158","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3362158","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T23:44:54Z","timestamp":1750203894000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3362158"}},"subtitle":["A Concentration Free Measure for Anomaly Detection"],"short-title":[],"issued":{"date-parts":[[2020,2,3]]},"references-count":52,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2020,2,29]]}},"alternative-id":["10.1145\/3362158"],"URL":"https:\/\/doi.org\/10.1145\/3362158","relation":{},"ISSN":["1556-4681","1556-472X"],"issn-type":[{"value":"1556-4681","type":"print"},{"value":"1556-472X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,2,3]]},"assertion":[{"value":"2018-11-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2019-09-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-02-03","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}