{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T18:40:43Z","timestamp":1777488043366,"version":"3.51.4"},"reference-count":45,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2021,6,2]],"date-time":"2021-06-02T00:00:00Z","timestamp":1622592000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,6,2]],"date-time":"2021-06-02T00:00:00Z","timestamp":1622592000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Qatar National Library"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Big Data"],"published-print":{"date-parts":[[2021,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Detection and removal of outliers in a dataset is a fundamental preprocessing task without which the analysis of the data can be misleading. Furthermore, the existence of anomalies in the data can heavily degrade the performance of machine learning algorithms. In order to detect the anomalies in a dataset in an unsupervised manner, some novel statistical techniques are proposed in this paper. The proposed techniques are based on statistical methods considering data compactness and other properties. The newly proposed ideas are found efficient in terms of performance, ease of implementation, and computational complexity. Furthermore, two proposed techniques presented in this paper use transformation of data to a unidimensional distance space to detect the outliers, so irrespective of the data\u2019s high dimensions, the techniques remain computationally inexpensive and feasible. Comprehensive performance analysis of the proposed anomaly detection schemes is presented in the paper, and the newly proposed schemes are found better than the state-of-the-art methods when tested on several benchmark datasets.<\/jats:p>","DOI":"10.1186\/s40537-021-00469-z","type":"journal-article","created":{"date-parts":[[2021,6,2]],"date-time":"2021-06-02T13:09:59Z","timestamp":1622639399000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":63,"title":["Unsupervised outlier detection in multidimensional data"],"prefix":"10.1186","volume":"8","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0248-7919","authenticated-orcid":false,"given":"Atiq","family":"ur Rehman","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Samir Brahim","family":"Belhaouari","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2021,6,2]]},"reference":[{"key":"469_CR1","doi-asserted-by":"publisher","first-page":"107","DOI":"10.1016\/j.arcontrol.2018.09.003","volume":"46","author":"J Zhu","year":"2018","unstructured":"Zhu J, Ge Z, Song Z, Gao F. Review and big data perspectives on robust data mining approaches for industrial process modeling with outliers and missing data. Annu Rev Control. 2018;46:107\u201333.","journal-title":"Annu Rev Control"},{"key":"469_CR2","volume-title":"Handbook of research methods in social and personality psychology","author":"GH McClelland","year":"2000","unstructured":"McClelland GH. Nasty data: unruly, ill-mannered observations can ruin your analysis. In: Handbook of research methods in social and personality psychology. Cambridge: Cambridge University Press; 2000."},{"issue":"12","key":"469_CR3","doi-asserted-by":"publisher","first-page":"3351","DOI":"10.1109\/TCYB.2015.2504404","volume":"46","author":"B Fr\u00e9nay","year":"2015","unstructured":"Fr\u00e9nay B, Verleysen M. Reinforced extreme learning machines for fast robust regression in the presence of outliers. IEEE Trans Cybern. 2015;46(12):3351\u201363.","journal-title":"IEEE Trans Cybern"},{"key":"469_CR4","doi-asserted-by":"publisher","first-page":"13","DOI":"10.1007\/978-981-15-9519-6_2","volume-title":"New Developments unsupervised outlier detection","author":"X Wang","year":"2021","unstructured":"Wang X, Wang X, Wilkes M, Wang X, Wang X, Wilkes M. Developments in unsupervised outlier detection research. In: New Developments unsupervised outlier detection. Springer: Singapore; 2021. p. 13\u201336."},{"issue":"6","key":"469_CR5","doi-asserted-by":"publisher","first-page":"e1280","DOI":"10.1002\/widm.1280","volume":"8","author":"A Zimek","year":"2018","unstructured":"Zimek A, Filzmoser P. There and back again: outlier detection between statistical reasoning and data mining algorithms. Wiley Interdiscip Rev Data Min Knowl Discov. 2018;8(6):e1280.","journal-title":"Wiley Interdiscip Rev Data Min Knowl Discov"},{"issue":"3","key":"469_CR6","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/1541880.1541882","volume":"41","author":"V Chandola","year":"2009","unstructured":"Chandola V, Banerjee A, Kumar V. Anomaly detection: a survey. ACM Comput Surv. 2009;41(3):1\u201358.","journal-title":"ACM Comput Surv"},{"key":"469_CR7","doi-asserted-by":"crossref","unstructured":"Angelin B, Geetha A. Outlier detection using clustering techniques-K-means and K-median. In: Proceedings of the international conference on intelligent computing control system. ICICCS 2020; 2020. p. 373\u20138.","DOI":"10.1109\/ICICCS48265.2020.9120990"},{"key":"469_CR8","unstructured":"Bergman L, Hoshen Y. Classification-based anomaly detection for general data. arXiv; 2020."},{"key":"469_CR9","doi-asserted-by":"publisher","first-page":"2107","DOI":"10.1007\/s00521-020-05068-2","volume":"33","author":"A Wahid","year":"2020","unstructured":"Wahid A, Annavarapu CSR. NaNOD: a natural neighbour-based outlier detection algorithm. Neural Comput Appl. 2020;33:2107\u201323.","journal-title":"Neural Comput Appl"},{"key":"469_CR10","doi-asserted-by":"publisher","first-page":"788","DOI":"10.1007\/s11633-020-1243-2","volume":"17","author":"PD Doma\u0144ski","year":"2020","unstructured":"Doma\u0144ski PD. Study on statistical outlier detection and labelling. Int J Autom Comput. 2020;17:788\u2013811.","journal-title":"Int J Autom Comput"},{"key":"469_CR11","unstructured":"Dong Y, Hopkins SB, Li J. Quantum entropy scoring for fast robust mean estimation and improved outlier detection. arXiv; 2019."},{"issue":"2","key":"469_CR12","doi-asserted-by":"publisher","first-page":"190714","DOI":"10.1098\/rsos.190714","volume":"7","author":"O Shetta","year":"2020","unstructured":"Shetta O, Niranjan M. Robust subspace methods for outlier detection in genomic data circumvents the curse of dimensionality. R Soc Open Sci. 2020;7(2):190714.","journal-title":"R Soc Open Sci"},{"key":"469_CR13","doi-asserted-by":"publisher","first-page":"103301","DOI":"10.1016\/j.engappai.2019.103301","volume":"87","author":"P Li","year":"2020","unstructured":"Li P, Niggemann O. Non-convex hull based anomaly detection in CPPS. Eng Appl Artif Intell. 2020;87:103301.","journal-title":"Eng Appl Artif Intell"},{"key":"469_CR14","first-page":"24","volume":"2495","author":"A Borghesi","year":"2019","unstructured":"Borghesi A, Bartolini A, Lombardi M, Milano M, Benini L. Anomaly detection using autoencoders in high performance computing systems. CEUR Workshop Proc. 2019;2495:24\u201332.","journal-title":"CEUR Workshop Proc"},{"key":"469_CR15","unstructured":"Knorr E, Ng R. A unified notion of outliers: properties and computation. In: Proceedings of the 3rd ACM international conference on knowledge discovery and data mining (KDD), Newport Beach; 1997, p. 219\u201322."},{"key":"469_CR16","unstructured":"Knorr E, Ng R. Algorithms for mining distance-based outliers in large datasets. In: Proceedings of the 24th international conference on very large data bases (VLDB), New York; 1998, p. 392\u2013403."},{"key":"469_CR17","doi-asserted-by":"crossref","unstructured":"Wu G et al. A fast kNN-based approach for time sensitive anomaly detection over data streams. In: International conference on computational science; 2019, p. 59\u201374.","DOI":"10.1007\/978-3-030-22741-8_5"},{"key":"469_CR18","doi-asserted-by":"publisher","first-page":"42749","DOI":"10.1109\/ACCESS.2020.2977114","volume":"8","author":"R Zhu","year":"2020","unstructured":"Zhu R, et al. KNN-based approximate outlier detection algorithm over IoT streaming data. IEEE Access. 2020;8:42749\u201359.","journal-title":"IEEE Access"},{"key":"469_CR19","doi-asserted-by":"crossref","unstructured":"Ramaswamy S, Rastogi R, Shim K. Efficient algorithms for mining outliers from large data sets. In: Proceedings of the ACM international conference on management of data (SIGMOD), Dallas; 2000, p. 427\u201338.","DOI":"10.1145\/335191.335437"},{"issue":"2","key":"469_CR20","doi-asserted-by":"publisher","first-page":"203","DOI":"10.1109\/TKDE.2005.31","volume":"17","author":"F Angiulli","year":"2005","unstructured":"Angiulli F, Pizzuti C. Outlier mining in large high-dimensional data sets. IEEE Trans Knowl Data Eng. 2005;17(2):203\u201315.","journal-title":"IEEE Trans Knowl Data Eng"},{"key":"469_CR21","doi-asserted-by":"crossref","unstructured":"Breunig M, Kriegel H, Ng R, Sander J. LOF: identifying density-based local outliers. In: Proceedings of the ACM international conference on management of data (SIGMOD), Dallas; 2000, p. 93\u2013104.","DOI":"10.1145\/335191.335388"},{"key":"469_CR22","unstructured":"Tukey JW. Exploratoy data analysis. Addison-Wesley Ser Behav Sci; 1977."},{"key":"469_CR23","doi-asserted-by":"publisher","first-page":"21","DOI":"10.2307\/2347808","volume":"39","author":"AC Kimber","year":"1990","unstructured":"Kimber AC. Exploratory data analysis for possibly censored data from skewed distributions. Appl Stat. 1990;39:21\u201330.","journal-title":"Appl Stat"},{"key":"469_CR24","doi-asserted-by":"publisher","first-page":"13","DOI":"10.1007\/978-3-0348-7958-3_2","volume-title":"Theory and applications of recent robust methods","author":"L Aucremanne","year":"2004","unstructured":"Aucremanne L, Brys G, Hubert M, Rousseeuw PJ, Struyf A. A study of belgian inflation, relative prices and nominal rigidities using new robust measures of skewness and tail weight. In: Theory and applications of recent robust methods. Basel: Birkh\u00e4user; 2004. p. 13\u201325."},{"key":"469_CR25","doi-asserted-by":"publisher","first-page":"165","DOI":"10.1016\/j.csda.2003.10.012","volume":"47","author":"NC Schwertman","year":"2004","unstructured":"Schwertman NC, Owens MA, Adnan R. A simple more general boxplot method for identifying outliers. Comput Stat Data Anal. 2004;47:165\u201374.","journal-title":"Comput Stat Data Anal"},{"issue":"12","key":"469_CR26","doi-asserted-by":"publisher","first-page":"5186","DOI":"10.1016\/j.csda.2007.11.008","volume":"52","author":"M Hubert","year":"2008","unstructured":"Hubert M, Vandervieren E. An adjusted boxplot for skewed distributions. Comput Stat Data Anal. 2008;52(12):5186\u2013201.","journal-title":"Comput Stat Data Anal"},{"key":"469_CR27","doi-asserted-by":"crossref","unstructured":"Belhaouari SB, Ahmed S, Mansour S. Optimized K-means algorithm. Math Probl Eng. 2014; 2014.","DOI":"10.1155\/2014\/506480"},{"key":"469_CR28","unstructured":"N. Distribution. Encyclopedia.com: https:\/\/www.encyclopedia.com\/social-sciences\/applied-and-social-sciences-magazines\/distribution-normal. Gale encyclopedia of psychology."},{"key":"469_CR29","unstructured":"Casella G, Berger RL. Statistical inference, 2nd edn. Duxbury. ISBN 978-0-534-24312-8; 2001."},{"key":"469_CR30","doi-asserted-by":"publisher","first-page":"891","DOI":"10.1007\/s10618-015-0444-8","volume":"30","author":"GO Campos","year":"2016","unstructured":"Campos GO, et al. On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Min Knowl Discov. 2016;30:891\u2013927.","journal-title":"Data Min Knowl Discov"},{"key":"469_CR31","doi-asserted-by":"crossref","unstructured":"Angiulli F, Pizzuti C. Fast outlier detection in high dimensional spaces. In: Proceedings of the 6th European conference on principles of data mining and knowledge discovery (PKDD), Helsinki; 2002, p. 15\u201326.","DOI":"10.1007\/3-540-45681-3_2"},{"key":"469_CR32","doi-asserted-by":"crossref","unstructured":"Hautam\u00e4ki V, K\u00e4rkk\u00e4inen I, Fr\u00e4nti P. Outlier detection using k-nearest neighbor graph. In: Proceedings of the 17th international conference on pattern recognition (ICPR), Cambridge; 2004, p. 430\u20133.","DOI":"10.1109\/ICPR.2004.1334558"},{"issue":"1","key":"469_CR33","doi-asserted-by":"publisher","first-page":"190","DOI":"10.1007\/s10618-012-0300-z","volume":"28","author":"E Schubert","year":"2014","unstructured":"Schubert E, Zimek A, Kriegel H. Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection. Data Min Knowl Discov. 2014;28(1):190\u2013237.","journal-title":"Data Min Knowl Discov"},{"key":"469_CR34","doi-asserted-by":"crossref","unstructured":"Tang J, Chen Z, Fu A, Cheung D. Enhancing effectiveness of outlier detections for low density patterns. In: Proceedings of the 6th Pacific-Asia conference on knowledge discovery and data mining (PAKDD), Taipei; 2002, p. 535\u201348.","DOI":"10.1007\/3-540-47887-6_53"},{"key":"469_CR35","doi-asserted-by":"crossref","unstructured":"Jin W, Tung A, Han J, Wang W. Ranking outliers using symmetric neighborhood relationship. In: Proceedings of the 10th Pacific-Asia conference on knowledge discovery and data mining (PAKDD), Singapore; 2006, p. 577\u201393.","DOI":"10.1007\/11731139_68"},{"key":"469_CR36","doi-asserted-by":"crossref","unstructured":"Kriegel H, Kr\u00f6ger P, Schubert E, Zimek A. LoOP: local outlier probabilities. In: Proceedings of the 18th ACM conference on information and knowledge management (CIKM), Hong Kong; 2009, p. 1649\u201352.","DOI":"10.1145\/1645953.1646195"},{"key":"469_CR37","doi-asserted-by":"crossref","unstructured":"Zhang K, Hutter M, Jin H. A new local distance-based outlier detection approach for scattered real- world data. In: Proceedings of the 13th Pacific-Asia conference on knowledge discovery and data mining (PAKDD), Bangkok; 2009, p. 813\u201322.","DOI":"10.1007\/978-3-642-01307-2_84"},{"key":"469_CR38","doi-asserted-by":"crossref","unstructured":"Latecki L, Lazarevic A, Pokrajac D. Outlier detection with kernel density functions. In: Proceedings of the 5th international conference on machine learning and data mining in pattern recognition (MLDM), Leipzig; 2007, p. 61\u201375.","DOI":"10.1007\/978-3-540-73499-4_6"},{"key":"469_CR39","doi-asserted-by":"crossref","unstructured":"Schubert E, Zimek A, Kriegel H. Generalized outlier detection with flexible kernel density estimates. In: Proceedings of the 14th SIAM International Conference on Data Mining (SDM), Philadelphia; 2014, p. 542\u201350.","DOI":"10.1137\/1.9781611973440.63"},{"issue":"8","key":"469_CR40","doi-asserted-by":"publisher","first-page":"1517","DOI":"10.1109\/TKDE.2019.2905559","volume":"32","author":"Y Liu","year":"2020","unstructured":"Liu Y, et al. Generative adversarial active learning for unsupervised outlier detection. IEEE Trans Knowl Data Eng. 2020;32(8):1517\u201328.","journal-title":"IEEE Trans Knowl Data Eng"},{"key":"469_CR41","doi-asserted-by":"crossref","unstructured":"Abe N, Zadrozny B, Langford J. Outlier detection by active learning. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, vol. 2006; 2006, p. 504\u20139","DOI":"10.1145\/1150402.1150459"},{"key":"469_CR42","doi-asserted-by":"crossref","unstructured":"Yang X, Latecki LJ, Pokrajac D. Outlier detection with globally optimal exemplar-based GMM. In: Proceedings of the applied mathematics, society for industrial and applied mathematics\u20149th SIAM international conference on data minning 2009, vol. 1; 2009, p. 144\u201353.","DOI":"10.1137\/1.9781611972795.13"},{"key":"469_CR43","first-page":"21","volume":"136","author":"G Cohen","year":"2008","unstructured":"Cohen G, Sax H, Geissbuhler A. Novelty detection using one-class parzen density estimator. An application to surveillance of nosocomial infections. Stud Health Technol Inform. 2008;136:21\u20136.","journal-title":"Stud Health Technol Inform"},{"issue":"7","key":"469_CR44","doi-asserted-by":"publisher","first-page":"1443","DOI":"10.1162\/089976601750264965","volume":"13","author":"JC Platt","year":"2001","unstructured":"Platt JC, Shawe-Taylor J, Smola AJ, Williamson RC, Scholkopf B. Estimating the support of a high-dimensional distribution. Neural Comput. 2001;13(7):1443\u201371.","journal-title":"Neural Comput"},{"key":"469_CR45","doi-asserted-by":"crossref","unstructured":"Kriegel H, Schubert M, Zimek A. Angle-based outlier detection in high-dimensional data. In: Proceedings of the 14th ACM international conference on knowledge discovery and data mining (SIGKDD), Las Vegas; 2008, p. 444\u201352.","DOI":"10.1145\/1401890.1401946"}],"container-title":["Journal of Big Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-021-00469-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s40537-021-00469-z\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-021-00469-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,6,2]],"date-time":"2021-06-02T13:14:41Z","timestamp":1622639681000},"score":1,"resource":{"primary":{"URL":"https:\/\/journalofbigdata.springeropen.com\/articles\/10.1186\/s40537-021-00469-z"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,6,2]]},"references-count":45,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,12]]}},"alternative-id":["469"],"URL":"https:\/\/doi.org\/10.1186\/s40537-021-00469-z","relation":{},"ISSN":["2196-1115"],"issn-type":[{"value":"2196-1115","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,6,2]]},"assertion":[{"value":"17 February 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"15 May 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 June 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"Authors declare no competing interest.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"80"}}