{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T05:59:01Z","timestamp":1760162341239,"version":"build-2065373602"},"reference-count":8,"publisher":"MDPI AG","issue":"9","license":[{"start":{"date-parts":[[2024,9,11]],"date-time":"2024-09-11T00:00:00Z","timestamp":1726012800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computation"],"abstract":"<jats:p>Outlier detection, or anomaly detection as it is known in the machine learning community, has gained interest in recent years, and it is commonly used when the sample size is smaller than the number of variables. In 2015, an outlier detection procedure was proposed 7 for this high-dimensional setting, replacing the classic minimum covariance determinant estimator with the minimum diagonal product estimator. Computationally speaking, their method has two drawbacks: (a) it is not computationally efficient and does not scale up, and (b) it is not memory efficient and, in some cases, it is not possible to apply due to memory limits. We address the first issue via efficient code written in both R and C++, whereas for the second issue, we utilize the eigen decomposition and its properties. Experiments are conducted using simulated data to showcase the time improvement, while gene expression data are used to further examine some extra practicalities associated with the algorithm. The simulation studies yield a speed-up factor that ranges between 17 and 1800, implying a successful reduction in the estimator\u2019s computational burden.<\/jats:p>","DOI":"10.3390\/computation12090185","type":"journal-article","created":{"date-parts":[[2024,9,11]],"date-time":"2024-09-11T05:45:12Z","timestamp":1726033512000},"page":"185","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Computationally Efficient Outlier Detection for High-Dimensional Data Using the MDP Algorithm"],"prefix":"10.3390","volume":"12","author":[{"given":"Michail","family":"Tsagris","sequence":"first","affiliation":[{"name":"Department of Economics, University of Crete, Gallos Campus, 74100 Rethymnon, Greece"}]},{"given":"Manos","family":"Papadakis","sequence":"additional","affiliation":[{"name":"Independent Researcher, 71500 Heraklion, Greece"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4758-8871","authenticated-orcid":false,"given":"Abdulaziz","family":"Alenazi","sequence":"additional","affiliation":[{"name":"Department of Mathematics, College of Science, Northern Border University, Arar 73213, Saudi Arabia"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7122-4659","authenticated-orcid":false,"given":"Omar","family":"Alzeley","sequence":"additional","affiliation":[{"name":"Department of Mathematics, Al-Qunfudah University College, Umm Al-Qura University, Mecca 24382, Saudi Arabia"}]}],"member":"1968","published-online":{"date-parts":[[2024,9,11]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"73","DOI":"10.1002\/widm.2","article-title":"Robust statistics for outlier detection","volume":"1","author":"Rousseeuw","year":"2011","journal-title":"Wiley Interdiscip. Rev. Data Min. Knowl. Discov."},{"key":"ref_2","first-page":"33","article-title":"Machine Learning Techniques for Anomaly Detection: An Overview","volume":"79","author":"Omar","year":"2013","journal-title":"Int. J. Comput. Appl."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"589","DOI":"10.1093\/biomet\/asv021","article-title":"Outlier detection for high-dimensional data","volume":"102","author":"Ro","year":"2015","journal-title":"Biometrika"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"212","DOI":"10.1080\/00401706.1999.10485670","article-title":"A fast algorithm for the minimum covariance determinant estimator","volume":"41","author":"Rousseeuw","year":"1999","journal-title":"Technometrics"},{"key":"ref_5","unstructured":"Papadakis, M., Tsagris, M., Dimitriadis, M., Fafalios, S., Tsamardinos, I., Fasiolo, M., Borboudakis, G., Burkardt, J., Zou, C., and Lakiotaki, K. (2024, September 03). Rfast: A Collection of Efficient and Extremely Fast R Functions; R Package Version 2.1.0; 2023. Available online: https:\/\/cloud.r-project.org\/web\/packages\/Rfast\/Rfast.pdf."},{"key":"ref_6","unstructured":"Strang, G. (2023). Introduction to Linear Algebra, Wellesley-Cambridge Press. [6th ed.]."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"bay011","DOI":"10.1093\/database\/bay011","article-title":"BioDataome: A collection of uniformly preprocessed and automatically annotated datasets for data-driven biology","volume":"2018","author":"Lakiotaki","year":"2018","journal-title":"Database"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"26","DOI":"10.1007\/s11222-023-10328-x","article-title":"Cauchy robust principal component analysis with applications to high-dimensional data sets","volume":"34","author":"Fayomi","year":"2024","journal-title":"Stat. Comput."}],"container-title":["Computation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2079-3197\/12\/9\/185\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T15:53:43Z","timestamp":1760111623000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2079-3197\/12\/9\/185"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,9,11]]},"references-count":8,"journal-issue":{"issue":"9","published-online":{"date-parts":[[2024,9]]}},"alternative-id":["computation12090185"],"URL":"https:\/\/doi.org\/10.3390\/computation12090185","relation":{},"ISSN":["2079-3197"],"issn-type":[{"type":"electronic","value":"2079-3197"}],"subject":[],"published":{"date-parts":[[2024,9,11]]}}}