{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,3]],"date-time":"2026-04-03T03:19:11Z","timestamp":1775186351476,"version":"3.50.1"},"reference-count":32,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2024,5,20]],"date-time":"2024-05-20T00:00:00Z","timestamp":1716163200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,5,20]],"date-time":"2024-05-20T00:00:00Z","timestamp":1716163200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100006752","name":"Universidade do Porto","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100006752","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int J Data Sci Anal"],"published-print":{"date-parts":[[2025,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Outlier detection is a widely used technique for identifying anomalous or exceptional events across various contexts. It has proven to be valuable in applications like fault detection, fraud detection, and real-time monitoring systems. Detecting outliers in real time is crucial in several industries, such as financial fraud detection and quality control in manufacturing processes. In the context of big data, the amount of data generated is enormous, and traditional batch mode methods are not practical since the entire dataset is not available. The limited computational resources further compound this issue. Boxplot is a widely used batch mode algorithm for outlier detection that involves several derivations. However, the lack of an incremental closed form for statistical calculations during boxplot construction poses considerable challenges for its application within the realm of big data. We propose an incremental\/online version of the boxplot algorithm to address these challenges. Our proposed algorithm is based on an approximation approach that involves numerical integration of the histogram and calculation of the cumulative distribution function. This approach is independent of the dataset\u2019s distribution, making it effective for all types of distributions, whether skewed or not. To assess the efficacy of the proposed algorithm, we conducted tests using simulated datasets featuring varying degrees of skewness. Additionally, we applied the algorithm to a real-world dataset concerning software fault detection, which posed a considerable challenge. The experimental results underscored the robust performance of our proposed algorithm, highlighting its efficacy comparable to batch mode methods that access the entire dataset. Our online boxplot method, leveraging dataset distribution to define whiskers, consistently achieved exceptional outlier detection results. Notably, our algorithm demonstrated computational efficiency, maintaining constant memory usage with minimal hyperparameter tuning.<\/jats:p>","DOI":"10.1007\/s41060-024-00559-0","type":"journal-article","created":{"date-parts":[[2024,5,20]],"date-time":"2024-05-20T16:01:57Z","timestamp":1716220917000},"page":"83-97","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":46,"title":["Online boxplot derived outlier detection"],"prefix":"10.1007","volume":"19","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2966-4113","authenticated-orcid":false,"given":"Arefeh","family":"Mazarei","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8414-5826","authenticated-orcid":false,"given":"Ricardo","family":"Sousa","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2471-2833","authenticated-orcid":false,"given":"Jo\u00e3o","family":"Mendes-Moreira","sequence":"additional","affiliation":[]},{"given":"Slavo","family":"Molchanov","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0732-342X","authenticated-orcid":false,"given":"Hugo Miguel","family":"Ferreira","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,5,20]]},"reference":[{"key":"559_CR1","doi-asserted-by":"publisher","first-page":"192","DOI":"10.1016\/j.inffus.2022.12.027","volume":"93","author":"P Zhang","year":"2023","unstructured":"Zhang, P., Li, T., Wang, G., Wang, D., Lai, P., Zhang, F.: A multi-source information fusion model for outlier detection. Inf. Fusion 93, 192\u2013208 (2023)","journal-title":"Inf. Fusion"},{"key":"559_CR2","doi-asserted-by":"publisher","first-page":"17","DOI":"10.1016\/j.inffus.2023.02.007","volume":"95","author":"Z Yuan","year":"2023","unstructured":"Yuan, Z., Chen, H., Luo, C., Peng, D.: Mfgad: Multi-fuzzy granules anomaly detection. Inf. Fusion 95, 17\u201325 (2023)","journal-title":"Inf. Fusion"},{"key":"559_CR3","doi-asserted-by":"crossref","unstructured":"Xu, H., Pang, G., Wang, Y., Wang, Y.: Deep isolation forest for anomaly detection. IEEE Trans. Knowl. Data Eng. (2023)","DOI":"10.1109\/TKDE.2023.3270293"},{"key":"559_CR4","doi-asserted-by":"publisher","DOI":"10.1016\/j.cosrev.2022.100463","volume":"44","author":"I Souiden","year":"2022","unstructured":"Souiden, I., Omri, M.N., Brahmi, Z.: A survey of outlier detection in high dimensional data streams. Comput. Sci. Rev. 44, 100463 (2022). https:\/\/doi.org\/10.1016\/j.cosrev.2022.100463","journal-title":"Comput. Sci. Rev."},{"issue":"4","key":"559_CR5","doi-asserted-by":"publisher","first-page":"348","DOI":"10.1080\/00031305.2018.1448891","volume":"72","author":"M Walker","year":"2018","unstructured":"Walker, M., Dovoedo, Y., Chakraborti, S., Hilton, C.: An improved boxplot for univariate data. Am. Stat. 72(4), 348\u2013353 (2018). https:\/\/doi.org\/10.1080\/00031305.2018.1448891","journal-title":"Am. Stat."},{"key":"559_CR6","doi-asserted-by":"publisher","first-page":"253","DOI":"10.1007\/s12530-010-9017-7","volume":"1","author":"K Tschumitschew","year":"2010","unstructured":"Tschumitschew, K., Klawonn, F.: Incremental quantile estimation. Evol. Syst. 1, 253\u2013264 (2010). https:\/\/doi.org\/10.1007\/s12530-010-9017-7","journal-title":"Evol. Syst."},{"issue":"12","key":"559_CR7","doi-asserted-by":"publisher","first-page":"5186","DOI":"10.1016\/j.csda.2007.11.008","volume":"52","author":"M Hubert","year":"2008","unstructured":"Hubert, M., Vandervieren, E.: An adjusted boxplot for skewed distributions. Comput. Stat. Data Anal. 52(12), 5186\u20135201 (2008). https:\/\/doi.org\/10.1016\/j.csda.2007.11.008","journal-title":"Comput. Stat. Data Anal."},{"key":"559_CR8","doi-asserted-by":"publisher","unstructured":"Aggarwal, C.C.: An Introduction to Outlier Analysis. Springer, Switzerland (2017). https:\/\/doi.org\/10.1007\/978-3-319-47578-3_1","DOI":"10.1007\/978-3-319-47578-3_1"},{"key":"559_CR9","doi-asserted-by":"publisher","unstructured":"Hawkins, D.M.: Identification of Outliers vol. 11. Springer, London (1980). https:\/\/doi.org\/10.1007\/978-94-015-3994-4","DOI":"10.1007\/978-94-015-3994-4"},{"key":"559_CR10","doi-asserted-by":"publisher","unstructured":"Grubbs, F.E.: Procedures for detecting outlying observations in samples. Technometrics 11(1), 1\u201321 (1969) https:\/\/doi.org\/10.1080\/00401706.1969.10490657","DOI":"10.1080\/00401706.1969.10490657"},{"key":"559_CR11","volume-title":"Outliers in Statistical Data","author":"V Barnett","year":"1994","unstructured":"Barnett, V., Lewis, T., et al.: Outliers in Statistical Data, vol. 3. Wiley, New York (1994)"},{"issue":"3","key":"559_CR12","first-page":"355","volume":"49","author":"Y Zhang","year":"2007","unstructured":"Zhang, Y., Meratnia, N., Havinga, P.: A taxonomy framework for unsupervised outlier detection techniques for multi-type data sets. Computer 49(3), 355\u2013363 (2007)","journal-title":"Computer"},{"key":"559_CR13","doi-asserted-by":"publisher","first-page":"47921","DOI":"10.1109\/ACCESS.2022.3172345","volume":"10","author":"T Toliopoulos","year":"2022","unstructured":"Toliopoulos, T., Gounaris, A.: Explainable distance-based outlier detection in data streams. IEEE Access 10, 47921\u201347936 (2022). https:\/\/doi.org\/10.1109\/ACCESS.2022.3172345","journal-title":"IEEE Access"},{"key":"559_CR14","doi-asserted-by":"publisher","unstructured":"Muhr, D., Affenzeller, M.: Little data is often enough for distance-based outlier detection. Procedia Comput. Sci. 200, 984\u2013992 (2022) https:\/\/doi.org\/10.1016\/j.procs.2022.01.297","DOI":"10.1016\/j.procs.2022.01.297"},{"issue":"4","key":"559_CR15","doi-asserted-by":"publisher","first-page":"1998","DOI":"10.1016\/j.eswa.2014.09.053","volume":"42","author":"CS Hemalatha","year":"2015","unstructured":"Hemalatha, C.S., Vaidehi, V., Lakshmi, R.: Minimal infrequent pattern based approach for mining outliers in data streams. Expert Syst. Appl. 42(4), 1998\u20132012 (2015). https:\/\/doi.org\/10.1016\/j.eswa.2014.09.053","journal-title":"Expert Syst. Appl."},{"issue":"1","key":"559_CR16","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/2133360.2133363","volume":"6","author":"FT Liu","year":"2012","unstructured":"Liu, F.T., Ting, K.M., Zhou, Z.-H.: Isolation-based anomaly detection. ACM Trans. Knowl. Discovery Data (TKDD) 6(1), 1\u201339 (2012). https:\/\/doi.org\/10.1145\/2133360.2133363","journal-title":"ACM Trans. Knowl. Discovery Data (TKDD)"},{"issue":"3","key":"559_CR17","doi-asserted-by":"publisher","first-page":"589","DOI":"10.1109\/TKDE.2011.261","volume":"25","author":"S Wu","year":"2011","unstructured":"Wu, S., Wang, S.: Information-theoretic outlier detection for large-scale categorical data. IEEE Trans. Knowl. Data Eng. 25(3), 589\u2013602 (2011). https:\/\/doi.org\/10.1109\/TKDE.2011.261","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"559_CR18","doi-asserted-by":"publisher","first-page":"2483","DOI":"10.1007\/s13042-018-0884-8","volume":"10","author":"F Jiang","year":"2019","unstructured":"Jiang, F., Zhao, H., Du, J., Xue, Y., Peng, Y.: Outlier detection based on approximation accuracy entropy. Int. J. Mach. Learn. Cybernet. 10, 2483\u20132499 (2019). https:\/\/doi.org\/10.1007\/s13042-018-0884-8","journal-title":"Int. J. Mach. Learn. Cybernet."},{"key":"559_CR19","doi-asserted-by":"publisher","unstructured":"Garg, S., Jain, S.: A brief survey on mass-based dissimilarity measures. In: International Conference on Innovative Computing and Communications: Proceedings of ICICC 2018, vol. 2, pp. 387\u2013395 (2019). https:\/\/doi.org\/10.1007\/978-981-13-2354-6_41 . Springer","DOI":"10.1007\/978-981-13-2354-6_41"},{"key":"559_CR20","unstructured":"Saleem, S., Aslam, M., Shaukat, M.R.: A review and empirical comparison of univariate outlier detection methods. Pak. J. Stat. 37(4) (2021)"},{"key":"559_CR21","volume-title":"Exploratory Data Analysis","author":"JWEA Tukey","year":"1977","unstructured":"Tukey, J.W.E.A.: Exploratory Data Analysis, vol. 2. Addison-Wesley, Reading, MA (1977)"},{"key":"559_CR22","volume-title":"Introduction to the Practice of Statistics","author":"DS Moore","year":"2009","unstructured":"Moore, D.S.: Introduction to the Practice of Statistics. WH Freeman and company, New York (2009)"},{"issue":"1","key":"559_CR23","doi-asserted-by":"publisher","first-page":"21","DOI":"10.2307\/2347808","volume":"39","author":"A Kimber","year":"1990","unstructured":"Kimber, A.: Exploratory data analysis for possibly censored data from skewed distributions. J. R. Stat. Soc.: Ser. C: Appl. Stat. 39(1), 21\u201330 (1990). https:\/\/doi.org\/10.2307\/2347808","journal-title":"J. R. Stat. Soc.: Ser. C: Appl. Stat."},{"issue":"4","key":"559_CR24","doi-asserted-by":"publisher","first-page":"996","DOI":"10.1198\/106186004X12632","volume":"13","author":"G Brys","year":"2004","unstructured":"Brys, G., Hubert, M., Struyf, A.: A robust measure of skewness. J. Comput. Graph. Stat. 13(4), 996\u20131017 (2004). https:\/\/doi.org\/10.1198\/106186004X12632","journal-title":"J. Comput. Graph. Stat."},{"key":"559_CR25","unstructured":"Bowley, A.: Elements of Statistics, 4 eds. Charles Scribner\u2019s Sons, New York, 220\u2013224 (1920)"},{"key":"559_CR26","doi-asserted-by":"publisher","unstructured":"Odoh, K.: Real-time anomaly detection for multivariate data streams. arXiv preprint arXiv:2209.12398 (2022) https:\/\/doi.org\/10.48550\/arXiv.2209.12398","DOI":"10.48550\/arXiv.2209.12398"},{"issue":"1","key":"559_CR27","first-page":"4945","volume":"22","author":"J Montiel","year":"2021","unstructured":"Montiel, J., Halford, M., Mastelini, S.M., Bolmier, G., Sourty, R., Vaysse, R., Zouitine, A., Gomes, H.M., Read, J., Abdessalem, T., et al.: River: machine learning for streaming data in python. J. Mach. Learn. Res. 22(1), 4945\u20134952 (2021)","journal-title":"J. Mach. Learn. Res."},{"key":"559_CR28","doi-asserted-by":"publisher","unstructured":"Shevlyakov, G., Kan, M.: Stream data preprocessing: Outlier detection based on the chebyshev inequality with applications. In: 2020 26th Conference of Open Innovations Association (FRUCT), pp. 402\u2013407 (2020). https:\/\/doi.org\/10.23919\/FRUCT48808.2020.9087459 . IEEE","DOI":"10.23919\/FRUCT48808.2020.9087459"},{"key":"559_CR29","doi-asserted-by":"publisher","unstructured":"Wang, P., Wang, H., Hart, P., Guo, X., Mahapatra, K.: Application of chebyshev\u2019s inequality in online anomaly detection driven by streaming pmu data. In: 2020 IEEE Power & Energy Society General Meeting (PESGM), pp. 1\u20135 (2020). https:\/\/doi.org\/10.1109\/PESGM41954.2020.9281553 . IEEE","DOI":"10.1109\/PESGM41954.2020.9281553"},{"key":"559_CR30","doi-asserted-by":"publisher","unstructured":"Pang, G., Cao, L., Chen, L., Liu, H.: Learning representations of ultrahigh-dimensional data for random distance-based outlier detection. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2041\u20132050 (2018). https:\/\/doi.org\/10.1145\/3219819.3220042","DOI":"10.1145\/3219819.3220042"},{"issue":"3","key":"559_CR31","doi-asserted-by":"publisher","first-page":"417","DOI":"10.1111\/j.1467-9574.1996.tb01507.x","volume":"50","author":"J Moors","year":"1996","unstructured":"Moors, J., Wagemakers, R.T.A., Coenen, V., Heuts, R., Janssens, M.: Characterizing systems of distributions by quantile measures. Stat. Neerl. 50(3), 417\u2013430 (1996). https:\/\/doi.org\/10.1111\/j.1467-9574.1996.tb01507.x","journal-title":"Stat. Neerl."},{"key":"559_CR32","unstructured":"Feller, W.: An Introduction to Probability Theory and Its Applications, Volume 2 vol. 81. John Wiley & Sons, United States (1991)"}],"container-title":["International Journal of Data Science and Analytics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s41060-024-00559-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s41060-024-00559-0\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s41060-024-00559-0.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,1,24]],"date-time":"2025-01-24T10:52:53Z","timestamp":1737715973000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s41060-024-00559-0"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,5,20]]},"references-count":32,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,1]]}},"alternative-id":["559"],"URL":"https:\/\/doi.org\/10.1007\/s41060-024-00559-0","relation":{},"ISSN":["2364-415X","2364-4168"],"issn-type":[{"value":"2364-415X","type":"print"},{"value":"2364-4168","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,5,20]]},"assertion":[{"value":"18 October 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"28 April 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 May 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. We hereby declare that there are no conflict of interest associated with this research work.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}},{"value":"The authors have no relevant financial or non-financial interests to disclose. The authors have no conflict of interest to declare that are relevant to the content of this article.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Financial or non-financial interests"}}]}}