{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T21:06:54Z","timestamp":1774991214259,"version":"3.50.1"},"reference-count":25,"publisher":"Association for Computing Machinery (ACM)","issue":"11","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2021,7]]},"abstract":"<jats:p>\n            The median absolute deviation (MAD) is a statistic measuring the variability of a set of quantitative elements. It is known to be more robust to outliers than the standard deviation (SD), and thereby widely used in outlier detection. Computing the exact MAD however is costly, e.g., by calling an algorithm of finding median twice, with space cost\n            <jats:italic>O<\/jats:italic>\n            (\n            <jats:italic>n<\/jats:italic>\n            ) over\n            <jats:italic>n<\/jats:italic>\n            elements in a set. In this paper, we propose the first fully mergeable approximate MAD algorithm, OP-MAD, with one-pass scan of the data. Remarkably, by calling the proposed algorithm at most twice, namely TP-MAD, it guarantees to return an (\u03f5, 1)-accurate MAD, i.e., the error relative to the exact MAD is bounded by the desired \u03f5 or 1. The space complexity is reduced to\n            <jats:italic>O<\/jats:italic>\n            (\n            <jats:italic>m<\/jats:italic>\n            ) while the time complexity is\n            <jats:italic>O<\/jats:italic>\n            (\n            <jats:italic>n<\/jats:italic>\n            +\n            <jats:italic>m<\/jats:italic>\n            log\n            <jats:italic>m<\/jats:italic>\n            ), where\n            <jats:italic>m<\/jats:italic>\n            is the size of the sketch used to compress data, related to the desired error bound \u03f5. To get a more accurate MAD, i.e., with smaller \u03f5, the sketch size\n            <jats:italic>m<\/jats:italic>\n            will be larger, a trade-off between effectiveness and efficiency. In practice, we often have the sketch size\n            <jats:italic>m<\/jats:italic>\n            \u226a\n            <jats:italic>n<\/jats:italic>\n            , leading to constant space cost\n            <jats:italic>O<\/jats:italic>\n            (1) and linear time cost\n            <jats:italic>O<\/jats:italic>\n            (\n            <jats:italic>n<\/jats:italic>\n            ). The extensive experiments over various datasets demonstrate the superiority of our solution, e.g., 160000\u00d7 less memory and 18x faster than the aforesaid exact method in datasets\n            <jats:italic>pareto<\/jats:italic>\n            and\n            <jats:italic>norm<\/jats:italic>\n            . Finally, we further implement and evaluate the parallelizable TP-MAD in Apache Spark, and the fully mergeable OP-MAD in Structured Streaming.\n          <\/jats:p>","DOI":"10.14778\/3476249.3476266","type":"journal-article","created":{"date-parts":[[2021,10,27]],"date-time":"2021-10-27T16:46:23Z","timestamp":1635353183000},"page":"2114-2126","source":"Crossref","is-referenced-by-count":12,"title":["Approximating median absolute deviation with bounded error"],"prefix":"10.14778","volume":"14","author":[{"given":"Zhiwei","family":"Chen","sequence":"first","affiliation":[{"name":"Tsinghua University"}]},{"given":"Shaoxu","family":"Song","sequence":"additional","affiliation":[{"name":"Tsinghua University"}]},{"given":"Ziheng","family":"Wei","sequence":"additional","affiliation":[{"name":"HUAWEI Cloud BU"}]},{"given":"Jingyun","family":"Fang","sequence":"additional","affiliation":[{"name":"HUAWEI Cloud BU"}]},{"given":"Jiang","family":"Long","sequence":"additional","affiliation":[{"name":"HUAWEI Cloud BU"}]}],"member":"320","published-online":{"date-parts":[[2021,10,27]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"[n.d.]. Full version technical report. https:\/\/sxsong.github.io\/doc\/mad.pdf ([n. d.]).  [n.d.]. Full version technical report. https:\/\/sxsong.github.io\/doc\/mad.pdf ([n. d.])."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.3844\/jmssp.2008.102.107"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.3923\/jas.2009.2835.2840"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.3844\/jmssp.2012.37.41"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/2500128"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2017.04.070"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.3923\/jas.2011.528.534"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3190664"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/3183713.3190659"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1177\/1087057107312035"},{"key":"e_1_2_1_11_1","volume-title":"Bitcoin data from","author":"Dataset Kaggle Public","year":"2009"},{"key":"e_1_2_1_12_1","unstructured":"Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http:\/\/archive.ics.uci.edu\/ml  Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http:\/\/archive.ics.uci.edu\/ml"},{"key":"e_1_2_1_13_1","doi-asserted-by":"crossref","unstructured":"Philippe Flajolet \u00c9ric Fusy Olivier Gandouet and Fr\u00e9d\u00e9ric Meunier. 2007. Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm. In Discrete Mathematics and Theoretical Computer Science. Discrete Mathematics and Theoretical Computer Science 137--156.  Philippe Flajolet \u00c9ric Fusy Olivier Gandouet and Fr\u00e9d\u00e9ric Meunier. 2007. Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm. In Discrete Mathematics and Theoretical Computer Science. Discrete Mathematics and Theoretical Computer Science 137--156.","DOI":"10.46298\/dmtcs.3545"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.14778\/3236187.3236212"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1145\/376284.375670"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1002\/0470013192.bsa384"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/GMAI.2008.7"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1016\/jjesp.2013.03.013"},{"key":"e_1_2_1_19_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1002\/cem.1182","article-title":"Determination of rank by median absolute deviation (DRMAD): a simple method for determining the number of principal factors responsible for a data matrix","volume":"23","author":"Malinowski Edmund R","year":"2009","journal-title":"Journal of Chemometrics: A Journal of the Chemometrics Society"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1186\/s40537-019-0250-z"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.14778\/3352063.3352135"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1007\/s41019-020-00116-2"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2723730"},{"key":"e_1_2_1_24_1","unstructured":"Engineering ToolBox. 2012. Discrete Data Sets - Mean Median and Mode Values. https:\/\/www.engineeringtoolbox.com\/mean-median-modal-d_1846.html. Accessed: 2021-05-11.  Engineering ToolBox. 2012. Discrete Data Sets - Mean Median and Mode Values. https:\/\/www.engineeringtoolbox.com\/mean-median-modal-d_1846.html. Accessed: 2021-05-11."},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1081\/SAC-120003850"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3476249.3476266","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T09:58:54Z","timestamp":1672221534000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3476249.3476266"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,7]]},"references-count":25,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2021,7]]}},"alternative-id":["10.14778\/3476249.3476266"],"URL":"https:\/\/doi.org\/10.14778\/3476249.3476266","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2021,7]]}}}