{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,11,27]],"date-time":"2024-11-27T15:40:08Z","timestamp":1732722008183,"version":"3.28.2"},"reference-count":25,"publisher":"Association for Computing Machinery (ACM)","issue":"11","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2024,7]]},"abstract":"<jats:p>While outlier detection has been widely studied over streaming data, the query of outliers in time series databases was largely overlooked. Apache IoTDB, an open-source time series database, employs LSM-tree based storage to support intensive writing workloads, yet this storage structure unfortunately encumbers the outlier query performing. In the system, data points of a time series may be stored in multiple files with overlapping time ranges, owing to the far delayed data arrivals, which are simply discarded in streaming outlier detection. Given the overlapping time ranges, it is not able to detect outliers in each file and merge them as the results. In this paper, we focus on optimizing the efficiency of distance-based outlier query in Apache IoTDB, with the consideration of overlapping files for delayed data. We propose to utilize bucket statistics of the values stored in files. Upper and lower bounds on the neighbor counts of data points are derived in buckets and overlapping files for efficient pruning. Extensive experiments demonstrate the efficiency of our proposal in the LSM-tree based time series database, Apache IoTDB, compared to the existing outlier detection methods designed for data streams.<\/jats:p>","DOI":"10.14778\/3681954.3681962","type":"journal-article","created":{"date-parts":[[2024,8,30]],"date-time":"2024-08-30T16:23:36Z","timestamp":1725035016000},"page":"2778-2790","update-policy":"http:\/\/dx.doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Distance-Based Outlier Query Optimization in Apache IoTDB"],"prefix":"10.14778","volume":"17","author":[{"given":"Yunxiang","family":"Su","sequence":"first","affiliation":[{"name":"Tsinghua University"}]},{"given":"Shaoxu","family":"Song","sequence":"additional","affiliation":[{"name":"Tsinghua University"}]},{"given":"Xiangdong","family":"Huang","sequence":"additional","affiliation":[{"name":"Tsinghua University"}]},{"given":"Chen","family":"Wang","sequence":"additional","affiliation":[{"name":"Tsinghua University"}]},{"given":"Jianmin","family":"Wang","sequence":"additional","affiliation":[{"name":"Tsinghua University"}]}],"member":"320","published-online":{"date-parts":[[2024,8,30]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"2024. Apache IoTDB. https:\/\/iotdb.apache.org\/."},{"key":"e_1_2_1_2_1","unstructured":"2024. Documentation. https:\/\/iotdb.apache.org\/UserGuide\/V1.2.x\/Reference\/UDF-Libraries.html#outlier."},{"key":"e_1_2_1_3_1","unstructured":"2024. Code. https:\/\/github.com\/apache\/iotdb\/tree\/research\/outlier."},{"key":"e_1_2_1_4_1","unstructured":"2024. Experimental code. https:\/\/github.com\/iotdb-lsmod\/iotdb-lsmod."},{"key":"e_1_2_1_5_1","unstructured":"2024. Supplementary. https:\/\/iotdb-lsmod.github.io\/iotdb-lsmod\/supplementary.pdf."},{"key":"e_1_2_1_6_1","unstructured":"2024. Apache Flink. https:\/\/flink.apache.org\/."},{"key":"e_1_2_1_7_1","unstructured":"2024. Microsoft StreamInsight. https:\/\/download.microsoft.com\/documents\/uk\/bieb\/Microsoft_CEP_Overview.pdf."},{"key":"e_1_2_1_8_1","unstructured":"2024. Apache Beam. https:\/\/beam.apache.org\/."},{"key":"e_1_2_1_9_1","unstructured":"2024. Apache Beam Docmentation. https:\/\/beam.apache.org\/documentation\/basics\/."},{"key":"e_1_2_1_10_1","unstructured":"2024. Apache Spark. https:\/\/spark.apache.org\/streaming\/."},{"key":"e_1_2_1_11_1","unstructured":"2024. Spark streaming Documentation. https:\/\/spark.apache.org\/docs\/latest\/structured-streaming-programming-guide.html."},{"key":"e_1_2_1_12_1","volume-title":"Zdonik","author":"Abadi Daniel J.","year":"2005","unstructured":"Daniel J. Abadi, Yanif Ahmad, Magdalena Balazinska, Ugur \u00c7etintemel, Mitch Cherniack, Jeong-Hyon Hwang, Wolfgang Lindner, Anurag Maskey, Alex Rasin, Esther Ryvkina, Nesime Tatbul, Ying Xing, and Stanley B. Zdonik. 2005. The Design of the Borealis Stream Processing Engine. In CIDR. www.cidrdb.org, 277--289."},{"key":"e_1_2_1_13_1","unstructured":"Roger S. Barga Jonathan Goldstein Mohamed H. Ali and Mingsheng Hong. 2007. Consistent Streaming Through Time: A Vision for Event Stream Processing. In CIDR. www.cidrdb.org 363--374."},{"volume-title":"Impatience Is a Virtue: Revisiting Disorder in High-Performance Log Analytics","author":"Chandramouli Badrish","key":"e_1_2_1_14_1","unstructured":"Badrish Chandramouli, Jonathan Goldstein, and Yinan Li. 2018. Impatience Is a Virtue: Revisiting Disorder in High-Performance Log Analytics. In ICDE. IEEE Computer Society, 677--688."},{"volume-title":"Separation or Not: On Handing Out-of-Order Time-Series Data in Leveled LSM-Tree","author":"Kang Yuyuan","key":"e_1_2_1_15_1","unstructured":"Yuyuan Kang, Xiangdong Huang, Shaoxu Song, Lingzhe Zhang, Jialin Qiao, Chen Wang, Jianmin Wang, and Julian Feinauer. 2022. Separation or Not: On Handing Out-of-Order Time-Series Data in Leveled LSM-Tree. In ICDE. IEEE, 3340--3352."},{"volume-title":"Continuous monitoring of distance-based outliers over data streams","author":"Kontaki Maria","key":"e_1_2_1_16_1","unstructured":"Maria Kontaki, Anastasios Gounaris, Apostolos N. Papadopoulos, Kostas Tsichlas, and Yannis Manolopoulos. 2011. Continuous monitoring of distance-based outliers over data streams. In ICDE. IEEE Computer Society, 135--146."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.14778\/1453856.1453890"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1007\/s002360050048"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.14778\/2994509.2994526"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.14778\/3425879.3425885"},{"key":"e_1_2_1_21_1","doi-asserted-by":"crossref","unstructured":"Joris van Rooij Vincenzo Gulisano and Marina Papatriantafilou. 2020. TinTiN: Travelling in time (if necessary) to deal with out-of-order data in streaming aggregation. In DEBS. ACM 141--152.","DOI":"10.1145\/3401025.3401769"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/3589775"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.14778\/3342263.3342269"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.14778\/3115404.3115410"},{"volume-title":"Backward-Sort for Time Series in Apache IoTDB","author":"Zhang Xiaojian","key":"e_1_2_1_25_1","unstructured":"Xiaojian Zhang, Hongyin Zhang, Shaoxu Song, Xiangdong Huang, Chen Wang, and Jianmin Wang. 2023. Backward-Sort for Time Series in Apache IoTDB. In ICDE. IEEE, 3196--3208."}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3681954.3681962","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,11,27]],"date-time":"2024-11-27T15:15:51Z","timestamp":1732720551000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3681954.3681962"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7]]},"references-count":25,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2024,7]]}},"alternative-id":["10.14778\/3681954.3681962"],"URL":"https:\/\/doi.org\/10.14778\/3681954.3681962","relation":{},"ISSN":["2150-8097"],"issn-type":[{"type":"print","value":"2150-8097"}],"subject":[],"published":{"date-parts":[[2024,7]]},"assertion":[{"value":"2024-08-30","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}