{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,8]],"date-time":"2026-01-08T02:18:33Z","timestamp":1767838713010,"version":"3.49.0"},"reference-count":13,"publisher":"Association for Computing Machinery (ACM)","issue":"12","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2024,8]]},"abstract":"<jats:p>Billions of data points are generated by devices equipped with thousands of sensors, leading to significant data quality issues in time series data. These errors not only complicate time series data management but also compromise the accuracy and reliability of analysis based on such data. Given the noteworthy characteristics of time series data, existing cleaning methods struggle to provide adequate repairs, and tools supporting expressive constraints for time series remain scarce. To address this, we develop Clean4TSDB, a specialized data cleaning system for time series databases. This system integrates three key modules: expressive data quality constraint discovery, violation detection, and multivariate time series repairing, forming a comprehensive \"profiling-detection-repair\" workflow. Technically, we introduce TSDD, a data quality constraint that effectively captures contextual relationships within multivariate time series, and implement an efficient algorithm for its automated mining. Leveraging both row- and column-based constraints, we propose an effective time series cleaning algorithm. From a system standpoint, Clean4TSDB is pre-configured for seamless integration with time series databases like Apache IoTDB. Using user-provided and algorithmically-mined constraints, it effectively identifies various error patterns and offers reliable cleaning solutions. Furthermore, we establish a comprehensive library of state-of-the-art time series repair algorithms to meet the diverse needs of different management scenarios.<\/jats:p>","DOI":"10.14778\/3685800.3685879","type":"journal-article","created":{"date-parts":[[2024,11,8]],"date-time":"2024-11-08T17:25:21Z","timestamp":1731086721000},"page":"4377-4380","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":12,"title":["Clean4TSDB: A Data Cleaning Tool for Time Series Databases"],"prefix":"10.14778","volume":"17","author":[{"given":"Xiaoou","family":"Ding","sequence":"first","affiliation":[{"name":"Harbin Institute of Technology"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yichen","family":"Song","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hongzhi","family":"Wang","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Donghua","family":"Yang","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chen","family":"Wang","sequence":"additional","affiliation":[{"name":"Tsinghua University"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jianmin","family":"Wang","sequence":"additional","affiliation":[{"name":"Tsinghua University"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2024,11,8]]},"reference":[{"key":"e_1_2_1_1_1","first-page":"78","article-title":"Data Quality for Temporal Streams","volume":"39","author":"Dasu Tamraparni","year":"2016","unstructured":"Tamraparni Dasu, Rong Duan, and Divesh Srivastava. 2016. Data Quality for Temporal Streams. IEEE Data Eng. Bull. 39, 2 (2016), 78--92.","journal-title":"IEEE Data Eng. Bull."},{"key":"e_1_2_1_2_1","volume-title":"Time Series Data Cleaning Under Expressive Constraints on Both Rows and Columns","author":"Ding Xiaoou","unstructured":"Xiaoou Ding, Genglong Li, Hongzhi Wang, Chen Wang, and Yichen Song. 2024. Time Series Data Cleaning Under Expressive Constraints on Both Rows and Columns. In ICDE. IEEE, 3682--3695."},{"key":"e_1_2_1_3_1","volume-title":"TSDDISCOVER: Discovering Data Dependency for Time Series Data. In ICDE. 3668--3681.","author":"Ding Xiaoou","year":"2024","unstructured":"Xiaoou Ding, Yingze Li, Hongzhi Wang, Chen Wang, Yida Liu, and Jianmin Wang. 2024. TSDDISCOVER: Discovering Data Dependency for Time Series Data. In ICDE. 3668--3681."},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-30678-5_54"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.14778\/3352063.3352066"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.jnca.2016.08.002"},{"key":"e_1_2_1_7_1","volume-title":"Automatic Data Repair: Are We Ready to Deploy? CoRR abs\/2310.00711","author":"Ni Wei","year":"2023","unstructured":"Wei Ni, Xiaoye Miao, Xiangyu Zhao, Yangyang Wu, and Jianwei Yin. 2023. Automatic Data Repair: Are We Ready to Deploy? CoRR abs\/2310.00711 (2023)."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.14778\/3538598.3538602"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/3465740"},{"key":"e_1_2_1_10_1","doi-asserted-by":"crossref","unstructured":"Shaoxu Song and Aoqian Zhang. 2020. IoT Data Quality. In CIKM. ACM 3517--3518.","DOI":"10.1145\/3340531.3412173"},{"key":"e_1_2_1_11_1","volume-title":"Yu","author":"Song Shaoxu","year":"2015","unstructured":"Shaoxu Song, Aoqian Zhang, Jianmin Wang, and Philip S. Yu. 2015. SCREEN: Stream Data Cleaning under Speed Constraints. In SIGMOD. 827--841."},{"key":"e_1_2_1_12_1","doi-asserted-by":"crossref","unstructured":"Chen Wang Jialin Qiao Xiaodong Huang Shaoxu Song Haonan Hou Tian Jiang Lei Rui Jianmin Wang and Jiaguang Sun. 2023. Apache IoTDB: A Time Series Database for IoT Applications. In SIGMOD. ACM.","DOI":"10.1145\/3589775"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.14778\/3115404.3115410"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3685800.3685879","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,12,31]],"date-time":"2024-12-31T05:26:19Z","timestamp":1735622779000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3685800.3685879"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,8]]},"references-count":13,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2024,8]]}},"alternative-id":["10.14778\/3685800.3685879"],"URL":"https:\/\/doi.org\/10.14778\/3685800.3685879","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2024,8]]},"assertion":[{"value":"2024-11-08","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}