{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T19:00:51Z","timestamp":1774983651666,"version":"3.50.1"},"reference-count":43,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2025,2,10]],"date-time":"2025-02-10T00:00:00Z","timestamp":1739145600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100006374","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62021002,62072265,62232005"],"award-info":[{"award-number":["62021002,62072265,62232005"]}],"id":[{"id":"10.13039\/501100006374","id-type":"DOI","asserted-by":"publisher"}]},{"name":"National Key Research and Development Plan","award":["2021YFB330050"],"award-info":[{"award-number":["2021YFB330050"]}]},{"DOI":"10.13039\/501100017582","name":"Beijing National Research Center for Information Science and Technology","doi-asserted-by":"crossref","award":["BNR2022RC01011"],"award-info":[{"award-number":["BNR2022RC01011"]}],"id":[{"id":"10.13039\/501100017582","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Alibaba Innovative Research (AIR) Program"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. ACM Manag. Data"],"published-print":{"date-parts":[[2025,2,10]]},"abstract":"<jats:p>Time series data are often clustered repeatedly across various time ranges to mine frequent subsequence patterns from different periods, which could further support downstream applications. Existing state-of-the-art (SOTA) time series clustering method, such as K-Shape, can proficiently cluster time series data referring to their shapes. However, in-database time series clustering problem has been neglected, especially in IoT scenarios with large-volume data and high efficiency demands. Most time series databases employ LSM-Tree based storage to support intensive writings, yet causing underlying data points out-of-order in timestamps. Therefore, to apply existing out-of-database methods, all data points must be fully loaded into memory and chronologically sorted. Additionally, out-of-database methods must cluster from scratch each time, making them inefficient when handling queries across different time ranges. In this work, we propose an in-database adaptation of SOTA time series clustering method K-Shape. Moreover, to solve the problem that K-Shape cannot efficiently handle long time series, we propose Medoid-Shape, as well as its in-database adaptation for further acceleration. Extensive experiments are conducted to demonstrate the higher efficiency of our proposals, with comparable effectiveness. Remarkably, all proposals have already been implemented in an open-source commodity time series database, Apache IoTDB.<\/jats:p>","DOI":"10.1145\/3709696","type":"journal-article","created":{"date-parts":[[2025,2,11]],"date-time":"2025-02-11T15:45:06Z","timestamp":1739288706000},"page":"1-26","source":"Crossref","is-referenced-by-count":1,"title":["In-Database Time Series Clustering"],"prefix":"10.1145","volume":"3","author":[{"ORCID":"https:\/\/orcid.org\/0009-0004-7584-6913","authenticated-orcid":false,"given":"Yunxiang","family":"Su","sequence":"first","affiliation":[{"name":"Tsinghua University, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0005-1131-8457","authenticated-orcid":false,"given":"Kenny Ye","family":"Liang","sequence":"additional","affiliation":[{"name":"Tsinghua University, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9503-2755","authenticated-orcid":false,"given":"Shaoxu","family":"Song","sequence":"additional","affiliation":[{"name":"BNRist, Tsinghua University, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2025,2,11]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"2024. Apache IoTDB. https:\/\/iotdb.apache.org."},{"key":"e_1_2_1_2_1","unstructured":"2024. Code. https:\/\/github.com\/apache\/iotdb\/tree\/research\/indb-ts-cluster\/."},{"key":"e_1_2_1_3_1","unstructured":"2024. Experimental code. https:\/\/github.com\/indb-ts-cluster\/indb-ts-cluster\/."},{"key":"e_1_2_1_4_1","unstructured":"2024. InfluxData. https:\/\/www.influxdata.com."},{"key":"e_1_2_1_5_1","unstructured":"2024. PostgreSQL. https:\/\/www.postgresql.org."},{"key":"e_1_2_1_6_1","unstructured":"2024. RocksDB. https:\/\/rocksdb.org."},{"key":"e_1_2_1_7_1","unstructured":"2024. Supplementary. https:\/\/indb-ts-cluster.github.io\/supplementary.pdf."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.is.2015.04.007"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.14778\/3565816.3565822"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.14778\/3467861.3467863"},{"key":"e_1_2_1_11_1","doi-asserted-by":"crossref","unstructured":"Moses Charikar. 2002. Similarity estimation techniques from rounding algorithms. In STOC. ACM 380--388.","DOI":"10.1145\/509907.509965"},{"key":"e_1_2_1_12_1","volume-title":"On the Kullback-Leibler information divergence of locally stationary processes. Stochastic processes and their applications 62, 1","author":"Dahlhaus Rainer","year":"1996","unstructured":"Rainer Dahlhaus. 1996. On the Kullback-Leibler information divergence of locally stationary processes. Stochastic processes and their applications 62, 1 (1996), 139--168."},{"key":"e_1_2_1_13_1","volume-title":"Yan Zhu, Shaghayegh Gharghabi, Chotirat Ann Ratanamahatana, Yanping, Bing Hu, Nurjahan Begum, Anthony Bagnall, Abdullah Mueen, Gustavo Batista, and Hexagon-ML.","author":"Dau Hoang Anh","year":"2018","unstructured":"Hoang Anh Dau, Eamonn Keogh, Kaveh Kamgar, Chin-Chia Michael Yeh, Yan Zhu, Shaghayegh Gharghabi, Chotirat Ann Ratanamahatana, Yanping, Bing Hu, Nurjahan Begum, Anthony Bagnall, Abdullah Mueen, Gustavo Batista, and Hexagon-ML. 2018. The UCR Time Series Classification Archive. https:\/\/www.cs.ucr.edu\/~eamonn\/time_series_ data_2018\/."},{"key":"e_1_2_1_14_1","volume-title":"Fast Subsequence Matching in Time-Series Databases. In SIGMOD Conference. ACM Press, 419--429","author":"Faloutsos Christos","year":"1994","unstructured":"Christos Faloutsos, M. Ranganathan, and Yannis Manolopoulos. 1994. Fast Subsequence Matching in Time-Series Databases. In SIGMOD Conference. ACM Press, 419--429."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.14778\/3538598.3538607"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2737792"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCYB.2022.3190705"},{"key":"e_1_2_1_18_1","volume-title":"Bagnall","author":"Holder Christopher","year":"2023","unstructured":"Christopher Holder, David Guijo-Rubio, and Anthony J. Bagnall. 2023. Clustering Time Series with k-Medoids Based Algorithms. In AALTD@ECML\/PKDD (Lecture Notes in Computer Science, Vol. 14343). Springer, 39--55."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-47426-3_25"},{"key":"e_1_2_1_20_1","doi-asserted-by":"crossref","unstructured":"Mahmoud Abo Khamis Hung Q. Ngo XuanLong Nguyen Dan Olteanu and Maximilian Schleich. 2018. In-Database Learning with Sparse Tensors. In PODS. ACM 325--340.","DOI":"10.1145\/3196959.3196960"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDMW.2010.39"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.2307\/2685263"},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2022.07.105"},{"key":"e_1_2_1_24_1","volume-title":"Proceedings of the fifth Berkeley symposium on mathematical statistics and probability","volume":"1","author":"James","unstructured":"James MacQueen et al. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Vol. 1. Oakland, CA, USA, 281--297."},{"key":"e_1_2_1_25_1","first-page":"359","article-title":"Evolutionary Active Constrained Clustering for Obstructive Sleep Apnea Analysis. Data Sci","volume":"3","author":"Mai Son T.","year":"2018","unstructured":"Son T. Mai, Sihem Amer-Yahia, S\u00e9bastien Bailly, Jean Louis P\u00e9pin, Ahlame Douzal Chouakria, Ky T. Nguyen, and Anh-Duong Nguyen. 2018. Evolutionary Active Constrained Clustering for Obstructive Sleep Apnea Analysis. Data Sci. Eng. 3, 4 (2018), 359--378.","journal-title":"Eng."},{"key":"e_1_2_1_26_1","volume-title":"CDFShop: Exploring and Optimizing Learned Index Structures. In SIGMOD Conference. ACM, 2789--2792","author":"Marcus Ryan","year":"2020","unstructured":"Ryan Marcus, Emily Zhang, and Tim Kraska. 2020. CDFShop: Exploring and Optimizing Learned Index Structures. In SIGMOD Conference. ACM, 2789--2792."},{"key":"e_1_2_1_27_1","volume-title":"Lazier Than Lazy Greedy","author":"Mirzasoleiman Baharan","year":"1812","unstructured":"Baharan Mirzasoleiman, Ashwinkumar Badanidiyuru, Amin Karbasi, Jan Vondr\u00e1k, and Andreas Krause. 2015. Lazier Than Lazy Greedy. In AAAI. AAAI Press, 1812--1818."},{"key":"e_1_2_1_28_1","unstructured":"Michael Mitzenmacher. 2018. A Model for Learned Bloom Filters and Optimizing by Sandwiching. (2018) 462--471."},{"key":"e_1_2_1_29_1","volume-title":"Distribution-free multiple comparisons","author":"Nemenyi Peter Bjorn","unstructured":"Peter Bjorn Nemenyi. 1963. Distribution-free multiple comparisons. Princeton University."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1007\/s002360050048"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1145\/2723372.2737793"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/3044711"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.csda.2003.11.009"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1080\/01621459.1971.10482356"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASSP.1978.1163055"},{"key":"e_1_2_1_36_1","volume-title":"Clustering Sequences with Hidden Markov Models","author":"Smyth Padhraic","unstructured":"Padhraic Smyth. 1996. Clustering Sequences with Hidden Markov Models. In NIPS. MIT Press, 648--654."},{"key":"e_1_2_1_37_1","doi-asserted-by":"crossref","unstructured":"Shaoxu Song Chunping Li and Xiaoquan Zhang. 2015. Turn Waste into Wealth: On Simultaneous Clustering and Cleaning over Dirty Data. In KDD. ACM 1115--1124.","DOI":"10.1145\/2783258.2783317"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/3580305.3599405"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPDS.2023.3298148"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/IJCNN52387.2021.9533427"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1098\/rspa.1912.0086"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1109\/TPAMI.2022.3198411"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1007\/s11390-015-1565-7"}],"container-title":["Proceedings of the ACM on Management of Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3709696","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3709696","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T18:17:23Z","timestamp":1774981043000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3709696"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,2,10]]},"references-count":43,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2025,2,10]]}},"alternative-id":["10.1145\/3709696"],"URL":"https:\/\/doi.org\/10.1145\/3709696","relation":{},"ISSN":["2836-6573"],"issn-type":[{"value":"2836-6573","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,2,10]]}}}