{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,21]],"date-time":"2026-04-21T14:58:26Z","timestamp":1776783506103,"version":"3.51.2"},"reference-count":48,"publisher":"Association for Computing Machinery (ACM)","issue":"10","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2022,6]]},"abstract":"<jats:p>Not only the vast applications but also the distinct features of time series data stimulate the booming growth of time series database management systems, such as Apache IoTDB, InfluxDB, OpenTSDB and so on. Almost all these systems employ columnar storage, with effective encoding of time series data. Given the distinct features of various time series data, it is not surprising that different encoding strategies may perform variously. In this study, we first summarize the features of time series data that may affect encoding performance, including scale, delta, repeat and increase. Then, we introduce the storage scheme of a typical time series database, Apache IoTDB, prescribing the limits to implementing encoding algorithms in the system. A qualitative analysis of encoding effectiveness regarding to various data features is then presented for the studied algorithms. To this end, we develop a benchmark for evaluating encoding algorithms, including a data generator regarding the aforesaid data features and several real-world datasets from our industrial partners. Finally, we present an extensive experimental evaluation using the benchmark. Remarkably, a quantitative analysis of encoding effectiveness regarding to various data features is conducted in Apache IoTDB.<\/jats:p>","DOI":"10.14778\/3547305.3547319","type":"journal-article","created":{"date-parts":[[2022,9,7]],"date-time":"2022-09-07T16:09:53Z","timestamp":1662566993000},"page":"2148-2160","source":"Crossref","is-referenced-by-count":30,"title":["Time series data encoding for efficient storage"],"prefix":"10.14778","volume":"15","author":[{"given":"Jinzhao","family":"Xiao","sequence":"first","affiliation":[{"name":"Tsinghua University"}]},{"given":"Yuxiang","family":"Huang","sequence":"additional","affiliation":[{"name":"Tsinghua University"}]},{"given":"Changyu","family":"Hu","sequence":"additional","affiliation":[{"name":"Tsinghua University"}]},{"given":"Shaoxu","family":"Song","sequence":"additional","affiliation":[{"name":"Tsinghua University"}]},{"given":"Xiangdong","family":"Huang","sequence":"additional","affiliation":[{"name":"Tsinghua University"}]},{"given":"Jianmin","family":"Wang","sequence":"additional","affiliation":[{"name":"Tsinghua University"}]}],"member":"320","published-online":{"date-parts":[[2022,9,7]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"https:\/\/iotdb.apache.org\/.  https:\/\/iotdb.apache.org\/."},{"key":"e_1_2_1_2_1","unstructured":"https:\/\/www.influxdata.com\/.  https:\/\/www.influxdata.com\/."},{"key":"e_1_2_1_3_1","unstructured":"http:\/\/opentsdb.net\/.  http:\/\/opentsdb.net\/."},{"key":"e_1_2_1_4_1","unstructured":"https:\/\/prometheus.io\/.  https:\/\/prometheus.io\/."},{"key":"e_1_2_1_5_1","unstructured":"https:\/\/github.com\/apache\/iotdb\/tree\/research\/encoding-exp.  https:\/\/github.com\/apache\/iotdb\/tree\/research\/encoding-exp."},{"key":"e_1_2_1_6_1","unstructured":"https:\/\/github.com\/xjz17\/iotdb\/tree\/TSEncoding.  https:\/\/github.com\/xjz17\/iotdb\/tree\/TSEncoding."},{"key":"e_1_2_1_7_1","unstructured":"https:\/\/thulab.github.io\/iotdb-quality\/.  https:\/\/thulab.github.io\/iotdb-quality\/."},{"key":"e_1_2_1_8_1","unstructured":"https:\/\/iotdb.apache.org\/UserGuide\/Master\/Data-Concept\/Encoding.html.  https:\/\/iotdb.apache.org\/UserGuide\/Master\/Data-Concept\/Encoding.html."},{"key":"e_1_2_1_9_1","unstructured":"https:\/\/github.com\/thulab\/iotdb-benchmark.  https:\/\/github.com\/thulab\/iotdb-benchmark."},{"key":"e_1_2_1_10_1","unstructured":"https:\/\/www.microsoft.com\/en-us\/download\/details.aspx.  https:\/\/www.microsoft.com\/en-us\/download\/details.aspx."},{"key":"e_1_2_1_11_1","unstructured":"https:\/\/archive.ics.uci.edu.  https:\/\/archive.ics.uci.edu."},{"key":"e_1_2_1_12_1","unstructured":"https:\/\/www.kaggle.com\/datasets\/eliasdabbas\/web-server-access-logs.  https:\/\/www.kaggle.com\/datasets\/eliasdabbas\/web-server-access-logs."},{"key":"e_1_2_1_13_1","unstructured":"https:\/\/www.kaggle.com\/datasets\/winmedals\/incident-event-log-dataset.  https:\/\/www.kaggle.com\/datasets\/winmedals\/incident-event-log-dataset."},{"key":"e_1_2_1_14_1","unstructured":"https:\/\/www.kaggle.com\/datasets\/shawon10\/web-log-dataset.  https:\/\/www.kaggle.com\/datasets\/shawon10\/web-log-dataset."},{"key":"e_1_2_1_15_1","unstructured":"https:\/\/www.kaggle.com\/datasets\/.  https:\/\/www.kaggle.com\/datasets\/."},{"key":"e_1_2_1_16_1","unstructured":"https:\/\/www.gnu.org\/software\/gzip\/.  https:\/\/www.gnu.org\/software\/gzip\/."},{"key":"e_1_2_1_17_1","unstructured":"https:\/\/sxsong.github.io\/doc\/encoding.pdf.  https:\/\/sxsong.github.io\/doc\/encoding.pdf."},{"key":"e_1_2_1_18_1","volume-title":"abs\/1908.05198","author":"Aamand Anders","year":"2019","unstructured":"Anders Aamand , Piotr Indyk , and Ali Vakilian . (learned) frequency estimation algorithms under zipfian distribution. CoRR , abs\/1908.05198 , 2019 . Anders Aamand, Piotr Indyk, and Ali Vakilian. (learned) frequency estimation algorithms under zipfian distribution. CoRR, abs\/1908.05198, 2019."},{"key":"e_1_2_1_19_1","first-page":"179","volume-title":"2015 IEEE International Conference on Electronics, Circuits, and Systems, ICECS 2015","author":"Bartik Matej","year":"2015","unstructured":"Matej Bartik , Sven Ubik , and Pavel Kubal\u00edk . LZ4 compression algorithm on FPGA . In 2015 IEEE International Conference on Electronics, Circuits, and Systems, ICECS 2015 , Cairo, Egypt , December 6-9, 2015 , pages 179 -- 182 . IEEE, 2015. Matej Bartik, Sven Ubik, and Pavel Kubal\u00edk. LZ4 compression algorithm on FPGA. In 2015 IEEE International Conference on Electronics, Circuits, and Systems, ICECS 2015, Cairo, Egypt, December 6-9, 2015, pages 179--182. IEEE, 2015."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/3264903"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.23919\/EUSIPCO.2017.8081677"},{"key":"e_1_2_1_22_1","volume-title":"Time series compression: a survey. CoRR, abs\/2101.08784","author":"Chiarot Giacomo","year":"2021","unstructured":"Giacomo Chiarot and Claudio Silvestri . Time series compression: a survey. CoRR, abs\/2101.08784 , 2021 . Giacomo Chiarot and Claudio Silvestri. Time series compression: a survey. CoRR, abs\/2101.08784, 2021."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1145\/358396.358400"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSP.2006.875394"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00778-014-0368-8"},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.1080\/0952813X.2010.505800"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.1966.1053907"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1007\/s41019-022-00181-9"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1016\/0020-0190(96)00090-7"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/2588555.2595638"},{"key":"e_1_2_1_31_1","volume-title":"Seventh Biennial Conference on Innovative Data Systems Research, CIDR 2015, Asilomar, CA, USA, January 4-7, 2015, Online Proceedings. www.cidrdb.org","author":"Katsis Yannis","year":"2015","unstructured":"Yannis Katsis , Yoav Freund , and Yannis Papakonstantinou . Combining databases and signal processing in plato . In Seventh Biennial Conference on Innovative Data Systems Research, CIDR 2015, Asilomar, CA, USA, January 4-7, 2015, Online Proceedings. www.cidrdb.org , 2015 . Yannis Katsis, Yoav Freund, and Yannis Papakonstantinou. Combining databases and signal processing in plato. In Seventh Biennial Conference on Innovative Data Systems Research, CIDR 2015, Asilomar, CA, USA, January 4-7, 2015, Online Proceedings. www.cidrdb.org, 2015."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/BigData47090.2019.9005580"},{"key":"e_1_2_1_33_1","volume-title":"Proceedings of the 19th International Conference on Data Engineering","author":"Lazaridis Iosif","year":"2003","unstructured":"Iosif Lazaridis and Sharad Mehrotra . Capturing sensor-generated time series with quality guarantees. In Umeshwar Dayal, Krithi Ramamritham, and T. M. Vijayaraman, editors , Proceedings of the 19th International Conference on Data Engineering , March 5-8, 2003 , Bangalore, India, pages 429--440. IEEE Computer Society , 2003. Iosif Lazaridis and Sharad Mehrotra. Capturing sensor-generated time series with quality guarantees. In Umeshwar Dayal, Krithi Ramamritham, and T. M. Vijayaraman, editors, Proceedings of the 19th International Conference on Data Engineering, March 5-8, 2003, Bangalore, India, pages 429--440. IEEE Computer Society, 2003."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/BigData.2014.7004244"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/2509420.2509427"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1016\/0020-0255(94)00108-N"},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.14778\/2824032.2824078"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-39071-5_33"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF02574699"},{"key":"e_1_2_1_40_1","unstructured":"Bin\n      Song Limin\n      Xiao Guangjun\n      Qin Li\n      Ruan and \n      Shida\n      Qiu\n    .\n  A deduplication algorithm based on data similarity and delta encoding\n  . In Hanning Yuan Jing Geng and Fuling Bian editors Geo-Spatial Knowledge and Intelligence - 4th International Conference on Geo-Informatics in Resource Management and Sustainable Ecosystem GRMSE \n  2016 Hong Kong China November 18-20 2016 Revised Selected Papers Part II volume \n  699\n   of \n  Communications in Computer and Information Science pages \n  245\n  --\n  253\n  . \n  Springer 2016.  Bin Song Limin Xiao Guangjun Qin Li Ruan and Shida Qiu. A deduplication algorithm based on data similarity and delta encoding. In Hanning Yuan Jing Geng and Fuling Bian editors Geo-Spatial Knowledge and Intelligence - 4th International Conference on Geo-Informatics in Resource Management and Sustainable Ecosystem GRMSE 2016 Hong Kong China November 18-20 2016 Revised Selected Papers Part II volume 699 of Communications in Computer and Information Science pages 245--253. Springer 2016."},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/INDIN.2018.8471921"},{"key":"e_1_2_1_42_1","series-title":"CEUR Workshop Proceedings","first-page":"72","volume-title":"Proceedings of the Dateso 2010 Annual International Workshop on DAtabases, TExts, Specifications and Objects","author":"Walder Jir\u00ed","year":"2010","unstructured":"Jir\u00ed Walder , Michal Kr\u00e1tk\u00fd , and Jan Platos . Fast fibonacci encoding algorithm . In Jaroslav Pokorn\u00fd, V\u00e1clav Sn\u00e1sel, and Karel Richta, editors, Proceedings of the Dateso 2010 Annual International Workshop on DAtabases, TExts, Specifications and Objects , Stedronin-Plazy, Czech Republic , April 21-23, 2010 , volume 567 of CEUR Workshop Proceedings , pages 72 -- 83 . CEUR-WS. org, 2010. Jir\u00ed Walder, Michal Kr\u00e1tk\u00fd, and Jan Platos. Fast fibonacci encoding algorithm. In Jaroslav Pokorn\u00fd, V\u00e1clav Sn\u00e1sel, and Karel Richta, editors, Proceedings of the Dateso 2010 Annual International Workshop on DAtabases, TExts, Specifications and Objects, Stedronin-Plazy, Czech Republic, April 21-23, 2010, volume 567 of CEUR Workshop Proceedings, pages 72--83. CEUR-WS.org, 2010."},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.14778\/3415478.3415504"},{"key":"e_1_2_1_44_1","doi-asserted-by":"publisher","DOI":"10.1109\/MC.1984.1659158"},{"key":"e_1_2_1_45_1","first-page":"516","volume-title":"Proceedings of the 2005 SIAM International Conference on Data Mining, SDM 2005","author":"Chi-Wing Wong Raymond","year":"2005","unstructured":"Raymond Chi-Wing Wong and Ada Wai-Chee Fu . Mining top-k itemsets over a sliding window based on zipfian distribution. In Hillol Kargupta, Jaideep Srivastava, Chandrika Kamath, and Arnold Goodman, editors , Proceedings of the 2005 SIAM International Conference on Data Mining, SDM 2005 , Newport Beach, CA, USA , April 21-23, 2005 , pages 516 -- 520 . SIAM, 2005. Raymond Chi-Wing Wong and Ada Wai-Chee Fu. Mining top-k itemsets over a sliding window based on zipfian distribution. In Hillol Kargupta, Jaideep Srivastava, Chandrika Kamath, and Arnold Goodman, editors, Proceedings of the 2005 SIAM International Conference on Data Mining, SDM 2005, Newport Beach, CA, USA, April 21-23, 2005, pages 516--520. SIAM, 2005."},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICEEM52022.2021.9480377"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE48307.2020.00119"},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.1977.1055714"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3547305.3547319","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,12,28]],"date-time":"2022-12-28T11:15:28Z","timestamp":1672226128000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3547305.3547319"}},"subtitle":["a comparative analysis in Apache IoTDB"],"short-title":[],"issued":{"date-parts":[[2022,6]]},"references-count":48,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2022,6]]}},"alternative-id":["10.14778\/3547305.3547319"],"URL":"https:\/\/doi.org\/10.14778\/3547305.3547319","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2022,6]]}}}