{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,7]],"date-time":"2026-03-07T19:55:42Z","timestamp":1772913342463,"version":"3.50.1"},"reference-count":61,"publisher":"Association for Computing Machinery (ACM)","issue":"11","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2025,7]]},"abstract":"<jats:p>Log data is a vital resource for capturing system events and states. With the increasing complexity and widespread adoption of modern software systems and IoT devices, the daily volume of log generation has surged to tens of petabytes, leading to significant collection and storage costs. To address this challenge, lossless log compression has emerged as an effective solution, enabling substantial resource savings without compromising log information. In this paper, we first conduct a characterization study on extensive public log datasets and identify four key observations. Building on these insights, we propose LogLite, a lightweight, plug-and-play, streaming lossless compression algorithm designed to handle both TEXT and JSON logs throughout their life cycle. LogLite requires no predefined rules or pre-training and is inherently adaptable to evolving log structures. Our evaluation shows that, compared to state-of-the-art baselines, LogLite achieves Pareto optimality in most scenarios, delivering an average improvement of up to 67.8% in compression ratio and up to 2.7X in compression speed.<\/jats:p>","DOI":"10.14778\/3749646.3749652","type":"journal-article","created":{"date-parts":[[2025,9,4]],"date-time":"2025-09-04T17:55:06Z","timestamp":1757008506000},"page":"3757-3770","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["LogLite: Lightweight Plug-and-Play Streaming Log Compression"],"prefix":"10.14778","volume":"18","author":[{"given":"Benzhao","family":"Tang","sequence":"first","affiliation":[{"name":"Cyberspace Institute of Advanced Technology of Guangzhou University &amp; Huangpu Research School of Guangzhou University"}]},{"given":"Shiyu","family":"Yang","sequence":"additional","affiliation":[{"name":"Cyberspace Institute of Advanced Technology of Guangzhou University &amp; Huangpu Research School of Guangzhou University"}]},{"given":"Zhitao","family":"Shen","sequence":"additional","affiliation":[{"name":"Ant Group"}]},{"given":"Wenjie","family":"Zhang","sequence":"additional","affiliation":[{"name":"University of New South Wales"}]},{"given":"Xuemin","family":"Lin","sequence":"additional","affiliation":[{"name":"Shanghai Jiao Tong University"}]},{"given":"Zhihong","family":"Tian","sequence":"additional","affiliation":[{"name":"Cyberspace Institute of Advanced Technology of Guangzhou University &amp; Huangpu Research School of Guangzhou University and Guangdong Key Laboratory of Industrial Control System Security"}]}],"member":"320","published-online":{"date-parts":[[2025,9,4]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"2025. LogLite. https:\/\/github.com\/benzhaotang\/LogLite"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/BigDataService.2018.00049"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/3231935"},{"key":"e_1_2_1_4_1","unstructured":"aliyun. 2018. aliyunIoT. https:\/\/developer.aliyun.com\/article\/637406"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE.2019.00031"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.14778\/3407790.3407851"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1145\/3448016.3457565"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/2463676.2465341"},{"key":"e_1_2_1_9_1","unstructured":"Yann Collet. 2011. LZ4: Fast Compression Algorithm. https:\/\/github.com\/lz4\/lz4"},{"key":"e_1_2_1_10_1","doi-asserted-by":"crossref","unstructured":"Yann Collet and Murray Kucherawy. 2018. Zstandard Compression and the application\/zstd Media Type. Technical Report.","DOI":"10.17487\/RFC8478"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/3133956.3134015"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/PCS.2015.7170048"},{"key":"e_1_2_1_13_1","volume-title":"Understanding user behavior through log data and analysis. Ways of Knowing in HCI","author":"Dumais Susan","year":"2014","unstructured":"Susan Dumais, Robin Jeffries, Daniel M Russell, Diane Tang, and Jaime Teevan. 2014. Understanding user behavior through log data and analysis. Ways of Knowing in HCI (2014), 349\u2013372."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2015.2489657"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10844-017-0450-y"},{"key":"e_1_2_1_16_1","volume-title":"MLC: an efficient multi-level log compression method for cloud backup systems. In 2016 IEEE Trustcom\/BigDataSE\/ISPA","author":"Feng Bo","unstructured":"Bo Feng, Chentao Wu, and Jie Li. 2016. MLC: an efficient multi-level log compression method for cloud backup systems. In 2016 IEEE Trustcom\/BigDataSE\/ISPA. IEEE, 1358\u20131365."},{"key":"e_1_2_1_17_1","volume-title":"GNU gzip. GNU Operating System","author":"Mark Adler Gailly","year":"1992","unstructured":"Jean-loup Gailly and Mark Adler. 1992. GNU gzip. GNU Operating System (1992)."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1145\/3236024.3236083"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDEW.2010.5452747"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/JRPROC.1952.273898"},{"key":"e_1_2_1_21_1","unstructured":"Facebook Inc. 2012. RocksDB: A Persistent Key-Value Store for Flash and RAM Storage. https:\/\/github.com\/facebook\/rocksdb"},{"key":"e_1_2_1_22_1","volume-title":"Snappy: A Fast Compressor. https:\/\/github.com\/google\/snappy","author":"Google Inc.","year":"2011","unstructured":"Google Inc. 2011. Snappy: A Fast Compressor. https:\/\/github.com\/google\/snappy"},{"key":"e_1_2_1_23_1","unstructured":"Insanity Industries. 2021. Pareto-optimal compression. https:\/\/insanity.industries\/post\/pareto-optimal-compression\/"},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/SANER.2018.8330197"},{"key":"e_1_2_1_25_1","volume-title":"ZIP: A File Compression Standard. https:\/\/www.info-zip.org\/","author":"Katz Phil","year":"1989","unstructured":"Phil Katz. 1989. ZIP: A File Compression Standard. https:\/\/www.info-zip.org\/"},{"key":"e_1_2_1_26_1","unstructured":"Abraham Lempel and Jacob Ziv. 2001. LZMA Algorithm. http:\/\/www.7-zip.org\/"},{"key":"e_1_2_1_27_1","unstructured":"Roman Leshchinskiy. 2018. LZBench: Compression Benchmarking Tool. https:\/\/github.com\/lemire\/LZBench"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.14778\/3587136.3587149"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/TDSC.2022.3162857"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1145\/3597503.3608129"},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.14778\/3551793.3551852"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/CCGrid.2015.45"},{"key":"e_1_2_1_33_1","volume-title":"YCSB: A Workload Generation Framework for Cloud Databases. https:\/\/github.com\/brianfrankcooper\/YCSB","author":"Jimmy Lin","year":"2010","unstructured":"Jimmy Lin et al. 2010. YCSB: A Workload Generation Framework for Cloud Databases. https:\/\/github.com\/brianfrankcooper\/YCSB"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE60146.2024.00124"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASE.2019.00085"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1145\/3510003.3511561"},{"key":"e_1_2_1_37_1","first-page":"4739","article-title":"Loganomaly: Unsupervised detection of sequential and quantitative anomalies in unstructured logs","volume":"19","author":"Meng Weibin","year":"2019","unstructured":"Weibin Meng, Ying Liu, Yichen Zhu, Shenglin Zhang, Dan Pei, Yuqing Liu, Yihao Chen, Ruizhi Zhang, Shimin Tao, Pei Sun, et al. 2019. Loganomaly: Unsupervised detection of sequential and quantitative anomalies in unstructured logs.. In IJCAI, Vol. 19. 4739\u20134745.","journal-title":"IJCAI"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/DSN.2015.14"},{"key":"e_1_2_1_39_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.1977.1055739"},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.14778\/2824032.2824078"},{"key":"e_1_2_1_41_1","volume-title":"15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21)","author":"Rodrigues Kirk","year":"2021","unstructured":"Kirk Rodrigues, Yu Luo, and Ding Yuan. 2021. CLP: Efficient and scalable search on compressed text logs. In 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21). 183\u2013198."},{"key":"e_1_2_1_42_1","volume-title":"An Empirical Study of Industrial Practitioners. In 2023 IEEE\/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 855\u2013867","author":"Rong Guoping","year":"2023","unstructured":"Guoping Rong, Shenghui Gu, Haifeng Shen, He Zhang, and Hongyu Kuang. 2023. How Do Developers' Profiles and Experiences Influence their Logging Practices? An Empirical Study of Industrial Practitioners. In 2023 IEEE\/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 855\u2013867."},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1145\/3382494.3410684"},{"key":"e_1_2_1_44_1","volume-title":"Amazon Redshift: Data Warehousing Service. https:\/\/aws.amazon.com\/redshift\/","author":"Services Amazon Web","year":"2012","unstructured":"Amazon Web Services. 2012. Amazon Redshift: Data Warehousing Service. https:\/\/aws.amazon.com\/redshift\/"},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/MSST.2010.5496972"},{"key":"e_1_2_1_46_1","doi-asserted-by":"crossref","unstructured":"Benzhao Tang Shiyu Yang Zhitao Shen Wenjie Zhang Xuemin Lin and Zhihong Tian. 2025. LogLite: Lightweight Plug-and-Play Streaming Log Compression. arXiv:2507.10337 [cs.DB] https:\/\/arxiv.org\/abs\/2507.10337","DOI":"10.14778\/3749646.3749652"},{"key":"e_1_2_1_47_1","volume-title":"18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24)","author":"Wang Rui","year":"2024","unstructured":"Rui Wang, Devin Gibson, Kirk Rodrigues, Yu Luo, Yun Zhang, Kaibo Wang, Yupeng Fu, Ting Chen, and Ding Yuan. 2024. \u03bcSlope: High Compression and Fast Search on Semi-Structured Logs. In 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24). 529\u2013544."},{"key":"e_1_2_1_48_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3035956"},{"key":"e_1_2_1_49_1","doi-asserted-by":"publisher","DOI":"10.1145\/3552326.3567484"},{"key":"e_1_2_1_50_1","volume-title":"19th USENIX Conference on File and Storage Technologies (FAST 21)","author":"Wei Junyu","year":"2021","unstructured":"Junyu Wei, Guangyan Zhang, Yang Wang, Zhiwei Liu, Zhanyang Zhu, Junchao Chen, Tingtao Sun, and Qi Zhou. 2021. On the feasibility of parser-based log compression in Large-Scale cloud systems. In 19th USENIX Conference on File and Storage Technologies (FAST 21). 249\u2013262."},{"key":"e_1_2_1_51_1","unstructured":"Wikipedia. 2024. Zstandard. https:\/\/en.wikipedia.org\/wiki\/Zstd"},{"key":"e_1_2_1_52_1","volume-title":"The Free Encyclopedia. https:\/\/en.wikipedia.org\/w\/index.php?title=Deflate&oldid=1148886022","author":"Wikipedia Wikipedia","unstructured":"Wikipedia contributors. 2023. Deflate \u2014 Wikipedia, The Free Encyclopedia. https:\/\/en.wikipedia.org\/w\/index.php?title=Deflate&oldid=1148886022"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1145\/3184407.3184416"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICSE48619.2023.00151"},{"key":"e_1_2_1_55_1","doi-asserted-by":"publisher","DOI":"10.1145\/3510003.3510180"},{"key":"e_1_2_1_56_1","doi-asserted-by":"publisher","DOI":"10.1145\/3626732"},{"key":"e_1_2_1_57_1","doi-asserted-by":"publisher","DOI":"10.14778\/3617838.3617839"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1145\/3338906.3338961"},{"key":"e_1_2_1_59_1","volume-title":"Loghub: A Large Collection of System Log Datasets for AI-driven Log Analytics. In IEEE International Symposium on Software Reliability Engineering (ISSRE).","author":"Zhu Jieming","unstructured":"Jieming Zhu, Shilin He, Pinjia He, Jinyang Liu, and Michael R. Lyu. 2023. Loghub: A Large Collection of System Log Datasets for AI-driven Log Analytics. In IEEE International Symposium on Software Reliability Engineering (ISSRE)."},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.1977.1055714"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.1978.1055934"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3749646.3749652","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,5]],"date-time":"2025-09-05T03:15:57Z","timestamp":1757042157000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3749646.3749652"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7]]},"references-count":61,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2025,7]]}},"alternative-id":["10.14778\/3749646.3749652"],"URL":"https:\/\/doi.org\/10.14778\/3749646.3749652","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2025,7]]},"assertion":[{"value":"2025-09-04","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}