{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,11]],"date-time":"2026-01-11T05:12:35Z","timestamp":1768108355940,"version":"3.49.0"},"reference-count":68,"publisher":"Association for Computing Machinery (ACM)","issue":"8","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2024,4]]},"abstract":"<jats:p>As the number of pre-trained machine learning (ML) models is growing exponentially, data reduction tools are not catching up. Existing data reduction techniques are not specifically designed for pre-trained model (PTM) dataset files. This is largely due to a lack of understanding of the patterns and characteristics of these datasets, especially those relevant to data reduction and compressibility.<\/jats:p>\n          <jats:p>This paper presents the first, exhaustive analysis to date of PTM datasets on storage compressibility. Our analysis spans different types of data reduction and compression techniques, from hash-based data deduplication, data similarity detection, to dictionary-coding compression. Our analysis explores these techniques at three data granularity levels, from model layers, model chunks, to model parameters. We draw new observations that indicate that modern data reduction tools are not effective when handling PTM datasets. There is a pressing need for new compression methods that take into account PTMs' data characteristics for effective storage reduction.<\/jats:p>\n          <jats:p>Motivated by our findings, we design Elf, a simple yet effective, error-bounded, lossy floating-point compression method. Elf transforms floating-point parameters in such a way that the common exponent field of the transformed parameters can be completely eliminated to save storage space. We develop Elves, a compression framework that integrates Elf along with several other data reduction methods. Elves uses the most effective method to compress PTMs that exhibit different patterns. Evaluation shows that Elves achieves an overall compression ratio of 1.52\u00d7, which is 1.31\u00d7, 1.32\u00d7 and 1.29\u00d7 higher than a general-purpose compressor (zstd), an error-bounded lossy compressor (SZ3), and the uniform model quantization, respectively, with negligible model accuracy loss.<\/jats:p>","DOI":"10.14778\/3659437.3659456","type":"journal-article","created":{"date-parts":[[2024,5,31]],"date-time":"2024-05-31T16:22:27Z","timestamp":1717172547000},"page":"2036-2049","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Everything You Always Wanted to Know About Storage Compressibility of Pre-Trained ML Models but Were Afraid to Ask"],"prefix":"10.14778","volume":"17","author":[{"given":"Zhaoyuan","family":"Su","sequence":"first","affiliation":[{"name":"University of Virginia"}]},{"given":"Ammar","family":"Ahmed","sequence":"additional","affiliation":[{"name":"University of Minnesota"}]},{"given":"Zirui","family":"Wang","sequence":"additional","affiliation":[{"name":"University of Virginia"}]},{"given":"Ali","family":"Anwar","sequence":"additional","affiliation":[{"name":"University of Minnesota"}]},{"given":"Yue","family":"Cheng","sequence":"additional","affiliation":[{"name":"University of Virginia"}]}],"member":"320","published-online":{"date-parts":[[2024,5,31]]},"reference":[{"key":"e_1_2_1_1_1","unstructured":"[n.d.]. gzip. https:\/\/www.gzip.org\/."},{"key":"e_1_2_1_2_1","unstructured":"[n.d.]. How Much Energy Do Data Centers Really Use? . https:\/\/energyinnovation.org\/2020\/03\/17\/how-much-energy-do-data-centers-really-use\/."},{"key":"e_1_2_1_3_1","unstructured":"[n.d.]. Hugging Face: The AI community building the future. https:\/\/huggingface.co\/."},{"key":"e_1_2_1_4_1","unstructured":"[n.d.]. Introducing LLaMA: A foundational 65-billion-parameter large language model. https:\/\/ai.meta.com\/blog\/large-language-model-llama-meta-ai\/."},{"key":"e_1_2_1_5_1","unstructured":"[n.d.]. pigz: A parallel implementation of gzip for modern multi-processor multi-core machines. https:\/\/zlib.net\/pigz\/."},{"key":"e_1_2_1_6_1","unstructured":"[n.d.]. Snappy a fast compressor\/decompressor. https:\/\/github.com\/google\/snappy."},{"key":"e_1_2_1_7_1","unstructured":"[n.d.]. TensorFlow Hub. https:\/\/www.tensorflow.org\/hub."},{"key":"e_1_2_1_8_1","unstructured":"[n.d.]. zfp. https:\/\/computing.llnl.gov\/projects\/zfp."},{"key":"e_1_2_1_9_1","unstructured":"[n.d.]. zip. https:\/\/www.iana.org\/assignments\/media-types\/application\/zip."},{"key":"e_1_2_1_10_1","unstructured":"[n.d.]. Zstandard. https:\/\/facebook.github.io\/zstd\/."},{"key":"e_1_2_1_11_1","volume-title":"Delta Compressed and Deduplicated Storage Using Stream-Informed Locality. In 4th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 12)","unstructured":"2012. Delta Compressed and Deduplicated Storage Using Stream-Informed Locality. In 4th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 12). USENIX Association, Boston, MA. https:\/\/www.usenix.org\/conference\/hotstorage12\/workshop-program\/presentation\/Shilane"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1145\/3005348"},{"key":"e_1_2_1_13_1","volume-title":"Beyond Efficiency: A Systematic Survey of Resource-Efficient Large Language Models. arXiv:2401.00625 [cs.LG]","author":"Bai Guangji","year":"2024","unstructured":"Guangji Bai, Zheng Chai, Chen Ling, Shiyu Wang, Jiaying Lu, Nan Zhang, Tingwei Shi, Ziyang Yu, Mengdan Zhu, Yifei Zhang, Carl Yang, Yue Cheng, and Liang Zhao. 2024. Beyond Efficiency: A Systematic Survey of Resource-Efficient Large Language Models. arXiv:2401.00625 [cs.LG]"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/3264903"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/SEQUEN.1997.666900"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/DCC.2007.44"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1177\/1094342019853336"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/TCSI.2020.3048260"},{"key":"e_1_2_1_19_1","volume-title":"International conference on machine learning. PMLR, 2285--2294","author":"Chen Wenlin","year":"2015","unstructured":"Wenlin Chen, James Wilson, Stephen Tyree, Kilian Weinberger, and Yixin Chen. 2015. Compressing neural networks with the hashing trick. In International conference on machine learning. PMLR, 2285--2294."},{"key":"e_1_2_1_20_1","volume-title":"Proceedings of the 27th International Conference on Neural Information Processing Systems -","volume":"1","author":"Denton Emily","year":"2014","unstructured":"Emily Denton, Wojciech Zaremba, Joan Bruna, Yann LeCun, and Rob Fergus. 2014. Exploiting Linear Structure within Convolutional Networks for Efficient Evaluation. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 1 (Montreal, Canada) (NIPS'14). MIT Press, Cambridge, MA, USA, 1269--1277."},{"key":"e_1_2_1_21_1","volume-title":"19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22)","author":"Eisenman Assaf","year":"2022","unstructured":"Assaf Eisenman, Kiran Kumar Matam, Steven Ingram, Dheevatsa Mudigere, Raghuraman Krishnamoorthi, Krishnakumar Nair, Misha Smelyanskiy, and Murali Annavaram. 2022. {Check-N-Run}: A checkpointing system for training deep learning recommendation models. In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22). 929--943."},{"key":"e_1_2_1_22_1","volume-title":"An Empirical Study of the Robustness of Windows NT Applications Using Random Testing. In 4th USENIX Windows Systems Symposium (4th USENIX Windows Systems Symposium). USENIX Association","author":"Forrester Justin","year":"2000","unstructured":"Justin Forrester and Barton Miller. 2000. An Empirical Study of the Robustness of Windows NT Applications Using Random Testing. In 4th USENIX Windows Systems Symposium (4th USENIX Windows Systems Symposium). USENIX Association, Seattle, WA."},{"key":"e_1_2_1_23_1","volume-title":"Design Tradeoffs for Data Deduplication Performance in Backup Workloads. In 13th USENIX Conference on File and Storage Technologies (FAST 15)","author":"Fu Min","year":"2015","unstructured":"Min Fu, Dan Feng, Yu Hua, Xubin He, Zuoning Chen, Wen Xia, Yucheng Zhang, and Yujuan Tan. 2015. Design Tradeoffs for Data Deduplication Performance in Backup Workloads. In 13th USENIX Conference on File and Storage Technologies (FAST 15). USENIX Association, Santa Clara, CA, 331--344. https:\/\/www.usenix.org\/conference\/fast15\/technical-sessions\/presentation\/fu"},{"key":"e_1_2_1_24_1","volume-title":"Abhishek Vaidyanathan, Ritwik Kanodia, Chuan-Sheng Foo, Wu Min, and Lin Jie.","author":"Gupta Manas","year":"2022","unstructured":"Manas Gupta, Efe Camci, Vishandi Rudy Keneta, Abhishek Vaidyanathan, Ritwik Kanodia, Chuan-Sheng Foo, Wu Min, and Lin Jie. 2022. Is complexity required for neural network pruning? a case study on global magnitude pruning. arXiv preprint arXiv:2209.14624 (2022)."},{"key":"e_1_2_1_25_1","volume-title":"Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149","author":"Han Song","year":"2015","unstructured":"Song Han, Huizi Mao, and William J Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015)."},{"key":"e_1_2_1_26_1","volume-title":"Proceedings of the 28th International Conference on Neural Information Processing Systems -","volume":"1","author":"Han Song","unstructured":"Song Han, Jeff Pool, John Tran, and William J. Dally. 2015. Learning Both Weights and Connections for Efficient Neural Networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1 (Montreal, Canada) (NIPS'15). MIT Press, Cambridge, MA, USA, 1135--1143."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.3389\/frai.2021.676564"},{"key":"e_1_2_1_28_1","volume-title":"2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 245--249","author":"He Tianxing","year":"2014","unstructured":"Tianxing He, Yuchen Fan, Yanmin Qian, Tian Tan, and Kai Yu. 2014. Reshaping deep neural network for fast decoding by node-pruning. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 245--249."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1111\/1467-8659.00681"},{"key":"e_1_2_1_30_1","series-title":"Lecture Notes on the Status of IEEE 754","volume-title":"IEEE Standard 754 for Binary Floating-Point Arithmetic","author":"Kahan William","year":"1996","unstructured":"William Kahan. 1996. IEEE Standard 754 for Binary Floating-Point Arithmetic. Lecture Notes on the Status of IEEE 754 (1996)."},{"key":"e_1_2_1_31_1","unstructured":"Raghuraman Krishnamoorthi. 2018. Quantizing deep convolutional networks for efficient inference: A whitepaper. arXiv:1806.08342 [cs.LG]"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/3589263"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.14778\/3551793.3551852"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2021.07.045"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2021.07.045"},{"key":"e_1_2_1_36_1","doi-asserted-by":"publisher","DOI":"10.1109\/BigData.2018.8622520"},{"key":"e_1_2_1_37_1","volume-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision. 1402--1406","author":"Liao Zhu","year":"2023","unstructured":"Zhu Liao, Victor Qu\u00e9tu, Van-Tam Nguyen, and Enzo Tartaglione. 2023. Can Unstructured Pruning Reduce the Depth in Deep Neural Networks?. In Proceedings of the IEEE\/CVF International Conference on Computer Vision. 1402--1406."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.14778\/3476249.3476305"},{"key":"e_1_2_1_39_1","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 12434--12443","author":"Liu Zhenhua","year":"2022","unstructured":"Zhenhua Liu, Yunhe Wang, Kai Han, Siwei Ma, and Wen Gao. 2022. Instance-aware dynamic neural network quantization. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition. 12434--12443."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1145\/1534530.1534541"},{"key":"e_1_2_1_41_1","volume-title":"9th USENIX Conference on File and Storage Technologies (FAST 11)","author":"Dutch","unstructured":"Dutch T. Meyer and William J. Bolosky. 2011. A Study of Practical Deduplication. In 9th USENIX Conference on File and Storage Technologies (FAST 11). USENIX Association, San Jose, CA. https:\/\/www.usenix.org\/conference\/fast11\/study-practical-deduplication"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/96267.96279"},{"key":"e_1_2_1_43_1","volume-title":"Vivekandanda Maganty, Ravi Murthy, Ajitkumar Natarajan, and Jeff Steidl.","author":"Miller Barton P","year":"1995","unstructured":"Barton P Miller, David Koski, Cjin Pheow Lee, Vivekandanda Maganty, Ravi Murthy, Ajitkumar Natarajan, and Jeff Steidl. 1995. Fuzz revisited: A re-examination of the reliability of UNIX utilities and services. Technical Report. University of Wisconsin-Madison Department of Computer Sciences."},{"key":"e_1_2_1_44_1","first-page":"11","article-title":"Loss aware post-training quantization","volume":"110","author":"Nahshan Yury","year":"2021","unstructured":"Yury Nahshan, Brian Chmiel, Chaim Baskin, Evgenii Zheltonozhskii, Ron Banner, Alex M Bronstein, and Avi Mendelson. 2021. Loss aware post-training quantization. Machine Learning 110, 11--12 (2021), 3245--3262.","journal-title":"Machine Learning"},{"key":"e_1_2_1_45_1","unstructured":"Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright Pamela Mishkin Chong Zhang Sandhini Agarwal Katarina Slama Alex Ray John Schulman Jacob Hilton Fraser Kelton Luke Miller Maddie Simens Amanda Askell Peter Welinder Paul Christiano Jan Leike and Ryan Lowe. 2022. Training language models to follow instructions with human feedback. arXiv:2203.02155 [cs.CL]"},{"key":"e_1_2_1_46_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-030-01225-0_36"},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.14778\/2824032.2824078"},{"key":"e_1_2_1_48_1","volume-title":"Redundancy Elimination Within Large Collections of Files. In 2004 USENIX Annual Technical Conference (USENIX ATC 04)","author":"University of Massachusetts Purushottam Kulkarni, Fred Douglis, Jason LaVoie, and John M. Tracey","year":"2004","unstructured":"University of Massachusetts Purushottam Kulkarni, Fred Douglis, Jason LaVoie, and John M. Tracey. 2004. Redundancy Elimination Within Large Collections of Files. In 2004 USENIX Annual Technical Conference (USENIX ATC 04). USENIX Association, Boston, MA. https:\/\/www.usenix.org\/conference\/2004-usenix-annual-technical-conference\/redundancy-elimination-within-large-collections"},{"key":"e_1_2_1_49_1","volume-title":"Conference on file and storage technologies (FAST 02)","author":"Quinlan Sean","year":"2002","unstructured":"Sean Quinlan and Sean Dorward. 2002. Venti: A new approach to archival data storage. In Conference on file and storage technologies (FAST 02)."},{"key":"e_1_2_1_50_1","unstructured":"Michael O. Rabin. 1981. Fingerprinting by random polynomials. Note: Harvard Aiken Computational Laboratory TR-15-81."},{"key":"e_1_2_1_51_1","unstructured":"Alec Radford Karthik Narasimhan Tim Salimans and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. (2018)."},{"key":"e_1_2_1_52_1","doi-asserted-by":"publisher","DOI":"10.1145\/322344.322346"},{"key":"e_1_2_1_53_1","doi-asserted-by":"publisher","DOI":"10.1109\/DRBSD56682.2022.00011"},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDCSW53096.2021.00012"},{"key":"e_1_2_1_55_1","doi-asserted-by":"crossref","DOI":"10.1007\/978-1-4842-2199-0","volume-title":"Apache parquet. Practical Hadoop Ecosystem: A Definitive Guide to Hadoop-Related Frameworks and Tools","author":"Vohra Deepak","year":"2016","unstructured":"Deepak Vohra and Deepak Vohra. 2016. Apache parquet. Practical Hadoop Ecosystem: A Definitive Guide to Hadoop-Related Frameworks and Tools (2016), 325--335."},{"key":"e_1_2_1_56_1","volume-title":"Structured pruning of large language models. arXiv preprint arXiv:1910.04732","author":"Wang Ziheng","year":"2019","unstructured":"Ziheng Wang, Jeremy Wohlwend, and Tao Lei. 2019. Structured pruning of large language models. arXiv preprint arXiv:1910.04732 (2019)."},{"key":"e_1_2_1_57_1","volume-title":"Workshop and Scao et al.","year":"2023","unstructured":"BigScience Workshop and Scao et al. 2023. BLOOM: A 176B-Parameter Open-Access Multilingual Language Model. arXiv:2211.05100 [cs.CL]"},{"key":"e_1_2_1_58_1","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2016.2571298"},{"key":"e_1_2_1_59_1","volume-title":"2016 USENIX Annual Technical Conference (USENIX ATC 16)","author":"Xia Wen","year":"2016","unstructured":"Wen Xia, Yukun Zhou, Hong Jiang, Dan Feng, Yu Hua, Yuchong Hu, Qing Liu, and Yucheng Zhang. 2016. {FastCDC}: A fast and efficient {Content-Defined} chunking approach for data deduplication. In 2016 USENIX Annual Technical Conference (USENIX ATC 16). 101--114."},{"key":"e_1_2_1_60_1","doi-asserted-by":"publisher","DOI":"10.1145\/3035918.3035938"},{"key":"e_1_2_1_61_1","doi-asserted-by":"publisher","DOI":"10.1145\/3502181.3531473"},{"key":"e_1_2_1_62_1","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO50266.2020.00071"},{"key":"e_1_2_1_63_1","volume-title":"2021 IEEE 39th International Conference on Computer Design (ICCD). IEEE, 542--550","author":"Zhang Shuyu","year":"2021","unstructured":"Shuyu Zhang, Donglei Wu, Haoyu Jin, Xiangyu Zou, Wen Xia, and Xiaojia Huang. 2021. QD-Compressor: a Quantization-based Delta Compression Framework for Deep Neural Networks. In 2021 IEEE 39th International Conference on Computer Design (ICCD). IEEE, 542--550."},{"key":"e_1_2_1_64_1","volume-title":"17th USENIX Conference on File and Storage Technologies (FAST 19)","author":"Zhang Yucheng","year":"2019","unstructured":"Yucheng Zhang, Wen Xia, Dan Feng, Hong Jiang, Yu Hua, and Qiang Wang. 2019. Finesse: Fine-Grained Feature Locality based Fast Resemblance Detection for Post-Deduplication Delta Compression. In 17th USENIX Conference on File and Storage Technologies (FAST 19). USENIX Association, Boston, MA, 121--128. https:\/\/www.usenix.org\/conference\/fast19\/presentation\/zhang"},{"key":"e_1_2_1_65_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE51399.2021.00145"},{"key":"e_1_2_1_66_1","doi-asserted-by":"publisher","DOI":"10.1145\/3369583.3392688"},{"key":"e_1_2_1_67_1","volume-title":"Avoiding the Disk Bottleneck in the Data Domain Deduplication File System. In 6th USENIX Conference on File and Storage Technologies (FAST 08)","author":"Zhu Benjamin","year":"2008","unstructured":"Benjamin Zhu, Kai Li, and Hugo Patterson. 2008. Avoiding the Disk Bottleneck in the Data Domain Deduplication File System. In 6th USENIX Conference on File and Storage Technologies (FAST 08). USENIX Association, San Jose, CA. https:\/\/www.usenix.org\/conference\/fast-08\/avoiding-disk-bottleneck-data-domain-deduplication-file-system"},{"key":"e_1_2_1_68_1","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.1977.1055714"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3659437.3659456","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,5,31]],"date-time":"2024-05-31T16:30:51Z","timestamp":1717173051000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3659437.3659456"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,4]]},"references-count":68,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2024,4]]}},"alternative-id":["10.14778\/3659437.3659456"],"URL":"https:\/\/doi.org\/10.14778\/3659437.3659456","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2024,4]]},"assertion":[{"value":"2024-05-31","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}