{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,23]],"date-time":"2026-01-23T05:40:08Z","timestamp":1769146808916,"version":"3.49.0"},"publisher-location":"New York, NY, USA","reference-count":44,"publisher":"ACM","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":[],"published-print":{"date-parts":[[2026,1,26]]},"DOI":"10.1145\/3784828.3784832","type":"proceedings-article","created":{"date-parts":[[2026,1,22]],"date-time":"2026-01-22T13:19:17Z","timestamp":1769087957000},"page":"274-283","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["DeepEBC: Compressing the Pre-Trained LLMs with Error-Bounded Lossy Compression"],"prefix":"10.1145","author":[{"ORCID":"https:\/\/orcid.org\/0009-0007-0512-7675","authenticated-orcid":false,"given":"Jiaqi","family":"Xu","sequence":"first","affiliation":[{"name":"The Hong Kong Polytechnic University, Hong Kong, Hong Kong"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0284-1113","authenticated-orcid":false,"given":"Zhaorui","family":"Zhang","sequence":"additional","affiliation":[{"name":"Hong Kong Polytechnic University, Hong Kong, Hong Kong"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-5203-5163","authenticated-orcid":false,"given":"Gaolin","family":"Wei","sequence":"additional","affiliation":[{"name":"Hong Kong Polytechnic University, Hong Kong, Hong Kong"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9935-5674","authenticated-orcid":false,"given":"Sheng","family":"Di","sequence":"additional","affiliation":[{"name":"Argonne National Laboratory, Lemont, IL, USA"}]},{"ORCID":"https:\/\/orcid.org\/0009-0009-8300-9740","authenticated-orcid":false,"given":"Benben","family":"Liu","sequence":"additional","affiliation":[{"name":"The University of Hong Kong, Hong Kong, Hong Kong"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6244-1264","authenticated-orcid":false,"given":"Xiaodong","family":"Yu","sequence":"additional","affiliation":[{"name":"Stevens Institute of Technology, Hoboken, New Jersey, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7581-8905","authenticated-orcid":false,"given":"Xiaoyi","family":"Lu","sequence":"additional","affiliation":[{"name":"University of California, Merced, Merced, USA"}]}],"member":"320","published-online":{"date-parts":[[2026,1,25]]},"reference":[{"key":"e_1_3_3_1_2_2","unstructured":"Josh Achiam Steven Adler Sandhini Agarwal Lama Ahmad Ilge Akkaya Florencia\u00a0Leoni Aleman Diogo Almeida Janko Altenschmidt Sam Altman Shyamal Anadkat et\u00a0al. 2023. Gpt-4 technical report. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2303.08774 (2023)."},{"key":"e_1_3_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v38i10.28960"},{"key":"e_1_3_3_1_4_2","unstructured":"Peter\u00a0F. Brown John Cocke Stephen A.\u00a0Della Pietra Vincent J.\u00a0Della Pietra Fredrick Jelinek John\u00a0D. Lafferty Robert\u00a0L. Mercer and Paul\u00a0S. Roossin. 1990. A Statistical Approach to Machine Translation. Computational Linguistics (1990) 79\u201385."},{"key":"e_1_3_3_1_5_2","first-page":"1877","volume-title":"Advances in Neural Information Processing Systems","author":"Brown Tom","year":"2020","unstructured":"Tom Brown, Benjamin Mann, and etc. Ryder. 2020. Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems , H.\u00a0Larochelle, M.\u00a0Ranzato, R.\u00a0Hadsell, M.F. Balcan, and H.\u00a0Lin (Eds.), Vol.\u00a033. Curran Associates, Inc., 1877\u20131901."},{"key":"e_1_3_3_1_6_2","doi-asserted-by":"crossref","unstructured":"Sheng Di Jinyang Liu Kai Zhao Xin Liang Robert Underwood Zhaorui Zhang Milan Shah Yafan Huang Jiajun Huang Xiaodong Yu et\u00a0al. 2025. A survey on error-bounded lossy compression for scientific datasets. ACM computing surveys 57 11 (2025) 1\u201338.","DOI":"10.1145\/3733104"},{"key":"e_1_3_3_1_7_2","unstructured":"Peijie Dong Zhenheng Tang Xiang Liu Lujun Li Xiaowen Chu and Bo Li. 2025. Can Compressed LLMs Truly Act? An Empirical Evaluation of Agentic Capabilities in LLM Compression. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2505.19433 (2025)."},{"key":"e_1_3_3_1_8_2","unstructured":"Elias Frantar Saleh Ashkboos Torsten Hoefler and Dan Alistarh. 2023. GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers. arxiv:https:\/\/arXiv.org\/abs\/2210.17323\u00a0[cs.LG]"},{"key":"e_1_3_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1201\/9781003162810-13"},{"key":"e_1_3_3_1_10_2","doi-asserted-by":"crossref","unstructured":"Kai Han Yunhe Wang Hanting Chen Xinghao Chen Jianyuan Guo Zhenhua Liu Yehui Tang An Xiao Chunjing Xu Yixing Xu et\u00a0al. 2022. A survey on vision transformer. IEEE transactions on pattern analysis and machine intelligence 45 1 (2022) 87\u2013110.","DOI":"10.1109\/TPAMI.2022.3152247"},{"key":"e_1_3_3_1_11_2","volume-title":"Advances in Neural Information Processing Systems","author":"Han Song","year":"2015","unstructured":"Song Han, Jeff Pool, John Tran, and William Dally. 2015. Learning both Weights and Connections for Efficient Neural Network. In Advances in Neural Information Processing Systems , Vol.\u00a028. Curran Associates, Inc."},{"key":"e_1_3_3_1_12_2","unstructured":"Edward\u00a0J. Hu Yelong Shen Phillip Wallis Zeyuan Allen-Zhu Yuanzhi Li Shean Wang Lu Wang and Weizhu Chen. 2021. LoRA: Low-Rank Adaptation of Large Language Models. arxiv:https:\/\/arXiv.org\/abs\/2106.09685\u00a0[cs.CL]"},{"key":"e_1_3_3_1_13_2","unstructured":"Jiajun Huang Sheng Di Xiaodong Yu Yujia Zhai Jinyang Liu Ken Raffenetti Hui Zhou Kai Zhao Zizhong Chen Franck Cappello et\u00a0al. 2023. C-Coll: Introducing error-bounded lossy compression into MPI collectives. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2304.03890 (2023)."},{"key":"e_1_3_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/IPDPS57955.2024.00072"},{"key":"e_1_3_3_1_15_2","unstructured":"Jiajun Huang Sheng Di Xiaodong Yu Yujia Zhai Zhaorui Zhang Jinyang Liu Xiaoyi Lu Ken Raffenetti Hui Zhou Kai Zhao et\u00a0al. 2025. ZCCL: Significantly improving collective communication with error-bounded lossy compression. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2502.18554 (2025)."},{"key":"e_1_3_3_1_16_2","doi-asserted-by":"crossref","unstructured":"Yafan Huang Sheng Di Xiaodong Yu Guanpeng Li and Franck Cappello. 2023. cuSZp: An Ultra-fast GPU Error-bounded Lossy Compression Framework with Optimized End-to-End Performance. Proceedings of the International Conference for High Performance Computing Networking Storage and Analysis (SC \u201923) (2023) 1\u201312.","DOI":"10.1145\/3581784.3607048"},{"key":"e_1_3_3_1_17_2","unstructured":"Aixin Liu Bei Feng Bin Wang Bingxuan Wang Bo Liu Chenggang Zhao Chengqi Dengr Chong Ruan Damai Dai Daya Guo et\u00a0al. 2024. Deepseek-v2: A strong economical and efficient mixture-of-experts language model. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2405.04434 (2024)."},{"key":"e_1_3_3_1_18_2","unstructured":"Qianli Liu Zhaorui Zhang Xin Yao and Benben Liu. 2025. HLoRA: Efficient federated learning system for LLM heterogeneous fine-tuning. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2503.00813 (2025)."},{"key":"e_1_3_3_1_19_2","unstructured":"Yuanjian Liu Sheng Di Jiajun Huang Zhaorui Zhang Kyle Chard and Ian Foster. 2025. Ocelot: An Interactive Efficient Distributed Compression-As-a-Service Platform With Optimized Data Compression Techniques. IEEE Transactions on Parallel and Distributed Systems (2025)."},{"key":"e_1_3_3_1_20_2","unstructured":"Zhuang Liu Mingjie Sun Tinghui Zhou Gao Huang and Trevor Darrell. 2018. Rethinking the value of network pruning. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/1810.05270 (2018)."},{"key":"e_1_3_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/3731599.3767377"},{"key":"e_1_3_3_1_22_2","unstructured":"Songkai Ma Zhaorui Zhang Sheng Di Benben Liu Xiaodong Yu Xiaoyi Lu and Dan Wang. 2025. MoE-Compression: How the Compression Error of Experts Affects the Inference Accuracy of MoE Model? arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2509.07727 (2025)."},{"key":"e_1_3_3_1_23_2","doi-asserted-by":"crossref","unstructured":"Saurav Muralidharan Sharath Turuvekere\u00a0Sreenivas Raviraj Joshi Marcin Chochowski Mostofa Patwary Mohammad Shoeybi Bryan Catanzaro Jan Kautz and Pavlo Molchanov. 2024. Compact language models via pruning and knowledge distillation. Advances in Neural Information Processing Systems 37 (2024) 41076\u201341102.","DOI":"10.52202\/079017-1299"},{"key":"e_1_3_3_1_24_2","unstructured":"Antonio Polino Razvan Pascanu and Dan Alistarh. 2018. Model compression via distillation and quantization. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/1802.05668 (2018)."},{"key":"e_1_3_3_1_25_2","unstructured":"Antonio Polino Razvan Pascanu and Dan Alistarh. 2018. Model compression via distillation and quantization. CoRR abs\/1802.05668 (2018)."},{"key":"e_1_3_3_1_26_2","unstructured":"A. Radford J. Wu R. Child D. Luan D. Amodei and I.\u00a0Sutskever et al.2019. Language models are unsupervised multitask learners. 9\u00a0pages."},{"key":"e_1_3_3_1_27_2","unstructured":"Luka Ribar Ivan Chelombiev Luke Hudlass-Galley Charlie Blake Carlo Luschi and Douglas Orr. 2024. SparQ Attention: Bandwidth-Efficient LLM Inference. arxiv:https:\/\/arXiv.org\/abs\/2312.04985\u00a0[cs.LG]"},{"key":"e_1_3_3_1_28_2","unstructured":"Siqi Sun Yu Cheng Zhe Gan and Jingjing Liu. 2019. Patient Knowledge Distillation for BERT Model Compression. arxiv:https:\/\/arXiv.org\/abs\/1908.09355\u00a0[cs.CL]"},{"key":"e_1_3_3_1_29_2","unstructured":"Hugo Touvron Thibaut Lavril Gautier Izacard Xavier Martinet Marie-Anne Lachaux Timoth\u00e9e Lacroix Baptiste Rozi\u00e8re Naman Goyal Eric Hambro Faisal Azhar et\u00a0al. 2023. Llama: Open and efficient foundation language models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2302.13971 (2023)."},{"key":"e_1_3_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/3679240.3734604"},{"key":"e_1_3_3_1_31_2","unstructured":"Haojun Xia Zhen Zheng Yuchao Li Donglin Zhuang Zhongzhu Zhou Xiafei Qiu Yong Li Wei Lin and Shuaiwen\u00a0Leon Song. 2023. Flash-llm: Enabling cost-effective and highly-efficient large generative model inference with unstructured sparsity. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2309.10285 (2023)."},{"key":"e_1_3_3_1_32_2","unstructured":"Haotian Xu Zhaorui Zhang Sheng Di Benben Liu Khalid\u00a0Ayed Alharthi and Jiannong Cao. 2024. Fedfa: a fully asynchronous training paradigm for federated learning. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2404.11015 (2024)."},{"key":"e_1_3_3_1_33_2","unstructured":"Xiaohan Xu Ming Li Chongyang Tao Tao Shen Reynold Cheng Jinyang Li Can Xu Dacheng Tao and Tianyi Zhou. 2024. A survey on knowledge distillation of large language models. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2402.13116 (2024)."},{"key":"e_1_3_3_1_34_2","unstructured":"Chuanpeng Yang Yao Zhu Wang Lu Yidong Wang Qian Chen Chenlong Gao Bingjie Yan and Yiqiang Chen. 2024. Survey on knowledge distillation for large language models: methods evaluation and application. ACM Transactions on Intelligent Systems and Technology (2024)."},{"key":"e_1_3_3_1_35_2","doi-asserted-by":"crossref","unstructured":"Yifei Yang Zouying Cao and Hai Zhao. 2024. Laco: Large language model pruning via layer collapse. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2402.11187 (2024).","DOI":"10.18653\/v1\/2024.findings-emnlp.372"},{"key":"e_1_3_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/NaNA63151.2024.00084"},{"key":"e_1_3_3_1_37_2","unstructured":"Zhijing Ye Sheng Di Jiamin Wang Zhiqing Zhong Zhaorui Zhang and Xiaodong Yu. 2025. An Efficient Gradient-Aware Error-Bounded Lossy Compressor for Federated Learning. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2511.05770 (2025)."},{"key":"e_1_3_3_1_38_2","unstructured":"Ping Zhang Zhaorui Zhang Sheng Di Yao Xin and Benben Liu. 2025. CLLoRA: An approach to measure the effects of the context length for LLM fine-tuning. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2502.18910 (2025)."},{"key":"e_1_3_3_1_39_2","unstructured":"Tianyi Zhang Mohsen Hariri Shaochen Zhong Vipin Chaudhary Yang Sui Xia Hu and Anshumali Shrivastava. 2025. 70% size 100% accuracy: Lossless llm compression for efficient gpu inference via dynamic-length float. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/2504.11651 (2025)."},{"key":"e_1_3_3_1_40_2","doi-asserted-by":"crossref","unstructured":"Zhaorui Zhang Sheng Di Benben Liu Zhuoran Ji Guanpeng Li Xiaoyi Lu Amelie\u00a0Chi Zhou Khalid\u00a0Ayed Alharthi and Jiannong Cao. 2025. FedEFsz: Fair Cross-Silo Federated Learning System with Error-Bounded Lossy Compression. IEEE Transactions on Parallel and Distributed Systems (2025).","DOI":"10.1109\/TPDS.2025.3593896"},{"key":"e_1_3_3_1_41_2","doi-asserted-by":"crossref","unstructured":"Zhaorui Zhang Sheng Di Kai Zhao Sian Jin Dingwen Tao Zhuoran Ji Benben Liu Khalid\u00a0Ayed Alharthi Jiannong Cao and Franck Cappello. 2025. FedCSpc: A Cross-Silo Federated Learning System with Error-Bounded Lossy Parameter Compression. IEEE Transactions on Parallel and Distributed Systems (2025).","DOI":"10.1109\/TPDS.2025.3564736"},{"key":"e_1_3_3_1_42_2","doi-asserted-by":"crossref","unstructured":"Zhaorui Zhang Zhuoran Ji and Choli Wang. 2022. Momentum-driven adaptive synchronization model for distributed DNN training on HPC clusters. J. Parallel and Distrib. Comput. 159 (2022) 65\u201384.","DOI":"10.1016\/j.jpdc.2021.09.007"},{"key":"e_1_3_3_1_43_2","doi-asserted-by":"crossref","unstructured":"Zhaorui Zhang and Choli Wang. 2021. SaPus: Self-adaptive parameter update strategy for DNN training on Multi-GPU clusters. IEEE Transactions on Parallel and Distributed Systems 33 7 (2021) 1569\u20131580.","DOI":"10.1109\/TPDS.2021.3118609"},{"key":"e_1_3_3_1_44_2","unstructured":"Zhaorui Zhang and Choli Wang. 2022. MIPD: An adaptive gradient sparsification framework for distributed DNNs training. IEEE Transactions on Parallel and Distributed Systems 33 11 (2022) 3053\u20133066."},{"key":"e_1_3_3_1_45_2","unstructured":"Michael Zhu and Suyog Gupta. 2017. To prune or not to prune: exploring the efficacy of pruning for model compression. arXiv preprint arXiv:https:\/\/arXiv.org\/abs\/1710.01878 (2017)."}],"event":{"name":"SCA\/HPCAsiaWS 2026: SCA\/HPCAsia 2026 Workshops: Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops","location":"Osaka , Japan","acronym":"SCA\/HPCAsiaWS 2026"},"container-title":["Proceedings of the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops"],"original-title":[],"deposited":{"date-parts":[[2026,1,22]],"date-time":"2026-01-22T13:39:46Z","timestamp":1769089186000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3784828.3784832"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,1,25]]},"references-count":44,"alternative-id":["10.1145\/3784828.3784832","10.1145\/3784828"],"URL":"https:\/\/doi.org\/10.1145\/3784828.3784832","relation":{},"subject":[],"published":{"date-parts":[[2026,1,25]]},"assertion":[{"value":"2026-01-25","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}