{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,25]],"date-time":"2026-03-25T15:05:46Z","timestamp":1774451146917,"version":"3.50.1"},"reference-count":67,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2025,5,9]],"date-time":"2025-05-09T00:00:00Z","timestamp":1746748800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Knowl. Discov. Data"],"published-print":{"date-parts":[[2025,5,31]]},"abstract":"<jats:p>\n            Neural network (NN) compression aims at reducing the model size and receives much research attention. Nevertheless, we observe that when compressing convolutional neural networks (CNNs), previous approaches may not well measure the impact of filters to loss, resulting in a significant performance degradation after compression. On the other hand, for compressing the fully connected neural networks (FCNNs), we observe that converting the weight matrix to the\n            <jats:italic>block diagonal structure<\/jats:italic>\n            would result in better compression. Therefore, for compressing CNNs, we propose a new pipeline in this article, named\n            <jats:italic>Retraining-Aware Pruning (RAP)<\/jats:italic>\n            , with a new self-distillation approach, named\n            <jats:italic>High-Level Activation-Guided Attention-Preserving Self-Distillation (HAP)<\/jats:italic>\n            and a novel filter pruning strategy, named\n            <jats:italic>Normalized Gradients and Geometric Median (NGGM)<\/jats:italic>\n            to effectively improve the accuracy and reduce the model size. Further, for reducing the model size of FCNNs, we formulate a new research problem, i.e.,\n            <jats:italic>Compression with Difference-Minimized Block Diagonal Structure (COMIS)<\/jats:italic>\n            , and propose a new algorithm,\n            <jats:italic>Memory-Efficient and Structure-Aware Compression (MESA)<\/jats:italic>\n            to effectively prune the weights into a block diagonal structure to significantly boost the compression rate. Extensive experiments on different models show that our approaches significantly outperform the state-of-the-art baselines in terms of compression rate, accuracy, and inference speed-up.\n          <\/jats:p>","DOI":"10.1145\/3721293","type":"journal-article","created":{"date-parts":[[2025,3,4]],"date-time":"2025-03-04T14:57:47Z","timestamp":1741100267000},"page":"1-27","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Compressing Deep Neural Networks with Goal-Specific Pruning and Self-Distillation"],"prefix":"10.1145","volume":"19","author":[{"ORCID":"https:\/\/orcid.org\/0009-0005-6926-9993","authenticated-orcid":false,"given":"Fa-You","family":"Chen","sequence":"first","affiliation":[{"name":"Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-7644-4068","authenticated-orcid":false,"given":"Yun-Jui","family":"Hsu","sequence":"additional","affiliation":[{"name":"Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-5966-8987","authenticated-orcid":false,"given":"Chia-Hsun","family":"Lu","sequence":"additional","affiliation":[{"name":"National Tsing Hua University, Hsinchu, Taiwan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2216-077X","authenticated-orcid":false,"given":"Hong-Han","family":"Shuai","sequence":"additional","affiliation":[{"name":"National Yang Ming Chiao Tung University, Hsinchu, Taiwan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9764-0455","authenticated-orcid":false,"given":"Lo-Yao","family":"Yeh","sequence":"additional","affiliation":[{"name":"Department of Information Management, National Central University, Zhongli District, Taoyuan, Taiwan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0377-7945","authenticated-orcid":false,"given":"Chih-Ya","family":"Shen","sequence":"additional","affiliation":[{"name":"Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan"}]}],"member":"320","published-online":{"date-parts":[[2025,5,9]]},"reference":[{"key":"e_1_3_3_2_2","doi-asserted-by":"crossref","unstructured":"Yash Akhauri. 2019. HadaNets: Flexible quantization strategies for neural networks. arXiv:1905.10759. Retrieved from https:\/\/arxiv.org\/abs\/1905.10759","DOI":"10.1109\/CVPRW.2019.00078"},{"key":"e_1_3_3_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSS.2022.3152179"},{"key":"e_1_3_3_4_2","doi-asserted-by":"crossref","unstructured":"Fa-You Chen Yun-Jui Hsu Chih-Ya Shen and Hong-Han Shuai. 2025. Source codes: Compressing deep neural networks with goal-specific pruning and self-distillation. Retrieved from https:\/\/github.com\/kantlulu\/NetworkCompression","DOI":"10.1145\/3721293"},{"key":"e_1_3_3_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2018.2875911"},{"key":"e_1_3_3_6_2","unstructured":"Luke N. Darlow Elliot J. Crowley Antreas Antoniou and Amos J. Storkey. 2018. Cinic-10 is not imagenet or cifar-10. arXiv:1810.03505. Retrieved from https:\/\/arxiv.org\/abs\/1810.03505"},{"key":"e_1_3_3_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"e_1_3_3_8_2","volume-title":"International Conference on Learning Representations, 1\u201342.","author":"Frankle Jonathan","year":"2019","unstructured":"Jonathan Frankle and Michael Carbin. 2019. The lottery ticket hypothesis: Finding sparse, trainable neural networks. In International Conference on Learning Representations, 1\u201342."},{"key":"e_1_3_3_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICIP.2018.8451339"},{"key":"e_1_3_3_10_2","doi-asserted-by":"publisher","DOI":"10.1145\/3487045"},{"key":"e_1_3_3_11_2","first-page":"1737","volume-title":"International Conference on Machine Learning","author":"Gupta Suyog","year":"2015","unstructured":"Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. 2015. Deep learning with limited numerical precision. In International Conference on Machine Learning, 1737\u20131746."},{"key":"e_1_3_3_12_2","unstructured":"Song Han Huizi Mao and William J. Dally.2015. Deep compression: Compressing deep neural networks with pruning trained quantization and huffman coding. arXiv:1510.00149. Retrieved from https:\/\/arxiv.org\/abs\/1510.00149"},{"key":"e_1_3_3_13_2","article-title":"Learning both weights and connections for efficient neural network","author":"Han Song","year":"2015","unstructured":"Song Han, Jeff Pool, John Tran, and William Dally.2015. Learning both weights and connections for efficient neural network. In Advances in Neural Information Processing Systems.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_3_14_2","first-page":"2009","volume-title":"IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"He Yang","year":"2020","unstructured":"Yang He, Yuhang Ding, Ping Liu, Linchao Zhu, Hanwang Zhang, and Yi Yang. 2020. Learning filter pruning criteria for deep convolutional neural networks acceleration. In IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 2009\u20132018."},{"key":"e_1_3_3_15_2","unstructured":"Yang He Xuanyi Dong Guoliang Kang Yanwei Fu and Yi Yang. 2018. Progressive deep neural networks acceleration via soft filter pruning. arXiv:1808.07471. Retrieved from https:\/\/arxiv.org\/abs\/1808.07471"},{"key":"e_1_3_3_16_2","unstructured":"Yang He Guoliang Kang Xuanyi Dong Yanwei Fu and Yi Yang. 2018. Soft filter pruning for accelerating deep convolutional neural networks. arXiv:1808.06866. Retrieved from https:\/\/arxiv.org\/abs\/1808.06866"},{"key":"e_1_3_3_17_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2019.00447"},{"key":"e_1_3_3_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.155"},{"key":"e_1_3_3_19_2","unstructured":"Geoffrey Hinton Oriol Vinyals and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv:1503.02531. Retrieved from https:\/\/arxiv.org\/abs\/1503.02531"},{"key":"e_1_3_3_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00110"},{"key":"e_1_3_3_21_2","unstructured":"Andrew G. Howard Menglong Zhu Bo Chen Dmitry Kalenichenko Weijun Wang Tobias Weyand Marco Andreetto and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861. Retrieved from https:\/\/arxiv.org\/abs\/1704.04861"},{"key":"e_1_3_3_22_2","doi-asserted-by":"publisher","DOI":"10.1145\/3340531.3411867"},{"key":"e_1_3_3_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/SSIRI.2009.62"},{"key":"e_1_3_3_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/279310.279321"},{"key":"e_1_3_3_25_2","unstructured":"Forrest N. Iandola Song Han Matthew W. Moskewicz Khalid Ashraf William J. Dally and Kurt Keutzer. 2016. Squeezenet: Alexnet-level accuracy with 50x fewer parameters and \\( < 0.5\\) mb model size. arXiv:1602.07360. Retrieved from https:\/\/arxiv.org\/abs\/1602.07360"},{"key":"e_1_3_3_26_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00807"},{"key":"e_1_3_3_27_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i9.16969"},{"key":"e_1_3_3_28_2","volume-title":"International Conference on Learning Representations","author":"Kim Yong-Deok","year":"2016","unstructured":"Yong-Deok Kim, Eunhyeok Park, Sungjoo Yoo, Taelim Choi, Lu Yang, and Dongjun Shin. 2016. Compression of deep convolutional neural networks for fast and low power mobile applications. In International Conference on Learning Representations."},{"key":"e_1_3_3_29_2","unstructured":"Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images. Technical Report."},{"key":"e_1_3_3_30_2","volume-title":"Advances in Neural Information Processing Systems","author":"Krizhevsky Alex","year":"2012","unstructured":"Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems."},{"key":"e_1_3_3_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/MPRV.2017.2940968"},{"key":"e_1_3_3_32_2","volume-title":"International Conference on Artificial Neural Networks","author":"LeCun Yann","year":"1995","unstructured":"Yann LeCun, L. D. Jackel, Leon Bottou, A. Brunot, Corinna Cortes, J. S. Denker, Harris Drucker, I. Guyon, U. A. Muller, Eduard Sackinger, et al. 1995. Comparison of learning algorithms for handwritten digit recognition. In International Conference on Artificial Neural Networks."},{"key":"e_1_3_3_33_2","unstructured":"Dongsoo Lee Se Jung Kwon Byeongwook Kim and Gu-Yeon Wei. 2019. Learning low-rank approximation for CNNs. arXiv:1905.10145. Retrieved from https:\/\/arxiv.org\/abs\/1905.10145"},{"key":"e_1_3_3_34_2","volume-title":"International Conference on Learning Representations (ICLR","author":"Li Hao","year":"2017","unstructured":"Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. 2017. Pruning filters for efficient convnets. In International Conference on Learning Representations (ICLR)."},{"key":"e_1_3_3_35_2","first-page":"1","article-title":"Knowledge distillation with attention for deep transfer learning of convolutional networks","volume":"16","author":"Li Xingjian","year":"2021","unstructured":"Xingjian Li, Haoyi Xiong, Zeyu Chen, Jun Huan, Ji Liu, Cheng-Zhong Xu, and Dejing Dou. 2021. Knowledge distillation with attention for deep transfer learning of convolutional networks. ACM Transactions on Knowledge Discovery from Data 16, 1\u201320 (2021), 3","journal-title":"ACM Transactions on Knowledge Discovery from Data"},{"key":"e_1_3_3_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.00637"},{"key":"e_1_3_3_37_2","volume-title":"International Conference on Learning Representations","author":"Li Yige","year":"2021","unstructured":"Yige Li, Xixiang Lyu, Nodens Koren, Lingjuan Lyu, Bo Li, and Xingjun Ma.2021. Neural attention distillation: Erasing backdoor triggers from deep neural networks. In International Conference on Learning Representations. Retrieved from https:\/\/openreview.net\/forum?id=9l0K4OM-oXE"},{"key":"e_1_3_3_38_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISPA\/IUCC.2017.00030"},{"key":"e_1_3_3_39_2","first-page":"5328","article-title":"Compressing neural networks: Towards determining the optimal layer-wise decomposition","volume":"34","author":"Liebenwein Lucas","year":"2021","unstructured":"Lucas Liebenwein, Alaa Maalouf, Dan Feldman, and Daniela Rus. 2021. Compressing neural networks: Towards determining the optimal layer-wise decomposition. Advances in Neural Information Processing Systems 34 (2021), 5328\u20135344.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_3_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/GCCE46687.2019.9015425"},{"key":"e_1_3_3_41_2","doi-asserted-by":"publisher","DOI":"10.1109\/SOCA.2017.35"},{"key":"e_1_3_3_42_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2021.3084856"},{"key":"e_1_3_3_43_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR42600.2020.00160"},{"key":"e_1_3_3_44_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.298"},{"key":"e_1_3_3_45_2","volume-title":"International Conference on Learning Representations","author":"Liu Zhuang","year":"2019","unstructured":"Zhuang Liu, Mingjie Sun, Tinghui Zhou, Gao Huang, and Trevor Darrell. 2019. Rethinking the value of network pruning. In International Conference on Learning Representations."},{"key":"e_1_3_3_46_2","first-page":"655","volume-title":"Joint European Conference on Machine Learning and Knowledge Discovery in Databases","author":"O\u2019Neill James","year":"2022","unstructured":"James O\u2019Neill, Sourav Dutta, and Haytham Assem. 2022. Self-distilled pruning of deep neural networks. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 655\u2013670."},{"key":"e_1_3_3_47_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i3.16351"},{"key":"e_1_3_3_48_2","volume-title":"International Conference on Learning Representations","author":"Schindler G\u00fcnther","year":"2019","unstructured":"G\u00fcnther Schindler, Wolfgang Roth, Franz Pernkopf, and Holger Fr\u00f6ning. 2019. N-Ary quantization for CNN model compression and inference acceleration. In International Conference on Learning Representations."},{"key":"e_1_3_3_49_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMC.2011.173"},{"key":"e_1_3_3_50_2","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2020.2980516"},{"key":"e_1_3_3_51_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2013.14"},{"key":"e_1_3_3_52_2","volume-title":"International Conference on Learning Representations","author":"Simonyan Karen","year":"2015","unstructured":"Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations."},{"key":"e_1_3_3_53_2","doi-asserted-by":"crossref","unstructured":"Pravendra Singh Vinay Kumar Verma Piyush Rai and Vinay P. Namboodiri. 2019. Play and prune: Adaptive filter pruning for deep model compression. arXiv:1905.04446. Retrieved from https:\/\/arxiv.org\/abs\/1905.04446","DOI":"10.24963\/ijcai.2019\/480"},{"key":"e_1_3_3_54_2","volume-title":"International Conference on Learning Representations","author":"Song Han","year":"2016","unstructured":"Han Song, Huizi Mao, and William J. Dally. 2016. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. In International Conference on Learning Representations."},{"key":"e_1_3_3_55_2","unstructured":"Lazar Supic Rawan Naous Ranko Sredojevic Aleksandra Faust and Vladimir Stojanovic. 2018. MPDCompress-matrix permutation decomposition algorithm for deep neural network compression. arXiv:1805.12085. Retrieved from https:\/\/arxiv.org\/abs\/1805.12085"},{"key":"e_1_3_3_56_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2019.00145"},{"key":"e_1_3_3_57_2","doi-asserted-by":"publisher","DOI":"10.1145\/3020078.3021744"},{"key":"e_1_3_3_58_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.78"},{"key":"e_1_3_3_59_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v37i9.26244"},{"key":"e_1_3_3_60_2","doi-asserted-by":"publisher","DOI":"10.1145\/3530836"},{"key":"e_1_3_3_61_2","doi-asserted-by":"publisher","DOI":"10.1145\/3534678.3539437"},{"key":"e_1_3_3_62_2","volume-title":"International Conference on Learning Representations","author":"Yang Qing","year":"2019","unstructured":"Qing Yang, Wei Wen, Zuoguan Wang, Yiran Chen, and Hai Li. 2019. Integral pruning on activations and weights for efficient neural networks. In International Conference on Learning Representations."},{"key":"e_1_3_3_63_2","volume-title":"International Conference on Learning Representations","author":"Ye Shaokai","year":"2019","unstructured":"Shaokai Ye, Tianyun Zhang, Kaiqi Zhang, Jiayu Li, Kaidi Xu, Yunfei Yang, Fuxun Yu, Jian Tang, Makan Fardad, Sijia Liu, et al. 2019. Progressive weight pruning of deep neural networks using ADMM. In International Conference on Learning Representations."},{"key":"e_1_3_3_64_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00958"},{"key":"e_1_3_3_65_2","unstructured":"Sergey Zagoruyko and Nikos Komodakis. 2016. Wide residual networks. arXiv:1605.07146. Retrieved from https:\/\/arxiv.org\/abs\/1605.07146"},{"key":"e_1_3_3_66_2","volume-title":"International Conference on Learning Representations (ICLR","author":"Zagoruyko Sergey","year":"2017","unstructured":"Sergey Zagoruyko, and Nikos Komodakis. 2017. Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. In International Conference on Learning Representations (ICLR)."},{"key":"e_1_3_3_67_2","first-page":"649","article-title":"Character-level convolutional networks for text classification","author":"Zhang Xiang","year":"2015","unstructured":"Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. In Advances in Neural Information Processing Systems, 649\u2013657.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_3_68_2","unstructured":"Zaida Zhou Chaoran Zhuge Xinwei Guan and Wen Liu. 2020. Channel distillation: Channel-wise attention for knowledge distillation. arXiv:2006.01683. Retrieved from https:\/\/arxiv.org\/abs\/2006.01683"}],"container-title":["ACM Transactions on Knowledge Discovery from Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3721293","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3721293","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:57:39Z","timestamp":1750298259000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3721293"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,5,9]]},"references-count":67,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2025,5,31]]}},"alternative-id":["10.1145\/3721293"],"URL":"https:\/\/doi.org\/10.1145\/3721293","relation":{},"ISSN":["1556-4681","1556-472X"],"issn-type":[{"value":"1556-4681","type":"print"},{"value":"1556-472X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,5,9]]},"assertion":[{"value":"2024-02-04","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-02-17","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-05-09","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}