{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,25]],"date-time":"2025-11-25T09:03:29Z","timestamp":1764061409899,"version":"3.45.0"},"reference-count":52,"publisher":"PeerJ","license":[{"start":{"date-parts":[[2025,11,25]],"date-time":"2025-11-25T00:00:00Z","timestamp":1764028800000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"abstract":"<jats:p>Despite the efficacy of network sparsity in reducing the complexity of convolutional neural networks (CNNs), the performance of sparse networks often deteriorates significantly compared to their dense counterparts. Knowledge distillation is regarded as a potent strategy for utilizing large models to augment the performance of smaller counterparts; however, its advantages for sparse networks remain substantially constrained. We identify in this article that the underlying issue stems from sparse student models exhibiting disparate behaviors in processing foreground and background features, thereby hindering the uniform transfer of knowledge from dense models that address both feature types concurrently. Building on this insight, we introduce a novel sparsity-friendly knowledge distillation (SF-KD) method, which independently supervises the two feature types using feature decoupling to facilitate effective knowledge distillation for sparse networks. Specifically, we decouple the foreground and background features through unique pooling techniques and implement separate mean squared error (MSE) feature distillation. Furthermore, we dynamically adjust the weights of the two loss components to optimize performance. Experimental results on Canadian Institute For Advanced Research (CIFAR) datasets (including CIFAR-10 and CIFAR-100) and Mini-ImageNet benchmarks substantiate significant performance enhancements, underscoring the effectiveness of our proposed methodology.<\/jats:p>","DOI":"10.7717\/peerj-cs.3388","type":"journal-article","created":{"date-parts":[[2025,11,25]],"date-time":"2025-11-25T08:54:52Z","timestamp":1764060892000},"page":"e3388","source":"Crossref","is-referenced-by-count":0,"title":["Towards optimal sparse CNNs: sparsity-friendly knowledge distillation through feature decoupling"],"prefix":"10.7717","volume":"11","author":[{"given":"Weihong","family":"He","sequence":"first","affiliation":[{"name":"School of Electronic and Information Engineering, South China University of Technology, Guangzhou, China"},{"name":"School of Electrical and Computer Engineering, Nanfang College Guangzhou, Guangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yuli","family":"Fu","sequence":"additional","affiliation":[{"name":"School of Electronic and Information Engineering, South China University of Technology, Guangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Youjun","family":"Xiang","sequence":"additional","affiliation":[{"name":"School of Electronic and Information Engineering, South China University of Technology, Guangzhou, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"4443","published-online":{"date-parts":[[2025,11,25]]},"reference":[{"key":"10.7717\/peerj-cs.3388\/ref-1","doi-asserted-by":"crossref","DOI":"10.1609\/aaai.v34i04.5746","article-title":"Online knowledge distillation with diverse peers","author":"Chen","year":"2020"},{"key":"10.7717\/peerj-cs.3388\/ref-2","doi-asserted-by":"crossref","DOI":"10.1109\/CVPR52688.2022.01163","article-title":"Knowledge distillation with the reused teacher classifier","author":"Chen","year":"2022a"},{"key":"10.7717\/peerj-cs.3388\/ref-3","article-title":"GP-NAS-ensemble: a model for the NAS performance prediction","author":"Chen","year":"2022b"},{"key":"10.7717\/peerj-cs.3388\/ref-4","article-title":"Feature-map-level online adversarial knowledge distillation","author":"Chung","year":"2020"},{"key":"10.7717\/peerj-cs.3388\/ref-5","article-title":"Progressive skeletonization: trimming more fat from a network at initialization","author":"de Jorge","year":"2021"},{"key":"10.7717\/peerj-cs.3388\/ref-6","first-page":"248","article-title":"Imagenet: a large-scale hierarchical image database","author":"Deng","year":"2009"},{"key":"10.7717\/peerj-cs.3388\/ref-7","first-page":"6382","article-title":"Global sparse momentum SGD for pruning very deep neural networks","author":"Ding","year":"2019"},{"key":"10.7717\/peerj-cs.3388\/ref-8","doi-asserted-by":"crossref","DOI":"10.1109\/ICASSP49357.2023.10094709","article-title":"RD-NAS: enhancing one-shot supernet ranking ability via ranking distillation from zero-cost proxies","author":"Dong","year":"2023"},{"key":"10.7717\/peerj-cs.3388\/ref-9","first-page":"2943","article-title":"Rigging the lottery: making all tickets winners","author":"Evci","year":"2020"},{"key":"10.7717\/peerj-cs.3388\/ref-10","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1902.09574","article-title":"The state of sparsity in deep neural networks","author":"Gale","year":"2019"},{"issue":"2","key":"10.7717\/peerj-cs.3388\/ref-11","doi-asserted-by":"publisher","first-page":"601","DOI":"10.1109\/tnnls.2018.2846646","article-title":"Dendritic neuron model with effective learning algorithms for classification, approximation and prediction","volume":"30","author":"Gao","year":"2019","journal-title":"IEEE Transactions on Neural Networks and Learning Systems"},{"key":"10.7717\/peerj-cs.3388\/ref-12","doi-asserted-by":"crossref","DOI":"10.1109\/CVPR52729.2023.01142","article-title":"Class attention transfer based knowledge distillation","author":"Guo","year":"2023"},{"key":"10.7717\/peerj-cs.3388\/ref-13","first-page":"1135","article-title":"Learning both weights and connections for efficient neural network","author":"Han","year":"2015"},{"key":"10.7717\/peerj-cs.3388\/ref-14","first-page":"5552","article-title":"Exploring unexplored tensor network decompositions for convolutional neural networks","author":"Hayashi","year":"2019"},{"key":"10.7717\/peerj-cs.3388\/ref-15","article-title":"Sparse friendly distillation using feature decoupling. Research Square. This work is licensed under a Creative Commons Attribution 4.0 International License","author":"He","year":"2024"},{"key":"10.7717\/peerj-cs.3388\/ref-16","first-page":"770","article-title":"Deep residual learning for image recognition","author":"He","year":"2016"},{"key":"10.7717\/peerj-cs.3388\/ref-17","first-page":"1389","article-title":"Channel pruning for accelerating very deep neural networks","author":"He","year":"2017"},{"key":"10.7717\/peerj-cs.3388\/ref-18","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1503.02531","article-title":"Distilling the knowledge in a neural network","author":"Hinton","year":"2015"},{"key":"10.7717\/peerj-cs.3388\/ref-19","doi-asserted-by":"publisher","first-page":"108025","DOI":"10.1016\/j.patcog.2021.108025","article-title":"Improving one-shot NAS with shrinking-and-expanding supernet","volume":"118","author":"Hu","year":"2021","journal-title":"Pattern Recognition"},{"key":"10.7717\/peerj-cs.3388\/ref-20","doi-asserted-by":"publisher","first-page":"1365","DOI":"10.1109\/tip.2022.3141255","article-title":"Feature map distillation of thin nets for low-resolution object recognition","volume":"31","author":"Huang","year":"2022","journal-title":"IEEE Transactions on Image Processing"},{"key":"10.7717\/peerj-cs.3388\/ref-21","first-page":"4107","article-title":"Binarized neural networks","author":"Hubara","year":"2016"},{"key":"10.7717\/peerj-cs.3388\/ref-22","article-title":"Learning multiple layers of features from tiny images","author":"Krizhevsky","year":"2009"},{"key":"10.7717\/peerj-cs.3388\/ref-23","first-page":"598","article-title":"Optimal brain damage","author":"LeCun","year":"1990"},{"key":"10.7717\/peerj-cs.3388\/ref-24","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2202.02643","article-title":"The unreasonable effectiveness of random pruning: return of the most naive baseline for sparse training","author":"Liu","year":"2022"},{"key":"10.7717\/peerj-cs.3388\/ref-25","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2305.13803","article-title":"Norm: knowledge distillation via n-to-one representation matching","author":"Liu","year":"2023"},{"issue":"1","key":"10.7717\/peerj-cs.3388\/ref-26","doi-asserted-by":"publisher","first-page":"202","DOI":"10.1007\/s11263-019-01227-8","article-title":"Bi-Real Net: binarizing deep network towards real-network performance","volume":"128","author":"Liu","year":"2020","journal-title":"International Journal of Computer Vision (IJCV)"},{"key":"10.7717\/peerj-cs.3388\/ref-27","first-page":"6336","article-title":"Finding trainable sparse networks through neural tangent transfer","author":"Liu","year":"2020"},{"key":"10.7717\/peerj-cs.3388\/ref-28","article-title":"Diversity networks: neural network compression using determinantal point processes","author":"Mariet","year":"2016"},{"issue":"1","key":"10.7717\/peerj-cs.3388\/ref-29","doi-asserted-by":"publisher","first-page":"2383","DOI":"10.1038\/s41467-018-04316-3","article-title":"Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science","volume":"9","author":"Mocanu","year":"2018","journal-title":"Nature Communications"},{"key":"10.7717\/peerj-cs.3388\/ref-30","doi-asserted-by":"crossref","DOI":"10.1109\/CVPR.2019.00409","article-title":"Relational knowledge distillation","author":"Park","year":"2019"},{"key":"10.7717\/peerj-cs.3388\/ref-31","first-page":"300","article-title":"Extreme network compression via filter group approximation","author":"Peng","year":"2018"},{"key":"10.7717\/peerj-cs.3388\/ref-32","article-title":"Distilling knowledge via knowledge review","author":"Pengguang","year":"2021"},{"key":"10.7717\/peerj-cs.3388\/ref-33","doi-asserted-by":"crossref","DOI":"10.1609\/aaai.v36i2.20108","article-title":"Activation modulation and recalibration scheme for weakly supervised semantic segmentation","author":"Qin","year":"2022"},{"key":"10.7717\/peerj-cs.3388\/ref-34","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1412.6550","article-title":"Fitnets: hints for thin deep nets","author":"Romero","year":"2014"},{"key":"10.7717\/peerj-cs.3388\/ref-35","article-title":"Fitnets: hints for thin deep nets","author":"Romero","year":"2015"},{"key":"10.7717\/peerj-cs.3388\/ref-36","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2305.10769","article-title":"Catch-up distillation: you only need to train once for accelerating sampling","author":"Shao","year":"2023"},{"key":"10.7717\/peerj-cs.3388\/ref-37","doi-asserted-by":"crossref","DOI":"10.1109\/TPAMI.2021.3127492","article-title":"Distilled siamese networks for visual tracking","author":"Shen","year":"2022"},{"key":"10.7717\/peerj-cs.3388\/ref-38","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1604.03540","article-title":"Training region-based object detectors with online hard example mining","author":"Shrivastava","year":"2016"},{"key":"10.7717\/peerj-cs.3388\/ref-39","article-title":"Filter distillation for network compression","author":"Suau","year":"2019"},{"key":"10.7717\/peerj-cs.3388\/ref-40","article-title":"Pruning neural networks without any data by iteratively conserving synaptic flow","author":"Tanaka","year":"2020"},{"key":"10.7717\/peerj-cs.3388\/ref-41","article-title":"Contrastive representation distillation","author":"Tian","year":"2020"},{"key":"10.7717\/peerj-cs.3388\/ref-42","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2006.00896","article-title":"Pruning via iterative ranking of sensitivity statistics","author":"Verdenius","year":"2020"},{"key":"10.7717\/peerj-cs.3388\/ref-43","article-title":"Picking winning tickets before training by preserving gradient flow","author":"Wang","year":"2020"},{"key":"10.7717\/peerj-cs.3388\/ref-44","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2311.14337","article-title":"Tvt: training-free vision transformer search on tiny datasets","author":"Wei","year":"2023"},{"key":"10.7717\/peerj-cs.3388\/ref-45","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2209.07738","article-title":"DMFormer: closing the gap between cnn and vision transformers","author":"Wei","year":"2022"},{"key":"10.7717\/peerj-cs.3388\/ref-46","article-title":"Training convolutional neural networks with cheap convolutions and online distillation","author":"Xie","year":"2019"},{"issue":"6","key":"10.7717\/peerj-cs.3388\/ref-47","doi-asserted-by":"publisher","first-page":"4188","DOI":"10.1109\/tpami.2024.3354928","article-title":"Learning from human educational wisdom: a student-centered knowledge distillation method","volume":"46","author":"Yang","year":"2024","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"10.7717\/peerj-cs.3388\/ref-48","doi-asserted-by":"crossref","DOI":"10.1109\/CVPR.2017.754","article-title":"A gift from knowledge distillation: fast optimization, network minimization and transfer learning","author":"Yim","year":"2017"},{"key":"10.7717\/peerj-cs.3388\/ref-49","article-title":"Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer","author":"Zagoruyko","year":"2017"},{"issue":"2","key":"10.7717\/peerj-cs.3388\/ref-50","doi-asserted-by":"publisher","first-page":"487","DOI":"10.1109\/jas.2023.124029","article-title":"A novel tensor decomposition-based efficient detector for low-altitude aerial objects with knowledge distillation scheme","volume":"11","author":"Zeng","year":"2024","journal-title":"IEEE\/CAA Journal of Automatica Sinica"},{"issue":"2","key":"10.7717\/peerj-cs.3388\/ref-51","doi-asserted-by":"publisher","first-page":"445","DOI":"10.1109\/jas.2017.7510817","article-title":"An online fault detection model and strategies based on SVM-Grid in clouds","volume":"5","author":"Zhang","year":"2018","journal-title":"IEEE\/CAA Journal of Automatica Sinica"},{"key":"10.7717\/peerj-cs.3388\/ref-52","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.1710.01878","article-title":"To prune, or not to prune: exploring the efficacy of pruning for model compression","author":"Zhu","year":"2017"}],"container-title":["PeerJ Computer Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/peerj.com\/articles\/cs-3388.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/peerj.com\/articles\/cs-3388.xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/peerj.com\/articles\/cs-3388.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/peerj.com\/articles\/cs-3388.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,11,25]],"date-time":"2025-11-25T08:54:57Z","timestamp":1764060897000},"score":1,"resource":{"primary":{"URL":"https:\/\/peerj.com\/articles\/cs-3388"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,11,25]]},"references-count":52,"alternative-id":["10.7717\/peerj-cs.3388"],"URL":"https:\/\/doi.org\/10.7717\/peerj-cs.3388","archive":["CLOCKSS","LOCKSS","Portico"],"relation":{},"ISSN":["2376-5992"],"issn-type":[{"value":"2376-5992","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,11,25]]},"article-number":"e3388"}}