{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,3]],"date-time":"2026-03-03T10:17:33Z","timestamp":1772533053583,"version":"3.50.1"},"reference-count":37,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2021,1,10]],"date-time":"2021-01-10T00:00:00Z","timestamp":1610236800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,1,10]],"date-time":"2021-01-10T00:00:00Z","timestamp":1610236800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Open Fund of Shenzhen Institue of Artificial Intelligence and Robotics for Societ","award":["AC01202005016"],"award-info":[{"award-number":["AC01202005016"]}]},{"DOI":"10.13039\/501100004750","name":"Aeronautical Science Foundation of China","doi-asserted-by":"publisher","award":["2019ZE057001"],"award-info":[{"award-number":["2019ZE057001"]}],"id":[{"id":"10.13039\/501100004750","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100013105","name":"Shanghai Rising-Star Program","doi-asserted-by":"publisher","award":["20QC1401100"],"award-info":[{"award-number":["20QC1401100"]}],"id":[{"id":"10.13039\/501100013105","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2022,4]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>To deploy deep neural networks to edge devices with limited computation and storage costs, model compression is necessary for the application of deep learning. Pruning, as a traditional way of model compression, seeks to reduce the parameters of model weights. However, when a deep neural network is pruned, the accuracy of the network will significantly decrease. The traditional way to decrease the accuracy loss is fine-tuning. When over many parameters are pruned, the pruned network\u2019s capacity is reduced heavily and cannot recover to high accuracy. In this paper, we apply the knowledge distillation strategy to abate the accuracy loss of pruned models. The original network of the pruned network was used as the teacher network, aiming to transfer the dark knowledge from the original network to the pruned sub-network. We have applied three mainstream knowledge distillation methods: response-based knowledge, feature-based knowledge, and relation-based knowledge (Gou et al. in Knowledge distillation: a survey. <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"http:\/\/arxiv.org\/abs\/200605525\">arXiv:200605525<\/jats:ext-link>, 2020), and compare the result to the traditional fine-tuning method with grand-truth labels. Experiments have been done on the CIFAR100 dataset with several deep convolution neural network. Results show that the pruned network recovered by knowledge distillation with its original network performs better accuracy than it recovered by fine-tuning with sample labels. It has also been validated in this paper that the original network as the teacher performs better than differently structured networks with same accuracy as the teacher.<\/jats:p>","DOI":"10.1007\/s40747-020-00248-y","type":"journal-article","created":{"date-parts":[[2021,1,10]],"date-time":"2021-01-10T02:07:58Z","timestamp":1610244478000},"page":"709-718","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":29,"title":["Knowledge from the original network: restore a better pruned network with knowledge distillation"],"prefix":"10.1007","volume":"8","author":[{"given":"Liyang","family":"Chen","sequence":"first","affiliation":[]},{"given":"Yongquan","family":"Chen","sequence":"additional","affiliation":[]},{"given":"Juntong","family":"Xi","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0318-9497","authenticated-orcid":false,"given":"Xinyi","family":"Le","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2021,1,10]]},"reference":[{"issue":"32","key":"248_CR1","doi-asserted-by":"publisher","first-page":"15849","DOI":"10.1073\/pnas.1903070116","volume":"116","author":"M Belkin","year":"2019","unstructured":"Belkin M, Hsu D, Ma S, Mandal S (2019) Reconciling modern machine-learning practice and the classical bias-variance trade-off. Proc Natl Acad Sci 116(32):15849\u201315854","journal-title":"Proc Natl Acad Sci"},{"key":"248_CR2","first-page":"1928","volume":"30","author":"H Chen","year":"2020","unstructured":"Chen H, Wang Y, Xu C, Xu C, Tao D (2020) Learning student networks via feature embedding. IEEE Trans Neural Netw Learn Syst 30:1928\u20131942","journal-title":"IEEE Trans Neural Netw Learn Syst"},{"key":"248_CR3","first-page":"2148","volume":"2013","author":"M Denil","year":"2013","unstructured":"Denil M, Shakibi B, Dinh L, Ranzato M, De Freitas N (2013) Predicting parameters in deep learning. Adv Neural Inf Process Syst 2013:2148\u20132156","journal-title":"Adv Neural Inf Process Syst"},{"key":"248_CR4","unstructured":"Frankle J, Carbin M (2018) The lottery ticket hypothesis: finding sparse, trainable neural networks. In: International conference on learning representations (ICLR)"},{"key":"248_CR5","unstructured":"Furlanello T, Lipton ZC, Tschannen M, Itti L, Anandkumar A (2018) Born again neural networks. In: International conference on machine learning (ICML)"},{"key":"248_CR6","unstructured":"Gou J, Yu B, Maybank SJ, Tao D (2020) Knowledge distillation: a survey. arXiv:2006.05525"},{"key":"248_CR7","first-page":"1379","volume":"2016","author":"Y Guo","year":"2016","unstructured":"Guo Y, Yao A, Chen Y (2016) Dynamic network surgery for efficient DNNs. Adv Neural Inf Process Syst 2016:1379\u20131387","journal-title":"Adv Neural Inf Process Syst"},{"key":"248_CR8","unstructured":"Hagiwara M (1993) Removal of hidden units and weights for back propagation networks. In: International joint conference on neural networks (IJCNN)"},{"key":"248_CR9","first-page":"1135","volume":"2015","author":"S Han","year":"2015","unstructured":"Han S, Pool J, Tran J, Dally W (2015) Learning both weights and connections for efficient neural network. Adv Neural Inf Process Syst 2015:1135\u20131143","journal-title":"Adv Neural Inf Process Syst"},{"key":"248_CR10","unstructured":"Han S, Mao H, Dally WJ (2016) Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. In: International conference on learning representations (ICLR), pp 1\u201314"},{"key":"248_CR11","doi-asserted-by":"crossref","unstructured":"He K, Zhang X, Ren S, Sun J (2016a) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 770\u2013778","DOI":"10.1109\/CVPR.2016.90"},{"key":"248_CR12","unstructured":"He K, Zhang X, Ren S, Sun J (2016b) Identity mappings in deep residual networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)"},{"key":"248_CR13","doi-asserted-by":"crossref","unstructured":"Heo B, Kim J, Yun S, Park H, Choi JY (2019) A comprehensive overhaul of feature distillation. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 1921\u20131930","DOI":"10.1109\/ICCV.2019.00201"},{"key":"248_CR14","unstructured":"Hinton G, Vinyals O, Dean J (2014) Distilling the knowledge in a neural network. arXiv:1503.02531"},{"key":"248_CR15","doi-asserted-by":"crossref","unstructured":"Huang Z, Yu Y, Xu J, Ni F, Le X (2020) PF-Net: point fractal network for 3D point cloud completion. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 7662\u20137670","DOI":"10.1109\/CVPR42600.2020.00768"},{"issue":"1","key":"248_CR16","first-page":"6869","volume":"18","author":"I Hubara","year":"2017","unstructured":"Hubara I, Courbariaux M, Soudry D, El-Yaniv R, Bengio Y (2017) Quantized neural networks: training neural networks with low precision weights and activations. J Mach Learn Res 18(1):6869\u20136898","journal-title":"J Mach Learn Res"},{"key":"248_CR17","first-page":"1","volume":"2020","author":"Y Jia","year":"2020","unstructured":"Jia Y, Chen X, Yu J, Wang L, Wang Y (2020) Speaker recognition based on characteristic spectrograms and an improved self-organizing feature map neural network. Complex Intell Syst 2020:1\u20139","journal-title":"Complex Intell Syst"},{"key":"248_CR18","first-page":"2760","volume":"2018","author":"J Kim","year":"2018","unstructured":"Kim J, Park S, Kwak N (2018) Paraphrasing complex network: network compression via factor transfer. Adv Neural Inf Process Syst 2018:2760\u201327693","journal-title":"Adv Neural Inf Process Syst"},{"issue":"2","key":"248_CR19","doi-asserted-by":"publisher","first-page":"119","DOI":"10.1007\/s40747-017-0064-6","volume":"4","author":"D Kollias","year":"2018","unstructured":"Kollias D, Tagaris A, Stafylopatis A, Kollias S, Tagaris G (2018) Deep neural architectures for prediction in healthcare. Complex Intell Syst 4(2):119\u2013131","journal-title":"Complex Intell Syst"},{"key":"248_CR20","doi-asserted-by":"publisher","first-page":"112","DOI":"10.1016\/j.neucom.2019.09.107","volume":"408","author":"X Le","year":"2020","unstructured":"Le X, Mei J, Zhang H, Zhou B, Xi J (2020) A learning-based approach for surface defect detection using small image datasets. Neurocomputing 408:112\u2013120","journal-title":"Neurocomputing"},{"key":"248_CR21","first-page":"598","volume":"1990","author":"Y LeCun","year":"1990","unstructured":"LeCun Y, Denker JS, Solla SA (1990) Optimal brain damage. Adv Neural Inf Process Syst 1990:598\u2013605","journal-title":"Adv Neural Inf Process Syst"},{"key":"248_CR22","unstructured":"Li H, Kadav A, Durdanovic I, Samet H, Graf HP (2017) Pruning filters for efficient convnets. In: International conference on learning representations (ICCV)"},{"issue":"12","key":"248_CR23","doi-asserted-by":"publisher","first-page":"2935","DOI":"10.1109\/TPAMI.2017.2773081","volume":"40","author":"Z Li","year":"2017","unstructured":"Li Z, Hoiem D (2017) Learning without forgetting. IEEE Trans Pattern Anal Mach Intell 40(12):2935\u20132947","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"248_CR24","first-page":"6754","volume":"33","author":"J Liu","year":"2019","unstructured":"Liu J, Chen Y, Liu K (2019) Exploiting the ground-truth: an adversarial imitation based knowledge distillation approach for event detection. Proc Conf AAAI Artif Intell 33:6754\u20136761","journal-title":"Proc Conf AAAI Artif Intell"},{"key":"248_CR25","doi-asserted-by":"crossref","unstructured":"Mirzadeh SI, Farajtabar M, Li A, Levine N, Matsukawa A, Ghasemzadeh H (2019) Improved knowledge distillation via teacher assistant. arXiv:1902.03393","DOI":"10.1609\/aaai.v34i04.5963"},{"key":"248_CR26","doi-asserted-by":"crossref","unstructured":"Park W, Kim D, Lu Y, Cho M (2020) Relational knowledge distillation. In: 2019 IEEE\/CVF conference on computer vision and pattern recognition (CVPR)","DOI":"10.1109\/CVPR.2019.00409"},{"key":"248_CR27","unstructured":"Ruffy F, Chahal K (2019) The state of knowledge distillation for classification. arXiv:1912.10850"},{"key":"248_CR28","unstructured":"Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International conference on learning representations (ICLR)"},{"key":"248_CR29","unstructured":"Tian Y, Krishnan D, Isola P (2019) Contrastive representation distillation. arXiv:1910.10699"},{"key":"248_CR30","doi-asserted-by":"crossref","unstructured":"Tung F, Mori G (2019) Similarity-preserving knowledge distillationn. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 1365\u20131374","DOI":"10.1109\/ICCV.2019.00145"},{"key":"248_CR31","unstructured":"Turc I, Chang MW, Lee K, Toutanova K (2019) Well-read students learn better: the impact of student initialization on knowledge distillation. arXiv:1908.08962"},{"key":"248_CR32","doi-asserted-by":"crossref","unstructured":"Wei Y, Pan X, Qin H, Ouyang W, Yan J (2018) Quantization mimic: towards very tiny cnn for object detection. In: Proceedings of the European conference on computer vision (ECCV), pp 267\u2013283","DOI":"10.1007\/978-3-030-01237-3_17"},{"key":"248_CR33","doi-asserted-by":"publisher","first-page":"2089","DOI":"10.1007\/s11263-019-01286-x","volume":"128","author":"X Wu","year":"2020","unstructured":"Wu X, He R, Hu Y, Sun Z (2020) Learning an evolutionary embedding via massive knowledge distillation. Int J Comput Vis 128:2089\u20132106","journal-title":"Int J Comput Vis"},{"key":"248_CR34","doi-asserted-by":"publisher","first-page":"192","DOI":"10.1016\/j.neucom.2019.12.032","volume":"384","author":"Y Yu","year":"2020","unstructured":"Yu Y, Huang Z, Li F, Zhang H, Le X (2020) Point encoder gan: a deep learning model for 3D point cloud inpainting. Neurocomputing 384:192\u2013199","journal-title":"Neurocomputing"},{"key":"248_CR35","unstructured":"Zagoruyko S, Komodakis N (2016) Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. arXiv:1612.03928"},{"key":"248_CR36","doi-asserted-by":"crossref","unstructured":"Zagoruyko S, Komodakis N (2017) Wide residual networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)","DOI":"10.5244\/C.30.87"},{"key":"248_CR37","unstructured":"Zhu M, Gupta S (2018) To prune, or not to prune: Exploring the efficacy of pruning for model compression. In: International conference on learning representation (ICLR)"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-020-00248-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-020-00248-y\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-020-00248-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,4,29]],"date-time":"2022-04-29T17:27:27Z","timestamp":1651253247000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-020-00248-y"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,1,10]]},"references-count":37,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2022,4]]}},"alternative-id":["248"],"URL":"https:\/\/doi.org\/10.1007\/s40747-020-00248-y","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"value":"2199-4536","type":"print"},{"value":"2198-6053","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,1,10]]},"assertion":[{"value":"9 October 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 December 2020","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"10 January 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Compliance with ethical standards"}},{"value":"On behalf of all authors, the corresponding author states that there is no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}