{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,26]],"date-time":"2025-07-26T08:45:25Z","timestamp":1753519525750,"version":"3.37.3"},"reference-count":48,"publisher":"Springer Science and Business Media LLC","issue":"3","license":[{"start":{"date-parts":[[2024,3,20]],"date-time":"2024-03-20T00:00:00Z","timestamp":1710892800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,3,20]],"date-time":"2024-03-20T00:00:00Z","timestamp":1710892800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100017668","name":"Anhui Provincial Key Research and Development Plan","doi-asserted-by":"publisher","award":["202104a06020012","202204c06020022","201904a06020056"],"award-info":[{"award-number":["202104a06020012","202204c06020022","201904a06020056"]}],"id":[{"id":"10.13039\/501100017668","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Independent Project of Anhui Key Laboratory of Smart Agricultural Technology and Equipment","award":["APKLSATE2019X001"],"award-info":[{"award-number":["APKLSATE2019X001"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2024,6]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Knowledge distillation can transfer the knowledge from the pre-trained teacher model to the student model, thus effectively accomplishing model compression. Previous studies have carefully crafted knowledge representation, targeting loss function design, and distillation location selection, but there have been few studies on the role of classifiers in distillation. Previous experiences have shown that the final classifier of the model has an essential role in making inferences, so this paper attempts to narrow the gap in performance between models by having the student model directly use the classifier of the teacher model for the final inference, which requires an additional projector to help match features of the student encoder with the teacher's classifier. However, a single projector cannot fully align the features, and integrating multiple projectors may result in better performance. Considering the balance between projector size and performance, through experiments, we obtain the size of projectors for different network combinations and propose a simple method for projector integration. In this way, the student model undergoes feature projection and then uses the classifiers of the teacher model for inference, obtaining a similar performance to the teacher model. Through extensive experiments on the CIFAR-100 and Tiny-ImageNet datasets, we show that our approach applies to various teacher\u2013student frameworks simply and effectively.<\/jats:p>","DOI":"10.1007\/s40747-024-01394-3","type":"journal-article","created":{"date-parts":[[2024,3,20]],"date-time":"2024-03-20T02:01:40Z","timestamp":1710900100000},"page":"4521-4533","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["Knowledge distillation based on projector integration and classifier sharing"],"prefix":"10.1007","volume":"10","author":[{"given":"Guanpeng","family":"Zuo","sequence":"first","affiliation":[]},{"given":"Chenlu","family":"Zhang","sequence":"additional","affiliation":[]},{"given":"Zhe","family":"Zheng","sequence":"additional","affiliation":[]},{"given":"Wu","family":"Zhang","sequence":"additional","affiliation":[]},{"given":"Ruiqing","family":"Wang","sequence":"additional","affiliation":[]},{"given":"Jingqi","family":"Lu","sequence":"additional","affiliation":[]},{"given":"Xiu","family":"Jin","sequence":"additional","affiliation":[]},{"given":"Zhaohui","family":"Jiang","sequence":"additional","affiliation":[]},{"given":"Yuan","family":"Rao","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,3,20]]},"reference":[{"issue":"11","key":"1394_CR1","doi-asserted-by":"publisher","first-page":"2278","DOI":"10.1109\/5.726791","volume":"86","author":"Y LeCun","year":"1998","unstructured":"LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278\u20132324. https:\/\/doi.org\/10.1109\/5.726791","journal-title":"Proc IEEE"},{"issue":"6","key":"1394_CR2","doi-asserted-by":"publisher","first-page":"84","DOI":"10.1145\/3065386","volume":"60","author":"A Krizhevsky","year":"2017","unstructured":"Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60(6):84\u201390. https:\/\/doi.org\/10.1145\/3065386","journal-title":"Commun ACM"},{"key":"1394_CR3","doi-asserted-by":"publisher","unstructured":"Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: 3rd international conference on learning representations (ICLR 2015), pp 1\u201314. https:\/\/doi.org\/10.48550\/arXiv.1409.1556","DOI":"10.48550\/arXiv.1409.1556"},{"key":"1394_CR4","doi-asserted-by":"publisher","unstructured":"Redmon J, Divvala S, Girshick R et al (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779\u2013788. https:\/\/doi.org\/10.1109\/CVPR.2016.91","DOI":"10.1109\/CVPR.2016.91"},{"key":"1394_CR5","doi-asserted-by":"publisher","unstructured":"Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440\u20131448. https:\/\/doi.org\/10.48550\/arXiv.1504.08083","DOI":"10.48550\/arXiv.1504.08083"},{"key":"1394_CR6","doi-asserted-by":"publisher","first-page":"28","DOI":"10.1109\/TPAMI.2016.2577031","volume":"2015","author":"S Ren","year":"2015","unstructured":"Ren S, He K, Girshick R et al (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 2015:28. https:\/\/doi.org\/10.1109\/TPAMI.2016.2577031","journal-title":"Adv Neural Inf Process Syst"},{"key":"1394_CR7","doi-asserted-by":"publisher","first-page":"30","DOI":"10.48550\/arXiv.1706.03762","volume":"2017","author":"A Vaswani","year":"2017","unstructured":"Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need. Adv Neural Inf Process Syst 2017:30. https:\/\/doi.org\/10.48550\/arXiv.1706.03762","journal-title":"Adv Neural Inf Process Syst"},{"key":"1394_CR8","doi-asserted-by":"publisher","first-page":"27730","DOI":"10.48550\/arXiv.2203.02155","volume":"35","author":"L Ouyang","year":"2022","unstructured":"Ouyang L, Wu J, Jiang X et al (2022) Training language models to follow instructions with human feedback. Adv Neural Inf Process Syst 35:27730\u201327744. https:\/\/doi.org\/10.48550\/arXiv.2203.02155","journal-title":"Adv Neural Inf Process Syst"},{"key":"1394_CR9","doi-asserted-by":"publisher","first-page":"2533","DOI":"10.1007\/s00521-018-3937-8","volume":"32","author":"S Malakar","year":"2020","unstructured":"Malakar S, Ghosh M, Bhowmik S et al (2020) A GA based hierarchical feature selection approach for handwritten word recognition. Neural Comput Appl 32:2533\u20132552. https:\/\/doi.org\/10.1007\/s00521-018-3937-8","journal-title":"Neural Comput Appl"},{"key":"1394_CR10","doi-asserted-by":"publisher","unstructured":"Zagoruyko S, Komodakis N (2016) Wide residual networks. Preprint arXiv:1605.07146. https:\/\/doi.org\/10.48550\/arXiv.1605.07146","DOI":"10.48550\/arXiv.1605.07146"},{"key":"1394_CR11","doi-asserted-by":"publisher","first-page":"5779","DOI":"10.1007\/s40747-023-01036-0","volume":"9","author":"R Wang","year":"2023","unstructured":"Wang R, Wan S, Zhang W et al (2023) Progressive multi-level distillation learning for pruning network. Complex Intell Syst 9:5779\u20135791. https:\/\/doi.org\/10.1007\/s40747-023-01036-0","journal-title":"Complex Intell Syst"},{"key":"1394_CR12","doi-asserted-by":"publisher","unstructured":"Liu Z, Li J, Shen Z et al (2017) Learning efficient convolutional networks through network slimming. In: Proceedings of the IEEE international conference on computer vision, pp 2736\u20132744. https:\/\/doi.org\/10.48550\/arXiv.1708.06519","DOI":"10.48550\/arXiv.1708.06519"},{"key":"1394_CR13","doi-asserted-by":"publisher","unstructured":"Gholami A, Kim S, Dong Z et al (2022) A survey of quantization methods for efficient neural network inference. In: Low-power computer vision. Chapman and Hall\/CRC, London, pp 291\u2013326. https:\/\/doi.org\/10.48550\/arXiv.2103.1363","DOI":"10.48550\/arXiv.2103.1363"},{"key":"1394_CR14","doi-asserted-by":"publisher","unstructured":"Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. Preprint arXiv:1503.02531. https:\/\/doi.org\/10.48550\/arXiv.1503.02531","DOI":"10.48550\/arXiv.1503.02531"},{"key":"1394_CR15","doi-asserted-by":"publisher","unstructured":"Zhang X, Zhou X, Lin M et al (2018) Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6848\u20136856. https:\/\/doi.org\/10.1109\/CVPR.2018.00716","DOI":"10.1109\/CVPR.2018.00716"},{"key":"1394_CR16","doi-asserted-by":"publisher","first-page":"1789","DOI":"10.1007\/s11263-021-01453-z","volume":"129","author":"J Gou","year":"2021","unstructured":"Gou J, Yu B, Maybank SJ et al (2021) Knowledge distillation: a survey. Int J Comput Vis 129:1789\u20131819. https:\/\/doi.org\/10.1007\/s11263-021-01453-z","journal-title":"Int J Comput Vis"},{"key":"1394_CR17","doi-asserted-by":"publisher","unstructured":"Romero A, Ballas N, Kahou SE et al (2014) Fitnets: hints for thin deep nets. Preprint arXiv:1412.6550. https:\/\/doi.org\/10.48550\/arXiv.1412.6550","DOI":"10.48550\/arXiv.1412.6550"},{"key":"1394_CR18","doi-asserted-by":"publisher","unstructured":"Ahn S, Hu SX, Damianou A et al (2019) Variational information distillation for knowledge transfer. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 9163\u20139171. https:\/\/doi.org\/10.48550\/arXiv.1904.05835","DOI":"10.48550\/arXiv.1904.05835"},{"key":"1394_CR19","doi-asserted-by":"publisher","unstructured":"Chen D, Mei JP, Zhang Y et al (2021) Cross-layer distillation with semantic calibration. In: Proceedings of the AAAI conference on artificial intelligence, vol 35(8), pp 7028\u20137036. https:\/\/doi.org\/10.48550\/arXiv.2012.03236","DOI":"10.48550\/arXiv.2012.03236"},{"key":"1394_CR20","doi-asserted-by":"publisher","unstructured":"Tian Y, Krishnan D, Isola P (2019) Contrastive representation distillation. Preprint arXiv:1910.10699. https:\/\/doi.org\/10.48550\/arXiv.1910.10699","DOI":"10.48550\/arXiv.1910.10699"},{"key":"1394_CR21","doi-asserted-by":"publisher","unstructured":"Tung F, Mori G (2019) Similarity-preserving knowledge distillation. In: Proceedings of the IEEE\/CVF international conference on computer vision, pp 1365\u20131374. https:\/\/doi.org\/10.1109\/ICCV.2019.00145","DOI":"10.1109\/ICCV.2019.00145"},{"key":"1394_CR22","unstructured":"Yang J, Martinez B, Bulat A et al (2021) Knowledge distillation via softmax regression representation learning. In: International conference on learning representations (ICLR)"},{"key":"1394_CR23","doi-asserted-by":"publisher","unstructured":"Zagoruyko S, Komodakis N (2016) Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. Preprint arXiv:1612.03928. https:\/\/doi.org\/10.48550\/arXiv.1612.03928","DOI":"10.48550\/arXiv.1612.03928"},{"key":"1394_CR24","doi-asserted-by":"publisher","unstructured":"Ben-Baruch E, Karklinsky M, Biton Y et al (2022) It's all in the head: representation knowledge distillation through classifier sharing. Preprint arXiv:2201.06945. https:\/\/doi.org\/10.48550\/arXiv.2201.06945","DOI":"10.48550\/arXiv.2201.06945"},{"key":"1394_CR25","doi-asserted-by":"publisher","unstructured":"Zhou Z-H, Wu J, Tang W (2002) Ensembling neural networks: many could be better than all. Artif Intell 137(1\u20132):239\u2013263. ISSN 0004-3702. https:\/\/doi.org\/10.1016\/S0004-3702(02)00190-X","DOI":"10.1016\/S0004-3702(02)00190-X"},{"key":"1394_CR26","doi-asserted-by":"publisher","unstructured":"Wang X, Kondratyuk D, Christiansen E et al (2020) Wisdom of committees: an overlooked approach to faster and more accurate models. Preprint arXiv:2012.01988. https:\/\/doi.org\/10.48550\/arXiv.2012.01988","DOI":"10.48550\/arXiv.2012.01988"},{"key":"1394_CR27","doi-asserted-by":"publisher","unstructured":"Chen Z, Wang S, Li J et al (2020) Rethinking generative zero-shot learning: an ensemble learning perspective for recognising visual patches. In: Proceedings of the 28th ACM international conference on multimedia, pp 3413\u20133421. https:\/\/doi.org\/10.48550\/arXiv.2007.13314","DOI":"10.48550\/arXiv.2007.13314"},{"key":"1394_CR28","doi-asserted-by":"publisher","first-page":"5991","DOI":"10.1007\/s40747-023-01025-3","volume":"9","author":"X Li","year":"2023","unstructured":"Li X, Zheng X, Zhang T et al (2023) Robust fault diagnosis of a high-voltage circuit breaker via an ensemble echo state network with evidence fusion. Complex Intell Syst 9:5991\u20136007. https:\/\/doi.org\/10.1007\/s40747-023-01025-3","journal-title":"Complex Intell Syst"},{"key":"1394_CR29","doi-asserted-by":"publisher","unstructured":"Heo B, Lee M, Yun S et al (2019) Knowledge transfer via distillation of activation boundaries formed by hidden neurons. In: Proceedings of the AAAI conference on artificial intelligence, vol 33(01), pp 3779\u20133787. https:\/\/doi.org\/10.48550\/arXiv.1811.03233","DOI":"10.48550\/arXiv.1811.03233"},{"key":"1394_CR30","doi-asserted-by":"publisher","unstructured":"Park W, Kim D, Lu Y et al (2019) Relational knowledge distillation. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 3967\u20133976. https:\/\/doi.org\/10.48550\/arXiv.1904.05068","DOI":"10.48550\/arXiv.1904.05068"},{"key":"1394_CR31","doi-asserted-by":"publisher","unstructured":"Chen P, Liu S, Zhao H et al (2021) Distilling knowledge via knowledge review. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 5008\u20135017. https:\/\/doi.org\/10.48550\/arXiv.2104.09044","DOI":"10.48550\/arXiv.2104.09044"},{"key":"1394_CR32","unstructured":"Yang J, Martinez B, Bulat A et al (2020) Knowledge distillation via softmax regression representation learning. In: International conference on learning representations"},{"key":"1394_CR33","doi-asserted-by":"publisher","unstructured":"Kim J, Park SU, Kwak N (2018) Paraphrasing complex network: network compression via factor transfer. Adv Neural Inf Process Syst. https:\/\/doi.org\/10.48550\/arXiv.1802.04977","DOI":"10.48550\/arXiv.1802.04977"},{"key":"1394_CR34","doi-asserted-by":"publisher","unstructured":"Heo B, Kim J, Yun S et al (2019) A comprehensive overhaul of feature distillation. In: Proceedings of the IEEE\/CVF international conference on computer vision, pp 1921\u20131930. https:\/\/doi.org\/10.48550\/arXiv.1904.01866","DOI":"10.48550\/arXiv.1904.01866"},{"issue":"1","key":"1394_CR35","doi-asserted-by":"publisher","first-page":"79","DOI":"10.1214\/aoms\/1177729694","volume":"22","author":"S Kullback","year":"1951","unstructured":"Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79\u201386. https:\/\/doi.org\/10.1214\/aoms\/1177729694","journal-title":"Ann Math Stat"},{"key":"1394_CR36","doi-asserted-by":"publisher","unstructured":"Chen X, He K (2021) Exploring simple siamese representation learning. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 15750\u201315758. https:\/\/doi.org\/10.48550\/arXiv.2011.10566","DOI":"10.48550\/arXiv.2011.10566"},{"key":"1394_CR37","doi-asserted-by":"publisher","unstructured":"Grill JB, Strub F, Altch\u00e9 F et al (2020) Bootstrap your own latent\u2014a new approach to self-supervised learning. Adv Neural Inf Process Syst, vol 33, pp 21271\u201321284. https:\/\/doi.org\/10.48550\/arXiv.2006.07733","DOI":"10.48550\/arXiv.2006.07733"},{"issue":"1","key":"1394_CR38","doi-asserted-by":"publisher","first-page":"41","DOI":"10.1023\/A:1007379606734","volume":"28","author":"R Caruana","year":"1997","unstructured":"Caruana R (1997) Multitask learning. Mach Learn 28(1):41\u201375","journal-title":"Mach Learn"},{"key":"1394_CR39","doi-asserted-by":"publisher","unstructured":"Donahue J, Jia Y, Vinyals O et al (2014) Decaf: a deep convolutional activation feature for generic visual recognition. In: International conference on machine learning. PMLR, pp 647\u2013655. https:\/\/doi.org\/10.48550\/arXiv.1310.1531","DOI":"10.48550\/arXiv.1310.1531"},{"key":"1394_CR40","doi-asserted-by":"publisher","unstructured":"Li Z, Hoiem D (2017) Learning without forgetting. IEEE Trans Pattern Anal Mach Intell 40(12):2935\u20132947. https:\/\/doi.org\/10.48550\/arXiv.1606.09282","DOI":"10.48550\/arXiv.1606.09282"},{"key":"1394_CR41","unstructured":"Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images"},{"key":"1394_CR42","unstructured":"Le Y, Yang X (2015) Tiny imagenet visual recognition challenge. CS 231N 7(7):3"},{"key":"1394_CR43","doi-asserted-by":"publisher","unstructured":"He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770\u2013778. https:\/\/doi.org\/10.48550\/arXiv.1512.03385","DOI":"10.48550\/arXiv.1512.03385"},{"key":"1394_CR44","doi-asserted-by":"publisher","unstructured":"Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Preprint arXiv:1409.1556. https:\/\/doi.org\/10.48550\/arXiv.1409.1556","DOI":"10.48550\/arXiv.1409.1556"},{"key":"1394_CR45","doi-asserted-by":"publisher","unstructured":"Sandler M, Howard A, Zhu M et al (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510\u20134520. https:\/\/doi.org\/10.48550\/arXiv.1801.04381","DOI":"10.48550\/arXiv.1801.04381"},{"key":"1394_CR46","doi-asserted-by":"publisher","unstructured":"Ma N, Zhang X, Zheng HT et al (2018) Shufflenet v2: practical guidelines for efficient cnn architecture design. In: Proceedings of the European conference on computer vision (ECCV), pp 116\u2013131. https:\/\/doi.org\/10.48550\/arXiv.1807.11164","DOI":"10.48550\/arXiv.1807.11164"},{"key":"1394_CR47","doi-asserted-by":"publisher","unstructured":"Deng X, Zhang Z (2021) Learning with retrospection. In: Proceedings of the AAAI conference on artificial intelligence, vol 35(8), pp 7201\u20137209. https:\/\/doi.org\/10.48550\/arXiv.2012.13098","DOI":"10.48550\/arXiv.2012.13098"},{"key":"1394_CR48","doi-asserted-by":"publisher","unstructured":"Mobahi H, Farajtabar M, Bartlett P (2020) Self-distillation amplifies regularization in hilbert space. Adv Neural Inf Process Syst, vol 33, pp 3351\u20133361. https:\/\/doi.org\/10.48550\/arXiv.2002.05715","DOI":"10.48550\/arXiv.2002.05715"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-024-01394-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-024-01394-3\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-024-01394-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,5,16]],"date-time":"2024-05-16T18:29:59Z","timestamp":1715884199000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-024-01394-3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,3,20]]},"references-count":48,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2024,6]]}},"alternative-id":["1394"],"URL":"https:\/\/doi.org\/10.1007\/s40747-024-01394-3","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"type":"print","value":"2199-4536"},{"type":"electronic","value":"2198-6053"}],"subject":[],"published":{"date-parts":[[2024,3,20]]},"assertion":[{"value":"18 July 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"16 February 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 March 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"On behalf of all authors, the corresponding author states that there is no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}