{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,7]],"date-time":"2026-02-07T19:45:19Z","timestamp":1770493519360,"version":"3.49.0"},"reference-count":23,"publisher":"MDPI AG","issue":"11","license":[{"start":{"date-parts":[[2021,11,15]],"date-time":"2021-11-15T00:00:00Z","timestamp":1636934400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>Nowadays, the transfer learning technique can be successfully applied in the deep learning field through techniques that fine-tune the CNN\u2019s starting point so it may learn over a huge dataset such as ImageNet and continue to learn on a fixed dataset to achieve better performance. In this paper, we designed a transfer learning methodology that combines the learned features of different teachers to a student network in an end-to-end model, improving the performance of the student network in classification tasks over different datasets. In addition to this, we tried to answer the following questions which are in any case directly related to the transfer learning problem addressed here. Is it possible to improve the performance of a small neural network by using the knowledge gained from a more powerful neural network? Can a deep neural network outperform the teacher using transfer learning? Experimental results suggest that neural networks can transfer their learning to student networks using our proposed architecture, designed to bring to light a new interesting approach for transfer learning techniques. Finally, we provide details of the code and the experimental settings.<\/jats:p>","DOI":"10.3390\/a14110334","type":"journal-article","created":{"date-parts":[[2021,11,15]],"date-time":"2021-11-15T20:46:47Z","timestamp":1637009207000},"page":"334","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Is One Teacher Model Enough to Transfer Knowledge to a Student Model?"],"prefix":"10.3390","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0565-7496","authenticated-orcid":false,"given":"Nicola","family":"Landro","sequence":"first","affiliation":[{"name":"Department of Theoretical and Applied Sciences\u2014DISTA, University of Insubria, Via J.H. Dunant, 3, 21100 Varese, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7076-8328","authenticated-orcid":false,"given":"Ignazio","family":"Gallo","sequence":"additional","affiliation":[{"name":"Department of Theoretical and Applied Sciences\u2014DISTA, University of Insubria, Via J.H. Dunant, 3, 21100 Varese, Italy"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4355-0366","authenticated-orcid":false,"given":"Riccardo","family":"La Grassa","sequence":"additional","affiliation":[{"name":"Department of Theoretical and Applied Sciences\u2014DISTA, University of Insubria, Via J.H. Dunant, 3, 21100 Varese, Italy"}]}],"member":"1968","published-online":{"date-parts":[[2021,11,15]]},"reference":[{"key":"ref_1","unstructured":"Hinton, G., Vinyals, O., and Dean, J. (2015). Distilling the Knowledge in a Neural Network. arXiv."},{"key":"ref_2","unstructured":"Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., and Bengio, Y. (2015, January 7\u20139). FitNets: Hints for Thin Deep Nets. Proceedings of the 3rd International Conference on Learning Representations, ICLR, San Diego, CA, USA."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Ahn, S., Hu, S.X., Damianou, A., Lawrence, N.D., and Dai, Z. (2019, January 15\u201320). Variational information distillation for knowledge transfer. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00938"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Tung, F., and Mori, G. (2019, January 23). Similarity-preserving knowledge distillation. Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea. Available online: https:\/\/openaccess.thecvf.com\/content_ICCV_2019\/papers\/Tung_Similarity-Preserving_Knowledge_Distillation_ICCV_2019_paper.pdf.","DOI":"10.1109\/ICCV.2019.00145"},{"key":"ref_5","unstructured":"Srinivas, S., and Fleuret, F. (2018). Knowledge transfer with jacobian matching. arXiv."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Lee, S.H., Kim, D.H., and Song, B.C. (2018, January 8\u201314). Self-supervised knowledge distillation using singular value decomposition. Proceedings of the European Conference on Computer Vision, Munich, Germany.","DOI":"10.1007\/978-3-030-01231-1_21"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Kornblith, S., Shlens, J., and Le, Q.V. (2019, January 16\u201320). Do better imagenet models transfer better?. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","DOI":"10.1109\/CVPR.2019.00277"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Yang, Z., Luo, T., Wang, D., Hu, Z., Gao, J., and Wang, L. (2018, January 8\u201314). Learning to navigate for fine-grained classification. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","DOI":"10.1007\/978-3-030-01264-9_26"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Agarwal, N., Sondhi, A., Chopra, K., and Singh, G. (2021). Transfer learning: Survey and classification. Smart Innovations in Communication and Computational Sciences, Springer.","DOI":"10.1007\/978-981-15-5345-5_13"},{"key":"ref_10","unstructured":"Goodfellow, I.J., Warde-Farley, D., Mirza, M., Courville, A., and Bengio, Y. (2013). Maxout networks. arXiv."},{"key":"ref_11","unstructured":"Landro, N. (2021, November 11). Features Transfer Learning. Available online: https:\/\/gitlab.com\/nicolalandro\/features_transfer_learning."},{"key":"ref_12","unstructured":"Krizhevsky, A., and Hinton, G. (2009). Learning Multiple Layers of Features from Tiny Images, University of Toronto. Available online: http:\/\/citeseerx.ist.psu.edu\/viewdoc\/download?doi=10.1.1.222.9220&rep=rep1&type=pdf."},{"key":"ref_13","unstructured":"Khosla, A., Jayadevaprakash, N., Yao, B., and Fei-Fei, L. (2011, January 20\u201325). Novel Dataset for Fine-Grained Image Categorization. Proceedings of the First Workshop on Fine-Grained Visual Categorization, Colorado Springs, CO, USA. Available online: https:\/\/people.csail.mit.edu\/khosla\/papers\/fgvc2011.pdf."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., and Fei-Fei, L. (2009, January 20\u201325). ImageNet: A Large-Scale Hierarchical Image Database. Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA.","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Parkhi, O.M., Vedaldi, A., Zisserman, A., and Jawahar, C. (2012, January 16\u201321). Cats and dogs. Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA.","DOI":"10.1109\/CVPR.2012.6248092"},{"key":"ref_16","unstructured":"Wah, C., Branson, S., Welinder, P., Perona, P., and Belongie, S. (2011). The Caltech-UCSD Birds-200-2011 Dataset, California Institute of Technology. Technical Report CNS-TR-2011-001."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1007\/s11263-015-0816-y","article-title":"ImageNet Large Scale Visual Recognition Challenge","volume":"115","author":"Russakovsky","year":"2015","journal-title":"Int. J. Comput. Vis. (IJCV)"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and Chen, L.C. (2018, January 18\u201323). Mobilenetv2: Inverted residuals and linear bottlenecks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00474"},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Nawaz, S., Calefati, A., Caraffini, M., Landro, N., and Gallo, I. (2019, January 2\u20134). Are These Birds Similar: Learning Branched Networks for Fine-grained Representations. Proceedings of the 2019 International Conference on Image and Vision Computing New Zealand (IVCNZ), Dunedin, New Zealand.","DOI":"10.1109\/IVCNZ48456.2019.8960960"},{"key":"ref_20","unstructured":"Landro, N. (2021, November 11). 131 Dog\u2019s Species Classification. Available online: https:\/\/github.com\/nicolalandro\/dogs_species_prediction."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2015). Deep Residual Learning for Image Recognition. arXiv.","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_22","unstructured":"Targ, S., Almeida, D., and Lyman, K. (2016). Resnet in resnet: Generalizing residual architectures. arXiv."},{"key":"ref_23","unstructured":"Frankle, J., and Carbin, M. (2018). The lottery ticket hypothesis: Finding sparse, trainable neural networks. arXiv."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/14\/11\/334\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T07:30:38Z","timestamp":1760167838000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/14\/11\/334"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,11,15]]},"references-count":23,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2021,11]]}},"alternative-id":["a14110334"],"URL":"https:\/\/doi.org\/10.3390\/a14110334","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,11,15]]}}}