{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,12]],"date-time":"2025-10-12T03:36:12Z","timestamp":1760240172376,"version":"build-2065373602"},"reference-count":47,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2019,4,2]],"date-time":"2019-04-02T00:00:00Z","timestamp":1554163200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Entropy"],"abstract":"<jats:p>Recently, deep learning has achieved state-of-the-art performance in more aspects than traditional shallow architecture-based machine-learning methods. However, in order to achieve higher accuracy, it is usually necessary to extend the network depth or ensemble the results of different neural networks. Increasing network depth or ensembling different networks increases the demand for memory resources and computing resources. This leads to difficulties in deploying depth-learning models in resource-constrained scenarios such as drones, mobile phones, and autonomous driving. Improving network performance without expanding the network scale has become a hot topic for research. In this paper, we propose a cross-architecture online-distillation approach to solve this problem by transmitting supplementary information on different networks. We use the ensemble method to aggregate networks of different structures, thus forming better teachers than traditional distillation methods. In addition, discontinuous distillation with progressively enhanced constraints is used to replace fixed distillation in order to reduce loss of information diversity in the distillation process. Our training method improves the distillation effect and achieves strong network-performance improvement. We used some popular models to validate the results. On the CIFAR100 dataset, AlexNet\u2019s accuracy was improved by 5.94%, VGG by 2.88%, ResNet by 5.07%, and DenseNet by 1.28%. Extensive experiments were conducted to demonstrate the effectiveness of the proposed method. On the CIFAR10, CIFAR100, and ImageNet datasets, we observed significant improvements over traditional knowledge distillation.<\/jats:p>","DOI":"10.3390\/e21040357","type":"journal-article","created":{"date-parts":[[2019,4,3]],"date-time":"2019-04-03T03:39:28Z","timestamp":1554262768000},"page":"357","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":10,"title":["Multistructure-Based Collaborative Online Distillation"],"prefix":"10.3390","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0896-8177","authenticated-orcid":false,"given":"Liang","family":"Gao","sequence":"first","affiliation":[{"name":"National Key Laboratory of Parallel and Distributed Processing, College of Computer, National University of Defense Technology, Changsha 410073, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Xu","family":"Lan","sequence":"additional","affiliation":[{"name":"School of Electronic Engineering and Computer Science, Queen Mary University of London, London E14NS, UK"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Haibo","family":"Mi","sequence":"additional","affiliation":[{"name":"National Key Laboratory of Parallel and Distributed Processing, College of Computer, National University of Defense Technology, Changsha 410073, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Dawei","family":"Feng","sequence":"additional","affiliation":[{"name":"National Key Laboratory of Parallel and Distributed Processing, College of Computer, National University of Defense Technology, Changsha 410073, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5997-5169","authenticated-orcid":false,"given":"Kele","family":"Xu","sequence":"additional","affiliation":[{"name":"National Key Laboratory of Parallel and Distributed Processing, College of Computer, National University of Defense Technology, Changsha 410073, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yuxing","family":"Peng","sequence":"additional","affiliation":[{"name":"National Key Laboratory of Parallel and Distributed Processing, College of Computer, National University of Defense Technology, Changsha 410073, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2019,4,2]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1038\/nature14539","article-title":"Deep learning","volume":"521","author":"LeCun","year":"2015","journal-title":"Nature"},{"key":"ref_2","unstructured":"Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning, MIT Press."},{"key":"ref_3","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (July, January 26). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2015, January 7\u201312). Going deeper with convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Girshick, R. (2015, January 7\u201313). Fast r-cnn. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","DOI":"10.1109\/ICCV.2015.169"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"1834","DOI":"10.1109\/TIP.2015.2510583","article-title":"Deeptrack: Learning discriminative feature representations online for robust visual tracking","volume":"25","author":"Li","year":"2016","journal-title":"IEEE Trans. Image Process."},{"key":"ref_7","unstructured":"Mikolov, T., Chen, K., Corrado, G., and Dean, J. (arXiv, 2013). Efficient estimation of word representations in vector space, arXiv."},{"key":"ref_8","unstructured":"Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., and Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Neural Information Processing Systems, Neural Information Processing Systems Foundation, Inc."},{"key":"ref_9","unstructured":"Hermann, K.M., Kocisky, T., Grefenstette, E., Espeholt, L., Kay, W., Suleyman, M., and Blunsom, P. (2015). Teaching machines to read and comprehend. Advances in Neural Information Processing Systems, Palais des Congr\u00e8s de Montr\u00e9al."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Joulin, A., Grave, E., Bojanowski, P., and Mikolov, T. (arXiv, 2017). Bag of tricks for efficient text classification, arXiv.","DOI":"10.18653\/v1\/E17-2068"},{"key":"ref_11","unstructured":"Weston, J., Bordes, A., Chopra, S., Rush, A.M., van Merri\u00ebnboer, B., Joulin, A., and Mikolov, T. (arXiv, 2016). Towards AI-complete question answering: A set of prerequisite toy tasks, arXiv."},{"key":"ref_12","unstructured":"Krizhevsky, A., Sutskever, I., and Hinton, G.E. (2012). Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, Neural Information Processing Systems Foundation, Inc."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Long, J., Shelhamer, E., and Darrell, T. (2015, January 7\u201312). Fully convolutional networks for semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298965"},{"key":"ref_14","first-page":"35","article-title":"Wide residual networks","volume":"8","author":"Zagoruyko","year":"2016","journal-title":"Br. Mach. Vis. Conf."},{"key":"ref_15","unstructured":"Canziani, A., Paszke, A., and Culurciello, E. (arXiv, 2017). An analysis of deep neural network models for practical applications, arXiv."},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Deng, L., and Platt, J.C. (2014, January 14\u201318). Ensemble deep learning for speech recognition. Proceedings of the Fifteenth Annual Conference of the International Speech Communication Association, Singapore.","DOI":"10.21437\/Interspeech.2014-433"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Qiu, X., Zhang, L., Ren, Y., Suganthan, P.N., and Amaratunga, G. (2014, January 9\u201312). Ensemble deep learning for regression and time series forecasting. Proceedings of the IEEE Computational Intelligence in Ensemble Learning, Orlando, FL, USA.","DOI":"10.1109\/CIEL.2014.7015739"},{"key":"ref_18","unstructured":"Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., and Keutzer, K. (arXiv, 2017). SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size, arXiv."},{"key":"ref_19","unstructured":"Cheng, Y., Wang, D., Zhou, P., and Zhang, T. (arXiv, 2017). A survey of model compression and acceleration for deep neural networks, arXiv."},{"key":"ref_20","unstructured":"Han, S., Mao, H., and Dally, W.J. (arXiv, 2016). Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding, arXiv."},{"key":"ref_21","unstructured":"Ba, J., and Caruana, R. (2014). Do deep nets really need to be deep?. Advances in Neural Information Processing Systems, Palais des Congr\u00e8s de Montr\u00e9al."},{"key":"ref_22","unstructured":"Hinton, G., Vinyals, O., and Dean, J. (arXiv, 2015). Distilling the knowledge in a neural network, arXiv."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Verma, N., Mahajan, D., Sellamanickam, S., and Nair, V. (2012, January 16\u201321). Learning hierarchical similarity metrics. Proceedings of the IEEE Computer Vision and Pattern Recognition, Providence, RI, USA.","DOI":"10.1109\/CVPR.2012.6247938"},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Deng, J., Berg, A.C., Li, K., and Fei-Fei, L. (2010). What does classifying more than 10,000 image categories tell us?. European Conference on Computer Vision, Springer.","DOI":"10.1007\/978-3-642-15555-0_6"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Zhang, Y., Xiang, T., Hospedales, T.M., and Lu, H. (2018, January 18\u201322). Deep mutual learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","DOI":"10.1109\/CVPR.2018.00454"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"107","DOI":"10.1145\/1327452.1327492","article-title":"MapReduce: Simplified data processing on large clusters","volume":"51","author":"Dean","year":"2008","journal-title":"Commun. ACM"},{"key":"ref_27","unstructured":"Dean, J., Corrado, G., Monga, R., Chen, K., Devin, M., Mao, M., Senior, A., Tucker, P., Yang, K., and Le, Q.V. (2012). Large scale distributed deep networks. Advances in Neural Information Processing Systems, Neural Information Processing Systems Foundation, Inc."},{"key":"ref_28","unstructured":"Izmailov, P., Podoprikhin, D., Garipov, T., Vetrov, D.P., and Wilson, A. (arXiv, 2018). Averaging weights leads to wider optima and better generalization, arXiv."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Zhang, X., Trmal, J., Povey, D., and Khudanpur, S. (2014, January 4\u20139). Improving deep neural network acoustic models using generalized maxout networks. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Florence, Italy.","DOI":"10.1109\/ICASSP.2014.6853589"},{"key":"ref_30","unstructured":"Xu, K., Mi, H., Feng, D., Wang, H., Chen, C., Zheng, Z., and Lan, X. (arXiv, 2018). Collaborative deep learning across multiple data centers, arXiv."},{"key":"ref_31","unstructured":"Chen, J., Pan, X., Monga, R., Bengio, S., and Jozefowicz, R. (arXiv, 2016). Revisiting distributed synchronous SGD, arXiv."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1023\/A:1007379606734","article-title":"Multitask learning","volume":"28","author":"Caruana","year":"1997","journal-title":"Mach. Learn."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Evgeniou, T., and Pontil, M. (2004, January 22\u201325). Regularized multi\u2013task learning. Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, USA.","DOI":"10.1145\/1014052.1014067"},{"key":"ref_34","unstructured":"Sun, Y., Chen, Y., Wang, X., and Tang, X. (2014). Deep learning face representation by joint identification-verification. Advances in Neural Information Processing Systems, Palais des Congr\u00e8s de Montr\u00e9al."},{"key":"ref_35","unstructured":"Yim, J., Jung, H., Yoo, B., Choi, C., Park, D., and Kim, J. (2015, January 7\u201312). Rotating your face using multi-task deep neural network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"806","DOI":"10.1162\/neco_a_01169","article-title":"Multiclass alpha integration of scores from multiple classifiers","volume":"31","author":"Safont","year":"2019","journal-title":"Neural Comput."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"1371","DOI":"10.1016\/j.advwatres.2006.11.014","article-title":"Multi-model ensemble hydrologic prediction using Bayesian model averaging","volume":"30","author":"Duan","year":"2007","journal-title":"Water Resour."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"1983","DOI":"10.1162\/NECO_a_00766","article-title":"Fusion of scores in a detection context based on alpha integration","volume":"27","author":"Soriano","year":"2015","journal-title":"Neural Comput."},{"key":"ref_39","unstructured":"Anil, R., Pereyra, G., Passos, A.T., Ormandi, R., Dahl, G.E., and Hinton, G.E. (arXiv, 2018). Large scale distributed neural network training through online distillation, arXiv."},{"key":"ref_40","unstructured":"Tarvainen, A., and Valpola, H. (2017). Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Advances in Neural Information Processing Systems, Neural Information Processing Systems Foundation, Inc."},{"key":"ref_41","unstructured":"Lan, X., Zhu, X., and Gong, S. (arXiv, 2018). Knowledge distillation by On-the-Fly native ensemble, arXiv."},{"key":"ref_42","unstructured":"Yang, C., Xie, L., Qiao, S., and Yuille, A.L. (arXiv, 2018). Knowledge distillation in generations: More tolerant teachers educate better students, arXiv."},{"key":"ref_43","unstructured":"Krizhevsky, A., and Hinton, G. (2009). Learning Multiple Layers of Features from Tiny Images, Technical Report; University of Toronto."},{"key":"ref_44","unstructured":"Krizhevsky, A. (arXiv, 2014). One weird trick for parallelizing convolutional neural networks, arXiv."},{"key":"ref_45","unstructured":"Simonyan, K., and Zisserman, A. (arXiv, 2015). Very deep convolutional networks for large-scale image recognition, arXiv."},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Huang, G., Liu, Z., Van Der Maaten, L., and Weinberger, K.Q. (2017, January 21\u201326). Densely connected convolutional networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","DOI":"10.1109\/CVPR.2017.243"},{"key":"ref_47","doi-asserted-by":"crossref","unstructured":"Sun, S., Chen, W., Bian, J., Liu, X., and Liu, T.Y. (2017). Ensemble-compression: A new method for parallel training of deep neural networks. Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer.","DOI":"10.1007\/978-3-319-71249-9_12"}],"container-title":["Entropy"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1099-4300\/21\/4\/357\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T12:42:29Z","timestamp":1760186549000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1099-4300\/21\/4\/357"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,4,2]]},"references-count":47,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2019,4]]}},"alternative-id":["e21040357"],"URL":"https:\/\/doi.org\/10.3390\/e21040357","relation":{},"ISSN":["1099-4300"],"issn-type":[{"type":"electronic","value":"1099-4300"}],"subject":[],"published":{"date-parts":[[2019,4,2]]}}}