{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,7]],"date-time":"2026-03-07T18:35:15Z","timestamp":1772908515954,"version":"3.50.1"},"reference-count":31,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2022,11,16]],"date-time":"2022-11-16T00:00:00Z","timestamp":1668556800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,11,16]],"date-time":"2022-11-16T00:00:00Z","timestamp":1668556800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2023,4]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Accessibility on edge devices and the trade-off between latency and accuracy is an area of interest in deploying deep learning models. This paper explores a Mixture of Experts system, namely, DynK-Hydra, which allows training of an ensemble formed of multiple similar branches on data sets with a high number of classes, but uses, during the inference, only a subset of necessary branches. We achieve this by training a cohort of specialized branches (deep network of reduced size) and a gater\/supervisor, that decides dynamically what branch to use for any specific input. An original contribution is that the number of chosen models is dynamically set, based on how confident the gater is (similar works use a static parameter for this). Another contribution is the way we ensure the branches\u2019 specialization. We divide the data set classes into multiple clusters, and we assign a cluster to each branch while enforcing its specialization on this cluster by a separate loss function. We evaluate DynK-Hydra on CIFAR-100, Food-101, CUB-200, and ImageNet32 data sets and we obtain improvements of up to 4.3% accuracy compared with state-of-the-art ResNet. All this while reducing the number of inference flops by a factor of 2\u20135.5 times. Compared to a similar work (HydraRes), we obtain marginal accuracy improvements of up to 1.2% on the pairwise inference time architectures. However, we improve the inference times by up to 2.8 times compared to HydraRes.<\/jats:p>","DOI":"10.1007\/s40747-022-00897-1","type":"journal-article","created":{"date-parts":[[2022,11,16]],"date-time":"2022-11-16T11:05:52Z","timestamp":1668596752000},"page":"2177-2188","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":1,"title":["DynK-hydra: improved dynamic architecture ensembling for efficient inference"],"prefix":"10.1007","volume":"9","author":[{"given":"Tudor Alexandru","family":"Ileni","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7580-5722","authenticated-orcid":false,"given":"Adrian Sergiu","family":"Darabant","sequence":"additional","affiliation":[]},{"given":"Diana Laura","family":"Borza","sequence":"additional","affiliation":[]},{"given":"Alexandru Ion","family":"Marinescu","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,11,16]]},"reference":[{"key":"897_CR1","doi-asserted-by":"crossref","unstructured":"Zhang Y, Xiang T, Hospedales TM, Lu H (2018) Deep mutual learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4320\u2013 4328","DOI":"10.1109\/CVPR.2018.00454"},{"key":"897_CR2","unstructured":"Fort S, Hu H, Lakshminarayanan B (2019) Deep ensembles: A loss landscape perspective. arXiv preprint arXiv:1912.02757"},{"issue":"1","key":"897_CR3","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1016\/j.inffus.2004.04.004","volume":"6","author":"G Brown","year":"2005","unstructured":"Brown G, Wyatt J, Harris R, Yao X (2005) Diversity creation methods: a survey and categorisation. Inform Fusion 6(1):5\u201320","journal-title":"Inform Fusion"},{"issue":"2","key":"897_CR4","doi-asserted-by":"publisher","first-page":"275","DOI":"10.1007\/s10462-012-9338-y","volume":"42","author":"S Masoudnia","year":"2014","unstructured":"Masoudnia S, Ebrahimpour R (2014) Mixture of experts: a literature survey. Artif Intell Rev 42(2):275\u2013293","journal-title":"Artif Intell Rev"},{"key":"897_CR5","unstructured":"Mullapudi RT, Mark WR, Shazeer N, Fatahalian K.( 2018) Hydranets: Specialized dynamic architectures for efficient inference. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8080\u2013 8089"},{"key":"897_CR6","unstructured":"Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images"},{"key":"897_CR7","doi-asserted-by":"publisher","first-page":"446","DOI":"10.1007\/978-3-319-10599-4_29","volume-title":"Computer vision - ECCV 2014","author":"L Bossard","year":"2014","unstructured":"Bossard L, Guillaumin M, Van Gool L (2014) Food-101 - mining discriminative components with random forests. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer vision - ECCV 2014. Springer, Cham, pp 446\u2013461"},{"key":"897_CR8","unstructured":"Welinder P, Branson S, Mita T, Wah C, Schroff F, Belongie S, Perona P (2010) Caltech-ucsd birds 200. Technical Report Cns-tr-2010-001, California Institute of Technology"},{"key":"897_CR9","doi-asserted-by":"crossref","unstructured":"Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L ( 2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248\u2013 255 . Ieee","DOI":"10.1109\/CVPR.2009.5206848"},{"key":"897_CR10","doi-asserted-by":"crossref","unstructured":"He K, Zhang X, Ren S, Sun J ( 2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770\u2013 778","DOI":"10.1109\/CVPR.2016.90"},{"key":"897_CR11","unstructured":"Eigen D, Ranzato M, Sutskever I (2013) Learning factored representations in a deep mixture of experts. arXiv preprint arXiv:1312.4314"},{"key":"897_CR12","doi-asserted-by":"publisher","DOI":"10.1109\/tii.2022.3156658","author":"W Dong","year":"2022","unstructured":"Dong W, Wozniak M, Wu J, Li W, Bai Z (2022) De-noising aggregation of graph neural networks by using principal component analysis. IEEE Transact Industrial Inform. https:\/\/doi.org\/10.1109\/tii.2022.3156658","journal-title":"IEEE Transact Industrial Inform"},{"key":"897_CR13","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2022.109616","volume":"254","author":"W Dong","year":"2022","unstructured":"Dong W, Wu J, Zhang X, Bai Z, Wang P, Wo\u017aniak M (2022) Improving performance and efficiency of graph neural networks by injective aggregation. Knowl-Based Syst 254:109616. https:\/\/doi.org\/10.1016\/j.knosys.2022.109616","journal-title":"Knowl-Based Syst"},{"issue":"4","key":"897_CR14","doi-asserted-by":"publisher","first-page":"820","DOI":"10.1109\/TNN.2003.813832","volume":"14","author":"MM Islam","year":"2003","unstructured":"Islam MM, Yao X, Murase K (2003) A constructive algorithm for training cooperative neural network ensembles. IEEE Trans Neural Netw 14(4):820\u2013834","journal-title":"IEEE Trans Neural Netw"},{"issue":"6","key":"897_CR15","doi-asserted-by":"publisher","first-page":"1289","DOI":"10.1162\/neco.1994.6.6.1289","volume":"6","author":"H Drucker","year":"1994","unstructured":"Drucker H, Cortes C, Jackel LD, LeCun Y, Vapnik V (1994) Boosting and other ensemble methods. Neural Comput 6(6):1289\u20131301","journal-title":"Neural Comput"},{"issue":"2","key":"897_CR16","first-page":"123","volume":"24","author":"L Breiman","year":"1996","unstructured":"Breiman L (1996) Bagging predictors. Machine Learn 24(2):123\u2013140","journal-title":"Bagging predictors. Machine Learn"},{"issue":"2","key":"897_CR17","doi-asserted-by":"publisher","first-page":"241","DOI":"10.1016\/S0893-6080(05)80023-1","volume":"5","author":"DH Wolpert","year":"1992","unstructured":"Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241\u2013259","journal-title":"Neural Netw"},{"key":"897_CR18","unstructured":"Huang G, Li Y, Pleiss G, Liu Z, Hopcroft JE, Weinberger KQ (2017) Snapshot ensembles: Train 1, get m for free. arXiv preprint arXiv:1704.00109"},{"issue":"7","key":"897_CR19","first-page":"231","volume":"7","author":"A Krogh","year":"1995","unstructured":"Krogh A, Vedelsby J (1995) Validation, and active learning. Adv Neural Inform Proces Syst 7(7):231","journal-title":"Adv Neural Inform Proces Syst"},{"issue":"3","key":"897_CR20","doi-asserted-by":"publisher","first-page":"231","DOI":"10.1017\/S0269888997003123","volume":"12","author":"AJ Sharkey","year":"1997","unstructured":"Sharkey AJ, Sharkey NE (1997) Combining diverse neural nets. Knowledge Eng Rev 12(3):231\u2013247","journal-title":"Knowledge Eng Rev"},{"key":"897_CR21","unstructured":"Lan X, Zhu X, Gong S (2018) Knowledge distillation by on-the-fly native ensemble. arXiv preprint arXiv:1806.04606"},{"key":"897_CR22","doi-asserted-by":"crossref","unstructured":"Chen Z, Li Y, Bengio S, Si S (2019) You look twice: Gaternet for dynamic filter selection in cnns. In: Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, pp. 9172\u2013 9180","DOI":"10.1109\/CVPR.2019.00939"},{"key":"897_CR23","unstructured":"Wang X, Yu F, Dunlap L, Ma Y-A, Wang R, Mirhoseini A, Darrell T, Gonzalez JE ( 2020)Deep mixture of experts via shallow embedding. In: Uncertainty in Artificial Intelligence, pp. 552\u2013 562 . Pmlr"},{"key":"897_CR24","unstructured":"Shazeer N, Mirhoseini A, Maziarz K, Davis A, Le Q, Hinton G, Dean J (2017) Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. arXiv preprint arXiv:1701.06538"},{"key":"897_CR25","unstructured":"Han Y, Huang G, Song S, Yang L, Wang H, Wang Y (2021) Dynamic neural networks: A survey. arXiv preprint arXiv:2102.04906"},{"key":"897_CR26","unstructured":"Tan M, Le Q ( 2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp. 6105\u2013 6114 . Pmlr"},{"issue":"10","key":"897_CR27","first-page":"2","volume":"1","author":"M Wattenberg","year":"2016","unstructured":"Wattenberg M, Vi\u00e9gas F, Johnson I (2016) How to use t-sne effectively. Distill 1(10):2","journal-title":"How to use t-sne effectively. Distill"},{"key":"897_CR28","doi-asserted-by":"crossref","unstructured":"Zenobi G, Cunningham P ( 2001) Using diversity in preparing ensembles of classifiers based on different feature subsets to minimize generalization error. In: European Conference on Machine Learning, pp. 576\u2013 587. Springer","DOI":"10.1007\/3-540-44795-4_49"},{"key":"897_CR29","doi-asserted-by":"crossref","unstructured":"Huang G, Liu Z, Van Der\u00a0Maaten L, Weinberger KQ( 2017) Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700\u2013 4708","DOI":"10.1109\/CVPR.2017.243"},{"key":"897_CR30","unstructured":"Loshchilov I, Hutter F (2017) Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101"},{"key":"897_CR31","unstructured":"Im DJ, Tao M, Branson K (2016) An empirical analysis of deep network loss surfaces"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-022-00897-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-022-00897-1\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-022-00897-1.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,4,18]],"date-time":"2023-04-18T09:45:54Z","timestamp":1681811154000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-022-00897-1"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,11,16]]},"references-count":31,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2023,4]]}},"alternative-id":["897"],"URL":"https:\/\/doi.org\/10.1007\/s40747-022-00897-1","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"value":"2199-4536","type":"print"},{"value":"2198-6053","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,11,16]]},"assertion":[{"value":"21 July 2022","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"17 October 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"16 November 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}},{"value":"This article does not contain any studies with human participants or animals performed by any of the authors.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethical approval"}},{"value":"All contributors to the works for this paper have consented for its publication.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}}]}}