{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,9]],"date-time":"2026-04-09T15:38:22Z","timestamp":1775749102669,"version":"3.50.1"},"reference-count":58,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2025,8,12]],"date-time":"2025-08-12T00:00:00Z","timestamp":1754956800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Artif. Intell."],"abstract":"<jats:sec><jats:title>Introduction<\/jats:title><jats:p>Accurate disease diagnosis is critical in the medical field, yet it remains a challenging task due to the limited, heterogeneous, and complex nature of medical data. These challenges are particularly pronounced in multimodal tasks requiring the integration of diverse data sources. While lightweight models offer computational efficiency, they often lack the comprehensive understanding necessary for reliable clinical predictions. Conversely, large vision models, trained on extensive general-domain datasets, provide strong generalization but fall short in specialized medical applications due to domain mismatch and limited medical data availability.<\/jats:p><\/jats:sec><jats:sec><jats:title>Methods<\/jats:title><jats:p>To bridge the gap between general and specialized performance, we propose MedAlmighty, a knowledge distillation-based framework that synergizes the strengths of both large and small models. In this approach, we utilize DINOv2\u2014a pre-trained large vision model\u2014as a frozen teacher, and a lightweight convolutional neural network (CNN) as the trainable student. The student model is trained using both hard labels from the ground truth and soft targets generated by the teacher model. We adopt a hybrid loss function that combines cross-entropy loss (for classification accuracy) and Kullback-Leibler divergence (for distillation), enabling the student model to capture rich semantic features while remaining efficient and domain-aware.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>Experimental evaluations reveal that MedAlmighty significantly improves disease diagnosis performance across datasets characterized by sparse and diverse medical data. The proposed model outperforms baselines by effectively integrating the generalizable representations of large models with the specialized knowledge from smaller models. The results confirm improved robustness and accuracy in complex diagnostic scenarios.<\/jats:p><\/jats:sec><jats:sec><jats:title>Discussion<\/jats:title><jats:p>The MedAlmighty framework demonstrates that incorporating general-domain representations via frozen large vision models\u2014when guided by task-specific distillation strategies\u2014can enhance the performance of lightweight medical models. This approach offers a promising solution to data scarcity and domain gap issues in medical imaging. Future work may explore extending this distillation strategy to other medical modalities and incorporating multimodal alignment for even richer representation learning.<\/jats:p><\/jats:sec>","DOI":"10.3389\/frai.2025.1527980","type":"journal-article","created":{"date-parts":[[2025,8,12]],"date-time":"2025-08-12T05:30:23Z","timestamp":1754976623000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["MedAlmighty: enhancing disease diagnosis with large vision model distillation"],"prefix":"10.3389","volume":"8","author":[{"given":"Yajing","family":"Ren","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zheng","family":"Gu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Wen","family":"Liu","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1965","published-online":{"date-parts":[[2025,8,12]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","first-page":"7","DOI":"10.1186\/s13036-022-00319-3","article-title":"RN-autoencoder: reduced noise autoencoder for classifying imbalanced cancer genomic data","volume":"17","author":"Arafa","year":"2023","journal-title":"J. Biol. Eng"},{"key":"B2","doi-asserted-by":"publisher","first-page":"121453","DOI":"10.1016\/j.eswa.2023.121453","article-title":"Crossover smell agent optimized multilayer perceptron for precise brain tumor classification on mri images","volume":"238","author":"Arumugam","year":"2024","journal-title":"Expert Syst. Appl"},{"key":"B3","article-title":"Beit: bert pre-training of image transformers","author":"Bao","year":"2021","journal-title":"arXiv preprint arXiv:2106.08254"},{"key":"B4","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV48922.2021.00951","article-title":"\u201cEmerging properties in self-supervised vision transformers,\u201d","author":"Caron","year":"2021","journal-title":"Proceedings of the IEEE\/CVF International Conference on Computer Vision"},{"key":"B5","doi-asserted-by":"publisher","first-page":"106034","DOI":"10.1016\/j.compbiomed.2022.106034","article-title":"Uncertainty teacher with dense focal loss for semi-supervised medical image segmentation","volume":"149","author":"Chen","year":"2022","journal-title":"Comput. Biol. Med"},{"key":"B6","article-title":"An image is worth 16x16 words: transformers for image recognition at scale","author":"Dosovitskiy","year":"2020","journal-title":"arXiv preprint arXiv:2010.11929"},{"key":"B7","first-page":"570","article-title":"Classification of heart diseases using fusion based learning approach","volume":"12","author":"Edupuganti","year":"2024","journal-title":"Int. J. Intell. Syst. Applic. Eng"},{"key":"B8","doi-asserted-by":"crossref","first-page":"1192","DOI":"10.1109\/ICETCI61221.2024.10594540","article-title":"\u201cEnhancing medical imaging with gans synthesizing realistic images from limited data,\u201d","volume-title":"2024 IEEE 4th International Conference on Electronic Technology, Communication and Information (ICETCI)","author":"Feng","year":"2024"},{"key":"B9","doi-asserted-by":"publisher","first-page":"107758","DOI":"10.1016\/j.compbiomed.2023.107758","article-title":"DM-CNN: dynamic multi-scale convolutional neural network with uncertainty quantification for medical image classification","volume":"168","author":"Han","year":"2024","journal-title":"Comput. Biol. Med"},{"key":"B10","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01553","article-title":"\u201cMasked autoencoders are scalable vision learners,\u201d","author":"He","year":"2021","journal-title":"2022 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)"},{"key":"B11","doi-asserted-by":"publisher","DOI":"10.1109\/WACV57701.2024.00746","article-title":"\u201cAre natural domain foundation models useful for medical image classification?\u201d","author":"Huix","year":"2024","journal-title":"Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision"},{"key":"B12","article-title":"Mmunlearner: Reformulating multimodal machine unlearning in the era of multimodal large language models","author":"Huo","year":"2025","journal-title":"arXiv preprint arXiv:2502.11051"},{"key":"B13","doi-asserted-by":"crossref","first-page":"1083","DOI":"10.1109\/ICMLA61862.2024.00164","article-title":"\u201cMulti-modal contrastive learning for medical image classification with limited training data,\u201d","volume-title":"2024 International Conference on Machine Learning and Applications (ICMLA)","author":"Jiao","year":"2024"},{"key":"B14","doi-asserted-by":"publisher","first-page":"3448","DOI":"10.1007\/s10489-024-05358-5","article-title":"A self-supervised learning model based on variational autoencoder for limited-sample mammogram classification","volume":"54","author":"Karagoz","year":"2024","journal-title":"Appl. Intell"},{"key":"B15","first-page":"468","article-title":"\u201cImproving medical multi-modal contrastive learning with expert annotations,\u201d","volume-title":"European Conference on Computer Vision","author":"Kumar","year":"2024"},{"key":"B16","doi-asserted-by":"publisher","first-page":"2463","DOI":"10.1109\/TMI.2025.3534436","article-title":"Enhancing medical vision-language contrastive learning via inter-matching relation modelling","volume":"44","author":"Li","year":"2025","journal-title":"IEEE Trans. Med. Imag"},{"key":"B17","doi-asserted-by":"publisher","first-page":"129809","DOI":"10.1016\/j.neucom.2025.129809","article-title":"Decoupled contrastive learning for multilingual multimodal medical pre-trained model","volume":"633","author":"Li","year":"2025","journal-title":"Neurocomputing"},{"key":"B18","article-title":"Memorysam: memorize modalities and semantics with segment anything model 2 for multi-modal semantic segmentation","author":"Liao","year":"2025","journal-title":"arXiv preprint arXiv:2503.06700"},{"key":"B19","doi-asserted-by":"publisher","first-page":"qzaf011","DOI":"10.1093\/gpbjnl\/qzaf011","article-title":"Challenges in AI-driven biomedical multimodal data fusion and analysis","volume":"23","author":"Liu","year":"2025","journal-title":"Genom. Prot. Bioinform"},{"key":"B20","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52729.2023.00327","article-title":"\u201cMultiple instance learning via iterative self-paced supervised contrastive learning,\u201d","author":"Liu","year":"2022","journal-title":"2023 IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR)"},{"key":"B21","doi-asserted-by":"publisher","first-page":"111323","DOI":"10.1016\/j.asoc.2024.111323","article-title":"An efficient medical image classification network based on multi-branch cnn, token grouping transformer and mixer MLP","volume":"153","author":"Liu","year":"2024","journal-title":"Appl. Soft Comput"},{"key":"B22","article-title":"Omnibind: teach to build unequal-scale modality interaction for omni-bind of all","author":"Lyu","year":"2024","journal-title":"arXiv preprint arXiv:2405.16108"},{"key":"B23","doi-asserted-by":"publisher","first-page":"106791","DOI":"10.1016\/j.compbiomed.2023.106791","article-title":"Medvit: a robust vision transformer for generalized medical image classification","volume":"157","author":"Manzari","year":"2023","journal-title":"Comput. Biol. Med"},{"key":"B24","doi-asserted-by":"publisher","first-page":"112536","DOI":"10.1016\/j.asoc.2024.112536","article-title":"Medical supervised masked autoencoder: crafting a better masking strategy and efficient fine-tuning schedule for medical image classification","volume":"169","author":"Mao","year":"2025","journal-title":"Appl. Soft Comput"},{"key":"B25","doi-asserted-by":"publisher","first-page":"3265","DOI":"10.1007\/s41870-024-01798-x","article-title":"Constructing a hybrid activation and parameter-fusion based cnn medical image classifier","volume":"16","author":"Maree","year":"2024","journal-title":"Int. J. Inf. Technol"},{"key":"B26","doi-asserted-by":"crossref","first-page":"1540","DOI":"10.1109\/ICMSCI62561.2025.10894026","article-title":"\u201cA densenet-enhanced gan model for classification of medical images into original and fake,\u201d","volume-title":"2025 International Conference on Multi-Agent Systems for Collaborative Intelligence (ICMSCI)","author":"MeenaPrakash","year":"2025"},{"key":"B27","doi-asserted-by":"publisher","first-page":"e210315","DOI":"10.1148\/ryai.210315","article-title":"Radimagenet: an open radiologic deep learning research dataset for effective transfer learning","volume":"4","author":"Mei","year":"2022","journal-title":"Radiology"},{"key":"B28","doi-asserted-by":"publisher","DOI":"10.1016\/j.ebiom.2023.104930","article-title":"Clinical phenotypes among patients with normal cardiac perfusion using unsupervised learning: a retrospective observational study","author":"Miller","year":"2024","journal-title":"EBioMedicine"},{"key":"B29","doi-asserted-by":"publisher","first-page":"4345","DOI":"10.1007\/s00521-024-10862-3","article-title":"Pp-cnn: probabilistic pooling cnn for enhanced image classification","volume":"37","author":"Mishra","year":"2025","journal-title":"Neur. Comput. Applic"},{"key":"B30","doi-asserted-by":"publisher","DOI":"10.1109\/CVPRW63382.2024.00517","article-title":"\u201cConpro: learning severity representation for medical images using contrastive learning and preference optimization,\u201d","author":"Nguyen","year":"2024","journal-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition"},{"key":"B31","article-title":"Dinov2: learning robust visual features without supervision","author":"Oquab","year":"2023","journal-title":"arXiv preprint arXiv:2304.07193"},{"key":"B32","doi-asserted-by":"publisher","first-page":"692","DOI":"10.1038\/s41598-024-51329-8","article-title":"A twin convolutional neural network with hybrid binary optimizer for multimodal breast cancer digital image classification","volume":"14","author":"Oyelade","year":"2024","journal-title":"Sci. Rep"},{"key":"B33","doi-asserted-by":"publisher","first-page":"107627","DOI":"10.1016\/j.bspc.2025.107627","article-title":"A novel cnn-vit-based deep learning model for early skin cancer diagnosis","volume":"104","author":"Pacal","year":"2025","journal-title":"Biomed. Signal Process. Control"},{"key":"B34","doi-asserted-by":"publisher","first-page":"3755","DOI":"10.1109\/TMM.2025.3535321","article-title":"Stream-VIT: learning streamlined convolutions in vision transformer","volume":"44","author":"Pan","year":"2025","journal-title":"IEEE Trans. Multim"},{"key":"B35","article-title":"Beit v2: masked image modeling with vector-quantized visual tokenizers","author":"Peng","year":"2022","journal-title":"arXiv preprint arXiv:2208.06366"},{"key":"B36","article-title":"Cha-maevit: unifying channel-aware masked autoencoders and multi-channel vision transformers for improved cross-channel learning","author":"Pham","year":"2025","journal-title":"arXiv preprint arXiv:2503.19331"},{"key":"B37","article-title":"Vit-ae++: improving vision transformer autoencoder for self-supervised medical image representations","author":"Prabhakar","year":"2023","journal-title":"ArXiv, abs\/2301.07382"},{"key":"B38","first-page":"8748","article-title":"\u201cLearning transferable visual models from natural language supervision,\u201d","volume-title":"International Conference on Machine Learning","author":"Radford","year":"2021"},{"key":"B39","doi-asserted-by":"publisher","first-page":"127145","DOI":"10.1016\/j.eswa.2025.127145","article-title":"Multiple teachers are beneficial: a lightweight and noise-resistant student model for point-of-care imaging classification","volume":"275","author":"Song","year":"2025","journal-title":"Exp. Syst. Applic"},{"key":"B40","doi-asserted-by":"publisher","first-page":"84","DOI":"10.1007\/s10916-024-02105-8","article-title":"Comparison of vision transformers and convolutional neural networks in medical image analysis: a systematic review","volume":"48","author":"Takahashi","year":"2024","journal-title":"J. Med. Syst"},{"key":"B41","first-page":"516","article-title":"\u201cDeit III: revenge of the vit,\u201d","volume-title":"European Conference on Computer Vision","author":"Touvron","year":"2022"},{"key":"B42","doi-asserted-by":"publisher","DOI":"10.1145\/3700906.3700937","article-title":"\u201cBreast cancer image classification method based on deep transfer learning,\u201d","author":"Wang","year":"2024","journal-title":"Proceedings of the International Conference on Image Processing, Machine Learning and Pattern Recognition"},{"key":"B43","doi-asserted-by":"publisher","DOI":"10.1109\/WACV57701.2024.00114","article-title":"\u201cLearning quality labels for robust image classification,\u201d","author":"Wang","year":"2024","journal-title":"Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision"},{"key":"B44","doi-asserted-by":"publisher","first-page":"901","DOI":"10.3390\/bioengineering10080901","article-title":"Self-supervised learning application on covid-19 chest x-ray image classification using masked autoencoder","volume":"10","author":"Xing","year":"2023","journal-title":"Bioengineering"},{"key":"B45","doi-asserted-by":"publisher","first-page":"102500","DOI":"10.1016\/j.compmedimag.2025.102500","article-title":"Contrastive learning in brain imaging","volume":"121","author":"Xu","year":"2025","journal-title":"Computer. Med. Imag. Graph"},{"key":"B46","article-title":"A survey of mathematical reasoning in the era of multimodal large language model: benchmark, method and challenges","author":"Yan","year":"2024","journal-title":"arXiv preprint arXiv:2412.11936"},{"key":"B47","doi-asserted-by":"publisher","author":"Yang","year":"2020","DOI":"10.1109\/ISBI48211.2021.9434062"},{"key":"B48","doi-asserted-by":"publisher","first-page":"41","DOI":"10.1038\/s41597-022-01721-8","article-title":"MedMNIST v2 - A large-scale lightweight benchmark for 2D and 3D biomedical image classification","volume":"10","author":"Yang","year":"2023","journal-title":"Sci. Data"},{"key":"B49","article-title":"Medkan: an advanced kolmogorov-arnold network for medical image classification","author":"Yang","year":"2025","journal-title":"arXiv preprint arXiv:2502.18416"},{"key":"B50","doi-asserted-by":"crossref","first-page":"4498","DOI":"10.1109\/ICASSP43922.2022.9747534","article-title":"\u201cConfidence-aware multi-teacher knowledge distillation,\u201d","volume-title":"ICASSP 2022\u20132022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","author":"Zhang","year":"2022"},{"key":"B51","article-title":"Goodsam: bridging domain and capacity gaps via segment anything model for distortion -aware panoramic semantic segmentation","author":"Zhang","year":"2024","journal-title":"arXiv preprint arXiv:2403.16370."},{"key":"B52","doi-asserted-by":"publisher","first-page":"6802","DOI":"10.1109\/TNNLS.2024.3399164","article-title":"Pyramid pixel context adaption network for medical image classification with supervised contrastive learning","volume":"36","author":"Zhang","year":"2024","journal-title":"IEEE Trans. Neural Netw. Learn. Syst"},{"key":"B53","article-title":"Unveiling the potential of segment anything model 2 for RGB-thermal semantic segmentation with language guidance","author":"Zhao","year":"2025","journal-title":"arXiv preprint arXiv:2503.02581"},{"key":"B54","doi-asserted-by":"publisher","first-page":"106051","DOI":"10.1016\/j.compbiomed.2022.106051","article-title":"Uncertainty-aware deep co-training for semi-supervised medical image segmentation","volume":"149","author":"Zheng","year":"2022","journal-title":"Comput. Biol. Med"},{"key":"B55","article-title":"Retrieval augmented generation and understanding in vision: a survey and new outlook","author":"Zheng","year":"2025","journal-title":"arXiv preprint arXiv:2503.18016"},{"key":"B56","article-title":"Omnisam: omnidirectional segment anything model for UDA in panoramic semantic segmentation","author":"Zhong","year":"2025","journal-title":"arXiv preprint arXiv:2503.07098"},{"key":"B57","doi-asserted-by":"publisher","DOI":"10.48550\/arXiv.2412.04220","article-title":"Customize segment anything model for multi-modal semantic segmentation with mixture of lora experts","author":"Zhu","year":"2024","journal-title":"arXiv preprint arXiv:2412.04220"},{"key":"B58","doi-asserted-by":"crossref","first-page":"e231219","DOI":"10.1148\/radiol.231219","article-title":"Radiologic features of nodules attached to the mediastinal or diaphragmatic pleura at low-dose CT for lung cancer screening","volume":"310","author":"Zhu","year":"2024","journal-title":"Radiology"}],"container-title":["Frontiers in Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2025.1527980\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,8,12]],"date-time":"2025-08-12T05:30:26Z","timestamp":1754976626000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/frai.2025.1527980\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8,12]]},"references-count":58,"alternative-id":["10.3389\/frai.2025.1527980"],"URL":"https:\/\/doi.org\/10.3389\/frai.2025.1527980","relation":{},"ISSN":["2624-8212"],"issn-type":[{"value":"2624-8212","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,8,12]]},"article-number":"1527980"}}