{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,13]],"date-time":"2026-05-13T19:13:09Z","timestamp":1778699589064,"version":"3.51.4"},"reference-count":41,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2024,5,4]],"date-time":"2024-05-04T00:00:00Z","timestamp":1714780800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,5,4]],"date-time":"2024-05-04T00:00:00Z","timestamp":1714780800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"\"FAIR \u2013 Future Artificial Intelligence Research\" project","award":["CUP H23C22000860006"],"award-info":[{"award-number":["CUP H23C22000860006"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Big Data"],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Large Language Models (LLMs) are characterized by their inherent memory inefficiency and compute-intensive nature, making them impractical to run on low-resource devices and hindering their applicability in edge AI contexts. To address this issue, Knowledge Distillation approaches have been adopted to transfer knowledge from a complex model, referred to as the teacher, to a more compact, computationally efficient one, known as the student. The aim is to retain the performance of the original model while substantially reducing computational requirements. However, traditional knowledge distillation methods may struggle to effectively transfer crucial explainable knowledge from an LLM teacher to the student, potentially leading to explanation inconsistencies and decreased performance. This paper presents <jats:italic>DiXtill<\/jats:italic>, a method based on a novel approach to distilling knowledge from LLMs into lightweight neural architectures. The main idea is to leverage local explanations provided by an eXplainable Artificial Intelligence (XAI) method to guide the cross-architecture distillation of a teacher LLM into a self-explainable student, specifically a bi-directional LSTM network.Experimental results show that our XAI-driven distillation method allows the teacher explanations to be effectively transferred to the student, resulting in better agreement compared to classical distillation methods,thus enhancing the student interpretability. Furthermore, it enables the student to achieve comparable performance to the teacher LLM while also delivering a significantly higher compression ratio and speedup compared to other techniques such as post-training quantization and pruning, which paves the way for more efficient and sustainable edge AI applications<\/jats:p>","DOI":"10.1186\/s40537-024-00928-3","type":"journal-article","created":{"date-parts":[[2024,5,4]],"date-time":"2024-05-04T11:01:34Z","timestamp":1714820494000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":22,"title":["Xai-driven knowledge distillation of large language models for efficient deployment on low-resource devices"],"prefix":"10.1186","volume":"11","author":[{"given":"Riccardo","family":"Cantini","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Alessio","family":"Orsino","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Domenico","family":"Talia","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2024,5,4]]},"reference":[{"key":"928_CR1","first-page":"1877","volume":"33","author":"T Brown","year":"2020","unstructured":"Brown T, et al. Language models are few-shot learners. Adv Neural Inf Process Syst. 2020;33:1877\u2013901.","journal-title":"Adv Neural Inf Process Syst"},{"key":"928_CR2","doi-asserted-by":"crossref","unstructured":"Chang Y, Wang X, Wang J, Wu Y, Yang, L, Zhu K, Chen H, Yi X, Wang C, Wang Y. et al A survey on evaluation of large language models. ACM Trans Intell Syst Technol 2023.","DOI":"10.1145\/3641289"},{"key":"928_CR3","doi-asserted-by":"crossref","unstructured":"Cantini R, Cosentino C, Kilanioti I, Marozzo F, Talia D. Unmasking covid-19 false information on twitter: A topic-based approach with bert. In: International Conference on Discovery Science, Springer, 2023; pp. 126\u2013140","DOI":"10.1007\/978-3-031-45275-8_9"},{"key":"928_CR4","unstructured":"Frantar E, Alistarh D. Sparsegpt: Massive language models can be accurately pruned inone-shot. In: International Conference on Machine Learning, PMLR, 2023; pp. 10323\u201310337"},{"key":"928_CR5","doi-asserted-by":"crossref","unstructured":"Marozzo F, Orsino A, Talia D, Trunfio P. Edge computing solutions for distributed machine learning. In: 2022 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC\/PiCom\/CBDCom\/CyberSciTech), IEEE, 2022; pp. 1\u20138","DOI":"10.1109\/DASC\/PiCom\/CBDCom\/Cy55231.2022.9927824"},{"issue":"1","key":"928_CR6","doi-asserted-by":"publisher","first-page":"4","DOI":"10.1186\/s40537-021-00555-2","volume":"9","author":"L Belcastro","year":"2022","unstructured":"Belcastro L, Cantini R, Marozzo F, Orsino A, Talia D, Trunfio P. Programming big data analysis: principles and solutions. J Big Data. 2022;9(1):4.","journal-title":"J Big Data"},{"key":"928_CR7","unstructured":"Ba J, Caruana R. Do deep nets really need to be deep? Adv Neural Inf Process Syst 27; 2014;"},{"key":"928_CR8","unstructured":"Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 2015"},{"issue":"3","key":"928_CR9","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3446374","volume":"54","author":"D Saxena","year":"2021","unstructured":"Saxena D, Cao J. Generative adversarial networks (GANS) challenges, solutions, and future directions. ACM Comput Surv (CSUR). 2021;54(3):1\u201342.","journal-title":"ACM Comput Surv (CSUR)"},{"key":"928_CR10","doi-asserted-by":"crossref","unstructured":"Alharbi R, Vu MN, Thai MT. Learning interpretation with explainable knowledge distillation. In: 2021 IEEE International Conference on Big Data (Big Data), pp. 705\u2013714, 2021","DOI":"10.1109\/BigData52589.2021.9671988"},{"key":"928_CR11","doi-asserted-by":"publisher","first-page":"1789","DOI":"10.1007\/s11263-021-01453-z","volume":"129","author":"J Gou","year":"2021","unstructured":"Gou J, Yu B, Maybank SJ, Tao D. Knowledge distillation: a survey. Int J Comput Vision. 2021;129:1789\u2013819.","journal-title":"Int J Comput Vision"},{"key":"928_CR12","first-page":"22243","volume":"33","author":"T Chen","year":"2020","unstructured":"Chen T, Kornblith S, Swersky K, Norouzi M, Hinton GE. Big self-supervised models are strong semi-supervised learners. Adv Neural Inf Process Syst. 2020;33:22243\u201355.","journal-title":"Adv Neural Inf Process Syst"},{"key":"928_CR13","doi-asserted-by":"crossref","unstructured":"Zhang L, Song J, Gao A, Chen J, Bao C, Ma, K. Be your own teacher: Improve the performance of convolutional neural networks via self distillation. In: Proceedings of the IEEE\/CVF International Conference on Computer Vision, 2019, pp. 3713\u20133722","DOI":"10.1109\/ICCV.2019.00381"},{"key":"928_CR14","doi-asserted-by":"crossref","unstructured":"Kim T, Oh J, Kim N, Cho S, Yun S.-Y. Comparing kullback-leibler divergence and mean squared error loss in knowledge distillation. arXiv preprint, 2021. arXiv:2105.08919","DOI":"10.24963\/ijcai.2021\/362"},{"key":"928_CR15","unstructured":"Tang R, Lu Y, Liu L, Mou L, Vechtomova O, Lin J. Distilling task-specific knowledge from bert into simple neural networks. arXiv preprint, 2019. arXiv:1903.12136"},{"key":"928_CR16","unstructured":"Zhu X, Li J, Liu Y, Ma C, Wang W. A survey on model compression for large language models. arXiv preprint, 2023. arXiv:2308.07633"},{"key":"928_CR17","unstructured":"Sanh V, Debut L, Chaumond J, Wolf T. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint, 2019. arXiv:1910.01108"},{"key":"928_CR18","unstructured":"Liu, Y, Cao J, Li B, Hu W, Ding J, Li L. Cross-architecture knowledge distillation. In: Proceedings of the Asian Conference on Computer Vision, 2022, pp. 3396\u20133411"},{"key":"928_CR19","doi-asserted-by":"crossref","unstructured":"Jacob B, Kligys S, Chen B, Zhu M, Tang M, Howard A, Adam H, Kalenichenko D. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018; pp. 2704\u20132713","DOI":"10.1109\/CVPR.2018.00286"},{"key":"928_CR20","unstructured":"Lin J, Tang J, Tang H, Yang S, Dang X, Han S. Awq: Activation-aware weight quantization for llm compression and acceleration. arXiv preprint, 2023. arXiv:2306.00978"},{"key":"928_CR21","unstructured":"Frantar E, Ashkboos S, Hoefler T, Alistarh D. Gptq: Accurate post-training quantization for generative pre-trained transformers. arXiv preprint, 2022. arXiv:2210.17323"},{"key":"928_CR22","doi-asserted-by":"crossref","unstructured":"Wang Z, Wohlwend J, Lei T. Structured pruning of large language models. arXiv preprint, 2019. arXiv:1910.04732","DOI":"10.18653\/v1\/2020.emnlp-main.496"},{"key":"928_CR23","first-page":"24101","volume":"35","author":"W Kwon","year":"2022","unstructured":"Kwon W, Kim S, Mahoney MW, Hassoun J, Keutzer K, Gholami A. A fast post-training pruning framework for transformers. Adv Neural Inf Process Syst. 2022;35:24101\u201316.","journal-title":"Adv Neural Inf Process Syst"},{"key":"928_CR24","unstructured":"Michel P, Levy O, Neubig G. Are sixteen heads really better than one? Adv Neural Inf Process Syst 32; 2019;"},{"issue":"1","key":"928_CR25","doi-asserted-by":"publisher","first-page":"68","DOI":"10.1145\/3359786","volume":"63","author":"M Du","year":"2019","unstructured":"Du M, Liu N, Hu X. Techniques for interpretable machine learning. Commun ACM. 2019;63(1):68\u201377.","journal-title":"Commun ACM"},{"key":"928_CR26","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2023.101805","volume":"99","author":"S Ali","year":"2023","unstructured":"Ali S, Abuhmed T, El-Sappagh S, Muhammad K, Alonso-Moral JM, Confalonieri R, Guidotti R, Del Ser J, D\u00edaz-Rodr\u00edguez N, Herrera F. Explainable artificial intelligence (XAI): What we know and what is left to attain trustworthy artificial intelligence. Inf Fusion. 2023;99: 101805.","journal-title":"Inf Fusion"},{"key":"928_CR27","doi-asserted-by":"crossref","unstructured":"Rajani NF, McCann B, Xiong C, Socher R. Explain yourself! leveraging language models for commonsense reasoning. preprint, 2019. arXiv:1906.02361","DOI":"10.18653\/v1\/P19-1487"},{"key":"928_CR28","doi-asserted-by":"publisher","first-page":"392","DOI":"10.1016\/j.neunet.2022.03.017","volume":"150","author":"P Kumar","year":"2022","unstructured":"Kumar P, Raman B. A bert based dual-channel explainable text emotion recognition system. Neural Netw. 2022;150:392\u2013407.","journal-title":"Neural Netw"},{"key":"928_CR29","doi-asserted-by":"crossref","unstructured":"Ribeiro MT, Singh S, Guestrin C. \u201cwhy should I trust you?\u201d: Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016, 2016; pp. 1135\u20131144","DOI":"10.1145\/2939672.2939778"},{"key":"928_CR30","unstructured":"Lundberg SM, Lee S.-I. A unified approach to interpreting model predictions. Adv Neural Inf Process Syst 30:2017"},{"key":"928_CR31","unstructured":"Sundararajan, M, Taly A, Yan Q. Axiomatic attribution for deep networks. In: International Conference on Machine Learning, PMLR, 2017; pp. 3319\u20133328"},{"key":"928_CR32","unstructured":"Gao Y, Gu S, Jiang J, Hong SR, Yu D, Zhao, L. Going beyond XAI: a systematic survey for explanation-guided learning. ACM Comput Surv 2022."},{"key":"928_CR33","unstructured":"Zeng G, Kowsar Y, Erfani S, Bailey J. Generating deep networks explanations with robust attribution alignment. In: Asian Conference on Machine Learning, PMLR, 2021; pp. 753\u2013768"},{"key":"928_CR34","unstructured":"Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser, \u0141, Polosukhin I. Attention is all you need. Adv Neural Inf Process Syst 30: 2017"},{"key":"928_CR35","unstructured":"Devlin J, Chang M-W, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint, 2018. arXiv:1810.04805"},{"key":"928_CR36","unstructured":"Radford A, Narasimhan K, Salimans T, Sutskever I, et al. Improving language understanding by generative pre-training 2018;"},{"key":"928_CR37","unstructured":"Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. arXiv preprint, 2014. arXiv:1409.0473"},{"key":"928_CR38","doi-asserted-by":"crossref","unstructured":"Ghorbani A, Abid A, Zou J. Interpretation of neural networks is fragile. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, 2019; pp. 3681\u20133688","DOI":"10.1609\/aaai.v33i01.33013681"},{"key":"928_CR39","unstructured":"Yang Y, UY MCS, Huang A. FinBERT: a pretrained language model for financial communications 2020;<arxivurl>2006.08097<\/arxivurl>"},{"key":"928_CR40","doi-asserted-by":"crossref","unstructured":"Krishna S, Han T, Gu A, Pombra J, Jabbari S, Wu S, Lakkaraju H. The disagreement problem in explainable machine learning: a practitioner\u2019s perspective. arXiv preprint, 2022. arXiv:2202.01602","DOI":"10.21203\/rs.3.rs-2963888\/v1"},{"key":"928_CR41","unstructured":"Kokhlikyan N, Miglani V, Martin M, Wang E, Alsallakh B, Reynolds J, Melnikov A, Kliushkina N, Araya C, Yan S, et al. Captum: A unified and generic model interpretability library for pytorch. arXiv preprint, 2022. arXiv:2009.07896"}],"container-title":["Journal of Big Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-024-00928-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s40537-024-00928-3\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-024-00928-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,5,21]],"date-time":"2024-05-21T08:12:26Z","timestamp":1716279146000},"score":1,"resource":{"primary":{"URL":"https:\/\/journalofbigdata.springeropen.com\/articles\/10.1186\/s40537-024-00928-3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,5,4]]},"references-count":41,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2024,12]]}},"alternative-id":["928"],"URL":"https:\/\/doi.org\/10.1186\/s40537-024-00928-3","relation":{},"ISSN":["2196-1115"],"issn-type":[{"value":"2196-1115","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,5,4]]},"assertion":[{"value":"31 January 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"16 April 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 May 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"22 May 2024","order":4,"name":"change_date","label":"Change Date","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Update","order":5,"name":"change_type","label":"Change Type","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The rendering error in References was corrected.","order":6,"name":"change_details","label":"Change Details","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no Conflict of interest.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"63"}}