{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,2]],"date-time":"2026-06-02T04:29:36Z","timestamp":1780374576068,"version":"3.54.1"},"reference-count":31,"publisher":"Springer Science and Business Media LLC","issue":"4","license":[{"start":{"date-parts":[[2024,5,31]],"date-time":"2024-05-31T00:00:00Z","timestamp":1717113600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,5,31]],"date-time":"2024-05-31T00:00:00Z","timestamp":1717113600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100018696","name":"HORIZON EUROPE Health","doi-asserted-by":"publisher","award":["848098"],"award-info":[{"award-number":["848098"]}],"id":[{"id":"10.13039\/100018696","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100010677","name":"H2020 Health","doi-asserted-by":"publisher","award":["101017453"],"award-info":[{"award-number":["101017453"]}],"id":[{"id":"10.13039\/100010677","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Cogn Comput"],"published-print":{"date-parts":[[2024,7]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Gaining clinicians\u2019 trust will unleash the full potential of artificial intelligence (AI) in medicine, and explaining AI decisions is seen as the way to build trustworthy systems. However, explainable artificial intelligence (XAI) methods in medicine often lack a proper evaluation. In this paper, we present our evaluation methodology for XAI methods using forward simulatability. We define the Forward Simulatability Score (FSS) and analyze its limitations in the context of clinical predictors. Then, we applied FSS to our XAI approach defined over an ML-RO, a machine learning clinical predictor based on random optimization over a multiple kernel support vector machine (SVM) algorithm. To Compare FSS values before and after the explanation phase, we test our evaluation methodology for XAI methods on three clinical datasets, namely breast cancer, VTE, and migraine. The ML-RO system is a good model on which to test our XAI evaluation strategy based on the FSS. Indeed, ML-RO outperforms two other base models\u2014a decision tree (DT) and a plain SVM\u2014in the three datasets and gives the possibility of defining different XAI models: TOPK, MIGF, and F4G. The FSS evaluation score suggests that the explanation method F4G for the ML-RO is the most effective in two datasets out of the three tested, and it shows the limits of the learned model for one dataset. Our study aims to introduce a standard practice for evaluating XAI methods in medicine. By establishing a rigorous evaluation framework, we seek to provide healthcare professionals with reliable tools for assessing the performance of XAI methods to enhance the adoption of AI systems in clinical practice.<\/jats:p>","DOI":"10.1007\/s12559-024-10297-x","type":"journal-article","created":{"date-parts":[[2024,5,31]],"date-time":"2024-05-31T03:42:28Z","timestamp":1717126948000},"page":"1436-1446","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":12,"title":["Evaluating Explainable Machine Learning Models for Clinicians"],"prefix":"10.1007","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6573-8095","authenticated-orcid":false,"given":"Noemi","family":"Scarpato","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Aria","family":"Nourbakhsh","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Patrizia","family":"Ferroni","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Silvia","family":"Riondino","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Mario","family":"Roselli","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Francesca","family":"Fallucchi","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Piero","family":"Barbanti","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Fiorella","family":"Guadagni","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7301-3596","authenticated-orcid":false,"given":"Fabio Massimo","family":"Zanzotto","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2024,5,31]]},"reference":[{"key":"10297_CR1","doi-asserted-by":"publisher","unstructured":"May M. Eight ways machine learning is assisting medicine. Nat Med. 2021;27(1):2\u20133. Number: 1 Publisher: Nature Publishing Group. https:\/\/doi.org\/10.1038\/s41591-020-01197-2.","DOI":"10.1038\/s41591-020-01197-2"},{"key":"10297_CR2","doi-asserted-by":"publisher","first-page":"1129380","DOI":"10.3389\/fonc.2023.1129380","volume":"13","author":"SC Lu","year":"2023","unstructured":"Lu SC, Swisher CL, Chung C, Jaffray D, Sidey-Gibbons C. On the importance of interpretable machine learning predictions to inform clinical decision making in oncology. Front Oncol. 2023;13:1129380. https:\/\/doi.org\/10.3389\/fonc.2023.1129380.","journal-title":"Front. Oncol."},{"key":"10297_CR3","doi-asserted-by":"publisher","DOI":"10.1186\/s12911-020-01332-6","author":"J Amann","year":"2020","unstructured":"Amann J, Blasimme A, Vayena E, Frey D, Madai VI. Explainability for artificial intelligence in healthcare: a multidisciplinary perspective. BMC Med Inform Decis Mak. 2020. https:\/\/doi.org\/10.1186\/s12911-020-01332-6.","journal-title":"BMC Med. Inform. Decis. Mak."},{"issue":"9","key":"10297_CR4","doi-asserted-by":"publisher","first-page":"4394","DOI":"10.3390\/ijms22094394","volume":"22","author":"AJ Banegas-Luna","year":"2021","unstructured":"Banegas-Luna AJ, Pe\u00f1a-Garc\u00eda J, Iftene A, Guadagni F, Ferroni P, Scarpato N, et al. Towards the interpretability of machine learning predictions for medical applications targeting personalised therapies: a cancer case survey. Int J Mol Sci. 2021;22(9):4394. https:\/\/doi.org\/10.3390\/ijms22094394.","journal-title":"Int. J. Mol. Sci."},{"key":"10297_CR5","doi-asserted-by":"crossref","unstructured":"Sokol K, Flach P. Explainability fact sheets: a framework for systematic assessment of explainable approaches. In: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. FAT* \u201920. New York, NY, USA: Association for Computing Machinery; 2020. pp. 56\u201367.","DOI":"10.1145\/3351095.3372870"},{"key":"10297_CR6","doi-asserted-by":"crossref","unstructured":"Coroama L, Groza A. Evaluation metrics in explainable artificial intelligence (XAI). In: Guarda T, Portela F, Augusto MF, editors. Advanced Research in Technologies, Information, Innovation and Sustainability. Cham: Springer Nature Switzerland; 2022. pp. 401\u201313.","DOI":"10.1007\/978-3-031-20319-0_30"},{"key":"10297_CR7","doi-asserted-by":"publisher","first-page":"688969","DOI":"10.3389\/fdata.2021.688969","volume":"4","author":"V Belle","year":"2021","unstructured":"Belle V, Papantonis I. Principles and practice of explainable machine learning. Front Big Data. 2021;4:688969. https:\/\/doi.org\/10.3389\/fdata.2021.688969.","journal-title":"Front Big Data"},{"key":"10297_CR8","doi-asserted-by":"crossref","unstructured":"Hase P, Bansal M. Evaluating explainable AI: which algorithmic explanations help users predict model behavior? In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Online: Association for Computational Linguistics; 2020. pp. 5540\u201352. https:\/\/aclanthology.org\/2020.acl-main.491.","DOI":"10.18653\/v1\/2020.acl-main.491"},{"issue":"1","key":"10297_CR9","doi-asserted-by":"publisher","first-page":"2","DOI":"10.1609\/hcomp.v7i1.5285","volume":"7","author":"G Bansal","year":"2019","unstructured":"Bansal G, Nushi B, Kamar E, Lasecki WS, Weld DS, Horvitz E. Beyond accuracy: the role of mental models in human-AI team performance. Proceedings of the AAAI Conference on Human Computation and Crowdsourcing. 2019;7(1):2\u201311. https:\/\/doi.org\/10.1609\/hcomp.v7i1.5285.","journal-title":"Proceedings of the AAAI Conference on Human Computation and Crowdsourcing."},{"issue":"2","key":"10297_CR10","doi-asserted-by":"publisher","first-page":"234","DOI":"10.1177\/0272989X16662654","volume":"37","author":"P Ferroni","year":"2016","unstructured":"Ferroni P, Zanzotto FM, Scarpato N, Riondino S, Nanni U, Roselli M, et al. Risk assessment for venous thromboembolism in chemotherapy-treated ambulatory cancer patients. Med Decis Making. 2016;37(2):234\u201342. https:\/\/doi.org\/10.1177\/0272989X16662654.","journal-title":"Med. Decis. Making"},{"key":"10297_CR11","doi-asserted-by":"publisher","unstructured":"Gilpin LH, Bau D, Yuan BZ, Bajwa A, Specter M, Kagal L. Explaining explanations: an overview of interpretability of machine learning. In: 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA), Turin, Italy. 2018. pp. 80\u20139. https:\/\/doi.org\/10.1109\/DSAA.2018.00018.","DOI":"10.1109\/DSAA.2018.00018"},{"key":"10297_CR12","doi-asserted-by":"publisher","first-page":"245","DOI":"10.1613\/jair.1.12228","volume":"70","author":"N Burkart","year":"2021","unstructured":"Burkart N, Huber MF. A survey on the explainability of supervised machine learning. J Artif Intell Res. 2021;70:245\u2013317. https:\/\/doi.org\/10.1613\/jair.1.12228.","journal-title":"J Artif Intell Res"},{"key":"10297_CR13","doi-asserted-by":"crossref","unstructured":"Arrieta AB, D\u00edaz-Rodr\u00edguez N, Del Ser J, Bennetot A, Tabik S, Barbado A, et al. Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inf Fusion. 2020;58:82\u2013115.","DOI":"10.1016\/j.inffus.2019.12.012"},{"issue":"1","key":"10297_CR14","doi-asserted-by":"publisher","first-page":"18","DOI":"10.3390\/e23010018","volume":"23","author":"P Linardatos","year":"2020","unstructured":"Linardatos P, Papastefanopoulos V, Kotsiantis S. Explainable AI: a review of machine learning interpretability methods. Entropy. 2020;23(1):18.","journal-title":"Entropy."},{"key":"10297_CR15","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2023.101805","author":"S Ali","year":"2023","unstructured":"Ali S, Abuhmed T, El-Sappagh S, Muhammad K, Alonso-Moral JM, Confalonieri R, et al. Explainable artificial intelligence (XAI): what we know and what is left to attain trustworthy artificial intelligence. Inf Fusion. 2023. https:\/\/doi.org\/10.1016\/j.inffus.2023.101805.","journal-title":"Inf Fusion"},{"key":"10297_CR16","doi-asserted-by":"publisher","unstructured":"Zhou J, Gandomi AH, Chen F, Holzinger A. Evaluating the quality of machine learning explanations: a survey on methods and metrics. Electronics. 2021;10(5):593. Number: 5 Publisher: Multidisciplinary Digital Publishing Institute. https:\/\/doi.org\/10.3390\/electronics10050593.","DOI":"10.3390\/electronics10050593"},{"key":"10297_CR17","doi-asserted-by":"publisher","first-page":"113941","DOI":"10.1016\/j.eswa.2020.113941","volume":"165","author":"M Moradi","year":"2021","unstructured":"Moradi M, Samwald M. Post-hoc explanation of black-box classifiers using confident itemsets. Expert Syst Appl. 2021;165:113941. https:\/\/doi.org\/10.1016\/j.eswa.2020.113941.","journal-title":"Expert Syst. Appl."},{"key":"10297_CR18","volume-title":"C4. 5: programs for machine learning","author":"JR Quinlan","year":"2014","unstructured":"Quinlan JR. C4. 5: programs for machine learning. Elsevier; 2014."},{"issue":"3","key":"10297_CR19","doi-asserted-by":"publisher","first-page":"31","DOI":"10.1145\/3236386.3241340","volume":"16","author":"ZC Lipton","year":"2018","unstructured":"Lipton ZC. The mythos of model interpretability: in machine learning, the concept of interpretability is both important and slippery. Queue. 2018;16(3):31\u201357.","journal-title":"Queue."},{"key":"10297_CR20","doi-asserted-by":"publisher","first-page":"110787","DOI":"10.1016\/j.ejrad.2023.110787","volume":"162","author":"K Borys","year":"2023","unstructured":"Borys K, Schmitt YA, Nauta M, Seifert C, Kr\u00e4mer N, Friedrich CM, et al. Explainable AI in medical imaging: an overview for clinical practitioners - Saliency-based XAI approaches. Eur J Radiol. 2023;162:110787. https:\/\/doi.org\/10.1016\/j.ejrad.2023.110787.","journal-title":"Eur. J. Radiol."},{"issue":"9","key":"10297_CR21","doi-asserted-by":"publisher","first-page":"1342","DOI":"10.1038\/s41591-018-0107-6","volume":"24","author":"J De Fauw","year":"2018","unstructured":"De Fauw J, Ledsam JR, Romera-Paredes B, Nikolov S, Tomasev N, Blackwell S, et al. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat Med. 2018;24(9):1342\u201350.","journal-title":"Nat. Med."},{"key":"10297_CR22","doi-asserted-by":"crossref","unstructured":"Ribeiro MT, Singh S, Guestrin C. \u201cWhy should I trust you?\u201d: explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD \u201916. New York, NY, USA: Association for Computing Machinery; 2016. pp. 1135\u201344.","DOI":"10.1145\/2939672.2939778"},{"key":"10297_CR23","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v32i1.11491","volume-title":"Proceedings of the AAAI Conference on Artificial Intelligence.","author":"MT Ribeiro","year":"2018","unstructured":"Ribeiro MT, Singh S, Guestrin C. Anchors: high-precision model-agnostic explanations. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2018. https:\/\/doi.org\/10.1609\/aaai.v32i1.11491."},{"issue":"3","key":"10297_CR24","doi-asserted-by":"publisher","first-page":"725","DOI":"10.2337\/diacare.26.3.725","volume":"26","author":"J Lindstr\u00f6m","year":"2003","unstructured":"Lindstr\u00f6m J, Tuomilehto J. The diabetes risk score: a practical tool to predict type 2 diabetes risk. Diabetes Care. 2003;26(3):725\u201331. https:\/\/doi.org\/10.2337\/diacare.26.3.725.","journal-title":"Diabetes Care"},{"key":"10297_CR25","doi-asserted-by":"crossref","unstructured":"Khorana AA, Kuderer NM, Culakova E, Lyman GH, Francis CW. Development and validation of a predictive model for chemotherapy-associated thrombosis. Blood. 2008;111(10):4902\u20137.","DOI":"10.1182\/blood-2007-10-116327"},{"issue":"2","key":"10297_CR26","doi-asserted-by":"publisher","first-page":"346","DOI":"10.1097\/ta.0b013e3181961c35","volume":"66","author":"TC Nunez","year":"2009","unstructured":"Nunez TC, Voskresensky IV, Dossett LA, Shinall R, Dutton WD, Cotton BA. Early prediction of massive transfusion in trauma: simple as ABC (Assessment of Blood Consumption)? The Journal of Trauma: Injury, Infection, and Critical Care. 2009;66(2):346\u201352. https:\/\/doi.org\/10.1097\/ta.0b013e3181961c35.","journal-title":"The Journal of Trauma: Injury, Infection, and Critical Care."},{"key":"10297_CR27","first-page":"2211","volume":"12","author":"M G\u00f6nen","year":"2011","unstructured":"G\u00f6nen M, Alpayd\u0131n E. Multiple kernel learning algorithms. J Mach Learn Res. 2011;12:2211\u201368.","journal-title":"J Mach Learn Res"},{"key":"10297_CR28","doi-asserted-by":"publisher","first-page":"1","DOI":"10.3390\/cancers11030328","volume":"11","author":"P Ferroni","year":"2019","unstructured":"Ferroni P, Zanzotto FM, Riondino S, Scarpato N, Guadagni F, Roselli M. Breast cancer prognosis using a machine learning approach. Cancers. 2019;11:1\u20139. https:\/\/doi.org\/10.3390\/cancers11030328.","journal-title":"Cancers"},{"key":"10297_CR29","doi-asserted-by":"publisher","unstructured":"Ferroni P, Zanzotto FM, Scarpato N, Spila A, Fofi L, Egeo G, et al. Machine learning approach to predict medication overuse in migraine patients. Comput Struct Biotechnol J. 2020;18:1487\u201396. https:\/\/doi.org\/10.1016\/j.csbj.2020.06.006.","DOI":"10.1016\/j.csbj.2020.06.006"},{"issue":"2","key":"10297_CR30","doi-asserted-by":"publisher","first-page":"541","DOI":"10.1021\/acs.chemrestox.0c00373","volume":"34","author":"L Wu","year":"2021","unstructured":"Wu L, Huang R, Tetko IV, Xia Z, Xu J, Tong W. Trade-off predictivity and explainability for machine-learning powered predictive toxicology: an in-depth investigation with Tox21 data sets. Chem Res Toxicol. 2021;34(2):541\u20139.","journal-title":"Chem. Res. Toxicol."},{"key":"10297_CR31","doi-asserted-by":"publisher","unstructured":"Nauta M, Trienes J, Pathak S, Nguyen E, Peters M, Schmitt Y, et\u00a0al. From anecdotal evidence to quantitative evaluation methods: a systematic review on evaluating explainable AI. ACM Comput Surv. 2023;55(13). https:\/\/doi.org\/10.1145\/3583558.","DOI":"10.1145\/3583558"}],"container-title":["Cognitive Computation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s12559-024-10297-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s12559-024-10297-x\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s12559-024-10297-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,5]],"date-time":"2024-07-05T09:46:07Z","timestamp":1720172767000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s12559-024-10297-x"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,5,31]]},"references-count":31,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2024,7]]}},"alternative-id":["10297"],"URL":"https:\/\/doi.org\/10.1007\/s12559-024-10297-x","relation":{},"ISSN":["1866-9956","1866-9964"],"issn-type":[{"value":"1866-9956","type":"print"},{"value":"1866-9964","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,5,31]]},"assertion":[{"value":"19 July 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 May 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"31 May 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"In this paper, we exploited three distinct datasets obtained from reputable research institutions (BIOBIM, University of Rome Tor Vergata, IRCSS San Raffaele Roma). The data collection, compilation, and storage procedures adhered to the current regulations, including the General Data Protection Regulation (GDPR), and were conducted in accordance with the ethical principles outlined in the Declaration of Helsinki.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Legal and Ethical Aspects"}},{"value":"The authors declare no competing interests.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of Interest"}}]}}