{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T02:49:28Z","timestamp":1773802168314,"version":"3.50.1"},"reference-count":0,"publisher":"Association for the Advancement of Artificial Intelligence (AAAI)","issue":"15","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["AAAI"],"abstract":"<jats:p>Automatic medical report generation has the potential to support clinical diagnosis, reduce the workload of radiologists, and demonstrate potential for enhancing diagnostic consistency. However, current evaluation metrics often fail to reflect the clinical reliability of generated reports. Overlap-based methods overlook fine-grained details (e.g., location, severity), diagnostic metrics are constrained by fixed vocabularies. Some diagnostic metrics are limited by fixed vocabularies or templates, reducing their ability to capture diverse clinical expressions. LLM-based metrics lack interpretable reasoning, limiting trust in clinical settings. Therefore, we propose a Granular Explainable Multi-Agent Score (GEMA-Score) in this paper,  which conducts both objective quantification and subjective evaluation through a large language model-based multi-agent workflow. Our GEMA-Score parses structured reports and employs stable calculations through interactive exchanges of information among agents to assess disease diagnosis, location, severity, and uncertainty. Additionally, an LLM-based scoring agent evaluates completeness, readability, and clinical terminology while providing explanatory feedback. Extensive experiments show that GEMA-Score achieves the highest correlation with human experts on public datasets (Kendall = 0.69 on ReXVal; 0.45 on RadEvalX), demonstrating improved clinical scoring reliability.<\/jats:p>","DOI":"10.1609\/aaai.v40i15.38302","type":"journal-article","created":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T00:21:09Z","timestamp":1773793269000},"page":"13025-13033","source":"Crossref","is-referenced-by-count":0,"title":["GEMA-Score: Granular Explainable Multi-Agent Scoring Framework for Radiology Report Evaluation"],"prefix":"10.1609","volume":"40","author":[{"given":"Zhenxuan","family":"Zhang","sequence":"first","affiliation":[]},{"given":"KinHei","family":"Lee","sequence":"additional","affiliation":[]},{"given":"Peiyuan","family":"Jing","sequence":"additional","affiliation":[]},{"given":"Weihang","family":"Deng","sequence":"additional","affiliation":[]},{"given":"Huichi","family":"Zhou","sequence":"additional","affiliation":[]},{"given":"Zihao","family":"Jin","sequence":"additional","affiliation":[]},{"given":"Jiahao","family":"Huang","sequence":"additional","affiliation":[]},{"given":"Zhifan","family":"Gao","sequence":"additional","affiliation":[]},{"given":"Dominic C.","family":"Marshall","sequence":"additional","affiliation":[]},{"given":"Yingying","family":"Fang","sequence":"additional","affiliation":[]},{"given":"Guang","family":"Yang","sequence":"additional","affiliation":[]}],"member":"9382","published-online":{"date-parts":[[2026,3,14]]},"container-title":["Proceedings of the AAAI Conference on Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/download\/38302\/42264","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/download\/38302\/42264","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T00:21:10Z","timestamp":1773793270000},"score":1,"resource":{"primary":{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/view\/38302"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3,14]]},"references-count":0,"journal-issue":{"issue":"15","published-online":{"date-parts":[[2026,3,17]]}},"URL":"https:\/\/doi.org\/10.1609\/aaai.v40i15.38302","relation":{},"ISSN":["2374-3468","2159-5399"],"issn-type":[{"value":"2374-3468","type":"electronic"},{"value":"2159-5399","type":"print"}],"subject":[],"published":{"date-parts":[[2026,3,14]]}}}