{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,24]],"date-time":"2025-12-24T14:49:02Z","timestamp":1766587742686,"version":"3.41.0"},"reference-count":48,"publisher":"Association for Computing Machinery (ACM)","issue":"5","license":[{"start":{"date-parts":[[2025,5,22]],"date-time":"2025-05-22T00:00:00Z","timestamp":1747872000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"crossref","award":["62176217, 62206624"],"award-info":[{"award-number":["62176217, 62206624"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100002858","name":"Postdoctoral Science Foundation of China","doi-asserted-by":"crossref","award":["2023M732428"],"award-info":[{"award-number":["2023M732428"]}],"id":[{"id":"10.13039\/501100002858","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Sichuan Science and Technology Program","award":["2024ZYD0272, 2025ZNSFSC0456"],"award-info":[{"award-number":["2024ZYD0272, 2025ZNSFSC0456"]}]},{"name":"Doctoral Research Innovation Project","award":["21E025"],"award-info":[{"award-number":["21E025"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Knowl. Discov. Data"],"published-print":{"date-parts":[[2025,6,30]]},"abstract":"<jats:p>Medical report generation which extracts pathological information within medical images and subsequently produces diagnostic text autonomously aims to alleviate the workload of medical experts and offers auxiliary support in diagnoses. Despite some preliminary progress have been made, several limitations still persist, including lack of specificity in extracted visual features, insufficient consideration of cross-modal alignment and extensive preparatory work required for prior knowledge. To address these issues, we, in this article, propose a novel deep label-guided graph convolutional network for medical report generation which utilizes disease label to guide to extract pathological information from medical images. To be specific, we first construct graph convolutional network to guide the model to extract the specific visual features based on disease labels, which allowing us to selectively extract disease specificity information resided in medical images. Then, we develop cross-modal alignment module to guide the alignment across medical image, diagnose report and disease label, which enables more accurate generation with more precise description. Besides, we build pre-constructed relational matrix to guide report generation model to learn the relationship between visual features and disease types with minimal additional workload to further reduce intensive workload. Extensive experiments on three benchmark datasets, i.e., IU X-ray, MIMIC-CXR, and COV-CTR, demonstrate that the proposed method outperforms the recent state-of-the-art medical report generation methods. Ours shows a 9.2% improvement in BLEU-4 score on the IU X-ray dataset, and both BLEU-4 and CIDEr scores improve by 6.31% on the MIMIC-CXR dataset. Additionally, the results show that it can be easily to applied and extended to medical image report generation with different modalities.<\/jats:p>","DOI":"10.1145\/3722226","type":"journal-article","created":{"date-parts":[[2025,3,10]],"date-time":"2025-03-10T16:21:48Z","timestamp":1741623708000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":1,"title":["Deep Disease Label-guided Graph Convolutional Network for Medical Report Generation"],"prefix":"10.1145","volume":"19","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0671-8182","authenticated-orcid":false,"given":"Liming","family":"Xu","sequence":"first","affiliation":[{"name":"China West Normal University, Sichuan, Nanchong, China and Sichuan Artificial Intelligence Research Institute, Yibin, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-9891-8803","authenticated-orcid":false,"given":"Yongheng","family":"Wang","sequence":"additional","affiliation":[{"name":"China West Normal University, Nanchong, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9135-5098","authenticated-orcid":false,"given":"Chunlin","family":"He","sequence":"additional","affiliation":[{"name":"China West Normal University, Nanchong, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-3075-4758","authenticated-orcid":false,"given":"Quan","family":"Tang","sequence":"additional","affiliation":[{"name":"China West Normal University, Nanchong, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5892-2372","authenticated-orcid":false,"given":"Xianhua","family":"Zeng","sequence":"additional","affiliation":[{"name":"Chongqing University of Posts and Telecommunications, Chongqing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6551-3884","authenticated-orcid":false,"given":"Jiancheng","family":"Lv","sequence":"additional","affiliation":[{"name":"Sichuan University, Chengdu, China"}]}],"member":"320","published-online":{"date-parts":[[2025,5,22]]},"reference":[{"key":"e_1_3_3_2_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41591-022-01981-2"},{"issue":"4","key":"e_1_3_3_3_2","doi-asserted-by":"crossref","first-page":"80","DOI":"10.1007\/s10462-023-10652-8","article-title":"Cost-sensitive learning for imbalanced medical data: A review","volume":"57","author":"Araf Imane","year":"2024","unstructured":"Imane Araf, Ali Idri, and Ikram Chairi. 2024. Cost-sensitive learning for imbalanced medical data: A review. Artificial Intelligence Review 57, 4 (2024), 80\u2013151.","journal-title":"Artificial Intelligence Review"},{"key":"e_1_3_3_4_2","first-page":"119","volume-title":"In British Machine Vision Conference","volume":"1","author":"Balntas Vassileios","year":"2016","unstructured":"Vassileios Balntas, Edgar Riba, Daniel Ponsa, and Krystian Mikolajczyk. 2016. Learning local feature descriptors with triplets and shallow convolutional neural networks. In British Machine Vision Conference, Vol. 1, 119\u2013129."},{"issue":"1","key":"e_1_3_3_5_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3608954","article-title":"Multi-view graph convolutional networks with differentiable node selection","volume":"18","author":"Chen Zhaoliang","year":"2023","unstructured":"Zhaoliang Chen, Lele Fu, Shunxin Xiao, Shiping Wang, Claudia Plant, and Wenzhong Guo. 2023. Multi-view graph convolutional networks with differentiable node selection. ACM Transactions on Knowledge Discovery from Data 18, 1 (2023), 1\u201321.","journal-title":"ACM Transactions on Knowledge Discovery from Data"},{"key":"e_1_3_3_6_2","first-page":"5904","volume-title":"Proceedings of the Annual Meeting of the Association for Computational Linguistics","author":"Chen Zhihong","year":"2021","unstructured":"Zhihong Chen, Yaling Shen, Yan Song, and Xiang Wan. 2021. Cross-modal memory networks for radiology report generation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, 5904\u20135914."},{"key":"e_1_3_3_7_2","first-page":"1439","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing","author":"Chen Zhihong","year":"2020","unstructured":"Zhihong Chen, Yan Song, Tsunghui Chang, and Xiang Wan. 2020. Generating radiology reports via memory-driven transformer. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 1439\u20131449."},{"key":"e_1_3_3_8_2","doi-asserted-by":"publisher","DOI":"10.1093\/jamia\/ocv080"},{"key":"e_1_3_3_9_2","first-page":"85","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing","author":"Denkowski Michael","year":"2021","unstructured":"Michael Denkowski and Alon Lavie. 2021. Automatic metric for reliable optimization and evaluation of machine translation systems. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 85\u201391."},{"key":"e_1_3_3_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/JBHI.2024.3371894"},{"key":"e_1_3_3_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2023.3322425"},{"key":"e_1_3_3_12_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.media.2020.101872"},{"key":"e_1_3_3_13_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jbi.2023.104496"},{"key":"e_1_3_3_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00745"},{"key":"e_1_3_3_15_2","first-page":"19809","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Huang Zhongzhen","year":"2023","unstructured":"Zhongzhen Huang, Xiaofan Zhang, and Shaoting Zhang. 2023. Kiut: Knowledge-injected u-transformer for radiology report generation. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 19809\u201319818."},{"key":"e_1_3_3_16_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.3301590"},{"key":"e_1_3_3_17_2","first-page":"2577","volume-title":"Proceedings of the Annual Meeting of the Association for Computational Linguistics","author":"Jing Baoyu","year":"2018","unstructured":"Baoyu Jing, Pengtao Xie, and Eric Xing. 2018. On the automatic generation of medical imaging reports. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2577\u20132586."},{"key":"e_1_3_3_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/3565575"},{"key":"e_1_3_3_19_2","first-page":"4171","volume-title":"Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics","author":"Kenton Jacob","year":"2019","unstructured":"Jacob Kenton and Lee Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics, 4171\u20134186."},{"key":"e_1_3_3_20_2","doi-asserted-by":"publisher","DOI":"10.1145\/3643034"},{"key":"e_1_3_3_21_2","first-page":"20656","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Li Mingjie","year":"2022","unstructured":"Mingjie Li, Wenjia Cai, and Xiaojun Chang. 2022. Cross-modal clinical graph transformer for ophthalmic report generation. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 20656\u201320665."},{"key":"e_1_3_3_22_2","first-page":"3334","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Li Mingjie","year":"2023","unstructured":"Mingjie Li, Bingqian Lin, Zicong Chen, Haokun Lin, Xiaodan Liang, and Xiaojun Chang. 2023. Dynamic graph enhanced contrastive learning for chest x-ray report generation. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 3334\u20133343."},{"key":"e_1_3_3_23_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11280-022-01013-6"},{"key":"e_1_3_3_24_2","first-page":"74","volume-title":"Proceedings of the Annual Meeting of the Association for Computational Linguistics","author":"Lin Chinyew","year":"2004","unstructured":"Chinyew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, 74\u201381."},{"key":"e_1_3_3_25_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR46437.2021.01354"},{"key":"e_1_3_3_26_2","first-page":"16266","article-title":"Auto-encoding knowledge graph for unsupervised medical report generation","volume":"34","author":"Liu Fenglin","year":"2021","unstructured":"Fenglin Liu, Chenyu You, Xian Wu, Shen Ge, and Xu Sun. 2021. Auto-encoding knowledge graph for unsupervised medical report generation. In Advances in Neural Information Processing Systems, Vol. 34, 16266\u201316279.","journal-title":"Advances in Neural Information Processing Systems, Vol"},{"key":"e_1_3_3_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/TMM.2023.3342691"},{"issue":"10","key":"e_1_3_3_28_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3522747","article-title":"A survey on deep learning and explainability for automatic report generation from medical images","volume":"54","author":"Messina Pablo","year":"2022","unstructured":"Pablo Messina, Pablo Pino, Denis Parra, Alvaro Soto, Cecilia Besa, Claudia Prieto, and Daniel Capurro. 2022. A survey on deep learning and explainability for automatic report generation from medical images. ACM Computing Surveys 54, 10 (2022), 1\u201340.","journal-title":"ACM Computing Surveys"},{"key":"e_1_3_3_29_2","first-page":"1448","article-title":"S3-Net: A self-supervised dual-stream network for radiology report generation","volume":"28","author":"Pan Renjie","year":"2023","unstructured":"Renjie Pan, Ruisheng Ran, Wei Hu, Wenfeng Zhang, Qibing Qin, and Shaoguo Cui. 2023. S3-Net: A self-supervised dual-stream network for radiology report generation. IEEE Journal of Biomedical and Health Informatics 28 (2023), 1448\u20131459.","journal-title":"IEEE Journal of Biomedical and Health Informatics"},{"key":"e_1_3_3_30_2","first-page":"311","volume-title":"Proceedings of the Annual Meeting of the Association for Computational Linguistics","author":"Papineni Kishore","year":"2002","unstructured":"Kishore Papineni, Salim Roukos, Todd Ward, and Weijing Zhu. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, 311\u2013318."},{"key":"e_1_3_3_31_2","first-page":"448","volume-title":"Proceedings of the Annual Meeting of the Association for Computational Linguistics","author":"Qin Han","year":"2022","unstructured":"Han Qin and Yan Song. 2022. Reinforced cross-modal alignment for radiology report generation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, 448\u2013458."},{"key":"e_1_3_3_32_2","first-page":"6105","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Tan Mingxing","year":"2019","unstructured":"Mingxing Tan and Quoc Le. 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, 6105\u20136114."},{"issue":"1","key":"e_1_3_3_33_2","article-title":"Medical image description based on multimodal auxiliary signals and transformer","volume":"2024","author":"Tan Yun","year":"2024","unstructured":"Yun Tan, Chunzhi Li, Jiaohua Qin, Youyuan Xue, and Xuyu Xiang. 2024. Medical image description based on multimodal auxiliary signals and transformer. International Journal of Intelligent Systems 2024, 1 (2024), 6680546.","journal-title":"International Journal of Intelligent Systems"},{"key":"e_1_3_3_34_2","first-page":"4566","volume-title":"Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition","author":"Vedantam Ramakrishna","year":"2015","unstructured":"Ramakrishna Vedantam, C. Zitnick, Devi Parikh, and Virgnia Tech. 2015. Cider: Consensus-based image description evaluation. In Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, 4566\u20134575."},{"key":"e_1_3_3_35_2","doi-asserted-by":"crossref","first-page":"2199","DOI":"10.1109\/JBHI.2024.3354712","article-title":"CAMANet: Class activation map guided attention network for radiology report generation","volume":"28","author":"Wang Jun","year":"2024","unstructured":"Jun Wang, Abhir Bhalerao, Terry Yin, Simon See, and Yulan He. 2024. CAMANet: Class activation map guided attention network for radiology report generation. IEEE Journal of Biomedical and Health Informatics 28 (2024), 2199\u20132210.","journal-title":"IEEE Journal of Biomedical and Health Informatics"},{"key":"e_1_3_3_36_2","first-page":"486","volume-title":"AMIA Summits on Translational Science Proceedings","volume":"2022","author":"Wang Song","year":"2022","unstructured":"Song Wang, Liyan Tang, Mingquan Lin, George Shih, Ying Ding, and Yifan Peng. 2022. Prior knowledge enhances radiology report generation. AMIA Summits on Translational Science Proceedings 2022 (2022), 486\u2013495."},{"key":"e_1_3_3_37_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.inffus.2023.101817"},{"key":"e_1_3_3_38_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neucom.2023.126287"},{"key":"e_1_3_3_39_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.media.2023.102798"},{"key":"e_1_3_3_40_2","first-page":"72","volume-title":"Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention","author":"You Di","year":"2021","unstructured":"Di You, Fenglin Liu, Shen Ge, Xiaoxia Xie, Jing Zhang, and Xian Wu. 2021. Aligntransformer: Hierarchical alignment of visual regions and disease tags for medical report generation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, 72\u201382."},{"key":"e_1_3_3_41_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.cmpb.2023.107979"},{"key":"e_1_3_3_42_2","doi-asserted-by":"crossref","first-page":"105700","DOI":"10.1016\/j.cmpb.2020.105700","article-title":"Generating diagnostic report for medical image by high-middle-level visual information incorporation on double deep learning models","volume":"197","author":"Zeng Xianhua","year":"2020","unstructured":"Xianhua Zeng, Li Wen, Yang Xu, and Conghui Ji. 2020. Generating diagnostic report for medical image by high-middle-level visual information incorporation on double deep learning models. Computer Methods and Programs in Biomedicine 197 (2020), 105700\u2013105710.","journal-title":"Computer Methods and Programs in Biomedicine"},{"issue":"21","key":"e_1_3_3_43_2","doi-asserted-by":"crossref","first-page":"11111","DOI":"10.3390\/app122111111","article-title":"Improving medical x-ray report generation by using knowledge graph","volume":"12","author":"Zhang Dehai","year":"2022","unstructured":"Dehai Zhang, Anquan Ren, Jiashu Liang, Qing Liu, Haoxing Wang, and Yu Ma. 2022. Improving medical x-ray report generation by using knowledge graph. Applied Sciences 12, 21 (2022), 11111\u201311125.","journal-title":"Applied Sciences"},{"key":"e_1_3_3_44_2","doi-asserted-by":"crossref","first-page":"904","DOI":"10.1109\/TMM.2023.3273390","article-title":"Semi-supervised medical report generation via graph-guided hybrid feature consistency","volume":"26","author":"Zhang Ke","year":"2023","unstructured":"Ke Zhang, Hanliang Jiang, Jian Zhang, Qingming Huang, Jianping Fan, Jun Yu, and Weidong Han. 2023. Semi-supervised medical report generation via graph-guided hybrid feature consistency. IEEE Transactions on Multimedia 26 (2023), 904\u2013915.","journal-title":"IEEE Transactions on Multimedia"},{"key":"e_1_3_3_45_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01521"},{"key":"e_1_3_3_46_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.compbiomed.2023.107522"},{"issue":"1","key":"e_1_3_3_47_2","doi-asserted-by":"crossref","first-page":"4542","DOI":"10.1038\/s41467-023-40260-7","article-title":"Knowledge-enhanced visual-language pre-training on chest radiology images","volume":"14","author":"Zhang Xiaoman","year":"2023","unstructured":"Xiaoman Zhang, Chaoyi Wu, Ya Zhang, Weidi Xie, and Yanfeng Wang. 2023. Knowledge-enhanced visual-language pre-training on chest radiology images. Nature Communications 14, 1 (2023), 4542.","journal-title":"Nature Communications"},{"key":"e_1_3_3_48_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i07.6989"},{"key":"e_1_3_3_49_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.artmed.2023.102714"}],"container-title":["ACM Transactions on Knowledge Discovery from Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3722226","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3722226","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:57:04Z","timestamp":1750298224000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3722226"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,5,22]]},"references-count":48,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2025,6,30]]}},"alternative-id":["10.1145\/3722226"],"URL":"https:\/\/doi.org\/10.1145\/3722226","relation":{},"ISSN":["1556-4681","1556-472X"],"issn-type":[{"type":"print","value":"1556-4681"},{"type":"electronic","value":"1556-472X"}],"subject":[],"published":{"date-parts":[[2025,5,22]]},"assertion":[{"value":"2024-09-14","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-02-28","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-05-22","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}