{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,7]],"date-time":"2026-05-07T15:11:41Z","timestamp":1778166701245,"version":"3.51.4"},"reference-count":47,"publisher":"MDPI AG","issue":"7","license":[{"start":{"date-parts":[[2025,6,23]],"date-time":"2025-06-23T00:00:00Z","timestamp":1750636800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"\u201cRomanian Hub for Artificial Intelligence-HRIA\u201d, Smart Growth, Digitization and Financial Instruments Program","award":["334906"],"award-info":[{"award-number":["334906"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Information"],"abstract":"<jats:p>Medical imaging is crucial for diagnosing, monitoring, and treating medical conditions. The medical reports of radiology images are the primary medium through which medical professionals can attest to their findings, but their writing is time-consuming and requires specialized clinical expertise. Therefore, the automated generation of radiography reports has the potential to improve and standardize patient care and significantly reduce the workload of clinicians. Through our work, we have designed and evaluated an end-to-end transformer-based method to generate accurate and factually complete radiology reports for X-ray images. Additionally, we are the first to introduce curriculum learning for end-to-end transformers in medical imaging and demonstrate its impact in obtaining improved performance. The experiments were conducted using the MIMIC-CXR-JPG database, the largest available chest X-ray dataset. The results obtained are comparable with the current state of the art on the natural language generation (NLG) metrics BLEU and ROUGE-L, while setting new state-of-the-art results on F1 examples-averaged F1-macro and F1-micro metrics for clinical accuracy and on the METEOR metric widely used for NLG.<\/jats:p>","DOI":"10.3390\/info16070524","type":"journal-article","created":{"date-parts":[[2025,6,23]],"date-time":"2025-06-23T07:42:34Z","timestamp":1750664554000},"page":"524","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["GIT-CXR: End-to-End Transformer for Chest X-Ray Report Generation"],"prefix":"10.3390","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0009-0005-4889-8379","authenticated-orcid":false,"given":"Iustin","family":"S\u00eerbu","sequence":"first","affiliation":[{"name":"Faculty of Automatic Control and Computer Science, National University of Science and Technology Politehnica of Bucharest, 060042 Bucharest, Romania"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-3783-1659","authenticated-orcid":false,"given":"Iulia-Renata","family":"S\u00eerbu","sequence":"additional","affiliation":[{"name":"Faculty of Automatic Control and Computer Science, National University of Science and Technology Politehnica of Bucharest, 060042 Bucharest, Romania"},{"name":"School of Engineering, Zurich University of Applied Sciences, 8401 Winterthur, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5634-7984","authenticated-orcid":false,"given":"Jasmina","family":"Bogojeska","sequence":"additional","affiliation":[{"name":"School of Engineering, Zurich University of Applied Sciences, 8401 Winterthur, Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7255-5537","authenticated-orcid":false,"given":"Traian","family":"Rebedea","sequence":"additional","affiliation":[{"name":"Faculty of Automatic Control and Computer Science, National University of Science and Technology Politehnica of Bucharest, 060042 Bucharest, Romania"},{"name":"NVIDIA, Santa Clara, CA 95051, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2025,6,23]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Jing, B., Xie, P., and Xing, E. (2017). On the automatic generation of medical imaging reports. arXiv.","DOI":"10.18653\/v1\/P18-1240"},{"key":"ref_2","first-page":"1537","article-title":"Hybrid retrieval-generation reinforced agent for medical image report generation","volume":"31","author":"Li","year":"2018","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Delrue, L., Gosselin, R., Ilsen, B., Van Landeghem, A., de Mey, J., and Duyck, P. (2011). Difficulties in the interpretation of chest radiography. Comparative Interpretation of CT and Standard Radiography of the Chest, Springer.","DOI":"10.1007\/978-3-540-79942-9_2"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Cao, Y., Cui, L., Zhang, L., Yu, F., Li, Z., and Xu, Y. (2023, January 7\u201314). MMTN: Multi-modal memory transformer network for image-report consistent medical report generation. Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA.","DOI":"10.1609\/aaai.v37i1.25100"},{"key":"ref_5","unstructured":"Wang, J., Yang, Z., Hu, X., Li, L., Lin, K., Gan, Z., Liu, Z., Liu, C., and Wang, L. (2022). Git: A generative image-to-text transformer for vision and language. arXiv."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Tanida, T., M\u00fcller, P., Kaissis, G., and Rueckert, D. (2023, January 17\u201324). Interactive and explainable region-guided radiology report generation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada.","DOI":"10.1109\/CVPR52729.2023.00718"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Bu, S., Li, T., Yang, Y., and Dai, Z. (2024, January 16\u201322). Instance-level Expert Knowledge and Aggregate Discriminative Attention for Radiology Report Generation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR52733.2024.01346"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"102714","DOI":"10.1016\/j.artmed.2023.102714","article-title":"Radiology report generation with medical knowledge and multilevel image-report alignment: A new method and its verification","volume":"146","author":"Zhao","year":"2023","journal-title":"Artif. Intell. Med."},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Johnson, A.E., Pollard, T.J., Greenbaum, N.R., Lungren, M.P., Deng, C.y., Peng, Y., Lu, Z., Mark, R.G., Berkowitz, S.J., and Horng, S. (2019). MIMIC-CXR-JPG, a large publicly available database of labeled chest radiographs. arXiv.","DOI":"10.1038\/s41597-019-0322-0"},{"key":"ref_10","unstructured":"Banerjee, S., and Lavie, A. (2005, January 29). METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and\/or Summarization, Ann Arbor, MI, USA."},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Papineni, K., Roukos, S., Ward, T., and Zhu, W.J. (2002, January 6\u201312). Bleu: A method for automatic evaluation of machine translation. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, USA.","DOI":"10.3115\/1073083.1073135"},{"key":"ref_12","unstructured":"Lin, C.Y. (2004). Rouge: A package for automatic evaluation of summaries. Text Summarization Branches Out, Association for Computational Linguistics."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Cornia, M., Stefanini, M., Baraldi, L., and Cucchiara, R. (2020, January 13\u201319). Meshed-memory transformer for image captioning. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","DOI":"10.1109\/CVPR42600.2020.01059"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Nguyen, V.Q., Suganuma, M., and Okatani, T. (2022, January 23\u201327). Grit: Faster and better image captioning transformer using dual visual features. Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel.","DOI":"10.1007\/978-3-031-20059-5_10"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Zhang, X., Sun, X., Luo, Y., Ji, J., Zhou, Y., Wu, Y., Huang, F., and Ji, R. (2021, January 20\u201325). Rstnet: Captioning with adaptive attention on visual and non-visual words. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","DOI":"10.1109\/CVPR46437.2021.01521"},{"key":"ref_16","unstructured":"Yuan, L., Chen, D., Chen, Y.L., Codella, N., Dai, X., Gao, J., Hu, H., Huang, X., Li, B., and Li, C. (2021). Florence: A new foundation model for computer vision. arXiv."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Vinyals, O., Toshev, A., Bengio, S., and Erhan, D. (2015, January 7\u201312). Show and tell: A neural image caption generator. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA.","DOI":"10.1109\/CVPR.2015.7298935"},{"key":"ref_18","unstructured":"Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., and Bengio, Y. (2015, January 7\u20139). Show, attend and tell: Neural image caption generation with visual attention. Proceedings of the International Conference on Machine Learning, Lille, France."},{"key":"ref_19","unstructured":"Li, C.Y., Liang, X., Hu, Z., and Xing, E.P. (February, January 27). Knowledge-driven encode, retrieve, paraphrase for medical image report generation. Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA."},{"key":"ref_20","unstructured":"Srinivasan, P., Thapar, D., Bhavsar, A., and Nigam, A. (December, January 30). Hierarchical X-ray report generation via pathology tags and multi head attention. Proceedings of the Asian Conference on Computer Vision, Kyoto, Japan."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Yin, C., Qian, B., Wei, J., Li, X., Zhang, X., Li, Y., and Zheng, Q. (2019, January 8\u201311). Automatic generation of medical imaging diagnostic report with hierarchical recurrent neural network. Proceedings of the 2019 IEEE International Conference on Data Mining (ICDM), Beijing, China.","DOI":"10.1109\/ICDM.2019.00083"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","article-title":"Long short-term memory","volume":"9","author":"Hochreiter","year":"1997","journal-title":"Neural Comput."},{"key":"ref_23","first-page":"5998","article-title":"Attention is all you need","volume":"30","author":"Vaswani","year":"2017","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_24","doi-asserted-by":"crossref","unstructured":"Miura, Y., Zhang, Y., Tsai, E.B., Langlotz, C.P., and Jurafsky, D. (2020). Improving factual completeness and consistency of image-to-text radiology report generation. arXiv.","DOI":"10.18653\/v1\/2021.naacl-main.416"},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Lovelace, J., and Mortazavi, B. (2020, January 16\u201320). Learning to generate clinically coherent chest X-ray reports. Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, Online.","DOI":"10.18653\/v1\/2020.findings-emnlp.110"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Nguyen, H.T., Nie, D., Badamdorj, T., Liu, Y., Zhu, Y., Truong, J., and Cheng, L. (2021). Automated generation of accurate & fluent medical X-ray reports. arXiv.","DOI":"10.18653\/v1\/2021.emnlp-main.288"},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Chen, Z., Song, Y., Chang, T.H., and Wan, X. (2020, January 16\u201320). Generating Radiology Reports via Memory-driven Transformer. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online.","DOI":"10.18653\/v1\/2020.emnlp-main.112"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"2803","DOI":"10.1109\/TMI.2022.3171661","article-title":"Automated radiographic report generation purely on transformer: A multicriteria supervised approach","volume":"41","author":"Wang","year":"2022","journal-title":"IEEE Trans. Med Imaging"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"102633","DOI":"10.1016\/j.artmed.2023.102633","article-title":"Improving chest X-ray report generation by leveraging warm starting","volume":"144","author":"Nicolson","year":"2023","journal-title":"Artif. Intell. Med."},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Bengio, Y., Louradour, J., Collobert, R., and Weston, J. (2009, January 14\u201318). Curriculum learning. Proceedings of the 26th Annual International Conference on Machine Learning, Montreal, QC, Canada.","DOI":"10.1145\/1553374.1553380"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Subramanian, S., Rajeswar, S., Dutil, F., Pal, C., and Courville, A. (2017, January 3). Adversarial generation of natural language. Proceedings of the 2nd Workshop on Representation Learning for NLP, Vancouver, BC, Canada.","DOI":"10.18653\/v1\/W17-2629"},{"key":"ref_32","unstructured":"Spitkovsky, V.I., Alshawi, H., and Jurafsky, D. (2009). Baby Steps: How \u201cLess is More\u201d in unsupervised dependency parsing. NIPS: Grammar Induction, Representation of Language and Language Learning, Neural Information Processing Systems Foundation."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Chang, E., Yeh, H.S., and Demberg, V. (2021). Does the order of training samples matter? improving neural data-to-text generation with curriculum learning. arXiv.","DOI":"10.18653\/v1\/2021.eacl-main.61"},{"key":"ref_34","unstructured":"Lotter, W., Sorensen, G., and Cox, D. (2017, January 14). A multi-scale CNN and curriculum learning strategy for mammogram classification. Proceedings of the Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: Third International Workshop, DLMIA 2017, and 7th International Workshop, ML-CDS 2017, Held in Conjunction with MICCAI 2017, Qu\u00e9bec City, QC, Canada. Proceedings 3."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"102273","DOI":"10.1016\/j.media.2021.102273","article-title":"Curriculum learning for improved femur fracture classification: Scheduling data with prior knowledge and uncertainty","volume":"75","author":"Mateus","year":"2022","journal-title":"Med. Image Anal."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"136","DOI":"10.1016\/j.media.2019.04.009","article-title":"Automatic CNN-based detection of cardiac MR motion artefacts using k-space data augmentation and curriculum learning","volume":"55","author":"Oksuz","year":"2019","journal-title":"Med. Image Anal."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Wei, J., Suriawinata, A., Ren, B., Liu, X., Lisovsky, M., Vaickus, L., Brown, C., Baker, M., Nasir-Moin, M., and Tomita, N. (2021, January 5\u20139). Learn like a pathologist: Curriculum learning by annotator agreement for histopathology image classification. Proceedings of the IEEE\/CVF Winter Conference on Applications of Computer Vision, Virtual.","DOI":"10.1109\/WACV48630.2021.00252"},{"key":"ref_38","unstructured":"Alsharid, M., El-Bouri, R., Sharma, H., Drukker, L., Papageorghiou, A.T., and Noble, J.A. (2020, January 4\u20138). A curriculum learning based approach to captioning ultrasound images. Proceedings of the Medical Ultrasound, and Preterm, Perinatal and Paediatric Image Analysis: First International Workshop, ASMUS 2020, and 5th International Workshop, PIPPI 2020, Held in Conjunction with MICCAI 2020, Lima, Peru. Proceedings 1."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1108\/eb026526","article-title":"A statistical interpretation of term specificity and its application in retrieval","volume":"28","year":"1972","journal-title":"J. Doc."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Liu, F., Ge, S., Zou, Y., and Wu, X. (2022). Competence-based multimodal curriculum learning for medical report generation. arXiv.","DOI":"10.18653\/v1\/2021.acl-long.234"},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Smit, A., Jain, S., Rajpurkar, P., Pareek, A., Ng, A.Y., and Lungren, M.P. (2020). CheXbert: Combining automatic labelers and expert annotations for accurate radiology report labeling using BERT. arXiv.","DOI":"10.18653\/v1\/2020.emnlp-main.117"},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"317","DOI":"10.1038\/s41597-019-0322-0","article-title":"MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports","volume":"6","author":"Johnson","year":"2019","journal-title":"Sci. Data"},{"key":"ref_43","unstructured":"Irvin, J., Rajpurkar, P., Ko, M., Yu, Y., Ciurea-Ilcus, S., Chute, C., Marklund, H., Haghgoo, B., Ball, R., and Shpanskaya, K. (February, January 27). Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison. Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA."},{"key":"ref_44","unstructured":"Endo, M., Krishnan, R., Krishna, V., Ng, A.Y., and Rajpurkar, P. (2021, January 6\u20137). Retrieval-based chest x-ray report generation using a pre-trained contrastive language-image model. Proceedings of the Machine Learning for Health, Virtual."},{"key":"ref_45","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv."},{"key":"ref_46","unstructured":"Boag, W., Hsu, T.M.H., McDermott, M., Berner, G., Alesentzer, E., and Szolovits, P. (2020, January 11). Baselines for chest X-ray report generation. Proceedings of the Machine Learning for Health Workshop, Virtual."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"304","DOI":"10.1093\/jamia\/ocv080","article-title":"Preparing a collection of radiology examinations for distribution and retrieval","volume":"23","author":"Kohli","year":"2016","journal-title":"J. Am. Med. Inform. Assoc."}],"container-title":["Information"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2078-2489\/16\/7\/524\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T17:56:58Z","timestamp":1760032618000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2078-2489\/16\/7\/524"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,6,23]]},"references-count":47,"journal-issue":{"issue":"7","published-online":{"date-parts":[[2025,7]]}},"alternative-id":["info16070524"],"URL":"https:\/\/doi.org\/10.3390\/info16070524","relation":{},"ISSN":["2078-2489"],"issn-type":[{"value":"2078-2489","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,6,23]]}}}