{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,6]],"date-time":"2026-04-06T21:05:45Z","timestamp":1775509545174,"version":"3.50.1"},"reference-count":41,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2024,5,3]],"date-time":"2024-05-03T00:00:00Z","timestamp":1714694400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,5,3]],"date-time":"2024-05-03T00:00:00Z","timestamp":1714694400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Start-up Fund for RAPs under the Strategic Hiring Scheme"},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["82171034"],"award-info":[{"award-number":["82171034"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Global STEM Professorship Scheme from HKSAR"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["npj Digit. Med."],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Fundus fluorescein angiography (FFA) is a crucial diagnostic tool for chorioretinal diseases, but its interpretation requires significant expertise and time. Prior studies have used Artificial Intelligence (AI)-based systems to assist FFA interpretation, but these systems lack user interaction and comprehensive evaluation by ophthalmologists. Here, we used large language models (LLMs) to develop an automated interpretation pipeline for both report generation and medical question-answering (QA) for FFA images. The pipeline comprises two parts: an image-text alignment module (Bootstrapping Language-Image Pre-training) for report generation and an LLM (Llama 2) for interactive QA. The model was developed using 654,343 FFA images with 9392 reports. It was evaluated both automatically, using language-based and classification-based metrics, and manually by three experienced ophthalmologists. The automatic evaluation of the generated reports demonstrated that the system can generate coherent and comprehensible free-text reports, achieving a BERTScore of 0.70 and F1 scores ranging from 0.64 to 0.82 for detecting top-5 retinal conditions. The manual evaluation revealed acceptable accuracy (68.3%, Kappa 0.746) and completeness (62.3%, Kappa 0.739) of the generated reports. The generated free-form answers were evaluated manually, with the majority meeting the ophthalmologists\u2019 criteria (error-free: 70.7%, complete: 84.0%, harmless: 93.7%, satisfied: 65.3%, Kappa: 0.762\u20130.834). This study introduces an innovative framework that combines multi-modal transformers and LLMs, enhancing ophthalmic image interpretation, and facilitating interactive communications during medical consultation.<\/jats:p>","DOI":"10.1038\/s41746-024-01101-z","type":"journal-article","created":{"date-parts":[[2024,5,3]],"date-time":"2024-05-03T14:02:06Z","timestamp":1714744926000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":56,"title":["FFA-GPT: an automated pipeline for fundus fluorescein angiography interpretation and question-answer"],"prefix":"10.1038","volume":"7","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1581-5045","authenticated-orcid":false,"given":"Xiaolan","family":"Chen","sequence":"first","affiliation":[]},{"given":"Weiyi","family":"Zhang","sequence":"additional","affiliation":[]},{"given":"Pusheng","family":"Xu","sequence":"additional","affiliation":[]},{"given":"Ziwei","family":"Zhao","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0914-7864","authenticated-orcid":false,"given":"Yingfeng","family":"Zheng","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6094-137X","authenticated-orcid":false,"given":"Danli","family":"Shi","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6912-2810","authenticated-orcid":false,"given":"Mingguang","family":"He","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,5,3]]},"reference":[{"key":"1101_CR1","doi-asserted-by":"publisher","first-page":"849","DOI":"10.1016\/j.survophthal.2023.05.004","volume":"68","author":"M Kvopka","year":"2023","unstructured":"Kvopka, M., Chan, W., Lake, S. R., Durkin, S. & Taranath, D. Fundus fluorescein angiography imaging of retinopathy of prematurity in infants: A review. Surv. Ophthalmol. 68, 849\u2013860 (2023).","journal-title":"Surv. Ophthalmol."},{"key":"1101_CR2","doi-asserted-by":"publisher","DOI":"10.1038\/s41598-020-71622-6","volume":"10","author":"K Jin","year":"2020","unstructured":"Jin, K. et al. Automatic detection of non-perfusion areas in diabetic macular edema from fundus fluorescein angiography for decision making using deep learning. Sci. Rep. 10, 15138 (2020).","journal-title":"Sci. Rep."},{"key":"1101_CR3","doi-asserted-by":"publisher","first-page":"539","DOI":"10.1109\/TPAMI.2022.3148210","volume":"45","author":"M Stefanini","year":"2023","unstructured":"Stefanini, M. et al. From Show to Tell: A Survey on Deep Learning-Based Image Captioning. IEEE Trans. pattern Anal. Mach. Intell. 45, 539\u2013559 (2023).","journal-title":"IEEE Trans. pattern Anal. Mach. Intell."},{"key":"1101_CR4","doi-asserted-by":"publisher","DOI":"10.1016\/j.jbi.2023.104281","volume":"138","author":"Z Lin","year":"2023","unstructured":"Lin, Z. et al. Contrastive pre-training and linear interaction attention-based transformer for universal medical reports generation. J. Biomed. Inform. 138, 104281 (2023).","journal-title":"J. Biomed. Inform."},{"key":"1101_CR5","doi-asserted-by":"publisher","unstructured":"Li, M. et al. Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation. IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 20624-20633 https:\/\/doi.org\/10.1109\/CVPR52688.2022.02000 (2022).","DOI":"10.1109\/CVPR52688.2022.02000"},{"key":"1101_CR6","doi-asserted-by":"publisher","first-page":"e917","DOI":"10.1016\/S2589-7500(23)00201-7","volume":"5","author":"BK Betzler","year":"2023","unstructured":"Betzler, B. K. et al. Large language models and their impact in ophthalmology. Lancet Digi. Health 5, e917\u2013e924 (2023).","journal-title":"Lancet Digi. Health"},{"key":"1101_CR7","doi-asserted-by":"publisher","first-page":"141","DOI":"10.1016\/j.ajo.2023.05.024","volume":"254","author":"LZ Cai","year":"2023","unstructured":"Cai, L. Z. et al. Performance of Generative Large Language Models on Ophthalmology Board-Style Questions. Am. J. Ophthalmol. 254, 141\u2013149 (2023).","journal-title":"Am. J. Ophthalmol."},{"key":"1101_CR8","doi-asserted-by":"publisher","unstructured":"Xu, P. et al. Evaluation of a digital ophthalmologist app built by GPT4-V (ision). medRxiv, 2023.2011. 2027.23299056 https:\/\/doi.org\/10.1101\/2023.11.27.23299056 (2023).","DOI":"10.1101\/2023.11.27.23299056"},{"key":"1101_CR9","doi-asserted-by":"publisher","unstructured":"Touvron, H. et al. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 https:\/\/doi.org\/10.48550\/arXiv.2307.09288 (2023).","DOI":"10.48550\/arXiv.2307.09288"},{"key":"1101_CR10","doi-asserted-by":"publisher","unstructured":"Ge, J. et al. Development of a liver disease-Specific large language model chat Interface using retrieval augmented generation. Hepatology https:\/\/doi.org\/10.1097\/hep.0000000000000834 (2024).","DOI":"10.1097\/hep.0000000000000834"},{"key":"1101_CR11","doi-asserted-by":"publisher","unstructured":"Civettini, I. et al. Evaluating the performance of large language models in haematopoietic stem cell transplantation decision-making. Br. J. Haematol https:\/\/doi.org\/10.1111\/bjh.19200 (2023).","DOI":"10.1111\/bjh.19200"},{"key":"1101_CR12","doi-asserted-by":"publisher","DOI":"10.1038\/s41467-024-46411-8","volume":"15","author":"S Sandmann","year":"2024","unstructured":"Sandmann, S., Riepenhausen, S., Plagwitz, L. & Varghese, J. Systematic analysis of ChatGPT, Google search and Llama 2 for clinical decision support tasks. Nat. Commun. 15, 2050 (2024).","journal-title":"Nat. Commun."},{"key":"1101_CR13","doi-asserted-by":"crossref","unstructured":"Masalkhi, M. et al. A side-by-side evaluation of Llama 2 by meta with ChatGPT and its application in ophthalmology. Eye. 1\u20134 (2024).","DOI":"10.1038\/s41433-024-02972-y"},{"key":"1101_CR14","doi-asserted-by":"publisher","first-page":"2401","DOI":"10.1007\/s00417-021-05151-x","volume":"259","author":"M Chen","year":"2021","unstructured":"Chen, M. et al. Automatic detection of leakage point in central serous chorioretinopathy of fundus fluorescein angiography based on time sequence deep learning. Graefe\u2019s Arch. Clin. Exp. Ophthalmol. 259, 2401\u20132411 (2021).","journal-title":"Graefe\u2019s Arch. Clin. Exp. Ophthalmol."},{"key":"1101_CR15","doi-asserted-by":"publisher","first-page":"1852","DOI":"10.1136\/bjo-2022-321472","volume":"107","author":"Z Gao","year":"2023","unstructured":"Gao, Z. et al. Automatic interpretation and clinical evaluation for fundus fluorescein angiography images of diabetic retinopathy patients by deep learning. Br. J. Ophthalmol. 107, 1852\u20131858 (2023).","journal-title":"Br. J. Ophthalmol."},{"key":"1101_CR16","doi-asserted-by":"publisher","first-page":"1663","DOI":"10.1007\/s00417-021-05503-7","volume":"260","author":"Z Gao","year":"2022","unstructured":"Gao, Z. et al. End-to-end diabetic retinopathy grading based on fundus fluorescein angiography images using deep learning. Graefe\u2019s Arch. Clin. Exp. Ophthalmol. 260, 1663\u20131673 (2022).","journal-title":"Graefe\u2019s Arch. Clin. Exp. Ophthalmol."},{"key":"1101_CR17","doi-asserted-by":"publisher","first-page":"1405","DOI":"10.3390\/bioengineering10121405","volume":"10","author":"B Zhang","year":"2023","unstructured":"Zhang, B. et al. An Improved Microaneurysm Detection Model Based on SwinIR and YOLOv8. Bioengineering 10, 1405 (2023).","journal-title":"Bioengineering"},{"key":"1101_CR18","doi-asserted-by":"publisher","first-page":"779","DOI":"10.1007\/s00417-019-04575-w","volume":"258","author":"X Pan","year":"2020","unstructured":"Pan, X. et al. Multi-label classification of retinal lesions in diabetic retinopathy for automatic analysis of fundus fluorescein angiography based on deep learning. Graefe\u2019s Arch. Clin. Exp. Ophthalmol. 258, 779\u2013785 (2020).","journal-title":"Graefe\u2019s Arch. Clin. Exp. Ophthalmol."},{"key":"1101_CR19","doi-asserted-by":"publisher","first-page":"e51926","DOI":"10.2196\/51926","volume":"26","author":"X Liu","year":"2024","unstructured":"Liu, X. et al. Uncovering Language Disparity of ChatGPT on Retinal Vascular Disease Classification: Cross-Sectional Study. J. Med. Internet Res. 26, e51926 (2024).","journal-title":"J. Med. Internet Res."},{"key":"1101_CR20","doi-asserted-by":"publisher","DOI":"10.1016\/j.media.2023.102798","volume":"86","author":"S Yang","year":"2023","unstructured":"Yang, S. et al. Radiology report generation with a learned knowledge base and multi-modal alignment. Med Image Anal. 86, 102798 (2023).","journal-title":"Med Image Anal."},{"key":"1101_CR21","doi-asserted-by":"publisher","first-page":"1226225","DOI":"10.3389\/fpsyt.2023.1226225","volume":"14","author":"F Marino","year":"2023","unstructured":"Marino, F., Alby, F., Zucchermaglio, C. & Fatigante, M. Digital technology in medical visits: a critical review of its impact on doctor-patient communication. Front. Psychiatry 14, 1226225 (2023).","journal-title":"Front. Psychiatry"},{"key":"1101_CR22","doi-asserted-by":"publisher","first-page":"e222976","DOI":"10.1148\/radiol.222976","volume":"307","author":"JH Lee","year":"2023","unstructured":"Lee, J. H., Hong, H., Nam, G., Hwang, E. J. & Park, C. M. Effect of human-AI interaction on detection of malignant lung nodules on chest radiographs. Radiology 307, e222976 (2023).","journal-title":"Radiology"},{"key":"1101_CR23","doi-asserted-by":"publisher","first-page":"e2313674","DOI":"10.1001\/jamanetworkopen.2023.13674","volume":"6","author":"W-J Tong","year":"2023","unstructured":"Tong, W.-J. et al. Integration of artificial intelligence decision aids to reduce workload and enhance efficiency in thyroid nodule management. JAMA Netw. Open 6, e2313674\u2013e2313674 (2023).","journal-title":"JAMA Netw. Open"},{"key":"1101_CR24","doi-asserted-by":"publisher","first-page":"1006","DOI":"10.1038\/s42256-023-00711-8","volume":"5","author":"R Achtibat","year":"2023","unstructured":"Achtibat, R. et al. From attribution maps to human-understandable explanations through Concept Relevance Propagation. Nat. Mach. Intell. 5, 1006\u20131019 (2023).","journal-title":"Nat. Mach. Intell."},{"key":"1101_CR25","doi-asserted-by":"publisher","DOI":"10.3389\/fcvm.2022.823436","volume":"9","author":"D Shi","year":"2022","unstructured":"Shi, D. et al. A deep learning system for fully automated retinal vessel measurement in high throughput image analysis. Front. Cardiovasc. Med. 9, 823436 (2022).","journal-title":"Front. Cardiovasc. Med."},{"key":"1101_CR26","doi-asserted-by":"publisher","first-page":"55","DOI":"10.7326\/M14-0697","volume":"162","author":"GS Collins","year":"2015","unstructured":"Collins, G. S., Reitsma, J. B., Altman, D. G. & Moons, K. G. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. Ann. Intern. Med. 162, 55\u201363 (2015).","journal-title":"Ann. Intern. Med."},{"key":"1101_CR27","unstructured":"Li, J., Li, D., Xiong, C. & Hoi, S. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. International conference on machine learning, 12888\u201312900 (2022)."},{"key":"1101_CR28","unstructured":"Dosovitskiy, A. et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. International Conference on Learning Representations, (2021)."},{"key":"1101_CR29","first-page":"4171","volume":"1","author":"J Devlin","year":"2019","unstructured":"Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL-HLT 2019 1, 4171\u20134186 (2019).","journal-title":"NAACL-HLT 2019"},{"key":"1101_CR30","unstructured":"Loshchilov, I. & Hutter, F. Decoupled Weight Decay Regularization. International Conference on Learning Representations (2018)."},{"key":"1101_CR31","doi-asserted-by":"publisher","first-page":"862","DOI":"10.1016\/j.oret.2023.05.022","volume":"7","author":"B Momenaei","year":"2023","unstructured":"Momenaei, B. et al. Appropriateness and readability of ChatGPT-4-generated responses for surgical treatment of retinal diseases. Ophthalmol. Retin. 7, 862\u2013868 (2023).","journal-title":"Ophthalmol. Retin."},{"key":"1101_CR32","doi-asserted-by":"publisher","unstructured":"Chang, Y. et al. A survey on evaluation of large language models. ACM Transactions on Intelligent Systems and Technology https:\/\/doi.org\/10.1145\/3641289 (2023).","DOI":"10.1145\/3641289"},{"key":"1101_CR33","doi-asserted-by":"publisher","unstructured":"Papineni, K., Roukos, S., Ward, T. & Zhu, W.-J. Bleu: a Method for Automatic Evaluation of Machine Translation. ACL 2002, 311\u2013318 https:\/\/doi.org\/10.3115\/1073083.1073135 (2002).","DOI":"10.3115\/1073083.1073135"},{"key":"1101_CR34","doi-asserted-by":"crossref","unstructured":"Vedantam, R., Lawrence Zitnick, C. & Parikh, D. Cider: Consensus-based image description evaluation. Proceedings of the IEEE conference on computer vision and pattern recognition, 4566\u20134575 (2015).","DOI":"10.1109\/CVPR.2015.7299087"},{"key":"1101_CR35","first-page":"81","volume":"74","author":"C-Y Lin","year":"2004","unstructured":"Lin, C.-Y. ROUGE: A Package for Automatic Evaluation of Summaries. Text. Summariz. Branches Out. 74, 81 (2004).","journal-title":"Text. Summariz. Branches Out."},{"key":"1101_CR36","doi-asserted-by":"publisher","unstructured":"Anderson, P., Fernando, B., Johnson, M. & Gould, S. SPICE: Semantic Propositional Image Caption Evaluation. 2016 European Conference on Computer Vision, 382\u2013398 https:\/\/doi.org\/10.1007\/978-3-319-46454-1_24 (2016).","DOI":"10.1007\/978-3-319-46454-1_24"},{"key":"1101_CR37","unstructured":"Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q. & Artzi, Y. BERTScore: Evaluating Text Generation with BERT. International Conference on Learning Representations (2020)."},{"key":"1101_CR38","doi-asserted-by":"publisher","first-page":"102381","DOI":"10.1016\/j.artmed.2022.102381","volume":"132","author":"K Rjoob","year":"2022","unstructured":"Rjoob, K. et al. Machine learning and the electrocardiogram over two decades: Time series and meta-analysis of the algorithms, evaluation metrics and applications. Artif. Intell. Med 132, 102381 (2022).","journal-title":"Artif. Intell. Med"},{"key":"1101_CR39","doi-asserted-by":"publisher","first-page":"172","DOI":"10.1038\/s41586-023-06291-2","volume":"620","author":"K Singhal","year":"2023","unstructured":"Singhal, K. et al. Large language models encode clinical knowledge. Nature 620, 172\u2013180 (2023).","journal-title":"Nature"},{"key":"1101_CR40","doi-asserted-by":"publisher","first-page":"6","DOI":"10.1097\/JTO.0b013e318200f983","volume":"6","author":"JN Mandrekar","year":"2011","unstructured":"Mandrekar, J. N. Measures of interrater agreement. J. Thorac. Oncol. 6, 6\u20137 (2011).","journal-title":"J. Thorac. Oncol."},{"key":"1101_CR41","doi-asserted-by":"publisher","first-page":"52","DOI":"10.1097\/PTS.0b013e3182948ef9","volume":"11","author":"T Williams","year":"2015","unstructured":"Williams, T., Szekendi, M., Pavkovic, S., Clevenger, W. & Cerese, J. The reliability of AHRQ Common Format Harm Scales in rating patient safety events. J. Patient Saf. 11, 52\u201359 (2015).","journal-title":"J. Patient Saf."}],"container-title":["npj Digital Medicine"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.nature.com\/articles\/s41746-024-01101-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41746-024-01101-z","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.nature.com\/articles\/s41746-024-01101-z.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,5,3]],"date-time":"2024-05-03T14:08:01Z","timestamp":1714745281000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.nature.com\/articles\/s41746-024-01101-z"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,5,3]]},"references-count":41,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2024,12]]}},"alternative-id":["1101"],"URL":"https:\/\/doi.org\/10.1038\/s41746-024-01101-z","relation":{"has-preprint":[{"id-type":"doi","id":"10.21203\/rs.3.rs-3307492\/v1","asserted-by":"object"}]},"ISSN":["2398-6352"],"issn-type":[{"value":"2398-6352","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,5,3]]},"assertion":[{"value":"29 August 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 April 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"3 May 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The authors declare no competing interests.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"111"}}