{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,7,15]],"date-time":"2026-07-15T20:16:50Z","timestamp":1784146610426,"version":"3.55.0"},"reference-count":37,"publisher":"Wiley","issue":"1","license":[{"start":{"date-parts":[[2025,7,21]],"date-time":"2025-07-21T00:00:00Z","timestamp":1753056000000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"},{"start":{"date-parts":[[2025,7,21]],"date-time":"2025-07-21T00:00:00Z","timestamp":1753056000000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/doi.wiley.com\/10.1002\/tdm_license_1.1"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["IIS\u20102145625"],"award-info":[{"award-number":["IIS\u20102145625"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R01AI188576"],"award-info":[{"award-number":["R01AI188576"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["advanced.onlinelibrary.wiley.com"],"crossmark-restriction":true},"short-container-title":["Advanced Intelligent Systems"],"published-print":{"date-parts":[[2026,1]]},"abstract":"<jats:p>Large vision language models (LVLMs) have achieved superior performance on natural image and text tasks, inspiring extensive fine\u2010tuning research. However, their robustness against hallucination in clinical contexts remains understudied. We propose the Medical Visual Hallucination Test (MedVH), a novel evaluation framework assessing hallucination tendencies in both medical\u2010specific and general\u2010purpose LVLMs. MedVH encompasses six tasks targeting medical hallucinations, including two traditional tasks and four novel tasks formatted as multi\u2010choice visual question answering and long response generation. Our extensive experiments with six evaluation metrics reveal that medical LVLMs, despite promising performance on standard medical tasks, are particularly susceptible to hallucinations\u2014often more so than general models. This raises significant concerns about domain\u2010specific model reliability. For real\u2010world applications, medical LVLMs must accurately integrate medical knowledge while maintaining robust reasoning to prevent hallucination. We explore mitigation methods without model\u2010specific fine\u2010tuning, including prompt engineering and collaboration between general and domain\u2010specific models. Our work provides a foundation for future evaluation studies. The dataset is available at PhysioNet: https:\/\/physionet.org\/content\/medvh.<\/jats:p>","DOI":"10.1002\/aisy.202500255","type":"journal-article","created":{"date-parts":[[2025,7,21]],"date-time":"2025-07-21T18:49:56Z","timestamp":1753123796000},"update-policy":"https:\/\/doi.org\/10.1002\/crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["MedVH: Toward Systematic Evaluation of Hallucination for Large Vision Language Models in the Medical Context"],"prefix":"10.1002","volume":"8","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0168-8658","authenticated-orcid":false,"given":"Zishan","family":"Gu","sequence":"first","affiliation":[{"name":"Department of Computer Science and Engineering The Ohio State University  Columbus Ohio 43210 USA"},{"name":"Department of Biomedical Informatics The Ohio State University  Columbus Ohio 43210 USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Jiayuan","family":"Chen","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering The Ohio State University  Columbus Ohio 43210 USA"},{"name":"Department of Biomedical Informatics The Ohio State University  Columbus Ohio 43210 USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Fenglin","family":"Liu","sequence":"additional","affiliation":[{"name":"Department of Engineering Science Institute of Biomedical Engineering University of Oxford  Oxford OX1 2JD UK"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Changchang","family":"Yin","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering The Ohio State University  Columbus Ohio 43210 USA"},{"name":"Department of Biomedical Informatics The Ohio State University  Columbus Ohio 43210 USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4601-0779","authenticated-orcid":false,"given":"Ping","family":"Zhang","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering The Ohio State University  Columbus Ohio 43210 USA"},{"name":"Department of Biomedical Informatics The Ohio State University  Columbus Ohio 43210 USA"},{"name":"Translational Data Analytics institute The Ohio State University  Columbus Ohio 43210 USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"311","published-online":{"date-parts":[[2025,7,21]]},"reference":[{"key":"e_1_2_9_2_1","doi-asserted-by":"crossref","unstructured":"W.Fu S.Li Y.Zhao H.Ma inProc. of the IEEE Asia and South Pacific Design Automation Conference (ASP\u2010DAC) 2024 Incheon Korea Jan. 22\u2013252024 pp.349\u2013354.","DOI":"10.1109\/ASP-DAC58780.2024.10473927"},{"key":"e_1_2_9_3_1","doi-asserted-by":"publisher","DOI":"10.1093\/jamia\/ocae122"},{"key":"e_1_2_9_4_1","doi-asserted-by":"publisher","DOI":"10.1145\/3652594"},{"key":"e_1_2_9_5_1","doi-asserted-by":"publisher","DOI":"10.1038\/s41586-023-06291-2"},{"key":"e_1_2_9_6_1","unstructured":"T.Han L. C.Adams J.Papaioannou P.Grundmann arXiv 2023 abs\/2304.08247."},{"key":"e_1_2_9_7_1","doi-asserted-by":"crossref","unstructured":"C.Wu W.Lin X.Zhang Y.Zhang W.Xie Y.Wang Journal of the American Medical Informatics Association2024 31 1833\u20131843.","DOI":"10.1093\/jamia\/ocae045"},{"key":"e_1_2_9_8_1","unstructured":"Z.Chen A.Hern\u00e1ndez Cano A.Romanou A.BonnetarXiv2023 abs\/2311.16079."},{"key":"e_1_2_9_9_1","first-page":"19730","volume":"202","author":"Li J.","year":"2023","journal-title":"in Proc. of the International Conference on Machine Learning"},{"key":"e_1_2_9_10_1","first-page":"34892","volume":"36","author":"Liu H.","year":"2023","journal-title":"in Proc. of the Conference on Neural Information Processing Systems"},{"key":"e_1_2_9_11_1","unstructured":"M.Moor Q.Huang S.Wu M.Yasunaga inProc. of the 3rd Machine Learning for Health Symposium 2023 (PMLR 225:353) PMLR Online\/Cambridge MA2023."},{"key":"e_1_2_9_12_1","first-page":"28541","volume":"36","author":"Li C.","year":"2023","journal-title":"in Proc. of the Conference on Neural Information Processing Systems"},{"key":"e_1_2_9_13_1","unstructured":"S.Lee W. J.Kim J.Chang J. C.Ye inProc. of the International Conference on Learning Representations 2024 Poster #25."},{"key":"e_1_2_9_14_1","unstructured":"Z.Chen M.Varma J.\u2010B.Delbrouck M.Paschali arXiv 2024 abs\/2401.12208."},{"key":"e_1_2_9_15_1","doi-asserted-by":"crossref","unstructured":"Y.Bang S.Cahyawijaya N.Lee W.Dai inProc. of the International Joint Conference on Natural Language Processing \u2013 Asia\u2010Pacific Chapter of the Association for Computational Linguistics2023 pp.675\u2013688.","DOI":"10.18653\/v1\/2023.ijcnlp-main.45"},{"key":"e_1_2_9_16_1","unstructured":"H.Liu W.Xue Y.Chen D.Chen arXiv 2024 abs\/2402.00253."},{"key":"e_1_2_9_17_1","doi-asserted-by":"crossref","unstructured":"K.Wu E.Wu A.Cassasola A.Zhang arXiv 2024 abs\/2402.02008.","DOI":"10.1051\/shsconf\/202418802008"},{"key":"e_1_2_9_18_1","doi-asserted-by":"crossref","unstructured":"P.Manakul A.Liusie M. J. F.Gales inProc. of the Conference on Empirical Methods in Natural Language Processing2023 pp.9004\u20139017.","DOI":"10.18653\/v1\/2023.emnlp-main.557"},{"key":"e_1_2_9_19_1","unstructured":"K.Shuster S.Poff M.Chen D.Kiela J.Weston arXiv 2021 abs\/2104.07567."},{"key":"e_1_2_9_20_1","doi-asserted-by":"crossref","unstructured":"J.Li X.Cheng X.Zhao J.\u2010Y.Nie J.\u2010R.Wen in Proc. of the Conference on Empirical Methods in Natural Language Processing2023 pp.6449\u20136468.","DOI":"10.18653\/v1\/2023.emnlp-main.397"},{"key":"e_1_2_9_21_1","unstructured":"H.Ye T.Liu A.Zhang W.Hua W.Jia arXiv 2023 abs\/2309.06794."},{"key":"e_1_2_9_22_1","unstructured":"L. K.Umapathi A.Pal M.Sankarasubbu arXiv 2023 abs\/2307.15343."},{"key":"e_1_2_9_23_1","doi-asserted-by":"crossref","unstructured":"Y.Li Y.Du K.Zhou J.Wang W. X.Zhao J.\u2010R.Wen inProc. of the Conference on Empirical Methods in Natural Language Processing2023 pp.292\u2013310.","DOI":"10.18653\/v1\/2023.emnlp-main.20"},{"key":"e_1_2_9_24_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.media.2021.102125"},{"key":"e_1_2_9_25_1","doi-asserted-by":"publisher","DOI":"10.1007\/s00500-020-05424-3"},{"key":"e_1_2_9_26_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.aej.2022.10.053"},{"key":"e_1_2_9_27_1","doi-asserted-by":"publisher","DOI":"10.1038\/sdata.2018.251"},{"key":"e_1_2_9_28_1","doi-asserted-by":"crossref","unstructured":"B.Liu L. M.Zhan L.Xu L.Ma in Proc. of the IEEE 18th International Symposium on Biomedical Imaging IEEE Nice France2021 1650.","DOI":"10.1109\/ISBI48211.2021.9434010"},{"key":"e_1_2_9_29_1","doi-asserted-by":"crossref","unstructured":"X.Zhang C.Wu Z.Zhao W.Lin Y.Zhang Y.Wang W.Xie arXiv 2023 abs\/2305.10415.","DOI":"10.1155\/2023\/2198259"},{"key":"e_1_2_9_30_1","unstructured":"X.He Y.Zhang L.Mou E.Xing P.Xie arXiv 2020 abs\/2003.10286."},{"key":"e_1_2_9_31_1","unstructured":"A.Ben Abacha M.Sarrouti D.Demner\u2010Fushman S. A.Hasan H.M\u00fcller In Proc. of the CLEF 2021 Conference and Labs of the Evaluation Forum\u2010working notes CEUR\u2010WS.org Bucharest Romania2021 pp.21\u201324."},{"key":"e_1_2_9_32_1","doi-asserted-by":"crossref","unstructured":"X.Hu L.Gu Q.An M.Zhang L. Liu K. Kobayashi T. Harada R. M. Summers Y. Zhu In Pro. of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining ACM NY USA2023 pp.4156\u20134165.","DOI":"10.1145\/3580305.3599819"},{"key":"e_1_2_9_33_1","unstructured":"J.Chen D.Zhu X.Shen X.Li arXiv 2023 abs\/2310.09478."},{"key":"e_1_2_9_34_1","doi-asserted-by":"crossref","unstructured":"O. C.Thawakar A.Shaker S. S.Mullappilly H.Cholakkal in Proc. of the Workshop on Biomedical Natural Language Processing2024 pp.440\u2013448.","DOI":"10.18653\/v1\/2024.bionlp-1.35"},{"key":"e_1_2_9_35_1","doi-asserted-by":"crossref","unstructured":"A.Rohrbach L. A.Hendricks K.Burns T.Darrell K.Saenko in Proc. of the Conference on Empirical Methods in Natural Language Processing2018 pp.4035\u20134045.","DOI":"10.18653\/v1\/D18-1437"},{"key":"e_1_2_9_36_1","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v33i01.3301590"},{"key":"e_1_2_9_37_1","doi-asserted-by":"crossref","unstructured":"X.Tang A.Zou Z.Zhang Z.Li Findings of the Association for Computational Linguistics Association for Computational Linguistics Bangkok Thailand2024 599.","DOI":"10.18653\/v1\/2024.findings-acl.33"},{"key":"e_1_2_9_38_1","doi-asserted-by":"publisher","DOI":"10.1002\/aisy.202400840"}],"container-title":["Advanced Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/advanced.onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/aisy.202500255","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/advanced.onlinelibrary.wiley.com\/doi\/full-xml\/10.1002\/aisy.202500255","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/advanced.onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/aisy.202500255","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/advanced.onlinelibrary.wiley.com\/doi\/pdf\/10.1002\/aisy.202500255","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,21]],"date-time":"2026-01-21T16:51:49Z","timestamp":1769014309000},"score":1,"resource":{"primary":{"URL":"https:\/\/advanced.onlinelibrary.wiley.com\/doi\/10.1002\/aisy.202500255"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,21]]},"references-count":37,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2026,1]]}},"alternative-id":["10.1002\/aisy.202500255"],"URL":"https:\/\/doi.org\/10.1002\/aisy.202500255","archive":["Portico"],"relation":{},"ISSN":["2640-4567","2640-4567"],"issn-type":[{"value":"2640-4567","type":"print"},{"value":"2640-4567","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,7,21]]},"assertion":[{"value":"2025-03-02","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-07-21","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}],"article-number":"2500255"}}