{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,13]],"date-time":"2025-10-13T03:41:19Z","timestamp":1760326879231,"version":"build-2065373602"},"publisher-location":"Cham","reference-count":56,"publisher":"Springer Nature Switzerland","isbn-type":[{"value":"9783032083296","type":"print"},{"value":"9783032083302","type":"electronic"}],"license":[{"start":{"date-parts":[[2025,10,14]],"date-time":"2025-10-14T00:00:00Z","timestamp":1760400000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,10,14]],"date-time":"2025-10-14T00:00:00Z","timestamp":1760400000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2026]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>In the dynamic landscape of artificial intelligence, the exploration of hallucinations within vision-language (VL) models emerges as a critical frontier. This work delves into the intricacies of hallucinatory phenomena exhibited by widely used image captioners, unraveling interesting patterns. Specifically, we step upon previously introduced techniques of conceptual counterfactual explanations to address VL hallucinations. The deterministic and efficient nature of the employed conceptual counterfactuals backbone is able to suggest semantically minimal edits driven by hierarchical knowledge, so that the transition from a hallucinated caption to a non-hallucinated one is performed in a black-box manner. HalCECE, our proposed hallucination detection framework is highly interpretable, by providing semantically meaningful edits apart from standalone numbers, while the hierarchical decomposition of hallucinated concepts leads to a thorough hallucination analysis. Another novelty tied to the current work is the investigation of role hallucinations, being one of the first works to involve interconnections between visual concepts in hallucination detection. Overall, HalCECE recommends an explainable direction to the crucial field of VL hallucination detection, thus fostering trustworthy evaluation of current and future VL systems.<\/jats:p>","DOI":"10.1007\/978-3-032-08330-2_5","type":"book-chapter","created":{"date-parts":[[2025,10,13]],"date-time":"2025-10-13T03:11:01Z","timestamp":1760325061000},"page":"87-111","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["HalCECE: A Framework for\u00a0Explainable Hallucination Detection Through Conceptual Counterfactuals in\u00a0Image Captioning"],"prefix":"10.1007","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9442-4186","authenticated-orcid":false,"given":"Maria","family":"Lymperaiou","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7015-7746","authenticated-orcid":false,"given":"Giorgos","family":"FIlandrianos","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0009-0001-5817-3794","authenticated-orcid":false,"given":"Angeliki","family":"Dimitriou","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0632-9769","authenticated-orcid":false,"given":"Athanasios","family":"Voulodimos","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1210-9874","authenticated-orcid":false,"given":"Giorgos","family":"Stamou","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,10,14]]},"reference":[{"key":"5_CR1","unstructured":"Anthropic: The claude 3 model family: Opus, sonnet, haiku. https:\/\/api.semanticscholar.org\/CorpusID:268232499"},{"key":"5_CR2","unstructured":"Bai, Z., et al.: Hallucination of multimodal large language models: A survey (2024). https:\/\/arxiv.org\/abs\/2404.18930"},{"key":"5_CR3","doi-asserted-by":"crossref","unstructured":"Chu, X., Su, J., Zhang, B., Shen, C.: Visionllama: a unified llama backbone for vision tasks (2024). https:\/\/arxiv.org\/abs\/2403.00522","DOI":"10.1007\/978-3-031-72848-8_1"},{"key":"5_CR4","unstructured":"Chung, H.W., et al.: Scaling instruction-finetuned language models (2022). https:\/\/arxiv.org\/abs\/2210.11416"},{"key":"5_CR5","doi-asserted-by":"crossref","unstructured":"Dai, W., Liu, Z., Ji, Z., Su, D., Fung, P.: Plausible may not be faithful: Probing object hallucination in vision-language pre-training. ArXiv abs\/2210.07688 (2022). https:\/\/api.semanticscholar.org\/CorpusID:252907639","DOI":"10.18653\/v1\/2023.eacl-main.156"},{"key":"5_CR6","unstructured":"Datta, S., Sundararaman, D.: Evaluating hallucination in large vision-language models based on context-aware object similarities (2025). https:\/\/arxiv.org\/abs\/2501.15046"},{"issue":"1","key":"5_CR7","doi-asserted-by":"publisher","first-page":"269","DOI":"10.1007\/BF01386390","volume":"1","author":"EW Dijkstra","year":"1959","unstructured":"Dijkstra, E.W.: A note on two problems in connexion with graphs. Numer. Math. 1(1), 269\u2013271 (1959)","journal-title":"Numer. Math."},{"key":"5_CR8","unstructured":"Dimitriou, A., Lymperaiou, M., Filandrianos, G., Thomas, K., Stamou, G.: Structure your data: Towards semantic graph counterfactuals (2024). https:\/\/arxiv.org\/abs\/2403.06514"},{"key":"5_CR9","unstructured":"Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale (2021)"},{"key":"5_CR10","unstructured":"Filandrianos, G., Thomas, K., Dervakos, E., Stamou, G.: Conceptual edits as counterfactual explanations. In: AAAI Spring Symposium: MAKE (2022)"},{"key":"5_CR11","unstructured":"Fischer, T., Remus, S., Biemann, C.: Measuring faithfulness of abstractive summaries. In: Schaefer, R., Bai, X., Stede, M., Zesch, T. (eds.) Proceedings of the 18th Conference on Natural Language Processing (KONVENS 2022), pp. 63\u201373. KONVENS 2022 Organizers, Potsdam, Germany (12\u201315 Sep 2022). https:\/\/aclanthology.org\/2022.konvens-1.8\/"},{"key":"5_CR12","doi-asserted-by":"crossref","unstructured":"Ghandi, T., Pourreza, H.R., Mahyar, H.: Deep learning approaches on image captioning: a review. ACM Comput. Surv. 56, 1 \u2013 39 (2022). https:\/\/api.semanticscholar.org\/CorpusID:246430542","DOI":"10.1145\/3617592"},{"issue":"S1","key":"5_CR13","doi-asserted-by":"publisher","first-page":"S63","DOI":"10.1121\/1.2016299","volume":"62","author":"F Jelinek","year":"2005","unstructured":"Jelinek, F., Mercer, R.L., Bahl, L.R., Baker, J.K.: Perplexity\u2013a measure of the difficulty of speech recognition tasks. J. Acoustical Soc. Am. 62(S1), S63\u2013S63 (2005). https:\/\/doi.org\/10.1121\/1.2016299","journal-title":"J. Acoustical Soc. Am."},{"key":"5_CR14","unstructured":"Jing, L., Li, R., Chen, Y., Jia, M., Du, X.: Faithscore: Evaluating hallucinations in large vision-language models (2023). ArXiv abs\/2311.01477 (2023). https:\/\/api.semanticscholar.org\/CorpusID:265019245"},{"issue":"4","key":"5_CR15","doi-asserted-by":"publisher","first-page":"325","DOI":"10.1007\/BF02278710","volume":"38","author":"R Jonker","year":"1987","unstructured":"Jonker, R., Volgenant, A.: A shortest augmenting path algorithm for dense and sparse linear assignment problems. Computing 38(4), 325\u2013340 (1987)","journal-title":"Computing"},{"key":"5_CR16","doi-asserted-by":"crossref","unstructured":"Kornblith, S., Li, L., Wang, Z., Nguyen, T.: Guiding image captioning models toward more specific captions (2023)","DOI":"10.1109\/ICCV51070.2023.01400"},{"key":"5_CR17","doi-asserted-by":"crossref","unstructured":"Krishna, R., et al.: Visual genome: Connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vis. 123, 32 \u2013 73 (2016). https:\/\/api.semanticscholar.org\/CorpusID:4492210","DOI":"10.1007\/s11263-016-0981-7"},{"key":"5_CR18","doi-asserted-by":"publisher","unstructured":"Kryscinski, W., McCann, B., Xiong, C., Socher, R.: Evaluating the factual consistency of abstractive text summarization. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 9332\u20139346. Association for Computational Linguistics, Online, November 2020. https:\/\/doi.org\/10.18653\/v1\/2020.emnlp-main.750, https:\/\/aclanthology.org\/2020.emnlp-main.750\/","DOI":"10.18653\/v1\/2020.emnlp-main.750"},{"key":"5_CR19","unstructured":"Kumar, A.: The illustrated image captioning using transformers. ankur3107.github.io (2022). https:\/\/ankur3107.github.io\/blogs\/the-illustrated-image-captioning-using-transformers\/"},{"key":"5_CR20","doi-asserted-by":"crossref","unstructured":"Leng, S., Zhang, H., Chen, G., Li, X., Lu, S., Miao, C., Bing, L.: Mitigating object hallucinations in large vision-language models through visual contrastive decoding (2023)","DOI":"10.1109\/CVPR52733.2024.01316"},{"key":"5_CR21","unstructured":"Li, J., Li, D., Savarese, S., Hoi, S.: Blip-2: bootstrapping language-image pre-training with frozen image encoders and large language models. In: Proceedings of the 40th International Conference on Machine Learning, ICML\u201923, JMLR.org (2023)"},{"key":"5_CR22","unstructured":"Li, J., Li, D., Savarese, S., Hoi, S.C.H.: Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. In: International Conference on Machine Learning (2023). https:\/\/api.semanticscholar.org\/CorpusID:256390509"},{"key":"5_CR23","unstructured":"Li, J., Li, D., Xiong, C., Hoi, S.: Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation (2022)"},{"key":"5_CR24","doi-asserted-by":"crossref","unstructured":"Li, Y., Du, Y., Zhou, K., Wang, J., Zhao, W.X., rong Wen, J.: Evaluating object hallucination in large vision-language models. ArXiv abs\/2305.10355 (2023). https:\/\/api.semanticscholar.org\/CorpusID:258740697","DOI":"10.18653\/v1\/2023.emnlp-main.20"},{"key":"5_CR25","unstructured":"Lin, C.Y.: ROUGE: A package for automatic evaluation of summaries. In: Text Summarization Branches Out. pp. 74\u201381. Association for Computational Linguistics, Barcelona, Spain, July 2004. https:\/\/aclanthology.org\/W04-1013"},{"key":"5_CR26","doi-asserted-by":"crossref","unstructured":"Lin, T.Y., et al.: Microsoft coco: Common objects in context (2015)","DOI":"10.1007\/978-3-319-10602-1_48"},{"key":"5_CR27","unstructured":"Liu, F., Lin, K., Li, L., Wang, J., Yacoob, Y., Wang, L.: Mitigating hallucination in large multi-modal models via robust instruction tuning (2023)"},{"key":"5_CR28","unstructured":"Liu, H., et al.: A survey on hallucination in large vision-language models (2024). https:\/\/arxiv.org\/abs\/2402.00253"},{"key":"5_CR29","doi-asserted-by":"crossref","unstructured":"Liu, H., Li, C., Li, Y., Lee, Y.J.: Improved baselines with visual instruction tuning (2023)","DOI":"10.1109\/CVPR52733.2024.02484"},{"key":"5_CR30","unstructured":"Liu, H., Li, C., Wu, Q., Lee, Y.J.: Visual instruction tuning (2023)"},{"key":"5_CR31","doi-asserted-by":"publisher","unstructured":"Lovenia, H., Dai, W., Cahyawijaya, S., Ji, Z., Fung, P.: Negative object presence evaluation (NOPE) to measure object hallucination in vision-language models. In: Gu, J., Fu, T.J.R., Hudson, D., Celikyilmaz, A., Wang, W. (eds.) Proceedings of the 3rd Workshop on Advances in Language and Vision Research (ALVR), pp. 37\u201358. Association for Computational Linguistics, Bangkok, Thailand, August 2024. https:\/\/doi.org\/10.18653\/v1\/2024.alvr-1.4, https:\/\/aclanthology.org\/2024.alvr-1.4","DOI":"10.18653\/v1\/2024.alvr-1.4"},{"key":"5_CR32","unstructured":"Lymperaiou, M., Filandrianos, G., Thomas, K., Stamou, G.: Counterfactual edits for generative evaluation (2023)"},{"key":"5_CR33","unstructured":"Lymperaiou, M., Manoliadis, G., Menis\u00a0Mastromichalakis, O., Dervakos, E.G., Stamou, G.: Towards explainable evaluation of language models on the semantic similarity of visual concepts. In: Calzolari, N., et al. (eds.) Proceedings of the 29th International Conference on Computational Linguistics, pp. 3639\u20133658. International Committee on Computational Linguistics, Gyeongju, Republic of Korea, October 2022. https:\/\/aclanthology.org\/2022.coling-1.321\/"},{"key":"5_CR34","doi-asserted-by":"publisher","unstructured":"Manevich, A., Tsarfaty, R.: Mitigating hallucinations in large vision-language models (LVLMs) via language-contrastive decoding (LCD). In: Ku, L.W., Martins, A., Srikumar, V. (eds.) Findings of the Association for Computational Linguistics: ACL 2024. pp. 6008\u20136022. Association for Computational Linguistics, Bangkok, Thailand, August 2024. https:\/\/doi.org\/10.18653\/v1\/2024.findings-acl.359, https:\/\/aclanthology.org\/2024.findings-acl.359","DOI":"10.18653\/v1\/2024.findings-acl.359"},{"issue":"11","key":"5_CR35","doi-asserted-by":"publisher","first-page":"39","DOI":"10.1145\/219717.219748","volume":"38","author":"GA Miller","year":"1995","unstructured":"Miller, G.A.: Wordnet: a lexical database for English. Commun. ACM 38(11), 39\u201341 (1995)","journal-title":"Commun. ACM"},{"key":"5_CR36","unstructured":"OpenAI: Gpt-4 technical report (2023)"},{"key":"5_CR37","doi-asserted-by":"publisher","unstructured":"Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Isabelle, P., Charniak, E., Lin, D. (eds.) Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311\u2013318. Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, July 2002. https:\/\/doi.org\/10.3115\/1073083.1073135, https:\/\/aclanthology.org\/P02-1040","DOI":"10.3115\/1073083.1073135"},{"key":"5_CR38","doi-asserted-by":"publisher","unstructured":"Petryk, S., Chan, D., Kachinthaya, A., Zou, H., Canny, J., Gonzalez, J., Darrell, T.: ALOHa: A new measure for hallucination in captioning models. In: Duh, K., Gomez, H., Bethard, S. (eds.) Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers), pp. 342\u2013357. Association for Computational Linguistics, Mexico City, Mexico, June 2024. https:\/\/doi.org\/10.18653\/v1\/2024.naacl-short.30, https:\/\/aclanthology.org\/2024.naacl-short.30\/","DOI":"10.18653\/v1\/2024.naacl-short.30"},{"key":"5_CR39","unstructured":"Pillutla, K., et al.: Mauve: measuring the gap between neural text and human text using divergence frontiers (2021)"},{"key":"5_CR40","unstructured":"Qu, X., Chen, Q., Wei, W., Sun, J., Dong, J.: Alleviating hallucination in large vision-language models with active retrieval augmentation (2024). https:\/\/arxiv.org\/abs\/2408.00555"},{"key":"5_CR41","unstructured":"Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners (2019)"},{"key":"5_CR42","doi-asserted-by":"crossref","unstructured":"Rohrbach, A., Hendricks, L.A., Burns, K., Darrell, T., Saenko, K.: Object hallucination in image captioning (2019)","DOI":"10.18653\/v1\/D18-1437"},{"key":"5_CR43","unstructured":"Tonmoy, S.M.T.I., Zaman, S.M.M., Jain, V., Rani, A., Rawte, V., Chadha, A., Das, A.: A comprehensive survey of hallucination mitigation techniques in large language models (2024)"},{"key":"5_CR44","unstructured":"Touvron, H., et al.: Llama: open and efficient foundation language models (2023)"},{"key":"5_CR45","doi-asserted-by":"crossref","unstructured":"Vedantam, R., Zitnick, C.L., Parikh, D.: Cider: Consensus-based image description evaluation (2015)","DOI":"10.1109\/CVPR.2015.7299087"},{"key":"5_CR46","unstructured":"Wang, J., et al.: Git: A generative image-to-text transformer for vision and language. ArXiv abs\/2205.14100 (2022). https:\/\/api.semanticscholar.org\/CorpusID:249152323"},{"key":"5_CR47","unstructured":"Wang, J., et al.: Evaluation and analysis of hallucination in large vision-language models (2023)"},{"key":"5_CR48","unstructured":"Wang, W., et al.: Image as a foreign language: Beit pretraining for all vision and vision-language tasks. ArXiv abs\/2208.10442 (2022). https:\/\/api.semanticscholar.org\/CorpusID:251719655"},{"key":"5_CR49","unstructured":"Wu, M., Ji, J., Huang, O., Li, J., Wu, Y., Sun, X., Ji, R.: Evaluating and analyzing relationship hallucinations in large vision-language models (2024). https:\/\/arxiv.org\/abs\/2406.16449"},{"key":"5_CR50","doi-asserted-by":"crossref","unstructured":"Wu, Z., Palmer, M.: Verb semantics and lexical selection. arXiv preprint cmp-lg\/9406033 (1994)","DOI":"10.3115\/981732.981751"},{"key":"5_CR51","doi-asserted-by":"publisher","unstructured":"Xiao, Y., Wang, W.Y.: On hallucination and predictive uncertainty in conditional language generation. In: Merlo, P., Tiedemann, J., Tsarfaty, R. (eds.) Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. pp. 2734\u20132744. Association for Computational Linguistics, Online, April 2021. https:\/\/doi.org\/10.18653\/v1\/2021.eacl-main.236, https:\/\/aclanthology.org\/2021.eacl-main.236\/","DOI":"10.18653\/v1\/2021.eacl-main.236"},{"key":"5_CR52","unstructured":"Zhang, R., Zhang, H., Zheng, Z.: Vl-uncertainty: Detecting hallucination in large vision-language model via uncertainty estimation (2024). https:\/\/arxiv.org\/abs\/2411.11919"},{"key":"5_CR53","unstructured":"Zhang, S., et al.: Opt: Open pre-trained transformer language models (2022). https:\/\/arxiv.org\/abs\/2205.01068"},{"key":"5_CR54","unstructured":"Zhang, Y., et al.: Siren\u2019s song in the ai ocean: A survey on hallucination in large language models (2023)"},{"key":"5_CR55","unstructured":"Zhao, Z., Wang, B., Ouyang, L., Dong, X., Wang, J., He, C.: Beyond hallucinations: Enhancing lvlms through hallucination-aware direct preference optimization (2023)"},{"key":"5_CR56","unstructured":"Zhou, Y., et al.: Analyzing and mitigating object hallucination in large vision-language models. ArXiv abs\/2310.00754 (2023). https:\/\/api.semanticscholar.org\/CorpusID:263334335"}],"container-title":["Communications in Computer and Information Science","Explainable Artificial Intelligence"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/978-3-032-08330-2_5","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,13]],"date-time":"2025-10-13T03:11:14Z","timestamp":1760325074000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/978-3-032-08330-2_5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,14]]},"ISBN":["9783032083296","9783032083302"],"references-count":56,"URL":"https:\/\/doi.org\/10.1007\/978-3-032-08330-2_5","relation":{},"ISSN":["1865-0929","1865-0937"],"issn-type":[{"value":"1865-0929","type":"print"},{"value":"1865-0937","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,10,14]]},"assertion":[{"value":"14 October 2025","order":1,"name":"first_online","label":"First Online","group":{"name":"ChapterHistory","label":"Chapter History"}},{"value":"The authors have no competing interests to declare that are relevant to the content of this article.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Disclosure of Interest"}},{"value":"xAI","order":1,"name":"conference_acronym","label":"Conference Acronym","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"World Conference on Explainable Artificial Intelligence","order":2,"name":"conference_name","label":"Conference Name","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"Istanbul","order":3,"name":"conference_city","label":"Conference City","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"T\u00fcrkiye","order":4,"name":"conference_country","label":"Conference Country","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"2025","order":5,"name":"conference_year","label":"Conference Year","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"9 July 2025","order":7,"name":"conference_start_date","label":"Conference Start Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"11 July 2025","order":8,"name":"conference_end_date","label":"Conference End Date","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"3","order":9,"name":"conference_number","label":"Conference Number","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"xai2025","order":10,"name":"conference_id","label":"Conference ID","group":{"name":"ConferenceInfo","label":"Conference Information"}},{"value":"https:\/\/xaiworldconference.com\/2025\/","order":11,"name":"conference_url","label":"Conference URL","group":{"name":"ConferenceInfo","label":"Conference Information"}}]}}