{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,31]],"date-time":"2025-10-31T22:21:49Z","timestamp":1761949309534,"version":"build-2065373602"},"reference-count":36,"publisher":"Springer Science and Business Media LLC","issue":"11","license":[{"start":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T00:00:00Z","timestamp":1760054400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T00:00:00Z","timestamp":1760054400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100024023","name":"Korea Institute for Advanced Study","doi-asserted-by":"publisher","award":["IITP-2025-RS-2023-00254592"],"award-info":[{"award-number":["IITP-2025-RS-2023-00254592"]}],"id":[{"id":"10.13039\/501100024023","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Korea Institute of Police Technology","award":["092021D75000000"],"award-info":[{"award-number":["092021D75000000"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Complex Intell. Syst."],"published-print":{"date-parts":[[2025,11]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Traditional vision-language models demonstrate strong performance in tasks such as image captioning and visual question answering, but they remain limited by issues such as hallucination, lack of self-correction, and shallow reasoning. These shortcomings compromise the safety, robustness, and consistency of their reasoning, particularly in ambiguous or high-stakes scenarios. In this paper, we propose three complementary frameworks aimed at enabling more trustworthy visual reasoning through structured deliberation. The first is the self-reflective reasoning single-agent framework, which facilitates iterative self-revision without requiring external supervision. The second is the structured debate agent framework, in which turn-based rebuttals between agents promote contrastive, multi-perspective refinement. The third is the progressive two-stage debate agent framework, which enables efficient yet accurate decision-making through model-to-model deliberation between smaller and larger agents. Experiments on the COCO dataset demonstrate that all three frameworks significantly enhance reasoning performance, achieving up to a 5.4% improvement in Intersection over Union (IoU) and over a 40% reduction in localization error compared to a single-pass baseline. Further evaluation across robustness (IoU), safety (self-revision rate, SRR), and consistency (consistency score, CS) confirms the effectiveness of multi-round, self-corrective, and multi-agent reasoning strategies. These results establish a practical path toward safer, more robust, and more interpretable vision-language models through lightweight, deliberative inference frameworks.<\/jats:p>","DOI":"10.1007\/s40747-025-02093-3","type":"journal-article","created":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T10:21:39Z","timestamp":1760091699000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Enhancing safety of vision-language reasoning through model-to-model deliberation"],"prefix":"10.1007","volume":"11","author":[{"ORCID":"https:\/\/orcid.org\/0009-0001-0118-3342","authenticated-orcid":false,"given":"Sungwoo","family":"Kim","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0009-0006-8025-700X","authenticated-orcid":false,"given":"Yongjin","family":"Lee","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3732-5346","authenticated-orcid":false,"given":"Yunsick","family":"Sung","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,10,10]]},"reference":[{"doi-asserted-by":"crossref","unstructured":"Everitt T, Lea G, Hutter M (2018) AGI safety literature review. arXiv preprint arXiv:1805.01109","key":"2093_CR1","DOI":"10.24963\/ijcai.2018\/768"},{"doi-asserted-by":"crossref","unstructured":"Antol S, Agrawal A, Lu J, Mitchell M, Batra D, Zitnick CL, Parikh D (2015) VQA: visual question answering. In: Proceedings of the IEEE international conference on computer vision, pp 2425\u20132433","key":"2093_CR2","DOI":"10.1109\/ICCV.2015.279"},{"unstructured":"Herdade S, Kappeler A, Boakye K, Soares J (2019) Image captioning: transforming objects into words. In: Advances in neural information processing systems, vol 32","key":"2093_CR3"},{"unstructured":"Mostafazadeh N, Brockett C, Dolan B, Galley M, Gao J, Spithourakis GP, Vanderwende L (2017) Image-grounded conversations: multimodal context for natural question and response generation. arXiv preprint arXiv:1701.08251","key":"2093_CR4"},{"unstructured":"Rae JW, Borgeaud S, Cai T, Millican K, Hoffmann J, Song F, Aslanides J, Henderson S, Ring R, Young S, et\u00a0al (2021) Scaling language models: methods, analysis & insights from training gopher. arXiv preprint arXiv:2112.11446","key":"2093_CR5"},{"unstructured":"Amodei D, Olah C, Steinhardt J, Christiano P, Schulman J, Man\u00e9 D (2016) Concrete problems in AI safety. arXiv preprint arXiv:1606.06565","key":"2093_CR6"},{"unstructured":"Dong H, Xiong W, Goyal D, Zhang Y, Chow W, Pan R, Diao S, Zhang J, Shum K, Zhang T (2023) Raft: reward ranked finetuning for generative foundation model alignment. arXiv preprint arXiv:2304.06767","key":"2093_CR7"},{"key":"2093_CR8","first-page":"27730","volume":"35","author":"L Ouyang","year":"2022","unstructured":"Ouyang L, Jeffrey W, Jiang X, Almeida D, Wainwright C, Mishkin P, Zhang C, Agarwal S, Slama K, Ray A et al (2022) Training language models to follow instructions with human feedback. Adv Neural Inf Process Syst 35:27730\u201327744","journal-title":"Adv Neural Inf Process Syst"},{"unstructured":"Du Y, Li S, Torralba A, Tenenbaum JB, Mordatch I (2023) Improving factuality and reasoning in language models through multiagent debate. In: Forty-first international conference on machine learning","key":"2093_CR9"},{"unstructured":"Anwar U, Saparov A, Rando J, Paleka D, Turpin M, Hase P, Lubana ES, Jenner E, Casper S, Sourbut O, et\u00a0al (2024) Foundational challenges in assuring alignment and safety of large language models. arXiv preprint arXiv:2404.09932","key":"2093_CR10"},{"issue":"3","key":"2093_CR11","doi-asserted-by":"publisher","first-page":"411","DOI":"10.1007\/s11023-020-09539-2","volume":"30","author":"I Gabriel","year":"2020","unstructured":"Gabriel I (2020) Artificial intelligence, values, and alignment. Mind Mach 30(3):411\u2013437","journal-title":"Mind Mach"},{"issue":"9","key":"2093_CR12","doi-asserted-by":"publisher","first-page":"2337","DOI":"10.1007\/s11263-022-01653-1","volume":"130","author":"K Zhou","year":"2022","unstructured":"Zhou K, Yang J, Loy CC, Liu Z (2022) Learning to prompt for vision-language models. Int J Comput Vis 130(9):2337\u20132348","journal-title":"Int J Comput Vis"},{"issue":"3","key":"2093_CR13","doi-asserted-by":"publisher","first-page":"1651","DOI":"10.3390\/s23031651","volume":"23","author":"G Bonwoo","year":"2023","unstructured":"Bonwoo G, Sung Y (2023) UX framework including imbalanced UX dataset reduction method for analyzing interaction trends of agent systems. Sensors 23(3):1651","journal-title":"Sensors"},{"issue":"2","key":"2093_CR14","first-page":"101","volume":"66","author":"A Kumar","year":"2014","unstructured":"Kumar A, Sato A, Oishi T, Ono S, Ikeuchi K (2014) Improving GPS position accuracy by identification of reflected GPS signals using range data for modeling of urban structures. Seisan Kenkyu 66(2):101\u2013107","journal-title":"Seisan Kenkyu"},{"issue":"2","key":"2093_CR15","first-page":"91","volume":"65","author":"A Kumar","year":"2013","unstructured":"Kumar A, Banno A, Ono S, Oishi T, Ikeuchi K (2013) Global coordinate adjustment of the 3D survey models under unstable GPS condition. Seisan Kenkyu 65(2):91\u201395","journal-title":"Seisan Kenkyu"},{"unstructured":"Aggarwal AK, Chauhan APS (2025) Robust feature extraction from omnidirectional outdoor images for computer vision applications. Int J Instrum Meas 10:8\u201313","key":"2093_CR16"},{"issue":"4","key":"2093_CR17","doi-asserted-by":"publisher","first-page":"344","DOI":"10.1504\/IJBET.2009.027798","volume":"2","author":"A Kumar","year":"2009","unstructured":"Kumar A (2009) Light propagation through biological tissue: comparison between Monte Carlo simulation and deterministic models. Int J Biomed Eng Technol 2(4):344\u2013351","journal-title":"Int J Biomed Eng Technol"},{"issue":"4","key":"2093_CR18","doi-asserted-by":"publisher","first-page":"798","DOI":"10.3390\/math11040798","volume":"11","author":"S Li","year":"2023","unstructured":"Li S, Sung Y (2023) MRBERT: pre-training of melody and rhythm for automatic music generation. Mathematics 11(4):798","journal-title":"Mathematics"},{"issue":"4","key":"2093_CR19","doi-asserted-by":"publisher","first-page":"1050","DOI":"10.3390\/math11041050","volume":"11","author":"Y Zhang","year":"2023","unstructured":"Zhang Y, Sung Y (2023) Hybrid traffic accident classification models. Mathematics 11(4):1050","journal-title":"Mathematics"},{"unstructured":"Rahman S, Issaka S, Suvarna A, Liu G, Shiffer J, Lee J, Parvez MR, Palangi H, Feng S, Peng N, et\u00a0al (2025) AI debate aids assessment of controversial claims. arXiv preprint arXiv:2506.02175","key":"2093_CR20"},{"unstructured":"Brown-Cohen J, Irving G, Piliouras G (2023) Scalable AI safety via doubly-efficient debate. arXiv preprint arXiv:2311.14125","key":"2093_CR21"},{"doi-asserted-by":"crossref","unstructured":"Deepak P (2024) AI safety: necessary, but insufficient and possibly problematic. AI Soc 40(2):1143\u20131145","key":"2093_CR22","DOI":"10.1007\/s00146-024-01899-y"},{"unstructured":"Irving G, Christiano P, Amodei D (2018) AI safety via debate. arXiv preprint arXiv:1805.00899","key":"2093_CR23"},{"issue":"2","key":"2093_CR24","doi-asserted-by":"publisher","DOI":"10.1007\/s11432-024-4222-0","volume":"68","author":"Z Xi","year":"2025","unstructured":"Xi Z, Chen W, Guo X, He W, Ding Y, Hong B, Zhang M, Wang J, Jin S, Zhou E et al (2025) The rise and potential of large language model based agents: a survey. Sci China Inf Sci 68(2):121101","journal-title":"Sci China Inf Sci"},{"key":"2093_CR25","first-page":"24824","volume":"35","author":"J Wei","year":"2022","unstructured":"Wei J, Wang X, Schuurmans D, Bosma M, Xia F, Chi E, Le QV, Zhou D et al (2022) Chain-of-thought prompting elicits reasoning in large language models. Adv Neural Inf Process Syst 35:24824\u201324837","journal-title":"Adv Neural Inf Process Syst"},{"unstructured":"Wang X, Wei J, Schuurmans D, Le Q, Chi E, Narang S, Chowdhery A, Zhou D (2022) Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171","key":"2093_CR26"},{"unstructured":"Yao S, Yu D, Zhao J, Shafran I, Griffiths TL, Cao Y, Narasimhan K (2023) Tree of thoughts: deliberate problem solving with large language models, 2023. https:\/\/arxiv.org\/abs\/2305.10601, 3","key":"2093_CR27"},{"unstructured":"Chan C-M, Chen W, Su Y, Yu J, Xue W, Zhang S, Fu J, Liu Z (2023) ChatEval: towards better LLM-based evaluators through multi-agent debate. arXiv preprint arXiv:2308.07201","key":"2093_CR28"},{"doi-asserted-by":"crossref","unstructured":"Liang T, He Z, Jiao W, Wang X, Wang Y, Wang R, Yang Y, Shi S, Tu Z (2023) Encouraging divergent thinking in large language models through multi-agent debate. arXiv preprint arXiv:2305.19118","key":"2093_CR29","DOI":"10.18653\/v1\/2024.emnlp-main.992"},{"doi-asserted-by":"crossref","unstructured":"Wang B, Yue X, Sun H (2023) Can ChatGPT defend its belief in truth? Evaluating LLM reasoning via debate. arXiv preprint arXiv:2305.13160","key":"2093_CR30","DOI":"10.18653\/v1\/2023.findings-emnlp.795"},{"unstructured":"Zhang Y, Yang X, Feng S, Wang D, Zhang Y, Song K (2024) Can LLMs beat humans in debating? a dynamic multi-agent framework for competitive debate. arXiv preprint arXiv:2408.04472","key":"2093_CR31"},{"key":"2093_CR32","first-page":"34892","volume":"36","author":"H Liu","year":"2023","unstructured":"Liu H, Li C, Wu Q, Lee YJ (2023) Visual instruction tuning. Adv Neural Inf Process Syst 36:34892\u201334916","journal-title":"Adv Neural Inf Process Syst"},{"doi-asserted-by":"crossref","unstructured":"Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Doll\u00e1r P, Zitnick CL (2014) Microsoft COCO: common objects in context. In: Computer vision\u2013ECCV 2014: 13th European conference, Zurich, Switzerland, September 6\u201312, 2014, proceedings, part V 13. Springer, pp 740\u2013755","key":"2093_CR33","DOI":"10.1007\/978-3-319-10602-1_48"},{"doi-asserted-by":"crossref","unstructured":"Rezatofighi H, Tsoi N, Gwak J, Sadeghian A, Reid I, Savarese S (2019) Generalized intersection over union: a metric and a loss for bounding box regression. In: Proceedings of the IEEE\/CVF conference on computer vision and pattern recognition, pp 658\u2013666","key":"2093_CR34","DOI":"10.1109\/CVPR.2019.00075"},{"unstructured":"Khan A, Hughes J, Valentine D, Ruis L, Sachan K, Radhakrishnan A, Grefenstette E, Bowman SR, Rockt\u00e4schel T, Perez E (2024) Debating with more persuasive LLMs leads to more truthful answers. arXiv preprint arXiv:2402.06782","key":"2093_CR35"},{"key":"2093_CR36","first-page":"68559","volume":"37","author":"R Ren","year":"2024","unstructured":"Ren R, Basart S, Khoja A, Gatti A, Phan L, Yin X, Mazeika M, Pan A, Mukobi G, Kim R et al (2024) Safetywashing: do AI safety benchmarks actually measure safety progress? Adv Neural Inf Process Syst 37:68559\u201368594","journal-title":"Adv Neural Inf Process Syst"}],"container-title":["Complex &amp; Intelligent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-025-02093-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s40747-025-02093-3\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s40747-025-02093-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,31]],"date-time":"2025-10-31T22:15:23Z","timestamp":1761948923000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s40747-025-02093-3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,10]]},"references-count":36,"journal-issue":{"issue":"11","published-print":{"date-parts":[[2025,11]]}},"alternative-id":["2093"],"URL":"https:\/\/doi.org\/10.1007\/s40747-025-02093-3","relation":{},"ISSN":["2199-4536","2198-6053"],"issn-type":[{"type":"print","value":"2199-4536"},{"type":"electronic","value":"2198-6053"}],"subject":[],"published":{"date-parts":[[2025,10,10]]},"assertion":[{"value":"8 July 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"1 September 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"10 October 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}],"article-number":"464"}}