{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,22]],"date-time":"2025-10-22T23:35:28Z","timestamp":1761176128996,"version":"build-2065373602"},"reference-count":0,"publisher":"IOS Press","isbn-type":[{"value":"9781643686318","type":"electronic"}],"license":[{"start":{"date-parts":[[2025,10,21]],"date-time":"2025-10-21T00:00:00Z","timestamp":1761004800000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,10,21]]},"abstract":"<jats:p>Large-scale Vision-Language Models (VLMs) have demonstrated impressive zero-shot performance in sample-level downstream tasks (e.g., image classification), driven by their powerful generalization ability. However, they still struggle in instance-level tasks, e.g., zero-shot Referring Expression Comprehension (REC), which requires precisely locating the target instance in an image based on a provided text caption. To address this issue, we propose Multimodal Semantic Decoupled Prompting (MSDP), a simple yet effective prompt engineering approach that contains both textual- and visual-focused instance-level understanding prompting. Specifically, we first propose a novel textual restructure strategy to eliminate the impact of task-irrelevant semantic information, steering the model\u2019s attention at the textual understanding level. Meanwhile, we design a united visual prompt at the visual understanding level that maximally activates the instance-level understanding capabilities of VLMs. Experiments on several benchmarks reveal that the proposed approach outperforms state-of-the-art (SOTA) methods. The code is available at repository.<\/jats:p>","DOI":"10.3233\/faia250850","type":"book-chapter","created":{"date-parts":[[2025,10,22]],"date-time":"2025-10-22T09:43:58Z","timestamp":1761126238000},"source":"Crossref","is-referenced-by-count":0,"title":["Multimodal Semantic Decoupled Prompt for Zero-Shot Referring Expression Comprehension"],"prefix":"10.3233","author":[{"given":"Yuxuan","family":"Zhang","sequence":"first","affiliation":[{"name":"Nanjing University of Science and Technology"}]},{"given":"Longfei","family":"Huang","sequence":"additional","affiliation":[{"name":"Nanjing University of Science and Technology"}]},{"given":"Yang","family":"Yang","sequence":"additional","affiliation":[{"name":"Nanjing University of Science and Technology"}]}],"member":"7437","container-title":["Frontiers in Artificial Intelligence and Applications","ECAI 2025"],"original-title":[],"link":[{"URL":"https:\/\/ebooks.iospress.nl\/pdf\/doi\/10.3233\/FAIA250850","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,22]],"date-time":"2025-10-22T09:43:58Z","timestamp":1761126238000},"score":1,"resource":{"primary":{"URL":"https:\/\/ebooks.iospress.nl\/doi\/10.3233\/FAIA250850"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,10,21]]},"ISBN":["9781643686318"],"references-count":0,"URL":"https:\/\/doi.org\/10.3233\/faia250850","relation":{},"ISSN":["0922-6389","1879-8314"],"issn-type":[{"value":"0922-6389","type":"print"},{"value":"1879-8314","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,10,21]]}}}