{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T02:38:48Z","timestamp":1773801528324,"version":"3.50.1"},"reference-count":0,"publisher":"Association for the Advancement of Artificial Intelligence (AAAI)","issue":"10","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["AAAI"],"abstract":"<jats:p>Vision-language retrieval (VLR), which uses text or image queries to retrieve corresponding cross-modal content, plays a crucial role in multimedia and computer vision tasks. However, challenging concepts in queries often confuse retrievers, limiting their ability to align concepts with visual content. Existing query optimization methods neglect retrievers\u2019 preferences (i.e., text descriptions that better match their corresponding visual content), resulting in unadapted to the retriever and leading to suboptimal performance. To address this, we propose the Retriever-Adaptive Query Optimization (RAQO), an interpretable framework that rewrites queries based on retriever-specific preferences. Specifically, we first leverages multimodal large language Models (MLLMs) and retrieval's feedback to construct the MLLMs-Driven Preference-Aware Dataset Engine (MPADE), which automatically refine queries offline, capturing the retriever\u2019s implicit preferences. Then, we introduce a ``detect-then-rewrite\" chain-of-thought rewriting (ReCoT) strategy equipped with a progressive preference alignment pipeline, including three stages: ambiguity detection fine-tuning, query rewriting fine-tuning, and preference rank optimization. This design enables the rewriter to focus on confusing concepts and produce retriever-adapted, high-quality queries. Extensive VLR benchmark experiments have demonstrated the superiority of RAQO in cross-modal retrieval, as well as its interpretability, generalizability and transferability.<\/jats:p>","DOI":"10.1609\/aaai.v40i10.37741","type":"journal-article","created":{"date-parts":[[2026,3,17]],"date-time":"2026-03-17T23:39:29Z","timestamp":1773790769000},"page":"7963-7971","source":"Crossref","is-referenced-by-count":0,"title":["Suit the Remedy to the Retriever: Interpretable Query Optimization with Retriever Preference Alignment for Vision-Language Retrieval"],"prefix":"10.1609","volume":"40","author":[{"given":"GuangHao","family":"Meng","sequence":"first","affiliation":[]},{"given":"Jinpeng","family":"Wang","sequence":"additional","affiliation":[]},{"given":"Jieming","family":"Zhu","sequence":"additional","affiliation":[]},{"given":"Letian","family":"Zhang","sequence":"additional","affiliation":[]},{"given":"Yong","family":"Jiang","sequence":"additional","affiliation":[]},{"given":"Dan","family":"Zhao","sequence":"additional","affiliation":[]},{"given":"Qing","family":"Li","sequence":"additional","affiliation":[]}],"member":"9382","published-online":{"date-parts":[[2026,3,14]]},"container-title":["Proceedings of the AAAI Conference on Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/download\/37741\/41703","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/download\/37741\/41703","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,17]],"date-time":"2026-03-17T23:39:29Z","timestamp":1773790769000},"score":1,"resource":{"primary":{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/view\/37741"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3,14]]},"references-count":0,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2026,3,17]]}},"URL":"https:\/\/doi.org\/10.1609\/aaai.v40i10.37741","relation":{},"ISSN":["2374-3468","2159-5399"],"issn-type":[{"value":"2374-3468","type":"electronic"},{"value":"2159-5399","type":"print"}],"subject":[],"published":{"date-parts":[[2026,3,14]]}}}