{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T02:55:01Z","timestamp":1773802501229,"version":"3.50.1"},"reference-count":0,"publisher":"Association for the Advancement of Artificial Intelligence (AAAI)","issue":"17","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["AAAI"],"abstract":"<jats:p>High quality datasets are critical for training reliable machine learning models, yet data faults caused by insufficient annotation expertise or malicious poisoning attacks remain prevalent. Traditional classifier based methods rely on manually curated subsets for fault detection, but their limited scale frequently leads to model overfitting. While multimodal large language models (MLLMs) based methods offer promising detection capabilities, their few-shot learning limitations hinder generalization in domain specific tasks. To address these challenges, we propose MLLM Guided Iterative Sample Filtering (MISF), a novel framework that combines the strengths of MLLM based initialization and iterative data refinement. Our framework initializes the detection model with MLLM generated synthetic images and a curated clean subset, then iteratively refines it by progressively selecting high certainty clean samples, improving both domain adaptation and detection accuracy. Extensive experiments on RESISC45 and Oxford-IIIT Pets datasets demonstrate that MISF effectively identifies data faults, outperforming existing approaches. MISF provides a robust, scalable solution for improving dataset quality in specialized domains.<\/jats:p>","DOI":"10.1609\/aaai.v40i17.38471","type":"journal-article","created":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T00:33:16Z","timestamp":1773793996000},"page":"14538-14546","source":"Crossref","is-referenced-by-count":0,"title":["MISF: MLLM Guided Iterative Sample Filtering for Data Fault Detection"],"prefix":"10.1609","volume":"40","author":[{"given":"Guoying","family":"Chen","sequence":"first","affiliation":[]},{"given":"Ruizhuo","family":"Zhao","sequence":"additional","affiliation":[]},{"given":"Zhewei","family":"Xu","sequence":"additional","affiliation":[]},{"given":"Bo","family":"Yang","sequence":"additional","affiliation":[]},{"given":"Kunlong","family":"Wang","sequence":"additional","affiliation":[]}],"member":"9382","published-online":{"date-parts":[[2026,3,14]]},"container-title":["Proceedings of the AAAI Conference on Artificial Intelligence"],"original-title":[],"link":[{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/download\/38471\/42433","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/download\/38471\/42433","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T00:33:16Z","timestamp":1773793996000},"score":1,"resource":{"primary":{"URL":"https:\/\/ojs.aaai.org\/index.php\/AAAI\/article\/view\/38471"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2026,3,14]]},"references-count":0,"journal-issue":{"issue":"17","published-online":{"date-parts":[[2026,3,17]]}},"URL":"https:\/\/doi.org\/10.1609\/aaai.v40i17.38471","relation":{},"ISSN":["2374-3468","2159-5399"],"issn-type":[{"value":"2374-3468","type":"electronic"},{"value":"2159-5399","type":"print"}],"subject":[],"published":{"date-parts":[[2026,3,14]]}}}