{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,9,17]],"date-time":"2025-09-17T03:16:28Z","timestamp":1758078988809,"version":"3.44.0"},"reference-count":10,"publisher":"Association for Computing Machinery (ACM)","issue":"12","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. VLDB Endow."],"published-print":{"date-parts":[[2025,8]]},"abstract":"<jats:p>Missing data prevalent in information systems impacts data diversity and fidelity, which systematically degrade clustering performance through biased similarity measures and unstable cluster boundaries. Current large-scale environments lack standardized imputation-clustering pipelines, as existing methods operate independently of downstream tasks without analyzing error propagation effects, leading to unreliable results. To address this, we propose TARImpute, a Task-Aware auto-Recommender system for missing value imputation for clustering. It owns three integrated features: Imputation Impact Profiler for quantitative evaluation of imputation-clustering interactions, Error Propagation Interpreter enabling explainable modeling of imputation error diffusion, and Adaptive Strategy Optimizer for dynamic selection of optimal imputation methods. TARImpute provides state-of-the-art imputation methods to evaluate their effects on clustering tasks. TARImpute also provides robust, interpretable solutions for low-quality data and shows extensibility to other analytical tasks.<\/jats:p>","DOI":"10.14778\/3750601.3750667","type":"journal-article","created":{"date-parts":[[2025,9,16]],"date-time":"2025-09-16T13:37:51Z","timestamp":1758029871000},"page":"5343-5346","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["TARImpute: Task-Aware Auto-Recommender System for Missing Value Imputation Algorithms with Clustering Case Studies"],"prefix":"10.14778","volume":"18","author":[{"given":"Xiaoou","family":"Ding","sequence":"first","affiliation":[{"name":"Harbin Institute of Technology, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yanshuo","family":"Liu","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zhounan","family":"Chen","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hongzhi","family":"Wang","sequence":"additional","affiliation":[{"name":"Harbin Institute of Technology, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chen","family":"Wang","sequence":"additional","affiliation":[{"name":"Tsinghua University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jianmin","family":"Wang","sequence":"additional","affiliation":[{"name":"Tsinghua University, China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2025,9,16]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"TSDDISCOVER: Discovering Data Dependency for Time Series Data","author":"Ding Xiaoou","year":"2024","unstructured":"Xiaoou Ding, Yingze Li, Hongzhi Wang, Chen Wang, Yida Liu, and Jianmin Wang. 2024. TSDDISCOVER: Discovering Data Dependency for Time Series Data. IEEE ICDE, 3668\u20133681."},{"key":"e_1_2_1_2_1","first-page":"4377","article-title":"Clean4TSDB: A Data Cleaning Tool for Time Series Databases","volume":"17","author":"Ding Xiaoou","year":"2024","unstructured":"Xiaoou Ding, Yichen Song, Hongzhi Wang, Donghua Yang, Chen Wang, and Jianmin Wang. 2024. Clean4TSDB: A Data Cleaning Tool for Time Series Databases. VLDB 17, 12 (2024), 4377\u20134380.","journal-title":"VLDB"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.23919\/ICN.2022.0026"},{"key":"e_1_2_1_4_1","first-page":"1786","article-title":"Cleanits: A Data Cleaning System for Industrial Time Series","volume":"12","author":"Ding Xiaoou","year":"2019","unstructured":"Xiaoou Ding, Hongzhi Wang, Jiaxuan Su, Zijue Li, Jianzhong Li, and Hong Gao. 2019. Cleanits: A Data Cleaning System for Industrial Time Series. VLDB 12, 12 (2019), 1786\u20131789.","journal-title":"VLDB"},{"key":"e_1_2_1_5_1","first-page":"9916","article-title":"HyperImpute: Generalized Iterative Imputation with Automatic Model Selection","volume":"162","author":"Jarrett Daniel","year":"2022","unstructured":"Daniel Jarrett, Bogdan Cebere, Tennison Liu, Alicia Curth, and Mihaela van der Schaar. 2022. HyperImpute: Generalized Iterative Imputation with Automatic Model Selection. In ICML, Vol. 162. 9916\u20139937.","journal-title":"ICML"},{"key":"e_1_2_1_6_1","first-page":"613","article-title":"Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm","volume":"8","author":"Kalisch Markus","year":"2007","unstructured":"Markus Kalisch and Peter B\u00fchlmann. 2007. Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm. J. Mach. Learn. Res. 8 (2007), 613\u2013636.","journal-title":"J. Mach. Learn. Res."},{"key":"e_1_2_1_7_1","first-page":"768","article-title":"Mind the Gap: An Experimental Evaluation of Imputation of Missing Values Techniques in Time Series","volume":"13","author":"Khayati Mourad","year":"2020","unstructured":"Mourad Khayati, Alberto Lerner, Zakhar Tymchenko, and Philippe Cudr\u00e9-Mauroux. 2020. Mind the Gap: An Experimental Evaluation of Imputation of Missing Values Techniques in Time Series. VLDB 13, 5 (2020), 768\u2013782.","journal-title":"VLDB"},{"key":"e_1_2_1_8_1","volume-title":"Adaptive proximal algorithms for convex optimization under local Lipschitz continuity of the gradient. CoRR abs\/2301.04431","author":"Latafat Puya","year":"2023","unstructured":"Puya Latafat, Andreas Themelis, Lorenzo Stella, and Panagiotis Patrinos. 2023. Adaptive proximal algorithms for convex optimization under local Lipschitz continuity of the gradient. CoRR abs\/2301.04431 (2023)."},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2022.3186498"},{"key":"e_1_2_1_10_1","doi-asserted-by":"crossref","unstructured":"Keyu Yang Yunjun Gao Rui Ma Lu Chen Sai Wu and Gang Chen. 2019. DBSCAN-MS: Distributed Density-Based Clustering in Metric Spaces. In ICDE. 1346\u20131357.","DOI":"10.1109\/ICDE.2019.00122"}],"container-title":["Proceedings of the VLDB Endowment"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.14778\/3750601.3750667","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,16]],"date-time":"2025-09-16T13:38:15Z","timestamp":1758029895000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.14778\/3750601.3750667"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8]]},"references-count":10,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2025,8]]}},"alternative-id":["10.14778\/3750601.3750667"],"URL":"https:\/\/doi.org\/10.14778\/3750601.3750667","relation":{},"ISSN":["2150-8097"],"issn-type":[{"value":"2150-8097","type":"print"}],"subject":[],"published":{"date-parts":[[2025,8]]},"assertion":[{"value":"2025-09-16","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}