{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,29]],"date-time":"2026-05-29T12:44:17Z","timestamp":1780058657662,"version":"3.54.0"},"reference-count":194,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2024,12,20]],"date-time":"2024-12-20T00:00:00Z","timestamp":1734652800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,12,20]],"date-time":"2024-12-20T00:00:00Z","timestamp":1734652800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62302241"],"award-info":[{"award-number":["62302241"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["62072265"],"award-info":[{"award-number":["62072265"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100006606","name":"Natural Science Foundation of Tianjin","doi-asserted-by":"crossref","award":["22JCQNJC01520"],"award-info":[{"award-number":["22JCQNJC01520"]}],"id":[{"id":"10.13039\/501100006606","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Fundamental Research Funds for the Central Universities, Nankai University","award":["63231147"],"award-info":[{"award-number":["63231147"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Data Sci. Eng."],"published-print":{"date-parts":[[2025,6]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Relational data play a crucial role in various fields, but they are often plagued by low-quality issues such as erroneous and missing values, which can terribly impact downstream applications. To tackle these issues, relational data cleaning with traditional signals, e.g., statistics, constraints, and clusters, have been extensively studied, with interpretability and efficiency. Recently, considering the strong capability of modeling complex relationships, artificial intelligence (AI) techniques have been introduced into the data cleaning field. These AI-based methods either consider multiple cleaning signals, integrate various techniques into the cleaning system, or incorporate neural networks. Among them, methods utilizing deep neural networks are classified as deep learning (DL) based, while those that do not are classified as machine learning (ML) based. In this study, we focus on three essential tasks (i.e., error detection, data repairing, and data imputation) for cleaning relational data, to comprehensively review the representative methods using traditional or AI techniques. By comparing and analyzing two types of methods across five dimensions (cost, generalization, interpretability, efficiency, and effectiveness), we provide insights into their strengths, weaknesses, and suitable application scenarios. Finally, we analyze the challenges and open issues currently faced in data cleaning and discuss possible directions for future studies.<\/jats:p>","DOI":"10.1007\/s41019-024-00266-7","type":"journal-article","created":{"date-parts":[[2024,12,20]],"date-time":"2024-12-20T07:09:04Z","timestamp":1734678544000},"page":"147-174","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":10,"title":["Relational Data Cleaning Meets Artificial Intelligence: A Survey"],"prefix":"10.1007","volume":"10","author":[{"given":"Jingyu","family":"Zhu","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Xintong","family":"Zhao","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-7398-2972","authenticated-orcid":false,"given":"Yu","family":"Sun","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Shaoxu","family":"Song","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Xiaojie","family":"Yuan","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"297","published-online":{"date-parts":[[2024,12,20]]},"reference":[{"key":"266_CR1","doi-asserted-by":"crossref","unstructured":"Bharwad ND, Goswami MM ( 2014) Proposed efficient approach for classification for multi-relational data mining using bayesian belief network. In: 2014 International Conference on Green Computing Communication and Electrical Engineering (ICGCCEE), pp. 1\u2013 4 . IEEE","DOI":"10.1109\/ICGCCEE.2014.6922401"},{"key":"266_CR2","doi-asserted-by":"crossref","unstructured":"Poulis G, Gkoulalas-Divanis A, Loukides G, Skiadopoulos S, Tryfonopoulos C (2015)Secreta: A tool for anonymizing relational, transaction and rt-datasets. Medical data privacy handbook, 83\u2013109","DOI":"10.1007\/978-3-319-23633-9_5"},{"key":"266_CR3","doi-asserted-by":"crossref","unstructured":"Li, T., Anand, S.S.: Hirel: An incremental clustering algorithm for relational datasets. In: 2008 Eighth IEEE International Conference on Data Mining, pp. 887\u2013 892 ( 2008). IEEE","DOI":"10.1109\/ICDM.2008.116"},{"key":"266_CR4","unstructured":"https:\/\/www.oracle.com\/"},{"key":"266_CR5","unstructured":"https:\/\/learn.microsoft.com\/sql\/"},{"key":"266_CR6","unstructured":"https:\/\/www.mysql.com\/"},{"issue":"24","key":"266_CR7","doi-asserted-by":"publisher","first-page":"2619","DOI":"10.2146\/ajhp050138","volume":"62","author":"J Sakowski","year":"2005","unstructured":"Sakowski J, Leonard T, Colburn S, Michaelsen B, Schiro T, Schneider J, Newman JM (2005) Using a bar-coded medication administration system to prevent medication errors in a community hospital network. Am J Health Syst Pharm 62(24):2619\u20132625","journal-title":"Am J Health Syst Pharm"},{"issue":"1","key":"266_CR8","doi-asserted-by":"publisher","first-page":"208","DOI":"10.1016\/j.ymssp.2013.05.007","volume":"40","author":"J Kullaa","year":"2013","unstructured":"Kullaa J (2013) Detection, identification, and quantification of sensor fault in a sensor network. Mech Syst Signal Process 40(1):208\u2013221","journal-title":"Mech Syst Signal Process"},{"issue":"1","key":"266_CR9","doi-asserted-by":"publisher","first-page":"914","DOI":"10.1109\/TVCG.2018.2864914","volume":"25","author":"H Song","year":"2018","unstructured":"Song H, Szafir DA (2018) Where\u2019s my data? Evaluating visualizations with missing data. IEEE Trans Visual Comput Graphics 25(1):914\u2013924","journal-title":"IEEE Trans Visual Comput Graphics"},{"issue":"2","key":"266_CR10","doi-asserted-by":"publisher","first-page":"422","DOI":"10.1109\/TAC.2012.2211411","volume":"58","author":"E Garcia","year":"2012","unstructured":"Garcia E, Antsaklis PJ (2012) Model-based event-triggered control for systems with quantization and time-varying network delays. IEEE Trans Autom Control 58(2):422\u2013434","journal-title":"IEEE Trans Autom Control"},{"key":"266_CR11","doi-asserted-by":"crossref","unstructured":"Nguyen TSL, Jourjon G, Potop-Butucaru M, Thai KL ( 2019) Impact of network delays on hyperledger fabric. In: IEEE INFOCOM 2019-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), pp. 222\u2013 227 . IEEE","DOI":"10.1109\/INFCOMW.2019.8845168"},{"key":"266_CR12","unstructured":"Eckerson WW (2002) Data quality and the bottom line. TDWI Report, The Data Warehouse Institute, 1\u201332"},{"issue":"7","key":"266_CR13","doi-asserted-by":"publisher","first-page":"757","DOI":"10.14778\/3067421.3067425","volume":"10","author":"S Gupta","year":"2017","unstructured":"Gupta S, Kumar R, Lu K, Moseley B, Vassilvitskii S (2017) Local search methods for k-means with outliers. PVLDB 10(7):757\u2013768. https:\/\/doi.org\/10.14778\/3067421.3067425","journal-title":"PVLDB"},{"issue":"1145\/2783258","key":"266_CR14","doi-asserted-by":"publisher","first-page":"2783317","DOI":"10.1145\/2783258.2783317","volume":"10","author":"S Song","year":"2015","unstructured":"Song S, Li C, Zhang X (2015) Turn waste into wealth: on simultaneous clustering and cleaning over dirty data. SIGKDD 10(1145\/2783258):2783317. https:\/\/doi.org\/10.1145\/2783258.2783317","journal-title":"SIGKDD"},{"key":"266_CR15","doi-asserted-by":"crossref","unstructured":"Li P, Rao X, Blase J, Zhang Y, Chu X, Zhang C ( 2021) Cleanml: A study for evaluating the impact of data cleaning on ml classification tasks. In: 2021 IEEE 37th International Conference on Data Engineering (ICDE), pp. 13\u2013 24. IEEE","DOI":"10.1109\/ICDE51399.2021.00009"},{"key":"266_CR16","doi-asserted-by":"publisher","unstructured":"Song S, Gao F, Huang R, Wang Y ( 2021) On saving outliers for better clustering over noisy data. In: Proceedings of the 2021 International Conference on Management of Data. SIGMOD \u201921, pp. 1692\u2013 1704. Association for Computing Machinery, New York, NY, USA. https:\/\/doi.org\/10.1145\/3448016.3457271","DOI":"10.1145\/3448016.3457271"},{"key":"266_CR17","doi-asserted-by":"publisher","first-page":"81","DOI":"10.1023\/A:1021564703268","volume":"7","author":"W Kim","year":"2003","unstructured":"Kim W, Choi B-J, Hong E-K, Kim S-K, Lee D (2003) A taxonomy of dirty data. Data Min Knowl Disc 7:81\u201399","journal-title":"Data Min Knowl Disc"},{"key":"266_CR18","doi-asserted-by":"publisher","first-page":"806","DOI":"10.1007\/s11390-021-1344-6","volume":"36","author":"Z-X Qi","year":"2021","unstructured":"Qi Z-X, Wang H-Z, Wang A-J (2021) Impacts of dirty data on classification and clustering models: an experimental evaluation. J Comput Sci Technol 36:806\u2013821","journal-title":"J Comput Sci Technol"},{"key":"266_CR19","first-page":"935","volume":"75","author":"DW Opderbeck","year":"2015","unstructured":"Opderbeck DW (2015) Cybersecurity, data breaches, and the economic loss doctrine in the payment card industry. Md. L. Rev. 75:935","journal-title":"Md. L. Rev."},{"issue":"8","key":"266_CR20","doi-asserted-by":"publisher","first-page":"10631","DOI":"10.1364\/OE.27.010631","volume":"27","author":"B Yan","year":"2019","unstructured":"Yan B, Zhao Y, Rahman S, Li Y, Yu X, Liu D, He Y, Zhang J (2019) Dirty-data-based alarm prediction in self-optimizing large-scale optical networks. Opt Express 27(8):10631\u201310643","journal-title":"Opt Express"},{"key":"266_CR21","doi-asserted-by":"crossref","unstructured":"Secci F, Ceccarelli A ( 2020) On failures of rgb cameras and their effects in autonomous driving applications. In: 2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE), pp. 13\u2013 24 . IEEE","DOI":"10.1109\/ISSRE5003.2020.00011"},{"key":"266_CR22","doi-asserted-by":"crossref","unstructured":"Ceccarelli A, Secci F (2022) Rgb cameras failures and their effects in autonomous driving applications. IEEE Transactions on Dependable and Secure Computing","DOI":"10.1109\/TDSC.2022.3156941"},{"issue":"2","key":"266_CR23","doi-asserted-by":"publisher","first-page":"105","DOI":"10.1016\/j.artmed.2010.05.002","volume":"50","author":"JM Jerez","year":"2010","unstructured":"Jerez JM, Molina I, Garc\u00eda-Laencina PJ, Alba E, Ribelles N, Mart\u00edn M, Franco L (2010) Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artif Intell Med 50(2):105\u2013115","journal-title":"Artif Intell Med"},{"issue":"3","key":"266_CR24","doi-asserted-by":"publisher","first-page":"259","DOI":"10.1023\/A:1008334909089","volume":"11","author":"K Lakshminarayan","year":"1999","unstructured":"Lakshminarayan K, Harp SA, Samad T (1999) Imputation of missing data in industrial databases. Appl Intell 11(3):259\u2013275","journal-title":"Appl Intell"},{"issue":"1","key":"266_CR25","doi-asserted-by":"publisher","first-page":"63","DOI":"10.1016\/j.artmed.2013.01.003","volume":"58","author":"F Cismondi","year":"2013","unstructured":"Cismondi F, Fialho AS, Vieira SM, Reti SR, Sousa JM, Finkelstein SN (2013) Missing data in medical databases: impute, delete or classify? Artif Intell Med 58(1):63\u201372","journal-title":"Artif Intell Med"},{"key":"266_CR26","doi-asserted-by":"crossref","unstructured":"Aljuaid T, Sasi S ( 2016) Proper imputation techniques for missing values in data sets. In: 2016 International Conference on Data Science and Engineering (ICDSE), pp. 1\u2013 5 . IEEE","DOI":"10.1109\/ICDSE.2016.7823957"},{"key":"266_CR27","doi-asserted-by":"crossref","unstructured":"Chu X, Ilyas IF, Krishnan S, Wang J (2016) Data cleaning: Overview and emerging challenges. In: Proceedings of the 2016 International Conference on Management of Data, pp. 2201\u2013 2206","DOI":"10.1145\/2882903.2912574"},{"key":"266_CR28","unstructured":"Shyu M-L, Chen S, Sarinnapakorn K, Chang L ( 2003) A novel anomaly detection scheme based on principal component classifier. https:\/\/api.semanticscholar.org\/CorpusID:6319694"},{"issue":"1145\/1807167","key":"266_CR29","doi-asserted-by":"publisher","first-page":"1807178","DOI":"10.1145\/1807167.1807178","volume":"10","author":"C Mayfield","year":"2010","unstructured":"Mayfield C, Neville J, Prabhakar S (2010) ERACER: A database approach for statistical inference and data cleaning. SIGMOD 10(1145\/1807167):1807178. https:\/\/doi.org\/10.1145\/1807167.1807178","journal-title":"SIGMOD"},{"key":"266_CR30","doi-asserted-by":"crossref","unstructured":"Yakout M, Berti-\u00c9quille L, Elmagarmid AK ( 2013) Don\u2019t be scared: use scalable automatic repairing with maximal likelihood and bounded changes. In: ACM SIGMOD Conference. https:\/\/api.semanticscholar.org\/CorpusID:3177872","DOI":"10.1145\/2463676.2463706"},{"key":"266_CR31","doi-asserted-by":"crossref","unstructured":"Grzymala-Busse JW, Goodwin LK, Grzymala-Busse WJ, Zheng X ( 2005) Handling missing attribute values in preterm birth data sets. In: Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing . https:\/\/api.semanticscholar.org\/CorpusID:16844449","DOI":"10.1007\/11548706_36"},{"issue":"6","key":"266_CR32","doi-asserted-by":"publisher","first-page":"520","DOI":"10.1093\/BIOINFORMATICS\/17.6.520","volume":"17","author":"OG Troyanskaya","year":"2001","unstructured":"Troyanskaya OG, Cantor MN, Sherlock G, Brown PO, Hastie T, Tibshirani R, Botstein D, Altman RB (2001) Missing value estimation methods for DNA microarrays. Bioinform 17(6):520\u2013525. https:\/\/doi.org\/10.1093\/BIOINFORMATICS\/17.6.520","journal-title":"Bioinform"},{"key":"266_CR33","unstructured":"Ester M, Kriegel H, Sander J, Xu X ( 1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, pp. 226\u2013 231 . http:\/\/www.aaai.org\/Library\/KDD\/1996\/kdd96-037.php"},{"issue":"1145\/304182","key":"266_CR34","doi-asserted-by":"publisher","DOI":"10.1145\/304182.304187","volume":"10","author":"M Ankerst","year":"1999","unstructured":"Ankerst M, Breunig MM, Kriegel H, Sander J (1999) OPTICS: ordering points to identify the clustering structure. SIGMOD 10(1145\/304182):304187. https:\/\/doi.org\/10.1145\/304182.304187","journal-title":"SIGMOD"},{"key":"266_CR35","doi-asserted-by":"crossref","unstructured":"Song S, Li C, Zhang X (2015) Turn waste into wealth: On simultaneous clustering and cleaning over dirty data. Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","DOI":"10.1145\/2783258.2783317"},{"key":"266_CR36","doi-asserted-by":"publisher","first-page":"573","DOI":"10.1007\/978-3-540-25929-9_70","volume":"3066","author":"D Li","year":"2004","unstructured":"Li D, Deogun J, Spaulding W, Shuart B (2004) Towards missing data imputation: a study of fuzzy k-means clustering method. Rough Sets Curr Trends Comput 3066:573\u2013579 (Springer)","journal-title":"Rough Sets Curr Trends Comput"},{"key":"266_CR37","doi-asserted-by":"publisher","first-page":"128","DOI":"10.1007\/978-3-540-79299-4_7","volume":"1","author":"S Zhang","year":"2008","unstructured":"Zhang S, Zhang J, Zhu X, Qin Y, Zhang C (2008) Missing value imputation based on data clustering. Trans Comput Sci 1:128\u2013138. https:\/\/doi.org\/10.1007\/978-3-540-79299-4_7","journal-title":"Trans Comput Sci"},{"issue":"1145\/1989323","key":"266_CR38","doi-asserted-by":"publisher","first-page":"1989373","DOI":"10.1145\/1989323.1989373","volume":"10","author":"W Fan","year":"2011","unstructured":"Fan W, Li J, Ma S, Tang N, Yu W (2011) Interaction between record matching and data repairing. SIGMOD 10(1145\/1989323):1989373. https:\/\/doi.org\/10.1145\/1989323.1989373","journal-title":"SIGMOD"},{"key":"266_CR39","doi-asserted-by":"crossref","unstructured":"Chu X, Ilyas IF, Papotti P (2013) Holistic data cleaning: Putting violations into context. 2013 IEEE 29th International Conference on Data Engineering (ICDE), 458\u2013469","DOI":"10.1109\/ICDE.2013.6544847"},{"key":"266_CR40","doi-asserted-by":"crossref","unstructured":"Khayyat Z, Ilyas IF, Jindal A, Madden S, Ouzzani M, Papotti P, Quian\u00e9-Ruiz J-A, Tang N, Yin S (2015) Bigdansing: A system for big data cleansing. Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data","DOI":"10.1145\/2723372.2747646"},{"key":"266_CR41","first-page":"1","volume":"61","author":"C Ye","year":"2018","unstructured":"Ye C, Li Q, Zhang H, Wang H, Gao J, Li J (2018) Autorepair: an automatic repairing approach over multi-source data. Knowl Inf Syst 61:1\u201331","journal-title":"Knowl Inf Syst"},{"issue":"11","key":"266_CR42","first-page":"1286","volume":"8","author":"S Song","year":"2015","unstructured":"Song S, Zhang A, Chen L, Wang J (2015) Enriching data imputation with extensive similarity neighbors. PVLDB 8(11):1286\u20131297","journal-title":"PVLDB"},{"issue":"2","key":"266_CR43","doi-asserted-by":"publisher","first-page":"275","DOI":"10.1109\/TKDE.2018.2883103","volume":"32","author":"S Song","year":"2020","unstructured":"Song S, Sun Y, Zhang A, Chen L, Wang J (2020) Enriching data imputation under similarity rule constraints. TKDE 32(2):275\u2013287. https:\/\/doi.org\/10.1109\/TKDE.2018.2883103","journal-title":"TKDE"},{"issue":"1","key":"266_CR44","doi-asserted-by":"publisher","first-page":"9","DOI":"10.21037\/atm-20-3623","volume":"4","author":"Z Zhang","year":"2016","unstructured":"Zhang Z (2016) Missing data imputation: focusing on single imputation. Ann Translat. Med 4(1):9","journal-title":"Ann Translat. Med"},{"issue":"3","key":"266_CR45","first-page":"343","volume":"86","author":"JD Dziura","year":"2013","unstructured":"Dziura JD, Post LA, Zhao Q, Fu Z, Peduzzi P (2013) Strategies for dealing with missing data in clinical trials: from design to analysis. Yale J Biol Med 86(3):343","journal-title":"Yale J Biol Med"},{"issue":"6","key":"266_CR46","doi-asserted-by":"publisher","first-page":"1453","DOI":"10.3233\/IDA-205497","volume":"25","author":"C Tang","year":"2021","unstructured":"Tang C, Wang H, Wang Z, Zeng X, Yan H, Xiao Y (2021) An improved optics clustering algorithm for discovering clusters with uneven densities. Intell Data Anal 25(6):1453\u20131471","journal-title":"Intell Data Anal"},{"key":"266_CR47","doi-asserted-by":"crossref","unstructured":"Mahdavi M, Abedjan Z, Fernandez RC, Madden S, Ouzzani M, Stonebraker M, Tang N (2019) Raha: A configuration-free error detection system. Proceedings of the 2019 International Conference on Management of Data","DOI":"10.1145\/3299869.3324956"},{"issue":"11","key":"266_CR48","doi-asserted-by":"publisher","first-page":"1190","DOI":"10.14778\/3137628.3137631","volume":"10","author":"T Rekatsinas","year":"2017","unstructured":"Rekatsinas T, Chu X, Ilyas IF, R\u00e9 C (2017) Holoclean: holistic data repairs with probabilistic inference. Proc VLDB Endow 10(11):1190\u20131201. https:\/\/doi.org\/10.14778\/3137628.3137631","journal-title":"Proc VLDB Endow"},{"key":"266_CR49","unstructured":"Krishnan S, Franklin MJ, Goldberg K, Wu E (2017) Boostclean: Automated error detection and repair for machine learning. ArXiv: abs\/1711.01299"},{"issue":"3","key":"266_CR50","doi-asserted-by":"publisher","first-page":"218","DOI":"10.1145\/3617338","volume":"1","author":"S Siddiqi","year":"2023","unstructured":"Siddiqi S, Kern R, Boehm M (2023) SAGA: a scalable framework for optimizing data cleaning pipelines for machine learning applications. Proc ACM Manag Data 1(3):218\u2013121826. https:\/\/doi.org\/10.1145\/3617338","journal-title":"Proc ACM Manag Data"},{"key":"266_CR51","doi-asserted-by":"publisher","first-page":"533","DOI":"10.1038\/323533a0","volume":"323","author":"DE Rumelhart","year":"1986","unstructured":"Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323:533\u2013536","journal-title":"Nature"},{"key":"266_CR52","doi-asserted-by":"publisher","first-page":"263","DOI":"10.1007\/s00521-009-0295-6","volume":"19","author":"PJ Garc\u00eda-Laencina","year":"2010","unstructured":"Garc\u00eda-Laencina PJ, Sancho-G\u00f3mez J-L, Figueiras-Vidal AR (2010) Pattern classification with missing data: a review. Neural Comput Appl 19:263\u2013282","journal-title":"Neural Comput Appl"},{"key":"266_CR53","doi-asserted-by":"crossref","unstructured":"Rekatsinas T, Chu X, Ilyas IF, R\u00e9 C (2017) Holoclean: Holistic data repairs with probabilistic inference. ArXiv: abs\/1702.00820","DOI":"10.14778\/3137628.3137631"},{"key":"266_CR54","unstructured":"Yoon J, Jordon J, Schaar M (2018) Gain: Missing data imputation using generative adversarial nets. ArXiv: abs\/1806.02920"},{"issue":"3","key":"266_CR55","doi-asserted-by":"publisher","first-page":"433","DOI":"10.14778\/3570690.3570694","volume":"16","author":"J Peng","year":"2022","unstructured":"Peng J, Shen D, Tang N, Liu T, Kou Y, Nie T, Cui H, Yu G (2022) Self-supervised and interpretable data cleaning with sequence generative adversarial networks. Proc VLDB Endow 16(3):433\u2013446. https:\/\/doi.org\/10.14778\/3570690.3570694","journal-title":"Proc VLDB Endow"},{"key":"266_CR56","unstructured":"Jarrett D, Cebere BC, Liu T, Curth A, Schaar M ( 2022) HyperImpute: Generalized iterative imputation with automatic model selection. In: Proceedings of the 39th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 162, pp. 9916\u2013 9937"},{"key":"266_CR57","unstructured":"Gondara L, Wang K (2017) Multiple imputation using deep denoising autoencoders. ArXiv: abs\/1705.02737"},{"key":"266_CR58","doi-asserted-by":"crossref","unstructured":"Costa AF, Santos MS, Soares JP, Abreu PH ( 2018) Missing data imputation via denoising autoencoders: The untold story. In: International Symposium on Intelligent Data Analysis . https:\/\/api.semanticscholar.org\/CorpusID:52961991","DOI":"10.1007\/978-3-030-01768-2_8"},{"key":"266_CR59","unstructured":"You J, Ma X, Ding DY, Kochenderfer MJ, Leskovec J (2020) Handling missing data with graph representation learning. ArXiv: abs\/2010.16418"},{"key":"266_CR60","unstructured":"Cappuzzo R, Thirumuruganathan S, Papotti P ( 2024) Relational Data Imputation with Graph Neural Networks. In: EDBT\/ICDT 2024, 27th International Conference on Extending Database Technology, Paestum, Italy . https:\/\/hal.science\/hal-04378971"},{"key":"266_CR61","unstructured":"Li A, Zhao Y, Qiu C, Kloft M, Smyth P, Rudolph M, Mandt S (2024) Anomaly detection of tabular data using llms. arXiv preprint arXiv:2406.16308"},{"key":"266_CR62","doi-asserted-by":"crossref","unstructured":"Biester F, Abdelaal M, Del\u00a0Gaudio D (2024) Llmclean: Context-aware tabular data cleaning via llm-generated ofds. arXiv preprint arXiv:2404.18681","DOI":"10.1007\/978-3-031-70421-5_7"},{"key":"266_CR63","doi-asserted-by":"crossref","unstructured":"Narayan A, Chami I, Orr L, Arora S, R\u00e9 C (2022) Can Foundation Models Wrangle Your Data?. https:\/\/arxiv.org\/abs\/2205.09911","DOI":"10.14778\/3574245.3574258"},{"key":"266_CR64","unstructured":"Pang G, Hengel A, Shen C, Cao L (2020) Deep reinforcement learning for unknown anomaly detection. arXiv preprint arXiv:2009.06847"},{"issue":"3","key":"266_CR65","doi-asserted-by":"publisher","first-page":"540","DOI":"10.3390\/agriculture13030540","volume":"13","author":"M Albahar","year":"2023","unstructured":"Albahar M (2023) A survey on deep learning and its impact on agriculture: challenges and opportunities. Agriculture 13(3):540","journal-title":"Agriculture"},{"issue":"12","key":"266_CR66","doi-asserted-by":"publisher","first-page":"3197","DOI":"10.1007\/s10115-022-01756-8","volume":"64","author":"X Li","year":"2022","unstructured":"Li X, Xiong H, Li X, Wu X, Zhang X, Liu J, Bian J, Dou D (2022) Interpretable deep learning: interpretation, interpretability, trustworthiness, and beyond. Knowl Inf Syst 64(12):3197\u20133234","journal-title":"Knowl Inf Syst"},{"key":"266_CR67","doi-asserted-by":"crossref","unstructured":"Lazarevic A, Kumar V ( 2005) Feature bagging for outlier detection. In: Knowledge Discovery and Data Mining . https:\/\/api.semanticscholar.org\/CorpusID:2054204","DOI":"10.1145\/1081870.1081891"},{"key":"266_CR68","unstructured":"Mariet Z, Harding R, Madden S, et al (2016) Outlier detection in heterogeneous datasets using automatic tuple expansion"},{"key":"266_CR69","doi-asserted-by":"crossref","unstructured":"Huang Z, He Y (2018) Auto-detect: Data-driven error detection in tables. Proceedings of the 2018 International Conference on Management of Data","DOI":"10.1145\/3183713.3196889"},{"key":"266_CR70","doi-asserted-by":"publisher","unstructured":"Mandros P, Boley M, Vreeken J ( 2017) Discovering reliable approximate functional dependencies. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD \u201917, pp. 355\u2013 363. Association for Computing Machinery, New York, NY, USA . https:\/\/doi.org\/10.1145\/3097983.3098062","DOI":"10.1145\/3097983.3098062"},{"issue":"1145\/1366102","key":"266_CR71","first-page":"1366103","volume":"10","author":"W Fan","year":"2008","unstructured":"Fan W, Geerts F, Jia X, Kementsietsidis A (2008) Conditional functional dependencies for capturing data inconsistencies. ACM Trans Database Syst 10(1145\/1366102):1366103","journal-title":"ACM Trans Database Syst"},{"issue":"13","key":"266_CR72","doi-asserted-by":"publisher","first-page":"1498","DOI":"10.14778\/2536258.2536262","volume":"6","author":"X Chu","year":"2013","unstructured":"Chu X, Ilyas IF, Papotti P (2013) Discovering denial constraints. Proc VLDB Endow 6(13):1498\u20131509. https:\/\/doi.org\/10.14778\/2536258.2536262","journal-title":"Proc VLDB Endow"},{"key":"266_CR73","doi-asserted-by":"publisher","unstructured":"Qahtan A, Tang N, Ouzzani M, Cao Y, Stonebraker M ( 2019). Anmat: Automatic knowledge discovery and error detection through pattern functional dependencies. In: Proceedings of the 2019 International Conference on Management of Data. SIGMOD \u201919, pp. 1977\u2013 1980, New York, NY, USA https:\/\/doi.org\/10.1145\/3299869.3320209","DOI":"10.1145\/3299869.3320209"},{"key":"266_CR74","doi-asserted-by":"publisher","unstructured":"Yan JN, Schulte O, Zhang M, Wang J, Cheng R ( 2020) Scoded: Statistical constraint oriented data error detection. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. SIGMOD \u201920, pp. 845\u2013 860. Association for Computing Machinery, New York, NY, USA . https:\/\/doi.org\/10.1145\/3318464.3380568","DOI":"10.1145\/3318464.3380568"},{"key":"266_CR75","doi-asserted-by":"publisher","unstructured":"Chai C, Cao L, Li G, Li J, Luo Y, Madden S ( 2020) Human-in-the-loop outlier detection. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. SIGMOD \u201920, pp. 19\u2013 33. Association for Computing Machinery, New York, NY, USA . https:\/\/doi.org\/10.1145\/3318464.3389772","DOI":"10.1145\/3318464.3389772"},{"key":"266_CR76","first-page":"392","volume":"98","author":"EM Knorr","year":"1998","unstructured":"Knorr EM, Ng RT (1998) Algorithms for mining distance-based outliers in large datasets. VLDB 98:392\u2013403","journal-title":"VLDB"},{"issue":"1145\/342009","key":"266_CR77","volume":"10","author":"MM Breunig","year":"2000","unstructured":"Breunig MM, Kriegel H, Ng RT, Sander J (2000) LOF: identifying density-based local outliers. SIGMOD 10(1145\/342009):335388","journal-title":"SIGMOD"},{"key":"266_CR78","doi-asserted-by":"crossref","unstructured":"Angiulli F, Pizzuti C ( 2002) Fast outlier detection in high dimensional spaces. In: European Conference on Principles of Data Mining and Knowledge Discovery . https:\/\/api.semanticscholar.org\/CorpusID:41515630","DOI":"10.1007\/3-540-45681-3_2"},{"key":"266_CR79","doi-asserted-by":"crossref","unstructured":"Kriegel H-P, Schubert M, Zimek A ( 2008)Angle-based outlier detection in high-dimensional data. In: Knowledge Discovery and Data Mining . https:\/\/api.semanticscholar.org\/CorpusID:3072058","DOI":"10.1145\/1401890.1401946"},{"key":"266_CR80","doi-asserted-by":"crossref","unstructured":"Liu FT, Ting KM, Zhou Z-H (2008) Isolation forest. 2008 Eighth IEEE International Conference on Data Mining, 413\u2013422","DOI":"10.1109\/ICDM.2008.17"},{"key":"266_CR81","unstructured":"Goldstein M, Dengel AR ( 2012) Histogram-based outlier score (hbos): A fast unsupervised anomaly detection algorithm. https:\/\/api.semanticscholar.org\/CorpusID:3590788"},{"key":"266_CR82","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM.2013.132","author":"B Micenkov\u00e1","year":"2013","unstructured":"Micenkov\u00e1 B, Ng RT, Dang X, Assent I (2013) Explaining outliers by subspace separability. ICDM. https:\/\/doi.org\/10.1109\/ICDM.2013.132","journal-title":"ICDM"},{"key":"266_CR83","doi-asserted-by":"publisher","unstructured":"Qahtan AA, Elmagarmid A, Castro\u00a0Fernandez R, Ouzzani M, Tang N ( 2018) Fahes: A robust disguised missing values detector. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. KDD \u201918, pp. 2100\u2013 2109. Association for Computing Machinery, New York, NY, USA . https:\/\/doi.org\/10.1145\/3219819.3220109","DOI":"10.1145\/3219819.3220109"},{"key":"266_CR84","doi-asserted-by":"publisher","unstructured":"Visengeriyeva L, Abedjan Z ( 2018) Metadata-driven error detection. In: Proceedings of the 30th International Conference on Scientific and Statistical Database Management. SSDBM \u201918. Association for Computing Machinery, New York, NY, USA . https:\/\/doi.org\/10.1145\/3221269.3223028","DOI":"10.1145\/3221269.3223028"},{"key":"266_CR85","doi-asserted-by":"crossref","unstructured":"Heidari A, McGrath J, Ilyas IF, Rekatsinas T (2019) Holodetect: Few-shot learning for error detection. Proceedings of the 2019 International Conference on Management of Data","DOI":"10.1145\/3299869.3319888"},{"key":"266_CR86","doi-asserted-by":"crossref","unstructured":"Wang P, He Y (2019) Uni-detect: A unified approach to automated error detection in tables. Proceedings of the 2019 International Conference on Management of Data","DOI":"10.1145\/3299869.3319855"},{"key":"266_CR87","doi-asserted-by":"publisher","unstructured":"Neutatz F, Mahdavi M, Abedjan Z ( 2019) Ed2: A case for active learning in error detection. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management. CIKM \u201919, pp. 2249\u2013 2252. Association for Computing Machinery, New York, NY, USA . https:\/\/doi.org\/10.1145\/3357384.3358129","DOI":"10.1145\/3357384.3358129"},{"issue":"5","key":"266_CR88","doi-asserted-by":"publisher","first-page":"927","DOI":"10.1007\/S00778-021-00699-W","volume":"31","author":"Z Liu","year":"2022","unstructured":"Liu Z, Zhou Z, Rekatsinas T (2022) Picket: guarding against corrupted data in tabular data during learning and inference. VLDB J 31(5):927\u2013955. https:\/\/doi.org\/10.1007\/S00778-021-00699-W","journal-title":"VLDB J"},{"key":"266_CR89","doi-asserted-by":"crossref","unstructured":"Chen J, Sathe S, Aggarwal C, Turaga D ( 2017) Outlier detection with autoencoder ensembles. In: Proceedings of the 2017 SIAM International Conference on Data Mining, pp. 90\u2013 98 . SIAM","DOI":"10.1137\/1.9781611974973.11"},{"key":"266_CR90","doi-asserted-by":"publisher","unstructured":"Pang G, Cao L, Chen L, Liu H ( 2018) Learning representations of ultrahigh-dimensional data for random distance-based outlier detection. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. KDD \u201918, pp. 2041\u2013 2050. Association for Computing Machinery, New York, NY, USA . https:\/\/doi.org\/10.1145\/3219819.3220042","DOI":"10.1145\/3219819.3220042"},{"key":"266_CR91","doi-asserted-by":"crossref","unstructured":"Pang G, Shen C, Van Den\u00a0Hengel, A ( 2019) Deep anomaly detection with deviation networks. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 353\u2013 362","DOI":"10.1145\/3292500.3330871"},{"key":"266_CR92","unstructured":"Ruff L, Vandermeulen RA, G\u00f6rnitz N, Binder A, M\u00fcller E, M\u00fcller K-R, Kloft M ( 2020) Deep semi-supervised anomaly detection. In: International Conference on Learning Representations . https:\/\/openreview.net\/forum?id=HkgH0TEYwH"},{"key":"266_CR93","doi-asserted-by":"crossref","unstructured":"Wang X, Meliou A, Wu E (2016) Qfix: Diagnosing errors through query histories. Proceedings of the 2017 ACM International Conference on Management of Data","DOI":"10.1145\/3035918.3035925"},{"key":"266_CR94","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE48307.2020.00068","author":"Y Sun","year":"2020","unstructured":"Sun Y, Song S, Wang C, Wang J (2020) Swapping repair for misplaced attribute values. ICDE. https:\/\/doi.org\/10.1109\/ICDE48307.2020.00068","journal-title":"ICDE"},{"key":"266_CR95","doi-asserted-by":"crossref","unstructured":"Hao S, Tang N, Li G, Li J (2017) Cleaning relations using knowledge bases. 2017 IEEE 33rd International Conference on Data Engineering (ICDE), 933\u2013944","DOI":"10.1109\/ICDE.2017.141"},{"key":"266_CR96","doi-asserted-by":"publisher","unstructured":"Chu X, Morcos J, Ilyas IF, Ouzzani M, Papotti P, Tang N, Ye Y ( 2015) Katara: A data cleaning system powered by knowledge bases and crowdsourcing. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. SIGMOD \u201915, pp. 1247\u2013 1261. Association for Computing Machinery, New York, NY, USA . https:\/\/doi.org\/10.1145\/2723372.2749431","DOI":"10.1145\/2723372.2749431"},{"key":"266_CR97","doi-asserted-by":"crossref","unstructured":"Chiang F, Miller RJ (2011) A unified model for data and constraint repair. 2011 IEEE 27th International Conference on Data Engineering, 446\u2013457","DOI":"10.1109\/ICDE.2011.5767833"},{"key":"266_CR98","doi-asserted-by":"crossref","unstructured":"Beskales G, Ilyas IF, Golab L, Galiullin A (2012) On the relative trust between inconsistent data and inaccurate constraints. 2013 IEEE 29th International Conference on Data Engineering (ICDE), 541\u2013552","DOI":"10.1109\/ICDE.2013.6544854"},{"key":"266_CR99","unstructured":"Livshits E, Kimelfeld B, Roy S (2017) Computing optimal repairs for functional dependencies. CoRR arXiv: abs\/1712.07705"},{"key":"266_CR100","doi-asserted-by":"publisher","first-page":"1218","DOI":"10.14778\/2536274.2536280","volume":"6","author":"A Ebaid","year":"2013","unstructured":"Ebaid A, Elmagarmid AK, Ilyas IF, Ouzzani M, Quian\u00e9-Ruiz J-A, Tang N, Yin S (2013) Nadeef: a generalized data cleaning system. Proc VLDB Endow 6:1218\u20131221","journal-title":"Proc VLDB Endow"},{"key":"266_CR101","first-page":"2048","volume":"34","author":"Y Gao","year":"2019","unstructured":"Gao Y, Ge C, Miao X, Wang H, Yao B, Li Q (2019) A hybrid data cleaning framework using markov logic networks. IEEE Trans Knowl Data Eng 34:2048\u20132062","journal-title":"IEEE Trans Knowl Data Eng"},{"key":"266_CR102","doi-asserted-by":"publisher","first-page":"1489","DOI":"10.1109\/TKDE.2019.2905548","volume":"32","author":"J Rammelaere","year":"2020","unstructured":"Rammelaere J, Geerts F, Goethals B (2020) Cleaning data with forbidden itemsets. IEEE Trans Knowl Data Eng 32:1489\u20131501","journal-title":"IEEE Trans Knowl Data Eng"},{"issue":"1145\/2882903","key":"266_CR103","first-page":"2882955","volume":"10","author":"S Song","year":"2016","unstructured":"Song S, Zhu H, Wang J (2016) Constraint-variance tolerant data repairing. SIGMOD 10(1145\/2882903):2882955","journal-title":"SIGMOD"},{"key":"266_CR104","doi-asserted-by":"crossref","unstructured":"Giannakopoulou S, Karpathiotakis M, Ailamaki A (2020) Cleaning denial constraint violations through relaxation. Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data","DOI":"10.1145\/3318464.3389775"},{"key":"266_CR105","doi-asserted-by":"publisher","first-page":"2546","DOI":"10.14778\/3476249.3476301","volume":"14","author":"EK Rezig","year":"2021","unstructured":"Rezig EK, Ouzzani M, Aref WG, Elmagarmid AK, Mahmood AR, Stonebraker M (2021) Horizon: scalable dependency-driven data cleaning. Proc VLDB Endow 14:2546\u20132554","journal-title":"Proc VLDB Endow"},{"key":"266_CR106","first-page":"429","volume":"56","author":"S Al-janabi","year":"2021","unstructured":"Al-janabi S, Janicki R (2021) Data repair of density-based data cleaning approach using conditional functional dependencies. Data Technol Appl 56:429\u2013446","journal-title":"Data Technol Appl"},{"key":"266_CR107","doi-asserted-by":"crossref","unstructured":"Sun Y, Song S (2021) From minimum change to maximum density: on s-repair under integrity constraints. ICDE, pp. 1943\u2013 1948 ( 2021)","DOI":"10.1109\/ICDE51399.2021.00181"},{"issue":"2","key":"266_CR108","doi-asserted-by":"publisher","first-page":"627","DOI":"10.1109\/TKDE.2023.3294401","volume":"36","author":"Y Sun","year":"2024","unstructured":"Sun Y, Song S, Yuan X (2024) From minimum change to maximum density: on determining near-optimal s-repair. IEEE Trans Knowl Data Eng 36(2):627\u2013639. https:\/\/doi.org\/10.1109\/TKDE.2023.3294401","journal-title":"IEEE Trans Knowl Data Eng"},{"key":"266_CR109","doi-asserted-by":"publisher","first-page":"1288","DOI":"10.1109\/TKDE.2020.2992456","volume":"34","author":"X Ding","year":"2022","unstructured":"Ding X, Wang H, Su J, Wang M, Li J, Gao H (2022) Leveraging currency for repairing inconsistent and incomplete data. IEEE Trans Knowl Data Eng 34:1288\u20131302","journal-title":"IEEE Trans Knowl Data Eng"},{"issue":"11","key":"266_CR110","doi-asserted-by":"publisher","first-page":"987","DOI":"10.14778\/2732967.2732974","volume":"7","author":"S Song","year":"2014","unstructured":"Song S, Cheng H, Yu JX, Chen L (2014) Repairing vertex labels under neighborhood constraints. PVLDB 7(11):987\u2013998. https:\/\/doi.org\/10.14778\/2732967.2732974","journal-title":"PVLDB"},{"key":"266_CR111","doi-asserted-by":"crossref","unstructured":"Song S, Gao F, Huang R, Wang Y (2021) On saving outliers for better clustering over noisy data. Proceedings of the 2021 International Conference on Management of Data","DOI":"10.1145\/3448016.3457271"},{"issue":"12","key":"266_CR112","doi-asserted-by":"publisher","first-page":"948","DOI":"10.14778\/2994509.2994514","volume":"9","author":"S Krishnan","year":"2016","unstructured":"Krishnan S, Wang J, Wu E, Franklin MJ, Goldberg K (2016) Activeclean: interactive data cleaning for statistical modeling. Proc VLDB Endow 9(12):948\u2013959. https:\/\/doi.org\/10.14778\/2994509.2994514","journal-title":"Proc VLDB Endow"},{"key":"266_CR113","doi-asserted-by":"publisher","first-page":"113511","DOI":"10.1016\/J.ESWA.2020.113511","volume":"159","author":"M Ataeyan","year":"2020","unstructured":"Ataeyan M, Daneshpour N (2020) A novel data repairing approach based on constraints and ensemble learning. Expert Syst Appl 159:113511. https:\/\/doi.org\/10.1016\/J.ESWA.2020.113511","journal-title":"Expert Syst Appl"},{"issue":"11","key":"266_CR114","doi-asserted-by":"publisher","first-page":"1948","DOI":"10.14778\/3407790.3407801","volume":"13","author":"M Mahdavi","year":"2020","unstructured":"Mahdavi M, Abedjan Z (2020) Baran: effective error correction via a unified context representation and transfer learning. Proc VLDB Endow 13(11):1948\u20131961","journal-title":"Proc VLDB Endow"},{"key":"266_CR115","doi-asserted-by":"publisher","unstructured":"Berti-Equille L ( 2019) Learn2clean: Optimizing the sequence of tasks for web data preparation. In: The World Wide Web Conference. WWW \u201919, pp. 2580\u2013 2586. Association for Computing Machinery, New York, NY, USA . https:\/\/doi.org\/10.1145\/3308558.3313602","DOI":"10.1145\/3308558.3313602"},{"key":"266_CR116","doi-asserted-by":"publisher","unstructured":"Zhang X, Ji Y, Nguyen C, Wang T ( 2018) Deepclean: Data cleaning via question asking. In: 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA), pp. 283\u2013 292 . https:\/\/doi.org\/10.1109\/DSAA.2018.00039","DOI":"10.1109\/DSAA.2018.00039"},{"issue":"1","key":"266_CR117","doi-asserted-by":"publisher","first-page":"663","DOI":"10.1038\/s41598-017-19120-0","volume":"8","author":"R Wei","year":"2018","unstructured":"Wei R, Wang J, Su M, Jia E, Chen S, Chen T, Ni Y (2018) Missing value imputation approach for mass spectrometry-based metabolomics data. Sci Rep 8(1):663","journal-title":"Sci Rep"},{"issue":"4","key":"266_CR118","doi-asserted-by":"publisher","first-page":"377","DOI":"10.1002\/sim.4067","volume":"30","author":"IR White","year":"2011","unstructured":"White IR, Royston P, Wood AM (2011) Multiple imputation using chained equations: issues and guidance for practice. Stat Med 30(4):377\u201399","journal-title":"Stat Med"},{"issue":"16","key":"266_CR119","doi-asserted-by":"publisher","first-page":"2088","DOI":"10.1093\/BIOINFORMATICS\/BTG287","volume":"19","author":"S Oba","year":"2003","unstructured":"Oba S, Sato M, Takemasa I, Monden M, Matsubara K, Ishii S (2003) A bayesian missing value estimation method for gene expression profile data. Bioinform 19(16):2088\u20132096. https:\/\/doi.org\/10.1093\/BIOINFORMATICS\/BTG287","journal-title":"Bioinform"},{"key":"266_CR120","unstructured":"Twala B, Cartwright M, Shepperd MJ (2005) Comparison of various methods for handling incomplete data in software engineering databases. 2005 International Symposium on Empirical Software Engineering, (2005)"},{"issue":"3","key":"266_CR121","doi-asserted-by":"publisher","first-page":"34","DOI":"10.1093\/nar\/gnh026","volume":"32","author":"T Hellem","year":"2004","unstructured":"Hellem T, Dysvik B, Jonassen I (2004) LSimpute: accurate estimation of missing values in microarray data with least squares methods. Nucleic Acids Res 32(3):34\u201334. https:\/\/doi.org\/10.1093\/nar\/gnh026 (https:\/\/academic.oup.com\/nar\/article-pdf\/32\/3\/e34\/9490860\/gnh026.pdf)","journal-title":"Nucleic Acids Res"},{"key":"266_CR122","doi-asserted-by":"publisher","first-page":"913","DOI":"10.1080\/08839514.2019.1637138","volume":"33","author":"AS Jadhav","year":"2019","unstructured":"Jadhav AS, Pramod D, Ramanathan K (2019) Comparison of performance of data imputation methods for numeric dataset. Appl Artif Intell 33:913\u2013933","journal-title":"Appl Artif Intell"},{"key":"266_CR123","doi-asserted-by":"publisher","first-page":"773","DOI":"10.1016\/j.csda.2006.12.036","volume":"52","author":"S Iacus","year":"2007","unstructured":"Iacus S, Porro G (2007) Missing data imputation, matching and other applications of random recursive partitioning. Comput Stat Data Anal 52:773\u2013789","journal-title":"Comput Stat Data Anal"},{"key":"266_CR124","doi-asserted-by":"publisher","first-page":"249","DOI":"10.1016\/j.knosys.2017.06.010","volume":"132","author":"X Chen","year":"2017","unstructured":"Chen X, Wei Z, Li Z, Liang J, Cai Y, Zhang B (2017) Ensemble correlation-based low-rank matrix completion with applications to traffic data imputation. Knowl Based Syst 132:249\u2013262","journal-title":"Knowl Based Syst"},{"key":"266_CR125","doi-asserted-by":"publisher","first-page":"12983","DOI":"10.1109\/ACCESS.2018.2803755","volume":"6","author":"X Xu","year":"2018","unstructured":"Xu X, Chong WK, Li S, Arabo A, Xiao J (2018) Miaec: missing data imputation based on the evidence chain. IEEE Access 6:12983\u201312992","journal-title":"IEEE Access"},{"key":"266_CR126","doi-asserted-by":"publisher","first-page":"32","DOI":"10.1186\/1471-2105-7-32","volume":"7","author":"X Wang","year":"2006","unstructured":"Wang X, Li A, Jiang Z, Feng H (2006) Missing value estimation for dna microarray gene expression data by support vector regression imputation and orthogonal coding scheme. BMC Bioinform 7:32\u201332","journal-title":"BMC Bioinform"},{"key":"266_CR127","doi-asserted-by":"publisher","first-page":"2794","DOI":"10.1016\/j.eswa.2008.01.059","volume":"36","author":"Y Qin","year":"2009","unstructured":"Qin Y, Zhang S, Zhu X, Zhang J, Zhang C (2009) Pop algorithm: Kernel-based imputation to treat missing values in knowledge discovery from databases. Expert Syst Appl 36:2794\u20132804","journal-title":"Expert Syst Appl"},{"key":"266_CR128","doi-asserted-by":"publisher","first-page":"79","DOI":"10.1007\/s10489-006-0032-0","volume":"27","author":"Y Qin","year":"2007","unstructured":"Qin Y, Zhang S, Zhu X, Zhang J, Zhang C (2007) Semi-parametric optimization for missing data imputation. Appl Intell 27:79\u201388","journal-title":"Appl Intell"},{"key":"266_CR129","doi-asserted-by":"crossref","unstructured":"Grzymala-Busse JW, Grzymala-Busse WJ, Goodwin LK ( 1999) A closest fit approach to missing attribute values in preterm birth data. In: Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing . https:\/\/api.semanticscholar.org\/CorpusID:18555094","DOI":"10.1007\/978-3-540-48061-7_49"},{"issue":"2","key":"266_CR130","doi-asserted-by":"publisher","first-page":"187","DOI":"10.1093\/bioinformatics\/bth499","volume":"21","author":"H Kim","year":"2005","unstructured":"Kim H, Golub GH, Park H (2005) Missing value estimation for dna microarray gene expression data: local least squares imputation. Bioinformatics 21(2):187\u201398","journal-title":"Bioinformatics"},{"issue":"10","key":"266_CR131","doi-asserted-by":"publisher","first-page":"1112","DOI":"10.1016\/j.compbiomed.2008.08.006","volume":"38","author":"X Zhang","year":"2008","unstructured":"Zhang X, Song X, Wang H, Zhang H (2008) Sequential local least squares imputation estimating missing value of microarray data. Comput Biol Med 38(10):1112\u201320","journal-title":"Comput Biol Med"},{"key":"266_CR132","doi-asserted-by":"publisher","first-page":"61","DOI":"10.1007\/s10489-010-0244-1","volume":"36","author":"B Zhu","year":"2010","unstructured":"Zhu B, He C, Liatsis P (2010) A robust missing value imputation method for noisy data. Appl Intell 36:61\u201374","journal-title":"Appl Intell"},{"key":"266_CR133","doi-asserted-by":"publisher","first-page":"452","DOI":"10.1016\/j.jss.2010.11.887","volume":"84","author":"S Zhang","year":"2011","unstructured":"Zhang S, Jin Z, Zhu X (2011) Missing data imputation by utilizing information within incomplete instances. J Syst Softw 84:452\u2013459","journal-title":"J Syst Softw"},{"key":"266_CR134","first-page":"153","volume":"6","author":"P Zuccolotto","year":"2008","unstructured":"Zuccolotto P (2008) A symbolic data approach for missing values treatment in principal component analysis. Stat Appl 6:153\u2013180","journal-title":"Stat Appl"},{"key":"266_CR135","doi-asserted-by":"publisher","first-page":"115","DOI":"10.1016\/j.ins.2013.03.043","volume":"240","author":"E Eirola","year":"2013","unstructured":"Eirola E, Doquire G, Verleysen M, Lendasse A (2013) Distance estimation in numerical data sets with missing values. Inf Sci 240:115\u2013128","journal-title":"Inf Sci"},{"key":"266_CR136","doi-asserted-by":"crossref","unstructured":"Schafer JL ( 1997) Analysis of incomplete multivariate data. https:\/\/api.semanticscholar.org\/CorpusID:61972012","DOI":"10.1201\/9781439821862"},{"key":"266_CR137","first-page":"85","volume":"27","author":"TE Raghunathan","year":"2001","unstructured":"Raghunathan TE, Lepkowski JM, Hoewyk JV, Solenberger PW (2001) A multivariate technique for multiply imputing missing values using a sequence of regression models. Surv Methodol 27:85\u201395","journal-title":"Surv Methodol"},{"key":"266_CR138","doi-asserted-by":"publisher","first-page":"4013","DOI":"10.1016\/j.csda.2006.12.022","volume":"51","author":"JRV Ginkel","year":"2007","unstructured":"Ginkel JRV, Ark LAV, Sijtsma K, Vermunt JK (2007) Two-way imputation: a bayesian method for estimating missing scores in tests and questionnaires, and an accurate approximation. Comput Stat Data Anal 51:4013\u20134027","journal-title":"Comput Stat Data Anal"},{"key":"266_CR139","doi-asserted-by":"publisher","first-page":"376","DOI":"10.1007\/s10489-013-0469-x","volume":"40","author":"J Tian","year":"2013","unstructured":"Tian J, Yu T, Yu D, Ma S (2013) Missing data analyses: a hybrid multiple imputation algorithm using gray system theory and entropy based on clustering. Appl Intell 40:376\u2013388","journal-title":"Appl Intell"},{"key":"266_CR140","first-page":"1","volume":"45","author":"S Van Buuren","year":"2011","unstructured":"Van Buuren S, Groothuis-Oudshoorn K (2011) mice: multivariate imputation by chained equations in r. JOSS 45:1\u201367","journal-title":"JOSS"},{"key":"266_CR141","doi-asserted-by":"publisher","DOI":"10.1145\/3639326","author":"M Perini","year":"2024","unstructured":"Perini M, Nikolic M (2024) In-database data imputation. Proc ACM Manag Data. https:\/\/doi.org\/10.1145\/3639326","journal-title":"Proc ACM Manag Data"},{"issue":"1","key":"266_CR142","doi-asserted-by":"publisher","first-page":"37","DOI":"10.1186\/s40537-020-00313-w","volume":"7","author":"SI Khan","year":"2020","unstructured":"Khan SI, Hoque ASML (2020) Sice: an improved missing data imputation technique. J Big Data 7(1):37","journal-title":"J Big Data"},{"key":"266_CR143","doi-asserted-by":"publisher","first-page":"231","DOI":"10.1007\/s11634-011-0086-7","volume":"5","author":"J Josse","year":"2011","unstructured":"Josse J, Pag\u00e8s J, Husson F (2011) Multiple imputation in principal component analysis. Adv Data Anal Classif 5:231\u2013246","journal-title":"Adv Data Anal Classif"},{"key":"266_CR144","doi-asserted-by":"publisher","first-page":"354","DOI":"10.1002\/bimj.201900360","volume":"63","author":"APD Silva","year":"2020","unstructured":"Silva APD, Livera AMD, Lee KJ, Moreno-Betancur M, Simpson JA (2020) Multiple imputation methods for handling missing values in longitudinal studies with sampling weights: comparison of methods implemented in stata. Biom J 63:354\u2013371","journal-title":"Biom J"},{"issue":"1","key":"266_CR145","doi-asserted-by":"publisher","first-page":"112","DOI":"10.1093\/bioinformatics\/btr597","volume":"28","author":"DJ Stekhoven","year":"2011","unstructured":"Stekhoven DJ, B\u00fchlmann P (2011) Missforest - non-parametric missing value imputation for mixed-type data. Bioinformatics 28(1):112\u20138","journal-title":"Bioinformatics"},{"key":"266_CR146","doi-asserted-by":"publisher","first-page":"52","DOI":"10.1016\/j.patcog.2017.04.005","volume":"69","author":"J Xia","year":"2017","unstructured":"Xia J, Zhang S, Cai G, Li L, Pan Q, Yan J, Ning G (2017) Adjusted weight voting algorithm for random forests in handling missing values. Pattern Recognit 69:52\u201360","journal-title":"Pattern Recognit"},{"key":"266_CR147","doi-asserted-by":"publisher","first-page":"51","DOI":"10.1016\/j.knosys.2013.08.023","volume":"53","author":"MG Rahman","year":"2013","unstructured":"Rahman MG, Islam MZ (2013) Missing value imputation using decision trees and decision forests by splitting and merging records: two novel techniques. Knowl Based Syst 53:51\u201365","journal-title":"Knowl Based Syst"},{"key":"266_CR148","doi-asserted-by":"publisher","first-page":"1001","DOI":"10.1007\/s00180-020-00987-z","volume":"35","author":"C Beaulac","year":"2018","unstructured":"Beaulac C, Rosenthal JS (2018) Best: a decision tree algorithm that handles missing values. Comput Stat 35:1001\u20131026","journal-title":"Comput Stat"},{"key":"266_CR149","doi-asserted-by":"publisher","first-page":"163","DOI":"10.1016\/j.ins.2015.03.018","volume":"311","author":"H Cevallos-Valdiviezo","year":"2015","unstructured":"Cevallos-Valdiviezo H, Aelst SV (2015) Tree-based prediction on incomplete data using imputation or surrogate decisions. Inf Sci 311:163\u2013181","journal-title":"Inf Sci"},{"key":"266_CR150","doi-asserted-by":"crossref","unstructured":"Madhu G, Bharadwaj BL, Nagachandrika G, Vardhan K (2019) A novel algorithm for missing data imputation on machine learning. 2019 International Conference on Smart Systems and Inventive Technology (ICSSIT), 173\u2013177","DOI":"10.1109\/ICSSIT46314.2019.8987895"},{"key":"266_CR151","doi-asserted-by":"publisher","first-page":"796","DOI":"10.1109\/TPAMI.1987.4767986","volume":"9","author":"AKC Wong","year":"1987","unstructured":"Wong AKC, Chiu DKY (1987) Synthesizing statistical knowledge from incomplete mixed-mode data. IEEE Trans Pattern Anal Mach Intell 9:796\u2013805","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"266_CR152","unstructured":"MacQueen, J( 1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics, pp. 281\u2013 297. University of California Press, Berkeley, Calif . https:\/\/projecteuclid.org\/euclid.bsmsp\/1200512992"},{"key":"266_CR153","doi-asserted-by":"publisher","DOI":"10.1109\/FUZZ-IEEE.2017.8015560","author":"S Nikfalazar","year":"2017","unstructured":"Nikfalazar S, Yeh C, Bedingfield SE, Khorshidi HA (2017) A new iterative fuzzy clustering algorithm for multiple imputation of missing data. FUZZ-IEEE. https:\/\/doi.org\/10.1109\/FUZZ-IEEE.2017.8015560","journal-title":"FUZZ-IEEE"},{"key":"266_CR154","doi-asserted-by":"publisher","first-page":"91","DOI":"10.1080\/00031305.2015.1086685","volume":"70","author":"JT Chi","year":"2014","unstructured":"Chi JT, Chi EC, Baraniuk R (2014) k-pod: A method for k-means clustering of missing data. Am Stat 70:91\u201399","journal-title":"Am Stat"},{"key":"266_CR155","first-page":"1","volume":"2015","author":"X Yan","year":"2015","unstructured":"Yan X, Xiong W, Hu L, Wang F, Zhao K (2015) Missing value imputation based on gaussian mixture model for the internet of things. Math Probl Eng 2015:1\u20138","journal-title":"Math Probl Eng"},{"key":"266_CR156","doi-asserted-by":"publisher","first-page":"134","DOI":"10.1016\/j.neucom.2014.12.073","volume":"156","author":"C Gautam","year":"2015","unstructured":"Gautam C, Ravi V (2015) Data imputation via evolutionary computation, clustering and a neural network. Neurocomputing 156:134\u2013142","journal-title":"Neurocomputing"},{"issue":"11","key":"266_CR157","doi-asserted-by":"publisher","first-page":"3045","DOI":"10.14778\/3681954.3681982","volume":"17","author":"Y Sun","year":"2024","unstructured":"Sun Y, Zhu J, Xu X, Xu X, Sun Y, Song S, Li X, Yuan X (2024) Win-win: on simultaneous clustering and imputing over incomplete data. Proc VLDB Endow 17(11):3045\u20133057","journal-title":"Proc VLDB Endow"},{"key":"266_CR158","doi-asserted-by":"publisher","first-page":"418","DOI":"10.1016\/j.ins.2021.04.076","volume":"571","author":"D-T Dinh","year":"2021","unstructured":"Dinh D-T, Huynh V-N, Sriboonchitta S (2021) Clustering mixed numerical and categorical data with missing values. Inf Sci 571:418\u2013442","journal-title":"Inf Sci"},{"key":"266_CR159","doi-asserted-by":"publisher","first-page":"60","DOI":"10.3390\/sym14010060","volume":"14","author":"K Gao","year":"2022","unstructured":"Gao K, Khan HA, Qu W (2022) Clustering with missing features: a density-based approach. Symmetry 14:60","journal-title":"Symmetry"},{"issue":"3","key":"266_CR160","doi-asserted-by":"publisher","first-page":"175","DOI":"10.1080\/00031305.1992.10475879","volume":"46","author":"NS Altman","year":"1992","unstructured":"Altman NS (1992) An introduction to kernel and nearest-neighbor nonparametric regression. Am Stat 46(3):175\u2013185","journal-title":"Am Stat"},{"key":"266_CR161","doi-asserted-by":"publisher","DOI":"10.1109\/ICPR.2004.1334065","author":"C Domeniconi","year":"2004","unstructured":"Domeniconi C, Yan B (2004) Nearest neighbor ensemble. ICPR. https:\/\/doi.org\/10.1109\/ICPR.2004.1334065","journal-title":"ICPR"},{"key":"266_CR162","doi-asserted-by":"publisher","DOI":"10.1109\/ICSMC.2012.6378177","author":"S Wu","year":"2012","unstructured":"Wu S, Feng X, Han Y, Wang Q (2012) Missing categorical data imputation approach based on similarity. SMC. https:\/\/doi.org\/10.1109\/ICSMC.2012.6378177","journal-title":"SMC"},{"key":"266_CR163","doi-asserted-by":"publisher","first-page":"2541","DOI":"10.1016\/j.jss.2012.05.073","volume":"85","author":"S Zhang","year":"2012","unstructured":"Zhang S (2012) Nearest neighbor selection for iteratively knn imputation. J Syst Softw 85:2541\u20132552","journal-title":"J Syst Softw"},{"key":"266_CR164","doi-asserted-by":"publisher","first-page":"1483","DOI":"10.1016\/j.neucom.2008.11.026","volume":"72","author":"PJ Garc\u00eda-Laencina","year":"2009","unstructured":"Garc\u00eda-Laencina PJ, Sancho-G\u00f3mez J-L, Figueiras-Vidal AR, Verleysen M (2009) K nearest neighbours with mutual information for simultaneous classification and missing data imputation. Neurocomputing 72:1483\u20131493","journal-title":"Neurocomputing"},{"key":"266_CR165","doi-asserted-by":"publisher","first-page":"614","DOI":"10.1007\/s10489-015-0666-x","volume":"43","author":"R Pan","year":"2015","unstructured":"Pan R, Yang T, Cao J, Lu K, Zhang Z (2015) Missing data imputation by k nearest neighbours based on grey relational structure and mutual information. Appl Intell 43:614\u2013632","journal-title":"Appl Intell"},{"key":"266_CR166","first-page":"32","volume":"9","author":"S Zhang","year":"2008","unstructured":"Zhang S (2008) Parimputation: from imputation and null-imputation to partially imputation. IEEE Intell Informatics Bull 9:32\u201338","journal-title":"IEEE Intell Informatics Bull"},{"key":"266_CR167","doi-asserted-by":"publisher","first-page":"226","DOI":"10.1016\/j.jss.2017.07.012","volume":"132","author":"J Huang","year":"2017","unstructured":"Huang J, Keung JW, Sarro F, Li Y, Yu Y-T, Chan WK, Sun H (2017) Cross-validation based k nearest neighbor imputation for software quality datasets: an empirical study. J Syst Softw 132:226\u2013252","journal-title":"J Syst Softw"},{"key":"266_CR168","first-page":"197","volume":"16","author":"L Beretta","year":"2016","unstructured":"Beretta L, Santaniello A (2016) Nearest neighbor imputation algorithms: a critical evaluation. BMC Med Inf Decis Mak 16:197\u2013208","journal-title":"BMC Med Inf Decis Mak"},{"key":"266_CR169","doi-asserted-by":"publisher","first-page":"5993","DOI":"10.1007\/s00500-021-05590-y","volume":"25","author":"BM Al-Helali","year":"2021","unstructured":"Al-Helali BM, Chen Q, Xue B, Zhang M (2021) A new imputation method based on genetic programming and weighted knn for symbolic regression with incomplete data. Soft Comput 25:5993\u20136012","journal-title":"Soft Comput"},{"key":"266_CR170","doi-asserted-by":"publisher","unstructured":"Cleveland, W.S., Loader, C( 1996) In: H\u00e4rdle, W., Schimek, M.G. (eds.) Smoothing by Local Regression: Principles and Methods, pp. 10\u2013 49. Physica-Verlag HD, Heidelberg . https:\/\/doi.org\/10.1007\/978-3-642-48425-4_2","DOI":"10.1007\/978-3-642-48425-4_2"},{"key":"266_CR171","doi-asserted-by":"publisher","unstructured":"Zhang A, Song S, Sun Y, Wang J ( 2019) Learning individual models for imputation. In: ICDE, pp. 160\u2013 171 . https:\/\/doi.org\/10.1109\/ICDE.2019.00023","DOI":"10.1109\/ICDE.2019.00023"},{"key":"266_CR172","doi-asserted-by":"publisher","unstructured":"Song S, Sun Y ( 2020) Imputing various incomplete attributes via distance likelihood maximization. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. KDD \u201920, pp. 535\u2013 545. Association for Computing Machinery, New York, NY, USA . https:\/\/doi.org\/10.1145\/3394486.3403096","DOI":"10.1145\/3394486.3403096"},{"key":"266_CR173","unstructured":"Muzellec B, Josse J, Boyer C, Cuturi M (2020) Missing data imputation using optimal transport. In: III, H.D., Singh, A. (eds.) Proceedings of the 37th International Conference on Machine Learning. Proceedings of Machine Learning Research 119:7130\u20137140 (https:\/\/proceedings.mlr.press\/v119\/muzellec20a.html)"},{"key":"266_CR174","unstructured":"Zhao H, Sun K, Dezfouli A, Bonilla E.V (2023) Transformed distribution matching for missing value imputation. In: Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., Scarlett, J. (eds.) Proceedings of the 40th International Conference on Machine Learning. Proceedings of Machine Learning Research 202:42159\u201342186 (https:\/\/proceedings.mlr.press\/v202\/zhao23h.html)"},{"key":"266_CR175","doi-asserted-by":"publisher","first-page":"17","DOI":"10.1016\/j.neucom.2016.08.044","volume":"218","author":"KJ Nishanth","year":"2016","unstructured":"Nishanth KJ, Ravi V (2016) Probabilistic neural network based categorical data imputation. Neurocomputing 218:17\u201325","journal-title":"Neurocomputing"},{"key":"266_CR176","doi-asserted-by":"publisher","first-page":"141","DOI":"10.1016\/j.ifacol.2018.09.406","volume":"51","author":"JT McCoy","year":"2018","unstructured":"McCoy JT, Kroon S, Auret L (2018) Variational autoencoders for missing data imputation with application to a simulated milling circuit. IFAC-PapersOnLine 51:141\u2013146","journal-title":"IFAC-PapersOnLine"},{"key":"266_CR177","unstructured":"Naz\u00e1bal A, Olmos PM, Ghahramani Z, Valera I (2018) Handling incomplete heterogeneous data using vaes. ArXiv: abs\/1807.03653"},{"key":"266_CR178","unstructured":"Mattei P-A, Frellsen J ( 2019) Miwae: Deep generative modelling and imputation of incomplete data sets. In: International Conference on Machine Learning . https:\/\/api.semanticscholar.org\/CorpusID:174800427"},{"key":"266_CR179","first-page":"249","volume":"129","author":"I Spinelli","year":"2019","unstructured":"Spinelli I, Scardapane S, Uncini A (2019) Missing data imputation with adversarially-trained graph convolutional networks. Neural Netw Off J Int Neural Netw Soc 129:249\u2013260","journal-title":"Neural Netw Off J Int Neural Netw Soc"},{"key":"266_CR180","unstructured":"Zhong JR, Ye W, Gui N ( 2022) Data imputation with iterative graph reconstruction. In: AAAI Conference on Artificial Intelligence . https:\/\/api.semanticscholar.org\/CorpusID:254275250"},{"issue":"7","key":"266_CR181","doi-asserted-by":"publisher","first-page":"1202","DOI":"10.14778\/3450980.3450989","volume":"14","author":"T Liu","year":"2021","unstructured":"Liu T, Fan J, Luo Y, Tang N, Li G, Du X (2021) Adaptive data augmentation for supervised learning over missing data. Proc VLDB Endow 14(7):1202\u20131214. https:\/\/doi.org\/10.14778\/3450980.3450989","journal-title":"Proc VLDB Endow"},{"key":"266_CR182","unstructured":"Arjovsky M, Chintala S, Bottou L (2017) Wasserstein generative adversarial networks. In: International Conference on Machine Learning . https:\/\/api.semanticscholar.org\/CorpusID:2057420"},{"key":"266_CR183","doi-asserted-by":"publisher","unstructured":"Thanh-Tung H, Tran T ( 2020) Catastrophic forgetting and mode collapse in gans. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1\u2013 10 . https:\/\/doi.org\/10.1109\/IJCNN48605.2020.9207181","DOI":"10.1109\/IJCNN48605.2020.9207181"},{"key":"266_CR184","unstructured":"Kyono T, Zhang Y, Bellot A, Schaar M (2021) Miracle: Causally-aware imputation via learning missing data mechanisms. In: Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P.S., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems 34:23806\u201323817"},{"key":"266_CR185","doi-asserted-by":"publisher","first-page":"9316","DOI":"10.1109\/TIP.2020.3026622","volume":"29","author":"J Xu","year":"2020","unstructured":"Xu J, Huang Y, Cheng M-M, Liu L, Zhu F, Xu Z, Shao L (2020) Noisy-as-clean: learning self-supervised denoising from corrupted image. IEEE Trans Image Process 29:9316\u20139329","journal-title":"IEEE Trans Image Process"},{"key":"266_CR186","unstructured":"Sportisse A, Marbac M, Laporte F, Celeux G, Boyer C, Josse J, Biernacki C (2021)Model-based clustering with missing not at random data. arXiv preprint arXiv:2112.10425"},{"key":"266_CR187","doi-asserted-by":"publisher","unstructured":"Huang Z, He Y ( 2018) Auto-detect: Data-driven error detection in tables. In: Proceedings of the 2018 International Conference on Management of Data. SIGMOD \u201918, pp. 1377\u2013 1392. Association for Computing Machinery, New York, NY, USA . https:\/\/doi.org\/10.1145\/3183713.3196889","DOI":"10.1145\/3183713.3196889"},{"issue":"1","key":"266_CR188","doi-asserted-by":"publisher","first-page":"37","DOI":"10.1053\/j.nainr.2009.12.009","volume":"10","author":"JW Osborne","year":"2010","unstructured":"Osborne JW (2010) Data cleaning basics: best practices in dealing with extreme scores. Newborn Infant Nurs Rev 10(1):37\u201343","journal-title":"Newborn Infant Nurs Rev"},{"issue":"3","key":"266_CR189","doi-asserted-by":"publisher","first-page":"581","DOI":"10.1093\/biomet\/63.3.581","volume":"63","author":"DB Rubin","year":"1976","unstructured":"Rubin DB (1976) Inference and missing data. Biometrika 63(3):581\u2013592","journal-title":"Biometrika"},{"key":"266_CR190","doi-asserted-by":"publisher","first-page":"107079","DOI":"10.1016\/j.knosys.2021.107079","volume":"224","author":"J Han","year":"2021","unstructured":"Han J, Kang S (2021) Active learning with missing values considering imputation uncertainty. Knowl-Based Syst 224:107079","journal-title":"Knowl-Based Syst"},{"key":"266_CR191","doi-asserted-by":"crossref","unstructured":"Sun Y, Zheng Z, Song S, Chiang F ( 2022) Confidence bounded replica currency estimation. In: SIGMOD 2022, pp. 730\u2013 743","DOI":"10.1145\/3514221.3517852"},{"key":"266_CR192","doi-asserted-by":"crossref","unstructured":"Iida H, Thai D, Manjunatha V, Iyyer M (2021) Tabbie: Pretrained representations of tabular data. arXiv preprint arXiv:2105.02584","DOI":"10.18653\/v1\/2021.naacl-main.270"},{"issue":"8","key":"266_CR193","doi-asserted-by":"publisher","first-page":"1254","DOI":"10.14778\/3457390.3457391","volume":"14","author":"N Tang","year":"2021","unstructured":"Tang N, Fan J, Li F, Tu J, Du X, Li G, Madden S, Ouzzani M (2021) Rpt: relational pre-trained transformer is almost all you need towards democratizing data preparation. Proc VLDB Endow 14(8):1254\u20131261. https:\/\/doi.org\/10.14778\/3457390.3457391","journal-title":"Proc VLDB Endow"},{"issue":"1","key":"266_CR194","doi-asserted-by":"publisher","first-page":"33","DOI":"10.1145\/3542700.3542709","volume":"51","author":"X Deng","year":"2022","unstructured":"Deng X, Sun H, Lees A, Wu Y, Yu C (2022) Turl: table understanding through representation learning. SIGMOD Rec 51(1):33\u201340. https:\/\/doi.org\/10.1145\/3542700.3542709","journal-title":"SIGMOD Rec"}],"container-title":["Data Science and Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s41019-024-00266-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s41019-024-00266-7\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s41019-024-00266-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,6]],"date-time":"2025-06-06T11:02:47Z","timestamp":1749207767000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s41019-024-00266-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,12,20]]},"references-count":194,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,6]]}},"alternative-id":["266"],"URL":"https:\/\/doi.org\/10.1007\/s41019-024-00266-7","relation":{},"ISSN":["2364-1185","2364-1541"],"issn-type":[{"value":"2364-1185","type":"print"},{"value":"2364-1541","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,12,20]]},"assertion":[{"value":"16 April 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 August 2024","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"5 October 2024","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 December 2024","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}