{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,19]],"date-time":"2026-03-19T02:22:27Z","timestamp":1773886947504,"version":"3.50.1"},"reference-count":79,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2025,5,14]],"date-time":"2025-05-14T00:00:00Z","timestamp":1747180800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100012166","name":"National Key R&D Program of China","doi-asserted-by":"crossref","award":["2023YFB4503600, 2022YFB2702100"],"award-info":[{"award-number":["2023YFB4503600, 2022YFB2702100"]}],"id":[{"id":"10.13039\/501100012166","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100001809","name":"NSF of China","doi-asserted-by":"crossref","award":["61925205, 62232009, 62102215"],"award-info":[{"award-number":["61925205, 62232009, 62102215"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Shenzhen Project","award":["CJGJZD20230724093403007"],"award-info":[{"award-number":["CJGJZD20230724093403007"]}]},{"name":"Zhongguancun Lab, Huawei, and Beijing National Research Center for Information Science and Technology"},{"name":"National Key Research and Development Program of China","award":["2024YFC3308200"],"award-info":[{"award-number":["2024YFC3308200"]}]},{"DOI":"10.13039\/501100001809","name":"NSFC","doi-asserted-by":"crossref","award":["62472031, 62436010, 62441230, 62372138, 61932004, 62225203, U21A20516, 62402409, 62427808, U2001211"],"award-info":[{"award-number":["62472031, 62436010, 62441230, 62372138, 61932004, 62225203, U21A20516, 62402409, 62427808, U2001211"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"crossref"}]},{"name":"CCF-Baidu Open Fund","award":["CCF-Baidu202402"],"award-info":[{"award-number":["CCF-Baidu202402"]}]},{"name":"Huawei"},{"name":"Beijing Natural Science Foundation","award":["L244010 and L222006, L241010"],"award-info":[{"award-number":["L244010 and L222006, L241010"]}]},{"name":"Research Funds of Renmin University of China"},{"DOI":"10.13039\/501100005046","name":"Heilongjiang Natural Science Foundation","doi-asserted-by":"crossref","award":["HSF20230095, 2024ZXJ01A04"],"award-info":[{"award-number":["HSF20230095, 2024ZXJ01A04"]}],"id":[{"id":"10.13039\/501100005046","id-type":"DOI","asserted-by":"crossref"}]},{"name":"CCF- Huawei Populus Grove Fund","award":["CCF-HuaweiDB202406"],"award-info":[{"award-number":["CCF-HuaweiDB202406"]}]},{"name":"Guangzhou Municipality Big Data Intelligence Key Lab","award":["2023A03J0012"],"award-info":[{"award-number":["2023A03J0012"]}]},{"DOI":"10.13039\/501100021171","name":"Guangdong Basic and Applied Basic Research Foundation","doi-asserted-by":"crossref","award":["2023A1515110545"],"award-info":[{"award-number":["2023A1515110545"]}],"id":[{"id":"10.13039\/501100021171","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Guangzhou-HKUST(GZ) Joint Funding Program","award":["2025A03J3714"],"award-info":[{"award-number":["2025A03J3714"]}]},{"DOI":"10.13039\/501100018617","name":"Liaoning Revitalization Talents Program","doi-asserted-by":"crossref","award":["XLYC2204005"],"award-info":[{"award-number":["XLYC2204005"]}],"id":[{"id":"10.13039\/501100018617","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Database Syst."],"published-print":{"date-parts":[[2025,9,30]]},"abstract":"<jats:p>Given a dataset with incomplete data (e.g., missing values), training a machine learning model over the incomplete data requires two steps. First, it requires a data-effective step that cleans the data in order to improve the data quality (and the model quality on the cleaned data). Second, it requires a data-efficient step that selects a core subset of the data (called coreset) such that the trained models on the entire data and the coreset have similar model quality, in order to save the computational cost of training. The first-data-effective-then-data-efficient methods are too costly, because they are expensive to clean the whole data; while the first-data-efficient-then-data-effective methods have low model quality, because they cannot select high-quality coreset for incomplete data.<\/jats:p>\n          <jats:p>\n            In this article, we investigate the problem of coreset selection over incomplete data for data-effective and data-efficient machine learning. The essential challenge is how to model the incomplete data for selecting high-quality coreset. To this end, we propose the\n            <jats:monospace>GoodCore<\/jats:monospace>\n            framework towards selecting a good coreset over incomplete data with low cost. To model the unknown complete data, we utilize the combinations of possible repairs as possible worlds of the incomplete data. Based on possible worlds,\n            <jats:monospace>GoodCore<\/jats:monospace>\n            selects an expected optimal coreset through gradient approximation without training ML models. We formally define the expected optimal coreset selection problem, prove its NP-hardness, and propose a greedy algorithm with an approximation ratio. To make\n            <jats:monospace>GoodCore<\/jats:monospace>\n            more efficient, we propose optimization methods that incorporate human-in-the-loop imputation or automatic imputation method into our framework. Moreover, a group-based strategy is utilized to further accelerate the coreset selection with incomplete data given large datasets. Experimental results show the effectiveness and efficiency of our framework with low cost.\n          <\/jats:p>","DOI":"10.1145\/3716376","type":"journal-article","created":{"date-parts":[[2025,3,8]],"date-time":"2025-03-08T11:43:06Z","timestamp":1741434186000},"page":"1-36","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["Cost-effective Missing Value Imputation for Data-effective Machine Learning"],"prefix":"10.1145","volume":"50","author":[{"ORCID":"https:\/\/orcid.org\/0009-0003-5386-1330","authenticated-orcid":false,"given":"Chengliang","family":"Chai","sequence":"first","affiliation":[{"name":"Computer Science and Technology, Beijing Institute of Technology, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0009-0004-7020-5404","authenticated-orcid":false,"given":"Kaisen","family":"Jin","sequence":"additional","affiliation":[{"name":"Computer Science and Technology, Beijing Institute of Technology, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2832-0295","authenticated-orcid":false,"given":"Nan","family":"Tang","sequence":"additional","affiliation":[{"name":"Computer Scisence and Technology, HKUST (GZ), Guangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4729-9903","authenticated-orcid":false,"given":"Ju","family":"Fan","sequence":"additional","affiliation":[{"name":"Renmin University of China, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9370-7088","authenticated-orcid":false,"given":"Dongjing","family":"Miao","sequence":"additional","affiliation":[{"name":"Faculty of Computing, Harbin Institute of Technology, Harbin, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9700-9751","authenticated-orcid":false,"given":"Jiayi","family":"Wang","sequence":"additional","affiliation":[{"name":"Computer Secience and Technology, Tsinghua University, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9530-3327","authenticated-orcid":false,"given":"Yuyu","family":"Luo","sequence":"additional","affiliation":[{"name":"HKUST(GZ), Guangzhou, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1398-0621","authenticated-orcid":false,"given":"Guoliang","family":"Li","sequence":"additional","affiliation":[{"name":"Computer Science, Tsinghua University, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6282-6057","authenticated-orcid":false,"given":"Ye","family":"Yuan","sequence":"additional","affiliation":[{"name":"Beijing Institute of Technology, Beijing, China"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0181-8379","authenticated-orcid":false,"given":"Guoren","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China"}]}],"member":"320","published-online":{"date-parts":[[2025,5,14]]},"reference":[{"key":"e_1_3_1_2_2","unstructured":"2024. Retrieved February 24 2024 from https:\/\/github.com\/awslabs\/datawig"},{"key":"e_1_3_1_3_2","unstructured":"2024. Retrieved February 24 2024 from https:\/\/archive.ics.uci.edu\/ml\/datasets\/nursery"},{"key":"e_1_3_1_4_2","unstructured":"2024. Retrieved February 24 2024 from https:\/\/archive.ics.uci.edu\/ml\/datasets\/adult"},{"key":"e_1_3_1_5_2","unstructured":"2024. Retrieved February 24 2024 from https:\/\/www.kaggle.com\/"},{"key":"e_1_3_1_6_2","unstructured":"2024. Retrieved February 24 2024 from https:\/\/ride.capitalbikeshare.com\/system-data"},{"key":"e_1_3_1_7_2","unstructured":"2024. Retrieved February 24 2024 from https:\/\/auctus.vida-nyu.org\/"},{"key":"e_1_3_1_8_2","doi-asserted-by":"crossref","unstructured":"Sharat Agarwal Himanshu Arora Saket Anand and Chetan Arora. 2020. Contextual diversity for active learning. In Computer Vision\u2013ECCV 2020: 16th European Conference Glasgow UK August 23\u201328 2020 Proceedings Part XVI 16 Springer 137\u2013153.","DOI":"10.1007\/978-3-030-58517-4_9"},{"key":"e_1_3_1_9_2","article-title":"Exploiting the structure: Stochastic gradient methods using raw clusters","volume":"29","author":"Allen-Zhu Zeyuan","year":"2016","unstructured":"Zeyuan Allen-Zhu, Yang Yuan, and Karthik Sridharan. 2016. Exploiting the structure: Stochastic gradient methods using raw clusters. NeurIPS 29 (2016), 1642\u20131650.","journal-title":"NeurIPS"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1080\/00031305.1992.10475879"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/FOCS.2006.49"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1145\/303976.303983"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-031-01883-1"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1145\/3294052.3322190"},{"issue":"175","key":"e_1_3_1_15_2","first-page":"1","article-title":"DataWig: Missing value imputation for tables","volume":"20","year":"2019","unstructured":"Felix Biessmann, Tammo Rukat, Phillipp Schmidt, Prathik Naidu, Sebastian Schelter, Andrey Taptunov, Dustin Lange, and David Salinas. 2019. DataWig: Missing value imputation for tables. JMLR 20, 175 (2019), 1\u20136.","journal-title":"JMLR"},{"key":"e_1_3_1_16_2","first-page":"14879","article-title":"Coresets via bilevel optimization for continual learning and streaming","volume":"33","author":"Borsos Zal\u00e1n","year":"2020","unstructured":"Zal\u00e1n Borsos, Mojmir Mutny, and Andreas Krause. 2020. Coresets via bilevel optimization for continual learning and streaming. Advances in Neural Information Processing Systems 33 (2020), 14879\u201314890.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511804441"},{"key":"e_1_3_1_18_2","unstructured":"Vladimir Braverman Dan Feldman and Harry Lang. 2016. New frameworks for offline and streaming coreset constructions. CoRR abs\/1612.00889 (2016). Retrieved from http:\/\/arxiv.org\/abs\/1612.00889"},{"key":"e_1_3_1_19_2","first-page":"697","volume-title":"Proceedings of the ICML 2018","author":"Campbell Trevor","year":"2018","unstructured":"Trevor Campbell and Tamara Broderick. 2018. Bayesian coreset construction via greedy iterative geodesic ascent. In Proceedings of the ICML 2018. PMLR, 697\u2013705."},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.14778\/3685800.3685880"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.14778\/3523210.3523223"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1145\/3589302"},{"key":"e_1_3_1_23_2","unstructured":"Chengliang Chai Jiayi Wang Yuyu Luo Zeping Niu and Guoliang Li. 2022. Data management for machine learning: A survey. TKDE 35 5 (2022) 4646\u20134667."},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1145\/3580305.3599326"},{"key":"e_1_3_1_25_2","unstructured":"Yutian Chen Max Welling and Alexander J. Smola. 2012. Super-Samples from Kernel Herding. CoRR abs\/1203.3472 (2012). Retrieved from http:\/\/arxiv.org\/abs\/1203.3472"},{"key":"e_1_3_1_26_2","doi-asserted-by":"publisher","DOI":"10.1214\/aoms\/1177728716"},{"key":"e_1_3_1_27_2","doi-asserted-by":"crossref","unstructured":"Ting Deng Wenfei Fan and Floris Geerts. 2016. Capturing missing tuples and missing values. ACM Transactions on Database Systems 41 2 (2016) 10:1\u201310:47.","DOI":"10.1145\/2901737"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.14778\/3648160.3648161"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.14778\/3659437.3659448"},{"key":"e_1_3_1_30_2","volume-title":"Proceedings of the SIGMOD","author":"Deng Yuhao","year":"2025","unstructured":"Yuhao Deng, Chengliang Chai, Kaisen Jin, Linan Zheng, Lei Cao, Ye Yuan, and Guoren Wang. 2025. Two birds with one stone: Efficient deep learning over mislabeled data through subset selection. In Proceedings of the SIGMOD."},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1145\/3626246.3654737"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.14778\/3654621.3654627"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.4007\/annals.2005.162.439"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/TSMCA.2007.902631"},{"key":"e_1_3_1_35_2","unstructured":"Dan Feldman. 2020. Introduction to core-sets: An Updated Survey. CoRR abs\/2011.09384 (2020). Retrieved from https:\/\/arxiv.org\/abs\/2011.09384"},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","unstructured":"Lovedeep Gondara and Ke Wang. 2018. MIDA: Multiple imputation using denoising autoencoders. In Advances in Knowledge Discovery and Data Mining - 22nd Pacific-Asia Conference PAKDD 2018 Melbourne VIC Australia June 3-6 2018 Proceedings Part III (Lecture Notes in Computer Science) Dinh Q. Phung Vincent S. Tseng Geoffrey I. Webb Bao Ho Mohadeseh Ganji and Lida Rashidi (Eds.). Springer 260\u2013272. DOI:10.1007\/978-3-319-93040-4_21","DOI":"10.1007\/978-3-319-93040-4_21"},{"key":"e_1_3_1_37_2","doi-asserted-by":"publisher","DOI":"10.2307\/2346830"},{"key":"e_1_3_1_38_2","article-title":"Variance reduced stochastic gradient descent with neighbors","volume":"28","author":"Hofmann Thomas","year":"2015","unstructured":"Thomas Hofmann, Aurelien Lucchi, Simon Lacoste-Julien, and Brian McWilliams. 2015. Variance reduced stochastic gradient descent with neighbors. Advances in Neural Information Processing Systems 28 (2015), 2305\u20132313.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_39_2","first-page":"4412","volume-title":"Proceedings of the ICML 2021","year":"2021","unstructured":"Jiawei Huang, Ruomin Huang, Wenjie Liu, Nikolaos Freris, and Hu Ding. 2021. A novel sequential coreset method for gradient descent algorithms. In Proceedings of the ICML 2021. PMLR, 4412\u20134422."},{"key":"e_1_3_1_40_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.artmed.2010.05.002"},{"key":"e_1_3_1_41_2","doi-asserted-by":"publisher","DOI":"10.14778\/3430915.3430917"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.1186\/s40537-020-00313-w"},{"key":"e_1_3_1_43_2","first-page":"5464","volume-title":"Proceedings of the ICML","year":"2021","unstructured":"Krishnateja Killamsetty, S. Durga, Ganesh Ramakrishnan, Abir De, and Rishabh Iyer. 2021. Grad-match: Gradient matching based data subset selection for efficient deep model training. In Proceedings of the ICML. 5464\u20135474."},{"key":"e_1_3_1_44_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i9.16988"},{"key":"e_1_3_1_45_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v35i9.16988"},{"key":"e_1_3_1_46_2","first-page":"14488","article-title":"Retrieve: Coreset selection for efficient and robust semi-supervised learning","volume":"34","author":"Killamsetty Krishnateja","year":"2021","unstructured":"Krishnateja Killamsetty, Xujiang Zhao, Feng Chen, and Rishabh Iyer. 2021. Retrieve: Coreset selection for efficient and robust semi-supervised learning. Advances in Neural Information Processing Systems 34 (2021), 14488\u201314501.","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_1_47_2","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1014"},{"issue":"3","key":"e_1_3_1_48_2","first-page":"59","article-title":"SampleClean: Fast and reliable analytics on dirty data","volume":"38","year":"2015","unstructured":"Sanjay Krishnan, Jiannan Wang, Michael J. Franklin, Ken Goldberg, Tim Kraska, Tova Milo, and Eugene Wu. 2015. SampleClean: Fast and reliable analytics on dirty data. IEEE Data Engineering Bulletin 38, 3 (2015), 59\u201375.","journal-title":"IEEE Data Engineering Bulletin"},{"key":"e_1_3_1_49_2","doi-asserted-by":"publisher","DOI":"10.14778\/2994509.2994514"},{"key":"e_1_3_1_50_2","unstructured":"Sanjay Krishnan Michael J. Franklin Ken Goldberg and Eugene Wu. 2017. BoostClean: Automated error detection and repair for machine learning. CoRR abs\/1711.01299 (2017). Retrieved from http:\/\/arxiv.org\/abs\/1711.01299"},{"key":"e_1_3_1_51_2","doi-asserted-by":"publisher","DOI":"10.14778\/2850583.2850594"},{"issue":"254","key":"e_1_3_1_52_2","first-page":"10","article-title":"Cauchy and the gradient method","volume":"251","author":"Lemar\u00e9chal Claude","year":"2012","unstructured":"Claude Lemar\u00e9chal. 2012. Cauchy and the gradient method. Doc Math Extra 251, 254 (2012), 10.","journal-title":"Doc Math Extra"},{"key":"e_1_3_1_53_2","first-page":"13","volume-title":"Proceedings of the ICDE","year":"2021","unstructured":"Peng Li, Xi Rao, Jennifer Blase, Yue Zhang, Xu Chu, and Ce Zhang. 2021. CleanML: A study for evaluating the impact of data cleaning on ML classification tasks. In Proceedings of the ICDE. 13\u201324."},{"key":"e_1_3_1_54_2","doi-asserted-by":"publisher","DOI":"10.5555\/2002472.2002537"},{"key":"e_1_3_1_55_2","doi-asserted-by":"publisher","DOI":"10.14778\/3450980.3450989"},{"key":"e_1_3_1_56_2","doi-asserted-by":"publisher","DOI":"10.1145\/1807167.1807178"},{"key":"e_1_3_1_57_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ifacol.2018.09.406"},{"key":"e_1_3_1_58_2","unstructured":"Xiaoye Miao Yangyang Wu Lu Chen Yunjun Gao and Jianwei Yin. 2022. An experimental survey of missing data imputation algorithms. TKDE 35 7 (2022) 6630\u20136650."},{"key":"e_1_3_1_59_2","first-page":"6950","volume-title":"Proceedings of the ICML 2020","author":"Mirzasoleiman Baharan","year":"2020","unstructured":"Baharan Mirzasoleiman, Jeff A. Bilmes, and Jure Leskovec. 2020. Coresets for data-efficient training of machine learning models. In Proceedings of the ICML 2020. 6950\u20136960."},{"key":"e_1_3_1_60_2","first-page":"11465","article-title":"Coresets for robust training of deep neural networks against noisy labels","volume":"33","author":"Mirzasoleiman Baharan","year":"2020","unstructured":"Baharan Mirzasoleiman, Kaidi Cao, and Jure Leskovec. 2020. Coresets for robust training of deep neural networks against noisy labels. NeurIPS 33 (2020), 11465\u201311477.","journal-title":"NeurIPS"},{"key":"e_1_3_1_61_2","unstructured":"Baharan Mirzasoleiman Kaidi Cao and Jure Leskovec. 2020. Coresets for robust training of deep neural networks against noisy labels. Advances in Neural Information Processing Systems 33 (2020) 11465\u201311477."},{"key":"e_1_3_1_62_2","volume-title":"Proceedings of the AAAI","year":"2015","unstructured":"Baharan Mirzasoleiman, Ashwinkumar Badanidiyuru, Amin Karbasi, Jan Vondrak, and Andreas Krause. 2015. Lazier than lazy greedy. In Proceedings of the AAAI."},{"key":"e_1_3_1_63_2","doi-asserted-by":"publisher","DOI":"10.1007\/s13218-017-0519-3"},{"key":"e_1_3_1_64_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2020.107501"},{"key":"e_1_3_1_65_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4757-6594-6_11"},{"key":"e_1_3_1_66_2","article-title":"From cleaning before ml to cleaning for ML","author":"Neutatz Felix","year":"2021","unstructured":"Felix Neutatz, Binger Chen, Ziawasch Abedjan, and Eugene Wu. 2021. From cleaning before ml to cleaning for ML. IEEE Data Engineering Bulletin 44, 1 (2021), 24\u201341.","journal-title":"IEEE Data Engineering Bulletin"},{"key":"e_1_3_1_67_2","unstructured":"Andrew Ng. 2021. MLOps: From model-centric to data-centric AI. In DeepLearning. AI https:\/\/www.deeplearning.ai\/wp-content\/uploads\/2021\/06\/MLOps-From-Model-centric-to-Data-centric-AI.pdf"},{"key":"e_1_3_1_68_2","volume-title":"Proceedings of the Statistical Data Analysis Based on the L1 Norm Conference, Neuchatel, Switzerland","author":"Rdusseeun L. K. P. J.","year":"1987","unstructured":"L. K. P. J. Rdusseeun and P. Kaufman. 1987. Clustering by means of medoids. In Proceedings of the Statistical Data Analysis Based on the L1 Norm Conference, Neuchatel, Switzerland."},{"key":"e_1_3_1_69_2","doi-asserted-by":"publisher","DOI":"10.18637\/jss.v045.i04"},{"key":"e_1_3_1_70_2","unstructured":"Ozan Sener and Silvio Savarese. 2018. Active learning for convolutional neural networks: A core-set approach. 6th International Conference on Learning Representations ICLR 2018 Vancouver BC Canada April 30 - May 3 2018 Conference Track Proceedings (2018). Retrieved from https:\/\/openreview.net\/forum?id=H1aIuk-RW"},{"key":"e_1_3_1_71_2","doi-asserted-by":"publisher","DOI":"10.1002\/j.1538-7305.1948.tb01338.x"},{"key":"e_1_3_1_72_2","unstructured":"Samarth Sinha Han Zhang Anirudh Goyal Yoshua Bengio Hugo Larochelle and Augustus Odena. 2020. Small-GAN: Speeding up GAN training using core-sets. In International Conference on Machine Learning. PMLR 9005\u20139015."},{"key":"e_1_3_1_73_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neunet.2020.06.005"},{"key":"e_1_3_1_74_2","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btr597"},{"key":"e_1_3_1_75_2","volume-title":"Linear Algebra and its Applications.","author":"Strang Gilbert","year":"2006","unstructured":"Gilbert Strang. 2006. Linear Algebra and its Applications.Belmont, CA: Thomson, Brooks\/Cole."},{"key":"e_1_3_1_76_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISESE.2005.1541819"},{"key":"e_1_3_1_77_2","doi-asserted-by":"publisher","DOI":"10.14778\/3561261.3561267"},{"key":"e_1_3_1_78_2","volume-title":"Proceedings of the MLSys 2020","author":"Wu Richard","year":"2020","unstructured":"Richard Wu, Aoqian Zhang, Ihab F. Ilyas, and Theodoros Rekatsinas. 2020. Attention-based learning for missing data imputation in holoclean. In Proceedings of the MLSys 2020. mlsys.org."},{"key":"e_1_3_1_79_2","first-page":"5675","volume-title":"Proceedings of the ICML","author":"Yoon Jinsung","year":"2018","unstructured":"Jinsung Yoon, James Jordon, and Mihaela van der Schaar. 2018. GAIN: Missing data imputation using generative adversarial nets. In Proceedings of the ICML. 5675\u20135684."},{"key":"e_1_3_1_80_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDE.2019.00023"}],"container-title":["ACM Transactions on Database Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3716376","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3716376","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T18:43:43Z","timestamp":1750272223000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3716376"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,5,14]]},"references-count":79,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2025,9,30]]}},"alternative-id":["10.1145\/3716376"],"URL":"https:\/\/doi.org\/10.1145\/3716376","relation":{},"ISSN":["0362-5915","1557-4644"],"issn-type":[{"value":"0362-5915","type":"print"},{"value":"1557-4644","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,5,14]]},"assertion":[{"value":"2024-04-16","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-02-03","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-05-14","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}