{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,8]],"date-time":"2026-04-08T05:41:49Z","timestamp":1775626909539,"version":"3.50.1"},"reference-count":68,"publisher":"Association for Computing Machinery (ACM)","issue":"6","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Knowl. Discov. Data"],"published-print":{"date-parts":[[2025,7,31]]},"abstract":"<jats:p>Data imputation and data generation have important applications across many domains where incomplete or missing data can hinder accurate analysis and decision-making. Diffusion models have emerged as powerful generative models capable of capturing complex data distributions across various data modalities such as image, audio, and time series. Recently, they have been also adapted to generate tabular data. In this article, we propose a diffusion model for tabular data that introduces three key enhancements: (1) a conditioning attention mechanism, (2) an encoder\u2013decoder transformer as the denoising network, and (3) dynamic masking. The conditioning attention mechanism is designed to improve the model\u2019s ability to capture the relationship between the condition and synthetic data. The transformer layers help model interactions within the condition (encoder) or synthetic data (decoder), while dynamic masking enables our model to efficiently handle both missing data imputation and synthetic data generation tasks within a unified framework. We conduct a comprehensive evaluation by comparing the performance of diffusion models with transformer conditioning against state-of-the-art techniques such as Variational Autoencoders, Generative Adversarial Networks, and Diffusion Models, on benchmark datasets. Our evaluation focuses on the assessment of the generated samples with respect to three important criteria, namely: (1) machine learning efficiency, (2) statistical similarity, and (3) privacy risk mitigation. For the task of data imputation, we consider the efficiency of the generated samples across different levels of missing features. The results demonstrate average superior machine learning efficiency and statistical accuracy compared to the baselines, while maintaining privacy risks at a comparable level, particularly showing increased performance in datasets with a large number of features. By conditioning the data generation on a desired target variable, the model can mitigate systemic biases, generate augmented datasets to address data imbalance issues, and improve data quality for subsequent analysis. This has significant implications for domains such as healthcare and finance, where accurate, unbiased, and privacy-preserving data are critical for informed decision-making and fair model outcomes.<\/jats:p>","DOI":"10.1145\/3742435","type":"journal-article","created":{"date-parts":[[2025,6,10]],"date-time":"2025-06-10T12:20:40Z","timestamp":1749558040000},"page":"1-32","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":13,"title":["Diffusion Models for Tabular Data Imputation and Synthetic Data Generation"],"prefix":"10.1145","volume":"19","author":[{"ORCID":"https:\/\/orcid.org\/0009-0002-0754-1742","authenticated-orcid":false,"given":"Mario","family":"Villaiz\u00e1n-Vallelado","sequence":"first","affiliation":[{"name":"Universidad de Valladolid, Valladolid, Spain and Telef\u00f3nica Scientific Research, Madrid, Spain"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1499-6024","authenticated-orcid":false,"given":"Matteo","family":"Salvatori","sequence":"additional","affiliation":[{"name":"Telef\u00f3nica Scientific Research, Madrid, Spain"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5867-281X","authenticated-orcid":false,"given":"Carlos","family":"Segura","sequence":"additional","affiliation":[{"name":"Telef\u00f3nica Scientific Research, Madrid, Spain"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2528-6597","authenticated-orcid":false,"given":"Ioannis","family":"Arapakis","sequence":"additional","affiliation":[{"name":"Telef\u00f3nica Scientific Research, Madrid, Spain"}]}],"member":"320","published-online":{"date-parts":[[2025,7,21]]},"reference":[{"key":"e_1_3_3_2_2","doi-asserted-by":"publisher","DOI":"10.1145\/3292500.3330701"},{"key":"e_1_3_3_3_2","doi-asserted-by":"publisher","unstructured":"Barry Becker and Ronny Kohavi. 1996. Adult. UCI Machine Learning Repository. DOI: 10.24432\/C5XW20","DOI":"10.24432\/C5XW20"},{"key":"e_1_3_3_4_2","unstructured":"Wendy Kan Benjamin Bossan and Josef Feigl. 2015. Otto Group Product Classification Challenge. Retrieved from https:\/\/kaggle.com\/competitions\/otto-group-product-classification-challenge"},{"key":"e_1_3_3_5_2","doi-asserted-by":"publisher","DOI":"10.5555\/3122009.3242053"},{"issue":"175","key":"e_1_3_3_6_2","first-page":"1","article-title":"DataWig: Missing value imputation for tables","volume":"20","author":"Biessmann Felix","year":"2019","unstructured":"Felix Biessmann, Tammo Rukat, Philipp Schmidt, Prathik Naidu, Sebastian Schelter, Andrey Taptunov, Dustin Lange, and David Salinas. 2019. DataWig: Missing value imputation for tables. Journal of Machine Learning Research 20, 175 (2019), 1\u20136.","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_3_7_2","doi-asserted-by":"publisher","unstructured":"Jock Blackard. 1998. Covertype. UCI Machine Learning Repository. DOI: 10.24432\/C50K5N","DOI":"10.24432\/C50K5N"},{"key":"e_1_3_3_8_2","doi-asserted-by":"publisher","DOI":"10.1109\/TNNLS.2022.3229161"},{"key":"e_1_3_3_9_2","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2939785"},{"key":"e_1_3_3_10_2","doi-asserted-by":"publisher","DOI":"10.1353\/rhe.2014.0026"},{"key":"e_1_3_3_11_2","first-page":"8780","volume-title":"Advances in Neural Information Processing Systems","author":"Dhariwal Prafulla","year":"2021","unstructured":"Prafulla Dhariwal and Alexander Nichol. 2021. Diffusion models beat GANs on image synthesis. In Advances in Neural Information Processing Systems. M. Ranzato, A. Beygelzimer, Y. Dauphin, P. S. Liang, and J. Wortman Vaughan (Eds.), Vol. 34, Curran Associates, Inc., 8780\u20138794. Retrieved from https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2021\/file\/49ad23d1ec9fa4bd8d77d02681df5cfa-Paper.pdf"},{"key":"e_1_3_3_12_2","doi-asserted-by":"publisher","DOI":"10.1007\/11787006_1"},{"key":"e_1_3_3_13_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2021.114582"},{"key":"e_1_3_3_14_2","unstructured":"FICO. 2019. Home Equity Line of Credit (HELOC) Dataset. Retrieved from https:\/\/community.fico.com\/s\/explainable-machine-learning-challenge"},{"key":"e_1_3_3_15_2","article-title":"Generative adversarial nets","volume":"27","author":"Goodfellow Ian","year":"2014","unstructured":"Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems. Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K.Q. Weinberger (Eds.), Curran Associates, Inc., Vol. 27. Retrieved from https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2014\/file\/f033ed80deb0234979a61f95710dbe25-Paper.pdf","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_3_16_2","volume-title":"Proceedings of the 36th International Conference on Neural Information Processing Systems (NIPS \u201922)","author":"Grinsztajn L\u00e9o","year":"2024","unstructured":"L\u00e9o Grinsztajn, Edouard Oyallon, and Ga\u00ebl Varoquaux. 2024. Why do tree-based models still outperform deep learning on typical tabular data? In Proceedings of the 36th International Conference on Neural Information Processing Systems (NIPS \u201922). Curran Associates Inc., Red Hook, NY, Article 37, 14 pages."},{"key":"e_1_3_3_17_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.jclinepi.2022.08.016"},{"key":"e_1_3_3_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/3133956.3134012"},{"key":"e_1_3_3_19_2","first-page":"6840","volume-title":"Advances in Neural Information Processing Systems","author":"Ho Jonathan","year":"2020","unstructured":"Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems. H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33, Curran Associates, Inc., 6840\u20136851. Retrieved from https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2020\/file\/4c5bcfec8584af0d967f1ab10179ca4b-Paper.pdf"},{"key":"e_1_3_3_20_2","first-page":"12454","volume-title":"Advances in Neural Information Processing Systems","author":"Hoogeboom Emiel","year":"2021","unstructured":"Emiel Hoogeboom, Didrik Nielsen, Priyank Jaini, Patrick Forr\u00e9, and Max Welling. 2021. Argmax flows and multinomial diffusion: Learning categorical distributions. In Advances in Neural Information Processing Systems. M. Ranzato, A. Beygelzimer, Y. Dauphin, P. S. Liang, and J. Wortman Vaughan (Eds.), Vol. 34, Curran Associates, Inc., 12454\u201312465. Retrieved from https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2021\/file\/67d96d458abdef21792e6d8e590244e7-Paper.pdf"},{"key":"e_1_3_3_21_2","volume-title":"Proceedings of the 10th International Conference on Learning Representations (ICLR \u201922)","author":"Ipsen Niels Bruun","year":"2022","unstructured":"Niels Bruun Ipsen, Pierre-Alexandre Mattei, and Jes Frellsen. 2022. How to deal with missing data in supervised deep learning? In Proceedings of the 10th International Conference on Learning Representations (ICLR \u201922)."},{"key":"e_1_3_3_22_2","unstructured":"Shruti Iyyer. 2019. Churn Modelling. Kaggle. Retrieved from https:\/\/www.kaggle.com\/datasets\/shrutimechlearn\/churn-modelling"},{"key":"e_1_3_3_23_2","doi-asserted-by":"publisher","DOI":"10.3389\/fdata.2021.693674"},{"key":"e_1_3_3_24_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patter.2021.100271"},{"key":"e_1_3_3_25_2","series-title":"Proceedings of Machine Learning Research","first-page":"9916","volume-title":"Proceedings of the 39th International Conference on Machine Learning","author":"Jarrett Daniel","year":"2022","unstructured":"Daniel Jarrett, Bogdan C. Cebere, Tennison Liu, Alicia Curth, Mihaela, and van der, Schaar. 2022. HyperImpute: Generalized iterative imputation with automatic model selection. In Proceedings of the 39th International Conference on Machine Learning. Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (Eds.), Proceedings of Machine Learning Research, Vol. 162, PMLR, 9916\u20139937. Retrieved from https:\/\/proceedings.mlr.press\/v162\/jarrett22a.html"},{"key":"e_1_3_3_26_2","unstructured":"Kaggle. 2016. House Sales in King County USA. Kaggle. Retrieved from https:\/\/www.kaggle.com\/datasets\/harlfoxem\/housesalesprediction"},{"key":"e_1_3_3_27_2","doi-asserted-by":"publisher","DOI":"10.4097\/kjae.2013.64.5.402"},{"key":"e_1_3_3_28_2","volume-title":"Proceedings of the 36th International Conference on Neural Information Processing Systems (NIPS \u201922)","author":"Karras Tero","year":"2024","unstructured":"Tero Karras, Miika Aittala, Samuli Laine, and Timo Aila. 2024. Elucidating the design space of diffusion-based generative models. In Proceedings of the 36th International Conference on Neural Information Processing Systems (NIPS \u201922). Curran Associates Inc., Red Hook, NY, Article 1926, 13 pages."},{"key":"e_1_3_3_29_2","article-title":"LightGBM: A highly efficient gradient boosting decision tree","volume":"30","author":"Ke Guolin","year":"2017","unstructured":"Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. LightGBM: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems. I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Curran Associates, Inc., Vol. 30. Retrieved from https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2017\/file\/6449f44a102fde848669bdd9eb6b76fa-Paper.pdf","journal-title":"Advances in Neural Information Processing Systems"},{"key":"e_1_3_3_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/3442381.3449999"},{"key":"e_1_3_3_31_2","volume-title":"Proceedings of the 11th International Conference on Learning Representations","author":"Kim Jayoung","year":"2023","unstructured":"Jayoung Kim, Chaejeong Lee, and Noseong Park. 2023. STaSy: Score-based tabular data synthesis. In Proceedings of the 11th International Conference on Learning Representations. Retrieved from https:\/\/openreview.net\/forum?id=1mNssCWt_v"},{"key":"e_1_3_3_32_2","first-page":"17564","volume-title":"Proceedings of the International Conference on Machine Learning","author":"Kotelnikov Akim","year":"2023","unstructured":"Akim Kotelnikov, Dmitry Baranchuk, Ivan Rubachev, and Artem Babenko. 2023. TabDDPM: Modelling tabular data with diffusion models. In Proceedings of the International Conference on Machine Learning. PMLR, 17564\u201317579."},{"key":"e_1_3_3_33_2","volume-title":"Advances in Neural Information Processing Systems","author":"Kyono Trent","year":"2021","unstructured":"Trent Kyono, Yao Zhang, Alexis Bellot, and Mihaela van der Schaar. 2021. MIRACLE: Causally-aware imputation via learning missing data mechanisms. In Advances in Neural Information Processing Systems. A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan (Eds.), Curran Associates, Inc. Retrieved from https:\/\/openreview.net\/forum?id=GzeqcAUFGl0"},{"key":"e_1_3_3_34_2","series-title":"Proceedings of Machine Learning Research","first-page":"18940","volume-title":"Proceedings of the 40th International Conference on Machine Learning","author":"Lee Chaejeong","year":"2023","unstructured":"Chaejeong Lee, Jayoung Kim, and Noseong Park. 2023. CoDi: Co-evolving contrastive diffusion models for mixed-type tabular synthesis. In Proceedings of the 40th International Conference on Machine Learning. Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (Eds.), Proceedings of Machine Learning Research, Vol. 202, PMLR, 18940\u201318956. Retrieved from https:\/\/proceedings.mlr.press\/v202\/lee23i.html"},{"key":"e_1_3_3_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/18.61115"},{"key":"e_1_3_3_36_2","series-title":"Proceedings of Machine Learning Research","first-page":"21450","volume-title":"Proceedings of the 40th International Conference on Machine Learning","author":"Liu Haohe","year":"2023","unstructured":"Haohe Liu, Zehua Chen, Yi Yuan, Xinhao Mei, Xubo Liu, Danilo Mandic, Wenwu Wang, and Mark D. Plumbley. 2023. AudioLDM: Text-to-audio generation with latent diffusion models. In Proceedings of the 40th International Conference on Machine Learning. Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (Eds.), Proceedings of Machine Learning Research, Vol. 202, PMLR, 21450\u201321474. Retrieved from https:\/\/proceedings.mlr.press\/v202\/liu23f.html"},{"key":"e_1_3_3_37_2","doi-asserted-by":"publisher","DOI":"10.5244\/C.34.191"},{"key":"e_1_3_3_38_2","doi-asserted-by":"publisher","DOI":"10.1145\/3458864.3466628"},{"key":"e_1_3_3_39_2","doi-asserted-by":"publisher","DOI":"10.5555\/3524938.3525599"},{"key":"e_1_3_3_40_2","doi-asserted-by":"publisher","DOI":"10.1109\/SP.2019.00065"},{"key":"e_1_3_3_41_2","series-title":"Proceedings of Machine Learning Research","first-page":"8162","volume-title":"Proceedings of the 38th International Conference on Machine Learning","author":"Quinn Alexander","year":"2021","unstructured":"Alexander Quinn and Nichol Prafulla Dhariwal. 2021. Improved denoising diffusion probabilistic models. In Proceedings of the 38th International Conference on Machine Learning. Marina Meila and Tong Zhang (Eds.), Proceedings of Machine Learning Research, Vol. 139, PMLR, 8162\u20138171. Retrieved from https:\/\/proceedings.mlr.press\/v139\/nichol21a.html"},{"key":"e_1_3_3_42_2","series-title":"Proceedings of Machine Learning Research","first-page":"16906","volume-title":"Proceedings of the 39th International Conference on Machine Learning","author":"Nock Richard","year":"2022","unstructured":"Richard Nock and Mathieu Guillame-Bert. 2022. Generative trees: Adversarial and copycat. In Proceedings of the 39th International Conference on Machine Learning. Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (Eds.), Proceedings of Machine Learning Research, Vol. 162, PMLR, 16906\u201316951. Retrieved from https:\/\/proceedings.mlr.press\/v162\/nock22a.html"},{"key":"e_1_3_3_43_2","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-7152(96)00140-X"},{"issue":"85","key":"e_1_3_3_44_2","first-page":"2825","article-title":"Scikit-learn: Machine learning in Python","volume":"12","author":"Pedregosa Fabian","year":"2011","unstructured":"Fabian Pedregosa, Ga\u00ebl Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12, 85 (2011), 2825\u20132830. Retrieved from http:\/\/jmlr.org\/papers\/v12\/pedregosa11a.html","journal-title":"Journal of Machine Learning Research"},{"key":"e_1_3_3_45_2","volume-title":"Advances in Neural Information Processing Systems.","author":"Prokhorenkova Liudmila","year":"2018","unstructured":"Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Dorogush, and Andrey Gulin. 2018. CatBoost: Unbiased boosting with categorical features. In Advances in Neural Information Processing Systems. S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), Curran Associates, Inc., Vol. 31. Retrieved from https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2017\/file\/6449f44a102fde848669bdd9eb6b76fa-Paper.pdf"},{"key":"e_1_3_3_46_2","doi-asserted-by":"publisher","DOI":"10.1038\/s41598-024-56706-x"},{"key":"e_1_3_3_47_2","unstructured":"Prakhar Rathi and Arpan Mishra. 2019. Insurance Company. Kaggle. Retrieved from https:\/\/www.kaggle.com\/datasets\/prakharrathi25\/insurance-company-dataset"},{"key":"e_1_3_3_48_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01042"},{"key":"e_1_3_3_49_2","article-title":"Deep unsupervised learning using nonequilibrium thermodynamics","author":"Sohl-Dickstein Jascha","year":"2015","unstructured":"Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. 2015. Deep unsupervised learning using nonequilibrium thermodynamics. In Proceedings of the International Conference on Machine Learning (ICML).","journal-title":"Proceedings of the International Conference on Machine Learning (ICML)"},{"key":"e_1_3_3_50_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Song Jiaming","year":"2021","unstructured":"Jiaming Song, Chenlin Meng, and Stefano Ermon. 2021. Denoising diffusion implicit models. In Proceedings of the International Conference on Learning Representations."},{"key":"e_1_3_3_51_2","volume-title":"Proceedings of the International Conference on Learning Representations","author":"Song Yang","year":"2021","unstructured":"Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. 2021. Score-based generative modeling through stochastic differential equations. In Proceedings of the International Conference on Learning Representations. Retrieved from https:\/\/openreview.net\/forum?id=PxTIG12RRHS"},{"key":"e_1_3_3_52_2","unstructured":"Daniel J. Stekhoven and Maintainer Daniel J. Stekhoven. 2013. Package \u2018missForest\u2019. R Package Version 1 21."},{"key":"e_1_3_3_53_2","first-page":"24804","volume-title":"Advances in Neural Information Processing Systems","author":"Tashiro Yusuke","year":"2021","unstructured":"Yusuke Tashiro, Jiaming Song, Yang Song, and Stefano Ermon. 2021. CSDI: Conditional score-based diffusion models for probabilistic time series imputation. In Advances in Neural Information Processing Systems. M. Ranzato, A. Beygelzimer, Y. Dauphin, P. S. Liang, and J. Wortman Vaughan (Eds.), Vol. 34, Curran Associates, Inc., 24804\u201324816. Retrieved from https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2021\/file\/cfe8504bda37b575c70ee1a8276f3486-Paper.pdf"},{"key":"e_1_3_3_54_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2021.12.018"},{"key":"e_1_3_3_55_2","unstructured":"Svetlana Ulianova. 2020. Cardiovascular Disease. Kaggle. Retrieved from https:\/\/www.kaggle.com\/datasets\/sulianova\/cardiovascular-disease-dataset"},{"key":"e_1_3_3_56_2","first-page":"11287","volume-title":"Advances in Neural Information Processing Systems","author":"Vahdat Arash","year":"2021","unstructured":"Arash Vahdat, Karsten Kreis, and Kautz Jan. 2021. Score-based generative modeling in latent space. In Advances in Neural Information Processing Systems. M. Ranzato, A. Beygelzimer, Y. Dauphin, P. S. Liang, and J. Wortman Vaughan (Eds.), Vol. 34, Curran Associates, Inc., 11287\u201311302. Retrieved from https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2021\/file\/5dca4c6b9e244d24a30b4c45601d9720-Paper.pdf"},{"key":"e_1_3_3_57_2","first-page":"1","article-title":"MICE: Multivariate imputation by chained equations in R","volume":"45","author":"Van Buuren Stef","year":"2011","unstructured":"Stef Van Buuren and Karin Groothuis-Oudshoorn. 2011. MICE: Multivariate imputation by chained equations in R. Journal of Statistical Software 45 (2011), 1\u201367.","journal-title":"Journal of Statistical Software"},{"key":"e_1_3_3_58_2","volume-title":"Advances in Neural Information Processing Systems","author":"Vaswani Ashish","year":"2017","unstructured":"Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, \u0141. Ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30, Curran Associates, Inc. Retrieved from https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2017\/file\/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf"},{"key":"e_1_3_3_59_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.snb.2012.01.074"},{"key":"e_1_3_3_60_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISIT45174.2021.9518186"},{"key":"e_1_3_3_61_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.neunet.2021.05.033"},{"key":"e_1_3_3_62_2","article-title":"Causal-TGAN: Modeling tabular data using causally-aware GAN","author":"Wen Bingyang","year":"2022","unstructured":"Bingyang Wen, Yupeng Cao, Fan Yang, Koduvayur Subbalakshmi, and Rajarathnam Chandramouli. 2022. Causal-TGAN: Modeling tabular data using causally-aware GAN. In Proceedings of the ICLR Workshop on Deep Generative Models for Highly Structured Data.","journal-title":"Proceedings of the ICLR Workshop on Deep Generative Models for Highly Structured Data"},{"key":"e_1_3_3_63_2","volume-title":"Advances in Neural Information Processing Systems","author":"Xu Lei","year":"2019","unstructured":"Lei Xu, Maria Skoularidou, Alfredo Cuesta-Infante, and Kalyan Veeramachaneni. 2019. Modeling tabular data using conditional GAN. In Advances in Neural Information Processing Systems. H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch\u00e9-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32, Curran Associates, Inc. Retrieved from https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2019\/file\/254ed7d2de3b23ab10936522dd547b78-Paper.pdf"},{"key":"e_1_3_3_64_2","series-title":"Proceedings of Machine Learning Research","first-page":"5689","volume-title":"Proceedings of the 35th International Conference on Machine Learning","author":"Yoon Jinsung","year":"2018","unstructured":"Jinsung Yoon, James Jordon, and Mihaela van der Schaar. 2018. GAIN: Missing data imputation using generative adversarial nets. In Proceedings of the 35th International Conference on Machine Learning. Jennifer Dy and Andreas Krause (Eds.), Proceedings of Machine Learning Research, Vol. 80, PMLR, 5689\u20135698."},{"key":"e_1_3_3_65_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR52688.2022.01102"},{"key":"e_1_3_3_66_2","volume-title":"Proceedings of the 12th International Conference on Learning Representations","author":"Zhang Hengrui","year":"2024","unstructured":"Hengrui Zhang, Jiani Zhang, Zhengyuan Shen, Balasubramaniam Srinivasan, Xiao Qin, Christos Faloutsos, Huzefa Rangwala, and George Karypis. 2024. Mixed-type tabular data synthesis with score-based diffusion in latent space. In Proceedings of the 12th International Conference on Learning Representations. Retrieved from https:\/\/openreview.net\/forum?id=4Ay23yeuz0"},{"key":"e_1_3_3_67_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICDM51629.2021.00103"},{"key":"e_1_3_3_68_2","series-title":"Proceedings of Machine Learning Research","first-page":"97","volume-title":"Proceedings of the 13th Asian Conference on Machine Learning","author":"Zhao Zilong","year":"2021","unstructured":"Zilong Zhao, Aditya Kunar, Robert Birke, and Lydia Y. Chen. 2021. CTAB-GAN: Effective table data synthesizing. In Proceedings of the 13th Asian Conference on Machine Learning. Vineeth N. Balasubramanian and Ivor Tsang (Eds.), Proceedings of Machine Learning Research, Vol. 157, PMLR, 97\u2013112. Retrieved from https:\/\/proceedings.mlr.press\/v157\/zhao21a.html"},{"key":"e_1_3_3_69_2","unstructured":"Shuhan Zheng and Nontawat Charoenphakdee. 2022. Diffusion models for missing value imputation in tabular data. In Proceedings of the NeurIPS 2022 1st Table Representation Workshop. Retrieved from https:\/\/openreview.net\/forum?id=4q9kFrXC2Ae"}],"container-title":["ACM Transactions on Knowledge Discovery from Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3742435","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,26]],"date-time":"2025-09-26T18:52:11Z","timestamp":1758912731000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3742435"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,7,21]]},"references-count":68,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2025,7,31]]}},"alternative-id":["10.1145\/3742435"],"URL":"https:\/\/doi.org\/10.1145\/3742435","relation":{},"ISSN":["1556-4681","1556-472X"],"issn-type":[{"value":"1556-4681","type":"print"},{"value":"1556-472X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,7,21]]},"assertion":[{"value":"2024-07-05","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-05-16","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-07-21","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}