{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,4]],"date-time":"2026-04-04T03:49:46Z","timestamp":1775274586189,"version":"3.50.1"},"reference-count":43,"publisher":"Springer Science and Business Media LLC","issue":"6","license":[{"start":{"date-parts":[[2025,3,15]],"date-time":"2025-03-15T00:00:00Z","timestamp":1741996800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,3,15]],"date-time":"2025-03-15T00:00:00Z","timestamp":1741996800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100022887","name":"Ho Chi Minh City Open University","doi-asserted-by":"crossref","award":["E2024.03.1"],"award-info":[{"award-number":["E2024.03.1"]}],"id":[{"id":"10.13039\/100022887","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/100022887","name":"Ho Chi Minh City Open University","doi-asserted-by":"crossref","award":["E2024.03.1"],"award-info":[{"award-number":["E2024.03.1"]}],"id":[{"id":"10.13039\/100022887","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/100022887","name":"Ho Chi Minh City Open University","doi-asserted-by":"crossref","award":["E2024.03.1"],"award-info":[{"award-number":["E2024.03.1"]}],"id":[{"id":"10.13039\/100022887","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100001538","name":"Victoria University of Wellington","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100001538","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Knowl Inf Syst"],"published-print":{"date-parts":[[2025,6]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:p>Class imbalance, where datasets often lack sufficient samples for minority classes, is a persistent challenge in machine learning. Existing solutions often generate synthetic data to mitigate this issue, but they typically struggle with complex data distributions, primarily because they focus on oversampling the minority class while neglecting the relationships with the majority class. To overcome these limitations, we propose the Contrastive Tabular Variational Autoencoder (CTVAE), which integrates conditional Variational Autoencoders with contrastive learning techniques. CTVAE excels at generating high-quality synthetic samples that capture the intricate data distributions of both minority and majority classes. Additionally, it can be seamlessly integrated with variants of the Synthetic Minority Oversampling Technique (SMOTE) for enhanced effectiveness. Experimental results demonstrate that CTVAE substantially improves classification performance on imbalanced datasets, offering a more robust and holistic solution to the class imbalance problem.\n<\/jats:p>","DOI":"10.1007\/s10115-025-02377-7","type":"journal-article","created":{"date-parts":[[2025,3,15]],"date-time":"2025-03-15T06:11:46Z","timestamp":1742019106000},"page":"5335-5354","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["CTVAE: Contrastive Tabular Variational Autoencoder for imbalance data"],"prefix":"10.1007","volume":"67","author":[{"given":"Alex X.","family":"Wang","sequence":"first","affiliation":[]},{"given":"Minh Quang","family":"Le","sequence":"additional","affiliation":[]},{"given":"Huu-Thanh","family":"Duong","sequence":"additional","affiliation":[]},{"given":"Bay Nguyen","family":"Van","sequence":"additional","affiliation":[]},{"given":"Binh P.","family":"Nguyen","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,3,15]]},"reference":[{"key":"2377_CR1","doi-asserted-by":"crossref","unstructured":"Johnson JM, Khoshgoftaar TM (2019) Survey on deep learning with class imbalance. J Big Data 6(1):1\u201354","DOI":"10.1186\/s40537-019-0192-5"},{"issue":"7","key":"2377_CR2","doi-asserted-by":"publisher","first-page":"3039","DOI":"10.1002\/int.22388","volume":"36","author":"L Dongdong","year":"2021","unstructured":"Dongdong L, Ziqiu C, Bolu W, Zhe W, Hai Y, Wenli D (2021) Entropy-based hybrid sampling ensemble learning for imbalanced data. Int J Intell Syst 36(7):3039\u20133067","journal-title":"Int J Intell Syst"},{"issue":"1","key":"2377_CR3","doi-asserted-by":"publisher","first-page":"6111312","DOI":"10.1155\/2024\/6111312","volume":"2024","author":"AU Rehman","year":"2024","unstructured":"Rehman AU, Butt WH, Ali TM, Javaid S, Almufareh MF, Humayun M, Rahman H, Mir A, Shaheen M (2024) A machine learning-based framework for accurate and early diagnosis of liver diseases: a comprehensive study on feature selection, data imbalance, and algorithmic performance. Int J Intell Syst 2024(1):6111312","journal-title":"Int J Intell Syst"},{"key":"2377_CR4","doi-asserted-by":"publisher","first-page":"321","DOI":"10.1613\/jair.953","volume":"16","author":"NV Chawla","year":"2002","unstructured":"Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321\u2013357","journal-title":"J Artif Intell Res"},{"issue":"9","key":"2377_CR5","doi-asserted-by":"publisher","first-page":"6390","DOI":"10.1109\/TNNLS.2021.3136503","volume":"34","author":"D Dablain","year":"2022","unstructured":"Dablain D, Krawczyk B, Chawla NV (2022) DeepSMOTE: fusing deep learning and SMOTE for imbalanced data. IEEE Trans Neural Netw Learn Syst 34(9):6390\u20136404","journal-title":"IEEE Trans Neural Netw Learn Syst"},{"issue":"6","key":"2377_CR6","doi-asserted-by":"publisher","first-page":"7499","DOI":"10.1109\/TNNLS.2022.3229161","volume":"35","author":"V Borisov","year":"2022","unstructured":"Borisov V, Leemann T, Se\u00dfler K, Haug J, Pawelczyk M, Kasneci G (2022) Deep neural networks and tabular data: a survey. IEEE Trans Neural Netw Learn Syst 35(6):7499\u20137519","journal-title":"IEEE Trans Neural Netw Learn Syst"},{"key":"2377_CR7","doi-asserted-by":"publisher","first-page":"7820","DOI":"10.1109\/TII.2024.3366991","volume":"5","author":"X Yang","year":"2024","unstructured":"Yang X, Ye T, Yuan X, Zhu W, Mei X, Zhou F (2024) A novel data augmentation method based on denoising diffusion probabilistic model for fault diagnosis under imbalanced data. IEEE Trans Ind Inf 5:7820\u20137831","journal-title":"IEEE Trans Ind Inf"},{"key":"2377_CR8","doi-asserted-by":"publisher","DOI":"10.1016\/j.asoc.2024.112223","volume":"166","author":"AX Wang","year":"2024","unstructured":"Wang AX, Chukova SS, Simpson CR, Nguyen BP (2024) Challenges and opportunities of generative models on tabular data. Appl Soft Comput 166:112223","journal-title":"Appl Soft Comput"},{"key":"2377_CR9","unstructured":"Vivekananthan S (2024) Comparative analysis of generative models: enhancing image synthesis with VAEs, GANs, and stable diffusion. arXiv:2408.08751"},{"key":"2377_CR10","unstructured":"Bai J, Kong S, Gomes CP (2022) Gaussian mixture variational autoencoder with contrastive learning for multi-label classification. In: International conference on machine learning. PMLR, pp 1383\u20131398"},{"key":"2377_CR11","doi-asserted-by":"crossref","unstructured":"Wang Y, Zhang H, Liu Z, Yang L, Yu PS (2022) Contrastvae: contrastive variational autoencoder for sequential recommendation. In: Proceedings of the 31st ACM international conference on information and knowledge management, pp 2056\u20132066","DOI":"10.1145\/3511808.3557268"},{"key":"2377_CR12","doi-asserted-by":"publisher","DOI":"10.1016\/j.asoc.2023.110895","volume":"148","author":"AX Wang","year":"2023","unstructured":"Wang AX, Chukova SS, Nguyen BP (2023) Synthetic minority oversampling using edited displacement-based k-nearest neighbors. Appl Soft Comput 148:110895","journal-title":"Appl Soft Comput"},{"issue":"4","key":"2377_CR13","doi-asserted-by":"publisher","first-page":"1619","DOI":"10.1002\/int.22354","volume":"36","author":"A Valdivia","year":"2021","unstructured":"Valdivia A, S\u00e1nchez-Monedero J, Casillas J (2021) How fair can we go in machine learning? Assessing the boundaries of accuracy and fairness. Int J Intell Syst 36(4):1619\u20131643","journal-title":"Int J Intell Syst"},{"key":"2377_CR14","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2023.119059","volume":"640","author":"R Sonoda","year":"2023","unstructured":"Sonoda R (2023) Fair oversampling technique using heterogeneous clusters. Inf Sci 640:119059","journal-title":"Inf Sci"},{"issue":"11","key":"2377_CR15","doi-asserted-by":"publisher","first-page":"2781","DOI":"10.1109\/TPAMI.2019.2914680","volume":"42","author":"C Huang","year":"2019","unstructured":"Huang C, Li Y, Loy CC, Tang X (2019) Deep imbalanced learning for face recognition and attribute prediction. IEEE Trans Pattern Anal Mach Intell 42(11):2781\u20132794","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"2377_CR16","doi-asserted-by":"publisher","first-page":"9294","DOI":"10.1109\/TKDE.2024.3419834","volume":"12","author":"K Yang","year":"2024","unstructured":"Yang K, Yu Z, Chen W, Liang Z, Chen CP (2024) Solving the imbalanced problem by metric learning and oversampling. IEEE Trans Knowl Data Eng 12:9294\u20139307","journal-title":"IEEE Trans Knowl Data Eng"},{"key":"2377_CR17","first-page":"5622","volume":"4","author":"M Lin","year":"2023","unstructured":"Lin M, Yang K, Yu Z, Shi Y, Chen CP (2023) Hybrid ensemble broad learning system for network intrusion detection. IEEE Trans Ind Inf 4:5622\u20135633","journal-title":"IEEE Trans Ind Inf"},{"issue":"6","key":"2377_CR18","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s10462-024-10759-6","volume":"57","author":"W Chen","year":"2024","unstructured":"Chen W, Yang K, Yu Z, Shi Y, Chen C (2024) A survey on imbalanced learning: latest research, applications and future directions. Artif Intell Rev 57(6):1\u201351","journal-title":"Artif Intell Rev"},{"issue":"1","key":"2377_CR19","doi-asserted-by":"publisher","DOI":"10.1016\/j.ipm.2023.103558","volume":"61","author":"AX Wang","year":"2024","unstructured":"Wang AX, Chukova SS, Sporle A, Milne BJ, Simpson CR, Nguyen BP (2024) Enhancing public research on citizen data: an empirical investigation of data synthesis using Statistics New Zealand\u2019s integrated data infrastructure. Inf Process Manag 61(1):103558","journal-title":"Inf Process Manag"},{"key":"2377_CR20","unstructured":"Nguyen HM, Cooper EW, Kamei K (2009) Borderline over-sampling for imbalanced data classification. In: Proceedings of the fifth international workshop on computational intelligence and applications. IEEE, pp 24\u201329"},{"key":"2377_CR21","doi-asserted-by":"crossref","unstructured":"He H, Bai Y, Garcia EA, Li S (2008) ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: Proceedings of the 2008 IEEE international joint conference on neural networks. IEEE, pp 1322\u20131328","DOI":"10.1109\/IJCNN.2008.4633969"},{"key":"2377_CR22","doi-asserted-by":"publisher","first-page":"114692","DOI":"10.1109\/ACCESS.2020.3003346","volume":"8","author":"I Kunakorntum","year":"2020","unstructured":"Kunakorntum I, Hinthong W, Phunchongharn P (2020) A synthetic minority based on probabilistic distribution (SyMProD) oversampling for imbalanced datasets. IEEE Access 8:114692\u2013114704","journal-title":"IEEE Access"},{"issue":"1","key":"2377_CR23","doi-asserted-by":"publisher","first-page":"20","DOI":"10.1145\/1007730.1007735","volume":"6","author":"GE Batista","year":"2004","unstructured":"Batista GE, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor Newsl 6(1):20\u201329","journal-title":"ACM SIGKDD Explor Newsl"},{"key":"2377_CR24","unstructured":"Kim J, Kong J, Son J (2021) Conditional variational autoencoder with adversarial learning for end-to-end text-to-speech. In: International conference on machine learning. PMLR, pp 5530\u20135540"},{"key":"2377_CR25","unstructured":"Damm S, Forster D, Velychko D, Dai Z, Fischer A, L\u00fccke J (2023) The ELBO of variational autoencoders converges to a sum of entropies. In: International conference on artificial intelligence and statistics. PMLR, pp 3931\u20133960"},{"key":"2377_CR26","first-page":"34709","volume":"35","author":"Y Zheng","year":"2022","unstructured":"Zheng Y, He T, Qiu Y, Wipf DP (2022) Learning manifold dimensions with conditional variational autoencoders. Adv Neural Inf Process Syst 35:34709","journal-title":"Adv Neural Inf Process Syst"},{"key":"2377_CR27","first-page":"480","volume":"34","author":"J Aneja","year":"2021","unstructured":"Aneja J, Schwing A, Kautz J, Vahdat A (2021) A contrastive learning approach for training variational autoencoder priors. Adv Neural Inf Process Syst 34:480\u2013493","journal-title":"Adv Neural Inf Process Syst"},{"key":"2377_CR28","doi-asserted-by":"crossref","unstructured":"Xie Z, Liu C, Zhang Y, Lu H, Wang D, Ding Y (2021) Adversarial and contrastive variational autoencoder for sequential recommendation. In: Proceedings of the web conference 2021, pp 449\u2013459","DOI":"10.1145\/3442381.3449873"},{"issue":"2","key":"2377_CR29","doi-asserted-by":"publisher","first-page":"1229","DOI":"10.1007\/s00371-023-02843-9","volume":"40","author":"M-f Hu","year":"2024","unstructured":"Hu M-f, Liu Z-Y, Liu J-W (2024) mcVAE: disentangling by mean constraint. Vis Comput 40(2):1229\u20131243","journal-title":"Vis Comput"},{"key":"2377_CR30","unstructured":"Xu L, Skoularidou M, Cuesta-Infante A, Veeramachaneni K (2019) Modeling tabular data using conditional GAN. In: Advances in neural information processing systems, pp 7335\u20137345"},{"issue":"4","key":"2377_CR31","doi-asserted-by":"publisher","first-page":"849","DOI":"10.1109\/TSMCB.2006.872273","volume":"36","author":"N Nasios","year":"2006","unstructured":"Nasios N, Bors AG (2006) Variational learning for Gaussian mixture models. IEEE Trans Syst Man Cybern Part B (Cybernetics) 36(4):849\u2013862","journal-title":"IEEE Trans Syst Man Cybern Part B (Cybernetics)"},{"key":"2377_CR32","doi-asserted-by":"publisher","DOI":"10.1016\/j.artint.2025.104292","volume":"340","author":"AX Wang","year":"2025","unstructured":"Wang AX, Nguyen BP (2025) TTVAE: transformer-based generative modeling for tabular data generation. Artif Intell 340:104292","journal-title":"Artif Intell"},{"key":"2377_CR33","doi-asserted-by":"publisher","unstructured":"Wang AX, Nguyen BP (2025) Deterministic Autoencoder using Wasserstein loss for tabular data generation. Neural Netw 185:107208. https:\/\/doi.org\/10.1016\/j.neunet.2025.107208","DOI":"10.1016\/j.neunet.2025.107208"},{"key":"2377_CR34","unstructured":"Gorishniy Y, Rubachev I, Kartashev N, Shlenskii D, Kotelnikov A, Babenko H\u00a0ArtemZhang, Zhang J, Srinivasan B, Shen Z, Qin X, Faloutsos C, Rangwala H, Karypis G (2024) Mixed-type tabular data synthesis with score-based diffusion in latent space. In: The 12th international conference on learning representations (ICLR). https:\/\/openreview.net\/pdf?id=4Ay23yeuz0"},{"key":"2377_CR35","unstructured":"Borisov V, Se\u00dfler K, Leemann T, Pawelczyk M, Kasneci G (2023) Language models are realistic tabular data generators. In: International conference on learning representations, pp 1\u201318"},{"key":"2377_CR36","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2024.121610","volume":"691","author":"AX Wang","year":"2025","unstructured":"Wang AX, Simpson CR, Nguyen BP (2025) Blending is all you need: data-centric ensemble synthetic data. Inf Sci 691:121610","journal-title":"Inf Sci"},{"issue":"7","key":"2377_CR37","first-page":"6651","volume":"35","author":"NA Azhar","year":"2022","unstructured":"Azhar NA, Pozi MSM, Din AM, Jatowt A (2022) An investigation of SMOTE based methods for imbalanced datasets with data complexity analysis. IEEE Trans Knowl Data Eng 35(7):6651\u20136672","journal-title":"IEEE Trans Knowl Data Eng"},{"key":"2377_CR38","doi-asserted-by":"crossref","unstructured":"Wang AX, Chukova SS, Nguyen BP (2023) Data-centric AI to improve churn prediction with synthetic data. In: The 3rd international conference on computer, control and robotics (ICCCR 2023). IEEE, pp 409\u2013413","DOI":"10.1109\/ICCCR56747.2023.10194217"},{"key":"2377_CR39","doi-asserted-by":"publisher","DOI":"10.3389\/fdata.2023.1296508","author":"Z Zhao","year":"2024","unstructured":"Zhao Z, Kunar A, Birke R, Scheer H, Chen LY (2024) CTAB-GAN+: enhancing tabular data synthesis. Front Big Data. https:\/\/doi.org\/10.3389\/fdata.2023.1296508","journal-title":"Front Big Data"},{"key":"2377_CR40","unstructured":"Kotelnikov A, Baranchuk D, Rubachev I, Babenko A (2023) TABDDPM: modelling tabular data with diffusion models. In: International conference on machine learning, p 17564"},{"key":"2377_CR41","doi-asserted-by":"crossref","unstructured":"Wang AX, Chukova SS, Simpson CR, Nguyen BP (2023) Data-centric AI to improve early detection of mental illness. In: The 2023 IEEE statistical signal processing workshop (SSP 2023). IEEE, pp 369\u2013373","DOI":"10.1109\/SSP53291.2023.10207938"},{"issue":"7","key":"2377_CR42","doi-asserted-by":"publisher","first-page":"0271260","DOI":"10.1371\/journal.pone.0271260","volume":"17","author":"M Kim","year":"2022","unstructured":"Kim M, Hwang K-B (2022) An empirical evaluation of sampling methods for the classification of imbalanced data. PLoS ONE 17(7):0271260","journal-title":"PLoS ONE"},{"key":"2377_CR43","doi-asserted-by":"crossref","unstructured":"Wang AX, Chukova SS, Nguyen BP (2022) Implementation and analysis of centroid displacement-based k-nearest neighbors. In: International conference on advanced data mining and applications. Springer, Berlin, pp 431\u2013443","DOI":"10.1007\/978-3-031-22064-7_31"}],"container-title":["Knowledge and Information Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10115-025-02377-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10115-025-02377-7\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10115-025-02377-7.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,5,20]],"date-time":"2025-05-20T12:37:35Z","timestamp":1747744655000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10115-025-02377-7"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,15]]},"references-count":43,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2025,6]]}},"alternative-id":["2377"],"URL":"https:\/\/doi.org\/10.1007\/s10115-025-02377-7","relation":{},"ISSN":["0219-1377","0219-3116"],"issn-type":[{"value":"0219-1377","type":"print"},{"value":"0219-3116","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,3,15]]},"assertion":[{"value":"5 November 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"26 January 2025","order":2,"name":"revised","label":"Revised","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"15 February 2025","order":3,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"15 March 2025","order":4,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no known competing financial interests or personal relationships that could have influenced the work reported in this paper.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}