{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,26]],"date-time":"2026-03-26T20:26:50Z","timestamp":1774556810794,"version":"3.50.1"},"reference-count":52,"publisher":"MDPI AG","issue":"4","license":[{"start":{"date-parts":[[2020,12,4]],"date-time":"2020-12-04T00:00:00Z","timestamp":1607040000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computation"],"abstract":"<jats:p>The recurrent use of databases with categorical variables in different applications demands new alternatives to identify relevant patterns. Classification is an interesting approach for the recognition of this type of data. However, there are a few amount of methods for this purpose in the literature. Also, those techniques are specifically focused only on kernels, having accuracy problems and high computational cost. For this reason, we propose an identification approach for categorical variables using conventional classifiers (LDC-QDC-KNN-SVM) and different mapping techniques to increase the separability of classes. Specifically, we map the initial features (categorical attributes) to another space, using the Chi-square (C-S) as a measure of dissimilarity. Then, we employ the (t-SNE) for reducing dimensionality of data to two or three features, allowing a significant reduction of computational times in learning methods. We evaluate the performance of proposed approach in terms of accuracy for several experimental configurations and public categorical datasets downloaded from the UCI repository, and we compare with relevant state of the art methods. Results show that C-S mapping and t-SNE considerably diminish the computational times in recognitions tasks, while the accuracy is preserved. Also, when we apply only the C-S mapping to the datasets, the separability of classes is enhanced, thus, the performance of learning algorithms is clearly increased.<\/jats:p>","DOI":"10.3390\/computation8040104","type":"journal-article","created":{"date-parts":[[2020,12,4]],"date-time":"2020-12-04T11:59:00Z","timestamp":1607083140000},"page":"104","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":15,"title":["Classification of Categorical Data Based on the Chi-Square Dissimilarity and t-SNE"],"prefix":"10.3390","volume":"8","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3985-4014","authenticated-orcid":false,"given":"Luis Ariosto Serna","family":"Cardona","sequence":"first","affiliation":[{"name":"Department of Electric Engineering, Universidad Tecnol\u00f3gica de Pereira, Pereira 660002, Colombia"},{"name":"Department of Engineering, Corporaci\u00f3n Instituto de Administraci\u00f3n y Finanzas (CIAF), Pereira 660002, Colombia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2086-3722","authenticated-orcid":false,"given":"Hern\u00e1n Dar\u00edo","family":"Vargas-Cardona","sequence":"additional","affiliation":[{"name":"Department of Electronics and Computer Science, Pontificia Universidad Javeriana Cali, Cali 760031, Colombia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Piedad","family":"Navarro Gonz\u00e1lez","sequence":"additional","affiliation":[{"name":"Department of Engineering, Corporaci\u00f3n Instituto de Administraci\u00f3n y Finanzas (CIAF), Pereira 660002, Colombia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0522-8683","authenticated-orcid":false,"given":"David Augusto","family":"Cardenas Pe\u00f1a","sequence":"additional","affiliation":[{"name":"Department of Electric Engineering, Universidad Tecnol\u00f3gica de Pereira, Pereira 660002, Colombia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"\u00c1lvaro \u00c1ngel","family":"Orozco Guti\u00e9rrez","sequence":"additional","affiliation":[{"name":"Department of Electric Engineering, Universidad Tecnol\u00f3gica de Pereira, Pereira 660002, Colombia"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2020,12,4]]},"reference":[{"key":"ref_1","unstructured":"Janert, P.K. (2010). Data Analysis with Open Source Tools: A Hands-On Guide for Programmers and Data Scientists, O\u2019Reilly Media, Inc."},{"key":"ref_2","unstructured":"Ng, A.Y., Jordan, M.I., and Weiss, Y. (2002). On spectral clustering: Analysis and an algorithm. Advances in Neural Information Processing Systems, MIT Press."},{"key":"ref_3","first-page":"23","article-title":"Support vector machines","volume":"1","author":"Meyer","year":"2001","journal-title":"R News"},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Rasmussen, C.E. (2004). Gaussian processes in machine learning. Advanced Lectures on Machine Learning, Springer.","DOI":"10.7551\/mitpress\/3206.001.0001"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"e5140","DOI":"10.1002\/cpe.5140","article-title":"Research on improved text classification method based on combined weighted model","volume":"32","author":"Wang","year":"2020","journal-title":"Concurr. Comput. Pract. Exp."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"446","DOI":"10.1109\/91.784206","article-title":"A fuzzy k-modes algorithm for clustering categorical data","volume":"7","author":"Huang","year":"1999","journal-title":"IEEE Trans. Fuzzy Syst."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"857","DOI":"10.2307\/2528823","article-title":"A general coefficient of similarity and some of its properties","volume":"27","author":"Gower","year":"1971","journal-title":"Biometrics"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"567","DOI":"10.1016\/0031-3203(91)90022-W","article-title":"Symbolic clustering using a new dissimilarity measure","volume":"24","author":"Gowda","year":"1991","journal-title":"Pattern Recognit."},{"key":"ref_9","unstructured":"Kaufman, L. (2009). Finding Groups in Data: An Introduction to Cluster Analysis, John Wiley and Sons."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"396","DOI":"10.1109\/TPAMI.1983.4767409","article-title":"Automated construction of classifications: Conceptual clustering versus numerical taxonomy","volume":"4","author":"Michalski","year":"1983","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"412","DOI":"10.1002\/sam.11402","article-title":"Dissimilarity measure for ranking data via mixture of copulae","volume":"12","author":"Bonanomi","year":"2019","journal-title":"Stat. Anal. Data Min. ASA Data Sci. J."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"e5094","DOI":"10.1002\/cpe.5094","article-title":"Design and evaluation of a parallel document clustering algorithm based on hierarchical latent semantic analysis","volume":"31","author":"Seshadri","year":"2019","journal-title":"Concurr. Comput. Pract. Exp."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"2831","DOI":"10.1109\/TNNLS.2016.2598722","article-title":"A fast and efficient method for training categorical radial basis function networks","volume":"28","author":"Alexandridis","year":"2017","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Zheng, Z., Cai, Y., Yang, Y., and Li, Y. (2018, January 18\u201321). Sparse Weighted Naive Bayes Classifier for Efficient Classification of Categorical Data. Proceedings of the 2018 IEEE Third International Conference on Data Science in Cyberspace (DSC), Guangzhou, China.","DOI":"10.1109\/DSC.2018.00110"},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"105","DOI":"10.1016\/j.neucom.2017.03.085","article-title":"The na\u00efve associative classifier (NAC): A novel, simple, transparent, and accurate classification model evaluated on financial data","volume":"265","year":"2017","journal-title":"Neurocomputing"},{"key":"ref_16","unstructured":"(2020, September 05). Computation, Special Issue \u201cExplainable Computational Intelligence, Theory, Methods and Applications\u201d. Available online: https:\/\/www.mdpi.com\/journal\/computation\/special_issues\/explainable_computational_intelligence."},{"key":"ref_17","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"Maaten","year":"2008","journal-title":"J. Mach. Learn. Res."},{"key":"ref_18","unstructured":"Field, A. (2013). Discovering Statistics Using IBM SPSS Statistics, Sage."},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"781","DOI":"10.1109\/TNNLS.2014.2325872","article-title":"Coupled attribute similarity learning on categorical data","volume":"26","author":"Wang","year":"2015","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Polato, M., Lauriola, I., and Aiolli, F. (2018). A novel boolean kernels family for categorical data. Entropy, 20.","DOI":"10.3390\/e20060444"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"1723","DOI":"10.3233\/JIFS-15372","article-title":"A new classifier for categorical data based on a possibilistic estimation and a novel generalized minimum-based algorithm","volume":"33","author":"Baati","year":"2017","journal-title":"J. Intell. Fuzzy Syst."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"1147","DOI":"10.1016\/0167-8655(95)00075-R","article-title":"A conceptual version of the k-means algorithm","volume":"16","author":"Ralambondrainy","year":"1995","journal-title":"Pattern Recognit. Lett."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"111","DOI":"10.1080\/01969727408621685","article-title":"Woodbury and Jonathan Clive. Clinical pure types as a fuzzy partition","volume":"4","author":"Max","year":"1974","journal-title":"J. Cybern."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"110","DOI":"10.1016\/j.patrec.2006.06.006","article-title":"A method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data set","volume":"28","author":"Ahmad","year":"2007","journal-title":"Pattern Recognit. Lett."},{"key":"ref_25","unstructured":"Jain, A.K., and Dubes, R.C. (1988). Algorithms for Clustering Data, Prentice-Hall, Inc."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1613\/jair.346","article-title":"Improved heterogeneous distance functions","volume":"6","author":"Wilson","year":"1997","journal-title":"J. Artif. Intell. Res."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"2047","DOI":"10.1109\/TNNLS.2015.2451151","article-title":"Space structure and clustering of categorical data","volume":"27","author":"Qian","year":"2016","journal-title":"IEEE Trans. Neural Netw. Learn. Syst."},{"key":"ref_28","first-page":"34","article-title":"A fast clustering algorithm to cluster very large categorical data sets in data mining","volume":"3","author":"Huang","year":"1997","journal-title":"DMKD"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"943","DOI":"10.1016\/j.patcog.2003.11.003","article-title":"An optimization algorithm for clustering using weighted dissimilarity measures","volume":"37","author":"Chan","year":"2004","journal-title":"Pattern Recognit."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"1509","DOI":"10.1109\/TPAMI.2012.228","article-title":"The impact of cluster representatives on the convergence of the k-modes type clustering","volume":"35","author":"Bai","year":"2013","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Kobayashi, Y., Song, L., Tomita, M., and Chen, P. (2019). Automatic Fault Detection and Isolation Method for Roller Bearing Using Hybrid-GA and Sequential Fuzzy Inference. Sensors, 19.","DOI":"10.3390\/s19163553"},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"16","DOI":"10.1016\/j.apacoust.2014.08.016","article-title":"Application of empirical mode decomposition and artificial neural network for automatic bearing fault diagnosis based on vibration signals","volume":"89","author":"Ali","year":"2015","journal-title":"Appl. Acoust."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"658","DOI":"10.1016\/j.ymssp.2016.04.028","article-title":"Self-adaptive bearing fault diagnosis based on permutation entropy and manifold-based dynamic time warping","volume":"114","author":"Tian","year":"2019","journal-title":"Mech. Syst. Signal Process."},{"key":"ref_34","doi-asserted-by":"crossref","unstructured":"Tan, J., Fu, W., Wang, K., Xue, X., Hu, W., and Shan, Y. (2019). Fault Diagnosis for Rolling Bearing Based on Semi-Supervised Clustering and Support Vector Data Description with Adaptive Parameter Optimization and Improved Decision Strategy. Appl. Sci., 9.","DOI":"10.3390\/app9081676"},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"79","DOI":"10.2478\/fcds-2014-0006","article-title":"Aspects in classification Learning\u2014Review of recent developments in learning vector quantization","volume":"39","author":"Kaden","year":"2014","journal-title":"Found. Comput. Decis. Sci."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"175","DOI":"10.1016\/j.mechmachtheory.2015.03.014","article-title":"Rolling bearing fault diagnosis under variable conditions using LMD-SVD and extreme learning machine","volume":"90","author":"Tian","year":"2015","journal-title":"Mech. Mach. Theory"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"194","DOI":"10.1016\/j.jsv.2016.12.017","article-title":"Novel synthetic index-based adaptive stochastic resonance method and its application in bearing fault diagnosis","volume":"391","author":"Zhou","year":"2017","journal-title":"J. Sound Vib."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"1431","DOI":"10.1177\/1077546314534870","article-title":"A fault diagnosis approach for roller bearing based on improved intrinsic timescale decomposition de-noising and kriging-variable predictive model-based class discriminate","volume":"22","author":"Yang","year":"2016","journal-title":"J. Vib. Control"},{"key":"ref_39","doi-asserted-by":"crossref","unstructured":"Chen, Y., Zhang, T., Zhao, W., Luo, Z., and Sun, K. (2019). Fault Diagnosis of Rolling Bearing Using Multiscale Amplitude-Aware Permutation Entropy and Random Forest. Algorithms, 12.","DOI":"10.3390\/a12090184"},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"5011","DOI":"10.1016\/j.eswa.2014.11.047","article-title":"Kurtosis forecasting of bearing vibration signal based on the hybrid model of empirical mode decomposition and RVM with artificial bee colony algorithm","volume":"42","author":"Fei","year":"2015","journal-title":"Expert Syst. Appl."},{"key":"ref_41","doi-asserted-by":"crossref","unstructured":"Shen, C., Xie, J., Wang, D., Jiang, X., and Shi, J. (2019). Improved Hierarchical Adaptive Deep Belief Network for Bearing Fault Diagnosis. Appl. Sci., 9.","DOI":"10.3390\/app9163374"},{"key":"ref_42","doi-asserted-by":"crossref","unstructured":"Anbu, S., Thangavelu, A., and Ashok, S.D. (2019). Fuzzy C-Means Based Clustering and Rule Formation Approach for Classification of Bearing Faults Using Discrete Wavelet Transform. Computation, 7.","DOI":"10.3390\/computation7040054"},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"691","DOI":"10.1016\/j.dss.2012.08.014","article-title":"Mutual information based input feature selection for classification problems","volume":"54","author":"Cang","year":"2012","journal-title":"Decis. Support Syst."},{"key":"ref_44","doi-asserted-by":"crossref","unstructured":"Sani, L., Pecori, R., Mordonini, M., and Cagnoni, S. (2019). From Complex System Analysis to Pattern Recognition: Experimental Assessment of an Unsupervised Feature Extraction Method Based on the Relevance Index Metrics. Computation, 7.","DOI":"10.3390\/computation7030039"},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Weber, M. (2018). Implications of PCCA+ in molecular simulation. Computation, 6.","DOI":"10.3390\/computation6010020"},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Tang, Y., Zu, Q., and Rodr\u00edguez Garc\u00eda, J. (2019). A K-Means Clustering Algorithm: Using the Chi-Square as a Distance. International Conference on Human Centered Computing, Springer. Lecture Notes in Computer Science.","DOI":"10.1007\/978-3-030-15127-0"},{"key":"ref_47","unstructured":"Hinton, G.E., and Roweis, S.T. (2003). Stochastic neighbor embedding. Advances in Neural Information Processing Systems, MIT Press."},{"key":"ref_48","first-page":"3221","article-title":"Accelerating t-SNE using tree-based algorithms","volume":"15","year":"2014","journal-title":"J. Mach. Learn. Res."},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/BF00994018","article-title":"Support-vector network","volume":"20","author":"Cortes","year":"1995","journal-title":"Mach. Learn."},{"key":"ref_50","first-page":"26","article-title":"Building sparse multiple-kernel SVM classifiers","volume":"3","author":"Hu","year":"2009","journal-title":"Learning (MKL)"},{"key":"ref_51","first-page":"73","article-title":"Discriminant function analysis: Concept and application","volume":"33","year":"2008","journal-title":"E\u011fitim Ara\u015ft\u0131rmalar\u0131 Dergisi"},{"key":"ref_52","unstructured":"Li, W., and Zhao, J. (2020). Wasserstein information matrix. arXiv."}],"container-title":["Computation"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2079-3197\/8\/4\/104\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T10:41:21Z","timestamp":1760179281000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2079-3197\/8\/4\/104"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,12,4]]},"references-count":52,"journal-issue":{"issue":"4","published-online":{"date-parts":[[2020,12]]}},"alternative-id":["computation8040104"],"URL":"https:\/\/doi.org\/10.3390\/computation8040104","relation":{},"ISSN":["2079-3197"],"issn-type":[{"value":"2079-3197","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,12,4]]}}}