{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,6]],"date-time":"2026-04-06T21:10:30Z","timestamp":1775509830541,"version":"3.50.1"},"reference-count":31,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2025,1,9]],"date-time":"2025-01-09T00:00:00Z","timestamp":1736380800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Symmetry"],"abstract":"<jats:p>This research presents an innovative approach in text mining based on rough set theory. This study fundamentally utilizes the concept of symmetry from rough set theory to construct indiscernibility matrices and model uncertainties in data analysis, ensuring both methodological structure and solution processes remain symmetric. The effective management and analysis of large-scale textual data heavily relies on automated text classification technologies. In this context, term weighting plays a crucial role in determining classification performance. Particularly, supervised term weighting methods that utilize class information have emerged as the most effective approaches. However, the optimal representation of class\u2013term relationships remains an area requiring further research. This study proposes the Rough Multivariate Weighting Scheme (RMWS) and presents its mathematical derivative, the Square Root Rough Multivariate Weighting Scheme (SRMWS). The RMWS model employs rough sets to identify information-carrying documents within the document\u2013term\u2013class space and adopts a computational methodology incorporating \u03b1, \u03b2, and \u03b3 coefficients. Moreover, the distribution of the term among classes is again effectively revealed. Comprehensive experimental studies were conducted on three different datasets featuring imbalanced-multiclass, balanced-multiclass, and imbalanced-binary class structures to evaluate the model\u2019s effectiveness. The results show that RMWS and its derivative SRMWS methods outperform existing approaches by exhibiting superior performance on balanced and unbalanced datasets without being affected by class imbalance and number of classes. Furthermore, the SRMWS method is found to be the most effective for SVM and KNN classifiers, while the RMWS method achieves the best results for NB classifiers. These results show that the proposed methods significantly improve the text classification performance.<\/jats:p>","DOI":"10.3390\/sym17010090","type":"journal-article","created":{"date-parts":[[2025,1,9]],"date-time":"2025-01-09T07:59:42Z","timestamp":1736409582000},"page":"90","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":2,"title":["Effective Text Classification Through Supervised Rough Set-Based Term Weighting"],"prefix":"10.3390","volume":"17","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7820-413X","authenticated-orcid":false,"given":"Ras\u0131m","family":"\u00c7ekik","sequence":"first","affiliation":[{"name":"Department of Computer Engineering, Faculty of Engineering, \u015e\u0131rnak University, 73000 \u015e\u0131rnak, Turkey"}]}],"member":"1968","published-online":{"date-parts":[[2025,1,9]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"546","DOI":"10.1109\/IJCNN.2005.1555890","article-title":"A Comparative Study on Term Weighting Schemes for Text Categorization","volume":"Volume 1","author":"Lan","year":"2005","journal-title":"Proceedings of the 2005 IEEE International Joint Conference on Neural Networks"},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"e6909","DOI":"10.1002\/cpe.6909","article-title":"A New Metric for Feature Selection on Short Text Datasets","volume":"34","author":"Cekik","year":"2022","journal-title":"Concurr. Comput."},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1177\/0165551521991037","article-title":"A Novel Filter Feature Selection Method for Text Classification: Extensive Feature Selector","volume":"49","author":"Parlak","year":"2023","journal-title":"J. Inf. Sci."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"113691","DOI":"10.1016\/j.eswa.2020.113691","article-title":"A Novel Filter Feature Selection Method Using Rough Set for Short Text Data","volume":"160","author":"Cekik","year":"2020","journal-title":"Expert Syst. Appl."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Manning, C.D. (2008). Introduction to Information Retrieval, Syngress Publishing.","DOI":"10.1017\/CBO9780511809071"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Debole, F., and Sebastiani, F. (2003, January 9\u201312). Supervised Term Weighting for Automated Text Categorization. Proceedings of the 2003 ACM Symposium on Applied Computing, Melbourne, FL, USA.","DOI":"10.1145\/952532.952688"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Emmanuel, M., Khatri, S.M., and Babu, D.R.R. (2013, January 13\u201316). A Novel Scheme for Term Weighting in Text Categorization: Positive Impact Factor. Proceedings of the 2013 IEEE \u0130nternational Conference on Systems, Man, and Cybernetics, Manchester, UK.","DOI":"10.1109\/SMC.2013.392"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"109","DOI":"10.1016\/j.ins.2013.02.029","article-title":"Class-Indexing-Based Term Weighting for Automatic Text Classification","volume":"236","author":"Ren","year":"2013","journal-title":"Inf. Sci."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1016\/j.ins.2019.09.013","article-title":"YAKE! Keyword Extraction from Single Documents Using Multiple Local Features","volume":"509","author":"Campos","year":"2020","journal-title":"Inf. Sci."},{"key":"ref_10","unstructured":"Do\u011fan, T. (2019). Metin S\u0131n\u0131fland\u0131rma I\u00e7in Terim A\u011f\u0131rl\u0131kland\u0131rma. [Ph.D. Thesis, Eski\u015fehir Technical University]."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"690","DOI":"10.1016\/j.eswa.2007.10.042","article-title":"Imbalanced Text Classification: A Term Weighting Approach","volume":"36","author":"Liu","year":"2009","journal-title":"Expert Syst. Appl."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"493","DOI":"10.1108\/00220410410560573","article-title":"A Statistical Interpretation of Term Specificity and Its Application in Retrieval","volume":"60","year":"2004","journal-title":"J. Doc."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"721","DOI":"10.1109\/TPAMI.2008.110","article-title":"Supervised and Traditional Term Weighting Methods for Automatic Text Categorization","volume":"31","author":"Lan","year":"2008","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"2553","DOI":"10.1002\/asi.23338","article-title":"A New Term-weighting Scheme for Text Classification Using the Odds of Positive and Negative Class Probabilities","volume":"66","author":"Ko","year":"2015","journal-title":"J. Assoc. Inf. Sci. Technol."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"245","DOI":"10.1016\/j.eswa.2016.09.009","article-title":"Turning from TF-IDF to TF-IGM for Term Weighting in Text Classification","volume":"66","author":"Chen","year":"2016","journal-title":"Expert Syst. Appl."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"45","DOI":"10.1016\/j.eswa.2019.04.015","article-title":"Improved Inverse Gravity Moment Term Weighting for Text Classification","volume":"130","author":"Dogan","year":"2019","journal-title":"Expert Syst. Appl."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"119578","DOI":"10.1016\/j.eswa.2023.119578","article-title":"TF-IGM Revisited: Imbalance Text Classification with Relative Imbalance Ratio","volume":"217","author":"Okkalioglu","year":"2023","journal-title":"Expert Syst. Appl."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"9761","DOI":"10.1007\/s11042-022-12538-3","article-title":"The Importance of Term Weighting in Semantic Understanding of Text: A Review of Techniques","volume":"82","author":"Rathi","year":"2023","journal-title":"Multimed. Tools Appl."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Dai, Z., and Callan, J. (2020, January 25\u201330). Context-Aware Term Weighting for First Stage Passage Retrieval. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual.","DOI":"10.1145\/3397271.3401204"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1162\/tacl_a_00051","article-title":"Enriching Word Vectors with Subword Information","volume":"5","author":"Bojanowski","year":"2017","journal-title":"Trans. Assoc. Comput. Linguist."},{"key":"ref_21","doi-asserted-by":"crossref","unstructured":"Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., and Zettlemoyer, L. (2018). Deep Contextualized Word Representations. arXiv.","DOI":"10.18653\/v1\/N18-1202"},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"1857","DOI":"10.1016\/j.eswa.2014.09.011","article-title":"Chinese Comments Sentiment Classification Based on Word2vec and SVMperf","volume":"42","author":"Zhang","year":"2015","journal-title":"Expert Syst. Appl."},{"key":"ref_23","unstructured":"Yang, K., Cai, Y., Chen, Z., Leung, H., and Lau, R. (2016, January 11\u201316). Exploring Topic Discriminating Power of Words in Latent Dirichlet Allocation. Proceedings of the COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, Osaka, Japan."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"661","DOI":"10.1080\/019697298125470","article-title":"Rough Set Theory and Its Applications to Data Analysis","volume":"29","author":"Pawlak","year":"1998","journal-title":"Cybern. Syst."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"1881","DOI":"10.1007\/s00500-016-2443-0","article-title":"A New Classification Method Based on Rough Sets Theory","volume":"22","author":"Cekik","year":"2018","journal-title":"Soft Comput."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"134","DOI":"10.1016\/j.ins.2016.05.025","article-title":"Cost-Sensitive Feature Selection Based on Adaptive Neighborhood Granularity with Multi-Level Confidence","volume":"366","author":"Zhao","year":"2016","journal-title":"Inf. Sci."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"1567","DOI":"10.1007\/s00500-014-1307-8","article-title":"Self-Adjusting Harmony Search-Based Feature Selection","volume":"19","author":"Zheng","year":"2015","journal-title":"Soft Comput."},{"key":"ref_28","unstructured":"Asuncion, A., and Newman, D. (2024, September 25). UCI Machine Learning Repository 2007. Available online: https:\/\/archive.ics.uci.edu\/dataset\/137\/reuters+21578+text+categorization+collection."},{"key":"ref_29","unstructured":"Metsis, V., Androutsopoulos, I., and Paliouras, G. (2006, January 27\u201328). Spam Filtering with Naive Bayes-Which Naive Bayes?. Proceedings of the CEAS, Mountain View, CA, USA."},{"key":"ref_30","doi-asserted-by":"crossref","first-page":"472","DOI":"10.54287\/gujsa.1379024","article-title":"A New Feature Selection Metric Based on Rough Sets and Information Gain in Text Classification","volume":"10","author":"Mahmut","year":"2023","journal-title":"Gazi Univ. J. Sci. Part A Eng. Innov."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"226","DOI":"10.1016\/j.knosys.2012.06.005","article-title":"A Novel Probabilistic Feature Selection Method for Text Classification","volume":"36","author":"Uysal","year":"2012","journal-title":"Knowl. Based Syst."}],"container-title":["Symmetry"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-8994\/17\/1\/90\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,8]],"date-time":"2025-10-08T10:25:16Z","timestamp":1759919116000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-8994\/17\/1\/90"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,1,9]]},"references-count":31,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2025,1]]}},"alternative-id":["sym17010090"],"URL":"https:\/\/doi.org\/10.3390\/sym17010090","relation":{},"ISSN":["2073-8994"],"issn-type":[{"value":"2073-8994","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,1,9]]}}}