{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,24]],"date-time":"2026-03-24T16:17:32Z","timestamp":1774369052342,"version":"3.50.1"},"reference-count":37,"publisher":"MDPI AG","issue":"8","license":[{"start":{"date-parts":[[2022,7,30]],"date-time":"2022-07-30T00:00:00Z","timestamp":1659139200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100000038","name":"NSERC Discovery Grant","doi-asserted-by":"publisher","award":["194376"],"award-info":[{"award-number":["194376"]}],"id":[{"id":"10.13039\/501100000038","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>Text classification aims to assign labels to textual units such as documents, sentences and paragraphs. Some applications of text classification include sentiment classification and news categorization. In this paper, we present a soft computing technique-based algorithm (TSC) to classify sentiment polarities of tweets as well as news categories from text. The TSC algorithm is a supervised learning method based on tolerance near sets. Near sets theory is a more recent soft computing methodology inspired by rough sets where instead of set approximation operators used by rough sets to induce tolerance classes, the tolerance classes are directly induced from the feature vectors using a tolerance level parameter and a distance function. The proposed TSC algorithm takes advantage of the recent advances in efficient feature extraction and vector generation from pre-trained bidirectional transformer encoders for creating tolerance classes. Experiments were performed on ten well-researched datasets which include both short and long text. Both pre-trained SBERT and TF-IDF vectors were used in the experimental analysis. Results from transformer-based vectors demonstrate that TSC outperforms five well-known machine learning algorithms on four datasets, and it is comparable with all other datasets based on the weighted F1, Precision and Recall scores. The highest AUC-ROC (Area under the Receiver Operating Characteristics) score was obtained in two datasets and comparable in six other datasets. The highest ROC-PRC (Area under the Precision\u2013Recall Curve) score was obtained in one dataset and comparable in four other datasets. Additionally, significant differences were observed in most comparisons when examining the statistical difference between the weighted F1-score of TSC and other classifiers using a Wilcoxon signed-ranks test.<\/jats:p>","DOI":"10.3390\/a15080267","type":"journal-article","created":{"date-parts":[[2022,7,31]],"date-time":"2022-07-31T21:49:02Z","timestamp":1659304142000},"page":"267","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":7,"title":["Short Text Classification with Tolerance-Based Soft Computing Method"],"prefix":"10.3390","volume":"15","author":[{"given":"Vrushang","family":"Patel","sequence":"first","affiliation":[{"name":"Deloitte Inc., Calgary, AB T2P 0R8, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4169-6115","authenticated-orcid":false,"given":"Sheela","family":"Ramanna","sequence":"additional","affiliation":[{"name":"Department of Applied Computer Science, University of Winnipeg, Winnipeg, MB R3B 2E9, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2653-3780","authenticated-orcid":false,"given":"Ketan","family":"Kotecha","sequence":"additional","affiliation":[{"name":"Symbiosis Institute of Technology (SIT), Symbiosis Centre for Applied Artificial Intelligence (SCAAI), Symbiosis International (Deemed University), Pune 412115, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1745-5231","authenticated-orcid":false,"given":"Rahee","family":"Walambe","sequence":"additional","affiliation":[{"name":"Symbiosis Institute of Technology (SIT), Symbiosis Centre for Applied Artificial Intelligence (SCAAI), Symbiosis International (Deemed University), Pune 412115, India"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2022,7,30]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1109\/MCI.2018.2840738","article-title":"Recent trends in deep learning based natural language processing","volume":"13","author":"Young","year":"2018","journal-title":"IEEE Comput. Intell. Mag."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Pang, B., and Lee, L. (2008). Opinion Mining and Sentiment Analysis. Foundations and Trends (R) in Information Retrieval, Now Publishers.","DOI":"10.1561\/1500000011"},{"key":"ref_3","unstructured":"Bollen, J., and Pepe, A. (2011, January 17\u201321). Modeling public mood and emotion: Twitter sentiment and socioeconomic phenomena. Proceedings of the 5th International AAAI Conference on Weblogs and Social Media (ICWSM\u201911), Barcelona, Spain."},{"key":"ref_4","first-page":"46","article-title":"Like It or Not: A Survey of Twitter Sentiment Analysis Methods","volume":"49","author":"Giachanou","year":"2016","journal-title":"ACM Comput. Surv."},{"key":"ref_5","unstructured":"Kouloumpis, E., Wilson, T., and Moore, J. (2011, January 17\u201321). Twitter sentiment analysis: The good the bad and the omg!. Proceedings of the International AAAI Conference on Web and Social Media, Barcelona, Spain."},{"key":"ref_6","unstructured":"Saif, H., He, Y., and Alani, H. (2012, January 16). Alleviating data sparsity for twitter sentiment analysis. Proceedings of the 2nd Workshop on Making Sense of Microposts: Big Things Come in Small Packages at the 21st International Conference on theWorld Wide Web (WWW\u201912), CEUR Workshop Proceedings (CEUR-WS.org), Lyon, France."},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Saif, H., Fernandez, M., He, Y., and Alani, H. (2014, January 26\u201331). On stopwords, filtering and data sparsity for sentiment analysis of twitter. Proceedings of the LREC 2014, Ninth International Conference on Language Resources and Evaluation, Reykjavik, Iceland.","DOI":"10.1007\/978-3-319-11915-1_21"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Terrana, D., Augello, A., and Pilato, G. (2014, January 16\u201318). Automatic unsupervised polarity detection on a twitter data stream. Proceedings of the 2014 IEEE International Conference on Semantic Computing, Newport Beach, CA, USA.","DOI":"10.1109\/ICSC.2014.17"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Pang, B., Lee, L., and Vaithyanathan, S. (2002). Thumbs up? Sentiment classification using machine learning techniques. arXiv.","DOI":"10.3115\/1118693.1118704"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Turney, P.D. (2002). Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. arXiv.","DOI":"10.3115\/1073083.1073153"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/2337542.2337551","article-title":"Twitter, MySpace, Digg: Unsupervised Sentiment Analysis in Social Media","volume":"3","author":"Paltoglou","year":"2012","journal-title":"ACM Trans. Intell. Syst. Technol."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"213","DOI":"10.1177\/0165551515593686","article-title":"Combining resources to improve unsupervised sentiment analysis at aspect-level","volume":"42","year":"2016","journal-title":"J. Inf. Sci."},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Hutto, C., and Gilbert, E. (2014, January 1\u20134). Vader: A parsimonious rule-based model for sentiment analysis of social media text. Proceedings of the International AAAI Conference on Web and Social Media, Ann Arbor, MI, USA.","DOI":"10.1609\/icwsm.v8i1.14550"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Dong, L., Wei, F., Tan, C., Tang, D., Zhou, M., and Xu, K. (2014, January 22\u201327). Adaptive Recursive Neural Network for Target-dependent Twitter Sentiment Classification. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Baltimore, MD, USA.","DOI":"10.3115\/v1\/P14-2009"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Tang, D., Wei, F., Yang, N., Zhou, M., Liu, T., and Qin, B. (2014, January 23\u201325). Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014\u2014Proceedings of the Conference, Baltimore, MD, USA.","DOI":"10.3115\/v1\/P14-1146"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Yenter, A., and Verma, A. (2017, January 19\u201321). Deep CNN-LSTM with combined kernels from multiple branches for IMDb review sentiment analysis. Proceedings of the 2017 IEEE 8th Annual Ubiquitous Computing, Electronics and Mobile Communication Conference (UEMCON), New York, NY, USA.","DOI":"10.1109\/UEMCON.2017.8249013"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"23253","DOI":"10.1109\/ACCESS.2017.2776930","article-title":"Deep Convolution Neural Networks for Twitter Sentiment Analysis","volume":"6","author":"Jianqiang","year":"2018","journal-title":"IEEE Access"},{"key":"ref_18","first-page":"2609","article-title":"Near sets. General theory about nearness of objects","volume":"1","author":"Peters","year":"2007","journal-title":"Appl. Math. Sci."},{"key":"ref_19","first-page":"407","article-title":"Near sets. Special theory about nearness of objects","volume":"75","author":"Peters","year":"2007","journal-title":"Fundam. Inform."},{"key":"ref_20","unstructured":"Pawlak, Z. (1992). Rough Sets: Theoretical Aspects of Reasoning About Data, Springer Science and Business Media."},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"199","DOI":"10.1002\/int.10016","article-title":"Nonhierarchical document clustering based on a tolerance rough set model","volume":"17","author":"Ho","year":"2002","journal-title":"Int. J. Intell. Syst."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"345","DOI":"10.1016\/j.knosys.2014.07.012","article-title":"Solar flare detection system based on tolerance near sets in a GPU-CUDA framework","volume":"70","author":"Poli","year":"2014","journal-title":"Knowl.-Based Syst. J."},{"key":"ref_23","unstructured":"Patel, V. (2021). Short Text Classification with Tolerance Near Sets. [Master\u2019s Thesis, University of Winnipeg]."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"143","DOI":"10.3233\/FI-2010-281","article-title":"Perception and classification. A Note on Near sets and Rough sets","volume":"101","author":"Wolski","year":"2010","journal-title":"Fundam. Inform."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Patel, V., and Ramanna, S. (2021). Tolerance-based short text Sentiment Classifier. International Joint Conference on Rough Sets, Lecture Notes in Artificial Intelligence, Springer.","DOI":"10.1007\/978-3-030-87334-9_22"},{"key":"ref_26","unstructured":"Devlin, J., Chang, M.W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Reimers, N., and Gurevych, I. (2019, January 3\u20137). Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Hong Kong, China.","DOI":"10.18653\/v1\/D19-1410"},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"e19273","DOI":"10.2196\/19273","article-title":"Tracking social media discourse about the covid-19 pandemic: Development of a public coronavirus twitter data set","volume":"6","author":"Chen","year":"2020","journal-title":"JMIR Public Health Surveill."},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Lang, K. (1995). Newsweeder: Learning to filter netnews. Machine Learning Proceedings 1995, Elsevier.","DOI":"10.1016\/B978-1-55860-377-6.50048-7"},{"key":"ref_30","doi-asserted-by":"crossref","unstructured":"Bowman, S.R., Angeli, G., Potts, C., and Manning, C.D. (2015, January 17\u201321). A large annotated corpus for learning natural language inference. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal.","DOI":"10.18653\/v1\/D15-1075"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Williams, A., Nangia, N., and Bowman, S. (2018). A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), Association for Computational Linguistics.","DOI":"10.18653\/v1\/N18-1101"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Conneau, A., Kiela, D., Schwenk, H., Barrault, L., and Bordes, A. (2017, January 7\u201311). Supervised Learning of Universal Sentence Representations from Natural Language Inference Data. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark.","DOI":"10.18653\/v1\/D17-1070"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Cer, D., Yang, Y., Kong, S.Y., Hua, N., Limtiaco, N., John, R.S., Constant, N., Guajardo-C\u00e9spedes, M., Yuan, S., and Tar, C. (2018). Universal sentence encoder. arXiv.","DOI":"10.18653\/v1\/D18-2029"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random Forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach. Learn."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/1961189.1961199","article-title":"LIBSVM: A library for support vector machines","volume":"2","author":"Chang","year":"2011","journal-title":"ACM Trans. Intell. Syst. Technol."},{"key":"ref_36","unstructured":"Bottou, L., and Bousquet, O. (2007). The Tradeoffs of Large Scale Learning, MIT Press."},{"key":"ref_37","unstructured":"Tianqi, C., and Carlos, G. (2016, January 13\u201317). XGBoost. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/15\/8\/267\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T00:00:12Z","timestamp":1760140812000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/15\/8\/267"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,7,30]]},"references-count":37,"journal-issue":{"issue":"8","published-online":{"date-parts":[[2022,8]]}},"alternative-id":["a15080267"],"URL":"https:\/\/doi.org\/10.3390\/a15080267","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,7,30]]}}}