{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T14:05:07Z","timestamp":1773842707969,"version":"3.50.1"},"reference-count":44,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2025,4,29]],"date-time":"2025-04-29T00:00:00Z","timestamp":1745884800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Computers"],"abstract":"<jats:p>This study conducts an in-depth investigation of the performance of six transformer models using 12 different datasets\u201410 with three classes and two with two classes\u2014on sentiment classification. We use these six models and generate all combinations of triple schema ensembles, Majority and Soft vote. In total, we compare 46 classifiers on each dataset and see in one case up to a 7.6% increase in accuracy on a dataset with three classes from an ensemble scheme and, in a second case, up to 8.5% increase in accuracy on a dataset with two classes. Our study contributes to the field of natural language processing by exploring the reasons for the predominance, in this particular task, of Majority vote over Soft vote. The conclusions are drawn after a thorough investigation of the classifiers that are co-compared with each other through reliability charts, analyses of the confidence the models have in their predictions and their metrics, concluding with statistical analyses using the Friedman test and the Nemenyi post-hoc test with useful conclusions.<\/jats:p>","DOI":"10.3390\/computers14050167","type":"journal-article","created":{"date-parts":[[2025,4,30]],"date-time":"2025-04-30T05:05:57Z","timestamp":1745989557000},"page":"167","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":5,"title":["From Transformers to Voting Ensembles for Interpretable Sentiment Classification: A Comprehensive Comparison"],"prefix":"10.3390","volume":"14","author":[{"ORCID":"https:\/\/orcid.org\/0009-0005-9986-9236","authenticated-orcid":false,"given":"Konstantinos","family":"Kyritsis","sequence":"first","affiliation":[{"name":"Department of Electrical & Computer Engineering, University of Peloponnese, 26334 Patras, Greece"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Charalampos M.","family":"Liapis","sequence":"additional","affiliation":[{"name":"Department of Mathematics, University of Patras, 26504 Patras, Greece"},{"name":"Computer Technology Institute & Press \u201cDiophantus\u201d, 26504 Patras, Greece"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6581-4676","authenticated-orcid":false,"given":"Isidoros","family":"Perikos","sequence":"additional","affiliation":[{"name":"Department of Electrical & Computer Engineering, University of Peloponnese, 26334 Patras, Greece"},{"name":"Computer Technology Institute & Press \u201cDiophantus\u201d, 26504 Patras, Greece"},{"name":"Department of Computer Engineering & Informatics, University of Patras, 26504 Patras, Greece"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Michael","family":"Paraskevas","sequence":"additional","affiliation":[{"name":"Department of Electrical & Computer Engineering, University of Peloponnese, 26334 Patras, Greece"},{"name":"Computer Technology Institute & Press \u201cDiophantus\u201d, 26504 Patras, Greece"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Vaggelis","family":"Kapoulas","sequence":"additional","affiliation":[{"name":"Computer Technology Institute & Press \u201cDiophantus\u201d, 26504 Patras, Greece"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"1968","published-online":{"date-parts":[[2025,4,29]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Liapis, C.M., Kyritsis, K., Perikos, I., Spatiotis, N., and Paraskevas, M. (2024). A Hybrid Ensemble Approach for Greek Text Classification Based on Multilingual Models. Big Data Cogn. Comput., 8.","DOI":"10.3390\/bdcc8100137"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Wang, J.-H., Norouzi, M., and Tsai, S.M. (2024). Augmenting Multimodal Content Representation with Transformers for Misinformation Detection. Big Data Cogn. Comput., 8.","DOI":"10.3390\/bdcc8100134"},{"key":"ref_3","first-page":"21","article-title":"Multi-Class Sentiment Classification on Bengali Social Media Comments Using Machine Learning","volume":"4","author":"Haque","year":"2023","journal-title":"Int. J. Cogn. Comput. Eng."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"e13701","DOI":"10.1111\/exsy.13701","article-title":"Exploring Transformer Models for Sentiment Classification: A Comparison of BERT, RoBERTa, ALBERT, DistilBERT, and XLNet","volume":"41","author":"Areshey","year":"2024","journal-title":"Expert Syst."},{"key":"ref_5","first-page":"40","article-title":"Comparative Analysis of Transformer Based Pre-Trained NLP Models","volume":"8","author":"Singla","year":"2020","journal-title":"Int. J. Comput. Sci. Eng."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Michailidis, P.D. (2024). A Comparative Study of Sentiment Classification Models for Greek Reviews. Big Data Cogn. Comput., 8.","DOI":"10.3390\/bdcc8090107"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Ashbaugh, L., and Zhang, Y. (2024). A Comparative Study of Sentiment Analysis on Customer Reviews Using Machine Learning and Deep Learning. Computers, 13.","DOI":"10.20944\/preprints202411.0741.v1"},{"key":"ref_8","doi-asserted-by":"crossref","unstructured":"Alotaibi, A., and Nadeem, F. (2024). Leveraging Social Media and Deep Learning for Sentiment Analysis for Smart Governance: A Case Study of Public Reactions to Educational Reforms in Saudi Arabia. Computers, 13.","DOI":"10.3390\/computers13110280"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Ali, H., Hashmi, E., Yayilgan Yildirim, S., and Shaikh, S. (2024). Analyzing Amazon Products Sentiment: A Comparative Study of Machine and Deep Learning, and Transformer-Based Techniques. Electronics, 13.","DOI":"10.3390\/electronics13071305"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Radecki, A., and Rybicki, T. (2024). Comparison of Sentiment Analysis Methods Used to Investigate the Quality of Teaching Aids Based on Virtual Simulators of Embedded Systems. Electronics, 13.","DOI":"10.3390\/electronics13101811"},{"key":"ref_11","doi-asserted-by":"crossref","unstructured":"Cui, S., Han, Y., Duan, Y., Li, Y., Zhu, S., and Song, C. (2023). A Two-Stage Voting-Boosting Technique for Ensemble Learning in Social Network Sentiment Classification. Entropy, 25.","DOI":"10.3390\/e25040555"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Wan, Y., and Gao, Q. (2015, January 14\u201317). An Ensemble Sentiment Classification System of Twitter Data for Airline Services Analysis. Proceedings of the 2015 IEEE International Conference on Data Mining Workshop (ICDMW), Atlantic City, NJ, USA.","DOI":"10.1109\/ICDMW.2015.7"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., and Funtowicz, M. (2020). HuggingFace\u2019s Transformers: State-of-the-Art Natural Language Processing. arXiv.","DOI":"10.18653\/v1\/2020.emnlp-demos.6"},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Howard, J., and Ruder, S. (2018). Universal Language Model Fine-Tuning for Text Classification. arXiv.","DOI":"10.18653\/v1\/P18-1031"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., and Zettlemoyer, L. (2019). BART: Denoising Sequence-to-Sequence Pre-Training for Natural Language Generation, Translation, and Comprehension. arXiv.","DOI":"10.18653\/v1\/2020.acl-main.703"},{"key":"ref_16","unstructured":"He, P., Liu, X., Gao, J., and Chen, W. (2021). DeBERTa: Decoding-Enhanced BERT with Disentangled Attention. arXiv."},{"key":"ref_17","unstructured":"He, P., Gao, J., and Chen, W. (2023). DeBERTaV3: Improving DeBERTa Using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing. arXiv."},{"key":"ref_18","first-page":"9","article-title":"Language Models Are Unsupervised Multitask Learners","volume":"1","author":"Radford","year":"2019","journal-title":"OpenAI Blog"},{"key":"ref_19","unstructured":"Biderman, S., Schoelkopf, H., Anthony, Q., Bradley, H., O\u2019Brien, K., Hallahan, E., Khan, M.A., Purohit, S., Prashanth, U.S., and Raff, E. (2023). Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling. arXiv."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Black, S., Biderman, S., Hallahan, E., Anthony, Q., Gao, L., Golding, L., He, H., Leahy, C., McDonell, K., and Phang, J. (2022). GPT-NeoX-20B: An Open-Source Autoregressive Language Model. arXiv.","DOI":"10.18653\/v1\/2022.bigscience-1.9"},{"key":"ref_21","unstructured":"Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., and Liu, P.J. (2023). Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. arXiv."},{"key":"ref_22","unstructured":"Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., and Soricut, R. (2020). ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations. arXiv."},{"key":"ref_23","unstructured":"(2025, February 01). Causal Language Modeling. Available online: https:\/\/huggingface.co\/docs\/transformers\/tasks\/language_modeling."},{"key":"ref_24","unstructured":"Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. arXiv."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"427","DOI":"10.1016\/j.ipm.2009.03.002","article-title":"A Systematic Analysis of Performance Measures for Classification Tasks","volume":"45","author":"Sokolova","year":"2009","journal-title":"Inf. Process. Manag."},{"key":"ref_26","unstructured":"Baeza-Yates, R., Ribeiro-neto, B., Mills, D., Bonn, O., Juan, S., Mexico, M., Taipei, C., Wesley, A., and Limited, L. (1999). Modern Information Retrieval, Association for Computing Machinery."},{"key":"ref_27","first-page":"1","article-title":"The Truth of the F-Measure","volume":"1","author":"Sasaki","year":"2007","journal-title":"Teach Tutor Mater"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Chicco, D., and Jurman, G. (2020). The Advantages of the Matthews Correlation Coefficient (MCC) over F1 Score and Accuracy in Binary Classification Evaluation. BMC Genom., 21.","DOI":"10.1186\/s12864-019-6413-7"},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"37","DOI":"10.1177\/001316446002000104","article-title":"A Coefficient of Agreement for Nominal Scales","volume":"20","author":"Cohen","year":"1960","journal-title":"Educ. Psychol. Meas."},{"key":"ref_30","unstructured":"Lakshminarayanan, B., Pritzel, A., and Blundell, C. (2017). Simple and Scalable Predictive Uncertainty Estimation Using Deep Ensembles. arXiv."},{"key":"ref_31","unstructured":"Guo, C., Pleiss, G., Sun, Y., and Weinberger, K.Q. (2017). On Calibration of Modern Neural Networks. arXiv."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"226","DOI":"10.1109\/34.667881","article-title":"On Combining Classifiers","volume":"20","author":"Kittler","year":"1998","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Dietterich, T.G. (2000, January 21\u201323). Ensemble Methods in Machine Learning. Proceedings of the Multiple Classifier Systems, Cagliari, Italy.","DOI":"10.1007\/3-540-45014-9_1"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"169","DOI":"10.1613\/jair.614","article-title":"Popular Ensemble Methods: An Empirical Study","volume":"11","author":"Maclin","year":"1999","journal-title":"J. Artif. Intell. Res."},{"key":"ref_35","unstructured":"Gupta, N., Smith, J., Adlam, B., and Mariet, Z. (2022). Ensembling over Classifiers: A Bias-Variance Perspective. arXiv."},{"key":"ref_36","first-page":"51","article-title":"A Hybrid Approach to Credit Card Fraud Detection: Integrating Adaboost and Majority Voting for Enhanced Accuracy and Robustness","volume":"2","author":"Kumar","year":"2024","journal-title":"Front. Collab. Res."},{"key":"ref_37","doi-asserted-by":"crossref","unstructured":"Cheng, J., Huang, L., Tang, B., Wu, Q., Wang, M., and Zhang, Z. (2025). A Minority Sample Enhanced Sampler for Crop Classification in Unmanned Aerial Vehicle Remote Sensing Images with Class Imbalance. Agriculture, 15.","DOI":"10.3390\/agriculture15040388"},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"105","DOI":"10.1007\/s10462-016-9518-2","article-title":"The Robustness of Majority Voting Compared to Filtering Misclassified Instances in Supervised Classification Tasks","volume":"49","author":"Smith","year":"2018","journal-title":"Artif. Intell. Rev."},{"key":"ref_39","unstructured":"Khurana, U., Nalisnick, E., Fokkens, A., and Swayamdipta, S. (2024). Crowd-Calibrator: Can Annotator Disagreement Inform Calibration in Subjective Tasks?. arXiv."},{"key":"ref_40","doi-asserted-by":"crossref","unstructured":"Taha, A. (2021). Intelligent Ensemble Learning Approach for Phishing Website Detection Based on Weighted Soft Voting. Mathematics, 9.","DOI":"10.3390\/math9212799"},{"key":"ref_41","unstructured":"DeVries, T., and Taylor, G.W. (2018). Learning Confidence for Out-of-Distribution Detection in Neural Networks. arXiv."},{"key":"ref_42","unstructured":"Minderer, M., Djolonga, J., Romijnders, R., Hubis, F., Zhai, X., Houlsby, N., Tran, D., and Lucic, M. (2021). Revisiting the Calibration of Modern Neural Networks. arXiv."},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Niculescu-Mizil, A., and Caruana, R. (2005, January 7\u201311). Predicting Good Probabilities with Supervised Learning. Proceedings of the 22nd International Conference on Machine Learning-ICML \u201905, Bonn, Germany.","DOI":"10.1145\/1102351.1102430"},{"key":"ref_44","first-page":"1","article-title":"Statistical Comparisons of Classifiers over Multiple Data Sets","volume":"7","year":"2006","journal-title":"J. Mach. Learn. Res."}],"container-title":["Computers"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-431X\/14\/5\/167\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,9]],"date-time":"2025-10-09T17:24:32Z","timestamp":1760030672000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-431X\/14\/5\/167"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,4,29]]},"references-count":44,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2025,5]]}},"alternative-id":["computers14050167"],"URL":"https:\/\/doi.org\/10.3390\/computers14050167","relation":{},"ISSN":["2073-431X"],"issn-type":[{"value":"2073-431X","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,4,29]]}}}