{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,14]],"date-time":"2026-04-14T16:03:36Z","timestamp":1776182616779,"version":"3.50.1"},"reference-count":137,"publisher":"MDPI AG","issue":"5","license":[{"start":{"date-parts":[[2023,4,29]],"date-time":"2023-04-29T00:00:00Z","timestamp":1682726400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"<jats:p>Machine-learning-based text classification is one of the leading research areas and has a wide range of applications, which include spam detection, hate speech identification, reviews, rating summarization, sentiment analysis, and topic modelling. Widely used machine-learning-based research differs in terms of the datasets, training methods, performance evaluation, and comparison methods used. In this paper, we surveyed 224 papers published between 2003 and 2022 that employed machine learning for text classification. The Preferred Reporting Items for Systematic Reviews (PRISMA) statement is used as the guidelines for the systematic review process. The comprehensive differences in the literature are analyzed in terms of six aspects: datasets, machine learning models, best accuracy, performance evaluation metrics, training and testing splitting methods, and comparisons among machine learning models. Furthermore, we highlight the limitations and research gaps in the literature. Although the research works included in the survey perform well in terms of text classification, improvement is required in many areas. We believe that this survey paper will be useful for researchers in the field of text classification.<\/jats:p>","DOI":"10.3390\/a16050236","type":"journal-article","created":{"date-parts":[[2023,5,1]],"date-time":"2023-05-01T12:10:03Z","timestamp":1682943003000},"page":"236","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":70,"title":["Twenty Years of Machine-Learning-Based Text Classification: A Systematic Review"],"prefix":"10.3390","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2531-1326","authenticated-orcid":false,"given":"Ashokkumar","family":"Palanivinayagam","sequence":"first","affiliation":[{"name":"Sri Ramachandra Faculty of Engineering and Technology, Sri Ramachandra Institute of Higher Education and Research, Chennai 600116, India"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8268-8878","authenticated-orcid":false,"given":"Claude Ziad","family":"El-Bayeh","sequence":"additional","affiliation":[{"name":"Department of Electrical Engineering, Bayeh Institute, Amchit 4307, Lebanon"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9990-1084","authenticated-orcid":false,"given":"Robertas","family":"Dama\u0161evi\u010dius","sequence":"additional","affiliation":[{"name":"Department of Software Engineering, Kaunas University of Technology, 44249 Kaunas, Lithuania"}]}],"member":"1968","published-online":{"date-parts":[[2023,4,29]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/505282.505283","article-title":"Machine Learning in Automated Text Categorization","volume":"34","author":"Sebastiani","year":"2002","journal-title":"ACM Comput. Surv."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Kowsari, K., Jafari Meimandi, K., Heidarysafa, M., Mendu, S., Barnes, L., and Brown, D. (2019). Text Classification Algorithms: A Survey. Information, 10.","DOI":"10.3390\/info10040150"},{"key":"ref_3","doi-asserted-by":"crossref","unstructured":"Kapo\u010diute-Dzikiene, J. (2020). A domain-specific generative chatbot trained from little data. Appl. Sci., 10.","DOI":"10.3390\/app10072221"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"1154","DOI":"10.1109\/TCSS.2021.3120138","article-title":"Real-Time Text Classification of User-Generated Content on Social Media: Systematic Review","volume":"9","author":"Rogers","year":"2022","journal-title":"IEEE Trans. Comput. Soc. Syst."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"409","DOI":"10.5755\/j01.itc.51.3.30276","article-title":"BERT-based Transfer Learning Model for COVID-19 Sentiment Analysis on Turkish Instagram Comments","volume":"51","author":"Karayigit","year":"2022","journal-title":"Inf. Technol. Control"},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Kapo\u010di\u016bt\u0117-Dzikien\u0117, J., Dama\u0161evi\u010dius, R., and Wo\u017aniak, M. (2019). Sentiment analysis of Lithuanian texts using traditional and deep learning approaches. Computers, 8.","DOI":"10.3390\/computers8010004"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Tesfagergish, S.G., Kapo\u010di\u016bt\u0117-Dzikien\u0117, J., and Dama\u0161evi\u010dius, R. (2022). Zero-Shot Emotion Detection for Semi-Supervised Sentiment Analysis Using Sentence Transformers and Ensemble Learning. Appl. Sci., 12.","DOI":"10.3390\/app12178662"},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"356","DOI":"10.5755\/j01.itc.51.2.29988","article-title":"Homophobic and Hate Speech Detection Using Multilingual-BERT Model on Turkish Social Media","volume":"51","author":"Karayigit","year":"2022","journal-title":"Inf. Technol. Control"},{"key":"ref_9","doi-asserted-by":"crossref","unstructured":"Aldjanabi, W., Dahou, A., Al-Qaness, M.A.A., Elaziz, M.A., Helmi, A.M., and Dama\u0161evi\u010dius, R. (2021). Arabic offensive and hate speech detection using a cross-corpora multi-task learning model. Informatics, 8.","DOI":"10.3390\/informatics8040069"},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Kapociute-Dzikiene, J., Venckauskas, A., and Damasevicius, R. (2017, January 3\u20136). A comparison of authorship attribution approaches applied on the Lithuanian language. Proceedings of the 2017 Federated Conference on Computer Science and Information Systems, FedCSIS 2017, Prague, Czech Republic.","DOI":"10.15439\/2017F110"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"124","DOI":"10.14445\/22315381\/IJETT-V70I10P214","article-title":"Text Based and Image Based Recommender Systems: Fundamental Concepts, Comprehensive Review and Future Directions","volume":"70","author":"Mathews","year":"2022","journal-title":"Int. J. Eng. Trends Technol."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"40416","DOI":"10.1109\/ACCESS.2019.2897586","article-title":"Recommendation Based on Review Texts and Social Communities: A Hybrid Model","volume":"7","author":"Ji","year":"2019","journal-title":"IEEE Access"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"458","DOI":"10.5755\/j01.itc.50.3.28047","article-title":"Automatic text summarization using deep reinforcement learning and beyond","volume":"50","author":"Sun","year":"2021","journal-title":"Inf. Technol. Control"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"345","DOI":"10.5755\/j01.itc.51.2.30796","article-title":"GATSum: Graph-Based Topic-Aware Abstract Text Summarization","volume":"51","author":"Jiang","year":"2022","journal-title":"Inf. Technol. Control"},{"key":"ref_15","first-page":"411","article-title":"Development of proposed ensemble model for spam e-mail classification","volume":"50","author":"Shrivas","year":"2021","journal-title":"Inf. Technol. Control."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"65703","DOI":"10.1109\/ACCESS.2022.3183083","article-title":"A Systematic Literature Review on Phishing Email Detection Using Natural Language Processing Techniques","volume":"10","author":"Salloum","year":"2022","journal-title":"IEEE Access"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Kapo\u010di\u016bt\u0117-Dzikien\u0117, J., Balodis, K., and Skadi\u0146\u0161, R. (2020). Intent detection problem solving via automatic DNN hyperparameter optimization. Appl. Sci., 10.","DOI":"10.3390\/app10217426"},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"375","DOI":"10.5755\/j01.itc.50.2.25470","article-title":"Big data full-text search index minimization using text summarization","volume":"50","author":"Iqbal","year":"2021","journal-title":"Inf. Technol. Control"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"1883698","DOI":"10.1155\/2022\/1883698","article-title":"A Complete Process of Text Classification System Using State-of-the-Art NLP Models","volume":"2022","author":"Dogra","year":"2022","journal-title":"Comput. Intell. Neurosci."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"526","DOI":"10.1016\/j.compeleceng.2018.05.004","article-title":"Intelligent optimal route recommendation among heterogeneous objects with keywords","volume":"68","author":"Ashokkumar","year":"2018","journal-title":"Comput. Electr. Eng."},{"key":"ref_21","first-page":"21","article-title":"Multi-class sentiment classification on Bengali social media comments using machine learning","volume":"4","author":"Haque","year":"2023","journal-title":"Int. J. Cogn. Comput. Eng."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"39","DOI":"10.1186\/s40854-020-00205-1","article-title":"Comprehensive review of text-mining applications in finance","volume":"6","author":"Gupta","year":"2020","journal-title":"Financ. Innov."},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Li, Q., Li, S., Zhang, S., Hu, J., and Hu, J. (2019). A review of text corpus-based tourism big data mining. Appl. Sci., 9.","DOI":"10.3390\/app9163300"},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"8839524","DOI":"10.1155\/2020\/8839524","article-title":"Text messaging-based medical diagnosis using natural language processing and fuzzy logic","volume":"2020","author":"Omoregbe","year":"2020","journal-title":"J. Healthc. Eng."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Tesfagergish, S.G., Dama\u0161evi\u010dius, R., and Kapo\u010di\u016bt\u0117-Dzikien\u0117, J. (2021). Deep Fake Recognition in Tweets Using Text Augmentation, Word Embeddings and Deep Learning, Springer.","DOI":"10.1007\/978-3-030-86979-3_37"},{"key":"ref_26","first-page":"117","article-title":"Text Classification Techniques: A Literature Review","volume":"13","author":"Thangaraj","year":"2018","journal-title":"Interdiscip. J. Inf. Knowl. Manag."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3439726","article-title":"Deep Learning\u2013based Text Classification","volume":"54","author":"Minaee","year":"2021","journal-title":"ACM Comput. Surv."},{"key":"ref_28","first-page":"3544558","article-title":"A Survey on Data Augmentation for Text Classification","volume":"55","author":"Bayer","year":"2022","journal-title":"ACM Comput. Surv."},{"key":"ref_29","first-page":"1","article-title":"A Survey on Text Classification: From Traditional to Deep Learning","volume":"13","author":"Li","year":"2022","journal-title":"ACM Trans. Intell. Syst. Technol."},{"key":"ref_30","first-page":"1309","article-title":"Review of text classification methods on deep learning","volume":"63","author":"Wu","year":"2020","journal-title":"Comput. Mater. Contin."},{"key":"ref_31","doi-asserted-by":"crossref","first-page":"36","DOI":"10.1016\/j.eswa.2018.03.058","article-title":"A recent overview of the state-of-the-art elements of text classification","volume":"106","author":"Protasiewicz","year":"2018","journal-title":"Expert Syst. Appl."},{"key":"ref_32","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/2046-4053-4-1","article-title":"Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015 statement","volume":"4","author":"Moher","year":"2015","journal-title":"Syst. Rev."},{"key":"ref_33","doi-asserted-by":"crossref","first-page":"1264","DOI":"10.1109\/TKDE.2008.76","article-title":"Text Document Preprocessing with the Bayes Formula for Classification Using the Support Vector Machine","volume":"20","author":"Isa","year":"2008","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"1281","DOI":"10.1016\/j.ipm.2006.11.003","article-title":"Using the revised EM algorithm to remove noisy data for improving the one-against-the-rest method in binary text classification","volume":"43","author":"Han","year":"2007","journal-title":"Inf. Process. Manag."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"102371","DOI":"10.1016\/j.ipm.2020.102371","article-title":"Shallow and deep learning for event relatedness classification","volume":"57","author":"Haneczok","year":"2020","journal-title":"Inf. Process. Manag."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"914","DOI":"10.1016\/j.ipm.2006.09.011","article-title":"Fuzzy support vector machine for multi-class text categorization","volume":"43","author":"Wang","year":"2007","journal-title":"Inf. Process. Manag."},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"101757","DOI":"10.1016\/j.ijdrr.2020.101757","article-title":"Machine-learning methods for identifying social media-based requests for urgent help during hurricanes","volume":"51","author":"Devaraj","year":"2020","journal-title":"Int. J. Disaster Risk Reduct."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"181","DOI":"10.1016\/j.procs.2018.10.374","article-title":"Design of an Interactive Biomedical Text Mining Framework to Recognize Real-Time Drug Entities Using Machine Learning Algorithms","volume":"143","author":"Chukwuocha","year":"2018","journal-title":"Procedia Comput. Sci."},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"102121","DOI":"10.1016\/j.ipm.2019.102121","article-title":"Arabic text classification using deep learning models","volume":"57","author":"Elnagar","year":"2020","journal-title":"Inf. Process. Manag."},{"key":"ref_40","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1016\/j.procs.2016.11.017","article-title":"Machine Learning Models of Text Categorization by Author Gender Using Topic-independent Features","volume":"101","author":"Sboev","year":"2016","journal-title":"Procedia Comput. Sci."},{"key":"ref_41","doi-asserted-by":"crossref","first-page":"794","DOI":"10.1109\/TFUZZ.2017.2690222","article-title":"Fuzzy Bag-of-Words Model for Document Representation","volume":"26","author":"Zhao","year":"2018","journal-title":"IEEE Trans. Fuzzy Syst."},{"key":"ref_42","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.inffus.2020.06.002","article-title":"Deep learning based emotion analysis of microblog texts","volume":"64","author":"Xu","year":"2020","journal-title":"Inf. Fusion"},{"key":"ref_43","doi-asserted-by":"crossref","unstructured":"Baker, L.D., and McCallum, A.K. (1998, January 24\u201328). Distributional Clustering of Words for Text Classification. Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR \u201998, Melbourne, Australia.","DOI":"10.1145\/290941.290970"},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"92120","DOI":"10.1109\/ACCESS.2020.2994450","article-title":"A Hybrid Classification Method via Character Embedding in Chinese Short Text With Few Words","volume":"8","author":"Zhu","year":"2020","journal-title":"IEEE Access"},{"key":"ref_45","doi-asserted-by":"crossref","first-page":"106443","DOI":"10.1016\/j.knosys.2020.106443","article-title":"A machine learning-based investigation utilizing the in-text features for the identification of dominant emotion in an email","volume":"208","author":"Halim","year":"2020","journal-title":"Knowl.-Based Syst."},{"key":"ref_46","doi-asserted-by":"crossref","first-page":"932","DOI":"10.1016\/j.future.2019.09.009","article-title":"Automating orthogonal defect classification using machine learning algorithms","volume":"102","author":"Lopes","year":"2020","journal-title":"Future Gener. Comput. Syst."},{"key":"ref_47","doi-asserted-by":"crossref","first-page":"104302","DOI":"10.1016\/j.ijmedinf.2020.104302","article-title":"Automatic classification of scanned electronic health record documents","volume":"144","author":"Goodrum","year":"2020","journal-title":"Int. J. Med. Inform."},{"key":"ref_48","doi-asserted-by":"crossref","first-page":"428","DOI":"10.1016\/j.procs.2019.09.197","article-title":"A New Method to Identify Short-Text Authors Using Combinations of Machine Learning and Natural Language Processing Techniques","volume":"159","author":"Vijayakumar","year":"2019","journal-title":"Procedia Comput. Sci."},{"key":"ref_49","doi-asserted-by":"crossref","first-page":"81","DOI":"10.1016\/j.dss.2017.03.007","article-title":"A machine learning approach to product review disambiguation based on function, form and behavior classification","volume":"97","author":"Singh","year":"2017","journal-title":"Decis. Support Syst."},{"key":"ref_50","doi-asserted-by":"crossref","first-page":"29051","DOI":"10.1109\/ACCESS.2019.2901933","article-title":"Supervised Paragraph Vector: Distributed Representations of Words, Documents and Class Labels","volume":"7","author":"Park","year":"2019","journal-title":"IEEE Access"},{"key":"ref_51","doi-asserted-by":"crossref","first-page":"146070","DOI":"10.1109\/ACCESS.2019.2944973","article-title":"Topic Modeling Technique for Text Mining Over Biomedical Text Corpora Through Hybrid Inverse Documents Frequency and Fuzzy K-Means Clustering","volume":"7","author":"Rashid","year":"2019","journal-title":"IEEE Access"},{"key":"ref_52","doi-asserted-by":"crossref","first-page":"397","DOI":"10.1109\/TSMCC.2011.2136334","article-title":"Movie Rating and Review Summarization in Mobile Environment","volume":"42","author":"Liu","year":"2012","journal-title":"IEEE Trans. Syst. Man Cybern. Part Appl. Rev."},{"key":"ref_53","doi-asserted-by":"crossref","first-page":"355","DOI":"10.1016\/j.knosys.2008.01.001","article-title":"A comparative study for content-based dynamic spam classification using four machine learning algorithms","volume":"21","author":"Yu","year":"2008","journal-title":"Knowl.-Based Syst."},{"key":"ref_54","doi-asserted-by":"crossref","first-page":"343","DOI":"10.1016\/j.compag.2018.05.007","article-title":"Machine learning for automatic rule classification of agricultural regulations: A case study in Spain","volume":"150","year":"2018","journal-title":"Comput. Electron. Agric."},{"key":"ref_55","doi-asserted-by":"crossref","first-page":"107023","DOI":"10.1016\/j.asoc.2020.107023","article-title":"Analyzing the effectiveness of semi-supervised learning approaches for opinion spam classification","volume":"101","author":"Ligthart","year":"2021","journal-title":"Appl. Soft Comput."},{"key":"ref_56","doi-asserted-by":"crossref","first-page":"101718","DOI":"10.1016\/j.is.2021.101718","article-title":"Multi-label legal document classification: A deep learning-based approach with label-attention and domain-specific pre-training","volume":"106","author":"Song","year":"2021","journal-title":"Inf. Syst."},{"key":"ref_57","first-page":"658","article-title":"Text categorisation in Quran and Hadith: Overcoming the interrelation challenges using machine learning and term weighting","volume":"33","author":"Rostam","year":"2019","journal-title":"J. King Saud Univ.-Comput. Inf. Sci."},{"key":"ref_58","doi-asserted-by":"crossref","first-page":"54","DOI":"10.1016\/j.engappai.2015.03.015","article-title":"A corpus-based semantic kernel for text classification by using meaning values of terms","volume":"43","author":"Diri","year":"2015","journal-title":"Eng. Appl. Artif. Intell."},{"key":"ref_59","doi-asserted-by":"crossref","first-page":"4","DOI":"10.1016\/j.neucom.2015.10.137","article-title":"Using unsupervised clustering approach to train the Support Vector Machine for text classification","volume":"211","author":"Shafiabady","year":"2016","journal-title":"Neurocomputing"},{"key":"ref_60","doi-asserted-by":"crossref","first-page":"193","DOI":"10.1016\/j.asoc.2017.04.069","article-title":"Modified frequency-based term weighting schemes for text classification","volume":"58","author":"Sabbah","year":"2017","journal-title":"Appl. Soft Comput."},{"key":"ref_61","doi-asserted-by":"crossref","first-page":"266","DOI":"10.1016\/j.compeleceng.2017.02.013","article-title":"Machine learning aided Android malware classification","volume":"61","author":"Milosevic","year":"2017","journal-title":"Comput. Electr. Eng."},{"key":"ref_62","doi-asserted-by":"crossref","first-page":"42689","DOI":"10.1109\/ACCESS.2020.2976744","article-title":"Document-Level Text Classification Using Single-Layer Multisize Filters Convolutional Neural Network","volume":"8","author":"Akhter","year":"2020","journal-title":"IEEE Access"},{"key":"ref_63","doi-asserted-by":"crossref","first-page":"105641","DOI":"10.1016\/j.mejo.2022.105641","article-title":"Linear regression combined KNN algorithm to identify latent defects for imbalance data of ICs","volume":"131","author":"Huang","year":"2023","journal-title":"Microelectron. J."},{"key":"ref_64","doi-asserted-by":"crossref","first-page":"2030","DOI":"10.1016\/j.eswa.2010.07.139","article-title":"Two-level hierarchical combination method for text classification","volume":"38","author":"Li","year":"2011","journal-title":"Expert Syst. Appl."},{"key":"ref_65","doi-asserted-by":"crossref","first-page":"11880","DOI":"10.1016\/j.eswa.2012.02.068","article-title":"A hybrid text classification approach with low dependency on parameter by integrating K-nearest neighbor and support vector machine","volume":"39","author":"Wan","year":"2012","journal-title":"Expert Syst. Appl."},{"key":"ref_66","doi-asserted-by":"crossref","first-page":"1684","DOI":"10.1016\/j.eswa.2014.09.031","article-title":"Learning to classify short text from scientific documents using topic models with various types of knowledge","volume":"42","author":"Vo","year":"2015","journal-title":"Expert Syst. Appl."},{"key":"ref_67","doi-asserted-by":"crossref","first-page":"1566","DOI":"10.1109\/TSMCC.2012.2208102","article-title":"Employing Structural and Textual Feature Extraction for Semistructured Document Classification","volume":"42","author":"Khabbaz","year":"2012","journal-title":"IEEE Trans. Syst. Man Cybern. Part Appl. Rev."},{"key":"ref_68","doi-asserted-by":"crossref","first-page":"461","DOI":"10.1016\/j.compeleceng.2017.08.001","article-title":"Significance of machine learning algorithms in professional blogger\u2019s classification","volume":"65","author":"Asim","year":"2018","journal-title":"Comput. Electr. Eng."},{"key":"ref_69","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1016\/j.ijresmar.2018.09.009","article-title":"Comparing automated text classification methods","volume":"36","author":"Hartmann","year":"2019","journal-title":"Int. J. Res. Mark."},{"key":"ref_70","doi-asserted-by":"crossref","first-page":"301109","DOI":"10.1016\/j.fsidi.2021.301109","article-title":"Digital forensics supported by machine learning for the detection of online sexual predatory chats","volume":"36","author":"Ngejane","year":"2021","journal-title":"Forensic Sci. Int. Digit. Investig."},{"key":"ref_71","doi-asserted-by":"crossref","first-page":"482","DOI":"10.5755\/j01.itc.49.4.26808","article-title":"Part-of-speech tagging via deep neural networks for northern-Ethiopic languages","volume":"49","author":"Tesfagergish","year":"2020","journal-title":"Inf. Technol. Control"},{"key":"ref_72","unstructured":"Mikolov, T., Chen, K., Corrado, G.S., and Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space. arXiv."},{"key":"ref_73","unstructured":"Le, Q.V., and Mikolov, T. (2014, January 21\u201326). Distributed Representations of Sentences and Documents. Proceedings of the International Conference on Machine Learning, Beijing, China."},{"key":"ref_74","doi-asserted-by":"crossref","first-page":"135","DOI":"10.1162\/tacl_a_00051","article-title":"Enriching Word Vectors with Subword Information","volume":"5","author":"Bojanowski","year":"2016","journal-title":"Trans. Assoc. Comput. Linguist."},{"key":"ref_75","doi-asserted-by":"crossref","first-page":"165","DOI":"10.1016\/j.patrec.2020.03.003","article-title":"Improving FastText with inverse document frequency of subwords","volume":"133","author":"Choi","year":"2020","journal-title":"Pattern Recognit. Lett."},{"key":"ref_76","doi-asserted-by":"crossref","unstructured":"Athiwaratkun, B., Wilson, A.G., and Anandkumar, A. (2018, January 15\u201320). Probabilistic FastText for Multi-Sense Word Embeddings. Proceedings of the ACL, Melbourne, Australia.","DOI":"10.18653\/v1\/P18-1001"},{"key":"ref_77","doi-asserted-by":"crossref","unstructured":"Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., and Zettlemoyer, L. (2018, January 1\u20136). Deep Contextualized Word Representations. Proceedings of the NAACL, New Orleans, LA, USA.","DOI":"10.18653\/v1\/N18-1202"},{"key":"ref_78","doi-asserted-by":"crossref","unstructured":"Damasevicius, R., Valys, R., and Wozniak, M. (2016, January 6\u20139). Intelligent tagging of online texts using fuzzy logic. Proceedings of the 2016 IEEE Symposium Series on Computational Intelligence, SSCI, Athens, Greece.","DOI":"10.1109\/SSCI.2016.7849917"},{"key":"ref_79","doi-asserted-by":"crossref","first-page":"343","DOI":"10.1016\/j.procs.2021.05.103","article-title":"Sentiment Classification Using fastText Embedding and Deep Learning Model","volume":"189","author":"Khasanah","year":"2021","journal-title":"Procedia Comput. Sci."},{"key":"ref_80","doi-asserted-by":"crossref","first-page":"102122","DOI":"10.1016\/j.ipm.2019.102122","article-title":"Towards a real-time processing framework based on improved distributed recurrent neural network variants with fastText for social big data analytics","volume":"57","author":"Mouline","year":"2020","journal-title":"Inf. Process. Manag."},{"key":"ref_81","doi-asserted-by":"crossref","first-page":"101764","DOI":"10.1016\/j.cose.2020.101764","article-title":"Detecting malicious JavaScript code based on semantic analysis","volume":"93","author":"Fang","year":"2020","journal-title":"Comput. Secur."},{"key":"ref_82","doi-asserted-by":"crossref","first-page":"3401","DOI":"10.1016\/j.aej.2021.02.009","article-title":"Efficient English text classification using selected Machine Learning Techniques","volume":"60","author":"Luo","year":"2021","journal-title":"Alex. Eng. J."},{"key":"ref_83","doi-asserted-by":"crossref","first-page":"103699","DOI":"10.1016\/j.jbi.2021.103699","article-title":"GHS-NET a generic hybridized shallow neural network for multi-label biomedical text classification","volume":"116","author":"Ibrahim","year":"2021","journal-title":"J. Biomed. Inform."},{"key":"ref_84","doi-asserted-by":"crossref","first-page":"177","DOI":"10.1016\/j.neucom.2019.08.082","article-title":"Finding decision jumps in text classification","volume":"371","author":"Liu","year":"2020","journal-title":"Neurocomputing"},{"key":"ref_85","doi-asserted-by":"crossref","first-page":"113987","DOI":"10.1016\/j.eswa.2020.113987","article-title":"Multi-view ensemble learning method for microblog sentiment classification","volume":"166","author":"Ye","year":"2021","journal-title":"Expert Syst. Appl."},{"key":"ref_86","doi-asserted-by":"crossref","first-page":"307","DOI":"10.1016\/j.sbspro.2014.07.098","article-title":"Combining Probabilistic Classifiers for Text Classification","volume":"147","author":"Fragos","year":"2014","journal-title":"Procedia-Soc. Behav. Sci."},{"key":"ref_87","doi-asserted-by":"crossref","first-page":"298","DOI":"10.1016\/j.knosys.2013.09.019","article-title":"Feature selection via maximizing global information gain for text classification","volume":"54","author":"Shang","year":"2013","journal-title":"Knowl.-Based Syst."},{"key":"ref_88","doi-asserted-by":"crossref","unstructured":"Mato\u0161evi\u0107, G., Dob\u0161a, J., and Mladeni\u0107, D. (2021). Using Machine Learning for Web Page Classification in Search Engine Optimization. Future Internet, 13.","DOI":"10.3390\/fi13010009"},{"key":"ref_89","doi-asserted-by":"crossref","first-page":"1922","DOI":"10.1016\/j.patrec.2011.07.010","article-title":"Feature sub-set selection metrics for Arabic text classification","volume":"32","author":"Mesleh","year":"2011","journal-title":"Pattern Recognit. Lett."},{"key":"ref_90","doi-asserted-by":"crossref","unstructured":"Santucci, V., Santarelli, F., Forti, L., and Spina, S. (2020). Automatic Classification of Text Complexity. Appl. Sci., 10.","DOI":"10.3390\/app10207285"},{"key":"ref_91","first-page":"375","article-title":"Leveraging Higher Order Dependencies between Features for Text Classification","volume":"5781","author":"Ganiz","year":"2009","journal-title":"Mach. Learn. Knowl. Discov. Databases Lect. Notes Comput. Sci."},{"key":"ref_92","doi-asserted-by":"crossref","first-page":"1908","DOI":"10.1016\/j.neucom.2015.09.063","article-title":"Hybridized term-weighting method for Dark Web classification","volume":"173","author":"Sabbah","year":"2016","journal-title":"Neurocomputing"},{"key":"ref_93","doi-asserted-by":"crossref","first-page":"1415","DOI":"10.1109\/TKDE.2012.148","article-title":"On the Use of Side Information for Mining Text Data","volume":"26","author":"Aggarwal","year":"2014","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_94","first-page":"e01165","article-title":"Performance evaluation of machine learning tools for detection of phishing attacks on web pages","volume":"16","author":"Ojewumi","year":"2022","journal-title":"Sci. Afr."},{"key":"ref_95","doi-asserted-by":"crossref","first-page":"302","DOI":"10.1109\/TKDE.2018.2883446","article-title":"Learning to Weight for Text Classification","volume":"32","author":"Moreo","year":"2020","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_96","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1016\/j.jbi.2016.05.004","article-title":"A study of the effectiveness of machine learning methods for classification of clinical interview fragments into a large number of categories","volume":"62","author":"Hasan","year":"2016","journal-title":"J. Biomed. Inform."},{"key":"ref_97","doi-asserted-by":"crossref","first-page":"1072","DOI":"10.1016\/j.engappai.2012.09.017","article-title":"Machine learning of syntactic parse trees for search and classification of text","volume":"26","author":"Galitsky","year":"2013","journal-title":"Eng. Appl. Artif. Intell."},{"key":"ref_98","doi-asserted-by":"crossref","first-page":"506","DOI":"10.1016\/j.procs.2013.05.065","article-title":"An EMM-based Approach for Text Classification","volume":"17","author":"Liang","year":"2013","journal-title":"Procedia Comput. Sci."},{"key":"ref_99","doi-asserted-by":"crossref","first-page":"40707","DOI":"10.1109\/ACCESS.2019.2907992","article-title":"Long Document Classification From Local Word Glimpses via Recurrent Attention Learning","volume":"7","author":"He","year":"2019","journal-title":"IEEE Access"},{"key":"ref_100","doi-asserted-by":"crossref","unstructured":"Alhaj, Y.A., Dahou, A., Al-Qaness, M.A.A., Abualigah, L., Abbasi, A.A., Almaweri, N.A.O., Elaziz, M.A., and Dama\u0161evi\u010dius, R. (2022). A Novel Text Classification Technique Using Improved Particle Swarm Optimization: A Case Study of Arabic Language. Future Internet, 14.","DOI":"10.3390\/fi14070194"},{"key":"ref_101","doi-asserted-by":"crossref","first-page":"1575","DOI":"10.1109\/TKDE.2013.19","article-title":"A Similarity Measure for Text Classification and Clustering","volume":"26","author":"Lin","year":"2014","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_102","doi-asserted-by":"crossref","first-page":"843","DOI":"10.1016\/j.is.2011.02.002","article-title":"Word co-occurrence features for text classification","volume":"36","author":"Figueiredo","year":"2011","journal-title":"Inf. Syst."},{"key":"ref_103","doi-asserted-by":"crossref","first-page":"914","DOI":"10.1109\/TIFS.2016.2621888","article-title":"Statistical Features-Based Real-Time Detection of Drifted Twitter Spam","volume":"12","author":"Chen","year":"2017","journal-title":"IEEE Trans. Inf. Forensics Secur."},{"key":"ref_104","doi-asserted-by":"crossref","unstructured":"Babapour, S.M., and Roostaee, M. (2017, January 22). Web pages classification: An effective approach based on text mining techniques. Proceedings of the 2017 IEEE 4th International Conference on Knowledge-Based Engineering and Innovation (KBEI), Tehran, Iran.","DOI":"10.1109\/KBEI.2017.8324994"},{"key":"ref_105","doi-asserted-by":"crossref","first-page":"128","DOI":"10.1016\/j.neucom.2018.07.002","article-title":"Towards perfect text classification with Wikipedia-based semantic Na\u00efve Bayes learning","volume":"315","author":"Kim","year":"2018","journal-title":"Neurocomputing"},{"key":"ref_106","doi-asserted-by":"crossref","unstructured":"Fesseha, A., Xiong, S., Emiru, E.D., Diallo, M., and Dahou, A. (2021). Text Classification Based on Convolutional Neural Networks and Word Embedding for Low-Resource Languages: Tigrinya. Information, 12.","DOI":"10.3390\/info12020052"},{"key":"ref_107","doi-asserted-by":"crossref","unstructured":"Lilleberg, J., Zhu, Y., and Zhang, Y. (2015, January 6\u20138). Support vector machines and Word2vec for text classification with semantic features. Proceedings of the 2015 IEEE 14th International Conference on Cognitive Informatics Cognitive Computing (ICCI*CC), Beijing, China.","DOI":"10.1109\/ICCI-CC.2015.7259377"},{"key":"ref_108","doi-asserted-by":"crossref","first-page":"1022","DOI":"10.1109\/TKDE.2010.160","article-title":"Higher Order Naive Bayes: A Novel Non-IID Approach to Text Classification","volume":"23","author":"Ganiz","year":"2011","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_109","doi-asserted-by":"crossref","unstructured":"Feng, X., Liang, Y., Shi, X., Xu, D., Wang, X., and Guan, R. (2017). Overfitting Reduction of Text Classification Based on AdaBELM. Entropy, 19.","DOI":"10.3390\/e19070330"},{"key":"ref_110","doi-asserted-by":"crossref","first-page":"113898","DOI":"10.1016\/j.eswa.2020.113898","article-title":"Hierarchical and lateral multiple timescales gated recurrent units with pre-trained encoder for long text classification","volume":"165","author":"Moirangthem","year":"2021","journal-title":"Expert Syst. Appl."},{"key":"ref_111","doi-asserted-by":"crossref","first-page":"806","DOI":"10.1016\/j.neucom.2015.09.096","article-title":"Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification","volume":"174","author":"Wang","year":"2016","journal-title":"Neurocomputing"},{"key":"ref_112","doi-asserted-by":"crossref","first-page":"247","DOI":"10.1016\/j.neucom.2016.03.088","article-title":"Multi-label maximum entropy model for social emotion classification over short text","volume":"210","author":"Li","year":"2016","journal-title":"Neurocomputing"},{"key":"ref_113","doi-asserted-by":"crossref","unstructured":"Wang, X., Chen, R., Jia, Y., and Zhou, B. (2013, January 16\u201317). Short Text Classification Using Wikipedia Concept Based Document Representation. Proceedings of the 2013 International Conference on Information Technology and Applications, Chengdu, China.","DOI":"10.1109\/ITA.2013.114"},{"key":"ref_114","doi-asserted-by":"crossref","first-page":"271","DOI":"10.1016\/j.patrec.2020.05.007","article-title":"Learning transferable features in meta-learning for few-shot text classification","volume":"135","author":"Xu","year":"2020","journal-title":"Pattern Recognit. Lett."},{"key":"ref_115","doi-asserted-by":"crossref","first-page":"102410","DOI":"10.1016\/j.ipm.2020.102410","article-title":"Automatic classification of citizen requests for transportation using deep learning: Case study from Boston city","volume":"58","author":"Kim","year":"2021","journal-title":"Inf. Process. Manag."},{"key":"ref_116","doi-asserted-by":"crossref","first-page":"690","DOI":"10.1016\/j.eswa.2007.10.042","article-title":"Imbalanced text classification: A term weighting approach","volume":"36","author":"Liu","year":"2009","journal-title":"Expert Syst. Appl."},{"key":"ref_117","doi-asserted-by":"crossref","first-page":"191","DOI":"10.1016\/j.dss.2009.07.011","article-title":"On strategies for imbalanced text classification using SVM: A comparative study","volume":"48","author":"Sun","year":"2009","journal-title":"Decis. Support Syst."},{"key":"ref_118","doi-asserted-by":"crossref","unstructured":"Triantafyllou, I., Drivas, I.C., and Giannakopoulos, G. (2020). How to Utilize My App Reviews? A Novel Topics Extraction Machine Learning Schema for Strategic Business Purposes. Entropy, 22.","DOI":"10.3390\/e22111310"},{"key":"ref_119","doi-asserted-by":"crossref","first-page":"105949","DOI":"10.1016\/j.knosys.2020.105949","article-title":"A novel method for sentiment classification of drug reviews using fusion of deep and machine learning techniques","volume":"198","author":"Basiri","year":"2020","journal-title":"Knowl.-Based Syst."},{"key":"ref_120","doi-asserted-by":"crossref","first-page":"216","DOI":"10.1016\/j.ins.2018.09.001","article-title":"An analysis of hierarchical text classification using word embeddings","volume":"471","author":"Stein","year":"2019","journal-title":"Inf. Sci."},{"key":"ref_121","doi-asserted-by":"crossref","first-page":"1305","DOI":"10.1109\/TKDE.2004.50","article-title":"Blocking reduction strategies in hierarchical text classification","volume":"16","author":"Sun","year":"2004","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_122","first-page":"46","article-title":"Clustering and classification of email contents","volume":"27","author":"Alsmadi","year":"2015","journal-title":"J. King Saud Univ.-Comput. Inf. Sci."},{"key":"ref_123","doi-asserted-by":"crossref","first-page":"6391","DOI":"10.1016\/j.eswa.2015.04.022","article-title":"LEXA: Building knowledge bases for automatic legal citation classification","volume":"42","author":"Galgani","year":"2015","journal-title":"Expert Syst. Appl."},{"key":"ref_124","doi-asserted-by":"crossref","first-page":"438","DOI":"10.1016\/j.eswa.2015.10.003","article-title":"Active learning for text classification with reusability","volume":"45","author":"Hu","year":"2016","journal-title":"Expert Syst. Appl."},{"key":"ref_125","doi-asserted-by":"crossref","first-page":"100917","DOI":"10.1016\/j.aei.2019.04.007","article-title":"Automated classification of building information modeling (BIM) case studies by BIM use based on natural language processing (NLP) and unsupervised learning","volume":"41","author":"Jung","year":"2019","journal-title":"Adv. Eng. Inform."},{"key":"ref_126","doi-asserted-by":"crossref","first-page":"2839","DOI":"10.1109\/TVCG.2012.277","article-title":"Visual Classifier Training for Text Document Retrieval","volume":"18","author":"Heimerl","year":"2012","journal-title":"IEEE Trans. Vis. Comput. Graph."},{"key":"ref_127","doi-asserted-by":"crossref","first-page":"767","DOI":"10.1007\/s10772-020-09728-5","article-title":"An optimized iterative clustering framework for recognizing speech","volume":"23","author":"Palanivinayagam","year":"2020","journal-title":"Int. J. Speech Technol."},{"key":"ref_128","doi-asserted-by":"crossref","first-page":"83","DOI":"10.1016\/j.eswa.2017.03.020","article-title":"Text classification method based on self-training and LDA topic models","volume":"80","author":"Pavlinek","year":"2017","journal-title":"Expert Syst. Appl."},{"key":"ref_129","doi-asserted-by":"crossref","first-page":"152","DOI":"10.1016\/j.knosys.2016.11.018","article-title":"MDLText: An efficient and lightweight text classifier","volume":"118","author":"Silva","year":"2017","journal-title":"Knowl.-Based Syst."},{"key":"ref_130","doi-asserted-by":"crossref","first-page":"437","DOI":"10.1016\/j.procs.2017.08.058","article-title":"Integrating Low-rank Approximation and Word Embedding for Feature Transformation in the High-dimensional Text Classification","volume":"112","author":"Quoc","year":"2017","journal-title":"Procedia Comput. Sci."},{"key":"ref_131","doi-asserted-by":"crossref","first-page":"232","DOI":"10.1016\/j.eswa.2016.03.045","article-title":"Ensemble of keyword extraction methods and classifiers in text classification","volume":"57","author":"Onan","year":"2016","journal-title":"Expert Syst. Appl."},{"key":"ref_132","doi-asserted-by":"crossref","first-page":"226","DOI":"10.1016\/j.knosys.2012.06.005","article-title":"A novel probabilistic feature selection method for text classification","volume":"36","author":"Uysal","year":"2012","journal-title":"Knowl.-Based Syst."},{"key":"ref_133","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1016\/j.cmpb.2016.08.018","article-title":"Improving the text classification using clustering and a novel HMM to reduce the dimensionality","volume":"136","author":"Borrajo","year":"2016","journal-title":"Comput. Methods Programs Biomed."},{"key":"ref_134","doi-asserted-by":"crossref","first-page":"69","DOI":"10.1016\/j.ins.2003.03.003","article-title":"Web page feature selection and classification using neural networks","volume":"158","author":"Selamat","year":"2004","journal-title":"Inf. Sci."},{"key":"ref_135","doi-asserted-by":"crossref","first-page":"101182","DOI":"10.1016\/j.csl.2020.101182","article-title":"Attention-based BiLSTM fused CNN with gating mechanism model for Chinese long text classification","volume":"68","author":"Deng","year":"2021","journal-title":"Comput. Speech Lang."},{"key":"ref_136","doi-asserted-by":"crossref","first-page":"323","DOI":"10.1016\/j.eswa.2017.03.042","article-title":"Multi-class sentiment classification: The experimental comparisons of feature selection and machine learning algorithms","volume":"80","author":"Liu","year":"2017","journal-title":"Expert Syst. Appl."},{"key":"ref_137","doi-asserted-by":"crossref","first-page":"104","DOI":"10.1016\/j.jss.2013.12.034","article-title":"Evolutionary instance selection for text classification","volume":"90","author":"Tsai","year":"2014","journal-title":"J. Syst. Softw."}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/16\/5\/236\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T19:26:35Z","timestamp":1760124395000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/16\/5\/236"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,4,29]]},"references-count":137,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2023,5]]}},"alternative-id":["a16050236"],"URL":"https:\/\/doi.org\/10.3390\/a16050236","relation":{},"ISSN":["1999-4893"],"issn-type":[{"value":"1999-4893","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,4,29]]}}}