{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,14]],"date-time":"2026-05-14T11:05:45Z","timestamp":1778756745219,"version":"3.51.4"},"reference-count":52,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2024,6,17]],"date-time":"2024-06-17T00:00:00Z","timestamp":1718582400000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,6,17]],"date-time":"2024-06-17T00:00:00Z","timestamp":1718582400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"the European University of Atlantic"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Big Data"],"abstract":"<jats:title>Abstract<\/jats:title><jats:p>The classification of imbalanced datasets is a prominent task in text mining and machine learning. The number of samples in each class is not uniformly distributed; one class contains a large number of samples while the other has a small number. Overfitting of the model occurs as a result of imbalanced datasets, resulting in poor performance. In this study, we compare different oversampling techniques like synthetic minority oversampling technique (SMOTE), support vector machine SMOTE (SVM-SMOTE), Border-line SMOTE, K-means SMOTE, and adaptive synthetic (ADASYN) oversampling to address the issue of imbalanced datasets and enhance the performance of machine learning models. Preprocessing significantly enhances the quality of input data by reducing noise, redundant data, and unnecessary data. This enables the machines to identify crucial patterns that facilitate the extraction of significant and pertinent information from the preprocessed data. This study preprocesses the data using various top-level preprocessing steps. Furthermore, two imbalanced Twitter datasets are used to compare the performance of oversampling techniques with six machine learning models including random forest (RF), SVM, K-nearest neighbor (KNN), AdaBoost (ADA), logistic regression (LR), and decision tree (DT). In addition, the bag of words (BoW) and term frequency and inverse document frequency (TF-IDF) features extraction approaches are used to extract features from the tweets. The experiments indicate that SMOTE and ADASYN perform much better than other techniques thus providing higher accuracy. Additionally, overall results show that SVM with \u2019linear\u2019 kernel tends to attain the highest accuracy and recall score of 99.67% and 1.00% on ADASYN oversampled datasets and 99.57% accuracy on SMOTE oversampled dataset with TF-IDF features. The SVM model using 10-fold cross-validation experiments achieved 97.40 mean accuracy with a 0.008 standard deviation. Our approach achieved 2.62% greater accuracy as compared to other current methods.<\/jats:p>","DOI":"10.1186\/s40537-024-00943-4","type":"journal-article","created":{"date-parts":[[2024,6,17]],"date-time":"2024-06-17T15:01:57Z","timestamp":1718636517000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":146,"title":["Data oversampling and imbalanced datasets: an investigation of performance for machine learning and feature engineering"],"prefix":"10.1186","volume":"11","author":[{"given":"Muhammad","family":"Mujahid","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"EROL","family":"K\u0131na","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Furqan","family":"Rustam","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Monica Gracia","family":"Villar","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Eduardo Silva","family":"Alvarado","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Isabel","family":"De La Torre Diez","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Imran","family":"Ashraf","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2024,6,17]]},"reference":[{"issue":"1","key":"943_CR1","doi-asserted-by":"publisher","first-page":"80","DOI":"10.1145\/1007730.1007741","volume":"6","author":"Z Zheng","year":"2004","unstructured":"Zheng Z, Wu X, Srihari R. Feature selection for text categorization on imbalanced data. ACM Sigkdd Explor Newsl. 2004;6(1):80\u20139.","journal-title":"ACM Sigkdd Explor Newsl"},{"key":"943_CR2","doi-asserted-by":"publisher","first-page":"148","DOI":"10.1016\/B978-1-55860-335-6.50026-X","volume-title":"Machine learning proceedings 1994","author":"DD Lewis","year":"1994","unstructured":"Lewis DD, Catlett J. Heterogeneous uncertainty sampling for supervised learning. In: Cohen WW, Hirsh H, editors. Machine learning proceedings 1994. New Brunswick: Elsevier; 1994. p. 148\u201356."},{"key":"943_CR3","first-page":"237","volume-title":"Pacific Rim international conference on artificial intelligence","author":"RA Mohammed","year":"2018","unstructured":"Mohammed RA, Wong K-W, Shiratuddin MF, Wang X. Scalable machine learning techniques for highly imbalanced credit card fraud detection: a comparative study. In: Geng X, Kang BH, editors. Pacific Rim international conference on artificial intelligence. Nanjing: Springer; 2018. p. 237\u201346."},{"issue":"5","key":"943_CR4","doi-asserted-by":"publisher","first-page":"429","DOI":"10.3233\/IDA-2002-6504","volume":"6","author":"N Japkowicz","year":"2002","unstructured":"Japkowicz N, Stephen S. The class imbalance problem: a systematic study. Intelligent data analysis. 2002;6(5):429\u201349.","journal-title":"Intelligent data analysis"},{"key":"943_CR5","volume-title":"2019 IEEE 10th international conference on awareness science and technology (iCAST)","author":"K Ghosh","year":"2019","unstructured":"Ghosh K, Banerjee A, Chatterjee S, Sen S. Imbalanced twitter sentiment analysis using minority oversampling. In: Ghosh K, editor. 2019 IEEE 10th international conference on awareness science and technology (iCAST). Morioka: IEEE; 2019."},{"key":"943_CR6","volume-title":"Workshop on interactions between data mining and natural language processing (DMNLP 2016)","author":"J Ah-Pine","year":"2016","unstructured":"Ah-Pine J, Soriano-Morales E-P. A study of synthetic oversampling for twitter imbalanced sentiment analysis. In: Ah-Pine J, editor. Workshop on interactions between data mining and natural language processing (DMNLP 2016). Riva del Garda: DMNLP; 2016."},{"key":"943_CR7","doi-asserted-by":"publisher","first-page":"239","DOI":"10.1109\/ASEW52652.2021.00053","volume-title":"2021 36th IEEE\/ACM International conference on automated software engineering workshops (ASEW)","author":"W Aljedaani","year":"2021","unstructured":"Aljedaani W, Rustam F, Ludi S, Ouni A, Mkaouer MW. Learning sentiment analysis for accessibility user reviews. In: Aljedaani W, editor. 2021 36th IEEE\/ACM International conference on automated software engineering workshops (ASEW). Melbourne: IEEE; 2021. p. 239\u201346."},{"key":"943_CR8","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2023.3309697","author":"KM Hasib","year":"2023","unstructured":"Hasib KM, Azam S, Karim A, Al Marouf A, Shamrat FJM, Montaha S, Yeo KC, Jonkman M, Alhajj R, Rokne JG. Mcnn-lstm: combining CNN and LSTM to classify multi-class text in imbalanced news data. IEEE Access. 2023. https:\/\/doi.org\/10.1109\/ACCESS.2023.3309697.","journal-title":"IEEE Access"},{"key":"943_CR9","doi-asserted-by":"publisher","DOI":"10.1016\/j.engappai.2023.106688","volume":"125","author":"KM Hasib","year":"2023","unstructured":"Hasib KM, Towhid NA, Faruk KO, Al Mahmud J, Mridha M. Strategies for enhancing the performance of news article classification in bangla: handling imbalance and interpretation. Eng Appl Artif Intell. 2023;125: 106688.","journal-title":"Eng Appl Artif Intell"},{"key":"943_CR10","first-page":"1","volume-title":"2015 2nd International conference on advanced informatics: concepts, theory and applications (ICAICTA)","author":"P Sarakit","year":"2015","unstructured":"Sarakit P, Theeramunkong T, Haruechaiyasak C. Improving emotion classification in imbalanced youtube dataset using smote algorithm. In: Sarakit P, editor. 2015 2nd International conference on advanced informatics: concepts, theory and applications (ICAICTA). Chonburi: IEEE; 2015. p. 1\u20135."},{"key":"943_CR11","doi-asserted-by":"publisher","first-page":"78621","DOI":"10.1109\/ACCESS.2021.3083638","volume":"9","author":"V Rupapara","year":"2021","unstructured":"Rupapara V, Rustam F, Shahzad HF, Mehmood A, Ashraf I, Choi GS. Impact of smote on imbalanced text features for toxic comments classification using rvvc model. IEEE Access. 2021;9:78621\u201334.","journal-title":"IEEE Access"},{"key":"943_CR12","first-page":"1","volume-title":"2018 International conference on engineering, applied sciences, and technology (ICEAST)","author":"AC Flores","year":"2018","unstructured":"Flores AC, Icoy RI, Pe\u00f1a CF, Gorro KD. An evaluation of SVM and naive bayes with smote on sentiment analysis data set. In: Flores AC, editor. 2018 International conference on engineering, applied sciences, and technology (ICEAST). Phuket: IEEE; 2018. p. 1\u20134."},{"key":"943_CR13","doi-asserted-by":"publisher","DOI":"10.1155\/2022\/6614730","author":"A Al-Hashedi","year":"2022","unstructured":"Al-Hashedi A, Al-Fuhaidi B, Mohsen AM, Ali Y, Gamal Al-Kaf HA, Al-Sorori W, Maqtary N. Ensemble classifiers for Arabic sentiment analysis of social network (twitter data) towards COVID-19-related conspiracy theories. Appl Comput Intell Soft Comput. 2022. https:\/\/doi.org\/10.1155\/2022\/6614730.","journal-title":"Appl Comput Intell Soft Comput"},{"key":"943_CR14","doi-asserted-by":"publisher","first-page":"359","DOI":"10.1016\/j.procs.2017.05.365","volume":"109","author":"S Al-Azani","year":"2017","unstructured":"Al-Azani S, El-Alfy E-SM. Using word embedding and ensemble learning for highly imbalanced data sentiment analysis in short arabic text. Proc Comput Sci. 2017;109:359\u201366.","journal-title":"Proc Comput Sci"},{"issue":"18","key":"943_CR15","doi-asserted-by":"publisher","first-page":"6253","DOI":"10.3390\/app10186253","volume":"10","author":"G Rivera","year":"2020","unstructured":"Rivera G, Florencia R, Garc\u00eda V, Ruiz A, S\u00e1nchez-Sol\u00eds JP. News classification for identifying traffic incident points in a spanish-speaking country: a real-world case study of class imbalance learning. Appl Sci. 2020;10(18):6253.","journal-title":"Appl Sci"},{"issue":"47","key":"943_CR16","doi-asserted-by":"publisher","first-page":"35995","DOI":"10.1007\/s11042-020-09138-4","volume":"79","author":"A Banerjee","year":"2020","unstructured":"Banerjee A, Bhattacharjee M, Ghosh K, Chatterjee S. Synthetic minority oversampling in addressing imbalanced sarcasm detection in social media. Multimed Tools Appl. 2020;79(47):35995\u20136031.","journal-title":"Multimed Tools Appl"},{"key":"943_CR17","unstructured":"Glazkova A. A comparison of synthetic oversampling methods for multi-class text classification. arXiv preprint. 2020. arXiv:2008.04636."},{"issue":"2","key":"943_CR18","doi-asserted-by":"publisher","first-page":"226","DOI":"10.1007\/s12559-015-9319-y","volume":"7","author":"R Xu","year":"2015","unstructured":"Xu R, Chen T, Xia Y, Lu Q, Liu B, Wang X. Word embedding composition for data imbalances in sentiment and emotion classification. Cogn Comput. 2015;7(2):226\u201340.","journal-title":"Cogn Comput"},{"issue":"2","key":"943_CR19","doi-asserted-by":"publisher","first-page":"137","DOI":"10.1007\/s40012-018-0193-0","volume":"6","author":"S Saumya","year":"2018","unstructured":"Saumya S, Singh JP. Detection of spam reviews: a sentiment analysis approach. CSI Trans ICT. 2018;6(2):137\u201348.","journal-title":"CSI Trans ICT"},{"key":"943_CR20","doi-asserted-by":"publisher","first-page":"399","DOI":"10.1109\/CCWC54503.2022.9720806","volume-title":"2022 IEEE 12th annual computing and communication workshop and conference (CCWC)","author":"KM Hasib","year":"2022","unstructured":"Hasib KM, Rahman F, Hasnat R, Alam MGR. A machine learning and explainable AI approach for predicting secondary school student performance. In: Hasib KM, editor. 2022 IEEE 12th annual computing and communication workshop and conference (CCWC). Las Vegas: IEEE; 2022. p. 399\u2013405."},{"issue":"18","key":"943_CR21","doi-asserted-by":"publisher","first-page":"8438","DOI":"10.3390\/app11188438","volume":"11","author":"M Mujahid","year":"2021","unstructured":"Mujahid M, Lee E, Rustam F, Washington PB, Ullah S, Reshi AA, Ashraf I. Sentiment analysis and topic modeling on tweets about online education during COVID-19. Appl Sci. 2021;11(18):8438.","journal-title":"Appl Sci"},{"key":"943_CR22","doi-asserted-by":"publisher","first-page":"1353","DOI":"10.3390\/healthcare9101353","volume":"9","author":"J Liu","year":"2021","unstructured":"Liu J, Lu S, Lu C. Exploring and monitoring the reasons for hesitation with COVID-19 vaccine based on social-platform text and classification algorithms. Healthcare. 2021;9:1353.","journal-title":"Healthcare"},{"issue":"2","key":"943_CR23","doi-asserted-by":"publisher","first-page":"109","DOI":"10.21609\/jiki.v13i2.885","volume":"13","author":"R Ardianto","year":"2020","unstructured":"Ardianto R, Rivanie T, Alkhalifi Y, Nugraha FS, Gata W. Sentiment analysis on e-sports for education curriculum using naive bayes and support vector machine. Jurnal Ilmu Komputer dan Informasi. 2020;13(2):109\u201322.","journal-title":"Jurnal Ilmu Komputer dan Informasi"},{"key":"943_CR24","doi-asserted-by":"publisher","DOI":"10.1016\/j.cosrev.2021.100395","volume":"40","author":"T Balaji","year":"2021","unstructured":"Balaji T, Annavarapu CSR, Bablani A. Machine learning algorithms for social media analysis: a survey. Comput Sci Rev. 2021;40: 100395.","journal-title":"Comput Sci Rev"},{"issue":"1","key":"943_CR25","doi-asserted-by":"publisher","first-page":"59","DOI":"10.1177\/0165551521991037","volume":"49","author":"B Parlak","year":"2023","unstructured":"Parlak B, Uysal AK. A novel filter feature selection method for text classification: extensive feature selector. J Inform Sci. 2023;49(1):59\u201378.","journal-title":"J Inform Sci"},{"issue":"6","key":"943_CR26","doi-asserted-by":"publisher","first-page":"727","DOI":"10.1177\/0165551520930897","volume":"47","author":"B Parlak","year":"2021","unstructured":"Parlak B, Uysal AK. The effects of globalisation techniques on feature selection for text classification. J Inform Sci. 2021;47(6):727\u201339.","journal-title":"J Inform Sci"},{"key":"943_CR27","doi-asserted-by":"publisher","DOI":"10.1109\/TCSS.2023.3263128","author":"KM Hasib","year":"2023","unstructured":"Hasib KM, Islam MR, Sakib S, Akbar MA, Razzak I, Alam MS. Depression detection from social networks data based on machine learning and deep learning techniques: An interrogative survey. IEEE Trans Comput Soc Syst. 2023. https:\/\/doi.org\/10.1109\/TCSS.2023.3263128.","journal-title":"IEEE Trans Comput Soc Syst"},{"key":"943_CR28","doi-asserted-by":"publisher","first-page":"108545","DOI":"10.1109\/ACCESS.2022.3213818","volume":"10","author":"KM Hasib","year":"2022","unstructured":"Hasib KM, Tanzim A, Shin J, Faruk KO, Al Mahmud J, Mridha MF. Bmnet-5: a novel approach of neural network to classify the genre of bengali music based on audio features. IEEE Access. 2022;10:108545\u201363.","journal-title":"IEEE Access"},{"key":"943_CR29","volume-title":"2021 International conference on information and communication technology for sustainable development (ICICT4SD)","author":"KM Hasib","year":"2021","unstructured":"Hasib KM, Habib MA, Towhid NA, Showrov MIH. A novel deep learning based sentiment analysis of twitter data for us airline service. In: Hasib KM, editor. 2021 International conference on information and communication technology for sustainable development (ICICT4SD). Dhaka: IEEE; 2021."},{"key":"943_CR30","unstructured":"Kaggle: ENDviolence Tweets. 2021. https:\/\/www.kaggle.com\/datasets\/shivamb\/real-or-fake-fake-jobposting-prediction\/metadata. Accessed  22 Feb 2024."},{"issue":"1","key":"943_CR31","first-page":"7","volume":"5","author":"S Vijayarani","year":"2015","unstructured":"Vijayarani S, Ilamathi MJ, Nithya M, et al. Preprocessing techniques for text mining-an overview. Int J Comput Sci Commun Netw. 2015;5(1):7\u201316.","journal-title":"Int J Comput Sci Commun Netw"},{"key":"943_CR32","first-page":"379","volume":"99","author":"S Scott","year":"1999","unstructured":"Scott S, Matwin S. Citeseer. Feature engineering for text classification. 1999;99:379\u201388.","journal-title":"Feature engineering for text classification"},{"issue":"1","key":"943_CR33","doi-asserted-by":"publisher","first-page":"43","DOI":"10.1007\/s13042-010-0001-0","volume":"1","author":"Y Zhang","year":"2010","unstructured":"Zhang Y, Jin R, Zhou Z-H. Understanding bag-of-words model: a statistical framework. Int J Mach Learn Cybern. 2010;1(1):43\u201352.","journal-title":"Int J Mach Learn Cybern"},{"issue":"1","key":"943_CR34","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1038\/srep30308","volume":"6","author":"Y Cong","year":"2016","unstructured":"Cong Y, Chan Y-B, Ragan MA. A novel alignment-free method for detection of lateral genetic transfer based on tf-idf. Sci Rep. 2016;6(1):1\u201313.","journal-title":"Sci Rep"},{"key":"943_CR35","doi-asserted-by":"publisher","first-page":"321","DOI":"10.1613\/jair.953","volume":"16","author":"NV Chawla","year":"2002","unstructured":"Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. Smote: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321\u201357.","journal-title":"J Artif Intell Res"},{"key":"943_CR36","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.knosys.2018.06.019","volume":"160","author":"Y Li","year":"2018","unstructured":"Li Y, Guo H, Zhang Q, Gu M, Yang J. Imbalanced text sentiment classification using universal and domain-specific knowledge. Knowl Based Syst. 2018;160:1\u201315.","journal-title":"Knowl Based Syst"},{"key":"943_CR37","first-page":"878","volume-title":"International conference on intelligent computing","author":"H Han","year":"2005","unstructured":"Han H, Wang W-Y, Mao B-H. Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: Huang DS, editor. International conference on intelligent computing. Cham: Springer; 2005. p. 878\u201387."},{"issue":"1","key":"943_CR38","doi-asserted-by":"publisher","first-page":"281","DOI":"10.1109\/TSMCB.2008.2002909","volume":"39","author":"Y Tang","year":"2008","unstructured":"Tang Y, Zhang Y.-Q, Chawla N.V, Krasser S. Svms modeling for highly imbalanced classification. IEEE Trans Syst Man Cybern Part B (Cybernetics). 2008;39(1):281\u20138.","journal-title":"IEEE Trans Syst Man Cybern Part B (Cybernetics)"},{"key":"943_CR39","first-page":"1322","volume-title":"2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence)","author":"H He","year":"2008","unstructured":"He H, Bai Y, Garcia EA, Li S. Adasyn: adaptive synthetic sampling approach for imbalanced learning. In: He H, editor. 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence). Hong Kong: IEEE; 2008. p. 1322\u20138."},{"key":"943_CR40","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1016\/j.ins.2018.06.056","volume":"465","author":"G Douzas","year":"2018","unstructured":"Douzas G, Bacao F, Last F. Improving imbalanced learning through a heuristic oversampling method based on k-means and smote. Inform Sci. 2018;465:1\u201320.","journal-title":"Inform Sci"},{"issue":"1","key":"943_CR41","doi-asserted-by":"publisher","first-page":"46","DOI":"10.11591\/ijeecs.v12.i1.pp46-50","volume":"12","author":"MA Fauzi","year":"2018","unstructured":"Fauzi MA. Random forest approach for sentiment analysis in Indonesian. Indonesian J Elect Eng Comput Sci. 2018;12(1):46\u201350.","journal-title":"Indonesian J Elect Eng Comput Sci"},{"issue":"2","key":"943_CR42","doi-asserted-by":"publisher","first-page":"149","DOI":"10.1007\/s10796-008-9131-2","volume":"12","author":"R Yuan","year":"2010","unstructured":"Yuan R, Li Z, Guan X, Xu L. An SVM-based machine learning method for accurate internet traffic classification. Inform Syst Front. 2010;12(2):149\u201356.","journal-title":"Inform Syst Front"},{"key":"943_CR43","doi-asserted-by":"publisher","DOI":"10.1016\/j.knosys.2019.06.032","volume":"187","author":"Y Chen","year":"2020","unstructured":"Chen Y, Hu X, Fan W, Shen L, Zhang Z, Liu X, Du J, Li H, Chen Y, Li H. Fast density peak clustering for large scale data based on KNN. Knowl Based Syst. 2020;187: 104824.","journal-title":"Knowl Based Syst"},{"issue":"5\u20136","key":"943_CR44","doi-asserted-by":"publisher","first-page":"352","DOI":"10.1016\/S1532-0464(03)00034-0","volume":"35","author":"S Dreiseitl","year":"2002","unstructured":"Dreiseitl S, Ohno-Machado L. Logistic regression and artificial neural network classification models: a methodology review. J Biomed Inform. 2002;35(5\u20136):352\u20139.","journal-title":"J Biomed Inform"},{"key":"943_CR45","first-page":"46","volume-title":"2017 International conference control electronics, renewable energy and communications (ICCREC)","author":"W Ramadhan","year":"2017","unstructured":"Ramadhan W, Novianty SA, Setianingsih SC. Sentiment analysis using multinomial logistic regression. In: Ramadhan W, editor. 2017 International conference control electronics, renewable energy and communications (ICCREC). Yogyakarta: IEEE; 2017. p. 46\u20139."},{"issue":"4","key":"943_CR46","doi-asserted-by":"publisher","first-page":"2094","DOI":"10.21275\/v5i4.NOV162954","volume":"5","author":"H Sharma","year":"2016","unstructured":"Sharma H, Kumar S. A survey on decision tree algorithms of classification in data mining. Int J Sci Res (IJSR). 2016;5(4):2094\u20137.","journal-title":"Int J Sci Res (IJSR)"},{"issue":"23","key":"943_CR47","doi-asserted-by":"publisher","first-page":"5077","DOI":"10.3390\/s19235077","volume":"19","author":"S Chen","year":"2019","unstructured":"Chen S, Shen B, Wang X, Yoo S-J. A strong machine learning classifier and decision stumps based hybrid adaboost classification algorithm for cognitive radios. Sensors. 2019;19(23):5077.","journal-title":"Sensors"},{"key":"943_CR48","doi-asserted-by":"publisher","first-page":"523","DOI":"10.7717\/peerj-cs.523","volume":"7","author":"A Alhudhaif","year":"2021","unstructured":"Alhudhaif A. A novel multi-class imbalanced eeg signals classification based on the adaptive synthetic sampling (adasyn) approach. PeerJ Comput Sci. 2021;7:523.","journal-title":"PeerJ Comput Sci"},{"issue":"24","key":"943_CR49","doi-asserted-by":"publisher","first-page":"9019","DOI":"10.3390\/app10249019","volume":"10","author":"A Rodr\u00edguez-Gonz\u00e1lez","year":"2020","unstructured":"Rodr\u00edguez-Gonz\u00e1lez A, Tu\u00f1as JM, Prieto Santamar\u00eda L, Fern\u00e1ndez Peces-Barba D, Menasalvas Ruiz E, Jaramillo A, Cotarelo M, Conejo Fern\u00e1ndez AJ, Arce A, Gil A. Identifying polarity in tweets from an imbalanced dataset about diseases and vaccines using a meta-model based on machine learning techniques. Appl Sci. 2020;10(24):9019.","journal-title":"Appl Sci"},{"issue":"2","key":"943_CR50","doi-asserted-by":"publisher","first-page":"595","DOI":"10.33395\/sinkron.v8i2.12214","volume":"8","author":"F.G Mahmud","year":"2023","unstructured":"Mahmud F.G, Hermanto T.I, Nugroho I.M. Implementation of k-nearest neighbor algorithm with smote for hotel reviews sentiment analysis. Sinkron. 2023;8(2):595\u2013602.","journal-title":"Sinkron"},{"issue":"2","key":"943_CR51","first-page":"279","volume":"8","author":"K Aditya","year":"2023","unstructured":"Aditya K, Wicaksono GW, Heryawan HAS, Aditya CSK. Sentiment analysis of the 2024 presidential candidates using smote and long short term memory. J Inform. 2023;8(2):279\u201386.","journal-title":"J Inform"},{"key":"943_CR52","doi-asserted-by":"publisher","first-page":"111","DOI":"10.1016\/B978-0-443-22009-8.00004-5","volume-title":"Computational intelligence methods for sentiment analysis in natural language processing applications","author":"P Lavanya","year":"2024","unstructured":"Lavanya P, Sasikala E. Enhanced performance of drug review classification from social networks by improved adasyn training and natural language processing techniques. In: Hemanth DJ, editor. Computational intelligence methods for sentiment analysis in natural language processing applications. Amsterdam: Elsevier; 2024. p. 111\u201327."}],"container-title":["Journal of Big Data"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-024-00943-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s40537-024-00943-4\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s40537-024-00943-4.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,6,17]],"date-time":"2024-06-17T15:04:15Z","timestamp":1718636655000},"score":1,"resource":{"primary":{"URL":"https:\/\/journalofbigdata.springeropen.com\/articles\/10.1186\/s40537-024-00943-4"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,6,17]]},"references-count":52,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2024,12]]}},"alternative-id":["943"],"URL":"https:\/\/doi.org\/10.1186\/s40537-024-00943-4","relation":{},"ISSN":["2196-1115"],"issn-type":[{"value":"2196-1115","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,6,17]]},"assertion":[{"value":"27 February 2024","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"31 May 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"17 June 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare no competing interest.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"87"}}