{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,30]],"date-time":"2026-04-30T15:24:06Z","timestamp":1777562646672,"version":"3.51.4"},"reference-count":70,"publisher":"Association for Computing Machinery (ACM)","issue":"5","license":[{"start":{"date-parts":[[2022,4,29]],"date-time":"2022-04-29T00:00:00Z","timestamp":1651190400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"ICT Division, Government of the People\u2019s Republic of Bangladesh"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Asian Low-Resour. Lang. Inf. Process."],"published-print":{"date-parts":[[2022,9,30]]},"abstract":"<jats:p>This article addresses the class imbalance issue in a low-resource language called Bengali. As a use-case, we choose one of the most fundamental NLP tasks, i.e., text classification, where we utilize three benchmark text corpora: fake-news dataset, sentiment analysis dataset, and song lyrics dataset. Each of them contains a critical class imbalance. We attempt to tackle the problem by applying several strategies that include data augmentation with synthetic samples via text and embedding generation in order to augment the proportion of the minority samples. Moreover, we apply ensembling of deep learning models by subsetting the majority samples. Additionally, we enforce the focal loss function for class-imbalanced data classification. We also apply the outlier detection technique, data resampling, and hidden feature extraction to improve the minority-f1 score. All of our experimentations are entirely focused on textual content analysis, which results in a more than<jats:bold>90%<\/jats:bold>minority f1 score for each of the three tasks. It is an excellent outcome on such highly class-imbalanced datasets.<\/jats:p>","DOI":"10.1145\/3511601","type":"journal-article","created":{"date-parts":[[2022,2,3]],"date-time":"2022-02-03T17:54:50Z","timestamp":1643910890000},"page":"1-21","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":11,"title":["Breaking the Curse of Class Imbalance: Bangla Text Classification"],"prefix":"10.1145","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3089-5277","authenticated-orcid":false,"given":"Md.","family":"Rafi-Ur-Rashid","sequence":"first","affiliation":[{"name":"Bangladesh University of Engineering &amp; Technology (BUET), Bangladesh, and United International University, Dhaka, Bangladesh"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1343-7493","authenticated-orcid":false,"given":"Mahim","family":"Mahbub","sequence":"additional","affiliation":[{"name":"Bangladesh University of Engineering &amp; Technology (BUET), Bangladesh, and United International University, Dhaka, Bangladesh"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3219-9053","authenticated-orcid":false,"given":"Muhammad Abdullah","family":"Adnan","sequence":"additional","affiliation":[{"name":"Bangladesh University of Engineering &amp; Technology (BUET), Bangladesh, and United International University, Dhaka, Bangladesh"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,4,29]]},"reference":[{"key":"e_1_3_1_2_1","doi-asserted-by":"crossref","first-page":"237","DOI":"10.1007\/978-3-319-14142-8_8","volume-title":"Data Mining","author":"Aggarwal Charu C.","year":"2015","unstructured":"Charu C. Aggarwal. 2015. Outlier analysis. In Data Mining. Springer, 237\u2013263."},{"key":"e_1_3_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/375663.375668"},{"key":"e_1_3_1_4_1","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-69155-8_9"},{"key":"e_1_3_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/3295662"},{"key":"e_1_3_1_6_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/W14-3623"},{"key":"e_1_3_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.299"},{"key":"e_1_3_1_8_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/K16-1002"},{"key":"e_1_3_1_9_1","article-title":"A survey of predictive modelling under imbalanced distributions","volume":"1505","author":"Branco Paula","year":"2015","unstructured":"Paula Branco, L. Torgo, and Rita P. Ribeiro. 2015. A survey of predictive modelling under imbalanced distributions. ArXiv abs\/1505.01658 (2015).","journal-title":"ArXiv"},{"key":"e_1_3_1_10_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1018054314350"},{"key":"e_1_3_1_11_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF00117832"},{"key":"e_1_3_1_12_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1010933404324"},{"key":"e_1_3_1_13_1","article-title":"Large scale GAN training for high fidelity natural image synthesis","author":"Brock Andrew","year":"2018","unstructured":"Andrew Brock, Jeff Donahue, and Karen Simonyan. 2018. Large scale GAN training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096 (2018).","journal-title":"arXiv preprint arXiv:1809.11096"},{"key":"e_1_3_1_14_1","doi-asserted-by":"crossref","first-page":"124","DOI":"10.1007\/978-3-642-12127-2_13","volume-title":"Multiple Classifier Systems","author":"Brown Gavin","year":"2010","unstructured":"Gavin Brown and Ludmila I. Kuncheva. 2010. \u201cGood\u201d and \u201cbad\u201d diversity in majority vote ensembles. In Multiple Classifier Systems, Neamat El Gayar, Josef Kittler, and Fabio Roli (Eds.). Springer, Berlin,124\u2013133."},{"key":"e_1_3_1_15_1","doi-asserted-by":"publisher","DOI":"10.5555\/1005332.1005347"},{"key":"e_1_3_1_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2939785"},{"key":"e_1_3_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/GRC.2006.1635905"},{"key":"e_1_3_1_18_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF00994018"},{"key":"e_1_3_1_19_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2008.08.010"},{"key":"e_1_3_1_20_1","first-page":"1401","volume-title":"Proceedings of the 16th International Joint Conference on Artificial Intelligence","author":"Freund Yoav","year":"1999","unstructured":"Yoav Freund and Robert E. Schapire. 1999. A short introduction to boosting. In Proceedings of the 16th International Joint Conference on Artificial Intelligence. Morgan Kaufmann, 1401\u20131406."},{"key":"e_1_3_1_21_1","doi-asserted-by":"publisher","DOI":"10.1023\/A:1007465528199"},{"key":"e_1_3_1_22_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSMCC.2011.2161285"},{"key":"e_1_3_1_23_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSMCC.2011.2161285"},{"key":"e_1_3_1_24_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICIIP47207.2019.8985885"},{"key":"e_1_3_1_25_1","unstructured":"Yoav Goldberg and Omer Levy. 2014. word2vec explained: Deriving Mikolov et\u00a0al.\u2019s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722."},{"key":"e_1_3_1_26_1","doi-asserted-by":"publisher","DOI":"10.1109\/UKRCON.2017.8100379"},{"key":"e_1_3_1_27_1","doi-asserted-by":"publisher","DOI":"10.1002\/9781118646106"},{"key":"e_1_3_1_28_1","doi-asserted-by":"publisher","DOI":"10.5555\/3382225.3382283"},{"key":"e_1_3_1_29_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_3_1_30_1","first-page":"2862","volume-title":"Proceedings of the 12th Language Resources and Evaluation Conference","author":"Hossain Md Zobaer","year":"2020","unstructured":"Md Zobaer Hossain, Md Ashraful Rahman, Md Saiful Islam, and Sudipta Kar. 2020. BanFakeNews: A dataset for detecting fake news in Bangla. In Proceedings of the 12th Language Resources and Evaluation Conference. European Language Resources Association, 2862\u20132871. https:\/\/www.aclweb.org\/anthology\/2020.lrec-1.349."},{"key":"e_1_3_1_31_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2016.580"},{"key":"e_1_3_1_32_1","unstructured":"Minhajul Islam. 2009. Research on Bangla language processing in bangladesh: Progress and challenges. In 8th International Language and Development Conference . 23\u201325."},{"key":"e_1_3_1_33_1","doi-asserted-by":"publisher","DOI":"10.3233\/IDA-2002-6504"},{"key":"e_1_3_1_34_1","doi-asserted-by":"publisher","DOI":"10.1186\/s40537-019-0192-5"},{"key":"e_1_3_1_35_1","doi-asserted-by":"publisher","DOI":"10.5555\/2556142"},{"key":"e_1_3_1_36_1","first-page":"3146","volume-title":"Advances in Neural Information Processing Systems 30","author":"Ke Guolin","year":"2017","unstructured":"Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. LightGBM: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 3146\u20133154. http:\/\/papers.nips.cc\/paper\/6907-lightgbm-a-highly-efficient-gradient-boosting-decision-tree.pdf."},{"key":"e_1_3_1_37_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1181"},{"key":"e_1_3_1_38_1","unstructured":"S. Kotsiantis D. Kanellopoulos and P. Pintelas. 2006. Handling imbalanced datasets: A review. GESTS International Transactions on Computer Science and Engineering 30 1 (2006) 25\u201336."},{"key":"e_1_3_1_39_1","first-page":"1097","volume-title":"Advances in Neural Information Processing Systems","author":"Krizhevsky Alex","year":"2012","unstructured":"Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097\u20131105."},{"key":"e_1_3_1_40_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/P14-2050"},{"key":"e_1_3_1_41_1","doi-asserted-by":"publisher","DOI":"10.1109\/IJCNN.2018.8489624"},{"key":"e_1_3_1_42_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.compbiomed.2010.03.005"},{"key":"e_1_3_1_43_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/W14-4320"},{"key":"e_1_3_1_44_1","doi-asserted-by":"publisher","DOI":"10.5555\/2390948.2390966"},{"key":"e_1_3_1_45_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICCV.2017.324"},{"key":"e_1_3_1_46_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSMCB.2008.2007853"},{"key":"e_1_3_1_47_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2011.5947611"},{"key":"e_1_3_1_48_1","unstructured":"Federico Monti Fabrizio Frasca Davide Eynard Damon Mannion and Michael M. Bronstein. 2019. Fake News Detection on Social Media using Geometric Deep Learning. arxiv:cs.SI\/1902.06673."},{"key":"e_1_3_1_49_1","first-page":"291","volume-title":"Advances in Artificial Intelligence (AI\u201909)","author":"Nikulin Vladimir","year":"2009","unstructured":"Vladimir Nikulin, Geoffrey J. McLachlan, and Shu Kay Ng. 2009. Ensemble approach for the classification of imbalanced data. In Advances in Artificial Intelligence (AI\u201909), Ann Nicholson and Xiaodong Li (Eds.). Springer, Berlin,291\u2013300."},{"key":"e_1_3_1_50_1","article-title":"Imbalance problems in object detection: A review","author":"Oksuz Kemal","year":"2020","unstructured":"Kemal Oksuz, Baris Can Cam, Sinan Kalkan, and Emre Akbas. 2020. Imbalance problems in object detection: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 10 (2020), 3388\u20133415.","journal-title":"IEEE Transactions on Pattern Analysis and Machine Intelligence"},{"key":"e_1_3_1_51_1","article-title":"Thumbs up? Sentiment classification using machine learning techniques","author":"Pang Bo","year":"2002","unstructured":"Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up? Sentiment classification using machine learning techniques. arXiv preprint cs\/0205070 (2002).","journal-title":"arXiv preprint cs\/0205070"},{"key":"e_1_3_1_52_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/D14-1162"},{"key":"e_1_3_1_53_1","first-page":"6638","volume-title":"Advances in Neural Information Processing Systems 31","author":"Prokhorenkova Liudmila","year":"2018","unstructured":"Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Dorogush, and Andrey Gulin. 2018. CatBoost: Unbiased boosting with categorical features. In Advances in Neural Information Processing Systems 31, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.). Curran Associates, Inc., 6638\u20136648. http:\/\/papers.nips.cc\/paper\/7898-catboost-unbiased-boosting-with-categorical-features.pdf."},{"key":"e_1_3_1_54_1","doi-asserted-by":"publisher","DOI":"10.1038\/tp.2015.182"},{"key":"e_1_3_1_55_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.wnut-1.8"},{"key":"e_1_3_1_56_1","doi-asserted-by":"publisher","DOI":"10.1109\/78.650093"},{"key":"e_1_3_1_57_1","doi-asserted-by":"publisher","DOI":"10.1109\/TSMCA.2009.2029559"},{"key":"e_1_3_1_58_1","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D17-1066"},{"key":"e_1_3_1_59_1","doi-asserted-by":"publisher","DOI":"10.1145\/3292500.3330935"},{"key":"e_1_3_1_60_1","doi-asserted-by":"publisher","DOI":"10.1145\/3137597.3137600"},{"key":"e_1_3_1_61_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.patcog.2014.11.014"},{"key":"e_1_3_1_62_1","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/P14-1146"},{"key":"e_1_3_1_63_1","doi-asserted-by":"publisher","DOI":"10.1145\/3240508.3240526"},{"key":"e_1_3_1_64_1","doi-asserted-by":"publisher","DOI":"10.1145\/1597735.1597754"},{"key":"e_1_3_1_65_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2017.2788058"},{"key":"e_1_3_1_66_1","doi-asserted-by":"publisher","DOI":"10.1109\/IJCNN.2016.7727770"},{"key":"e_1_3_1_67_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICIP.2018.8451366"},{"key":"e_1_3_1_68_1","doi-asserted-by":"publisher","DOI":"10.1145\/3219819.3219903"},{"key":"e_1_3_1_69_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICIP.2018.8451454"},{"key":"e_1_3_1_70_1","doi-asserted-by":"publisher","DOI":"10.1145\/3289600.3291382"},{"key":"e_1_3_1_71_1","doi-asserted-by":"publisher","DOI":"10.1109\/TKDE.2006.17"}],"container-title":["ACM Transactions on Asian and Low-Resource Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3511601","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3511601","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T17:51:04Z","timestamp":1750182664000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3511601"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,4,29]]},"references-count":70,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2022,9,30]]}},"alternative-id":["10.1145\/3511601"],"URL":"https:\/\/doi.org\/10.1145\/3511601","relation":{},"ISSN":["2375-4699","2375-4702"],"issn-type":[{"value":"2375-4699","type":"print"},{"value":"2375-4702","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,4,29]]},"assertion":[{"value":"2021-05-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-01-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-04-29","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}