{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,10]],"date-time":"2026-01-10T09:33:10Z","timestamp":1768037590751,"version":"3.49.0"},"reference-count":37,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2020,8,11]],"date-time":"2020-08-11T00:00:00Z","timestamp":1597104000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,8,11]],"date-time":"2020-08-11T00:00:00Z","timestamp":1597104000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Vis. Comput. Ind. Biomed. Art"],"published-print":{"date-parts":[[2020,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>An imbalanced dataset is commonly found in at least one class, which are typically exceeded by the other ones. A machine learning algorithm (classifier) trained with an imbalanced dataset predicts the majority class (frequently occurring) more than the other minority classes (rarely occurring). Training with an imbalanced dataset poses challenges for classifiers; however, applying suitable techniques for reducing class imbalance issues can enhance classifiers\u2019 performance. In this study, we consider an imbalanced dataset from an educational context. Initially, we examine all shortcomings regarding the classification of an imbalanced dataset. Then, we apply data-level algorithms for class balancing and compare the performance of classifiers. The performance of the classifiers is measured using the underlying information in their confusion matrices, such as accuracy, precision, recall, and F measure. The results show that classification with an imbalanced dataset may produce high accuracy but low precision and recall for the minority class. The analysis confirms that undersampling and oversampling are effective for balancing datasets, but the latter dominates.<\/jats:p>","DOI":"10.1186\/s42492-020-00055-9","type":"journal-article","created":{"date-parts":[[2020,8,11]],"date-time":"2020-08-11T00:02:52Z","timestamp":1597104172000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":23,"title":["Conversion of adverse data corpus to shrewd output using sampling metrics"],"prefix":"10.1186","volume":"3","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7637-7870","authenticated-orcid":false,"given":"Shahzad","family":"Ashraf","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sehrish","family":"Saleem","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tauqeer","family":"Ahmed","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Zeeshan","family":"Aslam","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Durr","family":"Muhammad","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2020,8,11]]},"reference":[{"key":"55_CR1","doi-asserted-by":"publisher","first-page":"71013","DOI":"10.1109\/ACCESS.2019.2915611","volume":"7","author":"MAUH Tahir","year":"2019","unstructured":"Tahir MAUH, Asghar S, Manzoor A, Noor MA (2019) A classification model for class imbalance dataset using genetic programming. IEEE Access 7:71013\u201371037. https:\/\/doi.org\/10.1109\/ACCESS.2019.2915611","journal-title":"IEEE Access"},{"key":"55_CR2","doi-asserted-by":"publisher","unstructured":"Ashraf S, Gao MS, Chen ZM, Kamran Haider S, Raza Z (2017) Efficient node monitoring mechanism in WSN using contikimac protocol. Int J Adv Comput Sci Appl 8(11). https:\/\/doi.org\/10.14569\/IJACSA.2017.081152","DOI":"10.14569\/IJACSA.2017.081152"},{"key":"55_CR3","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1109\/ICBDSC.2019.8645608","volume-title":"Tracking student performance in introductory programming by means of machine learning","author":"I Khan","year":"2019","unstructured":"Khan I, Al Sadiri A, Ahmad AR, Jabeur N (2019) Tracking student performance in introductory programming by means of machine learning. Paper presented at the 2019 4th MEC international conference on big data and Smart City (ICBDSC), IEEE, Muscat, pp 1\u20136. https:\/\/doi.org\/10.1109\/ICBDSC.2019.8645608"},{"issue":"5","key":"55_CR4","doi-asserted-by":"publisher","first-page":"173","DOI":"10.18196\/jrc.1535","volume":"1","author":"S Ashraf","year":"2020","unstructured":"Ashraf S, Raza A, Aslam Z, Naeem H, Ahmed T (2020) Underwater resurrection routing synergy using astucious energy pods. J Robot Control JRC 1(5):173\u2013184. https:\/\/doi.org\/10.18196\/jrc.1535","journal-title":"J Robot Control JRC"},{"issue":"1","key":"55_CR5","doi-asserted-by":"publisher","first-page":"10","DOI":"10.1145\/1656274.1656278","volume":"11","author":"M Hall","year":"2009","unstructured":"Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explor Newsl 11(1):10\u201318. https:\/\/doi.org\/10.1145\/1656274.1656278","journal-title":"ACM SIGKDD Explor Newsl"},{"issue":"2","key":"55_CR6","doi-asserted-by":"publisher","first-page":"557","DOI":"10.1016\/j.patcog.2006.01.009","volume":"40","author":"JG Xie","year":"2007","unstructured":"Xie JG, Qiu ZD (2007) The effect of imbalanced data sets on LDA: a theoretical and empirical analysis. Pattern Recogn 40(2):557\u2013562. https:\/\/doi.org\/10.1016\/j.patcog.2006.01.009","journal-title":"Pattern Recogn"},{"key":"55_CR7","unstructured":"Illustration of a Tomek link imbalanced learning. https:\/\/imbalanced-learn.readthedocs.io\/en\/stable\/auto_examples\/under-sampling\/plot_illustration_tomek_links.html. Accessed 16 Jun 2020"},{"issue":"2","key":"55_CR8","doi-asserted-by":"publisher","first-page":"1","DOI":"10.26438\/ijsrcse\/v8i2.19","volume":"8","author":"S Ashraf","year":"2020","unstructured":"Ashraf S, Aslam Z, Yahya A, Tahir A (2020) Underwater routing protocols: analysis of intrepid link selection mechanism, challenges and strategies. Int J Sci Res Comput Sci Eng 8(2):1\u20139. https:\/\/doi.org\/10.26438\/ijsrcse\/v8i2.19","journal-title":"Int J Sci Res Comput Sci Eng"},{"issue":"1","key":"55_CR9","doi-asserted-by":"publisher","first-page":"61","DOI":"10.2478\/cait-2013-0006","volume":"13","author":"D Kabakchieva","year":"2013","unstructured":"Kabakchieva D (2013) Predicting student performance by using data mining methods for classification. Cybern Inf Technol 13(1):61\u201372. https:\/\/doi.org\/10.2478\/cait-2013-0006","journal-title":"Cybern Inf Technol"},{"key":"55_CR10","doi-asserted-by":"publisher","first-page":"1075","DOI":"10.1007\/978-1-4419-1428-6_618","volume-title":"Encyclopedia of the sciences of learning","author":"O Scheuer","year":"2012","unstructured":"Scheuer O, McLaren BM (2012) Educational data mining. In: Seel NM (ed) Encyclopedia of the sciences of learning. Springer, Boston, pp 1075\u20131079. https:\/\/doi.org\/10.1007\/978-1-4419-1428-6_618"},{"issue":"5","key":"55_CR11","doi-asserted-by":"publisher","first-page":"162","DOI":"10.46501\/IJMTST060525","volume":"6","author":"S Ashraf","year":"2020","unstructured":"Ashraf S, Arfeen ZA, Khan MA, Ahmed T (2020) SLM-OJ: surrogate learning mechanism during outbreak juncture. Int J Mod Trends Sci Technol 6(5):162\u2013167. https:\/\/doi.org\/10.46501\/IJMTST060525","journal-title":"Int J Mod Trends Sci Technol"},{"key":"55_CR12","doi-asserted-by":"publisher","first-page":"401","DOI":"10.1109\/ICITISEE48480.2019.9003803","volume-title":"Hybrid resampling for imbalanced class handling on web phishing classification dataset","author":"Y Pristyanto","year":"2019","unstructured":"Pristyanto Y, Dahlan A (2019) Hybrid resampling for imbalanced class handling on web phishing classification dataset. Paper presented at the 2019 4th international conference on information technology, information systems and electrical engineering (ICITISEE), IEEE, Yogyakarta, pp 401\u2013406. https:\/\/doi.org\/10.1109\/ICITISEE48480.2019.9003803"},{"issue":"2","key":"55_CR13","doi-asserted-by":"publisher","first-page":"117","DOI":"10.1016\/j.aci.2014.03.002","volume":"12","author":"S Sasikala","year":"2016","unstructured":"Sasikala S, Appavu Alias Balamurugan S, Geetha S (2016) Multi filtration feature selection (MFFS) to improve discriminatory ability in clinical data set. Appl Comput Inform 12(2):117\u2013127. https:\/\/doi.org\/10.1016\/j.aci.2014.03.002","journal-title":"Appl Comput Inform"},{"issue":"19","key":"55_CR14","doi-asserted-by":"publisher","first-page":"14","DOI":"10.5120\/ijca2019919607","volume":"177","author":"S Fatima","year":"2019","unstructured":"Fatima S, Mahgoub S (2019) Predicting student's performance in education using data mining techniques. Int J Comput Appl 177(19):14\u201320. https:\/\/doi.org\/10.5120\/ijca2019919607","journal-title":"Int J Comput Appl"},{"key":"55_CR15","doi-asserted-by":"publisher","first-page":"249","DOI":"10.1016\/j.neunet.2018.07.011","volume":"106","author":"M Buda","year":"2018","unstructured":"Buda M, Maki A, Mazurowski MA (2018) A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw 106:249\u2013259. https:\/\/doi.org\/10.1016\/j.neunet.2018.07.011","journal-title":"Neural Netw"},{"key":"55_CR16","doi-asserted-by":"publisher","first-page":"3526539","DOI":"10.1155\/2019\/3526539","volume":"2019","author":"WH Xie","year":"2019","unstructured":"Xie WH, Liang GQ, Dong ZH, Tan BY, Zhang BS (2019) An improved oversampling algorithm based on the samples' selection strategy for classifying imbalanced data. Math Probl Eng 2019:3526539. https:\/\/doi.org\/10.1155\/2019\/3526539","journal-title":"Math Probl Eng"},{"key":"55_CR17","doi-asserted-by":"publisher","first-page":"9625974","DOI":"10.1155\/2020\/9625974","volume":"2020","author":"S Ashraf","year":"2020","unstructured":"Ashraf S, Gao MS, Zheng MC, Ahmed T, Raza A, Naeem H (2020) USPF: underwater shrewd packet flooding mechanism through surrogate holding time. Wirel Commun Mob Comput 2020:9625974. https:\/\/doi.org\/10.1155\/2020\/9625974","journal-title":"Wirel Commun Mob Comput"},{"key":"55_CR18","doi-asserted-by":"publisher","first-page":"262","DOI":"10.1007\/978-3-642-41822-8_33","volume-title":"An empirical study of oversampling and undersampling for instance selection methods on imbalance datasets","author":"J Hernandez","year":"2013","unstructured":"Hernandez J, Carrasco-Ochoa JA, Mart\u00ednez-Trinidad JF (2013) An empirical study of oversampling and undersampling for instance selection methods on imbalance datasets. Paper presented at the 18th Iberoamerican Congress on Pattern Recognition, Springer, Berlin, pp 262\u2013269. https:\/\/doi.org\/10.1007\/978-3-642-41822-8_33"},{"key":"55_CR19","doi-asserted-by":"publisher","first-page":"81794","DOI":"10.1109\/ACCESS.2019.2923846","volume":"7","author":"Y Liu","year":"2019","unstructured":"Liu Y, Wang YZ, Ren XG, Zhou H, Diao XC (2019) A classification method based on feature selection for imbalanced data. IEEE Access 7:81794\u201381807. https:\/\/doi.org\/10.1109\/ACCESS.2019.2923846","journal-title":"IEEE Access"},{"issue":"1","key":"55_CR20","doi-asserted-by":"publisher","first-page":"22","DOI":"10.13005\/ojcst13.01.02","volume":"13","author":"S Ashraf","year":"2020","unstructured":"Ashraf S, Ahmed T, Saleem S, Aslam Z (2020) Diverging mysterious in green supply chain management. Orient J Comput Sci Technol 13(1):22\u201328. https:\/\/doi.org\/10.13005\/ojcst13.01.02","journal-title":"Orient J Comput Sci Technol"},{"key":"55_CR21","doi-asserted-by":"publisher","first-page":"28100","DOI":"10.1109\/ACCESS.2019.2901860","volume":"7","author":"A Arshad","year":"2019","unstructured":"Arshad A, Riaz S, Jiao LC (2019) Semi-supervised deep fuzzy C-mean clustering for imbalanced multi-class classification. IEEE Access 7:28100\u201328112. https:\/\/doi.org\/10.1109\/ACCESS.2019.2901860","journal-title":"IEEE Access"},{"issue":"3","key":"55_CR22","doi-asserted-by":"publisher","first-page":"234","DOI":"10.3934\/ElectrEng.2020.3.234","volume":"4","author":"S Ashraf","year":"2020","unstructured":"Ashraf S, Ahmad A, Yahya A, Ahmed T (2020) Underwater routing protocols: analysis of link selection challenges. AIMS Electron Electr Eng 4(3):234\u2013248. https:\/\/doi.org\/10.3934\/ElectrEng.2020.3.234","journal-title":"AIMS Electron Electr Eng"},{"key":"55_CR23","doi-asserted-by":"publisher","first-page":"500","DOI":"10.1016\/j.procs.2015.07.372","volume":"57","author":"P Kaur","year":"2015","unstructured":"Kaur P, Singh M, Josan GS (2015) Classification and prediction based data mining algorithms to predict slow learners in education sector. Procedia Comput Sci 57:500\u2013508. https:\/\/doi.org\/10.1016\/j.procs.2015.07.372","journal-title":"Procedia Comput Sci"},{"issue":"1","key":"55_CR24","doi-asserted-by":"publisher","first-page":"74","DOI":"10.3390\/smartcities3010005","volume":"3","author":"S Ashraf","year":"2020","unstructured":"Ashraf S, Ahmed T, Raza A, Naeem H (2020) Design of shrewd underwater routing synergy using porous energy shells. Smart Cities 3(1):74\u201392. https:\/\/doi.org\/10.3390\/smartcities3010005","journal-title":"Smart Cities"},{"key":"55_CR25","doi-asserted-by":"publisher","first-page":"918","DOI":"10.1109\/COMPSAC.2019.00140","volume-title":"Improving prediction accuracy for logistic regression on imbalanced datasets","author":"H Zhang","year":"2019","unstructured":"Zhang H, Li ZL, Shahriar H, Tao LX, Bhattacharya P, Qian Y (2019) Improving prediction accuracy for logistic regression on imbalanced datasets. Paper presented at the 2019 IEEE 43rd annual computer software and applications conference (COMPSAC), IEEE, Milwaukee, pp 918\u2013919. https:\/\/doi.org\/10.1109\/COMPSAC.2019.00140"},{"issue":"1","key":"55_CR26","doi-asserted-by":"publisher","first-page":"20","DOI":"10.1145\/1007730.1007735","volume":"6","author":"GEAPA Batista","year":"2004","unstructured":"Batista GEAPA, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor Newsl 6(1):20\u201329. https:\/\/doi.org\/10.1145\/1007730.1007735","journal-title":"ACM SIGKDD Explor Newsl"},{"issue":"2","key":"55_CR27","doi-asserted-by":"publisher","first-page":"71","DOI":"10.46565\/jreas.2020.v05i02.006","volume":"5","author":"S Ashraf","year":"2020","unstructured":"Ashraf S, Saleem S, Chohan AH, Aslam Z, Raza A (2020) Challenging strategic trends in green supply chain management. Int J Res Eng Appl Sci JREAS 5(2):71\u201374. https:\/\/doi.org\/10.46565\/jreas.2020.v05i02.006","journal-title":"Int J Res Eng Appl Sci JREAS"},{"key":"55_CR28","unstructured":"Bayesian Statistics. Analytics Vidhya, Jun. 20, 2016. https:\/\/www.analyticsvidhya.com\/blog\/2016\/06\/bayesian-statistics-beginners-simple-english\/. Accessed 16 Jun 2020"},{"issue":"2","key":"55_CR29","first-page":"12","volume":"10","author":"S Ashraf","year":"2020","unstructured":"Ashraf S, Saleem S, Ahmed T (2020) Sagacious communication link selection mechanism for underwater wireless sensors network. Int J Wirel Microw Technol 10(2):12\u201325","journal-title":"Int J Wirel Microw Technol"},{"issue":"4","key":"55_CR30","first-page":"126","volume":"42","author":"JF Magee","year":"1964","unstructured":"Magee JF (1964) Decision trees for decision making. Harv Bus Rev 42(4):126\u2013138","journal-title":"Harv Bus Rev"},{"issue":"1","key":"55_CR31","doi-asserted-by":"publisher","first-page":"8","DOI":"10.17352\/tcsit.000012","volume":"5","author":"S Ashraf","year":"2020","unstructured":"Ashraf S, Ahmed T (2020) Dual-nature biometric recognition epitome. Trends Comput Sci Inf Technol 5(1):8\u201314. https:\/\/doi.org\/10.17352\/tcsit.000012","journal-title":"Trends Comput Sci Inf Technol"},{"key":"55_CR32","unstructured":"Accuracy, Precision, Recall & F1 Score: interpretation of performance measures. https:\/\/blog.exsilio.com\/all\/accuracy-precision-recall-f1-score-interpretation-of-performance-measures\/. Accessed 16 Jun 2020"},{"issue":"3","key":"55_CR33","first-page":"1","volume":"8","author":"S Ashraf","year":"2020","unstructured":"Ashraf S, Yahya A, Khan MA (2020) Culminate coverage for sensor network through bodacious-instance mechanism. Manag J Wirel Commun Netw 8(3):1\u20137","journal-title":"Manag J Wirel Commun Netw"},{"issue":"2","key":"55_CR34","first-page":"7","volume":"10","author":"S Ashraf","year":"2020","unstructured":"Ashraf S, Ahmed T (2020) Machine learning shrewd approach for an imbalanced dataset conversion samples. J Engneering Technol 10(2):7\u201325","journal-title":"J Engneering Technol"},{"issue":"6","key":"55_CR35","doi-asserted-by":"publisher","first-page":"1104","DOI":"10.1109\/TKDE.2019.2898861","volume":"32","author":"ERQ Fernandes","year":"2020","unstructured":"Fernandes ERQ, de Carvalho ACPLF, Yao X (2020) Ensemble of classifiers based on multiobjective genetic sampling for imbalanced data. IEEE Trans Knowl Data Eng 32(6):1104\u20131115. https:\/\/doi.org\/10.1109\/TKDE.2019.2898861","journal-title":"IEEE Trans Knowl Data Eng"},{"key":"55_CR36","doi-asserted-by":"publisher","first-page":"104814","DOI":"10.1016\/j.knosys.2019.06.022","volume":"187","author":"BS Raghuwanshi","year":"2020","unstructured":"Raghuwanshi BS, Shukla S (2020) SMOTE based class-specific extreme learning machine for imbalanced learning. Knowl -Based Syst 187:104814. https:\/\/doi.org\/10.1016\/j.knosys.2019.06.022","journal-title":"Knowl -Based Syst"},{"key":"55_CR37","first-page":"421","volume":"8","author":"S Ashraf","year":"2020","unstructured":"Ashraf S, Muhammad D, Khan MA, Ahmed T (2020) Fuzzy based efficient cosmetology paradigm. Int J Multidiscip Curr Res 8:421\u2013425","journal-title":"Int J Multidiscip Curr Res"}],"container-title":["Visual Computing for Industry, Biomedicine, and Art"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s42492-020-00055-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s42492-020-00055-9\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s42492-020-00055-9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,8,10]],"date-time":"2021-08-10T23:07:13Z","timestamp":1628636833000},"score":1,"resource":{"primary":{"URL":"https:\/\/vciba.springeropen.com\/articles\/10.1186\/s42492-020-00055-9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,8,11]]},"references-count":37,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2020,12]]}},"alternative-id":["55"],"URL":"https:\/\/doi.org\/10.1186\/s42492-020-00055-9","relation":{},"ISSN":["2524-4442"],"issn-type":[{"value":"2524-4442","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,8,11]]},"assertion":[{"value":"2 June 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"24 July 2020","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 August 2020","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"On behalf of all authors, the corresponding author states that there is no conflict of interest at all.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"19"}}