{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,17]],"date-time":"2026-05-17T05:00:17Z","timestamp":1778994017977,"version":"3.51.4"},"reference-count":30,"publisher":"Walter de Gruyter GmbH","issue":"1","license":[{"start":{"date-parts":[[2023,1,1]],"date-time":"2023-01-01T00:00:00Z","timestamp":1672531200000},"content-version":"unspecified","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,5,26]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>The rapid development of technology allows people to obtain a large amount of data, which contains important information and various noises. How to obtain useful knowledge from data is the most important thing at this stage of machine learning (ML). The problem of unbalanced classification is currently an important topic in the field of data mining and ML. At present, this problem has attracted more and more attention and is a relatively new challenge for academia and industry. The problem of unbalanced classification involves classifying data when there is insufficient data or severe category distribution deviations. Due to the inherent complexity of unbalanced data sets, more new algorithms and tools are needed to effectively convert a large amount of raw data into useful information and knowledge. Unbalanced data set is a special case of classification problem, in which the distribution between classes is uneven, and it is difficult to classify data accurately. This article mainly introduces the research on the processing method of computer algorithms based on the processing method of unbalanced data sets based on ML, aiming to provide some ideas and directions for the processing of computer algorithms based on unbalanced data sets based on ML. This article proposes a research strategy for processing unbalanced data sets based on ML, including data preprocessing, decision tree data classification algorithm, and C4.5 algorithm, which are used to conduct research experiments on processing methods for unbalanced data sets based on ML. The experimental results in this article show that the accuracy rate of the decision tree C4.5 algorithm based on ML is 94.80%, which can be better used for processing unbalanced data sets based on ML.<\/jats:p>","DOI":"10.1515\/comp-2022-0273","type":"journal-article","created":{"date-parts":[[2023,5,26]],"date-time":"2023-05-26T08:11:32Z","timestamp":1685088692000},"source":"Crossref","is-referenced-by-count":7,"title":["Machine learning-based processing of unbalanced data sets for computer algorithms"],"prefix":"10.1515","volume":"13","author":[{"given":"Qingwei","family":"Zhou","sequence":"first","affiliation":[{"name":"School of Information and Engineering, Sichuan Tourism University , Chengdu 610000, Sichuan , China"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yongjun","family":"Qi","sequence":"additional","affiliation":[{"name":"Faculty of Megadata and Computing, Guangdong Baiyun University , Guangzhou 510450 Guangdong , China"},{"name":"School of Information and Communication Technology, Mongolian University of Science and Technology, Bayanzurkh District , 13341 , Ulaanbaatar , Mongolia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hailin","family":"Tang","sequence":"additional","affiliation":[{"name":"Faculty of Megadata and Computing, Guangdong Baiyun University , Guangzhou 510450 Guangdong , China"},{"name":"School of Information and Communication Technology, Mongolian University of Science and Technology, Bayanzurkh District , 13341 , Ulaanbaatar , Mongolia"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Peng","family":"Wu","sequence":"additional","affiliation":[{"name":"School of Information and Engineering, Sichuan Tourism University , Chengdu 610000, Sichuan , China"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"374","published-online":{"date-parts":[[2023,5,26]]},"reference":[{"key":"2023090110141792812_j_comp-2022-0273_ref_001","doi-asserted-by":"crossref","unstructured":"A. Vollant, G. Balarac, and C. Corre, \u201cSubgrid-scale scalar flux modelling based on optimal estimation theory and machine-learning procedures,\u201d J. Turbul., vol. 18, no. 9, pp. 1\u201325, 2017.","DOI":"10.1080\/14685248.2017.1334907"},{"key":"2023090110141792812_j_comp-2022-0273_ref_002","doi-asserted-by":"crossref","unstructured":"T. Hunt, C. Song, R. Shokri, V. Shmatikov and E. Witchel, \u201cPrivacy-preserving machine learning as a service,\u201d Proc. Priv. Enhancing Technol., vol. 2018, no. 3, pp. 123\u2013142, 2018.","DOI":"10.1515\/popets-2018-0024"},{"key":"2023090110141792812_j_comp-2022-0273_ref_003","doi-asserted-by":"crossref","unstructured":"Y. Li, H. Li, F. C. Pickard, B. Narayanan, F. Sen, M. K. Y. Chan, et al. \u201cMachine learning force field parameters from Ab initio data,\u201d J. Chem. Theory Comput., vol. 13, no. 9. pp. 4492\u20134503, 2017.","DOI":"10.1021\/acs.jctc.7b00521"},{"key":"2023090110141792812_j_comp-2022-0273_ref_004","doi-asserted-by":"crossref","unstructured":"A. Karpatne, Z. Jiang, R. R. Vatsavai, S. Shekhar and V. Kumar, \u201cMonitoring land-cover changes: A machine-learning perspective,\u201d IEEE Geosci. Remote. Sens. Mag., vol. 4, no. 2. pp. 8\u201321, 2016.","DOI":"10.1109\/MGRS.2016.2528038"},{"key":"2023090110141792812_j_comp-2022-0273_ref_005","doi-asserted-by":"crossref","unstructured":"P. Plawiak, T. Sosnicki, M. Niedzwiecki, Z. Tabor, and K. Rzecki, \u201cHand body language gesture recognition based on signals from specialized glove and machine learning algorithms,\u201d IEEE Trans. Ind. Inform., vol. 12, no. 3. pp. 1104\u20131113, 2016.","DOI":"10.1109\/TII.2016.2550528"},{"key":"2023090110141792812_j_comp-2022-0273_ref_006","doi-asserted-by":"crossref","unstructured":"W. Yuan, K. S. Chin, M. Hua, G. Dong, and C. Wang, \u201cShape classification of wear particles by image boundary analysis using machine learning algorithms,\u201d Mech. Syst. Signal. Process, vol. 72\u201373, pp. 346\u2013358, 2016.","DOI":"10.1016\/j.ymssp.2015.10.013"},{"key":"2023090110141792812_j_comp-2022-0273_ref_007","doi-asserted-by":"crossref","unstructured":"M. E. Dickson and G. L. W. Perry, \u201cIdentifying the controls on coastal cliff landslides using machine-learning approaches,\u201d Environ. Model. & Softw., vol. 76, no. Feb, pp. 117\u2013127, 2016.","DOI":"10.1016\/j.envsoft.2015.10.029"},{"key":"2023090110141792812_j_comp-2022-0273_ref_008","doi-asserted-by":"crossref","unstructured":"G. Wang, M. Kalra, and C. G. Orton, \u201cMachine learning will transform radiology significantly within the next 5 years,\u201d Med. Phys., vol. 44, no. 6. pp. 2041\u20132044, 2017.","DOI":"10.1002\/mp.12204"},{"key":"2023090110141792812_j_comp-2022-0273_ref_009","doi-asserted-by":"crossref","unstructured":"Y. Huang, C. L. Gutterman, P. Samadi, P. B. Cho, W. Samoud, C. Ware, et al., \u201cDynamic mitigation of EDFA power excursions with machine learning,\u201d Opt. Express, vol. 25, no. 3. pp. 2245\u20132258, 2017.","DOI":"10.1364\/OE.25.002245"},{"key":"2023090110141792812_j_comp-2022-0273_ref_010","doi-asserted-by":"crossref","unstructured":"T. Liu, Y. Yang, G. B. Huang, K. Y. Yong, and Z. Lin, \u201cDriver distraction detection using semi-supervised machine learning,\u201d IEEE Trans. Intell. TransportatiSyst., vol. 17, no. 4. pp. 1108\u20131120, 2016.","DOI":"10.1109\/TITS.2015.2496157"},{"key":"2023090110141792812_j_comp-2022-0273_ref_011","doi-asserted-by":"crossref","unstructured":"E. E. Tripoliti, T. G. Papadopoulos, G. S. Karanasiou, K. K. Naka, and D. I. Fotiadis, \u201cHeart failure: Diagnosis, severity estimation and prediction of adverse events through machine learning techniques,\u201d Computat. Struct. Biotechnol. J., vol. 15, no. C. pp. 26\u201347, 2017.","DOI":"10.1016\/j.csbj.2016.11.001"},{"key":"2023090110141792812_j_comp-2022-0273_ref_012","doi-asserted-by":"crossref","unstructured":"J. A. Gonzalez, L. A. Cheah, A. M. Gomez, P. D. Green, and E. Holdsworth, \u201cDirect speech reconstruction from articulatory sensor data by machine learning,\u201d IEEE\/ACM Trans. Audio Speech Lang. Process., vol. 25, no. 12. pp. 2362\u20132374, 2017.","DOI":"10.1109\/TASLP.2017.2757263"},{"key":"2023090110141792812_j_comp-2022-0273_ref_013","doi-asserted-by":"crossref","unstructured":"E. Giacoumidis, A. Matin, J. Wei, N. J. Doran, L. P. Barry, and X. Wang, \u201cBlind nonlinearity equalization by machine-learning-based clustering for single- and multichannel coherent optical OFDM,\u201d J. Lightwave Technol., vol. 36, no. 3. pp. 721\u2013727, 2018.","DOI":"10.1109\/JLT.2017.2778883"},{"key":"2023090110141792812_j_comp-2022-0273_ref_014","doi-asserted-by":"crossref","unstructured":"A. Linden and P. R. Yarnold, \u201cCombining machine learning and matching techniques to improve causal inference in program evaluation,\u201d J. Eval. Clin. Pract., vol. 22, no. 6. pp. 864\u2013870, 2016.","DOI":"10.1111\/jep.12592"},{"key":"2023090110141792812_j_comp-2022-0273_ref_015","doi-asserted-by":"crossref","unstructured":"J. K. Park, B. K. Kwon, J. H. Park, and D. J. Kang, \u201cMachine learning-based imaging system for surface defect inspection,\u201d Int. J. Precis. Eng. Manuf.-Green Technol., vol. 3, no. 3. pp. 303\u2013310, 2016.","DOI":"10.1007\/s40684-016-0039-x"},{"key":"2023090110141792812_j_comp-2022-0273_ref_016","doi-asserted-by":"crossref","unstructured":"A. Kashyap, L. Han, R. Yus, J. Sleeman, T. Satyapanich, S. Gandhi, et al., \u201cRobust semantic text similarity using LSA, machine learning, and linguistic resources,\u201d Lang. Resour. Eval., vol. 50, no. 1. pp. 125\u2013161, 2016.","DOI":"10.1007\/s10579-015-9319-2"},{"key":"2023090110141792812_j_comp-2022-0273_ref_017","doi-asserted-by":"crossref","unstructured":"L. M. Eerikinen, J. Vanschoren, M. J. Rooijakkers, R. Vullings and R. M. Aarts, \u201cReduction of false arrhythmia alarms using signal selection and machine learning,\u201d Phys. Meas., vol. 37, no. 8. pp. 1204\u20131216, 2016.","DOI":"10.1088\/0967-3334\/37\/8\/1204"},{"key":"2023090110141792812_j_comp-2022-0273_ref_018","doi-asserted-by":"crossref","unstructured":"B. Long, K. Yu, and J. Qin, \u201cData augmentation for unbalanced face recognition training sets,\u201d Neurocomputing, vol. 235, no. APR.26. pp. 10\u201314, 2017.","DOI":"10.1016\/j.neucom.2016.12.013"},{"key":"2023090110141792812_j_comp-2022-0273_ref_019","doi-asserted-by":"crossref","unstructured":"D. Yu and X. Zi-Qiang, \u201cPrediction of damage to insulation joints based on SVM with unbalanced data sets,\u201d Int. J. Multimed. Ubiquitous Eng., vol. 11, no. 3. pp. 273\u2013282, 2016.","DOI":"10.14257\/ijmue.2016.11.3.26"},{"key":"2023090110141792812_j_comp-2022-0273_ref_020","doi-asserted-by":"crossref","unstructured":"A. Werner, G. Olaf, G. Asma, K. H. Folkert, K. Zardad and L. Berthold, \u201cEnsemble pruning for glaucoma detection in an unbalanced data set,\u201d Methods Inf. Med., vol. 55, no. 6. pp. 557\u2013563, 2016.","DOI":"10.3414\/ME16-01-0055"},{"key":"2023090110141792812_j_comp-2022-0273_ref_021","unstructured":"Z. Liang, X. Li, and W. Song, \u201cResearch on speech emotion recognition algorithm for unbalanced data set,\u201d J. Intell. Fuzzy Syst., vol. 5, pp. 1\u20136, 2020."},{"key":"2023090110141792812_j_comp-2022-0273_ref_022","doi-asserted-by":"crossref","unstructured":"L. S\u00e1nchez-Guerrero, J. F. Gonz\u00e1lez, B. A. Gonz\u00e1lez-Beltr\u00e1n, and S. B. Gonz\u00e1lez-Brambila, \u201cEvaluating predictive techniques in educational data mining: An unbalanced data set case of study,\u201d Res. Comput. Sci., vol. 148, no. 3. pp. 49\u201360, 2019.","DOI":"10.13053\/rcs-148-3-4"},{"key":"2023090110141792812_j_comp-2022-0273_ref_023","doi-asserted-by":"crossref","unstructured":"A. Den Reijer and A. Johansson, \u201cNowcasting Swedish GDP with a large and unbalanced data set,\u201d Empir. Econ., vol. 57, no. 4. pp. 1351\u20131373, 2019.","DOI":"10.1007\/s00181-018-1500-1"},{"key":"2023090110141792812_j_comp-2022-0273_ref_024","doi-asserted-by":"crossref","unstructured":"R. Jing-Shi, P. Hai-Wei, L. Peng-Yuan, G. Lin-Lin, H. Qi-Long, Z. Zhi-Qiang, et al., \u201cSymmetry theory based classification algorithm in brain computed tomography image database,\u201d J. Med. Imaging Health Inform., vol. 6, no. 1. pp. 22\u201333, 2016.","DOI":"10.1166\/jmihi.2016.1596"},{"key":"2023090110141792812_j_comp-2022-0273_ref_025","doi-asserted-by":"crossref","unstructured":"J. Cao, W. Huang, T. Zhao, J. Wang, and R. Wang, \u201cAn enhance excavation equipments classification algorithm based on acoustic spectrum dynamic feature,\u201d Multidimension. Syst. Signal. Process., vol. 28, no. 3. pp. 921\u2013943, 2017.","DOI":"10.1007\/s11045-015-0374-z"},{"key":"2023090110141792812_j_comp-2022-0273_ref_026","doi-asserted-by":"crossref","unstructured":"A. Palacios, L. Sanchez, I. Couso, and S. Destercke, \u201cAn extension of the FURIA classification algorithm to low quality data through fuzzy rankings and its application to the early diagnosis of dyslexia,\u201d Neurocomputing, vol. 176, no. Feb. 2, pp. 60\u201371, 2016.","DOI":"10.1016\/j.neucom.2014.11.088"},{"key":"2023090110141792812_j_comp-2022-0273_ref_027","doi-asserted-by":"crossref","unstructured":"C. G. Yan, X. D. Wang, X. N. Zuo, and Y. F. Zang, \u201cDPABI: Data processing & analysis for (Resting-State) brain imaging,\u201d Neuroinformatics, vol. 14, no. 3. pp. 339\u2013351, 2016.","DOI":"10.1007\/s12021-016-9299-4"},{"key":"2023090110141792812_j_comp-2022-0273_ref_028","doi-asserted-by":"crossref","unstructured":"C. Zhu, H. Wang, X. Liu, S. Lei, L. T. Yang, and V. C. M. Leung, \u201cA novel sensory data processing framework to integrate sensor networks with mobile cloud,\u201d IEEE Syst. J., vol. 10, no. 3. pp. 1125\u20131136, 2016.","DOI":"10.1109\/JSYST.2014.2300535"},{"key":"2023090110141792812_j_comp-2022-0273_ref_029","doi-asserted-by":"crossref","unstructured":"R. Munro, R. Lang, D. Klaes, G. Poli, C. Retscher, R. Lindstrot, et al., \u201cThe GOME-2 instrument on the Metop series of satellites: Instrument design, calibration, and level 1 data processing - An overview,\u201d Atmos. Meas. Tech., vol. 9, no. 3. pp. 1279\u20131301, 2016.","DOI":"10.5194\/amt-9-1279-2016"},{"key":"2023090110141792812_j_comp-2022-0273_ref_030","doi-asserted-by":"crossref","unstructured":"N. Corbin, E. Breton, M. de Mathelin, and Vappou J. \u201cK-space data processing for magnetic resonance elastography (MRE).\u201d Magnetic Reson. Mater. Phys. Biol. Med., vol. 30, no. 2. pp. 1\u201311, 2017.","DOI":"10.1007\/s10334-016-0594-8"}],"container-title":["Open Computer Science"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.degruyter.com\/document\/doi\/10.1515\/comp-2022-0273\/xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/www.degruyter.com\/document\/doi\/10.1515\/comp-2022-0273\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,9,1]],"date-time":"2023-09-01T10:19:29Z","timestamp":1693563569000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.degruyter.com\/document\/doi\/10.1515\/comp-2022-0273\/html"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,1,1]]},"references-count":30,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,5,25]]},"published-print":{"date-parts":[[2023,5,25]]}},"alternative-id":["10.1515\/comp-2022-0273"],"URL":"https:\/\/doi.org\/10.1515\/comp-2022-0273","relation":{},"ISSN":["2299-1093"],"issn-type":[{"value":"2299-1093","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,1,1]]},"article-number":"20220273"}}