{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T15:58:25Z","timestamp":1772121505334,"version":"3.50.1"},"reference-count":40,"publisher":"World Scientific Pub Co Pte Lt","issue":"01","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["J. Info. Know. Mgmt."],"published-print":{"date-parts":[[2020,3]]},"abstract":"<jats:p> Machine learning (ML) is a branch of computer science that is rapidly gaining popularity within the healthcare arena due to its ability to explore large datasets to discover useful patterns that can be interepreted for decision-making and prediction. ML techniques are used for the analysis of clinical parameters and their combinations for prognosis, therapy planning and support and patient management and wellbeing. In this research, we investigate a crucial problem associated with medical applications such as autism spectrum disorder (ASD) data imbalances in which cases are far more than just controls in the dataset. In autism diagnosis data, the number of possible instances is linked with one class, i.e. the no ASD is larger than the ASD, and this may cause performance issues such as models favouring the majority class and undermining the minority class. This research experimentally measures the impact of class imbalance issue on the performance of different classifiers on real autism datasets when various data imbalance approaches are utilised in the pre-processing phase. We employ oversampling techniques, such as Synthetic Minority Oversampling (SMOTE), and undersampling with different classifiers including Naive Bayes, RIPPER, C4.5 and Random Forest to measure the impact of these on the performance of the models derived in terms of area under curve and other metrics. Results pinpoint that oversampling techniques are superior to undersampling techniques, at least for the toddlers\u2019 autism dataset that we consider, and suggest that further work should look at incorporating sampling techniques with feature selection to generate models that do not overfit the dataset. <\/jats:p>","DOI":"10.1142\/s0219649220400146","type":"journal-article","created":{"date-parts":[[2020,3,12]],"date-time":"2020-03-12T07:30:54Z","timestamp":1583998254000},"page":"2040014","source":"Crossref","is-referenced-by-count":19,"title":["Data Imbalance in Autism Pre-Diagnosis Classification Systems: An Experimental Study"],"prefix":"10.1142","volume":"19","author":[{"given":"Neda","family":"Abdelhamid","sequence":"first","affiliation":[{"name":"IT Programme, Auckland Institute of Studies, Auckland, New Zealand"}]},{"given":"Arun","family":"Padmavathy","sequence":"additional","affiliation":[{"name":"Digital Technologies, Manukau Institute of Technology, Auckland, New Zealand"}]},{"given":"David","family":"Peebles","sequence":"additional","affiliation":[{"name":"Department of Psychology, University of Huddersfield, Queensgate, Huddersfield HD1 3DH, UK"}]},{"given":"Fadi","family":"Thabtah","sequence":"additional","affiliation":[{"name":"Digital Technologies, Manukau Institute of Technology, Auckland, New Zealand"}]},{"given":"Daymond","family":"Goulder-Horobin","sequence":"additional","affiliation":[{"name":"Digital Technologies, Manukau Institute of Technology, Auckland, New Zealand"}]}],"member":"219","published-online":{"date-parts":[[2020,3,11]]},"reference":[{"key":"S0219649220400146BIB002","doi-asserted-by":"publisher","DOI":"10.1097\/DBP.0000000000000668"},{"key":"S0219649220400146BIB004","doi-asserted-by":"publisher","DOI":"10.1016\/j.jaac.2011.11.003"},{"key":"S0219649220400146BIB005","doi-asserted-by":"publisher","DOI":"10.1007\/s10803-007-0509-7"},{"key":"S0219649220400146BIB006","doi-asserted-by":"publisher","DOI":"10.1176\/appi.books.9780890425596"},{"key":"S0219649220400146BIB008","doi-asserted-by":"publisher","DOI":"10.1177\/1460458219888405"},{"key":"S0219649220400146BIB009","doi-asserted-by":"publisher","DOI":"10.1023\/A:1010933404324"},{"key":"S0219649220400146BIB010","doi-asserted-by":"publisher","DOI":"10.1613\/jair.953"},{"key":"S0219649220400146BIB011","doi-asserted-by":"publisher","DOI":"10.1016\/B978-1-55860-377-6.50023-2"},{"key":"S0219649220400146BIB012","doi-asserted-by":"publisher","DOI":"10.1007\/BF00994018"},{"key":"S0219649220400146BIB013","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.1967.1053964"},{"key":"S0219649220400146BIB014","first-page":"268","volume-title":"The Twenty-Seventh International Flairs Conference","author":"Dittman DJ","year":"2014"},{"key":"S0219649220400146BIB015","doi-asserted-by":"publisher","DOI":"10.1016\/j.neuroimage.2013.10.005"},{"key":"S0219649220400146BIB016","volume-title":"Pattern Classification and Scene Analysis","author":"Duda RO","year":"1973"},{"key":"S0219649220400146BIB017","doi-asserted-by":"publisher","DOI":"10.1109\/ICoCS.2015.7483267"},{"key":"S0219649220400146BIB018","doi-asserted-by":"publisher","DOI":"10.1111\/j.0824-7935.2004.t01-1-00228.x"},{"key":"S0219649220400146BIB019","doi-asserted-by":"publisher","DOI":"10.1007\/s40747-017-0037-9"},{"key":"S0219649220400146BIB020","first-page":"144","volume-title":"Fifteenth International Conference on Machine Learning","author":"Frank F","year":"1998"},{"key":"S0219649220400146BIB022","doi-asserted-by":"publisher","DOI":"10.1214\/07-AOS537"},{"key":"S0219649220400146BIB023","volume-title":"Neural Networks: A Comprehensive Foundation","author":"Haykin S","year":"1999"},{"key":"S0219649220400146BIB024","first-page":"179","volume-title":"International Conference on Machine Learning","author":"Kubat M","year":"1997"},{"key":"S0219649220400146BIB025","doi-asserted-by":"publisher","DOI":"10.1016\/j.compbiomed.2010.03.005"},{"key":"S0219649220400146BIB026","doi-asserted-by":"publisher","DOI":"10.1038\/s41598-017-13945-5"},{"key":"S0219649220400146BIB027","doi-asserted-by":"publisher","DOI":"10.1016\/j.eurpsy.2013.02.005"},{"key":"S0219649220400146BIB028","unstructured":"Platt, JC  [1998] Sequential minimal optimization: A fast algorithm for training support vector machines.  Microsoft Research.  https:\/\/doi.org\/10.1.1.43.4376."},{"key":"S0219649220400146BIB029","volume-title":"C4.5: Programs for Machine Learning","author":"Quinlan JR","year":"1993"},{"key":"S0219649220400146BIB030","doi-asserted-by":"publisher","DOI":"10.7763\/IJMLC.2013.V3.307"},{"key":"S0219649220400146BIB031","doi-asserted-by":"publisher","DOI":"10.1017\/S0954579413000163"},{"key":"S0219649220400146BIB032","doi-asserted-by":"publisher","DOI":"10.23883\/IJRTER.2017.3168.0UWXM"},{"key":"S0219649220400146BIB034","doi-asserted-by":"publisher","DOI":"10.1145\/3107514.3107515"},{"key":"S0219649220400146BIB035","doi-asserted-by":"publisher","DOI":"10.1080\/17538157.2017.1399132"},{"key":"S0219649220400146BIB036","author":"Thabtah F","year":"2018","journal-title":"Health Informatics Journal"},{"key":"S0219649220400146BIB037","doi-asserted-by":"publisher","DOI":"10.1016\/j.ijmedinf.2018.06.009"},{"key":"S0219649220400146BIB038","author":"Thabtah F","year":"2019","journal-title":"Health Informatics Journal"},{"key":"S0219649220400146BIB039","doi-asserted-by":"publisher","DOI":"10.1007\/s13755-019-0073-5"},{"key":"S0219649220400146BIB040","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2019.11.004"},{"key":"S0219649220400146BIB041","doi-asserted-by":"publisher","DOI":"10.1007\/s10916-019-1469-0"},{"key":"S0219649220400146BIB043","doi-asserted-by":"publisher","DOI":"10.1186\/s12938-018-0569-2"},{"issue":"4","key":"S0219649220400146BIB044","first-page":"811","volume":"25","author":"Willemsen-Swinkels SH","year":"2002","journal-title":"PlumX Metrics"},{"key":"S0219649220400146BIB045","doi-asserted-by":"publisher","DOI":"10.1142\/S0219622006002258"},{"issue":"5","key":"S0219649220400146BIB46","first-page":"1017","volume":"34","author":"Zhuoyuan Z","year":"2015","journal-title":"Computing and Informatics"}],"container-title":["Journal of Information &amp; Knowledge Management"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.worldscientific.com\/doi\/pdf\/10.1142\/S0219649220400146","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2020,5,13]],"date-time":"2020-05-13T08:27:42Z","timestamp":1589358462000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.worldscientific.com\/doi\/abs\/10.1142\/S0219649220400146"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,3]]},"references-count":40,"journal-issue":{"issue":"01","published-print":{"date-parts":[[2020,3]]}},"alternative-id":["10.1142\/S0219649220400146"],"URL":"https:\/\/doi.org\/10.1142\/s0219649220400146","relation":{},"ISSN":["0219-6492","1793-6926"],"issn-type":[{"value":"0219-6492","type":"print"},{"value":"1793-6926","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,3]]}}}