{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,7]],"date-time":"2026-01-07T23:38:57Z","timestamp":1767829137032,"version":"3.49.0"},"reference-count":29,"publisher":"Maximum Academic Press","license":[{"start":{"date-parts":[[2018,7,10]],"date-time":"2018-07-10T00:00:00Z","timestamp":1531180800000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/www.cambridge.org\/core\/terms"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["The Knowledge Engineering Review"],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Automated machine classification will play a vital role in the machine learning and data mining. It is probable that each classifier will work well on some data sets and not so well in others, increasing the evaluation significance. The performance of the learning models will intensely rely on upon the characteristics of the data sets. The previous outcomes recommend that overlapping between classes and the presence of noise has the most grounded impact on the performance of learning algorithm. The class overlap problem is a critical problem in which data samples appear as valid instances of more than one class which may be responsible for the presence of noise in data sets.<\/jats:p>\n                  <jats:p>The objective of this paper is to comprehend better the data used as a part of machine learning problems so as to learn issues and to analyze the instances that are profoundly covered by utilizing new proposed overlap measures. The proposed overlap measures are Nearest Enemy Ratio, SubConcept Ratio, Likelihood Ratio and Soft Margin Ratio. To perform this experiment, we have created 438 binary classification data sets from real-world problems and computed the value of 12 data complexity metrics to find highly overlapped data sets. After that we apply measures to identify the overlapped instances and four noise filters to find the noisy instances. From results, we found that 60\u201380% overlapped instances are noisy instances in data sets by using four noise filters. We found that class overlap is a principal contributor to introduce class noise in data sets.<\/jats:p>","DOI":"10.1017\/s0269888918000115","type":"journal-article","created":{"date-parts":[[2018,7,10]],"date-time":"2018-07-10T05:58:56Z","timestamp":1531202336000},"source":"Crossref","is-referenced-by-count":18,"title":["Handling class overlapping to detect noisy instances in classification"],"prefix":"10.48130","volume":"33","author":[{"given":"Shivani","family":"Gupta","sequence":"first","affiliation":[]},{"given":"Atul","family":"Gupta","sequence":"additional","affiliation":[]}],"member":"27968","published-online":{"date-parts":[[2018,7,10]]},"reference":[{"key":"S0269888918000115_ref17","doi-asserted-by":"crossref","unstructured":"Kretzschmar R. , Karayiannis N. B. & Eggimann F. 2003. Handling class overlap with variance-controlled neural networks. In Proceedings of the International Joint Conference on Neural Networks, 2003, 1, 517\u2013522. IEEE.","DOI":"10.1109\/IJCNN.2003.1223400"},{"key":"S0269888918000115_ref26","doi-asserted-by":"crossref","unstructured":"Verbaeten S. & Van Assche A. 2003. Ensemble methods for noise elimination in classification problems. In 4th International Workshop on Multiple Classifier Systems (MCS 2003), LNCS 2709, 317\u2013325. Springer.","DOI":"10.1007\/3-540-44938-8_32"},{"key":"S0269888918000115_ref24","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-8655(97)00035-4"},{"key":"S0269888918000115_ref19","doi-asserted-by":"crossref","unstructured":"Mollineda R. A. , Snchez J. S. & Sotoca J. M. 2005. Data characterization for effective prototype selection. In Iberian Conference on Pattern Recognition and Image Analysis, 27\u201334. Springer.","DOI":"10.1007\/11492542_4"},{"key":"S0269888918000115_ref13","doi-asserted-by":"crossref","first-page":"1263","DOI":"10.1109\/TKDE.2008.239","article-title":"Learning from imbalanced data","volume":"21","author":"He","year":"2009","journal-title":"IEEE Transactions on Knowledge and Data Engineering"},{"key":"S0269888918000115_ref10","doi-asserted-by":"publisher","DOI":"10.1023\/A:1007413511361"},{"key":"S0269888918000115_ref9","doi-asserted-by":"publisher","DOI":"10.1016\/0167-8655(86)90066-8"},{"key":"S0269888918000115_ref8","doi-asserted-by":"publisher","DOI":"10.1109\/TSMCB.2012.2191953"},{"key":"S0269888918000115_ref4","doi-asserted-by":"publisher","DOI":"10.1109\/TEVC.2004.840153"},{"key":"S0269888918000115_ref1","first-page":"255","article-title":"Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework","volume":"17","author":"Alcal-Fdez","year":"2011","journal-title":"Journal of Multiple-Valued Logic and Soft Computing"},{"key":"S0269888918000115_ref12","doi-asserted-by":"publisher","DOI":"10.1016\/S0031-3203(99)00068-0"},{"key":"S0269888918000115_ref14","doi-asserted-by":"publisher","DOI":"10.1109\/34.824819"},{"key":"S0269888918000115_ref21","volume-title":"C4. 5: Programs for Machine Learning","author":"Quinlan","year":"2014"},{"key":"S0269888918000115_ref27","doi-asserted-by":"publisher","DOI":"10.1109\/TSMC.1972.4309137"},{"key":"S0269888918000115_ref500","unstructured":"Salvador, G. & Herrera, F. 2008. An extension on statistical comparisons of classifiers over multiple data setsi for all pairwise comparisons, Journal Machine Learning Research, 9, 2677\u20132694."},{"key":"S0269888918000115_ref20","unstructured":"Orriols-Puig A. , Macia N. & Ho T. K. 2010. Documentation for the Data Complexity Library in C++ 196, Universitat Ramon Llull, La Salle."},{"key":"S0269888918000115_ref23","doi-asserted-by":"publisher","DOI":"10.1007\/s10044-007-0061-2"},{"key":"S0269888918000115_ref28","doi-asserted-by":"publisher","DOI":"10.1007\/s10462-004-0751-8"},{"key":"S0269888918000115_ref2","doi-asserted-by":"publisher","DOI":"10.1016\/j.patrec.2006.01.006"},{"key":"S0269888918000115_ref11","unstructured":"Gamberger D. , Lavrac N. & Groselj C. 1999. Experiments with noise filtering in a medical domain. In 16th International Conference on Machine Learning (ICML99), 143\u2013151."},{"key":"S0269888918000115_ref6","doi-asserted-by":"publisher","DOI":"10.1007\/BF00994018"},{"key":"S0269888918000115_ref3","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-84628-172-3"},{"key":"S0269888918000115_ref5","doi-asserted-by":"crossref","first-page":"131","DOI":"10.1613\/jair.606","article-title":"Identifying mislabeled training data","volume":"11","author":"Brodley","year":"1999","journal-title":"Journal of Artificial Intelligence Research"},{"key":"S0269888918000115_ref25","first-page":"448","article-title":"An experiment with the edited nearest-neighbor rule","volume":"6","author":"Tomek","year":"1976","journal-title":"IEEE Transactions on Systems, Man, and Cybernetics"},{"key":"S0269888918000115_ref7","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.1967.1053964"},{"key":"S0269888918000115_ref22","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-8655(02)00225-8"},{"key":"S0269888918000115_ref15","doi-asserted-by":"publisher","DOI":"10.20965\/jaciii.2010.p0297"},{"key":"S0269888918000115_ref18","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2011.09.022"},{"key":"S0269888918000115_ref16","doi-asserted-by":"crossref","first-page":"3","DOI":"10.3233\/IDA-2005-9102","article-title":"Enhancing software quality estimation using ensemble-classifier based noise filtering","volume":"9","author":"Khoshgoftaar","year":"2005","journal-title":"Intelligent Data Analysis"}],"container-title":["The Knowledge Engineering Review"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S0269888918000115","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,5]],"date-time":"2026-01-05T14:42:12Z","timestamp":1767624132000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S0269888918000115\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,7,10]]},"references-count":29,"alternative-id":["S0269888918000115"],"URL":"https:\/\/doi.org\/10.1017\/s0269888918000115","relation":{},"ISSN":["0269-8889","1469-8005"],"issn-type":[{"value":"0269-8889","type":"print"},{"value":"1469-8005","type":"electronic"}],"subject":[],"published":{"date-parts":[[2018,7,10]]},"article-number":"e8"}}