{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,2]],"date-time":"2026-05-02T04:38:28Z","timestamp":1777696708279,"version":"3.51.4"},"reference-count":65,"publisher":"SAGE Publications","issue":"6","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["IDA"],"published-print":{"date-parts":[[2024,11,15]]},"abstract":"<jats:p>PURPOSE: Crop diseases can cause significant reductions in yield, subsequently impacting a country\u2019s economy. The current research is concentrated on detecting diseases in three specific crops \u2013 tomatoes, soybeans, and mushrooms, using a real-time dataset collected for tomatoes and two publicly accessible datasets for the other crops. The primary emphasis is on employing datasets with exclusively categorical attributes, which poses a notable challenge to the research community. METHODS: After applying label encoding to the attributes, the datasets undergo four distinct preprocessing techniques to address missing values. Following this, the SMOTE-N technique is employed to tackle class imbalance. Subsequently, the pre-processed datasets are subjected to classification using three ensemble methods: bagging, boosting, and voting. To further refine the classification process, the metaheuristic Ant Lion Optimizer (ALO) is utilized for hyper-parameter tuning. RESULTS: This comprehensive approach results in the evaluation of twelve distinct models. The top two performers are then subjected to further validation using ten standard categorical datasets. The findings demonstrate that the hybrid model II-SN-OXGB, surpasses all other models as well as the current state-of-the-art in terms of classification accuracy across all thirteen categorical datasets. II utilizes the Random Forest classifier to iteratively impute missing feature values, employing a nearest features strategy. Meanwhile, SMOTE-N (SN) serves as an oversampling technique particularly for categorical attributes, again utilizing nearest neighbors. Optimized (using ALO) Xtreme Gradient Boosting OXGB, sequentially trains multiple decision trees, with each tree correcting errors from its predecessor. CONCLUSION: Consequently, the model II-SN-OXGB emerges as the optimal choice for addressing classification challenges in categorical datasets. Applying the II-SN-OXGB model to crop datasets can significantly enhance disease detection which in turn, enables the farmers to take timely and appropriate measures to prevent yield losses and mitigate the economic impact of crop diseases.<\/jats:p>","DOI":"10.3233\/ida-230651","type":"journal-article","created":{"date-parts":[[2024,4,2]],"date-time":"2024-04-02T13:58:02Z","timestamp":1712066282000},"page":"1697-1721","source":"Crossref","is-referenced-by-count":3,"title":["Processing and optimized learning for improved classification of categorical plant disease datasets"],"prefix":"10.1177","volume":"28","author":[{"given":"Ayushi","family":"Gupta","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Anuradha","family":"Chug","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Amit Prakash","family":"Singh","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"179","reference":[{"issue":"23","key":"10.3233\/IDA-230651_ref2","doi-asserted-by":"crossref","first-page":"e2022239118","DOI":"10.1073\/pnas.2022239118","article-title":"The persistent threat of emerging plant disease pandemics to global food security","volume":"118","author":"Ristaino","year":"2021","journal-title":"Proceedings of the National Academy of Sciences"},{"key":"10.3233\/IDA-230651_ref3","doi-asserted-by":"crossref","first-page":"1093","DOI":"10.1016\/j.ins.2022.06.091","article-title":"Deconv-transformer (DecT): A histopathological image classification model for breast cancer based on color deconvolution and transformer architecture","volume":"608","author":"He","year":"2022","journal-title":"Information Sciences"},{"key":"10.3233\/IDA-230651_ref4","doi-asserted-by":"crossref","first-page":"103516","DOI":"10.1016\/j.cose.2023.103516","article-title":"Dynamic multi-scale topological representation for enhancing network intrusion detection","volume":"135","author":"Zhong","year":"2023","journal-title":"Computers & Security"},{"key":"10.3233\/IDA-230651_ref5","doi-asserted-by":"crossref","first-page":"343","DOI":"10.1007\/s13042-020-01175-7","article-title":"Cross-domain sentiment aware word embeddings for review sentiment analysis","volume":"12","author":"Liu","year":"2021","journal-title":"International Journal of Machine Learning and Cybernetics"},{"key":"10.3233\/IDA-230651_ref6","doi-asserted-by":"crossref","first-page":"119110","DOI":"10.1016\/j.eswa.2022.119110","article-title":"Aliasing black box adversarial attack with joint self-attention distribution and confidence probability","volume":"214","author":"Liu","year":"2023","journal-title":"Expert Systems with Applications"},{"key":"10.3233\/IDA-230651_ref7","doi-asserted-by":"crossref","first-page":"120519","DOI":"10.1016\/j.eswa.2023.120519","article-title":"Consistency-and dependence-guided knowledge distillation for object detection in remote sensing images","volume":"229","author":"Chen","year":"2023","journal-title":"Expert Systems with Applications"},{"issue":"4","key":"10.3233\/IDA-230651_ref8","first-page":"1","article-title":"Categorical data clustering using harmony search algorithm for healthcare datasets","volume":"13","author":"Sharma","year":"2022","journal-title":"International Journal of E-Health and Medical Communications (IJEHMC)"},{"issue":"4","key":"10.3233\/IDA-230651_ref9","doi-asserted-by":"crossref","first-page":"558","DOI":"10.1108\/DTA-12-2020-0298","article-title":"A systematic review of machine learning-based missing value imputation techniques","volume":"55","author":"Thomas","year":"2021","journal-title":"Data Technologies and Applications"},{"key":"10.3233\/IDA-230651_ref10","doi-asserted-by":"crossref","first-page":"228","DOI":"10.1016\/j.inffus.2022.08.017","article-title":"A unifying view of class overlap and imbalance: Key concepts, multi-view panorama, and open avenues for research","volume":"89","author":"Santos","year":"2023","journal-title":"Information Fusion"},{"issue":"1","key":"10.3233\/IDA-230651_ref11","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s40537-022-00679-z","article-title":"Smoothing target encoding and class center-based firefly algorithm for handling missing values in categorical variable","volume":"10","author":"Nugroho","year":"2023","journal-title":"Journal of Big Data"},{"key":"10.3233\/IDA-230651_ref13","doi-asserted-by":"crossref","first-page":"70113","DOI":"10.1109\/ACCESS.2022.3187287","article-title":"The categorical data conundrum: Heuristics for classification problems \u2013 A case study on domestic fire injuries","volume":"10","author":"Reilly","year":"2022","journal-title":"IEEE Access"},{"issue":"4","key":"10.3233\/IDA-230651_ref14","doi-asserted-by":"crossref","first-page":"7","DOI":"10.5120\/ijca2017915495","article-title":"A comparative study of categorical variable encoding techniques for neural network classifiers","volume":"175","author":"Potdar","year":"2017","journal-title":"International Journal of Computer Applications"},{"issue":"4","key":"10.3233\/IDA-230651_ref15","doi-asserted-by":"crossref","first-page":"104","DOI":"10.3390\/computation8040104","article-title":"Classification of Categorical Data Based on the Chi-Square Dissimilarity and t-SNE","volume":"8","author":"Cardona","year":"2020","journal-title":"Computation"},{"issue":"15","key":"10.3233\/IDA-230651_ref16","doi-asserted-by":"crossref","first-page":"3312","DOI":"10.3390\/math11153312","article-title":"Optimizing a multi-layer perceptron based on an improved gray wolf algorithm to identify plant diseases","volume":"11","author":"Bi","year":"2023","journal-title":"Mathematics"},{"key":"10.3233\/IDA-230651_ref17","doi-asserted-by":"crossref","first-page":"101453","DOI":"10.1016\/j.jestch.2023.101453","article-title":"A binary chaotic horse herd optimization algorithm for feature selection","volume":"44","author":"Zaimo\u011flu","year":"2023","journal-title":"Engineering Science and Technology, an International Journal"},{"issue":"6","key":"10.3233\/IDA-230651_ref18","doi-asserted-by":"publisher","first-page":"3142","DOI":"10.1016\/j.eswa.2014.12.002","article-title":"Nearest neighbor classification of categorical data by attributes weighting","volume":"42","author":"Chen","year":"2015","journal-title":"Expert Systems with Applications"},{"key":"10.3233\/IDA-230651_ref19","doi-asserted-by":"publisher","DOI":"10.1109\/DISCOVER50404.2020.9278060"},{"issue":"4","key":"10.3233\/IDA-230651_ref20","first-page":"71","article-title":"Plant disease detection for high dimensional imbalanced dataset using an enhanced decision tree approach","volume":"13","author":"Bhatia","year":"2020","journal-title":"Int J Future Gener Commun Netw"},{"key":"10.3233\/IDA-230651_ref21","doi-asserted-by":"crossref","unstructured":"A. Bhatia, A. Chug, A. Prakash\u00a0Singh and D. Singh, Investigate the Impact of Resampling Techniques on Imbalanced Datasets: A Case Study in Plant Disease Prediction, in: 2021 Thirteenth International Conference on Contemporary Computing (IC3-2021), 2021, pp.\u00a0278\u2013285.","DOI":"10.1145\/3474124.3474164"},{"key":"10.3233\/IDA-230651_ref22","doi-asserted-by":"crossref","unstructured":"K. Tutuncu, I. Cinar, R. Kursun and M. Koklu, Edible and poisonous mushrooms classification by machine learning algorithms, in: 2022 11th Mediterranean Conference on Embedded Computing (MECO), IEEE, 2022, pp.\u00a01\u20134.","DOI":"10.1109\/MECO55406.2022.9797212"},{"key":"10.3233\/IDA-230651_ref23","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1613\/jair.953","article-title":"SMOTE: Synthetic minority over-sampling technique","volume":"16","author":"Chawla","year":"2002","journal-title":"Journal of Artificial Intelligence Research"},{"key":"10.3233\/IDA-230651_ref24","doi-asserted-by":"publisher","DOI":"10.24432\/C5JG6Z"},{"key":"10.3233\/IDA-230651_ref25","doi-asserted-by":"publisher","DOI":"10.24432\/C5959T"},{"issue":"6","key":"10.3233\/IDA-230651_ref26","doi-asserted-by":"crossref","first-page":"520","DOI":"10.1093\/bioinformatics\/17.6.520","article-title":"Missing value estimation methods for DNA microarrays","volume":"17","author":"Troyanskaya","year":"2001","journal-title":"Bioinformatics"},{"key":"10.3233\/IDA-230651_ref27","first-page":"1","article-title":"mice: Multivariate imputation by chained equations in R","volume":"45","author":"Van\u00a0Buuren","year":"2011","journal-title":"Journal of Statistical Software"},{"key":"10.3233\/IDA-230651_ref28","unstructured":"G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye and T.-Y. Liu, Lightgbm: A highly efficient gradient boosting decision tree, Advances in Neural Information Processing Systems 30 (2017)."},{"key":"10.3233\/IDA-230651_ref29","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1007\/BF00058655","article-title":"Bagging predictors","volume":"24","author":"Breiman","year":"1996","journal-title":"Machine Learning"},{"issue":"1","key":"10.3233\/IDA-230651_ref30","doi-asserted-by":"crossref","first-page":"63","DOI":"10.1016\/j.inffus.2004.04.008","article-title":"Classifier selection for majority voting","volume":"6","author":"Ruta","year":"2005","journal-title":"Information Fusion"},{"key":"10.3233\/IDA-230651_ref31","doi-asserted-by":"publisher","DOI":"10.1145\/2939672.2939785"},{"key":"10.3233\/IDA-230651_ref32","doi-asserted-by":"crossref","first-page":"80","DOI":"10.1016\/j.advengsoft.2015.01.010","article-title":"The ant lion optimizer","volume":"83","author":"Mirjalili","year":"2015","journal-title":"Advances in Engineering Software"},{"key":"10.3233\/IDA-230651_ref33","doi-asserted-by":"publisher","DOI":"10.24432\/C5488X"},{"key":"10.3233\/IDA-230651_ref34","doi-asserted-by":"publisher","DOI":"10.24432\/C51P4M"},{"key":"10.3233\/IDA-230651_ref35","doi-asserted-by":"publisher","DOI":"10.24432\/C5HP4Z"},{"key":"10.3233\/IDA-230651_ref36","doi-asserted-by":"publisher","DOI":"10.24432\/C5JP48"},{"key":"10.3233\/IDA-230651_ref37","doi-asserted-by":"publisher","DOI":"10.24432\/C54598"},{"key":"10.3233\/IDA-230651_ref38","doi-asserted-by":"publisher","DOI":"10.24432\/C5P88W"},{"key":"10.3233\/IDA-230651_ref39","doi-asserted-by":"publisher","DOI":"10.24432\/C5WK5Q"},{"key":"10.3233\/IDA-230651_ref40","doi-asserted-by":"publisher","DOI":"10.24432\/C5P304"},{"key":"10.3233\/IDA-230651_ref41","doi-asserted-by":"publisher","DOI":"10.24432\/C5688J"},{"key":"10.3233\/IDA-230651_ref42","doi-asserted-by":"publisher","DOI":"10.24432\/C5C01P"},{"issue":"1","key":"10.3233\/IDA-230651_ref43","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/1007730.1007733","article-title":"Special issue on learning from imbalanced data sets","volume":"6","author":"Chawla","year":"2004","journal-title":"ACM SIGKDD Explorations Newsletter"},{"issue":"2","key":"10.3233\/IDA-230651_ref44","doi-asserted-by":"crossref","first-page":"1883","DOI":"10.4249\/scholarpedia.1883","article-title":"K-nearest neighbor","volume":"4","author":"Peterson","year":"2009","journal-title":"Scholarpedia"},{"key":"10.3233\/IDA-230651_ref45","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Machine Learning"},{"issue":"2","key":"10.3233\/IDA-230651_ref46","first-page":"3","article-title":"The optimality of naive Bayes","volume":"1","author":"Zhang","year":"2004","journal-title":"Aa"},{"issue":"3","key":"10.3233\/IDA-230651_ref47","doi-asserted-by":"crossref","first-page":"874","DOI":"10.2307\/2530946","article-title":"Classification and regression trees","volume":"40","author":"Gordon","year":"1984","journal-title":"Biometrics"},{"key":"10.3233\/IDA-230651_ref48","first-page":"2825","article-title":"Scikit-learn: Machine learning in python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"Journal of Machine Learning Research"},{"key":"10.3233\/IDA-230651_ref49","doi-asserted-by":"crossref","first-page":"110140","DOI":"10.1016\/j.knosys.2022.110140","article-title":"Flexible learning tree augmented na\u00efve classifier and its application","volume":"260","author":"Ren","year":"2023","journal-title":"Knowledge-Based Systems"},{"issue":"1","key":"10.3233\/IDA-230651_ref51","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1007\/s13042-022-01562-2","article-title":"Fractional mega trend diffusion function-based feature extraction for plant disease prediction","volume":"14","author":"Bhatia","year":"2023","journal-title":"International Journal of Machine Learning and Cybernetics"},{"key":"10.3233\/IDA-230651_ref52","first-page":"468","article-title":"Classification of plant disease using SVM and deep learning","volume":"47","author":"Thaiyalnayaki","year":"2021","journal-title":"Materials Today: Proceedings"},{"issue":"1","key":"10.3233\/IDA-230651_ref53","first-page":"257","article-title":"Plant disease prediction using classification algorithms","volume":"10","author":"Morgan","year":"2021","journal-title":"IAES International Journal of Artificial Intelligence"},{"issue":"8","key":"10.3233\/IDA-230651_ref54","doi-asserted-by":"publisher","first-page":"5042","DOI":"10.1016\/j.asoc.2011.05.054","article-title":"Parameter tuning, feature selection and weight assignment of features for case-based reasoning by artificial immune system","volume":"11","author":"Lin","year":"2011","journal-title":"Applied Soft Computing"},{"key":"10.3233\/IDA-230651_ref56","unstructured":"T. Saw and W.M. Oo, Ranking-based feature selection with wrapper PSO search in high-dimensional data classification, IAENG International Journal of Computer Science 50(1) (2023)."},{"key":"10.3233\/IDA-230651_ref57","doi-asserted-by":"crossref","first-page":"401","DOI":"10.1016\/j.ins.2023.01.041","article-title":"Incremental updating reduction for relation decision systems with dynamic conditional relation sets","volume":"625","author":"Su","year":"2023","journal-title":"Information Sciences"},{"key":"10.3233\/IDA-230651_ref59","doi-asserted-by":"crossref","first-page":"119368","DOI":"10.1016\/j.ins.2023.119368","article-title":"A bi-variable precision rough set model and its application to attribute reduction","volume":"645","author":"Yu","year":"2023","journal-title":"Information Sciences"},{"issue":"8","key":"10.3233\/IDA-230651_ref60","doi-asserted-by":"crossref","first-page":"4852","DOI":"10.3390\/app13084852","article-title":"Complement-class harmonized na\u00efve bayes classifier","volume":"13","author":"Alenazi","year":"2023","journal-title":"Applied Sciences"},{"key":"10.3233\/IDA-230651_ref61","doi-asserted-by":"crossref","first-page":"119164","DOI":"10.1016\/j.eswa.2022.119164","article-title":"Semi-supervised learning with graph convolutional extreme learning machines","volume":"213","author":"Zhang","year":"2023","journal-title":"Expert Systems with Applications"},{"key":"10.3233\/IDA-230651_ref63","doi-asserted-by":"crossref","first-page":"411","DOI":"10.1016\/j.ins.2023.03.013","article-title":"PN-GCN: Positive-negative graph convolution neural network in information system to classification","volume":"632","author":"Yu","year":"2023","journal-title":"Information Sciences"},{"key":"10.3233\/IDA-230651_ref64","unstructured":"A. Chaouki, J. Read and A. Bifet, Online Decision Tree Construction with Deep Reinforcement Learning, in: Sixteenth European Workshop on Reinforcement Learning, 2023."},{"key":"10.3233\/IDA-230651_ref65","doi-asserted-by":"crossref","first-page":"262","DOI":"10.1016\/j.ijar.2023.01.001","article-title":"Graph neural networks induced by concept lattices for classification","volume":"154","author":"Shao","year":"2023","journal-title":"International Journal of Approximate Reasoning"},{"issue":"4864","key":"10.3233\/IDA-230651_ref68","first-page":"361","article-title":"A hybrid wrapper spider monkey optimization-simulated annealing model for optimal feature selection","volume":"2089","author":"Sahu","year":"2023","journal-title":"Int J Reconfigurable & Embedded Syst ISSN"},{"issue":"1","key":"10.3233\/IDA-230651_ref69","doi-asserted-by":"crossref","first-page":"55","DOI":"10.3390\/bdcc7010055","article-title":"Effect of missing data types and imputation methods on supervised classifiers: An evaluation study","volume":"7","author":"Gabr","year":"2023","journal-title":"Big Data and Cognitive Computing"},{"key":"10.3233\/IDA-230651_ref70","doi-asserted-by":"crossref","unstructured":"F.I. Kumiadi, A. Wulandari and S. Arifin, Feature Selection using Grey Wolf Optimization Algorithm on Light Gradient Boosting Machine, in: 2023 International Conference on Computer Science, Information Technology and Engineering (ICCoSITE), IEEE, 2023, pp.\u00a0795\u2013799.","DOI":"10.1109\/ICCoSITE57641.2023.10127801"},{"issue":"1","key":"10.3233\/IDA-230651_ref71","doi-asserted-by":"crossref","first-page":"706","DOI":"10.1007\/s10489-022-03554-9","article-title":"Feature selection using binary monarch butterfly optimization","volume":"53","author":"Sun","year":"2023","journal-title":"Applied Intelligence"},{"key":"10.3233\/IDA-230651_ref72","doi-asserted-by":"crossref","first-page":"106520","DOI":"10.1016\/j.compbiomed.2022.106520","article-title":"A self-adaptive quantum equilibrium optimizer with artificial bee colony for feature selection","volume":"153","author":"Zhong","year":"2023","journal-title":"Computers in Biology and Medicine"},{"key":"10.3233\/IDA-230651_ref73","doi-asserted-by":"crossref","first-page":"105823","DOI":"10.1016\/j.engappai.2023.105823","article-title":"Feature construction using explanations of individual predictions","volume":"120","author":"Vouk","year":"2023","journal-title":"Engineering Applications of Artificial Intelligence"}],"container-title":["Intelligent Data Analysis"],"original-title":[],"link":[{"URL":"https:\/\/content.iospress.com\/download?id=10.3233\/IDA-230651","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T09:20:44Z","timestamp":1777454444000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.medra.org\/servlet\/aliasResolver?alias=iospress&doi=10.3233\/IDA-230651"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,11,15]]},"references-count":65,"journal-issue":{"issue":"6"},"URL":"https:\/\/doi.org\/10.3233\/ida-230651","relation":{},"ISSN":["1088-467X","1571-4128"],"issn-type":[{"value":"1088-467X","type":"print"},{"value":"1571-4128","type":"electronic"}],"subject":[],"published":{"date-parts":[[2024,11,15]]}}}