{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,23]],"date-time":"2026-01-23T10:03:14Z","timestamp":1769162594988,"version":"3.49.0"},"reference-count":45,"publisher":"Wiley","issue":"1","license":[{"start":{"date-parts":[[2021,1,28]],"date-time":"2021-01-28T00:00:00Z","timestamp":1611792000000},"content-version":"vor","delay-in-days":27,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["onlinelibrary.wiley.com"],"crossmark-restriction":true},"short-container-title":["Complexity"],"published-print":{"date-parts":[[2021,1]]},"abstract":"<jats:p>Virtual screening is the most critical process in drug discovery, and it relies on machine learning to facilitate the screening process. It enables the discovery of molecules that bind to a specific protein to form a drug. Despite its benefits, virtual screening generates enormous data and suffers from drawbacks such as high dimensions and imbalance. This paper tackles data imbalance and aims to improve virtual screening accuracy, especially for a minority dataset. For a dataset identified without considering the data\u2019s imbalanced nature, most classification methods tend to have high predictive accuracy for the majority category. However, the accuracy was significantly poor for the minority category. The paper proposes a <jats:italic>K<\/jats:italic>\u2010mean algorithm coupled with Synthetic Minority Oversampling Technique (SMOTE) to overcome the problem of imbalanced datasets. The proposed algorithm is named as KSMOTE. Using KSMOTE, minority data can be identified at high accuracy and can be detected at high precision. A large set of experiments were implemented on Apache Spark using numeric PaDEL and fingerprint descriptors. The proposed solution was compared to both no\u2010sampling method and SMOTE on the same datasets. Experimental results showed that the proposed solution outperformed other methods.<\/jats:p>","DOI":"10.1155\/2021\/6675279","type":"journal-article","created":{"date-parts":[[2021,1,29]],"date-time":"2021-01-29T03:20:10Z","timestamp":1611890410000},"update-policy":"https:\/\/doi.org\/10.1002\/crossmark_policy","source":"Crossref","is-referenced-by-count":16,"title":["Handling Imbalance Classification Virtual Screening Big Data Using Machine Learning Algorithms"],"prefix":"10.1155","volume":"2021","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3674-782X","authenticated-orcid":false,"given":"Sahar K.","family":"Hussin","sequence":"first","affiliation":[]},{"given":"Salah M.","family":"Abdelmageid","sequence":"additional","affiliation":[]},{"given":"Adel","family":"Alkhalil","sequence":"additional","affiliation":[]},{"given":"Yasser M.","family":"Omar","sequence":"additional","affiliation":[]},{"given":"Mahmoud I.","family":"Marie","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0281-9381","authenticated-orcid":false,"given":"Rabie A.","family":"Ramadan","sequence":"additional","affiliation":[]}],"member":"311","published-online":{"date-parts":[[2021,1,28]]},"reference":[{"key":"e_1_2_9_1_2","doi-asserted-by":"publisher","DOI":"10.1186\/1471-2105-15-S16-S2"},{"key":"e_1_2_9_2_2","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pcbi.1000397"},{"key":"e_1_2_9_3_2","doi-asserted-by":"publisher","DOI":"10.1128\/MMBR.66.1.39-63.2002"},{"key":"e_1_2_9_4_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10822-006-9096-5"},{"key":"e_1_2_9_5_2","unstructured":"NIH PubChem 2020 https:\/\/pubchem.ncbi.nlm.nih.gov\/."},{"key":"e_1_2_9_6_2","doi-asserted-by":"publisher","DOI":"10.1016\/S0031-3203(02)00257-1"},{"key":"e_1_2_9_7_2","doi-asserted-by":"publisher","DOI":"10.1186\/1471-2105-9-401"},{"key":"e_1_2_9_8_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.drudis.2010.10.003"},{"key":"e_1_2_9_9_2","doi-asserted-by":"publisher","DOI":"10.1007\/s005210170010"},{"key":"e_1_2_9_10_2","doi-asserted-by":"publisher","DOI":"10.1111\/j.0824-7935.2004.t01-1-00228.x"},{"key":"e_1_2_9_11_2","doi-asserted-by":"publisher","DOI":"10.1021\/ci4000536"},{"key":"e_1_2_9_12_2","doi-asserted-by":"publisher","DOI":"10.3233\/IDA-2002-6504"},{"key":"e_1_2_9_13_2","doi-asserted-by":"publisher","DOI":"10.1613\/jair.1199"},{"key":"e_1_2_9_14_2","first-page":"162","article-title":"Determination of minimum lethal time of commonly used mosquito larvicides","volume":"16","author":"Verma K.","year":"1984","journal-title":"The Journal of Communicable Diseases"},{"key":"e_1_2_9_15_2","volume-title":"Big Data Analysis for Bioinformatics and Biomedical Discoveries","author":"Ye S. Q.","year":"2015"},{"key":"e_1_2_9_16_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.ins.2014.03.043"},{"key":"e_1_2_9_17_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2009.02.048"},{"key":"e_1_2_9_18_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-89743-1_22"},{"key":"e_1_2_9_19_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.eswa.2011.12.043"},{"key":"e_1_2_9_20_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.aca.2007.04.009"},{"key":"e_1_2_9_21_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10822-018-0116-z"},{"key":"e_1_2_9_22_2","doi-asserted-by":"publisher","DOI":"10.1021\/ci400737s"},{"key":"e_1_2_9_23_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10115-009-0198-y"},{"key":"e_1_2_9_24_2","doi-asserted-by":"publisher","DOI":"10.2174\/1389200219666181019094526"},{"key":"e_1_2_9_25_2","doi-asserted-by":"publisher","DOI":"10.1038\/nature03197"},{"key":"e_1_2_9_26_2","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0189538"},{"key":"e_1_2_9_27_2","doi-asserted-by":"publisher","DOI":"10.1186\/1758-2946-1-21"},{"key":"e_1_2_9_28_2","doi-asserted-by":"publisher","DOI":"10.19101\/IJACR.2019.940150"},{"key":"e_1_2_9_29_2","doi-asserted-by":"publisher","DOI":"10.1080\/00401706.1993.10485033"},{"key":"e_1_2_9_30_2","doi-asserted-by":"publisher","DOI":"10.1613\/jair.953"},{"key":"e_1_2_9_31_2","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/btp589"},{"key":"e_1_2_9_32_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10822-008-9192-9"},{"key":"e_1_2_9_33_2","doi-asserted-by":"publisher","DOI":"10.3389\/fchem.2018.00362"},{"key":"e_1_2_9_34_2","doi-asserted-by":"publisher","DOI":"10.1186\/s12859-015-0610-4"},{"key":"e_1_2_9_35_2","unstructured":"Spark Apache spark is a unified analytics engine for big data processing 2020."},{"key":"e_1_2_9_36_2","unstructured":"P. AID440 Primary HTS assay for formylpeptide receptor (FPR) ligands and primary HTS counter-screen assay for formylpeptide-like-1 (FPRL1) ligands 2007 https:\/\/pubchem.ncbi.nlm.nih.gov\/bioassay\/440."},{"key":"e_1_2_9_37_2","unstructured":"P. 624202 AID qHTS assay to identify small molecule activators of BRCA1 expression 2012 https:\/\/pubchem.ncbi.nlm.nih.gov\/bioassay\/624202."},{"key":"e_1_2_9_38_2","unstructured":"P. 651820 AID qHTS assay for inhibitors of hepatitis C virus (HCV) 2012 https:\/\/pubchem.ncbi.nlm.nih.gov\/bioassay\/651820."},{"key":"e_1_2_9_39_2","unstructured":"Chem.libretexts.org Molecules and molecular compounds 2020 https:\/\/chem.libretexts.org\/Bookshelves\/General_Chemistry\/Map%3A_Chemistry_-_The_Central_Science_(Brown_et_al.)\/02._Atoms_Molecules_and_Ions\/2.6%3A_Molecules_and_Molecular_Compounds."},{"key":"e_1_2_9_40_2","doi-asserted-by":"crossref","unstructured":"WangX. LiuX. MatwinS. andJapkowiczN. Applying instance-weighted support vector machines to class imbalanced datasets Proceedings of the 2014 IEEE International Conference on Big Data (Big Data) December 2014 Washington DC USA IEEE 112\u2013118 https:\/\/doi.org\/10.1109\/BigData.2014.7004364 2-s2.0-84988264954.","DOI":"10.1109\/BigData.2014.7004364"},{"key":"e_1_2_9_41_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.molstruc.2018.07.080"},{"key":"e_1_2_9_42_2","doi-asserted-by":"crossref","unstructured":"PurnamiS. W.andTrapsilasiwiR. K. SMOTE-least square support vector machine for classification of multiclass imbalanced data Proceedings of the 9th International Conference on Machine Learning and Computing - ICMLC Febuary 2017 Singapore Singapore 107\u2013111 https:\/\/doi.org\/10.1145\/3055635.3056581 2-s2.0-85024379718.","DOI":"10.1145\/3055635.3056581"},{"key":"e_1_2_9_43_2","doi-asserted-by":"publisher","DOI":"10.1021\/ci100364a"},{"key":"e_1_2_9_44_2","doi-asserted-by":"publisher","DOI":"10.1007\/s10115-009-0198-y"},{"key":"e_1_2_9_45_2","unstructured":"TrapsilasiwiS. P. R. SMOTE-least square support vector machine for classification of multiclass imbalanced data Proceedings of the 9th International Conference on Machine Learning and Computing February 2017 Singapore ACM 107\u2013111."}],"container-title":["Complexity"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/downloads.hindawi.com\/journals\/complexity\/2021\/6675279.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/downloads.hindawi.com\/journals\/complexity\/2021\/6675279.xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/pdf\/10.1155\/2021\/6675279","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,9]],"date-time":"2024-08-09T22:32:19Z","timestamp":1723242739000},"score":1,"resource":{"primary":{"URL":"https:\/\/onlinelibrary.wiley.com\/doi\/10.1155\/2021\/6675279"}},"subtitle":[],"editor":[{"given":"Abd E.I.-Baset","family":"Hassanien","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2021,1]]},"references-count":45,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,1]]}},"alternative-id":["10.1155\/2021\/6675279"],"URL":"https:\/\/doi.org\/10.1155\/2021\/6675279","archive":["Portico"],"relation":{},"ISSN":["1076-2787","1099-0526"],"issn-type":[{"value":"1076-2787","type":"print"},{"value":"1099-0526","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,1]]},"assertion":[{"value":"2020-11-28","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-01-02","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-01-28","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}],"article-number":"6675279"}}