{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,4]],"date-time":"2026-04-04T03:24:28Z","timestamp":1775273068541,"version":"3.50.1"},"reference-count":37,"publisher":"Oxford University Press (OUP)","issue":"4","license":[{"start":{"date-parts":[[2020,10,15]],"date-time":"2020-10-15T00:00:00Z","timestamp":1602720000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["21405068"],"award-info":[{"award-number":["21405068"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100012226","name":"Fundamental Research Funds for the Central Universities","doi-asserted-by":"publisher","award":["lzujbky-2020-sp11"],"award-info":[{"award-number":["lzujbky-2020-sp11"]}],"id":[{"id":"10.13039\/501100012226","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,7,20]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>In order to extract useful information from a huge amount of biological data nowadays, simple and convenient tools are urgently needed for data analysis and modeling. In this paper, an automatic data mining tool, termed as ABCModeller (Automatic Binary Classification Modeller), with a user-friendly graphical interface was developed here, which includes automated functions as data preprocessing, significant feature extraction, classification modeling, model evaluation and prediction. In order to enhance the generalization ability of the final model, a consistent voting method was built here in this tool with the utilization of three popular machine-learning algorithms, as artificial neural network, support vector machine and random forest. Besides, Fibonacci search and orthogonal experimental design methods were also employed here to automatically select significant features in the data space and optimal hyperparameters of the three algorithms to achieve the best model. The reliability of this tool has been verified through multiple benchmark data sets. In addition, with the advantage of a user-friendly graphical interface of this tool, users without any programming skills can easily obtain reliable models directly from original data, which can reduce the complexity of modeling and data mining, and contribute to the development of related research including but not limited to biology. The excitable file of this tool can be downloaded from http:\/\/lishuyan.lzu.edu.cn\/ABCModeller.rar.<\/jats:p>","DOI":"10.1093\/bib\/bbaa247","type":"journal-article","created":{"date-parts":[[2020,9,5]],"date-time":"2020-09-05T11:12:56Z","timestamp":1599304376000},"source":"Crossref","is-referenced-by-count":3,"title":["ABCModeller: an automatic data mining tool based on a consistent voting method with a user-friendly graphical interface"],"prefix":"10.1093","volume":"22","author":[{"given":"Pengyi","family":"Zhang","sequence":"first","affiliation":[{"name":"Lanzhou University"}]},{"given":"Jiangpeng","family":"Wu","sequence":"additional","affiliation":[{"name":"Lanzhou University"}]},{"given":"Honglin","family":"Zhai","sequence":"additional","affiliation":[{"name":"Lanzhou University"}]},{"given":"Shuyan","family":"Li","sequence":"additional","affiliation":[{"name":"Lanzhou University"}]}],"member":"286","published-online":{"date-parts":[[2020,10,15]]},"reference":[{"key":"2021072112192444700_ref1","author":"NCBI. GenBank"},{"key":"2021072112192444700_ref2","doi-asserted-by":"crossref","first-page":"86","DOI":"10.1093\/bib\/bbk007","article-title":"Machine learning in bioinformatics","volume":"7","author":"Larra\u00f1aga","year":"2006","journal-title":"Brief Bioinform"},{"key":"2021072112192444700_ref3","doi-asserted-by":"crossref","first-page":"1058","DOI":"10.1093\/bib\/bbz049","article-title":"Consistent gene signature of schizophrenia identified by a novel feature selection strategy from comprehensive sets of transcriptomic data","volume":"21","author":"Yang","year":"2019","journal-title":"Brief Bioinform"},{"key":"2021072112192444700_ref4","doi-asserted-by":"crossref","first-page":"1054","DOI":"10.1111\/cns.13196","article-title":"Identification of the gene signature reflecting schizophrenia's etiology by constructing artificial intelligence-based method of enhanced reproducibility","volume":"25","author":"Yang","year":"2019","journal-title":"CNS Neurosci Ther"},{"key":"2021072112192444700_ref5","doi-asserted-by":"crossref","first-page":"1437","DOI":"10.1093\/bib\/bbz081","article-title":"Protein functional annotation of simultaneously improved stability, accuracy and false discovery rate achieved by a sequence-based deep learning","volume":"21","author":"Hong","year":"2019","journal-title":"Brief Bioinform"},{"key":"2021072112192444700_ref6","doi-asserted-by":"publisher","DOI":"10.1093\/bib\/bbz120","article-title":"Convolutional neural network-based annotation of bacterial type IV secretion system effectors with enhanced accuracy and reduced false discovery","author":"Hong","year":"2019","journal-title":"Brief Bioinform"},{"key":"2021072112192444700_ref7","doi-asserted-by":"publisher","DOI":"10.1093\/bib\/bbaa006","article-title":"GCdiscrimination: identification of gastric cancer based on a milliliter of blood","author":"Wu","year":"2020","journal-title":"Brief Bioinform"},{"key":"2021072112192444700_ref8","doi-asserted-by":"crossref","first-page":"12","DOI":"10.2196\/13476","article-title":"A machine learning method for identifying lung cancer based on routine blood indices: qualitative feasibility study","volume":"7","author":"Wu","year":"2019","journal-title":"JMIR Med Inform"},{"key":"2021072112192444700_ref9","doi-asserted-by":"crossref","first-page":"4561","DOI":"10.1021\/acs.jcim.9b00678","article-title":"ATBdiscrimination: an in silico tool for identification of active tuberculosis disease based on routine blood test and T-SPOT.TB detection results","volume":"59","author":"Wu","year":"2019","journal-title":"J Chem Inf Model"},{"key":"2021072112192444700_ref10","doi-asserted-by":"crossref","DOI":"10.1007\/978-1-4757-2440-0","volume-title":"The Nature of Statistical Learning","author":"Vapnik","year":"1995"},{"key":"2021072112192444700_ref11","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman","year":"2001","journal-title":"Mach Learn"},{"key":"2021072112192444700_ref12","first-page":"801","article-title":"Arcing classifiers","volume":"26","author":"Breiman","year":"1998","journal-title":"Ann Stat"},{"key":"2021072112192444700_ref13","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1016\/S0092-8240(05)80006-0","article-title":"A logical calculus of the ideas immanent in nervous activity","volume":"52","author":"McCulloch","year":"1990","journal-title":"Bull Math Biol"},{"key":"2021072112192444700_ref14","volume-title":"Algorithms for Hyper-Parameter Optimization","author":"Bergstra","year":"2011"},{"key":"2021072112192444700_ref15","volume-title":"Practical Bayesian Optimization of Machine Learning Algorithms","author":"Snoek","year":"2012"},{"key":"2021072112192444700_ref16","doi-asserted-by":"crossref","first-page":"148","DOI":"10.1109\/JPROC.2015.2494218","article-title":"Taking the human out of the loop: a review of Bayesian optimization","volume":"104","author":"Shahriari","year":"2016","journal-title":"Proc IEEE"},{"key":"2021072112192444700_ref17","article-title":"Google vizier: a service for Black-box optimization","author":"Golovin","year":"2017","journal-title":"Kdd\u201917: proceedings of the 23rd Acm sigkdd international conference on knowledge discovery and data mining"},{"key":"2021072112192444700_ref18","doi-asserted-by":"crossref","DOI":"10.1090\/S0002-9939-1953-0055639-3","article-title":"Sequential minimax search for a maximum","author":"Kiefer","year":"1953"},{"key":"2021072112192444700_ref19","volume-title":"Taguchi Methods: A Hands-On Approach","author":"Peace","year":"1993"},{"key":"2021072112192444700_ref20","first-page":"2825","article-title":"Scikit-learn: machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J Mach Learn Res"},{"key":"2021072112192444700_ref21","first-page":"1","article-title":"Imbalanced-learn: a Python toolbox to tackle the curse of imbalanced datasets in machine learning","volume":"18","author":"Lema\u00eetre","year":"2017","journal-title":"J Mach Learn Res"},{"key":"2021072112192444700_ref22","doi-asserted-by":"crossref","first-page":"168","DOI":"10.1016\/j.jbi.2018.07.015","article-title":"Benchmarking relief-based feature selection methods for bioinformatics data mining","volume":"85","author":"Urbanowicz","year":"2018","journal-title":"J Biomed Inform"},{"key":"2021072112192444700_ref23","doi-asserted-by":"crossref","first-page":"321","DOI":"10.1613\/jair.953","article-title":"SMOTE: synthetic minority over-sampling technique","volume":"16","author":"Chawla","year":"2002","journal-title":"J Artif Intell Res"},{"key":"2021072112192444700_ref24","first-page":"878","author":"Han","year":"2005"},{"key":"2021072112192444700_ref25","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1186\/1756-0381-2-5","article-title":"Spatially uniform relieff (SURF) for computationally-efficient filtering of gene-gene interactions","volume":"2","author":"Greene","year":"2009","journal-title":"BioData Mining"},{"key":"2021072112192444700_ref26","doi-asserted-by":"crossref","first-page":"e87357","DOI":"10.1371\/journal.pone.0087357","article-title":"Mutual information between discrete and continuous data sets","volume":"9","author":"Ross","year":"2014","journal-title":"PloS One"},{"issue":"6","key":"2021072112192444700_ref27","doi-asserted-by":"crossref","first-page":"066138","DOI":"10.1103\/PhysRevE.69.066138","article-title":"Estimating mutual information","volume":"69","author":"Kraskov","year":"2004","journal-title":"Physical Review E"},{"key":"2021072112192444700_ref28","doi-asserted-by":"crossref","first-page":"190","DOI":"10.1177\/0272989X8900900307","article-title":"Analyzing a portion of the ROC curve","volume":"9","author":"McClish","year":"1989","journal-title":"Med Decis Making"},{"key":"2021072112192444700_ref29","doi-asserted-by":"crossref","DOI":"10.1145\/1961189.1961199","article-title":"LIBSVM: a library for support vector machines","volume":"2","author":"Chang","year":"2011","journal-title":"ACM Trans Intell Syst Technol"},{"key":"2021072112192444700_ref30","first-page":"975","article-title":"Probability estimates for multi-class classification by pairwise coupling","volume":"5","author":"Wu","year":"2004","journal-title":"J Mach Learn Res"},{"key":"2021072112192444700_ref31","volume-title":"UCI Machine Learning Repository","author":"Dua","year":"2017"},{"key":"2021072112192444700_ref32","doi-asserted-by":"crossref","first-page":"9193","DOI":"10.1073\/pnas.87.23.9193","article-title":"Multisurface method of pattern separation for medical diagnosis applied to breast cytology","volume":"87","author":"Wolberg","year":"1990","journal-title":"Proc Natl Acad Sci USA"},{"key":"2021072112192444700_ref33","doi-asserted-by":"crossref","first-page":"6745","DOI":"10.1073\/pnas.96.12.6745","article-title":"Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays","volume":"96","author":"Alon","year":"1999","journal-title":"Proc Natl Acad Sci USA"},{"key":"2021072112192444700_ref34","doi-asserted-by":"crossref","first-page":"442","DOI":"10.1016\/0005-2795(75)90109-9","article-title":"Comparison of the predicted and observed secondary structure of T4 phage lysozyme","volume":"405","author":"Matthews","year":"1975","journal-title":"Biochim Biophys Acta"},{"key":"2021072112192444700_ref35","doi-asserted-by":"crossref","first-page":"504","DOI":"10.1093\/bib\/bbx138","article-title":"Machine learning approaches to decipher hormone and HER2 receptor status phenotypes in breast cancer","volume":"20","author":"Adabor","year":"2017","journal-title":"Brief Bioinform"},{"key":"2021072112192444700_ref36","volume-title":"TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems","author":"Mn","year":"2015"},{"key":"2021072112192444700_ref37","first-page":"6765","article-title":"Hyperband: a novel bandit-based approach to hyperparameter optimization","volume":"18","author":"Li","year":"2017","journal-title":"J Mach Learn Res"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bib\/article-pdf\/22\/4\/bbaa247\/39136431\/bbaa247.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/academic.oup.com\/bib\/article-pdf\/22\/4\/bbaa247\/39136431\/bbaa247.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,7,21]],"date-time":"2021-07-21T12:23:30Z","timestamp":1626870210000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbaa247\/5924101"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,10,15]]},"references-count":37,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2021,7,20]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbaa247","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"value":"1467-5463","type":"print"},{"value":"1477-4054","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,7]]},"published":{"date-parts":[[2020,10,15]]},"article-number":"bbaa247"}}