{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,29]],"date-time":"2026-03-29T15:23:04Z","timestamp":1774797784669,"version":"3.50.1"},"reference-count":27,"publisher":"Oxford University Press (OUP)","issue":"13","license":[{"start":{"date-parts":[[2022,5,18]],"date-time":"2022-05-18T00:00:00Z","timestamp":1652832000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001809","name":"NSFC","doi-asserted-by":"publisher","award":["61832001"],"award-info":[{"award-number":["61832001"]}],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Beijing Academy of Artificial Intelligence"},{"name":"PKU-Baidu Fund","award":["2019BD006"],"award-info":[{"award-number":["2019BD006"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,6,27]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>The emergence of next-generation sequencing techniques opens up tremendous opportunities for researchers to uncover the basic mechanisms of disease at the molecular level. Recently, automatic machine learning (AutoML) frameworks have been employed for genomic and epigenomic data analysis. However, to analyze those high-dimensional data, existing AutoML frameworks suffer from the following issues: (i) they could not effectively filter out the redundant features from the original data, and (ii) they usually obey the rule of feature engineering first and algorithm hyper-parameter tuning later to build the machine learning pipeline, which could lead to sub-optimal outcomes. Thus, it is an urgent need to design a new AutoML framework for high-dimensional omics data analysis.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>We introduce a new method: AutoDC, a tailored AutoML framework, for different disease classification based on gene expression data. AutoDC designs two novel optimization strategies to improve the performance. One is that AutoDC designs a novel two-stage feature selection method to select the features with high gene contribution scores. The other is that AutoDC proposes a novel optimization method, based on a two-layer Multi-Armed Bandit framework, to jointly optimize the feature engineering, algorithm selection and algorithm hyper-parameter tuning. We apply our framework to two public gene expression datasets. Compared with three state-of-the-art AutoML frameworks, AutoDC could effectively classify diseases with higher predictive accuracy.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability and implementation<\/jats:title><jats:p>The data and codes of AutoDC are available at https:\/\/github.com\/dingdian110\/AutoDC. The data underlying this article are available in the article and in its online supplementary material.<\/jats:p><\/jats:sec><jats:sec><jats:title>Supplementary information<\/jats:title><jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/btac334","type":"journal-article","created":{"date-parts":[[2022,5,18]],"date-time":"2022-05-18T12:47:56Z","timestamp":1652878076000},"page":"3415-3421","source":"Crossref","is-referenced-by-count":7,"title":["<scp>Auto<\/scp>DC: an automatic machine learning framework for disease classification"],"prefix":"10.1093","volume":"38","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3325-1642","authenticated-orcid":false,"given":"Yang","family":"Bai","sequence":"first","affiliation":[{"name":"Key Laboratory of High Confidence Software Technologies (MOE), School of CS, Peking University , Beijing, China"}]},{"given":"Yang","family":"Li","sequence":"additional","affiliation":[{"name":"Key Laboratory of High Confidence Software Technologies (MOE), School of CS, Peking University , Beijing, China"}]},{"given":"Yu","family":"Shen","sequence":"additional","affiliation":[{"name":"Key Laboratory of High Confidence Software Technologies (MOE), School of CS, Peking University , Beijing, China"}]},{"given":"Mingyu","family":"Yang","sequence":"additional","affiliation":[{"name":"Key Laboratory of High Confidence Software Technologies (MOE), School of CS, Peking University , Beijing, China"}]},{"given":"Wentao","family":"Zhang","sequence":"additional","affiliation":[{"name":"Key Laboratory of High Confidence Software Technologies (MOE), School of CS, Peking University , Beijing, China"}]},{"given":"Bin","family":"Cui","sequence":"additional","affiliation":[{"name":"Key Laboratory of High Confidence Software Technologies (MOE), School of CS, Peking University , Beijing, China"},{"name":"Institute of Computational Social Science, Peking University (Qingdao) , Qingdao, China"}]}],"member":"286","published-online":{"date-parts":[[2022,5,18]]},"reference":[{"key":"2023041407594865300_","first-page":"139","author":"Alaa","year":"2018"},{"key":"2023041407594865300_","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1016\/j.patrec.2019.02.018","article-title":"Improved SVD-based initialization for nonnegative matrix factorization using low-rank correction","volume":"122","author":"Atif","year":"2019","journal-title":"Pattern Recognit. Lett"},{"key":"2023041407594865300_","first-page":"471","author":"Binder","year":"2020"},{"key":"2023041407594865300_","doi-asserted-by":"crossref","first-page":"82","DOI":"10.1038\/s41586-020-1969-6","article-title":"Pan-cancer analysis of whole genomes","volume":"578","author":"Campbell","year":"2020","journal-title":"Nature"},{"key":"2023041407594865300_","doi-asserted-by":"crossref","first-page":"249","DOI":"10.3892\/ol.2021.12510","article-title":"Use of four genes in exosomes as biomarkers for the identification of lung adenocarcinoma and lung squamous cell carcinoma","volume":"21","author":"Cao","year":"2021","journal-title":"Oncol. Lett"},{"key":"2023041407594865300_","doi-asserted-by":"crossref","first-page":"389","DOI":"10.1038\/s41576-019-0122-6","article-title":"Deep learning: new computational modelling techniques for genomics","volume":"20","author":"Eraslan","year":"2019","journal-title":"Nat. Rev. Genet"},{"key":"2023041407594865300_","first-page":"113","author":"Feurer","year":"2019"},{"key":"2023041407594865300_","doi-asserted-by":"crossref","first-page":"245","DOI":"10.1016\/j.compbiolchem.2017.10.010","article-title":"A novel feature selection for RNA-seq analysis","volume":"71","author":"Han","year":"2017","journal-title":"Comput. Biol. Chem"},{"key":"2023041407594865300_","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s40537-020-00305-w","article-title":"Survey on categorical data for neural networks","volume":"7","author":"Hancock","year":"2020","journal-title":"J. Big Data"},{"key":"2023041407594865300_","first-page":"826","article-title":"Auto-WEKA 2.0: automatic model selection and hyperparameter optimization in WEKA","volume":"18","author":"Kotthoff","year":"2017","journal-title":"J. Mach. Learn. Res"},{"key":"2023041407594865300_","doi-asserted-by":"crossref","first-page":"R29","DOI":"10.1186\/gb-2014-15-2-r29","article-title":"voom: precision weights unlock linear model analysis tools for RNA-seq read counts","volume":"15","author":"Law","year":"2014","journal-title":"Genome Biol"},{"key":"2023041407594865300_","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41398-018-0234-3","article-title":"Identification and replication of RNA-seq gene network modules associated with depression severity","volume":"8","author":"Le","year":"2018","journal-title":"Transl. Psychiatry"},{"key":"2023041407594865300_","doi-asserted-by":"crossref","first-page":"250","DOI":"10.1093\/bioinformatics\/btz470","article-title":"Scaling tree-based automated machine learning to biomedical big data with a feature set selector","volume":"36","author":"Le","year":"2020","journal-title":"Bioinformatics"},{"key":"2023041407594865300_","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/s11432-018-1512-0","article-title":"Logistic regression algorithm to identify candidate disease genes based on reliable protein-protein interaction network","volume":"64","author":"Lei","year":"2021","journal-title":"Sci. China Inf. Sci"},{"key":"2023041407594865300_","first-page":"4763","author":"Li","year":"2020"},{"key":"2023041407594865300_","first-page":"3209","author":"Li","year":"2021"},{"key":"2023041407594865300_","first-page":"2167","author":"Li","year":"2021"},{"key":"2023041407594865300_","doi-asserted-by":"crossref","first-page":"1795","DOI":"10.1093\/bib\/bby051","article-title":"Review of applications of high-throughput sequencing in personalized medicine: barriers and facilitators of future progress in research and clinical application","volume":"20","author":"Lightbody","year":"2019","journal-title":"Brief. Bioinform"},{"key":"2023041407594865300_","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12885-019-5965-x","article-title":"Distinct signatures of lung cancer types: aberrant mucin O-glycosylation and compromised immune response","volume":"19","author":"Lucchetta","year":"2019","journal-title":"BMC Cancer"},{"key":"2023041407594865300_","first-page":"66","author":"Olson","year":"2016"},{"key":"2023041407594865300_","first-page":"471","author":"Parmentier","year":"2019"},{"key":"2023041407594865300_","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1038\/s41584-020-00538-2","article-title":"Machine learning in precision medicine: lessons to learn","volume":"17","author":"Plant","year":"2020","journal-title":"Nat. Rev. Rheumatol"},{"key":"2023041407594865300_","doi-asserted-by":"crossref","first-page":"107739","DOI":"10.1016\/j.biotechadv.2021.107739","article-title":"Using machine learning approaches for multi-omics data analysis: a review","volume":"49","author":"Reel","year":"2021","journal-title":"Biotechnol. Adv"},{"key":"2023041407594865300_","doi-asserted-by":"crossref","first-page":"11973","DOI":"10.2147\/CMAR.S279974","article-title":"Comprehensive characterization of stage IIIA non-small cell lung carcinoma","volume":"12","author":"Singh","year":"2020","journal-title":"Cancer Manag. Res"},{"key":"2023041407594865300_","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1561\/2200000068","article-title":"Introduction to Multi-Armed Bandits","volume":"12","author":"Slivkins","year":"2019","journal-title":"Found. Trends Mach. Learn."},{"key":"2023041407594865300_","doi-asserted-by":"crossref","first-page":"101822","DOI":"10.1016\/j.artmed.2020.101822","article-title":"Automated machine learning: review of the state-of-the-art and opportunities for healthcare","volume":"104","author":"Waring","year":"2020","journal-title":"Artif. Intell. Med"},{"key":"2023041407594865300_","doi-asserted-by":"crossref","first-page":"333","DOI":"10.1007\/s41019-020-00140-2","article-title":"Exploiting latent semantic subspaces to derive associations for specific pharmaceutical semantics","volume":"5","author":"Wawrzinek","year":"2020","journal-title":"Data Sci. Eng"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btac334\/43845916\/btac334.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/13\/3415\/49883733\/btac334.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/13\/3415\/49883733\/btac334.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,21]],"date-time":"2023-11-21T19:57:34Z","timestamp":1700596654000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/38\/13\/3415\/6588096"}},"subtitle":[],"editor":[{"given":"Pier Luigi","family":"Martelli","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2022,5,18]]},"references-count":27,"journal-issue":{"issue":"13","published-print":{"date-parts":[[2022,6,27]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btac334","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,7,1]]},"published":{"date-parts":[[2022,5,18]]}}}