{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,13]],"date-time":"2026-04-13T02:42:48Z","timestamp":1776048168441,"version":"3.50.1"},"reference-count":9,"publisher":"Oxford University Press (OUP)","issue":"19","license":[{"start":{"date-parts":[[2021,3,28]],"date-time":"2021-03-28T00:00:00Z","timestamp":1616889600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,10,11]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Summary<\/jats:title>\n                  <jats:p>Finding informative predictive features in high-dimensional biological case\u2013control datasets is challenging. The Extreme Pseudo-Sampling (EPS) algorithm offers a solution to the challenge of feature selection via a combination of deep learning and linear regression models. First, using a variational autoencoder, it generates complex latent representations for the samples. Second, it classifies the latent representations of cases and controls via logistic regression. Third, it generates new samples (pseudo-samples) around the extreme cases and controls in the regression model. Finally, it trains a new regression model over the upsampled space. The most significant variables in this regression are selected. We present an open-source implementation of the algorithm that is easy to set up, use and customize. Our package enhances the original algorithm by providing new features and customizability for data preparation, model training and classification functionalities. We believe the new features will enable the adoption of the algorithm for a diverse range of datasets.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>The software package for Python is available online at https:\/\/github.com\/roohy\/eps.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btab214","type":"journal-article","created":{"date-parts":[[2021,3,26]],"date-time":"2021-03-26T20:11:07Z","timestamp":1616789467000},"page":"3372-3373","source":"Crossref","is-referenced-by-count":2,"title":["EPS: automated feature selection in case\u2013control studies using extreme pseudo-sampling"],"prefix":"10.1093","volume":"37","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1065-7244","authenticated-orcid":false,"given":"Ruhollah","family":"Shemirani","sequence":"first","affiliation":[{"name":"Information Sciences Institute, University of Southern California, Marina del Rey , CA 90292, USA"}]},{"given":"Stephane","family":"Wenric","sequence":"additional","affiliation":[{"name":"Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai , New York, NY 10029, USA"}]},{"given":"Eimear","family":"Kenny","sequence":"additional","affiliation":[{"name":"Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai , New York, NY 10029, USA"}]},{"given":"Jos\u00e9 Luis","family":"Ambite","sequence":"additional","affiliation":[{"name":"Information Sciences Institute, University of Southern California, Marina del Rey , CA 90292, USA"}]}],"member":"286","published-online":{"date-parts":[[2021,3,28]]},"reference":[{"key":"2023051608270513600_btab214-B1","doi-asserted-by":"crossref","first-page":"219","DOI":"10.1142\/9789813207813_0022","article-title":"A deep learning approach for cancer detection and relevant gene identification","author":"Danaee","year":"2017","journal-title":"Pacific Symposium on Biocomputing 2017"},{"key":"2023051608270513600_btab214-B2","doi-asserted-by":"crossref","first-page":"S4","DOI":"10.1186\/1471-2105-15-S13-S4","article-title":"Feature selection and classifier performance on diverse biological datasets","volume":"15","author":"Hemphill","year":"2014","journal-title":"BMC Bioinformatics"},{"key":"2023051608270513600_btab214-B3","author":"Kingma","year":"2014"},{"key":"2023051608270513600_btab214-B4","author":"Kingma","year":"2013"},{"key":"2023051608270513600_btab214-B5","doi-asserted-by":"crossref","first-page":"e1001779","DOI":"10.1371\/journal.pmed.1001779","article-title":"UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age","volume":"12","author":"Sudlow","year":"2015","journal-title":"PLoS Med"},{"key":"2023051608270513600_btab214-B6","first-page":"132","article-title":"Unsupervised feature construction and knowledge extraction from genome-wide assays of breast cancer with denoising autoencoders","author":"Tan","year":"2014","journal-title":"Pacific Symposium on Biocomputing Co-Chairs"},{"key":"2023051608270513600_btab214-B7","doi-asserted-by":"crossref","first-page":"1113","DOI":"10.1038\/ng.2764","article-title":"The cancer genome atlas pan-cancer analysis project","volume":"45","author":"Weinstein","year":"2013","journal-title":"Nat. Genet"},{"key":"2023051608270513600_btab214-B8","doi-asserted-by":"crossref","first-page":"297","DOI":"10.3389\/fgene.2018.00297","article-title":"Using supervised learning methods for gene selection in RNA-seq case-control studies","volume":"9","author":"Wenric","year":"2018","journal-title":"Front. Genet"},{"key":"2023051608270513600_btab214-B9","article-title":"ctcRbase: the gene expression database of circulating tumor cells and microemboli","volume":"2020","author":"Zhao","year":"2020","journal-title":"J. Biol. Databases Curation"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btab214\/39309864\/btab214.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/19\/3372\/50338126\/btab214.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/19\/3372\/50338126\/btab214.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,16]],"date-time":"2023-05-16T08:40:58Z","timestamp":1684226458000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/37\/19\/3372\/6198102"}},"subtitle":[],"editor":[{"given":"Pier","family":"Luigi Martelli","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2021,3,28]]},"references-count":9,"journal-issue":{"issue":"19","published-print":{"date-parts":[[2021,10,11]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btab214","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,10,1]]},"published":{"date-parts":[[2021,3,28]]}}}