{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,2]],"date-time":"2026-05-02T09:24:47Z","timestamp":1777713887445,"version":"3.51.4"},"reference-count":61,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2020,3,20]],"date-time":"2020-03-20T00:00:00Z","timestamp":1584662400000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"},{"start":{"date-parts":[[2020,3,20]],"date-time":"2020-03-20T00:00:00Z","timestamp":1584662400000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100004319","name":"Pfizer","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100004319","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2020,12]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Background<\/jats:title><jats:p>The ability to confidently predict health outcomes from gene expression would catalyze a revolution in molecular diagnostics. Yet, the goal of developing actionable, robust, and reproducible predictive signatures of phenotypes such as clinical outcome has not been attained in almost any disease area. Here, we report a comprehensive analysis spanning prediction tasks from ulcerative colitis, atopic dermatitis, diabetes, to many cancer subtypes for a total of 24 binary and multiclass prediction problems and 26 survival analysis tasks. We systematically investigate the influence of gene subsets, normalization methods and prediction algorithms. Crucially, we also explore the novel use of deep representation learning methods on large transcriptomics compendia, such as GTEx and TCGA, to boost the performance of state-of-the-art methods. The resources and findings in this work should serve as both an up-to-date reference on attainable performance, and as a benchmarking resource for further research.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>Approaches that combine large numbers of genes outperformed single gene methods consistently and with a significant margin, but neither unsupervised nor semi-supervised representation learning techniques yielded consistent improvements in out-of-sample performance across datasets. Our findings suggest that using<jats:italic>l<\/jats:italic><jats:sub>2<\/jats:sub>-regularized regression methods applied to centered log-ratio transformed transcript abundances provide the best predictive analyses overall.<\/jats:p><\/jats:sec><jats:sec><jats:title>Conclusions<\/jats:title><jats:p>Transcriptomics-based phenotype prediction benefits from proper normalization techniques and state-of-the-art regularized regression approaches. In our view, breakthrough performance is likely contingent on factors which are independent of normalization and general modeling techniques; these factors might include reduction of systematic errors in sequencing data, incorporation of other data types such as single-cell sequencing and proteomics, and improved use of prior knowledge.<\/jats:p><\/jats:sec>","DOI":"10.1186\/s12859-020-3427-8","type":"journal-article","created":{"date-parts":[[2020,3,20]],"date-time":"2020-03-20T13:03:00Z","timestamp":1584709380000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":70,"title":["Standard machine learning approaches outperform deep representation learning on phenotype prediction from transcriptomics data"],"prefix":"10.1186","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5998-8637","authenticated-orcid":false,"given":"Aaron M.","family":"Smith","sequence":"first","affiliation":[]},{"given":"Jonathan R.","family":"Walsh","sequence":"additional","affiliation":[]},{"given":"John","family":"Long","sequence":"additional","affiliation":[]},{"given":"Craig B.","family":"Davis","sequence":"additional","affiliation":[]},{"given":"Peter","family":"Henstock","sequence":"additional","affiliation":[]},{"given":"Martin R.","family":"Hodge","sequence":"additional","affiliation":[]},{"given":"Mateusz","family":"Maciejewski","sequence":"additional","affiliation":[]},{"given":"Xinmeng Jasmine","family":"Mu","sequence":"additional","affiliation":[]},{"given":"Stephen","family":"Ra","sequence":"additional","affiliation":[]},{"given":"Shanrong","family":"Zhao","sequence":"additional","affiliation":[]},{"given":"Daniel","family":"Ziemek","sequence":"additional","affiliation":[]},{"given":"Charles K.","family":"Fisher","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2020,3,20]]},"reference":[{"key":"3427_CR1","doi-asserted-by":"publisher","first-page":"277","DOI":"10.1007\/978-1-4939-7493-1_14","volume":"1711","author":"NS Madhukar","year":"2018","unstructured":"Madhukar NS, Elemento O. Bioinformatics Approaches to Predict Drug Responses from Genomic Sequencing. Methods Mol Biol (Clifton, N.J.) 2018; 1711:277\u201396. https:\/\/doi.org\/10.1007\/978-1-4939-7493-1-14.","journal-title":"Methods Mol Biol (Clifton, N.J.)"},{"issue":"9","key":"3427_CR2","doi-asserted-by":"publisher","first-page":"888","DOI":"10.1038\/nbt.3000","volume":"32","author":"S Li","year":"2014","unstructured":"Li S, \u0141abaj PP, Zumbo P, Sykacek P, Shi W, Shi L, Phan J, Wu P-Y, Wang M, Wang C, Thierry-Mieg D, Thierry-Mieg J, Kreil DP, Mason CE. Detecting and correcting systematic variation in large-scale RNA sequencing data. Nat Biotechnol. 2014; 32(9):888\u201395. https:\/\/doi.org\/10.1038\/nbt.3000.","journal-title":"Nat Biotechnol"},{"issue":"11","key":"3427_CR3","doi-asserted-by":"publisher","first-page":"1015","DOI":"10.1038\/nbt.2702","volume":"31","author":"PAC t\u0301 Hoen","year":"2013","unstructured":"t\u0301 Hoen PAC, et al. Reproducibility of high-throughput mRNA and small RNA sequencing across laboratories. Nat Biotechnol. 2013; 31(11):1015\u201322. https:\/\/doi.org\/10.1038\/nbt.2702.","journal-title":"Nat Biotechnol"},{"issue":"141","key":"3427_CR4","doi-asserted-by":"publisher","first-page":"20170387","DOI":"10.1098\/rsif.2017.0387","volume":"15","author":"Ching","year":"2018","unstructured":"Ching, et al. Opportunities and obstacles for deep learning in biology and medicine. J R Soc Interface. 2018; 15(141):20170387. https:\/\/doi.org\/10.1098\/rsif.2017.0387.","journal-title":"J R Soc Interface"},{"issue":"5","key":"3427_CR5","doi-asserted-by":"publisher","first-page":"1445","DOI":"10.1021\/acs.molpharmaceut.5b00982","volume":"13","author":"P Mamoshina","year":"2016","unstructured":"Mamoshina P, Vieira A, Putin E, Zhavoronkov A. Applications of Deep Learning in Biomedicine. Mol Pharm. 2016; 13(5):1445\u201354. https:\/\/doi.org\/10.1021\/acs.molpharmaceut.5b00982.","journal-title":"Mol Pharm"},{"issue":"1","key":"3427_CR6","doi-asserted-by":"publisher","first-page":"449","DOI":"10.1186\/1471-2105-12-449","volume":"12","author":"AC Frazee","year":"2011","unstructured":"Frazee AC, Langmead B, Leek JT. ReCount: A multi-experiment resource of analysis-ready RNA-seq gene count datasets. BMC Bioinformatics. 2011; 12(1):449. https:\/\/doi.org\/10.1186\/1471-2105-12-449.","journal-title":"BMC Bioinformatics"},{"key":"3427_CR7","doi-asserted-by":"publisher","first-page":"319","DOI":"10.1038\/nbt.3838","volume":"35","author":"L Collado-Torres","year":"2017","unstructured":"Collado-Torres L, Nellore A, Kammers K, Ellis SE, Taub MA, Hansen KD, Jaffe AE, Langmead B, Leek JT. Reproducible RNA-seq analysis using recount2. Nat Biotechnol. 2017; 35:319\u201321. https:\/\/doi.org\/10.1038\/nbt.3838.","journal-title":"Nat Biotechnol"},{"issue":"1","key":"3427_CR8","doi-asserted-by":"publisher","first-page":"1366","DOI":"10.1038\/s41467-018-03751-6","volume":"9","author":"A Lachmann","year":"2018","unstructured":"Lachmann A, Torre D, Keenan AB, Jagodnik KM, Lee HJ, Wang L, Silverstein MC, Ma\u2019ayan A. Massive mining of publicly available RNA-seq data from human and mouse. Nat Commun. 2018; 9(1):1366. https:\/\/doi.org\/10.1038\/s41467-018-03751-6.","journal-title":"Nat Commun"},{"issue":"9","key":"3427_CR9","doi-asserted-by":"publisher","first-page":"54","DOI":"10.1093\/nar\/gky102","volume":"46","author":"SE Ellis","year":"2018","unstructured":"Ellis SE, Collado-Torres L, Jaffe A, Leek JT. Improving the value of public RNA-seq expression data by phenotype prediction. Nucleic Acids Res. 2018; 46(9):54. https:\/\/doi.org\/10.1093\/nar\/gky102.","journal-title":"Nucleic Acids Res"},{"issue":"16","key":"3427_CR10","doi-asserted-by":"publisher","first-page":"0","DOI":"10.1186\/s12859-016-1311-3","volume":"17","author":"M G\u00f6nen","year":"2016","unstructured":"G\u00f6nen M. Integrating gene set analysis and nonlinear predictive modeling of disease phenotypes using a Bayesian multitask formulation. BMC Bioinformatics. 2016; 17(16):0. https:\/\/doi.org\/10.1186\/s12859-016-1311-3.","journal-title":"BMC Bioinformatics"},{"issue":"43","key":"3427_CR11","doi-asserted-by":"publisher","first-page":"15545","DOI":"10.1073\/pnas.0506580102","volume":"102","author":"A Subramanian","year":"2005","unstructured":"Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci. 2005; 102(43):15545\u201350. https:\/\/doi.org\/10.1073\/pnas.0506580102.","journal-title":"Proc Natl Acad Sci"},{"issue":"1","key":"3427_CR12","doi-asserted-by":"publisher","first-page":"25","DOI":"10.1038\/75556","volume":"25","author":"M Ashburner","year":"2000","unstructured":"Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G. Gene Ontology: Tool for the unification of biology. Nat Genet. 2000; 25(1):25\u201329. https:\/\/doi.org\/10.1038\/75556.","journal-title":"Nat Genet"},{"issue":"1","key":"3427_CR13","doi-asserted-by":"publisher","first-page":"1237","DOI":"10.1038\/s41598-018-19635-0","volume":"8","author":"K Zarringhalam","year":"2018","unstructured":"Zarringhalam K, Degras D, Brockel C, Ziemek D. Robust phenotype prediction from gene expression data using differential shrinkage of co-regulated genes. Sci Rep. 2018; 8(1):1237. https:\/\/doi.org\/10.1038\/s41598-018-19635-0.","journal-title":"Sci Rep"},{"issue":"12","key":"3427_CR14","doi-asserted-by":"publisher","first-page":"69","DOI":"10.1093\/bioinformatics\/btu272","volume":"30","author":"K Zarringhalam","year":"2014","unstructured":"Zarringhalam K, Enayetallah A, Reddy P, Ziemek D. Robust clinical outcome prediction based on Bayesian analysis of transcriptional profiles and prior causal networks. Bioinformatics. 2014; 30(12):69\u201377. https:\/\/doi.org\/10.1093\/bioinformatics\/btu272.","journal-title":"Bioinformatics"},{"issue":"1","key":"3427_CR15","doi-asserted-by":"publisher","first-page":"565","DOI":"10.1186\/s12859-017-1984-2","volume":"18","author":"T Kang","year":"2017","unstructured":"Kang T, Ding W, Zhang L, Ziemek D, Zarringhalam K. A biological network-based regularized artificial neural network model for robust phenotype prediction from gene expression data. BMC Bioinformatics. 2017; 18(1):565. https:\/\/doi.org\/10.1186\/s12859-017-1984-2.","journal-title":"BMC Bioinformatics"},{"issue":"2","key":"3427_CR16","doi-asserted-by":"publisher","first-page":"110","DOI":"10.1016\/S1672-0229(06)60022-3","volume":"4","author":"Y-J Shen","year":"2006","unstructured":"Shen Y-J, Huang S-G. Improve Survival Prediction Using Principal Components of Gene Expression Data. Genom Proteomics Bioinforma. 2006; 4(2):110\u20139. https:\/\/doi.org\/10.1016\/S1672-0229(06)60022-3.","journal-title":"Genom Proteomics Bioinforma"},{"issue":"12","key":"3427_CR17","doi-asserted-by":"publisher","first-page":"1053","DOI":"10.1038\/s41592-018-0229-2","volume":"15","author":"R Lopez","year":"2018","unstructured":"Lopez R, Regier J, Cole MB, Jordan MI, Yosef N. Deep generative modeling for single-cell transcriptomics. Nat Methods. 2018; 15(12):1053. https:\/\/doi.org\/10.1038\/s41592-018-0229-2.","journal-title":"Nat Methods"},{"key":"3427_CR18","doi-asserted-by":"publisher","unstructured":"Gr\u00f8nbech CH, Vording MF, Timshel PN, S\u00f8nderby CK, Pers TH, Winther O. scVAE: Variational auto-encoders for single-cell gene expression data. bioRxiv. 2018. https:\/\/doi.org\/10.1101\/318295.","DOI":"10.1101\/318295"},{"key":"3427_CR19","first-page":"80","volume":"23","author":"GP Way","year":"2018","unstructured":"Way GP, Greene CS. Extracting a biologically relevant latent space from cancer transcriptomes with variational autoencoders. Pac Symp Biocomput Pac Symp Biocomput. 2018; 23:80\u201391.","journal-title":"Pac Symp Biocomput Pac Symp Biocomput"},{"key":"3427_CR20","unstructured":"Rampasek L, Hidru D, Smirnov P, Haibe-Kains B, Goldenberg A. Dr.VAE: Drug Response Variational Autoencoder. 2017. http:\/\/arxiv.org\/abs\/1706.08203."},{"issue":"8","key":"3427_CR21","doi-asserted-by":"publisher","first-page":"1798","DOI":"10.1109\/TPAMI.2013.50","volume":"35","author":"Y Bengio","year":"2013","unstructured":"Bengio Y, Courville A, Vincent P. Representation Learning: A Review and New Perspectives. IEEE Trans Pattern Anal Mach Intell. 2013; 35(8):1798\u2013828. https:\/\/doi.org\/10.1109\/TPAMI.2013.50.","journal-title":"IEEE Trans Pattern Anal Mach Intell"},{"key":"3427_CR22","doi-asserted-by":"publisher","first-page":"580","DOI":"10.1038\/ng.2653","volume":"45","author":"Lonsdale","year":"2013","unstructured":"Lonsdale, et al. The Genotype-Tissue Expression (GTEx) project. Nat Genet. 2013; 45:580\u20135. https:\/\/doi.org\/10.1038\/ng.2653.","journal-title":"Nat Genet"},{"issue":"2","key":"3427_CR23","doi-asserted-by":"publisher","first-page":"400","DOI":"10.1016\/j.cell.2018.02.052","volume":"173","author":"Liu","year":"2018","unstructured":"Liu, et al. An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High-Quality Survival Outcome Analytics. Cell. 2018; 173(2):400\u201341611. https:\/\/doi.org\/10.1016\/j.cell.2018.02.052.","journal-title":"Cell"},{"issue":"Database issue","key":"3427_CR24","doi-asserted-by":"publisher","first-page":"991","DOI":"10.1093\/nar\/gks1193","volume":"41","author":"Barrett","year":"2013","unstructured":"Barrett, et al. NCBI GEO: Archive for functional genomics data sets\u2013update. Nucleic Acids Res. 2013; 41(Database issue):991\u20135. https:\/\/doi.org\/10.1093\/nar\/gks1193.","journal-title":"Nucleic Acids Res"},{"issue":"4","key":"3427_CR25","doi-asserted-by":"publisher","first-page":"281","DOI":"10.1007\/s12064-012-0162-3","volume":"131","author":"GP Wagner","year":"2012","unstructured":"Wagner GP, Kin K, Lynch VJ. Measurement of mRNA abundance using RNA-seq data: RPKM measure is inconsistent among samples. Theory Biosci Theor Den Biowissenschaften. 2012; 131(4):281\u20135. https:\/\/doi.org\/10.1007\/s12064-012-0162-3.","journal-title":"Theory Biosci Theor Den Biowissenschaften"},{"issue":"1","key":"3427_CR26","doi-asserted-by":"publisher","first-page":"323","DOI":"10.1186\/1471-2105-12-323","volume":"12","author":"B Li","year":"2011","unstructured":"Li B, Dewey CN. RSEM: Accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics. 2011; 12(1):323. https:\/\/doi.org\/10.1186\/1471-2105-12-323.","journal-title":"BMC Bioinformatics"},{"issue":"2","key":"3427_CR27","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1111\/j.2517-6161.1982.tb01195.x","volume":"44","author":"J Aitchison","year":"1982","unstructured":"Aitchison J. The Statistical Analysis of Compositional Data. J R Stat Soc Ser B (Methodological). 1982; 44(2):139\u201377.","journal-title":"J R Stat Soc Ser B (Methodological)"},{"issue":"3","key":"3427_CR28","doi-asserted-by":"publisher","first-page":"1004075","DOI":"10.1371\/journal.pcbi.1004075","volume":"11","author":"D Lovell","year":"2015","unstructured":"Lovell D, Pawlowsky-Glahn V, Egozcue JJ, Marguerat S, B\u00e4hler J. Proportionality: A valid alternative to correlation for relative data. PLoS Comput Biol. 2015; 11(3):1004075. https:\/\/doi.org\/10.1371\/journal.pcbi.1004075.","journal-title":"PLoS Comput Biol"},{"issue":"1","key":"3427_CR29","doi-asserted-by":"publisher","first-page":"15","DOI":"10.1186\/2049-2618-2-15","volume":"2","author":"AD Fernandes","year":"2014","unstructured":"Fernandes AD, Reid JN, Macklaim JM, McMurrough TA, Edgell DR, Gloor GB. Unifying the analysis of high-throughput sequencing datasets: Characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis. Microbiome. 2014; 2(1):15. https:\/\/doi.org\/10.1186\/2049-2618-2-15.","journal-title":"Microbiome"},{"issue":"19","key":"3427_CR30","doi-asserted-by":"publisher","first-page":"2519","DOI":"10.1093\/bioinformatics\/btt432","volume":"29","author":"K Chawla","year":"2013","unstructured":"Chawla K, Tripathi S, Thommesen L, L\u00e6greid A, Kuiper M. TFcheckpoint: A curated compendium of specific DNA-binding RNA polymerase II transcription factors. Bioinformatics (Oxford, England). 2013; 29(19):2519\u201320. https:\/\/doi.org\/10.1093\/bioinformatics\/btt432.","journal-title":"Bioinformatics (Oxford, England)"},{"issue":"1","key":"3427_CR31","doi-asserted-by":"publisher","first-page":"50","DOI":"10.1214\/aoms\/1177730491","volume":"18","author":"HB Mann","year":"1947","unstructured":"Mann HB, Whitney DR. On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. Ann Math Stat. 1947; 18(1):50\u201360. https:\/\/doi.org\/10.1214\/aoms\/1177730491.","journal-title":"Ann Math Stat"},{"key":"3427_CR32","doi-asserted-by":"crossref","unstructured":"Harrell FE. Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis: Springer; 2001. https:\/\/www.springer.com\/gp\/book\/9781441929181.","DOI":"10.1007\/978-1-4757-3462-1"},{"issue":"D1","key":"3427_CR33","doi-asserted-by":"publisher","first-page":"649","DOI":"10.1093\/nar\/gkx1132","volume":"46","author":"Fabregat","year":"2018","unstructured":"Fabregat, et al. The Reactome Pathway Knowledgebase. Nucleic Acids Res. 2018; 46(D1):649\u201355. https:\/\/doi.org\/10.1093\/nar\/gkx1132.","journal-title":"Nucleic Acids Res"},{"issue":"4","key":"3427_CR34","doi-asserted-by":"publisher","first-page":"523","DOI":"10.1093\/bioinformatics\/btt703","volume":"30","author":"A Kr\u00e4mer","year":"2014","unstructured":"Kr\u00e4mer A, Green J, Pollard J, Tugendreich S. Causal analysis approaches in Ingenuity Pathway Analysis. Bioinformatics (Oxford, England). 2014; 30(4):523\u201330. https:\/\/doi.org\/10.1093\/bioinformatics\/btt703.","journal-title":"Bioinformatics (Oxford, England)"},{"key":"3427_CR35","doi-asserted-by":"publisher","first-page":"13427","DOI":"10.1038\/ncomms13427","volume":"7","author":"IV Ozerov","year":"2016","unstructured":"Ozerov IV, et al. In Silico Pathway Activation Network Decomposition Analysis (iPANDA) as a method for biomarker development. Nat Commun. 2016; 7:13427. https:\/\/doi.org\/10.1038\/ncomms13427.","journal-title":"Nat Commun"},{"issue":"12","key":"3427_CR36","doi-asserted-by":"publisher","first-page":"93","DOI":"10.1186\/gm214","volume":"2","author":"Y Zhao","year":"2010","unstructured":"Zhao Y, Simon R. Gene expression deconvolution in clinical samples. Genome Med. 2010; 2(12):93. https:\/\/doi.org\/10.1186\/gm214.","journal-title":"Genome Med"},{"issue":"17","key":"3427_CR37","doi-asserted-by":"publisher","first-page":"2211","DOI":"10.1093\/bioinformatics\/btt351","volume":"29","author":"R Gaujoux","year":"2013","unstructured":"Gaujoux R, Seoighe C. CellMix: A comprehensive toolbox for gene expression deconvolution. Bioinformatics. 2013; 29(17):2211\u20132. https:\/\/doi.org\/10.1093\/bioinformatics\/btt351.","journal-title":"Bioinformatics"},{"issue":"4","key":"3427_CR38","doi-asserted-by":"publisher","first-page":"287","DOI":"10.1038\/nmeth.1439","volume":"7","author":"SS Shen-Orr","year":"2010","unstructured":"Shen-Orr SS, Tibshirani R, Khatri P, Bodian DL, Staedtler F, Perry NM, Hastie T, Sarwal MM, Davis MM, Butte AJ. Cell type-specific gene expression differences in complex tissues. Nat Methods. 2010; 7(4):287\u20139. https:\/\/doi.org\/10.1038\/nmeth.1439.","journal-title":"Nat Methods"},{"key":"3427_CR39","doi-asserted-by":"publisher","unstructured":"Gupta A, Wang H, Ganapathiraju M. Learning structure in gene expression data using deep architectures, with an application to gene clustering. bioRxiv. 2015. https:\/\/doi.org\/10.1101\/031906.","DOI":"10.1101\/031906"},{"key":"3427_CR40","doi-asserted-by":"publisher","unstructured":"Dincer AB, Celik S, Hiranuma N, Lee S-I. DeepProfile: Deep learning of cancer molecular profiles for precision medicine. bioRxiv. 2018. https:\/\/doi.org\/10.1101\/278739.","DOI":"10.1101\/278739"},{"key":"3427_CR41","unstructured":"Way GP, Greene CS. Evaluating deep variational autoencoders trained on pan-cancer gene expression. 2017. http:\/\/arxiv.org\/abs\/1711.04828."},{"key":"3427_CR42","unstructured":"Supervised results table. https:\/\/figshare.com\/articles\/Supervised_results_table\/7817570. Accessed: 17 May 2019."},{"key":"3427_CR43","unstructured":"Feature importance for the recommended model. https:\/\/figshare.com\/articles\/Recommended_model_feature_importance_on_binary_predictive_tasks\/8980325. Accessed: 24 July 2019."},{"key":"3427_CR44","doi-asserted-by":"publisher","unstructured":"Fisher CK, Smith AM, Walsh JR. Who is this gene and what does it do? A toolkit for munging transcriptomics data in python. bioRxiv. 2018:299107. https:\/\/doi.org\/10.1101\/299107.","DOI":"10.1101\/299107"},{"issue":"5","key":"3427_CR45","doi-asserted-by":"publisher","first-page":"1218","DOI":"10.1016\/j.jaci.2015.03.003","volume":"135","author":"M Su\u00e1rez-Fari\u00f1as","year":"2015","unstructured":"Su\u00e1rez-Fari\u00f1as M, et al. RNA sequencing atopic dermatitis transcriptome profiling provides insights into novel disease mechanisms with potential therapeutic implications. J Allergy Clin Immunol. 2015; 135(5):1218\u201327. https:\/\/doi.org\/10.1016\/j.jaci.2015.03.003.","journal-title":"J Allergy Clin Immunol"},{"issue":"9","key":"3427_CR46","doi-asserted-by":"publisher","first-page":"2178","DOI":"10.1097\/MIB.0000000000000478","volume":"21","author":"BCE Peck","year":"2015","unstructured":"Peck BCE, et al. MicroRNAs Classify Different Disease Behavior Phenotypes of Crohn\u2019s Disease and May Have Prognostic Utility. Inflamm Bowel Dis. 2015; 21(9):2178\u201387. https:\/\/doi.org\/10.1097\/MIB.0000000000000478.","journal-title":"Inflamm Bowel Dis"},{"issue":"2","key":"3427_CR47","doi-asserted-by":"publisher","first-page":"477","DOI":"10.1053\/j.gastro.2015.10.041","volume":"150","author":"GW Tew","year":"2016","unstructured":"Tew GW, et al. Association Between Response to Etrolizumab and Expression of Integrin \u03b1E and Granzyme A in Colon Biopsies of Patients With Ulcerative Colitis. Gastroenterology. 2016; 150(2):477\u20134879. https:\/\/doi.org\/10.1053\/j.gastro.2015.10.041.","journal-title":"Gastroenterology"},{"issue":"6","key":"3427_CR48","doi-asserted-by":"publisher","first-page":"989","DOI":"10.1016\/j.immuni.2014.04.019","volume":"40","author":"P Di Meglio","year":"2014","unstructured":"Di Meglio P, Duarte JaH, Ahlfors H, Owens NDL, Li Y, Villanova F, Tosi I, Hirota K, Nestle FO, Mrowietz U, Gilchrist MJ, Stockinger B. Activation of the aryl hydrocarbon receptor dampens the severity of inflammatory skin conditions. Immunity. 2014; 40(6):989\u20131001. https:\/\/doi.org\/10.1016\/j.immuni.2014.04.019.","journal-title":"Immunity"},{"issue":"38","key":"3427_CR49","doi-asserted-by":"publisher","first-page":"13924","DOI":"10.1073\/pnas.1402665111","volume":"111","author":"Ja Fadista","year":"2014","unstructured":"Fadista Ja, et al. Global genomic and transcriptomic analysis of human pancreatic islets reveals novel genes influencing glucose metabolism. Proc Natl Acad Sci USA. 2014; 111(38):13924\u20139. https:\/\/doi.org\/10.1073\/pnas.1402665111.","journal-title":"Proc Natl Acad Sci USA"},{"issue":"1","key":"3427_CR50","doi-asserted-by":"publisher","first-page":"86","DOI":"10.1186\/s13073-015-0208-5","volume":"7","author":"WR Swindell","year":"2015","unstructured":"Swindell WR, Remmer HA, Sarkar MK, Xing X, Barnes DH, Wolterink L, Voorhees JJ, Nair RP, Johnston A, Elder JT, Gudjonsson JE. Proteogenomic analysis of psoriasis reveals discordant and concordant changes in mRNA and protein abundance. Genome Med. 2015; 7(1):86. https:\/\/doi.org\/10.1186\/s13073-015-0208-5.","journal-title":"Genome Med"},{"key":"3427_CR51","doi-asserted-by":"publisher","unstructured":"Arora R, Cotter A, Livescu K, Srebro N. Stochastic optimization for PCA and PLS. In: 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton): 2012. p. 861\u20138. https:\/\/doi.org\/10.1109\/Allerton.2012.6483308.","DOI":"10.1109\/Allerton.2012.6483308"},{"issue":"Dec","key":"3427_CR52","first-page":"3371","volume":"11","author":"P Vincent","year":"2010","unstructured":"Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol P-A. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res. 2010; 11(Dec):3371\u2013408.","journal-title":"J Mach Learn Res"},{"key":"3427_CR53","unstructured":"Kingma DP, Welling M. Auto-Encoding Variational Bayes. 2013. http:\/\/arxiv.org\/abs\/1312.6114."},{"key":"3427_CR54","unstructured":"Klambauer G, Unterthiner T, Mayr A, Hochreiter S. Self-Normalizing Neural Networks. 2017. http:\/\/arxiv.org\/abs\/1706.02515."},{"key":"3427_CR55","doi-asserted-by":"crossref","unstructured":"Bowman SR, Vilnis L, Vinyals O, Dai AM, Jozefowicz R, Bengio S. Generating Sentences from a Continuous Space. 2015. http:\/\/arxiv.org\/abs\/1511.06349.","DOI":"10.18653\/v1\/K16-1002"},{"key":"3427_CR56","unstructured":"Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A. Automatic differentiation in PyTorch. In: Proceedings of Neural Information Processing Systems: 2017."},{"key":"3427_CR57","first-page":"2825","volume":"12","author":"F Pedregosa","year":"2011","unstructured":"Pedregosa F, et al. Scikit-learn: Machine Learning in Python. J Mach Learn Res. 2011; 12:2825\u201330.","journal-title":"J Mach Learn Res"},{"issue":"1","key":"3427_CR58","doi-asserted-by":"publisher","first-page":"602","DOI":"10.1080\/21642583.2014.956265","volume":"2","author":"K Fawagreh","year":"2014","unstructured":"Fawagreh K, Gaber MM, Elyan E. Random forests: from early developments to recent advancements. Syst Sci Control Eng. 2014; 2(1):602\u20139. https:\/\/doi.org\/10.1080\/21642583.2014.956265. Accessed 22 July 2019.","journal-title":"Syst Sci Control Eng"},{"issue":"359","key":"3427_CR59","doi-asserted-by":"publisher","first-page":"557","DOI":"10.1080\/01621459.1977.10480613","volume":"72","author":"Bradley Efron","year":"1977","unstructured":"Efron B. The Efficiency of Cox\u2019s Likelihood Function for Censored Data. J Am Stat Assoc. 1977; 72(359):557\u201365. https:\/\/doi.org\/10.1080\/01621459.1977.10480613.","journal-title":"Journal of the American Statistical Association"},{"key":"3427_CR60","unstructured":"Dataset Repository. https:\/\/figshare.com\/projects\/Deep_learning_of_representations_for_transcriptomics-based_phenotype_prediction\/60938. Accessed: 17 May 2019."},{"key":"3427_CR61","unstructured":"Code repository. https:\/\/github.com\/unlearnai\/representation_learning_for_transcriptomics. Accessed: 17 May 2019."}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-020-3427-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/article\/10.1186\/s12859-020-3427-8\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-020-3427-8.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,2]],"date-time":"2024-08-02T16:26:43Z","timestamp":1722616003000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-020-3427-8"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,3,20]]},"references-count":61,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2020,12]]}},"alternative-id":["3427"],"URL":"https:\/\/doi.org\/10.1186\/s12859-020-3427-8","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,3,20]]},"assertion":[{"value":"30 January 2020","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"21 February 2020","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 March 2020","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"All transcriptomics data used in this study is from the recount2 database [] and is freely available under the CC BY 4.0 license.","order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"A.M.S, J.R.W, and C.K.F are affiliated with Unlearn.AI, Inc., a company that creates software for clinical research, and hence may have competing financial interests. J.L., C.B.D., P.H., M.R.H., M.M., J.X.M., S.R., S.Z., and D.Z. are employees of Pfizer and may own stock or stock options in Pfizer.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"119"}}