{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,6]],"date-time":"2026-04-06T08:52:48Z","timestamp":1775465568521,"version":"3.50.1"},"update-to":[{"DOI":"10.1371\/journal.pcbi.1010984","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2023,4,6]],"date-time":"2023-04-06T00:00:00Z","timestamp":1680739200000}}],"reference-count":35,"publisher":"Public Library of Science (PLoS)","issue":"3","license":[{"start":{"date-parts":[[2023,3,27]],"date-time":"2023-03-27T00:00:00Z","timestamp":1679875200000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000051","name":"National Human Genome Research Institute","doi-asserted-by":"crossref","award":["R01 HG10067"],"award-info":[{"award-number":["R01 HG10067"]}],"id":[{"id":"10.13039\/100000051","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/100000936","name":"Gordon and Betty Moore Foundation","doi-asserted-by":"crossref","award":["GBMF 4552"],"award-info":[{"award-number":["GBMF 4552"]}],"id":[{"id":"10.13039\/100000936","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["www.ploscompbiol.org"],"crossmark-restriction":false},"short-container-title":["PLoS Comput Biol"],"abstract":"<jats:p>Those building predictive models from transcriptomic data are faced with two conflicting perspectives. The first, based on the inherent high dimensionality of biological systems, supposes that complex non-linear models such as neural networks will better match complex biological systems. The second, imagining that complex systems will still be well predicted by simple dividing lines prefers linear models that are easier to interpret. We compare multi-layer neural networks and logistic regression across multiple prediction tasks on GTEx and Recount3 datasets and find evidence in favor of both possibilities. We verified the presence of non-linear signal when predicting tissue and metadata sex labels from expression data by removing the predictive linear signal with Limma, and showed the removal ablated the performance of linear methods but not non-linear ones. However, we also found that the presence of non-linear signal was not necessarily sufficient for neural networks to outperform logistic regression. Our results demonstrate that while multi-layer neural networks may be useful for making predictions from gene expression data, including a linear baseline model is critical because while biological systems are high-dimensional, effective dividing lines for predictive models may not be.<\/jats:p>","DOI":"10.1371\/journal.pcbi.1010984","type":"journal-article","created":{"date-parts":[[2023,3,27]],"date-time":"2023-03-27T17:31:37Z","timestamp":1679938297000},"page":"e1010984","update-policy":"https:\/\/doi.org\/10.1371\/journal.pcbi.corrections_policy","source":"Crossref","is-referenced-by-count":12,"title":["The effect of non-linear signal in classification problems using gene expression"],"prefix":"10.1371","volume":"19","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2811-1031","authenticated-orcid":true,"given":"Benjamin J.","family":"Heil","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6207-0782","authenticated-orcid":true,"given":"Jake","family":"Crawford","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-8713-9213","authenticated-orcid":true,"given":"Casey S.","family":"Greene","sequence":"additional","affiliation":[]}],"member":"340","published-online":{"date-parts":[[2023,3,27]]},"reference":[{"key":"pcbi.1010984.ref001","doi-asserted-by":"crossref","first-page":"1160","DOI":"10.1200\/JCO.2008.18.1370","article-title":"Supervised Risk Predictor of Breast Cancer Based on Intrinsic Subtypes.","volume":"27","author":"JS Parker","year":"2009","journal-title":"JCO"},{"key":"pcbi.1010984.ref002","doi-asserted-by":"crossref","first-page":"917","DOI":"10.1161\/CIRCULATIONAHA.116.022907","article-title":"Gene Expression Profiling for the Identification and Classification of Antibody-Mediated Heart Rejection","volume":"135","author":"A Loupy","year":"2017","journal-title":"Circulation"},{"key":"pcbi.1010984.ref003","doi-asserted-by":"crossref","first-page":"168","DOI":"10.1186\/s12859-021-04070-2","article-title":"Large-scale labeling and assessment of sex bias in publicly available expression data","volume":"22","author":"E Flynn","year":"2021","journal-title":"BMC Bioinformatics"},{"key":"pcbi.1010984.ref004","article-title":"Compute Trends Across Three Eras of Machine Learning.","author":"J Sevilla","year":"2022","journal-title":"arXiv. arXiv"},{"key":"pcbi.1010984.ref005","doi-asserted-by":"crossref","DOI":"10.1038\/s41467-018-03751-6","article-title":"Massive mining of publicly available RNA-seq data from human and mouse.","volume":"9","author":"A Lachmann","year":"2018","journal-title":"Nat Commun."},{"key":"pcbi.1010984.ref006","doi-asserted-by":"crossref","DOI":"10.1093\/database\/baaa073","article-title":"A curated database reveals trends in single-cell transcriptomics.","volume":"2020","author":"V Svensson","year":"2020","journal-title":"Database"},{"key":"pcbi.1010984.ref007","doi-asserted-by":"crossref","DOI":"10.1038\/s41598-019-52937-5","article-title":"DeePathology: Deep Multi-Task Learning for Inferring Molecular Pathology from Cancer Transcriptome.","volume":"9","author":"B Azarkhalili","year":"2019","journal-title":"Sci Rep"},{"key":"pcbi.1010984.ref008","doi-asserted-by":"crossref","DOI":"10.1093\/gigascience\/giab064","article-title":"Bias-invariant RNA-sequencing metadata annotation.","volume":"10","author":"H Wartmann","year":"2021","journal-title":"GigaScience"},{"key":"pcbi.1010984.ref009","doi-asserted-by":"crossref","first-page":"e1009433","DOI":"10.1371\/journal.pcbi.1009433","article-title":"Improved prediction of smoking status via isoform-aware RNA-seq deep learning models.","volume":"17","author":"Z Wang","year":"2021","journal-title":"PLoS Comput Biol"},{"key":"pcbi.1010984.ref010","doi-asserted-by":"crossref","first-page":"222","DOI":"10.1016\/j.semcdb.2011.12.004","article-title":"The evolution of gene expression and the transcriptome\u2013phenotype relationship.","volume":"23","author":"PW Harrison","year":"2012","journal-title":"Seminars in Cell & Developmental Biology."},{"key":"pcbi.1010984.ref011","doi-asserted-by":"crossref","first-page":"e0153295","DOI":"10.1371\/journal.pone.0153295","article-title":"Nonlinear Dynamics in Gene Regulation Promote Robustness and Evolvability of Gene Expression Levels.","volume":"11","author":"A Steinacher","year":"2016","journal-title":"PLoS ONE"},{"key":"pcbi.1010984.ref012","article-title":"ADAGE-Based Integration of Publicly Available Pseudomonas aeruginosa Gene Expression Data with Denoising Autoencoders Illuminates Microbe-Host Interactions.","volume":"1","author":"J Tan","year":"2016","journal-title":"mSystems"},{"key":"pcbi.1010984.ref013","doi-asserted-by":"crossref","first-page":"99","DOI":"10.1016\/j.cmpb.2018.10.004","article-title":"A semi-supervised deep learning method based on stacked sparse auto-encoder for cancer prediction using RNA-seq data","volume":"166","author":"Y Xiao","year":"2018","journal-title":"Computer Methods and Programs in Biomedicine"},{"key":"pcbi.1010984.ref014","doi-asserted-by":"crossref","DOI":"10.1186\/s12859-017-1984-2","article-title":"A biological network-based regularized artificial neural network model for robust phenotype prediction from gene expression data","volume":"18","author":"T Kang","year":"2017","journal-title":"BMC Bioinformatics"},{"key":"pcbi.1010984.ref015","doi-asserted-by":"crossref","DOI":"10.1186\/s12859-020-3427-8","article-title":"Standard machine learning approaches outperform deep representation learning on phenotype prediction from transcriptomics data","volume":"21","author":"AM Smith","year":"2020","journal-title":"BMC Bioinformatics"},{"key":"pcbi.1010984.ref016","doi-asserted-by":"crossref","first-page":"12","DOI":"10.1016\/j.jclinepi.2019.02.004","article-title":"A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models","volume":"110","author":"E Christodoulou","year":"2019","journal-title":"Journal of Clinical Epidemiology"},{"key":"pcbi.1010984.ref017","article-title":"Different scaling of linear models and deep learning in UKBiobank brain images versus machine-learning datasets","volume":"11","author":"M-A Schulz","year":"2020","journal-title":"Nat Commun"},{"key":"pcbi.1010984.ref018","doi-asserted-by":"crossref","first-page":"580","DOI":"10.1038\/ng.2653","article-title":"The Genotype-Tissue Expression (GTEx) project.","volume":"45","author":"J Lonsdale","year":"2013","journal-title":"Nat Genet"},{"key":"pcbi.1010984.ref019","doi-asserted-by":"crossref","DOI":"10.1186\/s13059-021-02533-6","article-title":"recount3: summaries and queries for large-scale RNA-seq expression and splicing","volume":"22","author":"C Wilks","year":"2021","journal-title":"Genome Biol"},{"key":"pcbi.1010984.ref020","first-page":"362","article-title":"Parameter tuning is a key part of dimensionality reduction via deep variational autoencoders for single cell RNA transcriptomics","volume":"24","author":"Q Hu","year":"2019","journal-title":"Pac Symp Biocomput"},{"key":"pcbi.1010984.ref021","doi-asserted-by":"crossref","first-page":"D19","DOI":"10.1093\/nar\/gkq1019","article-title":"The Sequence Read Archive","volume":"39","author":"R Leinonen","year":"2010","journal-title":"Nucleic Acids Research"},{"key":"pcbi.1010984.ref022","article-title":"An efficient not-only-linear correlation coefficient based on machine learning.","author":"M Pividori","year":"2022","journal-title":"Cold Spring Harbor Laboratory;"},{"key":"pcbi.1010984.ref023","doi-asserted-by":"crossref","first-page":"e47","DOI":"10.1093\/nar\/gkv007","article-title":"limma powers differential expression analyses for RNA-sequencing and microarray studies","volume":"43","author":"ME Ritchie","year":"2015","journal-title":"Nucleic Acids Research"},{"key":"pcbi.1010984.ref024","doi-asserted-by":"crossref","first-page":"169","DOI":"10.1038\/s41576-021-00434-9","article-title":"Navigating the pitfalls of applying machine learning in genomics","volume":"23","author":"S Whalen","year":"2021","journal-title":"Nat Rev Genet"},{"key":"pcbi.1010984.ref025","doi-asserted-by":"crossref","first-page":"7300","DOI":"10.1128\/MCB.20.19.7300-7310.2000","article-title":"Phosphorylation of ETS transcription factor ER81 in a complex with its coactivators CREB-binding protein and p300","volume":"20","author":"S Papoutsopoulou","year":"2000","journal-title":"Mol Cell Biol"},{"key":"pcbi.1010984.ref026","doi-asserted-by":"crossref","first-page":"22","DOI":"10.1186\/1471-2164-10-22","article-title":"BioMart\u2014biological queries made easy","volume":"10","author":"D Smedley","year":"2009","journal-title":"BMC Genomics"},{"key":"pcbi.1010984.ref027","doi-asserted-by":"crossref","first-page":"D28","DOI":"10.1093\/nar\/gkq967","article-title":"The European Nucleotide Archive","volume":"39","author":"R Leinonen","year":"2011","journal-title":"Nucleic Acids Res"},{"key":"pcbi.1010984.ref028","first-page":"807","volume-title":"Proceedings of the 27th International Conference on International Conference on Machine Learning.","author":"V Nair","year":"2010"},{"key":"pcbi.1010984.ref029","article-title":"PyTorch: An Imperative Style, High-Performance Deep Learning Library.","author":"A Paszke","year":"2019","journal-title":"arXiv. arXiv"},{"key":"pcbi.1010984.ref030","article-title":"Adam: A Method for Stochastic Optimization.","author":"DP Kingma","year":"2017","journal-title":"arXiv. arXiv"},{"key":"pcbi.1010984.ref031","first-page":"1929","article-title":"Dropout: A Simple Way to Prevent Neural Networks from Overfitting","volume":"15","author":"N Srivastava","year":"2014","journal-title":"Journal of Machine Learning Research"},{"key":"pcbi.1010984.ref032","first-page":"448","article-title":"Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift.","author":"S Ioffe","year":"2015","journal-title":"Proceedings of the 32nd International Conference on Machine Learning."},{"key":"pcbi.1010984.ref033","year":"2020","journal-title":"Neptune: Experiment management and collaboration tool"},{"key":"pcbi.1010984.ref034","doi-asserted-by":"crossref","first-page":"1132","DOI":"10.1038\/s41592-021-01256-7","article-title":"Reproducibility standards for machine learning in the life sciences","volume":"18","author":"BJ Heil","year":"2021","journal-title":"Nat Methods"},{"key":"pcbi.1010984.ref035","doi-asserted-by":"crossref","first-page":"2520","DOI":"10.1093\/bioinformatics\/bts480","article-title":"Snakemake\u2014a scalable bioinformatics workflow engine","volume":"28","author":"J Koster","year":"2012","journal-title":"Bioinformatics"}],"updated-by":[{"DOI":"10.1371\/journal.pcbi.1010984","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2023,4,6]],"date-time":"2023-04-06T00:00:00Z","timestamp":1680739200000}}],"container-title":["PLOS Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1010984","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,4,6]],"date-time":"2023-04-06T18:01:34Z","timestamp":1680804094000},"score":1,"resource":{"primary":{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1010984"}},"subtitle":[],"editor":[{"given":"Jie","family":"Liu","sequence":"first","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2023,3,27]]},"references-count":35,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2023,3,27]]}},"URL":"https:\/\/doi.org\/10.1371\/journal.pcbi.1010984","relation":{"new_version":[{"id-type":"doi","id":"10.1371\/journal.pcbi.1010984","asserted-by":"object"}]},"ISSN":["1553-7358"],"issn-type":[{"value":"1553-7358","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,3,27]]}}}