{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,31]],"date-time":"2025-12-31T10:24:00Z","timestamp":1767176640895,"version":"build-2238731810"},"update-to":[{"DOI":"10.1371\/journal.pcbi.1009433","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2021,10,21]],"date-time":"2021-10-21T00:00:00Z","timestamp":1634774400000}}],"reference-count":29,"publisher":"Public Library of Science (PLoS)","issue":"10","license":[{"start":{"date-parts":[[2021,10,11]],"date-time":"2021-10-11T00:00:00Z","timestamp":1633910400000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000050","name":"National Heart, Lung, and Blood Institute","doi-asserted-by":"publisher","award":["U01 HL089897"],"award-info":[{"award-number":["U01 HL089897"]}],"id":[{"id":"10.13039\/100000050","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000050","name":"National Heart, Lung, and Blood Institute","doi-asserted-by":"publisher","award":["U01 HL089856"],"award-info":[{"award-number":["U01 HL089856"]}],"id":[{"id":"10.13039\/100000050","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000050","name":"National Heart, Lung, and Blood Institute","doi-asserted-by":"publisher","award":["R01 HL124233"],"award-info":[{"award-number":["R01 HL124233"]}],"id":[{"id":"10.13039\/100000050","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000050","name":"National Heart, Lung, and Blood Institute","doi-asserted-by":"publisher","award":["R01 HL147326"],"award-info":[{"award-number":["R01 HL147326"]}],"id":[{"id":"10.13039\/100000050","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100004325","name":"AstraZeneca","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100004325","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100001003","name":"Boehringer Ingelheim","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100001003","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100004328","name":"Genentech","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100004328","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100004330","name":"GlaxoSmithKline","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100004330","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100004336","name":"Novartis","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100004336","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100009655","name":"Sunovion","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100009655","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["www.ploscompbiol.org"],"crossmark-restriction":false},"short-container-title":["PLoS Comput Biol"],"abstract":"<jats:p>Most predictive models based on gene expression data do not leverage information related to gene splicing, despite the fact that splicing is a fundamental feature of eukaryotic gene expression. Cigarette smoking is an important environmental risk factor for many diseases, and it has profound effects on gene expression. Using smoking status as a prediction target, we developed deep neural network predictive models using gene, exon, and isoform level quantifications from RNA sequencing data in 2,557 subjects in the COPDGene Study. We observed that models using exon and isoform quantifications clearly outperformed gene-level models when using data from 5 genes from a previously published prediction model. Whereas the test set performance of the previously published model was 0.82 in the original publication, our exon-based models including an exon-to-isoform mapping layer achieved a test set AUC (area under the receiver operating characteristic) of 0.88, which improved to an AUC of 0.94 using exon quantifications from a larger set of genes. Isoform variability is an important source of latent information in RNA-seq data that can be used to improve clinical prediction models.<\/jats:p>","DOI":"10.1371\/journal.pcbi.1009433","type":"journal-article","created":{"date-parts":[[2021,10,11]],"date-time":"2021-10-11T17:21:16Z","timestamp":1633972876000},"page":"e1009433","update-policy":"https:\/\/doi.org\/10.1371\/journal.pcbi.corrections_policy","source":"Crossref","is-referenced-by-count":9,"title":["Improved prediction of smoking status via isoform-aware RNA-seq deep learning models"],"prefix":"10.1371","volume":"17","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-0068-9042","authenticated-orcid":true,"given":"Zifeng","family":"Wang","sequence":"first","affiliation":[]},{"given":"Aria","family":"Masoomi","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6469-9178","authenticated-orcid":true,"given":"Zhonghui","family":"Xu","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1638-8575","authenticated-orcid":true,"given":"Adel","family":"Boueiz","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6730-0062","authenticated-orcid":true,"given":"Sool","family":"Lee","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6418-699X","authenticated-orcid":true,"given":"Tingting","family":"Zhao","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4651-363X","authenticated-orcid":true,"given":"Russell","family":"Bowler","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4907-1657","authenticated-orcid":true,"given":"Michael","family":"Cho","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3641-3822","authenticated-orcid":true,"given":"Edwin K.","family":"Silverman","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1342-4334","authenticated-orcid":true,"given":"Craig","family":"Hersh","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8430-134X","authenticated-orcid":true,"given":"Jennifer","family":"Dy","sequence":"additional","affiliation":[]},{"given":"Peter J.","family":"Castaldi","sequence":"additional","affiliation":[]}],"member":"340","published-online":{"date-parts":[[2021,10,11]]},"reference":[{"issue":"3","key":"pcbi.1009433.ref001","doi-asserted-by":"crossref","first-page":"J258","DOI":"10.1016\/j.jaut.2009.12.003","article-title":"Effects of tobacco smoke on immunity, inflammation and autoimmunity","volume":"34","author":"Y Arnson","year":"2010","journal-title":"Journal of Autoimmunity"},{"issue":"21","key":"pcbi.1009433.ref002","first-page":"4611","article-title":"A whole-blood transcriptome meta-analysis identifies gene expression signatures of cigarette smoking","volume":"25","author":"T Huan","year":"2016","journal-title":"Human Molecular Genetics"},{"issue":"1","key":"pcbi.1009433.ref003","doi-asserted-by":"crossref","first-page":"58","DOI":"10.1186\/s12920-017-0295-9","article-title":"RNA sequencing identifies novel non-coding RNA and exon-specific effects associated with cigarette smoking","volume":"10","author":"MM Parker","year":"2017","journal-title":"BMC Medical Genomics"},{"issue":"1","key":"pcbi.1009433.ref004","doi-asserted-by":"crossref","first-page":"58","DOI":"10.1186\/1755-8794-5-58","article-title":"A whole blood gene expression-based signature for smoking status","volume":"5","author":"P Beineke","year":"2012","journal-title":"BMC Medical Genomics"},{"issue":"18","key":"pcbi.1009433.ref005","doi-asserted-by":"crossref","first-page":"10101","DOI":"10.1073\/pnas.97.18.10101","article-title":"Singular value decomposition for genome-wide expression data processing and modeling","volume":"97","author":"O Alter","year":"2000","journal-title":"Proceedings of the National Academy of Sciences of the United States of America"},{"issue":"25","key":"pcbi.1009433.ref006","doi-asserted-by":"crossref","first-page":"14863","DOI":"10.1073\/pnas.95.25.14863","article-title":"Cluster analysis and display of genome-wide expression patterns","volume":"95","author":"MB Eisen","year":"1998","journal-title":"Proceedings of the National Academy of Sciences of the United States of America"},{"issue":"15","key":"pcbi.1009433.ref007","doi-asserted-by":"crossref","first-page":"5594","DOI":"10.1073\/pnas.1118792109","article-title":"Making sense out of massive data by going beyond differential expression","volume":"109","author":"PR Schmid","year":"2012","journal-title":"Proceedings of the National Academy of Sciences"},{"issue":"27","key":"pcbi.1009433.ref008","doi-asserted-by":"crossref","first-page":"2817","DOI":"10.1056\/NEJMoa041588","article-title":"A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer","volume":"351","author":"S Paik","year":"2004","journal-title":"New England Journal of Medicine"},{"issue":"141","key":"pcbi.1009433.ref009","doi-asserted-by":"crossref","DOI":"10.1098\/rsif.2017.0387","article-title":"Opportunities and obstacles for deep learning in biology and medicine","volume":"15","author":"T Ching","year":"2018","journal-title":"Journal of the Royal Society, Interface"},{"issue":"11","key":"pcbi.1009433.ref010","doi-asserted-by":"crossref","first-page":"3367","DOI":"10.1016\/j.celrep.2019.11.017","article-title":"A Deep Learning Framework for Predicting Response to Therapy in Cancer","volume":"29","author":"T Sakellaropoulos","year":"2019","journal-title":"Cell reports"},{"issue":"1","key":"pcbi.1009433.ref011","doi-asserted-by":"crossref","first-page":"119","DOI":"10.1186\/s12859-020-3427-8","article-title":"Standard machine learning approaches outperform deep representation learning on phenotype prediction from transcriptomics data","volume":"21","author":"AM Smith","year":"2020","journal-title":"BMC Bioinformatics"},{"issue":"1","key":"pcbi.1009433.ref012","doi-asserted-by":"crossref","first-page":"133","DOI":"10.1186\/s13059-015-0694-1","article-title":"Comparison of RNA-seq and microarray-based models for clinical endpoint prediction","volume":"16","author":"W Zhang","year":"2015","journal-title":"Genome Biology"},{"issue":"7553","key":"pcbi.1009433.ref013","doi-asserted-by":"crossref","first-page":"436","DOI":"10.1038\/nature14539","article-title":"Deep learning","volume":"521","author":"Y LeCun","year":"2015","journal-title":"Nature"},{"issue":"1","key":"pcbi.1009433.ref014","doi-asserted-by":"crossref","first-page":"32","DOI":"10.3109\/15412550903499522","article-title":"Genetic epidemiology of COPD (COPDGene) study design","volume":"7","author":"EA Regan","year":"2010","journal-title":"COPD: Journal of Chronic Obstructive Pulmonary Disease"},{"issue":"1","key":"pcbi.1009433.ref015","doi-asserted-by":"crossref","first-page":"182","DOI":"10.1186\/1471-2105-15-182","article-title":"Skewer: a fast and accurate adapter trimmer for next-generation sequencing paired-end reads","volume":"15","author":"H Jiang","year":"2014","journal-title":"BMC Bioinformatics"},{"issue":"1","key":"pcbi.1009433.ref016","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1093\/bioinformatics\/bts635","article-title":"STAR: ultrafast universal RNA-seq aligner","volume":"29","author":"A Dobin","year":"2013","journal-title":"Bioinformatics"},{"issue":"11","key":"pcbi.1009433.ref017","doi-asserted-by":"crossref","first-page":"1530","DOI":"10.1093\/bioinformatics\/bts196","article-title":"RNA-SeQC: RNA-seq metrics for quality control and process optimization","volume":"28","author":"DS DeLuca","year":"2012","journal-title":"Bioinformatics"},{"key":"pcbi.1009433.ref018","unstructured":"Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv preprint arXiv:14126980. 2014."},{"issue":"1","key":"pcbi.1009433.ref019","first-page":"1929","article-title":"Dropout: a simple way to prevent neural networks from overfitting","volume":"15","author":"N Srivastava","year":"2014","journal-title":"The Journal of Machine Learning Research"},{"key":"pcbi.1009433.ref020","unstructured":"Ancona M, Ceolini E, \u00d6ztireli C, Gross M. Towards better understanding of gradient-based attribution methods for Deep Neural Networks. In: International Conference on Learning Representations; 2018. Available from: https:\/\/openreview.net\/forum?id=Sy21R9JAW."},{"key":"pcbi.1009433.ref021","unstructured":"Simonyan K, Vedaldi A, Zisserman A. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:13126034. 2013."},{"issue":"13","key":"pcbi.1009433.ref022","doi-asserted-by":"crossref","first-page":"1600","DOI":"10.1093\/bioinformatics\/btl140","article-title":"Improved scoring of functional groups from gene expression data by decorrelating GO graph structure","volume":"22","author":"A Alexa","year":"2006","journal-title":"Bioinformatics"},{"issue":"7221","key":"pcbi.1009433.ref023","doi-asserted-by":"crossref","first-page":"470","DOI":"10.1038\/nature07509","article-title":"Alternative isoform regulation in human tissue transcriptomes","volume":"456","author":"ET Wang","year":"2008","journal-title":"Nature"},{"issue":"2","key":"pcbi.1009433.ref024","doi-asserted-by":"crossref","first-page":"582","DOI":"10.1093\/nar\/gkx1165","article-title":"Alternative start and termination sites of transcription drive most transcript isoform differences across human tissues","volume":"46","author":"A Reyes","year":"2018","journal-title":"Nucleic Acids Research"},{"issue":"1","key":"pcbi.1009433.ref025","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1038\/nrg.2015.3","article-title":"RNA mis-splicing in disease","volume":"17","author":"MM Scotti","year":"2016","journal-title":"Nature Reviews Genetics"},{"issue":"6285","key":"pcbi.1009433.ref026","doi-asserted-by":"crossref","first-page":"600","DOI":"10.1126\/science.aad9417","article-title":"RNA splicing is a primary link between genetic variation and disease","volume":"352","author":"YI Li","year":"2016","journal-title":"Science"},{"issue":"2-3","key":"pcbi.1009433.ref027","doi-asserted-by":"crossref","first-page":"51","DOI":"10.1016\/j.ygeno.2016.01.004","article-title":"COPD subtypes identified by network-based clustering of blood gene expression","volume":"107","author":"Y Chang","year":"2016","journal-title":"Genomics"},{"issue":"11","key":"pcbi.1009433.ref028","doi-asserted-by":"crossref","first-page":"1108","DOI":"10.1038\/nmeth.2651","article-title":"Network-based stratification of tumor mutations","volume":"10","author":"M Hofree","year":"2013","journal-title":"Nature Methods"},{"issue":"12","key":"pcbi.1009433.ref029","doi-asserted-by":"crossref","first-page":"1236","DOI":"10.3390\/ijerph13121236","article-title":"Overview of Cotinine Cutoff Values for Smoking Status Classification","volume":"13","author":"S Kim","year":"2016","journal-title":"International Journal of Environmental Research and Public Health"}],"updated-by":[{"DOI":"10.1371\/journal.pcbi.1009433","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2021,10,21]],"date-time":"2021-10-21T00:00:00Z","timestamp":1634774400000}}],"container-title":["PLOS Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1009433","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,10,21]],"date-time":"2021-10-21T14:27:08Z","timestamp":1634826428000},"score":1,"resource":{"primary":{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1009433"}},"subtitle":[],"editor":[{"given":"Donna K.","family":"Slonim","sequence":"first","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2021,10,11]]},"references-count":29,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2021,10,11]]}},"URL":"https:\/\/doi.org\/10.1371\/journal.pcbi.1009433","relation":{},"ISSN":["1553-7358"],"issn-type":[{"value":"1553-7358","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,10,11]]}}}