{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,27]],"date-time":"2026-02-27T06:23:47Z","timestamp":1772173427829,"version":"3.50.1"},"update-to":[{"DOI":"10.1371\/journal.pcbi.1012859","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2025,3,31]],"date-time":"2025-03-31T00:00:00Z","timestamp":1743379200000}}],"reference-count":41,"publisher":"Public Library of Science (PLoS)","issue":"3","license":[{"start":{"date-parts":[[2025,3,7]],"date-time":"2025-03-07T00:00:00Z","timestamp":1741305600000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100000265","name":"Medical Research Council","doi-asserted-by":"publisher","award":["MR\/R013926\/1"],"award-info":[{"award-number":["MR\/R013926\/1"]}],"id":[{"id":"10.13039\/501100000265","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100000265","name":"Medical Research Council","doi-asserted-by":"publisher","award":["MC_UU_00002\/4"],"award-info":[{"award-number":["MC_UU_00002\/4"]}],"id":[{"id":"10.13039\/501100000265","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100010269","name":"Wellcome Trust","doi-asserted-by":"publisher","award":["WT220788"],"award-info":[{"award-number":["WT220788"]}],"id":[{"id":"10.13039\/100010269","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100018956","name":"NIHR Cambridge Biomedical Research Centre","doi-asserted-by":"publisher","award":["BRC-1215-20014"],"award-info":[{"award-number":["BRC-1215-20014"]}],"id":[{"id":"10.13039\/501100018956","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100012041","name":"Versus Arthritis","doi-asserted-by":"publisher","award":["22084"],"award-info":[{"award-number":["22084"]}],"id":[{"id":"10.13039\/501100012041","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001279","name":"Great Ormond Street Hospital Charity","doi-asserted-by":"publisher","award":["VS0518"],"award-info":[{"award-number":["VS0518"]}],"id":[{"id":"10.13039\/501100001279","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Olivia\u2019s Vision"},{"DOI":"10.13039\/501100012041","name":"Versus Arthritis","doi-asserted-by":"publisher","award":["20164"],"award-info":[{"award-number":["20164"]}],"id":[{"id":"10.13039\/501100012041","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100012041","name":"Versus Arthritis","doi-asserted-by":"publisher","award":["21593"],"award-info":[{"award-number":["21593"]}],"id":[{"id":"10.13039\/501100012041","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100012041","name":"Versus Arthritis","doi-asserted-by":"publisher","award":["22203"],"award-info":[{"award-number":["22203"]}],"id":[{"id":"10.13039\/501100012041","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100018208","name":"Cure JM Foundation","doi-asserted-by":"publisher","award":["GOSH102019"],"award-info":[{"award-number":["GOSH102019"]}],"id":[{"id":"10.13039\/100018208","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["www.ploscompbiol.org"],"crossmark-restriction":false},"short-container-title":["PLoS Comput Biol"],"abstract":"<jats:p>Gene expression studies often use bulk RNA sequencing of mixed cell populations because single cell or sorted cell sequencing may be prohibitively expensive. However, mixed cell studies may miss expression patterns that are restricted to specific cell populations. Computational deconvolution can be used to estimate cell fractions from bulk expression data and infer average cell-type expression in a set of samples (e.g., cases or controls), but imputing sample-level cell-type expression is required for more detailed analyses, such as relating expression to quantitative traits, and is less commonly addressed. Here, we assessed the accuracy of imputing sample-level cell-type expression using a real dataset where mixed peripheral blood mononuclear cells (PBMC) and sorted (CD4, CD8, CD14, CD19) RNA sequencing data were generated from the same subjects (N=158), and pseudobulk datasets synthesised from eQTLgen single cell RNA-seq data. We compared three domain-specific methods, CIBERSORTx, bMIND and debCAM\/swCAM, and two cross-domain machine learning methods, multiple response LASSO and ridge, that had not been used for this task before. We also assessed the methods according to their ability to recover differential gene expression (DGE) results. LASSO\/ridge showed higher sensitivity but lower specificity for recovering DGE signals seen in observed data compared to deconvolution methods, although LASSO\/ridge had higher area under curves than deconvolution methods. Machine learning methods have the potential to outperform domain-specific methods when suitable training data are available.<\/jats:p>","DOI":"10.1371\/journal.pcbi.1012859","type":"journal-article","created":{"date-parts":[[2025,3,7]],"date-time":"2025-03-07T13:20:41Z","timestamp":1741353641000},"page":"e1012859","update-policy":"https:\/\/doi.org\/10.1371\/journal.pcbi.corrections_policy","source":"Crossref","is-referenced-by-count":1,"title":["Penalised regression improves imputation of cell-type specific expression using RNA-seq data from mixed cell populations compared to domain-specific methods"],"prefix":"10.1371","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-9267-7988","authenticated-orcid":true,"given":"Wei-Yu","family":"Lin","sequence":"first","affiliation":[]},{"given":"Melissa","family":"Kartawinata","sequence":"additional","affiliation":[]},{"given":"Bethany R.","family":"Jebson","sequence":"additional","affiliation":[]},{"given":"Restuadi","family":"Restuadi","sequence":"additional","affiliation":[]},{"given":"Hannah","family":"Peckham","sequence":"additional","affiliation":[]},{"given":"Anna","family":"Radziszewska","sequence":"additional","affiliation":[]},{"given":"Claire T.","family":"Deakin","sequence":"additional","affiliation":[]},{"given":"Coziana","family":"Ciurtin","sequence":"additional","affiliation":[]},{"name":"CLUSTER Consortium","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7495-1429","authenticated-orcid":true,"given":"Lucy R.","family":"Wedderburn","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9755-1703","authenticated-orcid":true,"given":"Chris","family":"Wallace","sequence":"additional","affiliation":[]}],"member":"340","published-online":{"date-parts":[[2025,3,7]]},"reference":[{"issue":"10","key":"pcbi.1012859.ref001","doi-asserted-by":"crossref","first-page":"4170","DOI":"10.1172\/JCI59255","article-title":"Gene expression profiling of CD8+ T cells predicts prognosis in patients with Crohn disease and ulcerative colitis","volume":"121","author":"JC Lee","year":"2011","journal-title":"J Clin Invest"},{"issue":"5","key":"pcbi.1012859.ref002","doi-asserted-by":"crossref","first-page":"586","DOI":"10.1038\/nm.2130","article-title":"A CD8+ T cell transcription signature predicts prognosis in autoimmune disease","volume":"16","author":"EF McKinney","year":"2010","journal-title":"Nat Med"},{"issue":"6","key":"pcbi.1012859.ref003","doi-asserted-by":"crossref","first-page":"1208","DOI":"10.1136\/ard.2009.108043","article-title":"Novel expression signatures identified by transcriptional analysis of separated leucocyte subsets in systemic lupus erythematosus and vasculitis","volume":"69","author":"PA Lyons","year":"2010","journal-title":"Ann Rheum Dis"},{"issue":"7562","key":"pcbi.1012859.ref004","doi-asserted-by":"crossref","first-page":"612","DOI":"10.1038\/nature14468","article-title":"T-cell exhaustion, co-stimulation and clinical outcome in autoimmunity and infection","volume":"523","author":"EF McKinney","year":"2015","journal-title":"Nature"},{"key":"pcbi.1012859.ref005","first-page":"213","volume-title":"In Silico Cell-Type Deconvolution Methods in Cancer Immunotherapy. In: Boegel S, editor. Bioinformatics for Cancer Immunotherapy. Methods in Molecular Biology. New York: Humana Press","author":"G Sturm","year":"2020"},{"key":"pcbi.1012859.ref006","first-page":"135","volume-title":"Profiling Cell Type Abundance and Expression in Bulk Tissues with CIBERSORTx. In: Kidder, B, editor. Stem Cell Transcriptional Networks. Methods in Molecular Biology. New York: Humana Press","author":"CB Steen","year":"2020"},{"issue":"11","key":"pcbi.1012859.ref007","doi-asserted-by":"crossref","first-page":"1969","DOI":"10.1093\/bioinformatics\/bty019","article-title":"Computational deconvolution of transcriptomics data from mixed cell populations","volume":"34","author":"F Avila Cobos","year":"2018","journal-title":"Bioinformatics"},{"issue":"7","key":"pcbi.1012859.ref008","doi-asserted-by":"crossref","first-page":"1031","DOI":"10.1007\/s00262-018-2150-z","article-title":"Quantifying tumor-infiltrating immune cells from transcriptomics data","volume":"67","author":"F Finotello","year":"2018","journal-title":"Cancer Immunol Immunother"},{"issue":"4","key":"pcbi.1012859.ref009","doi-asserted-by":"crossref","first-page":"287","DOI":"10.1038\/nmeth.1439","article-title":"Cell type-specific gene expression differences in complex tissues","volume":"7","author":"SS Shen-Orr","year":"2010","journal-title":"Nat Methods"},{"issue":"5","key":"pcbi.1012859.ref010","doi-asserted-by":"crossref","first-page":"453","DOI":"10.1038\/nmeth.3337","article-title":"Robust enumeration of cell subsets from tissue expression profiles","volume":"12","author":"AM Newman","year":"2015","journal-title":"Nat Methods"},{"issue":"1","key":"pcbi.1012859.ref011","doi-asserted-by":"crossref","first-page":"174","DOI":"10.1186\/s13059-016-1028-7","article-title":"Comprehensive analyses of tumor immunity: implications for cancer immunotherapy","volume":"17","author":"B Li","year":"2016","journal-title":"Genome Biol"},{"key":"pcbi.1012859.ref012","doi-asserted-by":"crossref","first-page":"e26476","DOI":"10.7554\/eLife.26476","article-title":"Simultaneous enumeration of cancer and immune cell types from bulk tumor gene expression data. Valencia A, editor","volume":"6","author":"J Racle","year":"2017","journal-title":"Elife"},{"issue":"7","key":"pcbi.1012859.ref013","doi-asserted-by":"crossref","first-page":"773","DOI":"10.1038\/s41587-019-0114-2","article-title":"Determining cell type abundance and expression from bulk tissues with digital cytometry","volume":"37","author":"AM Newman","year":"2019","journal-title":"Nat Biotechnol"},{"issue":"5","key":"pcbi.1012859.ref014","doi-asserted-by":"crossref","first-page":"e1006976","DOI":"10.1371\/journal.pcbi.1006976","article-title":"Fast and robust deconvolution of tumor infiltrating lymphocyte from expression profiles using least trimmed squares","volume":"15","author":"Y Hao","year":"2019","journal-title":"PLoS Comput Biol"},{"issue":"1","key":"pcbi.1012859.ref015","doi-asserted-by":"crossref","first-page":"34","DOI":"10.1186\/s13073-019-0638-6","article-title":"Molecular and pharmacological modulators of the tumor immune contexture revealed by deconvolution of RNA-seq data","volume":"11","author":"F Finotello","year":"2019","journal-title":"Genome Med"},{"issue":"12","key":"pcbi.1012859.ref016","doi-asserted-by":"crossref","first-page":"e1007510","DOI":"10.1371\/journal.pcbi.1007510","article-title":"CDSeq: A novel complete deconvolution method for dissecting heterogeneous samples using gene expression data","volume":"15","author":"K Kang","year":"2019","journal-title":"PLoS Comput Biol"},{"issue":"3","key":"pcbi.1012859.ref017","doi-asserted-by":"crossref","first-page":"782","DOI":"10.1093\/bioinformatics\/btz619","article-title":"Using multiple measurements of tissue to estimate subject- and cell-type-specific gene expression","volume":"36","author":"J Wang","year":"2020","journal-title":"Bioinformatics"},{"issue":"12","key":"pcbi.1012859.ref018","doi-asserted-by":"crossref","first-page":"3927","DOI":"10.1093\/bioinformatics\/btaa205","article-title":"debCAM: a bioconductor R package for fully unsupervised deconvolution of complex tissues","volume":"36","author":"L Chen","year":"2020","journal-title":"Bioinformatics"},{"issue":"1","key":"pcbi.1012859.ref019","doi-asserted-by":"crossref","first-page":"262","DOI":"10.1186\/s12859-021-04186-5","article-title":"CDSeqR: fast complete deconvolution for gene expression data from bulk tissues","volume":"22","author":"K Kang","year":"2021","journal-title":"BMC Bioinformatics"},{"issue":"10","key":"pcbi.1012859.ref020","doi-asserted-by":"crossref","first-page":"1807","DOI":"10.1101\/gr.268722.120","article-title":"Bayesian estimation of cell type-specific gene expression with prior derived from single-cell data","volume":"31","author":"J Wang","year":"2021","journal-title":"Genome Res"},{"issue":"1","key":"pcbi.1012859.ref021","first-page":"lqaa110","article-title":"Computational deconvolution to estimate cell type-specific gene expression from bulk data","volume":"3","author":"MK Jaakkola","year":"2021","journal-title":"NAR Genom Bioinform"},{"issue":"2","key":"pcbi.1012859.ref022","doi-asserted-by":"crossref","first-page":"lqab056","DOI":"10.1093\/nargab\/lqab056","article-title":"A computational method for direct imputation of cell type-specific expression profiles and cellular compositions from bulk-tissue RNA-Seq in brain disorders","volume":"3","author":"A Doostparast Torshizi","year":"2021","journal-title":"NAR Genom Bioinform"},{"issue":"5","key":"pcbi.1012859.ref023","doi-asserted-by":"crossref","first-page":"1403","DOI":"10.1093\/bioinformatics\/btab839","article-title":"swCAM: estimation of subtype-specific expressions in individual samples with unsupervised sample-wise deconvolution","volume":"38","author":"L Chen","year":"2022","journal-title":"Bioinformatics"},{"issue":"1","key":"pcbi.1012859.ref024","doi-asserted-by":"crossref","first-page":"102","DOI":"10.1186\/s13059-021-02290-6","article-title":"A benchmark for RNA-seq deconvolution analysis under dynamic testing environments","volume":"22","author":"H Jin","year":"2021","journal-title":"Genome Biol"},{"issue":"14","key":"pcbi.1012859.ref025","doi-asserted-by":"crossref","first-page":"i436","DOI":"10.1093\/bioinformatics\/btz363","article-title":"Comprehensive evaluation of transcriptome-based cell-type quantification methods for immuno-oncology","volume":"35","author":"G Sturm","year":"2019","journal-title":"Bioinformatics"},{"issue":"1","key":"pcbi.1012859.ref026","doi-asserted-by":"crossref","first-page":"5650","DOI":"10.1038\/s41467-020-19015-1","article-title":"Benchmarking of cell type deconvolution pipelines for transcriptomics data","volume":"11","author":"F Avila Cobos","year":"2020","journal-title":"Nat Commun"},{"issue":"1","key":"pcbi.1012859.ref027","doi-asserted-by":"crossref","first-page":"3267","DOI":"10.1038\/s41467-022-30893-5","article-title":"Single-cell RNA-sequencing of peripheral blood mononuclear cells reveals widespread, context-specific gene expression regulation upon pathogenic exposure","volume":"13","author":"R Oelen","year":"2022","journal-title":"Nat Commun"},{"issue":"2","key":"pcbi.1012859.ref028","first-page":"390","article-title":"International League of Associations for Rheumatology classification of juvenile idiopathic arthritis: second revision, Edmonton, 2001","volume":"31","author":"RE Petty","year":"2004","journal-title":"J Rheumatol"},{"issue":"4","key":"pcbi.1012859.ref029","doi-asserted-by":"crossref","first-page":"316","DOI":"10.1038\/nbt.3820","article-title":"Nextflow enables reproducible computational workflows","volume":"35","author":"P Di Tommaso","year":"2017","journal-title":"Nat Biotechnol"},{"issue":"1","key":"pcbi.1012859.ref030","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1093\/bioinformatics\/bts635","article-title":"STAR: ultrafast universal RNA-seq aligner","volume":"29","author":"A Dobin","year":"2013","journal-title":"Bioinformatics"},{"issue":"1","key":"pcbi.1012859.ref031","doi-asserted-by":"crossref","first-page":"419","DOI":"10.1186\/s12859-016-1284-2","article-title":"Je, a versatile suite to handle multiplexed NGS libraries with unique molecular identifiers","volume":"17","author":"C Girardot","year":"2016","journal-title":"BMC Bioinformatics"},{"issue":"7","key":"pcbi.1012859.ref032","doi-asserted-by":"crossref","first-page":"923","DOI":"10.1093\/bioinformatics\/btt656","article-title":"featureCounts: an efficient general purpose program for assigning sequence reads to genomic features","volume":"30","author":"Y Liao","year":"2014","journal-title":"Bioinformatics"},{"issue":"3","key":"pcbi.1012859.ref033","doi-asserted-by":"crossref","first-page":"lqaa078","DOI":"10.1093\/nargab\/lqaa078","article-title":"ComBat-seq: batch effect adjustment for RNA-seq count data","volume":"2","author":"Y Zhang","year":"2020","journal-title":"NAR Genom Bioinform"},{"key":"pcbi.1012859.ref034","first-page":"1438","article-title":"From reads to genes to pathways: differential expression analysis of RNA-Seq experiments using Rsubread and the edgeR quasi-likelihood pipeline [version 2; peer review: 5 approved]","volume":"5","author":"Y Chen","year":"2016","journal-title":"F1000Res"},{"issue":"1","key":"pcbi.1012859.ref035","doi-asserted-by":"crossref","first-page":"139","DOI":"10.1093\/bioinformatics\/btp616","article-title":"edgeR: a Bioconductor package for differential expression analysis of digital gene expression data","volume":"26","author":"MD Robinson","year":"2010","journal-title":"Bioinformatics"},{"key":"pcbi.1012859.ref036","first-page":"243","volume-title":"Profiling Tumor Infiltrating Immune Cells with CIBERSORT. In: VonStechow L, editor. Cancer Systems Biology. Methods in Molecular Biology. New York: Humana Press","author":"B Chen","year":"2018"},{"key":"pcbi.1012859.ref037","article-title":"dynamicTreeCut: Methods for Detection of Clusters in Hierarchical Clustering Dendrograms","author":"P Langfelder","year":"2016"},{"issue":"1","key":"pcbi.1012859.ref038","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18637\/jss.v033.i01","article-title":"Regularization Paths for Generalized Linear Models via Coordinate Descent","volume":"33","author":"J Friedman","year":"2010","journal-title":"J Stat Softw"},{"issue":"7","key":"pcbi.1012859.ref039","doi-asserted-by":"crossref","first-page":"e47","DOI":"10.1093\/nar\/gkv007","article-title":"limma powers differential expression analyses for RNA-sequencing and microarray studies","volume":"43","author":"ME Ritchie","year":"2015","journal-title":"Nucleic Acids Res"},{"key":"pcbi.1012859.ref040","doi-asserted-by":"crossref","first-page":"77","DOI":"10.1186\/1471-2105-12-77","article-title":"pROC: an open-source package for R and S+ to analyze and compare ROC curves","volume":"12","author":"X Robin","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"pcbi.1012859.ref041","volume-title":"R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2021","author":"R Core Team"}],"updated-by":[{"DOI":"10.1371\/journal.pcbi.1012859","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2025,3,31]],"date-time":"2025-03-31T00:00:00Z","timestamp":1743379200000}}],"container-title":["PLOS Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1012859","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,31]],"date-time":"2025-03-31T17:01:59Z","timestamp":1743440519000},"score":1,"resource":{"primary":{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1012859"}},"subtitle":[],"editor":[{"given":"Marc","family":"Robinson-Rechavi","sequence":"first","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2025,3,7]]},"references-count":41,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2025,3,7]]}},"URL":"https:\/\/doi.org\/10.1371\/journal.pcbi.1012859","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2023.09.11.556650","asserted-by":"object"}]},"ISSN":["1553-7358"],"issn-type":[{"value":"1553-7358","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,3,7]]}}}