{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,13]],"date-time":"2025-11-13T22:01:47Z","timestamp":1763071307291,"version":"3.41.2"},"reference-count":38,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2024,3,4]],"date-time":"2024-03-04T00:00:00Z","timestamp":1709510400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100003246","name":"Nederlandse Organisatie voor Wetenschappelijk Onderzoek","doi-asserted-by":"publisher","award":["024.004.012"],"award-info":[{"award-number":["024.004.012"]}],"id":[{"id":"10.13039\/501100003246","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Bioinform."],"abstract":"<jats:p>Most regulatory elements, especially enhancer sequences, are cell population-specific. One could even argue that a distinct set of regulatory elements is what defines a cell population. However, discovering which non-coding regions of the DNA are essential in which context, and as a result, which genes are expressed, is a difficult task. Some computational models tackle this problem by predicting gene expression directly from the genomic sequence. These models are currently limited to predicting bulk measurements and mainly make tissue-specific predictions. Here, we present a model that leverages single-cell RNA-sequencing data to predict gene expression. We show that cell population-specific models outperform tissue-specific models, especially when the expression profile of a cell population and the corresponding tissue are dissimilar. Further, we show that our model can prioritize GWAS variants and learn motifs of transcription factor binding sites. We envision that our model can be useful for delineating cell population-specific regulatory elements.<\/jats:p>","DOI":"10.3389\/fbinf.2024.1347276","type":"journal-article","created":{"date-parts":[[2024,3,4]],"date-time":"2024-03-04T04:52:28Z","timestamp":1709527948000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Predicting cell population-specific gene expression from genomic sequence"],"prefix":"10.3389","volume":"4","author":[{"given":"Lieke","family":"Michielsen","sequence":"first","affiliation":[]},{"given":"Marcel J. T.","family":"Reinders","sequence":"additional","affiliation":[]},{"given":"Ahmed","family":"Mahfouz","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2024,3,4]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","first-page":"245","DOI":"10.1186\/s13059-022-02811-x","article-title":"The genetic and biochemical determinants of mRNA degradation rates in mammals","volume":"23","author":"Agarwal","year":"2022","journal-title":"Genome Biol."},{"key":"B2","doi-asserted-by":"publisher","first-page":"107663","DOI":"10.1016\/j.celrep.2020.107663","article-title":"Predicting mRNA abundance directly from genomic sequence using deep convolutional neural networks","volume":"31","author":"Agarwal","year":"2020","journal-title":"Cell Rep."},{"key":"B3","doi-asserted-by":"publisher","first-page":"1196","DOI":"10.1038\/s41592-021-01252-x","article-title":"Effective gene expression prediction from sequence by integrating long-range interactions","volume":"18","author":"Avsec","year":"","journal-title":"Nat. Methods"},{"key":"B4","doi-asserted-by":"publisher","first-page":"354","DOI":"10.1038\/s41588-021-00782-6","article-title":"Base-resolution models of transcription-factor binding reveal soft motif syntax","volume":"53","author":"Avsec","year":"","journal-title":"Nat. Genet."},{"key":"B5","doi-asserted-by":"publisher","first-page":"111","DOI":"10.1038\/s41586-021-03465-8","article-title":"Comparative cellular analysis of motor cortex in human, marmoset and mouse","volume":"598","author":"Bakken","year":"2021","journal-title":"Nature"},{"key":"B6","doi-asserted-by":"publisher","first-page":"86","DOI":"10.1186\/s13059-023-02933-w","article-title":"Consequences and opportunities arising due to sparser single-cell RNA-seq datasets","volume":"24","author":"Bouland","year":"2023","journal-title":"Genome Biol."},{"key":"B7","doi-asserted-by":"publisher","first-page":"D1005","DOI":"10.1093\/nar\/gky1120","article-title":"The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019","volume":"47","author":"Buniello","year":"2019","journal-title":"Nucleic Acids Res."},{"key":"B8","doi-asserted-by":"publisher","first-page":"D165","DOI":"10.1093\/nar\/gkab1113","article-title":"JASPAR 2022: the 9th release of the open-access database of transcription factor binding profiles","volume":"50","author":"Castro-Mondragon","year":"2022","journal-title":"Nucleic Acids Res."},{"key":"B9","first-page":"1","article-title":"Statistical comparisons of classifiers over multiple data sets","volume":"7","author":"Dem\u0161ar","year":"2006","journal-title":"J. Mach. Learn. Res."},{"key":"B10","doi-asserted-by":"publisher","first-page":"699","DOI":"10.1038\/s41586-020-2493-4","article-title":"Expanded encyclopaedias of DNA elements in the human and mouse genomes","volume":"583","author":"Moore","year":"2020","journal-title":"Nature"},{"key":"B11","doi-asserted-by":"publisher","first-page":"824","DOI":"10.1093\/schbul\/sby140","article-title":"Genome-wide association study detected novel susceptibility genes for schizophrenia and shared trans-populations\/diseases genetic effect","volume":"45","author":"Ikeda","year":"2019","journal-title":"Schizophr. Bull."},{"key":"B12","doi-asserted-by":"publisher","first-page":"630","DOI":"10.1038\/s41586-021-04262-z","article-title":"Decoding gene regulation in the fly brain","volume":"601","author":"Janssens","year":"2022","journal-title":"Nature"},{"key":"B13","doi-asserted-by":"publisher","first-page":"739","DOI":"10.1101\/gr.227819.117","article-title":"Sequential regulatory activity prediction across chromosomes with convolutional neural networks","volume":"28","author":"Kelley","year":"2018","journal-title":"Genome Res."},{"key":"B14","doi-asserted-by":"publisher","first-page":"990","DOI":"10.1101\/gr.200535.115","article-title":"Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks","volume":"26","author":"Kelley","year":"2016","journal-title":"Genome Res."},{"key":"B15","doi-asserted-by":"publisher","first-page":"1670","DOI":"10.1038\/s41588-019-0512-x","article-title":"Comparative genetic architectures of schizophrenia in East Asian and European populations","volume":"51","author":"Lam","year":"2019","journal-title":"Nat. Genet."},{"key":"B16","doi-asserted-by":"publisher","first-page":"650","DOI":"10.1016\/j.cell.2018.01.029","article-title":"The human transcription factors","volume":"172","author":"Lambert","year":"2018","journal-title":"Cell"},{"key":"B17","doi-asserted-by":"publisher","first-page":"1576","DOI":"10.1038\/ng.3973","article-title":"Genome-wide association analysis identifies 30 new susceptibility loci for schizophrenia","volume":"49","author":"Li","year":"2017","journal-title":"Nat. Genet."},{"key":"B18","doi-asserted-by":"publisher","DOI":"10.1101\/2022.01.06.474932","article-title":"Machine learning-assisted identification of factors contributing to the technical variability between bulk and single-cell RNA-seq experiments","author":"Lipnitskaya","year":"2022"},{"key":"B19","doi-asserted-by":"publisher","first-page":"50","DOI":"10.1214\/aoms\/1177730491","article-title":"On a test of whether one of two random variables is stochastically larger than the other","volume":"18","author":"Mann","year":"1947","journal-title":"aoms"},{"key":"B20","doi-asserted-by":"publisher","first-page":"1134","DOI":"10.1126\/science.aay0793","article-title":"Brain cell type-specific enhancer-promoter interactome maps and disease-risk association","volume":"366","author":"Nott","year":"2019","journal-title":"Science"},{"key":"B21","doi-asserted-by":"publisher","first-page":"381","DOI":"10.1038\/s41588-018-0059-2","article-title":"Common schizophrenia alleles are enriched in mutation-intolerant genes and in regions under strong background selection","volume":"50","author":"Pardi\u00f1as","year":"2018","journal-title":"Nat. Genet."},{"key":"B22","first-page":"8024","article-title":"PyTorch: an imperative style, high-performance deep learning library","volume-title":"Advances in neural information processing systems 32","author":"Paszke","year":"2019"},{"key":"B23","doi-asserted-by":"publisher","first-page":"367","DOI":"10.1038\/s41586-018-0590-4","article-title":"Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris","volume":"562","author":"Schaum","year":"2018","journal-title":"Nature"},{"key":"B24","doi-asserted-by":"publisher","first-page":"662254","DOI":"10.1101\/662254","article-title":"The murine transcriptome reveals global aging nodes with organ-specific phase and amplitude","author":"Schaum","year":"2019"},{"key":"B25","doi-asserted-by":"publisher","first-page":"421","DOI":"10.1038\/nature13595","article-title":"Biological insights from 108 schizophrenia-associated genetic loci","volume":"511","year":"2014","journal-title":"Nature"},{"key":"B26","doi-asserted-by":"publisher","first-page":"45","DOI":"10.1093\/dnares\/dsn030","article-title":"Database for mRNA half-life of 19 977 genes obtained by DNA microarray analysis of pluripotent and differentiating mouse embryonic stem cells","volume":"16","author":"Sharova","year":"2009","journal-title":"DNA Res."},{"key":"B27","doi-asserted-by":"publisher","first-page":"e0163962","DOI":"10.1371\/journal.pone.0163962","article-title":"SeqKit: a cross-platform and ultrafast toolkit for FASTA\/Q file manipulation","volume":"11","author":"Shen","year":"2016","journal-title":"PLoS One"},{"key":"B28","doi-asserted-by":"publisher","first-page":"2078","DOI":"10.1101\/gr.156919.113","article-title":"3\u2032 UTR-isoform choice has limited influence on the stability and translational efficiency of most mRNAs in mouse fibroblasts","volume":"23","author":"Spies","year":"2013","journal-title":"Genome Res."},{"key":"B29","doi-asserted-by":"publisher","first-page":"335","DOI":"10.1038\/nn.4216","article-title":"Adult mouse cortical cell taxonomy revealed by single cell transcriptomics","volume":"19","author":"Tasic","year":"2016","journal-title":"Nat. Neurosci."},{"key":"B30","doi-asserted-by":"publisher","first-page":"72","DOI":"10.1038\/s41586-018-0654-5","article-title":"Shared and distinct transcriptomic cell types across neocortical areas","volume":"563","author":"Tasic","year":"2018","journal-title":"Nature"},{"key":"B31","doi-asserted-by":"publisher","first-page":"252","DOI":"10.1038\/nrg2538","article-title":"A census of human transcription factors: function, expression and evolution","volume":"10","author":"Vaquerizas","year":"2009","journal-title":"Nat. Rev. Genet."},{"key":"B32","doi-asserted-by":"publisher","first-page":"e51503","DOI":"10.7554\/eLife.51503","article-title":"Deep learning models predict regulatory variants in pancreatic islets and refine type 2 diabetes association signals","volume":"9","author":"Wesolowska-Andersen","year":"2020","journal-title":"Elife"},{"key":"B33","doi-asserted-by":"publisher","first-page":"1276","DOI":"10.1038\/s41588-021-00921-z","article-title":"A genome-wide association study with 1,126,563 individuals identifies new risk loci for Alzheimer\u2019s disease","volume":"53","author":"Wightman","year":"2021","journal-title":"Nat. Genet."},{"key":"B34","doi-asserted-by":"publisher","first-page":"15","DOI":"10.1186\/s13059-017-1382-0","article-title":"SCANPY: large-scale single-cell gene expression data analysis","volume":"19","author":"Wolf","year":"2018","journal-title":"Genome Biol."},{"key":"B35","doi-asserted-by":"publisher","first-page":"69","DOI":"10.1038\/s41398-020-01195-5","article-title":"Integrative analysis of genome-wide association studies identifies novel loci associated with neuropsychiatric disorders","volume":"11","author":"Yao","year":"2021","journal-title":"Transl. Psychiatry"},{"key":"B36","doi-asserted-by":"crossref","DOI":"10.1101\/2020.06.21.163956","article-title":"Predicting gene expression from DNA sequence using residual neural network","author":"Zhang","year":"2020"},{"key":"B37","doi-asserted-by":"publisher","first-page":"1171","DOI":"10.1038\/s41588-018-0160-6","article-title":"Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk","volume":"50","author":"Zhou","year":"2018","journal-title":"Nat. Genet."},{"key":"B38","doi-asserted-by":"publisher","first-page":"931","DOI":"10.1038\/nmeth.3547","article-title":"Predicting effects of noncoding variants with deep learning-based sequence model","volume":"12","author":"Zhou","year":"2015","journal-title":"Nat. Methods"}],"container-title":["Frontiers in Bioinformatics"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fbinf.2024.1347276\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,3,4]],"date-time":"2024-03-04T04:52:35Z","timestamp":1709527955000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fbinf.2024.1347276\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,3,4]]},"references-count":38,"alternative-id":["10.3389\/fbinf.2024.1347276"],"URL":"https:\/\/doi.org\/10.3389\/fbinf.2024.1347276","relation":{},"ISSN":["2673-7647"],"issn-type":[{"type":"electronic","value":"2673-7647"}],"subject":[],"published":{"date-parts":[[2024,3,4]]},"article-number":"1347276"}}