{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,2]],"date-time":"2026-04-02T19:15:33Z","timestamp":1775157333452,"version":"3.50.1"},"reference-count":40,"publisher":"Frontiers Media SA","license":[{"start":{"date-parts":[[2023,5,15]],"date-time":"2023-05-15T00:00:00Z","timestamp":1684108800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["frontiersin.org"],"crossmark-restriction":true},"short-container-title":["Front. Bioinform."],"abstract":"<jats:p>One important characteristic of single-cell RNA sequencing (scRNA-seq) data is its high sparsity, where the gene-cell count data matrix contains high proportion of zeros. The sparsity has motivated widespread discussions on dropouts and missing data, as well as imputation algorithms of scRNA-seq analysis. Here, we aim to investigate whether there exist genes that are more prone to be under-detected in scRNA-seq, and if yes, what commonalities those genes may share. From public data sources, we gathered paired bulk RNA-seq and scRNA-seq data from 53 human samples, which were generated in diverse biological contexts. We derived pseudo-bulk gene expression by averaging the scRNA-seq data across cells. Comparisons of the paired bulk and pseudo-bulk gene expression profiles revealed that there indeed exists a collection of genes that are frequently under-detected in scRNA-seq compared to bulk RNA-seq. This result was robust to randomization when unpaired bulk and pseudo-bulk gene expression profiles were compared. We performed motif search to the last 350\u00a0bp of the identified genes, and observed an enrichment of poly(T) motif. The poly(T) motif toward the tails of those genes may be able to form hairpin structures with the poly(A) tails of their mRNA transcripts, making it difficult for their mRNA transcripts to be captured during scRNA-seq library preparation, which is a mechanistic conjecture of why certain genes may be more prone to be under-detected in scRNA-seq.<\/jats:p>","DOI":"10.3389\/fbinf.2023.1120290","type":"journal-article","created":{"date-parts":[[2023,5,15]],"date-time":"2023-05-15T15:32:15Z","timestamp":1684164735000},"update-policy":"https:\/\/doi.org\/10.3389\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["Gene representation in scRNA-seq is correlated with common motifs at the 3\u2032 end of transcripts"],"prefix":"10.3389","volume":"3","author":[{"given":"Xinling","family":"Li","sequence":"first","affiliation":[]},{"given":"Greg","family":"Gibson","sequence":"additional","affiliation":[]},{"given":"Peng","family":"Qiu","sequence":"additional","affiliation":[]}],"member":"1965","published-online":{"date-parts":[[2023,5,15]]},"reference":[{"key":"B1","doi-asserted-by":"publisher","first-page":"2865","DOI":"10.1093\/bioinformatics\/bty1044","article-title":"M3Drop: Dropout-based feature selection for scRNASeq","volume":"35","author":"Andrews","year":"2019","journal-title":"Bioinformatics"},{"key":"B2","doi-asserted-by":"publisher","first-page":"W202","DOI":"10.1093\/nar\/gkp335","article-title":"Meme suite: Tools for motif discovery and searching","volume":"37","author":"Bailey","year":"2009","journal-title":"Nucleic Acids Res."},{"key":"B3","doi-asserted-by":"publisher","first-page":"233","DOI":"10.1093\/bfgp\/elx035","article-title":"Experimental design for single-cell RNA sequencing","volume":"17","author":"Baran-Gale","year":"2018","journal-title":"Brief. Funct. Genomics"},{"key":"B4","doi-asserted-by":"publisher","first-page":"38","DOI":"10.1038\/nbt.4314","article-title":"Dimensionality reduction for visualizing single-cell data using UMAP","volume":"37","author":"Becht","year":"2018","journal-title":"Nat. Biotechnol."},{"key":"B5","doi-asserted-by":"publisher","first-page":"411","DOI":"10.1038\/nbt.4096","article-title":"Integrating single-cell transcriptomic data across different conditions, technologies, and species","volume":"36","author":"Butler","year":"2018","journal-title":"Nat. Biotechnol."},{"key":"B6","doi-asserted-by":"publisher","first-page":"1540","DOI":"10.1164\/rccm.201904-0792oc","article-title":"Single-cell reconstruction of human basal cell diversity in normal and idiopathic pulmonary fibrosis lungs","volume":"202","author":"Carraro","year":"2020","journal-title":"Am. J. Respir. Crit. Care Med."},{"key":"B7","doi-asserted-by":"publisher","first-page":"317","DOI":"10.3389\/fgene.2019.00317","article-title":"Single-cell RNA-seq technologies and related computational data analysis","volume":"10","author":"Chen","year":"2019","journal-title":"Front. Genet."},{"key":"B8","doi-asserted-by":"publisher","first-page":"1903","DOI":"10.1038\/s41467-019-09670-4","article-title":"Single-cell trajectories reconstruction, exploration and mapping of omics data with STREAM","volume":"10","author":"Chen","year":"2019","journal-title":"Nat. Commun."},{"key":"B9","doi-asserted-by":"publisher","first-page":"416","DOI":"10.1093\/bib\/bbz166","article-title":"Scdc: Bulk gene expression deconvolution by multiple single-cell RNA sequencing references","volume":"22","author":"Dong","year":"2021","journal-title":"Brief. Bioinform"},{"key":"B10","volume-title":"The elements of statistical learning","author":"Friedman","year":"2001"},{"key":"B11","doi-asserted-by":"publisher","first-page":"296","DOI":"10.1186\/s13059-019-1874-1","article-title":"Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression","volume":"20","author":"Hafemeister","year":"2019","journal-title":"Genome Biol."},{"key":"B12","doi-asserted-by":"publisher","first-page":"1353","DOI":"10.1101\/gr.234062.117","article-title":"Single-cell RNA-seq analysis identifies markers of resistance to targeted BRAF inhibitors in melanoma cell populations","volume":"28","author":"Ho","year":"2018","journal-title":"Genome Res."},{"key":"B13","doi-asserted-by":"publisher","first-page":"218","DOI":"10.1186\/s13059-020-02132-x","article-title":"A systematic evaluation of single-cell RNA-sequencing imputation methods","volume":"21","author":"Hou","year":"2020","journal-title":"Genome Biol."},{"key":"B14","doi-asserted-by":"publisher","first-page":"539","DOI":"10.1038\/s41592-018-0033-z","article-title":"Saver: Gene expression recovery for single-cell RNA sequencing","volume":"15","author":"Huang","year":"2018","journal-title":"Nat. Methods"},{"key":"B15","doi-asserted-by":"publisher","first-page":"e117","DOI":"10.1093\/nar\/gkw430","article-title":"Tscan: Pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis","volume":"44","author":"Ji","year":"2016","journal-title":"Nucleic Acids Res."},{"key":"B16","doi-asserted-by":"publisher","first-page":"740","DOI":"10.1038\/nmeth.2967","article-title":"Bayesian approach to single-cell differential expression analysis","volume":"11","author":"Kharchenko","year":"2014","journal-title":"Nat. Methods"},{"key":"B17","doi-asserted-by":"publisher","first-page":"196","DOI":"10.1186\/s13059-020-02096-y","article-title":"Demystifying \"drop-outs\" in single-cell UMI data","volume":"21","author":"Kim","year":"2020","journal-title":"Genome Biol."},{"key":"B18","doi-asserted-by":"publisher","first-page":"296","DOI":"10.1038\/nbt.3500","article-title":"Haplotypes drop by drop","volume":"34","author":"Kitzman","year":"2016","journal-title":"Nat. Biotechnol."},{"key":"B19","doi-asserted-by":"publisher","first-page":"1187","DOI":"10.1016\/j.cell.2015.04.044","article-title":"Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells","volume":"161","author":"Klein","year":"2015","journal-title":"Cell"},{"key":"B20","doi-asserted-by":"publisher","first-page":"5416","DOI":"10.1038\/s41467-019-13056-x","article-title":"The art of using t-SNE for single-cell transcriptomics","volume":"10","author":"Kobak","year":"2019","journal-title":"Nat. Commun."},{"key":"B21","doi-asserted-by":"publisher","first-page":"997","DOI":"10.1038\/s41467-018-03405-7","article-title":"An accurate and robust imputation method scImpute for single-cell RNA-seq data","volume":"9","author":"Li","year":"2018","journal-title":"Nat. Commun."},{"key":"B22","doi-asserted-by":"publisher","first-page":"87","DOI":"10.1038\/s41467-020-20358-y","article-title":"Single-cell transcriptome profiling of the vaginal wall in women with severe anterior vaginal prolapse","volume":"12","author":"Li","year":"2021","journal-title":"Nat. Commun."},{"key":"B23","doi-asserted-by":"publisher","first-page":"101","DOI":"10.1038\/s41586-020-2734-6","article-title":"Reprogramming roadmap reveals route to human induced trophoblast stem cells","volume":"586","author":"Liu","year":"2020","journal-title":"Nature"},{"key":"B24","doi-asserted-by":"publisher","first-page":"550","DOI":"10.1186\/s13059-014-0550-8","article-title":"Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2","volume":"15","author":"Love","year":"2014","journal-title":"Genome Biol."},{"key":"B25","doi-asserted-by":"publisher","first-page":"75","DOI":"10.1186\/s13059-016-0947-7","article-title":"Pooling across cells to normalize single-cell RNA sequencing data with many zero counts","volume":"17","author":"Lun","year":"2016","journal-title":"Genome Biol."},{"key":"B26","doi-asserted-by":"publisher","first-page":"1202","DOI":"10.1016\/j.cell.2015.05.002","article-title":"Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets","volume":"161","author":"Macosko","year":"2015","journal-title":"Cell"},{"key":"B27","doi-asserted-by":"publisher","first-page":"e107333","DOI":"10.15252\/embj.2020107333","article-title":"A single-cell RNA expression atlas of normal, preneoplastic and tumorigenic states in the human breast","volume":"40","author":"Pal","year":"2021","journal-title":"EMBO J."},{"key":"B28","doi-asserted-by":"publisher","first-page":"367","DOI":"10.1126\/science.360.6387.367","article-title":"Chronicling embryos, cell by cell, gene by gene","volume":"360","author":"Pennisi","year":"2018","journal-title":"Science"},{"key":"B29","doi-asserted-by":"publisher","first-page":"1096","DOI":"10.1038\/nmeth.2639","article-title":"Smart-seq2 for sensitive full-length transcriptome profiling in single cells","volume":"10","author":"Picelli","year":"2013","journal-title":"Nat. Methods"},{"key":"B30","doi-asserted-by":"publisher","first-page":"1169","DOI":"10.1038\/s41467-020-14976-9","article-title":"Embracing the dropouts in single-cell RNA-seq analysis","volume":"11","author":"Qiu","year":"2020","journal-title":"Nat. Commun."},{"key":"B31","doi-asserted-by":"publisher","first-page":"770","DOI":"10.1038\/s41588-021-00873-4","article-title":"Separating measurement and expression models clarifies confusion in single-cell RNA sequencing analysis","volume":"53","author":"Sarkar","year":"2021","journal-title":"Nat. Genet."},{"key":"B32","first-page":"2022","article-title":"Machine learning-assisted identification of factors contributing to the technical variability between bulk and single-cell RNA-seq experiments","author":"Lipnitskaya","year":"2022"},{"key":"B33","doi-asserted-by":"publisher","first-page":"709","DOI":"10.1186\/1479-7364-5-6-709","article-title":"In-silico human genomics with GeneCards","volume":"5","author":"Stelzer","year":"2011","journal-title":"Hum. Genomics"},{"key":"B34","doi-asserted-by":"publisher","first-page":"147","DOI":"10.1038\/s41587-019-0379-5","article-title":"Droplet scRNA-seq is not zero-inflated","volume":"38","author":"Svensson","year":"2020","journal-title":"Nat. Biotechnol."},{"key":"B35","doi-asserted-by":"publisher","first-page":"388","DOI":"10.1186\/s12859-019-2977-0","article-title":"Rescue: Imputing dropout events in single-cell RNA-sequencing data","volume":"20","author":"Tracy","year":"2019","journal-title":"BMC Bioinforma."},{"key":"B36","doi-asserted-by":"publisher","first-page":"716","DOI":"10.1016\/j.cell.2018.05.061","article-title":"Recovering gene interactions from single-cell data using data diffusion","volume":"174","author":"van Dijk","year":"2018","journal-title":"Cell"},{"key":"B37","doi-asserted-by":"publisher","first-page":"253","DOI":"10.1016\/j.gpb.2020.02.005","article-title":"Direct comparative analyses of 10X genomics Chromium and smart-seq2","volume":"19","author":"Wang","year":"2021","journal-title":"Genomics Proteomics Bioinforma."},{"key":"B38","doi-asserted-by":"publisher","first-page":"1334","DOI":"10.1038\/s41588-021-00911-1","article-title":"A single-cell and spatially resolved atlas of human breast cancers","volume":"53","author":"Wu","year":"2021","journal-title":"Nat. Genet."},{"key":"B39","doi-asserted-by":"publisher","first-page":"13097","DOI":"10.1093\/nar\/gkx1189","article-title":"Linnorm: Improved statistical analysis for single cell RNA-seq expression data","volume":"45","author":"Yip","year":"2017","journal-title":"Nucleic Acids Res."},{"key":"B40","doi-asserted-by":"publisher","first-page":"2209","DOI":"10.1038\/s41467-019-09990-5","article-title":"Complete deconvolution of cellular mixtures based on linearity of transcriptional signatures","volume":"10","author":"Zaitsev","year":"2019","journal-title":"Nat. Commun."}],"container-title":["Frontiers in Bioinformatics"],"original-title":[],"link":[{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fbinf.2023.1120290\/full","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,15]],"date-time":"2023-05-15T15:32:17Z","timestamp":1684164737000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.frontiersin.org\/articles\/10.3389\/fbinf.2023.1120290\/full"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,5,15]]},"references-count":40,"alternative-id":["10.3389\/fbinf.2023.1120290"],"URL":"https:\/\/doi.org\/10.3389\/fbinf.2023.1120290","relation":{},"ISSN":["2673-7647"],"issn-type":[{"value":"2673-7647","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,5,15]]},"article-number":"1120290"}}