{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,13]],"date-time":"2026-03-13T03:44:14Z","timestamp":1773373454481,"version":"3.50.1"},"update-to":[{"DOI":"10.1371\/journal.pcbi.1009464","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2021,11,4]],"date-time":"2021-11-04T00:00:00Z","timestamp":1635984000000}}],"reference-count":38,"publisher":"Public Library of Science (PLoS)","issue":"10","license":[{"start":{"date-parts":[[2021,10,19]],"date-time":"2021-10-19T00:00:00Z","timestamp":1634601600000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["www.ploscompbiol.org"],"crossmark-restriction":false},"short-container-title":["PLoS Comput Biol"],"abstract":"<jats:p>Gene selection in unannotated large single cell RNA sequencing (scRNA-seq) data is important and crucial step in the preliminary step of downstream analysis. The existing approaches are primarily based on high variation (highly variable genes) or significant high expression (highly expressed genes) failed to provide stable and predictive feature set due to technical noise present in the data. Here, we propose<jats:italic>RgCop<\/jats:italic>, a novel<jats:bold>r<\/jats:bold>e<jats:bold>g<\/jats:bold>ularized<jats:bold>cop<\/jats:bold>ula based method for gene selection from large single cell RNA-seq data.<jats:italic>RgCop<\/jats:italic>utilizes copula correlation (<jats:italic>Ccor<\/jats:italic>), a robust equitable dependence measure that captures multivariate dependency among a set of genes in single cell expression data. We formulate an objective function by adding<jats:italic>l<\/jats:italic><jats:sub>1<\/jats:sub>regularization term with<jats:italic>Ccor<\/jats:italic>to penalizes the redundant co-efficient of features\/genes, resulting non-redundant effective features\/genes set. Results show a significant improvement in the clustering\/classification performance of real life scRNA-seq data over the other state-of-the-art.<jats:italic>RgCop<\/jats:italic>performs extremely well in capturing dependence among the features of noisy data due to the scale invariant property of copula, thereby improving the stability of the method. Moreover, the differentially expressed (DE) genes identified from the clusters of scRNA-seq data are found to provide an accurate annotation of cells. Finally, the features\/genes obtained from<jats:italic>RgCop<\/jats:italic>is able to annotate the unknown cells with high accuracy.<\/jats:p>","DOI":"10.1371\/journal.pcbi.1009464","type":"journal-article","created":{"date-parts":[[2021,10,21]],"date-time":"2021-10-21T03:27:51Z","timestamp":1634786871000},"page":"e1009464","update-policy":"https:\/\/doi.org\/10.1371\/journal.pcbi.corrections_policy","source":"Crossref","is-referenced-by-count":19,"title":["RgCop-A regularized copula based method for gene selection in single-cell RNA-seq data"],"prefix":"10.1371","volume":"17","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6694-5344","authenticated-orcid":true,"given":"Snehalika","family":"Lall","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3371-8516","authenticated-orcid":true,"given":"Sumanta","family":"Ray","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6370-2083","authenticated-orcid":true,"given":"Sanghamitra","family":"Bandyopadhyay","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"340","published-online":{"date-parts":[[2021,10,19]]},"reference":[{"key":"pcbi.1009464.ref001","doi-asserted-by":"crossref","first-page":"14049","DOI":"10.1038\/ncomms14049","article-title":"Massively parallel digital transcriptional profiling of single cells","volume":"8","author":"GX Zheng","year":"2017","journal-title":"Nature communications"},{"key":"pcbi.1009464.ref002","article-title":"sc-REnF: An Entropy Guided Robust Feature Selection for Single-Cell RNA-seq Data","author":"S Lall","year":"2021","journal-title":"bioRxiv"},{"key":"pcbi.1009464.ref003","article-title":"Generating realistic cell samples for gene selection in scRNA-seq data: A novel generative framework","author":"S Lall","year":"2021","journal-title":"bioRxiv"},{"issue":"5","key":"pcbi.1009464.ref004","doi-asserted-by":"crossref","first-page":"483","DOI":"10.1038\/nmeth.4236","article-title":"SC3: consensus clustering of single-cell RNA-seq data","volume":"14","author":"VY Kiselev","year":"2017","journal-title":"Nature methods"},{"issue":"13","key":"pcbi.1009464.ref005","doi-asserted-by":"crossref","first-page":"e117","DOI":"10.1093\/nar\/gkw430","article-title":"TSCAN: Pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis","volume":"44","author":"Z Ji","year":"2016","journal-title":"Nucleic acids research"},{"issue":"6391","key":"pcbi.1009464.ref006","doi-asserted-by":"crossref","DOI":"10.1126\/science.aaq1723","article-title":"Cell type atlas and lineage tree of a whole complex animal by single-cell transcriptomics","volume":"360","author":"M Plass","year":"2018","journal-title":"Science"},{"issue":"6391","key":"pcbi.1009464.ref007","doi-asserted-by":"crossref","DOI":"10.1126\/science.aaq1736","article-title":"Cell type transcriptome atlas for the planarian Schmidtea mediterranea","volume":"360","author":"CT Fincher","year":"2018","journal-title":"Science"},{"key":"pcbi.1009464.ref008","article-title":"MarkerCapsule: Explainable Single Cell Typing using Capsule Networks","author":"S Ray","year":"2020","journal-title":"bioRxiv"},{"issue":"6","key":"pcbi.1009464.ref009","doi-asserted-by":"crossref","first-page":"e8746","DOI":"10.15252\/msb.20188746","article-title":"Current best practices in single-cell RNA-seq analysis: a tutorial","volume":"15","author":"MD Luecken","year":"2019","journal-title":"Molecular systems biology"},{"issue":"5","key":"pcbi.1009464.ref010","doi-asserted-by":"crossref","first-page":"411","DOI":"10.1038\/nbt.4096","article-title":"Integrating single-cell transcriptomic data across different conditions, technologies, and species","volume":"36","author":"A Butler","year":"2018","journal-title":"Nature biotechnology"},{"key":"pcbi.1009464.ref011","article-title":"Generating realistic cell samples for gene selection in scRNA-seq data: A novel generative framework","author":"S Lall","year":"2021","journal-title":"bioRxiv"},{"key":"pcbi.1009464.ref012","doi-asserted-by":"crossref","first-page":"107697","DOI":"10.1016\/j.patcog.2020.107697","article-title":"Stable feature selection using copula based mutual information","volume":"112","author":"S Lall","year":"2021","journal-title":"Pattern Recognition"},{"issue":"1","key":"pcbi.1009464.ref013","doi-asserted-by":"crossref","first-page":"225","DOI":"10.1186\/1471-2105-9-225","article-title":"A copula method for modeling directional dependence of genes","volume":"9","author":"JM Kim","year":"2008","journal-title":"BMC bioinformatics"},{"issue":"1","key":"pcbi.1009464.ref014","first-page":"1","article-title":"CODC: a Copula-based model to identify differential coexpression","volume":"6","author":"S Ray","year":"2020","journal-title":"NPJ systems biology and applications"},{"issue":"2","key":"pcbi.1009464.ref015","doi-asserted-by":"crossref","first-page":"621","DOI":"10.1093\/bioinformatics\/btz599","article-title":"Gaussian mixture copulas for high-dimensional clustering and dependency-based subtyping","volume":"36","author":"SR Kasa","year":"2020","journal-title":"Bioinformatics"},{"issue":"22","key":"pcbi.1009464.ref016","doi-asserted-by":"crossref","first-page":"e179","DOI":"10.1093\/nar\/gkx828","article-title":"Linnorm: improved statistical analysis for single cell RNA-seq expression data","volume":"45","author":"SH Yip","year":"2017","journal-title":"Nucleic acids research"},{"issue":"1","key":"pcbi.1009464.ref017","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1186\/s13059-017-1382-0","article-title":"SCANPY: large-scale single-cell gene expression data analysis","volume":"19","author":"FA Wolf","year":"2018","journal-title":"Genome biology"},{"issue":"1","key":"pcbi.1009464.ref018","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41598-019-41695-z","article-title":"From Louvain to Leiden: guaranteeing well-connected communities","volume":"9","author":"VA Traag","year":"2019","journal-title":"Scientific reports"},{"issue":"1","key":"pcbi.1009464.ref019","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13059-017-1305-0","article-title":"Splatter: simulation of single-cell RNA sequencing data","volume":"18","author":"L Zappia","year":"2017","journal-title":"Genome biology"},{"issue":"1","key":"pcbi.1009464.ref020","doi-asserted-by":"crossref","first-page":"144","DOI":"10.1186\/s13059-016-1010-4","article-title":"GiniClust: detecting rare cell types from single-cell gene expression data with Gini index","volume":"17","author":"L Jiang","year":"2016","journal-title":"Genome biology"},{"issue":"5","key":"pcbi.1009464.ref021","doi-asserted-by":"crossref","first-page":"1202","DOI":"10.1016\/j.cell.2015.05.002","article-title":"Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets","volume":"161","author":"EZ Macosko","year":"2015","journal-title":"Cell"},{"issue":"6","key":"pcbi.1009464.ref022","doi-asserted-by":"crossref","first-page":"637","DOI":"10.1038\/nmeth.2930","article-title":"Validation of noise models for single-cell transcriptomics","volume":"11","author":"D Gr\u00fcn","year":"2014","journal-title":"Nature methods"},{"issue":"Nov","key":"pcbi.1009464.ref023","first-page":"1531","article-title":"Fast binary feature selection with conditional mutual information","volume":"5","author":"F Fleuret","year":"2004","journal-title":"Journal of Machine Learning Research"},{"issue":"22","key":"pcbi.1009464.ref024","doi-asserted-by":"crossref","first-page":"8520","DOI":"10.1016\/j.eswa.2015.07.007","article-title":"Feature selection using joint mutual information maximisation","volume":"42","author":"M Bennasar","year":"2015","journal-title":"Expert Systems with Applications"},{"key":"pcbi.1009464.ref025","first-page":"91","volume-title":"Workshops on Applications of Evolutionary Computation","author":"PE Meyer","year":"2006"},{"issue":"8","key":"pcbi.1009464.ref026","doi-asserted-by":"crossref","first-page":"1226","DOI":"10.1109\/TPAMI.2005.159","article-title":"Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy","volume":"27","author":"H Peng","year":"2005","journal-title":"IEEE Transactions on pattern analysis and machine intelligence"},{"issue":"D1","key":"pcbi.1009464.ref027","doi-asserted-by":"crossref","first-page":"D721","DOI":"10.1093\/nar\/gky900","article-title":"CellMarker: a manually curated resource of cell markers in human and mouse","volume":"47","author":"X Zhang","year":"2019","journal-title":"Nucleic acids research"},{"issue":"1","key":"pcbi.1009464.ref028","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1038\/s41467-018-07234-6","article-title":"Discovery of rare cells from voluminous single cell expression data","volume":"9","author":"A Jindal","year":"2018","journal-title":"Nature communications"},{"issue":"1","key":"pcbi.1009464.ref029","first-page":"1","article-title":"A benchmark of batch-effect correction methods for single-cell RNA sequencing data","volume":"21","author":"HTN Tran","year":"2020","journal-title":"Genome biology"},{"issue":"4","key":"pcbi.1009464.ref030","doi-asserted-by":"crossref","first-page":"1015","DOI":"10.1016\/j.cell.2018.07.028","article-title":"Molecular diversity and specializations among the cells of the adult mouse brain","volume":"174","author":"A Saunders","year":"2018","journal-title":"Cell"},{"issue":"9","key":"pcbi.1009464.ref031","doi-asserted-by":"crossref","first-page":"1131","DOI":"10.1038\/nsmb.2660","article-title":"Single-cell RNA-Seq profiling of human preimplantation embryos and embryonic stem cells","volume":"20","author":"L Yan","year":"2013","journal-title":"Nature structural & molecular biology"},{"issue":"10","key":"pcbi.1009464.ref032","doi-asserted-by":"crossref","first-page":"1053","DOI":"10.1038\/nbt.2967","article-title":"Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex","volume":"32","author":"AA Pollen","year":"2014","journal-title":"Nature biotechnology"},{"issue":"4","key":"pcbi.1009464.ref033","doi-asserted-by":"crossref","first-page":"385","DOI":"10.1016\/j.cels.2016.09.002","article-title":"A single-cell transcriptome atlas of the human pancreas","volume":"3","author":"MJ Muraro","year":"2016","journal-title":"Cell systems"},{"key":"pcbi.1009464.ref034","volume-title":"An introduction to copulas","author":"RB Nelsen","year":"2007"},{"issue":"284","key":"pcbi.1009464.ref035","doi-asserted-by":"crossref","first-page":"814","DOI":"10.1080\/01621459.1958.10501481","article-title":"Ordinal measures of association","volume":"53","author":"WH Kruskal","year":"1958","journal-title":"Journal of the American Statistical Association"},{"key":"pcbi.1009464.ref036","unstructured":"Nelsen RB. Properties and applications of copulas: A brief survey. In: Proceedings of the First Brazilian Conference on Statistical Modeling in Insurance and Finance,(Dhaene, J., Kolev, N., Morettin, PA (Eds.)), University Press USP: Sao Paulo; 2003. p. 10\u201328."},{"key":"pcbi.1009464.ref037","first-page":"601","article-title":"Feature selection for high-dimensional genomic microarray data","volume":"vol. 1","author":"E Xing","year":"2001","journal-title":"ICML"},{"issue":"Jan","key":"pcbi.1009464.ref038","first-page":"27","article-title":"Conditional likelihood maximisation: a unifying framework for information theoretic feature selection","volume":"13","author":"G Brown","year":"2012","journal-title":"Journal of machine learning research"}],"updated-by":[{"DOI":"10.1371\/journal.pcbi.1009464","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2021,11,4]],"date-time":"2021-11-04T00:00:00Z","timestamp":1635984000000}}],"container-title":["PLOS Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1009464","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,11]],"date-time":"2023-11-11T13:15:08Z","timestamp":1699708508000},"score":1,"resource":{"primary":{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1009464"}},"subtitle":[],"editor":[{"given":"Wei","family":"Li","sequence":"first","affiliation":[],"role":[{"role":"editor","vocabulary":"crossref"}]}],"short-title":[],"issued":{"date-parts":[[2021,10,19]]},"references-count":38,"journal-issue":{"issue":"10","published-online":{"date-parts":[[2021,10,19]]}},"URL":"https:\/\/doi.org\/10.1371\/journal.pcbi.1009464","relation":{"new_version":[{"id-type":"doi","id":"10.1371\/journal.pcbi.1009464","asserted-by":"object"}]},"ISSN":["1553-7358"],"issn-type":[{"value":"1553-7358","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,10,19]]}}}