{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,27]],"date-time":"2026-02-27T06:12:55Z","timestamp":1772172775226,"version":"3.50.1"},"update-to":[{"DOI":"10.1371\/journal.pcbi.1008569","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2021,1,20]],"date-time":"2021-01-20T00:00:00Z","timestamp":1611100800000}}],"reference-count":44,"publisher":"Public Library of Science (PLoS)","issue":"1","license":[{"start":{"date-parts":[[2021,1,7]],"date-time":"2021-01-07T00:00:00Z","timestamp":1609977600000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["www.ploscompbiol.org"],"crossmark-restriction":false},"short-container-title":["PLoS Comput Biol"],"abstract":"<jats:p>\n                    The analysis of single-cell genomics data presents several statistical challenges, and extensive efforts have been made to produce methods for the analysis of this data that impute missing values, address sampling issues and quantify and correct for noise. In spite of such efforts, no consensus on best practices has been established and all current approaches vary substantially based on the available data and empirical tests. The k-Nearest Neighbor Graph (kNN-G) is often used to infer the identities of, and relationships between, cells and is the basis of many widely used dimensionality-reduction and projection methods. The kNN-G has also been the basis for imputation methods using,\n                    <jats:italic>e.g<\/jats:italic>\n                    ., neighbor averaging and graph diffusion. However, due to the lack of an agreed-upon optimal objective function for choosing hyperparameters, these methods tend to oversmooth data, thereby resulting in a loss of information with regard to cell identity and the specific gene-to-gene patterns underlying regulatory mechanisms. In this paper, we investigate the tuning of kNN- and diffusion-based denoising methods with a novel non-stochastic method for optimally preserving biologically relevant informative variance in single-cell data. The framework,\n                    <jats:italic>Denoising Expression data with a Weighted Affinity Kernel and Self-Supervision<\/jats:italic>\n                    (DEW\u00c4KSS), uses a self-supervised technique to tune its parameters. We demonstrate that denoising with optimal parameters selected by our objective function (i) is robust to preprocessing methods using data from established benchmarks, (ii) disentangles cellular identity and maintains robust clusters over dimension-reduction methods, (iii) maintains variance along several expression dimensions, unlike previous heuristic-based methods that tend to oversmooth data variance, and (iv) rarely involves diffusion but rather uses a fixed weighted kNN graph for denoising. Together, these findings provide a new understanding of kNN- and diffusion-based denoising methods. Code and example data for DEW\u00c4KSS is available at\n                    <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/gitlab.com\/Xparx\/dewakss\/-\/tree\/Tjarnberg2020branch\" xlink:type=\"simple\">https:\/\/gitlab.com\/Xparx\/dewakss\/-\/tree\/Tjarnberg2020branch<\/jats:ext-link>\n                    .\n                  <\/jats:p>","DOI":"10.1371\/journal.pcbi.1008569","type":"journal-article","created":{"date-parts":[[2021,1,7]],"date-time":"2021-01-07T18:51:15Z","timestamp":1610045475000},"page":"e1008569","update-policy":"https:\/\/doi.org\/10.1371\/journal.pcbi.corrections_policy","source":"Crossref","is-referenced-by-count":25,"title":["Optimal tuning of weighted kNN- and diffusion-based methods for denoising single cell genomics data"],"prefix":"10.1371","volume":"17","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-0064-1791","authenticated-orcid":true,"given":"Andreas","family":"Tj\u00e4rnberg","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4437-5416","authenticated-orcid":true,"given":"Omar","family":"Mahmood","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8769-2710","authenticated-orcid":true,"given":"Christopher A.","family":"Jackson","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9994-7354","authenticated-orcid":true,"given":"Giuseppe-Antonio","family":"Saldi","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1669-3211","authenticated-orcid":true,"given":"Kyunghyun","family":"Cho","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5930-5667","authenticated-orcid":true,"given":"Lionel A.","family":"Christiaen","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4354-7906","authenticated-orcid":true,"given":"Richard A.","family":"Bonneau","sequence":"additional","affiliation":[]}],"member":"340","published-online":{"date-parts":[[2021,1,7]]},"reference":[{"issue":"4","key":"pcbi.1008569.ref001","doi-asserted-by":"crossref","first-page":"631","DOI":"10.1016\/j.molcel.2017.01.023","article-title":"Comparative Analysis of Single-Cell RNA Sequencing Methods","volume":"65","author":"C Ziegenhain","year":"2017","journal-title":"Molecular Cell"},{"key":"pcbi.1008569.ref002","article-title":"Challenges in unsupervised clustering of single-cell RNA-seq data","author":"VY Kiselev","year":"2019","journal-title":"Nature Reviews Genetics"},{"key":"pcbi.1008569.ref003","article-title":"Comprehensive integration of single cell data","author":"T Stuart","year":"2018","journal-title":"bioRxiv"},{"issue":"1","key":"pcbi.1008569.ref004","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1186\/s13059-017-1382-0","article-title":"SCANPY: large-scale single-cell gene expression data analysis","volume":"19","author":"FA Wolf","year":"2018","journal-title":"Genome Biology"},{"issue":"3","key":"pcbi.1008569.ref005","doi-asserted-by":"crossref","first-page":"716","DOI":"10.1016\/j.cell.2018.05.061","article-title":"Recovering Gene Interactions from Single-Cell Data Using Data Diffusion","volume":"174","author":"D van Dijk","year":"2018","journal-title":"Cell"},{"issue":"1","key":"pcbi.1008569.ref006","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1186\/s13059-017-1382-0","article-title":"SCANPY: large-scale single-cell gene expression data analysis","volume":"19","author":"FA Wolf","year":"2018","journal-title":"Genome Biology"},{"key":"pcbi.1008569.ref007","article-title":"Orchestrating Single-Cell Analysis with Bioconductor","author":"RA Amezquita","year":"2019","journal-title":"bioRxiv"},{"issue":"1","key":"pcbi.1008569.ref008","doi-asserted-by":"crossref","first-page":"63","DOI":"10.1186\/s13059-016-0927-y","article-title":"Design and computational analysis of single-cell RNA-sequencing experiments","volume":"17","author":"R Bacher","year":"2016","journal-title":"Genome Biology"},{"issue":"4","key":"pcbi.1008569.ref009","doi-asserted-by":"crossref","first-page":"311","DOI":"10.1038\/s41592-019-0353-7","article-title":"Scalable analysis of cell-type composition from single-cell transcriptomics using deep recurrent learning","volume":"16","author":"Y Deng","year":"2019","journal-title":"Nature Methods"},{"issue":"1","key":"pcbi.1008569.ref010","doi-asserted-by":"crossref","first-page":"997","DOI":"10.1038\/s41467-018-03405-7","article-title":"An accurate and robust imputation method scImpute for single-cell RNA-seq data","volume":"9","author":"WV Li","year":"2018","journal-title":"Nature Communications"},{"issue":"1","key":"pcbi.1008569.ref011","doi-asserted-by":"crossref","first-page":"220","DOI":"10.1186\/s12859-018-2226-y","article-title":"DrImpute: imputing dropout events in single cell RNA sequencing data","volume":"19","author":"W Gong","year":"2018","journal-title":"BMC Bioinformatics"},{"issue":"1","key":"pcbi.1008569.ref012","doi-asserted-by":"crossref","first-page":"390","DOI":"10.1038\/s41467-018-07931-2","article-title":"Single-cell RNA-seq denoising using a deep count autoencoder","volume":"10","author":"G Eraslan","year":"2019","journal-title":"Nature Communications"},{"key":"pcbi.1008569.ref013","article-title":"K-nearest neighbor smoothing for high-throughput single-cell RNA-Seq data","author":"F Wagner","year":"2017","journal-title":"bioRxiv"},{"issue":"3","key":"pcbi.1008569.ref014","doi-asserted-by":"crossref","first-page":"716","DOI":"10.1016\/j.cell.2018.05.061","article-title":"Recovering Gene Interactions from Single-Cell Data Using Data Diffusion","volume":"174","author":"D van Dijk","year":"2018","journal-title":"Cell"},{"key":"pcbi.1008569.ref015","doi-asserted-by":"crossref","first-page":"740","DOI":"10.1038\/nmeth.2967","article-title":"Bayesian approach to single-cell differential expression analysis","volume":"11","author":"PV Kharchenko","year":"2014","journal-title":"Nature Methods"},{"key":"pcbi.1008569.ref016","article-title":"Droplet scRNA-seq is not zero-inflated","author":"V Svensson","year":"2019","journal-title":"bioRxiv"},{"key":"pcbi.1008569.ref017","doi-asserted-by":"crossref","DOI":"10.1038\/s41592-018-0229-2","article-title":"Deep generative modeling for single-cell transcriptomics","volume":"15","author":"R Lopez","year":"2018","journal-title":"Nature Methods"},{"key":"pcbi.1008569.ref018","doi-asserted-by":"crossref","unstructured":"McInnes L, Healy J. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. ArXiv e-prints. 2018;.","DOI":"10.21105\/joss.00861"},{"key":"pcbi.1008569.ref019","author":"V Traag","year":"2018","journal-title":"From Louvain to Leiden: guaranteeing well-connected communities"},{"issue":"10","key":"pcbi.1008569.ref020","doi-asserted-by":"crossref","first-page":"P10008","DOI":"10.1088\/1742-5468\/2008\/10\/P10008","article-title":"Fast unfolding of communities in large networks","volume":"2008","author":"VD Blondel","year":"2008","journal-title":"Journal of Statistical Mechanics: Theory and Experiment"},{"issue":"6392","key":"pcbi.1008569.ref021","doi-asserted-by":"crossref","DOI":"10.1126\/science.aar3131","article-title":"Single-cell reconstruction of developmental trajectories during zebrafish embryogenesis","volume":"360","author":"JA Farrell","year":"2018","journal-title":"Science"},{"key":"pcbi.1008569.ref022","doi-asserted-by":"crossref","first-page":"845","DOI":"10.1038\/nmeth.3971","article-title":"Diffusion pseudotime robustly reconstructs lineage branching","volume":"13","author":"L Haghverdi","year":"2016","journal-title":"Nature Methods"},{"issue":"18","key":"pcbi.1008569.ref023","doi-asserted-by":"crossref","first-page":"2989","DOI":"10.1093\/bioinformatics\/btv325","article-title":"Diffusion maps for high-dimensional single-cell analysis of differentiation data","volume":"31","author":"L Haghverdi","year":"2015","journal-title":"Bioinformatics"},{"key":"pcbi.1008569.ref024","article-title":"A novel metric reveals previously unrecognized distortion in dimensionality reduction of scRNA-Seq data","author":"SM Cooley","year":"2019","journal-title":"bioRxiv"},{"key":"pcbi.1008569.ref025","article-title":"ENHANCE: Accurate denoising of single-cell RNA-Seq data","author":"F Wagner","year":"2019","journal-title":"bioRxiv"},{"key":"pcbi.1008569.ref026","article-title":"Noise2Self: Blind Denoising by Self-Supervision","author":"JD Batson","year":"2019","journal-title":"CoRR"},{"issue":"1","key":"pcbi.1008569.ref027","doi-asserted-by":"crossref","first-page":"211","DOI":"10.1186\/s13059-019-1837-6","article-title":"DeepImpute: an accurate, fast, and scalable deep neural network method to impute single-cell RNA-seq data","volume":"20","author":"C Arisdakessian","year":"2019","journal-title":"Genome Biology"},{"issue":"7","key":"pcbi.1008569.ref028","doi-asserted-by":"crossref","first-page":"539","DOI":"10.1038\/s41592-018-0033-z","article-title":"SAVER: gene expression recovery for single-cell RNA sequencing","volume":"15","author":"M Huang","year":"2018","journal-title":"Nature Methods"},{"issue":"6","key":"pcbi.1008569.ref029","doi-asserted-by":"crossref","first-page":"479","DOI":"10.1038\/s41592-019-0425-8","article-title":"Benchmarking single cell RNA-sequencing analysis pipelines using mixture control experiments","volume":"16","author":"L Tian","year":"2019","journal-title":"Nature Methods"},{"key":"pcbi.1008569.ref030","doi-asserted-by":"crossref","first-page":"77","DOI":"10.1186\/s13059-016-0938-8","article-title":"CEL-Seq2: sensitive highly-multiplexed single-cell RNA-Seq","volume":"17","author":"T Hashimshony","year":"2016","journal-title":"Genome Biology"},{"issue":"4","key":"pcbi.1008569.ref031","doi-asserted-by":"crossref","first-page":"385","DOI":"10.1016\/j.cels.2016.09.002","article-title":"A Single-Cell Transcriptome Atlas of the Human Pancreas","volume":"3","author":"M Muraro","year":"2016","journal-title":"Cell Systems"},{"key":"pcbi.1008569.ref032","author":"S Su","year":"2019","journal-title":"CellBench: Construct Benchmarks for Single Cell Analysis Methods"},{"issue":"1","key":"pcbi.1008569.ref033","doi-asserted-by":"crossref","first-page":"120","DOI":"10.1038\/s41593-017-0029-5","article-title":"Single-cell analysis of experience-dependent transcriptomic states in the mouse visual cortex","volume":"21","author":"S Hrvatin","year":"2018","journal-title":"Nature neuroscience"},{"issue":"7","key":"pcbi.1008569.ref034","doi-asserted-by":"crossref","first-page":"1663","DOI":"10.1016\/j.cell.2015.11.013","article-title":"Transcriptional Heterogeneity and Lineage Commitment in Myeloid Progenitors","volume":"163","author":"F Paul","year":"2015","journal-title":"Cell"},{"key":"pcbi.1008569.ref035","doi-asserted-by":"crossref","first-page":"e51254","DOI":"10.7554\/eLife.51254","article-title":"Gene regulatory network reconstruction using single-cell RNA sequencing of barcoded genotypes in diverse environments","volume":"9","author":"CA Jackson","year":"2020","journal-title":"eLife"},{"issue":"7719","key":"pcbi.1008569.ref036","doi-asserted-by":"crossref","first-page":"494","DOI":"10.1038\/s41586-018-0414-6","article-title":"RNA velocity of single cells","volume":"560","author":"G La Manno","year":"2018","journal-title":"Nature"},{"issue":"1","key":"pcbi.1008569.ref037","doi-asserted-by":"crossref","first-page":"196","DOI":"10.1186\/s13059-018-1575-1","article-title":"VIPER: variability-preserving imputation for accurate gene expression recovery in single-cell RNA sequencing studies","volume":"19","author":"M Chen","year":"2018","journal-title":"Genome Biology"},{"key":"pcbi.1008569.ref038","article-title":"Molecular Cross-Validation for Single-Cell RNA-seq","author":"J Batson","year":"2019","journal-title":"bioRxiv"},{"issue":"3","key":"pcbi.1008569.ref039","doi-asserted-by":"crossref","first-page":"740","DOI":"10.1016\/j.cell.2014.02.054","article-title":"Large-Scale Genetic Perturbations Reveal Regulatory Networks and an Abundance of Gene-Specific Repressors","volume":"157","author":"P Kemmeren","year":"2014","journal-title":"Cell"},{"issue":"29","key":"pcbi.1008569.ref040","doi-asserted-by":"crossref","first-page":"861","DOI":"10.21105\/joss.00861","article-title":"UMAP: Uniform Manifold Approximation and Projection","volume":"3","author":"L McInnes","year":"2018","journal-title":"The Journal of Open Source Software"},{"issue":"4","key":"pcbi.1008569.ref041","doi-asserted-by":"crossref","first-page":"607","DOI":"10.1214\/aoms\/1177729756","article-title":"Transformations Related to the Angular and the Square Root","volume":"21","author":"MF Freeman","year":"1950","journal-title":"Ann Math Statist"},{"issue":"2","key":"pcbi.1008569.ref042","doi-asserted-by":"crossref","first-page":"22","DOI":"10.1109\/MCSE.2011.37","article-title":"The NumPy Array: A Structure for Efficient Numerical Computation","volume":"13","author":"S van der Walt","year":"2011","journal-title":"Computing in Science Engineering"},{"key":"pcbi.1008569.ref043","author":"L Tian","year":"2019","journal-title":"Single cell mixology: single cell RNA-seq benchmarking"},{"issue":"6226","key":"pcbi.1008569.ref044","doi-asserted-by":"crossref","first-page":"1138","DOI":"10.1126\/science.aaa1934","article-title":"Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq","volume":"347","author":"A Zeisel","year":"2015","journal-title":"Science"}],"updated-by":[{"DOI":"10.1371\/journal.pcbi.1008569","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2021,1,20]],"date-time":"2021-01-20T00:00:00Z","timestamp":1611100800000}}],"container-title":["PLOS Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1008569","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,1,20]],"date-time":"2021-01-20T14:43:16Z","timestamp":1611153796000},"score":1,"resource":{"primary":{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1008569"}},"subtitle":[],"editor":[{"given":"Qing","family":"Nie","sequence":"first","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2021,1,7]]},"references-count":44,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2021,1,7]]}},"URL":"https:\/\/doi.org\/10.1371\/journal.pcbi.1008569","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2020.02.28.970202","asserted-by":"object"}]},"ISSN":["1553-7358"],"issn-type":[{"value":"1553-7358","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,1,7]]}}}