{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,29]],"date-time":"2025-12-29T11:31:02Z","timestamp":1767007862525,"version":"3.41.2"},"reference-count":43,"publisher":"Oxford University Press (OUP)","issue":"1","license":[{"start":{"date-parts":[[2023,12,12]],"date-time":"2023-12-12T00:00:00Z","timestamp":1702339200000},"content-version":"vor","delay-in-days":20,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100012190","name":"Ministry of Science and Higher Education of the Russian Federation","doi-asserted-by":"publisher","award":["075-15-2022-310"],"award-info":[{"award-number":["075-15-2022-310"]}],"id":[{"id":"10.13039\/501100012190","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2023,11,22]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Single-cell ATAC-seq (scATAC-seq) is a recently developed approach that provides means to investigate open chromatin at single cell level, to assess epigenetic regulation and transcription factors binding landscapes. The sparsity of the scATAC-seq data calls for imputation. Similarly, preprocessing (filtering) may be required to reduce computational load due to the large number of open regions. However, optimal strategies for both imputation and preprocessing have not been yet evaluated together. We present SAPIEnS (scATAC-seq Preprocessing and Imputation Evaluation System), a benchmark for scATAC-seq imputation frameworks, a combination of state-of-the-art imputation methods with commonly used preprocessing techniques. We assess different types of scATAC-seq analysis, i.e. clustering, visualization and digital genomic footprinting, and attain optimal preprocessing-imputation strategies. We discuss the benefits of the imputation framework depending on the task and the number of the dataset features (peaks). We conclude that the preprocessing with the Boruta method is beneficial for the majority of tasks, while imputation is helpful mostly for small datasets. We also implement a SAPIEnS database with pre-computed transcription factor footprints based on imputed data with their activity scores in a specific cell type. SAPIEnS is published at: https:\/\/github.com\/lab-medvedeva\/SAPIEnS. SAPIEnS database is available at: https:\/\/sapiensdb.com<\/jats:p>","DOI":"10.1093\/bib\/bbad447","type":"journal-article","created":{"date-parts":[[2023,12,12]],"date-time":"2023-12-12T17:47:09Z","timestamp":1702403229000},"source":"Crossref","is-referenced-by-count":7,"title":["scATAC-seq preprocessing and imputation evaluation system for visualization, clustering and digital footprinting"],"prefix":"10.1093","volume":"25","author":[{"ORCID":"https:\/\/orcid.org\/0009-0008-0778-1352","authenticated-orcid":false,"given":"Pavel","family":"Akhtyamov","sequence":"first","affiliation":[{"name":"Department of Biomedical Physics, Moscow Institute of Physics and Technology (National Research University) , 9 Institutskiy per., 141701, Moscow Region , Russian Federation"},{"name":"The National Medical Research Center for Endocrinology , Dm. Ulyanova, 11, 117036, Moscow , Russian Federation"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Layal","family":"Shaheen","sequence":"additional","affiliation":[{"name":"Department of Biomedical Physics, Moscow Institute of Physics and Technology (National Research University) , 9 Institutskiy per., 141701, Moscow Region , Russian Federation"},{"name":"The National Medical Research Center for Endocrinology , Dm. Ulyanova, 11, 117036, Moscow , Russian Federation"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Mikhail","family":"Raevskiy","sequence":"additional","affiliation":[{"name":"Department, \u00c9cole Polytechnique F\u00e9d\u00e9rale de Lausanne , Rte Cantonale, 1015, Lausanne, Vaud , Switzerland"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7605-410X","authenticated-orcid":false,"given":"Alexey","family":"Stupnikov","sequence":"additional","affiliation":[{"name":"Department of Biomedical Physics, Moscow Institute of Physics and Technology (National Research University) , 9 Institutskiy per., 141701, Moscow Region , Russian Federation"},{"name":"The National Medical Research Center for Endocrinology , Dm. Ulyanova, 11, 117036, Moscow , Russian Federation"},{"name":"Institute of Bioengineering, Research Center of Biotechnology, Russian Academy of Science , Leninsky prospect, 33, build. 2, 119071, Moscow , Russian Federation"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7587-1666","authenticated-orcid":false,"given":"Yulia A","family":"Medvedeva","sequence":"additional","affiliation":[{"name":"Department of Biomedical Physics, Moscow Institute of Physics and Technology (National Research University) , 9 Institutskiy per., 141701, Moscow Region , Russian Federation"},{"name":"The National Medical Research Center for Endocrinology , Dm. Ulyanova, 11, 117036, Moscow , Russian Federation"},{"name":"Institute of Bioengineering, Research Center of Biotechnology, Russian Academy of Science , Leninsky prospect, 33, build. 2, 119071, Moscow , Russian Federation"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2023,12,11]]},"reference":[{"issue":"1","key":"2023121211060106600_ref1","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1002\/0471142727.mb2129s109","article-title":"Atac-seq: a method for assaying chromatin accessibility genome-wide","volume":"109","author":"Buenrostro","year":"2015","journal-title":"Curr Protoc Mol Biol"},{"issue":"12","key":"2023121211060106600_ref2","doi-asserted-by":"crossref","first-page":"840","DOI":"10.1038\/nrg3306","article-title":"Chip-seq and beyond: new and improved methodologies to detect and characterize protein-DNA interactions","volume":"13","author":"Furey","year":"2012","journal-title":"Nat Rev Genet"},{"issue":"4","key":"2023121211060106600_ref3","doi-asserted-by":"crossref","first-page":"283","DOI":"10.1038\/nmeth.1313","article-title":"Global mapping of protein-dna interactions in vivo by digital genomic footprinting","volume":"6","author":"Hesselberth","year":"2009","journal-title":"Nat Methods"},{"issue":"3","key":"2023121211060106600_ref4","doi-asserted-by":"crossref","first-page":"213","DOI":"10.1038\/nmeth.3768","article-title":"Genomic footprinting","volume":"13","author":"Vierstra","year":"2016","journal-title":"Nat Methods"},{"key":"2023121211060106600_ref5","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1007\/978-1-0716-1534-8_3","article-title":"Genomic footprinting analyses from DNase-seq data to construct gene regulatory networks","volume":"2328","author":"Moyano","year":"2021","journal-title":"Methods Mol Biol"},{"issue":"1","key":"2023121211060106600_ref6","doi-asserted-by":"crossref","first-page":"22","DOI":"10.1186\/s13059-020-1929-3","article-title":"From reads to insight: a hitchhiker\u2019s guide to ATAC-seq data analysis","volume":"21","author":"Yan","year":"2020","journal-title":"Genome Biol"},{"key":"2023121211060106600_ref7","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13059-020-02132-x","article-title":"A systematic evaluation of single-cell rna-sequencing imputation methods","volume":"21","author":"Hou","year":"2020","journal-title":"Genome Biol"},{"issue":"1","key":"2023121211060106600_ref8","doi-asserted-by":"crossref","first-page":"6386","DOI":"10.1038\/s41467-021-26530-2","article-title":"Chromatin-accessibility estimation from single-cell atac-seq data with scopen","volume":"12","author":"Li","year":"2021","journal-title":"Nat Commun"},{"issue":"7","key":"2023121211060106600_ref9","doi-asserted-by":"crossref","first-page":"6229","DOI":"10.3390\/ijms24076229","article-title":"Epi-impute: single-cell rna-seq imputation via integration with single-cell atac-seq","volume":"24","author":"Raevskiy","year":"2023","journal-title":"Int J Mol Sci"},{"issue":"5","key":"2023121211060106600_ref10","doi-asserted-by":"crossref","first-page":"397","DOI":"10.1038\/s41592-019-0367-1","article-title":"Cistopic: cis-regulatory topic modeling on single-cell atac-seq data","volume":"16","author":"Gonz\u00e1lez-Blas","year":"2019","journal-title":"Nat Methods"},{"key":"2023121211060106600_ref11","doi-asserted-by":"crossref","first-page":"1","DOI":"10.18637\/jss.v036.i11","article-title":"Feature selection with the boruta package","volume":"36","author":"Kursa","year":"2010","journal-title":"J Stat Softw"},{"issue":"5","key":"2023121211060106600_ref12","doi-asserted-by":"crossref","first-page":"858","DOI":"10.1016\/j.molcel.2018.06.044","article-title":"Cicero predicts cis-regulatory dna interactions from single-cell chromatin accessibility data","volume":"71","author":"Pliner","year":"2018","journal-title":"Mol Cell"},{"issue":"1","key":"2023121211060106600_ref13","doi-asserted-by":"crossref","first-page":"4576","DOI":"10.1038\/s41467-019-12630-7","article-title":"Scale method for single-cell atac-seq analysis via latent feature extraction","volume":"10","author":"Xiong","year":"2019","journal-title":"Nat Commun"},{"issue":"1","key":"2023121211060106600_ref14","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13059-019-1854-5","article-title":"Assessment of computational methods for the analysis of single-cell atac-seq data","volume":"20","author":"Chen","year":"2019","journal-title":"Genome Biol"},{"issue":"01","key":"2023121211060106600_ref15","first-page":"2023","article-title":"Benchmarking algorithms for gene set scoring of single-cell atac-seq data","volume":"2023","author":"Wang","journal-title":"bioRxiv"},{"issue":"1","key":"2023121211060106600_ref16","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1038\/s41592-021-01336-8","article-title":"Benchmarking atlas-level data integration in single-cell genomics","volume":"19","author":"Luecken","year":"2022","journal-title":"Nat Methods"},{"issue":"1","key":"2023121211060106600_ref17","doi-asserted-by":"crossref","first-page":"bbab442","DOI":"10.1093\/bib\/bbab442","article-title":"Are dropout imputation methods for scRNA-seq effective for scATAC-seq data?","volume":"23","author":"Liu","year":"2021","journal-title":"Brief Bioinform"},{"issue":"3","key":"2023121211060106600_ref18","doi-asserted-by":"crossref","first-page":"264","DOI":"10.1145\/331499.331504","article-title":"Data clustering: a review","volume":"31","author":"Jain","year":"1999","journal-title":"ACM Comput Surv"},{"issue":"6","key":"2023121211060106600_ref19","doi-asserted-by":"crossref","first-page":"583","DOI":"10.3233\/IDA-2007-11602","article-title":"An overview of clustering methods","volume":"11","author":"Omran","year":"2007","journal-title":"Intell Data Anal"},{"issue":"5","key":"2023121211060106600_ref20","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/3347711","article-title":"How complex is your classification problem? A survey on measuring classification complexity","volume":"52","author":"Lorena","year":"2019","journal-title":"ACM Comput Surv"},{"key":"2023121211060106600_ref21","doi-asserted-by":"crossref","DOI":"10.12688\/f1000research.74846.1","article-title":"Hobotnica: exploring molecular signature quality","volume":"10","author":"Stupnikov","year":"2021","journal-title":"F1000Research"},{"issue":"5","key":"2023121211060106600_ref22","doi-asserted-by":"crossref","first-page":"493","DOI":"10.1108\/00220410410560573","article-title":"A statistical interpretation of term specificity and its application in retrieval","volume":"60","author":"Jones","year":"2004","journal-title":"J Doc"},{"issue":"6","key":"2023121211060106600_ref23","doi-asserted-by":"crossref","first-page":"1535","DOI":"10.1016\/j.cell.2018.03.074","article-title":"Integrated single-cell analysis maps the continuous regulatory landscape of human hematopoietic differentiation","volume":"173","author":"Buenrostro","year":"2018","journal-title":"Cell"},{"issue":"1","key":"2023121211060106600_ref24","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s12859-021-03957-4","article-title":"Selecting single cell clustering parameter values using subsampling-based robustness metrics","volume":"22","author":"Patterson-Cross","year":"2021","journal-title":"BMC Bioinformatics"},{"key":"2023121211060106600_ref25","doi-asserted-by":"crossref","first-page":"3470","DOI":"10.1016\/j.csbj.2021.05.040","article-title":"Robustness of differential gene expression analysis of rna-seq","volume":"19","author":"Stupnikov","year":"2021","journal-title":"Comput Struct Biotechnol J"},{"issue":"18","key":"2023121211060106600_ref26","doi-asserted-by":"crossref","first-page":"2057","DOI":"10.1093\/bioinformatics\/btn365","article-title":"Apparently low reproducibility of true differential expression discoveries in microarray studies","volume":"24","author":"Zhang","year":"2008","journal-title":"Bioinformatics"},{"issue":"21","key":"2023121211060106600_ref27","doi-asserted-by":"crossref","first-page":"3345","DOI":"10.1093\/bioinformatics\/btw475","article-title":"Samexplorer: exploring reproducibility and robustness of rna-seq results based on sam files","volume":"32","author":"Stupnikov","year":"2016","journal-title":"Bioinformatics"},{"key":"2023121211060106600_ref28","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13059-019-1642-2","article-title":"Identification of transcription factor binding sites using atac-seq","volume":"20","author":"Li","year":"2019","journal-title":"Genome Biol"},{"issue":"D1","key":"2023121211060106600_ref29","doi-asserted-by":"crossref","first-page":"D252","DOI":"10.1093\/nar\/gkx1106","article-title":"Hocomoco: towards a complete collection of transcription factor binding models for human and mouse via large-scale chip-seq analysis","volume":"46","author":"Kulakovskiy","year":"2018","journal-title":"Nucleic Acids Res"},{"issue":"1","key":"2023121211060106600_ref30","doi-asserted-by":"crossref","first-page":"4590","DOI":"10.1038\/s41467-018-07115-y","article-title":"Joint single-cell dna accessibility and protein epitope profiling reveals environmental regulation of epigenomic heterogeneity","volume":"9","author":"Chen","year":"2018","journal-title":"Nat Commun"},{"issue":"3","key":"2023121211060106600_ref31","doi-asserted-by":"crossref","first-page":"432","DOI":"10.1038\/s41593-018-0079-3","article-title":"Single-nucleus analysis of accessible chromatin in developing mouse forebrain reveals cell-type-specific transcriptional regulation","volume":"21","author":"Preissl","year":"2018","journal-title":"Nat Neurosci"},{"key":"2023121211060106600_ref32","doi-asserted-by":"crossref","first-page":"62","DOI":"10.1016\/j.yjmcc.2021.09.002","article-title":"Delineating chromatin accessibility re-patterning at single cell level during early stage of direct cardiac reprogramming","volume":"162","author":"Wang","year":"2022","journal-title":"J Mol Cell Cardiol"},{"issue":"5","key":"2023121211060106600_ref33","doi-asserted-by":"crossref","first-page":"1309","DOI":"10.1016\/j.cell.2018.06.052","article-title":"A single-cell atlas of in vivo mammalian chromatin accessibility","volume":"174","author":"Cusanovich","year":"2018","journal-title":"Cell"},{"issue":"5","key":"2023121211060106600_ref34","doi-asserted-by":"crossref","first-page":"580","DOI":"10.1038\/s41591-018-0008-8","article-title":"Transcript-indexed atac-seq for precision immune profiling","volume":"24","author":"Satpathy","year":"2018","journal-title":"Nat Med"},{"issue":"7561","key":"2023121211060106600_ref35","doi-asserted-by":"crossref","first-page":"486","DOI":"10.1038\/nature14590","article-title":"Single-cell chromatin accessibility reveals principles of regulatory variation","volume":"523","author":"Buenrostro","year":"2015","journal-title":"Nature"},{"article-title":"10k human pbmcs, multiome v1.0, chromium x","year":"2021","author":"10X Genomics","key":"2023121211060106600_ref36"},{"key":"2023121211060106600_ref37","doi-asserted-by":"crossref","DOI":"10.21105\/joss.00861","article-title":"Umap: uniform manifold approximation and projection for dimension reduction","volume":"3","author":"McInnes","year":"2018","journal-title":"Journal of Open Source Software"},{"key":"2023121211060106600_ref38","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13059-017-1382-0","article-title":"Scanpy: large-scale single-cell gene expression data analysis","volume":"19","author":"Wolf","year":"2018","journal-title":"Genome Biol"},{"issue":"10","key":"2023121211060106600_ref39","doi-asserted-by":"crossref","first-page":"P10008","DOI":"10.1088\/1742-5468\/2008\/10\/P10008","article-title":"Fast unfolding of communities in large networks","volume":"2008","author":"Blondel","year":"2008","journal-title":"Journal of statistical mechanics: theory and experiment"},{"key":"2023121211060106600_ref40","first-page":"2825","article-title":"Scikit-learn: machine learning in python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"the Journal of machine Learning research"},{"author":"Homola","key":"2023121211060106600_ref41","article-title":"boruta_py"},{"issue":"1","key":"2023121211060106600_ref42","doi-asserted-by":"crossref","first-page":"34","DOI":"10.2174\/156652412798376125","article-title":"Hematopoietic stem cells: transcriptional regulation, ex vivo expansion and clinical application","volume":"12","author":"Aggarwal","year":"2012","journal-title":"Curr Mol Med"},{"issue":"4","key":"2023121211060106600_ref43","doi-asserted-by":"crossref","first-page":"382","DOI":"10.1038\/ni1314","article-title":"Early hematopoietic lineage restrictions directed by ikaros","volume":"7","author":"Yoshida","year":"2006","journal-title":"Nat Immunol"}],"container-title":["Briefings in Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/25\/1\/bbad447\/54256320\/bbad447.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bib\/article-pdf\/25\/1\/bbad447\/54256320\/bbad447.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,12,12]],"date-time":"2023-12-12T17:47:30Z","timestamp":1702403250000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bib\/article\/doi\/10.1093\/bib\/bbad447\/7469348"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,11,22]]},"references-count":43,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2023,11,22]]}},"URL":"https:\/\/doi.org\/10.1093\/bib\/bbad447","relation":{},"ISSN":["1467-5463","1477-4054"],"issn-type":[{"type":"print","value":"1467-5463"},{"type":"electronic","value":"1477-4054"}],"subject":[],"published-other":{"date-parts":[[2024,1,1]]},"published":{"date-parts":[[2023,11,22]]},"article-number":"bbad447"}}