{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:33:55Z","timestamp":1772138035722,"version":"3.50.1"},"reference-count":9,"publisher":"Oxford University Press (OUP)","issue":"12","license":[{"start":{"date-parts":[[2021,1,20]],"date-time":"2021-01-20T00:00:00Z","timestamp":1611100800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["U01MH116492"],"award-info":[{"award-number":["U01MH116492"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000025","name":"National Institutes of Mental Health","doi-asserted-by":"crossref","award":["K01MH123896"],"award-info":[{"award-number":["K01MH123896"]}],"id":[{"id":"10.13039\/100000025","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/100000057","name":"National Institute of General Medical Sciences","doi-asserted-by":"publisher","award":["R01GM134020"],"award-info":[{"award-number":["R01GM134020"]}],"id":[{"id":"10.13039\/100000057","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,7,19]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Summary<\/jats:title>\n                    <jats:p>scATAC-seq is a powerful approach for characterizing cell-type-specific regulatory landscapes. However, it is difficult to benchmark the performance of various scATAC-seq analysis techniques (such as clustering and deconvolution) without having a priori a known set of gold-standard cell types. To simulate scATAC-seq experiments with known cell-type labels, we introduce an efficient and scalable scATAC-seq simulation method (SCAN-ATAC-Sim) that down-samples bulk ATAC-seq data (e.g. from representative cell lines or tissues). Our protocol uses a consistent but tunable signal-to-noise ratio across cell types in a scATAC-seq simulation for integrating bulk experiments with different levels of background noise, and it independently samples twice without replacement to account for the diploid genome. Because it uses an efficient weighted reservoir sampling algorithm and is highly parallelizable with OpenMP, our implementation in C++ allows millions of cells to be simulated in less than an hour on a laptop computer.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>SCAN-ATAC-Sim is available at scan-atac-sim.gersteinlab.org.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaa1039","type":"journal-article","created":{"date-parts":[[2021,1,14]],"date-time":"2021-01-14T14:12:44Z","timestamp":1610633564000},"page":"1756-1758","source":"Crossref","is-referenced-by-count":16,"title":["SCAN-ATAC-Sim: a scalable and efficient method for simulating single-cell ATAC-seq data from bulk-tissue experiments"],"prefix":"10.1093","volume":"37","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5835-3840","authenticated-orcid":false,"given":"Zhanlin","family":"Chen","sequence":"first","affiliation":[{"name":"Department of Molecular Biophysics and Biochemistry, Yale University , New Haven, CT 06520, USA"},{"name":"Department of Computer Science, Yale University , New Haven, CT 06520, USA"}]},{"given":"Jing","family":"Zhang","sequence":"additional","affiliation":[{"name":"Department of Computer Science, University of California , Irvine, CA 92617, USA"}]},{"given":"Jason","family":"Liu","sequence":"additional","affiliation":[{"name":"Department of Molecular Biophysics and Biochemistry, Yale University , New Haven, CT 06520, USA"}]},{"given":"Zixuan","family":"Zhang","sequence":"additional","affiliation":[{"name":"School of Electronic Engineering and Computer Science, Queen Mary University of London , London E1 4NS, UK"}]},{"given":"Jiangqi","family":"Zhu","sequence":"additional","affiliation":[{"name":"School of Electronic Engineering and Computer Science, Queen Mary University of London , London E1 4NS, UK"}]},{"given":"Donghoon","family":"Lee","sequence":"additional","affiliation":[{"name":"Department of Genetics and Genomic Sciences , New York, NY 10029, USA"},{"name":"Department of Psychiatry, Icahn School of Medicine at Mount Sinai , New York, NY 10029, USA"}]},{"given":"Min","family":"Xu","sequence":"additional","affiliation":[{"name":"Computational Biology Department, Carnegie Mellon University , Pittsburgh, PA 15213, USA"}]},{"given":"Mark","family":"Gerstein","sequence":"additional","affiliation":[{"name":"Department of Molecular Biophysics and Biochemistry, Yale University , New Haven, CT 06520, USA"},{"name":"Department of Computer Science, Yale University , New Haven, CT 06520, USA"}]}],"member":"286","published-online":{"date-parts":[[2021,1,20]]},"reference":[{"key":"2023051709461901800_btaa1039-B1","doi-asserted-by":"crossref","first-page":"397","DOI":"10.1038\/s41592-019-0367-1","article-title":"cisTopic: cis-regulatory topic modeling on single-cell ATAC-seq data","volume":"16","author":"Bravo Gonzalez-Blas","year":"2019","journal-title":"Nat. Methods"},{"key":"2023051709461901800_btaa1039-B2","doi-asserted-by":"crossref","first-page":"486","DOI":"10.1038\/nature14590","article-title":"Single-cell chromatin accessibility reveals principles of regulatory variation","volume":"523","author":"Buenrostro","year":"2015","journal-title":"Nature"},{"key":"2023051709461901800_btaa1039-B3","doi-asserted-by":"crossref","first-page":"181","DOI":"10.1016\/j.ipl.2005.11.003","article-title":"Weighted random sampling with a reservoir","volume":"97","author":"Efraimidis","year":"2006","journal-title":"Inf. Process. Lett"},{"key":"2023051709461901800_btaa1039-B4","doi-asserted-by":"publisher","author":"Fang","year":"2019","DOI":"10.1101\/615179\u00a0["},{"key":"2023051709461901800_btaa1039-B5","doi-asserted-by":"crossref","first-page":"470","DOI":"10.1038\/s41467-018-08205-7","article-title":"Deconvolution of single-cell multi-omics layers reveals regulatory heterogeneity","volume":"10","author":"Liu","year":"2019","journal-title":"Nat. Commun"},{"key":"2023051709461901800_btaa1039-B6","doi-asserted-by":"crossref","first-page":"975","DOI":"10.1038\/nmeth.4401","article-title":"chromVAR: inferring transcription-factor-associated accessibility from single-cell epigenomic data","volume":"14","author":"Schep","year":"2017","journal-title":"Nat. Methods"},{"key":"2023051709461901800_btaa1039-B7","doi-asserted-by":"crossref","first-page":"4576","DOI":"10.1038\/s41467-019-12630-7","article-title":"SCALE method for single-cell ATAC-seq analysis via latent feature extraction","volume":"10","author":"Xiong","year":"2019","journal-title":"Nat. Commun"},{"key":"2023051709461901800_btaa1039-B8","doi-asserted-by":"crossref","first-page":"2410","DOI":"10.1038\/s41467-018-04629-3","article-title":"Unsupervised clustering and epigenetic classification of single cells","volume":"9","author":"Zamanighomi","year":"2018","journal-title":"Nat. Commun"},{"key":"2023051709461901800_btaa1039-B9","first-page":"726","article-title":"An integrative ENCODE resource for cancer genomics","volume":"11","author":"Zhang","year":"2020","journal-title":"Nat. Commun"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaa1039\/38711858\/btaa1039.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/12\/1756\/50361325\/btaa1039.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/12\/1756\/50361325\/btaa1039.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,17]],"date-time":"2023-05-17T06:33:41Z","timestamp":1684305221000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/37\/12\/1756\/6104822"}},"subtitle":[],"editor":[{"given":"Anthony","family":"Mathelier","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2021,1,20]]},"references-count":9,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2021,7,19]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaa1039","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2020.05.29.123638","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2021,6,15]]},"published":{"date-parts":[[2021,1,20]]}}}