{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,25]],"date-time":"2026-03-25T02:33:16Z","timestamp":1774405996980,"version":"3.50.1"},"reference-count":23,"publisher":"Oxford University Press (OUP)","issue":"13","license":[{"start":{"date-parts":[[2018,6,27]],"date-time":"2018-06-27T00:00:00Z","timestamp":1530057600000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"name":"EEE"},{"name":"JOK"},{"DOI":"10.13039\/100000002","name":"NIH","doi-asserted-by":"publisher","award":["U41HG007497"],"award-info":[{"award-number":["U41HG007497"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000011","name":"Howard Hughes Medical Institute","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000011","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2018,7,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Motivation<\/jats:title><jats:p>Current sequencing technologies are able to produce reads orders of magnitude longer than ever possible before. Such long reads have sparked a new interest in de novo genome assembly, which removes reference biases inherent to re-sequencing approaches and allows for a direct characterization of complex genomic variants. However, even with latest algorithmic advances, assembling a mammalian genome from long error-prone reads incurs a significant computational burden and does not preclude occasional misassemblies. Both problems could potentially be mitigated if assembly could commence for each chromosome separately.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>To address this, we show how single-cell template strand sequencing (Strand-seq) data can be leveraged for this purpose. We introduce a novel latent variable model and a corresponding Expectation Maximization algorithm, termed SaaRclust, and demonstrates its ability to reliably cluster long reads by chromosome. For each long read, this approach produces a posterior probability distribution over all chromosomes of origin and read directionalities. In this way, it allows to assess the amount of uncertainty inherent to sparse Strand-seq data on the level of individual reads. Among the reads that our algorithm confidently assigns to a chromosome, we observed more than 99% correct assignments on a subset of Pacific Bioscience reads with 30.1\u00d7\u2009coverage. To our knowledge, SaaRclust is the first approach for the in silico separation of long reads by chromosome prior to assembly.<\/jats:p><\/jats:sec><jats:sec><jats:title>Availability and implementation<\/jats:title><jats:p>https:\/\/github.com\/daewoooo\/SaaRclust<\/jats:p><\/jats:sec>","DOI":"10.1093\/bioinformatics\/bty290","type":"journal-article","created":{"date-parts":[[2018,4,16]],"date-time":"2018-04-16T19:11:48Z","timestamp":1523905908000},"page":"i115-i123","source":"Crossref","is-referenced-by-count":29,"title":["Strand-seq enables reliable separation of long reads by chromosome via expectation maximization"],"prefix":"10.1093","volume":"34","author":[{"given":"Maryam","family":"Ghareghani","sequence":"first","affiliation":[{"name":"Center for Bioinformatics, Saarland University, Saarland Informatics Campus E2.1, Saarbr\u00fccken, 66123, Germany"},{"name":"Max Planck Institute for Informatics, Saarland Informatics Campus E1.4, Saarbr\u00fccken, Germany"},{"name":"Graduate School of Computer Science, Saarland University, Saarland Informatics Campus E1.3, Saarbr\u00fccken, Germany"}]},{"given":"David","family":"Porubsk\u1ef3","sequence":"additional","affiliation":[{"name":"Center for Bioinformatics, Saarland University, Saarland Informatics Campus E2.1, Saarbr\u00fccken, 66123, Germany"},{"name":"Max Planck Institute for Informatics, Saarland Informatics Campus E1.4, Saarbr\u00fccken, Germany"}]},{"given":"Ashley D","family":"Sanders","sequence":"additional","affiliation":[{"name":"European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany"}]},{"given":"Sascha","family":"Meiers","sequence":"additional","affiliation":[{"name":"European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany"}]},{"given":"Evan E","family":"Eichler","sequence":"additional","affiliation":[{"name":"Department of Genome Sciences, University of Washington, Seattle, WA, USA"},{"name":"Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA"}]},{"given":"Jan O","family":"Korbel","sequence":"additional","affiliation":[{"name":"European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Heidelberg, Germany"}]},{"given":"Tobias","family":"Marschall","sequence":"additional","affiliation":[{"name":"Center for Bioinformatics, Saarland University, Saarland Informatics Campus E2.1, Saarbr\u00fccken, 66123, Germany"},{"name":"Max Planck Institute for Informatics, Saarland Informatics Campus E1.4, Saarbr\u00fccken, Germany"}]}],"member":"286","published-online":{"date-parts":[[2018,6,27]]},"reference":[{"key":"2023051604241192600_bty290-B1","doi-asserted-by":"crossref","first-page":"1119","DOI":"10.1038\/nbt.2727","article-title":"Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions","volume":"31","author":"Burton","year":"2013","journal-title":"Nat. Biotechnol"},{"key":"2023051604241192600_bty290-B2","author":"Chaisson","year":"2017"},{"key":"2023051604241192600_bty290-B3","doi-asserted-by":"crossref","first-page":"1050","DOI":"10.1038\/nmeth.4035","article-title":"Phased diploid genome assembly with single-molecule real-time sequencing","volume":"13","author":"Chin","year":"2016","journal-title":"Nat. Methods"},{"key":"2023051604241192600_bty290-B4","doi-asserted-by":"crossref","DOI":"10.7554\/eLife.30560","article-title":"Genome-wide mapping of sister chromatid exchange events in single yeast cells using strand-seq","volume":"6","author":"Claussin","year":"2017","journal-title":"Elife"},{"key":"2023051604241192600_bty290-B5","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1111\/j.2517-6161.1977.tb01600.x","article-title":"Maximum likelihood from incomplete data via the EM algorithm","volume":"39","author":"Dempster","year":"1977","journal-title":"J. R. Stat. Soc. Ser. B (Methodol.)"},{"key":"2023051604241192600_bty290-B6","doi-asserted-by":"crossref","first-page":"1107","DOI":"10.1038\/nmeth.2206","article-title":"DNA template strand sequencing of single-cells maps genomic rearrangements at high resolution","volume":"9","author":"Falconer","year":"2012","journal-title":"Nat. Methods"},{"key":"2023051604241192600_bty290-B7","doi-asserted-by":"crossref","first-page":"aae0344.","DOI":"10.1126\/science.aae0344","article-title":"Long-read sequence assembly of the gorilla genome","volume":"352","author":"Gordon","year":"2016","journal-title":"Science"},{"key":"2023051604241192600_bty290-B8","doi-asserted-by":"crossref","first-page":"82.","DOI":"10.1186\/gm486","article-title":"Bait: organizing genomes and mapping rearrangements in single cells","volume":"5","author":"Hills","year":"2013","journal-title":"Genome Med"},{"key":"2023051604241192600_bty290-B9","author":"Hills","year":"2018"},{"key":"2023051604241192600_bty290-B10","doi-asserted-by":"crossref","first-page":"64","DOI":"10.1016\/j.pbi.2017.02.002","article-title":"The impact of third generation genomic technologies on plant genome assembly","volume":"36","author":"Jiao","year":"2017","journal-title":"Curr. Opin. Plant Biol"},{"key":"2023051604241192600_bty290-B11","doi-asserted-by":"crossref","first-page":"778","DOI":"10.1101\/gr.213652.116","article-title":"Improving and correcting the contiguity of long-read genome assemblies of three plant species using optical mapping and chromosome conformation capture data","volume":"27","author":"Jiao","year":"2017","journal-title":"Genome Res"},{"key":"2023051604241192600_bty290-B12","doi-asserted-by":"crossref","first-page":"722","DOI":"10.1101\/gr.215087.116","article-title":"Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation","volume":"27","author":"Koren","year":"2017","journal-title":"Genome Res"},{"key":"2023051604241192600_bty290-B13","doi-asserted-by":"crossref","first-page":"2103","DOI":"10.1093\/bioinformatics\/btw152","article-title":"Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences","volume":"32","author":"Li","year":"2016","journal-title":"Bioinformatics"},{"key":"2023051604241192600_bty290-B14","doi-asserted-by":"crossref","first-page":"E8396","DOI":"10.1073\/pnas.1604560113","article-title":"Assembly of long error-prone reads using de Bruijn graphs","volume":"113","author":"Lin","year":"2016","journal-title":"Proc. Natl. Acad. Sci"},{"key":"2023051604241192600_bty290-B15","doi-asserted-by":"crossref","first-page":"52","DOI":"10.1007\/978-3-662-44753-6_5","volume-title":"International Workshop on Algorithms in Bioinformatics","author":"Myers","year":"2014"},{"key":"2023051604241192600_bty290-B16","doi-asserted-by":"crossref","first-page":"3021","DOI":"10.1093\/bioinformatics\/btw369","article-title":"Assemblytics: a web analytics tool for the detection of variants from an assembly","volume":"32","author":"Nattestad","year":"2016","journal-title":"Bioinformatics"},{"key":"2023051604241192600_bty290-B17","doi-asserted-by":"crossref","first-page":"2737","DOI":"10.1093\/bioinformatics\/btx281","article-title":"Assembling draft genomes using contiBAIT","volume":"33","author":"O\u2019Neill","year":"2017","journal-title":"Bioinformatics"},{"key":"2023051604241192600_bty290-B18","doi-asserted-by":"crossref","first-page":"1565","DOI":"10.1101\/gr.209841.116","article-title":"Direct chromosome-length haplotyping by single-cell sequencing","volume":"26","author":"Porubsk\u00fd","year":"2016","journal-title":"Genome Res"},{"key":"2023051604241192600_bty290-B19","author":"Porubsk\u1ef3","year":"2017"},{"key":"2023051604241192600_bty290-B20","doi-asserted-by":"crossref","first-page":"1575","DOI":"10.1101\/gr.201160.115","article-title":"Characterizing polymorphic inversions in human genomes by single-cell sequencing","volume":"26","author":"Sanders","year":"2016","journal-title":"Genome Res"},{"key":"2023051604241192600_bty290-B21","doi-asserted-by":"crossref","first-page":"36","DOI":"10.1038\/nrg3117","article-title":"Repetitive dna and next-generation sequencing: computational challenges and solutions","volume":"13","author":"Treangen","year":"2012","journal-title":"Nat. Rev. Genet"},{"key":"2023051604241192600_bty290-B22","doi-asserted-by":"crossref","first-page":"271.","DOI":"10.1038\/s41467-017-02760-1","article-title":"BLM helicase suppresses recombination at g-quadruplex motifs in transcribed genes","volume":"9","author":"van Wietmarschen","year":"2018","journal-title":"Nat. Commun"},{"key":"2023051604241192600_bty290-B23","doi-asserted-by":"crossref","first-page":"95","DOI":"10.1214\/aos\/1176346060","article-title":"On the convergence properties of the EM algorithm","volume":"11","author":"Wu","year":"1983","journal-title":"Ann. Stat"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/13\/i115\/50316001\/bioinformatics_34_13_i115.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/13\/i115\/50316001\/bioinformatics_34_13_i115.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,6]],"date-time":"2024-07-06T02:10:47Z","timestamp":1720231847000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/34\/13\/i115\/5045731"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,6,27]]},"references-count":23,"journal-issue":{"issue":"13","published-print":{"date-parts":[[2018,7,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bty290","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2018,7,1]]},"published":{"date-parts":[[2018,6,27]]}}}