{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,11]],"date-time":"2026-04-11T02:16:26Z","timestamp":1775873786590,"version":"3.50.1"},"reference-count":41,"publisher":"Oxford University Press (OUP)","issue":"14","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2012,7,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: Eugene Myers in his string graph paper suggested that in a string graph or equivalently a unitig graph, any path spells a valid assembly. As a string\/unitig graph also encodes every valid assembly of reads, such a graph, provided that it can be constructed correctly, is in fact a lossless representation of reads. In principle, every analysis based on whole-genome shotgun sequencing (WGS) data, such as SNP and insertion\/deletion (INDEL) calling, can also be achieved with unitigs.<\/jats:p><jats:p>Results: To explore the feasibility of using de novo assembly in the context of resequencing, we developed a de novo assembler, fermi, that assembles Illumina short reads into unitigs while preserving most of information of the input reads. SNPs and INDELs can be called by mapping the unitigs against a reference genome. By applying the method on 35-fold human resequencing data, we showed that in comparison to the standard pipeline, our approach yields similar accuracy for SNP calling and better results for INDEL calling. It has higher sensitivity than other de novo assembly based methods for variant calling. Our work suggests that variant calling with de novo assembly can be a beneficial complement to the standard variant calling pipeline for whole-genome resequencing. In the methodological aspects, we propose FMD-index for forward\u2013backward extension of DNA sequences, a fast algorithm for finding all super-maximal exact matches and one-pass construction of unitigs from an FMD-index.<\/jats:p><jats:p>Availability: \u00a0http:\/\/github.com\/lh3\/fermi<\/jats:p><jats:p>Contact: \u00a0hengli@broadinstitute.org<\/jats:p>","DOI":"10.1093\/bioinformatics\/bts280","type":"journal-article","created":{"date-parts":[[2012,5,9]],"date-time":"2012-05-09T00:19:00Z","timestamp":1336522740000},"page":"1838-1844","source":"Crossref","is-referenced-by-count":355,"title":["Exploring single-sample SNP and INDEL calling with whole-genome<i>de novo<\/i>assembly"],"prefix":"10.1093","volume":"28","author":[{"given":"Heng","family":"Li","sequence":"first","affiliation":[{"name":"Medical Population Genetics Program, Broad Institute, 7 Cambridge Center, MA 02142, USA"}]}],"member":"286","published-online":{"date-parts":[[2012,5,7]]},"reference":[{"key":"2023012512433928700_B1","doi-asserted-by":"crossref","first-page":"961","DOI":"10.1101\/gr.112326.110","article-title":"Dindel: accurate indel calls from short-read data","volume":"21","author":"Albers","year":"2010","journal-title":"Genome Res."},{"key":"2023012512433928700_B2","volume-title":"A block-sorting lossless data compression algorithm.","author":"Burrows","year":"1994"},{"key":"2023012512433928700_B3","doi-asserted-by":"crossref","first-page":"279","DOI":"10.1089\/cmb.2011.0201","article-title":"Computational techniques for human genome resequencing using mated gapped reads","volume":"19","author":"Carnevali","year":"2011","journal-title":"J. Comput. Biol."},{"key":"2023012512433928700_B4","doi-asserted-by":"crossref","first-page":"336","DOI":"10.1101\/gr.079053.108","article-title":"De novo fragment assembly with short mate-paired reads: Does the read length matter?","volume":"19","author":"Chaisson","year":"2009","journal-title":"Genome Res."},{"key":"2023012512433928700_B5","doi-asserted-by":"crossref","first-page":"491","DOI":"10.1038\/ng.806","article-title":"A framework for variation discovery and genotyping using next-generation DNA sequencing data","volume":"43","author":"Depristo","year":"2011","journal-title":"Nat. Genet."},{"key":"2023012512433928700_B6","doi-asserted-by":"crossref","first-page":"78","DOI":"10.1126\/science.1181498","article-title":"Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays","volume":"327","author":"Drmanac","year":"2010","journal-title":"Science"},{"key":"2023012512433928700_B7","doi-asserted-by":"crossref","first-page":"194","DOI":"10.1109\/TIT.1975.1055349","article-title":"Universal codeword sets and representations of the integers","volume":"21","author":"Elias","year":"1975","journal-title":"IEEE Trans. Inf. Theory"},{"key":"2023012512433928700_B8","first-page":"390","article-title":"Opportunistic data structures with applications","volume-title":"FOCS","author":"Ferragina","year":"2000"},{"key":"2023012512433928700_B9","first-page":"697","article-title":"Lightweight data indexing and compression in external memory","volume-title":"LATIN","author":"Ferragina","year":"2010"},{"key":"2023012512433928700_B10","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1093\/nar\/7.2.529","article-title":"Computer programs for the assembly of DNA sequences","volume":"7","author":"Gingeras","year":"1979","journal-title":"Nucleic Acids Res."},{"key":"2023012512433928700_B11","doi-asserted-by":"crossref","first-page":"1513","DOI":"10.1073\/pnas.1017351108","article-title":"High-quality draft assemblies of mammalian genomes from massively parallel sequence data","volume":"108","author":"Gnerre","year":"2011","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023012512433928700_B12","doi-asserted-by":"crossref","first-page":"R99","DOI":"10.1186\/gb-2010-11-10-r99","article-title":"Improved variant discovery through local re-alignment of short-read next-generation sequencing data using SRMA","volume":"11","author":"Homer","year":"2010","journal-title":"Genome Biol."},{"key":"2023012512433928700_B13","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1007\/s00453-006-1228-8","article-title":"A space and time efficient algorithm for constructing compressed suffix arrays","volume":"48","author":"Hon","year":"2007","journal-title":"Algorithmica"},{"key":"2023012512433928700_B14","doi-asserted-by":"crossref","first-page":"291","DOI":"10.1089\/cmb.1995.2.291","article-title":"A new algorithm for DNA sequence assembly","volume":"2","author":"Idury","year":"1995","journal-title":"J. Comput. Biol."},{"key":"2023012512433928700_B15","doi-asserted-by":"crossref","first-page":"295","DOI":"10.1093\/bioinformatics\/btq653","article-title":"HiTEC: accurate error correction in high-throughput sequencing data","volume":"27","author":"Ilie","year":"2011","journal-title":"Bioinformatics"},{"key":"2023012512433928700_B16","doi-asserted-by":"crossref","first-page":"226","DOI":"10.1038\/ng.1028","article-title":"De novo assembly and genotyping of variants using colored de bruijn graphs","volume":"44","author":"Iqbal","year":"2012","journal-title":"Nat. Genet."},{"key":"2023012512433928700_B17","first-page":"31","article-title":"High throughput short read alignment via bi-directional BWT","volume-title":"BIBM","author":"Lam","year":"2009"},{"key":"2023012512433928700_B18","doi-asserted-by":"crossref","first-page":"78","DOI":"10.1038\/nbt.2065","article-title":"Performance comparison of whole-genome sequencing platforms","volume":"30","author":"Lam","year":"2012","journal-title":"Nat. Biotechnol."},{"key":"2023012512433928700_B19","doi-asserted-by":"crossref","first-page":"e254","DOI":"10.1371\/journal.pbio.0050254","article-title":"The diploid genome sequence of an individual human","volume":"5","author":"Levy","year":"2007","journal-title":"PLoS Biol."},{"key":"2023012512433928700_B20","doi-asserted-by":"crossref","first-page":"1754","DOI":"10.1093\/bioinformatics\/btp324","article-title":"Fast and accurate short read alignment with Burrows-Wheeler transform","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012512433928700_B21","doi-asserted-by":"crossref","first-page":"589","DOI":"10.1093\/bioinformatics\/btp698","article-title":"Fast and accurate long-read alignment with Burrows-Wheeler transform","volume":"26","author":"Li","year":"2010","journal-title":"Bioinformatics"},{"key":"2023012512433928700_B22","doi-asserted-by":"crossref","first-page":"1157","DOI":"10.1093\/bioinformatics\/btr076","article-title":"Improving SNP discovery by base alignment quality","volume":"27","author":"Li","year":"2011","journal-title":"Bioinformatics"},{"key":"2023012512433928700_B23","doi-asserted-by":"crossref","first-page":"265","DOI":"10.1101\/gr.097261.109","article-title":"De novo assembly of human genomes with massively parallel short read sequencing","volume":"20","author":"Li","year":"2010","journal-title":"Genome Res"},{"key":"2023012512433928700_B24","first-page":"121","article-title":"Storage and retrieval of individual genomes","volume-title":"RECOMB.","author":"M\u00e4kinen","year":"2009"},{"key":"2023012512433928700_B25","doi-asserted-by":"crossref","first-page":"2434","DOI":"10.1093\/bioinformatics\/btp403","article-title":"SNP-o-matic","volume":"25","author":"Manske","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012512433928700_B26","first-page":"289","article-title":"Computability of models for sequence assembly","volume-title":"WABI","author":"Medvedev","year":"2007"},{"key":"2023012512433928700_B27","doi-asserted-by":"crossref","first-page":"830","DOI":"10.1101\/gr.115907.110","article-title":"Natural genetic variation caused by small insertions and deletions in the human genome","volume":"21","author":"Mills","year":"2011","journal-title":"Genome Res."},{"key":"2023012512433928700_B28","doi-asserted-by":"crossref","first-page":"275","DOI":"10.1089\/cmb.1995.2.275","article-title":"Toward simplifying and accurately formulating fragment assembly","volume":"2","author":"Myers","year":"1995","journal-title":"J. Comput. Biol."},{"key":"2023012512433928700_B29","doi-asserted-by":"crossref","first-page":"2196","DOI":"10.1126\/science.287.5461.2196","article-title":"A whole-genome assembly of drosophila","volume":"287","author":"Myers","year":"2000","journal-title":"Science"},{"issue":"Suppl. 2","key":"2023012512433928700_B30","doi-asserted-by":"crossref","first-page":"ii79","DOI":"10.1093\/bioinformatics\/bti1114","article-title":"The fragment assembly string graph","volume":"21","author":"Myers","year":"2005","journal-title":"Bioinformatics"},{"key":"2023012512433928700_B31","doi-asserted-by":"crossref","first-page":"1471","DOI":"10.1109\/TC.2010.188","article-title":"Two efficient algorithms for linear time suffix array construction","volume":"60","author":"Nong","year":"2011","journal-title":"IEEE Trans. Comput."},{"key":"2023012512433928700_B32","doi-asserted-by":"crossref","first-page":"2024","DOI":"10.1101\/gr.080200.108","article-title":"Sequencing of natural strains of arabidopsis thaliana with short reads","volume":"18","author":"Ossowski","year":"2008","journal-title":"Genome Res."},{"key":"2023012512433928700_B33","doi-asserted-by":"crossref","first-page":"307","DOI":"10.1093\/nar\/12.1Part1.307","article-title":"SEQAID: a DNA sequence assembling program based on a mathematical model","volume":"12","author":"Peltola","year":"1984","journal-title":"Nucleic Acids Res."},{"key":"2023012512433928700_B34","doi-asserted-by":"crossref","first-page":"9748","DOI":"10.1073\/pnas.171285098","article-title":"An eulerian path approach to DNA fragment assembly","volume":"98","author":"Pevzner","year":"2001","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023012512433928700_B35","doi-asserted-by":"crossref","first-page":"i367","DOI":"10.1093\/bioinformatics\/btq217","article-title":"Efficient construction of an assembly string graph using the FM-index","volume":"26","author":"Simpson","year":"2010","journal-title":"Bioinformatics"},{"key":"2023012512433928700_B36","doi-asserted-by":"crossref","first-page":"549","DOI":"10.1101\/gr.126953.111","article-title":"Efficient de novo assembly of large genomes using compressed data structures","volume":"22","author":"Simpson","year":"2012","journal-title":"Genome Res."},{"key":"2023012512433928700_B37","doi-asserted-by":"crossref","first-page":"1117","DOI":"10.1101\/gr.089532.108","article-title":"ABySS: a parallel assembler for short read sequence data","volume":"19","author":"Simpson","year":"2009","journal-title":"Genome Res."},{"key":"2023012512433928700_B38","doi-asserted-by":"crossref","first-page":"63","DOI":"10.1007\/978-3-642-03784-9_7","article-title":"Compressed suffix arrays for massive data","volume-title":"String Processing and Information Retrieval","author":"Siren","year":"2009"},{"key":"2023012512433928700_B39","doi-asserted-by":"crossref","first-page":"2601","DOI":"10.1093\/nar\/6.7.2601","article-title":"A strategy of DNA sequencing employing computer programs","volume":"6","author":"Staden","year":"1979","journal-title":"Nucleic Acids Res."},{"key":"2023012512433928700_B40","doi-asserted-by":"crossref","first-page":"e8407","DOI":"10.1371\/journal.pone.0008407","article-title":"Pebble and rock band: heuristic resolution of repeats and scaffolding in the velvet short-read de novo assembler","volume":"4","author":"Zerbino","year":"2009","journal-title":"PLoS ONE"},{"key":"2023012512433928700_B41","doi-asserted-by":"crossref","first-page":"1061","DOI":"10.1038\/nature09534","article-title":"A map of human genome variation from population-scale sequencing","volume":"467","author":"1000 Genomes Project Consortium","year":"2010","journal-title":"Nature"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/28\/14\/1838\/48871878\/bioinformatics_28_14_1838.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/28\/14\/1838\/48871878\/bioinformatics_28_14_1838.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,4,24]],"date-time":"2024-04-24T10:18:08Z","timestamp":1713953888000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/28\/14\/1838\/218887"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,5,7]]},"references-count":41,"journal-issue":{"issue":"14","published-print":{"date-parts":[[2012,7,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bts280","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2012,7,15]]},"published":{"date-parts":[[2012,5,7]]}}}