{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,8]],"date-time":"2026-01-08T18:51:00Z","timestamp":1767898260580,"version":"3.49.0"},"reference-count":42,"publisher":"Oxford University Press (OUP)","issue":"8","license":[{"start":{"date-parts":[[2016,10,2]],"date-time":"2016-10-02T00:00:00Z","timestamp":1475366400000},"content-version":"vor","delay-in-days":3134,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/2.0\/uk\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2008,4,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: We present an algorithm to identify allelic variation given a Whole Genome Shotgun (WGS) assembly of haploid sequences, and to produce a set of haploid consensus sequences rather than a single consensus sequence. Existing WGS assemblers take a column-by-column approach to consensus generation, and produce a single consensus sequence which can be inconsistent with the underlying haploid alleles, and inconsistent with any of the aligned sequence reads. Our new algorithm uses a dynamic windowing approach. It detects alleles by simultaneously processing the portions of aligned reads spanning a region of sequence variation, assigns reads to their respective alleles, phases adjacent variant alleles and generates a consensus sequence corresponding to each confirmed allele. This algorithm was used to produce the first diploid genome sequence of an individual human. It can also be applied to assemblies of multiple diploid individuals and hybrid assemblies of multiple haploid organisms.<\/jats:p><jats:p>Results: Being applied to the individual human genome assembly, the new algorithm detects exactly two confirmed alleles and reports two consensus sequences in 98.98% of the total number 2 033 311 detected regions of sequence variation. In 33 269 out of 460 373 detected regions of size &amp;gt;1 bp, it fixes the constructed errors of a mosaic haploid representation of a diploid locus as produced by the original Celera Assembler consensus algorithm. Using an optimized procedure calibrated against 1 506 344 known SNPs, it detects 438 814 new heterozygous SNPs with false positive rate 12%.<\/jats:p><jats:p>Availability: The open source code is available at: http:\/\/wgs-assembler.cvs.sourceforge.net\/wgs-assembler\/<\/jats:p><jats:p>Contact: \u00a0gdenisov@jcvi.org<\/jats:p>","DOI":"10.1093\/bioinformatics\/btn074","type":"journal-article","created":{"date-parts":[[2008,3,6]],"date-time":"2008-03-06T01:25:31Z","timestamp":1204766731000},"page":"1035-1040","source":"Crossref","is-referenced-by-count":102,"title":["Consensus generation and variant detection by Celera Assembler"],"prefix":"10.1093","volume":"24","author":[{"given":"Gennady","family":"Denisov","sequence":"first","affiliation":[{"name":"J. Craig Venter Institute, 9704 Medical Center Drive, Rockville, MD 20850, USA"}]},{"given":"Brian","family":"Walenz","sequence":"additional","affiliation":[{"name":"J. Craig Venter Institute, 9704 Medical Center Drive, Rockville, MD 20850, USA"}]},{"given":"Aaron L.","family":"Halpern","sequence":"additional","affiliation":[{"name":"J. Craig Venter Institute, 9704 Medical Center Drive, Rockville, MD 20850, USA"}]},{"given":"Jason","family":"Miller","sequence":"additional","affiliation":[{"name":"J. Craig Venter Institute, 9704 Medical Center Drive, Rockville, MD 20850, USA"}]},{"given":"Nelson","family":"Axelrod","sequence":"additional","affiliation":[{"name":"J. Craig Venter Institute, 9704 Medical Center Drive, Rockville, MD 20850, USA"}]},{"given":"Samuel","family":"Levy","sequence":"additional","affiliation":[{"name":"J. Craig Venter Institute, 9704 Medical Center Drive, Rockville, MD 20850, USA"}]},{"given":"Granger","family":"Sutton","sequence":"additional","affiliation":[{"name":"J. Craig Venter Institute, 9704 Medical Center Drive, Rockville, MD 20850, USA"}]}],"member":"286","published-online":{"date-parts":[[2008,3,4]]},"reference":[{"key":"2023020210003074200_B1","doi-asserted-by":"crossref","first-page":"513","DOI":"10.1038\/35035083","article-title":"An SNP map of the human genome generated by reduced representation shotgun sequencing","volume":"407","author":"Altshuler","year":"2000","journal-title":"Nature"},{"key":"2023020210003074200_B2","doi-asserted-by":"crossref","first-page":"1301","DOI":"10.1126\/science.1072104","article-title":"Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes","volume":"297","author":"Aparicio","year":"2002","journal-title":"Science"},{"key":"2023020210003074200_B3","doi-asserted-by":"crossref","first-page":"421","DOI":"10.1093\/bioinformatics\/btf881","article-title":"Redundancy based detection of sequence polymorphisms in expressed sequence tag data using autoSNP","volume":"19","author":"Barker","year":"2003","journal-title":"Bioinformatics"},{"key":"2023020210003074200_B4","first-page":"177","article-title":"ARACHNE: a whole-genome shotgun assembler","volume":"12","author":"Batzoglou","year":"2002","journal-title":"Genome Res."},{"key":"2023020210003074200_B5","doi-asserted-by":"crossref","first-page":"3404","DOI":"10.1093\/nar\/26.14.3404","article-title":"Automated detection of point mutations using fluorescent sequence trace subtraction","volume":"26","author":"Bonfield","year":"1998","journal-title":"Nucleic Acids Res."},{"key":"2023020210003074200_B6","doi-asserted-by":"crossref","first-page":"106","DOI":"10.1371\/journal.pcbi.0010024","article-title":"Bioinformatics for whole-genome shotgun sequencing of microbial communities","volume":"1","author":"Chen","year":"2005","journal-title":"PLoS Comput. Biol."},{"key":"2023020210003074200_B7","first-page":"111","article-title":"Inference of haplotypes from PCR-amplified samples of diploid populations","volume":"7","author":"Clark","year":"1990","journal-title":"Mol. Biol. Evol."},{"key":"2023020210003074200_B8","doi-asserted-by":"crossref","first-page":"229","DOI":"10.1038\/ng1001-229","article-title":"High-resolution haplotype structure in the human genome","volume":"29","author":"Daly","year":"2001","journal-title":"Nat. Genet."},{"key":"2023020210003074200_B9","article-title":"A system and method for improving the accuracy of DNA sequencing and error probability estimation through application of a mathematical model to the analysis of electropherograms","author":"Denisov","year":"2004","journal-title":"US Patent"},{"key":"2023020210003074200_B10","doi-asserted-by":"crossref","first-page":"11240","DOI":"10.1073\/pnas.0604351103","article-title":"A Sanger\/pyrosequencing hybrid approach for the generation of high-quality draft assemblies of marine microbial genomes","volume":"103","author":"Goldberg","year":"2006","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023020210003074200_B11","article-title":"PHRAP documentation","author":"Green","year":"2005"},{"key":"2023020210003074200_B12","doi-asserted-by":"crossref","first-page":"721","DOI":"10.1101\/gr.2264004","article-title":"The Atlas genome assembly system","volume":"14","author":"Havlak","year":"2004","journal-title":"Genome Res."},{"key":"2023020210003074200_B13","doi-asserted-by":"crossref","first-page":"1072","DOI":"10.1126\/science.1105436","article-title":"Whole-genome patterns of common DNA variation in three human populations","volume":"307","author":"Hinds","year":"2005","journal-title":"Science"},{"key":"2023020210003074200_B14","doi-asserted-by":"crossref","first-page":"868","DOI":"10.1101\/gr.9.9.868","article-title":"CAP3: A DNA sequence assembly program","volume":"9","author":"Huang","year":"1999","journal-title":"Genome Res."},{"key":"2023020210003074200_B15","doi-asserted-by":"crossref","first-page":"2164","DOI":"10.1101\/gr.1390403","article-title":"PCAP: a whole-genome assembly program","volume":"13","author":"Huang","year":"2003","journal-title":"Genome Res."},{"key":"2023020210003074200_B16","doi-asserted-by":"crossref","first-page":"495","DOI":"10.1093\/bioinformatics\/btk006","article-title":"SEAN: SNP prediction and display program utilizing EST sequence clusters","volume":"22","author":"Huntley","year":"2006","journal-title":"Bioinformatics"},{"key":"2023020210003074200_B17","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1186\/1471-2105-6-303","article-title":"Analysis of concordance of different haplotype block partitioning algorithms","volume":"6","author":"Indap","year":"2005","journal-title":"BMC Bioinformatics"},{"key":"2023020210003074200_B18","doi-asserted-by":"crossref","first-page":"1299","DOI":"10.1038\/nature04226","article-title":"A haplotype map of the human genome","volume":"437","author":"International HapMap Consortium","year":"2005","journal-title":"Nature"},{"key":"2023020210003074200_B19","doi-asserted-by":"crossref","first-page":"1916","DOI":"10.1073\/pnas.0307971100","article-title":"Whole-genome shotgun assembly and comparison of human genome assemblies","volume":"101","author":"Istrail","year":"2004","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023020210003074200_B20","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1101\/gr.828403","article-title":"Whole-genome sequence assembly for mammalian genomes: Arachne 2","volume":"13","author":"Jaffe","year":"2003","journal-title":"Genome Res."},{"key":"2023020210003074200_B21","doi-asserted-by":"crossref","first-page":"7329","DOI":"10.1073\/pnas.0401648101","article-title":"The diploid genome sequence of Candida albicans","volume":"101","author":"Jones","year":"2004","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023020210003074200_B22","doi-asserted-by":"crossref","first-page":"1541","DOI":"10.1101\/gr.183201","article-title":"Assembly of the working draft of the human genome with GigAssembler","volume":"11","author":"Kent","year":"2001","journal-title":"Genome Res."},{"key":"2023020210003074200_B23","doi-asserted-by":"crossref","first-page":"1101","DOI":"10.1101\/gr.5894107","article-title":"Diploid genome reconstruction of Ciona intestinalis and comparative analysis with Ciona savignyi","volume":"17","author":"Kim","year":"2007","journal-title":"Genome Res."},{"key":"2023020210003074200_B24","doi-asserted-by":"crossref","first-page":"88","DOI":"10.1109\/TCBB.2007.1007","article-title":"Accuracy assessment of diploid consensus sequences","volume":"4","author":"Kim","year":"2007","journal-title":"IEEE\/ACM Trans. Comput. Biol. Bioinform."},{"key":"2023020210003074200_B25","doi-asserted-by":"crossref","first-page":"182","DOI":"10.1007\/3-540-44676-1_15","article-title":"SNPs problems, complexity, and algorithms","volume":"2161","author":"Lancia","year":"2001","journal-title":"Lect. Notes Comput. Sci."},{"key":"2023020210003074200_B26","doi-asserted-by":"crossref","first-page":"2113","DOI":"10.1371\/journal.pbio.0050254","article-title":"The diploid genome sequence of an individual human","volume":"5","author":"Levy","year":"2007","journal-title":"PLoS Biol."},{"key":"2023020210003074200_B27","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1093\/bib\/3.1.23","article-title":"Algorithmic strategies for the single nucleotide polymorphism haplotype assembly problem","volume":"3","author":"Lippert","year":"2002","journal-title":"Brief. Bioinform."},{"key":"2023020210003074200_B28","doi-asserted-by":"crossref","first-page":"452","DOI":"10.1038\/70570","article-title":"A general approach to single-nucleotide polymorphism discovery","volume":"23","author":"Marth","year":"1999","journal-title":"Nat. Gen."},{"key":"2023020210003074200_B29","doi-asserted-by":"crossref","DOI":"10.56021\/9780801857423","volume-title":"Mendelian Inheritance in Man","author":"McKusick","year":"1998"},{"key":"2023020210003074200_B30","doi-asserted-by":"crossref","first-page":"81","DOI":"10.1101\/gr.731003","article-title":"The phusion assembler","volume":"13","author":"Mullikin","year":"2003","journal-title":"Genome Res."},{"key":"2023020210003074200_B31","doi-asserted-by":"crossref","first-page":"2196","DOI":"10.1126\/science.287.5461.2196","article-title":"A whole-genome assembly of Drosophila","volume":"287","author":"Myers","year":"2000","journal-title":"Science"},{"key":"2023020210003074200_B32","doi-asserted-by":"crossref","first-page":"2745","DOI":"10.1093\/nar\/25.14.2745","article-title":"PolyPhred: automating the detection and genotyping of single nucleotide substitutions using fluorescence-based resequencing","volume":"25","author":"Nickerson","year":"1997","journal-title":"Nucleic Acids Res."},{"key":"2023020210003074200_B33","doi-asserted-by":"crossref","first-page":"9748","DOI":"10.1073\/pnas.171285098","article-title":"An Eulerian path approach to DNA fragment assembly","volume":"98","author":"Pevzner","year":"2001","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023020210003074200_B34","doi-asserted-by":"crossref","first-page":"489","DOI":"10.1126\/science.1059431","article-title":"Haplotype variation and linkage disequilibrium in 313 human genes","volume":"293","author":"Stephens","year":"2001","journal-title":"Science"},{"key":"2023020210003074200_B35","doi-asserted-by":"crossref","first-page":"9","DOI":"10.1089\/gst.1995.1.9","article-title":"TIGR Assembler: A new tool for assembling large shotgun sequencing projects","volume":"1","author":"Sutton","year":"1995","journal-title":"Genome Sci. Technol."},{"key":"2023020210003074200_B36","doi-asserted-by":"crossref","first-page":"805","DOI":"10.1038\/nrg1709","article-title":"Metagenomics: DNA sequencing of environmental samples","volume":"6","author":"Tringe","year":"2005","journal-title":"Nat. Rev. Genet."},{"key":"2023020210003074200_B37","doi-asserted-by":"crossref","first-page":"1304","DOI":"10.1126\/science.1058040","article-title":"The sequence of the human genome","volume":"291","author":"Venter","year":"2001","journal-title":"Science"},{"key":"2023020210003074200_B38","doi-asserted-by":"crossref","first-page":"66","DOI":"10.1126\/science.1093857","article-title":"Environmental genome shotgun sequencing of the Sargasso Sea","volume":"304","author":"Venter","year":"2004","journal-title":"Science"},{"key":"2023020210003074200_B39","doi-asserted-by":"crossref","first-page":"220","DOI":"10.1186\/1471-2105-6-220","article-title":"A method for finding single-nucleotide polymorphisms with allele frequencies in sequences of deep coverage","volume":"6","author":"Wang","year":"2005","journal-title":"BMC Bioinformatics"},{"key":"2023020210003074200_B40","doi-asserted-by":"crossref","first-page":"824","DOI":"10.1101\/gr.165102","article-title":"RePS: a sequence assembler that masks exact repeats identified from the shotgun data","volume":"12","author":"Wang","year":"2002","journal-title":"Genome Res."},{"key":"2023020210003074200_B41","doi-asserted-by":"crossref","first-page":"520","DOI":"10.1038\/nature01262","article-title":"Initial sequencing and comparative analysis of the mouse genome","volume":"420","author":"Waterston","year":"2002","journal-title":"Nature"},{"key":"2023020210003074200_B42","doi-asserted-by":"crossref","first-page":"e16","DOI":"10.1371\/journal.pbio.0050016","article-title":"The sorcerer II global ocean sampling expedition: expanding the universe of protein families","volume":"5","author":"Yooseph","year":"2007","journal-title":"PLoS Biol."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/24\/8\/1035\/49046060\/bioinformatics_24_8_1035.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/24\/8\/1035\/49046060\/bioinformatics_24_8_1035.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,16]],"date-time":"2023-05-16T21:25:21Z","timestamp":1684272321000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/24\/8\/1035\/212858"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,3,4]]},"references-count":42,"journal-issue":{"issue":"8","published-print":{"date-parts":[[2008,4,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btn074","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2008,4,15]]},"published":{"date-parts":[[2008,3,4]]}}}