{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,28]],"date-time":"2026-01-28T12:23:49Z","timestamp":1769603029175,"version":"3.49.0"},"reference-count":35,"publisher":"Oxford University Press (OUP)","issue":"12","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2015,6,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Large insertions of novel sequence are an important type of structural variants. Previous studies used traditional de novo assemblers for assembling non-mapping high-throughput sequencing (HTS) or capillary reads and then tried to anchor them in the reference using paired read information.<\/jats:p>\n               <jats:p>Results: We present approaches for detecting insertion breakpoints and targeted assembly of large insertions from HTS paired data: BASIL and ANISE. On near identity repeats that are hard for assemblers, ANISE employs a repeat resolution step. This results in far better reconstructions than obtained by the compared methods. On simulated data, we found our insert assembler to be competitive with the de novo assemblers ABYSS and SGA while yielding already anchored inserted sequence as opposed to unanchored contigs as from ABYSS\/SGA. On real-world data, we detected novel sequence in a human individual and thoroughly validated the assembled sequence. ANISE was found to be superior to the competing tool MindTheGap on both simulated and real-world data.<\/jats:p>\n               <jats:p>Availability and implementation: ANISE and BASIL are available for download at http:\/\/www.seqan.de\/projects\/herbarium under a permissive open source license.<\/jats:p>\n               <jats:p>Contact: manuel.holtgrewe@fu-berlin.de or knut.reinert@fu-berlin.de<\/jats:p>\n               <jats:p>Supplementary information: Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btv051","type":"journal-article","created":{"date-parts":[[2015,2,4]],"date-time":"2015-02-04T01:18:07Z","timestamp":1423012687000},"page":"1904-1912","source":"Crossref","is-referenced-by-count":20,"title":["Methods for the detection and assembly of novel sequence in high-throughput sequencing data"],"prefix":"10.1093","volume":"31","author":[{"given":"Manuel","family":"Holtgrewe","sequence":"first","affiliation":[{"name":"1 Department of Computer Science, Freie Universit\u00e4t Berlin and 2Max Planck Institute for Molecular Genetics, Berlin, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Leon","family":"Kuchenbecker","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science, Freie Universit\u00e4t Berlin and 2Max Planck Institute for Molecular Genetics, Berlin, Germany"},{"name":"1 Department of Computer Science, Freie Universit\u00e4t Berlin and 2Max Planck Institute for Molecular Genetics, Berlin, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Knut","family":"Reinert","sequence":"additional","affiliation":[{"name":"1 Department of Computer Science, Freie Universit\u00e4t Berlin and 2Max Planck Institute for Molecular Genetics, Berlin, Germany"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2015,2,2]]},"reference":[{"key":"2023020115130995900_btv051-B1","doi-asserted-by":"crossref","first-page":"363","DOI":"10.1038\/nrg2958","article-title":"Genome structural variation discovery and genotyping","volume":"12","author":"Alkan","year":"2011","journal-title":"Nat. Rev. Genet."},{"key":"2023020115130995900_btv051-B2","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J. Mol. Biol."},{"key":"2023020115130995900_btv051-B3","doi-asserted-by":"crossref","first-page":"369","DOI":"10.1089\/cmb.1997.4.369","article-title":"ReAligner: a program for refining DNA sequence multi-alignments","volume":"4","author":"Anson","year":"1997","journal-title":"J. Comput. Biol."},{"key":"2023020115130995900_btv051-B4","doi-asserted-by":"crossref","first-page":"675","DOI":"10.1007\/BF02945456","article-title":"The haplotyping problem: an overview of computational models and solutions","volume":"18","author":"Bonizzoni","year":"2003","journal-title":"J. Comput. Sci. Technol."},{"key":"2023020115130995900_btv051-B5","author":"Chevreux","year":"2005"},{"key":"2023020115130995900_btv051-B6","doi-asserted-by":"crossref","first-page":"2156","DOI":"10.1093\/bioinformatics\/btr330","article-title":"The variant call format and VCFtools","volume":"27","author":"Danecek","year":"2011","journal-title":"Bioinformatics"},{"key":"2023020115130995900_btv051-B7","doi-asserted-by":"crossref","first-page":"1002384","DOI":"10.1371\/journal.pgen.1002384","article-title":"Repetitive elements may comprise over two-thirds of the human genome","volume":"7","author":"de Koning","year":"2011","journal-title":"PLoS Genet."},{"key":"2023020115130995900_btv051-B8","doi-asserted-by":"crossref","first-page":"23","DOI":"10.1016\/j.entcs.2011.06.003","article-title":"LEMON\u2014an open source C++ graph template library","volume":"264","author":"Dezs\u0151","year":"2011","journal-title":"Electr. Notes Theor. Comput. Sci."},{"key":"2023020115130995900_btv051-B9","doi-asserted-by":"crossref","first-page":"161","DOI":"10.2307\/1969503","article-title":"A decomposition theorem for partially ordered sets","volume":"51","author":"Dilworth","year":"1950","journal-title":"Ann. Math."},{"key":"2023020115130995900_btv051-B10","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1186\/1471-2105-9-11","article-title":"SeqAn an efficient, generic C++ library for sequence analysis","volume":"9","author":"D\u00f6ring","year":"2008","journal-title":"BMC Bioinf."},{"key":"2023020115130995900_btv051-B11","doi-asserted-by":"crossref","first-page":"1000074","DOI":"10.1371\/journal.pcbi.1000074","article-title":"Viral population estimation using pyrosequencing","volume":"4","author":"Eriksson","year":"2008","journal-title":"PLoS Comput. Biol."},{"key":"2023020115130995900_btv051-B12","doi-asserted-by":"crossref","first-page":"1277","DOI":"10.1093\/bioinformatics\/btq152","article-title":"Detection and characterization of novel sequence insertions using paired-end next-generation sequencing","volume":"26","author":"Hajirasouliha","year":"2010","journal-title":"Bioinformatics"},{"key":"2023020115130995900_btv051-B13","doi-asserted-by":"crossref","first-page":"603","DOI":"10.1145\/585265.585267","article-title":"The greedy path-merging algorithm for contig scaffolding","volume":"49","author":"Huson","year":"2002","journal-title":"J. ACM (JACM)"},{"key":"2023020115130995900_btv051-B14","doi-asserted-by":"crossref","first-page":"226","DOI":"10.1038\/ng.1028","article-title":"De\u00a0novo assembly and genotyping of variants using colored de Bruijn graphs","volume":"44","author":"Iqbal","year":"2012","journal-title":"Nat. Genet."},{"key":"2023020115130995900_btv051-B15","first-page":"176","article-title":"Separating repeats in DNA sequence assembly","author":"Kececioglu","year":"2001"},{"key":"2023020115130995900_btv051-B16","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1038\/nature06862","article-title":"Mapping and sequencing of structural variation from eight human genomes","volume":"453","author":"Kidd","year":"2008","journal-title":"Nature"},{"key":"2023020115130995900_btv051-B17","doi-asserted-by":"crossref","first-page":"365","DOI":"10.1038\/nmeth.1451","article-title":"Characterization of missing human genome sequences and copy-number polymorphic insertions","volume":"7","author":"Kidd","year":"2010","journal-title":"Nat. Methods"},{"key":"2023020115130995900_btv051-B18","doi-asserted-by":"crossref","first-page":"128","DOI":"10.1093\/nar\/gkt339","article-title":"Reprever: resolving low-copy duplicated sequences using template driven assembly","volume":"41","author":"Kim","year":"2013","journal-title":"Nucleic Acids Res."},{"key":"2023020115130995900_btv051-B19","doi-asserted-by":"crossref","first-page":"1754","DOI":"10.1093\/bioinformatics\/btp324","article-title":"Fast and accurate short read alignment with Burrows\u2013Wheeler transform","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2023020115130995900_btv051-B20","doi-asserted-by":"crossref","first-page":"2078","DOI":"10.1093\/bioinformatics\/btp352","article-title":"The sequence alignment\/map format and SAMtools","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2023020115130995900_btv051-B21","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1101\/gr.132480.111","article-title":"SOAPindel: efficient identification of indels from short paired reads","volume":"23","author":"Li","year":"2013","journal-title":"Genome Res."},{"key":"2023020115130995900_btv051-B22","doi-asserted-by":"crossref","first-page":"13","DOI":"10.1038\/nmeth.1374","article-title":"Computational methods for discovering structural variation with next-generation sequencing","volume":"6","author":"Medvedev","year":"2009","journal-title":"Nat. Methods"},{"key":"2023020115130995900_btv051-B23","doi-asserted-by":"crossref","first-page":"2818","DOI":"10.1093\/bioinformatics\/btn548","article-title":"Aggressive assembly of pyrosequencing reads with mates","volume":"24","author":"Miller","year":"2008","journal-title":"Bioinformatics"},{"key":"2023020115130995900_btv051-B24","doi-asserted-by":"crossref","first-page":"2196","DOI":"10.1126\/science.287.5461.2196","article-title":"A whole-genome assembly of Drosophila","volume":"287","author":"Myers","year":"2000","journal-title":"Science"},{"key":"2023020115130995900_btv051-B25","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1186\/1471-2105-12-S6-S3","article-title":"Assembly of non-unique insertion content using next-generation sequencing","volume":"12","author":"Parrish","year":"2011","journal-title":"BMC Bioinf."},{"key":"2023020115130995900_btv051-B26","doi-asserted-by":"crossref","first-page":"1118","DOI":"10.1093\/bioinformatics\/btp131","article-title":"A consistency-based consensus algorithm for de novo and reference-guided sequence assembly of short reads","volume":"25","author":"Rausch","year":"2009","journal-title":"Bioinformatics"},{"key":"2023020115130995900_btv051-B27","doi-asserted-by":"crossref","first-page":"333","DOI":"10.1093\/bioinformatics\/bts378","article-title":"DELLY: structural variant discovery by integrated paired-end and split-read analysis","volume":"28","author":"Rausch","year":"2012","journal-title":"Bioinformatics"},{"key":"2023020115130995900_btv051-B28","doi-asserted-by":"crossref","first-page":"3451","DOI":"10.1093\/bioinformatics\/btu545","article-title":"MindTheGap: integrated detection and assembly of short and long insertions","volume":"30","author":"Rizk","year":"2014","journal-title":"Bioinformatics."},{"key":"2023020115130995900_btv051-B29","doi-asserted-by":"crossref","first-page":"549","DOI":"10.1101\/gr.126953.111","article-title":"Efficient de novo assembly of large genomes using compressed data structures","volume":"22","author":"Simpson","year":"2012","journal-title":"Genome Res."},{"key":"2023020115130995900_btv051-B30","doi-asserted-by":"crossref","first-page":"1117","DOI":"10.1101\/gr.089532.108","article-title":"ABySS: a parallel assembler for short read sequence data","volume":"19","author":"Simpson","year":"2009","journal-title":"Genome Res."},{"key":"2023020115130995900_btv051-B31","doi-asserted-by":"crossref","first-page":"641","DOI":"10.1126\/science.1197005","article-title":"Diversity of human copy number variation and multicopy genes","volume":"330","author":"Sudmant","year":"2010","journal-title":"Science"},{"key":"2023020115130995900_btv051-B32","doi-asserted-by":"crossref","first-page":"379","DOI":"10.1093\/bioinformatics\/18.3.379","article-title":"Separation of nearly identical repeats in shotgun assemblies using defined nucleotide positions, DNPs","volume":"18","author":"Tammi","year":"2002","journal-title":"Bioinformatics"},{"key":"2023020115130995900_btv051-B33","doi-asserted-by":"crossref","first-page":"652","DOI":"10.1038\/nmeth.1628","article-title":"CREST maps somatic structural variation in cancer genomes with base-pair resolution","volume":"8","author":"Wang","year":"2011","journal-title":"Nat. Methods"},{"key":"2023020115130995900_btv051-B34","doi-asserted-by":"crossref","first-page":"2592","DOI":"10.1093\/bioinformatics\/bts505","article-title":"RazerS 3: faster, fully sensitive read mapping","volume":"28","author":"Weese","year":"2012","journal-title":"Bioinformatics"},{"key":"2023020115130995900_btv051-B35","doi-asserted-by":"crossref","first-page":"821","DOI":"10.1101\/gr.074492.107","article-title":"Velvet: algorithms for de novo short read assembly using de Bruijn graphs","volume":"18","author":"Zerbino","year":"2008","journal-title":"Genome Res."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/31\/12\/1904\/49014141\/bioinformatics_31_12_1904.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/31\/12\/1904\/49014141\/bioinformatics_31_12_1904.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T00:03:08Z","timestamp":1675296188000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/31\/12\/1904\/213830"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,2,2]]},"references-count":35,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2015,6,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btv051","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2015,6,15]]},"published":{"date-parts":[[2015,2,2]]}}}