{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,11,13]],"date-time":"2025-11-13T12:28:12Z","timestamp":1763036892338},"reference-count":39,"publisher":"Oxford University Press (OUP)","issue":"7","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2016,4,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: The detection of genomic structural variation (SV) has advanced tremendously in recent years due to progress in high-throughput sequencing technologies. Novel sequence insertions, insertions without similarity to a human reference genome, have received less attention than other types of SVs due to the computational challenges in their detection from short read sequencing data, which inherently involves de novo assembly. De\u00a0novo assembly is not only computationally challenging, but also requires high-quality data. Although the reads from a single individual may not always meet this requirement, using reads from multiple individuals can increase power to detect novel insertions.<\/jats:p>\n               <jats:p>Results: We have developed the program PopIns, which can discover and characterize non-reference insertions of 100\u2009bp or longer on a population scale. In this article, we describe the approach we implemented in PopIns. It takes as input a reads-to-reference alignment, assembles unaligned reads using a standard assembly tool, merges the contigs of different individuals into high-confidence sequences, anchors the merged sequences into the reference genome, and finally genotypes all individuals for the discovered insertions. Our tests on simulated data indicate that the merging step greatly improves the quality and reliability of predicted insertions and that PopIns shows significantly better recall and precision than the recent tool MindTheGap. Preliminary results on a dataset of 305 Icelanders demonstrate the practicality of the new approach.<\/jats:p>\n               <jats:p>Availability and implementation: The source code of PopIns is available from http:\/\/github.com\/bkehr\/popins.<\/jats:p>\n               <jats:p>Contact: \u00a0birte.kehr@decode.is<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btv273","type":"journal-article","created":{"date-parts":[[2015,4,30]],"date-time":"2015-04-30T00:32:23Z","timestamp":1430353943000},"page":"961-967","source":"Crossref","is-referenced-by-count":33,"title":["PopIns: population-scale detection of novel sequence insertions"],"prefix":"10.1093","volume":"32","author":[{"given":"Birte","family":"Kehr","sequence":"first","affiliation":[{"name":"1 deCODE genetics\/Amgen, Reykjav\u00edk, Iceland,"}]},{"given":"P\u00e1ll","family":"Melsted","sequence":"additional","affiliation":[{"name":"1 deCODE genetics\/Amgen, Reykjav\u00edk, Iceland,"},{"name":"2 Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjav\u00edk, Iceland and"}]},{"given":"Bjarni V.","family":"Halld\u00f3rsson","sequence":"additional","affiliation":[{"name":"1 deCODE genetics\/Amgen, Reykjav\u00edk, Iceland,"},{"name":"3 Institute of Biomedical and Neural Engineering, Reykjav\u00edk University, Reykjav\u00edk, Iceland"}]}],"member":"286","published-online":{"date-parts":[[2015,4,28]]},"reference":[{"key":"2023020111594308400_btv273-B1","doi-asserted-by":"crossref","first-page":"1061","DOI":"10.1038\/nature09534","article-title":"A map of human genome variation from population-scale sequencing","volume":"467","author":"1000 Genomes Project Consortium","year":"2010","journal-title":"Nature"},{"key":"2023020111594308400_btv273-B2","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1038\/nature11632","article-title":"An integrated map of genetic variation from 1,092 human genomes","volume":"491","author":"1000 Genomes Project Consortium","year":"2012","journal-title":"Nature"},{"key":"2023020111594308400_btv273-B3","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1038\/nmeth.1527","article-title":"Limitations of next-generation genome sequence assembly","volume":"8","author":"Alkan","year":"2011","journal-title":"Nat. Methods"},{"key":"2023020111594308400_btv273-B4","doi-asserted-by":"crossref","first-page":"455","DOI":"10.1089\/cmb.2012.0021","article-title":"SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing","volume":"19","author":"Bankevich","year":"2012","journal-title":"J. Comput. Biol."},{"key":"2023020111594308400_btv273-B5","doi-asserted-by":"crossref","first-page":"221","DOI":"10.1038\/ejhg.2013.118","article-title":"The genome of the Netherlands: design, and project goals","volume":"22","author":"Boomsma","year":"2014","journal-title":"Eur. J. Hum. Genet."},{"key":"2023020111594308400_btv273-B6","doi-asserted-by":"crossref","first-page":"608","DOI":"10.1038\/nature13907","article-title":"Resolving the complexity of the human genome using single-molecule sequencing","volume":"517","author":"Chaisson","year":"2014","journal-title":"Nature"},{"key":"2023020111594308400_btv273-B7","doi-asserted-by":"crossref","first-page":"310","DOI":"10.1101\/gr.162883.113","article-title":"TIGRA: a targeted iterative graph routing assembler for breakpoint assembly","volume":"24","author":"Chen","year":"2014","journal-title":"Genome Res."},{"key":"2023020111594308400_btv273-B8","doi-asserted-by":"crossref","first-page":"704","DOI":"10.1038\/nature08516","article-title":"Origins and functional impact of copy number variation in the human genome","volume":"464","author":"Conrad","year":"2010","journal-title":"Nature"},{"key":"2023020111594308400_btv273-B9","doi-asserted-by":"crossref","first-page":"491","DOI":"10.1038\/ng.806","article-title":"A framework for variation discovery and genotyping using next-generation DNA sequencing data","volume":"43","author":"DePristo","year":"2011","journal-title":"Nat. Genet."},{"key":"2023020111594308400_btv273-B10","doi-asserted-by":"crossref","first-page":"11","DOI":"10.1186\/1471-2105-9-11","article-title":"SeqAn an efficient, generic C++ library for sequence analysis","volume":"9","author":"D\u00f6ring","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023020111594308400_btv273-B11","doi-asserted-by":"crossref","first-page":"e47768","DOI":"10.1371\/journal.pone.0047768","article-title":"Mind the gap: upgrading genomes with pacific biosciences RS long-read sequencing technology","volume":"7","author":"English","year":"2012","journal-title":"PloS One"},{"key":"2023020111594308400_btv273-B12","doi-asserted-by":"crossref","first-page":"351","DOI":"10.1007\/BF02603120","article-title":"Progressive sequence alignment as a prerequisite to correct phylogenetic trees","volume":"25","author":"Feng","year":"1987","journal-title":"J. Mol. Evol."},{"key":"2023020111594308400_btv273-B13","article-title":"Haplotype-based variant detection from short-read sequencing","author":"Garrison","year":"2012","journal-title":"arXiv preprint arXiv:1207.3907 [q-bio.GN]"},{"key":"2023020111594308400_btv273-B14","doi-asserted-by":"crossref","first-page":"789","DOI":"10.1038\/nature02168","article-title":"The international HapMap project","volume":"426","author":"Gibbs","year":"2003","journal-title":"Nature"},{"key":"2023020111594308400_btv273-B15","doi-asserted-by":"crossref","first-page":"1513","DOI":"10.1073\/pnas.1017351108","article-title":"High-quality draft assemblies of mammalian genomes from massively parallel sequence data","volume":"108","author":"Gnerre","year":"2011","journal-title":"Proc. Natl. Acad. Sci."},{"key":"2023020111594308400_btv273-B16","doi-asserted-by":"crossref","first-page":"435","DOI":"10.1038\/ng.3247","article-title":"Large-scale whole-genome sequencing of the icelandic population","volume":"47","author":"Gudbjartsson","year":"2015","journal-title":"Nat. Genet."},{"key":"2023020111594308400_btv273-B17","doi-asserted-by":"crossref","first-page":"1277","DOI":"10.1093\/bioinformatics\/btq152","article-title":"Detection and characterization of novel sequence insertions using paired-end next-generation sequencing","volume":"26","author":"Hajirasouliha","year":"2010","journal-title":"Bioinformatics"},{"key":"2023020111594308400_btv273-B18","author":"Holtgrewe","year":"2010"},{"key":"2023020111594308400_btv273-B19","doi-asserted-by":"crossref","first-page":"1904","DOI":"10.1093\/bioinformatics\/btv051","article-title":"Methods for the detection and assembly of novel sequence in high-throughput sequencing data","volume":"31","author":"Holtgrewe","year":"2015","journal-title":"Bioinformatics"},{"key":"2023020111594308400_btv273-B20","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/s00251-007-0262-2","article-title":"Variation analysis and gene annotation of eight MHC haplotypes: the MHC Haplotype Project","volume":"60","author":"Horton","year":"2008","journal-title":"Immunogenetics"},{"key":"2023020111594308400_btv273-B21","doi-asserted-by":"crossref","first-page":"860","DOI":"10.1038\/35057062","article-title":"Initial sequencing and analysis of the human genome","volume":"409","author":"International Human Genome Sequencing Consortium","year":"2001","journal-title":"Nature"},{"key":"2023020111594308400_btv273-B22","doi-asserted-by":"crossref","first-page":"226","DOI":"10.1038\/ng.1028","article-title":"De\u00a0novo assembly and genotyping of variants using colored de Bruijn graphs","volume":"44","author":"Iqbal","year":"2012","journal-title":"Nat. Genet."},{"key":"2023020111594308400_btv273-B23","doi-asserted-by":"crossref","first-page":"S15","DOI":"10.1186\/1471-2105-12-S9-S15","article-title":"STELLAR: fast and exact local alignments","volume":"12","author":"Kehr","year":"2011","journal-title":"BMC Bioinformatics"},{"key":"2023020111594308400_btv273-B24","doi-asserted-by":"crossref","first-page":"e128","DOI":"10.1093\/nar\/gkt339","article-title":"Reprever: resolving low-copy duplicated sequences using template driven assembly","volume":"41","author":"Kim","year":"2013","journal-title":"Nucleic Acids Res."},{"key":"2023020111594308400_btv273-B25","doi-asserted-by":"crossref","first-page":"265","DOI":"10.1101\/gr.097261.109","article-title":"De\u00a0novo assembly of human genomes with massively parallel short read sequencing","volume":"20","author":"Li","year":"2010","journal-title":"Genome Res."},{"key":"2023020111594308400_btv273-B26","doi-asserted-by":"crossref","first-page":"2875","DOI":"10.1093\/bioinformatics\/bts566","article-title":"Clever: clique-enumerating variant finder","volume":"28","author":"Marschall","year":"2012","journal-title":"Bioinformatics"},{"key":"2023020111594308400_btv273-B27","doi-asserted-by":"crossref","first-page":"1297","DOI":"10.1101\/gr.107524.110","article-title":"The genome analysis toolkit: a map reduce framework for analyzing next-generation DNA sequencing data","volume":"20","author":"McKenna","year":"2010","journal-title":"Genome Res."},{"key":"2023020111594308400_btv273-B28","doi-asserted-by":"crossref","first-page":"315","DOI":"10.1016\/j.ygeno.2010.03.001","article-title":"Assembly algorithms for next-generation sequencing data","volume":"95","author":"Miller","year":"2010","journal-title":"Genomics"},{"key":"2023020111594308400_btv273-B29","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1038\/nature09708","article-title":"Mapping copy number variation by population-scale genome sequencing","volume":"470","author":"Mills","year":"2011","journal-title":"Nature"},{"key":"2023020111594308400_btv273-B30","doi-asserted-by":"crossref","first-page":"S8","DOI":"10.1186\/1471-2164-14-S1-S8","article-title":"Genome reassembly with high-throughput sequencing data","volume":"14","author":"Parrish","year":"2013","journal-title":"BMC Genomics"},{"key":"2023020111594308400_btv273-B31","doi-asserted-by":"crossref","first-page":"296","DOI":"10.1089\/cmb.2006.13.296","article-title":"Efficient q-gram filters for finding all epsilon-matches over a given length","volume":"13","author":"Rasmussen","year":"2006","journal-title":"J. Comput. Biol."},{"key":"2023020111594308400_btv273-B32","doi-asserted-by":"crossref","first-page":"i333","DOI":"10.1093\/bioinformatics\/bts378","article-title":"Delly: structural variant discovery by integrated paired-end and split-read analysis","volume":"28","author":"Rausch","year":"2012","journal-title":"Bioinformatics"},{"key":"2023020111594308400_btv273-B33","doi-asserted-by":"crossref","first-page":"3451","DOI":"10.1093\/bioinformatics\/btu545","article-title":"MindTheGap: integrated detection and assembly of short and long insertions","volume":"30","author":"Rizk","year":"2014","journal-title":"Bioinformatics"},{"key":"2023020111594308400_btv273-B34","doi-asserted-by":"crossref","first-page":"437","DOI":"10.1146\/annurev-med-100708-204735","article-title":"Structural variation in the human genome and its role in disease","volume":"61","author":"Stankiewicz","year":"2010","journal-title":"Annu. Rev. Med."},{"key":"2023020111594308400_btv273-B35","doi-asserted-by":"crossref","first-page":"1304","DOI":"10.1126\/science.1058040","article-title":"The sequence of the human genome","volume":"291","author":"Venter","year":"2001","journal-title":"Science"},{"key":"2023020111594308400_btv273-B36","doi-asserted-by":"crossref","first-page":"337","DOI":"10.1089\/cmb.1994.1.337","article-title":"On the complexity of multiple sequence alignment","volume":"1","author":"Wang","year":"1994","journal-title":"J. Comput. Biol."},{"key":"2023020111594308400_btv273-B37","doi-asserted-by":"crossref","first-page":"821","DOI":"10.1101\/gr.074492.107","article-title":"Velvet: algorithms for de novo short read assembly using de Bruijn graphs","volume":"18","author":"Zerbino","year":"2008","journal-title":"Genome Res."},{"key":"2023020111594308400_btv273-B38","doi-asserted-by":"crossref","first-page":"179","DOI":"10.1126\/science.1216830","article-title":"Integrating genomes","volume":"336","author":"Zerbino","year":"2012","journal-title":"Science"},{"key":"2023020111594308400_btv273-B39","doi-asserted-by":"crossref","first-page":"2669","DOI":"10.1093\/bioinformatics\/btt476","article-title":"The MaSuRCA genome assembler","volume":"29","author":"Zimin","year":"2013","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/32\/7\/961\/49018371\/bioinformatics_32_7_961.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/32\/7\/961\/49018371\/bioinformatics_32_7_961.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,1]],"date-time":"2023-02-01T22:24:17Z","timestamp":1675290257000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/32\/7\/961\/2240308"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,4,28]]},"references-count":39,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2016,4,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btv273","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2016,4,1]]},"published":{"date-parts":[[2015,4,28]]}}}