{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,27]],"date-time":"2025-10-27T10:50:32Z","timestamp":1761562232449,"version":"3.37.3"},"reference-count":34,"publisher":"Oxford University Press (OUP)","issue":"14","license":[{"start":{"date-parts":[[2017,7,12]],"date-time":"2017-07-12T00:00:00Z","timestamp":1499817600000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"funder":[{"DOI":"10.13039\/100004410","name":"European Molecular Biology Organization","doi-asserted-by":"publisher","award":["EMBO-IG 2521"],"award-info":[{"award-number":["EMBO-IG 2521"]}],"id":[{"id":"10.13039\/100004410","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100004410","name":"Scientific and Technological Research Council of Turkey","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100004410","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2017,7,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Despite recent advances in algorithms design to characterize structural variation using high-throughput short read sequencing (HTS) data, characterization of novel sequence insertions longer than the average read length remains a challenging task. This is mainly due to both computational difficulties and the complexities imposed by genomic repeats in generating reliable assemblies to accurately detect both the sequence content and the exact location of such insertions. Additionally, de novo genome assembly algorithms typically require a very high depth of coverage, which may be a limiting factor for most genome studies. Therefore, characterization of novel sequence insertions is not a routine part of most sequencing projects.<\/jats:p>\n                  <jats:p>There are only a handful of algorithms that are specifically developed for novel sequence insertion discovery that can bypass the need for the whole genome de novo assembly. Still, most such algorithms rely on high depth of coverage, and to our knowledge there is only one method (PopIns) that can use multi-sample data to \u201ccollectively\u201d obtain a very high coverage dataset to accurately find insertions common in a given population.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Result<\/jats:title>\n                  <jats:p>Here, we present Pamir, a new algorithm to efficiently and accurately discover and genotype novel sequence insertions using either single or multiple genome sequencing datasets. Pamir is able to detect breakpoint locations of the insertions and calculate their zygosity (i.e. heterozygous versus homozygous) by analyzing multiple sequence signatures, matching one-end-anchored sequences to small-scale de novo assemblies of unmapped reads, and conducting strand-aware local assembly. We test the efficacy of Pamir on both simulated and real data, and demonstrate its potential use in accurate and routine identification of novel sequence insertions in genome projects.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>Pamir is available at https:\/\/github.com\/vpc-ccg\/pamir.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btx254","type":"journal-article","created":{"date-parts":[[2017,4,24]],"date-time":"2017-04-24T07:39:48Z","timestamp":1493019588000},"page":"i161-i169","source":"Crossref","is-referenced-by-count":29,"title":["Discovery and genotyping of novel sequence insertions in many sequenced individuals"],"prefix":"10.1093","volume":"33","author":[{"given":"P\u0131nar","family":"Kavak","sequence":"first","affiliation":[{"name":"Department of Computer Engineering, Bo\u011fazi\u00e7i University, Istanbul, Turkey"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yen-Yi","family":"Lin","sequence":"additional","affiliation":[{"name":"School of Computing Science, Simon Fraser University, Burnaby, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Ibrahim","family":"Numanagi\u0107","sequence":"additional","affiliation":[{"name":"School of Computing Science, Simon Fraser University, Burnaby, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hossein","family":"Asghari","sequence":"additional","affiliation":[{"name":"School of Computing Science, Simon Fraser University, Burnaby, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tunga","family":"G\u00fcng\u00f6r","sequence":"additional","affiliation":[{"name":"Department of Computer Engineering, Bo\u011fazi\u00e7i University, Istanbul, Turkey"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Can","family":"Alkan","sequence":"additional","affiliation":[{"name":"Department of Computer Engineering, Bilkent University, Ankara, Turkey"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Faraz","family":"Hach","sequence":"additional","affiliation":[{"name":"School of Computing Science, Simon Fraser University, Burnaby, Canada"},{"name":"Vancouver Prostate Centre, Vancouver, Canada"},{"name":"Department of Urologic Sciences, University of British Columbia, Vancouver, Canada"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2017,7,12]]},"reference":[{"key":"2023051506464823100_btx254-B1","doi-asserted-by":"crossref","first-page":"363","DOI":"10.1038\/nrg2958","article-title":"Genome structural variation discovery and genotyping","volume":"12","author":"Alkan","year":"2011","journal-title":"Nat. Rev. Genet"},{"key":"2023051506464823100_btx254-B2","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J. Mol. Biol"},{"key":"2023051506464823100_btx254-B3","doi-asserted-by":"crossref","first-page":"1005","DOI":"10.1101\/gr.187101","article-title":"Segmental duplications: organization and impact within the current human genome project assembly","volume":"11","author":"Bailey","year":"2001","journal-title":"Genome Res"},{"key":"2023051506464823100_btx254-B4","doi-asserted-by":"crossref","first-page":"e72.","DOI":"10.1093\/nar\/gks001","article-title":"Summarizing and correcting the gc content bias in high-throughput sequencing","volume":"40","author":"Benjamini","year":"2012","journal-title":"Nucl. Acids Res"},{"key":"2023051506464823100_btx254-B5","doi-asserted-by":"crossref","first-page":"627","DOI":"10.1038\/nrg3933","article-title":"Genetic variation and the de novo assembly of human genomes","volume":"16","author":"Chaisson","year":"2015","journal-title":"Nat. Rev. Genet"},{"key":"2023051506464823100_btx254-B6","doi-asserted-by":"crossref","first-page":"608","DOI":"10.1038\/nature13907","article-title":"Resolving the complexity of the human genome using single-molecule sequencing","volume":"517","author":"Chaisson","year":"2015","journal-title":"Nature"},{"key":"2023051506464823100_btx254-B7","doi-asserted-by":"crossref","first-page":"13.","DOI":"10.1186\/s13059-015-0587-3","article-title":"Extending reference assembly models","volume":"16","author":"Church","year":"2015","journal-title":"Genome Biol"},{"key":"2023051506464823100_btx254-B8","doi-asserted-by":"crossref","first-page":"2156","DOI":"10.1093\/bioinformatics\/btr330","article-title":"The variant call format and vcftools","volume":"27","author":"Danecek","year":"2011","journal-title":"Bioinformatics"},{"key":"2023051506464823100_btx254-B9","doi-asserted-by":"crossref","first-page":"2243","DOI":"10.1093\/bioinformatics\/btw139","article-title":"On genomic repeats and reproducibility","volume":"32","author":"Firtina","year":"2016","journal-title":"Bioinformatics"},{"key":"2023051506464823100_btx254-B10","doi-asserted-by":"crossref","first-page":"576","DOI":"10.1038\/nmeth0810-576","article-title":"mrsFAST: a cache-oblivious algorithm for short-read mapping","volume":"7","author":"Hach","year":"2010","journal-title":"Nat. Methods"},{"issue":"Web Server issue","key":"2023051506464823100_btx254-B11","doi-asserted-by":"crossref","first-page":"W494","DOI":"10.1093\/nar\/gku370","article-title":"mrsFAST-Ultra: a compact, SNP-aware mapper for high performance sequencing applications","volume":"42","author":"Hach","year":"2014","journal-title":"Nucl. Acids Res"},{"key":"2023051506464823100_btx254-B12","doi-asserted-by":"crossref","first-page":"1277","DOI":"10.1093\/bioinformatics\/btq152","article-title":"Detection and characterization of novel sequence insertions using paired-end next-generation sequencing","volume":"26","author":"Hajirasouliha","year":"2010","journal-title":"Bioinformatics"},{"key":"2023051506464823100_btx254-B13","doi-asserted-by":"crossref","first-page":"1904","DOI":"10.1093\/bioinformatics\/btv051","article-title":"Methods for the detection and assembly of novel sequence in high-throughput sequencing data","volume":"31","author":"Holtgrewe","year":"2015","journal-title":"Bioinformatics"},{"key":"2023051506464823100_btx254-B14","doi-asserted-by":"crossref","first-page":"593","DOI":"10.1093\/bioinformatics\/btr708","article-title":"ART: a next-generation sequencing read simulator","volume":"28","author":"Huang","year":"2012","journal-title":"Bioinformatics"},{"key":"2023051506464823100_btx254-B15","article-title":"Discovery and genotyping of structural variation from long-read haploid genome sequence data","author":"Huddleston","year":"2016","journal-title":"Genome Res"},{"key":"2023051506464823100_btx254-B16","doi-asserted-by":"crossref","first-page":"226","DOI":"10.1038\/ng.1028","article-title":"De novo assembly and genotyping of variants using colored de Bruijn graphs","volume":"44","author":"Iqbal","year":"2012","journal-title":"Nat. Genet"},{"key":"2023051506464823100_btx254-B17","doi-asserted-by":"crossref","first-page":"256","DOI":"10.1016\/S0022-0000(74)80044-9","article-title":"Approximation algorithms for combinatorial problems","volume":"9","author":"Johnson","year":"1974","journal-title":"J. Comput. Syst. Sci"},{"key":"2023051506464823100_btx254-B18","doi-asserted-by":"crossref","first-page":"558","DOI":"10.1145\/368996.369025","article-title":"Topological sorting of large networks","volume":"5","author":"Kahn","year":"1962","journal-title":"Commun. ACM"},{"key":"2023051506464823100_btx254-B19","doi-asserted-by":"crossref","first-page":"961","DOI":"10.1093\/bioinformatics\/btv273","article-title":"PopIns: population-scale detection of novel sequence insertions","volume":"32","author":"Kehr","year":"2015","journal-title":"Bioinformatics"},{"key":"2023051506464823100_btx254-B20","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1038\/nature06862","article-title":"Mapping and sequencing of structural variation from eight human genomes","volume":"453","author":"Kidd","year":"2008","journal-title":"Nature"},{"key":"2023051506464823100_btx254-B21","doi-asserted-by":"crossref","first-page":"365","DOI":"10.1038\/nmeth.1451","article-title":"Characterization of missing human genome sequences and copy-number polymorphic insertions","volume":"7","author":"Kidd","year":"2010","journal-title":"Nat. Methods"},{"key":"2023051506464823100_btx254-B22","doi-asserted-by":"crossref","first-page":"837","DOI":"10.1016\/j.cell.2010.10.027","article-title":"A human genome structural variation sequencing resource reveals insights into mutational mechanisms","volume":"143","author":"Kidd","year":"2010","journal-title":"Cell"},{"issue":"11 Suppl","key":"2023051506464823100_btx254-B24","doi-asserted-by":"crossref","first-page":"S13","DOI":"10.1038\/nmeth.1374","article-title":"Computational methods for discovering structural variation with next-generation sequencing","volume":"6","author":"Medvedev","year":"2009","journal-title":"Nat. Methods"},{"key":"2023051506464823100_btx254-B25","doi-asserted-by":"crossref","first-page":"59","DOI":"10.1038\/nature09708","article-title":"Mapping copy number variation by population-scale genome sequencing","volume":"470","author":"Mills","year":"2011","journal-title":"Nature"},{"key":"2023051506464823100_btx254-B26","doi-asserted-by":"crossref","first-page":"3451","DOI":"10.1093\/bioinformatics\/btu545","article-title":"MindTheGap: integrated detection and assembly of short and long insertions","volume":"30","author":"Rizk","year":"2014","journal-title":"Bioinformatics"},{"key":"2023051506464823100_btx254-B27","doi-asserted-by":"crossref","first-page":"R51.","DOI":"10.1186\/gb-2013-14-5-r51","article-title":"Characterizing and measuring bias in sequence data","volume":"14","author":"Ross","year":"2013","journal-title":"Genome Biol"},{"key":"2023051506464823100_btx254-B28","doi-asserted-by":"crossref","first-page":"407","DOI":"10.1146\/annurev.genom.7.080505.115618","article-title":"Structural variation of the human genome","volume":"7","author":"Sharp","year":"2006","journal-title":"Annu Rev. Genom. Hum. Genet"},{"key":"2023051506464823100_btx254-B29","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1016\/0022-2836(81)90087-5","article-title":"Identification of common molecular subsequences","volume":"147","author":"Smith","year":"1981","journal-title":"J. Mol. Biol"},{"key":"2023051506464823100_btx254-B30","doi-asserted-by":"crossref","first-page":"2066","DOI":"10.1101\/gr.180893.114","article-title":"Single haplotype assembly of the human genome from a hydatidiform mole","volume":"24","author":"Steinberg","year":"2014","journal-title":"Genome Res"},{"key":"2023051506464823100_btx254-B31","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1038\/nature15393","article-title":"A global reference for human genetic variation","volume":"526","author":"The 1000 Genomes Project Consortium","year":"2015","journal-title":"Nature"},{"key":"2023051506464823100_btx254-B35","article-title":"Computational pan-genomics: status, promises and challenges","author":"The Computational Pan-Genomics Consortium","year":"2017","journal-title":"Brief. Bioinform"},{"key":"2023051506464823100_btx254-B32","first-page":"e126.","article-title":"A genome-wide approach for detecting novel insertion-deletion variants of mid-range size","volume":"44","author":"Xia","year":"2016","journal-title":"Nucl. Acids Res"},{"key":"2023051506464823100_btx254-B33","doi-asserted-by":"crossref","first-page":"821","DOI":"10.1101\/gr.074492.107","article-title":"Velvet: algorithms for de novo short read assembly using de Bruijn graphs","volume":"18","author":"Zerbino","year":"2008","journal-title":"Genome Res"},{"key":"2023051506464823100_btx254-B34","doi-asserted-by":"crossref","first-page":"246","DOI":"10.1038\/nbt.2835","article-title":"Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls","volume":"32","author":"Zook","year":"2014","journal-title":"Nat. Biotechnol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/14\/i161\/50314670\/bioinformatics_33_14_i161.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/14\/i161\/50314670\/bioinformatics_33_14_i161.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,15]],"date-time":"2023-05-15T06:47:11Z","timestamp":1684133231000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/33\/14\/i161\/3953969"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2017,7,12]]},"references-count":34,"journal-issue":{"issue":"14","published-print":{"date-parts":[[2017,7,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btx254","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2017,7,15]]},"published":{"date-parts":[[2017,7,12]]}}}