{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,28]],"date-time":"2026-01-28T13:55:18Z","timestamp":1769608518827,"version":"3.49.0"},"reference-count":29,"publisher":"Oxford University Press (OUP)","issue":"18","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2015,9,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: The number of reported genetic variants is rapidly growing, empowered by ever faster accumulation of next-generation sequencing data. A major issue is comparability. Standards that address the combined problem of inaccurately predicted breakpoints and repeat-induced ambiguities are missing. This decisively lowers the quality of \u2018consensus\u2019 callsets and hampers the removal of duplicate entries in variant databases, which can have deleterious effects in downstream analyses.<\/jats:p>\n               <jats:p>Results: We introduce a sound framework for comparison of deletions that captures both tool-induced inaccuracies and repeat-induced ambiguities. We present a maximum matching algorithm that outputs virtual duplicates among two sets of predictions\/annotations. We demonstrate that our approach is clearly superior over ad hoc criteria, like overlap, and that it can reduce the redundancy among callsets substantially. We also identify large amounts of duplicate entries in the Database of Genomic Variants, which points out the immediate relevance of our approach.<\/jats:p>\n               <jats:p>Availability and implementation: Implementation is open source and available from https:\/\/bitbucket.org\/readdi\/readdi<\/jats:p>\n               <jats:p>Contact: \u00a0roland.wittler@uni-bielefeld.de or t.marschall@mpi-inf.mpg.de<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btv304","type":"journal-article","created":{"date-parts":[[2015,5,16]],"date-time":"2015-05-16T00:35:24Z","timestamp":1431736524000},"page":"2947-2954","source":"Crossref","is-referenced-by-count":18,"title":["Repeat- and error-aware comparison of deletions"],"prefix":"10.1093","volume":"31","author":[{"given":"Roland","family":"Wittler","sequence":"first","affiliation":[{"name":"1 Genome Informatics, Faculty of Technology and Center for Biotechnology (CeBiTec), Bielefeld University, Germany, 2Center for Bioinformatics, Saarland University and Department of Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, Saarbr\u00fccken, Germany, 3Centrum Wiskunde & Informatica (CWI), Life Sciences Group, Amsterdam, The Netherlands and 4Helsinki Institute for Information Technology (HIIT), Department of Computer Science, University of Helsinki, Finland"}]},{"given":"Tobias","family":"Marschall","sequence":"additional","affiliation":[{"name":"1 Genome Informatics, Faculty of Technology and Center for Biotechnology (CeBiTec), Bielefeld University, Germany, 2Center for Bioinformatics, Saarland University and Department of Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, Saarbr\u00fccken, Germany, 3Centrum Wiskunde & Informatica (CWI), Life Sciences Group, Amsterdam, The Netherlands and 4Helsinki Institute for Information Technology (HIIT), Department of Computer Science, University of Helsinki, Finland"}]},{"given":"Alexander","family":"Sch\u00f6nhuth","sequence":"additional","affiliation":[{"name":"1 Genome Informatics, Faculty of Technology and Center for Biotechnology (CeBiTec), Bielefeld University, Germany, 2Center for Bioinformatics, Saarland University and Department of Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, Saarbr\u00fccken, Germany, 3Centrum Wiskunde & Informatica (CWI), Life Sciences Group, Amsterdam, The Netherlands and 4Helsinki Institute for Information Technology (HIIT), Department of Computer Science, University of Helsinki, Finland"}]},{"given":"Veli","family":"M\u00e4kinen","sequence":"additional","affiliation":[{"name":"1 Genome Informatics, Faculty of Technology and Center for Biotechnology (CeBiTec), Bielefeld University, Germany, 2Center for Bioinformatics, Saarland University and Department of Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, Saarbr\u00fccken, Germany, 3Centrum Wiskunde & Informatica (CWI), Life Sciences Group, Amsterdam, The Netherlands and 4Helsinki Institute for Information Technology (HIIT), Department of Computer Science, University of Helsinki, Finland"}]}],"member":"286","published-online":{"date-parts":[[2015,5,15]]},"reference":[{"key":"2023020202234997100_btv304-B1","doi-asserted-by":"crossref","first-page":"363","DOI":"10.1038\/nrg2958","article-title":"Genome structural variation discovery and genotyping","volume":"12","author":"Alkan","year":"2011","journal-title":"Nat. Rev. Genet."},{"key":"2023020202234997100_btv304-B2","doi-asserted-by":"crossref","first-page":"e62803","DOI":"10.1371\/journal.pone.0062803","article-title":"Equivalent indels\u2013ambiguous functional classes and redundancy in databases","volume":"8","author":"Assmus","year":"2013","journal-title":"PLoS ONE"},{"key":"2023020202234997100_btv304-B3","doi-asserted-by":"crossref","first-page":"677","DOI":"10.1038\/nmeth.1363","article-title":"Breakdancer: an algorithm for high-resolution mapping of genomic structural variation","volume":"6","author":"Chen","year":"2009","journal-title":"Nat. Methods"},{"key":"2023020202234997100_btv304-B4","doi-asserted-by":"crossref","first-page":"2156","DOI":"10.1093\/bioinformatics\/btr330","article-title":"The variant call format and VCFtools","volume":"27","author":"Danecek","year":"2011","journal-title":"Bioinformatics"},{"key":"2023020202234997100_btv304-B5","doi-asserted-by":"crossref","first-page":"2224","DOI":"10.1101\/gr.126599.111","article-title":"Assemblathon 1: A competitive assessment of de novo short read assembly methods","volume":"21","author":"Earl","year":"2011","journal-title":"Genome Res."},{"key":"2023020202234997100_btv304-B6","first-page":"77","article-title":"An algebraic dynamic programming approach to the analysis of recombinant DNA sequences","volume-title":"Workshop on Algorithmic Ascpects of Advanced Programming Languages (WAAAPL)","author":"Giegerich","year":"1999"},{"key":"2023020202234997100_btv304-B7","doi-asserted-by":"crossref","first-page":"525","DOI":"10.1016\/j.jcss.2004.03.004","article-title":"Linear time algorithms for finding and representing all the tandem repeats in a string","volume":"69","author":"Gusfield","year":"2004","journal-title":"J. Comput. Syst. Sci."},{"key":"2023020202234997100_btv304-B8","doi-asserted-by":"crossref","first-page":"38","DOI":"10.1093\/nar\/30.1.38","article-title":"The ensembl genome database project","volume":"30","author":"Hubbard","year":"2002","journal-title":"Nucleic Acids Res."},{"key":"2023020202234997100_btv304-B9","doi-asserted-by":"crossref","first-page":"722","DOI":"10.1093\/bioinformatics\/btq027","article-title":"Microindel detection in short-read sequence data","volume":"26","author":"Krawitz","year":"2010","journal-title":"Bioinformatics"},{"key":"2023020202234997100_btv304-B10","doi-asserted-by":"crossref","first-page":"226","DOI":"10.1038\/nbt.2134","article-title":"Detecting and annotating genetic variations using the hugeseq pipeline","volume":"30","author":"Lam","year":"2012","journal-title":"Nat. Biotechnol."},{"key":"2023020202234997100_btv304-B11","doi-asserted-by":"crossref","first-page":"157","DOI":"10.1016\/0196-6774(89)90010-2","article-title":"Fast parallel and serial approximate string matching","volume":"10","author":"Landau","year":"1989","journal-title":"J. Algorithms"},{"key":"2023020202234997100_btv304-B12","doi-asserted-by":"crossref","first-page":"e254","DOI":"10.1371\/journal.pbio.0050254","article-title":"The diploid genome sequence of an individual human","volume":"5","author":"Levy","year":"2007","journal-title":"PLoS Biol."},{"key":"2023020202234997100_btv304-B13","doi-asserted-by":"crossref","first-page":"298","DOI":"10.1101\/gr.6725608","article-title":"Uncertainty in homology inferences: assessing and improving genomic sequence alignment","volume":"18","author":"Lunter","year":"2008","journal-title":"Genome Res."},{"key":"2023020202234997100_btv304-B14","doi-asserted-by":"crossref","first-page":"S13","DOI":"10.1186\/1471-2105-14-S15-S13","article-title":"Haploid to diploid alignment for variation calling assessment","volume":"14","author":"M\u00e4kinen","year":"2013","journal-title":"BMC Bioinformatics"},{"key":"2023020202234997100_btv304-B15","doi-asserted-by":"crossref","first-page":"762","DOI":"10.1101\/gr.143677.112","article-title":"Breakpoint profiling of 64 cancer genomes reveals numerous complex rearrangements spawned by homology-independent mechanisms","volume":"23","author":"Malhotra","year":"2013","journal-title":"Genome Res."},{"key":"2023020202234997100_btv304-B16","doi-asserted-by":"crossref","first-page":"2875","DOI":"10.1093\/bioinformatics\/bts566","article-title":"CLEVER: clique-enumerating variant finder","volume":"28","author":"Marschall","year":"2012","journal-title":"Bioinformatics"},{"key":"2023020202234997100_btv304-B17","doi-asserted-by":"crossref","first-page":"3143","DOI":"10.1093\/bioinformatics\/btt556","article-title":"MATE-CLEVER: Mendelian-inheritance-aware discovery and genotyping of midsize and long indels","volume":"29","author":"Marschall","year":"2013","journal-title":"Bioinformatics"},{"key":"2023020202234997100_btv304-B18","doi-asserted-by":"crossref","first-page":"S13","DOI":"10.1038\/nmeth.1374","article-title":"Computational methods for discovering structural variation with next-generation sequencing","volume":"6","author":"Medvedev","year":"2009","journal-title":"Nat. Methods"},{"key":"2023020202234997100_btv304-B19","doi-asserted-by":"crossref","first-page":"e1002821","DOI":"10.1371\/journal.pcbi.1002821","article-title":"Structural variation and medical genomics","volume":"8","author":"Raphael","year":"2012","journal-title":"PLoS Comput. Biol."},{"key":"2023020202234997100_btv304-B20","doi-asserted-by":"crossref","first-page":"912","DOI":"10.1038\/ng.3036","article-title":"Integrating mapping-, assembly-and haplotype-based approaches for calling variants in clinical sequencing applications","volume":"46","author":"Rimmer","year":"2014","journal-title":"Nat. Genet."},{"key":"2023020202234997100_btv304-B21","doi-asserted-by":"crossref","first-page":"308","DOI":"10.1093\/nar\/29.1.308","article-title":"dbSNP: the NCBI database of genetic variation","volume":"29","author":"Sherry","year":"2001","journal-title":"Nucleic Acids Res."},{"key":"2023020202234997100_btv304-B22","doi-asserted-by":"crossref","first-page":"253","DOI":"10.1186\/1471-2164-14-253","article-title":"Massively-parallel sequencing of genes on a single chromosome: a comparison of solution hybrid selection and flow sorting","volume":"14","author":"Teer","year":"2013","journal-title":"BMC Genomics"},{"key":"2023020202234997100_btv304-B23","doi-asserted-by":"crossref","first-page":"1061","DOI":"10.1038\/nature09534","article-title":"A map of human genome variation from population-scale sequencing","volume":"467","author":"The 1000 Genomes Project Consortium","year":"2010","journal-title":"Nature"},{"key":"2023020202234997100_btv304-B24","doi-asserted-by":"crossref","first-page":"818","DOI":"10.1038\/ng.3021","article-title":"Whole-genome sequence variation, population structure and demographic history of the Dutch population","volume":"46","author":"The Genome of the Netherlands Consortium","year":"2014","journal-title":"Nat. Genet."},{"key":"2023020202234997100_btv304-B25","first-page":"557","article-title":"Repetitive DNA and next-generation sequencing: computational challenges and solutions","volume":"13","author":"Treangen","year":"2012","journal-title":"Nat. Rev. Genet."},{"key":"2023020202234997100_btv304-B26","doi-asserted-by":"crossref","first-page":"187","DOI":"10.1093\/bioinformatics\/btu591","article-title":"Consensus genotyper for exome sequencing (CGES): improving the quality of exome variant genotypes","volume":"31","author":"Trubetskoy","year":"2015","journal-title":"Bioinformatics"},{"key":"2023020202234997100_btv304-B27","doi-asserted-by":"crossref","first-page":"405","DOI":"10.1093\/bfgp\/elq025","article-title":"Detecting structural variations in the human genome using next generation sequencing","volume":"9","author":"Xi","year":"2010","journal-title":"Brief Funct. Genomics"},{"key":"2023020202234997100_btv304-B28","doi-asserted-by":"crossref","first-page":"2865","DOI":"10.1093\/bioinformatics\/btp394","article-title":"Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads","volume":"25","author":"Ye","year":"2009","journal-title":"Bioinformatics"},{"key":"2023020202234997100_btv304-B29","doi-asserted-by":"crossref","first-page":"205","DOI":"10.1159\/000095916","article-title":"Development of bioinformatics resources for display and analysis of copy number and other structural variants in the human genome","volume":"115","author":"Zhang","year":"2006","journal-title":"Cytogenet. Genome Res."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/31\/18\/2947\/49035299\/bioinformatics_31_18_2947.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/31\/18\/2947\/49035299\/bioinformatics_31_18_2947.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T03:48:03Z","timestamp":1675309683000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/31\/18\/2947\/240690"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,5,15]]},"references-count":29,"journal-issue":{"issue":"18","published-print":{"date-parts":[[2015,9,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btv304","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2015,9,15]]},"published":{"date-parts":[[2015,5,15]]}}}