{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,4]],"date-time":"2026-04-04T17:40:48Z","timestamp":1775324448964,"version":"3.50.1"},"reference-count":23,"publisher":"Oxford University Press (OUP)","issue":"15","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2016,8,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Results: Here, we present a comprehensive analysis on the reproducibility of computational characterization of genomic variants using high throughput sequencing data. We reanalyzed the same datasets twice, using the same tools with the same parameters, where we only altered the order of reads in the input (i.e. FASTQ file). Reshuffling caused the reads from repetitive regions being mapped to different locations in the second alignment, and we observed similar results when we only applied a scatter\/gather approach for read mapping\u2014without prior shuffling. Our results show that, some of the most common variation discovery algorithms do not handle the ambiguous read mappings accurately when random locations are selected. In addition, we also observed that even when the exact same alignment is used, the GATK HaplotypeCaller generates slightly different call sets, which we pinpoint to the variant filtration step. We conclude that, algorithms at each step of genomic variation discovery and characterization need to treat ambiguous mappings in a deterministic fashion to ensure full replication of results.<\/jats:p>\n               <jats:p>Availability and Implementation: Code, scripts and the generated VCF files are available at DOI:10.5281\/zenodo.32611.<\/jats:p>\n               <jats:p>Contact: \u00a0calkan@cs.bilkent.edu.tr<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btw139","type":"journal-article","created":{"date-parts":[[2016,3,13]],"date-time":"2016-03-13T01:18:47Z","timestamp":1457831927000},"page":"2243-2247","source":"Crossref","is-referenced-by-count":42,"title":["On genomic repeats and reproducibility"],"prefix":"10.1093","volume":"32","author":[{"given":"Can","family":"Firtina","sequence":"first","affiliation":[{"name":"Department of Computer Engineering, Bilkent University, Ankara 06800, Turkey"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Can","family":"Alkan","sequence":"additional","affiliation":[{"name":"Department of Computer Engineering, Bilkent University, Ankara 06800, Turkey"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2016,3,11]]},"reference":[{"key":"2023020112470605000_btw139-B1","doi-asserted-by":"crossref","first-page":"1061","DOI":"10.1038\/ng.437","article-title":"Personalized copy number and segmental duplication maps using next-generation sequencing","volume":"41","author":"Alkan","year":"2009","journal-title":"Nat. Genet"},{"key":"2023020112470605000_btw139-B2","doi-asserted-by":"crossref","first-page":"1665","DOI":"10.1101\/gr.092841.109","article-title":"The ClinSeq project: piloting large-scale genome sequencing for research in genomic medicine","volume":"19","author":"Biesecker","year":"2009","journal-title":"Genome Res"},{"key":"2023020112470605000_btw139-B3","first-page":"ID 456479.","article-title":"A comparison of variant calling pipelines using genome in a bottle as a reference","author":"Cornish","year":"2015","journal-title":"Biomed. Res. Int"},{"key":"2023020112470605000_btw139-B4","doi-asserted-by":"crossref","first-page":"491","DOI":"10.1038\/ng.806","article-title":"A framework for variation discovery and genotyping using next-generation DNA sequencing data","volume":"43","author":"DePristo","year":"2011","journal-title":"Nat. Genet"},{"key":"2023020112470605000_btw139-B5","author":"Garrison","year":"2012"},{"key":"2023020112470605000_btw139-B6","doi-asserted-by":"crossref","first-page":"296","DOI":"10.1038\/ng.3200","article-title":"Large multiallelic copy number variations in humans","volume":"47","author":"Handsaker","year":"2015","journal-title":"Nat. Genet"},{"key":"2023020112470605000_btw139-B7","doi-asserted-by":"crossref","first-page":"1270","DOI":"10.1101\/gr.088633.108","article-title":"Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes","volume":"19","author":"Hormozdiari","year":"2009","journal-title":"Genome Res"},{"key":"2023020112470605000_btw139-B8","doi-asserted-by":"crossref","first-page":"i350","DOI":"10.1093\/bioinformatics\/btq216","article-title":"Next-generation VariationHunter: combinatorial algorithms for transposon insertion discovery","volume":"26","author":"Hormozdiari","year":"2010","journal-title":"Bioinformatics"},{"key":"2023020112470605000_btw139-B9","doi-asserted-by":"crossref","first-page":"2203","DOI":"10.1101\/gr.120501.111","article-title":"Simultaneous structural variation discovery among multiple paired-end sequenced genomes","volume":"21","author":"Hormozdiari","year":"2011","journal-title":"Genome Res"},{"key":"2023020112470605000_btw139-B10","doi-asserted-by":"crossref","first-page":"e0138259","DOI":"10.1371\/journal.pone.0138259","article-title":"Robustness of massively parallel sequencing platforms","volume":"10","author":"Kavak","year":"2015","journal-title":"PLoS One"},{"key":"2023020112470605000_btw139-B11","doi-asserted-by":"crossref","first-page":"996","DOI":"10.1101\/gr.229102","article-title":"The human genome browser at UCSC","volume":"12","author":"Kent","year":"2002","journal-title":"Genome Res"},{"key":"2023020112470605000_btw139-B12","doi-asserted-by":"crossref","first-page":"R25","DOI":"10.1186\/gb-2009-10-3-r25","article-title":"Ultrafast and memory-efficient alignment of short DNA sequences to the human genome","volume":"10","author":"Langmead","year":"2009","journal-title":"Genome Biol"},{"key":"2023020112470605000_btw139-B13","doi-asserted-by":"crossref","first-page":"R84","DOI":"10.1186\/gb-2014-15-6-r84","article-title":"LUMPY: a probabilistic framework for structural variant discovery","volume":"15","author":"Layer","year":"2014","journal-title":"Genome Biol"},{"key":"2023020112470605000_btw139-B14","author":"Li","year":"2013"},{"key":"2023020112470605000_btw139-B15","doi-asserted-by":"crossref","first-page":"2078","DOI":"10.1093\/bioinformatics\/btp352","article-title":"The sequence alignment\/map format and SAMtools","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2023020112470605000_btw139-B16","doi-asserted-by":"crossref","first-page":"841","DOI":"10.1093\/bioinformatics\/btq033","article-title":"BEDTools: a flexible suite of utilities for comparing genomic features","volume":"26","author":"Quinlan","year":"2010","journal-title":"Bioinformatics"},{"key":"2023020112470605000_btw139-B17","doi-asserted-by":"crossref","first-page":"i333","DOI":"10.1093\/bioinformatics\/bts378","article-title":"DELLY: structural variant discovery by integrated paired-end and split-read analysis","volume":"28","author":"Rausch","year":"2012","journal-title":"Bioinformatics"},{"key":"2023020112470605000_btw139-B18","doi-asserted-by":"crossref","first-page":"912","DOI":"10.1038\/ng.3036","article-title":"Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications","volume":"46","author":"Rimmer","year":"2014","journal-title":"Nat. Genet"},{"key":"2023020112470605000_btw139-B19","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1038\/nature15393","article-title":"A global reference for human genetic variation","volume":"526","author":"The 1000 Genomes Project Consortium","year":"2015","journal-title":"Nature"},{"key":"2023020112470605000_btw139-B20","doi-asserted-by":"crossref","first-page":"36","DOI":"10.1038\/nrg3117","article-title":"Repetitive DNA and next-generation sequencing: computational challenges and solutions","volume":"13","author":"Treangen","year":"2012","journal-title":"Nat. Rev. Genet"},{"key":"2023020112470605000_btw139-B21","first-page":"11.10.1","article-title":"From FASTQ data to high confidence variant calls: the genome analysis toolkit best practices pipeline","volume":"11","author":"Van der Auwera","year":"2013","journal-title":"Curr. Protoc. Bioinformatics"},{"key":"2023020112470605000_btw139-B22","doi-asserted-by":"crossref","first-page":"2592","DOI":"10.1093\/bioinformatics\/bts505","article-title":"RazerS 3: faster, fully sensitive read mapping","volume":"28","author":"Weese","year":"2012","journal-title":"Bioinformatics"},{"key":"2023020112470605000_btw139-B23","doi-asserted-by":"crossref","first-page":"246","DOI":"10.1038\/nbt.2835","article-title":"Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls","volume":"32","author":"Zook","year":"2014","journal-title":"Nat. Biotechnol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/32\/15\/2243\/49020296\/bioinformatics_32_15_2243.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/32\/15\/2243\/49020296\/bioinformatics_32_15_2243.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,1]],"date-time":"2023-02-01T22:50:00Z","timestamp":1675291800000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/32\/15\/2243\/1743552"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,3,11]]},"references-count":23,"journal-issue":{"issue":"15","published-print":{"date-parts":[[2016,8,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btw139","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2016,8,1]]},"published":{"date-parts":[[2016,3,11]]}}}