{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,27]],"date-time":"2026-02-27T06:15:50Z","timestamp":1772172950470,"version":"3.50.1"},"update-to":[{"DOI":"10.1371\/journal.pcbi.1009269","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2022,3,11]],"date-time":"2022-03-11T00:00:00Z","timestamp":1646956800000}}],"reference-count":58,"publisher":"Public Library of Science (PLoS)","issue":"2","license":[{"start":{"date-parts":[[2022,2,17]],"date-time":"2022-02-17T00:00:00Z","timestamp":1645056000000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"turku university foundation"},{"name":"state research funding from the turku university hospital"},{"DOI":"10.13039\/501100000781","name":"european research council","doi-asserted-by":"publisher","award":["677943"],"award-info":[{"award-number":["677943"]}],"id":[{"id":"10.13039\/501100000781","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100002341","name":"academy of finland","doi-asserted-by":"publisher","award":["296801, 310561, 314443, 329278, 335434, 335611"],"award-info":[{"award-number":["296801, 310561, 314443, 329278, 335434, 335611"]}],"id":[{"id":"10.13039\/501100002341","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100006306","name":"sigrid jus\u00e9liuksen s\u00e4\u00e4ti\u00f6","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100006306","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100019391","name":"University of Turku Graduate School","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100019391","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100013840","name":"Biocenter Finland","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100013840","id-type":"DOI","asserted-by":"crossref"}]},{"name":"ELIXIR Finland"}],"content-domain":{"domain":["www.ploscompbiol.org"],"crossmark-restriction":false},"short-container-title":["PLoS Comput Biol"],"abstract":"<jats:p>Insertions and deletions (indels) in human genomes are associated with a wide range of phenotypes, including various clinical disorders. High-throughput, next generation sequencing (NGS) technologies enable the detection of short genetic variants, such as single nucleotide variants (SNVs) and indels. However, the variant calling accuracy for indels remains considerably lower than for SNVs. Here we present a comparative study of the performance of variant calling tools for indel calling, evaluated with a wide repertoire of NGS datasets. While there is no single optimal tool to suit all circumstances, our results demonstrate that the choice of variant calling tool greatly impacts the precision and recall of indel calling. Furthermore, to reliably detect indels, it is essential to choose NGS technologies that offer a long read length and high coverage coupled with specific variant calling tools.<\/jats:p>","DOI":"10.1371\/journal.pcbi.1009269","type":"journal-article","created":{"date-parts":[[2022,2,17]],"date-time":"2022-02-17T13:50:31Z","timestamp":1645105831000},"page":"e1009269","update-policy":"https:\/\/doi.org\/10.1371\/journal.pcbi.corrections_policy","source":"Crossref","is-referenced-by-count":20,"title":["Tool evaluation for the detection of variably sized indels from next generation whole genome and targeted sequencing data"],"prefix":"10.1371","volume":"18","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-1275-4710","authenticated-orcid":true,"given":"Ning","family":"Wang","sequence":"first","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1468-1631","authenticated-orcid":true,"given":"Vladislav","family":"Lysenkov","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0353-6826","authenticated-orcid":true,"given":"Katri","family":"Orte","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0562-8054","authenticated-orcid":true,"given":"Veli","family":"Kairisto","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4142-3256","authenticated-orcid":true,"given":"Juhani","family":"Aakko","sequence":"additional","affiliation":[]},{"given":"Sofia","family":"Khan","sequence":"additional","affiliation":[]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5648-4532","authenticated-orcid":true,"given":"Laura L.","family":"Elo","sequence":"additional","affiliation":[]}],"member":"340","published-online":{"date-parts":[[2022,2,17]]},"reference":[{"key":"pcbi.1009269.ref001","doi-asserted-by":"crossref","first-page":"5463","DOI":"10.1073\/pnas.74.12.5463","article-title":"DNA sequencing with chain-terminating inhibitors","volume":"74","author":"F Sanger","year":"1977","journal-title":"Proc Natl Acad Sci"},{"key":"pcbi.1009269.ref002","article-title":"Performance comparison of benchtop high-throughput sequencing platforms","author":"NJ Loman","year":"2012","journal-title":"Nat Biotechnol"},{"key":"pcbi.1009269.ref003","article-title":"Trends in next-generation sequencing and a new era for whole genome sequencing","author":"ST Park","year":"2016","journal-title":"International Neurourology Journal"},{"key":"pcbi.1009269.ref004","article-title":"Copy number signatures and mutational processes in ovarian carcinoma","author":"G Macintyre","year":"2018","journal-title":"Nat Genet"},{"key":"pcbi.1009269.ref005","article-title":"Erratum: Sequence data and association statistics from 12,940 type 2 diabetes cases and controls","author":"J Flannick","year":"2018","journal-title":"Scientific data"},{"key":"pcbi.1009269.ref006","article-title":"Whole genome sequencing of 91 multiplex schizophrenia families reveals increased burden of rare, exonic copy number variation in schizophrenia probands and genetic heterogeneity","author":"FF Khan","year":"2018","journal-title":"Schizophr Res"},{"key":"pcbi.1009269.ref007","doi-asserted-by":"crossref","first-page":"4298","DOI":"10.1093\/nar\/gks043","article-title":"Performance comparison and evaluation of software tools for microRNA deep-sequencing data analysis","volume":"40","author":"Y Li","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"pcbi.1009269.ref008","doi-asserted-by":"crossref","DOI":"10.1093\/hmg\/ddq400","article-title":"Small insertions and deletions (INDELs) in human genomes","volume":"19","author":"JM Mullaney","year":"2010","journal-title":"Hum Mol Genet"},{"key":"pcbi.1009269.ref009","article-title":"Structural variation detection using next-generation sequencing data: A comparative technical review","author":"P Guan","year":"2016","journal-title":"Methods"},{"key":"pcbi.1009269.ref010","article-title":"Genetic analysis of indel markers in three loci associated with Parkinson\u2019s disease","author":"Z Huo","year":"2017","journal-title":"PLoS One"},{"key":"pcbi.1009269.ref011","article-title":"Paired-end mapping reveals extensive structural variation in the human genome","author":"JO Korbel","year":"2007","journal-title":"Science"},{"key":"pcbi.1009269.ref012","doi-asserted-by":"crossref","first-page":"305","DOI":"10.1093\/bfgp\/elv014","article-title":"A decade of structural variants: Description, history and methods to detect structural variation","volume":"14","author":"G Escaram\u00eds","year":"2015","journal-title":"Brief Funct Genomics"},{"key":"pcbi.1009269.ref013","doi-asserted-by":"crossref","first-page":"2865","DOI":"10.1093\/bioinformatics\/btp394","article-title":"Pindel: A pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads","volume":"25","author":"K Ye","year":"2009","journal-title":"Bioinformatics"},{"key":"pcbi.1009269.ref014","article-title":"Mapping copy number variation by population-scale genome sequencing","author":"RE Mills","year":"2011","journal-title":"Nature"},{"key":"pcbi.1009269.ref015","article-title":"A practical comparison of De Novo genome assembly software tools for next-generation sequencing technologies","volume":"6","author":"W Zhang","year":"2011","journal-title":"PLoS One"},{"key":"pcbi.1009269.ref016","article-title":"Scaling accurate genetic variant discovery to tens of thousands of samples","volume":"201178","author":"R Poplin","year":"2017","journal-title":"bioRxiv"},{"key":"pcbi.1009269.ref017","doi-asserted-by":"crossref","first-page":"912","DOI":"10.1038\/ng.3036","article-title":"Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications","volume":"46","author":"A Rimmer","year":"2014","journal-title":"Nat Genet"},{"key":"pcbi.1009269.ref018","article-title":"Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM","volume":"1303","author":"H. Li","year":"2013","journal-title":"arXiv"},{"key":"pcbi.1009269.ref019","doi-asserted-by":"crossref","first-page":"2283","DOI":"10.1093\/bioinformatics\/btp373","article-title":"VarScan: Variant detection in massively parallel sequencing of individual and pooled samples","volume":"25","author":"DC Koboldt","year":"2009","journal-title":"Bioinformatics"},{"key":"pcbi.1009269.ref020","article-title":"Detection of structural DNA variation from next generation sequencing data: A review of informatic approaches","author":"HJ Abel","year":"2013","journal-title":"Cancer Genetics"},{"key":"pcbi.1009269.ref021","doi-asserted-by":"crossref","DOI":"10.1093\/bioinformatics\/bts378","article-title":"DELLY: Structural variant discovery by integrated paired-end and split-read analysis","volume":"28","author":"T Rausch","year":"2012","journal-title":"Bioinformatics"},{"key":"pcbi.1009269.ref022","doi-asserted-by":"crossref","first-page":"591","DOI":"10.1038\/s41592-018-0051-x","article-title":"Strelka2: fast and accurate calling of germline and somatic variants","volume":"15","author":"S Kim","year":"2018","journal-title":"Nat Methods"},{"key":"pcbi.1009269.ref023","doi-asserted-by":"crossref","first-page":"983","DOI":"10.1038\/nbt.4235","article-title":"A universal snp and small-indel variant caller using deep neural networks","author":"R Poplin","year":"2018","journal-title":"Nature Biotechnology"},{"key":"pcbi.1009269.ref024","article-title":"Evaluating Variant Calling Tools for Non-Matched Next-Generation Sequencing Data","author":"S Sandmann","year":"2017","journal-title":"Sci Rep"},{"key":"pcbi.1009269.ref025","article-title":"Comparison of three variant callers for human whole genome sequencing","author":"A Supernat","year":"2018","journal-title":"Sci Rep"},{"key":"pcbi.1009269.ref026","doi-asserted-by":"crossref","DOI":"10.1038\/s41598-020-77218-4","article-title":"Accuracy and efficiency of germline variant calling pipelines for human genome data","volume":"10","author":"S Zhao","year":"2020","journal-title":"Sci Rep"},{"key":"pcbi.1009269.ref027","article-title":"Benchmarking variant callers in next-generation and third-generation sequencing analysis","volume":"22","author":"S Pei","year":"2021","journal-title":"Brief Bioinform"},{"key":"pcbi.1009269.ref028","doi-asserted-by":"crossref","DOI":"10.1186\/s13059-019-1720-5","article-title":"Comprehensive evaluation of structural variation detection algorithms for whole genome sequencing","volume":"20","author":"S Kosugi","year":"2019","journal-title":"Genome Biol"},{"key":"pcbi.1009269.ref029","doi-asserted-by":"crossref","DOI":"10.1038\/s41467-019-11146-4","article-title":"Comprehensive evaluation and characterisation of short read general-purpose structural variant calling software","volume":"10","author":"DL Cameron","year":"2019","journal-title":"Nat Commun"},{"key":"pcbi.1009269.ref030","article-title":"Disease-targeted sequencing: A cornerstone in the clinic","author":"HL Rehm","year":"2013","journal-title":"Nature Reviews Genetics"},{"key":"pcbi.1009269.ref031","doi-asserted-by":"crossref","first-page":"2113","DOI":"10.1371\/journal.pbio.0050254","article-title":"The diploid genome sequence of an individual human","volume":"5","author":"S Levy","year":"2007","journal-title":"PLoS Biol"},{"key":"pcbi.1009269.ref032","doi-asserted-by":"crossref","first-page":"246","DOI":"10.1038\/nbt.2835","article-title":"Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls","volume":"32","author":"JM Zook","year":"2014","journal-title":"Nat Biotechnol"},{"key":"pcbi.1009269.ref033","doi-asserted-by":"crossref","first-page":"608","DOI":"10.1038\/nature13907","article-title":"Resolving the complexity of the human genome using single-molecule sequencing","volume":"517","author":"MJP Chaisson","year":"2015","journal-title":"Nature"},{"key":"pcbi.1009269.ref034","doi-asserted-by":"crossref","first-page":"3694","DOI":"10.1093\/bioinformatics\/btv440","article-title":"FermiKit: Assembly-based variant calling for Illumina resequencing data","volume":"31","author":"H. Li","year":"2015","journal-title":"Bioinformatics"},{"key":"pcbi.1009269.ref035","doi-asserted-by":"crossref","first-page":"2078","DOI":"10.1093\/bioinformatics\/btp352","article-title":"The Sequence Alignment\/Map format and SAMtools","volume":"25","author":"H Li","year":"2009","journal-title":"Bioinformatics"},{"key":"pcbi.1009269.ref036","doi-asserted-by":"crossref","first-page":"2156","DOI":"10.1093\/bioinformatics\/btr330","article-title":"The variant call format and VCFtools","volume":"27","author":"P Danecek","year":"2011","journal-title":"Bioinformatics"},{"key":"pcbi.1009269.ref037","article-title":"Mining sequential patterns by pattern-growth: The prefixspan approach","author":"J Pei","year":"2004","journal-title":"IEEE Trans Knowl Data Eng"},{"key":"pcbi.1009269.ref038","doi-asserted-by":"crossref","first-page":"157","DOI":"10.1101\/gr.210500.116","article-title":"A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree","volume":"27","author":"MA Eberle","year":"2017","journal-title":"Genome Res"},{"key":"pcbi.1009269.ref039","article-title":"DbVar and DGVa: Public archives for genomic structural variation","volume":"41","author":"I Lappalainen","year":"2013","journal-title":"Nucleic Acids Res"},{"key":"pcbi.1009269.ref040","doi-asserted-by":"crossref","first-page":"308","DOI":"10.1093\/nar\/29.1.308","article-title":"dbSNP: the NCBI database of genetic variation","volume":"29","author":"ST Sherry","year":"2001","journal-title":"Nucleic Acids Res"},{"key":"pcbi.1009269.ref041","article-title":"Jointly aligning a group of DNA reads improves accuracy of identifying large deletions","volume":"46","author":"AMS Shrestha","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"pcbi.1009269.ref042","article-title":"SMaSH: A benchmarking toolkit for human genome variant calling","author":"A Talwalkar","year":"2014","journal-title":"Bioinformatics"},{"key":"pcbi.1009269.ref043","article-title":"ART: A next-generation sequencing read simulator","author":"W Huang","year":"2012","journal-title":"Bioinformatics"},{"key":"pcbi.1009269.ref044","article-title":"Extensive sequencing of seven human genomes to characterize benchmark reference materials","volume":"3","author":"JM Zook","year":"2016","journal-title":"Sci Data"},{"key":"pcbi.1009269.ref045","doi-asserted-by":"crossref","first-page":"2379","DOI":"10.1056\/NEJMoa1311347","article-title":"Somatic mutations of calreticulin in myeloproliferative neoplasms","volume":"369","author":"T Klampfl","year":"2013","journal-title":"N Engl J Med"},{"key":"pcbi.1009269.ref046","doi-asserted-by":"crossref","first-page":"230","DOI":"10.1111\/j.1365-2141.2008.07328.x","article-title":"Rapid and sensitive screening for CEBPA mutations in acute myeloid leukaemia","volume":"143","author":"T Benthaus","year":"2008","journal-title":"Br J Haematol"},{"key":"pcbi.1009269.ref047","doi-asserted-by":"crossref","first-page":"555","DOI":"10.1038\/s41587-019-0054-x","article-title":"Best practices for benchmarking germline small-variant calls in human genomes","volume":"37","author":"P Krusche","year":"2019","journal-title":"Nat Biotechnol"},{"key":"pcbi.1009269.ref048","doi-asserted-by":"crossref","first-page":"841","DOI":"10.1093\/bioinformatics\/btq033","article-title":"BEDTools: A flexible suite of utilities for comparing genomic features","volume":"26","author":"AR Quinlan","year":"2010","journal-title":"Bioinformatics"},{"key":"pcbi.1009269.ref049","doi-asserted-by":"crossref","DOI":"10.1371\/journal.pcbi.1004572","article-title":"Wham: Identifying Structural Variants of Biological Consequence","volume":"11","author":"ZN Kronenberg","year":"2015","journal-title":"PLoS Comput Biol"},{"key":"pcbi.1009269.ref050","article-title":"Structural variation in the human genome","author":"L Feuk","year":"2006","journal-title":"Nature Reviews Genetics"},{"key":"pcbi.1009269.ref051","first-page":"171","article-title":"Structural variation in the sequencing era","author":"SS Ho","year":"2020","journal-title":"Nature Reviews Genetics"},{"key":"pcbi.1009269.ref052","article-title":"Structural variant calling: The long and the short of it","author":"M Mahmoud","year":"2019","journal-title":"Genome Biology"},{"key":"pcbi.1009269.ref053","article-title":"Structural variants are a major source of gene expression differences in humans and often affect multiple nearby genes","author":"AJ Scott","year":"2021","journal-title":"Genome Res"},{"key":"pcbi.1009269.ref054","doi-asserted-by":"crossref","DOI":"10.1038\/srep43169","article-title":"Evaluating Variant Calling Tools for Non-Matched Next-Generation Sequencing Data","volume":"7","author":"S Sandmann","year":"2017","journal-title":"Sci Rep"},{"key":"pcbi.1009269.ref055","doi-asserted-by":"crossref","first-page":"595","DOI":"10.1038\/s41592-018-0054-7","article-title":"A synthetic-diploid benchmark for accurate variant-calling evaluation","volume":"15","author":"H Li","year":"2018","journal-title":"Nat Methods"},{"key":"pcbi.1009269.ref056","doi-asserted-by":"crossref","DOI":"10.1186\/1756-0500-7-864","article-title":"Comparison of insertion\/deletion calling algorithms on human next-generation sequencing data","volume":"7","author":"DH Ghoneim","year":"2014","journal-title":"BMC Res Notes"},{"key":"pcbi.1009269.ref057","article-title":"Evaluating the performance of tools used to call minority variants from whole genome short-read data","author":"K Said Mohammed","year":"2018","journal-title":"Wellcome Open Res"},{"key":"pcbi.1009269.ref058","doi-asserted-by":"crossref","first-page":"89","DOI":"10.1186\/s13073-014-0089-z","article-title":"Reducing INDEL calling errors in whole genome and exome sequencing data","volume":"6","author":"H Fang","year":"2014","journal-title":"Genome Med"}],"updated-by":[{"DOI":"10.1371\/journal.pcbi.1009269","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2022,3,11]],"date-time":"2022-03-11T00:00:00Z","timestamp":1646956800000}}],"container-title":["PLOS Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1009269","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,17]],"date-time":"2023-11-17T12:48:01Z","timestamp":1700225281000},"score":1,"resource":{"primary":{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1009269"}},"subtitle":[],"editor":[{"given":"Sergei L.","family":"Kosakovsky Pond","sequence":"first","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2022,2,17]]},"references-count":58,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2022,2,17]]}},"URL":"https:\/\/doi.org\/10.1371\/journal.pcbi.1009269","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2021.07.15.452444","asserted-by":"object"}]},"ISSN":["1553-7358"],"issn-type":[{"value":"1553-7358","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,2,17]]}}}