{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,18]],"date-time":"2026-03-18T23:15:52Z","timestamp":1773875752410,"version":"3.50.1"},"reference-count":33,"publisher":"Oxford University Press (OUP)","issue":"10","license":[{"start":{"date-parts":[[2017,11,24]],"date-time":"2017-11-24T00:00:00Z","timestamp":1511481600000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["T32GM105490"],"award-info":[{"award-number":["T32GM105490"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100006778","name":"Georgia Institute of Technology","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100006778","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2018,5,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>The standard protocol for detecting variation in DNA is to map millions of short sequence reads to a known reference and find loci that differ. While this approach works well, it cannot be applied where the sample contains dense variants or is too distant from known references. De novo assembly or hybrid methods can recover genomic variation, but the cost of computation is often much higher. We developed a novel k-mer algorithm and software implementation, Kestrel, capable of characterizing densely packed SNPs and large indels without mapping, assembly or de Bruijn graphs.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>When applied to mosaic penicillin binding protein (PBP) genes in Streptococcus pneumoniae, we found near perfect concordance with assembled contigs at a fraction of the CPU time. Multilocus sequence typing (MLST) with this approach was able to bypass de novo assemblies. Kestrel has a very low false-positive rate when applied to the whole genome, and while Kestrel identified many variants missed by other methods, limitations of a purely k-mer based approach affect overall sensitivity.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>Source code and documentation for a Java implementation of Kestrel can be found at https:\/\/github.com\/paudano\/kestrel. All test code for this publication is located at https:\/\/github.com\/paudano\/kescases.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btx753","type":"journal-article","created":{"date-parts":[[2017,11,23]],"date-time":"2017-11-23T23:12:27Z","timestamp":1511478747000},"page":"1659-1665","source":"Crossref","is-referenced-by-count":23,"title":["Mapping-free variant calling using haplotype reconstruction from k-mer frequencies"],"prefix":"10.1093","volume":"34","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5187-0415","authenticated-orcid":false,"given":"Peter A","family":"Audano","sequence":"first","affiliation":[{"name":"School of Biology, Georgia Institute of Technology, Atlanta, GA, USA"}]},{"given":"Shashidhar","family":"Ravishankar","sequence":"additional","affiliation":[{"name":"School of Biology, Georgia Institute of Technology, Atlanta, GA, USA"}]},{"given":"Fredrik O","family":"Vannberg","sequence":"additional","affiliation":[{"name":"School of Biology, Georgia Institute of Technology, Atlanta, GA, USA"}]}],"member":"286","published-online":{"date-parts":[[2017,11,24]]},"reference":[{"key":"2023012713434712600_btx753-B1","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J. Mol. Biol"},{"key":"2023012713434712600_btx753-B2","doi-asserted-by":"crossref","first-page":"2070","DOI":"10.1093\/bioinformatics\/btu152","article-title":"KAnalyze: a fast versatile pipelined K-mer toolkit","volume":"30","author":"Audano","year":"2014","journal-title":"Bioinformatics"},{"key":"2023012713434712600_btx753-B3","doi-asserted-by":"crossref","first-page":"455","DOI":"10.1089\/cmb.2012.0021","article-title":"SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing","volume":"19","author":"Bankevich","year":"2012","journal-title":"J. Comput. Biol"},{"key":"2023012713434712600_btx753-B4","doi-asserted-by":"crossref","first-page":"888","DOI":"10.1038\/nbt0816-888d","article-title":"Near-optimal probabilistic RNA-seq quantification","volume":"34","author":"Bray","year":"2016","journal-title":"Nat. Biotechnol"},{"key":"2023012713434712600_btx753-B5","first-page":"023754","article-title":"Comparing variant call files for performance benchmarking of next-generation sequencing variant calling pipelines","author":"Cleary","year":"2015","journal-title":"bioRxiv"},{"key":"2023012713434712600_btx753-B6","doi-asserted-by":"crossref","first-page":"909","DOI":"10.1038\/nbt0704-909","article-title":"What is dynamic programming?","volume":"22","author":"Eddy","year":"2004","journal-title":"Nat. Biotechnol"},{"key":"2023012713434712600_btx753-B7","doi-asserted-by":"crossref","first-page":"e81760","DOI":"10.1371\/journal.pone.0081760","article-title":"When whole-genome alignments just won\u2019t work: kSNP v2 software for alignment-free SNP discovery and phylogenetics of hundreds of microbial genomes","volume":"8","author":"Gardner","year":"2013","journal-title":"PLoS One"},{"key":"2023012713434712600_btx753-B8","doi-asserted-by":"crossref","first-page":"107","DOI":"10.4172\/2157-7145.1000107","article-title":"Scalable SNP Analyses of 100+ Bacterial or Viral Genomes","volume":"1","author":"Gardner","year":"2010","journal-title":"J. Forensic Res"},{"key":"2023012713434712600_btx753-B9","doi-asserted-by":"crossref","first-page":"593","DOI":"10.1093\/bioinformatics\/btr708","article-title":"ART: A next-generation sequencing read simulator","volume":"28","author":"Huang","year":"2012","journal-title":"Bioinformatics"},{"key":"2023012713434712600_btx753-B11","doi-asserted-by":"crossref","first-page":"226","DOI":"10.1038\/ng.1028","article-title":"De novo assembly and genotyping of variants using colored de Bruijn graphs","volume":"44","author":"Iqbal","year":"2012","journal-title":"Nat. Genet"},{"key":"2023012713434712600_btx753-B12","doi-asserted-by":"crossref","first-page":"595","DOI":"10.1186\/1471-2105-11-595","article-title":"BIGSdb: Scalable analysis of bacterial genome variation at the population level","volume":"11","author":"Jolley","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2023012713434712600_btx753-B13","doi-asserted-by":"crossref","first-page":"e1002824","DOI":"10.1371\/journal.ppat.1002824","article-title":"Routine use of microbial whole genome sequencing in diagnostic and public health microbiology","volume":"8","author":"K\u00f6ser","year":"2012","journal-title":"PLoS Pathogens"},{"key":"2023012713434712600_btx753-B14","doi-asserted-by":"crossref","first-page":"1993","DOI":"10.1111\/j.1365-2958.1991.tb00821.x","article-title":"Interspecies recombinational events during the evolution of altered pbp 2x genes in penicillin-resistant clinical isolates of Streptococcus pneumoniae","volume":"5","author":"Laible","year":"1991","journal-title":"Mol. Microbiol"},{"key":"2023012713434712600_btx753-B15","doi-asserted-by":"crossref","first-page":"1355","DOI":"10.1128\/JCM.06094-11","article-title":"Multilocus sequence typing of total genome sequenced bacteria","volume":"50","author":"Larsen","year":"2012","journal-title":"J. Clin. Microbiol"},{"key":"2023012713434712600_btx753-B16","doi-asserted-by":"crossref","first-page":"3294","DOI":"10.1128\/JB.00363-12","article-title":"Complete genome sequence of Streptococcus pneumoniae Strain ST556, a multidrug-resistant isolate from an otitis media patient","volume":"194","author":"Li","year":"2012","journal-title":"J. Bacteriol"},{"key":"2023012713434712600_btx753-B17","doi-asserted-by":"crossref","first-page":"1754","DOI":"10.1093\/bioinformatics\/btp324","article-title":"Fast and accurate short read alignment with Burrows-Wheeler transform","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012713434712600_btx753-B18","doi-asserted-by":"crossref","first-page":"589","DOI":"10.1093\/bioinformatics\/btp698","article-title":"Fast and accurate long-read alignment with Burrows-Wheeler transform","volume":"26","author":"Li","year":"2010","journal-title":"Bioinformatics"},{"key":"2023012713434712600_btx753-B19","doi-asserted-by":"crossref","first-page":"2078","DOI":"10.1093\/bioinformatics\/btp352","article-title":"The Sequence Alignment\/Map format and SAMtools","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012713434712600_btx753-B20","doi-asserted-by":"crossref","first-page":"271","DOI":"10.1186\/1479-7364-4-4-271","article-title":"State of the art de novo assembly of human genomes from massively parallel sequencing data","volume":"4","author":"Li","year":"2010","journal-title":"Hum. Genomics"},{"key":"2023012713434712600_btx753-B21","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1128\/mBio.00756-16","article-title":"Penicillin-binding protein transpeptidase signatures for tracking and predicting \u03b2-lactam resistance levels in Streptococcus pneumoniae","volume":"7","author":"Li","year":"2016","journal-title":"mBio"},{"key":"2023012713434712600_btx753-B22","doi-asserted-by":"crossref","first-page":"3831","DOI":"10.1002\/j.1460-2075.1992.tb05475.x","article-title":"Relatedness of penicillin-binding protein 1a genes from different clones of penicillin-resistant Streptococcus pneumoniae isolated in South Africa and Spain","volume":"11","author":"Martin","year":"1992","journal-title":"EMBO J"},{"key":"2023012713434712600_btx753-B23","doi-asserted-by":"crossref","first-page":"1297","DOI":"10.1101\/gr.107524.110","article-title":"The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data","volume":"20","author":"McKenna","year":"2010","journal-title":"Genome Res"},{"key":"2023012713434712600_btx753-B24","doi-asserted-by":"crossref","first-page":"299","DOI":"10.1038\/35012500","article-title":"Lateral gene transfer and the nature of bacterial innovation","volume":"405","author":"Ochman","year":"2000","journal-title":"Nature"},{"key":"2023012713434712600_btx753-B25","doi-asserted-by":"crossref","first-page":"1","DOI":"10.3389\/fgene.2015.00235","article-title":"Best practices for evaluating single nucleotide variant calling methods for microbial genomics","volume":"6","author":"Olson","year":"2015","journal-title":"Front. Genet"},{"key":"2023012713434712600_btx753-B26","doi-asserted-by":"crossref","first-page":"462","DOI":"10.1038\/nbt.2862","article-title":"Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms","volume":"32","author":"Patro","year":"2014","journal-title":"Nat. Biotechnol"},{"key":"2023012713434712600_btx753-B27","doi-asserted-by":"crossref","first-page":"1397","DOI":"10.1001\/jamainternmed.2013.7734","article-title":"Rapid bacterial whole-genome sequencing to enhance diagnostic and public health microbiology","volume":"173","author":"Reuter","year":"2013","journal-title":"JAMA Int. Med"},{"key":"2023012713434712600_btx753-B28","doi-asserted-by":"crossref","first-page":"912","DOI":"10.1038\/ng.3036","article-title":"Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications","volume":"46","author":"Rimmer","year":"2014","journal-title":"Nat. Genet"},{"key":"2023012713434712600_btx753-B29","doi-asserted-by":"crossref","first-page":"3363","DOI":"10.1093\/bioinformatics\/bth408","article-title":"Reducing storage requirements for biological sequence comparison","volume":"20","author":"Roberts","year":"2004","journal-title":"Bioinformatics"},{"key":"2023012713434712600_btx753-B30","first-page":"119","article-title":"Large-scale genomic sequencing of extraintestinal pathogenic Escherichia coli strains","volume-title":"Genome Res","author":"Salipante","year":"2015"},{"key":"2023012713434712600_btx753-B31","doi-asserted-by":"crossref","first-page":"125","DOI":"10.1186\/gb-2011-12-8-125","article-title":"The real cost of sequencing: higher than you think!","volume":"12","author":"Sboner","year":"2011","journal-title":"Genome Biol"},{"key":"2023012713434712600_btx753-B32","doi-asserted-by":"crossref","first-page":"195","DOI":"10.1016\/0022-2836(81)90087-5","article-title":"Identification of common molecular subsequences","volume":"147","author":"Smith","year":"1981","journal-title":"J. Mol. Biol"},{"key":"2023012713434712600_btx753-B33","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1371\/journal.pone.0157718","article-title":"A bacterial analysis platform: an integrated system for analysing bacterial whole genome sequencing data for clinical diagnostics and surveillance","volume":"11","author":"Thomsen","year":"2016","journal-title":"PLoS One"},{"key":"2023012713434712600_btx753-B34","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/gb-2014-15-3-r46","article-title":"Kraken: ultrafast metagenomic sequence classification using exact alignments","volume":"15","author":"Wood","year":"2014","journal-title":"Genome Biol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/10\/1659\/48935716\/bioinformatics_34_10_1659.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/34\/10\/1659\/48935716\/bioinformatics_34_10_1659.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,27]],"date-time":"2023-01-27T09:23:40Z","timestamp":1674811420000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/34\/10\/1659\/4657072"}},"subtitle":[],"editor":[{"given":"Bonnie","family":"Berger","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2017,11,24]]},"references-count":33,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2018,5,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btx753","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/153619","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2018,5,15]]},"published":{"date-parts":[[2017,11,24]]}}}