{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:34:41Z","timestamp":1772138081495,"version":"3.50.1"},"reference-count":27,"publisher":"Oxford University Press (OUP)","issue":"9","license":[{"start":{"date-parts":[[2016,12,23]],"date-time":"2016-12-23T00:00:00Z","timestamp":1482451200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/about_us\/legal\/notices"}],"funder":[{"DOI":"10.13039\/100000001","name":"NSF","doi-asserted-by":"publisher","award":["DBI-1356529"],"award-info":[{"award-number":["DBI-1356529"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"NSF","doi-asserted-by":"publisher","award":["CCF-1439057"],"award-info":[{"award-number":["CCF-1439057"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"NSF","doi-asserted-by":"publisher","award":["IIS-1453527"],"award-info":[{"award-number":["IIS-1453527"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000001","name":"NSF","doi-asserted-by":"publisher","award":["IIS-1421908"],"award-info":[{"award-number":["IIS-1421908"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2017,5,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Small variant calling is an important component of many analyses, and, in many instances, it is important to determine the set of variants which appear in multiple callsets. Variant matching is complicated by variants that have multiple equivalent representations. Normalization and decomposition algorithms have been proposed, but are not robust to different representation of complex variants. Variant matching is also usually done to maximize the number of matches, as opposed to other optimization criteria.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We present the VarMatch algorithm for the variant matching problem. Our algorithm is based on a theoretical result which allows us to partition the input into smaller subproblems without sacrificing accuracy. VarMatch is robust to different representation of complex variants and is particularly effective in low complexity regions or those dense in variants. VarMatch is able to detect more matches than either the normalization or decomposition algorithms on tested datasets. It also implements different optimization criteria, such as edit distance, that can improve robustness to different variant representations. Finally, the VarMatch software provides summary statistics, annotations and visualizations that are useful for understanding callers\u2019 performance.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and Implementation<\/jats:title>\n                    <jats:p>VarMatch is freely available at: https:\/\/github.com\/medvedevgroup\/varmatch<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btw797","type":"journal-article","created":{"date-parts":[[2016,12,13]],"date-time":"2016-12-13T15:05:18Z","timestamp":1481641518000},"page":"1301-1308","source":"Crossref","is-referenced-by-count":16,"title":["VarMatch: robust matching of small variant datasets using flexible scoring schemes"],"prefix":"10.1093","volume":"33","author":[{"given":"Chen","family":"Sun","sequence":"first","affiliation":[{"name":"Department of Computer Science and Engineering, The Pennsylvania State University, State College, PA, USA"}]},{"given":"Paul","family":"Medvedev","sequence":"additional","affiliation":[{"name":"Department of Computer Science and Engineering, The Pennsylvania State University, State College, PA, USA"},{"name":"Department of Biochemistry and Molecular Biology, The Pennsylvania State University, State College, PA, USA"},{"name":"Genome Sciences Institute at the Huck, The Pennsylvania State University, State College, PA, USA"}]}],"member":"286","published-online":{"date-parts":[[2016,12,30]]},"reference":[{"key":"2023020205014985000_btw797-B1","doi-asserted-by":"crossref","first-page":"1061","DOI":"10.1038\/nature09534","article-title":"A map of human genome variation from population-scale sequencing","volume":"467","author":"1000 Genomes Project Consortium","year":"2010","journal-title":"Nature"},{"key":"2023020205014985000_btw797-B2","doi-asserted-by":"crossref","first-page":"56","DOI":"10.1038\/nature11632","article-title":"An integrated map of genetic variation from 1,092 human genomes","volume":"491","author":"1000 Genomes Project Consortium","year":"2012","journal-title":"Nature"},{"key":"2023020205014985000_btw797-B3","doi-asserted-by":"crossref","first-page":"e62803","DOI":"10.1371\/journal.pone.0062803","article-title":"Equivalent indels\u2013ambiguous functional classes and redundancy in databases","volume":"8","author":"Assmus","year":"2013","journal-title":"PloS One"},{"key":"2023020205014985000_btw797-B4","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1471-2164-15-948","article-title":"Evaluation of variant identification methods for whole genome sequencing data in dairy cattle","volume":"15","author":"Baes","year":"2014","journal-title":"BMC Genomics"},{"key":"2023020205014985000_btw797-B5","doi-asserted-by":"crossref","first-page":"1707","DOI":"10.1093\/bioinformatics\/btu067","article-title":"Assessing single nucleotide variant detection and genotype calling on whole-genome sequenced individuals","volume":"30","author":"Cheng","year":"2014","journal-title":"Bioinformatics"},{"key":"2023020205014985000_btw797-B6","first-page":"023754","article-title":"Comparing variant call files for performance benchmarking of next-generation sequencing variant calling pipelines","author":"Cleary","year":"2015","journal-title":"BioRxiv"},{"key":"2023020205014985000_btw797-B7","doi-asserted-by":"crossref","DOI":"10.1155\/2015\/456479","article-title":"A comparison of variant calling pipelines using genome in a bottle as a reference","volume":"2015","author":"Cornish","year":"2015","journal-title":"BioMed Res. Int"},{"key":"2023020205014985000_btw797-B8","doi-asserted-by":"crossref","first-page":"2156","DOI":"10.1093\/bioinformatics\/btr330","article-title":"The variant call format and VCFtools","volume":"27","author":"Danecek","year":"2011","journal-title":"Bioinformatics"},{"key":"2023020205014985000_btw797-B9","doi-asserted-by":"crossref","first-page":"736","DOI":"10.1101\/gr.185892.114","article-title":"Accurate typing of short tandem repeats from genome-wide sequencing data and its applications","volume":"25","author":"Fungtammasan","year":"2015","journal-title":"Genome Res"},{"key":"2023020205014985000_btw797-B10","author":"Garrison","year":"2012"},{"key":"2023020205014985000_btw797-B11","doi-asserted-by":"crossref","first-page":"e1000327","DOI":"10.1371\/journal.pgen.1000327","article-title":"A microhomology-mediated break-induced replication model for the origin of human copy number variation","volume":"5","author":"Hastings","year":"2009","journal-title":"PLoS Genet"},{"key":"2023020205014985000_btw797-B12","doi-asserted-by":"crossref","DOI":"10.1038\/ncomms7275","article-title":"An analytical framework for optimizing variant discovery from personal genomes","volume":"6","author":"Highnam","year":"2015","journal-title":"Nat. Commun"},{"key":"2023020205014985000_btw797-B13","article-title":"Systematic comparison of variant calling pipelines using gold standard personal exome variants","volume":"5","author":"Hwang","year":"2015","journal-title":"Sci. Reports"},{"key":"2023020205014985000_btw797-B14","doi-asserted-by":"crossref","first-page":"2283","DOI":"10.1093\/bioinformatics\/btp373","article-title":"VarScan: variant detection in massively parallel sequencing of individual and pooled samples","volume":"25","author":"Koboldt","year":"2009","journal-title":"Bioinformatics"},{"key":"2023020205014985000_btw797-B15","doi-asserted-by":"crossref","first-page":"722","DOI":"10.1093\/bioinformatics\/btq027","article-title":"Microindel detection in short-read sequence data","volume":"26","author":"Krawitz","year":"2010","journal-title":"Bioinformatics"},{"key":"2023020205014985000_btw797-B16","doi-asserted-by":"crossref","first-page":"2841","DOI":"10.1093\/bioinformatics\/btu356","article-title":"Towards better understanding of artifacts in variant calling from high-coverage samples","volume":"30","author":"Li","year":"2014","journal-title":"Bioinformatics"},{"key":"2023020205014985000_btw797-B17","doi-asserted-by":"crossref","first-page":"2078","DOI":"10.1093\/bioinformatics\/btp352","article-title":"The sequence alignment\/map format and SAMtools","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2023020205014985000_btw797-B18","doi-asserted-by":"crossref","first-page":"S13","DOI":"10.1186\/1471-2105-14-S15-S13","article-title":"Haploid to diploid alignment for variation calling assessment","volume":"14(suppl. 15)","author":"M\u00e4kinen","year":"2013","journal-title":"BMC Bioinformatics"},{"key":"2023020205014985000_btw797-B19","doi-asserted-by":"crossref","first-page":"S15","DOI":"10.1186\/1471-2164-15-S6-S15","article-title":"Recombination-aware alignment of diploid individuals","volume":"15(suppl. 6)","author":"M\u00e4kinen","year":"2014","journal-title":"BMC Genomics"},{"key":"2023020205014985000_btw797-B20","doi-asserted-by":"crossref","first-page":"1297","DOI":"10.1101\/gr.107524.110","article-title":"The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation dna sequencing data","volume":"20","author":"McKenna","year":"2010","journal-title":"Genome Res"},{"key":"2023020205014985000_btw797-B21","first-page":"btt314","article-title":"Isaac: ultra-fast whole-genome secondary analysis on illumina sequencing platforms","author":"Raczy","year":"2013","journal-title":"Bioinformatics"},{"key":"2023020205014985000_btw797-B22","doi-asserted-by":"crossref","first-page":"912","DOI":"10.1038\/ng.3036","article-title":"Integrating mapping-, assembly-and haplotype-based approaches for calling variants in clinical sequencing applications","volume":"46","author":"Rimmer","year":"2014","journal-title":"Nat. Genet"},{"key":"2023020205014985000_btw797-B23","doi-asserted-by":"crossref","first-page":"2787","DOI":"10.1093\/bioinformatics\/btu345","article-title":"Smash: a benchmarking toolkit for human genome variant calling","volume":"30","author":"Talwalkar","year":"2014","journal-title":"Bioinformatics"},{"key":"2023020205014985000_btw797-B24","first-page":"btv112","article-title":"Unified representation of genetic variants","author":"Tan","year":"2015","journal-title":"Bioinformatics"},{"key":"2023020205014985000_btw797-B25","doi-asserted-by":"crossref","first-page":"e132","DOI":"10.1093\/nar\/gkr599","article-title":"SNVer: a statistical tool for variant calling in analysis of pooled or individual next-generation sequencing data","volume":"39","author":"Wei","year":"2011","journal-title":"Nucleic Acids Res"},{"key":"2023020205014985000_btw797-B26","doi-asserted-by":"crossref","first-page":"2947","DOI":"10.1093\/bioinformatics\/btv304","article-title":"Repeat-and error-aware comparison of deletions","volume":"31","author":"Wittler","year":"2015","journal-title":"Bioinformatics"},{"key":"2023020205014985000_btw797-B27","doi-asserted-by":"crossref","first-page":"246","DOI":"10.1038\/nbt.2835","article-title":"Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls","volume":"32","author":"Zook","year":"2014","journal-title":"Nat. Biotechnol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/9\/1301\/49038452\/bioinformatics_33_9_1301.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/33\/9\/1301\/49038452\/bioinformatics_33_9_1301.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,2,2]],"date-time":"2023-02-02T00:05:38Z","timestamp":1675296338000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/33\/9\/1301\/2736365"}},"subtitle":[],"editor":[{"given":"Inanc","family":"Birol","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2016,12,30]]},"references-count":27,"journal-issue":{"issue":"9","published-print":{"date-parts":[[2017,5,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btw797","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/062943","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2017,5,1]]},"published":{"date-parts":[[2016,12,30]]}}}