{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,19]],"date-time":"2026-06-19T07:50:38Z","timestamp":1781855438317,"version":"3.54.5"},"update-to":[{"DOI":"10.1371\/journal.pcbi.1009123","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2022,7,15]],"date-time":"2022-07-15T00:00:00Z","timestamp":1657843200000}}],"reference-count":49,"publisher":"Public Library of Science (PLoS)","issue":"5","license":[{"start":{"date-parts":[[2022,5,31]],"date-time":"2022-05-31T00:00:00Z","timestamp":1653955200000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["www.ploscompbiol.org"],"crossmark-restriction":false},"short-container-title":["PLoS Comput Biol"],"abstract":"<jats:p>Since its introduction in 2011 the variant call format (VCF) has been widely adopted for processing DNA and RNA variants in practically all population studies\u2014as well as in somatic and germline mutation studies. The VCF format can represent single nucleotide variants, multi-nucleotide variants, insertions and deletions, and simple structural variants called and anchored against a reference genome.<\/jats:p>\n<jats:p>Here we present a spectrum of over 125 useful, complimentary free and open source software tools and libraries, we wrote and made available through the multiple <jats:monospace>vcflib<\/jats:monospace>, <jats:monospace>bio-vcf<\/jats:monospace>, <jats:monospace>cyvcf2<\/jats:monospace>, <jats:monospace>hts-nim<\/jats:monospace> and <jats:monospace>slivar<\/jats:monospace> projects. These tools are applied for comparison, filtering, normalisation, smoothing and annotation of VCF, as well as output of statistics, visualisation, and transformations of files variants. These tools run everyday in critical biomedical pipelines and countless shell scripts. Our tools are part of the wider bioinformatics ecosystem and we highlight best practices.<\/jats:p>\n<jats:p>We shortly discuss the design of VCF, lessons learnt, and how we can address more complex variation through pangenome graph formats, variation that can not easily be represented by the VCF format.<\/jats:p>","DOI":"10.1371\/journal.pcbi.1009123","type":"journal-article","created":{"date-parts":[[2022,5,31]],"date-time":"2022-05-31T17:42:26Z","timestamp":1654018946000},"page":"e1009123","update-policy":"https:\/\/doi.org\/10.1371\/journal.pcbi.corrections_policy","source":"Crossref","is-referenced-by-count":239,"title":["A spectrum of free software tools for processing the VCF variant call format: vcflib, bio-vcf, cyvcf2, hts-nim and slivar"],"prefix":"10.1371","volume":"18","author":[{"given":"Erik","family":"Garrison","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7627-9808","authenticated-orcid":true,"given":"Zev N.","family":"Kronenberg","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5448-1653","authenticated-orcid":true,"given":"Eric T.","family":"Dawson","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Brent S.","family":"Pedersen","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8021-9162","authenticated-orcid":true,"given":"Pjotr","family":"Prins","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"340","published-online":{"date-parts":[[2022,5,31]]},"reference":[{"issue":"15","key":"pcbi.1009123.ref001","doi-asserted-by":"crossref","first-page":"2156","DOI":"10.1093\/bioinformatics\/btr330","article-title":"The variant call format and VCFtools","volume":"27","author":"P Danecek","year":"2011","journal-title":"Bioinformatics"},{"key":"pcbi.1009123.ref002","unstructured":"HTS-Specs: specifications of SAM\/BAM and related high-throughput sequencing file formats; 2011 (accessed April 2021). https:\/\/samtools.github.io\/hts-specs\/. GitHub Repository."},{"issue":"9","key":"pcbi.1009123.ref003","doi-asserted-by":"crossref","first-page":"1297","DOI":"10.1101\/gr.107524.110","article-title":"The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data","volume":"20","author":"A McKenna","year":"2010","journal-title":"Genome Res"},{"key":"pcbi.1009123.ref004","article-title":"Haplotype-Based Variant Detection from Short-Read Sequencing","author":"E Garrison","year":"2012","journal-title":"ARXIV"},{"issue":"5","key":"pcbi.1009123.ref005","doi-asserted-by":"crossref","first-page":"718","DOI":"10.1093\/bioinformatics\/btq671","article-title":"Tabix: fast retrieval of sequence features from generic TAB-delimited files","volume":"27","author":"H Li","year":"2011","journal-title":"Bioinformatics"},{"issue":"2","key":"pcbi.1009123.ref006","doi-asserted-by":"crossref","DOI":"10.1093\/gigascience\/giab008","article-title":"Twelve years of SAMtools and BCFtools","volume":"10","author":"P Danecek","year":"2021","journal-title":"Gigascience"},{"issue":"13","key":"pcbi.1009123.ref007","doi-asserted-by":"crossref","first-page":"4091","DOI":"10.1093\/bioinformatics\/btaa290","article-title":"genozip: a fast and efficient compression tool for VCF files","volume":"36","author":"D Lan","year":"2020","journal-title":"Bioinformatics"},{"key":"pcbi.1009123.ref008","unstructured":"Prins P, Strozzi F, Tarasov A, de Ligt J, Githinji G, oth ers. Small tools MANIFESTO for Bioinformatics; 2014."},{"issue":"12","key":"pcbi.1009123.ref009","doi-asserted-by":"crossref","first-page":"1867","DOI":"10.1093\/bioinformatics\/btx057","article-title":"cyvcf2: fast, flexible variant analysis with Python","volume":"33","author":"BS Pedersen","year":"2017","journal-title":"Bioinformatics"},{"issue":"19","key":"pcbi.1009123.ref010","doi-asserted-by":"crossref","first-page":"3387","DOI":"10.1093\/bioinformatics\/bty358","article-title":"hts-nim: scripting high-performance genomic analyses","volume":"34","author":"BS Pedersen","year":"2018","journal-title":"Bioinformatics"},{"issue":"7571","key":"pcbi.1009123.ref011","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1038\/nature15393","article-title":"A global reference for human genetic variation","volume":"526","author":"A Auton","year":"2015","journal-title":"Nature"},{"key":"pcbi.1009123.ref012","article-title":"Reproducible integration of multiple sequencing datasets to form high-confidence SNP, indel, and reference calls for five human genome reference materials","author":"JM Zook","year":"2018","journal-title":"bioRxiv"},{"issue":"13","key":"pcbi.1009123.ref013","doi-asserted-by":"crossref","first-page":"2202","DOI":"10.1093\/bioinformatics\/btv112","article-title":"Unified representation of genetic variants","volume":"31","author":"A Tan","year":"2015","journal-title":"Bioinformatics"},{"issue":"2","key":"pcbi.1009123.ref014","doi-asserted-by":"crossref","DOI":"10.1093\/gigascience\/giab007","article-title":"HTSlib: C library for reading\/writing high-throughput sequencing data","volume":"10","author":"JK Bonfield","year":"2021","journal-title":"Gigascience"},{"key":"pcbi.1009123.ref015","unstructured":"Lan D. The Variant Call Format Dual Coordinate Extension (DVCF) Specification; 2021."},{"key":"pcbi.1009123.ref016","article-title":"Sparse Project VCF: efficient encoding of population genotype matrices","author":"MF Lin","year":"2020","journal-title":"bioRxiv"},{"key":"pcbi.1009123.ref017","unstructured":"vcflib for working with VCF files; 2021 (accessed Feb 2021). https:\/\/github.com\/vcflib\/vcflib. GitHub Repository."},{"issue":"7","key":"pcbi.1009123.ref018","doi-asserted-by":"crossref","first-page":"1157","DOI":"10.1046\/j.1365-294X.2002.01512.x","article-title":"A Bayesian approach to inferring population structure from dominant markers","volume":"11","author":"KE Holsinger","year":"2002","journal-title":"Mol Ecol"},{"issue":"9","key":"pcbi.1009123.ref019","doi-asserted-by":"crossref","first-page":"639","DOI":"10.1038\/nrg2611","article-title":"Genetics in geographically structured populations: defining, estimating and interpreting F(ST)","volume":"10","author":"KE Holsinger","year":"2009","journal-title":"Nat Rev Genet"},{"issue":"3","key":"pcbi.1009123.ref020","first-page":"855","article-title":"Estrimation of gene flow from F-statistics","volume":"47","author":"CC Cockerham","year":"1993","journal-title":"Evolution"},{"issue":"10","key":"pcbi.1009123.ref021","doi-asserted-by":"crossref","first-page":"5269","DOI":"10.1073\/pnas.76.10.5269","article-title":"Mathematical model for studying genetic variation in terms of restriction endonucleases","volume":"76","author":"M Nei","year":"1979","journal-title":"Proc Natl Acad Sci U S A"},{"issue":"7164","key":"pcbi.1009123.ref022","doi-asserted-by":"crossref","first-page":"913","DOI":"10.1038\/nature06250","article-title":"Genome-wide detection and characterization of positive selection in human populations","volume":"449","author":"PC Sabeti","year":"2007","journal-title":"Nature"},{"key":"pcbi.1009123.ref023","volume-title":"In a Nutshell Series","author":"JEF Friedl","year":"1997"},{"key":"pcbi.1009123.ref024","unstructured":"bio-vcf: smart VCF parser; 2021 (accessed Feb 2021). https:\/\/github.com\/vcflib\/bio-vcf. GitHub Repository."},{"issue":"1","key":"pcbi.1009123.ref025","doi-asserted-by":"crossref","first-page":"60","DOI":"10.1038\/s41525-021-00227-3","article-title":"Effective variant filtering and expected candidate variant yield in studies of rare human disease","volume":"6","author":"BS Pedersen","year":"2021","journal-title":"NPJ Genom Med"},{"issue":"11","key":"pcbi.1009123.ref026","doi-asserted-by":"crossref","first-page":"1422","DOI":"10.1093\/bioinformatics\/btp163","article-title":"Biopython: freely available Python tools for computational molecular biology and bioinformatics","volume":"25","author":"PJ Cock","year":"2009","journal-title":"Bioinformatics"},{"issue":"10","key":"pcbi.1009123.ref027","doi-asserted-by":"crossref","first-page":"1611","DOI":"10.1101\/gr.361602","article-title":"The Bioperl toolkit: Perl modules for the life sciences","volume":"12","author":"JE Stajich","year":"2002","journal-title":"Genome Res"},{"issue":"20","key":"pcbi.1009123.ref028","doi-asserted-by":"crossref","first-page":"2617","DOI":"10.1093\/bioinformatics\/btq475","article-title":"BioRuby: bioinformatics software for the Ruby programming language","volume":"26","author":"N Goto","year":"2010","journal-title":"Bioinformatics"},{"issue":"1","key":"pcbi.1009123.ref029","doi-asserted-by":"crossref","first-page":"44","DOI":"10.1111\/1755-0998.12549","article-title":"VCFR: a package to manipulate and visualize variant call format data in R","volume":"17","author":"BJ Knaus","year":"2017","journal-title":"Molecular Ecology Resources"},{"issue":"1","key":"pcbi.1009123.ref030","doi-asserted-by":"crossref","first-page":"118","DOI":"10.1186\/s13059-016-0973-5","article-title":"Vcfanno: fast, flexible annotation of genetic variants","volume":"17","author":"BS Pedersen","year":"2016","journal-title":"Genome Biol"},{"issue":"7","key":"pcbi.1009123.ref031","doi-asserted-by":"crossref","first-page":"649","DOI":"10.1089\/cmb.2017.0251","article-title":"Superbubbles, Ultrabubbles, and Cacti","volume":"25","author":"B Paten","year":"2018","journal-title":"Journal of Computational Biology"},{"issue":"5","key":"pcbi.1009123.ref032","doi-asserted-by":"crossref","first-page":"665","DOI":"10.1101\/gr.214155.116","article-title":"Genome Graphs and the Evolution of Genome Inference","volume":"27","author":"B Paten","year":"2017","journal-title":"Genome Research"},{"issue":"9","key":"pcbi.1009123.ref033","doi-asserted-by":"crossref","first-page":"875","DOI":"10.1038\/nbt.4227","article-title":"Variation Graph Toolkit Improves Read Mapping by Representing Genetic Variation in the Reference","volume":"36","author":"E Garrison","year":"2018","journal-title":"Nature Biotechnology"},{"key":"pcbi.1009123.ref034","unstructured":"Graphical Fragment Assembly (GFA) Format Specification; 2015 (accessed Jan 2021). https:\/\/github.com\/GFA-spec\/GFA-spec. GitHub Repository."},{"key":"pcbi.1009123.ref035","unstructured":"vgtools for Working with Genome Variation Graphs; 2014 (accessed Jan 2021). https:\/\/github.com\/vgteam\/. GitHub Repository."},{"key":"pcbi.1009123.ref036","unstructured":"Pangenome Tools; 2020 (accessed Jan 2021). https:\/\/github.com\/pangenome\/. GitHub Repository."},{"key":"pcbi.1009123.ref037","unstructured":"Pangenome Tools; 2020 (accessed Jan 2021). https:\/\/pangenome.github.io\/. GitHub Repository."},{"key":"pcbi.1009123.ref038","unstructured":"pggb: pangenome graph builder; 2020 (accessed Jan 2021). https:\/\/github.com\/pangenome\/pggb. GitHub Repository."},{"key":"pcbi.1009123.ref039","article-title":"ODGI: understanding pangenome graphs","author":"A Guarracino","year":"2021","journal-title":"bioRxiv"},{"key":"pcbi.1009123.ref040","unstructured":"GFF-Spec: Generic Feature Format Version 3 (GFF3); 2016 (accessed April 2021). GFF3 Specification. GitHub Repository."},{"key":"pcbi.1009123.ref041","first-page":"160018","volume":"3","author":"MD Wilkinson","journal-title":"The FAIR Guiding Principles for scientific data management and stewardship"},{"issue":"18","key":"pcbi.1009123.ref042","doi-asserted-by":"crossref","first-page":"2096","DOI":"10.1093\/bioinformatics\/btn397","article-title":"BioJava: an open-source framework for bioinformatics","volume":"24","author":"RC Holland","year":"2008","journal-title":"Bioinformatics"},{"issue":"7","key":"pcbi.1009123.ref043","doi-asserted-by":"crossref","first-page":"686","DOI":"10.1038\/nbt.3240","article-title":"Toward effective software solutions for big biology","volume":"33","author":"P Prins","year":"2015","journal-title":"Nat Biotechnol"},{"key":"pcbi.1009123.ref044","article-title":"Bioconda: A sustainable and comprehensive software distribution for the life sciences","author":"B Gr\u00fcning","year":"2017","journal-title":"bioRxiv"},{"key":"pcbi.1009123.ref045","unstructured":"Debian Linux Software Distribution; 1993 (accessed April 2021). https:\/\/debian.org\/. Online Webpage."},{"key":"pcbi.1009123.ref046","unstructured":"Bavier E, Court\u00e8s L, Garlick P, Prins P, Wurmus R. Guix-HPC Activity Report 2017\u20132018. Inria Bordeaux Sud-Ouest; Max Delbr\u00fcck Center for Molecular Medicine; Cray, Inc.; Tourbillion Technology; 2019. Available from: https:\/\/hal.inria.fr\/hal-02056461."},{"key":"pcbi.1009123.ref047","unstructured":"Prins P. Creating a reproducible workflow with CWL; 2019. Online. https:\/\/hpc.guix.info\/blog\/2019\/01\/creating-a-reproducible-workflow-with-cwl\/."},{"key":"pcbi.1009123.ref048","unstructured":"Amstutz P and Crusoe MR and Tijani\u00c4? N and Chapman B and Chilton J and Heuer M and Kartashov A and Kern J and Leehr D and M\u00e9nager H and Nedeljkovich M and Scales M and Soiland-Reyes S and Stojanovic L. Common Workflow Language, v1.0. Figshare. 2016;."},{"key":"pcbi.1009123.ref049","doi-asserted-by":"crossref","first-page":"723","DOI":"10.1007\/978-1-4939-9074-0_24","article-title":"Scalable Workflows and Reproducible Data Analysis for Genomics","volume":"1910","author":"F Strozzi","year":"2019","journal-title":"Methods Mol Biol"}],"updated-by":[{"DOI":"10.1371\/journal.pcbi.1009123","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2022,7,15]],"date-time":"2022-07-15T00:00:00Z","timestamp":1657843200000}}],"container-title":["PLOS Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1009123","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,7,15]],"date-time":"2022-07-15T17:43:21Z","timestamp":1657907001000},"score":1,"resource":{"primary":{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1009123"}},"subtitle":[],"editor":[{"given":"Dina","family":"Schneidman-Duhovny","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"editor"}]}],"short-title":[],"issued":{"date-parts":[[2022,5,31]]},"references-count":49,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2022,5,31]]}},"URL":"https:\/\/doi.org\/10.1371\/journal.pcbi.1009123","relation":{"new_version":[{"id-type":"doi","id":"10.1371\/journal.pcbi.1009123","asserted-by":"object"}]},"ISSN":["1553-7358"],"issn-type":[{"value":"1553-7358","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,5,31]]}}}