{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,17]],"date-time":"2026-04-17T23:11:44Z","timestamp":1776467504073,"version":"3.51.2"},"reference-count":15,"publisher":"Oxford University Press (OUP)","issue":"17","license":[{"start":{"date-parts":[[2022,7,8]],"date-time":"2022-07-08T00:00:00Z","timestamp":1657238400000},"content-version":"vor","delay-in-days":1,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100012007","name":"Rockefeller University","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100012007","id-type":"DOI","asserted-by":"publisher"}]},{"name":"DataPLANT","award":["442077441"],"award-info":[{"award-number":["442077441"]}]},{"name":"German National Research Data Initiative","award":["NFDI 7\/1"],"award-info":[{"award-number":["NFDI 7\/1"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2022,9,2]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>With the current pace at which reference genomes are being produced, the availability of tools that can reliably and efficiently generate genome assembly summary statistics has become critical. Additionally, with the emergence of new algorithms and data types, tools that can improve the quality of existing assemblies through automated and manual curation are required.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We sought to address both these needs by developing gfastats, as part of the Vertebrate Genomes Project (VGP) effort to generate high-quality reference genomes at scale. Gfastats is a standalone tool to compute assembly summary statistics and manipulate assembly sequences in FASTA, FASTQ or GFA [.gz] format. Gfastats stores assembly sequences internally in a GFA-like format. This feature allows gfastats to seamlessly convert FAST* to and from GFA [.gz] files. Gfastats can also build an assembly graph that can in turn be used to manipulate the underlying sequences following instructions provided by the user, while simultaneously generating key metrics for the new sequences.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>Gfastats is implemented in C++. Precompiled releases (Linux, MacOS, Windows) and commented source code for gfastats are available under MIT licence at https:\/\/github.com\/vgl-hub\/gfastats. Examples of how to run gfastats are provided in the GitHub. Gfastats is also available in Bioconda, in Galaxy (https:\/\/assembly.usegalaxy.eu) and as a MultiQC module (https:\/\/github.com\/ewels\/MultiQC). An automated test workflow is available to ensure consistency of software updates.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Supplementary information<\/jats:title>\n                    <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btac460","type":"journal-article","created":{"date-parts":[[2022,7,8]],"date-time":"2022-07-08T00:47:27Z","timestamp":1657241247000},"page":"4214-4216","source":"Crossref","is-referenced-by-count":970,"title":["Gfastats: conversion, evaluation and manipulation of genome sequences using assembly graphs"],"prefix":"10.1093","volume":"38","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7554-5991","authenticated-orcid":false,"given":"Giulio","family":"Formenti","sequence":"first","affiliation":[{"name":"The Vertebrate Genome Laboratory, The Rockefeller University , New York, NY 10065, USA"}]},{"given":"Linelle","family":"Abueg","sequence":"additional","affiliation":[{"name":"The Vertebrate Genome Laboratory, The Rockefeller University , New York, NY 10065, USA"}]},{"given":"Angelo","family":"Brajuka","sequence":"additional","affiliation":[{"name":"The Vertebrate Genome Laboratory, The Rockefeller University , New York, NY 10065, USA"}]},{"given":"Nadolina","family":"Brajuka","sequence":"additional","affiliation":[{"name":"The Vertebrate Genome Laboratory, The Rockefeller University , New York, NY 10065, USA"}]},{"given":"Crist\u00f3bal","family":"Gallardo-Alba","sequence":"additional","affiliation":[{"name":"Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg , Freiburg 79110, Germany"}]},{"given":"Alice","family":"Giani","sequence":"additional","affiliation":[{"name":"Helen and Robert Appel Alzheimer Disease Research Institute, Feil Family Brain and Mind Research Institute, Weill Cornell Medicine , New York, NY 10021, USA"}]},{"given":"Olivier","family":"Fedrigo","sequence":"additional","affiliation":[{"name":"The Vertebrate Genome Laboratory, The Rockefeller University , New York, NY 10065, USA"}]},{"given":"Erich D","family":"Jarvis","sequence":"additional","affiliation":[{"name":"The Vertebrate Genome Laboratory, The Rockefeller University , New York, NY 10065, USA"},{"name":"Howard Hughes Medical Institute, Chevy Chase , Maryland 20815, USA"}]}],"member":"286","published-online":{"date-parts":[[2022,7,7]]},"reference":[{"key":"2023041408370351500_","doi-asserted-by":"crossref","first-page":"170","DOI":"10.1038\/s41592-020-01056-5","article-title":"Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm","volume":"18","author":"Cheng","year":"2021","journal-title":"Nat. Methods"},{"key":"2023041408370351500_","doi-asserted-by":"crossref","DOI":"10.1038\/s41587-022-01261-x","article-title":"Haplotype-resolved assembly of diploid genomes without parental data","volume-title":"Nat. Biotechnol","author":"Cheng","year":"2022"},{"key":"2023041408370351500_","doi-asserted-by":"crossref","first-page":"1767","DOI":"10.1093\/nar\/gkp1137","article-title":"The sanger FASTQ file format for sequences with quality scores, and the solexa\/illumina FASTQ variants","volume":"38","author":"Cock","year":"2010","journal-title":"Nucleic Acids Res"},{"key":"2023041408370351500_","doi-asserted-by":"crossref","first-page":"1083","DOI":"10.21105\/joss.01083","article-title":"GFAKluge: a C++ library and command line utilities for the graphical fragment assembly formats","volume":"4","author":"Dawson","year":"2019","journal-title":"J. Open Source Softw"},{"key":"2023041408370351500_","doi-asserted-by":"crossref","first-page":"3047","DOI":"10.1093\/bioinformatics\/btw354","article-title":"MultiQC: summarize analysis results for multiple tools and samples in a single report","volume":"32","author":"Ewels","year":"2016","journal-title":"Bioinformatics"},{"key":"2023041408370351500_","doi-asserted-by":"crossref","first-page":"1072","DOI":"10.1093\/bioinformatics\/btt086","article-title":"QUAST: quality assessment tool for genome assemblies","volume":"29","author":"Gurevich","year":"2013","journal-title":"Bioinformatics"},{"key":"2023041408370351500_","doi-asserted-by":"crossref","first-page":"giaa153","DOI":"10.1093\/gigascience\/giaa153","article-title":"Significantly improving the quality of genome assemblies through curation","volume":"10","author":"Howe","year":"2021","journal-title":"Gigascience"},{"key":"2023041408370351500_","doi-asserted-by":"crossref","DOI":"10.1038\/s41586-022-05325-5","article-title":"Automated assembly of high-quality diploid human reference genomes","volume-title":"bioRxiv","author":"Jarvis","year":"2022"},{"key":"2023041408370351500_","doi-asserted-by":"crossref","first-page":"4325","DOI":"10.1073\/pnas.1720115115","article-title":"Earth BioGenome project: sequencing life for the future of life","volume":"115","author":"Lewin","year":"2018","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023041408370351500_","doi-asserted-by":"crossref","first-page":"1435","DOI":"10.1126\/science.2983426","article-title":"Rapid and sensitive protein similarity searches","volume":"227","author":"Lipman","year":"1985","journal-title":"Science"},{"key":"2023041408370351500_","doi-asserted-by":"crossref","first-page":"665","DOI":"10.1101\/gr.214155.116","article-title":"Genome graphs and the evolution of genome inference","volume":"27","author":"Paten","year":"2017","journal-title":"Genome Res"},{"key":"2023041408370351500_","doi-asserted-by":"crossref","DOI":"10.1101\/2022.06.24.497523","article-title":"Verkko: telomere-to-telomere assembly of diploid chromosomes","volume-title":"bioRxiv","author":"Rautiainen","year":"2022"},{"key":"2023041408370351500_","doi-asserted-by":"crossref","first-page":"737","DOI":"10.1038\/s41586-021-03451-0","article-title":"Towards complete and error-free genome assemblies of all vertebrate species","volume":"592","author":"Rhie","year":"2021","journal-title":"Nature"},{"key":"2023041408370351500_","doi-asserted-by":"crossref","first-page":"e0163962","DOI":"10.1371\/journal.pone.0163962","article-title":"SeqKit: a cross-platform and ultrafast toolkit for FASTA\/Q file manipulation","volume":"11","author":"Shen","year":"2016","journal-title":"PLoS One"},{"key":"2023041408370351500_","doi-asserted-by":"crossref","first-page":"3350","DOI":"10.1093\/bioinformatics\/btv383","article-title":"Bandage: interactive visualization of de novo genome assemblies","volume":"31","author":"Wick","year":"2015","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btac460\/45019532\/btac460.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/17\/4214\/49889683\/btac460.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/38\/17\/4214\/49889683\/btac460.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,24]],"date-time":"2023-11-24T04:55:49Z","timestamp":1700801749000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/38\/17\/4214\/6633308"}},"subtitle":[],"editor":[{"given":"Alfonso","family":"Valencia","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2022,7,7]]},"references-count":15,"journal-issue":{"issue":"17","published-print":{"date-parts":[[2022,9,2]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btac460","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2022.03.24.485682","asserted-by":"object"}]},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2022,9,1]]},"published":{"date-parts":[[2022,7,7]]}}}