{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,27]],"date-time":"2026-02-27T04:11:02Z","timestamp":1772165462235,"version":"3.50.1"},"reference-count":21,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2021,11,1]],"date-time":"2021-11-01T00:00:00Z","timestamp":1635724800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2021,11,1]],"date-time":"2021-11-01T00:00:00Z","timestamp":1635724800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2021,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Background<\/jats:title>\n                    <jats:p>The FASTA file format, used to store polymeric sequence data, has become a bioinformatics file standard used for decades. The relatively large files require additional files, beyond the scope of the original format, to identify sequences and to provide random access. Multiple compressors have been developed to archive FASTA files back and forth, but these lack direct access to targeted content or metadata of the archive. Moreover, these solutions are not directly backwards compatible to FASTA files, resulting in limited software integration.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>We designed a linux based toolkit that virtualises the content of DNA, RNA and protein FASTA archives into the filesystem by using filesystem in userspace. This guarantees in-sync virtualised metadata files and offers fast random-access decompression using bit encodings plus Zstandard (zstd). The toolkit, FASTAFS, can track all its system-wide running instances, allows file integrity verification and can provide, instantly, scriptable access to sequence files and is easy to use and deploy. The file compression ratios were comparable but not superior to other state of the art archival tools, despite the innovative random access feature implemented in FASTAFS.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Conclusions<\/jats:title>\n                    <jats:p>FASTAFS is a user-friendly and easy to deploy backwards compatible generic purpose solution to store and access compressed FASTA files, since it offers file system access to FASTA files as well as in-sync metadata files through file virtualisation. Using virtual filesystems as in-between layer offers format conversion without the need to rewrite code into different programming languages while preserving compatibility.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1186\/s12859-021-04455-3","type":"journal-article","created":{"date-parts":[[2021,11,1]],"date-time":"2021-11-01T04:02:57Z","timestamp":1635739377000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":6,"title":["FASTAFS: file system virtualisation of random access compressed FASTA files"],"prefix":"10.1186","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-2166-0676","authenticated-orcid":false,"given":"Youri","family":"Hoogstrate","sequence":"first","affiliation":[]},{"given":"Guido W.","family":"Jenster","sequence":"additional","affiliation":[]},{"given":"Harmen J. G.","family":"van de Werken","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2021,11,1]]},"reference":[{"issue":"4","key":"4455_CR1","doi-asserted-by":"publisher","first-page":"576","DOI":"10.1016\/j.molcel.2010.05.004","volume":"38","author":"S Heinz","year":"2010","unstructured":"Heinz S, Benner C, Spann N, Bertolino E, Lin YC, Laslo P, Cheng JX, Murre C, Singh H, Glass CK. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and b cell identities. Mol Cell. 2010;38(4):576\u201389. https:\/\/doi.org\/10.1016\/j.molcel.2010.05.004.","journal-title":"Mol. Cell"},{"issue":"24","key":"4455_CR2","doi-asserted-by":"publisher","first-page":"3211","DOI":"10.1093\/bioinformatics\/bts611","volume":"28","author":"E Kopylova","year":"2012","unstructured":"Kopylova E, No\u00e9 L, Touzet H. SortMeRNA: fast and accurate filtering of ribosomal RNAs in metatranscriptomic data. Bioinformatics. 2012;28(24):3211\u20137. https:\/\/doi.org\/10.1093\/bioinformatics\/bts611.","journal-title":"Bioinformatics"},{"issue":"1","key":"4455_CR3","doi-asserted-by":"publisher","first-page":"15","DOI":"10.1093\/bioinformatics\/bts635","volume":"29","author":"A Dobin","year":"2012","unstructured":"Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2012;29(1):15\u201321. https:\/\/doi.org\/10.1093\/bioinformatics\/bts635.","journal-title":"Bioinformatics"},{"issue":"10","key":"4455_CR4","doi-asserted-by":"publisher","first-page":"108","DOI":"10.1093\/nar\/gkt214","volume":"41","author":"Y Liao","year":"2013","unstructured":"Liao Y, Smyth GK, Shi W. The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote. Nucleic Acids Res. 2013;41(10):108\u2013108. https:\/\/doi.org\/10.1093\/nar\/gkt214.","journal-title":"Nucleic Acids Res"},{"issue":"1","key":"4455_CR5","doi-asserted-by":"publisher","first-page":"66","DOI":"10.1186\/s13059-016-0924-1","volume":"17","author":"R Buels","year":"2016","unstructured":"Buels R, Yao E, Diesh CM, Hayes RD, Munoz-Torres M, Helt G, Goodstein DM, Elsik CG, Lewis SE, Stein L, Holmes IH. Jbrowse: a dynamic web platform for genome visualization and analysis. Genome Biol. 2016;17(1):66. https:\/\/doi.org\/10.1186\/s13059-016-0924-1.","journal-title":"Genome Biol"},{"issue":"3","key":"4455_CR6","doi-asserted-by":"publisher","first-page":"568","DOI":"10.1101\/gr.129684.111","volume":"22","author":"DC Koboldt","year":"2012","unstructured":"Koboldt DC, Zhang Q, Larson DE, Shen D, McLellan MD, Lin L, Miller CA, Mardis ER, Ding L, Wilson RK. Varscan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 2012;22(3):568\u201376. https:\/\/doi.org\/10.1101\/gr.129684.111.","journal-title":"Genome Res"},{"issue":"10","key":"4455_CR7","doi-asserted-by":"publisher","first-page":"0163962","DOI":"10.1371\/journal.pone.0163962","volume":"11","author":"W Shen","year":"2016","unstructured":"Shen W, Le S, Li Y, Hu F. Seqkit: a cross-platform and ultrafast toolkit for fasta\/q file manipulation. PloS One. 2016;11(10):0163962\u20130163962. https:\/\/doi.org\/10.1371\/journal.pone.0163962.","journal-title":"PloS One"},{"issue":"9","key":"4455_CR8","doi-asserted-by":"publisher","first-page":"1297","DOI":"10.1101\/gr.107524.110","volume":"20","author":"A McKenna","year":"2010","unstructured":"McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA. The genome analysis toolkit: a mapreduce framework for analyzing next-generation dna sequencing data. Genome Res. 2010;20(9):1297\u2013303. https:\/\/doi.org\/10.1101\/gr.107524.110.","journal-title":"Genome Res"},{"key":"4455_CR9","unstructured":"Picard toolkit. Broad Institute (2019). https:\/\/broadinstitute.github.io\/picard"},{"issue":"19","key":"4455_CR10","doi-asserted-by":"publisher","first-page":"3826","DOI":"10.1093\/bioinformatics\/btz144","volume":"35","author":"K Kryukov","year":"2019","unstructured":"Kryukov K, Ueda MT, Nakagawa S, Imanishi T. Nucleotide archival format (NAF) enables efficient lossless reference-free compression of DNA sequences. Bioinformatics. 2019;35(19):3826\u20138. https:\/\/doi.org\/10.1093\/bioinformatics\/btz144.","journal-title":"Bioinformatics"},{"issue":"1","key":"4455_CR11","doi-asserted-by":"publisher","first-page":"117","DOI":"10.1093\/bioinformatics\/btt594","volume":"30","author":"AJ Pinho","year":"2013","unstructured":"Pinho AJ, Pratas D. MFCompress: a compression tool for FASTA and multi-FASTA data. Bioinformatics. 2013;30(1):117\u20138. https:\/\/doi.org\/10.1093\/bioinformatics\/btt594.","journal-title":"Bioinformatics"},{"issue":"8","key":"4455_CR12","doi-asserted-by":"publisher","first-page":"350","DOI":"10.6026\/97320630005350","volume":"5","author":"P Rajarajeswari","year":"2011","unstructured":"Rajarajeswari P, Apparao A. Dnabit compress-genome compression algorithm. Bioinformation. 2011;5(8):350\u201360. https:\/\/doi.org\/10.6026\/97320630005350.","journal-title":"Bioinformation"},{"issue":"15","key":"4455_CR13","doi-asserted-by":"publisher","first-page":"2213","DOI":"10.1093\/bioinformatics\/btu208","volume":"30","author":"L Roguski","year":"2014","unstructured":"Roguski L, Deorowicz S. DSRC 2-industry-oriented compression of FASTQ files. Bioinformatics. 2014;30(15):2213\u20135. https:\/\/doi.org\/10.1093\/bioinformatics\/btu208.","journal-title":"Bioinformatics"},{"key":"4455_CR14","unstructured":"Samtools Organisation: CRAM format specification (version 3.0: 2fcaab6). https:\/\/samtools.github.io\/hts-specs\/CRAMv3.pdf (2019)"},{"key":"4455_CR15","unstructured":"The SAM\/BAM format specification working group: sequence alignment\/map format specification (version 1.6: f2a6b99). 2019. https:\/\/samtools.github.io\/hts-specs\/SAMv1.pdf."},{"key":"4455_CR16","unstructured":"European Bioinformatics Institute: CRAM reference registry. https:\/\/www.ebi.ac.uk\/ena\/cram (2019)"},{"issue":"D1","key":"4455_CR17","doi-asserted-by":"publisher","first-page":"590","DOI":"10.1093\/nar\/gks1219","volume":"41","author":"C Quast","year":"2013","unstructured":"Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, Peplies J, Gl\u00f6ckner FO. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 2013;41(D1):590\u20136. https:\/\/doi.org\/10.1093\/nar\/gks1219.","journal-title":"Nucleic Acids Res"},{"key":"4455_CR18","doi-asserted-by":"publisher","first-page":"9","DOI":"10.1093\/nar\/gky092","volume":"8","author":"R Apweiler","year":"2004","unstructured":"Apweiler R, Bairoch A, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Natale DA, ODonovan C, Redaschi N, Yeh LSL. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 2004;8:9. https:\/\/doi.org\/10.1093\/nar\/gky092.","journal-title":"Nucleic Acids Res."},{"issue":"D1","key":"4455_CR19","doi-asserted-by":"publisher","first-page":"155","DOI":"10.1093\/nar\/gky1141","volume":"47","author":"A Kozomara","year":"2018","unstructured":"Kozomara A, Birgaoanu M, Griffiths-Jones S. mirbase: from microrna sequences to function. Nucleic Acids Res. 2018;47(D1):155\u201362. https:\/\/doi.org\/10.1093\/nar\/gky1141.","journal-title":"Nucleic Acids Res."},{"issue":"20","key":"4455_CR20","doi-asserted-by":"publisher","first-page":"3600","DOI":"10.1093\/bioinformatics\/bty350","volume":"34","author":"J K\u00f6ster","year":"2018","unstructured":"K\u00f6ster J, Rahmann S. Snakemake\u2014a scalable bioinformatics workflow engine. Bioinformatics. 2018;34(20):3600\u20133600. https:\/\/doi.org\/10.1093\/bioinformatics\/bty350.","journal-title":"Bioinformatics"},{"issue":"4","key":"4455_CR21","doi-asserted-by":"publisher","first-page":"316","DOI":"10.1038\/nbt.3820","volume":"35","author":"P Di Tommaso","year":"2017","unstructured":"Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017;35(4):316\u20139. https:\/\/doi.org\/10.1038\/nbt.3820.","journal-title":"Nat Biotechnol"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-021-04455-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12859-021-04455-3\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-021-04455-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,12,1]],"date-time":"2021-12-01T06:05:48Z","timestamp":1638338748000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-021-04455-3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,11,1]]},"references-count":21,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,12]]}},"alternative-id":["4455"],"URL":"https:\/\/doi.org\/10.1186\/s12859-021-04455-3","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2020.11.11.377689","asserted-by":"object"}]},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,11,1]]},"assertion":[{"value":"9 August 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"20 October 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"1 November 2021","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 December 2021","order":4,"name":"change_date","label":"Change Date","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"Update","order":5,"name":"change_type","label":"Change Type","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"The given name and family name of Dr. Harmen J. G. van de Werken were erroneously transposed in the citation of the original publication. The article has been updated to rectify the errors.","order":6,"name":"change_details","label":"Change Details","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not Applicable","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not Applicable","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}},{"value":"All sequences used were obtained from free public resources: <b>CM002240:<\/b>\n                      \n                      ; <b>hg19.fa:<\/b>\n                      \n                      ; <b>GRCh38.p12:<\/b>\n                      \n                      ; <b>GRCh38.p13:<\/b>\n                      \n                      ; <b>GRCh38.primary<\/b>_<b>asm:<\/b>\n                      \n                      ; <b>NC<\/b>_<b>001422:<\/b>\n                      \n                      ; <b>NC<\/b>_<b>045512.2:<\/b>\n                      \n                      ; <b>miRBase 22.1 hairpin:<\/b>\n                      \n                      ; <b>sacCer3-mature-tRNAs:<\/b>\n                      \n                      ; <b>SILVA<\/b>_<b>138<\/b>_<b>SSURef:<\/b>\n                      \n                      ; <b>silva-bac-16s-1d90:<\/b>\n                      \n                      ; <b>silva-euk-28s-id98:<\/b>\n                      \n                      ; <b>uniprot<\/b>_<b>sprot:<\/b>\n                      \n                      ;","order":5,"name":"Ethics","group":{"name":"EthicsHeading","label":"Sequences"}}],"article-number":"535"}}