{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,22]],"date-time":"2025-02-22T00:38:57Z","timestamp":1740184737887,"version":"3.37.3"},"reference-count":29,"publisher":"Oxford University Press (OUP)","issue":"9","license":[{"start":{"date-parts":[[2020,11,3]],"date-time":"2020-11-03T00:00:00Z","timestamp":1604361600000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"New Hampshire-INBRE through an Institutional Development Award","award":["P20GM103506"],"award-info":[{"award-number":["P20GM103506"]}]},{"DOI":"10.13039\/100000057","name":"National Institute of General Medical Sciences of the NIH","doi-asserted-by":"crossref","id":[{"id":"10.13039\/100000057","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2021,6,9]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Reproducibility is of central importance to the scientific process. The difficulty of consistently replicating and verifying experimental results is magnified in the era of big data, in which bioinformatics analysis often involves complex multi-application pipelines operating on terabytes of data. These processes result in thousands of possible permutations of data preparation steps, software versions and command-line arguments. Existing reproducibility frameworks are cumbersome and involve redesigning computational methods. To address these issues, we developed RepeatFS, a file system that records, replicates and verifies informatics workflows with no alteration to the original methods. RepeatFS also provides several other features to help promote analytical transparency and reproducibility, including provenance visualization and task automation.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We used RepeatFS to successfully visualize and replicate a variety of bioinformatics tasks consisting of over a million operations with no alteration to the original methods. RepeatFS correctly identified all software inconsistencies that resulted in replication differences.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availabilityand implementation<\/jats:title>\n                  <jats:p>RepeatFS is implemented in Python 3. Its source code and documentation are available at https:\/\/github.com\/ToniWestbrook\/repeatfs.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Supplementary information<\/jats:title>\n                  <jats:p>Supplementary data are available at Bioinformatics online.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaa950","type":"journal-article","created":{"date-parts":[[2020,10,30]],"date-time":"2020-10-30T12:11:24Z","timestamp":1604059884000},"page":"1292-1296","source":"Crossref","is-referenced-by-count":2,"title":["RepeatFS: a file system providing reproducibility through provenance and automation"],"prefix":"10.1093","volume":"37","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-5398-6916","authenticated-orcid":false,"given":"Anthony","family":"Westbrook","sequence":"first","affiliation":[{"name":"Department of Computer Science"},{"name":"Hubbard Center for Genome Studies"}]},{"given":"Elizabeth","family":"Varki","sequence":"additional","affiliation":[{"name":"Department of Computer Science"}]},{"given":"W Kelley","family":"Thomas","sequence":"additional","affiliation":[{"name":"Hubbard Center for Genome Studies"},{"name":"Department of Molecular Cellular and Biomedical Sciences, University of New Hampshire , Durham, NH 03824, USA"}]}],"member":"286","published-online":{"date-parts":[[2020,11,24]]},"reference":[{"key":"2023051706055208100_btaa950-B1","doi-asserted-by":"crossref","first-page":"W537","DOI":"10.1093\/nar\/gky379","article-title":"The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update","volume":"46","author":"Afgan","year":"2018","journal-title":"Nucleic Acids Res"},{"key":"2023051706055208100_btaa950-B2","doi-asserted-by":"crossref","first-page":"403","DOI":"10.1016\/S0022-2836(05)80360-2","article-title":"Basic local alignment search tool","volume":"215","author":"Altschul","year":"1990","journal-title":"J. Mol. Biol"},{"year":"2020","key":"2023051706055208100_btaa950-B3"},{"key":"2023051706055208100_btaa950-B4","doi-asserted-by":"crossref","first-page":"452","DOI":"10.1038\/533452a","article-title":"1,500 scientists lift the lid on reproducibility","volume":"533","author":"Baker","year":"2016","journal-title":"Nature"},{"key":"2023051706055208100_btaa950-B5","doi-asserted-by":"crossref","first-page":"455","DOI":"10.1089\/cmb.2012.0021","article-title":"SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing","volume":"19","author":"Bankevich","year":"2012","journal-title":"J. Comput. Biol"},{"key":"2023051706055208100_btaa950-B6","doi-asserted-by":"crossref","first-page":"2114","DOI":"10.1093\/bioinformatics\/btu170","article-title":"Trimmomatic: a flexible trimmer for Illumina sequence data","volume":"30","author":"Bolger","year":"2014","journal-title":"Bioinformatics"},{"volume-title":"et al.","key":"2023051706055208100_btaa950-B7"},{"key":"2023051706055208100_btaa950-B8","doi-asserted-by":"crossref","first-page":"1422","DOI":"10.1093\/bioinformatics\/btp163","article-title":"Biopython: freely available Python tools for computational molecular biology and bioinformatics","volume":"25","author":"Cock","year":"2009","journal-title":"Bioinformatics"},{"key":"2023051706055208100_btaa950-B9","doi-asserted-by":"crossref","first-page":"963","DOI":"10.1093\/jamia\/ocy028","article-title":"Does health informatics have a replication crisis?","volume":"25","author":"Coiera","year":"2018","journal-title":"J. Am. Med. Inform. Assoc"},{"key":"2023051706055208100_btaa950-B10","doi-asserted-by":"crossref","first-page":"225","DOI":"10.1080\/14737159.2017.1282822","article-title":"Genomics pipelines and data integration: challenges and opportunities in the research setting","volume":"17","author":"Davis-Turak","year":"2017","journal-title":"Expert Rev. Mol. Diagn"},{"year":"2020","key":"2023051706055208100_btaa950-B11"},{"author":"Felsenstein","key":"2023051706055208100_btaa950-B12"},{"key":"2023051706055208100_btaa950-B13","doi-asserted-by":"crossref","first-page":"e80278","DOI":"10.1371\/journal.pone.0080278","article-title":"Quantifying reproducibility in computational biology: the case of the tuberculosis drugome","volume":"8","author":"Garijo","year":"2013","journal-title":"PLoS One"},{"key":"2023051706055208100_btaa950-B14","doi-asserted-by":"crossref","first-page":"2778","DOI":"10.1093\/bioinformatics\/btq524","article-title":"Ruffus: a lightweight Python library for computational pipelines","volume":"26","author":"Goodstadt","year":"2010","journal-title":"Bioinformatics"},{"year":"2014","author":"Gordon","key":"2023051706055208100_btaa950-B15"},{"key":"2023051706055208100_btaa950-B16","doi-asserted-by":"crossref","first-page":"337","DOI":"10.1186\/s12859-017-1747-0","article-title":"Investigating reproducibility and tracking provenance \u2013 a genomic workflow case study","volume":"18","author":"Kanwal","year":"2017","journal-title":"BMC Bioinformatics"},{"key":"2023051706055208100_btaa950-B17","doi-asserted-by":"crossref","first-page":"3059","DOI":"10.1093\/nar\/gkf436","article-title":"MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform","volume":"30","author":"Katoh","year":"2002","journal-title":"Nucleic Acids Res"},{"key":"2023051706055208100_btaa950-B18","doi-asserted-by":"crossref","first-page":"giy077","DOI":"10.1093\/gigascience\/giy077","article-title":"Experimenting with reproducibility: a case study of robustness in bioinformatics","volume":"7","author":"Kim","year":"2018","journal-title":"GigaScience"},{"key":"2023051706055208100_btaa950-B19","doi-asserted-by":"crossref","first-page":"2520","DOI":"10.1093\/bioinformatics\/bts480","article-title":"Snakemake\u2014a scalable bioinformatics workflow engine","volume":"28","author":"K\u00f6ster","year":"2012","journal-title":"Bioinformatics"},{"key":"2023051706055208100_btaa950-B20","doi-asserted-by":"crossref","first-page":"D19","DOI":"10.1093\/nar\/gkq1019","article-title":"The sequence read archive","volume":"39","author":"Leinonen","year":"2011","journal-title":"Nucleic Acids Res"},{"key":"2023051706055208100_btaa950-B21","doi-asserted-by":"crossref","first-page":"52","DOI":"10.1186\/s12918-016-0288-x","article-title":"Where next for the reproducibility agenda in computational biology?","volume":"10","author":"Lewis","year":"2016","journal-title":"BMC Syst. Biol"},{"year":"2011","author":"Li","key":"2023051706055208100_btaa950-B22"},{"key":"2023051706055208100_btaa950-B23","doi-asserted-by":"crossref","first-page":"2078","DOI":"10.1093\/bioinformatics\/btp352","article-title":"The Sequence alignment\/map format and SAMtools","volume":"25","author":"Li","year":"2009","journal-title":"Bioinformatics"},{"key":"2023051706055208100_btaa950-B24","doi-asserted-by":"crossref","first-page":"D590","DOI":"10.1093\/nar\/gks1219","article-title":"The SILVA ribosomal RNA gene database project: improved data processing and web-based tools","volume":"41","author":"Quast","year":"2012","journal-title":"Nucleic Acids Res"},{"key":"2023051706055208100_btaa950-B25","first-page":"1525","article-title":"Bpipe: a tool for running and managing bioinformatics pipelines","volume":"28","author":"Sadedin","year":"2012","journal-title":"Bioinf. Oxf. Engl"},{"key":"2023051706055208100_btaa950-B26","doi-asserted-by":"crossref","first-page":"2068","DOI":"10.1093\/bioinformatics\/btu153","article-title":"Prokka: rapid prokaryotic genome annotation","volume":"30","author":"Seemann","year":"2014","journal-title":"Bioinformatics"},{"key":"2023051706055208100_btaa950-B27","doi-asserted-by":"crossref","first-page":"1312","DOI":"10.1093\/bioinformatics\/btu033","article-title":"RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies","volume":"30","author":"Stamatakis","year":"2014","journal-title":"Bioinformatics"},{"key":"2023051706055208100_btaa950-B28","doi-asserted-by":"crossref","first-page":"4673","DOI":"10.1093\/nar\/22.22.4673","article-title":"CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice","volume":"22","author":"Thompson","year":"1994","journal-title":"Nucleic Acids Res"},{"key":"2023051706055208100_btaa950-B29","doi-asserted-by":"crossref","first-page":"e1006843","DOI":"10.1371\/journal.pcbi.1006843","article-title":"Script of Scripts: a pragmatic workflow system for daily computational research","volume":"15","author":"Wang","year":"2019","journal-title":"PLoS Comput. Biol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaa950\/34470741\/btaa950.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/9\/1292\/50359570\/btaa950.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/37\/9\/1292\/50359570\/btaa950.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,5,17]],"date-time":"2023-05-17T06:12:25Z","timestamp":1684303945000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/37\/9\/1292\/5952659"}},"subtitle":[],"editor":[{"given":"Wren","family":"Jonathan","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2020,11,24]]},"references-count":29,"journal-issue":{"issue":"9","published-print":{"date-parts":[[2021,6,9]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaa950","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2021,5,1]]},"published":{"date-parts":[[2020,11,24]]}}}