{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,30]],"date-time":"2025-07-30T11:42:10Z","timestamp":1753875730843,"version":"3.41.2"},"reference-count":11,"publisher":"Oxford University Press (OUP)","issue":"6","license":[{"start":{"date-parts":[[2024,5,24]],"date-time":"2024-05-24T00:00:00Z","timestamp":1716508800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"German Federal Ministry of Education and Research"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,6,3]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Microbial sequencing data from clinical samples is often contaminated with human sequences, which have to be removed prior to sharing. Existing methods for human read removal, however, are applicable only after the target dataset has been retrieved in its entirety, putting the recipient at least temporarily in control of a potentially identifiable genetic dataset with potential implications under regulatory frameworks such as the GDPR. In some instances, the ability to carry out stream-based host depletion as part of the data transfer process may be preferable.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We present SWGTS, a client\u2013server application for the transfer and stream-based host depletion of sequencing reads. SWGTS enforces a robust upper bound on the maximum amount of human genetic data from any one client held in memory at any point in time by storing all incoming sequencing data in a limited-size, client-specific intermediate processing buffer, and by throttling the rate of incoming data if it exceeds the speed of host depletion carried out on the SWGTS server in the background. SWGTS exposes a HTTP\u2013REST interface, is implemented using docker-compose, Redis and traefik, and requires less than 8\u2009Gb of RAM for deployment. We demonstrate high filtering accuracy of SWGTS; incoming data transfer rates of up to 1.65 megabases per second in a conservative configuration; and mitigation of re-identification risks by the ability to limit the number of SNPs present on a popular population-scale genotyping array covered by reads in the SWGTS buffer to a low user-defined number, such as 10 or 100.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>SWGTS is available on GitHub: https:\/\/github.com\/AlBi-HHU\/swgts (https:\/\/doi.org\/10.5281\/zenodo.10891052). The repository also contains a jupyter notebook that can be used to reproduce all the benchmarks used in this article. All datasets used for benchmarking are publicly available.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btae332","type":"journal-article","created":{"date-parts":[[2024,5,24]],"date-time":"2024-05-24T21:35:09Z","timestamp":1716586509000},"source":"Crossref","is-referenced-by-count":0,"title":["SWGTS\u2014a platform for stream-based host DNA depletion"],"prefix":"10.1093","volume":"40","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-6039-377X","authenticated-orcid":false,"given":"Philipp","family":"Spohr","sequence":"first","affiliation":[{"name":"Algorithmic Bioinformatics, Heinrich Heine University D\u00fcsseldorf , D\u00fcsseldorf, 40225, Germany"},{"name":"Center for Digital Medicine , D\u00fcsseldorf, 40225, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1910-6131","authenticated-orcid":false,"given":"Max","family":"Ried","sequence":"additional","affiliation":[{"name":"Algorithmic Bioinformatics, Heinrich Heine University D\u00fcsseldorf , D\u00fcsseldorf, 40225, Germany"},{"name":"Center for Digital Medicine , D\u00fcsseldorf, 40225, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0009-0000-3581-0677","authenticated-orcid":false,"given":"Laura","family":"K\u00fchle","sequence":"additional","affiliation":[{"name":"Algorithmic Bioinformatics, Heinrich Heine University D\u00fcsseldorf , D\u00fcsseldorf, 40225, Germany"},{"name":"Center for Digital Medicine , D\u00fcsseldorf, 40225, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6394-4581","authenticated-orcid":false,"given":"Alexander","family":"Dilthey","sequence":"additional","affiliation":[{"name":"Center for Digital Medicine , D\u00fcsseldorf, 40225, Germany"},{"name":"Institute of Medical Microbiology and Hospital Hygiene, University Hospital D\u00fcsseldorf, Heinrich Heine University D\u00fcsseldorf , D\u00fcsseldorf, 40225, Germany"}]}],"member":"286","published-online":{"date-parts":[[2024,5,24]]},"reference":[{"key":"2024061207154531300_btae332-B1","doi-asserted-by":"crossref","first-page":"68","DOI":"10.1038\/nature15393","article-title":"A global reference for human genetic variation","volume":"526","author":"1000 Genomes Project Consortium","year":"2015","journal-title":"Nature"},{"key":"2024061207154531300_btae332-B2","first-page":"mgen000393","article-title":"Evaluation of methods for detecting human reads in microbial sequencing datasets","volume":"6","author":"Bush","year":"2020","journal-title":"Microb Genom"},{"key":"2024061207154531300_btae332-B3","doi-asserted-by":"crossref","first-page":"btad728","DOI":"10.1093\/bioinformatics\/btad728","article-title":"Hostile: accurate decontamination of microbial host sequences","volume":"39","author":"Constantinides","year":"2023","journal-title":"Bioinformatics"},{"key":"2024061207154531300_btae332-B4","doi-asserted-by":"crossref","first-page":"3291","DOI":"10.1093\/bioinformatics\/btac311","article-title":"ReadItAndKeep: rapid decontamination of SARS-CoV-2 sequencing reads","volume":"38","author":"Hunt","year":"2022","journal-title":"Bioinformatics"},{"key":"2024061207154531300_btae332-B5","doi-asserted-by":"crossref","first-page":"183","DOI":"10.1126\/science.1095019","article-title":"Genomic research and human subject privacy","volume":"305","author":"Lin","year":"2004","journal-title":"Science"},{"key":"2024061207154531300_btae332-B6","doi-asserted-by":"crossref","first-page":"3094","DOI":"10.1093\/bioinformatics\/bty191","article-title":"Minimap2: pairwise alignment for nucleotide sequences","volume":"34","author":"Li","year":"2018","journal-title":"Bioinformatics"},{"key":"2024061207154531300_btae332-B7","doi-asserted-by":"crossref","first-page":"315","DOI":"10.1007\/s00439-009-0771-1","article-title":"SNPs for a universal individual identification panel","volume":"127","author":"Pakstis","year":"2010","journal-title":"Hum Genet"},{"key":"2024061207154531300_btae332-B8","doi-asserted-by":"crossref","first-page":"e48316","DOI":"10.15252\/embr.201948316","article-title":"Re-identifiability of genomic data and the GDPR: assessing the re-identifiability of genomic data in light of the EU General Data Protection Regulation","volume":"20","author":"Shabani","year":"2019","journal-title":"EMBO Rep"},{"key":"2024061207154531300_btae332-B9","doi-asserted-by":"crossref","first-page":"1079","DOI":"10.1038\/s41564-023-01381-3","article-title":"Reconstruction of the personal information from human genome reads in gut metagenome sequencing data","volume":"8","author":"Tomofuji","year":"2023","journal-title":"Nat Microbiol"},{"key":"2024061207154531300_btae332-B10","doi-asserted-by":"crossref","first-page":"1039","DOI":"10.1093\/cid\/ciab588","article-title":"Characterization of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection clusters based on integrated genomic surveillance, outbreak analysis and contact tracing in an urban setting","volume":"74","author":"Walker","year":"2022","journal-title":"Clin Infect Dis"},{"key":"2024061207154531300_btae332-B11","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1186\/s13059-019-1891-0","article-title":"Improved metagenomic analysis with Kraken 2","volume":"20","author":"Wood","year":"2019","journal-title":"Genome Biol"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btae332\/57890135\/btae332.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/6\/btae332\/58200787\/btae332.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/6\/btae332\/58200787\/btae332.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,6,12]],"date-time":"2024-06-12T07:52:35Z","timestamp":1718178755000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btae332\/7681885"}},"subtitle":[],"editor":[{"given":"Can","family":"Alkan","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2024,5,24]]},"references-count":11,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2024,6,3]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btae332","relation":{},"ISSN":["1367-4811"],"issn-type":[{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2024,6]]},"published":{"date-parts":[[2024,5,24]]},"article-number":"btae332"}}