{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,10]],"date-time":"2026-04-10T02:47:10Z","timestamp":1775789230301,"version":"3.50.1"},"reference-count":24,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2023,10,2]],"date-time":"2023-10-02T00:00:00Z","timestamp":1696204800000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2023,10,2]],"date-time":"2023-10-02T00:00:00Z","timestamp":1696204800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100008730","name":"Kreftforeningen","doi-asserted-by":"publisher","award":["190179 and 198048"],"award-info":[{"award-number":["190179 and 198048"]}],"id":[{"id":"10.13039\/100008730","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100005366","name":"University of Oslo","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100005366","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Background<\/jats:title>\n                    <jats:p>Shotgun metagenome sequencing data obtained from a host environment will usually be contaminated with sequences from the host organism. Host sequences should be removed before further analysis to avoid biases, reduce downstream computational load, or ensure privacy in the case of a human host. The tools that we identified, as designed specifically to perform host contamination sequence removal, were either outdated, not maintained, or complicated to use. Consequently, we have developed HoCoRT, a fast and user-friendly tool that implements several methods for optimised host sequence removal. We have evaluated the speed and accuracy of these methods.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>HoCoRT is an open-source command-line tool for host contamination removal. It is designed to be easy to install and use, offering a one-step option for genome indexing. HoCoRT employs a variety of well-known mapping, classification, and alignment methods to classify reads. The user can select the underlying classification method and its parameters, allowing adaptation to different scenarios. Based on our investigation of various methods and parameters using synthetic human gut and oral microbiomes, and on assessment of publicly available data, we provide recommendations for typical datasets with short and long reads.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Conclusions<\/jats:title>\n                    <jats:p>\n                      To decontaminate a human gut microbiome with short reads using HoCoRT, we found the optimal combination of speed and accuracy with BioBloom, Bowtie2 in end-to-end mode, and HISAT2. Kraken2 consistently demonstrated the highest speed, albeit with a trade-off in accuracy. The same applies to an oral microbiome, but here Bowtie2 was notably slower than the other tools. For long reads, the detection of human host reads is more difficult. In this case, a combination of Kraken2 and Minimap2 achieved the highest accuracy and detected 59% of human reads. In comparison to the dedicated DeconSeq tool, HoCoRT using Bowtie2 in end-to-end mode proved considerably faster and slightly more accurate. HoCoRT is available as a Bioconda package, and the source code can be accessed at\n                      <jats:ext-link xmlns:xlink=\"http:\/\/www.w3.org\/1999\/xlink\" ext-link-type=\"uri\" xlink:href=\"https:\/\/github.com\/ignasrum\/hocort\">https:\/\/github.com\/ignasrum\/hocort<\/jats:ext-link>\n                      along with the documentation. It is released under the MIT licence and is compatible with Linux and macOS (except for the BioBloom module).\n                    <\/jats:p>\n                  <\/jats:sec>","DOI":"10.1186\/s12859-023-05492-w","type":"journal-article","created":{"date-parts":[[2023,10,2]],"date-time":"2023-10-02T07:02:05Z","timestamp":1696230125000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":33,"title":["HoCoRT: host contamination removal tool"],"prefix":"10.1186","volume":"24","author":[{"given":"Ignas","family":"Rumbavicius","sequence":"first","affiliation":[]},{"given":"Trine B.","family":"Rounge","sequence":"additional","affiliation":[]},{"given":"Torbj\u00f8rn","family":"Rognes","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2023,10,2]]},"reference":[{"issue":"1","key":"5492_CR1","doi-asserted-by":"publisher","first-page":"178","DOI":"10.1093\/bib\/bbz155","volume":"22","author":"R Bharti","year":"2021","unstructured":"Bharti R, Grimm DG. Current challenges and best-practice protocols for microbiome analysis. Brief Bioinform. 2021;22(1):178\u201393. https:\/\/doi.org\/10.1093\/bib\/bbz155.","journal-title":"Brief Bioinform"},{"key":"5492_CR2","doi-asserted-by":"publisher","first-page":"257","DOI":"10.1186\/s12859-020-03585-4","volume":"21","author":"S Kieser","year":"2020","unstructured":"Kieser S, Brown J, Zdobnov EM, Trajkovski M, McCue LA. ATLAS: a Snakemake workflow for assembly, annotation, and genomic binning of metagenome sequence data. BMC Bioinformatics. 2020;21:257. https:\/\/doi.org\/10.1186\/s12859-020-03585-4.","journal-title":"BMC Bioinformatics"},{"key":"5492_CR3","doi-asserted-by":"publisher","first-page":"46","DOI":"10.1186\/s40168-019-0658-x","volume":"7","author":"EL Clarke","year":"2019","unstructured":"Clarke EL, Taylor LJ, Zhao C, Connell A, Lee J, Bushman FD, Bittinger K. Sunbeam: an extensible pipeline for analyzing metagenomic sequencing experiments. Microbiome. 2019;7:46. https:\/\/doi.org\/10.1186\/s40168-019-0658-x.","journal-title":"Microbiome"},{"key":"5492_CR4","unstructured":"Bushnell B. BBMap short read aligner, and other bioinformatic tools. https:\/\/sourceforge.net\/projects\/bbmap\/. Accessed 1 May 2022."},{"key":"5492_CR5","unstructured":"Joint Genome Institute BBTools. https:\/\/jgi.doe.gov\/data-and-tools\/software-tools\/bbtools\/. Accessed 30 March 2023."},{"issue":"14","key":"5492_CR6","doi-asserted-by":"publisher","first-page":"1754","DOI":"10.1093\/bioinformatics\/btp324","volume":"25","author":"H Li","year":"2009","unstructured":"Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25(14):1754\u201360. https:\/\/doi.org\/10.1093\/bioinformatics\/btp324.","journal-title":"Bioinformatics"},{"issue":"4","key":"5492_CR7","doi-asserted-by":"publisher","first-page":"357","DOI":"10.1038\/nmeth.1923","volume":"9","author":"B Langmead","year":"2012","unstructured":"Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9(4):357\u20139. https:\/\/doi.org\/10.1038\/nmeth.1923.","journal-title":"Nat Methods"},{"issue":"3","key":"5492_CR8","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0017288","volume":"6","author":"R Schmieder","year":"2011","unstructured":"Schmieder R, Edwards R. Fast identification and removal of sequence contamination from genomic and metagenomic datasets. PLoS ONE. 2011;6(3): e17288. https:\/\/doi.org\/10.1371\/journal.pone.0017288.","journal-title":"PLoS ONE"},{"issue":"13","key":"5492_CR9","doi-asserted-by":"publisher","first-page":"2318","DOI":"10.1093\/bioinformatics\/bty963","volume":"35","author":"MD Czajkowski","year":"2019","unstructured":"Czajkowski MD, Vance DP, Frese SA, Casaburi G. GenCoF: a graphical user interface to rapidly remove human genome contaminants from metagenomic datasets. Bioinformatics. 2019;35(13):2318\u20139. https:\/\/doi.org\/10.1093\/bioinformatics\/bty963.","journal-title":"Bioinformatics"},{"key":"5492_CR10","doi-asserted-by":"publisher","unstructured":"Gr\u00fcning B, Dale R, Sj\u00f6din A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, K\u00f6ster J, The Bioconda Team. Bioconda: Sustainable and comprehensive software distribution for the life sciences. Nature Methods. 2018;15(7):475\u20136. https:\/\/doi.org\/10.1038\/s41592-018-0046-7","DOI":"10.1038\/s41592-018-0046-7"},{"issue":"23","key":"5492_CR11","doi-asserted-by":"publisher","first-page":"3402","DOI":"10.1093\/bioinformatics\/btu558","volume":"30","author":"J Chu","year":"2014","unstructured":"Chu J, Sadeghi S, Raymond A, Jackman SD, Nip KM, Mar R, Mohamadi H, Butterfield YS, Robertson AG, Birol I. BioBloom tools: fast, accurate and memory-efficient host species sequence screening using bloom filters. Bioinformatics. 2014;30(23):3402\u20134. https:\/\/doi.org\/10.1093\/bioinformatics\/btu558.","journal-title":"Bioinformatics"},{"key":"5492_CR12","doi-asserted-by":"publisher","first-page":"314","DOI":"10.1109\/IPDPS.2019.00041","volume":"2019","author":"M Vasimuddin","year":"2019","unstructured":"Vasimuddin M, Misra S, Li H, Aluru S. Efficient architecture-aware acceleration of BWA-MEM for multicore systems. IEEE Int Parallel Distrib Process Symp (IPDPS). 2019;2019:314\u201324. https:\/\/doi.org\/10.1109\/IPDPS.2019.00041.","journal-title":"IEEE Int Parallel Distrib Process Symp (IPDPS)"},{"issue":"8","key":"5492_CR13","doi-asserted-by":"publisher","first-page":"907","DOI":"10.1038\/s41587-019-0201-4","volume":"37","author":"D Kim","year":"2019","unstructured":"Kim D, Paggi JM, Park C, Bennett C, Salzberg SL. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol. 2019;37(8):907\u201315. https:\/\/doi.org\/10.1038\/s41587-019-0201-4.","journal-title":"Nat Biotechnol"},{"issue":"1","key":"5492_CR14","doi-asserted-by":"publisher","first-page":"257","DOI":"10.1186\/s13059-019-1891-0","volume":"20","author":"DE Wood","year":"2019","unstructured":"Wood DE, Lu J, Langmead B. Improved metagenomic analysis with Kraken 2. Genome Biol. 2019;20(1):257. https:\/\/doi.org\/10.1186\/s13059-019-1891-0.","journal-title":"Genome Biol"},{"issue":"18","key":"5492_CR15","doi-asserted-by":"publisher","first-page":"3094","DOI":"10.1093\/bioinformatics\/bty191","volume":"34","author":"H Li","year":"2018","unstructured":"Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34(18):3094\u2013100. https:\/\/doi.org\/10.1093\/bioinformatics\/bty191.","journal-title":"Bioinformatics"},{"key":"5492_CR16","doi-asserted-by":"publisher","unstructured":"Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup. The sequence alignment\/map format and SAMtools. Bioinformatics. 2009;25(16):2078\u201379. https:\/\/doi.org\/10.1093\/bioinformatics\/btp352.","DOI":"10.1093\/bioinformatics\/btp352"},{"issue":"D1","key":"5492_CR17","doi-asserted-by":"publisher","first-page":"D161","DOI":"10.1093\/nar\/gkab1135","volume":"50","author":"EW Sayers","year":"2022","unstructured":"Sayers EW, Cavanaugh M, Clark K, Pruitt KD, Schoch CL, Sherry ST, Karsch-Mizrachi I. GenBank. Nucleic Acids Res. 2022;50(D1):D161\u20134. https:\/\/doi.org\/10.1093\/nar\/gkab1135.","journal-title":"Nucleic Acids Res"},{"issue":"3","key":"5492_CR18","doi-asserted-by":"publisher","first-page":"521","DOI":"10.1093\/bioinformatics\/bty630","volume":"35","author":"H Gourl\u00e9","year":"2019","unstructured":"Gourl\u00e9 H, Karlsson-Lindsj\u00f6 O, Hayer J, Bongcam-Rudloff E. Simulating illumina metagenomic data with InSilicoSeq. Bioinformatics. 2019;35(3):521\u20132. https:\/\/doi.org\/10.1093\/bioinformatics\/bty630.","journal-title":"Bioinformatics"},{"issue":"4","key":"5492_CR19","doi-asserted-by":"publisher","first-page":"gix010","DOI":"10.1093\/gigascience\/gix010","volume":"6","author":"C Yang","year":"2017","unstructured":"Yang C, Chu J, Warren RL, Birol I. NanoSim: nanopore sequence read simulator based on statistical characterization. GigaScience. 2017;6(4):gix010. https:\/\/doi.org\/10.1093\/gigascience\/gix010.","journal-title":"GigaScience"},{"issue":"3","key":"5492_CR20","doi-asserted-by":"publisher","first-page":"Iqab071","DOI":"10.1093\/nargab\/lqab071","volume":"3","author":"E Rachtman","year":"2021","unstructured":"Rachtman E, Bafna V, Mirarab S. CONSULT: accurate contamination removal using locality-sensitive hashing. NAR Genom Bioinf. 2021;3(3):Iqab071. https:\/\/doi.org\/10.1093\/nargab\/lqab071.","journal-title":"NAR Genom Bioinf"},{"key":"5492_CR21","doi-asserted-by":"publisher","first-page":"236","DOI":"10.1186\/s12864-015-1419-2","volume":"16","author":"R Ounit","year":"2015","unstructured":"Ounit R, Wanamaker S, Close TJ, Lonardi S. CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers. BMC Genomics. 2015;16:236. https:\/\/doi.org\/10.1186\/s12864-015-1419-2.","journal-title":"BMC Genomics"},{"issue":"19","key":"5492_CR22","doi-asserted-by":"publisher","first-page":"2520","DOI":"10.1093\/bioinformatics\/bts480","volume":"28","author":"J Koster","year":"2012","unstructured":"Koster J, Rahmann S. Snakemake\u2014a scalable bioinformatics workflow engine. Bioinformatics. 2012;28(19):2520\u20132. https:\/\/doi.org\/10.1093\/bioinformatics\/bts480.","journal-title":"Bioinformatics"},{"issue":"17","key":"5492_CR23","doi-asserted-by":"publisher","first-page":"3389","DOI":"10.1093\/nar\/25.17.3389","volume":"25","author":"SF Altschul","year":"1997","unstructured":"Altschul SF, Madden TL, Sch\u00e4ffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25(17):3389\u2013402. https:\/\/doi.org\/10.1093\/nar\/25.17.3389.","journal-title":"Nucleic Acids Res"},{"key":"5492_CR24","doi-asserted-by":"crossref","unstructured":"Rumbavicius I. Tool to remove specific organisms from microbiome sequencing data - Host Contamination Removal Tool (HoCoRT). Master thesis, Department of Informatics, University of Oslo, Norway. 2022. http:\/\/urn.nb.no\/URN:NBN:no-98212.","DOI":"10.1101\/2022.11.18.517030"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-023-05492-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s12859-023-05492-w\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s12859-023-05492-w.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,11,21]],"date-time":"2023-11-21T23:56:05Z","timestamp":1700610965000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/s12859-023-05492-w"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,10,2]]},"references-count":24,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,12]]}},"alternative-id":["5492"],"URL":"https:\/\/doi.org\/10.1186\/s12859-023-05492-w","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2022.11.18.517030","asserted-by":"object"}]},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,10,2]]},"assertion":[{"value":"25 May 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"21 September 2023","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"2 October 2023","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"Not applicable.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Ethics approval and consent to participate"}},{"value":"Not applicable.","order":3,"name":"Ethics","group":{"name":"EthicsHeading","label":"Consent for publication"}},{"value":"The authors declare that they have no competing interests.","order":4,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"371"}}