{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:34:35Z","timestamp":1772138075651,"version":"3.50.1"},"reference-count":11,"publisher":"Oxford University Press (OUP)","issue":"7","license":[{"start":{"date-parts":[[2025,6,26]],"date-time":"2025-06-26T00:00:00Z","timestamp":1750896000000},"content-version":"vor","delay-in-days":1,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"European Commission through the European Research Council","award":["101078461"],"award-info":[{"award-number":["101078461"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,7,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Summary<\/jats:title>\n                    <jats:p>Mining homologous biosynthetic gene clusters (BGCs) typically involves searching colocalised genes against large genomic databases. However, the high degree of genomic redundancy in these databases often propagates into the resulting hit sets, complicating downstream analyses and visualization. To address this challenge, we present CAGEcleaner, a Python-based pipeline with auxiliary bash scripts designed to reduce redundancy in gene cluster hit sets by dereplicating the genomes that host these hits. CAGEcleaner integrates seamlessly with widely used gene cluster mining tools, such as cblaster and CAGECAT, enabling efficient filtering and streamlining BGC discovery workflows.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>Source code and documentation is hosted at GitHub (https:\/\/github.com\/LucoDevro\/CAGEcleaner) and Zenodo (https:\/\/doi.org\/10.5281\/zenodo.14726119) under an MIT license. For accessibility, CAGEcleaner is installable from Bioconda (https:\/\/anaconda.org\/bioconda\/cagecleaner) and PyPi (https:\/\/pypi.org\/project\/cagecleaner\/), and is also available as a Docker image from DockerHub (https:\/\/hub.docker.com\/r\/lucodevro\/cagecleaner).<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaf373","type":"journal-article","created":{"date-parts":[[2025,6,24]],"date-time":"2025-06-24T07:36:49Z","timestamp":1750750609000},"source":"Crossref","is-referenced-by-count":1,"title":["CAGEcleaner: reducing genomic redundancy in gene cluster mining"],"prefix":"10.1093","volume":"41","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-6221-9348","authenticated-orcid":false,"given":"Lucas","family":"De Vrieze","sequence":"first","affiliation":[{"name":"Department of Biology, KU Leuven , Heverlee, 3001,","place":["Belgium"]},{"name":"Laboratory for Biomolecular Discovery & Engineering, VIB-KU Leuven Center for Microbiology , Heverlee, 3001,","place":["Belgium"]}]},{"given":"Miguel","family":"Biltjes","sequence":"additional","affiliation":[{"name":"Department of Biology, KU Leuven , Heverlee, 3001,","place":["Belgium"]},{"name":"Laboratory for Biomolecular Discovery & Engineering, VIB-KU Leuven Center for Microbiology , Heverlee, 3001,","place":["Belgium"]}]},{"given":"Sofya","family":"Lukashevich","sequence":"additional","affiliation":[{"name":"Department of Biology, KU Leuven , Heverlee, 3001,","place":["Belgium"]},{"name":"Laboratory for Biomolecular Discovery & Engineering, VIB-KU Leuven Center for Microbiology , Heverlee, 3001,","place":["Belgium"]}]},{"given":"Kodai","family":"Tsurumi","sequence":"additional","affiliation":[{"name":"Department of Biology, KU Leuven , Heverlee, 3001,","place":["Belgium"]},{"name":"Laboratory for Biomolecular Discovery & Engineering, VIB-KU Leuven Center for Microbiology , Heverlee, 3001,","place":["Belgium"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4366-3675","authenticated-orcid":false,"given":"Joleen","family":"Masschelein","sequence":"additional","affiliation":[{"name":"Department of Biology, KU Leuven , Heverlee, 3001,","place":["Belgium"]},{"name":"Laboratory for Biomolecular Discovery & Engineering, VIB-KU Leuven Center for Microbiology , Heverlee, 3001,","place":["Belgium"]}]}],"member":"286","published-online":{"date-parts":[[2025,6,25]]},"reference":[{"key":"2025071516411930000_btaf373-B1","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gkaf334","article-title":"antiSMASH 8.0: extended gene cluster detection capabilities and analyses of chemistry, enzymology, and regulation","author":"Blin","year":"2025:","journal-title":"Nucleic Acids Res"},{"key":"2025071516411930000_btaf373-B2","doi-asserted-by":"publisher","author":"Cediel-Becerra","year":"2025","DOI":"10.1101\/2025.02.24.639861,"},{"key":"2025071516411930000_btaf373-B3","doi-asserted-by":"publisher","first-page":"vbab016","DOI":"10.1093\/bioadv\/vbab016","article-title":"cblaster: a remote search tool for rapid identification and visualization of homologous gene clusters","volume":"1","author":"Gilchrist","year":"2021","journal-title":"Bioinf Adv"},{"key":"2025071516411930000_btaf373-B4","doi-asserted-by":"publisher","first-page":"2473","DOI":"10.1093\/bioinformatics\/btab007","article-title":"clinker & clustermap.Js: automatic generation of gene cluster comparison figures","volume":"37","author":"Gilchrist","year":"2021","journal-title":"Bioinformatics"},{"key":"2025071516411930000_btaf373-B5","doi-asserted-by":"publisher","first-page":"1218","DOI":"10.1093\/molbev\/mst025","article-title":"Detecting sequence homology at the gene cluster level with MultiGeneBlast","volume":"30","author":"Medema","year":"2013","journal-title":"Mol Biol Evol"},{"key":"2025071516411930000_btaf373-B6","doi-asserted-by":"publisher","first-page":"732","DOI":"10.1038\/s41597-024-03571-y","article-title":"Exploring and retrieving sequence and metadata for species across the tree of life with NCBI datasets","volume":"11","author":"O'Leary","year":"2024","journal-title":"Sci Data"},{"key":"2025071516411930000_btaf373-B7","doi-asserted-by":"publisher","author":"Salamzade","year":"2023","DOI":"10.1101\/2023.09.27.559801,"},{"key":"2025071516411930000_btaf373-B8","doi-asserted-by":"publisher","DOI":"10.1093\/nar\/gkaf045","article-title":"Zol and fai: large-scale targeted detection and evolutionary investigation of gene clusters","volume":"53","author":"Salamzade","year":"2025","journal-title":"Nucleic Acids Res"},{"key":"2025071516411930000_btaf373-B9","doi-asserted-by":"publisher","first-page":"1661","DOI":"10.1038\/s41592-023-02018-3","article-title":"Fast and robust metagenomic sequence comparison through sparse chaining with Skani","volume":"20","author":"Shaw","year":"2023","journal-title":"Nat Methods"},{"key":"2025071516411930000_btaf373-B10","doi-asserted-by":"publisher","first-page":"181","DOI":"10.1186\/s12859-023-05311-2","article-title":"CAGECAT: the CompArative GEne cluster analysis toolbox for rapid search and visualisation of homologous gene clusters","volume":"24","author":"van den Belt","year":"2023","journal-title":"BMC Bioinformatics"},{"key":"2025071516411930000_btaf373-B6796071","doi-asserted-by":"publisher","first-page":"D678","DOI":"10.1093\/nar\/gkae1115","article-title":"MIBiG 4.0: advancing biosynthetic gene cluster curation through global collaboration","volume":"53","author":"Zdouc","year":"2025"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaf373\/63582559\/btaf373.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/7\/btaf373\/63582559\/btaf373.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/7\/btaf373\/63582559\/btaf373.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,15]],"date-time":"2025-07-15T20:41:25Z","timestamp":1752612085000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btaf373\/8173959"}},"subtitle":[],"editor":[{"given":"Macha","family":"Nikolski","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2025,6,25]]},"references-count":11,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2025,7,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaf373","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2025.02.19.639057","asserted-by":"object"}]},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,7]]},"published":{"date-parts":[[2025,6,25]]},"article-number":"btaf373"}}