{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,1]],"date-time":"2026-04-01T19:01:44Z","timestamp":1775070104198,"version":"3.50.1"},"reference-count":56,"publisher":"Oxford University Press (OUP)","issue":"12","license":[{"start":{"date-parts":[[2025,11,21]],"date-time":"2025-11-21T00:00:00Z","timestamp":1763683200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"National Institutes of Health, National Institute of General Medical Sciences","award":["R35GM146987"],"award-info":[{"award-number":["R35GM146987"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,12,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Motivation<\/jats:title>\n                    <jats:p>Natural products are often produced by a set of biosynthetic enzymes that are encoded by genes clustered together in the producer\u2019s genome, referred to as a biosynthetic gene cluster (BGC). The ability to compare and cluster BGCs is essential for several applications, including predicting which bacteria will make a known product and assessing the potential diversity of natural products produced by a set of bacteria. There are multiple methods for comparing and clustering BGCs based on their similarity, but there has been a lack of investigation into how strongly BGC similarity relates to product structural similarity and how these methods perform relative to each other.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Results<\/jats:title>\n                    <jats:p>Using publicly available databases, we developed a benchmark dataset to assess how well different BGC similarity metrics correlate with the structural similarity of their products and how well these methods cluster BGCs. We found that all methods showed moderate correlation between BGC and structural similarity, with correlations improving for more similar BGCs and varying significantly by BGC biosynthetic class. Analysis of outliers revealed some outliers were due to mistakes or omissions in public datasets, while others represented deviation between BGC similarity and product structural similarity. All methods generally performed better on clustering metrics, with BiG-SCAPE performing the best after errors in the public datasets had been corrected.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>Scripts and data required to reproduce the results are available at https:\/\/github.com\/aswalker-lab\/BGC-clustering-benchmark and processed similarity, clusters, and scaffolds are also available at https:\/\/huggingface.co\/datasets\/allie-walker\/BGC-clustering-benchmark. Code is also available at Zenodo: 10.5281\/zenodo.17373546<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaf636","type":"journal-article","created":{"date-parts":[[2025,11,21]],"date-time":"2025-11-21T13:19:58Z","timestamp":1763731198000},"source":"Crossref","is-referenced-by-count":1,"title":["Benchmarking methods for measuring biosynthetic gene cluster similarity and determination of gene cluster families"],"prefix":"10.1093","volume":"41","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-9528-5288","authenticated-orcid":false,"given":"Abiodun S","family":"Oyedele","sequence":"first","affiliation":[{"name":"Department of Chemistry, Vanderbilt University , Nashville, TN 37240,","place":["United States"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5666-7232","authenticated-orcid":false,"given":"Allison S","family":"Walker","sequence":"additional","affiliation":[{"name":"Department of Chemistry, Vanderbilt University , Nashville, TN 37240,","place":["United States"]},{"name":"Department of Biological Sciences, Vanderbilt University , Nashville, TN 37235,","place":["United States"]},{"name":"Department of Pathology, Microbiology, and Immunology, Vanderbilt University Medical Center , Nashville, TN 37232,","place":["United States"]}]}],"member":"286","published-online":{"date-parts":[[2025,11,21]]},"reference":[{"key":"2025121319013999100_btaf636-B1","doi-asserted-by":"publisher","author":"Adduri","year":"2025","DOI":"10.1101\/2025.01.13.632878,"},{"key":"2025121319013999100_btaf636-B2","doi-asserted-by":"crossref","first-page":"3984","DOI":"10.1093\/bioinformatics\/btac420","article-title":"Improving candidate biosynthetic gene clusters in fungi through reinforcement learning","volume":"38","author":"Almeida","year":"2022","journal-title":"Bioinformatics"},{"key":"2025121319013999100_btaf636-B3","doi-asserted-by":"crossref","first-page":"1543","DOI":"10.1039\/D4NP00009A","article-title":"Advances, opportunities, and challenges in methods for interrogating the structure activity relationships of natural products","volume":"41","author":"Ancajas","year":"2024","journal-title":"Nat Prod Rep"},{"key":"2025121319013999100_btaf636-B4","doi-asserted-by":"crossref","first-page":"200","DOI":"10.1038\/s41573-020-00114-z","article-title":"Natural products in drug discovery: advances and opportunities","volume":"20","author":"Atanasov","year":"2021","journal-title":"Nat Rev Drug Discov"},{"key":"2025121319013999100_btaf636-B5","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1186\/s13321-015-0069-3","article-title":"Why is tanimoto index an appropriate choice for fingerprint-based similarity calculations?","volume":"7","author":"Bajusz","year":"2015","journal-title":"J Cheminform"},{"key":"2025121319013999100_btaf636-B6","doi-asserted-by":"crossref","first-page":"2887","DOI":"10.1021\/jm9602928","article-title":"The properties of known drugs. 1. Molecular frameworks","volume":"39","author":"Bemis","year":"1996","journal-title":"J Med Chem"},{"key":"2025121319013999100_btaf636-B7","doi-asserted-by":"crossref","first-page":"286","DOI":"10.1016\/j.tig.2024.11.013","article-title":"Genomic language models: opportunities and challenges","volume":"41","author":"Benegas","year":"2025","journal-title":"Trends Genet"},{"key":"2025121319013999100_btaf636-B8","doi-asserted-by":"crossref","first-page":"W204","DOI":"10.1093\/nar\/gkt449","article-title":"antiSMASH 2.0\u2014a versatile platform for genome mining of secondary metabolite producers","volume":"41","author":"Blin","year":"2013","journal-title":"Nucleic Acids Res"},{"key":"2025121319013999100_btaf636-B9","doi-asserted-by":"crossref","first-page":"W46","DOI":"10.1093\/nar\/gkad344","article-title":"antiSMASH 7.0: new and improved predictions for detection, regulation, chemical structures and visualisation","volume":"51","author":"Blin","year":"2023","journal-title":"Nucleic Acids Res"},{"key":"2025121319013999100_btaf636-B10","doi-asserted-by":"crossref","first-page":"747","DOI":"10.1021\/ci9803381","article-title":"Unsupervised data base clustering based on daylight\u2019s fingerprint and tanimoto similarity: a fast and automated way to cluster small and large data sets","volume":"39","author":"Butina","year":"1999","journal-title":"J Chem Inf Comput Sci"},{"key":"2025121319013999100_btaf636-B11","volume-title":"Nucleic Acids Res","author":"Cediel-Becerra","year":"2025"},{"key":"2025121319013999100_btaf636-B12","doi-asserted-by":"crossref","first-page":"2478","DOI":"10.1073\/pnas.1218073110","article-title":"Discovery of indolotryptoline antiproliferative agents by homology-guided metagenomic screening","volume":"110","author":"Chang","year":"2013","journal-title":"Proc Natl Acad Sci USA"},{"key":"2025121319013999100_btaf636-B13","doi-asserted-by":"crossref","first-page":"17906","DOI":"10.1021\/ja408683p","article-title":"Discovery and synthetic refactoring of tryptophan dimer gene clusters from the environment","volume":"135","author":"Chang","year":"2013","journal-title":"J Am Chem Soc"},{"key":"2025121319013999100_btaf636-B14","doi-asserted-by":"crossref","first-page":"6044","DOI":"10.1021\/jacs.5b01968","article-title":"Targeted metagenomics: finding rare tryptophan dimer natural products in the environment","volume":"137","author":"Chang","year":"2015","journal-title":"J Am Chem Soc"},{"key":"2025121319013999100_btaf636-B15","doi-asserted-by":"crossref","first-page":"3202","DOI":"10.1093\/bioinformatics\/btx400","article-title":"SANDPUMA: ensemble predictions of nonribosomal peptide chemistry reveal biosynthetic diversity across actinobacteria","volume":"33","author":"Chevrette","year":"2017","journal-title":"Bioinformatics"},{"key":"2025121319013999100_btaf636-B16","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1007\/s10295-018-2085-6","article-title":"Emerging evolutionary paradigms in antibiotic discovery","volume":"46","author":"Chevrette","year":"2019","journal-title":"J Ind Microbiol Biotechnol"},{"key":"2025121319013999100_btaf636-B17","doi-asserted-by":"crossref","first-page":"346","DOI":"10.1038\/s42256-025-01007-9","article-title":"Transformers and genome language models","volume":"7","author":"Consens","year":"2025","journal-title":"Nat Mach Intell"},{"key":"2025121319013999100_btaf636-B18","doi-asserted-by":"crossref","first-page":"147","DOI":"10.1099\/00221287-146-1-147","article-title":"Two new tailoring enzymes, a glycosyltransferase and an oxygenase, involved in biosynthesis of the angucycline antibiotic urdamycin A in Streptomyces fradiae Tu2717","volume":"146 ( Pt 1)","author":"Faust","year":"2000","journal-title":"Microbiology (Reading)"},{"key":"2025121319013999100_btaf636-B19","doi-asserted-by":"crossref","first-page":"520","DOI":"10.1016\/j.mib.2009.07.002","article-title":"Antibiotics from microbes: converging to kill","volume":"12","author":"Fischbach","year":"2009","journal-title":"Curr Opin Microbiol"},{"key":"2025121319013999100_btaf636-B20","doi-asserted-by":"crossref","first-page":"726","DOI":"10.1038\/s41564-022-01110-2","article-title":"Compendium of specialized metabolite biosynthetic diversity encoded in bacterial genomes","volume":"7","author":"Gavriilidou","year":"2022","journal-title":"Nat Microbiol"},{"key":"2025121319013999100_btaf636-B21","doi-asserted-by":"crossref","first-page":"35","DOI":"10.1016\/j.ijfoodmicro.2017.12.028","article-title":"Description of an orthologous cluster of ochratoxin a biosynthetic genes in Aspergillus and Penicillium species. A comparative analysis","volume":"268","author":"Gil-Serna","year":"2018","journal-title":"Int J Food Microbiol"},{"key":"2025121319013999100_btaf636-B22","doi-asserted-by":"crossref","first-page":"78","DOI":"10.3390\/md14040078","article-title":"Next generation sequencing of actinobacteria for the discovery of novel natural products","volume":"14","author":"Gomez-Escribano","year":"2016","journal-title":"Mar Drugs"},{"key":"2025121319013999100_btaf636-B23","doi-asserted-by":"crossref","first-page":"nwaf028","DOI":"10.1093\/nsr\/nwaf028","article-title":"Foundation models in bioinformatics","volume":"12","author":"Guo","year":"2025","journal-title":"Natl Sci Rev"},{"key":"2025121319013999100_btaf636-B24","doi-asserted-by":"crossref","first-page":"813","DOI":"10.1038\/s41589-019-0313-7","article-title":"Automated structure prediction of trans-acyltransferase polyketide synthase products","volume":"15","author":"Helfrich","year":"2019","journal-title":"Nat Chem Biol"},{"key":"2025121319013999100_btaf636-B25","doi-asserted-by":"crossref","first-page":"2783","DOI":"10.1021\/acssynbio.8b00392","article-title":"Biosynthesis of novel statins by combining heterologous genes from Xylaria and Aspergillus","volume":"7","author":"Itoh","year":"2018","journal-title":"ACS Synth Biol"},{"key":"2025121319013999100_btaf636-B26","doi-asserted-by":"crossref","first-page":"102214","DOI":"10.1016\/j.cbpa.2022.102214","article-title":"Convergent and divergent biosynthetic strategies towards phosphonic acid natural products","volume":"71","author":"Ju","year":"2022","journal-title":"Curr Opin Chem Biol"},{"key":"2025121319013999100_btaf636-B27","doi-asserted-by":"publisher","author":"Kang","year":"2025","DOI":"10.1101\/2025.04.29.651206,"},{"key":"2025121319013999100_btaf636-B28","doi-asserted-by":"crossref","first-page":"giaa154","DOI":"10.1093\/gigascience\/giaa154","article-title":"BiG-SLiCE: a highly scalable tool maps the diversity of 1.2 million biosynthetic gene clusters","volume":"10","author":"Kautsar","year":"2021","journal-title":"Gigascience"},{"key":"2025121319013999100_btaf636-B29","doi-asserted-by":"crossref","first-page":"430","DOI":"10.1016\/j.isci.2019.11.037","article-title":"Divergent biosynthesis of C-nucleoside minimycin and indigoidine in bacteria","volume":"22","author":"Kong","year":"2019","journal-title":"iScience"},{"key":"2025121319013999100_btaf636-B30","doi-asserted-by":"crossref","first-page":"gkaf305","DOI":"10.1093\/nar\/gkaf305","article-title":"Deciphering the biosynthetic potential of microbial genomes using a BGC language processing neural network model","volume":"53","author":"Lai","year":"2025","journal-title":"Nucleic Acids Res"},{"key":"2025121319013999100_btaf636-B31","doi-asserted-by":"crossref","first-page":"e202100484","DOI":"10.1002\/cbic.202100484","article-title":"Predictive engineering of class I terpene synthases using experimental and computational approaches","volume":"23","author":"Leferink","year":"2022","journal-title":"Chembiochem"},{"key":"2025121319013999100_btaf636-B32","doi-asserted-by":"publisher","author":"Liu","year":"2025","DOI":"10.1101\/2025.05.31.656985,"},{"key":"2025121319013999100_btaf636-B33","doi-asserted-by":"crossref","first-page":"e1004016","DOI":"10.1371\/journal.pcbi.1004016","article-title":"A systematic computational analysis of biosynthetic gene cluster evolution: lessons for engineering biosynthesis","volume":"10","author":"Medema","year":"2014","journal-title":"PLoS Comput Biol"},{"key":"2025121319013999100_btaf636-B34","doi-asserted-by":"crossref","first-page":"1218","DOI":"10.1093\/molbev\/mst025","article-title":"Detecting sequence homology at the gene cluster level with MultiGeneBlast","volume":"30","author":"Medema","year":"2013","journal-title":"Mol Biol Evol"},{"key":"2025121319013999100_btaf636-B35","doi-asserted-by":"crossref","first-page":"1216","DOI":"10.1021\/acs.jnatprod.0c01291","article-title":"Frankobactin metallophores produced by nitrogen-fixing Frankia actinobacteria function in toxic metal sequestration","volume":"84","author":"Mohr","year":"2021","journal-title":"J Nat Prod"},{"key":"2025121319013999100_btaf636-B36","doi-asserted-by":"crossref","first-page":"6147","DOI":"10.1039\/D1OB00600B","article-title":"Chlorinated metabolites from Streptomyces sp. highlight the role of biosynthetic mosaics and superclusters in the evolution of chemical diversity","volume":"19","author":"Morshed","year":"2021","journal-title":"Org Biomol Chem"},{"key":"2025121319013999100_btaf636-B37","doi-asserted-by":"crossref","first-page":"60","DOI":"10.1038\/s41589-019-0400-9","article-title":"A computational framework to explore large-scale biosynthetic diversity","volume":"16","author":"Navarro-Mu\u00f1oz","year":"2020","journal-title":"Nat Chem Biol"},{"key":"2025121319013999100_btaf636-B38","doi-asserted-by":"crossref","first-page":"770","DOI":"10.1021\/acs.jnatprod.9b01285","article-title":"Natural products as sources of new drugs over the nearly four decades from 01\/1981 to 09\/2019","volume":"83","author":"Newman","year":"2020","journal-title":"J Nat Prod"},{"key":"2025121319013999100_btaf636-B39","doi-asserted-by":"crossref","first-page":"eado9336","DOI":"10.1126\/science.ado9336","article-title":"Sequence modeling and design from molecular to genome scale with evo","volume":"386","author":"Nguyen","year":"2024","journal-title":"Science"},{"key":"2025121319013999100_btaf636-B40","doi-asserted-by":"crossref","first-page":"5478","DOI":"10.1093\/nar\/gkae314","article-title":"BGCFlow: systematic pangenome workflow for the analysis of biosynthetic gene clusters across large genomic datasets","volume":"52","author":"Nuhamunada","year":"2024","journal-title":"Nucleic Acids Res"},{"key":"2025121319013999100_btaf636-B41","doi-asserted-by":"crossref","first-page":"2385","DOI":"10.1002\/cbic.201500317","article-title":"Identification and characterization of the streptazone E biosynthetic gene cluster in streptomyces sp. MSC090213JE08","volume":"16","author":"Ohno","year":"2015","journal-title":"Chembiochem"},{"key":"2025121319013999100_btaf636-B42","doi-asserted-by":"crossref","first-page":"1049","DOI":"10.1038\/s41589-019-0343-1","article-title":"Fosmidomycin biosynthesis diverges from related phosphonate natural products","volume":"15","author":"Parkinson","year":"2019","journal-title":"Nat Chem Biol"},{"key":"2025121319013999100_btaf636-B43","doi-asserted-by":"crossref","first-page":"943","DOI":"10.1016\/S1074-5521(00)00044-2","article-title":"Cloning, sequencing and analysis of the enterocin biosynthesis gene cluster from the marine isolate \u2018Streptomyces maritimus\u2019: evidence for the derailment of an aromatic polyketide synthase","volume":"7","author":"Piel","year":"2000","journal-title":"Chem Biol"},{"key":"2025121319013999100_btaf636-B44","doi-asserted-by":"crossref","first-page":"e1011162","DOI":"10.1371\/journal.pcbi.1011162","article-title":"Deep self-supervised learning for biosynthetic gene cluster detection and product classification","volume":"19","author":"Rios-Martinez","year":"2023","journal-title":"PLoS Comput Biol"},{"key":"2025121319013999100_btaf636-B45","first-page":"mgen000988","article-title":"Evolutionary investigations of the biosynthetic diversity in the skin microbiome using lsaBGC","volume":"9","author":"Salamzade","year":"2023","journal-title":"Microb Genom"},{"key":"2025121319013999100_btaf636-B46","volume-title":"Angew Chem Int Ed","author":"Sokolova","year":"2025"},{"key":"2025121319013999100_btaf636-B47","doi-asserted-by":"crossref","first-page":"D603","DOI":"10.1093\/nar\/gkac1049","article-title":"MIBiG 3.0: a community-driven effort to annotate experimentally validated biosynthetic gene clusters","volume":"51","author":"Terlouw","year":"2023","journal-title":"Nucleic Acids Res"},{"key":"2025121319013999100_btaf636-B48","doi-asserted-by":"publisher","author":"Terlouw","year":"2025","DOI":"10.1101\/2025.01.08.631717,"},{"key":"2025121319013999100_btaf636-B49","doi-asserted-by":"crossref","first-page":"181","DOI":"10.1186\/s12859-023-05311-2","article-title":"CAGECAT: the CompArative GEne cluster analysis toolbox for rapid search and visualisation of homologous gene clusters","volume":"24","author":"van den Belt","year":"2023","journal-title":"BMC Bioinformatics"},{"key":"2025121319013999100_btaf636-B50","doi-asserted-by":"crossref","first-page":"1824","DOI":"10.1021\/acscentsci.9b00806","article-title":"The natural products atlas: an open access knowledge base for microbial natural products discovery","volume":"5","author":"van Santen","year":"2019","journal-title":"ACS Cent Sci"},{"key":"2025121319013999100_btaf636-B51","doi-asserted-by":"crossref","first-page":"91","DOI":"10.1016\/S0378-1097(03)00861-9","article-title":"Expression of the landomycin biosynthetic gene cluster in a PKS mutant of Streptomyces fradiae is dependent on the coexpression of a putative transcriptional activator gene","volume":"230","author":"von Mulert","year":"2004","journal-title":"FEMS Microbiol Lett"},{"key":"2025121319013999100_btaf636-B52","doi-asserted-by":"crossref","first-page":"335","DOI":"10.1016\/S0022-2836(03)00232-8","article-title":"Computational approach for prediction of domain organization and substrate specificity of modular polyketide synthases","volume":"328","author":"Yadav","year":"2003","journal-title":"J Mol Biol"},{"key":"2025121319013999100_btaf636-B53","doi-asserted-by":"crossref","first-page":"194","DOI":"10.3389\/fmicb.2017.00194","article-title":"Identification by genome mining of a type I polyketide gene cluster from Streptomyces argillaceus involved in the biosynthesis of pyridine and piperidine alkaloids argimycins P","volume":"8","author":"Ye","year":"2017","journal-title":"Front Microbiol"},{"key":"2025121319013999100_btaf636-B54","doi-asserted-by":"crossref","first-page":"159","DOI":"10.1007\/s10295-012-1207-9","article-title":"An indigoidine biosynthetic gene cluster from Streptomyces chromofuscus ATCC 49982 contains an unusual IndB homologue","volume":"40","author":"Yu","year":"2013","journal-title":"J Ind Microbiol Biotechnol"},{"key":"2025121319013999100_btaf636-B55","doi-asserted-by":"crossref","first-page":"D678","DOI":"10.1093\/nar\/gkae1115","article-title":"MIBiG 4.0: advancing biosynthetic gene cluster curation through global collaboration","volume":"53","author":"Zdouc","year":"2025","journal-title":"Nucleic Acids Res"},{"key":"2025121319013999100_btaf636-B56","doi-asserted-by":"publisher","author":"Zhou","year":"2025","DOI":"10.1101\/2025.01.30.635558,"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaf636\/65450454\/btaf636.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/12\/btaf636\/65450454\/btaf636.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/12\/btaf636\/65450454\/btaf636.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,12,14]],"date-time":"2025-12-14T00:02:06Z","timestamp":1765670526000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btaf636\/8339750"}},"subtitle":[],"editor":[{"given":"Macha","family":"Nikolski","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2025,11,21]]},"references-count":56,"journal-issue":{"issue":"12","published-print":{"date-parts":[[2025,12,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaf636","relation":{},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,12]]},"published":{"date-parts":[[2025,11,21]]},"article-number":"btaf636"}}