{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,27]],"date-time":"2026-02-27T04:28:55Z","timestamp":1772166535052,"version":"3.50.1"},"reference-count":43,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2025,8,29]],"date-time":"2025-08-29T00:00:00Z","timestamp":1756425600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2025,8,29]],"date-time":"2025-08-29T00:00:00Z","timestamp":1756425600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"European Research Council","award":["948770"],"award-info":[{"award-number":["948770"]}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Cheminform"],"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>Natural products provide a rich source of bioactive molecules for a variety of applications. Molecular fingerprints are the tool of choice for systematic large-scale studies of their structures. However, current molecular fingerprints insufficiently represent characteristic features of natural products inherently, decreasing the interpretability of natural product-specific predictions. Here, we show that a natural product-specific molecular fingerprint based on a relatively small set of selected biosynthetic building blocks provides more interpretable predictions of biosynthetic distance and natural product classification. Our fingerprint Biosynfoni outperforms MACCS, Morgan, and Daylight-like fingerprints in biosynthetic distance estimation, using 39 substructure keys. Moreover, Biosynfoni\u2019s design, compactness, and concrete substructure definition allow easy visualisation of the detected substructures and their respective biosynthetic pathway origins. Through Biosynfoni, users can gain more insights from predictions and better examine the importance of features within machine learning models. Our results show that a short fingerprint consisting of biologically significant building blocks performs on par with top-performing molecular fingerprints for natural product classification while improving prediction explainability.<\/jats:p>","DOI":"10.1186\/s13321-025-01081-6","type":"journal-article","created":{"date-parts":[[2025,8,29]],"date-time":"2025-08-29T10:02:38Z","timestamp":1756461758000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Biosynfoni: a biosynthesis-informed and interpretable lightweight molecular fingerprint"],"prefix":"10.1186","volume":"17","author":[{"given":"Lucina-May","family":"Nollen","sequence":"first","affiliation":[]},{"given":"David","family":"Meijer","sequence":"additional","affiliation":[]},{"given":"Maria","family":"Sorokina","sequence":"additional","affiliation":[]},{"given":"Justin J. J.","family":"van der Hooft","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2025,8,29]]},"reference":[{"key":"1081_CR1","doi-asserted-by":"publisher","unstructured":"Ertl P, Schuffenhauer A (2008) Cheminformatics analysis of natural products: Lessons from nature inspiring the design of new drugs. In: Petersen F, Amstutz R (eds) Natural Compounds as Drugs, vol\u00a066. Birkh\u00e4user Basel, Basel, p 217\u2013235, https:\/\/doi.org\/10.1007\/978-3-7643-8595-8_4","DOI":"10.1007\/978-3-7643-8595-8_4"},{"key":"1081_CR2","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1708560114","author":"MA Skinnider","year":"2017","unstructured":"Skinnider MA, Magarvey NA (2017) Statistical reanalysis of natural products reveals increasing chemical diversity. Proc Natl Acad Sci. https:\/\/doi.org\/10.1073\/pnas.1708560114","journal-title":"Proc Natl Acad Sci"},{"key":"1081_CR3","doi-asserted-by":"publisher","DOI":"10.1515\/psr-2018-0100","author":"BM Abegaz","year":"2019","unstructured":"Abegaz BM, Kinfe HH (2019) Secondary metabolites, their structural diversity, bioactivity, and ecological functions: an overview. Phys Sci Rev. https:\/\/doi.org\/10.1515\/psr-2018-0100","journal-title":"Phys Sci Rev"},{"issue":"9","key":"1081_CR4","doi-asserted-by":"publisher","first-page":"1295","DOI":"10.1039\/C9NP00027E","volume":"36","author":"M Adamek","year":"2019","unstructured":"Adamek M, Alanjary M, Ziemert N (2019) Applied evolution: phylogeny-based approaches in natural products research. Nat Prod Rep 36(9):1295\u20131312. https:\/\/doi.org\/10.1039\/C9NP00027E","journal-title":"Nat Prod Rep"},{"key":"1081_CR5","doi-asserted-by":"publisher","unstructured":"Springob K, Kutchan TM (2009) Introduction to the Different Classes of Natural Products. In: Osbourn AE, Lanzotti V (eds) Plant-derived Natural Products. Springer US, New York, NY, p 3\u201350, https:\/\/doi.org\/10.1007\/978-0-387-85498-4_1","DOI":"10.1007\/978-0-387-85498-4_1"},{"issue":"12","key":"1081_CR6","doi-asserted-by":"publisher","DOI":"10.1016\/j.heliyon.2021.e08436","volume":"7","author":"Y Kusumawati","year":"2021","unstructured":"Kusumawati Y, Hutama AS, Wellia DV et al (2021) Natural resources for dye-sensitized solar cells. Heliyon 7(12):e08436. https:\/\/doi.org\/10.1016\/j.heliyon.2021.e08436","journal-title":"Heliyon"},{"issue":"1","key":"1081_CR7","first-page":"2522","volume":"2","author":"SM Lundberg","year":"2020","unstructured":"Lundberg SM, Erion G, Chen H et al (2020) From local explanations to global understanding with explainable ai for trees. Nature Mach Intell 2(1):2522\u20135839","journal-title":"Nature Mach Intell"},{"key":"1081_CR8","doi-asserted-by":"publisher","first-page":"232","DOI":"10.1016\/j.copbio.2021.02.001","volume":"69","author":"SC Heard","year":"2021","unstructured":"Heard SC, Wu G, Winter JM (2021) Antifungal natural products. Curr Opin Biotechnol 69:232\u2013241. https:\/\/doi.org\/10.1016\/j.copbio.2021.02.001","journal-title":"Curr Opin Biotechnol"},{"issue":"8","key":"1081_CR9","doi-asserted-by":"publisher","first-page":"689","DOI":"10.1016\/j.tips.2016.05.001","volume":"37","author":"MG Moloney","year":"2016","unstructured":"Moloney MG (2016) Natural Products as a Source for Novel Antibiotics. Trends Pharmacol Sci 37(8):689\u2013701. https:\/\/doi.org\/10.1016\/j.tips.2016.05.001","journal-title":"Trends Pharmacol Sci"},{"issue":"2","key":"1081_CR10","doi-asserted-by":"publisher","first-page":"204","DOI":"10.1016\/j.drudis.2015.01.009","volume":"21","author":"E Patridge","year":"2016","unstructured":"Patridge E, Gareiss P, Kinch MS et al (2016) An analysis of FDA-approved drugs: natural products and their derivatives. Drug Discovery Today 21(2):204\u2013207. https:\/\/doi.org\/10.1016\/j.drudis.2015.01.009","journal-title":"Drug Discovery Today"},{"issue":"1","key":"1081_CR11","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1021\/acs.jcim.7b00425","volume":"58","author":"D Probst","year":"2018","unstructured":"Probst D, Reymond JL (2018) Smilesdrawer: parsing and drawing smiles-encoded molecular structures using client-side javascript. J Chem Inf Model 58(1):1\u20137","journal-title":"J Chem Inf Model"},{"key":"1081_CR12","doi-asserted-by":"publisher","DOI":"10.3389\/fphar.2022.975079","volume":"13","author":"J Ribeiro-Filho","year":"2022","unstructured":"Ribeiro-Filho J, Teles YCF, Igoli JO et al (2022) Editorial: new trends in natural product research for inflammatory and infectious diseases. Front Pharmacol 13:975079. https:\/\/doi.org\/10.3389\/fphar.2022.975079","journal-title":"Front Pharmacol"},{"key":"1081_CR13","doi-asserted-by":"publisher","DOI":"10.4081\/ija.2021.1851","author":"K Godlewska","year":"2021","unstructured":"Godlewska K, Ronga D, Michalak I (2021) Plant extracts-importance in sustainable agriculture. Ital J Agron. https:\/\/doi.org\/10.4081\/ija.2021.1851","journal-title":"Ital J Agron"},{"key":"1081_CR14","doi-asserted-by":"publisher","DOI":"10.1007\/978-981-15-3024-1","volume-title":"Natural Bioactive Products in Sustainable Agriculture","author":"J Singh","year":"2020","unstructured":"Singh J, Yadav AN (2020) Natural Bioactive Products in Sustainable Agriculture. Springer Singapore, Singapore. https:\/\/doi.org\/10.1007\/978-981-15-3024-1"},{"issue":"D1","key":"1081_CR15","doi-asserted-by":"publisher","first-page":"D634","DOI":"10.1093\/nar\/gkae1063","volume":"53","author":"V Chandrasekhar","year":"2025","unstructured":"Chandrasekhar V, Rajan K, Kanakam S et al (2025) COCONUT 2.0: a comprehensive overhaul and curation of the collection of open natural products database. Nucleic Acids Res 53(D1):D634\u2013D643. https:\/\/doi.org\/10.1093\/nar\/gkae1063","journal-title":"Nucleic Acids Res"},{"issue":"2\u20133","key":"1081_CR16","doi-asserted-by":"publisher","first-page":"155","DOI":"10.1007\/s10295-015-1723-5","volume":"43","author":"L Katz","year":"2016","unstructured":"Katz L, Baltz RH (2016) Natural product discovery: past, present, and future. J Ind Microbiol Biotechnol 43(2\u20133):155\u2013176. https:\/\/doi.org\/10.1007\/s10295-015-1723-5","journal-title":"J Ind Microbiol Biotechnol"},{"issue":"6","key":"1081_CR17","doi-asserted-by":"publisher","first-page":"1204","DOI":"10.1016\/j.chempr.2020.05.002","volume":"6","author":"L Pattanaik","year":"2020","unstructured":"Pattanaik L, Coley CW (2020) Molecular Representation: going Long on Fingerprints. Chem 6(6):1204\u20131207. https:\/\/doi.org\/10.1016\/j.chempr.2020.05.002","journal-title":"Chem"},{"key":"1081_CR18","doi-asserted-by":"publisher","DOI":"10.1002\/wcms.1603","author":"DS Wigh","year":"2022","unstructured":"Wigh DS, Goodman JM, Lapkin AA (2022) A review of molecular representation in the age of machine learning. WIREs Comput Mol Sci. https:\/\/doi.org\/10.1002\/wcms.1603","journal-title":"WIREs Comput Mol Sci"},{"issue":"15","key":"1081_CR19","doi-asserted-by":"publisher","first-page":"7298","DOI":"10.1073\/pnas.1818877116","volume":"116","author":"N Hadadi","year":"2019","unstructured":"Hadadi N, MohammadiPeyhani H, Miskovic L et al (2019) Enzyme annotation for orphan and novel reactions using knowledge of substrate reactive sites. Proc Natl Acad Sci 116(15):7298\u20137307. https:\/\/doi.org\/10.1073\/pnas.1818877116","journal-title":"Proc Natl Acad Sci"},{"issue":"1","key":"1081_CR20","doi-asserted-by":"publisher","first-page":"157","DOI":"10.1021\/acssynbio.9b00447","volume":"9","author":"M Koch","year":"2020","unstructured":"Koch M, Duigou T, Faulon JL (2020) Reinforcement Learning for Bioretrosynthesis. ACS Synth Biol 9(1):157\u2013168. https:\/\/doi.org\/10.1021\/acssynbio.9b00447","journal-title":"ACS Synth Biol"},{"key":"1081_CR21","doi-asserted-by":"publisher","unstructured":"Landrum G, Tosco P, Kelley B, et\u00a0al (2023) rdkit\/rdkit: 2023_03_3 (Q1 2023) Release. https:\/\/doi.org\/10.5281\/ZENODO.8254217","DOI":"10.5281\/ZENODO.8254217"},{"issue":"11","key":"1081_CR22","doi-asserted-by":"publisher","first-page":"2795","DOI":"10.1021\/acs.jnatprod.1c00399","volume":"84","author":"HW Kim","year":"2021","unstructured":"Kim HW, Wang M, Leber CA et al (2021) NPClassifier: a deep neural network-based structural classification tool for natural products. J Nat Prod 84(11):2795\u20132807. https:\/\/doi.org\/10.1021\/acs.jnatprod.1c00399","journal-title":"J Nat Prod"},{"issue":"1","key":"1081_CR23","doi-asserted-by":"publisher","first-page":"82","DOI":"10.1186\/s13321-021-00559-3","volume":"13","author":"A Capecchi","year":"2021","unstructured":"Capecchi A, Reymond JL (2021) Classifying natural products from plants, fungi or bacteria using the COCONUT database and machine learning. J Cheminform 13(1):82. https:\/\/doi.org\/10.1186\/s13321-021-00559-3","journal-title":"J Cheminform"},{"issue":"1","key":"1081_CR24","doi-asserted-by":"publisher","first-page":"1760","DOI":"10.1038\/s41467-021-22022-5","volume":"12","author":"J Hafner","year":"2021","unstructured":"Hafner J, Payne J, MohammadiPeyhani H et al (2021) A computational workflow for the expansion of heterologous biosynthetic pathways to natural product derivatives. Nat Commun 12(1):1760. https:\/\/doi.org\/10.1038\/s41467-021-22022-5","journal-title":"Nat Commun"},{"key":"1081_CR25","doi-asserted-by":"publisher","unstructured":"Kotera M (2018) Physicochemical Property Labels as Molecular Descriptors for Improved Analysis of Compound-Protein and Compound-Compound Networks. In: Brown J (ed) Computational Chemogenomics, vol 1825. Springer New York, New York, NY, p 211\u2013225, https:\/\/doi.org\/10.1007\/978-1-4939-8639-2_6, series Title: Methods in Molecular Biology","DOI":"10.1007\/978-1-4939-8639-2_6"},{"issue":"9","key":"1081_CR26","doi-asserted-by":"publisher","first-page":"639","DOI":"10.1038\/nchembio.1884","volume":"11","author":"MH Medema","year":"2015","unstructured":"Medema MH, Fischbach MA (2015) Computational approaches to natural product discovery. Nat Chem Biol 11(9):639\u2013648. https:\/\/doi.org\/10.1038\/nchembio.1884","journal-title":"Nat Chem Biol"},{"issue":"7","key":"1081_CR27","doi-asserted-by":"publisher","first-page":"445","DOI":"10.1038\/nchembio.580","volume":"7","author":"H Yim","year":"2011","unstructured":"Yim H, Haselbeck R, Niu W et al (2011) Metabolic engineering of Escherichia coli for direct production of 1,4-butanediol. Nat Chem Biol 7(7):445\u2013452. https:\/\/doi.org\/10.1038\/nchembio.580","journal-title":"Nat Chem Biol"},{"key":"1081_CR28","doi-asserted-by":"publisher","DOI":"10.1002\/9780470742761","volume-title":"Medicinal Natural Products: a Biosynthetic Approach","author":"PM Dewick","year":"2009","unstructured":"Dewick PM (2009) Medicinal Natural Products: a Biosynthetic Approach, 1st edn. Wiley, Hoboken. https:\/\/doi.org\/10.1002\/9780470742761","edition":"1"},{"key":"1081_CR29","unstructured":"Walsh C, Tang Y (2017) Natural product biosynthesis: chemical logic and enzymatic machinery. Royal Society of Chemistry, London, oCLC: ocn966394346"},{"issue":"11","key":"1081_CR30","first-page":"2579","volume":"9","author":"LV Maaten","year":"2008","unstructured":"Maaten LV, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(11):2579\u20132605","journal-title":"J Mach Learn Res"},{"key":"1081_CR31","doi-asserted-by":"publisher","unstructured":"McInnes L, Healy J, Melville J (2018) UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. https:\/\/doi.org\/10.48550\/ARXIV.1802.03426, publisher: arXiv Version Number: 3","DOI":"10.48550\/ARXIV.1802.03426"},{"key":"1081_CR32","volume-title":"Python 3 Reference Manual","author":"G Van Rossum","year":"2009","unstructured":"Van Rossum G, Drake FL (2009) Python 3 Reference Manual. CreateSpace, Scotts Valley"},{"issue":"1","key":"1081_CR33","doi-asserted-by":"publisher","first-page":"2","DOI":"10.1186\/s13321-020-00478-9","volume":"13","author":"M Sorokina","year":"2021","unstructured":"Sorokina M, Merseburger P, Rajan K et al (2021) COCONUT online: collection of Open Natural Products database. J Cheminform 13(1):2. https:\/\/doi.org\/10.1186\/s13321-020-00478-9","journal-title":"J Cheminform"},{"issue":"D1","key":"1081_CR34","doi-asserted-by":"publisher","first-page":"D1214","DOI":"10.1093\/nar\/gkv1031","volume":"44","author":"J Hastings","year":"2016","unstructured":"Hastings J, Owen G, Dekker A et al (2016) ChEBI in 2016: improved services and an expanding collection of metabolites. Nucleic Acids Res 44(D1):D1214\u2013D1219. https:\/\/doi.org\/10.1093\/nar\/gkv1031","journal-title":"Nucleic Acids Res"},{"issue":"4","key":"1081_CR35","doi-asserted-by":"publisher","first-page":"4","DOI":"10.1145\/2757001.2757003","volume":"1","author":"MA Musen","year":"2015","unstructured":"Musen MA (2015) The prot\u00e9g\u00e9 project: a look back and a look forward. AI Matters 1(4):4\u201312. https:\/\/doi.org\/10.1145\/2757001.2757003","journal-title":"AI Matters"},{"issue":"D1","key":"1081_CR36","doi-asserted-by":"publisher","first-page":"D445","DOI":"10.1093\/nar\/gkz862","volume":"48","author":"R Caspi","year":"2020","unstructured":"Caspi R, Billington R, Keseler IM et al (2020) The MetaCyc database of metabolic pathways and enzymes\u2014a 2019 update. Nucleic Acids Res 48(D1):D445\u2013D453. https:\/\/doi.org\/10.1093\/nar\/gkz862","journal-title":"Nucleic Acids Res"},{"key":"1081_CR37","doi-asserted-by":"publisher","unstructured":"Sorokina M, Steinbeck C (2019) NaPLeS: NP-likeness Scorer Database, language: en. https:\/\/doi.org\/10.5281\/ZENODO.2652372","DOI":"10.5281\/ZENODO.2652372"},{"key":"1081_CR38","doi-asserted-by":"publisher","DOI":"10.3389\/fchem.2021.650569","volume":"9","author":"Q Xu","year":"2021","unstructured":"Xu Q, Deng H, Li X et al (2021) Application of Amino Acids in the Structural Modification of Natural Products: a Review. Front Chem 9:650569. https:\/\/doi.org\/10.3389\/fchem.2021.650569","journal-title":"Front Chem"},{"issue":"2","key":"1081_CR39","doi-asserted-by":"publisher","first-page":"107","DOI":"10.1021\/c160017a018","volume":"5","author":"HL Morgan","year":"1965","unstructured":"Morgan HL (1965) The Generation of a Unique Machine Description for Chemical Structures\u2014a Technique Developed at Chemical Abstracts Service. J Chem Doc 5(2):107\u2013113. https:\/\/doi.org\/10.1021\/c160017a018","journal-title":"J Chem Doc"},{"issue":"6","key":"1081_CR40","doi-asserted-by":"publisher","first-page":"1273","DOI":"10.1021\/ci010132r","volume":"42","author":"JL Durant","year":"2002","unstructured":"Durant JL, Leland BA, Henry DR et al (2002) Reoptimization of MDL Keys for Use in Drug Discovery. J Chem Inf Comput Sci 42(6):1273\u20131280. https:\/\/doi.org\/10.1021\/ci010132r","journal-title":"J Chem Inf Comput Sci"},{"issue":"1","key":"1081_CR41","doi-asserted-by":"publisher","first-page":"20","DOI":"10.1186\/s13321-015-0069-3","volume":"7","author":"D Bajusz","year":"2015","unstructured":"Bajusz D, R\u00e1cz A, H\u00e9berger K (2015) Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? J Cheminform 7(1):20. https:\/\/doi.org\/10.1186\/s13321-015-0069-3","journal-title":"J Cheminform"},{"key":"1081_CR42","unstructured":"Tanimoto TT (1958) An elementary mathematical theory of classification and prediction. International Business Machines Corporation New York, New York, section: 10 pages 28 cm"},{"issue":"7","key":"1081_CR43","doi-asserted-by":"publisher","DOI":"10.1002\/cmtd.202200005","volume":"2","author":"M Cihan Sorkun","year":"2022","unstructured":"Cihan Sorkun M, Mullaj D, Koelman JMVA et al (2022) ChemPlot, a Python Library for Chemical Space Visualization**. Chemi Methods 2(7):e202200005. https:\/\/doi.org\/10.1002\/cmtd.202200005","journal-title":"Chemi Methods"}],"container-title":["Journal of Cheminformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-025-01081-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/s13321-025-01081-6\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/s13321-025-01081-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,9,10]],"date-time":"2025-09-10T05:19:06Z","timestamp":1757481546000},"score":1,"resource":{"primary":{"URL":"https:\/\/jcheminf.biomedcentral.com\/articles\/10.1186\/s13321-025-01081-6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,8,29]]},"references-count":43,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2025,12]]}},"alternative-id":["1081"],"URL":"https:\/\/doi.org\/10.1186\/s13321-025-01081-6","relation":{"has-preprint":[{"id-type":"doi","id":"10.26434\/chemrxiv-2025-cwq74","asserted-by":"object"}]},"ISSN":["1758-2946"],"issn-type":[{"value":"1758-2946","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,8,29]]},"assertion":[{"value":"10 February 2025","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"11 August 2025","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"29 August 2025","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"J.J.J.vdH. is currently member of the Scientific Advisory Board of NAICONS Srl., Milano, Italy, and consults for Corteva Agriscience, Indianapolis, IN, USA. The other authors declare to have no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Competing interests"}}],"article-number":"136"}}