{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,19]],"date-time":"2026-03-19T06:52:23Z","timestamp":1773903143085,"version":"3.50.1"},"reference-count":26,"publisher":"MDPI AG","issue":"2","license":[{"start":{"date-parts":[[2023,1,29]],"date-time":"2023-01-29T00:00:00Z","timestamp":1674950400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001871","name":"FCT (Funda\u00e7\u00e3o para a Ci\u00eancia e Tecnologia) \/MCTES (Minist\u00e9rio da Ci\u00eancia, Tecnologia e Ensino Superior)","doi-asserted-by":"publisher","award":["UIDB\/50006\/2020"],"award-info":[{"award-number":["UIDB\/50006\/2020"]}],"id":[{"id":"10.13039\/501100001871","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001871","name":"FCT (Funda\u00e7\u00e3o para a Ci\u00eancia e Tecnologia) \/MCTES (Minist\u00e9rio da Ci\u00eancia, Tecnologia e Ensino Superior)","doi-asserted-by":"publisher","award":["UIDP\/50006\/2020"],"award-info":[{"award-number":["UIDP\/50006\/2020"]}],"id":[{"id":"10.13039\/501100001871","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001871","name":"FCT (Funda\u00e7\u00e3o para a Ci\u00eancia e Tecnologia) \/MCTES (Minist\u00e9rio da Ci\u00eancia, Tecnologia e Ensino Superior)","doi-asserted-by":"publisher","award":["101016216"],"award-info":[{"award-number":["101016216"]}],"id":[{"id":"10.13039\/501100001871","id-type":"DOI","asserted-by":"publisher"}]},{"name":"European Union\u2019s Horizon 2020 Research and Innovation Programme","award":["UIDB\/50006\/2020"],"award-info":[{"award-number":["UIDB\/50006\/2020"]}]},{"name":"European Union\u2019s Horizon 2020 Research and Innovation Programme","award":["UIDP\/50006\/2020"],"award-info":[{"award-number":["UIDP\/50006\/2020"]}]},{"name":"European Union\u2019s Horizon 2020 Research and Innovation Programme","award":["101016216"],"award-info":[{"award-number":["101016216"]}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Pharmaceuticals"],"abstract":"<jats:p>Terpenes are a widespread class of natural products with significant chemical and biological diversity, and many of these molecules have already made their way into medicines. In this work, we employ a data science-based approach to identify, compile, and characterize the diversity of terpenes currently known in a systematic way, in a total of 59,833 molecules. We also employed several methods for the purpose of classifying terpene subclasses using their physicochemical descriptors. Light gradient boosting machine, k-nearest neighbours, random forests, Gaussian na\u00efve Bayes and Multilayer perceptron were tested, with the best-performing algorithms yielding accuracy, F1 score, precision and other metrics all over 0.9, thus showing the capabilities of these approaches for the classification of terpene subclasses. These results can be important for the field of phytochemistry and pharmacognosy, as they allow the prediction of the subclass of novel terpene molecules, even when biosynthetic studies are not available.<\/jats:p>","DOI":"10.3390\/ph16020202","type":"journal-article","created":{"date-parts":[[2023,1,30]],"date-time":"2023-01-30T03:31:11Z","timestamp":1675049471000},"page":"202","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":17,"title":["The Chemical Space of Terpenes: Insights from Data Science and AI"],"prefix":"10.3390","volume":"16","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8962-8985","authenticated-orcid":false,"given":"Morteza","family":"Hosseini","sequence":"first","affiliation":[{"name":"REQUIMTE\/LAQV, Laborat\u00f3rio de Farmacognosia, Departamento de Qu\u00edmica, Faculdade de Farm\u00e1cia, Universidade do Porto, R. Jorge Viterbo Ferreira, 4050-313 Porto, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0384-7592","authenticated-orcid":false,"given":"David M.","family":"Pereira","sequence":"additional","affiliation":[{"name":"REQUIMTE\/LAQV, Laborat\u00f3rio de Farmacognosia, Departamento de Qu\u00edmica, Faculdade de Farm\u00e1cia, Universidade do Porto, R. Jorge Viterbo Ferreira, 4050-313 Porto, Portugal"}]}],"member":"1968","published-online":{"date-parts":[[2023,1,29]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"200","DOI":"10.1038\/s41573-020-00114-z","article-title":"Natural products in drug discovery: Advances and opportunities","volume":"20","author":"Atanasov","year":"2021","journal-title":"Nat. Rev. Drug Discov."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Dewick, P.M. (2002). Medicinal Natural Products: A Biosynthetic Approach, John Wiley & Sons.","DOI":"10.1002\/0470846275"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"2","DOI":"10.1186\/s13321-020-00478-9","article-title":"COCONUT online: Collection of Open Natural Products database","volume":"13","author":"Sorokina","year":"2021","journal-title":"J. Cheminformatics"},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"243","DOI":"10.1517\/13543776.2014.870154","article-title":"Terpenes and derivatives as a new perspective for pain treatment: A patent review","volume":"24","author":"Serafini","year":"2014","journal-title":"Expert Opin. Ther. Pat."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"3667","DOI":"10.1021\/acs.jcim.9b00443","article-title":"Exploring Chemical and Biological Space of Terpenoids","volume":"59","author":"Zeng","year":"2019","journal-title":"J. Chem. Inf. Model."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1186\/s13321-016-0174-y","article-title":"ClassyFire: Automated chemical classification with a comprehensive, computable taxonomy","volume":"8","author":"Eisner","year":"2016","journal-title":"J. Cheminformatics"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"282","DOI":"10.1007\/s13181-013-0307-x","article-title":"Case series: Inhaled coral vapor--toxicity in a tank","volume":"9","author":"Sud","year":"2013","journal-title":"J. Med. Toxicol."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"55","DOI":"10.1186\/s13321-019-0378-z","article-title":"NaPLeS: A natural products likeness scorer\u2014Web application and database","volume":"11","author":"Sorokina","year":"2019","journal-title":"J. Cheminformatics"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"89","DOI":"10.1016\/j.addr.2016.05.007","article-title":"BDDCS, the Rule of 5 and drugability","volume":"101","author":"Benet","year":"2016","journal-title":"Adv. Drug Deliv. Rev."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"475","DOI":"10.1038\/nrd4609","article-title":"An analysis of the attrition of drug candidates from four major pharmaceutical companies","volume":"14","author":"Waring","year":"2015","journal-title":"Nat. Rev. Drug Discov."},{"key":"ref_11","unstructured":"Rosenberg, A., and Hirschberg, J. (2007, January 6). V-measure: A conditional entropy-based external cluster evaluation measure. Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Prague, Czech Republic."},{"key":"ref_12","doi-asserted-by":"crossref","first-page":"846","DOI":"10.1080\/01621459.1971.10482356","article-title":"Objective criteria for the evaluation of clustering methods","volume":"66","author":"Rand","year":"1971","journal-title":"J. Am. Stat. Assoc."},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"53","DOI":"10.1016\/0377-0427(87)90125-7","article-title":"Silhouettes: A graphical aid to the interpretation and validation of cluster analysis","volume":"20","author":"Rousseeuw","year":"1987","journal-title":"J. Comput. Appl. Math."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"906","DOI":"10.1080\/07391102.2015.1060161","article-title":"Identification of new candidate drugs for lung cancer using chemical\u2013chemical interactions, chemical\u2013protein interactions and a K-means clustering algorithm","volume":"34","author":"Lu","year":"2016","journal-title":"J. Biomol. Struct. Dyn."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"104856","DOI":"10.1016\/j.compbiomed.2021.104856","article-title":"Molecular descriptor analysis of approved drugs using unsupervised learning for drug repurposing","volume":"138","author":"Madugula","year":"2021","journal-title":"Comput. Biol. Med."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"995","DOI":"10.1111\/cbdd.13672","article-title":"Common cancer biomarkers of breast and ovarian types identified through artificial intelligence","volume":"96","author":"Pawar","year":"2020","journal-title":"Chem. Biol. Drug Des."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"411","DOI":"10.1016\/S0893-6080(00)00026-5","article-title":"Independent component analysis: Algorithms and applications","volume":"13","author":"Oja","year":"2000","journal-title":"Neural Netw."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"1299","DOI":"10.1162\/089976698300017467","article-title":"Nonlinear component analysis as a kernel eigenvalue problem","volume":"10","author":"Smola","year":"1998","journal-title":"Neural Comput."},{"key":"ref_19","first-page":"2579","article-title":"Visualizing data using t-SNE","volume":"9","author":"Hinton","year":"2008","journal-title":"J. Mach. Learn. Res."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"McInnes, L., Healy, J., and Melville, J. (2018). Umap: Uniform manifold approximation and projection for dimension reduction. arXiv.","DOI":"10.21105\/joss.00861"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"303","DOI":"10.1038\/nbt0308-303","article-title":"What is principal component analysis?","volume":"26","year":"2008","journal-title":"Nat. Biotechnol."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"38","DOI":"10.1038\/nbt.4314","article-title":"Dimensionality reduction for visualizing single-cell data using UMAP","volume":"37","author":"Becht","year":"2019","journal-title":"Nat. Biotechnol."},{"key":"ref_23","first-page":"3146","article-title":"Lightgbm: A highly efficient gradient boosting decision tree","volume":"30","author":"Ke","year":"2017","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"238","DOI":"10.2307\/1403797","article-title":"Discriminatory analysis. Nonparametric discrimination: Consistency properties","volume":"57","author":"Fix","year":"1989","journal-title":"Int. Stat. Rev. \/Rev. Int. De Stat."},{"key":"ref_25","unstructured":"Ho, T.K. (1995, January 14\u201316). Random decision forests. Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, Canada."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1007\/BF00058655","article-title":"Bagging predictors","volume":"24","author":"Breiman","year":"1996","journal-title":"Mach. Learn."}],"container-title":["Pharmaceuticals"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8247\/16\/2\/202\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,10]],"date-time":"2025-10-10T18:19:00Z","timestamp":1760120340000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8247\/16\/2\/202"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,1,29]]},"references-count":26,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2023,2]]}},"alternative-id":["ph16020202"],"URL":"https:\/\/doi.org\/10.3390\/ph16020202","relation":{},"ISSN":["1424-8247"],"issn-type":[{"value":"1424-8247","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,1,29]]}}}