{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,30]],"date-time":"2026-04-30T16:01:58Z","timestamp":1777564918042,"version":"3.51.4"},"reference-count":28,"publisher":"MDPI AG","issue":"3","license":[{"start":{"date-parts":[[2025,1,25]],"date-time":"2025-01-25T00:00:00Z","timestamp":1737763200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001871","name":"Funda\u00e7\u00e3o para a Ci\u00eancia e Tecnologia","doi-asserted-by":"publisher","award":["UIDB\/50016\/2020."],"award-info":[{"award-number":["UIDB\/50016\/2020."]}],"id":[{"id":"10.13039\/501100001871","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Molecules"],"abstract":"<jats:p>Camellia japonica is a plant species with high cultural and biological relevance. Besides being used as an ornamental plant species, C. japonica has relevant biological properties. Due to hybridization, thousands of cultivars are known, and their accurate identification is mandatory. Infrared spectroscopy is currently recognized as an accurate and rapid technique for species and\/or subspecies identifications, including in plants. However, selecting proper analysis tools (spectra pre-processing, feature selection, and chemometric models) highly impacts the accuracy of such identifications. This study tests the impact of two distinct machine learning-based approaches for discriminating C. japonica cultivars using near-infrared (NIR) and Fourier transform infrared (FTIR) spectroscopies. Leaves infrared spectra (NIR\u2014obtained in a previous study; FTIR\u2014obtained herein) of 15 different C. japonica cultivars (38 plants) were modeled and analyzed via different machine learning-based approaches (Approach 1 and Approach 2), each combining a feature selection method plus a classifier application. Regarding Approach 1, NIR spectroscopy emerged as the most effective technique for predicting C. japonica cultivars, achieving 81.3% correct cultivar assignments. However, Approach 2 obtained the best results with FTIR spectroscopy data, achieving a perfect 100.0% accuracy in cultivar assignments. When comparing both approaches, Approach 2 also improved the results for NIR data, increasing the correct cultivar predictions by nearly 13%. The results obtained in this study highlight the importance of chemometric tools in analyzing infrared data. The choice of a specific data analysis approach significantly affects the accuracy of the technique. Moreover, the same approach can have varying impacts on different techniques. Therefore, it is not feasible to establish a universal data analysis approach, even for very similar datasets from comparable analytical techniques.<\/jats:p>","DOI":"10.3390\/molecules30030546","type":"journal-article","created":{"date-parts":[[2025,1,27]],"date-time":"2025-01-27T09:42:23Z","timestamp":1737970943000},"page":"546","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":3,"title":["Machine Learning-Based Spectral Analyses for Camellia japonica Cultivar Identification"],"prefix":"10.3390","volume":"30","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5381-6615","authenticated-orcid":false,"given":"Pedro Miguel","family":"Rodrigues","sequence":"first","affiliation":[{"name":"CBQF\u2014Centro de Biotecnologia e Qu\u00edmica Fina\u2014Laborat\u00f3rio Associado, Escola Superior de Biotecnologia, Universidade Cat\u00f3lica Portuguesa, Rua de Diogo Botelho 1327, 4169-005 Porto, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7206-8771","authenticated-orcid":false,"given":"Clara","family":"Sousa","sequence":"additional","affiliation":[{"name":"CBQF\u2014Centro de Biotecnologia e Qu\u00edmica Fina\u2014Laborat\u00f3rio Associado, Escola Superior de Biotecnologia, Universidade Cat\u00f3lica Portuguesa, Rua de Diogo Botelho 1327, 4169-005 Porto, Portugal"}]}],"member":"1968","published-online":{"date-parts":[[2025,1,25]]},"reference":[{"key":"ref_1","first-page":"1053","article-title":"Antioxidant capacity of Camellia japonica cultivars assessed by near- and mid-infrared spectroscopy","volume":"249","author":"Teixeira","year":"2018","journal-title":"Planta"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Wang, Y., Zhuang, H., Shen, Y., Wang, Y., and Wang, Z. (2021). The Dataset of Camellia Cultivars Names in the World. Biodivers. Data J., 9.","DOI":"10.3897\/BDJ.9.e61646"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"95","DOI":"10.1007\/s00606-007-0601-7","article-title":"Camellia japonica L. genotypes identified by an artificial neural network based on phyllometric and fractal parameters","volume":"270","author":"Mugnai","year":"2007","journal-title":"Plant Syst. Evol."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"791","DOI":"10.1007\/s10577-015-9500-x","article-title":"Next-generation sequencing reveals differentially amplified tandem repeats as a major genome component of Northern Europe\u2019s oldest Camellia japonica","volume":"23","author":"Heitkam","year":"2015","journal-title":"Chromosome Res."},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"28","DOI":"10.1016\/j.compag.2019.02.025","article-title":"Discrimination of Camellia japonica cultivars and chemometric models: An interlaboratory study","volume":"159","author":"Sousa","year":"2019","journal-title":"Comput. Electron. Agric."},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"353","DOI":"10.4103\/0973-1296.137378","article-title":"A rapid identification of four medicinal chrysanthemum varieties with near infrared spectroscopy","volume":"10","author":"Han","year":"2014","journal-title":"Pharmacogn. Mag."},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1345","DOI":"10.1007\/s10096-014-2078-y","article-title":"Discrimination of the Acinetobacter calcoaceticus-Acinetobacter baumannii complex species by Fourier transform infrared spectroscopy","volume":"33","author":"Sousa","year":"2014","journal-title":"Eur. J. Clin. Microbiol. Infect. Dis."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"102","DOI":"10.1016\/j.mimet.2013.02.008","article-title":"Serotype discrimination of encapsulated Streptococcus pneumoniae strains by Fourier-transform infrared spectroscopy and chemometrics","volume":"93","author":"Vaz","year":"2013","journal-title":"J. Microbiol. Methods"},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"128204","DOI":"10.1016\/j.neucom.2024.128204","article-title":"Interpretability of deep neural networks: A review of methods, classification and hardware","volume":"601","author":"Antamis","year":"2024","journal-title":"Neurocomputing"},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/0003-2670(86)80028-9","article-title":"Partial least-squares regression: A tutorial","volume":"185","author":"Geladi","year":"1986","journal-title":"Anal. Chim. Acta"},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"4126","DOI":"10.1021\/ac980506o","article-title":"Variable Selection in Discriminant Partial Least-Squares Analysis","volume":"70","author":"Alsberg","year":"1998","journal-title":"Anal. Chem."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"N\u00e6s, T., Isaksson, T., Fearn, T., and Davies, T. (2017). A User-Friendly Guide to Multivariate Calibration and Classification, IM Publications Open.","DOI":"10.1255\/978-1-906715-25-0"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"1627","DOI":"10.1021\/ac60214a047","article-title":"Smoothing and Differentiation of Data by Simplified Least Squares Procedures","volume":"36","author":"Savitzky","year":"1964","journal-title":"Anal. Chem."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"15","DOI":"10.1016\/j.compag.2016.05.014","article-title":"Exploratory study on vineyards soil mapping by visible\/near-infrared spectroscopy of grapevine leaves","volume":"127","author":"Lopo","year":"2016","journal-title":"Comput. Electron. Agric."},{"key":"ref_15","first-page":"2825","article-title":"Scikit-learn: Machine learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"2407","DOI":"10.33395\/sinkron.v7i4.11792","article-title":"Stratified K-fold cross validation optimization on machine learning for prediction","volume":"7","author":"Widodo","year":"2022","journal-title":"Sinkron"},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"92065","DOI":"10.1039\/C6RA16769A","article-title":"Discrimination of clinically relevant Candida species by Fourier-transform infrared spectroscopy with attenuated total reflectance (FTIR-ATR)","volume":"6","author":"Silva","year":"2016","journal-title":"RSC Adv."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"3953","DOI":"10.1002\/jsfa.8918","article-title":"Citrus species and hybrids depicted by near- and mid-infrared spectroscopy","volume":"98","author":"Moreira","year":"2018","journal-title":"J. Sci. Food Agric."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Guti\u00e9rrez, S., Tardaguila, J., Fern\u00e1ndez-Novales, J., and Diago, M.P. (2016). Data Mining and NIR Spectroscopy in Viticulture: Applications for Plant Phenotyping under Field Conditions. Sensors, 16.","DOI":"10.3390\/s16020236"},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"69","DOI":"10.1016\/j.talanta.2017.12.030","article-title":"Varietal discrimination of hop pellets by near and mid infrared spectroscopy","volume":"180","author":"Machado","year":"2018","journal-title":"Talanta"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"183","DOI":"10.1016\/j.isprsjprs.2018.03.013","article-title":"Connecting infrared spectra with plant traits to identify species","volume":"139","author":"Buitrago","year":"2018","journal-title":"ISPRS J. Photogramm. Remote Sens."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Li, Y., Via, B.K., Han, F., Li, Y., and Pei, Z. (2023). Comparison of various chemometric methods on visible and near-infrared spectral analysis for wood density prediction among different tree species and geographical origins. Front. Plant Sci., 14.","DOI":"10.3389\/fpls.2023.1121287"},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"5671","DOI":"10.1002\/jsfa.9828","article-title":"Comprehensive comparison of multiple quantitative near-infrared spectroscopy models for Aspergillus flavus contamination detection in peanut","volume":"99","author":"Li","year":"2019","journal-title":"J. Sci. Food Agric."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"104835","DOI":"10.1016\/j.infrared.2023.104835","article-title":"Near-Infrared spectroscopy combined with machine learning methods for distinguishment of the storage years of rice","volume":"133","author":"Huang","year":"2023","journal-title":"Infrared Phys. Technol."},{"key":"ref_25","unstructured":"Lange, J., Komissarov, L., Lang, R., Enkelmann, D.D., and Anelli, A. (2024). Automatic solid form classification in pharmaceutical drug development. arXiv."},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"2004","DOI":"10.1021\/acs.jcim.0c00020","article-title":"Rapid Identification of X-ray Diffraction Patterns Based on Very Limited Data by Interpretable Convolutional Neural Networks","volume":"60","author":"Wang","year":"2020","journal-title":"J. Chem. Inf. Model."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Schuetzke, J., Benedix, A., Mikut, R., and Reischl, M. (2020, January 26\u201327). Siamese Networks for 1D Signal Identification. Proceedings of the 30 Workshop Computational Intelligence, Berlin, Germany.","DOI":"10.58895\/ksp\/1000124139-2"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"G\u00fcnd\u00fcz, H.A., Binder, M., To, X.Y., Mreches, R., Bischl, B., McHardy, A.C., M\u00fcnch, P.C., and Rezaei, M. (2023). A self-supervised deep learning method for data-efficient training in genomics. Commun. Biol., 6.","DOI":"10.1038\/s42003-023-05310-2"}],"container-title":["Molecules"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1420-3049\/30\/3\/546\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,8]],"date-time":"2025-10-08T10:35:59Z","timestamp":1759919759000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1420-3049\/30\/3\/546"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,1,25]]},"references-count":28,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2025,2]]}},"alternative-id":["molecules30030546"],"URL":"https:\/\/doi.org\/10.3390\/molecules30030546","relation":{},"ISSN":["1420-3049"],"issn-type":[{"value":"1420-3049","type":"electronic"}],"subject":[],"published":{"date-parts":[[2025,1,25]]}}}