{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,11]],"date-time":"2026-04-11T14:48:13Z","timestamp":1775918893683,"version":"3.50.1"},"reference-count":52,"publisher":"Oxford University Press (OUP)","issue":"7","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2005,4,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: The major difficulties relating to mathematical modelling of spectroscopic data are inconsistencies in spectral reproducibility and the black box nature of the modelling techniques. For the analysis of biological samples the first problem is due to biological, experimental and machine variability which can lead to sample size differences and unavoidable baseline shifts. Consequently, there is often a requirement for mathematical correction(s) to be made to the raw data if the best possible model is to be formed. The second problem prevents interpretation of the results since the variables that most contribute to the analysis are not easily revealed; as a result, the opportunity to obtain new knowledge from such data is lost.<\/jats:p><jats:p>Methods: We used genetic algorithms (GAs) to select spectral pre-processing steps for Fourier transform infrared (FT-IR) spectroscopic data. We demonstrate a novel approach for the selection of important discriminatory variables by GA from FT-IR spectra for multi-class identification by discriminant function analysis (DFA).<\/jats:p><jats:p>Results: The GA selects sensible pre-processing steps from a total of \u223c1010 possible mathematical transformations. Application of these algorithms results in a 16% reduction in the model error when compared against the raw data model. GA-DFA recovers six variables from the full set of 882 spectral variables against which a satisfactory DFA model can be formed; thus inferences can be made as to the biochemical differences that are reflected by these spectral bands.<\/jats:p><jats:p>Availability: Supplementary information, datasets and scripts are available from the corresponding author.<\/jats:p><jats:p>Contact: \u00a0roy.goodacre@manchester.ac.uk<\/jats:p>","DOI":"10.1093\/bioinformatics\/bti102","type":"journal-article","created":{"date-parts":[[2004,10,29]],"date-time":"2004-10-29T00:51:45Z","timestamp":1099011105000},"page":"860-868","source":"Crossref","is-referenced-by-count":149,"title":["Genetic algorithm optimization for pre-processing and variable selection of spectroscopic data"],"prefix":"10.1093","volume":"21","author":[{"given":"Roger M.","family":"Jarvis","sequence":"first","affiliation":[{"name":"Department of Chemistry, UMIST PO Box 88, Sackville St, Manchester M60 1QD, UK"}]},{"given":"Royston","family":"Goodacre","sequence":"additional","affiliation":[{"name":"Department of Chemistry, UMIST PO Box 88, Sackville St, Manchester M60 1QD, UK"}]}],"member":"286","published-online":{"date-parts":[[2004,10,28]]},"reference":[{"key":"2023013107420378100_B1","doi-asserted-by":"crossref","unstructured":"Allen, D.M. 1971Mean square error of prediction as a criterion for selecting variables. Technometrics13469\u2013475","DOI":"10.2307\/1267161"},{"key":"2023013107420378100_B2","doi-asserted-by":"crossref","unstructured":"Arnold, S.A., Crowley, J., Vaidyanathan, S., Matheson, L., Mohan, P., Hall, J.W., Harvey, L.M., McNeil, B. 2000At-line monitoring of a submerged filamentous bacterial cultivation using near-infrared spectroscopy. Enzyme Microb. Technol.27691\u2013697","DOI":"10.1016\/S0141-0229(00)00271-4"},{"key":"2023013107420378100_B3","unstructured":"B\u00e4ck, T., Fogel, D.B., Michalewicz, Z. Handbook of Evolutionary Computation1997, Oxford IOPPublishing\/Oxford University Press"},{"key":"2023013107420378100_B4","doi-asserted-by":"crossref","unstructured":"Blackstock, W.P. and Weir, M.P. 1999Proteomics: quantitative and physical mapping of cellular proteins. Trends Biotechnol.17, pp. 121\u2013127","DOI":"10.1016\/S0167-7799(98)01245-1"},{"key":"2023013107420378100_B5","doi-asserted-by":"crossref","unstructured":"Broadhurst, D., Goodacre, R., Jones, A., Rowland, J.J., Kell, D.B. 1997Genetic algorithms as a method for variable selection in multiple linear regression and partial least squares regression, with applications to pyrolysis mass spectrometry. Anal. Chim. Acta71\u201386","DOI":"10.1016\/S0003-2670(97)00065-2"},{"key":"2023013107420378100_B6","unstructured":"Burge, C.B. 2001Chipping away at the transcriptome. Nat. Genet27232\u2013234"},{"key":"2023013107420378100_B7","doi-asserted-by":"crossref","unstructured":"Chipperfield, A.J. and Fleming, P.J. 1995The MATLAB Genetic Algorithm Toolbox. IEE Colloquium Applied Control Techniques Using MATLAB , pp. 10\/11\u201310\/14","DOI":"10.1049\/ic:19950061"},{"key":"2023013107420378100_B8","unstructured":"Chipperfield, A.J., Fleming, P.J., Fonseca, C.M. 1994Genetic Algorithm Tools for Control Systems Engineering. Proceedings of Adaptive Computing in Engineering Design and Control Plymouth Engineering Design Centre, pp. 128\u2013133"},{"key":"2023013107420378100_B9","unstructured":"Chipperfield, A.J., Fleming, P.J., Pohlheim, H. 1994A Genetic Algorithm Toolbox for MATLAB. Proceedings of International Conference on Systems Engineering , UK Coventry, pp. 200\u2013207"},{"key":"2023013107420378100_B10","unstructured":"Chuzhanova, N.A., Jones, A.J., Margetts, S. 1998Feature selection for genetic sequence classification. Bioinformatics14139\u2013143"},{"key":"2023013107420378100_B11","unstructured":"Degen, I.A. Tables of Characteristic Group Frequencies for the Interpretation of Infrared and RAMAN Spectra1997, Harrow, UK Acolyte Publications"},{"key":"2023013107420378100_B12","unstructured":"Dixon, W. Biomedical Computer Programs1975, Los Angeles University of California Press"},{"key":"2023013107420378100_B13","doi-asserted-by":"crossref","unstructured":"Ellis, D.I., Broadhurst, D., Kell, D.B., Rowland, J.J., Goodacre, R. 2002Rapid and quantitative detection of the microbial spoilage of meat by Fourier transform infrared spectroscopy and machine learning. Appl. Environ. Microbiol.68, pp. 2822\u20132828","DOI":"10.1128\/AEM.68.6.2822-2828.2002"},{"key":"2023013107420378100_B14","doi-asserted-by":"crossref","unstructured":"Fiehn, O. 2002Metabolomics \u2013 the link between genotypes and phenotypes. Plant Mol. Biol.48155\u2013171","DOI":"10.1007\/978-94-010-0448-0_11"},{"key":"2023013107420378100_B15","doi-asserted-by":"crossref","unstructured":"Fiehn, O., Kopka, J., D\u00f6rmann, P., Altmann, T., Trethewey, R.N., Willmitzer, L. 2000Metabolite profiling for plant functional genomics. Nat. Biotechnol.181157\u20131161","DOI":"10.1038\/81137"},{"key":"2023013107420378100_B16","unstructured":"Goicoechea, H.C. and Olivieri, A.C. 2003A new family of genetic algorithms for wavelength interval selection in multivariate analytical spectroscopy. J. Chemometr.17338\u2013345"},{"key":"2023013107420378100_B17","unstructured":"Goldberg, D.E. Genetic Algorithms in Search, Optimization and Machine Learning1989, Reading, MA Addison-Wesley"},{"key":"2023013107420378100_B18","doi-asserted-by":"crossref","unstructured":"Goodacre, R., Neal, M.J., Kell, D.B. 1994Rapid identification using pyrolysis mass spectrometry and artificial neural networks of Propionibacterium acnes isolated from dogs. J. Appl. Bacteriol.76, pp. 124\u2013134","DOI":"10.1111\/j.1365-2672.1994.tb01607.x"},{"key":"2023013107420378100_B19","doi-asserted-by":"crossref","unstructured":"Goodacre, R., Timmins, \u00c9.M., Burton, R., Kaderbhai, N., Woodward, A., Kell, D.B., Rooney, P.J. 1998Rapid identification of urinary tract infection bacteria using hyperspectral, whole organism fingerprinting and artificial neural networks. Microbiology1441157\u20131170","DOI":"10.1099\/00221287-144-5-1157"},{"key":"2023013107420378100_B20","doi-asserted-by":"crossref","unstructured":"Goodacre, R., Timmins, E.M., Rooney, P.J., Rowland, J.J., Kell, D.B. 1996Rapid identification of Streptococcus and Enterococcus species using diffuse reflectance\u2013absorbance Fourier transform infrared spectroscopy and artificial neural networks. FEMS Microbiol. Lett.140233\u2013239","DOI":"10.1111\/j.1574-6968.1996.tb08342.x"},{"key":"2023013107420378100_B21","doi-asserted-by":"crossref","unstructured":"Goodacre, R., Vaidyanathan, S., Dunn, W.B., Harrigan, G.G., Kell, D.B. 2004Metabolomics by numbers \u2013 acquiring and understanding global metabolite data. Trends Biotechnol.22245\u2013252","DOI":"10.1016\/j.tibtech.2004.03.007"},{"key":"2023013107420378100_B22","unstructured":"Holland, J.H. Adaptation in Natural and Artificial Systems1992, Cambridge, MA MIT Press"},{"key":"2023013107420378100_B23","doi-asserted-by":"crossref","unstructured":"Jarvis, R.M. and Goodacre, R. 2004Ultra-violet resonance Raman spectroscopy for the rapid discrimination of urinary tract infection bacteria. FEMS Microbiol. Lett.232, pp. 127\u2013132","DOI":"10.1016\/S0378-1097(04)00040-0"},{"key":"2023013107420378100_B24","unstructured":"Jarvis, R.M. and Goodacre, R. 2004Rapid discrimination of bacteria using surface enhanced Raman spectroscopy. Anal. Chem.7640\u201347"},{"key":"2023013107420378100_B25","doi-asserted-by":"crossref","unstructured":"Johnson, H.E., Broadhurst, D., Goodacre, R., Smith, A.R. 2003Metabolic fingerprinting of salt-stressed tomatoes. Phytochemistry62919\u2013928","DOI":"10.1016\/S0031-9422(02)00722-7"},{"key":"2023013107420378100_B26","doi-asserted-by":"crossref","unstructured":"Johnson, H.E., Broadhurst, D., Kell, D.B., Theodorou, M.K., Merry, R.J., Griffith, G.W. 2004High-throughput metabolic fingerprinting of legume silage fermentations via Fourier transform infrared spectroscopy and chemometrics. Appl. Environ. Microbiol.701583\u20131592","DOI":"10.1128\/AEM.70.3.1583-1592.2004"},{"key":"2023013107420378100_B27","doi-asserted-by":"crossref","unstructured":"Kassama, Y., Rooney, P.J., Goodacre, R. 2002Fluorescent amplified fragment length polymorphism probabilistic database for identification of bacterial isolates from urinary tract infections. J. Clin. Microbiol.402795\u20132800","DOI":"10.1128\/JCM.40.8.2795-2800.2002"},{"key":"2023013107420378100_B28","doi-asserted-by":"crossref","unstructured":"Kell, D.B. and Oliver, S.G. 2004Here is the evidence, now what is the hypothesis? The complementary roles of inductive and hypothesis-driven science in the post-genomic era. Bioessays2699\u2013105","DOI":"10.1002\/bies.10385"},{"key":"2023013107420378100_B29","doi-asserted-by":"crossref","unstructured":"Kinoshita, E., Ozawa, Y., Aishima, T. 1998Differentiation of soy sauce types by HPLC profile pattern recognition \u2013 isolation of novel isoflavones. Flavonoids in the Living System , New York Plenum Press, pp. 117\u2013129","DOI":"10.1007\/978-1-4615-5335-9_9"},{"key":"2023013107420378100_B30","doi-asserted-by":"crossref","unstructured":"Konstam, A.H. 1993Linear discriminant analysis using genetic algorithms. Proceedings of the 1993 ACM\/SIGAPP Symposium on Applied computing: States of the Art and Practice , Indianapolis, IN ACM Press, pp. 152\u2013156","DOI":"10.1145\/162754.162848"},{"key":"2023013107420378100_B31","doi-asserted-by":"crossref","unstructured":"Konstam, A.H. 1994N-Group classification using genetic algorithms. Proceedings of the 1994 ACM Symposium on Applied Computing , Phoenix, AZ ACM Press, pp. 212\u2013216","DOI":"10.1145\/326619.326725"},{"key":"2023013107420378100_B32","doi-asserted-by":"crossref","unstructured":"Langdon, W. and Poli, R. Foundations of Genetic Programming2002, Berlin Springer-Verlag","DOI":"10.1007\/978-3-662-04726-2"},{"key":"2023013107420378100_B33","doi-asserted-by":"crossref","unstructured":"Lewis, P. 1998A genetic algorithm for maximum-likelihood phylogeny inference using nucleotide sequence data. Mol. Biol. Evol.15, pp. 277\u2013283","DOI":"10.1093\/oxfordjournals.molbev.a025924"},{"key":"2023013107420378100_B34","doi-asserted-by":"crossref","unstructured":"Li, L., Weinberg, C.R., Darden, T.A., Pederson, L.G. 2001Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA\/KNN method. Bioinformatics171131\u20131142","DOI":"10.1093\/bioinformatics\/17.12.1131"},{"key":"2023013107420378100_B35","unstructured":"Lopez-Diez, E.C. and Goodacre, R. 2004Characterization of microorganisms using UV resonance Raman spectroscopy and chemometrics. Anal. Chem.76585\u2013591"},{"key":"2023013107420378100_B36","doi-asserted-by":"crossref","unstructured":"MacFie, H., Gutteridge, C., Norris, J. 1978Use of canonical variates in differentiation of bacteria by pyrolysis gas-liquid chromatography. J. Gen. Microbiol.10467\u201374","DOI":"10.1099\/00221287-104-1-67"},{"key":"2023013107420378100_B37","unstructured":"Manly, B.F.J. Multivariate Statistical Methods: A Primer1994 2nd edn , New York Chapman & Hall\/CRC"},{"key":"2023013107420378100_B38","doi-asserted-by":"crossref","unstructured":"Maquelin, K., Choo-Smith, L.P., van Vreeswijk, T., Endtz, H.P., Smith, B., Bennett, R., Bruining, H.A., Puppels, G.J. 2000Raman spectroscopic method for identification of clinically relevant microorganisms growing on solid culture medium. Anal. Chem.72, pp. 12\u201319","DOI":"10.1021\/ac991011h"},{"key":"2023013107420378100_B39","unstructured":"Martens, H. and Naes, T. Multivariate Calibration1989, Chichester, UK Wiley"},{"key":"2023013107420378100_B40","doi-asserted-by":"crossref","unstructured":"McGovern, A.C., Broadhurst, D., Taylor, J., Kaderbhai, N., Winson, M.K., Small, D.A., Rowland, J.J., Kell, D.B., Goodacre, R. 2002Monitoring of complex industrial bioprocesses for metabolite concentrations using modern spectroscopies and machine learning: application to gibberellic acid production. Biotechnol. Bioeng.78, pp. 527\u2013538","DOI":"10.1002\/bit.10226"},{"key":"2023013107420378100_B41","unstructured":"Mitchell, M. An Introduction to Genetic Algorithms1995, Boston, MA MIT Press"},{"key":"2023013107420378100_B42","doi-asserted-by":"crossref","unstructured":"Naumann, D. 2001FT-infrared and FT-Raman spectroscopy in biomedical research. Appl. Spectrosc. Rev.36, pp. 239\u2013298","DOI":"10.1081\/ASR-100106157"},{"key":"2023013107420378100_B43","doi-asserted-by":"crossref","unstructured":"Naumann, D., Helm, D., Labischinski, H. 1991Microbiological characterizations by FT-IR spectroscopy. Nature35181\u201382","DOI":"10.1038\/351081a0"},{"key":"2023013107420378100_B44","doi-asserted-by":"crossref","unstructured":"Notredame, C., Holm, L., Higgins, D. 1998COFFEE: an objective function for multiple sequence alignments. Bioinformatics14407\u2013422","DOI":"10.1093\/bioinformatics\/14.5.407"},{"key":"2023013107420378100_B45","unstructured":"Ooi, C.H. and Tan, P. 2003Genetic algorithms applied to multi-class prediction for the analysis of gene expression data. Bioinformatics1937\u201344"},{"key":"2023013107420378100_B46","doi-asserted-by":"crossref","unstructured":"Podgorelec, V. and Kokol, P. 2000Fighting program bloat with the fractal complexity measure. Lecture Notes in Computer Science, Genetic Programming Proceedings1802326\u2013337","DOI":"10.1007\/978-3-540-46239-2_25"},{"key":"2023013107420378100_B47","doi-asserted-by":"crossref","unstructured":"Tapp, H.S., Defernez, M., Kemsley, E.K. 2003FTIR spectroscopy and multivariate analysis can distinguish the geographic origin of extra virgin olive oils. J. Agric. Food Chem.516110\u20136115","DOI":"10.1021\/jf030232s"},{"key":"2023013107420378100_B48","doi-asserted-by":"crossref","unstructured":"Timmins, E.M., Quain, D.E., Goodacre, R. 1998Differentiation of brewing yeast strains by pyrolysis mass spectrometry and Fourier transform infrared spectroscopy. Yeast14885\u2013893","DOI":"10.1002\/(SICI)1097-0061(199807)14:10<885::AID-YEA286>3.0.CO;2-G"},{"key":"2023013107420378100_B49","unstructured":"Vaidyanathan, S., Kell, D.B., Goodacre, R. 2002Rapid, high-throughput microbial characterization by metabolite and protein profiling of whole cells using soft-ionization mass spectrometry. Abstr. Pap. Am. Chem. Soc.224 011-BIOT"},{"key":"2023013107420378100_B50","doi-asserted-by":"crossref","unstructured":"Vaidyanathan, S., Kell, D.B., Goodacre, R. 2002Flow-injection electrospray ionization mass spectrometry of crude cell extracts for high-throughput bacterial identification. J. Am. Soc. Mass Spectrom.13118\u2013128","DOI":"10.1016\/S1044-0305(01)00339-7"},{"key":"2023013107420378100_B51","doi-asserted-by":"crossref","unstructured":"Vaidyanathan, S., Macaloney, G., Harvey, L.M., McNeil, B. 2001Assessment of the structure and predictive ability of models developed for monitoring key analytes in a submerged fungal bioprocess using near-infrared spectroscopy. Appl. Spectrosc.55444\u2013453","DOI":"10.1366\/0003702011951957"},{"key":"2023013107420378100_B52","unstructured":"Weckwerth, W. 2003Metabolomics in systems biology. Ann. Rev. Plant Biol.54669\u2013689"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/21\/7\/860\/48972294\/bioinformatics_21_7_860.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/21\/7\/860\/48972294\/bioinformatics_21_7_860.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,12,19]],"date-time":"2024-12-19T06:08:27Z","timestamp":1734588507000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/21\/7\/860\/268896"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2004,10,28]]},"references-count":52,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2005,4,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bti102","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2005,4,1]]},"published":{"date-parts":[[2004,10,28]]}}}