{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,4]],"date-time":"2026-05-04T10:14:14Z","timestamp":1777889654967,"version":"3.51.4"},"reference-count":41,"publisher":"MDPI AG","issue":"11","license":[{"start":{"date-parts":[[2021,11,18]],"date-time":"2021-11-18T00:00:00Z","timestamp":1637193600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100000780","name":"European Union","doi-asserted-by":"publisher","award":["731077"],"award-info":[{"award-number":["731077"]}],"id":[{"id":"10.13039\/501100000780","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001871","name":"Funda\u00e7\u00e3o para a Ci\u00eancia e Tecnologia","doi-asserted-by":"publisher","award":["PTDC\/BAA-MOL\/28675\/2017"],"award-info":[{"award-number":["PTDC\/BAA-MOL\/28675\/2017"]}],"id":[{"id":"10.13039\/501100001871","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001871","name":"Funda\u00e7\u00e3o para a Ci\u00eancia e Tecnologia","doi-asserted-by":"publisher","award":["CEECIND\/02246\/2017"],"award-info":[{"award-number":["CEECIND\/02246\/2017"]}],"id":[{"id":"10.13039\/501100001871","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001871","name":"Funda\u00e7\u00e3o para a Ci\u00eancia e Tecnologia","doi-asserted-by":"publisher","award":["LISBOA-01-0145-FEDER-022125"],"award-info":[{"award-number":["LISBOA-01-0145-FEDER-022125"]}],"id":[{"id":"10.13039\/501100001871","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Metabolites"],"abstract":"<jats:p>Metabolomics aims to perform a comprehensive identification and quantification of the small molecules present in a biological system. Due to metabolite diversity in concentration, structure, and chemical characteristics, the use of high-resolution methodologies, such as mass spectrometry (MS) or nuclear magnetic resonance (NMR), is required. In metabolomics data analysis, suitable data pre-processing, and pre-treatment procedures are fundamental, with subsequent steps aiming at highlighting the significant biological variation between samples over background noise. Traditional data analysis focuses primarily on the comparison of the features\u2019 intensity values. However, intensity data are highly variable between experimental batches, instruments, and pre-processing methods or parameters. The aim of this work was to develop a new pre-treatment method for MS-based metabolomics data, in the context of sample profiling and discrimination, considering only the occurrence of spectral features, encoding feature presence as 1 and absence as 0. This \u201cBinary Simplification\u201d encoding (BinSim) was used to transform several benchmark datasets before the application of clustering and classification methods. The performance of these methods after the BinSim pre-treatment was consistently as good as and often better than after different combinations of traditional, intensity-based, pre-treatments. Binary Simplification is, therefore, a viable pre-treatment procedure that effectively simplifies metabolomics data-analysis pipelines.<\/jats:p>","DOI":"10.3390\/metabo11110788","type":"journal-article","created":{"date-parts":[[2021,11,19]],"date-time":"2021-11-19T02:43:09Z","timestamp":1637289789000},"page":"788","update-policy":"https:\/\/doi.org\/10.3390\/mdpi_crossmark_policy","source":"Crossref","is-referenced-by-count":12,"title":["Binary Simplification as an Effective Tool in Metabolomics Data Analysis"],"prefix":"10.3390","volume":"11","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-4081-6544","authenticated-orcid":false,"given":"Francisco","family":"Traquete","sequence":"first","affiliation":[{"name":"Laborat\u00f3rio de FTICR e Espectrometria de Massa Estrutural, MARE-Marine and Environmental Sciences Centre, Faculdade de Ci\u00eancias, Universidade de Lisboa, 1749-016 Lisboa, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4000-4068","authenticated-orcid":false,"given":"Jo\u00e3o","family":"Luz","sequence":"additional","affiliation":[{"name":"Laborat\u00f3rio de FTICR e Espectrometria de Massa Estrutural, MARE-Marine and Environmental Sciences Centre, Faculdade de Ci\u00eancias, Universidade de Lisboa, 1749-016 Lisboa, Portugal"}]},{"given":"Carlos","family":"Cordeiro","sequence":"additional","affiliation":[{"name":"Laborat\u00f3rio de FTICR e Espectrometria de Massa Estrutural, MARE-Marine and Environmental Sciences Centre, Faculdade de Ci\u00eancias, Universidade de Lisboa, 1749-016 Lisboa, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3080-9682","authenticated-orcid":false,"given":"Marta","family":"Sousa Silva","sequence":"additional","affiliation":[{"name":"Laborat\u00f3rio de FTICR e Espectrometria de Massa Estrutural, MARE-Marine and Environmental Sciences Centre, Faculdade de Ci\u00eancias, Universidade de Lisboa, 1749-016 Lisboa, Portugal"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9625-8115","authenticated-orcid":false,"given":"Ant\u00f3nio E. N.","family":"Ferreira","sequence":"additional","affiliation":[{"name":"Laborat\u00f3rio de FTICR e Espectrometria de Massa Estrutural, MARE-Marine and Environmental Sciences Centre, Faculdade de Ci\u00eancias, Universidade de Lisboa, 1749-016 Lisboa, Portugal"}]}],"member":"1968","published-online":{"date-parts":[[2021,11,18]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","first-page":"e201301009","DOI":"10.5936\/csbj.201301009","article-title":"Statistical methods for the analysis of high-throughput metabolomics data","volume":"4","author":"Bartel","year":"2013","journal-title":"Comput. Struct. Biotechnol. J."},{"key":"ref_2","doi-asserted-by":"crossref","first-page":"30.2.1","DOI":"10.1002\/0471142727.mb3002s98","article-title":"Targeted metabolomics","volume":"98","author":"Roberts","year":"2012","journal-title":"Curr. Protoc. Mol. Biol."},{"key":"ref_3","first-page":"92","article-title":"Multivariate analysis in metabolomics","volume":"1","author":"Worley","year":"2013","journal-title":"Curr. Metab."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"van den Berg, R.A., Hoefsloot, H.C.J., Westerhuis, J.A., Smilde, A.K., and van der Werf, M.J. (2006). Centering, scaling, and transformations: Improving the biological information content of metabolomics data. BMC Genom., 7.","DOI":"10.1186\/1471-2164-7-142"},{"key":"ref_5","doi-asserted-by":"crossref","first-page":"10","DOI":"10.1016\/j.aca.2015.02.012","article-title":"A tutorial review: Metabolomics and partial least squares-discriminant analysis\u2014A marriage of convenience or a shotgun wedding","volume":"879","author":"Gromski","year":"2015","journal-title":"Anal. Chim. Acta"},{"key":"ref_6","doi-asserted-by":"crossref","first-page":"775","DOI":"10.3390\/metabo2040775","article-title":"A guideline to univariate statistical analysis for LC\/MS-based untargeted metabolomics-derived data","volume":"2","author":"Vinaixa","year":"2012","journal-title":"Metabolites"},{"key":"ref_7","doi-asserted-by":"crossref","first-page":"1897","DOI":"10.1007\/s13361-016-1469-y","article-title":"Untargeted metabolomics strategies-challenges and Emerging directions","volume":"27","author":"Codreanu","year":"2016","journal-title":"J. Am. Soc. Mass Spectrom."},{"key":"ref_8","doi-asserted-by":"crossref","first-page":"145","DOI":"10.1007\/978-3-319-47656-8_6","article-title":"Preprocessing and pretreatment of metabolomics data for statistical analysis","volume":"965","author":"Karaman","year":"2017","journal-title":"Adv. Exp. Med. Biol."},{"key":"ref_9","first-page":"498","article-title":"Analysis of metabolomic data: Tools, current strategies and future challenges for omics data integration","volume":"18","author":"Cambiaghi","year":"2017","journal-title":"Brief. Bioinform."},{"key":"ref_10","doi-asserted-by":"crossref","first-page":"23","DOI":"10.3389\/fbioe.2015.00023","article-title":"Analytical methods in untargeted metabolomics: State of the art in 2015","volume":"3","author":"Alonso","year":"2015","journal-title":"Front. Bioeng. Biotechnol."},{"key":"ref_11","doi-asserted-by":"crossref","first-page":"318","DOI":"10.1016\/j.chroma.2007.04.021","article-title":"Data processing for mass spectrometry-based metabolomics","volume":"1158","author":"Katajamaa","year":"2007","journal-title":"J. Chromatogr. A"},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Villas-Boas, S.G., and Roessner, U. (2007). Data analysis. Metabolome Analysis: An Introduction, Wiley. Chapter 5.","DOI":"10.1002\/0470105518"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"96","DOI":"10.2174\/157489312799304431","article-title":"Bioinformatics tools for mass spectroscopy-based metabolomic data processing and analysis","volume":"7","author":"Sugimoto","year":"2012","journal-title":"Curr. Bioinform."},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"663","DOI":"10.1038\/s41598-017-19120-0","article-title":"Missing value imputation approach for mass spectrometry-based Metabolomics data","volume":"8","author":"Wei","year":"2018","journal-title":"Sci. Rep."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"93","DOI":"10.1007\/s11306-016-1030-9","article-title":"Non-targeted UHPLC-MS metabolomic data processing methods: A comparative investigation of normalisation, missing value imputation, transformation and scaling","volume":"12","author":"Engel","year":"2016","journal-title":"Metabolomics"},{"key":"ref_16","doi-asserted-by":"crossref","first-page":"333","DOI":"10.1007\/978-1-4939-1258-2_22","article-title":"Statistical analysis and modeling of mass spectrometry-based metabolomics data","volume":"1198","author":"Xi","year":"2014","journal-title":"Methods Mol. Biol."},{"key":"ref_17","doi-asserted-by":"crossref","first-page":"10918","DOI":"10.1038\/s41598-020-67939-x","article-title":"Inter-laboratory reproducibility of an untargeted metabolomics GC\u2013MS assay for analysis of human plasma","volume":"10","author":"Lin","year":"2020","journal-title":"Sci. Rep."},{"key":"ref_18","doi-asserted-by":"crossref","first-page":"112","DOI":"10.1093\/bioinformatics\/btr597","article-title":"Missforest-non-parametric missing value imputation for mixed-type data","volume":"28","author":"Stekhoven","year":"2012","journal-title":"Bioinformatics"},{"key":"ref_19","doi-asserted-by":"crossref","first-page":"4281","DOI":"10.1021\/ac051632c","article-title":"Probabilistic quotient normalization as robust method to account for dilution of complex biological mixtures. Application in 1H NMR metabonomics","volume":"78","author":"Dieterle","year":"2006","journal-title":"Anal. Chem."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"33","DOI":"10.2307\/1217208","article-title":"The comparison of dendrograms by objective methods","volume":"11","author":"Sokal","year":"1962","journal-title":"Taxon"},{"key":"ref_21","first-page":"440","article-title":"Stability of two hierarchical grouping techniques case 1: Sensitivity to data errors","volume":"69","author":"Baker","year":"1974","journal-title":"J. Am. Stat. Assoc."},{"key":"ref_22","doi-asserted-by":"crossref","first-page":"3526","DOI":"10.1039\/C8AN00599K","article-title":"Partial least squares-discriminant analysis (PLS-DA) for classification of high-dimensional (HD) data: A review of contemporary practice strategies and knowledge gaps","volume":"143","author":"Lee","year":"2018","journal-title":"Analyst"},{"key":"ref_23","first-page":"431","article-title":"Understanding variable importances in forests of randomized trees","volume":"26","author":"Louppe","year":"2013","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"ref_24","doi-asserted-by":"crossref","first-page":"62","DOI":"10.1016\/j.chemolab.2012.07.010","article-title":"A review of variable selection methods in partial least squares regression","volume":"118","author":"Mehmood","year":"2012","journal-title":"Chemom. Intell. Lab. Syst."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Kokla, M., Virtanen, J., Kolehmainen, M., Paananen, J., and Hanhineva, K. (2019). Random forest-based imputation outperforms other methods for imputing LC-MS metabolomics data: A comparative study. BMC Bioinform., 20.","DOI":"10.1186\/s12859-019-3110-0"},{"key":"ref_26","doi-asserted-by":"crossref","first-page":"W388","DOI":"10.1093\/nar\/gkab382","article-title":"metaboanalyst 5.0: Narrowing the gap between raw spectra and functional insights","volume":"49","author":"Pang","year":"2021","journal-title":"Nucleic. Acids Res."},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"1641","DOI":"10.1214\/12-EJS724","article-title":"Non-metric partial least squares","volume":"6","author":"Russolillo","year":"2012","journal-title":"Electron. J. Stat."},{"key":"ref_28","unstructured":"Maia, M., Figueiredo, A., Silva, M.S., and Ferreira, A. (2020). Grapevine untargeted metabolomics to uncover potential biomarkers of fungal\/oomycetes-associated diseases. Dataset."},{"key":"ref_29","doi-asserted-by":"crossref","first-page":"15688","DOI":"10.1038\/s41598-020-72781-2","article-title":"Integrating metabolomics and targeted gene expression to uncover potential biomarkers of fungal\/oomycetes\u2014Associated disease susceptibility in grapevine","volume":"10","author":"Maia","year":"2020","journal-title":"Sci. Rep."},{"key":"ref_30","unstructured":"Ferreira, A.E.N., and Traquete, F. (2021). Metabolinks: A Python package for high-resolution-MS metabolomics data analysis. Datasets."},{"key":"ref_31","unstructured":"Luz, J. (2021). Metabolomic Effects of Single Gene Deletions in Saccharomyces Cerevisiae. [Master\u2019s Thesis, Faculdade de Ci\u00eancias da Universidade de Lisboa]."},{"key":"ref_32","unstructured":"Sousa Silva, M., Luz, J., Pend\u00e3o, A.S., and Cordeiro, C. (2021). Magnetic Resonance Mass Spectrometry (MRMS) Discriminates Yeast Mutants through Metabolomics and Analysis, Bruker. Application Note."},{"key":"ref_33","unstructured":"Luz, J., Pend\u00e3o, A.S., Silva, M.S., and Cordeiro, C. (2021). FT-ICR-MS based untargeted metabolomics for the discrimination of yeast mutants. Dataset."},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"1316","DOI":"10.1021\/acs.jproteome.8b00926","article-title":"Preoperative metabolic signatures of prostate cancer recurrence following radical prostatectomy","volume":"18","author":"Clendinen","year":"2019","journal-title":"J. Proteome Res."},{"key":"ref_35","first-page":"547","article-title":"Etude de la distribution florale dans une portion des Alpes et du Jura","volume":"37","author":"Jaccard","year":"1901","journal-title":"Bull. Soc. Vaud. Sci. Nat."},{"key":"ref_36","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1098\/rsta.1900.0019","article-title":"On the Association of Attributes in Statistics: With illustrations from the material of the childhood society, &c","volume":"194","author":"Yule","year":"1900","journal-title":"Philos. Trans. R. Soc. Lond. Ser. A"},{"key":"ref_37","doi-asserted-by":"crossref","first-page":"147","DOI":"10.1002\/j.1538-7305.1950.tb00463.x","article-title":"Error detecting and error correcting codes","volume":"29","author":"Hamming","year":"1950","journal-title":"Bell Syst. Tech. J."},{"key":"ref_38","doi-asserted-by":"crossref","first-page":"3718","DOI":"10.1093\/bioinformatics\/btv428","article-title":"Dendextend: An R package for visualizing, adjusting and comparing trees of hierarchical clustering","volume":"31","author":"Galili","year":"2015","journal-title":"Bioinformatics"},{"key":"ref_39","doi-asserted-by":"crossref","first-page":"297","DOI":"10.1093\/bib\/bbn058","article-title":"A roadmap of clustering algorithms: Finding a match for a biomedical application","volume":"10","author":"Andreopoulos","year":"2009","journal-title":"Brief. Bioinform."},{"key":"ref_40","first-page":"2825","article-title":"Scikit-learn: Machine Learning in Python","volume":"12","author":"Pedregosa","year":"2011","journal-title":"J. Mach. Learn. Res."},{"key":"ref_41","unstructured":"McKinney, W. (July, January 28). Data Structures for Statistical Computing in Python. Proceedings of the 9th Python in Science Conference, Austin, TX, USA."}],"container-title":["Metabolites"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2218-1989\/11\/11\/788\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,10,11]],"date-time":"2025-10-11T07:32:13Z","timestamp":1760167933000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2218-1989\/11\/11\/788"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,11,18]]},"references-count":41,"journal-issue":{"issue":"11","published-online":{"date-parts":[[2021,11]]}},"alternative-id":["metabo11110788"],"URL":"https:\/\/doi.org\/10.3390\/metabo11110788","relation":{},"ISSN":["2218-1989"],"issn-type":[{"value":"2218-1989","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,11,18]]}}}