{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,26]],"date-time":"2026-02-26T20:34:29Z","timestamp":1772138069852,"version":"3.50.1"},"reference-count":42,"publisher":"Oxford University Press (OUP)","issue":"10","license":[{"start":{"date-parts":[[2024,9,30]],"date-time":"2024-09-30T00:00:00Z","timestamp":1727654400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,10,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:sec>\n                    <jats:title>Summary<\/jats:title>\n                    <jats:p>Computational metabolomics workflows have revolutionized the untargeted metabolomics field. However, the organization and prioritization of metabolite features remains a laborious process. Organizing metabolomics data is often done through mass fragmentation-based spectral similarity grouping, resulting in feature sets that also represent an intuitive and scientifically meaningful first stage of analysis in untargeted metabolomics. Exploiting such feature sets, feature-set testing has emerged as an approach that is widely used in genomics and targeted metabolomics pathway enrichment analyses. It allows for formally combining groupings with statistical testing into more meaningful pathway enrichment conclusions. Here, we present msFeaST (mass spectral Feature Set Testing), a feature-set testing and visualization workflow for LC-MS\/MS untargeted metabolomics data. Feature-set testing involves statistically assessing differential abundance patterns for groups of features across experimental conditions. We developed msFeaST to make use of spectral similarity-based feature groupings generated using k-medoids clustering, where the resulting clusters serve as a proxy for grouping structurally similar features with potential biosynthesis pathway relationships. Spectral clustering done in this way allows for feature group-wise statistical testing using the globaltest package, which provides high power to detect small concordant effects via joint modeling and reduced multiplicity adjustment penalties. Hence, msFeaST provides interactive integration of the semi-quantitative experimental information with mass-spectral structural similarity information, enhancing the prioritization of features and feature sets during exploratory data analysis.<\/jats:p>\n                  <\/jats:sec>\n                  <jats:sec>\n                    <jats:title>Availability and implementation<\/jats:title>\n                    <jats:p>The msFeaST workflow is freely available through https:\/\/github.com\/kevinmildau\/msFeaST and built to work on MacOS and Linux systems.<\/jats:p>\n                  <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btae584","type":"journal-article","created":{"date-parts":[[2024,9,26]],"date-time":"2024-09-26T07:22:09Z","timestamp":1727335329000},"source":"Crossref","is-referenced-by-count":5,"title":["Combined LC-MS\/MS feature grouping, statistical prioritization, and interactive networking in msFeaST"],"prefix":"10.1093","volume":"40","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-7776-0890","authenticated-orcid":false,"given":"Kevin","family":"Mildau","sequence":"first","affiliation":[{"name":"Bioinformatics Group, Department of Plant Sciences, Wageningen University & Research , Radix Building, Droevendaalsesteeg 1 , Wageningen, 6708PB,","place":["the Netherlands"]},{"name":"Department of Analytical Chemistry, University of Vienna , Vienna 1090,","place":["Austria"]},{"name":"Doctoral School in Chemistry (DOSCHEM), University of Vienna , Vienna 1090,","place":["Austria"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1729-9785","authenticated-orcid":false,"given":"Christoph","family":"B\u00fcschl","sequence":"additional","affiliation":[{"name":"Department of Agrobiotechnology, Institute of Bioanalytics and Agro-Metabolomics, University of Natural Resources and Life Sciences, Konrad-Lorenz-Stra\u00dfe , Lower Austria 3430,","place":["Austria"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1964-2455","authenticated-orcid":false,"given":"J\u00fcrgen","family":"Zanghellini","sequence":"additional","affiliation":[{"name":"Department of Analytical Chemistry, University of Vienna , Vienna 1090,","place":["Austria"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9340-5511","authenticated-orcid":false,"given":"Justin J J","family":"van der Hooft","sequence":"additional","affiliation":[{"name":"Bioinformatics Group, Department of Plant Sciences, Wageningen University & Research , Radix Building, Droevendaalsesteeg 1 , Wageningen, 6708PB,","place":["the Netherlands"]},{"name":"Department of Biochemistry, University of Johannesburg , Johannesburg, Gauteng Province 2006,","place":["South Africa"]}]}],"member":"286","published-online":{"date-parts":[[2024,9,30]]},"reference":[{"key":"2024101317312719600_btae584-B1","doi-asserted-by":"publisher","first-page":"1967","DOI":"10.1039\/d1np00023c","article-title":"Advances in decomposing complex metabolite mixtures using substructure- and network-based computational metabolomics approaches","volume":"38","author":"Beniddir","year":"2021","journal-title":"Nat Prod Rep"},{"key":"2024101317312719600_btae584-B2","doi-asserted-by":"publisher","first-page":"799","DOI":"10.1038\/s41596-019-0264-1","article-title":"Using microbiomeanalyst for comprehensive statistical, functional, and meta-analysis of microbiome data","volume":"15","author":"Chong","year":"2020","journal-title":"Nat Protoc"},{"key":"2024101317312719600_btae584-B3","author":"de Jonge","year":"2024"},{"key":"2024101317312719600_btae584-B4","doi-asserted-by":"publisher","first-page":"103","DOI":"10.1007\/s11306-022-01963-y","article-title":"Good practices and recommendations for using and benchmarking computational metabolomics metabolite annotation tools","volume":"18","author":"de Jonge","year":"2022","journal-title":"Metabolomics"},{"key":"2024101317312719600_btae584-B5","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/s13321-016-0174-y","article-title":"Classyfire: automated chemical classification with a comprehensive, computable taxonomy","volume":"8","author":"Djoumbou Feunang","year":"2016","journal-title":"J Cheminform"},{"key":"2024101317312719600_btae584-B6","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1177\/1473871611425872","article-title":"Visualizing explicit and implicit relations of complex information spaces","volume":"11","author":"Dork","year":"2011","journal-title":"Inf Vis"},{"key":"2024101317312719600_btae584-B7","doi-asserted-by":"publisher","first-page":"e12693","DOI":"10.1371\/journal.pone.0012693","article-title":"Self-contained gene-set analysis of expression data: an evaluation of existing and novel methods","volume":"5","author":"Fridley","year":"2010","journal-title":"PLoS One"},{"key":"2024101317312719600_btae584-B8","doi-asserted-by":"crossref","first-page":"967","DOI":"10.1111\/rssa.12276","article-title":"Beyond subjective and objective in statistics","volume":"180","author":"Gelman","year":"2017","journal-title":"J R Stat Soc Ser A Stat Soc"},{"key":"2024101317312719600_btae584-B9","first-page":"93","volume-title":"Bioinformatics","author":"Goeman"},{"key":"2024101317312719600_btae584-B10","doi-asserted-by":"publisher","first-page":"980","DOI":"10.1093\/bioinformatics\/btm051","article-title":"Analyzing gene expression data in terms of gene sets: methodological issues","volume":"23","author":"Goeman","year":"2007","journal-title":"Bioinformatics"},{"key":"2024101317312719600_btae584-B11","doi-asserted-by":"publisher","first-page":"93","DOI":"10.1093\/bioinformatics\/btg382","article-title":"A global test for groups of genes: testing association with a clinical outcome","volume":"20","author":"Goeman","year":"2004","journal-title":"Bioinformatics"},{"key":"2024101317312719600_btae584-B12","doi-asserted-by":"publisher","first-page":"87","DOI":"10.1016\/j.visinf.2022.04.003","article-title":"New guidance for using t-sne: alternative defaults, hyperparameter selection automation, and comparative evaluation","volume":"6","author":"Gove","year":"2022","journal-title":"Vis Inf"},{"key":"2024101317312719600_btae584-B13","doi-asserted-by":"publisher","first-page":"2411","DOI":"10.21105\/joss.02411","article-title":"matchms \u2013 processing and similarity evaluation of mass spectrometry data","volume":"5","author":"Huber","year":"2020","journal-title":"JOSS"},{"key":"2024101317312719600_btae584-B14","doi-asserted-by":"publisher","first-page":"e1008724","DOI":"10.1371\/journal.pcbi.1008724","article-title":"Spec2vec: improved mass spectral similarity scoring through learning of structural relationships","volume":"17","author":"Huber","year":"2021","journal-title":"PLoS Comput Biol"},{"key":"2024101317312719600_btae584-B15","doi-asserted-by":"publisher","first-page":"84","DOI":"10.1186\/s13321-021-00558-4","article-title":"Ms2deepscore: a novel deep learning similarity measure to compare tandem mass spectra","volume":"13","author":"Huber","year":"2021","journal-title":"J Cheminform"},{"key":"2024101317312719600_btae584-B16","doi-asserted-by":"publisher","first-page":"651","DOI":"10.1016\/j.patrec.2009.09.011","article-title":"Data clustering: 50 years beyond k-means","volume":"31","author":"Jain","year":"2010","journal-title":"Pattern Recognit Lett"},{"key":"2024101317312719600_btae584-B17","doi-asserted-by":"publisher","author":"Khatib","year":"2024","DOI":"10.1101\/2024.02.09.579616"},{"key":"2024101317312719600_btae584-B18","doi-asserted-by":"crossref","first-page":"2795","DOI":"10.1021\/acs.jnatprod.1c00399","article-title":"Npclassifier: a deep neural network-based structural classification tool for natural products","volume":"84","author":"Kim","year":"2021","journal-title":"J Nat Prod"},{"key":"2024101317312719600_btae584-B19","first-page":"e1012403","volume-title":"PLoS Comput Biol","author":"Lause","year":"2024"},{"key":"2024101317312719600_btae584-B20","first-page":"2579","article-title":"Visualizing data using t-sne","volume":"9","author":"Maaten","year":"2008","journal-title":"J Mach Learn Res"},{"key":"2024101317312719600_btae584-B21","doi-asserted-by":"publisher","first-page":"42","DOI":"10.1186\/s40246-019-0226-2","article-title":"Size matters: how sample size affects the reproducibility and specificity of gene set analysis","volume":"13","author":"Maleki","year":"2019","journal-title":"Hum Genomics"},{"key":"2024101317312719600_btae584-B22","doi-asserted-by":"publisher","first-page":"654","DOI":"10.3389\/fgene.2020.00654","article-title":"Gene set analysis: challenges, opportunities, and future research","volume":"11","author":"Maleki","year":"2020","journal-title":"Front Genet"},{"key":"2024101317312719600_btae584-B23","doi-asserted-by":"publisher","first-page":"103","DOI":"10.3390\/metabo11020103","article-title":"Ranking metabolite sets by their activity levels","volume":"11","author":"McLuskey","year":"2021","journal-title":"Metabolites"},{"key":"2024101317312719600_btae584-B24","doi-asserted-by":"publisher","first-page":"5798","DOI":"10.1021\/acs.analchem.3c04444","article-title":"Tailored mass spectral data exploration using the specxplore interactive dashboard","volume":"96","author":"Mildau","year":"2024","journal-title":"Anal Chem"},{"key":"2024101317312719600_btae584-B25","doi-asserted-by":"publisher","first-page":"905","DOI":"10.1038\/s41592-020-0933-6","article-title":"Feature-based molecular networking in the gnps analysis environment","volume":"17","author":"Nothias","year":"2020","journal-title":"Nat Methods"},{"key":"2024101317312719600_btae584-B26","doi-asserted-by":"publisher","first-page":"13900","DOI":"10.1021\/acs.analchem.8b03099","article-title":"Metgem software for the generation of molecular networks based on the t-sne algorithm","volume":"90","author":"Olivon","year":"2018","journal-title":"Anal Chem"},{"key":"2024101317312719600_btae584-B27","doi-asserted-by":"crossref","DOI":"10.1038\/s41596-024-01046-3","article-title":"Statistical analysis of feature-based molecular networking results from non-targeted metabolomics data","author":"Pakkir Shah","year":"2024","journal-title":"Nat Protoc"},{"key":"2024101317312719600_btae584-B28","doi-asserted-by":"publisher","first-page":"W388","DOI":"10.1093\/nar\/gkab382","article-title":"Metaboanalyst 5.0: narrowing the gap between raw spectra and functional insights","volume":"49","author":"Pang","year":"2021","journal-title":"Nucleic Acids Res"},{"key":"2024101317312719600_btae584-B29","year":"2015"},{"key":"2024101317312719600_btae584-B30","doi-asserted-by":"publisher","first-page":"37","DOI":"10.1007\/s11306-018-1335-y","article-title":"From correlation to causation: analysis of metabolomics data using systems biology approaches","volume":"14","author":"Rosato","year":"2018","journal-title":"Metabolomics"},{"key":"2024101317312719600_btae584-B31","doi-asserted-by":"publisher","first-page":"447","DOI":"10.1038\/s41587-023-01690-2","article-title":"Integrative analysis of multimodal mass spectrometry data in mzmine 3","volume":"41","author":"Schmid","year":"2023","journal-title":"Nat Biotechnol"},{"key":"2024101317312719600_btae584-B32","doi-asserted-by":"publisher","first-page":"4183","DOI":"10.21105\/joss.04183","article-title":"Fast k-medoids clustering in rust and python","volume":"7","author":"Schubert","year":"2022","journal-title":"JOSS"},{"key":"2024101317312719600_btae584-B33","doi-asserted-by":"publisher","first-page":"101804","DOI":"10.1016\/j.is.2021.101804","article-title":"Fast and eager k-medoids clustering: O (k) runtime improvement of the pam, clara, and clarans algorithms","volume":"101","author":"Schubert","year":"2021","journal-title":"Inf Syst"},{"key":"2024101317312719600_btae584-B34","doi-asserted-by":"crossref","first-page":"702","DOI":"10.1177\/1745691616658637","article-title":"Increasing transparency through a multiverse analysis","volume":"11","author":"Steegen","year":"2016","journal-title":"Perspect Psychol Sci"},{"key":"2024101317312719600_btae584-B35","doi-asserted-by":"publisher","DOI":"10.7554\/elife.52157","article-title":"Open exploration","volume":"9","author":"Thompson","year":"2020","journal-title":"Elife"},{"key":"2024101317312719600_btae584-B36","doi-asserted-by":"publisher","first-page":"411","DOI":"10.1111\/1467-9868.00293","article-title":"Estimating the number of clusters in a data set via the gap statistic","volume":"63","author":"Tibshirani","year":"2001","journal-title":"J R Stat Soc Ser B Stat Methodol"},{"key":"2024101317312719600_btae584-B37","doi-asserted-by":"publisher","first-page":"225","DOI":"10.1186\/1471-2105-6-225","article-title":"Pathway level analysis of gene expression using singular value decomposition","volume":"6","author":"Tomfohr","year":"2005","journal-title":"BMC Bioinformatics"},{"key":"2024101317312719600_btae584-B38","doi-asserted-by":"publisher","first-page":"23","DOI":"10.2307\/2682991","article-title":"We need both exploratory and confirmatory","volume":"34","author":"Tukey","year":"1980","journal-title":"Am Stat"},{"key":"2024101317312719600_btae584-B39","year":"2024"},{"key":"2024101317312719600_btae584-B40","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.1203689109","article-title":"Mass spectral molecular networking of living microbial colonies","volume":"109","author":"Watrous","year":"2012","journal-title":"Proc Natl Acad Sci USA"},{"key":"2024101317312719600_btae584-B41","doi-asserted-by":"publisher","first-page":"1832","DOI":"10.3389\/fpsyg.2016.01832","article-title":"Degrees of freedom in planning, running, analyzing, and reporting psychological studies: a checklist to avoid p-hacking","volume":"7","author":"Wicherts","year":"2016","journal-title":"Front Psychol"},{"key":"2024101317312719600_btae584-B42","doi-asserted-by":"publisher","first-page":"704","DOI":"10.1021\/acs.analchem.8b05112","article-title":"Accelerating metabolite identification in natural product research: toward an ideal combination of liquid chromatography\u2014high-resolution tandem mass spectrometry and nmr profiling, in silico databases, and chemometrics","volume":"91","author":"Wolfender","year":"2018","journal-title":"Anal Chem"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btae584\/59460036\/btae584.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/10\/btae584\/59738883\/btae584.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/10\/btae584\/59738883\/btae584.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,13]],"date-time":"2024-10-13T13:31:49Z","timestamp":1728826309000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btae584\/7796532"}},"subtitle":[],"editor":[{"given":"Pier Luigi","family":"Martelli","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2024,9,30]]},"references-count":42,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2024,10,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btae584","relation":{"has-preprint":[{"id-type":"doi","id":"10.26434\/chemrxiv-2024-h7sm8","asserted-by":"object"}]},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,10]]},"published":{"date-parts":[[2024,9,30]]},"article-number":"btae584"}}