{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,5]],"date-time":"2026-04-05T20:02:10Z","timestamp":1775419330572,"version":"3.50.1"},"reference-count":36,"publisher":"Oxford University Press (OUP)","issue":"7","license":[{"start":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T00:00:00Z","timestamp":1750118400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000057","name":"National Institute of General Medical Sciences","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000057","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["R35GM148219"],"award-info":[{"award-number":["R35GM148219"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"NIH","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2025,7,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>A major challenge in metabolomics is annotation: assigning molecular structures to mass spectral fragmentation patterns. Despite recent advances in molecule-to-spectra and in spectra-to-molecular fingerprint (FP) prediction, annotation rates remain low.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We introduce in this article a novel tool (JESTR) for annotation. Unlike prior approaches that \u201cexplicitly\u201d construct molecular FPs or spectra, JESTR leverages the insight that molecules and their corresponding spectra are views of the same data and effectively embeds their representations in a joint space. Candidate structures are ranked based on cosine similarity between the embeddings of query spectrum and each candidate. We evaluate JESTR against mol-to-spec, spec-to-FP, and spec-mol matching annotation tools on four datasets. On average, for rank@[1\u201320], JESTR outperforms other tools by 55.5%\u2013302.6%. We further demonstrate the strong value of regularization with candidate molecules during training, boosting rank@1 performance by 5.72% across all datasets and enhancing the model\u2019s ability to discern between target and candidate molecules. When comparing JESTR\u2019s performance against that of publicly available pretrained models of SIRIUS and CFM-ID on appropriate subsets of MassSpecGym dataset, JESTR outperforms these tools by 31% and 238%, respectively. Through JESTR, we offer a novel promising avenue toward accurate annotation, therefore unlocking valuable insights into the metabolome.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and implementation<\/jats:title>\n                  <jats:p>Code and dataset available at https:\/\/github.com\/HassounLab\/JESTR1\/.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btaf354","type":"journal-article","created":{"date-parts":[[2025,6,27]],"date-time":"2025-06-27T08:06:48Z","timestamp":1751011608000},"source":"Crossref","is-referenced-by-count":5,"title":["JESTR: Joint Embedding Space Technique for Ranking candidate molecules for the annotation of untargeted metabolomics data"],"prefix":"10.1093","volume":"41","author":[{"given":"Apurva","family":"Kalia","sequence":"first","affiliation":[{"name":"Department of Computer Science, Tufts University , Medford, MA 02155,","place":["United States"]}]},{"given":"Yan","family":"Zhou Chen","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Tufts University , Medford, MA 02155,","place":["United States"]}]},{"given":"Dilip","family":"Krishnan","sequence":"additional","affiliation":[{"name":"Google DeepMind , Mountain View, CA 94043,","place":["United States"]}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9477-2199","authenticated-orcid":false,"given":"Soha","family":"Hassoun","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Tufts University , Medford, MA 02155,","place":["United States"]},{"name":"Department of Chemical and Biological Engineering, Tufts University , Medford, MA 02155,","place":["United States"]}]}],"member":"286","published-online":{"date-parts":[[2025,6,17]]},"reference":[{"key":"2025070713414038800_btaf354-B1","doi-asserted-by":"publisher","author":"Bushuiev","year":"2025","DOI":"10.1038\/s41587-025-02663-3"},{"key":"2025070713414038800_btaf354-B2","first-page":"110010","article-title":"MassSpecGym: a benchmark for the discovery and identification of molecules","volume":"37","author":"Bushuiev","year":"2024","journal-title":"Adv Neural Inf Process Syst"},{"key":"2025070713414038800_btaf354-B3","doi-asserted-by":"publisher","author":"Butler","year":"2023","DOI":"10.26434\/chemrxiv-2023-vsmpx-v3"},{"key":"2025070713414038800_btaf354-B4","doi-asserted-by":"crossref","first-page":"16871","DOI":"10.1021\/acs.analchem.4c03724","article-title":"CMSSP: a contrastive mass spectra-structure pretraining model for metabolite identification","volume":"96","author":"Chen","year":"2024","journal-title":"Anal Chem"},{"key":"2025070713414038800_btaf354-B5","first-page":"1597","volume-title":"International Conference on Machine Learning","author":"Chen","year":"2020"},{"key":"2025070713414038800_btaf354-B6","first-page":"539","volume-title":"2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR\u201905)","author":"Chopra","year":"2005"},{"key":"2025070713414038800_btaf354-B7","doi-asserted-by":"crossref","first-page":"12799","DOI":"10.1021\/acs.analchem.9b02354","article-title":"Integrated probabilistic annotation: a Bayesian-based annotation method for metabolomic profiles integrating biochemical connections, isotope patterns, and adduct relationships","volume":"91","author":"Del Carratore","year":"2019","journal-title":"Anal Chem"},{"key":"2025070713414038800_btaf354-B8","doi-asserted-by":"crossref","first-page":"btad455","DOI":"10.1093\/bioinformatics\/btad455","article-title":"ipaPy2: integrated probabilistic annotation (IPA) 2.0\u2014an improved Bayesian-based method for the annotation of LC\u2013MS\/MS untargeted metabolomics data","volume":"39","author":"Del Carratore","year":"2023","journal-title":"Bioinformatics"},{"key":"2025070713414038800_btaf354-B9","doi-asserted-by":"crossref","first-page":"299","DOI":"10.1038\/s41592-019-0344-8","article-title":"SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information","volume":"16","author":"D\u00fchrkop","year":"2019","journal-title":"Nat Methods"},{"key":"2025070713414038800_btaf354-B10","doi-asserted-by":"crossref","first-page":"462","DOI":"10.1038\/s41587-020-0740-8","article-title":"Systematic classification of unknown metabolites using high-resolution fragmentation mass spectra","volume":"39","author":"D\u00fchrkop","year":"2021","journal-title":"Nat Biotechnol"},{"key":"2025070713414038800_btaf354-B11","doi-asserted-by":"publisher","author":"Faizan-Khan","year":"2025","DOI":"10.1101\/2025.02.07.637102"},{"key":"2025070713414038800_btaf354-B12","doi-asserted-by":"crossref","first-page":"965","DOI":"10.1038\/s42256-023-00708-3","article-title":"Annotating metabolite mass spectra with domain-inspired chemical formula transformers","volume":"5","author":"Goldman","year":"2023","journal-title":"Nat Mach Intell"},{"key":"2025070713414038800_btaf354-B13","doi-asserted-by":"crossref","first-page":"2333","DOI":"10.1093\/bioinformatics\/bts437","article-title":"Metabolite identification and molecular fingerprint prediction through machine learning","volume":"28","author":"Heinonen","year":"2012","journal-title":"Bioinformatics"},{"key":"2025070713414038800_btaf354-B14","doi-asserted-by":"crossref","first-page":"183","DOI":"10.3390\/metabo10050183","article-title":"Pathway-activity likelihood analysis and metabolite annotation for untargeted metabolomics using probabilistic modeling","volume":"10","author":"Hosseini","year":"2020","journal-title":"Metabolites"},{"key":"2025070713414038800_btaf354-B15","doi-asserted-by":"crossref","first-page":"e1008724","DOI":"10.1371\/journal.pcbi.1008724","article-title":"Spec2vec: improved mass spectral similarity scoring through learning of structural relationships","volume":"17","author":"Huber","year":"2021","journal-title":"PLoS Comput Biol"},{"key":"2025070713414038800_btaf354-B16","doi-asserted-by":"crossref","first-page":"27","DOI":"10.1021\/acs.jcim.7b00616","article-title":"Mol2vec: unsupervised machine learning approach with chemical intuition","volume":"58","author":"Jaeger","year":"2018","journal-title":"J Chem Inf Model"},{"key":"2025070713414038800_btaf354-B17","first-page":"18661","article-title":"Supervised contrastive learning","volume":"33","author":"Khosla","year":"2020","journal-title":"Adv Neural Inf Process Syst"},{"key":"2025070713414038800_btaf354-B18","doi-asserted-by":"crossref","first-page":"D1102","DOI":"10.1093\/nar\/gky1033","article-title":"PubChem 2019 update: improved access to chemical data","volume":"47","author":"Kim","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"2025070713414038800_btaf354-B19","doi-asserted-by":"crossref","first-page":"513","DOI":"10.1002\/mas.21535","article-title":"Identification of small molecules using accurate mass MS\/MS search","volume":"37","author":"Kind","year":"2018","journal-title":"Mass Spectrom Rev"},{"key":"2025070713414038800_btaf354-B20","author":"Kipf","year":"2016"},{"key":"2025070713414038800_btaf354-B21","doi-asserted-by":"crossref","first-page":"e1003123","DOI":"10.1371\/journal.pcbi.1003123","article-title":"Predicting network activity from high throughput metabolomics","volume":"9","author":"Li","year":"2013","journal-title":"PLoS Comput Biol"},{"key":"2025070713414038800_btaf354-B22","doi-asserted-by":"crossref","first-page":"btae490","DOI":"10.1093\/bioinformatics\/btae490","article-title":"An ensemble spectral prediction (ESP) model for metabolite annotation","volume":"40","author":"Li","year":"2024","journal-title":"Bioinformatics"},{"key":"2025070713414038800_btaf354-B23","doi-asserted-by":"crossref","first-page":"132","DOI":"10.1038\/s42004-023-00932-3","article-title":"An end-to-end deep learning framework for translating mass spectra to de-novo molecules","volume":"6","author":"Litsa","year":"2023","journal-title":"Commun Chem"},{"key":"2025070713414038800_btaf354-B24","doi-asserted-by":"publisher","first-page":"3213","DOI":"10.1021\/acs.analchem.4c01565","article-title":"Molecular structure discovery for untargeted metabolomics using biotransformation rules and global molecular networking","volume":"97","author":"Martin","year":"2025","journal-title":"Anal Chem"},{"key":"2025070713414038800_btaf354-B25","first-page":"8748","volume-title":"International Conference on Machine Learning","author":"Radford","year":"2021"},{"key":"2025070713414038800_btaf354-B26","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1186\/s13321-016-0115-9","article-title":"MetFrag relaunched: incorporating strategies beyond in silico fragmentation","volume":"8","author":"Ruttkies","year":"2016","journal-title":"J Cheminform"},{"key":"2025070713414038800_btaf354-B27","doi-asserted-by":"crossref","first-page":"865","DOI":"10.1038\/s41592-022-01486-3","article-title":"MSNovelist: de novo structure generation from mass spectra","volume":"19","author":"Stravs","year":"2022","journal-title":"Nat Methods"},{"key":"2025070713414038800_btaf354-B28","doi-asserted-by":"crossref","first-page":"146","DOI":"10.1016\/j.inffus.2021.11.005","article-title":"A comprehensive survey on regularization strategies in machine learning","volume":"80","author":"Tian","year":"2022","journal-title":"Inf Fusion"},{"key":"2025070713414038800_btaf354-B29","first-page":"776","volume-title":"Proceedings of the 16th European Conference on Computer Vision\u2013ECCV 2020, Glasgow, UK, August 23\u201328, 2020, Part XI 16","author":"Tian","year":"2020"},{"key":"2025070713414038800_btaf354-B30","doi-asserted-by":"crossref","first-page":"11692","DOI":"10.1021\/acs.analchem.1c01465","article-title":"CFM-ID 4.0: more accurate ESI-MS\/MS spectral prediction and compound identification","volume":"93","author":"Wang","year":"2021","journal-title":"Anal Chem"},{"key":"2025070713414038800_btaf354-B31","doi-asserted-by":"crossref","first-page":"828","DOI":"10.1038\/nbt.3597","article-title":"Sharing and community curation of mass spectrometry data with global natural products social molecular networking","volume":"34","author":"Wang","year":"2016","journal-title":"Nat Biotechnol"},{"key":"2025070713414038800_btaf354-B32","doi-asserted-by":"crossref","first-page":"700","DOI":"10.1021\/acscentsci.9b00085","article-title":"Rapid prediction of electron\u2013ionization mass spectrometry using neural networks","volume":"5","author":"Wei","year":"2019","journal-title":"ACS Cent Sci"},{"key":"2025070713414038800_btaf354-B33","doi-asserted-by":"crossref","first-page":"148","DOI":"10.1186\/1471-2105-11-148","article-title":"In silico fragmentation for computer assisted identification of metabolite mass spectra","volume":"11","author":"Wolf","year":"2010","journal-title":"BMC Bioinformatics"},{"key":"2025070713414038800_btaf354-B34","doi-asserted-by":"publisher","first-page":"404","DOI":"10.1038\/s42256-024-00816-8","article-title":"Tandem mass spectrum prediction for small molecules using graph transformers","volume":"6","author":"Young","year":"2024","journal-title":"Nat Mach Intell"},{"key":"2025070713414038800_btaf354-B35","doi-asserted-by":"crossref","first-page":"16599","DOI":"10.1021\/acs.analchem.4c02426","article-title":"MSBERT: embedding tandem mass spectra into chemically rational space by mask learning and contrastive learning","volume":"96","author":"Zhang","year":"2024","journal-title":"Anal Chem"},{"key":"2025070713414038800_btaf354-B36","author":"Zhu","year":"2020"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/advance-article-pdf\/doi\/10.1093\/bioinformatics\/btaf354\/63508085\/btaf354.pdf","content-type":"application\/pdf","content-version":"am","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/7\/btaf354\/63508085\/btaf354.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/41\/7\/btaf354\/63508085\/btaf354.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,7,7]],"date-time":"2025-07-07T17:41:52Z","timestamp":1751910112000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/doi\/10.1093\/bioinformatics\/btaf354\/8165426"}},"subtitle":[],"editor":[{"given":"Macha","family":"Nikolski","sequence":"additional","affiliation":[]}],"short-title":[],"issued":{"date-parts":[[2025,6,17]]},"references-count":36,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2025,7,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btaf354","relation":{},"ISSN":["1367-4811"],"issn-type":[{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2025,7]]},"published":{"date-parts":[[2025,6,17]]},"article-number":"btaf354"}}