{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,26]],"date-time":"2026-06-26T07:45:19Z","timestamp":1782459919908,"version":"3.54.5"},"update-to":[{"DOI":"10.1371\/journal.pcbi.1008724","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2021,2,26]],"date-time":"2021-02-26T00:00:00Z","timestamp":1614297600000}}],"reference-count":41,"publisher":"Public Library of Science (PLoS)","issue":"2","license":[{"start":{"date-parts":[[2021,2,16]],"date-time":"2021-02-16T00:00:00Z","timestamp":1613433600000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["www.ploscompbiol.org"],"crossmark-restriction":false},"short-container-title":["PLoS Comput Biol"],"abstract":"<jats:p>Spectral similarity is used as a proxy for structural similarity in many tandem mass spectrometry (MS\/MS) based metabolomics analyses such as library matching and molecular networking. Although weaknesses in the relationship between spectral similarity scores and the true structural similarities have been described, little development of alternative scores has been undertaken. Here, we introduce Spec2Vec, a novel spectral similarity score inspired by a natural language processing algorithm\u2014Word2Vec. Spec2Vec learns fragmental relationships within a large set of spectral data to derive abstract spectral embeddings that can be used to assess spectral similarities. Using data derived from GNPS MS\/MS libraries including spectra for nearly 13,000 unique molecules, we show how Spec2Vec scores correlate better with structural similarity than cosine-based scores. We demonstrate the advantages of Spec2Vec in library matching and molecular networking. Spec2Vec is computationally more scalable allowing structural analogue searches in large databases within seconds.<\/jats:p>","DOI":"10.1371\/journal.pcbi.1008724","type":"journal-article","created":{"date-parts":[[2021,2,16]],"date-time":"2021-02-16T17:30:31Z","timestamp":1613496631000},"page":"e1008724","update-policy":"https:\/\/doi.org\/10.1371\/journal.pcbi.corrections_policy","source":"Crossref","is-referenced-by-count":202,"title":["Spec2Vec: Improved mass spectral similarity scoring through learning of structural relationships"],"prefix":"10.1371","volume":"17","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3535-9406","authenticated-orcid":true,"given":"Florian","family":"Huber","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7635-9533","authenticated-orcid":true,"given":"Lars","family":"Ridder","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5821-2060","authenticated-orcid":true,"given":"Stefan","family":"Verhoeven","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-7064-4069","authenticated-orcid":true,"given":"Jurriaan H.","family":"Spaaks","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0989-929X","authenticated-orcid":true,"given":"Faruk","family":"Diblen","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3578-4477","authenticated-orcid":true,"given":"Simon","family":"Rogers","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9340-5511","authenticated-orcid":true,"given":"Justin J. J.","family":"van der Hooft","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"340","published-online":{"date-parts":[[2021,2,16]]},"reference":[{"key":"pcbi.1008724.ref001","doi-asserted-by":"crossref","first-page":"263","DOI":"10.1038\/nrm3314","article-title":"Metabolomics: the apogee of the omics trilogy","volume":"13","author":"GJ Patti","year":"2012","journal-title":"Nat Rev Mol Cell Biol"},{"key":"pcbi.1008724.ref002","doi-asserted-by":"crossref","first-page":"646","DOI":"10.1126\/science.356.6338.646","article-title":"Big data, big picture: Metabolomics meets systems biology","volume":"356","author":"M May","year":"2017","journal-title":"Science"},{"key":"pcbi.1008724.ref003","doi-asserted-by":"crossref","first-page":"166","DOI":"10.1038\/ng.308","article-title":"System-wide molecular evidence for phenotypic buffering in Arabidopsis","volume":"41","author":"J Fu","year":"2009","journal-title":"Nat Genet"},{"key":"pcbi.1008724.ref004","doi-asserted-by":"crossref","first-page":"106","DOI":"10.1007\/s11306-017-1242-7","article-title":"Navigating freely-available software tools for metabolomics analysis","volume":"13","author":"R Spicer","year":"2017","journal-title":"Metabolomics"},{"key":"pcbi.1008724.ref005","first-page":"8","article-title":"Software Tools and Approaches for Compound Identification of LC-MS\/MS Data in Metabolomics","author":"I Bla\u017eenovi\u0107","year":"2018","journal-title":"Metabolites"},{"key":"pcbi.1008724.ref006","doi-asserted-by":"crossref","first-page":"703","DOI":"10.1002\/jms.1777","article-title":"MassBank: a public repository for sharing mass spectral data for life sciences","volume":"45","author":"H Horai","year":"2010","journal-title":"J Mass Spectrom"},{"key":"pcbi.1008724.ref007","doi-asserted-by":"crossref","first-page":"3156","DOI":"10.1021\/acs.analchem.7b04424","article-title":"METLIN: A Technology Platform for Identifying Knowns and Unknowns","volume":"90","author":"C Guijas","year":"2018","journal-title":"Anal Chem"},{"key":"pcbi.1008724.ref008","doi-asserted-by":"crossref","first-page":"828","DOI":"10.1038\/nbt.3597","article-title":"Sharing and community curation of mass spectrometry data with GNPS","volume":"34","author":"M Wang","year":"2016","journal-title":"Nat Biotechnol"},{"key":"pcbi.1008724.ref009","doi-asserted-by":"crossref","first-page":"12580","DOI":"10.1073\/pnas.1509788112","article-title":"Searching molecular structure databases with tandem mass spectra using CSI:FingerID","volume":"112","author":"K D\u00fchrkop","year":"2015","journal-title":"Proc Natl Acad Sci U S A"},{"key":"pcbi.1008724.ref010","doi-asserted-by":"crossref","first-page":"13738","DOI":"10.1073\/pnas.1608041113","article-title":"Topic modeling for untargeted substructure exploration in metabolomics","volume":"113","author":"Hooft JJJ van der","year":"2016","journal-title":"Proc Natl Acad Sci"},{"key":"pcbi.1008724.ref011","doi-asserted-by":"crossref","first-page":"E1743","DOI":"10.1073\/pnas.1203689109","article-title":"Mass spectral molecular networking of living microbial colonies","volume":"109","author":"J Watrous","year":"2012","journal-title":"Proc Natl Acad Sci U S A"},{"key":"pcbi.1008724.ref012","doi-asserted-by":"crossref","first-page":"75","DOI":"10.1016\/j.aca.2004.04.014","article-title":"Spectral similarity versus structural similarity: mass spectrometry","volume":"516","author":"W Demuth","year":"2004","journal-title":"Anal Chim Acta"},{"key":"pcbi.1008724.ref013","doi-asserted-by":"crossref","first-page":"2692","DOI":"10.1007\/s13361-017-1797-6","article-title":"Similarity of High-Resolution Tandem Mass Spectrometry Spectra of Structurally Related Micropollutants and Transformation Products","volume":"28","author":"JE Scholl\u00e9e","year":"2017","journal-title":"J Am Soc Mass Spectrom"},{"key":"pcbi.1008724.ref014","doi-asserted-by":"crossref","first-page":"3474","DOI":"10.1021\/acs.analchem.6b04512","article-title":"iMet: A Network-Based Computational Tool To Assist in the Annotation of Metabolites from Tandem Mass Spectra","volume":"89","author":"A Aguilar-Mogas","year":"2017","journal-title":"Anal Chem"},{"key":"pcbi.1008724.ref015","doi-asserted-by":"crossref","first-page":"2097","DOI":"10.1021\/es5002105","article-title":"Identifying Small Molecules via High Resolution Mass Spectrometry: Communicating Confidence","volume":"48","author":"EL Schymanski","year":"2014","journal-title":"Environ Sci Technol"},{"key":"pcbi.1008724.ref016","first-page":"3","article-title":"Automatic Compound Annotation from Mass Spectrometry Data Using MAGMa","author":"L Ridder","year":"2014","journal-title":"Mass Spectrom"},{"key":"pcbi.1008724.ref017","doi-asserted-by":"crossref","first-page":"299","DOI":"10.1038\/s41592-019-0344-8","article-title":"SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information","volume":"16","author":"K D\u00fchrkop","year":"2019","journal-title":"Nat Methods"},{"key":"pcbi.1008724.ref018","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1186\/s13321-016-0115-9","article-title":"MetFrag relaunched: incorporating strategies beyond in silico fragmentation","volume":"8","author":"C Ruttkies","year":"2016","journal-title":"J Cheminformatics"},{"key":"pcbi.1008724.ref019","doi-asserted-by":"crossref","first-page":"i28","DOI":"10.1093\/bioinformatics\/btw246","article-title":"Fast metabolite identification with Input Output Kernel Regression","volume":"32","author":"C Brouard","year":"2016","journal-title":"Bioinformatics"},{"key":"pcbi.1008724.ref020","doi-asserted-by":"crossref","first-page":"5629","DOI":"10.1021\/acs.analchem.8b05405","article-title":"Deep MS\/MS-Aided Structural-Similarity Scoring for Unknown Metabolite Identification","volume":"91","author":"H Ji","year":"2019","journal-title":"Anal Chem"},{"key":"pcbi.1008724.ref021","doi-asserted-by":"crossref","first-page":"1516","DOI":"10.1038\/s41467-019-09550-x","article-title":"Metabolic reaction network-based recursive metabolite annotation for untargeted metabolomics","volume":"10","author":"X Shen","year":"2019","journal-title":"Nat Commun"},{"key":"pcbi.1008724.ref022","first-page":"3111","article-title":"Distributed representations of words and phrases and their compositionality","author":"T Mikolov","year":"2013","journal-title":"Advances in neural information processing systems"},{"key":"pcbi.1008724.ref023","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1145\/2529989","article-title":"Linear-Time Approximation for Maximum Weight Matching","volume":"61","author":"R Duan","year":"2014","journal-title":"J ACM"},{"key":"pcbi.1008724.ref024","doi-asserted-by":"crossref","first-page":"357","DOI":"10.1038\/s41586-020-2649-2","article-title":"Array programming with NumPy","volume":"585","author":"CR Harris","year":"2020","journal-title":"Nature"},{"key":"pcbi.1008724.ref025","first-page":"1","article-title":"Numba: a LLVM-based Python JIT compiler. Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC. Austin","author":"SK Lam","year":"2015","journal-title":"Texas: Association for Computing Machinery"},{"key":"pcbi.1008724.ref026","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1016\/j.physrep.2016.09.002","article-title":"Community detection in networks: A user guide","volume":"659","author":"S Fortunato","year":"2016","journal-title":"Phys Rep"},{"key":"pcbi.1008724.ref027","doi-asserted-by":"crossref","first-page":"P10008","DOI":"10.1088\/1742-5468\/2008\/10\/P10008","article-title":"Fast unfolding of communities in large networks","volume":"2008","author":"VD Blondel","year":"2008","journal-title":"J Stat Mech Theory Exp"},{"key":"pcbi.1008724.ref028","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1186\/s13321-016-0116-8","article-title":"Fragmentation trees reloaded","volume":"8","author":"S B\u00f6cker","year":"2016","journal-title":"J Cheminformatics"},{"key":"pcbi.1008724.ref029","doi-asserted-by":"crossref","first-page":"3268","DOI":"10.1021\/acs.jproteome.9b00216","article-title":"Alphabet Projection of Spectra","volume":"18","author":"PA Kreitzberg","year":"2019","journal-title":"J Proteome Res"},{"key":"pcbi.1008724.ref030","doi-asserted-by":"crossref","first-page":"14476","DOI":"10.1021\/acs.analchem.0c02521","article-title":"Retrieving and Utilizing Hypothetical Neutral Losses from Tandem Mass Spectra for Spectral Similarity Analysis and Unknown Metabolite Annotation","volume":"92","author":"S Xing","year":"2020","journal-title":"Anal Chem"},{"key":"pcbi.1008724.ref031","doi-asserted-by":"crossref","first-page":"2411","DOI":"10.21105\/joss.02411","article-title":"matchms\u2014processing and similarity evaluation of mass spectrometry data","volume":"5","author":"F Huber","year":"2020","journal-title":"J Open Source Softw"},{"key":"pcbi.1008724.ref032","volume-title":"spec2vec","author":"F Huber","year":"2020"},{"key":"pcbi.1008724.ref033","doi-asserted-by":"crossref","first-page":"905","DOI":"10.1038\/s41592-020-0933-6","article-title":"Feature-based molecular networking in the GNPS analysis environment","volume":"17","author":"L-F Nothias","year":"2020","journal-title":"Nat Methods"},{"key":"pcbi.1008724.ref034","doi-asserted-by":"crossref","first-page":"144","DOI":"10.3390\/metabo9070144","article-title":"MolNetEnhancer: Enhanced Molecular Networks by Integrating Metabolome Mining and Annotation Tools","volume":"9","author":"M Ernst","year":"2019","journal-title":"Metabolites"},{"key":"pcbi.1008724.ref035","doi-asserted-by":"crossref","first-page":"D1102","DOI":"10.1093\/nar\/gky1033","article-title":"PubChem 2019 update: improved access to chemical data","volume":"47","author":"S Kim","year":"2019","journal-title":"Nucleic Acids Res"},{"key":"pcbi.1008724.ref036","volume-title":"mcs07\/PubChemPy: PubChemPy v1.0.4","author":"Swain Matt","year":"2017"},{"key":"pcbi.1008724.ref037","first-page":"45","volume-title":"Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks","author":"R \u0158eh\u016f\u0159ek","year":"2010"},{"key":"pcbi.1008724.ref038","doi-asserted-by":"crossref","first-page":"20","DOI":"10.1186\/s13321-015-0069-3","article-title":"Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations?","volume":"7","author":"D Bajusz","year":"2015","journal-title":"J Cheminformatics"},{"key":"pcbi.1008724.ref039","author":"G Landrum","journal-title":"RDKit: Open-source cheminformatics"},{"key":"pcbi.1008724.ref040","author":"Phillip Cloud","year":"2020"},{"key":"pcbi.1008724.ref041","first-page":"261","article-title":"SciPy 1.0: fundamental algorithms for scientific computing in Python","volume":"17","author":"P Virtanen","journal-title":"Nat Methods. 2020"}],"updated-by":[{"DOI":"10.1371\/journal.pcbi.1008724","type":"new_version","label":"New version","source":"publisher","updated":{"date-parts":[[2021,2,26]],"date-time":"2021-02-26T00:00:00Z","timestamp":1614297600000}}],"container-title":["PLOS Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1008724","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,2,26]],"date-time":"2021-02-26T13:50:41Z","timestamp":1614347441000},"score":1,"resource":{"primary":{"URL":"https:\/\/dx.plos.org\/10.1371\/journal.pcbi.1008724"}},"subtitle":[],"editor":[{"given":"Lars Juhl","family":"Jensen","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"editor"}]}],"short-title":[],"issued":{"date-parts":[[2021,2,16]]},"references-count":41,"journal-issue":{"issue":"2","published-online":{"date-parts":[[2021,2,16]]}},"URL":"https:\/\/doi.org\/10.1371\/journal.pcbi.1008724","relation":{"has-preprint":[{"id-type":"doi","id":"10.1101\/2020.08.11.245928","asserted-by":"object"}]},"ISSN":["1553-7358"],"issn-type":[{"value":"1553-7358","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,2,16]]}}}