{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,6]],"date-time":"2026-06-06T16:15:15Z","timestamp":1780762515086,"version":"3.54.1"},"reference-count":36,"publisher":"Oxford University Press (OUP)","issue":"Supplement_1","license":[{"start":{"date-parts":[[2024,6,28]],"date-time":"2024-06-28T00:00:00Z","timestamp":1719532800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["DBI-2011271"],"award-info":[{"award-number":["DBI-2011271"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","award":["1R01AI143254"],"award-info":[{"award-number":["1R01AI143254"]}],"id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]},{"name":"University Precision Health Initiative"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,6,28]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>Tandem mass spectrometry (MS\/MS) is a crucial technology for large-scale proteomic analysis. The protein database search or the spectral library search are commonly used for peptide identification from MS\/MS spectra, which, however, may face challenges due to experimental variations between replicated spectra and similar fragmentation patterns among distinct peptides. To address this challenge, we present SpecEncoder, a deep metric learning approach to address these challenges by transforming MS\/MS spectra into robust and sensitive embedding vectors in a latent space. The SpecEncoder model can also embed predicted MS\/MS spectra of peptides, enabling a hybrid search approach that combines spectral library and protein database searches for peptide identification.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>We evaluated SpecEncoder on three large human proteomics datasets, and the results showed a consistent improvement in peptide identification. For spectral library search, SpecEncoder identifies 1%\u20132% more unique peptides (and PSMs) than SpectraST. For protein database search, it identifies 6%\u201315% more unique peptides than MSGF+ enhanced by Percolator, Furthermore, SpecEncoder identified 6%\u201312% additional unique peptides when utilizing a combined library of experimental and predicted spectra. SpecEncoder can also identify more peptides when compared to deep-learning enhanced methods (MSFragger boosted by MSBooster). These results demonstrate SpecEncoder\u2019s potential to enhance peptide identification for proteomic data analyses.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Availability and Implementation<\/jats:title>\n                  <jats:p>The source code and scripts for SpecEncoder and peptide identification are available on GitHub at https:\/\/github.com\/lkytal\/SpecEncoder. Contact: hatang@iu.edu.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btae220","type":"journal-article","created":{"date-parts":[[2024,6,28]],"date-time":"2024-06-28T09:27:39Z","timestamp":1719566859000},"page":"i257-i265","source":"Crossref","is-referenced-by-count":8,"title":["SpecEncoder: deep metric learning for accurate peptide identification in proteomics"],"prefix":"10.1093","volume":"40","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-3404-2802","authenticated-orcid":false,"given":"Kaiyuan","family":"Liu","sequence":"first","affiliation":[{"name":"Department of Computer Science, Luddy School of Informatics, Computing and Engineering, Indiana University , IN 47408, United States"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Chenghua","family":"Tao","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Luddy School of Informatics, Computing and Engineering, Indiana University , IN 47408, United States"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-3707-3185","authenticated-orcid":false,"given":"Yuzhen","family":"Ye","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Luddy School of Informatics, Computing and Engineering, Indiana University , IN 47408, United States"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Haixu","family":"Tang","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Luddy School of Informatics, Computing and Engineering, Indiana University , IN 47408, United States"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"286","published-online":{"date-parts":[[2024,6,28]]},"reference":[{"key":"2024062809033975900_btae220-B1","doi-asserted-by":"crossref","first-page":"198","DOI":"10.1038\/nature01511","article-title":"Mass spectrometry-based proteomics","volume":"422","author":"Aebersold","year":"2003","journal-title":"Nature"},{"key":"2024062809033975900_btae220-B2","author":"Bai"},{"key":"2024062809033975900_btae220-B3","doi-asserted-by":"crossref","first-page":"587","DOI":"10.1016\/j.cels.2017.05.009","article-title":"An optimized shotgun strategy for the rapid generation of comprehensive human proteomes","volume":"4","author":"Bekker-Jensen","year":"2017","journal-title":"Cell Syst"},{"key":"2024062809033975900_btae220-B4","doi-asserted-by":"crossref","first-page":"675","DOI":"10.1038\/s41592-022-01496-1","article-title":"A learned embedding for efficient joint analysis of millions of mass spectra","volume":"19","author":"Bittremieux","year":"2022","journal-title":"Nat Methods"},{"key":"2024062809033975900_btae220-B5","doi-asserted-by":"crossref","first-page":"D204","DOI":"10.1093\/nar\/gku989","article-title":"Uniprot: a hub for protein information","volume":"43","author":"Consortium U","year":"2015","journal-title":"Nucleic Acids Research"},{"key":"2024062809033975900_btae220-B6","doi-asserted-by":"crossref","first-page":"1794","DOI":"10.1021\/pr101065j","article-title":"Andromeda: a peptide search engine integrated into the maxquant environment","volume":"10","author":"Cox","year":"2011","journal-title":"J Proteome Res"},{"key":"2024062809033975900_btae220-B7","doi-asserted-by":"crossref","first-page":"2310","DOI":"10.1002\/rcm.1198","article-title":"A method for reducing the time required to match protein sequences with tandem mass spectra","volume":"17","author":"Craig","year":"2003","journal-title":"Rapid Commun Mass Spectrom"},{"key":"2024062809033975900_btae220-B8","doi-asserted-by":"crossref","first-page":"4051","DOI":"10.1021\/acs.jproteome.8b00485","article-title":"Expanding the use of spectral libraries in proteomics","volume":"17","author":"Deutsch","year":"2018","journal-title":"J Proteome Res"},{"key":"2024062809033975900_btae220-B9","doi-asserted-by":"crossref","first-page":"976","DOI":"10.1016\/1044-0305(94)80016-2","article-title":"An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database","volume":"5","author":"Eng","year":"1994","journal-title":"J Am Soc Mass Spectrom"},{"key":"2024062809033975900_btae220-B10","doi-asserted-by":"crossref","first-page":"R111.009522","DOI":"10.1074\/mcp.R111.009522","article-title":"A face in the crowd: recognizing peptides through database search","volume":"10","author":"Eng","year":"2011","journal-title":"Mol Cell Proteomics"},{"key":"2024062809033975900_btae220-B11","doi-asserted-by":"crossref","first-page":"587","DOI":"10.1038\/nmeth.1609","article-title":"Spectral archives: extending spectral libraries to analyze both identified and unidentified spectra","volume":"8","author":"Frank","year":"2011","journal-title":"Nat Methods"},{"key":"2024062809033975900_btae220-B12","doi-asserted-by":"crossref","first-page":"958","DOI":"10.1021\/pr0499491","article-title":"Open mass spectrometry search algorithm","volume":"3","author":"Geer","year":"2004","journal-title":"J Proteome Res"},{"key":"2024062809033975900_btae220-B13","doi-asserted-by":"crossref","first-page":"7888","DOI":"10.1021\/acs.analchem.3c00260","article-title":"Contrastive learning-based embedder for the representation of tandem mass spectra","volume":"95","author":"Guo","year":"2023","journal-title":"Anal Chem"},{"key":"2024062809033975900_btae220-B14","first-page":"161","article-title":"Mascot: multiple alignment system for protein sequences based on three-way dynamic programming","volume":"9","author":"Hirosawa","year":"1993","journal-title":"Comput Appl Biosci"},{"key":"2024062809033975900_btae220-B15","doi-asserted-by":"crossref","first-page":"e1008724","DOI":"10.1371\/journal.pcbi.1008724","article-title":"Spec2vec: improved mass spectral similarity scoring through learning of structural relationships","volume":"17","author":"Huber","year":"2021","journal-title":"PLoS Comput Biol"},{"key":"2024062809033975900_btae220-B16","doi-asserted-by":"crossref","first-page":"923","DOI":"10.1038\/nmeth1113","article-title":"Semi-supervised learning for peptide identification from shotgun proteomics datasets","volume":"4","author":"K\u00e4ll","year":"2007","journal-title":"Nat Methods"},{"key":"2024062809033975900_btae220-B17","doi-asserted-by":"crossref","first-page":"2534","DOI":"10.1093\/bioinformatics\/btn323","article-title":"Proteowizard: open source software for rapid proteomics tools development","volume":"24","author":"Kessner","year":"2008","journal-title":"Bioinformatics"},{"key":"2024062809033975900_btae220-B18","doi-asserted-by":"crossref","first-page":"5277","DOI":"10.1038\/ncomms6277","article-title":"Ms-gf+ makes progress towards a universal database search tool for proteomics","volume":"5","author":"Kim","year":"2014","journal-title":"Nat Commun"},{"key":"2024062809033975900_btae220-B19","doi-asserted-by":"crossref","first-page":"513","DOI":"10.1038\/nmeth.4256","article-title":"Msfragger: ultrafast and comprehensive peptide identification in mass spectrometry\u2013based proteomics","volume":"14","author":"Kong","year":"2017","journal-title":"Nat Methods"},{"key":"2024062809033975900_btae220-B20","author":"Lam","year":"2006"},{"key":"2024062809033975900_btae220-B21","doi-asserted-by":"crossref","first-page":"655","DOI":"10.1002\/pmic.200600625","article-title":"Development and validation of a spectral library searching method for peptide identification from ms\/ms","volume":"7","author":"Lam","year":"2007","journal-title":"Proteomics"},{"key":"2024062809033975900_btae220-B22","doi-asserted-by":"crossref","first-page":"873","DOI":"10.1038\/nmeth.1254","article-title":"Building consensus spectral libraries for peptide identification in proteomics","volume":"5","author":"Lam","year":"2008","journal-title":"Nat Methods"},{"key":"2024062809033975900_btae220-B23","doi-asserted-by":"crossref","first-page":"4275","DOI":"10.1021\/acs.analchem.9b04867","article-title":"Full-spectrum prediction of peptides tandem mass spectra using deep neural network","volume":"92","author":"Liu","year":"2020","journal-title":"Anal Chem"},{"key":"2024062809033975900_btae220-B24","author":"Liu","year":"2020"},{"key":"2024062809033975900_btae220-B25","doi-asserted-by":"crossref","first-page":"7974","DOI":"10.1038\/s41467-023-43010-x","article-title":"Accurate de novo peptide sequencing using fully convolutional neural networks","volume":"14","author":"Liu","year":"2023","journal-title":"Nat Commun"},{"key":"2024062809033975900_btae220-B26","doi-asserted-by":"crossref","first-page":"2749","DOI":"10.1021\/pr401169d","article-title":"Functional annotation of proteome encoded by human chromosome 22","volume":"13","author":"Pinto","year":"2014","journal-title":"J Proteome Res"},{"key":"2024062809033975900_btae220-B27","doi-asserted-by":"crossref","first-page":"104070","DOI":"10.1016\/j.jprot.2020.104070","article-title":"Deep learning embedder method and tool for mass spectra similarity search","volume":"232","author":"Qin","year":"2021","journal-title":"J Proteomics"},{"key":"2024062809033975900_btae220-B28","doi-asserted-by":"crossref","first-page":"223","DOI":"10.1038\/nbt.2839","article-title":"Proteomexchange provides globally coordinated proteomics data submission and dissemination","volume":"32","author":"Vizca\u00edno","year":"2014","journal-title":"Nat Biotechnol"},{"key":"2024062809033975900_btae220-B29","doi-asserted-by":"crossref","first-page":"926","DOI":"10.1109\/LSP.2018.2822810","article-title":"Additive margin softmax for face verification","volume":"25","author":"Wang","year":"2018","journal-title":"IEEE Signal Process Lett"},{"key":"2024062809033975900_btae220-B30","doi-asserted-by":"crossref","first-page":"412","DOI":"10.1016\/j.cels.2018.08.004","article-title":"Assembling the community-scale discoverable human proteome","volume":"7","author":"Wang","year":"2018","journal-title":"Cell Syst"},{"key":"2024062809033975900_btae220-B31","doi-asserted-by":"crossref","first-page":"582","DOI":"10.1038\/nature13319","article-title":"Mass-spectrometry-based draft of the human proteome","volume":"509","author":"Wilhelm","year":"2014","journal-title":"Nature"},{"key":"2024062809033975900_btae220-B32","doi-asserted-by":"crossref","first-page":"4539","DOI":"10.1038\/s41467-023-40129-9","article-title":"Msbooster: improving peptide identification rates using deep learning-based features","volume":"14","author":"Yang","year":"2023","journal-title":"Nat Commun"},{"key":"2024062809033975900_btae220-B33","doi-asserted-by":"crossref","first-page":"2280","DOI":"10.1007\/s13361-017-1748-2","article-title":"Extending a tandem mass spectral library to include ms 2 spectra of fragment ions produced in-source and ms n spectra","volume":"28","author":"Yang","year":"2017","journal-title":"J Am Soc Mass Spectrom"},{"key":"2024062809033975900_btae220-B34","doi-asserted-by":"crossref","first-page":"3557","DOI":"10.1021\/ac980122y","article-title":"Method to compare collision-induceddissociation spectra of peptides: potential for library searching and subtractive analysis","volume":"70","author":"Yates","year":"1998","journal-title":"Anal Chem"},{"key":"2024062809033975900_btae220-B35","doi-asserted-by":"crossref","first-page":"2343","DOI":"10.1021\/cr3003533","article-title":"Protein analysis by shotgun\/bottom-up proteomics","volume":"113","author":"Zhang","year":"2013","journal-title":"Chem Rev"},{"key":"2024062809033975900_btae220-B36","doi-asserted-by":"crossref","first-page":"259","DOI":"10.1038\/nmeth.4153","article-title":"Building proteometools based on a complete synthetic human proteome","volume":"14","author":"Zolg","year":"2017","journal-title":"Nat Methods"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/Supplement_1\/i257\/58354825\/btae220.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/Supplement_1\/i257\/58354825\/btae220.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,6,28]],"date-time":"2024-06-28T09:27:59Z","timestamp":1719566879000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/40\/Supplement_1\/i257\/7700866"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,6,28]]},"references-count":36,"journal-issue":{"issue":"Supplement_1","published-print":{"date-parts":[[2024,6,28]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btae220","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,7]]},"published":{"date-parts":[[2024,6,28]]}}}