{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T08:10:21Z","timestamp":1774944621944,"version":"3.50.1"},"reference-count":35,"publisher":"Oxford University Press (OUP)","issue":"Supplement_1","license":[{"start":{"date-parts":[[2024,6,28]],"date-time":"2024-06-28T00:00:00Z","timestamp":1719532800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["2245300"],"award-info":[{"award-number":["2245300"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2024,6,28]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:sec>\n                  <jats:title>Motivation<\/jats:title>\n                  <jats:p>One of the core problems in the analysis of protein tandem mass spectrometry data is the peptide assignment problem: determining, for each observed spectrum, the peptide sequence that was responsible for generating the spectrum. Two primary classes of methods are used to solve this problem: database search and de novo peptide sequencing. State-of-the-art methods for de novo sequencing use machine learning methods, whereas most database search engines use hand-designed score functions to evaluate the quality of a match between an observed spectrum and a candidate peptide from the database. We hypothesized that machine learning models for de novo sequencing implicitly learn a score function that captures the relationship between peptides and spectra, and thus may be re-purposed as a score function for database search. Because this score function is trained from massive amounts of mass spectrometry data, it could potentially outperform existing, hand-designed database search tools.<\/jats:p>\n               <\/jats:sec>\n               <jats:sec>\n                  <jats:title>Results<\/jats:title>\n                  <jats:p>To test this hypothesis, we re-engineered Casanovo, which has been shown to provide state-of-the-art de novo sequencing capabilities, to assign scores to given peptide-spectrum pairs. We then evaluated the statistical power of this Casanovo score function, Casanovo-DB, to detect peptides on a benchmark of three mass spectrometry runs from three different species. In addition, we show that re-scoring with the Percolator post-processor benefits Casanovo-DB more than other score functions, further increasing the number of detected peptides.<\/jats:p>\n               <\/jats:sec>","DOI":"10.1093\/bioinformatics\/btae218","type":"journal-article","created":{"date-parts":[[2024,6,28]],"date-time":"2024-06-28T09:32:35Z","timestamp":1719567155000},"page":"i410-i417","source":"Crossref","is-referenced-by-count":7,"title":["A learned score function improves the power of mass spectrometry database search"],"prefix":"10.1093","volume":"40","author":[{"given":"Varun","family":"Ananth","sequence":"first","affiliation":[{"name":"Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195,\u00a0USA"}]},{"given":"Justin","family":"Sanders","sequence":"additional","affiliation":[{"name":"Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195,\u00a0USA"}]},{"given":"Melih","family":"Yilmaz","sequence":"additional","affiliation":[{"name":"Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195,\u00a0USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2261-3150","authenticated-orcid":false,"given":"Bo","family":"Wen","sequence":"additional","affiliation":[{"name":"Department of Genome Sciences, University of Washington, Seattle, WA 98195,\u00a0USA"}]},{"given":"Sewoong","family":"Oh","sequence":"additional","affiliation":[{"name":"Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195,\u00a0USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-7283-4715","authenticated-orcid":false,"given":"William Stafford","family":"Noble","sequence":"additional","affiliation":[{"name":"Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195,\u00a0USA"},{"name":"Department of Genome Sciences, University of Washington, Seattle, WA 98195,\u00a0USA"}]}],"member":"286","published-online":{"date-parts":[[2024,6,28]]},"reference":[{"key":"2024062809064519900_btae218-B1","first-page":"327","author":"Bai","year":"2016"},{"key":"2024062809064519900_btae218-B2","doi-asserted-by":"crossref","first-page":"1794","DOI":"10.1021\/pr101065j","article-title":"Andromeda: a peptide search engine integrated into the MaxQuant environment","volume":"10","author":"Cox","year":"2011","journal-title":"J Proteome Res"},{"key":"2024062809064519900_btae218-B3","doi-asserted-by":"crossref","first-page":"1466","DOI":"10.1093\/bioinformatics\/bth092","article-title":"Tandem: matching proteins with tandem mass spectra","volume":"20","author":"Craig","year":"2004","journal-title":"Bioinformatics"},{"key":"2024062809064519900_btae218-B4","doi-asserted-by":"crossref","first-page":"3871","DOI":"10.1021\/pr101196n","article-title":"Faster SEQUEST searching for peptide identification from tandem mass spectra","volume":"10","author":"Diament","year":"2011","journal-title":"J Proteome Res"},{"key":"2024062809064519900_btae218-B5","doi-asserted-by":"crossref","first-page":"207","DOI":"10.1038\/nmeth1019","article-title":"Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry","volume":"4","author":"Elias","year":"2007","journal-title":"Nat Methods"},{"key":"2024062809064519900_btae218-B6","author":"Eloff","year":"2023"},{"key":"2024062809064519900_btae218-B7","doi-asserted-by":"crossref","first-page":"976","DOI":"10.1016\/1044-0305(94)80016-2","article-title":"An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database","volume":"5","author":"Eng","year":"1994","journal-title":"J Am Soc Mass Spectrom"},{"key":"2024062809064519900_btae218-B8","doi-asserted-by":"crossref","first-page":"22","DOI":"10.1002\/pmic.201200439","article-title":"Comet: an open source tandem mass spectrometry sequence database search tool","volume":"13","author":"Eng","year":"2012","journal-title":"Proteomics"},{"key":"2024062809064519900_btae218-B9","author":"Ge","year":"2022"},{"key":"2024062809064519900_btae218-B10","doi-asserted-by":"crossref","first-page":"2765","DOI":"10.1074\/mcp.O113.036681","article-title":"The mzTab data exchange format: communicating mass-spectrometry-based proteomics and metabolomics experimental results to a wider audience","volume":"13","author":"Griss","year":"2014","journal-title":"Mol Cell Proteomics"},{"key":"2024062809064519900_btae218-B11","author":"Jin","year":"2023"},{"key":"2024062809064519900_btae218-B12","doi-asserted-by":"crossref","first-page":"923","DOI":"10.1038\/nmeth1113","article-title":"A semi-supervised machine learning technique for peptide identification from shotgun proteomics datasets","volume":"4","author":"K\u00e4ll","year":"2007","journal-title":"Nat Methods"},{"key":"2024062809064519900_btae218-B13","doi-asserted-by":"crossref","first-page":"2478","DOI":"10.1074\/mcp.TIR119.001656","article-title":"Uncovering thousands of new peptides with sequence-mask-search hybrid de novo peptide sequencing framework","volume":"18","author":"Karunratanakul","year":"2019","journal-title":"Mol Cell Proteomics"},{"key":"2024062809064519900_btae218-B14","doi-asserted-by":"crossref","first-page":"1147","DOI":"10.1021\/pr5010983","article-title":"On the importance of well calibrated scores for identifying shotgun proteomics spectra","volume":"14","author":"Keich","year":"2015","journal-title":"J Proteome Res"},{"key":"2024062809064519900_btae218-B15","doi-asserted-by":"crossref","first-page":"2534","DOI":"10.1093\/bioinformatics\/btn323","article-title":"Proteowizard: open source software for rapid proteomics tools development","volume":"24","author":"Kessner","year":"2008","journal-title":"Bioinformatics"},{"key":"2024062809064519900_btae218-B16","doi-asserted-by":"crossref","first-page":"2106","DOI":"10.1021\/pr8011107","article-title":"Statistical calibration of the Sequest XCorr function","volume":"8","author":"Klammer","year":"2009","journal-title":"J Proteome Res"},{"key":"2024062809064519900_btae218-B17","doi-asserted-by":"crossref","first-page":"151","DOI":"10.1038\/s41467-023-44323-7","article-title":"Deep learning-driven fragment ion series classification enables highly precise and sensitive de novo peptide sequencing","volume":"15","author":"Klaproth-Andrade","year":"2024","journal-title":"Nat Commun"},{"key":"2024062809064519900_btae218-B18","doi-asserted-by":"crossref","first-page":"3652","DOI":"10.1021\/acs.jproteome.3c00486","article-title":"Sage: an open-source tool for fast proteomics searching and quantification at scale","volume":"22","author":"Lazear","year":"2023","journal-title":"J Proteome Res"},{"key":"2024062809064519900_btae218-B19","doi-asserted-by":"crossref","first-page":"e1011892","DOI":"10.1371\/journal.pcbi.1011892","article-title":"Bidirectional de novo peptide sequencing using a transformer model","volume":"20","author":"Lee","year":"2024","journal-title":"PLoS Comput Biol"},{"key":"2024062809064519900_btae218-B20","doi-asserted-by":"crossref","first-page":"2412","DOI":"10.1021\/acs.jproteome.2c00282","article-title":"Improving peptide-level mass spectrometry analysis via double competition","volume":"21","author":"Lin","year":"2022","journal-title":"J Proteome Res"},{"key":"2024062809064519900_btae218-B21","doi-asserted-by":"publisher","first-page":"e2300084","DOI":"10.1002\/pmic.202300084","article-title":"Target-decoy false discovery rate estimation using crema","volume":"24","author":"Lin","year":"2024","journal-title":"Proteomics"},{"key":"2024062809064519900_btae218-B22","doi-asserted-by":"crossref","first-page":"7974","DOI":"10.1038\/s41467-023-43010-x","article-title":"Accurate de novo peptide sequencing using fully convolutional neural networks","volume":"14","author":"Liu","year":"2023","journal-title":"Nat Commun"},{"key":"2024062809064519900_btae218-B23","doi-asserted-by":"crossref","first-page":"1250","DOI":"10.1038\/s42256-023-00738-x","article-title":"Mitigating the missing fragmentation problem in de novo peptide sequencing with a two stage graph-based deep learning model","volume":"5","author":"Mao","year":"2023","journal-title":"Nat Mach Intell"},{"key":"2024062809064519900_btae218-B24","doi-asserted-by":"crossref","first-page":"2092","DOI":"10.1016\/j.jprot.2010.08.009","article-title":"A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics","volume":"73","author":"Nesvizhskii","year":"2010","journal-title":"J Proteomics"},{"key":"2024062809064519900_btae218-B25","doi-asserted-by":"crossref","first-page":"3022","DOI":"10.1021\/pr800127y","article-title":"Rapid and accurate peptide identification from tandem mass spectra","volume":"7","author":"Park","year":"2008","journal-title":"J Proteome Res"},{"key":"2024062809064519900_btae218-B26","doi-asserted-by":"publisher","first-page":"100538","DOI":"10.1016\/j.mcpro.2023.100538","article-title":"MSFragger-labile: A flexible method to improve labile PTM analysis in proteomics","volume":"22","author":"Polasky","year":"2023","journal-title":"Mol Cell Proteomics"},{"key":"2024062809064519900_btae218-B27","doi-asserted-by":"crossref","first-page":"420","DOI":"10.1038\/s42256-021-00304-3","article-title":"Computationally instrument-resolution-independent de novo peptide sequencing for high-resolution devices","volume":"3","author":"Qiao","year":"2021","journal-title":"Nat Mach Intell"},{"key":"2024062809064519900_btae218-B28","doi-asserted-by":"crossref","first-page":"8247","DOI":"10.1073\/pnas.1705691114","article-title":"De novo peptide sequencing by deep learning","volume":"114","author":"Tran","year":"2017","journal-title":"Proc Natl Acad Sci USA"},{"key":"2024062809064519900_btae218-B29","doi-asserted-by":"crossref","first-page":"126","DOI":"10.1038\/s41597-022-01216-6","article-title":"A comprehensive LFQ benchmark dataset on modern day acquisition strategies in proteomics","volume":"9","author":"Van Puyvelde","year":"2022","journal-title":"Sci Data"},{"key":"2024062809064519900_btae218-B30","first-page":"6000","article-title":"Attention is all you need","volume":"30","author":"Vaswani","year":"2017","journal-title":"Adv Neural Inf Process Syst"},{"key":"2024062809064519900_btae218-B31","doi-asserted-by":"crossref","first-page":"vbad057","DOI":"10.1093\/bioadv\/vbad057","article-title":"PGPointNovo: an efficient neural network-based tool for parallel de novo peptide sequencing","volume":"3","author":"Xu","year":"2023","journal-title":"Bioinform Adv"},{"key":"2024062809064519900_btae218-B32","doi-asserted-by":"crossref","first-page":"i183","DOI":"10.1093\/bioinformatics\/btz366","article-title":"pNovo 3: precise de novo peptide sequencing using a learning-to-rank framework","volume":"35","author":"Yang","year":"2019","journal-title":"Bioinformatics"},{"key":"2024062809064519900_btae218-B33","doi-asserted-by":"crossref","first-page":"bbae021","DOI":"10.1093\/bib\/bbae021","article-title":"Introducing \u03c0-HelixNovo for practical large-scale de novo peptide sequencing","volume":"25","author":"Yang","year":"2024","journal-title":"Briefings in Bioinformatics"},{"key":"2024062809064519900_btae218-B34","first-page":"25514","author":"Yilmaz","year":"2022"},{"key":"2024062809064519900_btae218-B35","doi-asserted-by":"crossref","first-page":"1850","DOI":"10.1074\/mcp.TIR118.000783","article-title":"ProteomeTools: systematic characterization of 21 post-translational protein modifications by liquid chromatography tandem mass spectrometry (LC-MS\/MS) using synthetic peptides","volume":"17","author":"Zolg","year":"2018","journal-title":"Mol Cell Proteomics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/Supplement_1\/i410\/58354735\/btae218.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/40\/Supplement_1\/i410\/58354735\/btae218.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,6,28]],"date-time":"2024-06-28T09:32:54Z","timestamp":1719567174000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/40\/Supplement_1\/i410\/7700854"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,6,28]]},"references-count":35,"journal-issue":{"issue":"Supplement_1","published-print":{"date-parts":[[2024,6,28]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btae218","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"value":"1367-4803","type":"print"},{"value":"1367-4811","type":"electronic"}],"subject":[],"published-other":{"date-parts":[[2024,7]]},"published":{"date-parts":[[2024,6,28]]}}}