{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,21]],"date-time":"2026-03-21T05:02:33Z","timestamp":1774069353379,"version":"3.50.1"},"reference-count":31,"publisher":"Oxford University Press (OUP)","issue":"16","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2008,8,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: A mass spectrum produced via tandem mass spectrometry can be tentatively matched to a peptide sequence via database search. Here, we address the problem of assigning a posterior error probability (PEP) to a given peptide-spectrum match (PSM). This problem is considerably more difficult than the related problem of estimating the error rate associated with a large collection of PSMs. Existing methods for estimating PEPs rely on a parametric or semiparametric model of the underlying score distribution.<\/jats:p><jats:p>Results: We demonstrate how to apply non-parametric logistic regression to this problem. The method makes no explicit assumptions about the form of the underlying score distribution; instead, the method relies upon decoy PSMs, produced by searching the spectra against a decoy sequence database, to provide a model of the null score distribution. We show that our non-parametric logistic regression method produces accurate PEP estimates for six different commonly used PSM score functions. In particular, the estimates produced by our method are comparable in accuracy to those of PeptideProphet, which uses a parametric or semiparametric model designed specifically to work with SEQUEST. The advantage of the non-parametric approach is applicability and robustness to new score functions and new types of data.<\/jats:p><jats:p>Availability: C++ code implementing the method as well as supplementary information is available at http:\/\/noble.gs.washington.edu\/proj\/qvality<\/jats:p><jats:p>Contact: \u00a0noble@gs.washington.edu<\/jats:p>","DOI":"10.1093\/bioinformatics\/btn294","type":"journal-article","created":{"date-parts":[[2008,8,9]],"date-time":"2008-08-09T13:08:02Z","timestamp":1218287282000},"page":"i42-i48","source":"Crossref","is-referenced-by-count":203,"title":["Non-parametric estimation of posterior error probabilities associated with peptides identified by tandem mass spectrometry"],"prefix":"10.1093","volume":"24","author":[{"given":"Lukas","family":"K\u00e4ll","sequence":"first","affiliation":[{"name":"1 Department of Genome Sciences, University of Washington, Seattle, WA, 2Lewis-Sigler Institute, Princeton University, Princeton, NJ and 3Department of Computer Science and Engineering, University of Washington, Seattle, WA, USA"}]},{"given":"John D.","family":"Storey","sequence":"additional","affiliation":[{"name":"1 Department of Genome Sciences, University of Washington, Seattle, WA, 2Lewis-Sigler Institute, Princeton University, Princeton, NJ and 3Department of Computer Science and Engineering, University of Washington, Seattle, WA, USA"},{"name":"1 Department of Genome Sciences, University of Washington, Seattle, WA, 2Lewis-Sigler Institute, Princeton University, Princeton, NJ and 3Department of Computer Science and Engineering, University of Washington, Seattle, WA, USA"}]},{"given":"William Stafford","family":"Noble","sequence":"additional","affiliation":[{"name":"1 Department of Genome Sciences, University of Washington, Seattle, WA, 2Lewis-Sigler Institute, Princeton University, Princeton, NJ and 3Department of Computer Science and Engineering, University of Washington, Seattle, WA, USA"},{"name":"1 Department of Genome Sciences, University of Washington, Seattle, WA, 2Lewis-Sigler Institute, Princeton University, Princeton, NJ and 3Department of Computer Science and Engineering, University of Washington, Seattle, WA, USA"}]}],"member":"286","published-online":{"date-parts":[[2008,8,9]]},"reference":[{"key":"2023020210502306300_B1","doi-asserted-by":"crossref","first-page":"137","DOI":"10.1021\/pr0255654","article-title":"A new algorithm for the evaluation of shotgun peptide sequencing in proteomics: support vector machine classification of peptide MS\/MS spectra and sequest scores","volume":"2","author":"Anderson","year":"2003","journal-title":"J. Proteome Res"},{"key":"2023020210502306300_B2","doi-asserted-by":"crossref","first-page":"123","DOI":"10.1093\/biomet\/69.1.123","article-title":"Penalized maximum likelihood estimation in logistic regression and discrimination","volume":"69","author":"Anderson","year":"1982","journal-title":"Biometrika"},{"key":"2023020210502306300_B3","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1111\/j.2517-6161.1995.tb02031.x","article-title":"Controlling the false discovery rate: a practical and powerful approach to multiple testing","volume":"57","author":"Benjamini","year":"1995","journal-title":"J. R. Stat. Soc. B"},{"key":"2023020210502306300_B4","doi-asserted-by":"crossref","first-page":"1393","DOI":"10.1021\/ac0617013","article-title":"Lookup peaks: a hybrid de novo sequencing and database search for protein identification by tandem mass spectrometry","volume":"79","author":"Bern","year":"2007","journal-title":"Anal. Chem"},{"key":"2023020210502306300_B5","doi-asserted-by":"crossref","first-page":"254","DOI":"10.1021\/pr070542g","article-title":"Semisupervised model-based validation of peptide identifications in mass spectrometry-based proteomics","volume":"7","author":"Choi","year":"2008","journal-title":"J. Proteome Res"},{"key":"2023020210502306300_B6","doi-asserted-by":"crossref","first-page":"286","DOI":"10.1021\/pr7006818","article-title":"Statistical validation of peptide identifications in large-scale proteomics using target-decoy database search strategy and flexible mixture modeling","volume":"7","author":"Choi","year":"2008","journal-title":"J. Proteome Res"},{"key":"2023020210502306300_B7","doi-asserted-by":"crossref","first-page":"1454","DOI":"10.1002\/pmic.200300485","article-title":"OLAV: towards high-throughput tandem mass spectrometry data identification","volume":"3","author":"Colinge","year":"2003","journal-title":"Proteomics"},{"key":"2023020210502306300_B8","doi-asserted-by":"crossref","first-page":"1466","DOI":"10.1093\/bioinformatics\/bth092","article-title":"Tandem: matching proteins with tandem mass spectra","volume":"20","author":"Craig","year":"2004","journal-title":"Bioinformatics"},{"key":"2023020210502306300_B9","doi-asserted-by":"crossref","first-page":"1151","DOI":"10.1198\/016214501753382129","article-title":"Empirical bayes analysis of a microarray experiment","volume":"96","author":"Efron","year":"2001","journal-title":"J. Am. Stat. Assoc"},{"key":"2023020210502306300_B10","doi-asserted-by":"crossref","first-page":"214","DOI":"10.1038\/nbt930","article-title":"Intensity-based protein identification by machine learning from a library of tandem mass spectra","volume":"22","author":"Elias","year":"2004","journal-title":"Nat. Biotechnol"},{"key":"2023020210502306300_B11","doi-asserted-by":"crossref","first-page":"976","DOI":"10.1016\/1044-0305(94)80016-2","article-title":"An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database","volume":"5","author":"Eng","year":"1994","journal-title":"J. Am. Soc. Mass Spectrom"},{"key":"2023020210502306300_B12","doi-asserted-by":"crossref","first-page":"2210","DOI":"10.1093\/bioinformatics\/btm267","article-title":"Probability-based pattern recognition and statistical framework for randomization: modeling tandem mass spectrum\/peptide sequence false match frequencies","volume":"23","author":"Feng","year":"2007","journal-title":"Bioinformatics"},{"key":"2023020210502306300_B13","volume-title":"Nonparametric Regression and Generalized Linear Models: a Roughness Penalty Approach.","author":"Green"},{"key":"2023020210502306300_B14","doi-asserted-by":"crossref","first-page":"169","DOI":"10.1080\/10618600.1992.10477012","article-title":"Cross-validating non-gaussian data","volume":"1","author":"Gu","year":"1992","journal-title":"J. Comput. Graph. Stat"},{"key":"2023020210502306300_B15","doi-asserted-by":"crossref","first-page":"1758","DOI":"10.1021\/pr0605320","article-title":"Estimating the statistical signficance of peptide identifications from shotgun proteomics experiments","volume":"6","author":"Higgs","year":"2007","journal-title":"J. Proteome Res"},{"key":"2023020210502306300_B16","doi-asserted-by":"crossref","first-page":"923","DOI":"10.1038\/nmeth1113","article-title":"A semi-supervised machine learning technique for peptide identification from shotgun proteomics datasets","volume":"4","author":"K\u00e4ll","year":"2007","journal-title":"Nat. Methods"},{"key":"2023020210502306300_B17","doi-asserted-by":"crossref","first-page":"29","DOI":"10.1021\/pr700600n","article-title":"Assigning significance to peptides identified by tandem mass spectrometry using decoy databases","volume":"7","author":"K\u00e4ll","year":"2008","journal-title":"J. Proteome Res"},{"key":"2023020210502306300_B18","doi-asserted-by":"crossref","first-page":"5383","DOI":"10.1021\/ac025747h","article-title":"Empirical statistical model to estimate the accuracy of peptide identification made by MS\/MS and database search","volume":"74","author":"Keller","year":"2002","journal-title":"Anal. Chem"},{"key":"2023020210502306300_B19","doi-asserted-by":"crossref","first-page":"695","DOI":"10.1021\/pr050315j","article-title":"Effects of modified digestion schemes on the identification of proteins from complex mixtures","volume":"5","author":"Klammer","year":"2006","journal-title":"J. Proteome Res"},{"key":"2023020210502306300_B20","first-page":"175","article-title":"Peptide charge state determination for low-resolution tandem mass spectra","volume-title":"Proceedings of the IEEE Computational Systems Bioinformatics Conference (CSB\u201905).","author":"Klammer","year":"2005"},{"key":"2023020210502306300_B21","doi-asserted-by":"crossref","first-page":"378","DOI":"10.1016\/S1044-0305(02)00352-5","article-title":"Qscore: an algorithm for evaluating sequest database search results","volume":"13","author":"Moore","year":"2002","journal-title":"J. Am. Soc. Mass Spectrom"},{"key":"2023020210502306300_B22","doi-asserted-by":"crossref","first-page":"4646","DOI":"10.1021\/ac0341261","article-title":"A statistical model for identifying proteins by tandem mass spectrometry","volume":"75","author":"Nesvizhskii","year":"2003","journal-title":"Anal. Chem"},{"key":"2023020210502306300_B23","first-page":"3551","article-title":"Probability-based protein identification by searching sequence databases using mass spectrometry data","volume-title":"Electrophoresis.","author":"K\u00e4ll","year":"1999"},{"key":"2023020210502306300_B24","first-page":"608","article-title":"Statistical discoveries and effect-size estimation","volume":"84","author":"Soric","year":"1989","journal-title":"J. Am. Stat. Assoc"},{"key":"2023020210502306300_B25","doi-asserted-by":"crossref","first-page":"479","DOI":"10.1111\/1467-9868.00346","article-title":"A direct approach to false discovery rates","volume":"64","author":"Storey","year":"2002","journal-title":"J. R. Stat. Soc"},{"key":"2023020210502306300_B26","doi-asserted-by":"crossref","first-page":"9440","DOI":"10.1073\/pnas.1530509100","article-title":"Statistical significance for genome-wide studies","volume":"100","author":"Storey","year":"2003","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023020210502306300_B27","doi-asserted-by":"crossref","first-page":"1380","DOI":"10.1371\/journal.pbio.0030267","article-title":"Multiple locus linkage analysis of genomewide expression in yeast","volume":"3","author":"Storey","year":"2005","journal-title":"PLoS Biol"},{"key":"2023020210502306300_B28","doi-asserted-by":"crossref","first-page":"4626","DOI":"10.1021\/ac050102d","article-title":"InsPecT: identification of posttranslationally modified peptides from tandem mass spectra","volume":"77","author":"Tanner","year":"2005","journal-title":"Anal. Chem"},{"key":"2023020210502306300_B29","doi-asserted-by":"crossref","first-page":"133","DOI":"10.1111\/j.2517-6161.1983.tb01239.x","article-title":"Bayesian \u201cConfidence Intervals\u201d for the cross-validated smoothing Spline","volume":"45","author":"Wahba","year":"1983","journal-title":"J. R. Stat. Soc. B (Methodological)"},{"key":"2023020210502306300_B30","doi-asserted-by":"crossref","first-page":"242","DOI":"10.1038\/85686","article-title":"Large-scale analysis of the yeast proteome by multidimensional protein identification technology","volume":"19","author":"Washburn","year":"2001","journal-title":"Nat. Biotechnol"},{"key":"2023020210502306300_B31","doi-asserted-by":"crossref","first-page":"29","DOI":"10.1186\/1471-2105-9-29","article-title":"A nonparametric model for quality control of database search results in shotgun proteomics","volume":"9","author":"Zhang","year":"2008","journal-title":"BMC Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/24\/16\/i42\/49053715\/bioinformatics_24_16_i42.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/24\/16\/i42\/49053715\/bioinformatics_24_16_i42.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,1,31]],"date-time":"2025-01-31T11:20:32Z","timestamp":1738322432000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/24\/16\/i42\/201250"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,8,9]]},"references-count":31,"journal-issue":{"issue":"16","published-print":{"date-parts":[[2008,8,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btn294","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2008,8,15]]},"published":{"date-parts":[[2008,8,9]]}}}