{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,8]],"date-time":"2026-02-08T08:27:14Z","timestamp":1770539234392,"version":"3.49.0"},"reference-count":28,"publisher":"Springer Science and Business Media LLC","issue":"S16","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2012,11]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Automated database search engines are one of the fundamental engines of high-throughput proteomics enabling daily identifications of hundreds of thousands of peptides and proteins from tandem mass (MS\/MS) spectrometry data. Nevertheless, this automation also makes it humanly impossible to manually validate the vast lists of resulting identifications from such high-throughput searches. This challenge is usually addressed by using a Target-Decoy Approach (TDA) to impose an empirical False Discovery Rate (FDR) at a pre-determined threshold<jats:italic>x<\/jats:italic>% with the expectation that at most<jats:italic>x<\/jats:italic>% of the returned identifications would be false positives. But despite the fundamental importance of FDR estimates in ensuring the utility of large lists of identifications, there is surprisingly little consensus on exactly how TDA should be applied to minimize the chances of biased FDR estimates. In fact, since less rigorous TDA\/FDR estimates tend to result in more identifications (at higher 'true' FDR), there is often little incentive to enforce strict TDA\/FDR procedures in studies where the major metric of success is the size of the list of identifications and there are no follow up studies imposing hard cost constraints on the number of reported false positives.<\/jats:p><jats:p>Here we address the problem of the accuracy of TDA estimates of empirical FDR. Using MS\/MS spectra from samples where we were able to define a<jats:italic>factual<\/jats:italic>FDR estimator of 'true' FDR we evaluate several popular variants of the TDA procedure in a variety of database search contexts. We show that the fraction of false identifications can sometimes be over 10<jats:italic>\u00d7<\/jats:italic>higher than reported and may be unavoidably high for certain types of searches. In addition, we further report that the two-pass search strategy seems the most promising database search strategy.<\/jats:p><jats:p>While unavoidably constrained by the particulars of any specific evaluation dataset, our observations support a series of recommendations towards maximizing the number of resulting identifications while controlling database searches with robust and reproducible TDA estimation of empirical FDR.<\/jats:p>","DOI":"10.1186\/1471-2105-13-s16-s2","type":"journal-article","created":{"date-parts":[[2012,11,5]],"date-time":"2012-11-05T11:15:25Z","timestamp":1352114125000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":130,"title":["False discovery rates in spectral identification"],"prefix":"10.1186","volume":"13","author":[{"given":"Kyowon","family":"Jeong","sequence":"first","affiliation":[]},{"given":"Sangtae","family":"Kim","sequence":"additional","affiliation":[]},{"given":"Nuno","family":"Bandeira","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2012,11,5]]},"reference":[{"key":"5422_CR1","doi-asserted-by":"publisher","first-page":"976","DOI":"10.1016\/1044-0305(94)80016-2","volume":"5","author":"J Eng","year":"1994","unstructured":"Eng J, McCormack A, Yates J: An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J Am Soc Mass Spectrom. 1994, 5: 976-89. 10.1016\/1044-0305(94)80016-2.","journal-title":"J Am Soc Mass Spectrom"},{"key":"5422_CR2","doi-asserted-by":"publisher","first-page":"3551","DOI":"10.1002\/(SICI)1522-2683(19991201)20:18<3551::AID-ELPS3551>3.0.CO;2-2","volume":"20","author":"D Perkins","year":"1999","unstructured":"Perkins D, Pappin D, Creasy D, Cottrell J: Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis. 1999, 20: 3551-67. 10.1002\/(SICI)1522-2683(19991201)20:18<3551::AID-ELPS3551>3.0.CO;2-2.","journal-title":"Electrophoresis"},{"issue":"9","key":"5422_CR3","doi-asserted-by":"publisher","first-page":"1466","DOI":"10.1093\/bioinformatics\/bth092","volume":"20","author":"R Craig","year":"2004","unstructured":"Craig R, Beavis RC: TANDEM: matching proteins with tandem mass spectra. Bioinformatics. 2004, 20 (9): 1466-7. 10.1093\/bioinformatics\/bth092.","journal-title":"Bioinformatics"},{"issue":"5","key":"5422_CR4","doi-asserted-by":"publisher","first-page":"958","DOI":"10.1021\/pr0499491","volume":"3","author":"LY Geer","year":"2004","unstructured":"Geer LY, Markey SP, Kowalak JA, Wagner L, Xu M, Maynard DM, Yang X, Shi W, Bryant SH: Open mass spectrometry search algorithm. J Proteome Res. 2004, 3 (5): 958-64. 10.1021\/pr0499491.","journal-title":"J Proteome Res"},{"issue":"14","key":"5422_CR5","doi-asserted-by":"publisher","first-page":"4626","DOI":"10.1021\/ac050102d","volume":"77","author":"S Tanner","year":"2005","unstructured":"Tanner S, Shu H, Frank A, Wang LC, Zandi E, Mumby M, Pevzner PA, Bafna V: InsPecT: identification of posttranslationally modified peptides from tandem mass spectra. Anal Chem. 2005, 77 (14): 4626-39. 10.1021\/ac050102d.","journal-title":"Anal Chem"},{"issue":"12","key":"5422_CR6","doi-asserted-by":"publisher","first-page":"2840","DOI":"10.1074\/mcp.M110.003731","volume":"9","author":"S Kim","year":"2010","unstructured":"Kim S, Mischerikow N, Bandeira N, Navarro JD, Wich L, Mohammed S, Heck AJR, Pevzner PA: The Generating Function of CID, ETD, and CID\/ETD Pairs of Tandem Mass Spectra: Applications to Database Search. Mol Cell Proteomics. 2010, 9 (12): 2840-52. 10.1074\/mcp.M110.003731.","journal-title":"Mol Cell Proteomics"},{"issue":"11","key":"5422_CR7","doi-asserted-by":"publisher","first-page":"2092","DOI":"10.1016\/j.jprot.2010.08.009","volume":"73","author":"AI Nesvizhskii","year":"2010","unstructured":"Nesvizhskii AI: A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. J Proteomics. 2010, 73 (11): 2092-123. 10.1016\/j.jprot.2010.08.009.","journal-title":"J Proteomics"},{"issue":"3","key":"5422_CR8","doi-asserted-by":"publisher","first-page":"207","DOI":"10.1038\/nmeth1019","volume":"4","author":"JE Elias","year":"2007","unstructured":"Elias JE, Gygi SP: Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat Methods. 2007, 4 (3): 207-14. 10.1038\/nmeth1019.","journal-title":"Nat Methods"},{"key":"5422_CR9","doi-asserted-by":"publisher","first-page":"96","DOI":"10.1021\/pr070244j","volume":"7","author":"J Klimek","year":"2008","unstructured":"Klimek J, Eddes JS, Hohmann L, Jackson J, Peterson A, Letarte S, Gafken PR, Katz JE, Mallick P, Lee H, Schmidt A, Ossola R, Eng JK, Aebersold R, Martin DB: The standard protein mix database: a diverse data set to assist in the production of improved Peptide and protein identification software tools. J Proteome Res. 2008, 7: 96-103. 10.1021\/pr070244j.","journal-title":"J Proteome Res"},{"key":"5422_CR10","doi-asserted-by":"crossref","unstructured":"Paulovich AG, Billheimer D, Ham AL, Vega-Montoto L, Rudnick PA, Tabb DL, Wang P, Blackman RK, Bunk DM, Cardasis HL, Clauser KR, Kinsinger CR, Schilling B, Tegeler TJ, Variyath AM, Wang M, Whiteaker JR, Zimmerman LJ, Fenyo D, Carr SA, Fisher SJ, Gibson BW, Mesri M, Neubert TA, Regnier FE, Ro-driguez H, Spiegelman C, Stein SE, Tempst P, Liebler DC: Interlaboratory study characterizing a yeast performance standard for benchmarking LC-MS platform performance. Molecular & Cellular Proteomics. 242-254. 2","DOI":"10.1074\/mcp.M900222-MCP200"},{"issue":"12","key":"5422_CR11","doi-asserted-by":"publisher","first-page":"1336","DOI":"10.1038\/nbt1208-1336","volume":"26","author":"N Bandeira","year":"2008","unstructured":"Bandeira N, Pham V, Pevzner P, Arnott D, Lill JR: Automated de novo protein sequencing of monoclonal antibodies. Nat Biotechnol. 2008, 26 (12): 1336-8. 10.1038\/nbt1208-1336.","journal-title":"Nat Biotechnol"},{"key":"5422_CR12","doi-asserted-by":"crossref","unstructured":"Granholm V, Noble WS, K\u00e4ll L: On using samples of known protein content to assess the statistical calibration of scores assigned to peptide-spectrum matches in shotgun proteomics. J Proteome Res. 2671-2678. 5","DOI":"10.1021\/pr1012619"},{"key":"5422_CR13","doi-asserted-by":"crossref","unstructured":"Fisher RA: On the interpretation of \"Equation missing\"from contingency tables, and the calculation of P. Journal of the Royal Statistical Society. 87-94.","DOI":"10.2307\/2340521"},{"key":"5422_CR14","doi-asserted-by":"publisher","first-page":"29","DOI":"10.1021\/pr700600n","volume":"7","author":"L K\u00e4ll","year":"2008","unstructured":"K\u00e4ll L, Storey JD, Maccoss MJ, Noble WS: Assigning significance to peptides identified by tandem mass spectrometry using decoy databases. J Proteome Res. 2008, 7: 29-34. 10.1021\/pr700600n.","journal-title":"J Proteome Res"},{"key":"5422_CR15","doi-asserted-by":"publisher","first-page":"47","DOI":"10.1021\/pr700747q","volume":"7","author":"H Choi","year":"2008","unstructured":"Choi H, Nesvizhskii AI: False discovery rates and related statistical concepts in mass spectrometry-based proteomics. J Proteome Res. 2008, 7: 47-50. 10.1021\/pr700747q.","journal-title":"J Proteome Res"},{"key":"5422_CR16","doi-asserted-by":"crossref","unstructured":"Storey J: A direct approach to false discovery rates. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 64 (3): 479-498.","DOI":"10.1111\/1467-9868.00346"},{"key":"5422_CR17","doi-asserted-by":"publisher","first-page":"55","DOI":"10.1007\/978-1-60761-444-9_5","volume":"604","author":"JE Elias","year":"2010","unstructured":"Elias JE, Gygi SP: Target-decoy search strategy for mass spectrometry-based proteomics. Methods Mol Biol. 2010, 604: 55-71. 10.1007\/978-1-60761-444-9_5.","journal-title":"Methods Mol Biol"},{"issue":"Suppl 1","key":"5422_CR18","doi-asserted-by":"publisher","first-page":"i49","DOI":"10.1093\/bioinformatics\/bth947","volume":"20","author":"M Bern","year":"2004","unstructured":"Bern M, Goldberg D, McDonald WH, Yates JR: Automatic quality assessment of peptide tandem mass spectra. Bioinformatics. 2004, 20 (Suppl 1): i49-54. 10.1093\/bioinformatics\/bth947.","journal-title":"Bioinformatics"},{"issue":"12","key":"5422_CR19","doi-asserted-by":"publisher","first-page":"3241","DOI":"10.1021\/pr0603248","volume":"5","author":"S Na","year":"2006","unstructured":"Na S, Paek E: Quality assessment of tandem mass spectra based on cumulative intensity normalization. J Proteome Res. 2006, 5 (12): 3241-8. 10.1021\/pr0603248.","journal-title":"J Proteome Res"},{"key":"5422_CR20","doi-asserted-by":"publisher","first-page":"113","DOI":"10.1021\/pr070361e","volume":"7","author":"AM Frank","year":"2008","unstructured":"Frank AM, Bandeira N, Shen Z, Tanner S, Briggs SP, Smith RD, Pevzner PA: Clustering millions of tandem mass spectra. J Proteome Res. 2008, 7: 113-22. 10.1021\/pr070361e.","journal-title":"J Proteome Res"},{"issue":"47","key":"5422_CR21","doi-asserted-by":"publisher","first-page":"18132","DOI":"10.1073\/pnas.0800788105","volume":"105","author":"M Mann","year":"2008","unstructured":"Mann M, Kelleher NL: Precision proteomics: the case for high resolution and high mass accuracy. Proc Natl Acad Sci USA. 2008, 105 (47): 18132-8. 10.1073\/pnas.0800788105.","journal-title":"Proc Natl Acad Sci USA"},{"key":"5422_CR22","doi-asserted-by":"publisher","first-page":"5383","DOI":"10.1021\/ac025747h","volume":"74","author":"A Keller","year":"2002","unstructured":"Keller A, Nesvizhskii A, Kolker E, Aebersold R: Empirical statistical model to estimate the accuracy of peptide identifications made by MS\/MS and Database Search. Anal Chem. 2002, 74: 5383-92. 10.1021\/ac025747h.","journal-title":"Anal Chem"},{"issue":"8","key":"5422_CR23","doi-asserted-by":"publisher","first-page":"3354","DOI":"10.1021\/pr8001244","volume":"7","author":"S Kim","year":"2008","unstructured":"Kim S, Gupta N, Pevzner P: Spectral probabilities and generating functions of tandem mass spectra: a strike against decoy databases. J Proteome Res. 2008, 7 (8): 3354-3363. 10.1021\/pr8001244.","journal-title":"J Proteome Res"},{"issue":"20","key":"5422_CR24","doi-asserted-by":"publisher","first-page":"2310","DOI":"10.1002\/rcm.1198","volume":"17","author":"R Craig","year":"2003","unstructured":"Craig R, Beavis RC: A method for reducing the time required to match protein sequences with tandem mass spectra. Rapid Commun Mass Spectrom. 2003, 17 (20): 2310-6. 10.1002\/rcm.1198.","journal-title":"Rapid Commun Mass Spectrom"},{"issue":"4","key":"5422_CR25","doi-asserted-by":"publisher","first-page":"2123","DOI":"10.1021\/pr101143m","volume":"10","author":"M Bern","year":"2011","unstructured":"Bern M, Kil Y: Comment on \"Unbiased Statistical Analysis for Multi-Stage Proteomic Search Strategies\". J Proteome Res. 2011, 10 (4): 2123-2127. 10.1021\/pr101143m.","journal-title":"J Proteome Res"},{"issue":"9","key":"5422_CR26","doi-asserted-by":"publisher","first-page":"4328","DOI":"10.1021\/pr900349r","volume":"8","author":"M Bern","year":"2009","unstructured":"Bern M, Phinney BS, Goldberg D: Reanalysis of Tyrannosaurus rex mass spectra. J Proteome Res. 2009, 8 (9): 4328-32. 10.1021\/pr900349r.","journal-title":"J Proteome Res"},{"issue":"2","key":"5422_CR27","doi-asserted-by":"publisher","first-page":"700","DOI":"10.1021\/pr900256v","volume":"9","author":"LJ Everett","year":"2010","unstructured":"Everett LJ, Bierl C, Master SR: Unbiased statistical analysis for multi-stage proteomic search strategies. J Proteome Res. 2010, 9 (2): 700-707. 10.1021\/pr900256v.","journal-title":"J Proteome Res"},{"key":"5422_CR28","doi-asserted-by":"publisher","first-page":"605","DOI":"10.1021\/pr900947u","volume":"9","author":"H Lam","year":"2010","unstructured":"Lam H, Deutsch EW, Aebersold R: Artificial decoy spectral libraries for false discovery rate estimation in spectral library searching in proteomics. J Proteome Res. 2010, 9: 605-610. 10.1021\/pr900947u.","journal-title":"J Proteome Res"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-13-S16-S2.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,26]],"date-time":"2023-06-26T15:04:29Z","timestamp":1687791869000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-13-S16-S2"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,11]]},"references-count":28,"journal-issue":{"issue":"S16","published-print":{"date-parts":[[2012,11]]}},"alternative-id":["5422"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-13-s16-s2","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2012,11]]},"assertion":[{"value":"5 November 2012","order":1,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"S2"}}