{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,21]],"date-time":"2026-05-21T03:57:31Z","timestamp":1779335851521,"version":"3.51.4"},"reference-count":31,"publisher":"Oxford University Press (OUP)","issue":"17","funder":[{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2016,9,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation : There is a growing trend for biomedical researchers to extract evidence and draw conclusions from mass spectrometry based proteomics experiments, the cornerstone of which is peptide identification. Inaccurate assignments of peptide identification confidence thus may have far-reaching and adverse consequences. Although some peptide identification methods report accurate statistics, they have been limited to certain types of scoring function. The extreme value statistics based method, while more general in the scoring functions it allows, demands accurate parameter estimates and requires, at least in its original design, excessive computational resources. Improving the parameter estimate accuracy and reducing the computational cost for this method has two advantages: it provides another feasible route to accurate significance assessment, and it could provide reliable statistics for scoring functions yet to be developed.<\/jats:p><jats:p>Results : We have formulated and implemented an efficient algorithm for calculating the extreme value statistics for peptide identification applicable to various scoring functions, bypassing the need for searching large random databases.<\/jats:p><jats:p>Availability and Implementation : The source code, implemented in C\u2009++\u2009on a linux system, is available for download at ftp:\/\/ftp.ncbi.nlm.nih.gov\/pub\/qmbp\/qmbp_ms\/RAId\/RAId_Linux_64Bit<\/jats:p><jats:p>Contact: \u00a0yyu@ncbi.nlm.nih.gov<\/jats:p><jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btw225","type":"journal-article","created":{"date-parts":[[2016,4,30]],"date-time":"2016-04-30T01:35:22Z","timestamp":1461980122000},"page":"2642-2649","source":"Crossref","is-referenced-by-count":8,"title":["Confidence assignment for mass spectrometry based peptide identifications via the extreme value distribution"],"prefix":"10.1093","volume":"32","author":[{"given":"Gelio","family":"Alves","sequence":"first","affiliation":[{"name":"National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yi-Kuo","family":"Yu","sequence":"additional","affiliation":[{"name":"National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2016,6,29]]},"reference":[{"key":"2023020112585296700_btw225-B1","doi-asserted-by":"crossref","first-page":"6538","DOI":"10.1016\/j.physa.2008.08.024","article-title":"Statistical characterization of a 1D random potential problem \u2013 with applications in score statistics of MS-based peptide sequencing","volume":"387","author":"Alves","year":"2008","journal-title":"Physica A"},{"key":"2023020112585296700_btw225-B2","doi-asserted-by":"crossref","first-page":"699","DOI":"10.1093\/bioinformatics\/btu717","article-title":"Mass spectrometry-based protein identification with accurate statistical significance assignment","volume":"31","author":"Alves","year":"2015","journal-title":"Bioinformatics"},{"key":"2023020112585296700_btw225-B3","doi-asserted-by":"crossref","first-page":"26.","DOI":"10.1186\/1745-6150-2-26","article-title":"Calibrating E-values for MS2 database search methods","volume":"2","author":"Alves","year":"2007","journal-title":"Biol. Direct"},{"key":"2023020112585296700_btw225-B4","doi-asserted-by":"crossref","first-page":"25.","DOI":"10.1186\/1745-6150-2-25","article-title":"RAId_DbS: peptide identification using database searches with realistic statistics","volume":"2","author":"Alves","year":"2007","journal-title":"Biol. Direct"},{"key":"2023020112585296700_btw225-B5","doi-asserted-by":"crossref","first-page":"3102","DOI":"10.1021\/pr700798h","article-title":"Enhancing peptide identification confidence by combining search methods","volume":"7","author":"Alves","year":"2008","journal-title":"J. Proteome Res"},{"key":"2023020112585296700_btw225-B6","doi-asserted-by":"crossref","first-page":"505.","DOI":"10.1186\/1471-2164-9-505","article-title":"RAId_DbS: mass-spectrometry based peptide identification web server with knowledge integration","volume":"9","author":"Alves","year":"2008","journal-title":"BMC Genomics"},{"key":"2023020112585296700_btw225-B7","doi-asserted-by":"crossref","first-page":"e15438.","DOI":"10.1371\/journal.pone.0015438","article-title":"RAId_aPS: MS\/MS analysis with multiple scoring functions and spectrum-specific statistics","volume":"5","author":"Alves","year":"2010","journal-title":"PLoS ONE"},{"key":"2023020112585296700_btw225-B8","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1111\/j.2517-6161.1995.tb02031.x","article-title":"Controlling the false discovery rate: a practical and powerful approach to multiple testing","volume":"57","author":"Benjamini","year":"1995","journal-title":"J. R. Stat. Soc. Ser. B (Methodological)"},{"key":"2023020112585296700_btw225-B9","doi-asserted-by":"crossref","first-page":"207","DOI":"10.1038\/nmeth1019","article-title":"Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry","volume":"4","author":"Elias","year":"2007","journal-title":"Nat. Methods"},{"key":"2023020112585296700_btw225-B10","doi-asserted-by":"crossref","first-page":"976","DOI":"10.1016\/1044-0305(94)80016-2","article-title":"An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database","volume":"5","author":"Eng","year":"1994","journal-title":"J. Am. Soc. Mass Spectrom"},{"key":"2023020112585296700_btw225-B11","doi-asserted-by":"crossref","first-page":"4598","DOI":"10.1021\/pr800420s","article-title":"A fast SEQUEST cross correlation algorithm","volume":"7","author":"Eng","year":"2008","journal-title":"J. Proteome Res"},{"key":"2023020112585296700_btw225-B12","doi-asserted-by":"crossref","first-page":"768","DOI":"10.1021\/ac0258709","article-title":"A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes","volume":"75","author":"Fenyo","year":"2003","journal-title":"Anal. Chem"},{"key":"2023020112585296700_btw225-B13","doi-asserted-by":"crossref","first-page":"958","DOI":"10.1021\/pr0499491","article-title":"Open mass spectrometry search algorithm","volume":"3","author":"Geer","year":"2004","journal-title":"J. Proteome Res"},{"key":"2023020112585296700_btw225-B14","doi-asserted-by":"crossref","DOI":"10.7312\/gumb92958","volume-title":"Statistics of Extremes","author":"Gumbel","year":"1958"},{"key":"2023020112585296700_btw225-B15","doi-asserted-by":"crossref","first-page":"1111","DOI":"10.1007\/s13361-011-0139-3","article-title":"Target-decoy approach and false discovery rate: when things may go wrong","volume":"22","author":"Gupta","year":"2011","journal-title":"J. Am. Soc. Mass Spectrom"},{"key":"2023020112585296700_btw225-B16","doi-asserted-by":"crossref","first-page":"1225","DOI":"10.1093\/bioinformatics\/btn120","article-title":"A note on the false discovery rate and inconsistent comparisons between experiments","volume":"24","author":"Higdon","year":"2008","journal-title":"Bioinformatics"},{"key":"2023020112585296700_btw225-B17","doi-asserted-by":"crossref","first-page":"3354","DOI":"10.1021\/pr8001244","article-title":"Spectral probabilities and generating functions of tandem mass spectra: a strike against decoy databases","volume":"7","author":"Kim","year":"2008","journal-title":"J. Proteome Res"},{"key":"2023020112585296700_btw225-B18","doi-asserted-by":"crossref","first-page":"98","DOI":"10.1080\/00031305.1989.10475627","article-title":"Correlation coefficient goodness-of-fit test for the extreme-value distribution","volume":"43","author":"Kinnison","year":"1989","journal-title":"Am. Stat"},{"key":"2023020112585296700_btw225-B19","doi-asserted-by":"crossref","first-page":"2106","DOI":"10.1021\/pr8011107","article-title":"Statistical calibration of the SEQUEST XCorr function","volume":"8","author":"Klammer","year":"2009","journal-title":"J. Proteome Res"},{"key":"2023020112585296700_btw225-B20","doi-asserted-by":"crossref","DOI":"10.1142\/p191","volume-title":"Extreme Value Distributions","author":"Kotz","year":"2000"},{"key":"2023020112585296700_btw225-B21","doi-asserted-by":"crossref","first-page":"2830","DOI":"10.1093\/bioinformatics\/btl379","article-title":"General framework for developing and evaluating database scoring algorithms using the TANDEM search engine","volume":"22","author":"MacLean","year":"2006","journal-title":"Bioinformatics"},{"key":"2023020112585296700_btw225-B22","first-page":"285","article-title":"Optimization of proteomic sample preparation procedures for comprehensive protein characterization of pathogenic systems","volume":"19","author":"Mottaz-Brewer","year":"2008","journal-title":"J. Biomol. Tech"},{"key":"2023020112585296700_btw225-B23","first-page":"211","article-title":"Rapid assessment of extremal statistics for gapped local alignment","author":"Olsen","year":"1999","journal-title":"Proc. Int. Conf. Intell. Syst. Mol. Biol"},{"key":"2023020112585296700_btw225-B24","doi-asserted-by":"crossref","first-page":"8880","DOI":"10.1073\/pnas.88.20.8880","article-title":"Distribution of glutamine and asparagine residues and their near neighbors in peptides and proteins","volume":"88","author":"Robinson","year":"1991","journal-title":"Proc. Natl. Acad. Sci. USA"},{"key":"2023020112585296700_btw225-B25","doi-asserted-by":"crossref","DOI":"10.1371\/annotation\/03110e8b-3e10-4334-9ff7-969c85ad25d8","article-title":"Comparative omics-driven genome annotation refinement: application across Yersiniae","volume":"7","author":"Schrimpe-Rutledge","year":"2012","journal-title":"PLoS ONE"},{"key":"2023020112585296700_btw225-B26","doi-asserted-by":"crossref","first-page":"1652","DOI":"10.1093\/bioinformatics\/btn232","article-title":"On E-values for tandem MS scoring schemes","volume":"24","author":"Segal","year":"2008","journal-title":"Bioinformatics"},{"key":"2023020112585296700_btw225-B27","first-page":"608","article-title":"Statistical \u201cdiscoveries\u201d and effect-size estimation","volume":"84","author":"Sori\u0107","year":"1989","journal-title":"J. Am. Stat. Assoc"},{"key":"2023020112585296700_btw225-B28","doi-asserted-by":"crossref","first-page":"1128","DOI":"10.1093\/bioinformatics\/btr089","article-title":"Assigning spectrum-specific P -values to protein identifications by mass spectrometry","volume":"27","author":"Spirin","year":"2011","journal-title":"Bioinformatics"},{"key":"2023020112585296700_btw225-B29","doi-asserted-by":"crossref","first-page":"249","DOI":"10.1089\/10665270152530845","article-title":"Statistical significance of probabilistic sequence alignment and related local hidden Markov models","volume":"8","author":"Yu","year":"2001","journal-title":"J. Comput. Biol"},{"key":"2023020112585296700_btw225-B30","doi-asserted-by":"crossref","first-page":"3","DOI":"10.1007\/3-540-45692-9_1","volume-title":"Biological Evolution and Statistical Physics, Volume 585 of Lecture Notes in Physics","author":"Yu","year":"2002"},{"key":"2023020112585296700_btw225-B31","doi-asserted-by":"crossref","first-page":"5966","DOI":"10.1093\/nar\/gkl731","article-title":"Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches","volume":"34","author":"Yu","year":"2006","journal-title":"Nucleic Acids Res"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/32\/17\/2642\/49021142\/bioinformatics_32_17_2642.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/32\/17\/2642\/49021142\/bioinformatics_32_17_2642.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,6,16]],"date-time":"2024-06-16T03:47:47Z","timestamp":1718509667000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/32\/17\/2642\/2450727"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,6,29]]},"references-count":31,"journal-issue":{"issue":"17","published-print":{"date-parts":[[2016,9,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btw225","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2016,9,1]]},"published":{"date-parts":[[2016,6,29]]}}}