{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,16]],"date-time":"2026-01-16T12:49:34Z","timestamp":1768567774135,"version":"3.49.0"},"reference-count":39,"publisher":"Oxford University Press (OUP)","issue":"5","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2015,3,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: Assigning statistical significance accurately has become increasingly important as metadata of many types, often assembled in hierarchies, are constructed and combined for further biological analyses. Statistical inaccuracy of metadata at any level may propagate to downstream analyses, undermining the validity of scientific conclusions thus drawn. From the perspective of mass spectrometry-based proteomics, even though accurate statistics for peptide identification can now be achieved, accurate protein level statistics remain challenging.<\/jats:p><jats:p>Results: We have constructed a protein ID method that combines peptide evidences of a candidate protein based on a rigorous formula derived earlier; in this formula the database P-value of every peptide is weighted, prior to the final combination, according to the number of proteins it maps to. We have also shown that this protein ID method provides accurate protein level E-value, eliminating the need of using empirical post-processing methods for type-I error control. Using a known protein mixture, we find that this protein ID method, when combined with the Sori\u0107 formula, yields accurate values for the proportion of false discoveries. In terms of retrieval efficacy, the results from our method are comparable with other methods tested.<\/jats:p><jats:p>Availability and implementation: The source code, implemented in C++ on a linux system, is available for download at ftp:\/\/ftp.ncbi.nlm.nih.gov\/pub\/qmbp\/qmbp_ms\/RAId\/RAId_Linux_64Bit.<\/jats:p><jats:p>Contact: \u00a0yyu@ncbi.nlm.nih.gov<\/jats:p><jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btu717","type":"journal-article","created":{"date-parts":[[2014,11,1]],"date-time":"2014-11-01T07:11:58Z","timestamp":1414825918000},"page":"699-706","source":"Crossref","is-referenced-by-count":21,"title":["Mass spectrometry-based protein identification with accurate statistical significance assignment"],"prefix":"10.1093","volume":"31","author":[{"given":"Gelio","family":"Alves","sequence":"first","affiliation":[{"name":"National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Yi-Kuo","family":"Yu","sequence":"additional","affiliation":[{"name":"National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2014,10,31]]},"reference":[{"key":"2023020116164398100_btu717-B1","doi-asserted-by":"crossref","first-page":"6538","DOI":"10.1016\/j.physa.2008.08.024","article-title":"Statistical characterization of a 1D random potential problem\u2014with applications in score statistics of MS-based peptide sequencing","volume":"387","author":"Alves","year":"2008","journal-title":"Physica A"},{"key":"2023020116164398100_btu717-B2","doi-asserted-by":"crossref","first-page":"e22647","DOI":"10.1371\/journal.pone.0022647","article-title":"Combining independent, weighted P-values: achieving computational stability by a systematic expansion with controllable accuracy","volume":"6","author":"Alves","year":"2011","journal-title":"PLoS ONE"},{"key":"2023020116164398100_btu717-B3","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1186\/1745-6150-2-25","article-title":"RAId_DbS: peptide identification using database searches with realistic statistics","volume":"2","author":"Alves","year":"2007","journal-title":"Biol. Direct"},{"key":"2023020116164398100_btu717-B4","doi-asserted-by":"crossref","first-page":"27","DOI":"10.1186\/1745-6150-3-27","article-title":"Detection of co-eluted peptides using database search methods","volume":"3","author":"Alves","year":"2008","journal-title":"Biol. Direct"},{"key":"2023020116164398100_btu717-B5","doi-asserted-by":"crossref","first-page":"3102","DOI":"10.1021\/pr700798h","article-title":"Enhancing peptide identification confidence by combining search methods","volume":"7","author":"Alves","year":"2008","journal-title":"J. Proteome Res."},{"key":"2023020116164398100_btu717-B6","doi-asserted-by":"crossref","first-page":"e15438","DOI":"10.1371\/journal.pone.0015438","article-title":"RAId_aPS: MS\/MS analysis with multiple scoring functions and spectrum-specific statistics","volume":"5","author":"Alves","year":"2010","journal-title":"PLoS One"},{"key":"2023020116164398100_btu717-B7","volume-title":"Elements of the Theory of Markov Processes and Their Applications","author":"Bahrucha-Reid","year":"1960"},{"key":"2023020116164398100_btu717-B8","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1111\/j.2517-6161.1995.tb02031.x","article-title":"Controlling the false discovery rate: a practical and powerful approach to multiple testing","volume":"57","author":"Benjamini","year":"1995","journal-title":"J. R. Stat. Soc. Ser. B (Methodological)"},{"key":"2023020116164398100_btu717-B9","doi-asserted-by":"crossref","first-page":"3871","DOI":"10.1021\/pr101196n","article-title":"Faster SEQUEST searching for peptide identification from tandem mass spectra","volume":"10","author":"Diament","year":"2011","journal-title":"J. Proteome Res."},{"key":"2023020116164398100_btu717-B10","doi-asserted-by":"crossref","first-page":"207","DOI":"10.1038\/nmeth1019","article-title":"Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry","volume":"4","author":"Elias","year":"2007","journal-title":"Nat. Methods"},{"key":"2023020116164398100_btu717-B11","doi-asserted-by":"crossref","first-page":"976","DOI":"10.1016\/1044-0305(94)80016-2","article-title":"An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database","volume":"5","author":"Eng","year":"1994","journal-title":"J. Am. Soc. Mass Spectrom."},{"key":"2023020116164398100_btu717-B12","doi-asserted-by":"crossref","first-page":"2210","DOI":"10.1093\/bioinformatics\/btm267","article-title":"Probability-based pattern recognition and statistical framework for randomization: modeling tandem mass spectrum\/peptide sequence false match frequencies","volume":"23","author":"Feng","year":"2007","journal-title":"Bioinformatics"},{"key":"2023020116164398100_btu717-B13","doi-asserted-by":"crossref","first-page":"768","DOI":"10.1021\/ac0258709","article-title":"A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes","volume":"75","author":"Fenyo","year":"2003","journal-title":"Anal. Chem."},{"key":"2023020116164398100_btu717-B14","doi-asserted-by":"crossref","first-page":"189","DOI":"10.1007\/978-1-60761-842-3_11","article-title":"Mass spectrometric protein identification using the global proteome machine","volume":"673","author":"Fenyo","year":"2010","journal-title":"Methods Mol. Biol."},{"key":"2023020116164398100_btu717-B15","volume-title":"Statistical Methods for Research Workers","author":"Fisher","year":"1932"},{"key":"2023020116164398100_btu717-B16","doi-asserted-by":"crossref","first-page":"264","DOI":"10.1111\/j.2517-6161.1955.tb00201.x","article-title":"On the weighted combination of significance tests","volume":"17","author":"Good","year":"1955","journal-title":"J. R. Stat. Soc. Ser. B (Methodological)"},{"key":"2023020116164398100_btu717-B17","doi-asserted-by":"crossref","first-page":"1111","DOI":"10.1007\/s13361-011-0139-3","article-title":"Target-decoy approach and false discovery rate: when things may go wrong","volume":"22","author":"Gupta","year":"2011","journal-title":"J. Am. Soc. Mass Spectrom."},{"key":"2023020116164398100_btu717-B18","doi-asserted-by":"crossref","first-page":"586","DOI":"10.1093\/bib\/bbs004","article-title":"Protein inference: a review","volume":"13","author":"Huang","year":"2012","journal-title":"Brief. Bioinform."},{"key":"2023020116164398100_btu717-B19","doi-asserted-by":"crossref","first-page":"S4","DOI":"10.1186\/1471-2105-13-S16-S4","article-title":"Computational approaches to protein inference in shotgun proteomics","volume":"13","author":"Li","year":"2012","journal-title":"BMC Bioinformatics."},{"key":"2023020116164398100_btu717-B20","doi-asserted-by":"crossref","first-page":"625","DOI":"10.1080\/03610928308828483","article-title":"On linear combinations of independent exponential variables","volume":"12","author":"Mathai","year":"1983","journal-title":"Commun. Stat. Theory Methods"},{"key":"2023020116164398100_btu717-B21","doi-asserted-by":"crossref","first-page":"e12","DOI":"10.1371\/journal.pcbi.0040012","article-title":"Computational methods for protein identification from mass spectrometry data","volume":"4","author":"McHugh","year":"2008","journal-title":"PLoS Comput. Biol."},{"key":"2023020116164398100_btu717-B22","doi-asserted-by":"crossref","first-page":"4646","DOI":"10.1021\/ac0341261","article-title":"A statistical model for identifying proteins by tandem mass spectrometry","volume":"75","author":"Nesvizhskii","year":"2003","journal-title":"Anal. Chem."},{"key":"2023020116164398100_btu717-B23","doi-asserted-by":"crossref","first-page":"e1002296","DOI":"10.1371\/journal.pcbi.1002296","article-title":"Computational and statistical analysis of protein mass spectrometry data","volume":"8","author":"Noble","year":"2012","journal-title":"PLoS Comput. Biol."},{"key":"2023020116164398100_btu717-B24","doi-asserted-by":"crossref","first-page":"3022","DOI":"10.1021\/pr800127y","article-title":"Rapid and accurate peptide identification from tandem mass spectra","volume":"7","author":"Park","year":"2008","journal-title":"J. Proteome Res."},{"key":"2023020116164398100_btu717-B25","doi-asserted-by":"crossref","first-page":"527","DOI":"10.1074\/mcp.T600049-MCP200","article-title":"EBP, a program for protein identification using multiple tandem mass spectrometry datasets","volume":"6","author":"Price","year":"2007","journal-title":"Mol. Cell Proteomics"},{"key":"2023020116164398100_btu717-B26","doi-asserted-by":"crossref","first-page":"8880","DOI":"10.1073\/pnas.88.20.8880","article-title":"Distribution of glutamine and asparagine residues and their near neighbors in peptides and proteins","volume":"88","author":"Robinson","year":"1991","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023020116164398100_btu717-B27","doi-asserted-by":"crossref","first-page":"1265","DOI":"10.1002\/pmic.200900437","article-title":"Scaffold: a bioinformatic tool for validating MS\/MS-based proteomic studies","volume":"10","author":"Searle","year":"2010","journal-title":"Proteomics"},{"key":"2023020116164398100_btu717-B28","doi-asserted-by":"crossref","first-page":"1652","DOI":"10.1093\/bioinformatics\/btn232","article-title":"On E-values for tandem MS scoring schemes","volume":"24","author":"Segal","year":"2008","journal-title":"Bioinformatics"},{"key":"2023020116164398100_btu717-B29","doi-asserted-by":"crossref","first-page":"3","DOI":"10.4310\/SII.2012.v5.n1.a2","article-title":"A review of statistical methods for protein identification using tandem mass spectrometry","volume":"5","author":"Serang","year":"2012","journal-title":"Stat Interface"},{"key":"2023020116164398100_btu717-B30","doi-asserted-by":"crossref","first-page":"807","DOI":"10.1074\/mcp.O112.022863","article-title":"A non-parametric cutout index for robust evaluation of identified proteins","volume":"12","author":"Serang","year":"2013","journal-title":"Mol. Cell Proteomics"},{"key":"2023020116164398100_btu717-B31","doi-asserted-by":"crossref","DOI":"10.1074\/mcp.M111.007690","article-title":"iProphet: multi-level integrative analysis of shotgun proteomic data improves peptide and protein identification rates and error estimates","volume":"10","author":"Shteynberg","year":"2011","journal-title":"Mol. Cell Proteomics"},{"key":"2023020116164398100_btu717-B32","first-page":"608","article-title":"Statistical \u201cdiscoveries\u201d and effect-size estimation","volume":"84","author":"Sori\u0107","year":"1989","journal-title":"J. Am. Stat. Assoc."},{"key":"2023020116164398100_btu717-B33","doi-asserted-by":"crossref","first-page":"1128","DOI":"10.1093\/bioinformatics\/btr089","article-title":"Assigning spectrum-specific P-values to protein identifications by mass spectrometry","volume":"27","author":"Spirin","year":"2011","journal-title":"Bioinformatics"},{"key":"2023020116164398100_btu717-B34","doi-asserted-by":"crossref","first-page":"146","DOI":"10.1021\/ac801664q","article-title":"Decoy methods for assessing false positives and false discovery rates in shotgun proteomics","volume":"81","author":"Wang","year":"2009","journal-title":"Anal. Chem."},{"key":"2023020116164398100_btu717-B35","doi-asserted-by":"crossref","first-page":"1368","DOI":"10.1111\/j.1420-9101.2005.00917.x","article-title":"Combining probability from independent tests: the weighted Z-method is superior to Fisher\u2019s approach","volume":"18","author":"Whitlock","year":"2005","journal-title":"J. Evol. Biol."},{"key":"2023020116164398100_btu717-B36","doi-asserted-by":"crossref","first-page":"1002","DOI":"10.1021\/pr049920x","article-title":"DBParser: web-based software for shotgun proteomic data analyses","volume":"3","author":"Yang","year":"2004","journal-title":"J. Proteome Res."},{"key":"2023020116164398100_btu717-B37","doi-asserted-by":"crossref","first-page":"5966","DOI":"10.1093\/nar\/gkl731","article-title":"Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches","volume":"34","author":"Yu","year":"2006","journal-title":"Nucleic Acids Res."},{"key":"2023020116164398100_btu717-B38","doi-asserted-by":"crossref","first-page":"2482","DOI":"10.1021\/ac991363o","article-title":"ProFound: an expert system for protein identification using mass spectrometric peptide mapping information","volume":"72","author":"Zhang","year":"2000","journal-title":"Anal. Chem."},{"key":"2023020116164398100_btu717-B39","doi-asserted-by":"crossref","first-page":"2343","DOI":"10.1021\/cr3003533","article-title":"Protein analysis by shotgun\/bottom-up proteomics","volume":"113","author":"Zhang","year":"2013","journal-title":"Chem. Rev."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/31\/5\/699\/49011106\/bioinformatics_31_5_699.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/31\/5\/699\/49011106\/bioinformatics_31_5_699.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,5,6]],"date-time":"2025-05-06T03:04:52Z","timestamp":1746500692000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/31\/5\/699\/318043"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,10,31]]},"references-count":39,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2015,3,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btu717","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2015,3,1]]},"published":{"date-parts":[[2014,10,31]]}}}