{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,30]],"date-time":"2026-01-30T06:13:17Z","timestamp":1769753597892,"version":"3.49.0"},"reference-count":26,"publisher":"Oxford University Press (OUP)","issue":"3","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2007,2,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: Tandem mass-spectrometry of trypsin digests, followed by database searching, is one of the most popular approaches in high-throughput proteomics studies. Peptides are considered identified if they pass certain scoring thresholds. To avoid false positive protein identification, \u22652 unique peptides identified within a single protein are generally recommended. Still, in a typical high-throughput experiment, hundreds of proteins are identified only by a single peptide. We introduce here a method for distinguishing between true and false identifications among single-hit proteins. The approach is based on randomized database searching and usage of logistic regression models with cross-validation. This approach is implemented to analyze three bacterial samples enabling recovery 68\u201398% of the correct single-hit proteins with an error rate of &amp;lt;2%. This results in a 22\u201365% increase in number of identified proteins. Identifying true single-hit proteins will lead to discovering many crucial regulators, biomarkers and other low abundance proteins.<\/jats:p><jats:p>Contact: \u00a0ekolker@biatech.org<\/jats:p><jats:p>Supplementary information: Supplementary Data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btl595","type":"journal-article","created":{"date-parts":[[2006,11,23]],"date-time":"2006-11-23T01:24:28Z","timestamp":1164245068000},"page":"277-280","source":"Crossref","is-referenced-by-count":48,"title":["A predictive model for identifying proteins by a single peptide match"],"prefix":"10.1093","volume":"23","author":[{"given":"Roger","family":"Higdon","sequence":"first","affiliation":[{"name":"The BIATECH Institute, Bothell 1 \u00a0 1 \u00a0 \u00a0 WA 98011, USA"}]},{"given":"Eugene","family":"Kolker","sequence":"additional","affiliation":[{"name":"The BIATECH Institute, Bothell 1 \u00a0 1 \u00a0 \u00a0 WA 98011, USA"},{"name":"Division of Biomedical and Health Informatics, University of Washington 2 \u00a0 2 \u00a0 \u00a0 Seattle, WA 98195, USA"}]}],"member":"286","published-online":{"date-parts":[[2006,11,22]]},"reference":[{"key":"2023041109272173000_","doi-asserted-by":"crossref","first-page":"198","DOI":"10.1038\/nature01511","article-title":"Mass spectrometry-based proteomics","volume":"422","author":"Aebersold","year":"2003","journal-title":"Nature"},{"key":"2023041109272173000_","doi-asserted-by":"crossref","DOI":"10.1038\/nbt1240","article-title":"A probability-based approach for high-throughput protein phosphorylation analysis and site localization","author":"Beausoleil","year":"2006","journal-title":"Nat. Biotechnol"},{"key":"2023041109272173000_","doi-asserted-by":"crossref","first-page":"787","DOI":"10.1074\/mcp.E600005-MCP200","article-title":"Reporting protein identification data: the next generation of guidelines","volume":"5","author":"Bradshaw","year":"2006","journal-title":"Mol. Cell Proteom."},{"key":"2023041109272173000_","doi-asserted-by":"crossref","first-page":"1082","DOI":"10.1021\/pr049946o","article-title":"Potential for false positive identifications from large databases through tandem mass spectrometry","volume":"3","author":"Cargile","year":"2004","journal-title":"J. Proteome. Res."},{"key":"2023041109272173000_","doi-asserted-by":"crossref","first-page":"531","DOI":"10.1074\/mcp.T400006-MCP200","article-title":"The need for guidelines in publication of peptide and protein identification data: Working Group on Publication Guidelines for Peptide and Protein Identification Data","volume":"3","author":"Carr","year":"2004","journal-title":"Mol. Cell Proteom."},{"key":"2023041109272173000_","volume-title":"Of URFs and ORFs: A Primer on How to Analyze Derived Amino Acid Sequences","author":"Doolittle","year":"1986"},{"key":"2023041109272173000_","doi-asserted-by":"crossref","first-page":"3120","DOI":"10.1002\/pmic.200401140","article-title":"Global detection and characterization of hypothetical proteins in Shewanella oneidensis MR-1 using LC-MS based proteomics","volume":"5","author":"Elias","year":"2005","journal-title":"Proteomics"},{"key":"2023041109272173000_","doi-asserted-by":"crossref","first-page":"667","DOI":"10.1038\/nmeth785","article-title":"Comparative evaluation of mass spectrometry platforms used in large-scale proteomics investigations","volume":"2","author":"Elias","year":"2005","journal-title":"Nat. Methods"},{"key":"2023041109272173000_","doi-asserted-by":"crossref","first-page":"976","DOI":"10.1016\/1044-0305(94)80016-2","article-title":"An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database","volume":"5","author":"Eng","year":"1994","journal-title":"J. Am. Soc. Mass Spectr."},{"key":"2023041109272173000_","doi-asserted-by":"crossref","DOI":"10.1002\/9780470316771","volume-title":"Statistical Intervals: A Guide for Practitioners","author":"Hahn","year":"1991"},{"key":"2023041109272173000_","doi-asserted-by":"crossref","first-page":"364","DOI":"10.1089\/omi.2005.9.364","article-title":"Randomized databases for tandem mass spectrometry peptide and protein identification","volume":"9","author":"Higdon","year":"2005","journal-title":"Omics"},{"key":"2023041109272173000_","doi-asserted-by":"crossref","first-page":"357","DOI":"10.1089\/omi.2004.8.357","article-title":"LIP index for peptide classification using MS\/MS and SEQUEST search via logistic regression","volume":"8","author":"Higdon","year":"2004","journal-title":"Omics"},{"key":"2023041109272173000_","doi-asserted-by":"crossref","first-page":"5383","DOI":"10.1021\/ac025747h","article-title":"Empirical statistical model to estimate the accuracy of peptide identifications made by MS\/MS and database search","volume":"74","author":"Keller","year":"2002","journal-title":"Anal. Chem."},{"key":"2023041109272173000_","doi-asserted-by":"crossref","first-page":"229","DOI":"10.1016\/j.tim.2006.03.005","article-title":"Protein identification and expression analysis using mass spectrometry","volume":"14","author":"Kolker","year":"2006","journal-title":"Trends Microbiol."},{"key":"2023041109272173000_","doi-asserted-by":"crossref","first-page":"2099","DOI":"10.1073\/pnas.0409111102","article-title":"Global profiling of Shewanella oneidensis MR-1: expression of hypothetical genes and improved functional annotations","volume":"102","author":"Kolker","year":"2005","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023041109272173000_","volume-title":"Generalized Linear Models","author":"McCullagh","year":"1999"},{"key":"2023041109272173000_","doi-asserted-by":"crossref","first-page":"4646","DOI":"10.1021\/ac0341261","article-title":"A statistical model for identifying proteins by tandem mass spectrometry","volume":"75","author":"Nesvizhskii","year":"2003","journal-title":"Anal. Chem."},{"key":"2023041109272173000_","doi-asserted-by":"crossref","first-page":"3226","DOI":"10.1002\/pmic.200500358","article-title":"Overview of the HUPO Plasma Proteome Project: results from the pilot phase with 35 collaborating laboratories and multiple analytical groups, generating a core dataset of 3020 proteins and a publicly-available database","volume":"5","author":"Omenn","year":"2005","journal-title":"Proteomics"},{"key":"2023041109272173000_","doi-asserted-by":"crossref","first-page":"4436","DOI":"10.1002\/pmic.200600453","article-title":"HUPO Publications Committee Meeting: 21 April 2006, San Francisco, CA","volume":"6","author":"Orchard","year":"2006","journal-title":"Proteomics"},{"key":"2023041109272173000_","doi-asserted-by":"crossref","first-page":"161","DOI":"10.1021\/pr015518w","article-title":"Biomarker discovery in urine by proteomics","volume":"1","author":"Pang","year":"2002","journal-title":"J. Proteome Res."},{"key":"2023041109272173000_","doi-asserted-by":"crossref","first-page":"3551","DOI":"10.1002\/(SICI)1522-2683(19991201)20:18<3551::AID-ELPS3551>3.0.CO;2-2","article-title":"Probability-based protein identification by searching sequence databases using mass spectrometry data","volume":"20","author":"Perkins","year":"1999","journal-title":"Electrophoresis"},{"key":"2023041109272173000_","doi-asserted-by":"crossref","first-page":"79","DOI":"10.1089\/153623104773547507","article-title":"Standard mixtures for proteome studies","volume":"8","author":"Purvine","year":"2004","journal-title":"Omics"},{"key":"2023041109272173000_","doi-asserted-by":"crossref","first-page":"572","DOI":"10.1002\/pmic.200400942","article-title":"Comparative proteome analyses of human plasma following in vivo lipopolysaccharide administration using multidimensional separations coupled with tandem mass spectrometry","volume":"5","author":"Qian","year":"2005","journal-title":"Proteomics"},{"key":"2023041109272173000_","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511812651","volume-title":"Pattern Recognition and Neural Networks","author":"Ripley","year":"1996"},{"key":"2023041109272173000_","doi-asserted-by":"crossref","first-page":"181","DOI":"10.1128\/jb.164.1.181-186.1985","article-title":"Intracellular localization of phospholipid transfer activity in Rhodopseudomonas sphaeroides and a possible role in membrane biogenesis","volume":"164","author":"Tai","year":"1985","journal-title":"J. Bacteriol."},{"key":"2023041109272173000_","doi-asserted-by":"crossref","first-page":"242","DOI":"10.1038\/85686","article-title":"Large-scale analysis of the yeast proteome by multidimensional protein identification technology","volume":"19","author":"Washburn","year":"2001","journal-title":"Nat. Biotechnol."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/23\/3\/277\/49829220\/bioinformatics_23_3_277.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/23\/3\/277\/49829220\/bioinformatics_23_3_277.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,2,8]],"date-time":"2024-02-08T12:35:20Z","timestamp":1707395720000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/23\/3\/277\/235691"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2006,11,22]]},"references-count":26,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2007,2,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btl595","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2007,2,1]]},"published":{"date-parts":[[2006,11,22]]}}}