{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,7]],"date-time":"2026-04-07T19:42:45Z","timestamp":1775590965301,"version":"3.50.1"},"reference-count":37,"publisher":"Oxford University Press (OUP)","issue":"13","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2010,7,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: The standard approach to identifying peptides based on accurate mass and elution time (AMT) compares profiles obtained from a high resolution mass spectrometer to a database of peptides previously identified from tandem mass spectrometry (MS\/MS) studies. It would be advantageous, with respect to both accuracy and cost, to only search for those peptides that are detectable by MS (proteotypic).<\/jats:p><jats:p>Results: We present a support vector machine (SVM) model that uses a simple descriptor space based on 35 properties of amino acid content, charge, hydrophilicity and polarity for the quantitative prediction of proteotypic peptides. Using three independently derived AMT databases (Shewanella oneidensis, Salmonella typhimurium, Yersinia pestis) for training and validation within and across species, the SVM resulted in an average accuracy measure of \u223c0.83 with an SD of &amp;lt;0.038. Furthermore, we demonstrate that these results are achievable with a small set of 13 variables and can achieve high proteome coverage.<\/jats:p><jats:p>Availability: \u00a0http:\/\/omics.pnl.gov\/software\/STEPP.php<\/jats:p><jats:p>Contact: \u00a0bj@pnl.gov<\/jats:p><jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btq251","type":"journal-article","created":{"date-parts":[[2010,6,17]],"date-time":"2010-06-17T12:26:31Z","timestamp":1276777591000},"page":"1677-1683","source":"Crossref","is-referenced-by-count":39,"title":["A support vector machine model for the prediction of proteotypic peptides for accurate mass and time proteomics"],"prefix":"10.1093","volume":"26","author":[{"given":"Bobbie-Jo M.","family":"Webb-Robertson","sequence":"first","affiliation":[{"name":"1 Computational Biology & Bioinformatics, 2 Scientific Data Management, 3 Applied Computer Science and 4 Biological Separations and Mass Spectrometry, Pacific Northwest National Laboratory"}]},{"given":"William R.","family":"Cannon","sequence":"additional","affiliation":[{"name":"1 Computational Biology & Bioinformatics, 2 Scientific Data Management, 3 Applied Computer Science and 4 Biological Separations and Mass Spectrometry, Pacific Northwest National Laboratory"}]},{"given":"Christopher S.","family":"Oehmen","sequence":"additional","affiliation":[{"name":"1 Computational Biology & Bioinformatics, 2 Scientific Data Management, 3 Applied Computer Science and 4 Biological Separations and Mass Spectrometry, Pacific Northwest National Laboratory"}]},{"given":"Anuj R.","family":"Shah","sequence":"additional","affiliation":[{"name":"1 Computational Biology & Bioinformatics, 2 Scientific Data Management, 3 Applied Computer Science and 4 Biological Separations and Mass Spectrometry, Pacific Northwest National Laboratory"}]},{"given":"Vidhya","family":"Gurumoorthi","sequence":"additional","affiliation":[{"name":"1 Computational Biology & Bioinformatics, 2 Scientific Data Management, 3 Applied Computer Science and 4 Biological Separations and Mass Spectrometry, Pacific Northwest National Laboratory"}]},{"given":"Mary S.","family":"Lipton","sequence":"additional","affiliation":[{"name":"1 Computational Biology & Bioinformatics, 2 Scientific Data Management, 3 Applied Computer Science and 4 Biological Separations and Mass Spectrometry, Pacific Northwest National Laboratory"}]},{"given":"Katrina M.","family":"Waters","sequence":"additional","affiliation":[{"name":"1 Computational Biology & Bioinformatics, 2 Scientific Data Management, 3 Applied Computer Science and 4 Biological Separations and Mass Spectrometry, Pacific Northwest National Laboratory"}]}],"member":"286","published-online":{"date-parts":[[2010,6,16]]},"reference":[{"key":"2023012507564677900_B1","doi-asserted-by":"crossref","first-page":"1450","DOI":"10.1074\/mcp.M600139-MCP200","article-title":"Analysis of the Salmonella typhimurium proteome through environmental response toward infectious conditions","volume":"5","author":"Adkins","year":"2006","journal-title":"Mol. Cell Proteomics"},{"key":"2023012507564677900_B2","first-page":"409","article-title":"Advancement in protein inference from shotgun proteomics using peptide detectability","author":"Alves","year":"2007","journal-title":"Pac. Symp. Biocomput."},{"key":"2023012507564677900_B3","doi-asserted-by":"crossref","first-page":"137","DOI":"10.1021\/pr0255654","article-title":"A new algorithm for the evaluation of shotgun peptide sequencing in proteomics: support vector machine classification of peptide MS\/MS spectra and SEQUEST scores","volume":"2","author":"Anderson","year":"2003","journal-title":"J. Proteome Res."},{"key":"2023012507564677900_B4","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1186\/1477-5956-4-1","article-title":"Estimating probabilities of peptide database identifications to LC-FTICR-MS observations","volume":"4","author":"Anderson","year":"2006","journal-title":"Proteome Sci."},{"key":"2023012507564677900_B5","doi-asserted-by":"crossref","DOI":"10.1093\/oso\/9780198538493.001.0001","volume-title":"Neural Networks for Pattern Recognition.","author":"Bishop","year":"1995"},{"key":"2023012507564677900_B6","doi-asserted-by":"crossref","first-page":"1844","DOI":"10.1002\/rcm.1992","article-title":"The use of proteotypic peptide libraries for protein identification","volume":"19","author":"Craig","year":"2005","journal-title":"Rapid Commun. Mass Spectrom."},{"key":"2023012507564677900_B7","doi-asserted-by":"crossref","DOI":"10.1017\/CBO9780511801389","volume-title":"An Introduction to Support Vector Machines and Other Kernel-based Learning Methods.","author":"Cristianini","year":"2000"},{"key":"2023012507564677900_B8","first-page":"563","article-title":"MudPIT: multidimensional protein identification technology","volume":"43","author":"Delahunty","year":"2007","journal-title":"BioTechniques"},{"key":"2023012507564677900_B9","doi-asserted-by":"crossref","first-page":"D655","DOI":"10.1093\/nar\/gkj040","article-title":"The PeptideAtlas project","volume":"34","author":"Desiere","year":"2006","journal-title":"Nucleic Acids Res."},{"key":"2023012507564677900_B10","doi-asserted-by":"crossref","first-page":"125","DOI":"10.1016\/0022-2836(84)90309-7","article-title":"Analysis of membrane and surface protein sequences with the hydrophobic moment plot","volume":"179","author":"Eisenberg","year":"1984","journal-title":"J. Mol. Biol."},{"key":"2023012507564677900_B11","doi-asserted-by":"crossref","first-page":"862","DOI":"10.1126\/science.185.4154.862","article-title":"Amino acid difference formula to help explain protein evolution","volume":"185","author":"Grantham","year":"1974","journal-title":"Science"},{"key":"2023012507564677900_B12","doi-asserted-by":"crossref","first-page":"389","DOI":"10.1023\/A:1012487302797","article-title":"Gene selection for cancer classification using support vector machines","volume":"46","author":"Guyon","year":"2002","journal-title":"Mach. Learn."},{"key":"2023012507564677900_B13","doi-asserted-by":"crossref","first-page":"3008","DOI":"10.1021\/pr060179y","article-title":"Biomarker candidate identification in Yersinia pestis using organism-wide semiquantitative proteomics","volume":"5","author":"Hixson","year":"2006","journal-title":"J. Proteome Res."},{"key":"2023012507564677900_B14","doi-asserted-by":"crossref","first-page":"3824","DOI":"10.1073\/pnas.78.6.3824","article-title":"Prediction of protein antigenic determinants from amino acid sequences","volume":"78","author":"Hopp","year":"1981","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012507564677900_B15","doi-asserted-by":"crossref","first-page":"5800","DOI":"10.1021\/ac0480949","article-title":"Statistical characterization of the charge state and residue dependence of low-energy CID peptide dissociation patterns","volume":"77","author":"Huang","year":"2005","journal-title":"Anal. Chem."},{"key":"2023012507564677900_B16","doi-asserted-by":"crossref","first-page":"D659","DOI":"10.1093\/nar\/gkj138","article-title":"PRIDE: a public repository of protein and peptide identifications for the proteomics community","volume":"34","author":"Jones","year":"2006","journal-title":"Nucleic Acids Res."},{"key":"2023012507564677900_B17","doi-asserted-by":"crossref","first-page":"1783","DOI":"10.1002\/pmic.200500500","article-title":"PRISM: a data management system for high-throughput proteomics","volume":"6","author":"Kiebel","year":"2006","journal-title":"Proteomics"},{"key":"2023012507564677900_B18","doi-asserted-by":"crossref","first-page":"577","DOI":"10.1038\/nrm1683","article-title":"Scoring proteomes with proteotypic peptide probes","volume":"6","author":"Kuster","year":"2005","journal-title":"Nat. Rev. Mol. Cell Biol."},{"key":"2023012507564677900_B19","doi-asserted-by":"crossref","first-page":"105","DOI":"10.1016\/0022-2836(82)90515-0","article-title":"A simple method for displaying the hydropathic character of a protein","volume":"157","author":"Kyte","year":"1982","journal-title":"J. Mol. Biol."},{"key":"2023012507564677900_B20","doi-asserted-by":"crossref","first-page":"655","DOI":"10.1002\/pmic.200600625","article-title":"Development and validation of a spectral library searching method for peptide identification from MS\/MS","volume":"7","author":"Lam","year":"2007","journal-title":"Proteomics"},{"key":"2023012507564677900_B21","doi-asserted-by":"crossref","first-page":"113","DOI":"10.1002\/0471973165.ch9","article-title":"AMT tag approach to proteomic characterization of Deinococcus radiodurans and Shewanella oneidensis","volume":"49","author":"Lipton","year":"2006","journal-title":"Methods Biochem. Anal."},{"key":"2023012507564677900_B22","doi-asserted-by":"crossref","first-page":"117","DOI":"10.1038\/nbt1270","article-title":"Absolute protein expression profiling estimates the relative contributions of transcriptional and translational regulation","volume":"25","author":"Lu","year":"2007","journal-title":"Nat. Biotechnol."},{"key":"2023012507564677900_B23","doi-asserted-by":"crossref","first-page":"125","DOI":"10.1038\/nbt1275","article-title":"Computational prediction of proteotypic peptides for quantitative proteomics","volume":"25","author":"Mallick","year":"2007","journal-title":"Nat. Biotechnol."},{"key":"2023012507564677900_B24","doi-asserted-by":"crossref","first-page":"2685","DOI":"10.1021\/pr070146y","article-title":"A platform for accurate mass and time analyses of mass spectrometry data","volume":"6","author":"May","year":"2007","journal-title":"J. Proteome Res."},{"key":"2023012507564677900_B25","doi-asserted-by":"crossref","first-page":"4646","DOI":"10.1021\/ac0341261","article-title":"A statistical model for identifying proteins by tandem mass spectrometry","volume":"75","author":"Nesvizhskii","year":"2003","journal-title":"Anal. Chem."},{"key":"2023012507564677900_B26","doi-asserted-by":"crossref","first-page":"401","DOI":"10.1089\/10665270252935539","article-title":"Learning gene functional classifications from multiple data types","volume":"9","author":"Pavlidis","year":"2002","journal-title":"J. Comput. Biol."},{"key":"2023012507564677900_B27","doi-asserted-by":"crossref","first-page":"5026","DOI":"10.1021\/ac060143p","article-title":"Improved peptide elution time prediction for reversed-phase liquid chromatography-MS by incorporating peptide sequence information","volume":"78","author":"Petritis","year":"2006","journal-title":"Anal. Chem."},{"key":"2023012507564677900_B28","doi-asserted-by":"crossref","first-page":"621","DOI":"10.1016\/0022-2836(88)90642-0","article-title":"Hydrophobicity of the peptide C=OH-N hydrogen-bonded group","volume":"201","author":"Roseman","year":"1988","journal-title":"J. Mol. Biol."},{"key":"2023012507564677900_B29","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1089\/15362310252780843","article-title":"The use of accurate mass tags for high-throughput microbial proteomics","volume":"6","author":"Smith","year":"2002","journal-title":"Omics"},{"key":"2023012507564677900_B30","doi-asserted-by":"crossref","first-page":"513","DOI":"10.1002\/1615-9861(200205)2:5<513::AID-PROT513>3.0.CO;2-W","article-title":"An accurate mass tag strategy for quantitative and high-throughput proteome measurements","volume":"2","author":"Smith","year":"2002","journal-title":"Proteomics"},{"key":"2023012507564677900_B31","doi-asserted-by":"crossref","first-page":"e481","DOI":"10.1093\/bioinformatics\/btl237","article-title":"A computational approach toward label-free protein quantification using predicted peptide detectability","volume":"22","author":"Tang","year":"2006","journal-title":"Bioinformatics"},{"key":"2023012507564677900_B32","doi-asserted-by":"crossref","DOI":"10.1007\/978-1-4757-2440-0","volume-title":"The Nature of Statistical Learning Theory.","author":"Vapnik","year":"1995"},{"key":"2023012507564677900_B33","doi-asserted-by":"crossref","first-page":"242","DOI":"10.1038\/85686","article-title":"Large-scale analysis of the yeast proteome by multidimensional protein identification technology","volume":"19","author":"Washburn","year":"2001","journal-title":"Nat. Biotechnol."},{"key":"2023012507564677900_B34","doi-asserted-by":"crossref","first-page":"304","DOI":"10.1093\/bib\/bbm023","article-title":"Current trends in computational inference from mass spectrometry-based proteomics","volume":"8","author":"Webb-Robertson","year":"2007","journal-title":"Brief. Bioinform."},{"key":"2023012507564677900_B35","doi-asserted-by":"crossref","first-page":"1426","DOI":"10.1021\/ac00104a020","article-title":"Method to correlate tandem mass spectra of modified peptides to amino acid sequences in the protein database","volume":"67","author":"Yates","year":"1995","journal-title":"Anal. Chem."},{"key":"2023012507564677900_B36","doi-asserted-by":"crossref","first-page":"3557","DOI":"10.1021\/ac980122y","article-title":"Method to compare collision-induced dissociation spectra of peptides: potential for library searching and subtractive analysis","volume":"70","author":"Yates","year":"1998","journal-title":"Anal. Chem."},{"key":"2023012507564677900_B37","doi-asserted-by":"crossref","first-page":"170","DOI":"10.1016\/0022-5193(68)90069-6","article-title":"The characterization of amino acid sequences in proteins by statistical methods","volume":"21","author":"Zimmerman","year":"1968","journal-title":"J. Theor. Biol."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/13\/1677\/48851681\/bioinformatics_26_13_1677.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/13\/1677\/48851681\/bioinformatics_26_13_1677.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,3,27]],"date-time":"2024-03-27T17:24:40Z","timestamp":1711560280000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/26\/13\/1677\/201443"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,6,16]]},"references-count":37,"journal-issue":{"issue":"13","published-print":{"date-parts":[[2010,7,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btq251","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2010,7,1]]},"published":{"date-parts":[[2010,6,16]]}}}