{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,9,5]],"date-time":"2024-09-05T23:38:46Z","timestamp":1725579526076},"reference-count":40,"publisher":"World Scientific Pub Co Pte Lt","issue":"01","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["J. Bioinform. Comput. Biol."],"published-print":{"date-parts":[[2008,2]]},"abstract":"<jats:p> Tandem mass spectrometry (MS\/MS) combined with protein database searching has been widely used in protein identification. A validation procedure is generally required to reduce the number of false positives. Advanced tools using statistical and machine learning approaches may provide faster and more accurate validation than manual inspection and empirical filtering criteria. In this study, we use two feature selection algorithms based on random forest and support vector machine to identify peptide properties that can be used to improve validation models. We demonstrate that an improved model based on an optimized set of features reduces the number of false positives by 58% relative to the model which used only search engine scores, at the same sensitivity score of 0.8. In addition, we develop classification models based on the physicochemical properties and protein sequence environment of these peptides without using search engine scores. The performance of the best model based on the support vector machine algorithm is at 0.8 AUC, 0.78 accuracy, and 0.7 specificity, suggesting a reasonably accurate classification. The identified properties important to fragmentation and ionization can be either used in independent validation tools or incorporated into peptide sequencing and database search algorithms to improve existing software programs. <\/jats:p>","DOI":"10.1142\/s0219720008003345","type":"journal-article","created":{"date-parts":[[2008,3,6]],"date-time":"2008-03-06T05:43:22Z","timestamp":1204782202000},"page":"223-240","source":"Crossref","is-referenced-by-count":9,"title":["FEATURE SELECTION IN VALIDATING MASS SPECTROMETRY DATABASE SEARCH RESULTS"],"prefix":"10.1142","volume":"06","author":[{"given":"JIANWEN","family":"FANG","sequence":"first","affiliation":[{"name":"Bioinformatics Core Facility &amp; Information and Telecommunication Technology Center, University of Kansas, 2099 Constant Dr., Lawrence, Kansas 66047, USA"}]},{"given":"YINGHUA","family":"DONG","sequence":"additional","affiliation":[{"name":"Bioinformatics Core Facility, University of Kansas, 2099 Constant Dr., Lawrence, Kansas 66047, USA"}]},{"given":"TODD D.","family":"WILLIAMS","sequence":"additional","affiliation":[{"name":"Analytical Proteomics Laboratory, University of Kansas, 2099 Constant Dr., Lawrence, Kansas 66047, USA"}]},{"given":"GERALD H.","family":"LUSHINGTON","sequence":"additional","affiliation":[{"name":"Molecular Graphics and Modeling Laboratory &amp; Bioinformatics Core Facility, University of Kansas, 2099 Constant Dr., Lawrence, Kansas 66047, USA"}]}],"member":"219","published-online":{"date-parts":[[2011,11,21]]},"reference":[{"key":"rf1","doi-asserted-by":"publisher","DOI":"10.1016\/1044-0305(94)80016-2"},{"key":"rf2","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1522-2683(19991201)20:18<3551::AID-ELPS3551>3.0.CO;2-2"},{"key":"rf3","doi-asserted-by":"publisher","DOI":"10.1021\/ac9810516"},{"key":"rf4","doi-asserted-by":"publisher","DOI":"10.1093\/bioinformatics\/bth092"},{"key":"rf5","doi-asserted-by":"publisher","DOI":"10.1002\/pmic.200402091"},{"key":"rf6","doi-asserted-by":"publisher","DOI":"10.1016\/j.chroma.2005.04.059"},{"key":"rf7","doi-asserted-by":"publisher","DOI":"10.1016\/S1044-0305(02)00352-5"},{"key":"rf8","doi-asserted-by":"publisher","DOI":"10.1021\/ac025747h"},{"key":"rf9","doi-asserted-by":"publisher","DOI":"10.1021\/ac035229m"},{"key":"rf10","doi-asserted-by":"publisher","DOI":"10.1016\/j.jasms.2004.02.011"},{"key":"rf11","doi-asserted-by":"publisher","DOI":"10.1021\/pr0255654"},{"key":"rf12","doi-asserted-by":"publisher","DOI":"10.1074\/mcp.M500233-MCP200"},{"key":"rf13","doi-asserted-by":"publisher","DOI":"10.1038\/nbt930"},{"key":"rf14","doi-asserted-by":"publisher","DOI":"10.1002\/pmic.200300656"},{"key":"rf16","doi-asserted-by":"publisher","DOI":"10.1074\/mcp.M400120-MCP200"},{"key":"rf17","doi-asserted-by":"publisher","DOI":"10.1002\/rcm.1992"},{"key":"rf18","doi-asserted-by":"publisher","DOI":"10.1021\/ac060279n"},{"key":"rf19","doi-asserted-by":"publisher","DOI":"10.1021\/ac034616t"},{"key":"rf20","doi-asserted-by":"publisher","DOI":"10.1021\/ja9542193"},{"key":"rf21","doi-asserted-by":"publisher","DOI":"10.1021\/ac0351163"},{"key":"rf22","doi-asserted-by":"publisher","DOI":"10.1021\/ac026122m"},{"key":"rf23","doi-asserted-by":"publisher","DOI":"10.1038\/nbt1275"},{"key":"rf24","first-page":"1157","volume":"3","author":"Guyon I.","journal-title":"J. Mach. Learn. Res."},{"key":"rf25","doi-asserted-by":"publisher","DOI":"10.1023\/A:1012487302797"},{"key":"rf26","doi-asserted-by":"publisher","DOI":"10.1186\/1471-2105-7-3"},{"key":"rf27","doi-asserted-by":"publisher","DOI":"10.1016\/0005-2795(79)90498-7"},{"key":"rf28","doi-asserted-by":"publisher","DOI":"10.1074\/jbc.M010402200"},{"key":"rf29","doi-asserted-by":"publisher","DOI":"10.1016\/0022-2836(84)90309-7"},{"key":"rf30","doi-asserted-by":"publisher","DOI":"10.1093\/protein\/1.4.289"},{"key":"rf31","doi-asserted-by":"publisher","DOI":"10.1111\/j.1399-3011.1988.tb01261.x"},{"key":"rf32","doi-asserted-by":"publisher","DOI":"10.1021\/bi00405a042"},{"key":"rf33","volume-title":"Statistical Learning Theory","author":"Vapnik V.","year":"1998"},{"key":"rf34","doi-asserted-by":"publisher","DOI":"10.1023\/A:1010933404324"},{"key":"rf36","doi-asserted-by":"publisher","DOI":"10.1021\/ci034006u"},{"key":"rf37","first-page":"1229","volume":"3","author":"Bi J.","journal-title":"J. Mach. Learn. Res."},{"key":"rf38","doi-asserted-by":"publisher","DOI":"10.1021\/ci0203848"},{"key":"rf39","doi-asserted-by":"publisher","DOI":"10.1109\/TNN.2003.811356"},{"key":"rf40","doi-asserted-by":"publisher","DOI":"10.1177\/1087057105284334"},{"key":"rf41","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4899-4493-1"},{"key":"rf42","doi-asserted-by":"publisher","DOI":"10.1002\/1096-9888(200012)35:12<1399::AID-JMS86>3.0.CO;2-R"}],"container-title":["Journal of Bioinformatics and Computational Biology"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.worldscientific.com\/doi\/pdf\/10.1142\/S0219720008003345","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,8,6]],"date-time":"2019-08-06T22:41:24Z","timestamp":1565131284000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.worldscientific.com\/doi\/abs\/10.1142\/S0219720008003345"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,2]]},"references-count":40,"journal-issue":{"issue":"01","published-online":{"date-parts":[[2011,11,21]]},"published-print":{"date-parts":[[2008,2]]}},"alternative-id":["10.1142\/S0219720008003345"],"URL":"https:\/\/doi.org\/10.1142\/s0219720008003345","relation":{},"ISSN":["0219-7200","1757-6334"],"issn-type":[{"value":"0219-7200","type":"print"},{"value":"1757-6334","type":"electronic"}],"subject":[],"published":{"date-parts":[[2008,2]]}}}