{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,12]],"date-time":"2026-03-12T00:17:05Z","timestamp":1773274625033,"version":"3.50.1"},"reference-count":21,"publisher":"Oxford University Press (OUP)","issue":"24","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2010,12,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Enrichment tests are used in high-throughput experimentation to measure the association between gene or protein expression and membership in groups or pathways. The Fisher's exact test is commonly used. We specifically examined the associations produced by the Fisher test between protein identification by mass spectrometry discovery proteomics, and their Gene Ontology (GO) term assignments in a large yeast dataset. We found that direct application of the Fisher test is misleading in proteomics due to the bias in mass spectrometry to preferentially identify proteins based on their biochemical properties. False inference about associations can be made if this bias is not corrected. Our method adjusts Fisher tests for these biases and produces associations more directly attributable to protein expression rather than experimental bias.<\/jats:p>\n               <jats:p>Results: Using logistic regression, we modeled the association between protein identification and GO term assignments while adjusting for identification bias in mass spectrometry. The model accounts for five biochemical properties of peptides: (i) hydrophobicity, (ii) molecular weight, (iii) transfer energy, (iv) beta turn frequency and (v) isoelectric point. The model was fit on 181 060 peptides from 2678 proteins identified in 24 yeast proteomics datasets with a 1% false discovery rate. In analyzing the association between protein identification and their GO term assignments, we found that 25% (134 out of 544) of Fisher tests that showed significant association (q-value \u22640.05) were non-significant after adjustment using our model. Simulations generating yeast protein sets enriched for identification propensity show that unadjusted enrichment tests were biased while our approach worked well.<\/jats:p>\n               <jats:p>Contact: \u00a0eugene.kolker@seattlechildrens.org<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btq541","type":"journal-article","created":{"date-parts":[[2010,11,11]],"date-time":"2010-11-11T01:18:02Z","timestamp":1289438282000},"page":"3007-3011","source":"Crossref","is-referenced-by-count":6,"title":["The necessity of adjusting tests of protein category enrichment in discovery proteomics"],"prefix":"10.1093","volume":"26","author":[{"given":"Brenton","family":"Louie","sequence":"first","affiliation":[{"name":"1 Bioinformatics and High-throughput Analysis Laboratory, 2High-throughput Analysis Core, Seattle Children's Research Institute, Seattle, WA 98101, 3Predictive Analytics, Seattle Children's Hospital, Seattle, WA 98145 and 4Division of Biomedical and Health Informatics, Department of Medical Education and Biomedical Informatics, University of Washington Medical School, Seattle, WA 98195, USA"},{"name":"1 Bioinformatics and High-throughput Analysis Laboratory, 2High-throughput Analysis Core, Seattle Children's Research Institute, Seattle, WA 98101, 3Predictive Analytics, Seattle Children's Hospital, Seattle, WA 98145 and 4Division of Biomedical and Health Informatics, Department of Medical Education and Biomedical Informatics, University of Washington Medical School, Seattle, WA 98195, USA"},{"name":"1 Bioinformatics and High-throughput Analysis Laboratory, 2High-throughput Analysis Core, Seattle Children's Research Institute, Seattle, WA 98101, 3Predictive Analytics, Seattle Children's Hospital, Seattle, WA 98145 and 4Division of Biomedical and Health Informatics, Department of Medical Education and Biomedical Informatics, University of Washington Medical School, Seattle, WA 98195, USA"}]},{"given":"Roger","family":"Higdon","sequence":"additional","affiliation":[{"name":"1 Bioinformatics and High-throughput Analysis Laboratory, 2High-throughput Analysis Core, Seattle Children's Research Institute, Seattle, WA 98101, 3Predictive Analytics, Seattle Children's Hospital, Seattle, WA 98145 and 4Division of Biomedical and Health Informatics, Department of Medical Education and Biomedical Informatics, University of Washington Medical School, Seattle, WA 98195, USA"},{"name":"1 Bioinformatics and High-throughput Analysis Laboratory, 2High-throughput Analysis Core, Seattle Children's Research Institute, Seattle, WA 98101, 3Predictive Analytics, Seattle Children's Hospital, Seattle, WA 98145 and 4Division of Biomedical and Health Informatics, Department of Medical Education and Biomedical Informatics, University of Washington Medical School, Seattle, WA 98195, USA"},{"name":"1 Bioinformatics and High-throughput Analysis Laboratory, 2High-throughput Analysis Core, Seattle Children's Research Institute, Seattle, WA 98101, 3Predictive Analytics, Seattle Children's Hospital, Seattle, WA 98145 and 4Division of Biomedical and Health Informatics, Department of Medical Education and Biomedical Informatics, University of Washington Medical School, Seattle, WA 98195, USA"}]},{"given":"Eugene","family":"Kolker","sequence":"additional","affiliation":[{"name":"1 Bioinformatics and High-throughput Analysis Laboratory, 2High-throughput Analysis Core, Seattle Children's Research Institute, Seattle, WA 98101, 3Predictive Analytics, Seattle Children's Hospital, Seattle, WA 98145 and 4Division of Biomedical and Health Informatics, Department of Medical Education and Biomedical Informatics, University of Washington Medical School, Seattle, WA 98195, USA"},{"name":"1 Bioinformatics and High-throughput Analysis Laboratory, 2High-throughput Analysis Core, Seattle Children's Research Institute, Seattle, WA 98101, 3Predictive Analytics, Seattle Children's Hospital, Seattle, WA 98145 and 4Division of Biomedical and Health Informatics, Department of Medical Education and Biomedical Informatics, University of Washington Medical School, Seattle, WA 98195, USA"},{"name":"1 Bioinformatics and High-throughput Analysis Laboratory, 2High-throughput Analysis Core, Seattle Children's Research Institute, Seattle, WA 98101, 3Predictive Analytics, Seattle Children's Hospital, Seattle, WA 98145 and 4Division of Biomedical and Health Informatics, Department of Medical Education and Biomedical Informatics, University of Washington Medical School, Seattle, WA 98195, USA"},{"name":"1 Bioinformatics and High-throughput Analysis Laboratory, 2High-throughput Analysis Core, Seattle Children's Research Institute, Seattle, WA 98101, 3Predictive Analytics, Seattle Children's Hospital, Seattle, WA 98145 and 4Division of Biomedical and Health Informatics, Department of Medical Education and Biomedical Informatics, University of Washington Medical School, Seattle, WA 98195, USA"}]}],"member":"286","published-online":{"date-parts":[[2010,11,9]]},"reference":[{"key":"2023012508030621200_B1","doi-asserted-by":"crossref","first-page":"1600","DOI":"10.1093\/bioinformatics\/btl140","article-title":"Improved scoring of functional groups from gene expression data by decorrelating GO graph structure","volume":"22","author":"Alexa","year":"2006","journal-title":"Bioinformatics"},{"key":"2023012508030621200_B2","doi-asserted-by":"crossref","first-page":"529","DOI":"10.1186\/1471-2105-9-529","article-title":"The APEX quantitative proteomics tool: generating protein quantitation estimates from LC-MS\/MS proteomics results","volume":"9","author":"Braisted","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023012508030621200_B3","doi-asserted-by":"crossref","first-page":"429","DOI":"10.1016\/j.tibtech.2005.05.011","article-title":"Pathways to the analysis of microarray data","volume":"23","author":"Curtis","year":"2005","journal-title":"Trends Biotechnol."},{"key":"2023012508030621200_B4","doi-asserted-by":"crossref","first-page":"257","DOI":"10.1093\/bioinformatics\/btl567","article-title":"Using GOstats to test gene lists for GO term association","volume":"23","author":"Falcon","year":"2007","journal-title":"Bioinformatics"},{"key":"2023012508030621200_B5","doi-asserted-by":"crossref","first-page":"190","DOI":"10.1038\/nbt.1524","article-title":"Prediction of high-responding peptides for targeted protein assays by mass spectrometry","volume":"27","author":"Fusaro","year":"2009","journal-title":"Nat. Biotechnol."},{"key":"2023012508030621200_B6","doi-asserted-by":"crossref","first-page":"737","DOI":"10.1038\/nature02046","article-title":"Global analysis of protein expression in yeast","volume":"425","author":"Ghaemmaghami","year":"2003","journal-title":"Nature"},{"key":"2023012508030621200_B7","doi-asserted-by":"crossref","first-page":"2369","DOI":"10.1002\/pmic.200900619","article-title":"Estimating false discovery rates for peptide and protein identification using randomized databases","volume":"10","author":"Hather","year":"2010","journal-title":"Proteomics"},{"key":"2023012508030621200_B8","doi-asserted-by":"crossref","first-page":"277","DOI":"10.1093\/bioinformatics\/btl595","article-title":"A predictive model for identifying proteins by a single peptide match","volume":"23","author":"Higdon","year":"2007","journal-title":"Bioinformatics"},{"key":"2023012508030621200_B9","doi-asserted-by":"crossref","first-page":"1225","DOI":"10.1093\/bioinformatics\/btn120","article-title":"A note on the false discovery rate and inconsistent comparisons between experiments","volume":"24","author":"Higdon","year":"2008","journal-title":"Bioinformatics"},{"key":"2023012508030621200_B10","doi-asserted-by":"crossref","first-page":"309","DOI":"10.1089\/omi.2010.0034","article-title":"Meta-analysis for protein identification: a case study on yeast data","volume":"14","author":"Higdon","year":"2010","journal-title":"OMICS"},{"key":"2023012508030621200_B11","doi-asserted-by":"crossref","first-page":"D577","DOI":"10.1093\/nar\/gkm909","article-title":"Gene Ontology annotations at SGD: new data sources and annotation methods","volume":"36","author":"Hong","year":"2008","journal-title":"Nucleic Acids Res."},{"key":"2023012508030621200_B12","doi-asserted-by":"crossref","first-page":"374","DOI":"10.1093\/nar\/28.1.374","article-title":"AAindex: amino acid index database","volume":"28","author":"Kawashima","year":"2000","journal-title":"Nucleic Acids Res."},{"key":"2023012508030621200_B13","doi-asserted-by":"crossref","first-page":"229","DOI":"10.1016\/j.tim.2006.03.005","article-title":"Protein identification and expression analysis using mass spectrometry","volume":"14","author":"Kolker","year":"2006","journal-title":"Trends Microbiol."},{"key":"2023012508030621200_B14","doi-asserted-by":"crossref","first-page":"e7546","DOI":"10.1371\/journal.pone.0007546","article-title":"A statistical model of protein sequence similarity and function similarity reveals overly-specific function predictions","volume":"4","author":"Louie","year":"2009","journal-title":"PLoS ONE"},{"key":"2023012508030621200_B15","doi-asserted-by":"crossref","first-page":"125","DOI":"10.1038\/nbt1275","article-title":"Computational prediction of proteotypic peptides for quantitative proteomics","volume":"25","author":"Mallick","year":"2007","journal-title":"Nat. Biotechnol."},{"key":"2023012508030621200_B16","doi-asserted-by":"crossref","DOI":"10.1007\/978-1-4899-3242-6","volume-title":"Generalized Linear Models","author":"McCullagh","year":"1989","edition":"2"},{"key":"2023012508030621200_B17","doi-asserted-by":"crossref","first-page":"837","DOI":"10.1038\/35015709","article-title":"Proteomics to study genes and genomes","volume":"405","author":"Pandey","year":"2000","journal-title":"Nature"},{"key":"2023012508030621200_B18","doi-asserted-by":"crossref","first-page":"41","DOI":"10.1093\/biomet\/70.1.41","article-title":"The central role of the propensity score in observational studies for causal effects","volume":"70","author":"Rosenbaum","year":"1983","journal-title":"Biometrika"},{"key":"2023012508030621200_B19","doi-asserted-by":"crossref","first-page":"e48","DOI":"10.1371\/journal.pbio.1000048","article-title":"Comparative functional analysis of the Caenorhabditis elegans and Drosophila melanogaster proteomes","volume":"7","author":"Schrimpf","year":"2009","journal-title":"PLoS Biol."},{"key":"2023012508030621200_B20","doi-asserted-by":"crossref","first-page":"9440","DOI":"10.1073\/pnas.1530509100","article-title":"Statistical significance for genomewide studies","volume":"100","author":"Storey","year":"2003","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012508030621200_B21","doi-asserted-by":"crossref","first-page":"15545","DOI":"10.1073\/pnas.0506580102","article-title":"Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles","volume":"102","author":"Subramanian","year":"2005","journal-title":"Proc. Natl Acad. Sci. USA"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/24\/3007\/48853442\/bioinformatics_26_24_3007.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/24\/3007\/48853442\/bioinformatics_26_24_3007.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T08:04:39Z","timestamp":1674633879000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/26\/24\/3007\/287341"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,11,9]]},"references-count":21,"journal-issue":{"issue":"24","published-print":{"date-parts":[[2010,12,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btq541","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2010,12,15]]},"published":{"date-parts":[[2010,11,9]]}}}