{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,26]],"date-time":"2025-10-26T22:48:40Z","timestamp":1761518920616},"reference-count":38,"publisher":"Oxford University Press (OUP)","issue":"10","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2009,5,15]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: Genome-wide association (GWA) studies may identify multiple variants that are associated with a disease or trait. To narrow down candidates for further validation, quantitatively assessing how identified genes relate to a phenotype of interest is important.<\/jats:p><jats:p>Results: We describe an approach to characterize genes or biological concepts (phenotypes, pathways, diseases, etc.) by ontology fingerprint\u2014the set of Gene Ontology (GO) terms that are overrepresented among the PubMed abstracts discussing the gene or biological concept together with the enrichment p-value of these terms generated from a hypergeometric enrichment test. We then quantify the relevance of genes to the trait from a GWA study by calculating similarity scores between their ontology fingerprints using enrichment p-values. We validate this approach by correctly identifying corresponding genes for biological pathways with a 90% average area under the ROC curve (AUC). We applied this approach to rank genes identified through a GWA study that are associated with the lipid concentrations in plasma as well as to prioritize genes within linkage disequilibrium (LD) block. We found that the genes with highest scores were: ABCA1, lipoprotein lipase (LPL) and cholesterol ester transfer protein, plasma for high-density lipoprotein; low-density lipoprotein receptor, APOE and APOB for low-density lipoprotein; and LPL, APOA1 and APOB for triglyceride. In addition, we identified genes relevant to lipid metabolism from the literature even in cases where such knowledge was not reflected in current annotation of these genes. These results demonstrate that ontology fingerprints can be used effectively to prioritize genes from GWA studies for experimental validation.<\/jats:p><jats:p>Contact: \u00a0zhengw@musc.edu<\/jats:p><jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btp158","type":"journal-article","created":{"date-parts":[[2009,4,7]],"date-time":"2009-04-07T00:13:22Z","timestamp":1239063202000},"page":"1314-1320","source":"Crossref","is-referenced-by-count":20,"title":["Evaluation of genome-wide association study results through development of ontology fingerprints"],"prefix":"10.1093","volume":"25","author":[{"given":"Lam C.","family":"Tsoi","sequence":"first","affiliation":[{"name":"1 Bioinformatics Graduate Program, Department of Biostatistics, Bioinformatics and Epidemiology, Medical University of South Carolina, Charleston, SC, 2Department of Biostatistics and Center for Statistical Genetics, School of Public Health, University of Michigan, Ann Arbor, MI, 3Division of Endocrinology, Metabolism, and Medical Genetics, Department of Medicine, Medical University of South Carolina, 4Research Service, Ralph H. Johnson Department of Veterans Affairs Medical Center, Charleston and 5Department of Biostatistics, Bioinformatics & Epidemiology, Medical University of South Carolina, Charleston, SC, USA"}]},{"given":"Michael","family":"Boehnke","sequence":"additional","affiliation":[{"name":"1 Bioinformatics Graduate Program, Department of Biostatistics, Bioinformatics and Epidemiology, Medical University of South Carolina, Charleston, SC, 2Department of Biostatistics and Center for Statistical Genetics, School of Public Health, University of Michigan, Ann Arbor, MI, 3Division of Endocrinology, Metabolism, and Medical Genetics, Department of Medicine, Medical University of South Carolina, 4Research Service, Ralph H. Johnson Department of Veterans Affairs Medical Center, Charleston and 5Department of Biostatistics, Bioinformatics & Epidemiology, Medical University of South Carolina, Charleston, SC, USA"}]},{"given":"Richard L.","family":"Klein","sequence":"additional","affiliation":[{"name":"1 Bioinformatics Graduate Program, Department of Biostatistics, Bioinformatics and Epidemiology, Medical University of South Carolina, Charleston, SC, 2Department of Biostatistics and Center for Statistical Genetics, School of Public Health, University of Michigan, Ann Arbor, MI, 3Division of Endocrinology, Metabolism, and Medical Genetics, Department of Medicine, Medical University of South Carolina, 4Research Service, Ralph H. Johnson Department of Veterans Affairs Medical Center, Charleston and 5Department of Biostatistics, Bioinformatics & Epidemiology, Medical University of South Carolina, Charleston, SC, USA"},{"name":"1 Bioinformatics Graduate Program, Department of Biostatistics, Bioinformatics and Epidemiology, Medical University of South Carolina, Charleston, SC, 2Department of Biostatistics and Center for Statistical Genetics, School of Public Health, University of Michigan, Ann Arbor, MI, 3Division of Endocrinology, Metabolism, and Medical Genetics, Department of Medicine, Medical University of South Carolina, 4Research Service, Ralph H. Johnson Department of Veterans Affairs Medical Center, Charleston and 5Department of Biostatistics, Bioinformatics & Epidemiology, Medical University of South Carolina, Charleston, SC, USA"}]},{"given":"W. Jim","family":"Zheng","sequence":"additional","affiliation":[{"name":"1 Bioinformatics Graduate Program, Department of Biostatistics, Bioinformatics and Epidemiology, Medical University of South Carolina, Charleston, SC, 2Department of Biostatistics and Center for Statistical Genetics, School of Public Health, University of Michigan, Ann Arbor, MI, 3Division of Endocrinology, Metabolism, and Medical Genetics, Department of Medicine, Medical University of South Carolina, 4Research Service, Ralph H. Johnson Department of Veterans Affairs Medical Center, Charleston and 5Department of Biostatistics, Bioinformatics & Epidemiology, Medical University of South Carolina, Charleston, SC, USA"}]}],"member":"286","published-online":{"date-parts":[[2009,4,5]]},"reference":[{"key":"2023013110285597700_B1","doi-asserted-by":"crossref","DOI":"10.1002\/0471249688","volume-title":"Categorical Data Analysis.","author":"Agresti","year":"2002"},{"key":"2023013110285597700_B2","first-page":"54","article-title":"Intex: a syntactic role driven protein-protein interaction extractor for bio-medical text","volume-title":"Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontology and Database: Mining Biological Semantics.","author":"Ahmed","year":"2005"},{"key":"2023013110285597700_B3","doi-asserted-by":"crossref","first-page":"51","DOI":"10.1186\/1471-2105-6-51","article-title":"CoPub mapper: mining MEDLINE based on search term co-publication","volume":"6","author":"Alako","year":"2005","journal-title":"BMC Bioinformatics"},{"key":"2023013110285597700_B4","first-page":"381","article-title":"CBioC: beyond a prototype for collaborative annotation of molecular interactions from the literature","volume":"6","author":"Baral","year":"2007","journal-title":"Comput. Syst. Bioinform. Conf."},{"key":"2023013110285597700_B5","doi-asserted-by":"crossref","first-page":"E20","DOI":"10.1371\/journal.pbio.0000020","article-title":"Candidate gene association study in type 2 diabetes indicates a role for genes involved in beta-cell function as well as insulin action","volume":"1","author":"Barroso","year":"2003","journal-title":"PLoS Biol."},{"key":"2023013110285597700_B6","doi-asserted-by":"crossref","first-page":"W399","DOI":"10.1093\/nar\/gkn296","article-title":"PolySearch: a web-based text mining system for extracting relationships between human diseases, genes, mutations, drugs and metabolites","volume":"36","author":"Cheng","year":"2008","journal-title":"Nucleic Acids Res."},{"key":"2023013110285597700_B7","doi-asserted-by":"crossref","first-page":"15490","DOI":"10.1073\/pnas.0702759104","article-title":"Targeting thyroid hormone receptor-beta agonists to the liver reduces cholesterol and triglycerides and improves the therapeutic index","volume":"104","author":"Erion","year":"2007","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023013110285597700_B8","doi-asserted-by":"crossref","first-page":"W21","DOI":"10.1093\/nar\/gkm298","article-title":"iHOP web services","volume":"35","author":"Fernandez","year":"2007","journal-title":"Nucleic Acids Res."},{"key":"2023013110285597700_B9","doi-asserted-by":"crossref","first-page":"336","DOI":"10.1161\/01.ATV.14.3.336","article-title":"Regulation of plasma HDL cholesterol and subfraction distribution by genetic and environmental factors. Associations between the TaqI B RFLP in the CETP gene and smoking and obesity","volume":"14","author":"Freeman","year":"1994","journal-title":"Arterioscler. Thromb."},{"issue":"Suppl. 2","key":"2023013110285597700_B10","doi-asserted-by":"crossref","first-page":"S110","DOI":"10.1093\/bioinformatics\/18.suppl_2.S110","article-title":"A similarity-based method for genome-wide prediction of disease-relevant human genes","volume":"18","author":"Freudenberg","year":"2002","journal-title":"Bioinformatics"},{"key":"2023013110285597700_B11","doi-asserted-by":"crossref","first-page":"11553","DOI":"10.1074\/jbc.M512554200","article-title":"The lipoprotein lipase inhibitor ANGPTL3 is negatively regulated by thyroid hormone","volume":"281","author":"Fugier","year":"2006","journal-title":"J. Biol. Chem."},{"issue":"Suppl. 2","key":"2023013110285597700_B12","doi-asserted-by":"crossref","first-page":"ii252","DOI":"10.1093\/bioinformatics\/bti1142","article-title":"Implementing the iHOP concept for navigation of biomedical literature","volume":"21","author":"Hoffmann","year":"2005","journal-title":"Bioinformatics"},{"key":"2023013110285597700_B13","doi-asserted-by":"crossref","first-page":"2049","DOI":"10.1093\/bioinformatics\/bti268","article-title":"Co-occurrence based meta-analysis of scientific texts: retrieving biological relationships between genes","volume":"21","author":"Jelier","year":"2005","journal-title":"Bioinformatics"},{"key":"2023013110285597700_B14","doi-asserted-by":"crossref","first-page":"14","DOI":"10.1186\/1471-2105-8-14","article-title":"Text-derived concept profiles support assessment of DNA microarray data for acute myeloid leukemia and for androgen receptor stimulation","volume":"8","author":"Jelier","year":"2007","journal-title":"BMC Bioinformatics"},{"key":"2023013110285597700_B15","doi-asserted-by":"crossref","first-page":"354","DOI":"10.1016\/j.ijmedinf.2007.07.004","article-title":"Literature-based concept profiles for gene annotation: the issue of weighting","volume":"77","author":"Jelier","year":"2008","journal-title":"Int. J. Med. Inf."},{"key":"2023013110285597700_B16","doi-asserted-by":"crossref","first-page":"R96","DOI":"10.1186\/gb-2008-9-6-r96","article-title":"Anni 2.0: a multipurpose text-mining tool for the life sciences","volume":"9","author":"Jelier","year":"2008","journal-title":"Genome Biol."},{"key":"2023013110285597700_B17","doi-asserted-by":"crossref","first-page":"27","DOI":"10.1093\/nar\/28.1.27","article-title":"KEGG: kyoto encyclopedia of genes and genomes","volume":"28","author":"Kanehisa","year":"2000","journal-title":"Nucleic Acids Res."},{"key":"2023013110285597700_B18","doi-asserted-by":"crossref","first-page":"1240","DOI":"10.1056\/NEJMoa0706728","article-title":"Polymorphisms associated with cholesterol and risk of cardiovascular events","volume":"358","author":"Kathiresan","year":"2008","journal-title":"N. Engl. J. Med."},{"key":"2023013110285597700_B19","doi-asserted-by":"crossref","first-page":"1222","DOI":"10.1101\/gr.985203","article-title":"eVOC: a controlled vocabulary for unifying gene expression data","volume":"13","author":"Kelso","year":"2003","journal-title":"Genome Res."},{"key":"2023013110285597700_B20","doi-asserted-by":"crossref","first-page":"12491","DOI":"10.1073\/pnas.211291398","article-title":"Megalin-dependent cubilin-mediated endocytosis is a major pathway for the apical uptake of transferrin in polarized epithelia","volume":"98","author":"Kozyraki","year":"2001","journal-title":"Proc. Natl Acad. Sci USA"},{"key":"2023013110285597700_B21","doi-asserted-by":"crossref","first-page":"86","DOI":"10.1056\/NEJM199801083380203","article-title":"The role of a common variant of the cholesteryl ester transfer protein gene in the progression of coronary atherosclerosis. The regression growth evaluation statin study group","volume":"338","author":"Kuivenhoven","year":"1998","journal-title":"N. Engl. J. Med."},{"key":"2023013110285597700_B22","doi-asserted-by":"crossref","first-page":"86","DOI":"10.1194\/jlr.M600094-JLR200","article-title":"High density lipoprotein subfractions: isolation, composition, and their duplicitous role in oxidation","volume":"48","author":"McPherson","year":"2007","journal-title":"J. Lipid Res."},{"key":"2023013110285597700_B23","doi-asserted-by":"crossref","first-page":"155","DOI":"10.1016\/S0165-022X(02)00183-5","article-title":"Efficient and accurate experimental design for enzyme kinetics: Bayesian studies reveal a systematic approach","volume":"55","author":"Murphy","year":"2003","journal-title":"J. Biochem. Biophys. Methods"},{"key":"2023013110285597700_B24","doi-asserted-by":"crossref","first-page":"1335","DOI":"10.1001\/jama.299.11.1335","article-title":"How to interpret a genome-wide association study","volume":"299","author":"Pearson","year":"2008","journal-title":"JAMA"},{"key":"2023013110285597700_B25","doi-asserted-by":"crossref","first-page":"316","DOI":"10.1038\/ng895","article-title":"Association of genes to genetically inherited diseases using data mining","volume":"31","author":"Perez-Iratxeta","year":"2002","journal-title":"Nat. Genet."},{"key":"2023013110285597700_B26","doi-asserted-by":"crossref","first-page":"27533","DOI":"10.1074\/jbc.M503139200","article-title":"Thyroid hormone regulates the hypotriglyceridemic gene APOA5","volume":"280","author":"Prieur","year":"2005","journal-title":"J. Biol. Chem."},{"key":"2023013110285597700_B27","doi-asserted-by":"crossref","first-page":"741","DOI":"10.1002\/gepi.20237","article-title":"Improving power in genome-wide association studies: weights tip the scale","volume":"31","author":"Roeder","year":"2007","journal-title":"Genet. Epidemiol."},{"key":"2023013110285597700_B28","doi-asserted-by":"crossref","first-page":"185","DOI":"10.1037\/0033-2909.85.1.185","article-title":"Combining results of independent studies","volume":"85","author":"Rosentha","year":"1978","journal-title":"Psychol. Bull."},{"key":"2023013110285597700_B29","doi-asserted-by":"crossref","first-page":"821","DOI":"10.1089\/106652703322756104","article-title":"Mining the biomedical literature in the genomic era: an overview","volume":"10","author":"Shatkay","year":"2003","journal-title":"J. Comput. Biol."},{"key":"2023013110285597700_B30","doi-asserted-by":"crossref","first-page":"1251","DOI":"10.1038\/nbt1346","article-title":"The OBO foundry: coordinated evolution of ontologies to support biomedical data integration","volume":"25","author":"Smith","year":"2007","journal-title":"Nat. Biotechnol."},{"key":"2023013110285597700_B31","first-page":"D440","article-title":"The Gene Ontology project in 2008.","volume-title":"Nucleic Acids Res","author":"The Gene Ontology Consortium","year":"2008"},{"key":"2023013110285597700_B32","doi-asserted-by":"crossref","first-page":"337","DOI":"10.1086\/432962","article-title":"Recent development in genomewide association scans: a workshop summary and review","volume":"77","author":"Thomas","year":"2005","journal-title":"Am. J. Hum. Genet."},{"key":"2023013110285597700_B33","doi-asserted-by":"crossref","first-page":"1544","DOI":"10.1093\/nar\/gki296","article-title":"Integration of text- and data-mining using ontologies successfully selects disease gene candidates","volume":"33","author":"Tiffin","year":"2005","journal-title":"Nucleic Acids Res."},{"key":"2023013110285597700_B34","doi-asserted-by":"crossref","first-page":"R75","DOI":"10.1186\/gb-2003-4-11-r75","article-title":"POCUS: mining genomic sequence annotation to predict disease genes","volume":"4","author":"Turner","year":"2003","journal-title":"Genome Biol."},{"key":"2023013110285597700_B35","first-page":"51","article-title":"The gene ontology as a source of lexical semantic knowledge for a biological natural language processing application","volume-title":"Proceedings of the SIGIR'03 Workshop on Text Analysis and Search for Bioinformatics.","author":"Verspoor","year":"2003"},{"key":"2023013110285597700_B36","doi-asserted-by":"crossref","first-page":"161","DOI":"10.1038\/ng.76","article-title":"Newly identified loci that influence lipid concentrations and risk of coronary artery disease","volume":"40","author":"Willer","year":"2008","journal-title":"Nat. Genet."},{"key":"2023013110285597700_B37","doi-asserted-by":"crossref","first-page":"1606","DOI":"10.1172\/JCI119323","article-title":"A common substitution (Asn291Ser) in lipoprotein lipase is associated with increased risk of ischemic heart disease","volume":"99","author":"Wittrup","year":"1999","journal-title":"J. Clin. Invest."},{"key":"2023013110285597700_B38","doi-asserted-by":"crossref","first-page":"170","DOI":"10.1002\/gepi.0042","article-title":"Truncated product method for combining p-values","volume":"22","author":"Zaykin","year":"2002","journal-title":"Genet. Epidemiol."}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/25\/10\/1314\/48987716\/bioinformatics_25_10_1314.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/25\/10\/1314\/48987716\/bioinformatics_25_10_1314.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,3,12]],"date-time":"2024-03-12T12:07:21Z","timestamp":1710245241000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/25\/10\/1314\/270704"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,4,5]]},"references-count":38,"journal-issue":{"issue":"10","published-print":{"date-parts":[[2009,5,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btp158","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2009,5,15]]},"published":{"date-parts":[[2009,4,5]]}}}