{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,2]],"date-time":"2026-04-02T02:35:18Z","timestamp":1775097318173,"version":"3.50.1"},"reference-count":32,"publisher":"Oxford University Press (OUP)","issue":"19","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2010,10,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: The Gene Ontology (GO) is a controlled vocabulary designed to represent the biological concepts pertaining to gene products. This study investigates the methods for identifying informative subsets of GO terms in an automatic and objective fashion. This task in turn requires addressing the following issues: how to represent the semantic context of GO terms, what metrics are suitable for measuring the semantic differences between terms, how to identify an informative subset that retains as much as possible of the original semantic information of GO.<\/jats:p>\n               <jats:p>Results: We represented the semantic context of a GO term using the word-usage-profile associated with the term, which enables one to measure the semantic differences between terms based on the differences in their semantic contexts. We further employed the information bottleneck methods to automatically identify subsets of GO terms that retain as much as possible of the semantic information in an annotation database. The automatically retrieved informative subsets align well with an expert-picked GO slim subset, cover important concepts and proteins, and enhance literature-based GO annotation.<\/jats:p>\n               <jats:p>Availability: \u00a0http:\/\/carcweb.musc.edu\/TextminingProjects\/<\/jats:p>\n               <jats:p>Contact: \u00a0xinghua@pitt.edu<\/jats:p>\n               <jats:p>Supplementary information: \u00a0Supplementary data are available at Bioinformatics online.<\/jats:p>","DOI":"10.1093\/bioinformatics\/btq449","type":"journal-article","created":{"date-parts":[[2010,8,12]],"date-time":"2010-08-12T00:32:12Z","timestamp":1281573132000},"page":"2445-2451","source":"Crossref","is-referenced-by-count":19,"title":["Identifying informative subsets of the Gene Ontology with information bottleneck methods"],"prefix":"10.1093","volume":"26","author":[{"given":"Bo","family":"Jin","sequence":"first","affiliation":[{"name":"Department of Biochemistry and Molecular Biology, Medical University of South Carolina, 174 Ashley Ave, Charleston, SC 29425; Department of Biomedical Informatics, University of Pittsburgh, UPMC Cancer Pavilion, Suite 301, 5150 Centre Avenue, Pittsburgh, PA 15232, USA"}]},{"given":"Xinghua","family":"Lu","sequence":"additional","affiliation":[{"name":"Department of Biochemistry and Molecular Biology, Medical University of South Carolina, 174 Ashley Ave, Charleston, SC 29425; Department of Biomedical Informatics, University of Pittsburgh, UPMC Cancer Pavilion, Suite 301, 5150 Centre Avenue, Pittsburgh, PA 15232, USA"}]}],"member":"286","published-online":{"date-parts":[[2010,8,11]]},"reference":[{"key":"2023012508170229700_B1","doi-asserted-by":"crossref","first-page":"25","DOI":"10.1038\/75556","article-title":"Gene ontology: tool for the unification of biology. The Gene Ontology Consortium","volume":"25","author":"Ashburner","year":"2000","journal-title":"Nat. Genet."},{"key":"2023012508170229700_B2","first-page":"5","article-title":"The Gene Ontology Annotation (GOA) Database\u2013an integrated resource of GO annotations to the UniProt Knowledgebase","volume":"4","author":"Camon","year":"2004","journal-title":"In Silico Biol."},{"issue":"Suppl. 1","key":"2023012508170229700_B3","doi-asserted-by":"crossref","first-page":"S17","DOI":"10.1186\/1471-2105-6-S1-S17","article-title":"An evaluation of GO annotation retrieval for BioCreAtIvE and GOA","volume":"6","author":"Camon","year":"2005","journal-title":"BMC Bioinformatics"},{"key":"2023012508170229700_B4","doi-asserted-by":"crossref","first-page":"4","DOI":"10.1186\/1747-5333-1-4","article-title":"The TREC 2004 genomics track categorization task: classifying full text biomedical documents","volume":"1","author":"Cohen","year":"2006","journal-title":"J. Biomed. Discov. Collab."},{"key":"2023012508170229700_B5","doi-asserted-by":"crossref","first-page":"e20","DOI":"10.1371\/journal.pcbi.0040020","article-title":"Getting started in text mining","volume":"4","author":"Cohen","year":"2008","journal-title":"PLoS Comput. Biol."},{"key":"2023012508170229700_B6","doi-asserted-by":"crossref","first-page":"i63","DOI":"10.1093\/bioinformatics\/btp193","article-title":"From disease ontology to disease-ontology lite: statistical methods to adapt a general-purpose ontology for the test of gene-ontology associations","volume":"25","author":"Du","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012508170229700_B7","first-page":"465","article-title":"Agnostic classification of Markovian sequences","volume":"10","author":"El-Yaniv","year":"1997","journal-title":"Adv. Neural Inf. Process. Syst."},{"key":"2023012508170229700_B8","doi-asserted-by":"crossref","first-page":"R183","DOI":"10.1186\/gb-2007-8-9-r183","article-title":"DAVID Gene Functional Classification Tool: a novel biological module-centric algorithm to functionally analyze large gene list","volume":"8","author":"Huang","year":"2007","journal-title":"Genome Biol."},{"key":"2023012508170229700_B9","doi-asserted-by":"crossref","first-page":"44","DOI":"10.1038\/nprot.2008.211","article-title":"Systematic and integrative analysis of large gene lists using DAVID Bioinformatics Resources","volume":"4","author":"Huang","year":"2009","journal-title":"Nat. Protoc."},{"key":"2023012508170229700_B10","article-title":"Semantic similarity based on corpus statistics and lexical taxonomy","volume-title":"Proceedings on International Conference on Research in Computational Linguistics","author":"Jiang","year":"1998"},{"key":"2023012508170229700_B11","doi-asserted-by":"crossref","first-page":"525","DOI":"10.1186\/1471-2105-9-525","article-title":"Multi-label literature classification based on the Gene Ontology graph","volume":"9","author":"Jin","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023012508170229700_B12","doi-asserted-by":"crossref","first-page":"83","DOI":"10.1002\/nav.3800020109","article-title":"The Hungarian Method for the assignment problem","volume":"2","author":"Kuhn","year":"1955","journal-title":"Naval Res. Logist. Quart."},{"key":"2023012508170229700_B13","doi-asserted-by":"crossref","first-page":"253","DOI":"10.1002\/nav.3800030404","article-title":"Variants of the Hungarian method for assignment problems","volume":"3","author":"Kuhn","year":"1956","journal-title":"Naval Res. Logist. Quart."},{"key":"2023012508170229700_B14","first-page":"296","article-title":"An information-theoretic definition of similarity","volume-title":"Proceedings of the 15th International Conference on Machine Learning.","author":"Lin","year":"1998"},{"key":"2023012508170229700_B15","doi-asserted-by":"crossref","first-page":"145","DOI":"10.1109\/18.61115","article-title":"Divergence measures based on the Shannon entropy","volume":"37","author":"Lin","year":"1991","journal-title":"IEEE Trans. Inf. Theory"},{"key":"2023012508170229700_B16","doi-asserted-by":"crossref","first-page":"281","DOI":"10.1055\/s-0038-1634945","article-title":"The Unified Medical Language System","volume":"32","author":"Lindberg","year":"1993","journal-title":"Methods Inf. Med."},{"key":"2023012508170229700_B17","doi-asserted-by":"crossref","first-page":"1275","DOI":"10.1093\/bioinformatics\/btg153","article-title":"Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation","volume":"19","author":"Lord","year":"2003","journal-title":"Bioinformatics"},{"key":"2023012508170229700_B18","doi-asserted-by":"crossref","first-page":"122","DOI":"10.1186\/1756-0500-2-122","article-title":"GOGrapher: a Python library for GO graph representation and analysis","volume":"2","author":"Muller","year":"2009","journal-title":"BMC Res. Notes"},{"key":"2023012508170229700_B19","doi-asserted-by":"crossref","first-page":"32","DOI":"10.1137\/0105003","article-title":"Algorithms for the Assignment and Transportation Problems","volume":"5","author":"Munkres","year":"1957","journal-title":"J. Soc. Indust. Appl. Math."},{"key":"2023012508170229700_B20","doi-asserted-by":"crossref","first-page":"130","DOI":"10.1108\/eb046814","article-title":"An algorithm for suffix stripping","volume":"14","author":"Porter","year":"1980","journal-title":"Program"},{"key":"2023012508170229700_B21","first-page":"448","article-title":"Using information content to evaluate semantic similarity in a taxonomy","volume-title":"Proceedings of the 14th International Joint Conference on Artificial Intelligence","author":"Resnik","year":"1995"},{"key":"2023012508170229700_B22","doi-asserted-by":"crossref","first-page":"i79","DOI":"10.1093\/bioinformatics\/btq203","article-title":"Assessing the functional coherence of gene sets with metrics based on the Gene Ontology graph","volume":"26","author":"Richards","year":"2010","journal-title":"Bioinformatics"},{"key":"2023012508170229700_B23","doi-asserted-by":"crossref","first-page":"302","DOI":"10.1186\/1471-2105-7-302","article-title":"A new measure for functional similarity of gene products based on Gene Ontology","volume":"7","author":"Schlicker","year":"2006","journal-title":"BMC Bioinformatics"},{"key":"2023012508170229700_B24","doi-asserted-by":"crossref","first-page":"468","DOI":"10.1186\/1471-2105-9-468","article-title":"A relation based measure of semantic similarity for Gene Ontology annotations","volume":"9","author":"Sheehan","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023012508170229700_B25","doi-asserted-by":"crossref","first-page":"18297","DOI":"10.1073\/pnas.0507432102","article-title":"Information-based clustering","volume":"102","author":"Slonim","year":"2005","journal-title":"Proc. Natl Acad. Sci. USA"},{"key":"2023012508170229700_B26","first-page":"929","article-title":"Agglomerative multivariate information bottleneck","volume-title":"Advances in Neural Information Processing Systems (NIPS-14), Cambridge, Mass.","author":"Slonim","year":"2002"},{"key":"2023012508170229700_B27","doi-asserted-by":"crossref","first-page":"208","DOI":"10.1145\/345508.345578","article-title":"Document clustering using word clusters via the information bottleneck method","volume-title":"Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.","author":"Slonim","year":"2000"},{"key":"2023012508170229700_B28","doi-asserted-by":"crossref","first-page":"1251","DOI":"10.1038\/nbt1346","article-title":"The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration","volume":"25","author":"Smith","year":"2007","journal-title":"Nat. Biotechnol."},{"key":"2023012508170229700_B29","doi-asserted-by":"crossref","first-page":"i529","DOI":"10.1093\/bioinformatics\/btm195","article-title":"Information theory applied to the sparse gene ontology annotation network to predict novel gene function","volume":"23","author":"Tao","year":"2007","journal-title":"Bioinformatics"},{"key":"2023012508170229700_B30","first-page":"368","article-title":"The information bottleneck method","author":"Tishby","year":"1999","journal-title":"Proceedings of the 37th Annual Allerton Conference on Communication, Control and Computing"},{"key":"2023012508170229700_B31","volume-title":"Statistical Learning Theory.","author":"Vapnik","year":"1998"},{"key":"2023012508170229700_B32","doi-asserted-by":"crossref","first-page":"1274","DOI":"10.1093\/bioinformatics\/btm087","article-title":"A new method to measure the semantic similarity of GO terms","volume":"23","author":"Wang","year":"2007","journal-title":"Bioinformatics"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/19\/2445\/48855404\/bioinformatics_26_19_2445.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/26\/19\/2445\/48855404\/bioinformatics_26_19_2445.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T08:18:05Z","timestamp":1674634685000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/26\/19\/2445\/229545"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2010,8,11]]},"references-count":32,"journal-issue":{"issue":"19","published-print":{"date-parts":[[2010,10,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btq449","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2010,10,1]]},"published":{"date-parts":[[2010,8,11]]}}}