{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,11]],"date-time":"2026-01-11T01:50:24Z","timestamp":1768096224836,"version":"3.49.0"},"reference-count":23,"publisher":"Oxford University Press (OUP)","issue":"7","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2005,4,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: With the rapid advancement of biomedical science and the development of high-throughput analysis methods, the extraction of various types of information from biomedical text has become critical. Since automatic functional annotations of genes are quite useful for interpreting large amounts of high-throughput data efficiently, the demand for automatic extraction of information related to gene functions from text has been increasing.<\/jats:p><jats:p>Results: We have developed a method for automatically extracting the biological process functions of genes\/protein\/families based on Gene Ontology (GO) from text using a shallow parser and sentence structure analysis techniques. When the gene\/protein\/family names and their functions are described in ACTOR (doer of action) and OBJECT (receiver of action) relationships, the corresponding GO-IDs are assigned to the genes\/proteins\/families. The gene\/protein\/family names are recognized using the gene\/protein\/family name dictionaries developed by our group. To achieve wide recognition of the gene\/protein\/family functions, we semi-automatically gather functional terms based on GO using co-occurrence, collocation similarities and rule-based techniques. A preliminary experiment demonstrated that our method has an estimated recall of 54\u201364% with a precision of 91\u201394% for actually described functions in abstracts. When applied to the PUBMED, it extracted over 190 000 gene\u2013GO relationships and 150 000 family\u2013GO relationships for major eukaryotes.<\/jats:p><jats:p>Availability: The extracted gene functions are available at http:\/\/prime.ontology.ims.u-tokyo.ac.jp<\/jats:p><jats:p>Contact: \u00a0akoike@hgc.jp<\/jats:p>","DOI":"10.1093\/bioinformatics\/bti084","type":"journal-article","created":{"date-parts":[[2004,10,28]],"date-time":"2004-10-28T00:23:01Z","timestamp":1098922981000},"page":"1227-1236","source":"Crossref","is-referenced-by-count":46,"title":["Automatic extraction of gene\/protein biological functions from biomedical text"],"prefix":"10.1093","volume":"21","author":[{"given":"Asako","family":"Koike","sequence":"first","affiliation":[]},{"given":"Yoshiki","family":"Niwa","sequence":"additional","affiliation":[]},{"given":"Toshihisa","family":"Takagi","sequence":"additional","affiliation":[]}],"member":"286","published-online":{"date-parts":[[2004,10,27]]},"reference":[{"key":"2023013107281988700_B1","doi-asserted-by":"crossref","unstructured":"Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., et al. 2000Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet.2525\u201329","DOI":"10.1038\/75556"},{"key":"2023013107281988700_B2","unstructured":"Blaschke, C. and Valencia, A. 2002Automatic ontology construction from the literature. Genome Inform.13201\u2013213"},{"key":"2023013107281988700_B3","doi-asserted-by":"crossref","unstructured":"Camon, E., Magrane, M., Barrell, D., Binns, D., Fleischmann, W., Kersey, P., Mulder, N., Oinn, T., Maslen, J., Cox, A., Apweiler, R. 2003The gene ontology annotation (GOA) project: implementation of GO in SWISS-PROT, TrEMBL, and InterPro. Genome Res.13662\u2013672","DOI":"10.1101\/gr.461403"},{"key":"2023013107281988700_B4","unstructured":"Chiang, J. and -H. and Yu, H.-C. 2003MeKE: discovering the functions of gene products from biomedical literature via sentence alignment. Bioinformatics191417\u20131422"},{"key":"2023013107281988700_B5","doi-asserted-by":"crossref","unstructured":"Collier, N., Nobata, C., Tsujii, J. 2000Comparison between Tagged Corpora for the Named Entity Task. Proceedings of the 18th International Conference on Computational Linguistics , Germany Saarbrucker, pp. 201\u2013207","DOI":"10.3115\/1117729.1117733"},{"key":"2023013107281988700_B6","doi-asserted-by":"crossref","unstructured":"Friedman, C., Kra, P., Yu, H., Krauthammer, M., Rzhetsky, A. 2001GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics17(Suppl. 1),S74\u2013S82","DOI":"10.1093\/bioinformatics\/17.suppl_1.S74"},{"key":"2023013107281988700_B7","unstructured":"Fukuda, K., Tsunoda, T., Tamura, A., Takagi, T. 1998Toward information extraction: identifying protein names from biological papers. Proceedings of the Pacific Symposium on Biocomputing , pp. 705\u2013716"},{"key":"2023013107281988700_B8","doi-asserted-by":"crossref","unstructured":"Humphrey, K., Demetriou, G., Gaizauskas, R. 2000Two applications of information extraction to biological science journal articles. Proceedings of the Pacific Symposium on BiocomputingEnzyme Interact. Protein Struct. , Hawaii, USA , pp. , pp. 505\u2013516","DOI":"10.1142\/9789814447331_0048"},{"key":"2023013107281988700_B9","doi-asserted-by":"crossref","unstructured":"Jacquemin, C. and Royaute, J. 1994Retrieving terms and their variants in a lexicalised unification-based framework. Proceedings of SIGIR , pp. 132\u2013141","DOI":"10.1007\/978-1-4471-2099-5_14"},{"key":"2023013107281988700_B10","doi-asserted-by":"crossref","unstructured":"Kim, J.-J. and Park, J.C. 2004Annotation of gene products in the literature with gene ontology terms using syntactic dependencies. Lect. Notes Artifi. Intell. (in press)","DOI":"10.1007\/978-3-540-30211-7_84"},{"key":"2023013107281988700_B11","unstructured":"Koike, A. and Takagi, T. Proceedings of HLT\/NAACL BioLINK Workshop2004, pp. 9\u201316"},{"key":"2023013107281988700_B12","doi-asserted-by":"crossref","unstructured":"Koike, A., Kobayashi, Y., Takagi, T. 2003Kinase pathway database: an integrated protein-kinase and NLP-based protein-interaction resource. Genome Res.131231\u20131243","DOI":"10.1101\/gr.835903"},{"key":"2023013107281988700_B13","unstructured":"Krallinger, M. and Padron, M.M. 2004Prediction of GO annotation by combining entity specific sentence sliding window profiles. Proceedings of BioCreAtIvE , Spain Granada"},{"key":"2023013107281988700_B14","unstructured":"Krymolowski, Y., Alex, B., Leidner, J.L. 2004BioCreative Task 2.1: The Edinburgh\u2013Stanford System. Proceedings of BioCreAtIvE"},{"key":"2023013107281988700_B15","doi-asserted-by":"crossref","unstructured":"Nenadic, G., Rice, S., Spasic, I., Ananiadou, S., Stapley, B. 2003Selecting text features for gene name classification: from documents to terms. Proceedings of the ACL Workshop on Natural Language Processing in Biomedicine , Japan Sapporo, pp. 121\u2013128","DOI":"10.3115\/1118958.1118974"},{"key":"2023013107281988700_B16","doi-asserted-by":"crossref","unstructured":"Raychaudhuri, S., Chang, J., Sutphin, P., Altman, R. 2002Associating genes with gene ontology codes using a maximum entropy analysis of biomedical literature. Genome Res.12203\u2013214","DOI":"10.1101\/gr.199701"},{"key":"2023013107281988700_B17","unstructured":"Rindflesch, T.C., Tanabe, L., Weinstein, J.N., Hunter, L. 2000EDGAR: extraction of drugs, genes and relations from the biomedical literature. Proceedings of Pacific Symposium on Bioinformatics , USA Hawaii, pp. 514\u2013525"},{"key":"2023013107281988700_B18","unstructured":"Salton, G., Wong, A., Yang, C.S. 1975A vector space model for automatic indexing. Commun. ACM18613\u2013620"},{"key":"2023013107281988700_B19","doi-asserted-by":"crossref","unstructured":"Schug, J., Diskin, S., Mazzarelli, J., Brunk, B.P., Stoeckert, C.J., Jr. 2002Predicting gene ontology functions from ProDom and CDD protein domains. Genome Res.12648\u2013655","DOI":"10.1101\/gr.222902"},{"key":"2023013107281988700_B20","doi-asserted-by":"crossref","unstructured":"Singhal, A., Buckley, C., Mitra, M. 1996Pivoted document length normalization. Proceedings of ACM SIGIR'96 , Zurich, Switzerland , pp. 21\u201329","DOI":"10.1145\/243199.243206"},{"key":"2023013107281988700_B21","doi-asserted-by":"crossref","unstructured":"Tanabe, L. and Wilbur, W.J. 2002Tagging gene and protein names in biomedical text. Bioinformatics181124\u20131132","DOI":"10.3115\/1118149.1118151"},{"key":"2023013107281988700_B22","doi-asserted-by":"crossref","unstructured":"Yakushiji, A., Tateishi, Y., Miyano, Y., Tsujii, J. 2001Event extraction from biological papers using a full parser. Proceedings of Pacific Symposium on Bioinformatics , USA Hawaii, pp. 408\u2013419","DOI":"10.1142\/9789814447362_0040"},{"key":"2023013107281988700_B23","doi-asserted-by":"crossref","unstructured":"Xie, H., Wasserman, A., Levine, Z., Novik, A., Grebinskiy, V., Shoshan, A., Mintz, L. 2002Large-scale protein annotation through gene ontology. Genome Res.12785\u2013794","DOI":"10.1101\/gr.86902"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/21\/7\/1227\/48966947\/bioinformatics_21_7_1227.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/21\/7\/1227\/48966947\/bioinformatics_21_7_1227.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,12,19]],"date-time":"2024-12-19T05:32:21Z","timestamp":1734586341000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/21\/7\/1227\/268830"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2004,10,27]]},"references-count":23,"journal-issue":{"issue":"7","published-print":{"date-parts":[[2005,4,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bti084","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2005,4,1]]},"published":{"date-parts":[[2004,10,27]]}}}