{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,26]],"date-time":"2025-10-26T22:46:42Z","timestamp":1761518802397,"version":"3.32.0"},"reference-count":21,"publisher":"Oxford University Press (OUP)","issue":"15","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2005,8,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Motivation: An enormous number of protein\u2013protein interaction relationships are buried in millions of research articles published over the years, and the number is growing. Rediscovering them automatically is a challenging bioinformatics task. Solutions to this problem also reach far beyond bioinformatics.<\/jats:p><jats:p>Results: We study a new approach that involves automatically discovering English expression patterns, optimizing them and using them to extract protein\u2013protein interactions. In a sister paper, we described how to generate English expression patterns related to protein\u2013protein interactions, and this approach alone has already achieved precision and recall rates significantly higher than those of other automatic systems. This paper continues to present our theory, focusing on how to improve the patterns. A minimum description length (MDL)-based pattern-optimization algorithm is designed to reduce and merge patterns. This has significantly increased generalization power, and hence the recall and precision rates, as confirmed by ourexperiments.<\/jats:p><jats:p>Availability: \u00a0http:\/\/spies.cs.tsinghua.edu.cn<\/jats:p><jats:p>Contact: \u00a0zxy-dcs@tsinghua.edu.cn<\/jats:p>","DOI":"10.1093\/bioinformatics\/bti493","type":"journal-article","created":{"date-parts":[[2005,5,13]],"date-time":"2005-05-13T00:24:40Z","timestamp":1115943880000},"page":"3294-3300","source":"Crossref","is-referenced-by-count":66,"title":["Discovering patterns to extract protein\u2013protein interactions from the literature: Part II"],"prefix":"10.1093","volume":"21","author":[{"given":"Yu","family":"Hao","sequence":"first","affiliation":[]},{"given":"Xiaoyan","family":"Zhu","sequence":"additional","affiliation":[]},{"given":"Minlie","family":"Huang","sequence":"additional","affiliation":[]},{"given":"Ming","family":"Li","sequence":"additional","affiliation":[]}],"member":"286","published-online":{"date-parts":[[2005,5,12]]},"reference":[{"doi-asserted-by":"crossref","unstructured":"Bader, G.D., et al. 2001BIND\u2014the biomolecular interaction network database. Nucleic Acids Res. \u00a029 \u00a0242\u2013245","key":"2023051612004055000_B1","DOI":"10.1093\/nar\/29.1.242"},{"unstructured":"Brill, E. 1995Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Computational Linguistics \u00a021 \u00a0543\u2013565","key":"2023051612004055000_B2"},{"doi-asserted-by":"crossref","unstructured":"Friedman, C., et al. 2001Genies: a natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics \u00a017 \u00a0S74\u2013S82","key":"2023051612004055000_B3","DOI":"10.1093\/bioinformatics\/17.suppl_1.S74"},{"doi-asserted-by":"crossref","unstructured":"Hirschman, L., et al. 2002Accomplishments and challenges in literature data mining for biology. Bioinformatics \u00a018 \u00a01553\u20131561","key":"2023051612004055000_B4","DOI":"10.1093\/bioinformatics\/18.12.1553"},{"doi-asserted-by":"crossref","unstructured":"Huang, M.L., et al. 2004Discovering patterns to extract protein\u2013protein interactions from full texts. Bioinformatics \u00a020 \u00a03604\u20133612","key":"2023051612004055000_B5","DOI":"10.1093\/bioinformatics\/bth451"},{"doi-asserted-by":"crossref","unstructured":"Leroy, G. and Chen, H. 2002Filling preposition-based templates to capture information from medical abstracts. Pacific Symposium on BiocomputingHawaii, USA Vol. 7, pp. 350\u2013361","key":"2023051612004055000_B6","DOI":"10.1142\/9789812799623_0033"},{"unstructured":"Li, M. and Vitanyi, P. An introduction to Kolmogorov complexity and itsapplications \u00a01997 Springer-Verlag","key":"2023051612004055000_B7"},{"doi-asserted-by":"crossref","unstructured":"Marcotte, E.M., et al. 2001Mining literature for protein\u2013protein interactions. Bioinformatics \u00a017 \u00a0359\u2013363","key":"2023051612004055000_B8","DOI":"10.1093\/bioinformatics\/17.4.359"},{"unstructured":"Ng, S.K. and Wong, M. 1999Toward routine automatic pathway discovery from on-line scientific text abstracts. Proceedings of 10th International Workshop on Genome InformaticsTokyo , pp. 104\u2013112","key":"2023051612004055000_B9"},{"doi-asserted-by":"crossref","unstructured":"Ono, T., et al. 2001Automated extraction of information on protein\u2013protein interactions from the biological literature. Bioinformatics \u00a017 \u00a0155\u2013161","key":"2023051612004055000_B10","DOI":"10.1093\/bioinformatics\/17.2.155"},{"doi-asserted-by":"crossref","unstructured":"Park, J.C., Kim, H.S., Kim, J.J. 2001Bidirectional incremental parsing for automatic pathway identify-cation with combinatory categorical grammar. Proceedings of the Pacific Symposium on BiocomputtingHawaii, USA , pp. 396\u2013407","key":"2023051612004055000_B11","DOI":"10.1142\/9789814447362_0039"},{"doi-asserted-by":"crossref","unstructured":"Pustejovsky, J., Castano, J., Zhang, J., Kotecki, M., Cochran, B. 2002Robust relational parsing over biomedical literature: extracting inhibit relations. Proceedings of the 7th Pacific Symposium on Biocomputing 2002Hawaii, USA , pp. 362\u2013373","key":"2023051612004055000_B12","DOI":"10.1142\/9789812799623_0034"},{"unstructured":"Ray, S. and Craven, M. 2001Representing sentence structure in hidden markov models for information extraction. Proceedings of the 17th International Joint Conference on Artificial Intelligence (IJCAI 2001)Morgan Kaufmann , pp. 1273\u20131279","key":"2023051612004055000_B13"},{"doi-asserted-by":"crossref","unstructured":"Rissanen, J. 1978Modelling by shortest data description. Automatica \u00a014 \u00a0465\u2013471","key":"2023051612004055000_B14","DOI":"10.1016\/0005-1098(78)90005-5"},{"doi-asserted-by":"crossref","unstructured":"Salwinski, L., et al. 2004The database of interacting proteins: 2004 update. Nucleic Acids Res. \u00a032 \u00a0D449\u2013D451","key":"2023051612004055000_B15","DOI":"10.1093\/nar\/gkh086"},{"doi-asserted-by":"crossref","unstructured":"Temkin, J.M. and Gilder, M.R. 2003Extraction of protein interaction information from unstructured text using a context-free grammar. Bioinformatics \u00a019 \u00a02046\u20132053","key":"2023051612004055000_B16","DOI":"10.1093\/bioinformatics\/btg279"},{"doi-asserted-by":"crossref","unstructured":"Thomas, J., Milward, D., Ouzounis, C., Pulman, S., Carroll, M. 2000Automatic extraction of protein interactions from scientific abstracts. Proceedings of the Pacific Symposium on BiocomputingHawaii, USA , pp. 541\u2013551","key":"2023051612004055000_B17","DOI":"10.1142\/9789814447331_0051"},{"doi-asserted-by":"crossref","unstructured":"Vit\u00e1nyi, P. and Li, M. 2000Minimum description length induction, Bayesianism and Kolmogorov Complexity. IEEE transactions on information theory \u00a047 \u00a0446\u2013464","key":"2023051612004055000_B18","DOI":"10.1109\/18.825807"},{"doi-asserted-by":"crossref","unstructured":"Wong, L. 2001A protein interaction extraction system. Proceedings of Pacific Symposium on Biocomputing 2001Hawaii , pp. 520\u2013530","key":"2023051612004055000_B19","DOI":"10.1142\/9789814447362_0050"},{"doi-asserted-by":"crossref","unstructured":"Yakushiji, A., Tateisi, Y., Miyao, Y., Tsujii, J. 2001Event extraction from biomedical papers using a full parser. Proceedings of the sixth Pacific Symposium on Biocomputing 2001Hawaii, USA , pp. 408\u2013419","key":"2023051612004055000_B20","DOI":"10.1142\/9789814447362_0040"},{"unstructured":"Yao, D., Wang, J., Lu, Y., Noble, N., Sun, H., Zhu, X., Lin, N., Payan, D.G., Li, M., Qu, K., et al. 2004PathwayFinder: paving the way towards automatic pathway extraction. Bioinformatics 2004: Proceedings of the 2nd Asia-Pacific Bioinformatics Conference (APBC) , pp. 53\u201362 29 volume of CRPIT","key":"2023051612004055000_B21"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/21\/15\/3294\/50340680\/bioinformatics_21_15_3294.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/21\/15\/3294\/50340680\/bioinformatics_21_15_3294.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,1,1]],"date-time":"2025-01-01T07:03:20Z","timestamp":1735715000000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/21\/15\/3294\/195397"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2005,5,12]]},"references-count":21,"journal-issue":{"issue":"15","published-print":{"date-parts":[[2005,8,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bti493","relation":{},"ISSN":["1367-4803","1367-4811"],"issn-type":[{"type":"print","value":"1367-4803"},{"type":"electronic","value":"1367-4811"}],"subject":[],"published-other":{"date-parts":[[2005,8]]},"published":{"date-parts":[[2005,5,12]]}}}