{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,23]],"date-time":"2026-01-23T05:22:02Z","timestamp":1769145722537,"version":"3.49.0"},"reference-count":36,"publisher":"Oxford University Press (OUP)","issue":"19","license":[{"start":{"date-parts":[[2016,10,2]],"date-time":"2016-10-02T00:00:00Z","timestamp":1475366400000},"content-version":"vor","delay-in-days":1885,"URL":"http:\/\/creativecommons.org\/licenses\/by-nc\/2.5"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2011,10,1]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: Understanding key biological processes (bioprocesses) and their relationships with constituent biological entities and pharmaceutical agents is crucial for drug design and discovery. One way to harvest such information is searching the literature. However, bioprocesses are difficult to capture because they may occur in text in a variety of textual expressions. Moreover, a bioprocess is often composed of a series of bioevents, where a bioevent denotes changes to one or a group of cells involved in the bioprocess. Such bioevents are often used to refer to bioprocesses in text, which current techniques, relying solely on specialized lexicons, struggle to find.<\/jats:p>\n               <jats:p>Results: This article presents a range of methods for finding bioprocess terms and events. To facilitate the study, we built a gold standard corpus in which terms and events related to angiogenesis, a key biological process of the growth of new blood vessels, were annotated. Statistics of the annotated corpus revealed that over 36% of the text expressions that referred to angiogenesis appeared as events. The proposed methods respectively employed domain-specific vocabularies, a manually annotated corpus and unstructured domain-specific documents. Evaluation results showed that, while a supervised machine-learning model yielded the best precision, recall and F1 scores, the other methods achieved reasonable performance and less cost to develop.<\/jats:p>\n               <jats:p>Availability: The angiogenesis vocabularies, gold standard corpus, annotation guidelines and software described in this article are available at http:\/\/text0.mib.man.ac.uk\/~mbassxw2\/angiogenesis\/<\/jats:p>\n               <jats:p>Contact: \u00a0xinglong.wang@gmail.com<\/jats:p>","DOI":"10.1093\/bioinformatics\/btr460","type":"journal-article","created":{"date-parts":[[2011,8,6]],"date-time":"2011-08-06T03:40:04Z","timestamp":1312602004000},"page":"2730-2737","source":"Crossref","is-referenced-by-count":13,"title":["Automatic extraction of angiogenesis bioprocess from text"],"prefix":"10.1093","volume":"27","author":[{"given":"Xinglong","family":"Wang","sequence":"first","affiliation":[{"name":"1 National Centre for Text Mining, 2School of Computer Science, University of Manchester, Manchester, 3AstraZeneca, Alderley Park, UK and 4Microsoft Research Asia, Beijing, China"},{"name":"1 National Centre for Text Mining, 2School of Computer Science, University of Manchester, Manchester, 3AstraZeneca, Alderley Park, UK and 4Microsoft Research Asia, Beijing, China"}]},{"given":"Iain","family":"McKendrick","sequence":"additional","affiliation":[{"name":"1 National Centre for Text Mining, 2School of Computer Science, University of Manchester, Manchester, 3AstraZeneca, Alderley Park, UK and 4Microsoft Research Asia, Beijing, China"}]},{"given":"Ian","family":"Barrett","sequence":"additional","affiliation":[{"name":"1 National Centre for Text Mining, 2School of Computer Science, University of Manchester, Manchester, 3AstraZeneca, Alderley Park, UK and 4Microsoft Research Asia, Beijing, China"}]},{"given":"Ian","family":"Dix","sequence":"additional","affiliation":[{"name":"1 National Centre for Text Mining, 2School of Computer Science, University of Manchester, Manchester, 3AstraZeneca, Alderley Park, UK and 4Microsoft Research Asia, Beijing, China"}]},{"given":"Tim","family":"French","sequence":"additional","affiliation":[{"name":"1 National Centre for Text Mining, 2School of Computer Science, University of Manchester, Manchester, 3AstraZeneca, Alderley Park, UK and 4Microsoft Research Asia, Beijing, China"}]},{"given":"Jun'ichi","family":"Tsujii","sequence":"additional","affiliation":[{"name":"1 National Centre for Text Mining, 2School of Computer Science, University of Manchester, Manchester, 3AstraZeneca, Alderley Park, UK and 4Microsoft Research Asia, Beijing, China"}]},{"given":"Sophia","family":"Ananiadou","sequence":"additional","affiliation":[{"name":"1 National Centre for Text Mining, 2School of Computer Science, University of Manchester, Manchester, 3AstraZeneca, Alderley Park, UK and 4Microsoft Research Asia, Beijing, China"},{"name":"1 National Centre for Text Mining, 2School of Computer Science, University of Manchester, Manchester, 3AstraZeneca, Alderley Park, UK and 4Microsoft Research Asia, Beijing, China"}]}],"member":"286","published-online":{"date-parts":[[2011,8,5]]},"reference":[{"key":"2023012512004979500_B1","first-page":"556","article-title":"Assisted curation: does text mining really help?","volume":"13","author":"Alex","year":"2008","journal-title":"Pac. Symp. Biocomput."},{"key":"2023012512004979500_B2","first-page":"33","article-title":"Scalable training of L1-regularized log-linear models","volume-title":"Proceedings of the ICML.","author":"Andrew","year":"2007"},{"key":"2023012512004979500_B3","doi-asserted-by":"crossref","first-page":"381","DOI":"10.1016\/j.tibtech.2010.04.005","article-title":"Event extraction for systems biology by text mining the literature","volume":"28","author":"Ananiadou","year":"2010","journal-title":"Trends Biotechnol."},{"key":"2023012512004979500_B4","doi-asserted-by":"crossref","DOI":"10.3115\/1572340.1572343","article-title":"Extracting complex biological events with rich graph-based feature sets","volume-title":"Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task.","author":"Bj\u00f6rne","year":"2009"},{"key":"2023012512004979500_B5","doi-asserted-by":"crossref","first-page":"310","DOI":"10.3115\/981863.981904","article-title":"An empirical study of smoothing techniques for language modeling","volume-title":"Proceedings of the 34th Annual Meeting of the Association of Computational Linguistics.","author":"Chen","year":"1996"},{"key":"2023012512004979500_B6","doi-asserted-by":"crossref","DOI":"10.1002\/0471200611","volume-title":"Elements of Information Theory.","author":"Cover","year":"1991"},{"key":"2023012512004979500_B7","doi-asserted-by":"crossref","first-page":"433","DOI":"10.1016\/0306-4573(93)90039-G","article-title":"Generating and evaluating domain-oriented multi-word terms from texts","volume":"29","author":"Damerau","year":"1993","journal-title":"Informat. Process. Manag."},{"key":"2023012512004979500_B8","volume-title":"Nature Special Issue on Angiogenesis","author":"DeWitt","year":"2005"},{"key":"2023012512004979500_B9","first-page":"668","article-title":"Domain-specific keyphrase extraction","volume-title":"Proceedings of International Joint Conference on Artificial Intelligence (IJCAI).","author":"Frank","year":"1999"},{"key":"2023012512004979500_B10","doi-asserted-by":"crossref","first-page":"117","DOI":"10.1007\/s007999900023","article-title":"Automatic recognition of multi-word terms","volume":"3","author":"Frantzi","year":"2000","journal-title":"Int. J. Digit. Libr."},{"issue":"Suppl. 1","key":"2023012512004979500_B11","doi-asserted-by":"crossref","first-page":"S14","DOI":"10.1186\/1471-2105-6-S1-S14","article-title":"ProMine: rule-based protein and gene entity recognition","volume":"6","author":"Hanisch","year":"2005","journal-title":"BMC Bioinformatics"},{"issue":"Suppl. 1","key":"2023012512004979500_B12","doi-asserted-by":"crossref","first-page":"S1","DOI":"10.1186\/1471-2105-6-S1-S1","article-title":"Overview of the BioCreative: critical assessment of information extraction for biology","volume":"6","author":"Hirschman","year":"2005","journal-title":"BMC. Bioinformatics"},{"key":"2023012512004979500_B13","doi-asserted-by":"crossref","first-page":"589","DOI":"10.1016\/j.molcel.2006.02.012","article-title":"Biomedical language processing: what's beyond PubMed","volume":"21","author":"Hunter","year":"2006","journal-title":"Mol. Cell"},{"key":"2023012512004979500_B14","doi-asserted-by":"crossref","first-page":"1116","DOI":"10.1093\/bioinformatics\/btg047","article-title":"Learning rule-based models of biological process from gene expression time profiles using gene ontology","volume":"19","author":"Hvidsten","year":"2003","journal-title":"Bioinformatics"},{"key":"2023012512004979500_B15","doi-asserted-by":"crossref","first-page":"400","DOI":"10.1109\/TASSP.1987.1165125","article-title":"Estimation of probabilities from sparse data for language model component of a speech recogniser","volume":"35","author":"Katz","year":"1987","journal-title":"IEEE Trans. Acoust. Speech Signal Process."},{"key":"2023012512004979500_B16","article-title":"GENIA ontology","volume-title":"Technical Report TR-NLP-UT-2006-2.","author":"Kim","year":"2006"},{"key":"2023012512004979500_B17","doi-asserted-by":"crossref","DOI":"10.1186\/1471-2105-9-10","article-title":"Corpus annotation for mining biomedical events from literature","volume":"9","author":"Kim","year":"2008","journal-title":"BMC Bioinformatics"},{"key":"2023012512004979500_B18","doi-asserted-by":"crossref","DOI":"10.3115\/1572340.1572342","article-title":"Overview of BioNLP'09 shared task on event extraction","volume-title":"Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task.","author":"Kim","year":"2009"},{"key":"2023012512004979500_B19","doi-asserted-by":"crossref","first-page":"1227","DOI":"10.1093\/bioinformatics\/bti084","article-title":"Automatic extraction of gene\/protein biological functions from biomedical text","volume":"21","author":"Koike","year":"2004","journal-title":"Bioinformatics"},{"key":"2023012512004979500_B20","doi-asserted-by":"crossref","first-page":"711","DOI":"10.1038\/nrd1470","article-title":"Can the pharmaceutical industry reduce attrition rate?","volume":"3","author":"Kola","year":"2004","journal-title":"Nat. Rev. Drug Discov."},{"issue":"Suppl. 2","key":"2023012512004979500_B21","doi-asserted-by":"crossref","first-page":"S4","DOI":"10.1186\/gb-2008-9-s2-s4","article-title":"Overview of the protein-protein interaction extraction task of BioCreative II","volume":"9","author":"Krallinger","year":"2008","journal-title":"Genome Biol."},{"issue":"Suppl. 2","key":"2023012512004979500_B22","first-page":"S1","article-title":"Evaluation of text mining systems for biology: overview of the second BioCreative community challenge.Genome","volume":"9","author":"Krallinger","year":"2008","journal-title":"Biol."},{"key":"2023012512004979500_B23","first-page":"282","article-title":"Conditional random fields: Probabilistic models for segmenting and labeling sequence data","volume-title":"Proceedings of the 18th International Conference on Machine Learning.","author":"Lafferty","year":"2001"},{"key":"2023012512004979500_B24","doi-asserted-by":"crossref","first-page":"131","DOI":"10.1142\/S0219720010004586","article-title":"Event extraction with complex event classification using rich features","volume":"8","author":"Miwa","year":"2010","journal-title":"J. Bioinformatics Comput. Biol."},{"key":"2023012512004979500_B25","doi-asserted-by":"crossref","first-page":"35","DOI":"10.1162\/coli.2008.34.1.35","article-title":"Feature forest models for probabilistic HPSG parsing","volume":"34","author":"Miyao","year":"2008","journal-title":"Comput. Linguist."},{"key":"2023012512004979500_B26","doi-asserted-by":"crossref","first-page":"394","DOI":"10.1093\/bioinformatics\/btn631","article-title":"Evaluating contributions of natural language parsers to protein-protein interaction extraction","volume":"25","author":"Miyao","year":"2009","journal-title":"Bioinformatics"},{"key":"2023012512004979500_B27","doi-asserted-by":"crossref","first-page":"161","DOI":"10.1162\/coli.2007.33.2.161","article-title":"Dependency-based construction of semantic space models","volume":"33","author":"Pad\u00f3","year":"2007","journal-title":"Comput. Linguist."},{"key":"2023012512004979500_B28","first-page":"82","article-title":"Text chunking using transformation based learning","volume-title":"Proceedings of the 3rd ACL Workshop on Very Large Corpora","author":"Ramhsaw","year":"1995"},{"key":"2023012512004979500_B29","article-title":"BioLexicon: a lexical resource for the biology domain","volume-title":"Proceedings of the Third International Symposium on Semantic Mining in Biomedicine (SMBM).","author":"Sasaki","year":"2008"},{"issue":"Suppl. 11","key":"2023012512004979500_B30","doi-asserted-by":"crossref","first-page":"S5","DOI":"10.1186\/1471-2105-9-S11-S5","article-title":"How to make the most of NE dictionaries in statistical NER","volume":"9","author":"Sasaki","year":"2008","journal-title":"BMC Bioinformatics"},{"issue":"Suppl. 2","key":"2023012512004979500_B31","article-title":"Overview of BioCreative II gene mention recognition","volume":"9","author":"Smith","year":"2008","journal-title":"Genome Biol."},{"key":"2023012512004979500_B32","doi-asserted-by":"crossref","DOI":"10.3115\/1119282.1119287","article-title":"A language model approach to keyphrase extraction","volume-title":"Proceedings of the ACL Workshop on Multiword Expressions.","author":"Tomokiyo","year":"2003"},{"key":"2023012512004979500_B33","doi-asserted-by":"crossref","first-page":"477","DOI":"10.3115\/1687878.1687946","article-title":"Stochastic gradient descent training for L1-regularized log-linear models with cumulative penalty","volume-title":"Proceedings of ACL-IJCNLP.","author":"Tsuruoka","year":"2009"},{"key":"2023012512004979500_B34","article-title":"Learning to extract keyphrases from text","volume-title":"Technical Report ERB-1057.","author":"Turney","year":"1999"},{"key":"2023012512004979500_B35","doi-asserted-by":"crossref","first-page":"661","DOI":"10.1093\/bioinformatics\/btq002","article-title":"Disambiguating the species of biomedical named entities using natural language parsers","volume":"26","author":"Wang","year":"2010","journal-title":"Bioinformatics"},{"key":"2023012512004979500_B36","first-page":"7","article-title":"BioCreative II Gene Mention Task","volume-title":"Proceeding of the BioCreative II Workshop.","author":"Wilbur","year":"2007"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/27\/19\/2730\/48870123\/bioinformatics_27_19_2730.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/27\/19\/2730\/48870123\/bioinformatics_27_19_2730.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,25]],"date-time":"2023-01-25T13:55:59Z","timestamp":1674654959000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/27\/19\/2730\/231424"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,8,5]]},"references-count":36,"journal-issue":{"issue":"19","published-print":{"date-parts":[[2011,10,1]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/btr460","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2011,10,1]]},"published":{"date-parts":[[2011,8,5]]}}}