{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,2,12]],"date-time":"2026-02-12T15:37:16Z","timestamp":1770910636115,"version":"3.50.1"},"reference-count":50,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2004,10,19]],"date-time":"2004-10-19T00:00:00Z","timestamp":1098144000000},"content-version":"tdm","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/2.0"},{"start":{"date-parts":[[2004,10,19]],"date-time":"2004-10-19T00:00:00Z","timestamp":1098144000000},"content-version":"vor","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/2.0"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"abstract":"<jats:title>Abstract<\/jats:title><jats:sec>\n                        <jats:title>Background<\/jats:title>\n                        <jats:p>The exploitation of information extraction (IE), a technology aiming to provide instances of structured representations from free-form text, has been rapidly growing within the molecular biology (MB) research community to keep track of the latest results reported in literature. IE systems have traditionally used shallow syntactic patterns for matching facts in sentences but such approaches appear inadequate to achieve high accuracy in MB event extraction due to complex sentence structure. A consensus in the IE community is emerging on the necessity for exploiting deeper knowledge structures such as through the relations between a verb and its arguments shown by predicate-argument structure (PAS). PAS is of interest as structures typically correspond to events of interest and their participating entities. For this to be realized within IE a key knowledge component is the definition of PAS frames. PAS frames for non-technical domains such as newswire are already being constructed in several projects such as PropBank, VerbNet, and FrameNet. Knowledge from PAS should enable more accurate applications in several areas where sentence understanding is required like machine translation and text summarization. In this article, we explore the need to adapt PAS for the MB domain and specify PAS frames to support IE, as well as outlining the major issues that require consideration in their construction.<\/jats:p>\n                     <\/jats:sec><jats:sec>\n                        <jats:title>Results<\/jats:title>\n                        <jats:p>We introduce <jats:bold>PASBio<\/jats:bold> by extending a model based on PropBank to the MB domain. The hypothesis we explore is that PAS holds the key for understanding relationships describing the roles of genes and gene products in mediating their biological functions. We chose predicates describing gene expression, molecular interactions and signal transduction events with the aim of covering a number of research areas in MB. Analysis was performed on sentences containing a set of verbal predicates from MEDLINE and full text journals. Results confirm the necessity to analyze PAS specifically for MB domain.<\/jats:p>\n                     <\/jats:sec><jats:sec>\n                        <jats:title>Conclusions<\/jats:title>\n                        <jats:p>At present <jats:bold>PASBio<\/jats:bold> contains the analyzed PAS of over 30 verbs, publicly available on the Internet for use in advanced applications. In the future we aim to expand the knowledge base to cover more verbs and the nominal form of each predicate.<\/jats:p>\n                     <\/jats:sec>","DOI":"10.1186\/1471-2105-5-155","type":"journal-article","created":{"date-parts":[[2004,10,29]],"date-time":"2004-10-29T13:25:43Z","timestamp":1099056343000},"update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":48,"title":["PASBio: predicate-argument structures for event extraction in molecular biology"],"prefix":"10.1186","volume":"5","author":[{"given":"Tuangthong","family":"Wattarujeekrit","sequence":"first","affiliation":[]},{"given":"Parantu K","family":"Shah","sequence":"additional","affiliation":[]},{"given":"Nigel","family":"Collier","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2004,10,19]]},"reference":[{"key":"271_CR1","doi-asserted-by":"publisher","first-page":"235","DOI":"10.1093\/nar\/28.1.235","volume":"28","author":"HM Berman","year":"2000","unstructured":"Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank.\n                           Nucleic Acids Research 2000, 28: 235\u2013242. 10.1093\/nar\/28.1.235","journal-title":"Nucleic Acids Research"},{"key":"271_CR2","doi-asserted-by":"publisher","first-page":"264","DOI":"10.1093\/nar\/30.1.264","volume":"30","author":"L Lo Conte","year":"2002","unstructured":"Lo Conte L, Brenner SE, Hubbard TJP, Chothia C, Murzin A: SCOP database in 2002: refinements accommodate structural genomics.\n                           Nucleic Acids Research 2002, 30: 264\u2013267. 10.1093\/nar\/30.1.264","journal-title":"Nucleic Acids Research"},{"key":"271_CR3","doi-asserted-by":"publisher","first-page":"242","DOI":"10.1093\/nar\/29.1.242","volume":"29","author":"GD Bader","year":"2001","unstructured":"Bader GD, Donaldson I, Wolting C, Ouellette BF, Pawson T, Hogue CW: BIND-The Biomolecular Interaction Network Database.\n                           Nucleic Acids Research 2001, 29: 242\u2013245. 10.1093\/nar\/29.1.242","journal-title":"Nucleic Acids Research"},{"key":"271_CR4","doi-asserted-by":"publisher","first-page":"302","DOI":"10.1093\/nar\/28.1.302","volume":"28","author":"A Bairoch","year":"2000","unstructured":"Bairoch A, Apweiler R: The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000.\n                           Nucleic Acids Research 2000, 28: 302\u2013303. 10.1093\/nar\/28.1.302","journal-title":"Nucleic Acids Research"},{"key":"271_CR5","doi-asserted-by":"publisher","first-page":"135","DOI":"10.1016\/S0014-5793(01)03293-8","volume":"513","author":"A Zanzoni","year":"2002","unstructured":"Zanzoni A, Montecchi-Palazzi L, Quondam M, Ausiello G, Helmer-Citterich M, Cesareni G: MINT: a Molecular INTeraction database.\n                           FEBS Lett 2002, 513: 135\u2013140. 10.1016\/S0014-5793(01)03293-8","journal-title":"FEBS Lett"},{"key":"271_CR6","doi-asserted-by":"crossref","first-page":"316","DOI":"10.1038\/ng895","volume":"31","author":"C Perez-Iratxeta","year":"2002","unstructured":"Perez-Iratxeta C, Bork P, Andrade MA: Association of genes to genetically inherited diseases using data mining.\n                           Nature Genetics 2002, 31: 316\u2013319.","journal-title":"Nature Genetics"},{"key":"271_CR7","doi-asserted-by":"publisher","first-page":"239","DOI":"10.1075\/term.7.2.07col","volume":"7","author":"N Collier","year":"2002","unstructured":"Collier N, Nobata C, Tsujii J: Automatic Acquisition and Classification of Terminology using a Tagged Corpus in the Molecular Biology Domain.\n                           Terminology 2002, 7: 239\u2013257.","journal-title":"Terminology"},{"key":"271_CR8","first-page":"707","volume-title":"Pac Sym Biocomput","author":"K Fukuda","year":"1998","unstructured":"Fukuda K, Tsunoda T, Tamura A, Takagi T: Towards information extraction: Identifying protein names from biological papers.\n                           Pac Sym Biocomput 1998, 707\u2013718."},{"key":"271_CR9","doi-asserted-by":"publisher","first-page":"1124","DOI":"10.1093\/bioinformatics\/18.8.1124","volume":"18","author":"L Tanabe","year":"2002","unstructured":"Tanabe L, Wilbur WJ: Tagging gene and protein names in biomedical text.\n                           Bioinformatics 2002, 18: 1124\u20131132. 10.1093\/bioinformatics\/18.8.1124","journal-title":"Bioinformatics"},{"key":"271_CR10","doi-asserted-by":"publisher","first-page":"43","DOI":"10.3115\/1567594.1567602","volume-title":"Joint Workshop on Natural Language Processing in Biomedicine and its applications","author":"E Alphonse","year":"2004","unstructured":"Alphonse E, Aubin Sophie., Bessieres P, Bisson G, Hamon T, Lagarrigue S, Nazarenko A, Manine A, Nedellec C, Vetah M, Poibeau T, Weissenbacher D: Event-based Information Extraction for the biomedical domain: the Caderge project. In Joint Workshop on Natural Language Processing in Biomedicine and its applications. Geneva, Switzerland; 2004:43\u201349."},{"key":"271_CR11","first-page":"60","volume-title":"Proc Int Conf Intell Syst Mol Bio","author":"C Blaschke","year":"1999","unstructured":"Blaschke C, Andrade MA, Ouzounis C, Valencia A: Automatic extraction of biological information from scientific text: Protein-protein interactions. In Proc Int Conf Intell Syst Mol Bio. Heidelberg; 1999:60\u201367."},{"key":"271_CR12","doi-asserted-by":"publisher","first-page":"11","DOI":"10.1186\/1471-2105-4-11","volume":"4","author":"I Donaldson","year":"2003","unstructured":"Donaldson I, Martin J, de Bruijn B, Wolting C, Lay V, Tuekam B, Zhang S, Baskin B, Bader GD, Michalickova K, Pawson T, Hogue CW: PreBIND and Textomy - mining the biomedical literature for protein-protein interactions using a support vector machine.\n                           BMC Bioinformatics 2003, 4: 11\u201311. 10.1186\/1471-2105-4-11","journal-title":"BMC Bioinformatics"},{"key":"271_CR13","doi-asserted-by":"publisher","first-page":"359","DOI":"10.1093\/bioinformatics\/17.4.359","volume":"17","author":"E Marcotte","year":"2001","unstructured":"Marcotte E, Xenarios I, Eisenberg D: Mining literature for protein-protein interactions.\n                           Bioinformatics 2001, 17: 359\u2013363. 10.1093\/bioinformatics\/17.4.359","journal-title":"Bioinformatics"},{"key":"271_CR14","doi-asserted-by":"publisher","first-page":"1699","DOI":"10.1093\/bioinformatics\/btg207","volume":"19","author":"S Novichkova","year":"2003","unstructured":"Novichkova S, Egorov S, Daraselia N: MedScan, a natural language processing engine for MEDLINE abstracts.\n                           Bioinformatics 2003, 19: 1699\u20131706. 10.1093\/bioinformatics\/btg207","journal-title":"Bioinformatics"},{"key":"271_CR15","doi-asserted-by":"publisher","first-page":"155","DOI":"10.1093\/bioinformatics\/17.2.155","volume":"17","author":"T Ono","year":"2001","unstructured":"Ono T, Hishigaki H, Tanigami A, Takagi T: Automated extraction of information on protein-protein interactions from the biological literature.\n                           Bioinformatics 2001, 17: 155\u2013161. 10.1093\/bioinformatics\/17.2.155","journal-title":"Bioinformatics"},{"key":"271_CR16","first-page":"362","volume-title":"Pacific Symposium on Biocomputing","author":"J Pustejovsky","year":"2002","unstructured":"Pustejovsky J, Castano J, Zhang J, Kotecki M, Cochran B: Robust Relational Parsing Over Biomedical Literature: Extracting Inhibit Relations.\n                           Pacific Symposium on Biocomputing 2002, 362\u2013373."},{"key":"271_CR17","doi-asserted-by":"publisher","first-page":"188","DOI":"10.3115\/974147.974173","volume-title":"6th Conference on Applied Natural Language Processing (ANLP-NAACL'2000)","author":"TC Rindflesch","year":"2000","unstructured":"Rindflesch TC, Rajan JV, Hunter L: Extracting Molecular Binding Relationships from Biomedical Text. In 6th Conference on Applied Natural Language Processing (ANLP-NAACL'2000). WA; 2000:188\u2013195."},{"key":"271_CR18","first-page":"62","volume-title":"Genome Inform","author":"T Sekimizu","year":"1998","unstructured":"Sekimizu T, Park HS, Tsujii J: Identifying the interaction between genes and gene products based on frequently seen verbs in MEDLINE abstracts.\n                           Genome Inform 1998, 62\u201371."},{"key":"271_CR19","volume-title":"Mathematical Structures of Language","author":"Z Harris","year":"1968","unstructured":"Harris Z: Mathematical Structures of Language. In Mathematical Structures of Language. New York, Wiley-Interscience; 1968."},{"key":"271_CR20","volume-title":"Workshop on Adaptive Text Extraction and Mining at the 7th International Conference on Artificial Intelligence","author":"R Grishman","year":"2001","unstructured":"Grishman R: Adaptive Information Extraction and Sublanguage Analysis. In Workshop on Adaptive Text Extraction and Mining at the 7th International Conference on Artificial Intelligence. Seattle, USA; 2001."},{"key":"271_CR21","first-page":"86","volume-title":"36th Annual Meeting of the ACL and the 17th International Conference on Computational Linguistics (COLING-ACL 1998)","author":"CF Baker","year":"1998","unstructured":"Baker CF, Fillmore CJ, Lowe JB: The Berkeley FrameNet project. In 36th Annual Meeting of the ACL and the 17th International Conference on Computational Linguistics (COLING-ACL 1998). Montreal; 1998:86\u201390."},{"key":"271_CR22","first-page":"1989","volume-title":"3rd International Conference on Language Resources and Evaluation (LREC-2002)","author":"P Kingsbury","year":"2002","unstructured":"Kingsbury P, Palmer M: From Treebank to PropBank. In 3rd International Conference on Language Resources and Evaluation (LREC-2002). Las Palmas; 2002:1989\u20131993."},{"key":"271_CR23","volume-title":"Human Language Technology Conference","author":"P Kingsbury","year":"2002","unstructured":"Kingsbury P, Palmer M, Marcus M: Adding Semantic Annotation to the Penn TreeBank. In Human Language Technology Conference. San Diego, CA, USA; 2002."},{"key":"271_CR24","first-page":"691","volume-title":"17th National Conference on Artificial Intelligence (AAAI-2000)","author":"K Kipper","year":"2000","unstructured":"Kipper K, Dang HT, Palmer M: Class based construction of a verb lexicon. In 17th National Conference on Artificial Intelligence (AAAI-2000). Austin, TX; 2000:691\u2013696."},{"key":"271_CR25","volume-title":"8th International Conference on Medical Librarianship","author":"SJ Nelson","year":"2000","unstructured":"Nelson SJ, Schopen M, Schulman J, Arluk N: An Interlingual Database of MeSH Translations. In 8th International Conference on Medical Librarianship. London, UK; 2000."},{"key":"271_CR26","unstructured":"Gene Ontology[http:\/\/www.geneontology.org\/]"},{"key":"271_CR27","unstructured":"GENIA Project[http:\/\/www-tsujii.is.s.u-tokyo.ac.jp\/GENIA\/]"},{"key":"271_CR28","unstructured":"PASBio Project[http:\/\/research.nii.ac.jp\/~collier\/projects\/PASBio\/]"},{"key":"271_CR29","volume-title":"Natural Language Generation in the Context of Machine Translation","author":"J Hajic","year":"2004","unstructured":"Hajic J, Cmejrek M, Dorr B, Ding Y, Eisner J, Gildea D, Koo T, Parton K, Penn G, Redev D, Rambow O: Natural Language Generation in the Context of Machine Translation. The Center for Language and Speech Processing, The Johns Hopkins University; 2004."},{"key":"271_CR30","first-page":"40","volume-title":"Association for Machine Translation in the Americas 2000","author":"C Han","year":"2000","unstructured":"Han C, Lavoie B, Palmer M, Rambow O, Kittredge R, Korelsky T, Kim N, Kim M: Handling Structural Divergences and Recovering Deropped Arguments in a Korean\/English Machine Translation System. In Association for Machine Translation in the Americas 2000. New York; 2000:40\u201353."},{"key":"271_CR31","unstructured":"DARPA In the Sixth Message Understanding Conference (MUC-7). Fairfax, VA, USA, Morgan Kaufmann; 1998."},{"key":"271_CR32","first-page":"348","volume-title":"English Verb Classes and Alternations: A Preliminary Investigation","author":"B Levin","year":"1993","unstructured":"Levin B: English Verb Classes and Alternations: A Preliminary Investigation. University of Chicago Press; 1993:348."},{"key":"271_CR33","doi-asserted-by":"crossref","first-page":"383","DOI":"10.7551\/mitpress\/3007.003.0015","volume-title":"Finite State Devices for Natural Language Processsing","author":"JR Hobbs","year":"1997","unstructured":"Hobbs JR, Appelt D, Israel D, Bear J, Kameyama M, Stickel M, Tyson M: Fastus: A cascade finite-state transducer for extracting information from natural-language text. In Finite State Devices for Natural Language Processsing. Edited by: Roche E and Schabes Y. MIT Press; 1997:383\u2013406."},{"key":"271_CR34","first-page":"1044","volume-title":"13th National Conference on Artificial Intelligence (AAAI-96)","author":"E Riloff","year":"1996","unstructured":"Riloff E: Automatically generating extraction patterns from untagged text. In 13th National Conference on Artificial Intelligence (AAAI-96). The AAAI Press\/MIT; 1996:1044\u20131049."},{"key":"271_CR35","first-page":"343","volume-title":"41st Annual Meeting of the Association for Computational Linguistics","author":"R Yangarber","year":"2003","unstructured":"Yangarber R: Counter-Training in Discovery of Semantic Patterns. In 41st Annual Meeting of the Association for Computational Linguistics. Tokyo; 2003:343\u2013350."},{"key":"271_CR36","unstructured":"MEDLINE Database[http:\/\/www.ncbi.nlm.nih.gov\/PubMed\/]"},{"key":"271_CR37","unstructured":"The EMBO Journal[http:\/\/www.nature.com\/emboj\/]"},{"key":"271_CR38","unstructured":"Proceedings of the National Academy of Sciences of the United States of America[http:\/\/www.pnas.org\/]"},{"key":"271_CR39","unstructured":"Nucleic Acids Research Articles[http:\/\/nar.oupjournals.org\/]"},{"key":"271_CR40","unstructured":"Journal of Virology[http:\/\/jvi.asm.org\/]"},{"key":"271_CR41","volume-title":"ARPA Human Language Technology Workshop","author":"M Marcus","year":"1994","unstructured":"Marcus M: The Penn Treebank: A revised corpus design for extracting predicate-argument structure. In ARPA Human Language Technology Workshop. Princeton, NJ; 1994."},{"key":"271_CR42","doi-asserted-by":"publisher","first-page":"25","DOI":"10.1038\/75556","volume":"25","author":"The Gene Ontology Consortium","year":"2000","unstructured":"Consortium The Gene Ontology: Gene ontology: Tool for the unification of biology.\n                           Nature Genetics 2000, 25: 25\u201329. 10.1038\/75556","journal-title":"Nature Genetics"},{"key":"271_CR43","doi-asserted-by":"publisher","first-page":"235","DOI":"10.1093\/ijl\/3.4.235","volume":"3","author":"GA Miller","year":"1990","unstructured":"Miller GA: WordNet: An on-line lexical database.\n                           International Journal of Lexicography 1990, 3: 235\u2013312.","journal-title":"International Journal of Lexicography"},{"key":"271_CR44","volume-title":"7th Euralex International Congress","author":"A Meyers","year":"1996","unstructured":"Meyers A, Macleod C, Grishman R: Standardization of the Complement Adjunct Distinction. In 7th Euralex International Congress. Goteborg; 1996."},{"key":"271_CR45","doi-asserted-by":"crossref","first-page":"272","DOI":"10.7551\/mitpress\/6754.001.0001","volume-title":"The Theory and Practice of Discourse Parsing and Summarization","author":"D Marcu","year":"2000","unstructured":"Marcu D: The Theory and Practice of Discourse Parsing and Summarization. MIT Press; 2000:272."},{"key":"271_CR46","first-page":"8","volume-title":"41th Annual Meeting of the Association for Computational Linguistics","author":"M Surdeanu","year":"2003","unstructured":"Surdeanu M, Harabagiu S, Williams J, Aarseth P: Using Predicate-Argument Structures for Information Extraction. In 41th Annual Meeting of the Association for Computational Linguistics. Tokyo; 2003:8\u201315."},{"key":"271_CR47","volume-title":"Workshop on the 1st International Joint Conference on Natural Language Processing (IJCNLP-04)","author":"Y Tateisi","year":"2004","unstructured":"Tateisi Y, Ohta T, Tsujii J: Annotation of Predicate-argument Structure on Molecular Biology Text. In Workshop on the 1st International Joint Conference on Natural Language Processing (IJCNLP-04). China; 2004."},{"key":"271_CR48","doi-asserted-by":"publisher","first-page":"29","DOI":"10.3115\/1567594.1567600","volume-title":"Joint Workshop on Natural Language Processing in Biomedicine and its Applications","author":"Y Mizuta","year":"2004","unstructured":"Mizuta Y, Collier N: Zone Indentification in Biology Articles as a Basis for Information Extraction. In Joint Workshop on Natural Language Processing in Biomedicine and its Applications. Geneva, Switzerland; 2004:29\u201335."},{"key":"271_CR49","doi-asserted-by":"publisher","first-page":"20","DOI":"10.1186\/1471-2105-4-20","volume":"4","author":"PK Shah","year":"2003","unstructured":"Shah PK, Perez-Iratxeta C, Bork P, Andrade MA: Information extraction from full text scientific articles: where are the keywords?\n                           BMC Bioinformatics 2003, 4: 20\u201320. 10.1186\/1471-2105-4-20","journal-title":"BMC Bioinformatics"},{"key":"271_CR50","doi-asserted-by":"publisher","first-page":"64","DOI":"10.3115\/974557.974568","volume-title":"5th Conference on Applied Natural Language Processing (ANLP'97)","author":"P Tapanainen","year":"1997","unstructured":"Tapanainen P, Jarvinen T: A non-projective dependency parser. In 5th Conference on Applied Natural Language Processing (ANLP'97). Washington, D.C.; 1997:64\u201371."}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-5-155.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1186\/1471-2105-5-155\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-5-155.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,10,7]],"date-time":"2024-10-07T12:19:20Z","timestamp":1728303560000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-5-155"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2004,10,19]]},"references-count":50,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2004,12]]}},"alternative-id":["271"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-5-155","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2004,10,19]]},"assertion":[{"value":"10 March 2004","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"19 October 2004","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"19 October 2004","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"155"}}