{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,7,18]],"date-time":"2024-07-18T16:31:10Z","timestamp":1721320270889},"reference-count":42,"publisher":"Springer Science and Business Media LLC","issue":"S3","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2008,4]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>This paper describes and evaluates a sentence selection engine that extracts a GeneRiF (Gene Reference into Functions) as defined in ENTREZ-Gene based on a MEDLINE record. Inputs for this task include both a gene and a pointer to a MEDLINE reference. In the suggested approach we merge two independent sentence extraction strategies. The first proposed strategy (LASt) uses argumentative features, inspired by discourse-analysis models. The second extraction scheme (GOEx) uses an automatic text categorizer to estimate the density of Gene Ontology categories in every sentence; thus providing a full ranking of all possible candidate GeneRiFs. A combination of the two approaches is proposed, which also aims at reducing the size of the selected segment by filtering out non-content bearing rhetorical phrases.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>Based on the TREC-2003 Genomics collection for GeneRiF identification, the LASt extraction strategy is already competitive (52.78%). When used in a combined approach, the extraction task clearly shows improvement, achieving a Dice score of over 57% (+10%).<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusions<\/jats:title>\n            <jats:p>Argumentative representation levels and conceptual density estimation using Gene Ontology contents appear complementary for functional annotation in proteomics.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-9-s3-s9","type":"journal-article","created":{"date-parts":[[2008,4,11]],"date-time":"2008-04-11T18:15:05Z","timestamp":1207937705000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":9,"title":["Gene Ontology density estimation and discourse analysis for automatic GeneRiF extraction"],"prefix":"10.1186","volume":"9","author":[{"given":"Julien","family":"Gobeill","sequence":"first","affiliation":[]},{"given":"Imad","family":"Tbahriti","sequence":"additional","affiliation":[]},{"given":"Fr\u00e9d\u00e9ric","family":"Ehrler","sequence":"additional","affiliation":[]},{"given":"Ana\u00efs","family":"Mottaz","sequence":"additional","affiliation":[]},{"given":"Anne-Lise","family":"Veuthey","sequence":"additional","affiliation":[]},{"given":"Patrick","family":"Ruch","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2008,4,11]]},"reference":[{"key":"2591_CR1","volume-title":"MUC-7 Named-Entity task Definition","author":"N Chinchor","year":"1997","unstructured":"Chinchor N: MUC-7 Named-Entity task Definition. 1997."},{"issue":"6","key":"2591_CR2","doi-asserted-by":"publisher","first-page":"413","DOI":"10.1016\/j.ijmedinf.2005.06.008","volume":"75","author":"N Collier","year":"2006","unstructured":"Collier N, Nazarenko A, Baud R, Ruch P: Recent advances in natural language processing for biomedical applications. Int J Med Inform 2006, 75(6):413\u2013417.","journal-title":"Int J Med Inform"},{"key":"2591_CR3","unstructured":"Ehrler F, Gobeill J, Tbahriti I, Ruch P: GeneTeam Site Report for BioCreative II: Customizing a Simple Toolkit for Text Mining in Molecular Biology. Proceedings of BioCreative II"},{"key":"2591_CR4","first-page":"121","volume-title":"Summarizing Text Documents","author":"J Goldstein","year":"1999","unstructured":"Goldstein J, Kantrowitz M, Mittal V, Carbonell J: Summarizing Text Documents. 1999, 121\u2013128."},{"issue":"Suppl 1","key":"2591_CR5","doi-asserted-by":"publisher","first-page":"S23","DOI":"10.1186\/1471-2105-6-S1-S23","volume":"6","author":"F Ehrler","year":"2005","unstructured":"Ehrler F, Geissb\u00fchler A, Jimeno A, Ruch P: Data-poor categorization and passage retrieval for gene ontology annotation in Swiss-Prot. BMC Bioinformatics 2005, 6(Suppl 1):S23.","journal-title":"BMC Bioinformatics"},{"key":"2591_CR6","first-page":"270","volume-title":"ACL","author":"M Strube","year":"1996","unstructured":"Strube M, Hahn U: Functional Centering. ACL 1996, 270\u2013277."},{"key":"2591_CR7","doi-asserted-by":"publisher","first-page":"171","DOI":"10.1016\/0306-4573(90)90014-S","volume":"26","author":"C Paice","year":"1990","unstructured":"Paice C: Constructing Literature Abstracts by Computer: Techniques and Prospects. Inform Proc Manag 1990, 26: 171\u201386.","journal-title":"Inform Proc Manag"},{"key":"2591_CR8","first-page":"68","volume-title":"SIGIR","author":"J Kupiec","year":"1995","unstructured":"Kupiec J, Pedersen J, Chen F: A Trainable Document Summarizer. SIGIR 1995, 68\u201373."},{"key":"2591_CR9","first-page":"155","volume-title":"Advances in Automatic Text Summarization","author":"S Teufel","year":"1999","unstructured":"Teufel S, Moens M: Argumentative Classification of Extracted Sentences as a First Step Towards Flexible Abstracting. Advances in Automatic Text Summarization 1999, 155\u2013171."},{"issue":"3","key":"2591_CR10","doi-asserted-by":"crossref","first-page":"179","DOI":"10.1371\/journal.pcbi.0010034","volume":"1","author":"P Bourne","year":"2005","unstructured":"Bourne P: Will a biological database be different from a biological journal? PLoS Comput Biol 2005, 1(3):179\u201381.","journal-title":"PLoS Comput Biol"},{"key":"2591_CR11","volume-title":"Proteome Research: new frontiers in functional genomics","author":"A Bairoch","year":"1997","unstructured":"Bairoch A: Proteome Research: new frontiers in functional genomics. Protein databases - Springer; 1997."},{"key":"2591_CR12","first-page":"60","volume-title":"ISMB","author":"C Blaschke","year":"1999","unstructured":"Blaschke C, Andrade M, Ouzounis C, Valencia A: Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions. ISMB 1999, 60\u201367."},{"issue":"Suppl 1","key":"2591_CR13","doi-asserted-by":"publisher","first-page":"S16","DOI":"10.1186\/1471-2105-6-S1-S16","volume":"6","author":"C Blaschke","year":"2005","unstructured":"Blaschke C, Leon E, Krallinger M, Valencia A: Evaluation of BioCreAtIvE assessment of task 2. BMC Bioinformatics 2005, 6(Suppl 1):S16.","journal-title":"BMC Bioinformatics"},{"key":"2591_CR14","volume-title":"TREC, NIST","author":"W Hersh","year":"2007","unstructured":"Hersh W, Cohen A, Rekapalli H, Roberts P: TREC 2006 Genomics Track Overview. TREC, NIST 2007."},{"key":"2591_CR15","doi-asserted-by":"publisher","first-page":"20","DOI":"10.1186\/1471-2105-4-20","volume":"4","author":"P Shah","year":"2003","unstructured":"Shah P, Perez-Iratxeta C, Bork P, Andrade M: Information extraction from full text scientific articles: Where are the keywords? BMC Bioinformatics 2003, 4():20.","journal-title":"BMC Bioinformatics"},{"key":"2591_CR16","volume-title":"SMBM Proceedings","author":"J Hakenberg","year":"2005","unstructured":"Hakenberg J, Rutsch J, Leser U: Tuning text classification for hereditary diseases with section weighting. SMBM Proceedings 2005."},{"key":"2591_CR17","volume-title":"COLING Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA\/BioNLP)","author":"Y Mizuta","year":"2004","unstructured":"Mizuta Y, Collier N: Zone Identification in Biology Articles as a Basis for Information Extraction. COLING Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA\/BioNLP) 2004."},{"key":"2591_CR18","volume-title":"SMBM Proceedings","author":"F Lisacek","year":"2005","unstructured":"Lisacek F, Chichester C, Kaplan A, Sandor A: Discovering Paradigm Shift Patterns in Biomedical Abstracts: Application to Neurodegenerative Diseases. SMBM Proceedings 2005."},{"issue":"6","key":"2591_CR19","doi-asserted-by":"publisher","first-page":"488","DOI":"10.1016\/j.ijmedinf.2005.06.007","volume":"75","author":"I Tbahriti","year":"2006","unstructured":"Tbahriti I, Chichester C, Lisacek F, Ruch P: Using argumentation to retrieve articles with similar citations: an inquiry into improving related articles search in the MEDLINE digital library. Int J Med Inf 2006, 75(6):488\u2013495.","journal-title":"Int J Med Inf"},{"key":"2591_CR20","volume-title":"ACL","author":"P Ruch","year":"2006","unstructured":"Ruch P, Tbahriti I, Gobeill J, Aronson A: Argumentative Feedback: A Linguistically-Motivated Term Expansion for Information Retrieval. ACL 2006."},{"key":"2591_CR21","first-page":"14","volume-title":"TREC-2003","author":"W Hersh","year":"2004","unstructured":"Hersh W, Bhupatiraju B: TREC Genomics Track Overview. TREC-2003 2004, 14\u201323."},{"key":"2591_CR22","first-page":"246","volume-title":"ISMB","author":"D Lewis","year":"1995","unstructured":"Lewis D: Evaluating and Optimizing Autonomous Text Classification Systems. ISMB 1995, 246\u2013254."},{"key":"2591_CR23","doi-asserted-by":"crossref","first-page":"289","DOI":"10.1145\/243199.243276","volume-title":"SIGIR","author":"L Larkey","year":"1996","unstructured":"Larkey L, Croft W: Combining Classifiers in Text Categorization. SIGIR 1996, 289\u2013297."},{"key":"2591_CR24","first-page":"16","volume-title":"ANLP","author":"J Reynar","year":"1997","unstructured":"Reynar J, Ratnaparkhi A: A Maximum Entropy Approach to Identifying Sentence Boundaries. ANLP 1997, 16\u201319."},{"key":"2591_CR25","first-page":"111","volume-title":"CoNLL-2000","author":"P Ruch","year":"2000","unstructured":"Ruch P, Baud R, Bouillon P, Robert G: Minimal Commitment and Full Lexical Disambiguation: Balancing Rules and Hidden Markov Models. CoNLL-2000 2000, 111\u2013116."},{"key":"2591_CR26","first-page":"433","volume-title":"Proceedings Corpus Linguistics","author":"C Orasan","year":"2001","unstructured":"Orasan C: Patterns in Scientific Abstracts. Proceedings Corpus Linguistics 2001, 433\u2013445."},{"key":"2591_CR27","volume-title":"Genre Analysis: English in Academic and Research Settings","author":"J Swales","year":"1990","unstructured":"Swales J: Genre Analysis: English in Academic and Research Settings. Cambridge University Press; 1990."},{"key":"2591_CR28","first-page":"223","volume-title":"AAAI","author":"P Langley","year":"1992","unstructured":"Langley P, Iba W, Thompson K: An Analysis of Bayesian Classifiers. AAAI 1992, 223\u2013228."},{"key":"2591_CR29","first-page":"67","volume-title":"412-420","author":"Y Yang","year":"1997","unstructured":"Yang Y, Pedersen J: A Comparative Study on Feature Selection in Text Categorization. 412\u2013420 1997, 67\u201388."},{"issue":"2-3","key":"2591_CR30","doi-asserted-by":"publisher","first-page":"195","DOI":"10.1016\/j.ijmedinf.2006.05.002","volume":"76","author":"P Ruch","year":"2007","unstructured":"Ruch P, Boyer C, Chichester C, Tbahriti I, Geissbuhler A, Fabry P, Gobeill J, Pillet V, Rebholz-Schuhmann D, Lovis C, Veuthey A: Using argumentation to extract key sentences from biomedical abstracts. Int J Med Inform 2007, 76(2\u20133):195\u2013200.","journal-title":"Int J Med Inform"},{"key":"2591_CR31","volume-title":"BioCreative Notebook Papers, CNB","author":"F Couto","year":"2004","unstructured":"Couto F, Silva M, Coutinho P: FIGO: Findings GO Terms in UnStructured Text. BioCreative Notebook Papers, CNB 2004. [http:\/\/www.pdg.cnb.uam.es\/BioLink\/workshop_BioCreative_04\/handout\/]"},{"issue":"6","key":"2591_CR32","doi-asserted-by":"publisher","first-page":"658","DOI":"10.1093\/bioinformatics\/bti783","volume":"22","author":"P Ruch","year":"2006","unstructured":"Ruch P: Automatic assignment of biomedical categories: toward a generic approach. Bioinformatics 2006, 22(6):658\u201364.","journal-title":"Bioinformatics"},{"key":"2591_CR33","volume-title":"BMC Bioinformatics","author":"E Camon","year":"2005","unstructured":"Camon E, Barrell D, Dimmer E, Lee V, Magrane M, Maslen J, Binn D, Apweiler R: An evaluation of GO annotation retrieval for BioCreAtIvE and GOA. BMC Bioinformatics 2005., 6(1):"},{"key":"2591_CR34","first-page":"612","volume-title":"TREC-2003","author":"G Bhalotia","year":"2004","unstructured":"Bhalotia G, Nakov P, Schwartz A, Hearst M: BioText Team Report for the TREC 2003 Genomics Track. TREC-2003 2004, 612\u2013621."},{"key":"2591_CR35","volume-title":"Machine Learning","author":"T Mitchell","year":"1997","unstructured":"Mitchell T: Machine Learning. McGraw Hill; 1997."},{"key":"2591_CR36","first-page":"225","volume-title":"TREC-2003","author":"R Jelier","year":"2004","unstructured":"Jelier R, Schuemie M, van der Eijk C, Weeber M, van Mulligen E, Schijvenaars B, Mons B, Kors J: Searching for GeneRIFs: Concept-Based Query Expansion and Bayes Classification. TREC-2003 2004, 225\u2013233."},{"key":"2591_CR37","volume-title":"ECIR (to appear)","author":"P Ruch","year":"2005","unstructured":"Ruch P, Perret L, Savoy J: Features Combination for Extracting Gene Functions from MEDLINE. ECIR (to appear) 2005."},{"key":"2591_CR38","first-page":"441","volume-title":"TREC-2003","author":"M Kayaalp","year":"2004","unstructured":"Kayaalp M, Aronson A, Humphrey S, Ide N, Tanabe L, Smith L, Demner D, Loane R, Mork J, Bodenreider O: Methods for Accurate Retrieval of MEDLINE Citations in Functional Genomics. TREC-2003 2004, 441\u2013450."},{"key":"2591_CR39","first-page":"88","volume-title":"Pac Symp Biocomput","author":"E Stoica","year":"2006","unstructured":"Stoica E, Hearst M: Predicting gene functions from text using a cross-species approach. Pac Symp Biocomput 2006, 88\u201399."},{"key":"2591_CR40","doi-asserted-by":"publisher","first-page":"3232","DOI":"10.1093\/bioinformatics\/btm495","volume":"23","author":"C Crangle","year":"2007","unstructured":"Crangle C, Cherry JM, Hong EL, Zbyslaw A: Mining experimental evidence of molecular function claims from the literature. Bioinformatics 2007, 23: 3232\u20133240.","journal-title":"Bioinformatics"},{"key":"2591_CR41","volume-title":"BMC Bioinformatics","author":"A Mottaz","year":"2008","unstructured":"Mottaz A, Yip YL, Ruch P, Veuthey AL: Mapping proteins to disease terminologies: from UniProt to MeSH. BMC Bioinformatics 2008. (to appear)"},{"key":"2591_CR42","doi-asserted-by":"publisher","first-page":"373","DOI":"10.1186\/1471-2105-7-373","volume":"7","author":"J Natarajan","year":"2006","unstructured":"Natarajan J, Berrar D, Dubitzky W, Hack C, Zhang Y, DeSesa C, van Brocklyn J, Bremer E: Text mining of full-text journal articles combined with gene expression analysis reveals a relationship between sphingosine-1-phosphate and invasiveness of a glioblastoma cell line. BMC Bioinformatics 2006, 7():373.","journal-title":"BMC Bioinformatics"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-9-S3-S9.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,8,31]],"date-time":"2021-08-31T21:29:35Z","timestamp":1630445375000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-9-S3-S9"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2008,4]]},"references-count":42,"journal-issue":{"issue":"S3","published-print":{"date-parts":[[2008,4]]}},"alternative-id":["2591"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-9-s3-s9","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2008,4]]},"assertion":[{"value":"11 April 2008","order":1,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"S9"}}