{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,29]],"date-time":"2025-10-29T12:56:12Z","timestamp":1761742572660},"reference-count":28,"publisher":"Springer Science and Business Media LLC","issue":"S1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2005,5]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>Within the emerging field of text mining and statistical natural language processing (NLP) applied to biomedical articles, a broad variety of techniques have been developed during the past years. Nevertheless, there is still a great ned of comparative assessment of the performance of the proposed methods and the development of common evaluation criteria. This issue was addressed by the Critical Assessment of Text Mining Methods in Molecular Biology (BioCreative) contest. The aim of this contest was to assess the performance of text mining systems applied to biomedical texts including tools which recognize named entities such as genes and proteins, and tools which automatically extract protein annotations.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>The \"sentence sliding window\" approach proposed here was found to efficiently extract text fragments from full text articles containing annotations on proteins, providing the highest number of correctly predicted annotations. Moreover, the number of correct extractions of individual entities (i.e. proteins and GO terms) involved in the relationships used for the annotations was significantly higher than the correct extractions of the complete annotations (protein-function relations).<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusion<\/jats:title>\n            <jats:p>We explored the use of averaging sentence sliding windows for information extraction, especially in a context where conventional training data is unavailable. The combination of our approach with more refined statistical estimators and machine learning techniques might be a way to improve annotation extraction for future biomedical text mining applications.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-6-s1-s19","type":"journal-article","created":{"date-parts":[[2005,5,24]],"date-time":"2005-05-24T18:13:44Z","timestamp":1116958424000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":15,"title":["A sentence sliding window approach to extract protein annotations from biomedical articles"],"prefix":"10.1186","volume":"6","author":[{"given":"Martin","family":"Krallinger","sequence":"first","affiliation":[]},{"given":"Maria","family":"Padron","sequence":"additional","affiliation":[]},{"given":"Alfonso","family":"Valencia","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2005,5,24]]},"reference":[{"key":"654_CR1","doi-asserted-by":"publisher","first-page":"28","DOI":"10.1093\/nar\/gkg033","volume":"31","author":"D Wheeler","year":"2003","unstructured":"Wheeler D, Church D, Federhen S, Lash A, Madden T, Pontius J, Schuler G, Schriml L, Sequeira E, Tatusova T, Wagner L: Database resources of the National Center for Biotechnology. Nucleic Acids Res 2003, 31: 28\u201333. 10.1093\/nar\/gkg033","journal-title":"Nucleic Acids Res"},{"key":"654_CR2","doi-asserted-by":"publisher","first-page":"98","DOI":"10.1002\/1097-0134(20001001)41:1<98::AID-PROT120>3.0.CO;2-S","volume":"41","author":"D Devos","year":"2000","unstructured":"Devos D, Valencia A: Practical limits of function prediction. Proteins 2000, 41: 98\u2013107. 10.1002\/1097-0134(20001001)41:1<98::AID-PROT120>3.0.CO;2-S","journal-title":"Proteins"},{"key":"654_CR3","doi-asserted-by":"publisher","first-page":"429","DOI":"10.1016\/S0168-9525(01)02348-4","volume":"17","author":"D Devos","year":"2001","unstructured":"Devos D, Valencia A: Intrinsic errors in genome annotation. Trends Genet 2001, 17: 429\u2013431. 10.1016\/S0168-9525(01)02348-4","journal-title":"Trends Genet"},{"key":"654_CR4","doi-asserted-by":"publisher","first-page":"600","DOI":"10.1093\/bioinformatics\/14.7.600","volume":"14","author":"M Andrade","year":"1998","unstructured":"Andrade M, Valencia A: Automatic extraction of keywords from scientific text: application to the knowledge domain of protein families. Bioinformatics 1998, 14: 600\u2013607. 10.1093\/bioinformatics\/14.7.600","journal-title":"Bioinformatics"},{"key":"654_CR5","doi-asserted-by":"publisher","first-page":"785","DOI":"10.1101\/gr.86902","volume":"12","author":"H Xie","year":"2002","unstructured":"Xie H, Wasserman A, Levine Z, Novik A, Grebinskiy V, Shoshan A: Large-scale protein annotation through gene ontology. Genome Res 2002, 12: 785\u2013794. 10.1101\/gr.86902","journal-title":"Genome Res"},{"key":"654_CR6","doi-asserted-by":"publisher","first-page":"203","DOI":"10.1101\/gr.199701","volume":"12","author":"S Raychaudhuri","year":"2002","unstructured":"Raychaudhuri S, Chang J, Sutphin P, Altman R: Associating genes with gene ontology codes using a maximum entropy analysis of biomedical literature. Genome Res 2002, 12: 203\u2013214. 10.1101\/gr.199701","journal-title":"Genome Res"},{"key":"654_CR7","first-page":"106","volume":"11","author":"J Oliveros","year":"2000","unstructured":"Oliveros J, Blaschke C, Herrero J, Dopazo J, Valencia A: Expression profiles and biological function. Genome Inform Ser Workshop Genome Inform 2000, 11: 106\u2013117.","journal-title":"Genome Inform Ser Workshop Genome Inform"},{"key":"654_CR8","doi-asserted-by":"publisher","first-page":"396","DOI":"10.1093\/bioinformatics\/btg002","volume":"19","author":"S Raychaudhuri","year":"2003","unstructured":"Raychaudhuri S, Altman R: A literature-based method for assessing the functional coherence of a gene group. Bioinformatics 2003, 19: 396\u2013401. 10.1093\/bioinformatics\/btg002","journal-title":"Bioinformatics"},{"key":"654_CR9","doi-asserted-by":"publisher","first-page":"21","DOI":"10.1038\/88213","volume":"28","author":"T Jenssen","year":"2001","unstructured":"Jenssen T, Laegreid A, Komorowski J, Hovig E: A literature network of human genes for high-throughput analysis of gene expression. Nat Genet 2001, 28: 21\u201328. 10.1038\/88213","journal-title":"Nat Genet"},{"key":"654_CR10","doi-asserted-by":"publisher","first-page":"RESEARCH0055","DOI":"10.1186\/gb-2002-3-10-research0055","volume":"3","author":"D Chaussabel","year":"2002","unstructured":"Chaussabel D, Sher A: Mining microarray expression data by literature profiling. Genome Biol 2002, 3: RESEARCH0055. 10.1186\/gb-2002-3-10-research0055","journal-title":"Genome Biol"},{"key":"654_CR11","first-page":"60","volume-title":"Proc Int Conf Intell Syst Mol Biol","author":"C Blaschke","year":"1999","unstructured":"Blaschke C, Andrade AM, Ouzounis C, Valencia A: Automatic extraction of biological information from scientific text: protein-protein interactions. Proc Int Conf Intell Syst Mol Biol 1999, 60\u201367."},{"key":"654_CR12","first-page":"374","volume-title":"Pac Symp Biocomput","author":"J Chang","year":"2001","unstructured":"Chang J, Raychaudhuri S, Altman R: Including biological literature improves homology search. Pac Symp Biocomput 2001, 374\u2013383."},{"key":"654_CR13","doi-asserted-by":"publisher","first-page":"125","DOI":"10.1093\/bioinformatics\/16.2.125","volume":"16","author":"R MacCallum","year":"2000","unstructured":"MacCallum R, Kelley L, Sternberg M: SAWTED: structure assignment with text description-enhanced detection of remote homologues with automated SWISS-PROT annotation comparisons. Bioinformatics 2000, 16: 125\u2013129. 10.1093\/bioinformatics\/16.2.125","journal-title":"Bioinformatics"},{"issue":"Suppl 1","key":"654_CR14","doi-asserted-by":"publisher","first-page":"S16","DOI":"10.1186\/1471-2105-6-S1-S16","volume":"6","author":"C Blaschke","year":"2005","unstructured":"Blaschke C, Andres Leon E, Valencia A: Evaluation of BioCreative assessment of task 2. BMC Bioinformatics 2005, 6(Suppl 1):S16. 10.1186\/1471-2105-6-S1-S16","journal-title":"BMC Bioinformatics"},{"issue":"Suppl 1","key":"654_CR15","doi-asserted-by":"publisher","first-page":"S2","DOI":"10.1186\/1471-2105-6-S1-S2","volume":"6","author":"A Yeh","year":"2005","unstructured":"Yeh A, Hirschmann L, Morgan A, Colosimo M: BioCreAtIvE task 1A: gene mention finding evaluation. BMC bioinformatics 2005, 6(Suppl 1):S2. 10.1186\/1471-2105-6-S1-S2","journal-title":"BMC bioinformatics"},{"issue":"Suppl 1","key":"654_CR16","doi-asserted-by":"publisher","first-page":"S11","DOI":"10.1186\/1471-2105-6-S1-S11","volume":"6","author":"L Hirschmann","year":"2005","unstructured":"Hirschmann L, Colosimo M, Morgan A, Yeh A: Overview of BioCreAtIvE task 1B: Normailzed Gene Lists. BMC bioinformatics 2005, 6(Suppl 1):S11. 10.1186\/1471-2105-6-S1-S11","journal-title":"BMC bioinformatics"},{"issue":"Suppl 1","key":"654_CR17","doi-asserted-by":"publisher","first-page":"S17","DOI":"10.1186\/1471-2105-6-S1-S17","volume":"6","author":"E Camon","year":"2005","unstructured":"Camon E, Barrell D, Dimmer E, Lee V, Magrane M, Mslen J, Binns D, Apweiler R: Evaluation of GO annotation retrieval for BioCreative, Task 2: Lessons to be learned and comparison with existing annotation techniques in GOA. BMC bioinformatics 2005, 6(Suppl 1):S17. 10.1186\/1471-2105-6-S1-S17","journal-title":"BMC bioinformatics"},{"key":"654_CR18","doi-asserted-by":"publisher","first-page":"262","DOI":"10.1093\/nar\/gkh021","volume":"32","author":"E Camon","year":"2004","unstructured":"Camon E, Magrane M, Barrell D, Lee V, Dimmer E, Maslen J, Binns D, Harte N, Lopez R, Apweiler R: The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology. Nucleic Acids Res 2004, 32: 262\u2013266. 10.1093\/nar\/gkh021","journal-title":"Nucleic Acids Res"},{"key":"654_CR19","doi-asserted-by":"publisher","first-page":"365","DOI":"10.1093\/nar\/gkg095","volume":"31","author":"B Boeckmann","year":"2003","unstructured":"Boeckmann B, Bairoch A, Apweiler R, Blatter M, Estreicher A, Gasteiger E, Martin M, Michoud K, O'Donovan C, Phan I, Pilbout S, Schneider M: The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res 2003, 31: 365\u2013370. 10.1093\/nar\/gkg095","journal-title":"Nucleic Acids Res"},{"key":"654_CR20","doi-asserted-by":"publisher","first-page":"331","DOI":"10.1093\/bioinformatics\/btg1046","volume":"19","author":"A Yeh","year":"2003","unstructured":"Yeh A, Hirschman L, Morgan A: Evaluation of text data mining for database curation: lessons learned from the KDD Challenge Cup. Bioinformatics 2003, 19: 331\u2013339. 10.1093\/bioinformatics\/btg1046","journal-title":"Bioinformatics"},{"key":"654_CR21","first-page":"504","volume-title":"Proc AMIA Symp","author":"A McCray","year":"2002","unstructured":"McCray A, Browne A, Bodenreider O: The lexical properties of the gene ontology. Proc AMIA Symp 2002, 504\u2013508."},{"key":"654_CR22","doi-asserted-by":"publisher","first-page":"D41","DOI":"10.1093\/nar\/gkh092","volume":"32","author":"H Mewes","year":"2004","unstructured":"Mewes H, Amid C, Arnold R, Frishman D, Guldener U, Mannhaupt G, Munsterkotter M, Pagel P, Strack N, Stumpflen V, Warfsmann J, Ruepp A: MIPS: analysis and annotation of proteins from whole genomes. Nucleic Acids Res 2004, 32: D41-D44. [http:\/\/mips.gsf.de\/] 10.1093\/nar\/gkh092","journal-title":"Nucleic Acids Res"},{"key":"654_CR23","doi-asserted-by":"publisher","first-page":"130","DOI":"10.1108\/eb046814","volume":"14","author":"M Porter","year":"1980","unstructured":"Porter M: An algorithm for suffix stripping. Program 1980, 14: 130\u2013137.","journal-title":"Program"},{"key":"654_CR24","first-page":"635","volume-title":"SODA","author":"M Datar","year":"2002","unstructured":"Datar M, Gionis A, Indyk P, Motwani R: Maintaining stream statistics over sliding windows. SODA 2002, 635\u2013644."},{"key":"654_CR25","doi-asserted-by":"publisher","first-page":"1333","DOI":"10.1111\/j.1432-1033.1993.tb17885.x","volume":"213","author":"L Sipos","year":"1993","unstructured":"Sipos L, vonHeijne G: Predicting the topology of eukaryotic membrane proteins. Eur J Biochem 1993, 213: 1333\u20131340. 10.1111\/j.1432-1033.1993.tb17885.x","journal-title":"Eur J Biochem"},{"key":"654_CR26","volume-title":"Foundations of Statistical Natural Language Processing","author":"C Manning","year":"1999","unstructured":"Manning C, Schuetze H: Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press; 1999."},{"key":"654_CR27","first-page":"80","volume":"95","author":"G Marquet","year":"2003","unstructured":"Marquet G, Burgun A, Moussouni F, Guerin E, LeDuff F, Loreal O: BioMeKe: an ontology-based biomedical knowledge extraction system devoted to transcriptome analysis. Stud Health Technol Inform 2003, 95: 80\u201385.","journal-title":"Stud Health Technol Inform"},{"key":"654_CR28","doi-asserted-by":"publisher","first-page":"1417","DOI":"10.1093\/bioinformatics\/btg160","volume":"19","author":"J Chiang","year":"2003","unstructured":"Chiang J, Yu H: MeKE: discovering the functions of gene products from biomedical literature via sentence alignment. Bioinformatics 2003, 19: 1417\u20131422. 10.1093\/bioinformatics\/btg160","journal-title":"Bioinformatics"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-6-S1-S19.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T10:09:43Z","timestamp":1630490983000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-6-S1-S19"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2005,5]]},"references-count":28,"journal-issue":{"issue":"S1","published-print":{"date-parts":[[2005,5]]}},"alternative-id":["654"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-6-s1-s19","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2005,5]]},"assertion":[{"value":"24 May 2005","order":1,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"S19"}}