{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,1,2]],"date-time":"2024-01-02T15:01:41Z","timestamp":1704207701088},"reference-count":45,"publisher":"Springer Science and Business Media LLC","issue":"S1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2005,5]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>In the context of the BioCreative competition, where training data were very sparse, we investigated two complementary tasks: 1) given a Swiss-Prot triplet, containing a protein, a GO (Gene Ontology) term and a relevant article, extraction of a short passage that justifies the GO category assignement; 2) given a Swiss-Prot pair, containing a protein and a relevant article, automatic assignement of a set of categories.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Methods<\/jats:title>\n            <jats:p>Sentence is the basic retrieval unit. Our classifier computes a distance between each sentence and the GO category provided with the Swiss-Prot entry. The Text Categorizer computes a distance between each GO term and the text of the article. Evaluations are reported both based on annotator judgements as established by the competition and based on mean average precision measures computed using a curated sample of Swiss-Prot.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>Our system achieved the best recall and precision combination both for passage retrieval and text categorization as evaluated by official evaluators. However, text categorization results were far below those in other data-poor text categorization experiments The top proposed term is relevant in less that 20% of cases, while categorization with other biomedical controlled vocabulary, such as the Medical Subject Headings, we achieved more than 90% precision. We also observe that the scoring methods used in our experiments, based on the retrieval status value of our engines, exhibits effective confidence estimation capabilities.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusion<\/jats:title>\n            <jats:p>From a comparative perspective, the combination of retrieval and natural language processing methods we designed, achieved very competitive performances. Largely data-independent, our systems were no less effective that data-intensive approaches. These results suggests that the overall strategy could benefit a large class of information extraction tasks, especially when training data are missing. However, from a user perspective, results were disappointing. Further investigations are needed to design applicable end-user text mining tools for biologists.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-6-s1-s23","type":"journal-article","created":{"date-parts":[[2005,5,24]],"date-time":"2005-05-24T18:13:44Z","timestamp":1116958424000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":20,"title":["Data-poor categorization and passage retrieval for Gene Ontology Annotation in Swiss-Prot"],"prefix":"10.1186","volume":"6","author":[{"given":"Fr\u00e9d\u00e9ric","family":"Ehrler","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Antoine","family":"Geissb\u00fchler","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Antonio","family":"Jimeno","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Patrick","family":"Ruch","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2005,5,24]]},"reference":[{"issue":"12","key":"658_CR1","doi-asserted-by":"publisher","first-page":"1553","DOI":"10.1093\/bioinformatics\/18.12.1553","volume":"18","author":"L Hirschman","year":"2002","unstructured":"Hirschman L, Park J, Tsujii J, Wong L, Wu C: Accomplishments and challenges in literature data mining for biology. Bioinformatics 2002, 18(12):1553\u20131561. 10.1093\/bioinformatics\/18.12.1553","journal-title":"Bioinformatics"},{"key":"658_CR2","volume-title":"MUC-7 Named-Entity task Definition. MUC","author":"N Chinchor","year":"1997","unstructured":"Chinchor N: MUC-7 Named-Entity task Definition. MUC. 1997."},{"key":"658_CR3","volume-title":"TREC-8 Report","author":"D Hull","year":"2000","unstructured":"Hull D: Xerox TREC-8 Question Answering Track Report. TREC-8 Report 2000."},{"key":"658_CR4","first-page":"165","volume-title":"Information Retrieval","author":"P Kantor","year":"2000","unstructured":"Kantor P, Voorhees E: The TREC-5 Confusion Track: Comparing Retrieval Methods for Scanned Text. Information Retrieval 2000, 165\u201376. 10.1023\/A:1009902609570"},{"key":"658_CR5","volume-title":"COLING","author":"P Ruch","year":"2002","unstructured":"Ruch P: Using contextual spelling correction to improve retrieval effectiveness in degraded text collections. COLING 2002."},{"key":"658_CR6","volume-title":"SDAIR Proceedings","author":"E Mittendorf","year":"1996","unstructured":"Mittendorf E, Schauble P: Measuring the Effects of Data Corruption on Information Retrieval. SDAIR Proceedings 1996."},{"key":"658_CR7","volume-title":"Proceedings of the AAAI Spring Symposium on Machine Learning in Information Access","author":"Y Yang","year":"1996","unstructured":"Yang Y: Sampling strategies and learning efficiency in text categorization. Proceedings of the AAAI Spring Symposium on Machine Learning in Information Access 1996."},{"key":"658_CR8","volume-title":"AAAI-98 Workshop on Learning for Text Categorization","author":"A McCallum","year":"1998","unstructured":"McCallum A, Nigam K: A comparison of event models for Naive Bayes text classification. AAAI-98 Workshop on Learning for Text Categorization 1998."},{"key":"658_CR9","volume-title":"Advances in Kernel Methods \u2013 Support Vector Learning","author":"T Joachims","year":"1999","unstructured":"Joachims T: Making Large-Scale SVM Learning Practical. Advances in Kernel Methods \u2013 Support Vector Learning 1999."},{"issue":"2\/3","key":"658_CR10","doi-asserted-by":"publisher","first-page":"135","DOI":"10.1023\/A:1007649029923","volume":"39","author":"R Schapire","year":"2000","unstructured":"Schapire R, Singer Y: BoosTexter: A Boosting-based System for Text Categorization. Machine Learning 2000, 39(2\/3):135\u2013168. 10.1023\/A:1007649029923","journal-title":"Machine Learning"},{"issue":"3","key":"658_CR11","doi-asserted-by":"publisher","first-page":"233","DOI":"10.1145\/183422.183423","volume":"12","author":"C Apt\u00e9","year":"1994","unstructured":"Apt\u00e9 C, Damerau F, Weiss S: Automated learning of decision rules for text categorization. ACM Transactions on Information Systems (TOIS) 1994, 12(3):233\u2013251. 10.1145\/183422.183423","journal-title":"ACM Transactions on Information Systems (TOIS)"},{"key":"658_CR12","volume-title":"Proceedings of the Second Annual Conference on Innovative Applications of Intelligence","author":"P Hayes","year":"1990","unstructured":"Hayes P, Weinstein S: A System for Content-Based Indexing of a Database of News Stories. Proceedings of the Second Annual Conference on Innovative Applications of Intelligence 1990."},{"key":"658_CR13","first-page":"289","volume-title":"Combining classifiers in text categorization","author":"L Larkey","year":"1996","unstructured":"Larkey L, Croft W: Combining classifiers in text categorization. SIGIR, ACM Press, New York, US; 1996:289\u2013297."},{"key":"658_CR14","first-page":"447","volume-title":"COLING","author":"Y Yang","year":"1992","unstructured":"Yang Y, Chute C: A linear least squares fit mapping method for information retrieval from natural language texts. COLING 1992, 447\u2013453."},{"key":"658_CR15","first-page":"101","volume-title":"ECIR","author":"Y Rasolofo","year":"2003","unstructured":"Rasolofo Y, Savoy J: Term Proximity Scoring for Keyword-based Retrieval Systems. ECIR 2003, 101\u2013116."},{"key":"658_CR16","first-page":"167","volume-title":"TREC-5 NIST Special Publication 500-238","author":"D Hull","year":"1997","unstructured":"Hull D, Grefenstette G, Schulze B, Gaussier E, Schutze H, Pedersen J: XEROX TREC-5 site report: Routing, Filtering, NLP, and Spanish tracks. TREC-5 NIST Special Publication 500\u2013238 1997, 167\u2013180."},{"key":"658_CR17","first-page":"164","volume-title":"Text REtrieval Conference","author":"T Strzalkowski","year":"1998","unstructured":"Strzalkowski T, Stein G, Wise GB, Carballo JP, Tapanainen P, Jarvinen T, Voutilainen A, Karlgren J: Natural Language Information Retrieval: TREC-7 Report. Text REtrieval Conference 1998, 164\u2013173."},{"issue":"4","key":"658_CR18","doi-asserted-by":"publisher","first-page":"529","DOI":"10.1016\/S0306-4573(01)00045-0","volume":"38","author":"C Tan","year":"2002","unstructured":"Tan C, Wang Y, Lee C: The Use of BiGrams to Enhance Text Categorization. Information Processing and Management 2002, 38(4):529\u2013546. 10.1016\/S0306-4573(01)00045-0","journal-title":"Information Processing and Management"},{"key":"658_CR19","first-page":"213","volume":"LNCS 2291","author":"M Kongovi","year":"2002","unstructured":"Kongovi M, Guzman J, Dasigi V: Text Categorization: An Experiment Using Phrases. ECIR 2002, LNCS 2291: 213\u2013220.","journal-title":"ECIR"},{"issue":"4","key":"658_CR20","doi-asserted-by":"publisher","first-page":"352","DOI":"10.1002\/(SICI)1097-4571(2000)51:4<352::AID-ASI5>3.0.CO;2-8","volume":"51","author":"K Tolle","year":"2000","unstructured":"Tolle K, Chen H: Comparing noun phrasing techniques for use with medical digital library tools. Journal of the American Society for Information Science 2000, 51(4):352\u2013370. Publisher Full Text 10.1002\/(SICI)1097-4571(2000)51:4<352::AID-ASI5>3.0.CO;2-8","journal-title":"Journal of the American Society for Information Science"},{"key":"658_CR21","first-page":"200","volume-title":"RIAO","author":"M Mitra","year":"1997","unstructured":"Mitra M, Buckley C, Singhal A, Cardie C: An analysis of Statistical and Syntactic Phrases. RIAO 1997, 200\u2013214."},{"key":"658_CR22","volume-title":"Encyclopedia of Library and Information Sciece","author":"A Arampatzis","year":"2000","unstructured":"Arampatzis A, van der Weide T, van Nommel P, Koster C: Linguistically Motivated Information Retrieval. Encyclopedia of Library and Information Sciece 2000., 69:"},{"key":"658_CR23","volume-title":"Language and Speech","author":"W Stolz","year":"1965","unstructured":"Stolz W: A probabilistic procedure for grouping words into phrases. Language and Speech 1965., 8:"},{"key":"658_CR24","first-page":"111","volume-title":"CoNLL","author":"P Ruch","year":"2000","unstructured":"Ruch P, Baud R, Bouillon P, Robert G: Minimal Commitment and Full Lexical Disambiguation: Balancing Rules and Hidden Markov Models. CoNLL 2000, 111\u2013116."},{"issue":"1","key":"658_CR25","doi-asserted-by":"publisher","first-page":"45","DOI":"10.1093\/nar\/28.1.45","volume":"28","author":"A Bairoch","year":"2000","unstructured":"Bairoch A, Apweiler R: The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res 2000, 28(1):45\u20138. 10.1093\/nar\/28.1.45","journal-title":"Nucleic Acids Res"},{"key":"658_CR26","doi-asserted-by":"publisher","first-page":"25","DOI":"10.1038\/75556","volume":"25","author":"TGO Consortium","year":"2000","unstructured":"Consortium TGO: Gene Ontology: tool for the unification of biology. Nature Genetics 2000, 25: 25\u201329. 10.1038\/75556","journal-title":"Nature Genetics"},{"key":"658_CR27","volume-title":"TREC-12","author":"P Ruch","year":"2004","unstructured":"Ruch P, Chichester C, Cohen G, Coray G, Ehrler F, Ghorbel H, M\u00fcller H, Pallotta V: Report on the TREC 2003 Experiment: Genomic Track. TREC-12 2004. [http:\/\/trec.nist.gov\/pubs\/trec12\/t12_proceedings.html]"},{"key":"658_CR28","volume-title":"Proceedings of the ANLP","author":"J Reynar","year":"1997","unstructured":"Reynar J, Ratnaparkhi A: Entropy Approach to Identifying Sentence Boundaries. Proceedings of the ANLP 1997."},{"issue":"1\u20132","key":"658_CR29","doi-asserted-by":"publisher","first-page":"169","DOI":"10.1016\/S0933-3657(03)00052-6","volume":"29","author":"P Ruch","year":"2003","unstructured":"Ruch P, Baud R, Geissb\u00fchler A: Using Lexical Disambiguation and Named-Entity Recognition to Improve Spelling Correction in the Electronic Patient Record. Art Intell Med 2003, 29(1\u20132):169\u2013184. 10.1016\/S0933-3657(03)00052-6","journal-title":"Art Intell Med"},{"key":"658_CR30","doi-asserted-by":"publisher","first-page":"168","DOI":"10.1145\/321796.321811","volume":"1","author":"R Wagner","year":"1974","unstructured":"Wagner R, Fisher M: The string-to-string correction problem. Journal of the Association of Computing Machinery 1974, 1: 168\u2013173.","journal-title":"Journal of the Association of Computing Machinery"},{"key":"658_CR31","first-page":"73","volume-title":"IIWeb","author":"W Cohen","year":"2003","unstructured":"Cohen W, Fienberg PRS: A Comparison of String Distance Metrics for Name-Matching Tasks. IIWeb 2003, 73\u201378."},{"key":"658_CR32","volume-title":"BioCreative Notebook Papers, CNB","author":"M Krallinger","year":"2004","unstructured":"Krallinger M, Padron M: Prediction of GO annotation by Combining Entity Specific Sentence Sliding Windows Profiles. BioCreative Notebook Papers, CNB 2004. [http:\/\/www.pdg.cnb.uam.es\/BioLink\/workshop_BioCreative_04\/handout\/]"},{"key":"658_CR33","volume-title":"Introduction to Modern Information Retrieval","author":"G Salton","year":"1983","unstructured":"Salton G, McGill M: Introduction to Modern Information Retrieval. McGraw Hill Book; 1983."},{"issue":"4","key":"658_CR34","doi-asserted-by":"publisher","first-page":"357","DOI":"10.1145\/582415.582416","volume":"20","author":"G Amati","year":"2002","unstructured":"Amati G, van Rijsbergen C: Probabilistic Models of Information Retrieval based on Measuring the Divergence from Randomness. ACM Transactions on Information Systems (TOIS) 2002, 20(4):357\u2013389. 10.1145\/582415.582416","journal-title":"ACM Transactions on Information Systems (TOIS)"},{"key":"658_CR35","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1145\/243199.243206","volume-title":"ACM-SIGIR","author":"A Singhal","year":"1996","unstructured":"Singhal A, Buckley C, Mitra M: Pivoted document length normalization. ACM-SIGIR 1996, 21\u201329."},{"key":"658_CR36","doi-asserted-by":"publisher","first-page":"38","DOI":"10.1007\/s007990050035","volume":"2","author":"I Grabtree","year":"1998","unstructured":"Grabtree I, Soltysiak S: Identifying and Tracking Changing Interests. International Journal of Digital Libraries 1998, 2: 38\u201353. 10.1007\/s007990050035","journal-title":"International Journal of Digital Libraries"},{"key":"658_CR37","first-page":"487","volume-title":"Proceedings of ICML","author":"R Klinkenberg","year":"2000","unstructured":"Klinkenberg R, Joachims T: Detecting Concept Drift with Support Vector Machines. Proceedings of ICML 2000, 487\u2013494."},{"key":"658_CR38","volume-title":"COLING","author":"Y Park","year":"2002","unstructured":"Park Y, Byrd R, Boguraev B: Automatic Glossary Extraction: Beyond Terminology Identification. COLING 2002."},{"key":"658_CR39","volume-title":"BioCreative Notebook Papers, CNB","author":"K Verspoor","year":"2004","unstructured":"Verspoor K, Cohn J, Joslyn C, Mniszewski S: Protein Annotation as Term categorization in the Gene Ontology. BioCreative Notebook Papers, CNB 2004. [http:\/\/www.pdg.cnb.uam.es\/BioLink\/workshop_BioCreative_04\/handout\/]"},{"key":"658_CR40","volume-title":"BioCreative Notebook Papers, CNB","author":"F Couto","year":"2004","unstructured":"Couto F, Silva M, Coutinho P: FIGO: Findings GO Terms in UnStructured Text. BioCreative Notebook Papers, CNB 2004. [http:\/\/www.pdg.cnb.uam.es\/BioLink\/workshop_BioCreative_04\/handout\/]"},{"key":"658_CR41","volume-title":"BioCreative Notebook Papers, CNB","author":"D Hanish","year":"2004","unstructured":"Hanish D, Fundel K, Mevissen H, Zimmer R, Fluck J: ProMiner: Organism-specific protein name detecion using approximate string matching. BioCreative Notebook Papers, CNB 2004. [http:\/\/www.pdg.cnb.uam.es\/BioLink\/workshop_BioCreative_04\/handout\/]"},{"key":"658_CR42","volume-title":"BioCreative Notebook Papers, CNB","author":"J Crim","year":"2004","unstructured":"Crim J, McDonald R, Pereira F: Automatically Annotating Documents with Normalized Gene Lists. BioCreative Notebook Papers, CNB 2004. [http:\/\/www.pdg.cnb.uam.es\/BioLink\/workshop_BioCreative_04\/handout\/]"},{"key":"658_CR43","volume-title":"TREC-12","author":"W Hersh","year":"2004","unstructured":"Hersh W, Bhupatiraju R: TREC GENOMICS Track Overview. TREC-12 2004. [http:\/\/trec.nist.gov\/pubs\/trec12\/t12_proceedings.html]"},{"key":"658_CR44","volume-title":"COLING Workshop on Natural Language Processing in Biomedicine and its Application (JNLPBA)","author":"Y Mizuta","year":"2004","unstructured":"Mizuta Y, Collier N: Zone Identification in Biology Articles as a Basis for Information Extraction. COLING Workshop on Natural Language Processing in Biomedicine and its Application (JNLPBA) 2004."},{"key":"658_CR45","volume-title":"COLING Workshop on Natural Language Processing in Biomedicine and its Application (JNLPBA)","author":"I Tbahriti","year":"2004","unstructured":"Tbahriti I, Chichester C, Lisacek F, Ruch P: Using Argumentation to Retrieve Articles with Similar Citations from MEDLINE. COLING Workshop on Natural Language Processing in Biomedicine and its Application (JNLPBA) 2004."}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-6-S1-S23.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T01:41:37Z","timestamp":1630460497000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-6-S1-S23"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2005,5]]},"references-count":45,"journal-issue":{"issue":"S1","published-print":{"date-parts":[[2005,5]]}},"alternative-id":["658"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-6-s1-s23","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2005,5]]},"assertion":[{"value":"24 May 2005","order":1,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"S23"}}