{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,8]],"date-time":"2025-10-08T22:25:45Z","timestamp":1759962345319},"reference-count":38,"publisher":"Oxford University Press (OUP)","issue":"6","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2006,3,15]]},"abstract":"<jats:title>Abstract<\/jats:title>\n               <jats:p>Motivation: We report on the development of a generic text categorization system designed to automatically assign biomedical categories to any input text. Unlike usual automatic text categorization systems, which rely on data-intensive models extracted from large sets of training data, our categorizer is largely data-independent.<\/jats:p>\n               <jats:p>Methods: In order to evaluate the robustness of our approach we test the system on two different biomedical terminologies: the Medical Subject Headings (MeSH) and the Gene Ontology (GO). Our lightweight categorizer, based on two ranking modules, combines a pattern matcher and a vector space retrieval engine, and uses both stems and linguistically-motivated indexing units.<\/jats:p>\n               <jats:p>Results and Conclusion: Results show the effectiveness of phrase indexing for both GO and MeSH categorization, but we observe the categorization power of the tool depends on the controlled vocabulary: precision at high ranks ranges from above 90% for MeSH to &amp;lt;20% for GO, establishing a new baseline for categorizers based on retrieval methods.<\/jats:p>\n               <jats:p>Contact: \u00a0Patrick.Ruch@sim.hcuge.ch<\/jats:p>","DOI":"10.1093\/bioinformatics\/bti783","type":"journal-article","created":{"date-parts":[[2005,11,16]],"date-time":"2005-11-16T03:08:21Z","timestamp":1132110501000},"page":"658-664","source":"Crossref","is-referenced-by-count":76,"title":["Automatic assignment of biomedical categories: toward a generic approach"],"prefix":"10.1093","volume":"22","author":[{"given":"Patrick","family":"Ruch","sequence":"first","affiliation":[{"name":"University Hospitals of Geneva, Medical Informatics Service \u00a0 CH-1201, Geneva"}]}],"member":"286","published-online":{"date-parts":[[2005,11,15]]},"reference":[{"key":"2023012408504954800_b1","first-page":"1","article-title":"Exchangeability and related topics","volume-title":"\u00c9cole d'\u00e9t\u00e9 de probabilit\u00e9s de Saint-Flour, XIII\u20141983, Volume 1117 of Lecture Notes in Mathematics","author":"Aldous","year":"1985"},{"key":"2023012408504954800_b2","first-page":"142","article-title":"Automatic text summarization based on word-clusters and ranking algorithms","author":"Amini","year":"2005"},{"key":"2023012408504954800_b3","article-title":"Linguistically motivated information retrieval","volume-title":"Encyclopedia of Library and Information Science","author":"Arampatzis","year":"2000"},{"key":"2023012408504954800_b4","first-page":"36","article-title":"Fusion of knowledge-intensive and statistical approaches for retrieving and annotating textual genomics documents","author":"Aronson","year":"2005"},{"key":"2023012408504954800_b5","doi-asserted-by":"crossref","first-page":"562","DOI":"10.1101\/gr.461403","article-title":"The Gene Ontology Annotation (GOA) project: implementation of GO in Swiss-Prot, TrEMBL and InterPro","volume":"13","author":"Camon","year":"2003","journal-title":"Genome Res."},{"key":"2023012408504954800_b6","doi-asserted-by":"crossref","first-page":"19","DOI":"10.1016\/0020-0271(71)90024-6","article-title":"A definition of relevance for information retrieval","volume":"7","author":"Cooper","year":"1971","journal-title":"Inf. Storage Retr."},{"key":"2023012408504954800_b7","article-title":"FIGO: findings GO terms in unStructured text","volume-title":"BioCreative Notebook Papers, CNB 2004","author":"Couto","year":"2004"},{"key":"2023012408504954800_b8","first-page":"451","article-title":"Finding gene functions using litminer","author":"de Bruijn","year":"2003"},{"key":"2023012408504954800_b9","doi-asserted-by":"crossref","first-page":"103","DOI":"10.1023\/A:1007413511361","article-title":"On the optimality of the simple bayesian classifier under zero-one loss","volume":"29","author":"Domingos","year":"1997","journal-title":"Mach. Learn."},{"key":"2023012408504954800_b10","doi-asserted-by":"crossref","DOI":"10.1186\/1471-2105-6-S1-S23","article-title":"Data-poor Categorization and Passage Retrieval for Gene Ontology Annotation in Swiss-Prot","volume":"6","author":"Ehrler","year":"2005","journal-title":"BMC Bioinformatics"},{"key":"2023012408504954800_b11","first-page":"176","article-title":"Indexing consistency in medline","volume":"71","author":"Funk","year":"1983","journal-title":"Bull Med. Libr. Assoc."},{"key":"2023012408504954800_b12","doi-asserted-by":"crossref","first-page":"328","DOI":"10.1162\/coli.2003.29.2.328","article-title":"Recent advances in computational terminology","volume":"29","author":"Gaizauskas","year":"2003","journal-title":"Comput. Linguist."},{"key":"2023012408504954800_b13","doi-asserted-by":"crossref","first-page":"21","DOI":"10.1145\/1067268.1067273","article-title":"Report on the TREC 2004 Genomics track","volume":"39","author":"Hersh","year":"2005","journal-title":"SIGIR Forum"},{"key":"2023012408504954800_b14","first-page":"192","article-title":"OHSUMED: an interactive retrieval evaluation and new large test collection for research","author":"Hersh","year":"1994"},{"key":"2023012408504954800_b15","doi-asserted-by":"crossref","DOI":"10.1186\/1471-2105-6-S1-S1","article-title":"Overview of BioCreAtIvE: critical assessment of information extraction for biology","volume":"6","author":"Hirschman","year":"2005","journal-title":"BMC Bioinformatics"},{"key":"2023012408504954800_b16","first-page":"1","article-title":"Making large-scale SVM learning practical","volume-title":"Advances in Kernel Methods\u2014Support Vector Learning","author":"Joachims","year":"1999"},{"key":"2023012408504954800_b17","first-page":"319","article-title":"Automatic MeSH term assignment and quality assessment","author":"Kim","year":"2001","journal-title":"Proc. AMIA Symp."},{"key":"2023012408504954800_b18","first-page":"289","article-title":"Combining classifiers in text categorization","author":"Larkey","year":"1996"},{"key":"2023012408504954800_b19","first-page":"246","article-title":"Evaluating and optimizing autonomous text classification systems","author":"Lewis","year":"1995"},{"key":"2023012408504954800_b20","first-page":"298","article-title":"Training algorithms for linear text classifiers","author":"Lewis","year":"1996"},{"key":"2023012408504954800_b21","first-page":"23","article-title":"GLIMPSE: a tool to search through entire file systems","author":"Manber","year":"1994"},{"key":"2023012408504954800_b22","doi-asserted-by":"crossref","DOI":"10.3115\/1072228.1072370","article-title":"Automatic glossary extraction: beyond terminology identification","author":"Park","year":"2002"},{"key":"2023012408504954800_b23","article-title":"Extraction and disambiguation of acronym\u2013meaning pairs in medline","author":"Pustejovsky","year":"2001"},{"key":"2023012408504954800_b24","first-page":"101","article-title":"Term proximity scoring for keyword-based retrieval systems","author":"Rasolofo","year":"2003"},{"key":"2023012408504954800_b25","article-title":"Information retrieval and spelling errors: improving effectiveness by lexical disambiguation","volume-title":"ACM-SAC Information Access and Retrieval Track","author":"Ruch","year":"2002"},{"key":"2023012408504954800_b26","doi-asserted-by":"crossref","first-page":"75","DOI":"10.1016\/S1386-5056(02)00057-6","article-title":"Evaluating and reducing the effect of data corruption when applying bag of words approaches to medical records","volume":"67","author":"Ruch","year":"2002","journal-title":"Int. J. Med. Inf."},{"key":"2023012408504954800_b27","first-page":"111","article-title":"Minimal commitment and full lexical disambiguation: balancing rules and hidden Markov models","author":"Ruch","year":"2000"},{"key":"2023012408504954800_b28","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-540-31865-1_9","article-title":"Features combination for extracting gene functions from medline","author":"Ruch","year":"2005"},{"key":"2023012408504954800_b29","doi-asserted-by":"crossref","DOI":"10.1186\/1471-2105-4-20","article-title":"Information extraction from full text scientific articles: where are the keywords?","volume":"4","author":"Shah","year":"2003","journal-title":"BMC Bioinformatics"},{"key":"2023012408504954800_b30","first-page":"35","article-title":"Modern information retrieval: a brief overview","volume":"24","author":"Singhal","year":"2001","journal-title":"IEEE Data Eng. Bull."},{"key":"2023012408504954800_b31","first-page":"21","article-title":"Pivoted document length normalization","author":"Singhal","year":"1996"},{"key":"2023012408504954800_b32","doi-asserted-by":"crossref","DOI":"10.1177\/002383096500800404","article-title":"A probabilistic procedure for grouping words into phrases","volume":"8","author":"Stolz","year":"1965","journal-title":"Lang. Speech"},{"key":"2023012408504954800_b33","doi-asserted-by":"crossref","first-page":"209","DOI":"10.1016\/0010-4825(95)00055-0","article-title":"An analysis of statistical term strength and its use in the indexing and retrieval of molecular biology texts","volume":"26","author":"Wilbur","year":"1996","journal-title":"Comput. Biol. Med."},{"key":"2023012408504954800_b34","first-page":"358","article-title":"An evaluation of statistical approaches to medline indexing","author":"Yang","year":"1996"},{"key":"2023012408504954800_b35","first-page":"88","article-title":"Sampling strategies and learning efficiency in text categorization","author":"Yang","year":"1996"},{"key":"2023012408504954800_b36","first-page":"67","article-title":"An evaluation of statistical approaches to text categorization","volume":"1","author":"Yang","year":"1999","journal-title":"J. Inf. Ret."},{"key":"2023012408504954800_b37","first-page":"447","article-title":"A linear least squares fit mapping method for information retrieval from natural language texts","author":"Yang","year":"1992"},{"key":"2023012408504954800_b38","first-page":"307","article-title":"How reliable are large-scale information retrieval experiments?","author":"Zobel","year":"1998"}],"container-title":["Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/22\/6\/658\/48839321\/bioinformatics_22_6_658.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article-pdf\/22\/6\/658\/48839321\/bioinformatics_22_6_658.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,24]],"date-time":"2023-01-24T09:21:32Z","timestamp":1674552092000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/bioinformatics\/article\/22\/6\/658\/294293"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2005,11,15]]},"references-count":38,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2006,3,15]]}},"URL":"https:\/\/doi.org\/10.1093\/bioinformatics\/bti783","relation":{},"ISSN":["1367-4811","1367-4803"],"issn-type":[{"value":"1367-4811","type":"electronic"},{"value":"1367-4803","type":"print"}],"subject":[],"published-other":{"date-parts":[[2006,3,15]]},"published":{"date-parts":[[2005,11,15]]}}}