{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2023,10,15]],"date-time":"2023-10-15T00:52:04Z","timestamp":1697331124459},"reference-count":31,"publisher":"Springer Science and Business Media LLC","issue":"1","content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["BMC Bioinformatics"],"published-print":{"date-parts":[[2006,12]]},"abstract":"<jats:title>Abstract<\/jats:title>\n          <jats:sec>\n            <jats:title>Background<\/jats:title>\n            <jats:p>The rapid proliferation of biomedical text makes it increasingly difficult for researchers to identify, synthesize, and utilize developed knowledge in their fields of interest. Automated information extraction procedures can assist in the acquisition and management of this knowledge. Previous efforts in biomedical text mining have focused primarily upon named entity recognition of well-defined molecular objects such as genes, but less work has been performed to identify disease-related objects and concepts. Furthermore, promise has been tempered by an inability to efficiently scale approaches in ways that minimize manual efforts and still perform with high accuracy. Here, we have applied a machine-learning approach previously successful for identifying molecular entities to a disease concept to determine if the underlying probabilistic model effectively generalizes to unrelated concepts with minimal manual intervention for model retraining.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Results<\/jats:title>\n            <jats:p>We developed a named entity recognizer (MTag), an entity tagger for recognizing clinical descriptions of malignancy presented in text. The application uses the machine-learning technique Conditional Random Fields with additional domain-specific features. MTag was tested with 1,010 training and 432 evaluation documents pertaining to cancer genomics. Overall, our experiments resulted in 0.85 precision, 0.83 recall, and 0.84 F-measure on the evaluation set. Compared with a baseline system using string matching of text with a neoplasm term list, MTag performed with a much higher recall rate (92.1% vs. 42.1% recall) and demonstrated the ability to learn new patterns. Application of MTag to all MEDLINE abstracts yielded the identification of 580,002 unique and 9,153,340 overall mentions of malignancy. Significantly, addition of an extensive lexicon of malignancy mentions as a feature set for extraction had minimal impact in performance.<\/jats:p>\n          <\/jats:sec>\n          <jats:sec>\n            <jats:title>Conclusion<\/jats:title>\n            <jats:p>Together, these results suggest that the identification of disparate biomedical entity classes in free text may be achievable with high accuracy and only moderate additional effort for each new application domain.<\/jats:p>\n          <\/jats:sec>","DOI":"10.1186\/1471-2105-7-492","type":"journal-article","created":{"date-parts":[[2006,11,8]],"date-time":"2006-11-08T07:14:42Z","timestamp":1162970082000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":22,"title":["Automated recognition of malignancy mentions in biomedical literature"],"prefix":"10.1186","volume":"7","author":[{"given":"Yang","family":"Jin","sequence":"first","affiliation":[]},{"given":"Ryan T","family":"McDonald","sequence":"additional","affiliation":[]},{"given":"Kevin","family":"Lerman","sequence":"additional","affiliation":[]},{"given":"Mark A","family":"Mandel","sequence":"additional","affiliation":[]},{"given":"Steven","family":"Carroll","sequence":"additional","affiliation":[]},{"given":"Mark Y","family":"Liberman","sequence":"additional","affiliation":[]},{"given":"Fernando C","family":"Pereira","sequence":"additional","affiliation":[]},{"given":"Raymond S","family":"Winters","sequence":"additional","affiliation":[]},{"given":"Peter S","family":"White","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2006,11,7]]},"reference":[{"issue":"6","key":"1231_CR1","doi-asserted-by":"publisher","first-page":"423","DOI":"10.1016\/j.jbi.2004.08.008","volume":"37","author":"N Collier","year":"2004","unstructured":"Collier N, Takeuchi K: Comparison of character-level and part of speech features for name recognition in biomedical texts. J Biomed Inform 2004, 37(6):423\u2013435. 10.1016\/j.jbi.2004.08.008","journal-title":"J Biomed Inform"},{"issue":"Suppl 1","key":"1231_CR2","doi-asserted-by":"publisher","first-page":"S5","DOI":"10.1186\/1471-2105-6-S1-S5","volume":"6","author":"J Finkel","year":"2005","unstructured":"Finkel J, Dingare S, Manning CD, Nissim M, Alex B, Grover C: Exploring the boundaries: gene and protein identification in biomedical text. BMC Bioinformatics 2005, 6 Suppl 1: S5. 10.1186\/1471-2105-6-S1-S5","journal-title":"BMC Bioinformatics"},{"issue":"Suppl 1","key":"1231_CR3","doi-asserted-by":"publisher","first-page":"S9","DOI":"10.1186\/1471-2105-6-S1-S9","volume":"6","author":"J Hakenberg","year":"2005","unstructured":"Hakenberg J, Bickel S, Plake C, Brefeld U, Zahn H, Faulstich L, Leser U, Scheffer T: Systematic feature evaluation for gene name recognition. BMC Bioinformatics 2005, 6 Suppl 1: S9. 10.1186\/1471-2105-6-S1-S9","journal-title":"BMC Bioinformatics"},{"issue":"Suppl 1","key":"1231_CR4","doi-asserted-by":"publisher","first-page":"S4","DOI":"10.1186\/1471-2105-6-S1-S4","volume":"6","author":"S Kinoshita","year":"2005","unstructured":"Kinoshita S, Cohen KB, Ogren PV, Hunter L: BioCreAtIvE Task1A: entity identification with a stochastic tagger. BMC Bioinformatics 2005, 6 Suppl 1: S4. 10.1186\/1471-2105-6-S1-S4","journal-title":"BMC Bioinformatics"},{"issue":"Suppl 1","key":"1231_CR5","doi-asserted-by":"publisher","first-page":"S6","DOI":"10.1186\/1471-2105-6-S1-S6","volume":"6","author":"R McDonald","year":"2005","unstructured":"McDonald R, Pereira F: Identifying gene and protein mentions in text using conditional random fields. BMC Bioinformatics 2005, 6 Suppl 1: S6. 10.1186\/1471-2105-6-S1-S6","journal-title":"BMC Bioinformatics"},{"issue":"Suppl 1","key":"1231_CR6","doi-asserted-by":"publisher","first-page":"S8","DOI":"10.1186\/1471-2105-6-S1-S8","volume":"6","author":"T Mitsumori","year":"2005","unstructured":"Mitsumori T, Fation S, Murata M, Doi K, Doi H: Gene\/protein name recognition based on support vector machine using dictionary as features. BMC Bioinformatics 2005, 6 Suppl 1: S8. 10.1186\/1471-2105-6-S1-S8","journal-title":"BMC Bioinformatics"},{"issue":"Suppl 1","key":"1231_CR7","doi-asserted-by":"publisher","first-page":"S10","DOI":"10.1186\/1471-2105-6-S1-S10","volume":"6","author":"J Tamames","year":"2005","unstructured":"Tamames J: Text Detective: a rule-based system for gene annotation in biomedical texts. BMC Bioinformatics 2005, 6 Suppl 1: S10. 10.1186\/1471-2105-6-S1-S10","journal-title":"BMC Bioinformatics"},{"issue":"8","key":"1231_CR8","doi-asserted-by":"publisher","first-page":"1124","DOI":"10.1093\/bioinformatics\/18.8.1124","volume":"18","author":"L Tanabe","year":"2002","unstructured":"Tanabe L, Wilbur WJ: Tagging gene and protein names in biomedical text. Bioinformatics 2002, 18(8):1124\u20131132. 10.1093\/bioinformatics\/18.8.1124","journal-title":"Bioinformatics"},{"issue":"Suppl 1","key":"1231_CR9","doi-asserted-by":"publisher","first-page":"S3","DOI":"10.1186\/1471-2105-6-S1-S3","volume":"6","author":"L Tanabe","year":"2005","unstructured":"Tanabe L, Xie N, Thom LH, Matten W, Wilbur WJ: GENETAG: a tagged corpus for gene\/protein named entity recognition. BMC Bioinformatics 2005, 6 Suppl 1: S3. 10.1186\/1471-2105-6-S1-S3","journal-title":"BMC Bioinformatics"},{"issue":"16","key":"1231_CR10","doi-asserted-by":"publisher","first-page":"2046","DOI":"10.1093\/bioinformatics\/btg279","volume":"19","author":"JM Temkin","year":"2003","unstructured":"Temkin JM, Gilder MR: Extraction of protein interaction information from unstructured text using a context-free grammar. Bioinformatics 2003, 19(16):2046\u20132053. 10.1093\/bioinformatics\/btg279","journal-title":"Bioinformatics"},{"issue":"6","key":"1231_CR11","doi-asserted-by":"publisher","first-page":"498","DOI":"10.1016\/j.jbi.2004.08.007","volume":"37","author":"M Torii","year":"2004","unstructured":"Torii M, Kamboj S, Vijay-Shanker K: Using name-internal and contextual features to classify biological terms. J Biomed Inform 2004, 37(6):498\u2013511. 10.1016\/j.jbi.2004.08.007","journal-title":"J Biomed Inform"},{"issue":"Suppl 1","key":"1231_CR12","doi-asserted-by":"publisher","first-page":"S2","DOI":"10.1186\/1471-2105-6-S1-S2","volume":"6","author":"A Yeh","year":"2005","unstructured":"Yeh A, Morgan A, Colosimo M, Hirschman L: BioCreAtIvE Task 1A: gene mention finding evaluation. BMC Bioinformatics 2005, 6 Suppl 1: S2. 10.1186\/1471-2105-6-S1-S2","journal-title":"BMC Bioinformatics"},{"issue":"Suppl 1","key":"1231_CR13","doi-asserted-by":"publisher","first-page":"S7","DOI":"10.1186\/1471-2105-6-S1-S7","volume":"6","author":"G Zhou","year":"2005","unstructured":"Zhou G, Shen D, Zhang J, Su J, Tan S: Recognition of protein\/gene names from text using an ensemble of classifiers. BMC Bioinformatics 2005, 6 Suppl 1: S7. 10.1186\/1471-2105-6-S1-S7","journal-title":"BMC Bioinformatics"},{"issue":"17","key":"1231_CR14","doi-asserted-by":"publisher","first-page":"3249","DOI":"10.1093\/bioinformatics\/bth350","volume":"20","author":"RT McDonald","year":"2004","unstructured":"McDonald RT, Winters RS, Mandel M, Jin Y, White PS, Pereira F: An entity tagger for recognizing acquired genomic variations in cancer literature. Bioinformatics 2004, 20(17):3249\u20133251. 10.1093\/bioinformatics\/bth350","journal-title":"Bioinformatics"},{"issue":"Pt 2","key":"1231_CR15","first-page":"758","volume":"11","author":"L Chen","year":"2004","unstructured":"Chen L, Friedman C: Extracting phenotypic information from the literature via natural language processing. Medinfo 2004, 11(Pt 2):758\u2013762.","journal-title":"Medinfo"},{"key":"1231_CR16","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1017\/S1351324900000061","volume":"1","author":"C Friedman","year":"1995","unstructured":"Friedman C, Hripcsak G, DuMouchel W, Hohnson SB, Clayton PD: Natural language processing in an operational clinical information system. Natural Language Engineering 1995, 1: 1\u201328.","journal-title":"Natural Language Engineering"},{"issue":"1-3","key":"1231_CR17","doi-asserted-by":"publisher","first-page":"63","DOI":"10.1016\/S1386-5056(02)00053-9","volume":"67","author":"U Hahn","year":"2002","unstructured":"Hahn U, Romacker M, Schulz S: MEDSYNDIKATE--a natural language system for the extraction of medical information from findings reports. Int J Med Inform 2002, 67(1\u20133):63\u201374. 10.1016\/S1386-5056(02)00053-9","journal-title":"Int J Med Inform"},{"key":"1231_CR18","volume-title":"Hierarchical Hidden Markov Models for information extraction: Acapulco, Mexico.","author":"M Skounakis","year":"2003","unstructured":"Skounakis M, Craven M, Ray S: Hierarchical Hidden Markov Models for information extraction: Acapulco, Mexico. ; 2003."},{"issue":"5","key":"1231_CR19","doi-asserted-by":"publisher","first-page":"535","DOI":"10.1038\/sj.ejhg.5201585","volume":"14","author":"MA van Driel","year":"2006","unstructured":"van Driel MA, Bruggeman J, Vriend G, Brunner HG, Leunissen JA: A text-mining analysis of the human phenome. Eur J Hum Genet 2006, 14(5):535\u2013542. 10.1038\/sj.ejhg.5201585","journal-title":"Eur J Hum Genet"},{"key":"1231_CR20","first-page":"282","volume-title":"Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data","author":"J Lafferty","year":"2001","unstructured":"Lafferty J, McCallum A, Pereira F: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. 2001, 282\u2013289."},{"key":"1231_CR21","first-page":"403","volume-title":"Efficiently Inducing Features of Conditional Random Fields","author":"A McCallum","year":"2003","unstructured":"McCallum A: Efficiently Inducing Features of Conditional Random Fields. Edited by: Meek C, Kj\u00carulff U. Morgan Kaufmann; 2003:403\u2013410."},{"key":"1231_CR22","doi-asserted-by":"publisher","first-page":"88","DOI":"10.1186\/1471-2407-4-88","volume":"4","author":"JJ Berman","year":"2004","unstructured":"Berman JJ: Tumor taxonomy for the developmental lineage classification of neoplasms. BMC Cancer 2004, 4: 88. 10.1186\/1471-2407-4-88","journal-title":"BMC Cancer"},{"key":"1231_CR23","doi-asserted-by":"crossref","unstructured":"The Gene Ontology (GO) project in 2006 Nucleic Acids Res 2006, 34(Database issue):D322\u20136. 10.1093\/nar\/gkj021","DOI":"10.1093\/nar\/gkj021"},{"issue":"1","key":"1231_CR24","doi-asserted-by":"publisher","first-page":"D267","DOI":"10.1093\/nar\/gkh061","volume":"32","author":"O Bodenreider","year":"2004","unstructured":"Bodenreider O: The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res 2004, 32(1):D267\u201370. 10.1093\/nar\/gkh061","journal-title":"Nucleic Acids Res"},{"issue":"9","key":"1231_CR25","first-page":"273","volume":"63","author":"KK Kakazu","year":"2004","unstructured":"Kakazu KK, Cheung LW, Lynne W: The Cancer Biomedical Informatics Grid (caBIG): pioneering an expansive network of information and tools for collaborative cancer research. Hawaii Med J 2004, 63(9):273\u2013275.","journal-title":"Hawaii Med J"},{"key":"1231_CR26","unstructured":"Semantic type definition for malignancy[http:\/\/bioie.ldc.upenn.edu\/mamandel\/annotators\/onco\/definitions.html]"},{"key":"1231_CR27","volume-title":"Proc of BioLink 2004","author":"S Kulick","year":"2004","unstructured":"Kulick S, Bies A, Liberman M, Mandel M, McDonald R, Palmer M, Schein A, Ungar L, Winters S, White P: Integrated annotation for biomedical information extraction. Proc of BioLink 2004 2004."},{"key":"1231_CR28","volume-title":"Proc ISMB","author":"S Kulick","year":"2003","unstructured":"Kulick S, Liberman M, Palmer M, Schein A: Shallow semantic annotation of biomedical corpora for information extraction. Proc ISMB 2003."},{"key":"1231_CR29","unstructured":"Penn BioIE corpus release v0.9[http:\/\/bioie.ldc.upenn.edu]"},{"key":"1231_CR30","unstructured":"McCallum A: MALLET: A Machine Learning for Language Toolkit.[http:\/\/mallet.cs.umass.edu\/]"},{"issue":"9","key":"1231_CR31","doi-asserted-by":"publisher","first-page":"1117","DOI":"10.1097\/01.pas.0000131558.32412.40","volume":"28","author":"E Bruder","year":"2004","unstructured":"Bruder E, Passera O, Harms D, Leuschner I, Ladanyi M, Argani P, Eble JN, Struckmann K, Schraml P, Moch H: Morphologic and molecular characterization of renal cell carcinoma in children and young adults. Am J Surg Pathol 2004, 28(9):1117\u20131132.","journal-title":"Am J Surg Pathol"}],"container-title":["BMC Bioinformatics"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1186\/1471-2105-7-492.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,1]],"date-time":"2021-09-01T11:04:30Z","timestamp":1630494270000},"score":1,"resource":{"primary":{"URL":"https:\/\/bmcbioinformatics.biomedcentral.com\/articles\/10.1186\/1471-2105-7-492"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2006,11,7]]},"references-count":31,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2006,12]]}},"alternative-id":["1231"],"URL":"https:\/\/doi.org\/10.1186\/1471-2105-7-492","relation":{},"ISSN":["1471-2105"],"issn-type":[{"value":"1471-2105","type":"electronic"}],"subject":[],"published":{"date-parts":[[2006,11,7]]},"assertion":[{"value":"24 July 2006","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 November 2006","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"7 November 2006","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"492"}}