{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,15]],"date-time":"2026-05-15T06:28:56Z","timestamp":1778826536361,"version":"3.51.4"},"reference-count":19,"publisher":"Oxford University Press (OUP)","issue":"5","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2015,9,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Introduction Automatically identifying specific phenotypes in free-text clinical notes is critically important for the reuse of clinical data. In this study, the authors combine expert-guided feature (text) selection with one-class classification for text processing.<\/jats:p><jats:p>Objectives To compare the performance of one-class classification to traditional binary classification; to evaluate the utility of feature selection based on expert-selected salient text (snippets); and to determine the robustness of these models with respects to irrelevant surrounding text.<\/jats:p><jats:p>Methods The authors trained one-class support vector machines (1C-SVMs) and two-class SVMs (2C-SVMs) to identify notes discussing breast cancer. Manually annotated visit summary notes (88 positive and 88 negative for breast cancer) were used to compare the performance of models trained on whole notes labeled as positive or negative to models trained on expert-selected text sections (snippets) relevant to breast cancer status. Model performance was evaluated using a 70:30 split for 20 iterations and on a realistic dataset of 10\u2009000 records with a breast cancer prevalence of 1.4%.<\/jats:p><jats:p>Results When tested on a balanced experimental dataset, 1C-SVMs trained on snippets had comparable results to 2C-SVMs trained on whole notes (F\u2009=\u20090.92 for both approaches). When evaluated on a realistic imbalanced dataset, 1C-SVMs had a considerably superior performance (F\u2009=\u20090.61 vs. F\u2009=\u20090.17 for the best performing model) attributable mainly to improved precision (p\u2009=\u2009.88 vs. p\u2009=\u2009.09 for the best performing model).<\/jats:p><jats:p>Conclusions 1C-SVMs trained on expert-selected relevant text sections perform better than 2C-SVMs classifiers trained on either snippets or whole notes when applied to realistically imbalanced data with low prevalence of the positive class.<\/jats:p>","DOI":"10.1093\/jamia\/ocv010","type":"journal-article","created":{"date-parts":[[2015,6,11]],"date-time":"2015-06-11T11:04:18Z","timestamp":1434020658000},"page":"962-966","source":"Crossref","is-referenced-by-count":11,"title":["Expert guided natural language processing using one-class classification"],"prefix":"10.1093","volume":"22","author":[{"given":"Erel","family":"Joffe","sequence":"first","affiliation":[{"name":"School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Texas"},{"name":"Department of Hematology and Bone Marrow Transplantation, Tel Aviv Medical Center, Tel Aviv Israel"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Emily J","family":"Pettigrew","sequence":"additional","affiliation":[{"name":"Department of Computer Science, Rice University, Houston, Texas"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jorge R","family":"Herskovic","sequence":"additional","affiliation":[{"name":"School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Texas"},{"name":"The University of Texas, M.D. Anderson Cancer Center, Houston, Texas"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Charles F","family":"Bearden","sequence":"additional","affiliation":[{"name":"School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Texas"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Elmer V","family":"Bernstam","sequence":"additional","affiliation":[{"name":"School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Texas"},{"name":"Department of Internal Medicine, Medical School, The University of Texas Health Science Center at Houston, Houston, Texas"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"286","published-online":{"date-parts":[[2015,6,10]]},"reference":[{"key":"2020110613034459200_ocv010-B1","doi-asserted-by":"crossref","first-page":"345","DOI":"10.1017\/S026988891300043X","article-title":"One-class classification: taxonomy of study and review of techniques","volume":"29","author":"Khan","year":"2014","journal-title":"Knowl Eng Rev."},{"key":"2020110613034459200_ocv010-B2","doi-asserted-by":"crossref","first-page":"690","DOI":"10.1016\/j.eswa.2007.10.042","article-title":"Imbalanced text classification: a term weighting approach","volume":"36","author":"Liu","year":"2009","journal-title":"Expert Syst Appl."},{"key":"2020110613034459200_ocv010-B3","doi-asserted-by":"crossref","first-page":"80","DOI":"10.1145\/1007730.1007741","article-title":"Feature selection for text categorization on imbalanced data","volume":"6","author":"Zheng","year":"2004","journal-title":"ACM SIGKDD Explor Newsl."},{"key":"2020110613034459200_ocv010-B4","doi-asserted-by":"crossref","first-page":"32","DOI":"10.4304\/jcp.1.7.32-40","article-title":"Parameter optimization of kernel-based one-class classifier on imbalance learning","volume":"1","author":"Zhuang","year":"2006","journal-title":"J Comput."},{"key":"2020110613034459200_ocv010-B5","doi-asserted-by":"crossref","first-page":"60","DOI":"10.1145\/1007730.1007739","article-title":"Extreme re-balancing for SVMs","volume":"6","author":"Raskutti","year":"2004","journal-title":"ACM SIGKDD Explor Newsl."},{"key":"2020110613034459200_ocv010-B6","doi-asserted-by":"crossref","first-page":"1466","DOI":"10.1016\/j.neucom.2006.05.013","article-title":"One-class document classification via Neural Networks","volume":"70","author":"Manevitz","year":"2007","journal-title":"Neurocomputing."},{"key":"2020110613034459200_ocv010-B7","doi-asserted-by":"crossref","first-page":"273","DOI":"10.1007\/s10462-008-9082-5","article-title":"An evaluation of dimension reduction techniques for one-class classification","volume":"27","author":"Villalba","year":"2008","journal-title":"Artif Intell Rev."},{"key":"2020110613034459200_ocv010-B8","doi-asserted-by":"crossref","first-page":"1027","DOI":"10.1016\/j.patrec.2012.01.019","article-title":"On feature selection with principal component analysis for one-class SVM","volume":"33","author":"Lian","year":"2012","journal-title":"Pattern Recognit Lett."},{"key":"2020110613034459200_ocv010-B9"},{"key":"2020110613034459200_ocv010-B10","first-page":"260","article-title":"Using \u201cAnnotator Rationales\u201d to improve machine learning for text categorization","volume":"260","author":"Zaidan","year":"2007","journal-title":"Comput Linguist."},{"key":"2020110613034459200_ocv010-B11","first-page":"1603","article-title":"Leveraging rich annotations to improve learning of medical concepts from clinical free text","volume":"2011","author":"Yu","year":"2011","journal-title":"AMIA Annu Symp Proc."},{"key":"2020110613034459200_ocv010-B12","doi-asserted-by":"crossref","first-page":"130","DOI":"10.1108\/eb046814","article-title":"An algorithm for suffix stripping","volume":"14","author":"Porter","year":"1980","journal-title":"Program."},{"key":"2020110613034459200_ocv010-B13","doi-asserted-by":"crossref","first-page":"1443","DOI":"10.1162\/089976601750264965","article-title":"Estimating the support of a high-dimensional distribution","volume":"13","author":"Sch\u00f6lkopf","year":"2001","journal-title":"Neural Comput."},{"key":"2020110613034459200_ocv010-B14","first-page":"67","article-title":"Feature selection, perceptron learning, and a usability case study for text categorization","volume":"31","author":"Ng","year":"1997","journal-title":"SIGIR Forum (ACM Spec Interes Gr Inf Retrieval)."},{"key":"2020110613034459200_ocv010-B15","doi-asserted-by":"crossref","DOI":"10.1007\/978-3-540-89378-3_32","article-title":"Discriminating against new classes: one-class versus multi-class classification","volume-title":"AI 2008: Advances in Artificial Intelligence","author":"Hempstalk","year":"2008"},{"key":"2020110613034459200_ocv010-B16","doi-asserted-by":"crossref","first-page":"950","DOI":"10.1016\/j.jbi.2008.12.013","article-title":"Building a semantically annotated corpus of clinical texts","volume":"42","author":"Roberts","year":"2009","journal-title":"J Biomed Inform."},{"key":"2020110613034459200_ocv010-B17","doi-asserted-by":"crossref","first-page":"356","DOI":"10.1186\/1471-2105-7-356","article-title":"New directions in biomedical text annotation: definitions, guidelines and corpus construction","volume":"7","author":"Wilbur","year":"2006","journal-title":"BMC Bioinformatics."},{"key":"2020110613034459200_ocv010-B18","doi-asserted-by":"crossref","first-page":"327","DOI":"10.1093\/bib\/bbs084","article-title":"A survey on annotation tools for the biomedical literature","volume":"15","author":"Neves","year":"2014","journal-title":"Brief Bioinform."},{"key":"2020110613034459200_ocv010-B19","first-page":"988","article-title":"Collaborative knowledge acquisition for the design of context aware alert systems","volume":"19","author":"Joffe","year":"2012","journal-title":"JAMIA."}],"container-title":["Journal of the American Medical Informatics Association"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/jamia\/article-pdf\/22\/5\/962\/34146371\/ocv010.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/academic.oup.com\/jamia\/article-pdf\/22\/5\/962\/34146371\/ocv010.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,5,12]],"date-time":"2022-05-12T15:38:04Z","timestamp":1652369884000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/jamia\/article\/22\/5\/962\/929463"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2015,6,10]]},"references-count":19,"journal-issue":{"issue":"5","published-online":{"date-parts":[[2015,6,10]]},"published-print":{"date-parts":[[2015,9,1]]}},"URL":"https:\/\/doi.org\/10.1093\/jamia\/ocv010","relation":{},"ISSN":["1527-974X","1067-5027"],"issn-type":[{"value":"1527-974X","type":"electronic"},{"value":"1067-5027","type":"print"}],"subject":[],"published-other":{"date-parts":[[2015,9]]},"published":{"date-parts":[[2015,6,10]]}}}