{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,3]],"date-time":"2026-05-03T23:48:58Z","timestamp":1777852138995,"version":"3.51.4"},"reference-count":53,"publisher":"SAGE Publications","issue":"1","license":[{"start":{"date-parts":[[2021,1,1]],"date-time":"2021-01-01T00:00:00Z","timestamp":1609459200000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Health Informatics J"],"published-print":{"date-parts":[[2021,1]]},"abstract":"<jats:p>A natural language processing (NLP) application requires sophisticated lexical resources to support its processing goals. Different solutions, such as dictionary lookup and MetaMap, have been proposed in the healthcare informatics literature to identify disease terms with more than one word (multi-gram disease named entities). Although a lot of work has been done in the identification of protein- and gene-named entities in the biomedical field, not much research has been done on the recognition and resolution of terminologies in the clinical trial subject eligibility analysis. In this study, we develop a specialized lexicon for improving NLP and text mining analysis in the breast cancer domain, and evaluate it by comparing it with the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT). We use a hybrid methodology, which combines the knowledge of domain experts, terms from multiple online dictionaries, and the mining of text from sample clinical trials. Use of our methodology introduces 4243 unique lexicon items, which increase bigram entity match by 38.6% and trigram entity match by 41%. Our lexicon, which adds a significant number of new terms, is very useful for matching patients to clinical trials automatically based on eligibility matching. Beyond clinical trial matching, the specialized lexicon developed in this study could serve as a foundation for future healthcare text mining applications.<\/jats:p>","DOI":"10.1177\/1460458221989392","type":"journal-article","created":{"date-parts":[[2021,2,10]],"date-time":"2021-02-10T11:07:09Z","timestamp":1612955229000},"update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":6,"title":["Building a specialized lexicon for breast cancer clinical trial subject eligibility analysis"],"prefix":"10.1177","volume":"27","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-1784-7411","authenticated-orcid":false,"given":"Euisung","family":"Jung","sequence":"first","affiliation":[{"name":"Information Operations and Technology Management, John B. and Lillian E. Neff College of Business and Innovation, The University of Toledo, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Hemant","family":"Jain","sequence":"additional","affiliation":[{"name":"Gary W. Rollins College of Business, The University of Tennessee at Chattanooga, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Atish P","family":"Sinha","sequence":"additional","affiliation":[{"name":"Lubar School of Business, University of Wisconsin-Milwaukee, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Carmelo","family":"Gaudioso","sequence":"additional","affiliation":[{"name":"Roswell Park Cancer Institute, USA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"179","published-online":{"date-parts":[[2021,2,3]]},"reference":[{"key":"bibr1-1460458221989392","unstructured":"Frank G. Current challenges in clinical trial patient recruitment and enrollment. SoCRA Source 2004; 2:30\u201338."},{"key":"bibr2-1460458221989392","doi-asserted-by":"publisher","DOI":"10.3109\/00952999209026069"},{"key":"bibr3-1460458221989392","doi-asserted-by":"publisher","DOI":"10.1136\/jamia.1999.0060466"},{"key":"bibr4-1460458221989392","first-page":"850","volume-title":"2010 international symposium on information technology","author":"Khan A"},{"key":"bibr5-1460458221989392","doi-asserted-by":"publisher","DOI":"10.4018\/978-1-59904-373-9.ch005"},{"key":"bibr6-1460458221989392","doi-asserted-by":"publisher","DOI":"10.1016\/j.ijmedinf.2014.06.009"},{"key":"bibr7-1460458221989392","doi-asserted-by":"publisher","DOI":"10.1186\/1471-2105-9-S3-S3"},{"key":"bibr8-1460458221989392","doi-asserted-by":"publisher","DOI":"10.1016\/j.jbi.2011.06.001"},{"key":"bibr9-1460458221989392","doi-asserted-by":"publisher","DOI":"10.1002\/j.1538-7305.1951.tb01366.x"},{"key":"bibr10-1460458221989392","volume-title":"Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition 2009","author":"Jurafsky D","year":"2009"},{"key":"bibr11-1460458221989392","first-page":"161","volume-title":"Proceedings of the third annual symposium on document analysis and information retrieval","author":"Cavnar W","year":"1994"},{"key":"bibr12-1460458221989392","volume-title":"Seventh Message Understanding Conference (MUC-7)","author":"Krupka G","year":"1998"},{"key":"bibr13-1460458221989392","volume-title":"Proceedings of the 14th European conference on artificial intelligence (ECAI 2000)","author":"Paliouras G","year":"2000"},{"key":"bibr14-1460458221989392","doi-asserted-by":"publisher","DOI":"10.3115\/974557.974586"},{"key":"bibr15-1460458221989392","volume-title":"Proceedings of the sixth workshop on very large Corpora","author":"Borthwick A","year":"1998"},{"key":"bibr16-1460458221989392","first-page":"188","volume-title":"Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4 (CONLL \u201803)","volume":"2003","author":"McCallum A"},{"key":"bibr17-1460458221989392","doi-asserted-by":"publisher","DOI":"10.3115\/1567594.1567618"},{"key":"bibr18-1460458221989392","first-page":"5905","volume-title":"2009 annual international conference of the IEEE engineering in medicine and biology society","author":"Apostolova E"},{"key":"bibr19-1460458221989392","volume-title":"Proceedings of the eighteenth conference on computational natural language learning","author":"Passos A"},{"key":"bibr20-1460458221989392","doi-asserted-by":"publisher","DOI":"10.1001\/jama.291.22.2720"},{"issue":"192","key":"bibr21-1460458221989392","first-page":"51531","volume":"61","author":"NIH (National Institutes of Health)","year":"1996","journal-title":"Fed Reg"},{"key":"bibr22-1460458221989392","doi-asserted-by":"publisher","DOI":"10.1093\/jnci\/87.23.1747"},{"key":"bibr23-1460458221989392","doi-asserted-by":"publisher","DOI":"10.1016\/S0197-2456(96)00236-X"},{"key":"bibr24-1460458221989392","doi-asserted-by":"publisher","DOI":"10.1016\/S0895-4356(99)00141-9"},{"key":"bibr25-1460458221989392","doi-asserted-by":"publisher","DOI":"10.1200\/JCO.2003.02.105"},{"key":"bibr26-1460458221989392","doi-asserted-by":"publisher","DOI":"10.1016\/j.cct.2015.07.007"},{"key":"bibr27-1460458221989392","first-page":"326","volume-title":"2010 international conference on P2P, parallel, grid, cloud and internet computing","author":"Andronikou V"},{"key":"bibr28-1460458221989392","doi-asserted-by":"publisher","DOI":"10.1016\/j.jbi.2009.12.004"},{"key":"bibr29-1460458221989392","volume-title":"2012 IEEE international conference on bioinformatics and biomedicine","author":"Milian K"},{"key":"bibr30-1460458221989392","first-page":"816","author":"Patel C","year":"2007","journal-title":"Matching patient records to clinical trials using ontologies. The semantic web"},{"key":"bibr31-1460458221989392","doi-asserted-by":"publisher","DOI":"10.1016\/j.ijmedinf.2011.02.003"},{"key":"bibr32-1460458221989392","doi-asserted-by":"publisher","DOI":"10.1093\/jamia\/ocw009"},{"key":"bibr33-1460458221989392","doi-asserted-by":"publisher","DOI":"10.2196\/jmir.9312"},{"key":"bibr34-1460458221989392","doi-asserted-by":"publisher","DOI":"10.1016\/j.jbi.2010.09.007"},{"key":"bibr35-1460458221989392","doi-asserted-by":"publisher","DOI":"10.1016\/j.jbi.2012.07.006"},{"key":"bibr36-1460458221989392","first-page":"26","volume":"2010","author":"Luo Z","year":"2010","journal-title":"Summit Transl Bioinform"},{"key":"bibr37-1460458221989392","doi-asserted-by":"publisher","DOI":"10.3414\/ME12-01-0092"},{"key":"bibr38-1460458221989392","first-page":"136","volume":"2009","author":"Wilcox A","year":"2009","journal-title":"Summit Transl Bioinform"},{"key":"bibr39-1460458221989392","doi-asserted-by":"publisher","DOI":"10.2196\/13331"},{"key":"bibr40-1460458221989392","first-page":"1","volume":"3","author":"Zeng J","year":"2019","journal-title":"JCO Clin Cancer Inform"},{"key":"bibr41-1460458221989392","first-page":"587","volume":"2010","author":"Parai GK","year":"2010","journal-title":"AMIA Annu Symp Proc"},{"key":"bibr42-1460458221989392","doi-asserted-by":"publisher","DOI":"10.1186\/1471-2105-12-397"},{"key":"bibr43-1460458221989392","doi-asserted-by":"publisher","DOI":"10.1371\/journal.pone.0175277"},{"key":"bibr44-1460458221989392","doi-asserted-by":"publisher","DOI":"10.1142\/S0219720010004513"},{"key":"bibr45-1460458221989392","doi-asserted-by":"publisher","DOI":"10.1016\/j.artmed.2011.04.011"},{"key":"bibr46-1460458221989392","doi-asserted-by":"publisher","DOI":"10.1093\/jamia\/ocx152"},{"key":"bibr47-1460458221989392","doi-asserted-by":"publisher","DOI":"10.1109\/JBHI.2013.2257815"},{"key":"bibr48-1460458221989392","unstructured":"Medical Library Association. Recommended websites for cancer information, https:\/\/www.mlanet.org\/p\/cm\/ld\/fid=909 (accessed 20 January 2020)."},{"key":"bibr49-1460458221989392","unstructured":"SNOMED International. SNOMED CT, https:\/\/www.snomed.org\/ (accessed 20 January 2020)."},{"key":"bibr50-1460458221989392","unstructured":"National Library of Medicine. SNOMED CT, https:\/\/www.nlm.nih.gov\/healthit\/snomedct\/index.html (accessed 20 January 2020)."},{"key":"bibr51-1460458221989392","doi-asserted-by":"crossref","unstructured":"Bhattacharyya SB. SNOMED CT basics. Introduction to SNOMED CT. Springer Singapore, 2015, pp.25\u201360.","DOI":"10.1007\/978-981-287-895-3_4"},{"key":"bibr52-1460458221989392","doi-asserted-by":"publisher","DOI":"10.4137\/BII.S11664"},{"key":"bibr53-1460458221989392","doi-asserted-by":"publisher","DOI":"10.1136\/jamia.2009.001024"}],"container-title":["Health Informatics Journal"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1460458221989392","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/1460458221989392","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1460458221989392","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,4,29]],"date-time":"2026-04-29T22:31:22Z","timestamp":1777501882000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/1460458221989392"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,1]]},"references-count":53,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,1]]}},"alternative-id":["10.1177\/1460458221989392"],"URL":"https:\/\/doi.org\/10.1177\/1460458221989392","relation":{},"ISSN":["1460-4582","1741-2811"],"issn-type":[{"value":"1460-4582","type":"print"},{"value":"1741-2811","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,1]]},"article-number":"1460458221989392"}}