{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,25]],"date-time":"2025-10-25T12:29:36Z","timestamp":1761395376602,"version":"3.37.3"},"reference-count":28,"publisher":"Oxford University Press (OUP)","issue":"3","license":[{"start":{"date-parts":[[2019,1,21]],"date-time":"2019-01-21T00:00:00Z","timestamp":1548028800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/academic.oup.com\/journals\/pages\/open_access\/funder_policies\/chorus\/standard_publication_model"}],"funder":[{"name":"Intramural Research Program"},{"DOI":"10.13039\/100000092","name":"National Library of Medicine","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000092","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000002","name":"National Institutes of Health","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100000002","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2019,3,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:sec><jats:title>Objective<\/jats:title><jats:p>Automated understanding of consumer health inquiries might be hindered by misspellings. To detect and correct various types of spelling errors in consumer health questions, we developed a distributable spell-checking tool, CSpell, that handles nonword errors, real-word errors, word boundary infractions, punctuation errors, and combinations of the above.<\/jats:p><\/jats:sec><jats:sec><jats:title>Methods<\/jats:title><jats:p>We developed a novel approach of using dual embedding within Word2vec for context-dependent corrections. This technique was used in combination with dictionary-based corrections in a 2-stage ranking system. We also developed various splitters and handlers to correct word boundary infractions. All correction approaches are integrated to handle errors in consumer health questions.<\/jats:p><\/jats:sec><jats:sec><jats:title>Results<\/jats:title><jats:p>Our approach achieves an F1 score of 80.93% and 69.17% for spelling error detection and correction, respectively.<\/jats:p><\/jats:sec><jats:sec><jats:title>Discussion<\/jats:title><jats:p>The dual-embedding model shows a significant improvement (9.13%) in F1 score compared with the general practice of using cosine similarity with word vectors in Word2vec for context ranking. Our 2-stage ranking system shows a 4.94% improvement in F1 score compared with the best 1-stage ranking system.<\/jats:p><\/jats:sec><jats:sec><jats:title>Conclusion<\/jats:title><jats:p>CSpell improves over the state of the art and provides near real-time automatic misspelling detection and correction in consumer health questions. The software and the CSpell test set are available at https:\/\/umlslex.nlm.nih.gov\/cSpell.<\/jats:p><\/jats:sec>","DOI":"10.1093\/jamia\/ocy171","type":"journal-article","created":{"date-parts":[[2018,11,26]],"date-time":"2018-11-26T20:24:09Z","timestamp":1543263849000},"page":"211-218","source":"Crossref","is-referenced-by-count":18,"title":["Spell checker for consumer language (CSpell)"],"prefix":"10.1093","volume":"26","author":[{"given":"Chris J","family":"Lu","sequence":"first","affiliation":[{"name":"Lister Hill National Center for Biomedical Communications National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA"}]},{"given":"Alan R","family":"Aronson","sequence":"additional","affiliation":[{"name":"Lister Hill National Center for Biomedical Communications National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA"}]},{"given":"Sonya E","family":"Shooshan","sequence":"additional","affiliation":[{"name":"Lister Hill National Center for Biomedical Communications National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA"}]},{"given":"Dina","family":"Demner-Fushman","sequence":"additional","affiliation":[{"name":"Lister Hill National Center for Biomedical Communications National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA"}]}],"member":"286","published-online":{"date-parts":[[2019,1,21]]},"reference":[{"issue":"4","key":"2020110613061256300_ocy171-B1","doi-asserted-by":"crossref","first-page":"377","DOI":"10.1145\/146370.146380","article-title":"Techniques for automatically correcting words in texts","volume":"24","author":"Kukich","year":"1992","journal-title":"ACM Comput Surv"},{"key":"2020110613061256300_ocy171-B2","first-page":"727","article-title":"An Ensemble method for spelling correction in consumer health questions","author":"Kilicoglu","year":"2015","journal-title":"AMIA Annu Symp Proc"},{"key":"2020110613061256300_ocy171-B3","doi-asserted-by":"crossref","first-page":"210","DOI":"10.1145\/1882992.1883023","article-title":"Contextualizing consumer health information searching: an analysis of questions in a social Q&A community","volume-title":"Proceedings of the 1st ACM International Health Informatics Symposium","author":"Zhang","year":"2010"},{"issue":"3","key":"2020110613061256300_ocy171-B4","doi-asserted-by":"crossref","first-page":"171","DOI":"10.1145\/363958.363994","article-title":"A technique for computer detection and correction of spelling errors","volume":"7","author":"Damerau","year":"1964","journal-title":"Commun ACM"},{"key":"2020110613061256300_ocy171-B5","first-page":"707","article-title":"Binary codes capable of correcting deletions, insertions and reversals","volume":"10","author":"Levenshtein","year":"1966","journal-title":"Soviet Phys Doklady"},{"issue":"3","key":"2020110613061256300_ocy171-B6","doi-asserted-by":"crossref","first-page":"179","DOI":"10.1197\/jamia.M1474","article-title":"A frequency-based technique to improve the spelling suggestion rank in medical queries","volume":"11","author":"Crowell","year":"2004","journal-title":"J Am Med Inform Assoc"},{"key":"2020110613061256300_ocy171-B7","first-page":"751","article-title":"Identification of misspelled words without a comprehensive dictionary using prevalence analysis","author":"Turchin","year":"2007","journal-title":"AMIA Annu Symp Proc"},{"issue":"02","key":"2020110613061256300_ocy171-B8","doi-asserted-by":"crossref","first-page":"173","DOI":"10.1017\/S1351324908004804","article-title":"Ordering the suggestions of a spellchecker without using context","volume":"15","author":"Mitton","year":"2009","journal-title":"Nat Lang Eng"},{"issue":"2","key":"2020110613061256300_ocy171-B9","doi-asserted-by":"crossref","first-page":"93","DOI":"10.1007\/BF01889984","article-title":"Probability scoring for spelling correction","volume":"1","author":"Church","year":"1991","journal-title":"Stat Comput"},{"issue":"5","key":"2020110613061256300_ocy171-B10","doi-asserted-by":"crossref","first-page":"543","DOI":"10.1007\/s10791-006-9002-8","article-title":"Spelling correction in the PubMed search engine","volume":"9","author":"Wilbur","year":"2006","journal-title":"Inf Retr Boston"},{"key":"2020110613061256300_ocy171-B11","first-page":"219","article-title":"Chapter 14: natural language corpus data","volume-title":"Beautiful Data","author":"Norvig","year":"2009"},{"key":"2020110613061256300_ocy171-B12","first-page":"105","article-title":"On using context for automatic correction of non-word misspellings in student essays","volume-title":"Proceedings of the Seventh Workshop on Building Educational Applications Using NLP","author":"Flor","year":"2012"},{"key":"2020110613061256300_ocy171-B13","doi-asserted-by":"crossref","first-page":"188","DOI":"10.1016\/j.jbi.2015.04.008","article-title":"Automated misspelling detection and correction in clinical free-text records","volume":"55","author":"Lai","year":"2015","journal-title":"J Biomed Inform"},{"issue":"3","key":"2020110613061256300_ocy171-B14","first-page":"61","article-title":"Four types of context for automatic spelling correction","volume":"53","author":"Flor","year":"2012","journal-title":"TAL"},{"key":"2020110613061256300_ocy171-B15","doi-asserted-by":"crossref","first-page":"143","DOI":"10.18653\/v1\/W17-2317","article-title":"Unsupervised context sensitive spelling correction of clinical free-text with word and character N-Gram embeddings","volume-title":"Proceedings of the BioNLP 2017 Workshop","author":"Fivez","year":"2017"},{"key":"2020110613061256300_ocy171-B16","first-page":"170","article-title":"Effective search space reduction for spell correction using character neural embeddings","volume":"2","author":"Pande","year":"2017","journal-title":"Proc 15th Conf Eur Chapter Assoc Comput Linguist"},{"issue":"1","key":"2020110613061256300_ocy171-B17","doi-asserted-by":"crossref","first-page":"87","DOI":"10.1017\/S1351324904003560","article-title":"Correcting real-word spelling errors by restoring lexical cohesion","volume":"11","author":"Hirst","year":"2005","journal-title":"Nat Lang Eng"},{"issue":"5","key":"2020110613061256300_ocy171-B18","doi-asserted-by":"crossref","first-page":"517","DOI":"10.1016\/0306-4573(91)90066-U","article-title":"Context based spelling correction","volume":"27","author":"Mays","year":"1991","journal-title":"Inform Process Manag"},{"key":"2020110613061256300_ocy171-B19","first-page":"605","article-title":"Real-word spelling correction with trigrams: A reconsideration of the Mays, Damerau, and Mercer model","volume-title":"Proceedings, 9th International Conference on Intelligent Text Processing and Computational Lin-guistics (CICLing-2008)","author":"Wilcox-O\u2019Hearn","year":"17\u201323, 2008"},{"key":"2020110613061256300_ocy171-B20","first-page":"211","article-title":"A simple real-word error detection and correction using local word bigram and trigram","volume-title":"Proceedings of the 25th Conference on Computational Linguistics and Speech Processing (ROCLING 2013)","author":"Samanta","year":"2013"},{"key":"2020110613061256300_ocy171-B21","article-title":"Detection is the central problem in real-word spelling correction","volume-title":"arXiv","author":"Wilcox-O\u2019Hearn","year":"2014"},{"key":"2020110613061256300_ocy171-B22","article-title":"Efficient estimation of word representations in vector space","volume-title":"arXiv","author":"Mikolov","year":"2013"},{"key":"2020110613061256300_ocy171-B23","first-page":"3111","article-title":"Distributed representations of words and phrases and their compositionality","volume-title":"NIPS\u201913: Proceedings of the 26th International Conference on Neural Information Processing Systems \u2013 Volume 2","author":"Mikolov","year":"2013"},{"issue":"2","key":"2020110613061256300_ocy171-B24","first-page":"184","article-title":"UMLS knowledge for biomedical language processing","volume":"81","author":"McCray","year":"1993","journal-title":"Bull Med Libr Assoc"},{"key":"2020110613061256300_ocy171-B25","doi-asserted-by":"crossref","first-page":"77","DOI":"10.5220\/0006142000770087","article-title":"Generating a distilled N-Gram Set: Effective Lexical Multiword Building in the SPECIALIST Lexicon","volume-title":"Proceedings of the 10th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2017), Volume 5","author":"Lu","year":"2017"},{"key":"2020110613061256300_ocy171-B26","article-title":"Using Element Words to Generate (Multi) Words for the SPECIALIST Lexicon","volume":"1499","author":"Lu","year":"2014","journal-title":"AMIA Annu Symp Proc"},{"key":"2020110613061256300_ocy171-B27","article-title":"Improving spelling correction with consumer health terminology","volume":"2053","author":"Lu","year":"2018","journal-title":"AMIA Annu Symp Proc"},{"key":"2020110613061256300_ocy171-B28","article-title":"Generating the MEDLINE N-Gram set","volume":"1569","author":"Lu","year":"2015","journal-title":"AMIA Annu Symp Proc"}],"container-title":["Journal of the American Medical Informatics Association"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/academic.oup.com\/jamia\/article-pdf\/26\/3\/211\/34151375\/ocy171.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"http:\/\/academic.oup.com\/jamia\/article-pdf\/26\/3\/211\/34151375\/ocy171.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,9,6]],"date-time":"2022-09-06T22:46:34Z","timestamp":1662504394000},"score":1,"resource":{"primary":{"URL":"https:\/\/academic.oup.com\/jamia\/article\/26\/3\/211\/5298352"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2019,1,21]]},"references-count":28,"journal-issue":{"issue":"3","published-online":{"date-parts":[[2019,1,21]]},"published-print":{"date-parts":[[2019,3,1]]}},"URL":"https:\/\/doi.org\/10.1093\/jamia\/ocy171","relation":{},"ISSN":["1067-5027","1527-974X"],"issn-type":[{"type":"print","value":"1067-5027"},{"type":"electronic","value":"1527-974X"}],"subject":[],"published-other":{"date-parts":[[2019,3]]},"published":{"date-parts":[[2019,1,21]]}}}