{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,8]],"date-time":"2025-10-08T22:24:55Z","timestamp":1759962295511,"version":"3.38.0"},"reference-count":50,"publisher":"SAGE Publications","issue":"3","license":[{"start":{"date-parts":[[2016,7,26]],"date-time":"2016-07-26T00:00:00Z","timestamp":1469491200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/journals.sagepub.com\/page\/policies\/text-and-data-mining-license"}],"content-domain":{"domain":["journals.sagepub.com"],"crossmark-restriction":true},"short-container-title":["Health Informatics J"],"published-print":{"date-parts":[[2016,9]]},"abstract":"<jats:p> This article examines methods for automated question classification applied to cancer-related questions that people have asked on the web. This work is part of a broader effort to provide automated question answering for health education. We created a new corpus of consumer-health questions related to cancer and a new taxonomy for those questions. We then compared the effectiveness of different statistical methods for developing classifiers, including weighted classification and resampling. Basic methods for building classifiers were limited by the high variability in the natural distribution of questions and typical refinement approaches of feature selection and merging categories achieved only small improvements to classifier accuracy. Best performance was achieved using weighted classification and resampling methods, the latter yielding an accuracy of F1\u2009=\u20090.963. Thus, it would appear that statistical classifiers can be trained on natural data, but only if natural distributions of classes are smoothed. Such classifiers would be useful for automated question answering, for enriching web-based content, or assisting clinical professionals to answer questions. <\/jats:p>","DOI":"10.1177\/1460458215571643","type":"journal-article","created":{"date-parts":[[2015,3,11]],"date-time":"2015-03-11T01:31:57Z","timestamp":1426037517000},"page":"523-535","update-policy":"https:\/\/doi.org\/10.1177\/sage-journals-update-policy","source":"Crossref","is-referenced-by-count":10,"title":["Toward automated classification of consumers\u2019 cancer-related questions with a new taxonomy of expected answer types"],"prefix":"10.1177","volume":"22","author":[{"given":"Susan","family":"McRoy","sequence":"first","affiliation":[{"name":"University of Wisconsin\u2013Milwaukee, USA"}]},{"given":"Sean","family":"Jones","sequence":"additional","affiliation":[{"name":"University of Wisconsin\u2013Milwaukee, USA"}]},{"given":"Adam","family":"Kurmally","sequence":"additional","affiliation":[{"name":"University of Wisconsin\u2013Milwaukee, USA"}]}],"member":"179","published-online":{"date-parts":[[2016,7,26]]},"reference":[{"first-page":"556","volume-title":"The 19th international conference on computational linguistics (COLING \u201902)","author":"Li X","key":"bibr1-1460458215571643"},{"volume-title":"The 9th text retrieval conference (TREC-9) (NIST special publication 500-249)","author":"Ittycheriah A","key":"bibr2-1460458215571643"},{"key":"bibr3-1460458215571643","doi-asserted-by":"publisher","DOI":"10.1162\/coli.2007.33.1.63"},{"first-page":"99","volume-title":"The 2nd international conference on human language technology research","author":"Zheng Z","key":"bibr4-1460458215571643"},{"key":"bibr5-1460458215571643","doi-asserted-by":"publisher","DOI":"10.1017\/S1351324905003955"},{"volume-title":"Question answering using document tagging and question classification","year":"2005","author":"Dubien S","key":"bibr6-1460458215571643"},{"key":"bibr7-1460458215571643","doi-asserted-by":"publisher","DOI":"10.1098\/rsta.2012.0513"},{"key":"bibr8-1460458215571643","doi-asserted-by":"publisher","DOI":"10.1016\/j.jbi.2013.08.011"},{"key":"bibr9-1460458215571643","doi-asserted-by":"publisher","DOI":"10.3414\/ME12-02-0005"},{"first-page":"1","volume-title":"The 1st international conference on human language technology research","author":"Hovy E","key":"bibr10-1460458215571643"},{"key":"bibr11-1460458215571643","volume-title":"Speech and language processing","author":"Jurafsky D","year":"2009","edition":"2"},{"key":"bibr12-1460458215571643","volume-title":"Pattern classification","author":"Duda RO","year":"2001","edition":"2"},{"first-page":"497","volume-title":"The 22nd international conference on computational linguistics","author":"Liu Y","key":"bibr13-1460458215571643"},{"key":"bibr14-1460458215571643","doi-asserted-by":"publisher","DOI":"10.1609\/aimag.v31i3.2303"},{"key":"bibr15-1460458215571643","unstructured":"Horowitz B. IBM, WellPoint developing health care applications for Watson. eWeek, http:\/\/www.eweek.com\/c\/a\/Health-Care-IT\/IBM-WellPoint-Developing-Health-Care-Applications-for-Watson-417553\/ (2011, accessed 2 December 2014)."},{"key":"bibr16-1460458215571643","unstructured":"IBM. USAA and IBM join forces to serve military members, http:\/\/www-03.ibm.com\/press\/us\/en\/pressrelease\/44431.wss (2014, accessed 2 December 2014)."},{"key":"bibr17-1460458215571643","doi-asserted-by":"publisher","DOI":"10.12968\/bjcn.2008.13.4.29024"},{"key":"bibr18-1460458215571643","unstructured":"Medela. Ask The LC, http:\/\/www.medelabreastfeedingus.com\/ask-the-lc (2014, accessed 23 September 2014)."},{"key":"bibr19-1460458215571643","doi-asserted-by":"publisher","DOI":"10.1016\/j.pec.2013.04.016"},{"key":"bibr20-1460458215571643","doi-asserted-by":"publisher","DOI":"10.1177\/1557988314539502"},{"key":"bibr21-1460458215571643","doi-asserted-by":"publisher","DOI":"10.17294\/2330-0698.1039"},{"key":"bibr22-1460458215571643","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-79721-0_66"},{"key":"bibr23-1460458215571643","doi-asserted-by":"publisher","DOI":"10.1016\/S0377-2217(00)00244-7"},{"key":"bibr24-1460458215571643","doi-asserted-by":"publisher","DOI":"10.1111\/j.0824-7935.2004.t01-1-00228.x"},{"key":"bibr25-1460458215571643","unstructured":"About.com. All experts, http:\/\/www.allexperts.com\/ (2011, accessed 12 September 2011)."},{"key":"bibr26-1460458215571643","unstructured":"American Society of Clinical Oncology. ASCO expert chats, http:\/\/connection.asco.org\/ (2011, accessed 12 September 2011)."},{"key":"bibr27-1460458215571643","unstructured":"The Cleveland Clinic. Cleveland clinic live chats, http:\/\/my.clevelandclinic.org\/multimedia\/transcripts\/default.aspx (2011, accessed 12 September 2011)."},{"key":"bibr28-1460458215571643","unstructured":"MedHelp. Med help forums, http:\/\/www.medhelp.org\/ (2011, accessed 12 September 2011)."},{"key":"bibr29-1460458215571643","unstructured":"University of Cincinnati. Net Wellness questions, http:\/\/www.netwellness.org (2011, accessed 12 September 2011)."},{"key":"bibr30-1460458215571643","unstructured":"Your Cancer Questions, https:\/\/web.archive.org\/web\/20100401190243\/http:\/\/yourcancerquestions.com\/ (2011, accessed 12 September 2011)."},{"key":"bibr31-1460458215571643","doi-asserted-by":"publisher","DOI":"10.1037\/h0031619"},{"key":"bibr32-1460458215571643","unstructured":"Krippendorff K. Computing Krippendorff\u2019s alpha-reliability, http:\/\/repository.upenn.edu\/asc_papers\/43\/ (2011, accessed 1 December 2014)."},{"key":"bibr33-1460458215571643","doi-asserted-by":"publisher","DOI":"10.1145\/1656274.1656278"},{"key":"bibr34-1460458215571643","unstructured":"The Apache Software Foundation. Lucene. http:\/\/lucene.apache.org\/ (2014, accessed 1 December 2014)."},{"first-page":"394","volume-title":"The 16th Nordic conference on computational linguistics (NODALIDA 2007)","author":"Sundblad H","key":"bibr35-1460458215571643"},{"first-page":"26","volume-title":"The 26th international ACM\/SIGIR conference on research and development in information retrieval (SIGIR \u201903)","author":"Zhang D","key":"bibr36-1460458215571643"},{"key":"bibr37-1460458215571643","doi-asserted-by":"publisher","DOI":"10.1109\/TSMCB.2012.2187280"},{"key":"bibr38-1460458215571643","doi-asserted-by":"publisher","DOI":"10.1142\/S0218001407005703"},{"first-page":"338","volume-title":"The 11th conference on uncertainty in artificial intelligence","author":"John G","key":"bibr39-1460458215571643"},{"volume-title":"The AAAI-98 workshop on learning for text categorization","author":"McCallum A","key":"bibr40-1460458215571643"},{"key":"bibr41-1460458215571643","doi-asserted-by":"publisher","DOI":"10.1016\/B978-0-08-050058-4.50007-3"},{"journal-title":"Sequential minimal optimization: a fast algorithm for training support vector machines","author":"Platt JC","key":"bibr42-1460458215571643"},{"key":"bibr43-1460458215571643","unstructured":"EL-Manzalawy Y, Honavar V. WLSVM: integrating LibSVM into Weka environment, http:\/\/perun.pmf.uns.ac.rs\/radovanovic\/dmsem\/cd\/install\/LIBSVM\/WLSVM\/wlsvm.htm (2005, accessed 1 December 2014)."},{"journal-title":"Support vector machines for text categorization based on latent semantic indexing","year":"2003","author":"Huang Y","key":"bibr44-1460458215571643"},{"key":"bibr45-1460458215571643","unstructured":"Kuenning G. International Ispell, http:\/\/fmg-www.cs.ucla.edu\/geoff\/ispell.html (1996, accessed 1 December 2014)."},{"key":"bibr46-1460458215571643","unstructured":"Atkinson K. SCOWL (And Friends), http:\/\/wordlist.sourceforge.net\/ (2014, accessed 1 December 2014)."},{"key":"bibr47-1460458215571643","first-page":"859","volume":"2005","author":"Zeng QT","year":"2005","journal-title":"AMIA Annu Symp Proc"},{"first-page":"552","volume-title":"The 6th International Joint Conference on Natural Language Processing","author":"Juan X","key":"bibr48-1460458215571643"},{"key":"bibr49-1460458215571643","unstructured":"Stemler SE. A comparison of consensus, consistency, and measurement approaches to estimating inter-rater reliability. Practical Assess Res Eval 2004; 9(4), http:\/\/pareonline.net\/getvn.asp?v=9&n=4 (accessed 1 December 2014)."},{"first-page":"27","volume-title":"The 20th national conference on artificial intelligence (AAAI-05): the workshop on question answering in restricted domains","author":"Yu H","key":"bibr50-1460458215571643"}],"container-title":["Health Informatics Journal"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1460458215571643","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/full-xml\/10.1177\/1460458215571643","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/journals.sagepub.com\/doi\/pdf\/10.1177\/1460458215571643","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,3,1]],"date-time":"2025-03-01T18:05:26Z","timestamp":1740852326000},"score":1,"resource":{"primary":{"URL":"https:\/\/journals.sagepub.com\/doi\/10.1177\/1460458215571643"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2016,7,26]]},"references-count":50,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2016,9]]}},"alternative-id":["10.1177\/1460458215571643"],"URL":"https:\/\/doi.org\/10.1177\/1460458215571643","relation":{},"ISSN":["1460-4582","1741-2811"],"issn-type":[{"type":"print","value":"1460-4582"},{"type":"electronic","value":"1741-2811"}],"subject":[],"published":{"date-parts":[[2016,7,26]]}}}