{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2023,8,22]],"date-time":"2023-08-22T02:18:42Z","timestamp":1692670722007},"reference-count":28,"publisher":"Cambridge University Press (CUP)","issue":"1","license":[{"start":{"date-parts":[[2009,1,1]],"date-time":"2009-01-01T00:00:00Z","timestamp":1230768000000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/www.cambridge.org\/core\/terms"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Nat. Lang. Eng."],"published-print":{"date-parts":[[2009,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Evaluating interactive question answering (QA) systems with real users can be challenging because traditional evaluation measures based on the relevance of items returned are difficult to employ since relevance judgments can be unstable in multi-user evaluations. The work reported in this paper evaluates, in distinguishing among a set of interactive QA systems, the effectiveness of three questionnaires: a Cognitive Workload Questionnaire (NASA TLX), and Task and System Questionnaires customized to a specific interactive QA application. These Questionnaires were evaluated with four systems, seven analysts, and eight scenarios during a 2-week workshop. Overall, results demonstrate that all three Questionnaires are effective at distinguishing among systems, with the Task Questionnaire being the most sensitive. Results also provide initial support for the validity and reliability of the Questionnaires.<\/jats:p>","DOI":"10.1017\/s1351324908004932","type":"journal-article","created":{"date-parts":[[2008,11,6]],"date-time":"2008-11-06T03:58:27Z","timestamp":1225943907000},"page":"119-141","source":"Crossref","is-referenced-by-count":7,"title":["Questionnaires for eliciting evaluation data from users of interactive question answering systems"],"prefix":"10.1017","volume":"15","author":[{"given":"D.","family":"KELLY","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"P. B.","family":"KANTOR","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"E. L.","family":"MORSE","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"J.","family":"SCHOLTZ","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Y.","family":"SUN","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"56","published-online":{"date-parts":[[2009,1,1]]},"reference":[{"key":"S1351324908004932_ref27","doi-asserted-by":"crossref","unstructured":"Walker M. A. , Litman D. J. , Kamm C. A. , and Abella A. 1997. PARADISE: A framework for evaluating spoken dialogue agents. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics (ACL '97), Madrid, Spain, pp. 271\u2013280.","DOI":"10.3115\/976909.979652"},{"key":"S1351324908004932_ref26","doi-asserted-by":"publisher","DOI":"10.1002\/asi.20560"},{"key":"S1351324908004932_ref25","volume-title":"TREC: Experiment and Evaluation in Information Retrieval","author":"Voorhees","year":"2005"},{"key":"S1351324908004932_ref19","first-page":"3","volume-title":"New Directions in Question Answering","author":"Maybury","year":"2004"},{"key":"S1351324908004932_ref18","first-page":"477","volume-title":"Advances in Open Domain Question Answering","author":"Maiorano","year":"2006"},{"key":"S1351324908004932_ref17","first-page":"1","article-title":"A technique for the measurement of attitudes","volume":"140","author":"Likert","year":"1932","journal-title":"Archives of Psychology"},{"key":"S1351324908004932_ref16","doi-asserted-by":"crossref","unstructured":"Liddy E. D. , Diekema A. R. , and Yilmazel O. 2004. Context-based question-answering evaluation. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '04), Sheffield, UK, pp. 508\u2013509.","DOI":"10.1145\/1008992.1009094"},{"key":"S1351324908004932_ref15","doi-asserted-by":"publisher","DOI":"10.1002\/asi.20575"},{"key":"S1351324908004932_ref10","doi-asserted-by":"crossref","unstructured":"Harabagiu S. , Hickl A. , Lehmann J. , and Moldovan D. 2005. Experiments with interactive question-answering. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL), Ann Arbor, MI, pp. 205\u2013214.","DOI":"10.3115\/1219840.1219866"},{"key":"S1351324908004932_ref8","first-page":"141","volume-title":"New Directions in Question Answering","author":"Diekema","year":"2004"},{"key":"S1351324908004932_ref7","unstructured":"den Os E. , and Bloothooft G. 1998. Evaluating various spoken language dialogue systems with a single questionnaire: analysis of the ELSNET Olympics. In Proceedings of the First International Conference on Language Resources and Evaluation (LREC '98), Granada, Spain, pp. 51\u201354."},{"key":"S1351324908004932_ref6","volume-title":"TREC2006, Proceedings of the Fifteenth Text Retrieval Conference","author":"Dang","year":"2007"},{"key":"S1351324908004932_ref5","doi-asserted-by":"crossref","unstructured":"Cowley P. , Nowell L. , and Scholtz J. 2005. Glass box: an instrumented infrastructure for supporting human interaction with information. In Proceedings of the 38th Hawaii International Conference on System Sciences, Waikoloa, Hawaii.","DOI":"10.1109\/HICSS.2005.286"},{"key":"S1351324908004932_ref2","doi-asserted-by":"publisher","DOI":"10.1002\/asi.10286"},{"key":"S1351324908004932_ref1","article-title":"The IIR evaluation model: a framework for evaluation of interactive information retrieval systems","volume":"8","author":"Borlund","year":"2003","journal-title":"Information Research"},{"key":"S1351324908004932_ref3","doi-asserted-by":"crossref","unstructured":"Chin J. P. , Diehl V. A. , and Norman K. L. 1988. Development of an instrument measuring user satisfaction of the human\u2013computer interface. In Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI '88), Washington, DC, pp. 213\u2013218.","DOI":"10.1145\/57167.57203"},{"key":"S1351324908004932_ref4","doi-asserted-by":"crossref","unstructured":"Cowley P. , Haack J. , Littlefield R. , and Hampson E. 2006. Glass box: capturing, archiving and retrieving workstation activities. In Proceedings of the second ACM Workshop on Continuous Archival and Retrieval of Personal Experiences (CARPE '05), Santa Barbara, CA, pp. 13\u201318.","DOI":"10.1145\/1178657.1178662"},{"key":"S1351324908004932_ref13","doi-asserted-by":"publisher","DOI":"10.1016\/S0306-4573(00)00052-2"},{"key":"S1351324908004932_ref20","unstructured":"Small S. , Strzalkowski T. , Janack T. , Liu T. , Ryan S. , Salkin R. , Shimizu N. , Kantor P. , Kelly D. , Rittman R. , Wacholder N. , and Yamrom B. 2004. HITIQA: scenario-based question answering. In Proceedings of the Workshop on Pragmatics of Question Answering at HLT-NAACL 2004, Boston, MA, pp. 52\u201359."},{"key":"S1351324908004932_ref14","doi-asserted-by":"crossref","unstructured":"Kelly D. , Kantor P. , Morse E. L. , Scholtz J. , and Sun Y. 2006. User-centered evaluation of interactive question answering systems. In Proceedings of the Workshop on Interactive Question Answering at the Human Language Technology Conference (HLT-NAACL '06), New York, NY.","DOI":"10.3115\/1641579.1641586"},{"key":"S1351324908004932_ref21","volume-title":"Advances in Open Domain Question Answering","author":"Strzalkowski","year":"2006"},{"key":"S1351324908004932_ref23","doi-asserted-by":"publisher","DOI":"10.1016\/0306-4573(92)90005-K"},{"key":"S1351324908004932_ref11","doi-asserted-by":"publisher","DOI":"10.1016\/S0166-4115(08)62386-9"},{"key":"S1351324908004932_ref9","first-page":"123","volume-title":"TREC: Experiment and Evaluation in Information Retrieval","author":"Dumais","year":"2005"},{"key":"S1351324908004932_ref24","first-page":"233","volume-title":"TREC: Experiment and Evaluation in Information Retrieval","author":"Voorhees","year":"2005"},{"key":"S1351324908004932_ref22","doi-asserted-by":"publisher","DOI":"10.1002\/asi.20324"},{"key":"S1351324908004932_ref28","volume-title":"The Lunar Sciences Natural Language Information System: Final Report, BBN Report 2378","author":"Woods","year":"1972"},{"key":"S1351324908004932_ref12","first-page":"431","volume-title":"Advances in Open Domain Question Answering","author":"Hersh","year":"2006"}],"container-title":["Natural Language Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S1351324908004932","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2019,4,9]],"date-time":"2019-04-09T16:38:52Z","timestamp":1554827932000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S1351324908004932\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,1]]},"references-count":28,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2009,1]]}},"alternative-id":["S1351324908004932"],"URL":"https:\/\/doi.org\/10.1017\/s1351324908004932","relation":{},"ISSN":["1351-3249","1469-8110"],"issn-type":[{"value":"1351-3249","type":"print"},{"value":"1469-8110","type":"electronic"}],"subject":[],"published":{"date-parts":[[2009,1]]}}}