{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2023,9,13]],"date-time":"2023-09-13T20:15:40Z","timestamp":1694636140050},"reference-count":37,"publisher":"Cambridge University Press (CUP)","issue":"1","license":[{"start":{"date-parts":[[2009,4,22]],"date-time":"2009-04-22T00:00:00Z","timestamp":1240358400000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/www.cambridge.org\/core\/terms"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Nat. Lang. Eng."],"published-print":{"date-parts":[[2010,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>We investigate the use of different machine learning methods in combination with feature selection techniques to explore human multimodal dialogue strategies and the use of those strategies for automated dialogue systems. We learn policies from data collected in a Wizard-of-Oz study where different human \u2018wizards\u2019 decide whether to ask a clarification request in a multimodal manner or else to use speech alone. We first describe the data collection, the coding scheme and annotated corpus, and the validation of the multimodal annotations. We then show that there is a uniform multimodal dialogue strategy across wizards, which is based on multiple features in the dialogue context. These are generic features, available at runtime, which can be implemented in dialogue systems. Our prediction models (for human wizard behaviour) achieve a weighted f-score of 88.6 per cent (which is a 25.6 per cent improvement over the majority baseline). We interpret and discuss the learned strategy. We conclude that human wizard behaviour is not optimal for automatic dialogue systems, and argue for the use of automatic optimization methods, such as Reinforcement Learning. Throughout the investigation we also discuss the issues arising from using small initial Wizard-of-Oz data sets, and we show that feature engineering is an essential step when learning dialogue strategies from such limited data.<\/jats:p>","DOI":"10.1017\/s1351324909005099","type":"journal-article","created":{"date-parts":[[2009,4,22]],"date-time":"2009-04-22T09:11:31Z","timestamp":1240391491000},"page":"3-23","source":"Crossref","is-referenced-by-count":9,"title":["Learning human multimodal dialogue strategies"],"prefix":"10.1017","volume":"16","author":[{"given":"V.","family":"RIESER","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"O.","family":"LEMON","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"56","published-online":{"date-parts":[[2009,4,22]]},"reference":[{"key":"S1351324909005099_ref22","volume-title":"Current and New Directions in Discourse and Dialogue","author":"Purver","year":"2003"},{"key":"S1351324909005099_ref2","doi-asserted-by":"publisher","DOI":"10.3758\/BF03195511"},{"key":"S1351324909005099_ref32","doi-asserted-by":"crossref","unstructured":"Schlangen D. , and Fernandez R. 2007. Speaking through a noisy channel: experiments on inducing clarification behaviour in human\u2013human dialogue. In Interspeech.","DOI":"10.21437\/Interspeech.2007-397"},{"key":"S1351324909005099_ref4","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9780511620539"},{"key":"S1351324909005099_ref11","volume-title":"Proceedings of the 11th UAI-95","author":"John","year":"1995"},{"key":"S1351324909005099_ref23","volume-title":"C4.5: Programs for Machine Learning","author":"Quinlan","year":"1993"},{"key":"S1351324909005099_ref5","doi-asserted-by":"crossref","unstructured":"Cohen W. W. 1995. Fast effective rule induction. In Proceedings of the 12th ICML-95.","DOI":"10.1016\/B978-1-55860-377-6.50023-2"},{"key":"S1351324909005099_ref17","doi-asserted-by":"crossref","unstructured":"Lemon O. , Georgila K. , and Henderson J. 2006. Evaluating effectiveness and portability of reinforcement learned dialogue strategies with real users: the TALK TownInfo evaluation. In IEEE\/ACL Spoken Language Technology.","DOI":"10.1109\/SLT.2006.326774"},{"key":"S1351324909005099_ref8","unstructured":"Fayyad U. , and Irani K. 1993. Multi-interval discretization of continuous valued attributes for classification learning. In Proc. IJCAI-93."},{"key":"S1351324909005099_ref35","doi-asserted-by":"publisher","DOI":"10.1207\/s15516709cog2805_8"},{"key":"S1351324909005099_ref3","first-page":"13","article-title":"The reliability of a dialogue structure coding scheme","volume":"1","author":"Carletta","year":"1997","journal-title":"Computational Linguistics"},{"key":"S1351324909005099_ref7","doi-asserted-by":"crossref","unstructured":"Daelemans W. , Hoste V. , De Meulder, F. , and Naudts B. 2003. Combined optimization of feature selection and algorithm parameter interaction in machine learning of language. In Proceedings of the 14th ECML-03.","DOI":"10.1007\/978-3-540-39857-8_10"},{"key":"S1351324909005099_ref1","first-page":"249","article-title":"Assessing agreement on classification tasks: the kappa statistic","volume":"2","author":"Carletta","year":"1996","journal-title":"Computational Linguistic"},{"key":"S1351324909005099_ref29","doi-asserted-by":"crossref","unstructured":"Rieser V. , and Moore J. 2005. Implications for generating clarification requests in task-oriented dialogues. In Proceedings of the 43rd ACL.","DOI":"10.3115\/1219840.1219870"},{"key":"S1351324909005099_ref27","unstructured":"Rieser V. , and Lemon O. 2008. Learning effective multimodal dialogue strategies from Wizard-of-Oz data: bootstrapping and evaluation. In Proceedings of ACL."},{"key":"S1351324909005099_ref14","doi-asserted-by":"crossref","unstructured":"Kruijff-Korbayov\u00e1 I. , Rieser V. , Gerstenberger C. , Schehl J. , and Becker T. 2006b. The Sammie multimodal dialogue corpus meets the Nite XML Toolkit. In Proceedings of the Fifth Workshop on multi-dimensional Markup in Natural Language Processing.","DOI":"10.3115\/1621034.1621047"},{"key":"S1351324909005099_ref16","unstructured":"Le Z. 2003. Maximum Entropy Modeling Toolkit for Python and C++. homepages.inf.ed.ac.uk\/s0450736\/maxent_toolkit.html."},{"key":"S1351324909005099_ref26","doi-asserted-by":"crossref","unstructured":"Rieser V. , and Lemon O. 2006. Utilising machine learning to explore human multimodal clarification strategies. In Proceedings of the 44rd Annual Meeting of the Association for Computational Linguistics, COLING\/ACL.","DOI":"10.3115\/1273073.1273158"},{"key":"S1351324909005099_ref10","doi-asserted-by":"publisher","DOI":"10.1162\/coli.2008.07-028-R2-05-82"},{"key":"S1351324909005099_ref31","unstructured":"Salmen A. 2002. Multimodale Men\u00fcausgabe im Fahrzeug (Multimodal Menu-based Interaction in the Vehicle). Ph.D. thesis, University of Regensburg."},{"key":"S1351324909005099_ref13","unstructured":"Kruijff-Korbayov\u00e1 I. , Blaylock N. , Gerstenberger C. , Rieser V. , Becker T. , Kaisser M. , Poller P. , and Schehl J. 2005. An experiment setup for collecting data for adaptive output planning in a multimodal dialogue system. In 10th European Workshop on NLG."},{"key":"S1351324909005099_ref9","unstructured":"Hall M. 2000. Correlation-based feature selection for discrete and numeric class machine learning. In Proc. 17th Int Conf. on Machine Learning."},{"key":"S1351324909005099_ref12","unstructured":"Kruijff-Korbayov\u00e1 I. , Becker T. , Blaylock N. , Gerstenberger C. , Kaisser M. , Poller P. , Rieser V. , and Schehl J. 2006a. The SAMMIE corpus of multimodal dialogues with an MP3 player. In Proceedings the 5th International Conference on Language Resources and Evaluation (LREC)."},{"key":"S1351324909005099_ref19","unstructured":"Mattes S. 2003. The lane-change-task as a tool for driver distraction evaluation. In Proc. of IGfA."},{"key":"S1351324909005099_ref36","doi-asserted-by":"crossref","unstructured":"Winterboer A. , Hu J. , Moore J. D. , and Nass C. 2007. The influence of user tailoring and cognitive load on user performance in spoken dialogue systems. in Proc. ICSLP.","DOI":"10.21437\/Interspeech.2007-67"},{"key":"S1351324909005099_ref34","doi-asserted-by":"crossref","unstructured":"Stuttle M. N. , Williams J. D. , and Young S. 2004. A framework for dialogue data collection with a simulated ASR Channel. In ICSLP.","DOI":"10.21437\/Interspeech.2004-128"},{"key":"S1351324909005099_ref15","doi-asserted-by":"crossref","unstructured":"Langley P. , and Sage S. 1994. Induction of selective Bayesian classifiers. In Proceedings of the 10th UAI-94.","DOI":"10.1016\/B978-1-55860-332-5.50055-9"},{"key":"S1351324909005099_ref18","unstructured":"Lemon O. , Georgila K. , Henderson J. , Gabsdil M. , Meza-Ruiz I. , and Young S. 2005. Deliverable D4.1: integration of learning and adaptivity with the ISU approach. Technical report, TALK Project, www.talk-project.org."},{"key":"S1351324909005099_ref37","volume-title":"Data Mining: Practical Machine Learning Tools and Techniques (2nd Edition)","author":"Witten","year":"2005"},{"key":"S1351324909005099_ref20","volume-title":"Advances in Computers","author":"Oviatt","year":"2002"},{"key":"S1351324909005099_ref6","doi-asserted-by":"publisher","DOI":"10.1162\/089120105774321109"},{"key":"S1351324909005099_ref24","unstructured":"Rieser V. 2008. Bootstrapping Reinforcement Learning-based Dialogue Strategies from Wizard-of-Oz data. Ph.D. thesis, Saarbruecken Dissertations in Computational Linguistics and Language Technology, Vol 28."},{"key":"S1351324909005099_ref33","doi-asserted-by":"publisher","DOI":"10.1016\/j.specom.2004.11.005"},{"key":"S1351324909005099_ref21","doi-asserted-by":"crossref","unstructured":"Oviatt S. , Coulston R. , and Lunsford R. 2004. When do we interact Multimodally? Cognitive load and multimodal communication patterns. In Proceedings of the 6th ICMI-04.","DOI":"10.1145\/1027933.1027957"},{"key":"S1351324909005099_ref30","unstructured":"Rodriguez K. , and Schlangen D. 2004. Form, intonation and function of clarification requests in German task-orientaded spoken dialogues. In Proceedings of the Eighth Workshop on Formal Semantics and Dialogue."},{"key":"S1351324909005099_ref25","unstructured":"Rieser V. , Kruijff-Korbayov\u00e1 I. , and Lemon O. 2005. A corpus collection and annotation framework for learning multimodal clarification strategies. In Proceedings of the 6th SIGdial Workshop on Discourse and Dialogue."},{"key":"S1351324909005099_ref28","doi-asserted-by":"crossref","unstructured":"Rieser V. , and Lemon O. 2009. Natural language generation as planning under uncertainty for spoken dialogue system. In Proceedings of EACL.","DOI":"10.3115\/1609067.1609143"}],"container-title":["Natural Language Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S1351324909005099","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,10,4]],"date-time":"2021-10-04T10:56:34Z","timestamp":1633344994000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S1351324909005099\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2009,4,22]]},"references-count":37,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2010,1]]}},"alternative-id":["S1351324909005099"],"URL":"https:\/\/doi.org\/10.1017\/s1351324909005099","relation":{},"ISSN":["1351-3249","1469-8110"],"issn-type":[{"value":"1351-3249","type":"print"},{"value":"1469-8110","type":"electronic"}],"subject":[],"published":{"date-parts":[[2009,4,22]]}}}