{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,18]],"date-time":"2026-06-18T02:38:30Z","timestamp":1781750310189,"version":"3.54.5"},"reference-count":48,"publisher":"Maximum Academic Press","issue":"1","license":[{"start":{"date-parts":[[2012,11,28]],"date-time":"2012-11-28T00:00:00Z","timestamp":1354060800000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/www.cambridge.org\/core\/terms"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["The Knowledge Engineering Review"],"published-print":{"date-parts":[[2013,3]]},"abstract":"<jats:title>Abstract<\/jats:title>\n                  <jats:p>User simulation is an important research area in the field of spoken dialogue systems (SDSs) because collecting and annotating real human\u2013machine interactions is often expensive and time-consuming. However, such data are generally required for designing, training and assessing dialogue systems. User simulations are especially needed when using machine learning methods for optimizing dialogue management strategies such as Reinforcement Learning, where the amount of data necessary for training is larger than existing corpora. The quality of the user simulation is therefore of crucial importance because it dramatically influences the results in terms of SDS performance analysis and the learnt strategy. Assessment of the quality of simulated dialogues and user simulation methods is an open issue and, although assessment metrics are required, there is no commonly adopted metric. In this paper, we give a survey of User Simulations Metrics in the literature, propose some extensions and discuss these metrics in terms of a list of desired features.<\/jats:p>","DOI":"10.1017\/s0269888912000343","type":"journal-article","created":{"date-parts":[[2012,11,28]],"date-time":"2012-11-28T06:36:13Z","timestamp":1354084573000},"page":"59-73","source":"Crossref","is-referenced-by-count":47,"title":["A survey on metrics for the evaluation of user simulations"],"prefix":"10.48130","volume":"28","author":[{"given":"Olivier","family":"Pietquin","sequence":"first","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Helen","family":"Hastie","sequence":"additional","affiliation":[],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"27968","published-online":{"date-parts":[[2012,11,28]]},"reference":[{"key":"S0269888912000343_ref18","doi-asserted-by":"publisher","DOI":"10.1016\/j.csl.2009.03.002"},{"key":"S0269888912000343_ref11","doi-asserted-by":"publisher","DOI":"10.1017\/S0269888909990166"},{"key":"S0269888912000343_ref45","doi-asserted-by":"publisher","DOI":"10.1016\/j.csl.2006.06.008"},{"key":"S0269888912000343_ref21","doi-asserted-by":"publisher","DOI":"10.1109\/89.817450"},{"key":"S0269888912000343_ref41","volume-title":"Reinforcement Learning: An Introduction","author":"Sutton","year":"1998"},{"key":"S0269888912000343_ref34","doi-asserted-by":"crossref","unstructured":"Schatzmann J. , Georgila K. , Young S. 2005a. Quantitative evaluation of user simulation techniques for spoken dialogue systems. In Proceedings of SIGdial'05. Lisbon, Portugal.","DOI":"10.21437\/Interspeech.2006-160"},{"key":"S0269888912000343_ref39","unstructured":"Scheffler K. , Young S. 2001. Corpus-based dialogue simulation for automatic strategy learning and evaluation. In Proceedings of NAACL Workshop on Adaptation in Dialogue Systems. Pittsburgh, PA, USA."},{"key":"S0269888912000343_ref48","doi-asserted-by":"publisher","DOI":"10.1023\/A:1011175525451"},{"key":"S0269888912000343_ref30","doi-asserted-by":"crossref","unstructured":"Rieser V. , Lemon O. 2006. Simulations for learning dialogue strategies. In Proceedings of Interspeech 2006, Pittsburg, USA.","DOI":"10.21437\/Interspeech.2006-489"},{"key":"S0269888912000343_ref32","unstructured":"Rieser V. 2008. Bootstrapping Reinforcement Learning-based Dialogue Strategies from Wizard-of-Oz data. PhD thesis, Saarland University, Department of Computational Linguistics."},{"key":"S0269888912000343_ref2","doi-asserted-by":"crossref","unstructured":"Ai H. , Litman D. 2009. Setting up user action probabilities in user simulations for dialog system development. In Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL), Singapore.","DOI":"10.3115\/1690219.1690271"},{"key":"S0269888912000343_ref36","doi-asserted-by":"crossref","unstructured":"Schatzmann J. , Thomson B. , Weilhammer K. , Ye H. , Young S. 2007a. Agenda-based user simulation for bootstrapping a POMDP dialogue system. In Proceedings of ICASSP'07. Honolulu, USA.","DOI":"10.3115\/1614108.1614146"},{"key":"S0269888912000343_ref40","unstructured":"Singh S. , Kearns M. , Litman D. , Walker M. 1999. Reinforcement learning for spoken dialogue systems. In Proceedings of the NIPS'99. Vancouver, Canada."},{"key":"S0269888912000343_ref6","first-page":"171","article-title":"On the composition of elementary errors. Second paper: statistical applications","volume":"11","author":"Cramer","year":"1928","journal-title":"Skandinavisk Aktuarietidskrift"},{"key":"S0269888912000343_ref20","unstructured":"Levin E. , Pieraccini R. , Eckert W. 1997. Learning dialogue strategies within the Markov decision process framework. In Proceedings of ASRU'97. Santa Barbara, USA."},{"key":"S0269888912000343_ref14","doi-asserted-by":"crossref","unstructured":"Janarthanam S. , Lemon O. 2009a. A data-driven method for adaptive referring expression generation in automated dialogue systems: maximising expected utility. In Proceedings of PRE-COGSCI 09. Boston, USA.","DOI":"10.1007\/978-3-642-15573-4_4"},{"key":"S0269888912000343_ref19","doi-asserted-by":"publisher","DOI":"10.1214\/aoms\/1177729694"},{"key":"S0269888912000343_ref33","doi-asserted-by":"crossref","unstructured":"Russell S. 1998. Learning agents for uncertain environments (extended abstract). In COLT\u2019 98: Proceedings of the 11th Annual Conference on Computational Learning Theory. ACM, 101\u2013103. Madisson, USA.","DOI":"10.1145\/279943.279964"},{"key":"S0269888912000343_ref5","doi-asserted-by":"crossref","unstructured":"Chandramohan S. , Geist M. , Lef\u00e8vre F. , Pietquin O. 2011. User Simulation in Dialogue Systems using Inverse Reinforcement Learning. In Proceedings of Interspeech 2011, Florence, Italy.","DOI":"10.21437\/Interspeech.2011-302"},{"key":"S0269888912000343_ref15","doi-asserted-by":"crossref","unstructured":"Janarthanam S. , Lemon O. 2009b. A two-tier user simulation model for reinforcement learning of adaptive referring expression generation policies. In Proceedings of SIGDIAL. London, UK.","DOI":"10.3115\/1708376.1708392"},{"key":"S0269888912000343_ref29","doi-asserted-by":"crossref","unstructured":"Pietquin O. 2006. Consistent goal-directed user model for realisitc man\u2013machine task-oriented spoken dialogue simulation. In Proceedingsof ICME'06. Toronto, Canada.","DOI":"10.1109\/ICME.2006.262563"},{"key":"S0269888912000343_ref27","unstructured":"Pietquin O. , Rossignol S. , Ianotto M. 2009. Training Bayesian networks for realistic man\u2013machine spoken dialogue simulation. In Proceedings of the 1st International Workshop on Spoken Dialogue Systems Technology, Irsee, Germany, 4."},{"key":"S0269888912000343_ref28","unstructured":"Pietquin O. 2004. A Framework for Unsupervised Learning of Dialogue Strategies. PhD thesis, Facult\u00e9 Polytechnique de Mons (FPMs), Belgium."},{"key":"S0269888912000343_ref43","doi-asserted-by":"crossref","unstructured":"Walker M. , Hindle D. , Fromer J. , Fabbrizio G. D. , Mestel C. 1997a. Evaluating competing agent strategies for a voice email agent. In Proceedings of the 5th European Conference on Speech Communication and Technology (Eurospeech'97), Rhodes, Greece.","DOI":"10.21437\/Eurospeech.1997-585"},{"key":"S0269888912000343_ref8","unstructured":"Cuayahuitl H. 2009. Hierarchical Reinforcement Learning for Spoken Dialogue Systems. PhD thesis, University of Edinburgh, UK."},{"key":"S0269888912000343_ref25","doi-asserted-by":"crossref","unstructured":"Papineni K. , Roukos S. , Ward T. , Zhu W. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, 311\u2013318.","DOI":"10.3115\/1073083.1073135"},{"key":"S0269888912000343_ref23","unstructured":"Ng A. Y. , Russell S. 2000. Algorithms for inverse reinforcement learning. In Proceedings of 17th International Conference on Machine Learning. Morgan Kaufmann, 663\u2013670."},{"key":"S0269888912000343_ref24","doi-asserted-by":"publisher","DOI":"10.1016\/j.specom.2008.03.010"},{"key":"S0269888912000343_ref46","unstructured":"Williams J. , Poupart P. , Young S. 2005. Partially Observable Markov Decision Processes with Continuous Observations for Dialogue Management. In Proceedings of the SigDial Workshop (SigDial'06). Sydney, Australia."},{"key":"S0269888912000343_ref1","unstructured":"Ai H. , Litman D. 2008. Assessing dialog system user simulation evaluation measures using human judges. In Proceedings of the 46th Meeting of the Association for Computational Linguistics, Columbus, OH, USA, 622\u2013629."},{"key":"S0269888912000343_ref17","doi-asserted-by":"crossref","unstructured":"Janarthanam S. , Lemon O. 2009d. A Wizard-of-Oz environment to study referring expression generation in a situated spoken dialogue task. In Proceedings of ENLG, 2009. Athens, Greece.","DOI":"10.3115\/1610195.1610209"},{"key":"S0269888912000343_ref3","doi-asserted-by":"publisher","DOI":"10.1214\/aoms\/1177704477"},{"key":"S0269888912000343_ref7","doi-asserted-by":"crossref","unstructured":"Cuayahuitl H. , Renals S. , Lemon O. , Shimodaira H. 2005. Human\u2013computer dialogue simulation using hidden Markov models. In Proceedings of ASRU, 290\u2013295. Cancun, Mexico","DOI":"10.1109\/ASRU.2005.1566485"},{"key":"S0269888912000343_ref4","first-page":"249","article-title":"Assessing agreement on classification tasks: the kappa statistic","volume":"22","author":"Carletta","year":"1996","journal-title":"Computational Linguistics"},{"key":"S0269888912000343_ref9","doi-asserted-by":"crossref","unstructured":"Doddington G. 2002. Automatic evaluation of machine translation quality using n-gram cooccurrence statistics. In Proceedings of the Human Language Technology Conference (HLT), San Diego, CA, USA, 128\u2013132.","DOI":"10.3115\/1289189.1289273"},{"key":"S0269888912000343_ref10","unstructured":"Eckert W. , Levin E. , Pieraccini R. 1997. User modeling for spoken dialogue system evaluation. In Proceedings of ASRU'97. Santa Barbara, USA."},{"key":"S0269888912000343_ref12","doi-asserted-by":"crossref","unstructured":"Georgila K. , Henderson J. , Lemon O. 2005. Learning user simulations for information state update dialogue systems. In Proceedings of Interspeech 2005. Lisboa, Portugal.","DOI":"10.21437\/Interspeech.2005-401"},{"key":"S0269888912000343_ref13","doi-asserted-by":"crossref","unstructured":"Georgila K. , Henderson J. , Lemon O. 2006. User simulation for spoken dialogue systems: learning and evaluation. In Proceedings of Interspeech'06. Pittsburg, USA.","DOI":"10.21437\/Interspeech.2006-160"},{"key":"S0269888912000343_ref16","doi-asserted-by":"crossref","unstructured":"Janarthanam S. , Lemon O. 2009c. Learning adaptive referring expression generation policies for spoken dialogue systems using reinforcement learning. In Proceedings of SEMDIAL. Stockholm, Sweden.","DOI":"10.1007\/978-3-642-15573-4_4"},{"key":"S0269888912000343_ref22","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-6393(02)00126-7"},{"key":"S0269888912000343_ref26","doi-asserted-by":"publisher","DOI":"10.1109\/TSA.2005.855836"},{"key":"S0269888912000343_ref31","unstructured":"Rieser V. , Lemon O. 2008. Learning effective multimodal dialogue strategies from Wizard-of-Oz data: bootstrapping and evaluation. In Proceedings of ACL, 2008. Colombus, Ohio."},{"key":"S0269888912000343_ref35","doi-asserted-by":"crossref","unstructured":"Schatzmann J. , Stuttle M. N. , Weilhammer K. , Young S. 2005b. Effects of the user model on simulation-based learning of dialogue strategies. In Proceedings of ASRU'05. Cancun, Mexico.","DOI":"10.1109\/ASRU.2005.1566539"},{"key":"S0269888912000343_ref38","doi-asserted-by":"publisher","DOI":"10.1017\/S0269888906000944"},{"key":"S0269888912000343_ref37","unstructured":"Schatzmann J. , Thomson B. , Young S. 2007b. Statistical user simulation with a hidden agenda. In Proceedings of SigDial'07. Anvers, Belgium."},{"key":"S0269888912000343_ref42","volume-title":"Information Retrieval","author":"van Rijsbergen","year":"1979"},{"key":"S0269888912000343_ref44","doi-asserted-by":"crossref","unstructured":"Walker M. , Litman D. , Kamm C. , Abella A. 1997b. Paradise: a framework for evaluating spoken dialogue agents. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics, 271\u2013280. Madrid, Spain.","DOI":"10.3115\/976909.979652"},{"key":"S0269888912000343_ref47","doi-asserted-by":"publisher","DOI":"10.1016\/j.specom.2008.05.007"}],"container-title":["The Knowledge Engineering Review"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S0269888912000343","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2026,1,5]],"date-time":"2026-01-05T14:41:56Z","timestamp":1767624116000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S0269888912000343\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,11,28]]},"references-count":48,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2013,3]]}},"alternative-id":["S0269888912000343"],"URL":"https:\/\/doi.org\/10.1017\/s0269888912000343","relation":{},"ISSN":["0269-8889","1469-8005"],"issn-type":[{"value":"0269-8889","type":"print"},{"value":"1469-8005","type":"electronic"}],"subject":[],"published":{"date-parts":[[2012,11,28]]}}}