{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,13]],"date-time":"2026-05-13T21:07:19Z","timestamp":1778706439939,"version":"3.51.4"},"reference-count":202,"publisher":"Springer Science and Business Media LLC","issue":"1","license":[{"start":{"date-parts":[[2020,6,25]],"date-time":"2020-06-25T00:00:00Z","timestamp":1593043200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2020,6,25]],"date-time":"2020-06-25T00:00:00Z","timestamp":1593043200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001942","name":"CHIST-ERA","doi-asserted-by":"publisher","id":[{"id":"10.13039\/501100001942","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001711","name":"Schweizerischer Nationalfonds zur F\u00f6rderung der Wissenschaftlichen Forschung","doi-asserted-by":"publisher","award":["20CH21_174237"],"award-info":[{"award-number":["20CH21_174237"]}],"id":[{"id":"10.13039\/501100001711","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100011033","name":"Agencia Estatal de Investigaci\u00f3n","doi-asserted-by":"crossref","award":["PCIN-2017-118\/AEI"],"award-info":[{"award-number":["PCIN-2017-118\/AEI"]}],"id":[{"id":"10.13039\/501100011033","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100011033","name":"Agencia Estatal de Investigaci\u00f3n","doi-asserted-by":"publisher","award":["PCIN-2017-085\/AEI"],"award-info":[{"award-number":["PCIN-2017-085\/AEI"]}],"id":[{"id":"10.13039\/501100011033","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100001665","name":"Agence Nationale de la Recherche","doi-asserted-by":"publisher","award":["ANR-17-CHR2-0001-03"],"award-info":[{"award-number":["ANR-17-CHR2-0001-03"]}],"id":[{"id":"10.13039\/501100001665","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Artif Intell Rev"],"published-print":{"date-parts":[[2021,1]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>In this paper, we survey the methods and concepts developed for the evaluation of dialogue systems. Evaluation, in and of itself, is a crucial part during the development process. Often, dialogue systems are evaluated by means of human evaluations and questionnaires. However, this tends to be very cost- and time-intensive. Thus, much work has been put into finding methods which allow a reduction in involvement of human labour. In this survey, we present the main concepts and methods. For this, we differentiate between the various classes of dialogue systems (task-oriented, conversational, and question-answering dialogue systems). We cover each class by introducing the main technologies developed for the dialogue systems and then present the evaluation methods regarding that class.<\/jats:p>","DOI":"10.1007\/s10462-020-09866-x","type":"journal-article","created":{"date-parts":[[2020,6,25]],"date-time":"2020-06-25T20:03:36Z","timestamp":1593115416000},"page":"755-810","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":186,"title":["Survey on evaluation methods for dialogue systems"],"prefix":"10.1007","volume":"54","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8405-1344","authenticated-orcid":false,"given":"Jan","family":"Deriu","sequence":"first","affiliation":[]},{"given":"Alvaro","family":"Rodrigo","sequence":"additional","affiliation":[]},{"given":"Arantxa","family":"Otegi","sequence":"additional","affiliation":[]},{"given":"Guillermo","family":"Echegoyen","sequence":"additional","affiliation":[]},{"given":"Sophie","family":"Rosset","sequence":"additional","affiliation":[]},{"given":"Eneko","family":"Agirre","sequence":"additional","affiliation":[]},{"given":"Mark","family":"Cieliebak","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2020,6,25]]},"reference":[{"key":"9866_CR1","unstructured":"Adiwardana D, Luong MT, So DR, Hall J, Fiedel N, Thoppilan R, Yang Z, Kulshreshtha A, Nemade G, Lu Y, et\u00a0al. (2020) Towards a human-like open-domain chatbot. arXiv preprint arXiv:200109977"},{"key":"9866_CR2","unstructured":"Ameixa D, Coheur L (2013) From subtitles to human interactions: introducing the SubTle Corpus. In: Technical report 2013"},{"key":"9866_CR3","volume-title":"How to do things with words","author":"JL Austin","year":"1962","unstructured":"Austin JL (1962) How to do things with words. Oxford University Press, Oxford, William James"},{"key":"9866_CR4","unstructured":"Banchs RE (2012) Movie-DiC: a Movie Dialogue Corpus for Research and Development. In: Proceedings of the 50th annual meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Association for Computational Linguistics, pp 203\u2013207"},{"key":"9866_CR5","unstructured":"Banchs RE, Li H (2012) IRIS: a chat-oriented dialogue system based on the vector space model. In: Proceedings of the ACL 2012 demonstrations, Jeju Island, Korea, pp 37\u201342"},{"key":"9866_CR6","unstructured":"Bernardi R, Kirschner M (2010) From artificial questions to real user interaction logs: Real challenges for Interactive Question Answering systems. In: Proceedings of workshop on web logs and question answering (WLQA\u201910), Valletta, Malta"},{"key":"9866_CR7","doi-asserted-by":"crossref","unstructured":"Black AW, Eskenazi M (2009) The Spoken Dialogue Challenge. In: Proceedings of the SIGDIAL 2009 conference: the 10th annual meeting of the special interest group on discourse and dialogue, Association for Computational Linguistics, Stroudsburg, PA, USA, SIGDIAL \u201909, pp 337\u2013340","DOI":"10.3115\/1708376.1708426"},{"key":"9866_CR8","unstructured":"Black AW, Burger S, Conkie A, Hastie H, Keizer S, Lemon O, Merigaud N, Parent G, Schubiner G, Thomson B, Williams JD, Yu K, Young S, Eskenazi M (2011) Spoken Dialog Challenge 2010: comparison of live and control test results. In: Proceedings of the SIGDIAL 2011 conference: The 12th annual meeting of the special interest group on discourse and dialogue, Association for Computational Linguistics, Portland, Oregon, pp 2\u20137"},{"key":"9866_CR9","unstructured":"Bordes A, Boureau YL, Weston J (2017) Learning end-to-end goal-oriented dialog. In: International conference on learning representations (ICLR) 2017, Toulon, France"},{"key":"9866_CR10","doi-asserted-by":"crossref","unstructured":"Bowman SR, Vilnis L, Vinyals O, Dai A, Jozefowicz R, Bengio S (2016) Generating sentences from a continuous space. In: Proceedings of The 20th SIGNLL conference on computational natural language learning, Association for Computational Linguistics, Berlin, Germany, pp 10\u201321","DOI":"10.18653\/v1\/K16-1002"},{"key":"9866_CR11","doi-asserted-by":"crossref","unstructured":"Bruni E, Fernandez R (2017) Adversarial evaluation for open-domain dialogue generation. In: Proceedings of the SIGDIAL 2017 conference: The 18th annual meeting of the special interest group on discourse and dialogue, Association for Computational Linguistics, pp 284\u2013288","DOI":"10.18653\/v1\/W17-5534"},{"key":"9866_CR12","volume-title":"conference on empirical methods in natural language processing (EMNLP)","author":"Budzianowski P, Wen TH, Tseng BH, Casanueva I, Stefan U, Osman R, Ga\u0161i\u0107 M (2018) MultiWOZ: A large-scale multi-domain Wizard-of-Oz dataset for task-oriented dialogue modelling. In: Proceedings of the","year":"2018","unstructured":"Budzianowski P, Wen TH, Tseng BH, Casanueva I, Stefan U, Osman R, Ga\u0161i\u0107 M (2018) MultiWOZ: A large-scale multi-domain Wizard-of-Oz dataset for task-oriented dialogue modelling. In: Proceedings of the (2018) conference on empirical methods in natural language processing (EMNLP). Belgium, Brussels"},{"key":"9866_CR13","doi-asserted-by":"publisher","unstructured":"Byrne B, Krishnamoorthi K, Sankar C, Neelakantan A, Goodrich B, Duckworth D, Yavuz S, Dubey A, Kim K, Cedilnik A (2019) Taskmaster-1: Toward a realistic and diverse dialog dataset. In: Inui K, Jiang J, Ng V, Wan X (eds) Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3\u20137, 2019, Association for Computational Linguistics, pp 4515\u20134524, https:\/\/doi.org\/10.18653\/v1\/D19-1459","DOI":"10.18653\/v1\/D19-1459"},{"key":"9866_CR14","unstructured":"Campos JA, Otegi A, Soroa A, Deriu J, Cieliebak M, Agirre E (2019) Conversational QA for FAQs. In: 3rd Conversational AI: \u201cToday\u2019s Practice and Tomorrow\u2019s Potential\u201d workshop at NeurIPS 2019"},{"issue":"2","key":"9866_CR15","first-page":"249","volume":"22","author":"J Carletta","year":"1996","unstructured":"Carletta J (1996) Assessing Agreement on Classification Tasks: The Kappa Statistic. Computational Linguistics 22(2):249\u2013254","journal-title":"Computational Linguistics"},{"key":"9866_CR16","unstructured":"Charras F, Dubuisson\u00a0Duplessis G, Letard V, Ligozat AL, Rosset S (2016) Comparing system-response retrieval models for open-domain and casual conversational agent. In: Workshop on Chatbots and Conversational Agent Technologies (WOCHAT)"},{"key":"9866_CR17","doi-asserted-by":"crossref","unstructured":"Chen H, Liu X, Yin D, Tang J (2017) A Survey on dialogue systems: recent advances and new frontiers. Special interest group on knowledge discovery and data mining (SIGKDD) Explor Newsl 19(2):25\u201335","DOI":"10.1145\/3166054.3166058"},{"key":"9866_CR18","volume-title":"Lifelong Machine Learning","author":"Z Chen","year":"2016","unstructured":"Chen Z, Liu B, Brachman R, Stone P, Rossi F (2016) Lifelong Machine Learning, 1st edn. Morgan & Claypool Publishers, San Rafael","edition":"1"},{"key":"9866_CR19","volume-title":"conference on empirical methods in natural language processing (EMNLP)","author":"Choi E, He H, Iyyer M, Yatskar M, Yih Wt, Choi Y, Liang P, Zettlemoyer L (2018) QuAC: Question answering in context. In: Proceedings of the","year":"2018","unstructured":"Choi E, He H, Iyyer M, Yatskar M, Yih Wt, Choi Y, Liang P, Zettlemoyer L (2018) QuAC: Question answering in context. In: Proceedings of the (2018) conference on empirical methods in natural language processing (EMNLP). France, Paris"},{"key":"9866_CR20","doi-asserted-by":"crossref","unstructured":"Chotimongkol A, Rudnicky AI (2001) N-best speech hypotheses reordering using linear regression. In: Dalsgaard P, Lindberg B, Benner H, Tan Z (eds) EUROSPEECH 2001 Scandinavia, 7th European conference on speech communication and technology, 2nd INTERSPEECH Event, Aalborg, Denmark, September 3\u20137, 2001, ISCA, pp 1829\u20131832, http:\/\/www.isca-speech.org\/archive\/eurospeech_2001\/e01_1829.html","DOI":"10.21437\/Eurospeech.2001-432"},{"issue":"1","key":"9866_CR21","doi-asserted-by":"publisher","first-page":"5","DOI":"10.1609\/aimag.v37i1.2636","volume":"37","author":"P Clark","year":"2016","unstructured":"Clark P, Etzioni O (2016) My computer is an honor student but how intelligent is it? standardized tests as a measure of ai. AI Mag 37(1):5\u201312. https:\/\/doi.org\/10.1609\/aimag.v37i1.2636","journal-title":"AI Mag"},{"issue":"4","key":"9866_CR22","doi-asserted-by":"publisher","first-page":"515","DOI":"10.1017\/S0140525X00000030","volume":"4","author":"KM Colby","year":"1981","unstructured":"Colby KM (1981) Modeling a paranoid mind. Behav Brain Sci 4(4):515\u2013534","journal-title":"Behav Brain Sci"},{"key":"9866_CR23","unstructured":"Cole R (1999) Tools for research and education in speech science. In: Proceedings of the international conference of phonetic sciences, San Francisco, USA, pp 1277\u20131280"},{"key":"9866_CR24","doi-asserted-by":"publisher","unstructured":"Collins E, Rozanov N, Zhang B (2019) LIDA: lightweight interactive dialogue annotator. In: Pad\u00f3 S, Huang R (eds) Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3\u20137, 2019\u2014system demonstrations, Association for Computational Linguistics, pp 121\u2013126, https:\/\/doi.org\/10.18653\/v1\/D19-3021","DOI":"10.18653\/v1\/D19-3021"},{"key":"9866_CR25","unstructured":"Danescu C, Lee L (2011) Chameleons in imagined conversations: a new approach to understanding coordination of linguistic style in dialogs. In: Proceedings of the 2nd workshop on cognitive modeling and computational linguistics, Association for Computational Linguistics, pp 76\u201387"},{"key":"9866_CR26","unstructured":"Dethlefs N, Hastie H, Cuay\u00e1huitl H, Lemon O (2013) Conditional random fields for responsive surface realisation using global features. In: Proceedings of the 51st annual meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Sofia, Bulgaria, pp 1254\u20131263"},{"key":"9866_CR27","unstructured":"DeVault D, Leuski A, Sagae K (2011) Toward learning and evaluation of dialogue policies with text examples. In: Proceedings of the SIGDIAL 2011 conference: the 12th annual meeting of the special interest group on discourse and dialogue, Association for Computational Linguistics, Stroudsburg, PA, USA, pp 39\u201348"},{"key":"9866_CR28","doi-asserted-by":"publisher","unstructured":"Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American Chapter of the Association for Computational Linguistics: human language technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics, Minneapolis, Minnesota, pp 4171\u20134186, https:\/\/doi.org\/10.18653\/v1\/N19-1423, https:\/\/www.aclweb.org\/anthology\/N19-1423","DOI":"10.18653\/v1\/N19-1423"},{"issue":"3","key":"9866_CR29","doi-asserted-by":"publisher","first-page":"529","DOI":"10.1007\/s10115-017-1100-y","volume":"55","author":"D Diefenbach","year":"2018","unstructured":"Diefenbach D, Lopez V, Singh K, Maret P (2018) Core techniques of question answering systems over knowledge bases: a survey. Knowl Inf Syst 55(3):529\u2013569","journal-title":"Knowl Inf Syst"},{"key":"9866_CR30","unstructured":"Do P, Nguyen H, Tran C, Nguyen M, Nguyen M (2017) Legal question answering using ranking SVM and deep convolutional neural network. arXiv preprint arXiv:abs\/1703.05320"},{"key":"9866_CR31","unstructured":"Dubuisson\u00a0DG, Letard V, Ligozat AL, Rosset S (2016) Purely corpus-based automatic conversation authoring. In: Proceedings of the tenth international conference on language resources and evaluation, European Language Resources Association (ELRA), Paris, France, LREC 2016, http:\/\/www.lrec-conf.org\/proceedings\/lrec2016\/pdf\/396_Paper.pdf"},{"key":"9866_CR32","unstructured":"Dubuisson\u00a0DG, Charras F, Letard V, Ligozat AL, Rosset S (2017) Utterance retrieval based on recurrent surface text patterns. In: European conference on information retrieval, Aberdeen, Scotland UK, ECIR 2017, https:\/\/hal.archives-ouvertes.fr\/hal-01436052\/document"},{"key":"9866_CR33","doi-asserted-by":"crossref","unstructured":"Du\u0161ek O, Jurcicek F (2016) Sequence-to-sequence generation for spoken dialogue via deep syntax trees and strings. In: Proceedings of the 54th annual meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Association for Computational Linguistics, ACL 2016, pp 45\u201351","DOI":"10.18653\/v1\/P16-2008"},{"key":"9866_CR34","doi-asserted-by":"publisher","first-page":"123","DOI":"10.1016\/j.csl.2019.06.009","volume":"59","author":"O Du\u0161ek","year":"2020","unstructured":"Du\u0161ek O, Novikova J, Rieser V (2020) Evaluating the state-of-the-art of end-to-end natural language generation: the E2E NLG challenge. Comput Speech Lang 59:123\u2013156. https:\/\/doi.org\/10.1016\/j.csl.2019.06.009","journal-title":"Comput Speech Lang"},{"key":"9866_CR35","doi-asserted-by":"crossref","unstructured":"Engel Y, Mannor S, Meir R (2005) Reinforcement learning with gaussian processes. In: Proceedings of the 22nd international conference on machine learning, ACM, Bonn, Germany, ICML \u201905, pp 201\u2013208","DOI":"10.1145\/1102351.1102377"},{"key":"9866_CR36","unstructured":"Engelbrecht KP, M\u00f6ller S, Schleicher R, Wechsung I (2008) Analysis of paradise models for individual users of a spoken dialog system. In: Electronic speech signal processing, proceedings of the 19th conference, Frankfurt am Main, Germany, ESSV 2008, pp 86\u201393, https:\/\/d-nb.info\/990359174\/04"},{"key":"9866_CR37","doi-asserted-by":"crossref","unstructured":"Engelbrecht KP, G\u00f6dde F, Hartard F, Ketabdar H, M\u00f6ller S (2009a) Modeling user satisfaction with Hidden Markov Model. In: Proceedings of the SIGDIAL 2009 conference: the 10th annual meeting of the special interest group on discourse and dialogue, Association for Computational Linguistics, London, UK, SIGDIAL \u201909, pp 170\u2013177, http:\/\/dl.acm.org\/citation.cfm?id=1708376.1708402","DOI":"10.3115\/1708376.1708402"},{"key":"9866_CR38","doi-asserted-by":"crossref","unstructured":"Engelbrecht KP, Quade M, M\u00f6ller S (2009b) Analysis of a new simulation approach to dialog system evaluation. Speech Commun 51(12):1234\u20131252, http:\/\/dx.doi.org\/10.1016\/j.specom.2009.06.007","DOI":"10.1016\/j.specom.2009.06.007"},{"key":"9866_CR39","doi-asserted-by":"publisher","unstructured":"Eric M, Krishnan L, Charette F, Manning CD (2017) Key-value retrieval networks for task-oriented dialogue. In: Proceedings of the SIGDIAL 2017 conference: the 18th annual meeting of the special interest group on discourse and dialogue, Saarbr\u00fccken, Germany, SIGDIAL\u201917, pp 37\u201349, https:\/\/doi.org\/10.18653\/v1\/W17-5506, http:\/\/aclweb.org\/anthology\/W17-5506","DOI":"10.18653\/v1\/W17-5506"},{"key":"9866_CR40","doi-asserted-by":"publisher","unstructured":"Evanini K, Hunter P, Liscombe J, Suendermann D, Dayanidhi K, Pieraccini R (2008) Caller experience: a method for evaluating dialog systems and its automatic prediction. In: 2008 IEEE spoken language technology workshop, Goa, India, pp 129\u2013132, https:\/\/doi.org\/10.1109\/SLT.2008.4777857","DOI":"10.1109\/SLT.2008.4777857"},{"key":"9866_CR41","unstructured":"Fader A, Zettlemoyer L, Etzioni O (2013) Paraphrase-driven learning for open question answering. In: Proceedings of the 51st annual meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Sofia, Bulgaria, pp 1608\u20131618, https:\/\/www.aclweb.org\/anthology\/P13-1158"},{"issue":"5","key":"9866_CR42","doi-asserted-by":"publisher","first-page":"378","DOI":"10.1037\/h0031619","volume":"76","author":"JL Fleiss","year":"1971","unstructured":"Fleiss JL (1971) Measuring nominal scale agreement among many raters. Psychol Bull 76(5):378\u2013382. https:\/\/doi.org\/10.1037\/h0031619","journal-title":"Psychol Bull"},{"key":"9866_CR43","unstructured":"Furlanello T, Lipton ZC, Tschannen M, Itti L, Anandkumar A (2018) Born-again neural networks. In: Dy JG, Krause A (eds) Proceedings of the 35th international conference on machine learning, ICML 2018, Stockholmsm\u00e4ssan, Stockholm, Sweden, July 10\u201315, 2018, PMLR, Proceedings of machine learning research, vol\u00a080, pp 1602\u20131611, http:\/\/proceedings.mlr.press\/v80\/furlanello18a.html"},{"key":"9866_CR44","doi-asserted-by":"crossref","unstructured":"Galley M, Brockett C, Sordoni A, Ji Y, Auli M, Quirk C, Mitchell M, Gao J, Dolan B (2015) deltaBLEU: a discriminative metric for generation tasks with intrinsically diverse targets. In: Proceedings of the 53rd annual meeting of the Association for Computational Linguistics and the 7th international joint conference on natural language processing (Volume 2: Short Papers), Association for Computational Linguistics, ACL 2015, pp 445\u2013450, http:\/\/www.aclweb.org\/anthology\/P15-2073","DOI":"10.3115\/v1\/P15-2073"},{"key":"9866_CR45","doi-asserted-by":"publisher","unstructured":"Gandhe S, Traum D (2016) A Semi-automated Evaluation Metric for Dialogue Model Coherence, Springer International Publishing, Cham, pp 217\u2013225. https:\/\/doi.org\/10.1007\/978-3-319-21834-2_19","DOI":"10.1007\/978-3-319-21834-2_19"},{"key":"9866_CR46","first-page":"2013","volume-title":"conference: the 14th annual meeting of the special interest group on discourse and dialogue","author":"Gandhe S, Traum DR (2013) Surface text based dialogue models for virtual humans. In: Proceedings of the SIGDIAL","year":"2013","unstructured":"Gandhe S, Traum DR (2013) Surface text based dialogue models for virtual humans. In: Proceedings of the SIGDIAL (2013) conference: the 14th annual meeting of the special interest group on discourse and dialogue. Metz, France, SIGDIAL, p 2013"},{"key":"9866_CR47","unstructured":"Gandhe S, Whitman N, Traum D, Artstein R (2009) An integrated authoring tool for tactical questioning dialogue systems. In: 6th IJCAI Workshop on knowledge and reasoning in practical dialogue systems, Pasadena Conference Center, California, USA., pp 10\u201318"},{"key":"9866_CR48","unstructured":"Gasic M, Breslin C, Henderson M, Kim D, Szummer M, Thomson B, Tsiakoulis P, Young S (2013) POMDP-based dialogue manager adaptation to extended domains. In: Proceedings of the SIGDIAL 2013 conference: the 14th annual meeting of the special interest group on discourse and dialogue, Association for Computational Linguistics, Metz, France, SIGDIAL 2013, pp 214\u2013222, http:\/\/www.aclweb.org\/anthology\/W13-4035"},{"key":"9866_CR49","doi-asserted-by":"crossref","unstructured":"Gasic M, Kim D, Tsiakoulis P, Breslin C, Henderson M, Szummer M, Thomson B, Young SJ (2014) Incremental on-line adaptation of POMDP-based dialogue managers to extended domains. In: 15th annual conference of the international speech communication association, Singapore, INTERSPEECH 2014, pp 140\u2013144, http:\/\/www.isca-speech.org\/archive\/interspeech_2014\/i14_0140.html","DOI":"10.21437\/Interspeech.2014-40"},{"key":"9866_CR50","doi-asserted-by":"publisher","unstructured":"Ga\u0161i\u0107 M, Jur\u010d\u00ed\u010dek F, Thomson B, Yu K, Young S (2011) On-line policy optimisation of spoken dialogue systems via live interaction with human subjects. In: 2011 IEEE workshop on automatic speech recognition understanding, pp 312\u2013317, https:\/\/doi.org\/10.1109\/ASRU.2011.6163950","DOI":"10.1109\/ASRU.2011.6163950"},{"key":"9866_CR51","first-page":"5110","volume":"2018","author":"M Ghazvininejad","year":"2018","unstructured":"Ghazvininejad M, Brockett C, Chang MW, Dolan B, Gao J, Yih Wt, Galley M (2018) A knowledge-grounded neural conversation model. Thirty-second AAAI conference on artificial intelligence, New Orleans, Louisiana, USA, AAAI 2018:5110\u20135117","journal-title":"Thirty-second AAAI conference on artificial intelligence, New Orleans, Louisiana, USA, AAAI"},{"key":"9866_CR52","doi-asserted-by":"publisher","unstructured":"Godfrey JJ, Holliman EC, McDaniel J (1992) SWITCHBOARD: telephone speech corpus for research and development. In: [Proceedings] ICASSP-92: 1992 IEEE international conference on acoustics, speech, and signal processing, San Francisco, CA, USA, vol\u00a01, pp 517\u2013520, https:\/\/doi.org\/10.1109\/ICASSP.1992.225858","DOI":"10.1109\/ICASSP.1992.225858"},{"key":"9866_CR53","unstructured":"Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (eds) Advances in neural information processing systems 27, NIPS 27, Curran Associates, Inc., pp 2672\u20132680, http:\/\/papers.nips.cc\/paper\/5423-generative-adversarial-nets.pdf"},{"key":"9866_CR54","doi-asserted-by":"crossref","unstructured":"Gunasekara C, Kummerfeld JK, Polymenakos L, Lasecki WS (2019) DSTC7 Task 1: Noetic end-to-end response selection. In: 7th edition of the dialog system technology challenges at AAAI 2019, http:\/\/workshop.colips.org\/dstc7\/papers\/dstc7_task1_final_report.pdf","DOI":"10.18653\/v1\/W19-4107"},{"key":"9866_CR55","doi-asserted-by":"crossref","unstructured":"Guo D, Tur G, Yih Wt, Zweig G (2014) Joint semantic utterance classification and slot filling with recursive neural networks. In: 2014 IEEE spoken language technology workshop (SLT), South Lake Tahoe, California, USA, IEEE 2014, pp 554\u2013559, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2014\/12\/SLT2014-daniel.pdf","DOI":"10.1109\/SLT.2014.7078634"},{"key":"9866_CR56","unstructured":"Guo F, Metallinou A, Khatri C, Raju A, Venkatesh A, Ram A (2018) Topic-based evaluation for conversational bots. arXiv preprint arXiv:180103622"},{"key":"9866_CR57","doi-asserted-by":"crossref","unstructured":"Gupta P, Mehri S, Zhao T, Pavel A, Eskenazi M, Bigham JP (2019) Investigating evaluation of open-domain dialogue systems with human generated multiple references.\u00a0In: 20th annual meeting of the special interest group on discourse and dialogue","DOI":"10.18653\/v1\/W19-5944"},{"key":"9866_CR58","first-page":"1569","volume":"16","author":"S Hahn","year":"2010","unstructured":"Hahn S, Dinarelli M, Raymond C, Lef\u00e8vre F, Lehen P, De Mori R, Moschitti A, Ney H, Riccardi G (2010) Comparing stochastic approaches to spoken language understanding in multiple languages. IEEE Trans Audio Speech Lang Process 16:1569\u20131583","journal-title":"IEEE Trans Audio Speech Lang Process"},{"key":"9866_CR59","doi-asserted-by":"crossref","unstructured":"Hancock B, Bordes A, Mazare PE, Weston J (2019) Learning from dialogue after deployment: feed yourself, Chatbot! In: Proceedings of the 57th annual meeting of the Association for Computational Linguistics, Florence, Italy, ACL 2019, pp 3667\u20133684, https:\/\/www.aclweb.org\/anthology\/P19-1358","DOI":"10.18653\/v1\/P19-1358"},{"key":"9866_CR60","unstructured":"Hara S (2010) Estimation method of user satisfaction using N-gram-based dialog history model for spoken dialog system. In: Proceedings of the seventh international conference on language resources and evaluation, Valletta, Malta, LREC\u201910, pp 78\u201383, http:\/\/www.lrec-conf.org\/proceedings\/lrec2010\/pdf\/579_Paper.pdf"},{"key":"9866_CR61","doi-asserted-by":"crossref","unstructured":"Henderson M, Thomson B, Williams J (2013a) Dialog state tracking challenge 2 & 3. Technical report","DOI":"10.1109\/SLT.2014.7078595"},{"key":"9866_CR62","unstructured":"Henderson M, Thomson B, Young S (2013b) Deep neural network approach for the dialog state tracking challenge. In: Proceedings of the SIGDIAL 2013 Conference: The 14th Annual Meeting of the Special Interest Group on Discourse and Dialogue, Metz, France, pp 467\u2013471, http:\/\/www.aclweb.org\/anthology\/W13-4073"},{"key":"9866_CR63","doi-asserted-by":"crossref","unstructured":"Henderson M, Thomson B, Williams J (2014) The Second Dialog State Tracking Challenge. In: Proceedings of the SIGDIAL 2014 Conference: The 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue, Philadelphia, PA, USA, pp 263\u2013272, https:\/\/www.microsoft.com\/en-us\/research\/publication\/the-second-dialog-state-tracking-challenge\/","DOI":"10.3115\/v1\/W14-4337"},{"key":"9866_CR64","doi-asserted-by":"publisher","first-page":"48","DOI":"10.1007\/978-3-642-16202-2_5","volume-title":"Second international workshop on spoken dialogue systems technology: spoken dialogue systems for ambient environments","author":"R Higashinaka","year":"2010","unstructured":"Higashinaka R, Minami Y, Dohsaka K (2010) Meguro T (2010) Issues in predicting user satisfaction transitions in dialogues: individual differences, evaluation criteria, and prediction models. In: Lee GG, Mariani J, Minker W, Nakamura S (eds) Second international workshop on spoken dialogue systems technology: spoken dialogue systems for ambient environments. Springer, Berlin Heidelberg, Gotemba, Shizuoka, Japan, WSDS, pp 48\u201360"},{"key":"9866_CR65","doi-asserted-by":"crossref","unstructured":"Hirschman L, Dahl DA, McKay DP, Norton LM, Linebarger MC (1990) Beyond class A: a proposal for automatic evaluation of discourse. In: Proceedings of the speech and natural language workshop, Hidden Valley, Pennsylvania, USA, HLT, pp 109\u2013113","DOI":"10.21236\/ADA458704"},{"key":"9866_CR66","doi-asserted-by":"publisher","first-page":"1735","DOI":"10.1162\/neco.1997.9.8.1735","volume":"9","author":"S Hochreiter","year":"1997","unstructured":"Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Computation 9:1735\u20131780","journal-title":"Neural Computation"},{"key":"9866_CR67","unstructured":"Hu Z, Yang Z, Liang X, Salakhutdinov R, Xing EP (2017) Toward controlled generation of text. In: Proceedings of the 34th international conference on machine learning, international convention centre, Sydney, Australia, ICML, pp 1587\u20131596, http:\/\/proceedings.mlr.press\/v70\/hu17e.html"},{"key":"9866_CR68","unstructured":"Huang HY, Choi E, tau Yih W (2019) FlowQA: grasping flow in history for conversational machine comprehension. In: International conference on learning representations, https:\/\/openreview.net\/forum?id=ByftGnR9KX"},{"key":"9866_CR69","doi-asserted-by":"publisher","unstructured":"Iyyer M, Yih Wt, Chang MW (2017a) Search-based neural structured learning for sequential question answering. In: Proceedings of the 55th annual meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, ACL, pp 1821\u20131831, https:\/\/doi.org\/10.18653\/v1\/P17-1167, http:\/\/www.aclweb.org\/anthology\/P17-1167","DOI":"10.18653\/v1\/P17-1167"},{"key":"9866_CR70","doi-asserted-by":"publisher","unstructured":"Iyyer M, Yih Wt, Chang MW (2017b) Search-based neural structured learning for sequential question answering. In: Proceedings of the 55th annual meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Vancouver, Canada, pp 1821\u20131831, https:\/\/doi.org\/10.18653\/v1\/P17-1167, https:\/\/www.aclweb.org\/anthology\/P17-1167","DOI":"10.18653\/v1\/P17-1167"},{"key":"9866_CR71","doi-asserted-by":"publisher","unstructured":"Joshi M, Choi E, Weld D, Zettlemoyer L (2017) TriviaQA: a large scale distantly supervised challenge dataset for reading comprehension. In: Proceedings of the 55th annual meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Vancouver, Canada, pp 1601\u20131611, https:\/\/doi.org\/10.18653\/v1\/P17-1147, https:\/\/www.aclweb.org\/anthology\/P17-1147","DOI":"10.18653\/v1\/P17-1147"},{"key":"9866_CR72","unstructured":"Ju Y, Zhao F, Chen S, Zheng B, Yang X, Liu Y (2019) Technical report on conversational question answering"},{"key":"9866_CR73","volume-title":"Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition","author":"D Jurafsky","year":"2017","unstructured":"Jurafsky D, Martin JH (2017) Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 3rd edn.\u00a0Prentice Hall PTR, USA","edition":"3"},{"key":"9866_CR74","first-page":"3061","volume-title":"12th annual conference of the international speech communication association","author":"F Jurc\u00edcek","year":"2011","unstructured":"Jurc\u00edcek F, Keizer S, Gasic M, Mairesse F, Thomson B, Yu K, Young SJ (2011) Real user evaluation of spoken dialogue systems using amazon mechanical turk. 12th annual conference of the international speech communication association. Florence, Italy, INTERSPEECH, pp 3061\u20133064"},{"key":"9866_CR75","unstructured":"Kannan A, Vinyals O (2016) Adversarial evaluation of dialogue models. In: Workshop on adversarial training at neural information processing systems 2016"},{"issue":"1","key":"9866_CR76","doi-asserted-by":"publisher","first-page":"119","DOI":"10.1017\/S1351324908004932","volume":"15","author":"D Kelly","year":"2009","unstructured":"Kelly D, Kantor PB, Morse EL, Scholtz J, Sun Y (2009) Questionnaires for eliciting evaluation data from users of interactive question answering systems. Nat Lang Eng 15(1):119\u2013141","journal-title":"Nat Lang Eng"},{"key":"9866_CR77","doi-asserted-by":"crossref","unstructured":"Kenny PG, Parsons TD, Rizzo AA (2009) Human computer interaction in virtual standardized patient systems. In: Proceedings of the 13th international conference on human-computer interaction. Part IV: interacting in various application domains, Springer-Verlag, Berlin, Heidelberg, pp 514\u2013523, http:\/\/dx.doi.org\/10.1007\/978-3-642-02583-9_56","DOI":"10.1007\/978-3-642-02583-9_56"},{"key":"9866_CR78","doi-asserted-by":"publisher","unstructured":"Kim S, D\u2019Haro LF, Banchs RE, Williams JD, Henderson M, Yoshino K (2016) The fifth dialog state tracking challenge. In: 2016 IEEE Spoken Language Technology Workshop (SLT), pp 511\u2013517, https:\/\/doi.org\/10.1109\/SLT.2016.7846311","DOI":"10.1109\/SLT.2016.7846311"},{"key":"9866_CR79","doi-asserted-by":"publisher","first-page":"317","DOI":"10.1162\/tacl_a_00023","volume":"6","author":"T Ko\u010disk\u00fd","year":"2018","unstructured":"Ko\u010disk\u00fd T, Schwarz J, Blunsom P, Dyer C, Hermann KM, Melis G, Grefenstette E (2018) The narrativeQA reading comprehension challenge. Trans Assoc Computational Ling 6:317\u2013328. https:\/\/doi.org\/10.1162\/tacl_a_00023","journal-title":"Trans Assoc Computational Ling"},{"issue":"24","key":"9866_CR80","doi-asserted-by":"publisher","first-page":"5412","DOI":"10.1016\/j.ins.2011.07.047","volume":"181","author":"O Kolomiyets","year":"2011","unstructured":"Kolomiyets O, Moens MF (2011) A Survey on Question Answering Technology from an Information Retrieval Perspective. Inf Sci 181(24):5412\u20135434. https:\/\/doi.org\/10.1016\/j.ins.2011.07.047","journal-title":"Inf Sci"},{"key":"9866_CR81","doi-asserted-by":"crossref","unstructured":"Konstantinova N, Orasan C (2013) Interactive Question Answering. In: Emerging applications of natural language processing: concepts and new research, pp 149\u2013169","DOI":"10.4018\/978-1-4666-2169-5.ch007"},{"key":"9866_CR82","doi-asserted-by":"crossref","unstructured":"Kreyssig F, Casanueva I, Budzianowski P, Gasic M (2018) Neural user simulation for corpus-based policy optimisation for spoken dialogue systems. arXiv preprint arXiv:1805.06966","DOI":"10.18653\/v1\/W18-5007"},{"key":"9866_CR83","doi-asserted-by":"crossref","unstructured":"Lai G, Xie Q, Liu H, Yang Y, Hovy E (2017) RACE: large-scale ReAding comprehension dataset from examinations. In: Proceedings EMNLP 2017\u2014conference on empirical methods in natural language processing, pp 785\u2013794, arXiv:1704.04683","DOI":"10.18653\/v1\/D17-1082"},{"issue":"4","key":"9866_CR84","doi-asserted-by":"publisher","first-page":"339","DOI":"10.1016\/S0167-6393(99)00067-9","volume":"31","author":"L Lamel","year":"2000","unstructured":"Lamel L, Rosset S, Gauvain JL, Bennacef S, Garnier-Rizet M, Prouts B (2000) The limsi arise system. Speech Commun 31(4):339\u2013353","journal-title":"Speech Commun"},{"key":"9866_CR85","unstructured":"Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, Soricut R (2019) Albert: a lite bert for self-supervised learning of language representations.\u00a0arXiv preprint arXiv:1909.11942"},{"key":"9866_CR86","doi-asserted-by":"publisher","unstructured":"Larson S, Mahendran A, Peper JJ, Clarke C, Lee A, Hill P, Kummerfeld JK, Leach K, Laurenzano MA, Tang L, Mars J (2019) An evaluation dataset for intent classification and out-of-scope prediction. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), Association for Computational Linguistics, Hong Kong, China, pp 1311\u20131316, https:\/\/doi.org\/10.18653\/v1\/D19-1131, https:\/\/www.aclweb.org\/anthology\/D19-1131","DOI":"10.18653\/v1\/D19-1131"},{"key":"9866_CR87","doi-asserted-by":"crossref","unstructured":"Lavie A, Denkowski MJ (2009) The meteor metric for automatic evaluation of machine translation. Mach Transl 23(2-3):105\u2013115, http:\/\/dx.doi.org\/10.1007\/s10590-009-9059-4","DOI":"10.1007\/s10590-009-9059-4"},{"issue":"5","key":"9866_CR88","doi-asserted-by":"publisher","first-page":"466","DOI":"10.1016\/j.specom.2009.01.008","volume":"51","author":"C Lee","year":"2009","unstructured":"Lee C, Jung S, Kim S, Lee GG (2009) Example-based dialog modeling for practical multi-domain dialog system. Speech Commun 51(5):466\u2013484","journal-title":"Speech Commun"},{"key":"9866_CR89","unstructured":"Lee S, Schulz H, Atkinson A, Gao J, Suleman K, El Asri L, Adada M, Huang M, Sharma S, Tay W, Li X (2019) Multi-domain task-completion dialog challenge. In: Dialog system technology challenges 8"},{"key":"9866_CR90","doi-asserted-by":"publisher","first-page":"9","DOI":"10.1017\/S0266078400006854","volume":"28","author":"GN Leech","year":"1993","unstructured":"Leech GN (1993) 100 million words of english: the british national corpus (BNC). English Today 28:9\u201315. https:\/\/doi.org\/10.1017\/S0266078400006854","journal-title":"English Today"},{"key":"9866_CR91","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-4614-4803-7","volume-title":"Data-driven methods for adaptive spoken dialogue systems: computational learning for conversational interfaces","author":"O Lemon","year":"2012","unstructured":"Lemon O, Pietquin O (2012) Data-driven methods for adaptive spoken dialogue systems: computational learning for conversational interfaces. Springer, Berlin"},{"issue":"8","key":"9866_CR92","first-page":"707","volume":"10","author":"VI Levenshtein","year":"1966","unstructured":"Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions and reversals. Soviet Phys Doklady 10(8):707\u2013710","journal-title":"Soviet Phys Doklady"},{"key":"9866_CR93","doi-asserted-by":"publisher","unstructured":"Levin E, Pieraccini R, Eckert W (1998) Using Markov decision process for learning dialogue strategies. In: Proceedings of the 1998 IEEE international conference on acoustics, speech and signal processing, Seattle, WA, USA, ICASSP, vol\u00a01, pp 201\u2013204, https:\/\/doi.org\/10.1109\/ICASSP.1998.674402","DOI":"10.1109\/ICASSP.1998.674402"},{"key":"9866_CR94","doi-asserted-by":"crossref","unstructured":"Li H, Min MR, Ge Y, Kadav A (2017a) A context-aware attention network for interactive question answering. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, USA, KDD \u201917, pp 927\u2013935, http:\/\/doi.acm.org\/10.1145\/3097983.3098115","DOI":"10.1145\/3097983.3098115"},{"key":"9866_CR95","doi-asserted-by":"crossref","unstructured":"Li J, Galley M, Brockett C, Gao J, Dolan B (2016a) A diversity-promoting objective function for neural conversation models. In: Proceedings of the 2016 conference of the North American Chapter of the Association for Computational Linguistics: human language technologies, Association for Computational Linguistics, San Diego, California, pp 110\u2013119, http:\/\/www.aclweb.org\/anthology\/N16-1014","DOI":"10.18653\/v1\/N16-1014"},{"key":"9866_CR96","doi-asserted-by":"publisher","unstructured":"Li J, Monroe W, Ritter A, Jurafsky D, Galley M, Gao J (2016b) Deep reinforcement learning for dialogue generation. In: Proceedings of the 2016 conference on empirical methods in natural language processing, Association for Computational Linguistics, Austin, Texas, EMNLP \u201916, pp 1192\u20131202, https:\/\/doi.org\/10.18653\/v1\/D16-1127, http:\/\/www.aclweb.org\/anthology\/D16-1127","DOI":"10.18653\/v1\/D16-1127"},{"key":"9866_CR97","unstructured":"Li X, Chen YN, Li L, Gao J, Celikyilmaz A (2017b) End-to-end task-completion neural dialogue systems. In: Proceedings of the eighth international joint conference on natural language processing (Volume 1: Long Papers), Asian Federation of Natural Language Processing, Taipei, Taiwan, IJCNLP, pp 733\u2013743, http:\/\/aclweb.org\/anthology\/I17-1074"},{"key":"9866_CR98","unstructured":"Li Y, Su H, Shen X, Li W, Cao Z, Niu S (2017c) DailyDialog: A manually labelled multi-turn dialogue dataset. In: Proceedings of the eighth international joint conference on natural language processing (Volume 1: Long Papers), Asian Federation of Natural Language Processing, Taipei, Taiwan, pp 986\u2013995, https:\/\/www.aclweb.org\/anthology\/I17-1099"},{"key":"9866_CR99","unstructured":"Lin CY (2004) ROUGE: a package for automatic evaluation of summaries. In: Marie-Francine\u00a0Moens SS (ed) Text summarization branches out: proceedings of the ACL-04 workshop, Association for Computational Linguistics, Barcelona, Spain, pp 74\u201381, http:\/\/www.aclweb.org\/anthology\/W04-1013"},{"key":"9866_CR100","doi-asserted-by":"crossref","unstructured":"Liu B, T\u00fcr G, Hakkani-T\u00fcr D, Shah P, Heck L (2018) Dialogue learning with human teaching and feedback in end-to-end trainable task-oriented dialogue systems. In: Proceedings of the 2018 conference of the North American chapter of the Association for Computational Linguistics: human language technologies, Volume 1 (Long Papers), Association for Computational Linguistics, New Orleans, Louisiana, USA, NAACL-HLT \u201918, pp 2060\u20132069, http:\/\/aclweb.org\/anthology\/N18-1187","DOI":"10.18653\/v1\/N18-1187"},{"key":"9866_CR101","doi-asserted-by":"publisher","unstructured":"Liu CW, Lowe R, Serban I, Noseworthy M, Charlin L, Pineau J (2016) How NOT To evaluate your dialogue system: an empirical study of unsupervised evaluation metrics for dialogue response generation. In: Proceedings of the 2016 conference on empirical methods in natural language processing, Association for Computational Linguistics, Austin, Texas, pp 2122\u20132132, https:\/\/doi.org\/10.18653\/v1\/D16-1230, http:\/\/www.aclweb.org\/anthology\/D16-1230","DOI":"10.18653\/v1\/D16-1230"},{"key":"9866_CR102","unstructured":"Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: a robustly optimized bert pretraining approach.\u00a0arXiv preprint arXiv:1907.11692"},{"key":"9866_CR103","doi-asserted-by":"crossref","unstructured":"Lowe R, Serban IV, Noseworthy M, Charlin L, Pineau J (2016) On the evaluation of dialogue systems with next utterance classification. In: Proceedings of the SIGDIAL 2016 conference: the 17th annual meeting of the special interest group on discourse and dialogue, Association for Computational Linguistics, Los Angeles, CA, USA, pp 264\u2013269, http:\/\/www.aclweb.org\/anthology\/W16-3634","DOI":"10.18653\/v1\/W16-3634"},{"key":"9866_CR104","doi-asserted-by":"publisher","unstructured":"Lowe R, Noseworthy M, Serban IV, Angelard-Gontier N, Bengio Y, Pineau J (2017a) Towards an automatic turing test: learning to evaluate dialogue responses. In: Proceedings of the 55th annual meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Vancouver, Canada, ACL \u201917, pp 1116\u20131126, https:\/\/doi.org\/10.18653\/v1\/P17-1103, http:\/\/www.aclweb.org\/anthology\/P17-1103","DOI":"10.18653\/v1\/P17-1103"},{"issue":"1","key":"9866_CR105","doi-asserted-by":"publisher","first-page":"31","DOI":"10.5087\/dad.2017.102","volume":"8","author":"R Lowe","year":"2017","unstructured":"Lowe R, Pow N, Serban IV, Charlin L, Liu CW, Pineau J (2017b) Training end-to-end dialogue systems with the ubuntu dialogue corpus. Dialogue Discourse 8(1):31\u201365","journal-title":"Dialogue Discourse"},{"key":"9866_CR106","doi-asserted-by":"crossref","unstructured":"Lowe RJ, Pow N, Serban I, Pineau J (2015) The Ubuntu dialogue corpus: a large dataset for research in unstructured multi-turn dialogue systems. In: Proceedings of the SIGDIAL 2015 conference: the 16th annual meeting of the special interest group on discourse and dialogue, Association for Computational Linguistics, Prague, Czech Republic, pp 285\u2013294, http:\/\/aclweb.org\/anthology\/W15-4640","DOI":"10.18653\/v1\/W15-4640"},{"issue":"2","key":"9866_CR107","doi-asserted-by":"publisher","first-page":"190","DOI":"10.1111\/j.1540-4781.2011.01232_1.x","volume":"96","author":"X Lu","year":"2012","unstructured":"Lu X (2012) The relationship of lexical richness to the quality of ESL learners\u2019 oral narratives. Modern Lang J 96(2):190\u2013208. https:\/\/doi.org\/10.1111\/j.1540-4781.2011.01232_1.x","journal-title":"Modern Lang J"},{"key":"9866_CR108","unstructured":"Mairesse F, Ga\u0161i\u0107 M, Jur\u010d\u00ed\u010dek F, Keizer S, Thomson B, Yu K, Young S (2010) Phrase-based statistical language generation using graphical models and active learning. In: Proceedings of the 48th annual meeting of the Association for Computational Linguistics, Uppsala, Sweden, ACL \u201910, pp 1552\u20131561, https:\/\/www.aclweb.org\/anthology\/P10-1157"},{"key":"9866_CR109","doi-asserted-by":"crossref","unstructured":"Mazza R, Ambrosini L, Catenazzi N, Vanini S, Tuggener D, Tavarnesi G (2018) Behavioural simulator for professional training based on natural language interaction. In: 10th international conference on education and new learning technologies, Palma, Mallorca, Spain, EDULEARN18, pp 3204\u20133214, http:\/\/repository.supsi.ch\/9776\/1\/edulearn18-paper-lifelike.pdf","DOI":"10.21125\/edulearn.2018.0845"},{"issue":"3","key":"9866_CR110","doi-asserted-by":"publisher","first-page":"249","DOI":"10.1016\/j.specom.2004.11.006","volume":"45","author":"M McTear","year":"2005","unstructured":"McTear M, O\u2019Neill I, Hanna P, Liu X (2005) Handling errors and determining confirmation strategies\u2013an object-based approach. Speech Commun 45(3):249\u2013269","journal-title":"Speech Commun"},{"key":"9866_CR111","doi-asserted-by":"crossref","unstructured":"Mei H, Bansal M, Walter MR (2016) What to talk about and how? Selective generation using LSTMs with coarse-to-fine alignment. In: Proceedings of the 2016 conference of the North American Chapter of the Association for Computational Linguistics: human language technologies, San Diego, California, NAACL-HLT, pp 720\u2013730, https:\/\/www.aclweb.org\/anthology\/N16-1086","DOI":"10.18653\/v1\/N16-1086"},{"issue":"3","key":"9866_CR112","doi-asserted-by":"publisher","first-page":"530","DOI":"10.1109\/TASLP.2014.2383614","volume":"23","author":"G Mesnil","year":"2015","unstructured":"Mesnil G, Dauphin Y, Yao K, Bengio Y, Deng L, Hakkani-Tur D, He X, Heck L, Tur G, Yu D, Zweig G (2015) Using recurrent neural networks for slot filling in spoken language understanding. IEEE\/ACM Trans Audio, Speech Lang Process 23(3):530\u2013539","journal-title":"IEEE\/ACM Trans Audio, Speech Lang Process"},{"key":"9866_CR113","unstructured":"Metallinou A, Bohus D, Williams J (2013) Discriminative state tracking for spoken dialog systems. In: Proceedings of the 51st annual meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Sofia, Bulgaria, pp 466\u2013475, http:\/\/www.aclweb.org\/anthology\/P13-1046"},{"key":"9866_CR114","doi-asserted-by":"crossref","unstructured":"Miller A, Feng W, Batra D, Bordes A, Fisch A, Lu J, Parikh D, Weston J (2017) ParlAI: a dialog research software platform. In: Proceedings of the 2017 conference on empirical methods in natural language processing: system demonstrations, EMNLP \u201917, pp 79\u201384, https:\/\/www.aclweb.org\/anthology\/D17-2014","DOI":"10.18653\/v1\/D17-2014"},{"issue":"3","key":"9866_CR115","doi-asserted-by":"publisher","first-page":"345","DOI":"10.1016\/j.jksuci.2014.10.007","volume":"28","author":"A Mishra","year":"2016","unstructured":"Mishra A, Jain SK (2016) A survey on question answering systems with classification. J King Saud Univ Comput Inf Sci 28(3):345\u2013361. https:\/\/doi.org\/10.1016\/j.jksuci.2014.10.007","journal-title":"J King Saud Univ Comput Inf Sci"},{"key":"9866_CR116","unstructured":"M\u00f6ller S, Krebber J, Raake A, Smeele P, Rajman M, Melichar M, Pallotta V, Tsakou G, Kladis B, Vovos A, Hoonhout J, Schuchardt D, Fakotakis N, Ganchev T, Potamitis I (2004) INSPIRE: evaluation of a smart-home system for infotainment management and device control. In: Proceedings of the fourth international conference on language resources and evaluation (LREC\u201904), European Language Resources Association (ELRA), Lisbon, Portugal, http:\/\/www.lrec-conf.org\/proceedings\/lrec2004\/pdf\/12.pdf"},{"key":"9866_CR117","doi-asserted-by":"crossref","unstructured":"M\u00f6ller S, Englert R, Engelbrecht K, Hafner V, Jameson A, Oulasvirta A, Raake A, Reithinger N (2006) MeMo: towards automatic usability evaluation of spoken dialogue services by user error simulations. In: Ninth international conference on spoken language processing, INTERSPEECH\u2014ICSLP 2006, pp 1786\u20131789, https:\/\/www.isca-speech.org\/archive\/interspeech_2006\/i06_1131.html","DOI":"10.21437\/Interspeech.2006-494"},{"key":"9866_CR118","doi-asserted-by":"publisher","unstructured":"Mrk\u0161i\u0107 N, \u00d3\u00a0S\u00e9aghdha D, Wen TH, Thomson B, Young S (2017) Neural belief tracker: data-driven dialogue state tracking. In: Proceedings of the 55th annual meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, Canada, ACL \u201917, pp 1777\u20131788, https:\/\/doi.org\/10.18653\/v1\/P17-1163, http:\/\/aclweb.org\/anthology\/P17-1163","DOI":"10.18653\/v1\/P17-1163"},{"key":"9866_CR119","doi-asserted-by":"crossref","unstructured":"Novikova J, Du\u0161ek O, Rieser V (2017) The E2E dataset: new challenges for end-to-end generation. In: Proceedings of the 18th annual meeting of the special interest group on discourse and dialogue, Saarbr\u00fccken, Germany, SIGDIAL \u201917, pp 201\u2013206, https:\/\/www.aclweb.org\/anthology\/W17-5525, arXiv:1706.09254","DOI":"10.18653\/v1\/W17-5525"},{"key":"9866_CR120","unstructured":"Paek T (2006) Reinforcement learning for spoken dialogue systems: comparing strengths and weaknesses for practical deployment. In: Proceedings of dialog-on-dialog workshop, interspeech, Pittsburgh, PA, USA, http:\/\/www.ling.helsinki.fi\/~kjokinen\/ICSLP06-DoD\/Programme\/PaekTim.pdf"},{"key":"9866_CR121","doi-asserted-by":"crossref","unstructured":"Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, ACL \u201902, pp 311\u2013318, http:\/\/www.aclweb.org\/anthology\/P02-1040","DOI":"10.3115\/1073083.1073135"},{"issue":"2","key":"9866_CR122","doi-asserted-by":"publisher","first-page":"177","DOI":"10.1007\/s10579-012-9177-0","volume":"46","author":"A Pe\u00f1as","year":"2012","unstructured":"Pe\u00f1as A, Magnini B, Forner P, Sutcliffe R, Rodrigo \u00c1, Giampiccolo D (2012) Question answering at the cross-language evaluation forum 2003\u20132010. Lang Resour Evaluat 46(2):177\u2013217. https:\/\/doi.org\/10.1007\/s10579-012-9177-0","journal-title":"Lang Resour Evaluat"},{"key":"9866_CR123","unstructured":"Perez J, Boureau YL, Bordes A (2017) Dialog system and technology challenge 6 overview of track 1 - end-to-end goal-oriented dialog learning. Technical report"},{"key":"9866_CR124","doi-asserted-by":"publisher","unstructured":"Peskov D, Clarke N, Krone J, Fodor B, Zhang Y, Youssef A, Diab M (2019) Multi-domain goal-oriented dialogues (MultiDoGO): strategies toward curating and annotating large scale dialogue data. In: Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), Association for Computational Linguistics, Hong Kong, China, pp 4526\u20134536, https:\/\/doi.org\/10.18653\/v1\/D19-1460, https:\/\/www.aclweb.org\/anthology\/D19-1460","DOI":"10.18653\/v1\/D19-1460"},{"issue":"1","key":"9866_CR125","doi-asserted-by":"publisher","first-page":"59","DOI":"10.1017\/S0269888912000343","volume":"28","author":"O Pietquin","year":"2013","unstructured":"Pietquin O, Hastie H (2013) A survey on metrics for the evaluation of user simulations. Knowl Eng Rev 28(1):59\u201373. https:\/\/doi.org\/10.1017\/S0269888912000343","journal-title":"Knowl Eng Rev"},{"key":"9866_CR126","unstructured":"Powers DMW (2012) The Problem with Kappa. In: Proceedings of the 13th conference of the European chapter of the Association for Computational Linguistics, Avignon, France, EACL \u201913, pp 345\u2013355, http:\/\/www.aclweb.org\/anthology\/E12-1035"},{"key":"9866_CR127","doi-asserted-by":"publisher","unstructured":"Qu C, Yang L, Croft WB, Trippas JR, Zhang Y, Qiu M (2018) Analyzing and characterizing user intent in information-seeking conversations. In: The 41st international ACM SIGIR conference on research & development in information retrieval, Ann Arbor, MI, USA, SIGIR 2018, pp 989\u2013992, https:\/\/doi.org\/10.1145\/3209978.3210124","DOI":"10.1145\/3209978.3210124"},{"key":"9866_CR128","doi-asserted-by":"publisher","unstructured":"Qu C, Yang L, Qiu M, Zhang Y, Chen C, Croft WB, Iyyer M (2019) Attentive history selection for conversational question answering. In: Proceedings of the 28th ACM international conference on information and knowledge management, Association for Computing Machinery, New York, NY, USA, CIKM \u201919, pp 1391\u20131400, https:\/\/doi.org\/10.1145\/3357384.3357905,","DOI":"10.1145\/3357384.3357905"},{"key":"9866_CR129","unstructured":"Qu Y, Green N (2002) A constraint-based approach for cooperative information-seeking dialogue. In: Proceedings of the international natural language generation conference, Harriman, New York, USA, INLG, pp 136\u2013143"},{"key":"9866_CR130","doi-asserted-by":"publisher","unstructured":"Rajpurkar P, Zhang J, Lopyrev K, Liang P (2016) SQuAD: 100,000+ questions for machine comprehension of text. In: Proceedings of the 2016 conference on empirical methods in natural language processing, Association for Computational Linguistics, Austin, Texas, pp 2383\u20132392, https:\/\/doi.org\/10.18653\/v1\/D16-1264, https:\/\/www.aclweb.org\/anthology\/D16-1264","DOI":"10.18653\/v1\/D16-1264"},{"key":"9866_CR131","doi-asserted-by":"publisher","unstructured":"Rajpurkar P, Jia R, Liang P (2018) Know what you don\u2019t know: unanswerable questions for SQuAD. In: Proceedings of the 56th annual meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Association for Computational Linguistics, Melbourne, Australia, pp 784\u2013789, https:\/\/doi.org\/10.18653\/v1\/P18-2124, https:\/\/www.aclweb.org\/anthology\/P18-2124,","DOI":"10.18653\/v1\/P18-2124"},{"key":"9866_CR132","doi-asserted-by":"crossref","unstructured":"Rambow O, Bangalore S, Walker M (2001) Natural language generation in dialog systems. In: Proceedings of the first international conference on Human language technology (HLT) research, San Diego, USA, pp 67\u201373","DOI":"10.3115\/1072133.1072207"},{"key":"9866_CR133","doi-asserted-by":"crossref","unstructured":"Rastogi A, Zang X, Sunkara S, Gupta R, Khaitan P (2019) Towards scalable multi-domain conversational agents: the schema-guided dialogue dataset. arXiv preprint arXiv:1909.05855","DOI":"10.1609\/aaai.v34i05.6394"},{"key":"9866_CR134","doi-asserted-by":"publisher","first-page":"249","DOI":"10.1162\/tacl_a_00266","volume":"7","author":"S Reddy","year":"2018","unstructured":"Reddy S, Chen D, Manning CD (2018) CoQA: a conversational question answering challenge. Trans Assoc Comput Linguist 7:249\u2013266","journal-title":"Trans Assoc Comput Linguist"},{"key":"9866_CR135","unstructured":"Richardson M, Burges CJ, Renshaw E (2013) MCTest: a challenge dataset for the open-domain machine comprehension of text. In: Proceedings of the 2013 conference on empirical methods in natural language processing, Association for Computational Linguistics, Seattle, Washington, USA, pp 193\u2013203, https:\/\/www.aclweb.org\/anthology\/D13-1020"},{"issue":"1","key":"9866_CR136","doi-asserted-by":"publisher","first-page":"55","DOI":"10.1017\/S1351324908004907","volume":"15","author":"V Rieser","year":"2009","unstructured":"Rieser V, Lemon O (2009) Does this list contain what you were searching for? Learning adaptive dialogue strategies for interactive question answering. Nat Lang Eng 15(1):55\u201372. https:\/\/doi.org\/10.1017\/S1351324908004907","journal-title":"Nat Lang Eng"},{"key":"9866_CR137","unstructured":"Ritter A, Cherry C, Dolan B (2010) Unsupervised modeling of twitter conversations. In: Human language technologies: the 2010 annual conference of the North American Chapter of the Association for Computational Linguistics, Stroudsburg, PA, USA, HLT \u201910, pp 172\u2013180, http:\/\/dl.acm.org\/citation.cfm?id=1857999.1858019"},{"key":"9866_CR138","unstructured":"Ritter A, Cherry C, Dolan WB (2011) Data-driven response generation in social media. In: Proceedings of the conference on empirical methods in natural language processing, Edinburgh, Scotland, UK., EMNLP \u201911, pp 583\u2013593, http:\/\/dl.acm.org\/citation.cfm?id=2145432.2145500"},{"issue":"4","key":"9866_CR139","doi-asserted-by":"publisher","first-page":"564","DOI":"10.1016\/J.IPM.2018.03.002","volume":"54","author":"A Rodrigo","year":"2018","unstructured":"Rodrigo A, Pe\u00f1as A, Miyao Y, Kando N (2018) Do systems pass university entrance exams? Inf Process Manag 54(4):564\u2013575. https:\/\/doi.org\/10.1016\/J.IPM.2018.03.002","journal-title":"Inf Process Manag"},{"key":"9866_CR140","doi-asserted-by":"crossref","unstructured":"Rogers A, Kovaleva O, Downey M, Rumshisky A (2020a) Getting closer to AI complete question answering: a set of prerequisite real tasks. In\u00a0Proceedings of the AAAI conference on artificial intelligence","DOI":"10.1609\/aaai.v34i05.6398"},{"key":"9866_CR141","doi-asserted-by":"crossref","unstructured":"Rogers A, Kovaleva O, Rumshisky A (2020b) A primer in BERTology: What we know about how BERT works arXiv:2002.12327","DOI":"10.1162\/tacl_a_00349"},{"key":"9866_CR142","doi-asserted-by":"crossref","unstructured":"Saha A, Pahuja V, Khapra MM, Sankaranarayanan K, Chandar S (2018) Complex sequential question answering: towards learning to converse over linked question answer pairs with a knowledge graph. In: McIlraith SA, Weinberger KQ (eds) Proceedings of the thirty-second AAAI conference on artificial intelligence, (AAAI-18), the 30th innovative applications of artificial intelligence (IAAI-18), and the 8th AAAI symposium on educational advances in artificial intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, AAAI Press, pp 705\u2013713, https:\/\/www.aaai.org\/ocs\/index.php\/AAAI\/AAAI18\/paper\/view\/17181","DOI":"10.1609\/aaai.v32i1.11332"},{"key":"9866_CR143","doi-asserted-by":"crossref","unstructured":"Sai AB, Gupta MD, Khapra MM, Srinivasan M (2019) Re-evaluating adem: a deeper look at scoring dialogue responses. In: Proceedings of the thirty-third AAAI conference on artificial intelligence, Honolulu, Hawaii, USA, AAAI\u201919, vol\u00a033, pp 6220\u20136227, https:\/\/aaai.org\/ojs\/index.php\/AAAI\/article\/view\/4581","DOI":"10.1609\/aaai.v33i01.33016220"},{"key":"9866_CR144","doi-asserted-by":"publisher","unstructured":"Sarrouti M, Ouatik El Alaoui S (2017) A passage retrieval method based on probabilistic information retrieval model and UMLS concepts in biomedical question answering. J Biomed Inf 68(C):96\u2013103. https:\/\/doi.org\/10.1016\/j.jbi.2017.03.001","DOI":"10.1016\/j.jbi.2017.03.001"},{"issue":"2","key":"9866_CR145","doi-asserted-by":"publisher","first-page":"97","DOI":"10.1017\/S0269888906000944","volume":"21","author":"J Schatzmann","year":"2006","unstructured":"Schatzmann J, Weilhammer K, Stuttle M, Young S (2006) A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies. Knowl Eng Rev 21(2):97\u2013126","journal-title":"Knowl Eng Rev"},{"key":"9866_CR146","doi-asserted-by":"crossref","unstructured":"Schatzmann J, Thomson B, Weilhammer K, Ye H, Young S (2007) Agenda-based user simulation for bootstrapping a POMDP dialogue system. In: Human language technologies 2007: the conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers, Rochester, New York, NAACL-Short \u201907, pp 149\u2013152, http:\/\/dl.acm.org\/citation.cfm?id=1614108.1614146","DOI":"10.3115\/1614108.1614146"},{"key":"9866_CR147","doi-asserted-by":"crossref","unstructured":"Schatztnann J, Stuttle MN, Weilhammer K, Young S (2005) Effects of the user model on simulation-based learning of dialogue strategies. In: IEEE workshop on automatic speech recognition and understanding, San Juan, Puerto Rico, ASRU, pp 220\u2013225, https:\/\/ieeexplore.ieee.org\/document\/1566539","DOI":"10.1109\/ASRU.2005.1566539"},{"key":"9866_CR148","doi-asserted-by":"publisher","first-page":"12","DOI":"10.1016\/j.specom.2015.06.003","volume":"74","author":"A Schmitt","year":"2015","unstructured":"Schmitt A, Ultes S (2015) Interaction quality: assessing the quality of ongoing spoken dialog interaction by experts\u2013and how it relates to user satisfaction. Speech Commun 74:12\u201336","journal-title":"Speech Commun"},{"key":"9866_CR149","unstructured":"Schmitt A, Ultes S, Minker W (2012) A parameterized and annotated spoken dialog corpus of the CMU let\u2019s go bus information system. In: Chair) NCC, Choukri K, Declerck T, Do\u011fan MU, Maegaard B, Mariani J, Moreno A, Odijk J, Piperidis S (eds) Proceedings of the eight international conference on language resources and evaluation (LREC\u201912), European Language Resources Association (ELRA), Istanbul, Turkey"},{"key":"9866_CR151","unstructured":"Schrading JN (2015) Analyzing domestic abuse using natural language processing on social media data. Master\u2019s thesis, Rochester Institute of Technology, http:\/\/scholarworks.rit.edu\/theses"},{"key":"9866_CR152","doi-asserted-by":"publisher","DOI":"10.1017\/CBO9781139173438","volume-title":"Speech acts: an essay in the philosophy of language","author":"JR Searle","year":"1969","unstructured":"Searle JR (1969) Speech acts: an essay in the philosophy of language. Cambridge University Press, Cambridge"},{"key":"9866_CR153","first-page":"59","volume-title":"Syntax and semantics 3: speech acts","author":"JR Searle","year":"1975","unstructured":"Searle JR (1975) Indirect speech acts. In: Cole P, Morgan J (eds) Syntax and semantics 3: speech acts. Academic Press, New York, pp 59\u201382"},{"key":"9866_CR154","doi-asserted-by":"crossref","unstructured":"Semeniuta S, Severyn A, Barth E (2017) A hybrid convolutional variational autoencoder for text generation. In: Proceedings of the 2017 conference on empirical methods in natural language processing, Copenhagen, Denmark, EMNLP, pp 627\u2013637, https:\/\/www.aclweb.org\/anthology\/D17-1066","DOI":"10.18653\/v1\/D17-1066"},{"key":"9866_CR155","doi-asserted-by":"crossref","unstructured":"Serban IV, Sordoni A, Bengio Y, Courville A, Pineau J (2016) Building end-to-end dialogue systems using generative hierarchical neural network models. In: Proceedings of the thirtieth AAAI conference on artificial intelligence, AAAI Press, Phoenix, Arizona, USA, AAAI\u201916, pp 3776\u20133783, http:\/\/dl.acm.org\/citation.cfm?id=3016387.3016435","DOI":"10.1609\/aaai.v30i1.9883"},{"key":"9866_CR156","doi-asserted-by":"crossref","unstructured":"Serban IV, Klinger T, Tesauro G, Talamadupula K, Zhou B, Bengio Y, Courville AC (2017a) Multiresolution recurrent neural networks: an application to dialogue response generation. In: Proceedings of the thirty-first AAAI conference on artificial intelligence, San Francisco, California, USA, AAAI \u201917, pp 3288\u20133294, http:\/\/aaai.org\/ocs\/index.php\/AAAI\/AAAI17\/paper\/view\/14571","DOI":"10.1609\/aaai.v31i1.10984"},{"key":"9866_CR157","unstructured":"Serban IV, Sankar C, Germain M, Zhang S, Lin Z, Subramanian S, Kim T, Pieper M, Chandar S, Ke NR, et\u00a0al. (2017b) A deep reinforcement learning chatbot. arXiv preprint arXiv:1709.02349"},{"key":"9866_CR158","doi-asserted-by":"crossref","unstructured":"Serban IV, Sordoni A, Lowe R, Charlin L, Pineau J, Courville A, Bengio Y (2017c) A hierarchical latent variable encoder-decoder model for generating dialogues. In: Proceedings of the thirty-first aaai conference on artificial intelligence, San Francisco, California USA, AAAI\u201917, pp 3295\u20133301, https:\/\/dl.acm.org\/doi\/10.5555\/3298023.3298047","DOI":"10.1609\/aaai.v31i1.10983"},{"issue":"9","key":"9866_CR159","doi-asserted-by":"publisher","first-page":"1","DOI":"10.5087\/dad.2018.101","volume":"1","author":"IV Serban","year":"2018","unstructured":"Serban IV, Lowe R, Henderson P, Charlin L, Pineau J (2018) A survey of available corpora for building data-driven dialogue systems: the journal version. Dialogue Discourse 1(9):1\u201349","journal-title":"Dialogue Discourse"},{"key":"9866_CR160","doi-asserted-by":"crossref","unstructured":"Shang L, Lu Z, Li H (2015) Neural responding machine for short-text conversation. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (Volume 1: Long Papers), Beijing, China, ACL - IJCNLP \u201915, pp 1577\u20131586, http:\/\/www.aclweb.org\/anthology\/P15-1152","DOI":"10.3115\/v1\/P15-1152"},{"key":"9866_CR161","unstructured":"Singh SP, Kearns MJ, Litman DJ, Walker MA (2000) Reinforcement learning for spoken dialogue systems. In: Solla SA, Leen TK, M\u00fcller K (eds) Advances in neural information processing systems 12, MIT Press, pp 956\u2013962, http:\/\/papers.nips.cc\/paper\/1775-reinforcement-learning-for-spoken-dialogue-systems.pdf"},{"key":"9866_CR162","doi-asserted-by":"publisher","unstructured":"Sordoni A, Galley M, Auli M, Brockett C, Ji Y, Mitchell M, Nie JY, Gao J, Dolan B (2015) A neural network approach to context-sensitive generation of conversational responses. In: Proceedings of the 2015 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Beijing, China, ACL\u2014IJCNLP \u201915, pp 196\u2013205, https:\/\/doi.org\/10.3115\/v1\/N15-1020, http:\/\/www.aclweb.org\/anthology\/N15-1020","DOI":"10.3115\/v1\/N15-1020"},{"key":"9866_CR163","doi-asserted-by":"crossref","unstructured":"Stent A, Prasad R, Walker M (2004) Trainable sentence planning for complex information presentation in spoken dialog systems. In: Proceedings of the 42nd annual meeting of the Association for Computational Linguistics, Barcelona, Spain, ACL \u201904, pp 79\u201386, https:\/\/www.aclweb.org\/anthology\/P04-1011","DOI":"10.3115\/1218955.1218966"},{"key":"9866_CR164","doi-asserted-by":"publisher","unstructured":"Sugiyama H, Meguro T, Higashinaka R (2019) Automatic evaluation of chat-oriented dialogue systems using large-scale multi-references, Springer International Publishing, Cham, pp 15\u201325. https:\/\/doi.org\/10.1007\/978-3-319-92108-2_2,","DOI":"10.1007\/978-3-319-92108-2_2"},{"key":"9866_CR165","unstructured":"Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Proceedings of the 27th international conference on neural information processing systems\u2014Volume 2, MIT Press, Cambridge, MA, USA, NIPS\u201914, pp 3104\u20133112, http:\/\/dl.acm.org\/citation.cfm?id=2969033.2969173"},{"key":"9866_CR166","doi-asserted-by":"publisher","unstructured":"Talmor A, Berant J (2018) The web as a knowledge-base for answering complex questions. In: Proceedings of the 2018 conference of the North American Chapter of the Association for Computational Linguistics: human language technologies, Volume 1 (Long Papers), Association for Computational Linguistics, New Orleans, Louisiana, pp 641\u2013651, https:\/\/doi.org\/10.18653\/v1\/N18-1059, https:\/\/www.aclweb.org\/anthology\/N18-1059","DOI":"10.18653\/v1\/N18-1059"},{"key":"9866_CR167","doi-asserted-by":"crossref","unstructured":"Tao C, Mou L, Zhao D, Yan R (2018) Ruber: an unsupervised method for automatic evaluation of open-domain dialog systems. https:\/\/www.aaai.org\/ocs\/index.php\/AAAI\/AAAI18\/paper\/view\/16179\/15752","DOI":"10.1609\/aaai.v32i1.11321"},{"key":"9866_CR168","doi-asserted-by":"crossref","unstructured":"Tiedemann J (2009) News from OPUS-A collection of multilingual parallel corpora with tools and interfaces. In:\u00a0Recent advances in natural language processing, vol 5, pp 237\u2013248","DOI":"10.1075\/cilt.309.19tie"},{"key":"9866_CR169","unstructured":"Tiedemann J (2012) Parallel Data, Tools and Interfaces in OPUS. In: Chair) NCC, Choukri K, Declerck T, Do\u011fan MU, Maegaard B, Mariani J, Moreno A, Odijk J, Piperidis S (eds) Proceedings of the eight international conference on language resources and evaluation (LREC\u201912), European Language Resources Association (ELRA)"},{"key":"9866_CR170","doi-asserted-by":"publisher","unstructured":"Traum DR (1999) Speech acts for dialogue agents, Springer Netherlands, Dordrecht, pp 169\u2013201. https:\/\/doi.org\/10.1007\/978-94-015-9204-8_8","DOI":"10.1007\/978-94-015-9204-8_8"},{"key":"9866_CR171","doi-asserted-by":"publisher","unstructured":"Trischler A, Wang T, Yuan X, Harris J, Sordoni A, Bachman P, Suleman K (2017) NewsQA: a machine comprehension dataset. In: Proceedings of the 2nd workshop on representation learning for NLP, Association for Computational Linguistics, Vancouver, Canada, pp 191\u2013200, https:\/\/doi.org\/10.18653\/v1\/W17-2623, https:\/\/www.aclweb.org\/anthology\/W17-2623","DOI":"10.18653\/v1\/W17-2623"},{"key":"9866_CR172","doi-asserted-by":"publisher","DOI":"10.1002\/9781119992691","volume-title":"Spoken language understanding: systems for extracting semantic information from speech","author":"G Tur","year":"2011","unstructured":"Tur G, De Mori R (2011) Spoken language understanding: systems for extracting semantic information from speech. Wiley, Hoboken"},{"key":"9866_CR173","doi-asserted-by":"publisher","DOI":"10.1002\/9781119992691","volume-title":"Spoken language understanding: systems for extracting semantic information from speech","author":"G Tur","year":"2011","unstructured":"Tur G, Mori RD (2011) Spoken language understanding: systems for extracting semantic information from speech. Wiley, Hoboken"},{"key":"9866_CR174","doi-asserted-by":"publisher","unstructured":"Turing AM (1950) Computing machinery and intelligence. Mind LIX(236):433\u2013460. https:\/\/doi.org\/10.1093\/mind\/LIX.236.433","DOI":"10.1093\/mind\/LIX.236.433"},{"key":"9866_CR175","unstructured":"Ultes S, Schmitt A, Minker W (2013) On quality ratings for spoken dialogue systems\u2013experts vs. users. In: Proceedings of the 2013 conference of the North American Chapter of the Association for Computational Linguistics: human language technologies, Atlanta, Georgia, USA, NAACL\u2014HLT\u201913, pp 569\u2013578, https:\/\/www.aclweb.org\/anthology\/N13-1064"},{"key":"9866_CR176","doi-asserted-by":"crossref","unstructured":"Ultes S, Rojas\u00a0Barahona LM, Su PH, Vandyke D, Kim D, Casanueva In, Budzianowski P, Mrk\u0161i\u0107 N, Wen TH, Gasic M, Young S (2017) PyDial: a multi-domain statistical dialogue system toolkit. In: Proceedings of ACL 2017, System Demonstrations, Vancouver, Canada, pp 73\u201378","DOI":"10.18653\/v1\/P17-4013"},{"key":"9866_CR150","doi-asserted-by":"crossref","unstructured":"van Schooten B, Rosset S, Galibert O, Max A, op den Akker R, Illouz G (2007) Handling speech input in the Ritel QA dialogue system. In: 8th annual conference of the international speech communication Association, Antwerp, Belgium, INTERSPEECH 2007, pp 126\u2013129, https:\/\/www.isca-speech.org\/archive\/interspeech_2007\/i07_0126.html","DOI":"10.21437\/Interspeech.2007-55"},{"key":"9866_CR177","unstructured":"Vinyals O, Le Q (2015) A neural conversational model. arXiv preprint arXiv:150605869"},{"key":"9866_CR178","doi-asserted-by":"publisher","unstructured":"Voorhees EM (2006) Evaluating question answering system performance, Springer Netherlands, Dordrecht, pp 409\u2013430. https:\/\/doi.org\/10.1007\/978-1-4020-4746-6_13","DOI":"10.1007\/978-1-4020-4746-6_13"},{"key":"9866_CR179","doi-asserted-by":"publisher","unstructured":"Walker MA, Litman DJ, Kamm CA, Abella A (1997) PARADISE: a framework for evaluating spoken dialogue agents. In: Proceedings of the Eighth Conference on European chapter of the association for computational linguistics, Madrid, Spain, EACL \u201997, pp 271\u2013280, https:\/\/doi.org\/10.3115\/979617.979652","DOI":"10.3115\/979617.979652"},{"issue":"3\u20134","key":"9866_CR180","doi-asserted-by":"publisher","first-page":"363","DOI":"10.1017\/S1351324900002503","volume":"6","author":"MA Walker","year":"2000","unstructured":"Walker MA, Kamm CA, Litman DJ (2000) Towards developing general models of usability with PARADISE. Nat Lang Eng 6(3\u20134):363\u2013377. https:\/\/doi.org\/10.1017\/S1351324900002503","journal-title":"Nat Lang Eng"},{"key":"9866_CR181","doi-asserted-by":"publisher","unstructured":"Wang A, Singh A, Michael J, Hill F, Levy O, Bowman S (2018) GLUE: A multi-task benchmark and analysis platform for natural language understanding. In: Proceedings of the 2018 EMNLP workshop BlackboxNLP: analyzing and interpreting neural networks for NLP, Association for Computational Linguistics, Brussels, Belgium, pp 353\u2013355, https:\/\/doi.org\/10.18653\/v1\/W18-5446, https:\/\/www.aclweb.org\/anthology\/W18-5446","DOI":"10.18653\/v1\/W18-5446"},{"key":"9866_CR182","doi-asserted-by":"publisher","unstructured":"Wang Z, Wen TH, Su PH, Stylianou Y (2015) Learning domain-independent dialogue policies via ontology parameterisation. In: Proceedings of the SIGDIAL 2015 conference: the 16th annual meeting of the special interest group on discourse and dialogue, Prague, Czech Republic, SIGDIAL \u201915, pp 412\u2013416, https:\/\/doi.org\/10.18653\/v1\/W15-4654, http:\/\/www.aclweb.org\/anthology\/W15-4654","DOI":"10.18653\/v1\/W15-4654"},{"issue":"1","key":"9866_CR183","doi-asserted-by":"publisher","first-page":"36","DOI":"10.1145\/365153.365168","volume":"9","author":"J Weizenbaum","year":"1966","unstructured":"Weizenbaum J (1966) ELIZA\u2013a computer program for the study of natural language communication between man and machine. Commun ACM 9(1):36\u201345. https:\/\/doi.org\/10.1145\/365153.365168","journal-title":"Commun ACM"},{"key":"9866_CR184","doi-asserted-by":"crossref","unstructured":"Wen TH, Ga\u0161i\u0107 M, Mrk\u0161i\u0107 N, Su PH, Vandyke D, Young S (2015) Semantically conditioned LSTM-based natural language generation for spoken dialogue systems. In: Proceedings of the 2015 conference on empirical methods in natural language processing, Lisbon, Portugal, EMNLP \u201915","DOI":"10.18653\/v1\/D15-1199"},{"key":"9866_CR185","doi-asserted-by":"crossref","unstructured":"Wen TH, Ga\u0161i\u0107 M, Mrk\u0161i\u0107 N, Rojas-Barahona LM, Su PH, Vandyke D, Young S (2016) Multi-domain neural network language generation for spoken dialogue systems. In: Proceedings of the 2016 conference of the North American Chapter of the Association for Computational Linguistics: human language technologies, san Diego, California, NAACL -HLT \u201916, pp 120\u2013129","DOI":"10.18653\/v1\/N16-1015"},{"key":"9866_CR186","doi-asserted-by":"crossref","unstructured":"Wen TH, Vandyke D, Mrk\u0161i\u0107 N, Gasic M, Rojas\u00a0Barahona LM, Su PH, Ultes S, Young S (2017) A network-based end-to-end trainable task-oriented dialogue system. In: Proceedings of the 15th conference of the european chapter of the Association for Computational Linguistics: Volume 1, Long Papers, Valencia, Spain, EACL \u201917, pp 438\u2013449, http:\/\/aclweb.org\/anthology\/E17-1042","DOI":"10.18653\/v1\/E17-1042"},{"key":"9866_CR187","unstructured":"Williams J, Raux A, Ramachandran D, Black A (2013) The dialog state tracking challenge. In: Proceedings of the SIGDIAL 2013 conference, Association for Computational Linguistics, Metz, France, pp 404\u2013413"},{"key":"9866_CR188","doi-asserted-by":"crossref","unstructured":"Williams J, Raux A, Henderson M (2016) The dialog state tracking challenge series: a review. Dialogue & Discourse https:\/\/www.microsoft.com\/en-us\/research\/publication\/the-dialog-state-tracking-challenge-series-a-review\/","DOI":"10.5087\/dad.2016.301"},{"key":"9866_CR189","doi-asserted-by":"crossref","unstructured":"Xing C, Wu W, Wu Y, Liu J, Huang Y, Zhou M, Ma W (2017) Topic aware neural response generation. In: Proceedings of the thirty-first AAAI conference on artificial intelligence, San Francisco, California, USA, AAAI \u201917, pp 3351\u20133357, http:\/\/aaai.org\/ocs\/index.php\/AAAI\/AAAI17\/paper\/view\/14563","DOI":"10.1609\/aaai.v31i1.10981"},{"key":"9866_CR190","doi-asserted-by":"publisher","unstructured":"Yang Y, Yih Wt, Meek C (2015) WikiQA: a challenge dataset for open-domain question answering. In: Proceedings of the 2015 Conference on empirical methods in natural language processing, Association for Computational Linguistics, Lisbon, Portugal, pp 2013\u20132018, https:\/\/doi.org\/10.18653\/v1\/D15-1237, https:\/\/www.aclweb.org\/anthology\/D15-1237","DOI":"10.18653\/v1\/D15-1237"},{"key":"9866_CR191","unstructured":"Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov RR, Le QV (2019) Xlnet: generalized autoregressive pretraining for language understanding. In: Advances in neural information processing systems, pp 5754\u20135764"},{"key":"9866_CR192","doi-asserted-by":"publisher","unstructured":"Yao K, Peng B, Zhang Y, Yu D, Zweig G, Shi Y (2014) Spoken language understanding using long short-term memory neural networks. In: Spoken language technology workshop (SLT), IEEE, South Lake Tahoe, NV, USA, IEEE 2014, pp 189\u2013194, https:\/\/doi.org\/10.1109\/SLT.2014.7078572, https:\/\/ieeexplore.ieee.org\/document\/7078572","DOI":"10.1109\/SLT.2014.7078572"},{"key":"9866_CR193","doi-asserted-by":"publisher","unstructured":"Yeh YT, Chen YN (2019) FlowDelta: modeling flow information gain in reasoning for conversational machine comprehension. In: Proceedings of the 2nd workshop on machine reading for question answering, Association for Computational Linguistics, Hong Kong, China, pp 86\u201390, https:\/\/doi.org\/10.18653\/v1\/D19-5812, https:\/\/www.aclweb.org\/anthology\/D19-5812","DOI":"10.18653\/v1\/D19-5812"},{"key":"9866_CR194","unstructured":"Young S (2007) CUED standard dialogue acts. Report, Cambridge University, Engineering Department http:\/\/mi.eng.cam.ac.uk\/research\/dialogue\/LocalDocs\/dastd.pdf"},{"key":"9866_CR195","doi-asserted-by":"crossref","unstructured":"Young S, Schatzmann J, Weilhammer K, Ye H (2007) The hidden information state approach to dialog management. In: IEEE International conference on acoustics, speech and signal processing, Honolulu, HI, USA, ICASSP \u201907, vol\u00a04, pp 149\u2013152, http:\/\/svr-ftp.eng.cam.ac.uk\/~sjy\/papers\/yswy07.pdf","DOI":"10.1109\/ICASSP.2007.367185"},{"issue":"2","key":"9866_CR196","doi-asserted-by":"publisher","first-page":"150","DOI":"10.1016\/j.csl.2009.04.001","volume":"24","author":"S Young","year":"2010","unstructured":"Young S, Ga\u0161i\u0107 M, Keizer S, Mairesse F, Schatzmann J, Thomson B, Yu K (2010) The hidden information state model: a practical framework for POMDP-based spoken dialogue management. Comput Speech Lang 24(2):150\u2013174. https:\/\/doi.org\/10.1016\/j.csl.2009.04.001","journal-title":"Comput Speech Lang"},{"issue":"5","key":"9866_CR197","doi-asserted-by":"publisher","first-page":"1160","DOI":"10.1109\/JPROC.2012.2225812","volume":"101","author":"S Young","year":"2013","unstructured":"Young S, Ga\u0161i\u0107 M, Thomson B, Williams JD (2013) POMDP-based statistical spoken dialog systems: a review. Proc IEEE 101(5):1160\u20131179. https:\/\/doi.org\/10.1109\/JPROC.2012.2225812","journal-title":"Proc IEEE"},{"key":"9866_CR198","unstructured":"Zhang X, Wang H (2016) A joint model of intent determination and slot filling for spoken language understanding. In: Proceedings of the twenty-fifth international joint conference on artificial intelligence, New York, New York, USA, IJCAI\u201916, pp 2993\u20132999, https:\/\/www.ijcai.org\/Proceedings\/16\/Papers\/425.pdf"},{"key":"9866_CR199","doi-asserted-by":"publisher","unstructured":"Zhao T, Eskenazi M (2016) Towards end-to-end learning for dialog state tracking and management using deep reinforcement learning. In: Proceedings of the SIGDIAL 2016 conference: the 17th annual meeting of the special interest group on discourse and dialogue, Los Angeles, CA, USA, SIGDIAL\u201916, pp 1\u201310, https:\/\/doi.org\/10.18653\/v1\/W16-3601, http:\/\/www.aclweb.org\/anthology\/W16-3601","DOI":"10.18653\/v1\/W16-3601"},{"key":"9866_CR200","doi-asserted-by":"publisher","unstructured":"Zhao T, Zhao R, Eskenazi M (2017) Learning discourse-level diversity for neural dialog models using conditional variational autoencoders. In: Proceedings of the 55th annual meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, Vancouver, Canada, pp 654\u2013664, https:\/\/doi.org\/10.18653\/v1\/P17-1061, https:\/\/www.aclweb.org\/anthology\/P17-1061","DOI":"10.18653\/v1\/P17-1061"},{"key":"9866_CR201","doi-asserted-by":"crossref","unstructured":"Zhao WX, Jiang J, Weng J, He J, Lim EP, Yan H, Li X (2011) Comparing Twitter and traditional media using topic models. In: Proceedings of the 33rd European conference on advances in information retrieval, Springer-Verlag, Berlin, Heidelberg, ECIR\u201911, pp 338\u2013349, http:\/\/dl.acm.org\/citation.cfm?id=1996889.1996934","DOI":"10.1007\/978-3-642-20161-5_34"},{"key":"9866_CR202","unstructured":"Zhou L, Gao J, Li D, Shum HY (2018) The Design and implementation of XiaoIce, an empathetic social chatbot. arXiv preprint arXiv:1812.08989"}],"container-title":["Artificial Intelligence Review"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10462-020-09866-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10462-020-09866-x\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10462-020-09866-x.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,10,30]],"date-time":"2022-10-30T10:59:49Z","timestamp":1667127589000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10462-020-09866-x"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,6,25]]},"references-count":202,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2021,1]]}},"alternative-id":["9866"],"URL":"https:\/\/doi.org\/10.1007\/s10462-020-09866-x","relation":{},"ISSN":["0269-2821","1573-7462"],"issn-type":[{"value":"0269-2821","type":"print"},{"value":"1573-7462","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,6,25]]},"assertion":[{"value":"25 June 2020","order":1,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Compliance with ethical standards"}},{"value":"There are no conflicts of interest to disclose.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}