{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,10,5]],"date-time":"2025-10-05T04:21:15Z","timestamp":1759638075468,"version":"3.30.2"},"reference-count":43,"publisher":"Cambridge University Press (CUP)","issue":"6","license":[{"start":{"date-parts":[[2024,1,9]],"date-time":"2024-01-09T00:00:00Z","timestamp":1704758400000},"content-version":"unspecified","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["cambridge.org"],"crossmark-restriction":true},"short-container-title":["Nat. Lang. Eng."],"published-print":{"date-parts":[[2024,11]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Automatic dialog systems have become a mainstream part of online customer service. Many such systems are built, maintained, and improved by customer service specialists, rather than dialog systems engineers and computer programmers. As conversations between people and machines become commonplace, it is critical to understand what is working, what is not, and what actions can be taken to reduce the frequency of inappropriate system responses. These analyses and recommendations need to be presented in terms that directly reflect the user experience rather than the internal dialog processing.<\/jats:p><jats:p>This paper introduces and explains the use of Actionable Conversational Quality Indicators (ACQIs), which are used both to recognize parts of dialogs that can be improved and to recommend how to improve them. This combines benefits of previous approaches, some of which have focused on producing dialog quality scoring while others have sought to categorize the types of errors the dialog system is making. We demonstrate the effectiveness of using ACQIs on LivePerson internal dialog systems used in commercial customer service applications and on the publicly available LEGOv2 conversational dataset. We report on the annotation and analysis of conversational datasets showing which ACQIs are important to fix in various situations.<\/jats:p><jats:p>The annotated datasets are then used to build a predictive model which uses a turn-based vector embedding of the message texts and achieves a 79% weighted average f1-measure at the task of finding the correct ACQI for a given conversation. We predict that if such a model worked perfectly, the range of potential improvement actions a bot-builder must consider at each turn could be reduced by an average of 81%.<\/jats:p>","DOI":"10.1017\/s1351324923000372","type":"journal-article","created":{"date-parts":[[2024,1,9]],"date-time":"2024-01-09T08:12:41Z","timestamp":1704787961000},"page":"1229-1254","update-policy":"https:\/\/doi.org\/10.1017\/policypage","source":"Crossref","is-referenced-by-count":2,"title":["Actionable conversational quality indicators for improving task-oriented dialog systems"],"prefix":"10.1017","volume":"30","author":[{"given":"Michael","family":"Higgins","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4241-0820","authenticated-orcid":false,"given":"Dominic","family":"Widdows","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Beth Ann","family":"Hockey","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Akshay","family":"Hazare","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Kristen","family":"Howell","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Gwen","family":"Christian","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Sujit","family":"Mathi","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Chris","family":"Brew","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Andrew","family":"Maurer","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"George","family":"Bonev","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Matthew","family":"Dunn","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Joseph","family":"Bradley","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"56","published-online":{"date-parts":[[2024,1,9]]},"reference":[{"key":"S1351324923000372_ref24","doi-asserted-by":"publisher","DOI":"10.3115\/1075527.1075533"},{"key":"S1351324923000372_ref40","doi-asserted-by":"crossref","unstructured":"Walker, M.A. , Litman, D.J. , Kamm, C.A. and Abella, A. (1997). PARADISE: A framework for evaluating spoken dialogue agents. In 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics, Madrid, Spain. Association for Computational Linguistics, pp. 271\u2013280.","DOI":"10.3115\/976909.979652"},{"key":"S1351324923000372_ref9","unstructured":"Forrester Research (2020). The total economic impact of IBM Watson assistant. Technical report, Forrester, commissioned by IBM."},{"key":"S1351324923000372_ref42","unstructured":"Witt, S. (2011). A global experience metric for dialog management in spoken dialog systems. In Proceedings of SemDial, pp. 158\u2013166."},{"key":"S1351324923000372_ref30","doi-asserted-by":"publisher","DOI":"10.1207\/S15327957PSPR0504_2"},{"key":"S1351324923000372_ref36","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-92108-2_16"},{"key":"S1351324923000372_ref43","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1369"},{"key":"S1351324923000372_ref26","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W17-5520"},{"key":"S1351324923000372_ref3","unstructured":"Bodigutla, P.K. , Polymenakos, L. and Matsoukas, S. (2019). Multi-domain conversation quality evaluation via user satisfaction estimation. In 33rd Conference on Neural Information Processing Systems, NeurIPS 2019, Vacouver."},{"key":"S1351324923000372_ref18","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1012"},{"key":"S1351324923000372_ref21","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1075"},{"key":"S1351324923000372_ref17","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1071"},{"key":"S1351324923000372_ref4","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.findings-emnlp.347"},{"key":"S1351324923000372_ref20","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P18-1138"},{"key":"S1351324923000372_ref32","doi-asserted-by":"publisher","DOI":"10.1016\/j.specom.2015.06.003"},{"key":"S1351324923000372_ref11","doi-asserted-by":"crossref","unstructured":"Hirschman, L. , Dahl, D.A. , McKay, D.P. , Norton, L.M. and Linebarger, M.C. (1990). Beyond class a: A proposal for automatic evaluation of discourse. In Speech and Natural Language: Proceedings of a Workshop Held at Hidden, June 24-27, 1990, Valley, Pennsylvania.","DOI":"10.21236\/ADA458704"},{"key":"S1351324923000372_ref7","unstructured":"Devlin, J. , Chang, M.-W. , Lee, K. and Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota. Association for Computational Linguistics, pp. 4171\u20134186."},{"key":"S1351324923000372_ref12","doi-asserted-by":"publisher","DOI":"10.21437\/Eurospeech.1993-323"},{"key":"S1351324923000372_ref1","unstructured":"Adiwardana, D. , Luong, M.-T. , So, D.R. , Hall, J. , Fiedel, N. , Thoppilan, R. , Yang, Z. , Kulshreshtha, A. , Nemade, G. , Lu, Y. , et al. (2020). Towards a human-like open-domain chatbot. arXiv preprint arXiv:2001.09977."},{"key":"S1351324923000372_ref13","doi-asserted-by":"publisher","DOI":"10.3115\/1067807.1067828"},{"key":"S1351324923000372_ref22","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D18-1255"},{"key":"S1351324923000372_ref33","unstructured":"Schmitt, A. , Ultes, S. and Minker, W. (2012). A parameterized and annotated spoken dialog corpus of the CMU let\u2019s go bus information system. In Language Resources and Evaluation Conference (LREC), pp. 3369\u20133373."},{"key":"S1351324923000372_ref10","doi-asserted-by":"publisher","DOI":"10.1177\/1938965520902012"},{"key":"S1351324923000372_ref5","unstructured":"Danieli, M. and Gerbino, E. (1995). Metrics for evaluating dialogue strategies in a spoken language system. In Proceedings of the 1995 AAAI Spring Symposium on Empirical Methods in Discourse Interpretation and Generation, pp. 34\u201339."},{"key":"S1351324923000372_ref31","unstructured":"Schmitt, A. , Schatz, B. and Minker, W. (2011). Modeling and predicting quality in spoken human-computer interaction. In Proceedings of the SIGDIAL. 2011 Conference, pp. 173\u2013184."},{"key":"S1351324923000372_ref15","doi-asserted-by":"publisher","DOI":"10.1186\/1758-2946-6-10"},{"key":"S1351324923000372_ref37","unstructured":"Szyma\u0144ski, P. and Kajdanowicz, T. (2017). A network perspective on stratification of multi-label data. In First International Workshop on Learning with Imbalanced Domains: Theory and Applications. PMLR, pp. 22\u201335."},{"key":"S1351324923000372_ref19","unstructured":"Ling, Y. , Yao, B. , Kohli, G.S. , Pham, T. and Guo, C. (2020). IQ-net: A DNN model for estimating interaction-level dialogue quality with conversational agents. In Converse@KDD."},{"key":"S1351324923000372_ref34","doi-asserted-by":"publisher","DOI":"10.3115\/1075527.1075538"},{"key":"S1351324923000372_ref8","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/2020.sigdial-1.29"},{"key":"S1351324923000372_ref35","doi-asserted-by":"crossref","unstructured":"Stoyanchev, S. , Maiti, S. and Bangalore, S. (2017). Predicting interaction quality in customer service dialogs. In IWSDS.","DOI":"10.1007\/978-3-319-92108-2_16"},{"key":"S1351324923000372_ref38","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W19-5902"},{"key":"S1351324923000372_ref16","doi-asserted-by":"publisher","DOI":"10.2307\/2529310"},{"key":"S1351324923000372_ref14","doi-asserted-by":"publisher","DOI":"10.1145\/3196709.3196735"},{"key":"S1351324923000372_ref27","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2006-17"},{"key":"S1351324923000372_ref39","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-19291-8_4"},{"key":"S1351324923000372_ref28","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2005-399"},{"key":"S1351324923000372_ref25","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P19-1372"},{"key":"S1351324923000372_ref23","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-04208-9_3"},{"key":"S1351324923000372_ref2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2014.6854195"},{"key":"S1351324923000372_ref6","first-page":"1","article-title":"Survey on evaluation methods for dialogue systems","volume":"54","author":"Deriu","year":"2020","journal-title":"Artificial Intelligence Review"},{"key":"S1351324923000372_ref29","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/D19-1410"},{"key":"S1351324923000372_ref41","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i05.6453"}],"container-title":["Natural Language Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S1351324923000372","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,12,12]],"date-time":"2024-12-12T11:12:29Z","timestamp":1734001949000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S1351324923000372\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,1,9]]},"references-count":43,"journal-issue":{"issue":"6","published-print":{"date-parts":[[2024,11]]}},"alternative-id":["S1351324923000372"],"URL":"https:\/\/doi.org\/10.1017\/s1351324923000372","relation":{},"ISSN":["1351-3249","1469-8110"],"issn-type":[{"type":"print","value":"1351-3249"},{"type":"electronic","value":"1469-8110"}],"subject":[],"published":{"date-parts":[[2024,1,9]]},"assertion":[{"value":"\u00a9 The Author(s), 2024. Published by Cambridge University Press","name":"copyright","label":"Copyright","group":{"name":"copyright_and_licensing","label":"Copyright and Licensing"}},{"value":"This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http:\/\/creativecommons.org\/licenses\/by\/4.0\/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.","name":"license","label":"License","group":{"name":"copyright_and_licensing","label":"Copyright and Licensing"}},{"value":"This content has been made available to all.","name":"free","label":"Free to read"}]}}