{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,3]],"date-time":"2026-04-03T09:13:32Z","timestamp":1775207612974,"version":"3.50.1"},"reference-count":75,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2023,11,8]],"date-time":"2023-11-08T00:00:00Z","timestamp":1699401600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-nd\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Inf. Syst."],"published-print":{"date-parts":[[2024,3,31]]},"abstract":"<jats:p>User satisfaction depicts the effectiveness of a system from the user\u2019s perspective. Understanding and predicting user satisfaction is vital for the design of user-oriented evaluation methods for<jats:bold>conversational recommender systems (CRSs)<\/jats:bold>. Current approaches rely on turn-level satisfaction ratings to predict a user\u2019s overall satisfaction with CRS. These methods assume that all users perceive satisfaction similarly, failing to capture the broader dialogue aspects that influence overall user satisfaction.<\/jats:p><jats:p>We investigate the effect of several dialogue aspects on user satisfaction when interacting with a CRS. To this end, we annotate dialogues based on six aspects\u00a0(i.e.,<jats:italic>relevance<\/jats:italic>,<jats:italic>interestingness<\/jats:italic>,<jats:italic>understanding<\/jats:italic>,<jats:italic>task-completion<\/jats:italic>,<jats:italic>interest-arousal<\/jats:italic>, and<jats:italic>efficiency<\/jats:italic>) at the turn and dialogue levels. We find that the concept of satisfaction varies per user. At the turn level, a system\u2019s ability to make relevant recommendations is a significant factor in satisfaction. We adopt these aspects as features for predicting response quality and user satisfaction. We achieve an F1-score of 0.80 in classifying dissatisfactory dialogues, and a Pearson\u2019s<jats:italic>r<\/jats:italic>of 0.73 for turn-level response quality estimation, demonstrating the effectiveness of the proposed dialogue aspects in predicting user satisfaction and being able to identify dialogues where the system is failing.<\/jats:p><jats:p>With this article, we release our annotated data.<jats:xref ref-type=\"fn\"><jats:sup>1<\/jats:sup><\/jats:xref><\/jats:p><jats:p\/>","DOI":"10.1145\/3624989","type":"journal-article","created":{"date-parts":[[2023,9,21]],"date-time":"2023-09-21T11:27:27Z","timestamp":1695295647000},"page":"1-37","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":16,"title":["Understanding and Predicting User Satisfaction with Conversational Recommender Systems"],"prefix":"10.1145","volume":"42","author":[{"given":"Clemencia","family":"Siro","sequence":"first","affiliation":[{"name":"University of Amsterdam, The Netherlands"}]},{"given":"Mohammad","family":"Aliannejadi","sequence":"additional","affiliation":[{"name":"University of Amsterdam, The Netherlands"}]},{"given":"Maarten","family":"De Rijke","sequence":"additional","affiliation":[{"name":"University of Amsterdam, The Netherlands"}]}],"member":"320","published-online":{"date-parts":[[2023,11,8]]},"reference":[{"key":"e_1_3_5_2_2","doi-asserted-by":"publisher","DOI":"10.1002\/asi.21300"},{"key":"e_1_3_5_3_2","doi-asserted-by":"crossref","first-page":"773","DOI":"10.1145\/1277741.1277902","volume-title":"Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval","author":"Al-Maskari Azzah","year":"2007","unstructured":"Azzah Al-Maskari, Mark Sanderson, and Paul Clough. 2007. The relationship between ir effectiveness measures and user satisfaction. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, New York, NY, 773\u2013774. DOI:10.1145\/1277741.1277902"},{"key":"e_1_3_5_4_2","doi-asserted-by":"crossref","first-page":"153","DOI":"10.1007\/978-3-642-20161-5_16","volume-title":"Advances in Information Retrieval","author":"Alonso Omar","year":"2011","unstructured":"Omar Alonso and Ricardo Baeza-Yates. 2011. Design and implementation of relevance assessments using crowdsourcing. In Advances in Information Retrieval. Paul Clough, Colum Foley, Cathal Gurrin, Gareth J. F. Jones, Wessel Kraaij, Hyowon Lee, and Vanessa Mudoch (Eds.). Springer, Berlin,, 153\u2013164."},{"key":"e_1_3_5_5_2","doi-asserted-by":"publisher","DOI":"10.4230\/DagRep.9.11.34"},{"key":"e_1_3_5_6_2","doi-asserted-by":"crossref","unstructured":"Krisztian Balog and ChengXiang Zhai. 2023. User simulation for evaluating information access Systems. CoRR abs\/2306.08550 (2023). 10.48550\/arXiv.2306.08550 arXiv:2306.08550.","DOI":"10.1145\/3583780.3615296"},{"key":"e_1_3_5_7_2","unstructured":"Praveen Kumar Bodigutla Lazaros Polymenakos and Spyros Matsoukas. 2019. Multi-domain conversation quality evaluation via user satisfaction estimation. CoRR abs\/1911.08567 (2019). arXiv:1911.08567 http:\/\/arxiv.org\/abs\/1911.08567"},{"key":"e_1_3_5_8_2","doi-asserted-by":"crossref","first-page":"3897","DOI":"10.18653\/v1\/2020.findings-emnlp.347","volume-title":"Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020","author":"Bodigutla Praveen Kumar","year":"2020","unstructured":"Praveen Kumar Bodigutla, Aditya Tiwari, Spyros Matsoukas, Josep Valls-Vargas, and Lazaros Polymenakos. 2020. Joint turn and dialogue level user satisfaction estimation on multi-domain conversations. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online, 3897\u20133909. DOI:10.18653\/v1\/2020.findings-emnlp.347"},{"issue":"1","key":"e_1_3_5_9_2","doi-asserted-by":"crossref","first-page":"5","DOI":"10.1023\/A:1010933404324","article-title":"Random forests","volume":"45","author":"Breiman Leo","year":"2001","unstructured":"Leo Breiman. 2001. Random forests. Machine Learning 45, 1 (2001), 5\u201332.","journal-title":"Machine Learning"},{"key":"e_1_3_5_10_2","volume-title":"Classification and Regression Trees","author":"Breiman Leo","year":"1984","unstructured":"Leo Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. 1984. Classification and Regression Trees. Wadsworth."},{"key":"e_1_3_5_11_2","series-title":"Proceedings of the 28th ACM Conference on User Modeling, Adaptation and Personalization","first-page":"33","author":"Cai Wanling","year":"2020","unstructured":"Wanling Cai and Li Chen. 2020. Predicting user intents and satisfaction with dialogue-based conversational recommendations. In Proceedings of the 28th ACM Conference on User Modeling, Adaptation and Personalization (Genoa, Italy) (UMAP\u201920). Association for Computing Machinery, New York, NY, 33\u201342. DOI:10.1145\/3340631.3394856"},{"key":"e_1_3_5_12_2","volume-title":"Statistical Inference (2nd ed.)","author":"Casella Georges","year":"2002","unstructured":"Georges Casella. 2002. Statistical Inference (2nd ed.). Duxbury\/Thomson Learning."},{"issue":"1","key":"e_1_3_5_13_2","doi-asserted-by":"crossref","first-page":"125","DOI":"10.1007\/s11257-011-9108-6","article-title":"Critiquing-based recommenders: Survey and emerging trends","volume":"22","author":"Chen Li","year":"2012","unstructured":"Li Chen and Pearl Pu. 2012. Critiquing-based recommenders: Survey and emerging trends. User Modeling and User-Adapted Interaction 22, 1 (2012), 125\u2013150.","journal-title":"User Modeling and User-Adapted Interaction"},{"key":"e_1_3_5_14_2","first-page":"173","volume-title":"Aslib Proceedings","volume":"19","author":"Cleverdon Cyril W.","year":"1967","unstructured":"Cyril W. Cleverdon. 1967. The cranfield tests on index language devices. In Aslib Proceedings, Vol. 19. 173\u2013192."},{"key":"e_1_3_5_15_2","first-page":"3","volume-title":"Proceedings of the 14th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval","author":"Cleverdon Cyril W.","year":"1991","unstructured":"Cyril W. Cleverdon. 1991. The significance of the cranfield tests on index languages. In Proceedings of the 14th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, NY, 3\u201312. DOI:10.1145\/122860.122861"},{"key":"e_1_3_5_16_2","doi-asserted-by":"publisher","DOI":"10.1016\/0020-0271(71)90024-6"},{"key":"e_1_3_5_17_2","doi-asserted-by":"crossref","first-page":"376","DOI":"10.3115\/v1\/W14-3348","volume-title":"Proceedings of the 9th Workshop on Statistical Machine Translation","author":"Denkowski Michael","year":"2014","unstructured":"Michael Denkowski and Alon Lavie. 2014. Meteor universal: Language specific translation evaluation for any target language. In Proceedings of the 9th Workshop on Statistical Machine Translation. Association for Computational Linguistics, 376\u2013380. DOI:10.3115\/v1\/W14-3348"},{"key":"e_1_3_5_18_2","doi-asserted-by":"crossref","first-page":"755","DOI":"10.1007\/s10462-020-09866-x","article-title":"Survey on evaluation methods for dialogue systems","volume":"54","author":"Deriu Jan","year":"2020","unstructured":"Jan Deriu, \u00c1lvaro Rodrigo, Arantxa Otegi, Guillermo Echegoyen, Sophie Rosset, Eneko Agirre, and Mark Cieliebak. 2020. Survey on evaluation methods for dialogue systems. Artificial Intelligence Review 54, 1 (2020), 755\u2013810.","journal-title":"Artificial Intelligence Review"},{"key":"e_1_3_5_19_2","unstructured":"Harris Drucker Christopher J. C. Burges Linda Kaufman Alexander J. Smola and Vladimir Vapnik. 1996. Support Vector Regression Machines. In Advances in Neural Information Processing Systems 9 NIPS Denver CO USA December 2-5 1996 Michael Mozer Michael I. Jordan and Thomas Petsche (Eds.). MIT Press 155\u2013161. http:\/\/papers.nips.cc\/paper\/1238-support-vector-regression-machines"},{"key":"e_1_3_5_20_2","first-page":"3806","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"Dziri Nouha","year":"2019","unstructured":"Nouha Dziri, Ehsan Kamalloo, Kory Mathewson, and Osmar Zaiane. 2019. Evaluating coherence in dialogue systems using entailment. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 3806\u20133812. DOI:10.18653\/v1\/N19-1381"},{"key":"e_1_3_5_21_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.specom.2008.04.002"},{"key":"e_1_3_5_22_2","first-page":"39","volume-title":"Proceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval (ICTIR\u201923)","author":"Faggioli Guglielmo","year":"2023","unstructured":"Guglielmo Faggioli, Laura Dietz, Charles L. A. Clarke, Gianluca Demartini, Matthias Hagen, Claudia Hauff, Noriko Kando, Evangelos Kanoulas, Martin Potthast, Benno Stein, and Henning Wachsmuth. 2023. Perspectives on large language models for relevance judgment. In Proceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval (ICTIR\u201923). Association for Computing Machinery, New York, NY, 39\u201350."},{"key":"e_1_3_5_23_2","doi-asserted-by":"crossref","first-page":"236","DOI":"10.18653\/v1\/2020.sigdial-1.29","volume-title":"Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue","author":"Finch Sarah E.","year":"2020","unstructured":"Sarah E. Finch and Jinho D. Choi. 2020. Towards unified dialogue system evaluation: A comprehensive analysis of current evaluation protocols. In Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue. Association for Computational Linguistics, 236\u2013245. Retrieved from https:\/\/aclanthology.org\/2020.sigdial-1.29"},{"key":"e_1_3_5_24_2","first-page":"1189","article-title":"Greedy function approximation: A gradient boosting machine","author":"Friedman Jerome H.","year":"2001","unstructured":"Jerome H. Friedman. 2001. Greedy function approximation: A gradient boosting machine. Annals of Statistics 29, 5 (2001), 1189\u20131232.","journal-title":"Annals of Statistics"},{"key":"e_1_3_5_25_2","doi-asserted-by":"publisher","DOI":"10.1016\/j.aiopen.2021.06.002"},{"key":"e_1_3_5_26_2","first-page":"1","article-title":"CIRS: Bursting filter bubbles by counterfactual interactive recommender system","author":"Gao Chongming","year":"2022","unstructured":"Chongming Gao, Shiqi Wang, Shijun Li, Jiawei Chen, Xiangnan He, Wenqiang Lei, Biao Li, Yuan Zhang, and Peng Jiang. 2022. CIRS: Bursting filter bubbles by counterfactual interactive recommender system. ACM Transactions on Information Systems 42, 1 (2022), 1\u201327.","journal-title":"ACM Transactions on Information Systems"},{"key":"e_1_3_5_27_2","doi-asserted-by":"publisher","DOI":"10.1561\/1500000074"},{"key":"e_1_3_5_28_2","doi-asserted-by":"crossref","first-page":"82","DOI":"10.18653\/v1\/W19-2310","volume-title":"Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation","author":"Ghazarian Sarik","year":"2019","unstructured":"Sarik Ghazarian, Johnny Wei, Aram Galstyan, and Nanyun Peng. 2019. Better automatic evaluation of open-domain dialogue systems with contextualized embeddings. In Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation. Association for Computational Linguistics, 82\u201389. DOI:10.18653\/v1\/W19-2310"},{"key":"e_1_3_5_29_2","doi-asserted-by":"crossref","first-page":"1891","DOI":"10.21437\/Interspeech.2019-3079","volume-title":"Proceedings of the Interspeech 2019","author":"Gopalakrishnan Karthik","year":"2019","unstructured":"Karthik Gopalakrishnan, Behnam Hedayatnia, Qinlang Chen, Anna Gottardi, Sanjeev Kwatra, Anu Venkatesh, Raefer Gabriel, and Dilek Hakkani-T\u00fcr. 2019. Topical-chat: Towards knowledge-grounded open-domain conversations. In Proceedings of the Interspeech 2019. 1891\u20131895. DOI:10.21437\/Interspeech.2019-3079"},{"key":"e_1_3_5_30_2","series-title":"Proceedings of the 27th ACM International Conference on Information and Knowledge Management","first-page":"1183","author":"Hashemi Seyyed Hadi","year":"2018","unstructured":"Seyyed Hadi Hashemi, Kyle Williams, Ahmed El Kholy, Imed Zitouni, and Paul A. Crook. 2018. Measuring user satisfaction on smart speaker intelligent assistants using intent sensitive query embeddings. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (Torino, Italy) (CIKM\u201918). Association for Computing Machinery, New York, NY, 1183\u20131192. DOI:10.1145\/3269206.3271802"},{"key":"e_1_3_5_31_2","first-page":"275","volume-title":"Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR\u201912)","author":"Hassan Ahmed","year":"2012","unstructured":"Ahmed Hassan. 2012. A semi-supervised approach to modeling web search satisfaction. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR\u201912). Association for Computing Machinery, New York, NY, 275\u2013284. DOI:10.1145\/2348283.2348323"},{"key":"e_1_3_5_32_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-0-387-21606-5"},{"key":"e_1_3_5_33_2","doi-asserted-by":"crossref","first-page":"9230","DOI":"10.18653\/v1\/2020.emnlp-main.742","volume-title":"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)","author":"Huang Lishan","year":"2020","unstructured":"Lishan Huang, Zheng Ye, Jinghui Qin, Liang Lin, and Xiaodan Liang. 2020. GRADE: Automatic graph-enhanced coherence metric for evaluating open-domain dialogue systems. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 9230\u20139240. DOI:10.18653\/v1\/2020.emnlp-main.742"},{"key":"e_1_3_5_34_2","doi-asserted-by":"publisher","DOI":"10.1145\/3453154"},{"key":"e_1_3_5_35_2","doi-asserted-by":"publisher","DOI":"10.1145\/582415.582418"},{"key":"e_1_3_5_36_2","series-title":"Proceedings of the 8th ACM International Conference on Web Search and Data Mining","first-page":"57","author":"Jiang Jiepu","year":"2015","unstructured":"Jiepu Jiang, Ahmed Hassan Awadallah, Xiaolin Shi, and Ryen W. White. 2015. Understanding and predicting graded search satisfaction. In Proceedings of the 8th ACM International Conference on Web Search and Data Mining (Shanghai, China) (WSDM\u201915). Association for Computing Machinery, New York, NY, 57\u201366. DOI:10.1145\/2684822.2685319"},{"key":"e_1_3_5_37_2","doi-asserted-by":"crossref","first-page":"258","DOI":"10.1145\/3576840.3578319","volume-title":"Proceedings of the 2023 ACM SIGIR Conference on Human Information Interaction & Retrieval (CHIIR)","author":"Jiang Shaojie","year":"2023","unstructured":"Shaojie Jiang, Svitlana Vakulenko, and Maarten de Rijke. 2023. Weakly supervised turn-level engagingness evaluator for dialogues. In Proceedings of the 2023 ACM SIGIR Conference on Human Information Interaction & Retrieval (CHIIR). Association for Computing Machinery, New York, NY, 258\u2013268. DOI:10.1145\/3576840.3578319"},{"key":"e_1_3_5_38_2","series-title":"Proceedings of the 9th International Conference on Human-Agent Interaction","first-page":"93","author":"Jin Yucheng","year":"2021","unstructured":"Yucheng Jin, Li Chen, Wanling Cai, and Pearl Pu. 2021. Key qualities of conversational recommender systems: From users\u2019 perspective. In Proceedings of the 9th International Conference on Human-Agent Interaction (Virtual Event, Japan) (HAI\u201921). Association for Computing Machinery, New York, NY, 93\u2013102. DOI:10.1145\/3472307.3484164"},{"key":"e_1_3_5_39_2","series-title":"Proceedings of the 11th ACM Conference on Recommender Systems","first-page":"229","author":"Kang Jie","year":"2017","unstructured":"Jie Kang, Kyle Condiff, Shuo Chang, Joseph A. Konstan, Loren Terveen, and F. Maxwell Harper. 2017. Understanding how people use natural language to ask for recommendations. In Proceedings of the 11th ACM Conference on Recommender Systems (Como, Italy) (RecSys\u201917). Association for Computing Machinery, New York, NY, 229\u2013237. DOI:10.1145\/3109859.3109873"},{"key":"e_1_3_5_40_2","doi-asserted-by":"publisher","DOI":"10.1561\/1500000012"},{"key":"e_1_3_5_41_2","series-title":"Proceedings of the 7th ACM International Conference on Web Search and Data Mining","first-page":"193","author":"Kim Youngho","year":"2014","unstructured":"Youngho Kim, Ahmed Hassan, Ryen W. White, and Imed Zitouni. 2014. Modeling dwell time to predict click-level satisfaction. In Proceedings of the 7th ACM International Conference on Web Search and Data Mining (New York, New York) (WSDM\u201914). Association for Computing Machinery, New York, NY, 193\u2013202. DOI:10.1145\/2556195.2556220"},{"key":"e_1_3_5_42_2","series-title":"Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval","first-page":"45","author":"Kiseleva Julia","year":"2016","unstructured":"Julia Kiseleva, Kyle Williams, Ahmed Hassan Awadallah, Aidan C. Crook, Imed Zitouni, and Tasos Anastasakos. 2016. Predicting user satisfaction with intelligent assistants. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (Pisa, Italy) (SIGIR\u201916). Association for Computing Machinery, New York, NY, 45\u201354. DOI:10.1145\/2911451.2911521"},{"key":"e_1_3_5_43_2","series-title":"Proceedings of the 2016 ACM Conference on Human Information Interaction and Retrieval, CHIIR","first-page":"121","author":"Kiseleva Julia","year":"2016","unstructured":"Julia Kiseleva, Kyle Williams, Jiepu Jiang, Ahmed Hassan Awadallah, Aidan C. Crook, Imed Zitouni, and Tasos Anastasakos. 2016. Understanding user satisfaction with intelligent assistants. In Proceedings of the 2016 ACM Conference on Human Information Interaction and Retrieval, CHIIR (Carrboro, North Carolina, USA) (CHIIR\u201916). Association for Computing Machinery, New York, NY, 121\u2013130. DOI:10.1145\/2854946.2854961"},{"key":"e_1_3_5_44_2","unstructured":"Margaret Li Jason Weston and Stephen Roller. 2019. ACUTE-EVAL: Improved dialogue evaluation with optimized questions and multi-turn comparisons. CoRR abs\/1909.03087 (2019). arXiv:1909.03087 http:\/\/arxiv.org\/abs\/1909.03087"},{"key":"e_1_3_5_45_2","first-page":"9748","volume-title":"Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montr\u00e9al, Canada","author":"Li Raymond","year":"2018","unstructured":"Raymond Li, Samira Ebrahimi Kahou, Hannes Schulz, Vincent Michalski, Laurent Charlin, and Chris Pal. 2018. Towards deep conversational recommendations. In Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montr\u00e9al, Canada. 9748\u20139758. Retrieved from https:\/\/proceedings.neurips.cc\/paper\/2018\/hash\/800de15c79c8d840f4e78d3af937d4d4-Abstract.html"},{"key":"e_1_3_5_46_2","first-page":"74","volume-title":"Proceedings of the Text Summarization Branches Out","author":"Lin Chin-Yew","year":"2004","unstructured":"Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Proceedings of the Text Summarization Branches Out. Association for Computational Linguistics, Barcelona, Spain, 74\u201381. Retrieved from https:\/\/aclanthology.org\/W04-1013"},{"key":"e_1_3_5_47_2","first-page":"2122","volume-title":"Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing","author":"Liu Chia-Wei","year":"2016","unstructured":"Chia-Wei Liu, Ryan Lowe, Iulian Serban, Mike Noseworthy, Laurent Charlin, and Joelle Pineau. 2016. How NOT To evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2122\u20132132. DOI:10.18653\/v1\/D16-1230"},{"key":"e_1_3_5_48_2","first-page":"1141","volume-title":"Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR","author":"Liu Jiqun","year":"2020","unstructured":"Jiqun Liu and Fangyuan Han. 2020. Investigating reference dependence effects on user search interaction and satisfaction: A behavioral economics perspective. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR. Association for Computing Machinery, 1141\u20131150."},{"key":"e_1_3_5_49_2","series-title":"Proceedings of the 2018 World Wide Web Conference","first-page":"1533","author":"Liu Mengyang","year":"2018","unstructured":"Mengyang Liu, Yiqun Liu, Jiaxin Mao, Cheng Luo, Min Zhang, and Shaoping Ma. 2018. \u201cSatisfaction with failure\u201d or \u201cunsatisfied success\u201d: Investigating the relationship between search success and user satisfaction. In Proceedings of the 2018 World Wide Web Conference (Lyon, France) (WWW\u201918). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 1533\u20131542. DOI:10.1145\/3178876.3186065"},{"key":"e_1_3_5_50_2","first-page":"1523","volume-title":"Proceedings of the SIGIR 2021: 44th international ACM SIGIR Conference on Research and Development in Information Retrieval","author":"Lu Hongyu","year":"2021","unstructured":"Hongyu Lu, Weizhi Ma, Min Zhang, Maarten de Rijke, Yiqun Liu, and Shaoping Ma. 2021. Standing in your shoes: External assessments for personalized recommender systems. In Proceedings of the SIGIR 2021: 44th international ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 1523\u20131533."},{"key":"e_1_3_5_51_2","doi-asserted-by":"crossref","first-page":"225","DOI":"10.18653\/v1\/2020.sigdial-1.28","volume-title":"Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue","author":"Mehri Shikib","year":"2020","unstructured":"Shikib Mehri and Maxine Eskenazi. 2020. Unsupervised evaluation of interactive dialog with DialoGPT. In Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue. Association for Computational Linguistics, 1st virtual meeting, 225\u2013235. Retrieved from https:\/\/aclanthology.org\/2020.sigdial-1.28"},{"key":"e_1_3_5_52_2","doi-asserted-by":"crossref","first-page":"681","DOI":"10.18653\/v1\/2020.acl-main.64","volume-title":"Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics","author":"Mehri Shikib","year":"2020","unstructured":"Shikib Mehri and Maxine Eskenazi. 2020. USR: An unsupervised and reference free evaluation metric for dialog generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 681\u2013707. DOI:10.18653\/v1\/2020.acl-main.64"},{"key":"e_1_3_5_53_2","series-title":"Proceedings of the 40th Annual Meeting on Association for Computational Linguistics","first-page":"311","author":"Papineni Kishore","year":"2002","unstructured":"Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (Philadelphia, Pennsylvania) (ACL\u201902). Association for Computational Linguistics, 311\u2013318. DOI:10.3115\/1073083.1073135"},{"key":"e_1_3_5_54_2","first-page":"25","volume-title":"Proceedings of the 2019 Conference on Human Information Interaction and Retrieval, CHIIR","author":"Qu Chen","year":"2019","unstructured":"Chen Qu, Liu Yang, W. Bruce Croft, Yongfeng Zhang, Johanne R. Trippas, and Minghui Qiu. 2019. User intent prediction in information-seeking conversations. In Proceedings of the 2019 Conference on Human Information Interaction and Retrieval, CHIIR. Association for Computing Machinery, 25\u201333. DOI:10.1145\/3295750.3298924"},{"key":"e_1_3_5_55_2","doi-asserted-by":"crossref","first-page":"353","DOI":"10.18653\/v1\/W19-5941","volume-title":"Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue","author":"Radlinski Filip","year":"2019","unstructured":"Filip Radlinski, Krisztian Balog, Bill Byrne, and Karthik Krishnamoorthi. 2019. Coached conversational preference elicitation: A case study in understanding movie preferences. In Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue. Association for Computational Linguistics, 353\u2013360. DOI:10.18653\/v1\/W19-5941"},{"key":"e_1_3_5_56_2","doi-asserted-by":"publisher","DOI":"10.1609\/aaai.v34i05.6394"},{"key":"e_1_3_5_57_2","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/978-0-387-85820-3","volume-title":"Recommender Systems Handbook","author":"Ricci Francesco","year":"2011","unstructured":"Francesco Ricci, Lior Rokach, and Bracha Shapira. 2011. Introduction to recommender systems handbook. In Recommender Systems Handbook. Springer, 1\u201335."},{"key":"e_1_3_5_58_2","volume-title":"Information Science: Integration in Perspectives. Proceedings of the Second Conference on Conceptions of Library and Information Science, Copenhagen (Denmark)","author":"Saracevic Tefko","year":"1996","unstructured":"Tefko Saracevic. 1996. Relevance reconsidered. In Information Science: Integration in Perspectives. Proceedings of the Second Conference on Conceptions of Library and Information Science, Copenhagen (Denmark)."},{"key":"e_1_3_5_59_2","first-page":"3369","volume-title":"Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC\u201912)","author":"Schmitt Alexander","year":"2012","unstructured":"Alexander Schmitt, Stefan Ultes, and Wolfgang Minker. 2012. A parameterized and annotated spoken dialog corpus of the CMU Let\u2019s go bus information system. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC\u201912). European Language Resources Association (ELRA), Istanbul, Turkey, 3369\u20133373. Retrieved from http:\/\/www.lrec-conf.org\/proceedings\/lrec2012\/pdf\/333_Paper.pdf"},{"key":"e_1_3_5_60_2","first-page":"1702","volume-title":"Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)","author":"See Abigail","year":"2019","unstructured":"Abigail See, Stephen Roller, Douwe Kiela, and Jason Weston. 2019. What makes a good conversation? How controllable attributes affect human judgments. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 1702\u20131723. DOI:10.18653\/v1\/N19-1170"},{"key":"e_1_3_5_61_2","series-title":"Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval","first-page":"2018","author":"Siro Clemencia","year":"2022","unstructured":"Clemencia Siro, Mohammad Aliannejadi, and Maarten de Rijke. 2022. Understanding user satisfaction with task-oriented dialogue systems. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (Madrid, Spain) (SIGIR\u201922). Association for Computing Machinery, New York, NY, 2018\u20132023. DOI:10.1145\/3477495.3531798"},{"key":"e_1_3_5_62_2","first-page":"1570","volume-title":"Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies","author":"Sun Kai","year":"2021","unstructured":"Kai Sun, Seungwhan Moon, Paul Crook, Stephen Roller, Becka Silvert, Bing Liu, Zhiguang Wang, Honglei Liu, Eunjoon Cho, and Claire Cardie. 2021. Adding chit-chat to enhance task-oriented dialogues. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 1570\u20131583. DOI:10.18653\/v1\/2021.naacl-main.124"},{"key":"e_1_3_5_63_2","doi-asserted-by":"crossref","first-page":"2499","DOI":"10.1145\/3404835.3463241","volume-title":"Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval","author":"Sun Weiwei","year":"2021","unstructured":"Weiwei Sun, Shuo Zhang, Krisztian Balog, Zhaochun Ren, Pengjie Ren, Zhumin Chen, and Maarten de Rijke. 2021. Simulating user satisfaction for the evaluation of task-oriented dialogue systems. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, New York, NY, 2499\u20132506. DOI:10.1145\/3404835.3463241"},{"key":"e_1_3_5_64_2","doi-asserted-by":"crossref","first-page":"225","DOI":"10.1145\/383952.383992","volume-title":"Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval","author":"Turpin Andrew H.","year":"2001","unstructured":"Andrew H. Turpin and William Hersh. 2001. Why batch and user evaluations do not give the same results. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (New Orleans, Louisiana, USA). Association for Computing Machinery, 225\u2013231. DOI:10.1145\/383952.383992"},{"key":"e_1_3_5_65_2","unstructured":"Anu Venkatesh Chandra Khatri Ashwin Ram Fenfei Guo Raefer Gabriel Ashish Nagar Rohit Prasad Ming Cheng Behnam Hedayatnia Angeliki Metallinou Rahul Goel Shaohua Yang and Anirudh Raju. 2018. On evaluating and comparing open domain dialog systems. arXiv:1801.03625. Retrieved from https:\/\/arxiv.org\/abs\/1801.03625"},{"key":"e_1_3_5_66_2","series-title":"Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics","first-page":"271","author":"Walker Marilyn A.","year":"1997","unstructured":"Marilyn A. Walker, Diane J. Litman, Candace A. Kamm, and Alicia Abella. 1997. PARADISE: A framework for evaluating spoken dialogue agents. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics (Madrid, Spain) (ACL\u201998\/EACL\u201998). Association for Computational Linguistics, 271\u2013280. DOI:10.3115\/976909.979652"},{"key":"e_1_3_5_67_2","first-page":"1929","volume-title":"KDD\u201922: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining","author":"Wang Xiaolei","year":"2022","unstructured":"Xiaolei Wang, Kun Zhou, Ji-Rong Wen, and Wayne Xin Zhao. 2022. Towards unified conversational recommender systems via knowledge-enhanced prompt learning. In KDD\u201922: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. Aidong Zhang and Huzefa Rangwala (Eds.). ACM, 1929\u20131937. DOI:10.1145\/3534678.3539382"},{"key":"e_1_3_5_68_2","doi-asserted-by":"crossref","first-page":"185","DOI":"10.21437\/Interspeech.2004-114","volume-title":"Proceedings of the Interspeech 2004","author":"Williams Jason D.","year":"2004","unstructured":"Jason D. Williams and Steve Young. 2004. Characterizing task-oriented dialog using a simulated ASR chanel. In Proceedings of the Interspeech 2004. 185\u2013188. DOI:10.21437\/Interspeech.2004-114"},{"key":"e_1_3_5_69_2","first-page":"217","volume-title":"Reading and Understanding Multivariate Statistics","author":"Wright Raymond E.","year":"1995","unstructured":"Raymond E. Wright. 1995. Logistic regression. In Reading and Understanding Multivariate Statistics. L.G. Grimm and P. R. Yarnold (Eds.). American Psychological Association, 217\u2013244."},{"key":"e_1_3_5_70_2","first-page":"5676","volume-title":"Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)","author":"Zhang Chen","year":"2021","unstructured":"Chen Zhang, Yiming Chen, Luis Fernando D\u2019Haro, Yan Zhang, Thomas Friedrichs, Grandee Lee, and Haizhou Li. 2021. DynaEval: Unifying turn and dialogue level evaluation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, 5676\u20135689. DOI:10.18653\/v1\/2021.acl-long.441"},{"key":"e_1_3_5_71_2","doi-asserted-by":"publisher","DOI":"10.1109\/TASLP.2021.3074012"},{"key":"e_1_3_5_72_2","doi-asserted-by":"crossref","first-page":"1512","DOI":"10.1145\/3394486.3403202","volume-title":"Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","author":"Zhang Shuo","year":"2020","unstructured":"Shuo Zhang and Krisztian Balog. 2020. Evaluating conversational recommender systems via user simulation. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, New York, NY, 1512\u20131520. DOI:10.1145\/3394486.3403202"},{"key":"e_1_3_5_73_2","doi-asserted-by":"crossref","first-page":"2204","DOI":"10.18653\/v1\/P18-1205","volume-title":"Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)","author":"Zhang Saizheng","year":"2018","unstructured":"Saizheng Zhang, Emily Dinan, Jack Urbanek, Arthur Szlam, Douwe Kiela, and Jason Weston. 2018. Personalizing dialogue agents: I have a dog, do you have pets too?. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 2204\u20132213. DOI:10.18653\/v1\/P18-1205"},{"key":"e_1_3_5_74_2","first-page":"185","volume-title":"Proceedings of the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL 2021\u2014System Demonstrations, Online, August 1-6, 2021","author":"Zhou Kun","year":"2021","unstructured":"Kun Zhou, Xiaolei Wang, Yuanhang Zhou, Chenzhan Shang, Yuan Cheng, Wayne Xin Zhao, Yaliang Li, and Ji-Rong Wen. 2021. CRSLab: An open-source toolkit for building conversational recommender system. In Proceedings of the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL 2021\u2014System Demonstrations, Online, August 1-6, 2021, Heng Ji, Jong C. Park, and Rui Xia (Eds.). Association for Computational Linguistics, 185\u2013193. DOI:10.18653\/v1\/2021.acl-demo.22"},{"key":"e_1_3_5_75_2","first-page":"1006","volume-title":"KDD\u201920: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, CA, USA, August 23-27, 2020","author":"Zhou Kun","year":"2020","unstructured":"Kun Zhou, Wayne Xin Zhao, Shuqing Bian, Yuanhang Zhou, Ji-Rong Wen, and Jingsong Yu. 2020. Improving conversational recommender systems via knowledge graph based semantic fusion. In KDD\u201920: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, CA, USA, August 23-27, 2020, Rajesh Gupta, Yan Liu, Jiliang Tang, and B. Aditya Prakash (Eds.). ACM, 1006\u20131014. DOI:10.1145\/3394486.3403143"},{"key":"e_1_3_5_76_2","first-page":"4128","volume-title":"Proceedings of the 28th International Conference on Computational Linguistics, COLING 2020, Barcelona, Spain (Online), December 8-13, 2020","author":"Zhou Kun","year":"2020","unstructured":"Kun Zhou, Yuanhang Zhou, Wayne Xin Zhao, Xiaoke Wang, and Ji-Rong Wen. 2020. Towards topic-guided conversational recommender system. In Proceedings of the 28th International Conference on Computational Linguistics, COLING 2020, Barcelona, Spain (Online), December 8-13, 2020, Donia Scott, N\u00faria Bel, and Chengqing Zong (Eds.). International Committee on Computational Linguistics, 4128\u20134139. DOI:10.18653\/v1\/2020.coling-main.365"}],"container-title":["ACM Transactions on Information Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3624989","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3624989","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T16:35:46Z","timestamp":1750178146000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3624989"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,11,8]]},"references-count":75,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2024,3,31]]}},"alternative-id":["10.1145\/3624989"],"URL":"https:\/\/doi.org\/10.1145\/3624989","relation":{},"ISSN":["1046-8188","1558-2868"],"issn-type":[{"value":"1046-8188","type":"print"},{"value":"1558-2868","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,11,8]]},"assertion":[{"value":"2023-01-20","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-09-04","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-11-08","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}