{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,7,31]],"date-time":"2024-07-31T00:27:01Z","timestamp":1722385621949},"reference-count":37,"publisher":"Springer Science and Business Media LLC","issue":"4","license":[{"start":{"date-parts":[[2024,6,4]],"date-time":"2024-06-04T00:00:00Z","timestamp":1717459200000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,6,4]],"date-time":"2024-06-04T00:00:00Z","timestamp":1717459200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"Vagelis Hristidis"}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Data Min Knowl Disc"],"published-print":{"date-parts":[[2024,7]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>Existing work on task-oriented dialog systems generally assumes that the interaction of users with the system is restricted to the information stored in a closed data schema. However, in practice users may ask \u2018out-of-schema\u2019 questions, that is, questions that the system cannot answer, because the information does not exist in the schema. Failure to answer these questions may lead the users to drop out of the chat before reaching the success state (e.g. reserving a restaurant). A key challenge is that the number of these questions may be too high for a domain expert to answer them all. We formulate the problem of out-of-schema question detection and selection that identifies the most critical out-of-schema questions to answer, in order to maximize the expected success rate of the system. We propose a two-stage pipeline to solve the problem. In the first stage, we propose a novel in-context learning (ICL) approach to detect out-of-schema questions. In the second stage, we propose two algorithms for out-of-schema question selection (OQS): a naive approach that chooses a question based on its frequency in the dropped-out conversations, and a probabilistic approach that represents each conversation as a Markov chain and a question is picked based on its overall benefit. We propose and publish two new datasets for the problem, as existing datasets do not contain out-of-schema questions or user drop-outs. Our quantitative and simulation-based experimental analyses on these datasets measure how our methods can effectively identify out-of-schema questions and positively impact the success rate of the system.<\/jats:p>","DOI":"10.1007\/s10618-024-01039-6","type":"journal-article","created":{"date-parts":[[2024,6,4]],"date-time":"2024-06-04T06:01:59Z","timestamp":1717480919000},"page":"2466-2494","update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Modeling the impact of out-of-schema questions in task-oriented dialog systems"],"prefix":"10.1007","volume":"38","author":[{"given":"Jannat Ara","family":"Meem","sequence":"first","affiliation":[]},{"given":"Muhammad Shihab","family":"Rashid","sequence":"additional","affiliation":[]},{"given":"Vagelis","family":"Hristidis","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,6,4]]},"reference":[{"issue":"15","key":"1039_CR1","doi-asserted-by":"publisher","first-page":"17356","DOI":"10.1007\/s10489-022-03295-9","volume":"52","author":"WA Abro","year":"2022","unstructured":"Abro WA, Qi G, Aamir M, Ali Z (2022) Joint intent detection and slot filling using weighted finite state transducer and bert. Appl Intell 52(15):17356\u201317370","journal-title":"Appl Intell"},{"key":"1039_CR2","doi-asserted-by":"crossref","unstructured":"Bang Y, Cahyawijaya S, Lee N, Dai W, Su D, Wilie B, Lovenia H, Ji Z, Yu T, Chung W, et al (2023) A multitask, multilingual, multimodal evaluation of ChatGPT on reasoning, hallucination, and interactivity","DOI":"10.18653\/v1\/2023.ijcnlp-main.45"},{"key":"1039_CR3","unstructured":"Bert-large-uncased-wwm-finetuned-boolq. https:\/\/huggingface.co\/lewtun\/bert-large-uncaseda-wwm-finetuned-boolq"},{"issue":"1\u20137","key":"1039_CR4","doi-asserted-by":"publisher","first-page":"107","DOI":"10.1016\/S0169-7552(98)00110-X","volume":"30","author":"S Brin","year":"1998","unstructured":"Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30(1\u20137):107\u2013117","journal-title":"Comput Netw ISDN Syst"},{"key":"1039_CR5","first-page":"1877","volume":"33","author":"T Brown","year":"2020","unstructured":"Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A (2020) Language models are few-shot learners. Adv Neural Inf Process Syst 33:1877\u20131901","journal-title":"Adv Neural Inf Process Syst"},{"key":"1039_CR6","doi-asserted-by":"crossref","unstructured":"Budzianowski P, Wen T-H, Tseng B-H, Casanueva I, Ultes S, Ramadan O, Ga\u0161i\u0107 M (2018) Multiwoz\u2013a large-scale multi-domain wizard-of-oz dataset for task-oriented dialogue modelling. arXiv preprint arXiv:1810.00278","DOI":"10.18653\/v1\/D18-1547"},{"key":"1039_CR7","doi-asserted-by":"crossref","unstructured":"Chen L, Lv B, Wang C, Zhu S, Tan B, Yu K (2020) Schema-guided multi-domain dialogue state tracking with graph attention neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 7521\u20137528","DOI":"10.1609\/aaai.v34i05.6250"},{"key":"1039_CR8","unstructured":"Chung HW, Hou L, Longpre S, Zoph B, Tay Y, Fedus W, Li Y, Wang X, Dehghani M, Brahma S, et al (2022) Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416"},{"key":"1039_CR9","unstructured":"Clark C, Lee K, Chang M-W, Kwiatkowski T, Collins M, Toutanova K (2019) Boolq: Exploring the surprising difficulty of natural yes\/no questions. arXiv preprint arXiv:1905.10044"},{"key":"1039_CR10","doi-asserted-by":"crossref","unstructured":"Coucke A, Saade A, Ball A, Bluche T, Caulier A, Leroy D, Doumouro C, Gisselbrecht T, Caltagirone F, Lavril T, et al (2018) Snips voice platform: an embedded spoken language understanding system for private-by-design voice interfaces. arXiv preprint arXiv:1805.10190","DOI":"10.1109\/EMC2-NIPS53020.2019.00021"},{"key":"1039_CR11","doi-asserted-by":"crossref","unstructured":"Deng Y, Zhang W, Lam W, Cheng H, Meng H (2022) User satisfaction estimation with sequential dialogue act modeling in goal-oriented conversational systems. In: Proceedings of the ACM web conference 2022, pp. 2998\u20133008","DOI":"10.1145\/3485447.3512020"},{"key":"1039_CR12","unstructured":"Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805"},{"key":"1039_CR13","doi-asserted-by":"publisher","first-page":"103335","DOI":"10.1016\/j.jretconser.2023.103335","volume":"73","author":"AG Fernando","year":"2023","unstructured":"Fernando AG, Aw EC-X (2023) What do consumers want? a methodological framework to identify determinant product attributes from consumers\u2019 online questions. J Retail Consum Serv 73:103335","journal-title":"J Retail Consum Serv"},{"key":"1039_CR14","doi-asserted-by":"crossref","unstructured":"Hackl V, M\u00fcller AE, Granitzer M, Sailer M (2023) Is gpt-4 a reliable rater? evaluating consistency in gpt-4 text ratings. arXiv preprint arXiv:2308.02575","DOI":"10.3389\/feduc.2023.1272229"},{"key":"1039_CR15","doi-asserted-by":"crossref","unstructured":"Hu Y, Lee C-H, Xie T, Yu T, Smith NA, Ostendorf M (2022) In-context learning for few-shot dialogue state tracking. arXiv preprint arXiv:2203.08568","DOI":"10.18653\/v1\/2022.findings-emnlp.193"},{"issue":"7","key":"1039_CR16","doi-asserted-by":"publisher","first-page":"1358","DOI":"10.1002\/asi.21071","volume":"60","author":"BJ Jansen","year":"2009","unstructured":"Jansen BJ, Booth DL, Spink A (2009) Patterns of query reformulation during web searching. J Am Soc Inform Sci Technol 60(7):1358\u20131371","journal-title":"J Am Soc Inform Sci Technol"},{"key":"1039_CR17","unstructured":"Jurafsky D, Martin JH (2009) Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition"},{"key":"1039_CR18","doi-asserted-by":"crossref","unstructured":"Kim S, Eric M, Gopalakrishnan K, Hedayatnia B, Liu Y, Hakkani-Tur D (2020) Beyond domain apis: task-oriented conversational modeling with unstructured knowledge access. arXiv preprint arXiv:2006.03533","DOI":"10.18653\/v1\/2020.sigdial-1.35"},{"key":"1039_CR19","doi-asserted-by":"crossref","unstructured":"Kim Y, Hassan A, White RW, Zitouni I (2014) Modeling dwell time to predict click-level satisfaction. In: Proceedings of the 7th ACM International conference on web search and data mining, pp. 193\u2013202","DOI":"10.1145\/2556195.2556220"},{"key":"1039_CR20","first-page":"22199","volume":"35","author":"T Kojima","year":"2022","unstructured":"Kojima T, Gu SS, Reid M, Matsuo Y, Iwasawa Y (2022) Large language models are zero-shot reasoners. Adv Neural Inf Process Syst 35:22199\u201322213","journal-title":"Adv Neural Inf Process Syst"},{"key":"1039_CR21","unstructured":"Larson S, Leach K (2022) A survey of intent classification and slot-filling datasets for task-oriented dialog. arXiv preprint arXiv:2207.13211"},{"key":"1039_CR22","doi-asserted-by":"crossref","unstructured":"Li C-H, Yeh S-F, Chang T-J, Tsai M-H, Chen K, Chang Y-J (2020) A conversation analysis of non-progress and coping strategies with a banking task-oriented chatbot. In: Proceedings of the 2020 CHI conference on human factors in computing systems, pp. 1\u201312","DOI":"10.1145\/3313831.3376209"},{"key":"1039_CR23","unstructured":"Liu X, Eshghi A, Swietojanski P, Rieser V (2019) Benchmarking natural language understanding services for building conversational agents. arXiv preprint arXiv:1903.05566"},{"key":"1039_CR24","doi-asserted-by":"crossref","unstructured":"Maqbool MH, Xu L, Siddique A, Montazeri N, Hristidis V, Foroosh H (2022) Zero-label anaphora resolution for off-script user queries in goal-oriented dialog systems. In: 2022 IEEE 16th international conference on semantic computing (ICSC). IEEE, pp. 217\u2013224","DOI":"10.1109\/ICSC52841.2022.00043"},{"key":"1039_CR25","unstructured":"OpenAI, R (2023) Gpt-4 technical report. arXiv:2303.08774"},{"key":"1039_CR26","doi-asserted-by":"crossref","unstructured":"Pan Y, Ma M, Pflugfelder B, Groh G (2022) User satisfaction modeling with domain adaptation in task-oriented dialogue systems. In: Proceedings of the 23rd Annual meeting of the special interest group on discourse and dialogue, pp. 630\u2013636","DOI":"10.18653\/v1\/2022.sigdial-1.59"},{"key":"1039_CR27","doi-asserted-by":"crossref","unstructured":"Ponnusamy P, Ghias AR, Guo C, Sarikaya R (2020) Feedback-based self-learning in large-scale conversational ai agents. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp. 13180\u201313187","DOI":"10.1609\/aaai.v34i08.7022"},{"key":"1039_CR28","doi-asserted-by":"crossref","unstructured":"Rastogi A, Zang X, Sunkara S, Gupta R, Khaitan P (2020) Towards scalable multi-domain conversational agents: The schema-guided dialogue dataset. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp. 8689\u20138696","DOI":"10.1609\/aaai.v34i05.6394"},{"key":"1039_CR29","unstructured":"Roberta-base-boolq. https:\/\/huggingface.co\/shahrukhx01\/roberta-base-boolq"},{"key":"1039_CR30","doi-asserted-by":"crossref","unstructured":"Siro C, Aliannejadi M, Rijke M (2022) Understanding user satisfaction with task-oriented dialogue systems. In: Proceedings of the 45th International ACM SIGIR conference on research and development in information retrieval, pp. 2018\u20132023","DOI":"10.1145\/3477495.3531798"},{"key":"1039_CR31","unstructured":"t5-base-finetuned-boolq. https:\/\/huggingface.co\/mrm8488\/t5-base-finetuned-boolq"},{"key":"1039_CR32","unstructured":"Touvron H, Martin L, Stone K, Albert P, Almahairi A, Babaei Y, Bashlykov N, Batra S, Bhargava P, Bhosale S, et al (2023) Llama 2: open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288"},{"key":"1039_CR33","doi-asserted-by":"publisher","DOI":"10.1055\/2015956468","author":"J Wang","year":"2015","unstructured":"Wang J, Huang JZ, Wu D et al (2015) Recommending high utility queries via query-reformulation graph. Math Probl Eng. https:\/\/doi.org\/10.1055\/2015956468","journal-title":"Math Probl Eng"},{"key":"1039_CR34","doi-asserted-by":"crossref","unstructured":"Wang J, Li J, Zhao H (2023) Self-prompted chain-of-thought on large language models for open-domain multi-hop reasoning. arXiv preprint arXiv:2310.13552","DOI":"10.18653\/v1\/2023.findings-emnlp.179"},{"key":"1039_CR35","first-page":"24824","volume":"35","author":"J Wei","year":"2022","unstructured":"Wei J, Wang X, Schuurmans D, Bosma M, Xia F, Chi E, Le QV, Zhou D (2022) Chain-of-thought prompting elicits reasoning in large language models. Adv Neural Inf Process Syst 35:24824\u201324837","journal-title":"Adv Neural Inf Process Syst"},{"key":"1039_CR36","doi-asserted-by":"crossref","unstructured":"Zhao R, Li X, Joty S, Qin C, Bing L (2023) Verify-and-edit: a knowledge-enhanced chain-of-thought framework. arXiv preprint arXiv:2305.03268","DOI":"10.18653\/v1\/2023.acl-long.320"},{"key":"1039_CR37","doi-asserted-by":"crossref","unstructured":"Zhu X, Guo J, Cheng X, Lan Y (2012) More than relevance: high utility query recommendation by mining users\u2019 search behaviors. In: Proceedings of the 21st ACM international conference on information and knowledge management, pp. 1814\u20131818","DOI":"10.1145\/2396761.2398523"}],"container-title":["Data Mining and Knowledge Discovery"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10618-024-01039-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10618-024-01039-6\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10618-024-01039-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,30]],"date-time":"2024-07-30T10:04:28Z","timestamp":1722333868000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10618-024-01039-6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,6,4]]},"references-count":37,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2024,7]]}},"alternative-id":["1039"],"URL":"https:\/\/doi.org\/10.1007\/s10618-024-01039-6","relation":{},"ISSN":["1384-5810","1573-756X"],"issn-type":[{"type":"print","value":"1384-5810"},{"type":"electronic","value":"1573-756X"}],"subject":[],"published":{"date-parts":[[2024,6,4]]},"assertion":[{"value":"6 December 2023","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"16 May 2024","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"4 June 2024","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"This project does not involve any live subjects (human or animal), and no personal or restricted data were collected from any source or subject. Although the LLMs may sometimes generate fake information i.e. hallucinate, our experiments do not involve LLMs in creating any harmful content and, thus raise no ethical concern. As per our knowledge, there are no ethical implications for this project.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Human and animal rights"}}]}}