{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,4,3]],"date-time":"2026-04-03T15:20:24Z","timestamp":1775229624156,"version":"3.50.1"},"reference-count":56,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2012,5,1]],"date-time":"2012-05-01T00:00:00Z","timestamp":1335830400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["208835"],"award-info":[{"award-number":["208835"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Speech Lang. Process."],"published-print":{"date-parts":[[2012,5]]},"abstract":"<jats:p>Even as progress in speech technologies and task and dialog modeling has allowed the development of advanced spoken dialog systems, the low-level interaction behavior of those systems often remains rigid and inefficient. Based on an analysis of human-human and human-computer turn-taking in naturally occurring task-oriented dialogs, we define a set of features that can be automatically extracted and show that they can be used to inform efficient end-of-turn detection. We then frame turn-taking as decision making under uncertainty and describe the Finite-State Turn-Taking Machine (FSTTM), a decision-theoretic model that combines data-driven machine learning methods and a cost structure derived from Conversation Analysis to control the turn-taking behavior of dialog systems. Evaluation results on CMU Let's Go, a publicly deployed bus information system, confirm that the FSTTM significantly improves the responsiveness of the system compared to a standard threshold-based approach, as well as previous data-driven methods.<\/jats:p>","DOI":"10.1145\/2168748.2168749","type":"journal-article","created":{"date-parts":[[2012,5,15]],"date-time":"2012-05-15T13:33:09Z","timestamp":1337088789000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":21,"title":["Optimizing the turn-taking behavior of task-oriented spoken dialog systems"],"prefix":"10.1145","volume":"9","author":[{"given":"Antoine","family":"Raux","sequence":"first","affiliation":[{"name":"Honda Research Institute USA, Mountain View, CA"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Maxine","family":"Eskenazi","sequence":"additional","affiliation":[{"name":"Carnegie Mellon University, Pittsburgh, PA"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2012,5,16]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Proceedings of the International Conference on Computational Linguistics (COLING).","author":"Atterer M."},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1515\/semi.1982.39.1-2.93"},{"key":"e_1_2_1_3_1","volume-title":"Proceedings of the SIGDIAL Conference, Special Interest Group on Discourse and Dialogue.","author":"Black A."},{"key":"e_1_2_1_4_1","volume-title":"Proceedings of the SIGDIAL Conference, Special Interest Group on Discourse and Dialogue.","author":"Bohus D.","year":"2011"},{"key":"e_1_2_1_5_1","doi-asserted-by":"crossref","unstructured":"Bohus D. and Rudnicky A. 2002. Integrating multiple knowledge sources for utterance-level confidence annotation in the CMU Communicator spoken dialog system. Tech. rep. CS-190 Carnegie Mellon University Pittsburgh PA.  Bohus D. and Rudnicky A. 2002. Integrating multiple knowledge sources for utterance-level confidence annotation in the CMU Communicator spoken dialog system. Tech. rep. CS-190 Carnegie Mellon University Pittsburgh PA.","DOI":"10.21236\/ADA461099"},{"key":"e_1_2_1_6_1","volume-title":"Proceedings of the Conference on Speech Communication and Technology (EUROSPEECH).","author":"Bohus D."},{"key":"e_1_2_1_7_1","volume-title":"Proceedings of the SIGDIAL Conference, Special Interest Group on Discourse and Dialogue.","author":"Bohus D."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.csl.2008.10.001"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1002\/j.1538-7305.1969.tb01181.x"},{"key":"e_1_2_1_10_1","unstructured":"Bull M. 1997. The timing and coordination of turn-taking. Ph.D. thesis University of Edinburgh.  Bull M. 1997. The timing and coordination of turn-taking. Ph.D. thesis University of Edinburgh."},{"key":"e_1_2_1_11_1","volume-title":"Proceedings of the International Conference on Spoken Language Processing (ISCLP). 1175--1178","author":"Bull M."},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.5555\/972684.972686"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1145\/383259.383315"},{"key":"e_1_2_1_14_1","volume-title":"Talking Data: Transcription and Coding Methods for Language Research","author":"Chafe W. L.","year":"1992"},{"key":"e_1_2_1_15_1","volume-title":"Proceedings of the IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN).","author":"Chao C."},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1016\/0378-2166(95)00036-4"},{"key":"e_1_2_1_17_1","volume-title":"Proceedings of the Conference on Speech Communication and Technology (EUROSPEECH).","author":"Clarkson P."},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1353\/lan.2006.0130"},{"key":"e_1_2_1_19_1","volume-title":"Proceedings of the 10th SIGDIAL Meeting on Discourse and Dialogue.","author":"DeVault D."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1037\/h0033031"},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of Interspeech.","author":"Edlund J."},{"key":"e_1_2_1_22_1","volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP).","author":"Ferrer L."},{"key":"e_1_2_1_23_1","doi-asserted-by":"crossref","unstructured":"Ford C. E. and Thompson S. A. 1996. Interaction and Grammar. Cambridge University Press 134--184.  Ford C. E. and Thompson S. A. 1996. Interaction and Grammar. Cambridge University Press 134--184.","DOI":"10.1017\/CBO9780511620874.003"},{"key":"e_1_2_1_24_1","doi-asserted-by":"crossref","unstructured":"Furo H. 2001. Turn-Taking in English and Japanese. Projectability in Grammar Intonation and Semantics. Routeledge.  Furo H. 2001. Turn-Taking in English and Japanese. Projectability in Grammar Intonation and Semantics. Routeledge.","DOI":"10.1515\/jjl-2002-0108"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.csl.2010.10.003"},{"key":"e_1_2_1_26_1","volume-title":"Proceedings of the 10th International Conference on Autnomous Agents and Multiagent Systems.","author":"Huang L."},{"key":"e_1_2_1_27_1","volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP).","author":"Huggins-Dai Nes D."},{"key":"e_1_2_1_28_1","unstructured":"Jaffe J. and Feldstein S. 1970. Rhythms of Dialogue. Academic Press.  Jaffe J. and Feldstein S. 1970. Rhythms of Dialogue. Academic Press."},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1177\/002383099804100404"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1007\/11839354_7"},{"key":"e_1_2_1_31_1","volume-title":"Proceedings of the Meeting of the Association for Conversational Linguistics (ACL).","author":"Laskowski K.","year":"2010"},{"key":"e_1_2_1_32_1","volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP).","author":"Laskowski K."},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10458-009-9092-y"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/1514095.1514109"},{"key":"e_1_2_1_35_1","unstructured":"Orestr\u00f6m B. 1983. Turn-Taking in English Conversation. CWK Gleerup Lund.  Orestr\u00f6m B. 1983. Turn-Taking in English Conversation. CWK Gleerup Lund."},{"key":"e_1_2_1_36_1","volume-title":"Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence.","author":"Paek T."},{"key":"e_1_2_1_37_1","volume-title":"Proceedings of the Human language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics (HLT\/NAACL).","author":"Porzel R."},{"key":"e_1_2_1_38_1","volume-title":"Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).","author":"Raux A."},{"key":"e_1_2_1_39_1","doi-asserted-by":"crossref","unstructured":"Raux A. 2008. Flexible turn-taking for spoken dialog systems. Ph.D. thesis Language Technologies Institute Carnegie Mellon University.  Raux A. 2008. Flexible turn-taking for spoken dialog systems. Ph.D. thesis Language Technologies Institute Carnegie Mellon University.","DOI":"10.3115\/1620754.1620846"},{"key":"e_1_2_1_40_1","volume-title":"Proceedings of the 9th International Conference on Spoken Language Processing (Interspeech).","author":"Raux A."},{"key":"e_1_2_1_41_1","volume-title":"Proceedings of the 8th SIGDIAL Meeting on Discourse and Dialogue.","author":"Raux A."},{"key":"e_1_2_1_42_1","volume-title":"Proceedings of the Human language Technologies. Conference of the North American Chapter of the Association of Computational Linguistics (HLT\/NAACL).","author":"Raux A."},{"key":"e_1_2_1_43_1","volume-title":"Proceedings of the Conference on Speech Communication and Technology (EUROSPEECH).","author":"Raux A."},{"key":"e_1_2_1_44_1","volume-title":"Proceedings of the International Conference on Spoken Language Processing (Interspeech).","author":"Raux A."},{"key":"e_1_2_1_45_1","doi-asserted-by":"publisher","DOI":"10.1353\/lan.1974.0010"},{"key":"e_1_2_1_46_1","volume-title":"Proceedings of the International Conference on Spoken Language Processing (ICSLP).","author":"Sato R."},{"key":"e_1_2_1_47_1","doi-asserted-by":"publisher","DOI":"10.1017\/S0047404500001019"},{"key":"e_1_2_1_48_1","volume-title":"Proceedings of the SIGDIAL Meeting on Discourse and Dialogue.","author":"Schlangen D."},{"key":"e_1_2_1_49_1","unstructured":"Sjolander K. 2004. The snack sound toolkit. http:\/\/www.speech.kth.se\/snack\/.  Sjolander K. 2004. The snack sound toolkit. http:\/\/www.speech.kth.se\/snack\/."},{"key":"e_1_2_1_50_1","volume-title":"Proceedings of the Speech Prosody Conference.","author":"Takeuchi M."},{"key":"e_1_2_1_51_1","doi-asserted-by":"crossref","unstructured":"Thorisson K. R. 2002. Multimodality in Language and Speech Systems. Kluwer Academic Publishers 173--207.   Thorisson K. R. 2002. Multimodality in Language and Speech Systems. Kluwer Academic Publishers 173--207.","DOI":"10.1007\/978-94-017-2367-1_8"},{"key":"e_1_2_1_52_1","volume-title":"Proceedings of the International Conference on Spoken Language Processing (Interspeech).","author":"Ward N."},{"key":"e_1_2_1_53_1","volume-title":"Proceedings of the International Conference on Spoken Language Processing (Interspeech).","author":"Ward N."},{"key":"e_1_2_1_54_1","doi-asserted-by":"publisher","DOI":"10.3115\/1075812.1075857"},{"key":"e_1_2_1_55_1","volume-title":"Proceedings of the International Conference on Spoken Language Processing (Interspeech). 3389--3392","author":"Wesseling W."},{"key":"e_1_2_1_56_1","volume-title":"Proceedings of International Conference on Human-Computer Interaction (HCII-5).","author":"White M."}],"container-title":["ACM Transactions on Speech and Language Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2168748.2168749","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2168748.2168749","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T09:54:44Z","timestamp":1750240484000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2168748.2168749"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,5]]},"references-count":56,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2012,5]]}},"alternative-id":["10.1145\/2168748.2168749"],"URL":"https:\/\/doi.org\/10.1145\/2168748.2168749","relation":{},"ISSN":["1550-4875","1550-4883"],"issn-type":[{"value":"1550-4875","type":"print"},{"value":"1550-4883","type":"electronic"}],"subject":[],"published":{"date-parts":[[2012,5]]},"assertion":[{"value":"2011-09-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2012-02-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2012-05-16","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}