{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,1,21]],"date-time":"2026-01-21T12:23:07Z","timestamp":1768998187667,"version":"3.49.0"},"reference-count":42,"publisher":"Association for Computing Machinery (ACM)","issue":"EICS","license":[{"start":{"date-parts":[[2021,5,27]],"date-time":"2021-05-27T00:00:00Z","timestamp":1622073600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Proc. ACM Hum.-Comput. Interact."],"published-print":{"date-parts":[[2021,5,27]]},"abstract":"<jats:p>Conversational agents are widely used in many situations, especially for speech tutoring. However, their contents and functions are often pre-defined and not customizable for people without technical backgrounds, thus significantly limiting their flexibility and usability. Besides, conventional agents often cannot provide feedback in the middle of training sessions because they lack technical approaches to evaluate users' speech dynamically. We propose JustSpeak: automated and interactive speech tutoring agents with various configurable feedback mechanisms, using any speech recordings with its transcription text as the template for speech training. In JustSpeak, we developed an automated procedure to generate customized tutoring agents from user-inputted templates. Moreover, we created a set of methods to dynamically synchronize speech recognizers' behavior with the agent's tutoring progress, making it possible to detect various speech mistakes dynamically such as being stuck, mispronunciation, and rhythm deviations. Furthermore, we identified the design primitives in JustSpeak to create different novel feedback mechanisms, such as adaptive playback, follow-on training, and passive adaptation. They can be combined to create customized tutoring agents, which we demonstrate with an example for language learning. We believe JustSpeak can create more personalized speech learning opportunities by enabling tutoring agents that are customizable, always available, and easy-to-use.<\/jats:p>","DOI":"10.1145\/3459744","type":"journal-article","created":{"date-parts":[[2021,5,30]],"date-time":"2021-05-30T01:12:52Z","timestamp":1622337172000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":3,"title":["JustSpeak: Automated, User-Configurable, Interactive Agents for Speech Tutoring"],"prefix":"10.1145","volume":"5","author":[{"given":"Xinlei","family":"Zhang","sequence":"first","affiliation":[{"name":"The University of Tokyo \/ Rekimoto Lab, Tokyo, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Takashi","family":"Miyaki","sequence":"additional","affiliation":[{"name":"The University of Tokyo, Bunkyo-ku, Tokyo, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Jun","family":"Rekimoto","sequence":"additional","affiliation":[{"name":"The University of Tokyo, Tokyo, Japan"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2021,5,29]]},"reference":[{"key":"e_1_2_2_1_1","unstructured":"[n.d.]. openFrameworks. https:\/\/openframeworks.cc\/. [n.d.]. openFrameworks. https:\/\/openframeworks.cc\/."},{"key":"e_1_2_2_2_1","doi-asserted-by":"crossref","unstructured":"Jeesoo Bang Sechun Kang and Gary Geunbae Lee. 2013. An automatic feedback system for English speaking integrating pronunciation and prosody assessments. In Speech and Language Technology in Education. Jeesoo Bang Sechun Kang and Gary Geunbae Lee. 2013. An automatic feedback system for English speaking integrating pronunciation and prosody assessments. In Speech and Language Technology in Education.","DOI":"10.21437\/SLaTE.2013-14"},{"key":"e_1_2_2_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/3220134.3220135"},{"key":"e_1_2_2_4_1","volume-title":"Second European Conference on Speech Communication and Technology.","author":"Bernstein Jared","year":"1991","unstructured":"Jared Bernstein and Dimitry Rtischev . 1991 . A voice interactive language instruction system . In Second European Conference on Speech Communication and Technology. Jared Bernstein and Dimitry Rtischev. 1991. A voice interactive language instruction system. In Second European Conference on Speech Communication and Technology."},{"key":"e_1_2_2_6_1","doi-asserted-by":"publisher","DOI":"10.5555\/2615731.2617415"},{"key":"e_1_2_2_7_1","volume-title":"Fifth European Conference on Speech Communication and Technology.","author":"Ehsani Farzad","year":"1997","unstructured":"Farzad Ehsani , Jared Bernstein , Amir Najmi , and Ognjen Todic . 1997 . Subarashii: Japanese interactive spoken language education . In Fifth European Conference on Speech Communication and Technology. Farzad Ehsani, Jared Bernstein, Amir Najmi, and Ognjen Todic. 1997. Subarashii: Japanese interactive spoken language education. In Fifth European Conference on Speech Communication and Technology."},{"key":"e_1_2_2_8_1","unstructured":"Farzad Ehsani and Eva Knodt. 1998. Speech technology in computer-aided language learning: Strengths and limitations of a new CALL paradigm. (1998). Farzad Ehsani and Eva Knodt. 1998. Speech technology in computer-aided language learning: Strengths and limitations of a new CALL paradigm. (1998)."},{"key":"e_1_2_2_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/1570433.1570494"},{"key":"e_1_2_2_10_1","volume-title":"Proceedings of the STiLL Workshop. Citeseer.","author":"Eskenazi Maxine","year":"1998","unstructured":"Maxine Eskenazi and Scott Hansma . 1998 . The Fluency pronunciation trainer . In Proceedings of the STiLL Workshop. Citeseer. Maxine Eskenazi and Scott Hansma. 1998. The Fluency pronunciation trainer. In Proceedings of the STiLL Workshop. Citeseer."},{"key":"e_1_2_2_11_1","volume-title":"Proceedings INSTiL2000","volume":"1","author":"Eskenazi Maxine","year":"2000","unstructured":"Maxine Eskenazi , Yan Ke , Jordi Albornoz , and Katharina Probst . 2000 . The fluency pronunciation trainer: Update and user issues . In Proceedings INSTiL2000 , Vol. 1 . Maxine Eskenazi, Yan Ke, Jordi Albornoz, and Katharina Probst. 2000. The fluency pronunciation trainer: Update and user issues. In Proceedings INSTiL2000, Vol. 1."},{"key":"e_1_2_2_12_1","doi-asserted-by":"publisher","DOI":"10.1007\/s12193-011-0077-1"},{"key":"e_1_2_2_13_1","volume-title":"Proceedings of SIGDIAL. 132--135","author":"Hjalmarsson Anna","year":"2007","unstructured":"Anna Hjalmarsson , Preben Wik , and Jenny Brusk . 2007 . Dealing with DEAL: a dialogue system for conversation training . In Proceedings of SIGDIAL. 132--135 . Anna Hjalmarsson, Preben Wik, and Jenny Brusk. 2007. Dealing with DEAL: a dialogue system for conversation training. In Proceedings of SIGDIAL. 132--135."},{"key":"e_1_2_2_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/2493432.2493502"},{"key":"e_1_2_2_15_1","doi-asserted-by":"publisher","DOI":"10.1109\/TAFFC.2013.2297104"},{"key":"e_1_2_2_16_1","doi-asserted-by":"publisher","DOI":"10.1145\/2774225.2774843"},{"key":"e_1_2_2_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/355588.365140"},{"key":"e_1_2_2_18_1","doi-asserted-by":"publisher","DOI":"10.1007\/11550617_28"},{"key":"e_1_2_2_19_1","volume-title":"Yoo Rhee Oh, Yun-Kyung Lee, Byung Ok Kang, et al.","author":"Kwon Oh-Woog","year":"2015","unstructured":"Oh-Woog Kwon , Kiyoung Lee , Yoon-Hyung Roh , Jin-Xia Huang , Sung-Kwon Choi , Young-Kil Kim , Hyung Bae Jeon , Yoo Rhee Oh, Yun-Kyung Lee, Byung Ok Kang, et al. 2015 . GenieTutor: A Computer-Assisted Second-Language Learning System Based on Spoken Language Understanding. In Natural language dialog systems and intelligent assistants. Springer , 257--262. Oh-Woog Kwon, Kiyoung Lee, Yoon-Hyung Roh, Jin-Xia Huang, Sung-Kwon Choi, Young-Kil Kim, Hyung Bae Jeon, Yoo Rhee Oh, Yun-Kyung Lee, Byung Ok Kang, et al. 2015. GenieTutor: A Computer-Assisted Second-Language Learning System Based on Spoken Language Understanding. In Natural language dialog systems and intelligent assistants. Springer, 257--262."},{"key":"e_1_2_2_20_1","doi-asserted-by":"publisher","DOI":"10.1017\/S0272263116000085"},{"key":"e_1_2_2_21_1","volume-title":"Proceedings: APSIPA ASC 2009: Asia-Pacific Signal and Information Processing Association, 2009 Annual Summit and Conference. Asia-Pacific Signal and Information Processing Association","author":"Lee Akinobu","year":"2009","unstructured":"Akinobu Lee and Tatsuya Kawahara . 2009 . Recent development of open-source speech recognition engine julius . In Proceedings: APSIPA ASC 2009: Asia-Pacific Signal and Information Processing Association, 2009 Annual Summit and Conference. Asia-Pacific Signal and Information Processing Association , 2009 Annual ?, 131--137. Akinobu Lee and Tatsuya Kawahara. 2009. Recent development of open-source speech recognition engine julius. In Proceedings: APSIPA ASC 2009: Asia-Pacific Signal and Information Processing Association, 2009 Annual Summit and Conference. Asia-Pacific Signal and Information Processing Association, 2009 Annual ?, 131--137."},{"key":"e_1_2_2_22_1","doi-asserted-by":"publisher","DOI":"10.1587\/transinf.E97.D.1830"},{"key":"e_1_2_2_23_1","volume-title":"Proc. APSIPA ASC","author":"Lee Sungjin","year":"2010","unstructured":"Sungjin Lee , Hyungjong Noh , Jonghoon Lee , Kyusong Lee , and G Lee . 2010 . POSTECH approaches for dialog-based english conversation tutoring . Proc. APSIPA ASC (2010), 794--803. Sungjin Lee, Hyungjong Noh, Jonghoon Lee, Kyusong Lee, and G Lee. 2010. POSTECH approaches for dialog-based english conversation tutoring. Proc. APSIPA ASC (2010), 794--803."},{"key":"#cr-split#-e_1_2_2_24_1.1","doi-asserted-by":"crossref","unstructured":"John M. Levis. 2018. Rhythm and Intelligibility .Cambridge University Press 127--149. https:\/\/doi.org\/10.1017\/9781108241564.009 10.1017\/9781108241564.009","DOI":"10.1017\/9781108241564.009"},{"key":"#cr-split#-e_1_2_2_24_1.2","doi-asserted-by":"crossref","unstructured":"John M. Levis. 2018. Rhythm and Intelligibility .Cambridge University Press 127--149. https:\/\/doi.org\/10.1017\/9781108241564.009","DOI":"10.1017\/9781108241564.009"},{"key":"e_1_2_2_25_1","doi-asserted-by":"publisher","DOI":"10.1145\/2494603.2480325"},{"key":"e_1_2_2_26_1","doi-asserted-by":"publisher","DOI":"10.1080\/09588220500173344"},{"key":"e_1_2_2_27_1","unstructured":"Jack Mostow et al. 2001. Evaluating tutors that listen: An overview of Project LISTEN. (2001). Jack Mostow et al. 2001. Evaluating tutors that listen: An overview of Project LISTEN. (2001)."},{"key":"e_1_2_2_28_1","volume-title":"Advances in Intelligent Tutoring Systems","author":"Nkambou Roger","unstructured":"Roger Nkambou , Jacqueline Bourdeau , and Val\u00e9ry Psych\u00e9 . 2010. Building intelligent tutoring systems: An overview . In Advances in Intelligent Tutoring Systems . Springer , 361--375. Roger Nkambou, Jacqueline Bourdeau, and Val\u00e9ry Psych\u00e9. 2010. Building intelligent tutoring systems: An overview. In Advances in Intelligent Tutoring Systems. Springer, 361--375."},{"key":"e_1_2_2_29_1","doi-asserted-by":"publisher","DOI":"10.1162\/105474602317343668"},{"key":"e_1_2_2_30_1","volume-title":"InSTIL\/ICALL Symposium","author":"Raux Antoine","year":"2004","unstructured":"Antoine Raux and Maxine Eskenazi . 2004 . Using task-oriented spoken dialogue systems for language learning: potential, practical applications and challenges . In InSTIL\/ICALL Symposium 2004. Antoine Raux and Maxine Eskenazi. 2004. Using task-oriented spoken dialogue systems for language learning: potential, practical applications and challenges. In InSTIL\/ICALL Symposium 2004."},{"key":"e_1_2_2_31_1","volume-title":"VILTS: A tale of two technologies. CALICO journal","author":"Rypa Marikka Elizabeth","year":"1999","unstructured":"Marikka Elizabeth Rypa and Patti Price . 1999 . VILTS: A tale of two technologies. CALICO journal (1999), 385--404. Marikka Elizabeth Rypa and Patti Price. 1999. VILTS: A tale of two technologies. CALICO journal (1999), 385--404."},{"key":"e_1_2_2_32_1","first-page":"97","article-title":"The Time Domain Factors Affecting EFL Learners' Listening Comprehension: a Study on Japanese EFL Learners","volume":"27","author":"Kousuke SUGAI","year":"2016","unstructured":"Kousuke SUGAI , Shigeru YAMANE, and Kazuo KANZAKI. 2016 . The Time Domain Factors Affecting EFL Learners' Listening Comprehension: a Study on Japanese EFL Learners . ARELE: Annual Review of English Language Education in Japan , Vol. 27 (2016), 97 -- 108 . Kousuke SUGAI, Shigeru YAMANE, and Kazuo KANZAKI. 2016. The Time Domain Factors Affecting EFL Learners' Listening Comprehension: a Study on Japanese EFL Learners. ARELE: Annual Review of English Language Education in Japan, Vol. 27 (2016), 97--108.","journal-title":"ARELE: Annual Review of English Language Education in Japan"},{"key":"e_1_2_2_33_1","doi-asserted-by":"publisher","DOI":"10.1145\/3090092"},{"key":"e_1_2_2_34_1","doi-asserted-by":"publisher","DOI":"10.1145\/2702123.2702584"},{"key":"e_1_2_2_35_1","doi-asserted-by":"publisher","DOI":"10.1145\/3267851.3267882"},{"key":"e_1_2_2_36_1","doi-asserted-by":"publisher","DOI":"10.1080\/00461520.2011.611369"},{"key":"e_1_2_2_37_1","doi-asserted-by":"publisher","DOI":"10.1145\/2494603.2480335"},{"key":"e_1_2_2_38_1","volume-title":"A Voice Dialog Editor","author":"Wakabayashi Keitaro","year":"2015","unstructured":"Keitaro Wakabayashi , Daisuke Yamamoto , and Naohisa Takahashi . 2016. A Voice Dialog Editor Based on Finite State Transducer Using Composite State for Tablet Devices. In Computer and Information Science 2015 . Springer , 125--139. Keitaro Wakabayashi, Daisuke Yamamoto, and Naohisa Takahashi. 2016. A Voice Dialog Editor Based on Finite State Transducer Using Composite State for Tablet Devices. In Computer and Information Science 2015. Springer, 125--139."},{"key":"e_1_2_2_39_1","doi-asserted-by":"publisher","DOI":"10.1080\/0958822950080403"},{"key":"e_1_2_2_40_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0167-6393(99)00044-8"},{"key":"e_1_2_2_41_1","volume-title":"Workshop on Speech and Language Technology in Education.","author":"Yoshimura Yuki","year":"2007","unstructured":"Yuki Yoshimura and Brian MacWhinney . 2007 . The effect of oral repetition on L2 speech fluency: An experimental tool and language tutor . In Workshop on Speech and Language Technology in Education. Yuki Yoshimura and Brian MacWhinney. 2007. The effect of oral repetition on L2 speech fluency: An experimental tool and language tutor. In Workshop on Speech and Language Technology in Education."},{"key":"e_1_2_2_42_1","doi-asserted-by":"publisher","DOI":"10.1145\/3313831.3376322"}],"container-title":["Proceedings of the ACM on Human-Computer Interaction"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3459744","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3459744","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:48:50Z","timestamp":1750193330000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3459744"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,5,27]]},"references-count":42,"journal-issue":{"issue":"EICS","published-print":{"date-parts":[[2021,5,27]]}},"alternative-id":["10.1145\/3459744"],"URL":"https:\/\/doi.org\/10.1145\/3459744","relation":{},"ISSN":["2573-0142"],"issn-type":[{"value":"2573-0142","type":"electronic"}],"subject":[],"published":{"date-parts":[[2021,5,27]]},"assertion":[{"value":"2021-05-29","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}