{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2023,7,26]],"date-time":"2023-07-26T08:46:51Z","timestamp":1690361211659},"reference-count":48,"publisher":"IGI Global","issue":"2","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2011,4,1]]},"abstract":"
This paper presents a novel framework for easy creation of interactive, platform-independent voice-services with an animated 3D talking-head interface, on mobile phones. The Framework supports automated multi-modal interaction using speech and 3D graphics. The difficulty of synchronizing the audio stream to the animation is examined and alternatives for distributed network control of the animation and application logic is discussed. The ability of modern mobile devices to handle such applications is documented and it is shown that the power consumption trade-off of rendering on the mobile phone versus streaming from the server favors the phone. The presented tools will empower developers and researchers in future research and usability studies in the area of mobile talking-head applications (Figure 1). These may be used for example in entertainment, commerce, health care or education.<\/p>","DOI":"10.4018\/jmhci.2011040104","type":"journal-article","created":{"date-parts":[[2011,10,19]],"date-time":"2011-10-19T16:41:21Z","timestamp":1319042481000},"page":"50-64","source":"Crossref","is-referenced-by-count":7,"title":["3D Talking-Head Interface to Voice-Interactive Services on Mobile Phones"],"prefix":"10.4018","volume":"3","author":[{"given":"Jiri","family":"Danihelka","sequence":"first","affiliation":[{"name":"Czech Technical University in Prague, Czech Republic"}]},{"given":"Roman","family":"Hak","sequence":"additional","affiliation":[{"name":"Czech Technical University in Prague, Czech Republic"}]},{"given":"Lukas","family":"Kencl","sequence":"additional","affiliation":[{"name":"Czech Technical University in Prague, Czech Republic"}]},{"given":"Jiri","family":"Zara","sequence":"additional","affiliation":[{"name":"Czech Technical University in Prague, Czech Republic"}]}],"member":"2432","reference":[{"key":"jmhci.2011040104-0","unstructured":"acbPocketSoft (n. d.). acbTaskMan for PocketPC. Retrieved from http:\/\/www.acbpocketsoft.com"},{"key":"jmhci.2011040104-1","doi-asserted-by":"publisher","DOI":"10.1109\/TMC.2006.71"},{"key":"jmhci.2011040104-2","doi-asserted-by":"crossref","unstructured":"Agarwal, S. K., Chakraborty, D., Kumar, A., Nanavati, A. A., & Rajput, N. (2007). HSTP: Hyperspeech transfer protocol. In Proceedings of the Eighteenth Conference on Hypertext and Hypermedia (pp. 67-76). New York, NY: ACM Press.","DOI":"10.1145\/1286240.1286262"},{"key":"jmhci.2011040104-3","doi-asserted-by":"crossref","unstructured":"Agarwal, S. K., Kumar, A., Nanavati, A., & Rajput, N. (2008). The world wide telecom web browser. In Proceeding of the 17th International Conference on World Wide Web (pp. 1121-1122). New York, NY: ACM Press.","DOI":"10.1145\/1367497.1367686"},{"key":"jmhci.2011040104-4","unstructured":"Albrecht, I., Haber, J., & Seidel, H. (2002). Speech synchronization for physicsbased facial animation. In Proceedings of the 10th International Conference on Computer Graphics, Visualization, and Computer Vision (pp. 9-16)."},{"key":"jmhci.2011040104-5","unstructured":"Alexa, M., Berner, U., Hellenschmidt, M., & Rieger, T. (2001). An animation system for user interface agents. In Proceedings of the 6th International European Conference on Computer Graphics, Visualization, and Computer Vision."},{"key":"jmhci.2011040104-6","doi-asserted-by":"crossref","unstructured":"Balci, K. (2005). Xface: Open source toolkit for creating 3d faces of an embodied conversational agent. In Proceedings of the International Conference on Smart Graphics (pp. 263-266).","DOI":"10.1007\/11536482_25"},{"key":"jmhci.2011040104-7","doi-asserted-by":"crossref","unstructured":"Balci, K., Not, E., Zancanaro, M., & Pianesi, F. (2007). Xface: Open source project and smil-agent scripting language for creating and animating embodied conversational agents. In Proceedings of the 15th International Conference on Multimedia (pp. 1013-1016). New York, NY: ACM Press.","DOI":"10.1145\/1291233.1291453"},{"key":"jmhci.2011040104-8","unstructured":"Black, A., & Lenzo, K. (2001). Flite: A small fast run-time synthesis engine. In Proceedings of the 4th ISCA Tutorial and Research Workshop on Speech Synthesis (pp. 20-24)."},{"key":"jmhci.2011040104-9","doi-asserted-by":"publisher","DOI":"10.1109\/MCG.2008.109"},{"key":"jmhci.2011040104-10","doi-asserted-by":"crossref","unstructured":"Cassell, J., Vilhjalmsson, H. H., & Bickmore, T. (2001). Beat: The behavior expression animation toolkit. In Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques (pp. 477-486). New York, NY: ACM Press.","DOI":"10.1145\/383259.383315"},{"key":"jmhci.2011040104-11","doi-asserted-by":"crossref","unstructured":"Choi, S.-M., Kim, Y.-G., Lee, D.-S., Lee, S.-O., & Park, G.-T. (2004). Nonphotorealistic 3-d facial animation on the PDA based on facial expression recognition. In A. Butz, A. Kruger, & P. Olivier (Eds.), Proceedings of the 4th International Symposium on Smart Graphics (LNCS 3031, pp. 11-20).","DOI":"10.1007\/978-3-540-24678-7_2"},{"key":"jmhci.2011040104-12","unstructured":"Danihelka, J., Kencl, L., & Zara, J. (2010). Reduction of animated models for embedded devices. In Proceedings of the 18th International European Conference on Computer Graphics, Visualization, and Computer Vision (pp. 89-95)."},{"key":"jmhci.2011040104-13","doi-asserted-by":"publisher","DOI":"10.1023\/B:VLSI.0000015095.19623.73"},{"key":"jmhci.2011040104-14","unstructured":"Devevey, P., Lorenzon, N., & Tambary, C. (2005). Measuring wireless energy consumption on PDAs and on laptops (Tech. Rep. No. 2005). Genoa, Italy: University of Genoa."},{"key":"jmhci.2011040104-15","doi-asserted-by":"publisher","DOI":"10.1080\/088395199117423"},{"key":"jmhci.2011040104-16","doi-asserted-by":"publisher","DOI":"10.1016\/0167-8396(94)90032-9"},{"key":"jmhci.2011040104-17","unstructured":"Huggins-Daines, D., Kumar, M., Chan, A., Black, A., Ravishankar, M., & Rudnicky, A. (2006). Pocketsphinx: A free, real-time continuous speech recognition system for hand-held devices. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (p. 1). Washington, DC: IEEE Computer Society."},{"key":"jmhci.2011040104-18","unstructured":"Kadous, M., & Sammut, C. (2002). Mobile conversational characters. Virtual conversational characters: Applications, methods, and research challenge. Paper presented at the Joint HF\/OZCHI Workshop on Human Factors and Human-Computer Interaction, Melbourne, Australia."},{"key":"jmhci.2011040104-19","first-page":"1","article-title":"A multimodal 3D healthcare communication system. In","volume":"3DTV","author":"C.Keskin","year":"2007","journal-title":"Proceedings of the Conference on"},{"key":"jmhci.2011040104-20","unstructured":"Kishonti Informatics. (2003). GL benchmark. Retrieved from http:\/\/glbenchmark.com"},{"key":"jmhci.2011040104-21","unstructured":"Koommey, J. G., Berard, S., Sanchez, M., & Wong, H. (2009). Assessing trends in the electrical effciency of computation over time. IEEE Annals of the History of Computing."},{"key":"jmhci.2011040104-22","unstructured":"Kronous Groups. (n. d.). OpenGL ES - the standard for embedded accelerated 3D graphics. Retrieved from http:\/\/www.khronos.org\/opengles\/"},{"key":"jmhci.2011040104-23","unstructured":"Kshirsagar, S., Magnenat-Thalmann, N., Guye-Vuilleme, A., Thalmann, D., Kamyab, K., & Mamdani, E. (2002). Avatar markup language. In Proceedings of the Workshop on Virtual Environments, Aire-la-Ville, Switzerland (pp. 169-177)."},{"key":"jmhci.2011040104-24","doi-asserted-by":"crossref","unstructured":"Kumar, A., Rajput, N., Chakraborty, D., Agarwal, S. K., & Nanavati, A. A. (2007). WWTW: The world wide telecom web. In Proceedings of the Workshop on Networked Systems for Developing Regions (pp. 1-6). New York, NY: ACM Press.","DOI":"10.1145\/1326571.1326582"},{"key":"jmhci.2011040104-25","doi-asserted-by":"crossref","unstructured":"Kunc, L., & Kleindienst, J. (2007). ECAF: Authoring language for embodied conversational agents. In V. Matousek & P. Mautner (Eds.), Proceedings of the 10th International Conference on Text, Speech, and Dialogue (LNCS 4629, pp. 206-213).","DOI":"10.1007\/978-3-540-74628-7_28"},{"key":"jmhci.2011040104-26","doi-asserted-by":"crossref","unstructured":"Kunc, L., Slavik, P., & Kleindienst, J. (2008). Talking head as life blog. In P. Sojka, A. Horak, I. Kopecek, & K. Pala (Eds.), Proceedings of the 11thInternational Conference on Text, Speech, and Dialogue (LNCS 5246, pp. 365-372).","DOI":"10.1007\/978-3-540-87391-4_47"},{"key":"jmhci.2011040104-27","first-page":"44","article-title":"A simple, fast, and effective polygon reduction algorithm.","volume":"11","author":"S.Melax","year":"1998","journal-title":"Game Developer"},{"key":"jmhci.2011040104-28","doi-asserted-by":"crossref","unstructured":"Mochocki, B., Lahiri, K., & Cadambi, S. (2006). Power analysis of mobile 3d graphics. In Proceedings of the Conference on Design, Automation and Test in Europe, Leuven, Belgium (pp. 502-507).","DOI":"10.1109\/DATE.2006.243859"},{"key":"jmhci.2011040104-29","doi-asserted-by":"crossref","unstructured":"Nass, C., Moon, Y., Fogg, B. J., Reeves, B., & Dryer, C. (1995). Can computer personalities be human personalities? In Proceedings of the Conference Companion on Human Factors in Computing Systems (pp. 228-229). New York, NY: ACM Press.","DOI":"10.1145\/223355.223538"},{"key":"jmhci.2011040104-30","doi-asserted-by":"crossref","unstructured":"Ortiz, A., del Puy Carretero, M., Oyarzun, D., Yanguas, J., Buiza, C., Gonzalez, M., et al. (2007). Elderly users in ambient intelligence: Does an avatar improve the interaction? In C. Stephanidis & M. Pieper (Eds.), Proceedings of the 9th Conference on User Interfaces for All (LNCS 4397, pp. 99-114).","DOI":"10.1007\/978-3-540-71025-7_8"},{"key":"jmhci.2011040104-31","doi-asserted-by":"crossref","unstructured":"Pandzic, I. S. (2002). Facial animation framework for the web and mobile platforms. In Proceedings of the Seventh International Conference on 3D Web Technology (pp. 27-34). New York, NY: ACM Press.","DOI":"10.1145\/504502.504507"},{"key":"jmhci.2011040104-32","unstructured":"Pandzic, I. S., Ahlberg, J., Wzorek, M., Rudol, P., & Mosmondor, M. (2003). Faces everywhere: Towards ubiquitous production and delivery of face animation. Paper presented at the 2nd International Conference on Mobile and Ubiquitous Multimedia, Norrkoping, Sweden."},{"key":"jmhci.2011040104-33","doi-asserted-by":"publisher","DOI":"10.1002\/0470854626.part2"},{"key":"jmhci.2011040104-34","doi-asserted-by":"publisher","DOI":"10.1109\/MCOM.2010.5394036"},{"key":"jmhci.2011040104-35","doi-asserted-by":"crossref","unstructured":"Poller, P., & Muller, J. (2002). Distributed audio-visual speech synchronization. In Proceedings of the Seventh International Conference on Spoken Language Processing (pp. 205-208).","DOI":"10.21437\/ICSLP.2002-121"},{"key":"jmhci.2011040104-36","doi-asserted-by":"publisher","DOI":"10.1145\/1121112.1121113"},{"key":"jmhci.2011040104-37","unstructured":"Qt Software. (2008). Qt cross-platform application framework. Retrieved from http:\/\/qt.nokia.com\/products"},{"key":"jmhci.2011040104-38","doi-asserted-by":"crossref","unstructured":"Ramakrishnan, I. V., Stent, A., & Yang, G. (2004). Hearsay: Enabling audio browsing on hypertext content. In Proceedings of the 13th International Conference on World Wide Web (pp. 80-89). New York, NY: ACM Press.","DOI":"10.1145\/988672.988684"},{"key":"jmhci.2011040104-39","doi-asserted-by":"publisher","DOI":"10.1111\/1467-8659.1540191"},{"key":"jmhci.2011040104-40","doi-asserted-by":"crossref","unstructured":"Shrestha, S. (2007). Mobile web browsing: Usability study. In Proceedings of the 4th International Conference on Mobile Technology, Applications, and Systems and the 1st International Symposium on Computer Human Interaction in Mobile Technology (pp. 187-194). New York, NY: ACM Press.","DOI":"10.1145\/1378063.1378094"},{"issue":"1","key":"jmhci.2011040104-41","first-page":"17","article-title":"Observations on power-effciency trends in mobile communication devices.","author":"O.Silven","year":"2007","journal-title":"EURASIP Journal on Embedded Systems"},{"key":"jmhci.2011040104-42","unstructured":"Singular Inversion. (2010). FaceGen. Retrieved from http:\/\/www.facegen.com"},{"key":"jmhci.2011040104-43","doi-asserted-by":"crossref","unstructured":"Sun, Z., Stent, A., & Ramakrishnan, I. V. (2006). Dialog generation for voice browsing. In Proceedings of the International Crossdisciplinary Workshop on Web Accessibility (pp. 49-56). New York, NY: ACM Press.","DOI":"10.1145\/1133219.1133228"},{"key":"jmhci.2011040104-44","doi-asserted-by":"crossref","unstructured":"Wagner, D., Billinghurst, M., & Schmalstieg, D. (2006). How real should virtual characters be? In Proceedings of the ACM SIGCHI International Conference on Advances in Computer Entertainment Technology (p. 57). New York, NY: ACM Press.","DOI":"10.1145\/1178823.1178891"},{"key":"jmhci.2011040104-45","doi-asserted-by":"crossref","unstructured":"Wang, A., Emmi, M., & Faloutsos, P. (2007). Assembling an expressive facial animation system. In Proceedings of the ACM SIGGRAPH Symposium on Video Games (pp. 21-26). New York, NY: ACM Press.","DOI":"10.1145\/1274940.1274947"},{"key":"jmhci.2011040104-46","unstructured":"Waters, K., & Levergood, T. (1993). DECface: An automatic lipsynchronization algorithm for synthetic faces (Tech. Rep. No. 93\/4). Cambridge, MA: Cambridge Research Laboratory."},{"key":"jmhci.2011040104-47","doi-asserted-by":"crossref","unstructured":"Yin, M., & Zhai, S. (2006). The benefits of augmenting telephone voice menu navigation with visual browsing and search. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 319-328). New York, NY: ACM Press.","DOI":"10.1145\/1124772.1124821"}],"container-title":["International Journal of Mobile Human Computer Interaction"],"original-title":[],"language":"ng","link":[{"URL":"https:\/\/www.igi-global.com\/viewtitle.aspx?TitleId=53216","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,6,10]],"date-time":"2023-06-10T15:59:10Z","timestamp":1686412750000},"score":1,"resource":{"primary":{"URL":"https:\/\/services.igi-global.com\/resolvedoi\/resolve.aspx?doi=10.4018\/jmhci.2011040104"}},"subtitle":[""],"short-title":[],"issued":{"date-parts":[[2011,4,1]]},"references-count":48,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2011,4]]}},"URL":"http:\/\/dx.doi.org\/10.4018\/jmhci.2011040104","relation":{},"ISSN":["1942-390X","1942-3918"],"issn-type":[{"value":"1942-390X","type":"print"},{"value":"1942-3918","type":"electronic"}],"subject":["Human-Computer Interaction"],"published":{"date-parts":[[2011,4,1]]}}}