{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,6,10]],"date-time":"2026-06-10T14:43:36Z","timestamp":1781102616216,"version":"3.54.1"},"reference-count":37,"publisher":"IGI Global Scientific Publishing","issue":"4","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2012,10,1]]},"abstract":"<p>The pervasiveness of mobile handheld devices and advancement in real-time continuous speech recognition technology has opened up a wide range of research opportunities in human-computer interaction for those devices. On the one hand, there has been an increasing amount of research on developing user-friendly speech recognition solutions and applications for mobile handheld devices. On the other hand, there are many distinct challenges in mobile speech recognition. Aiming to gain a good understanding of this emerging yet challenging area and provide a research map, this paper presents a state-of-the-art overview of this field. We will discuss three main architectures of mobile speech recognition systems, analyze their strengths and weaknesses, introduce some major research issues in the field, and highlight a number of major applications of speech recognition on handheld devices. The authors will also shed some light into important future research issues as a road map for researchers and practitioners.<\/p>","DOI":"10.4018\/jhcr.2012100103","type":"journal-article","created":{"date-parts":[[2012,12,19]],"date-time":"2012-12-19T10:10:44Z","timestamp":1355911844000},"page":"40-55","source":"Crossref","is-referenced-by-count":1,"title":["Hype or Ready for Prime Time?"],"prefix":"10.4018","volume":"3","author":[{"given":"Dongsong","family":"Zhang","sequence":"first","affiliation":[{"name":"Department of Information Systems, University of Maryland, Baltimore County, Baltimore, MD, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Hsien-Ming","family":"Chou","sequence":"additional","affiliation":[{"name":"Department of Information Systems, University of Maryland, Baltimore County, Baltimore, MD, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]},{"given":"Lina","family":"Zhou","sequence":"additional","affiliation":[{"name":"Department of Information Systems, University of Maryland, Baltimore County, Baltimore, MD, USA"}],"role":[{"vocabulary":"crossref","role":"author"}]}],"member":"2432","reference":[{"key":"jhcr.2012100103-0","doi-asserted-by":"publisher","DOI":"10.1007\/s10772-009-9025-9"},{"key":"jhcr.2012100103-1","doi-asserted-by":"crossref","unstructured":"Cohen, J. (2008, March 31-April 4). Embedded speech recognition applications in mobile phones: status, trends, and challenges. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Las Vegas, NV (pp. 5352-5355).","DOI":"10.1109\/ICASSP.2008.4518869"},{"key":"jhcr.2012100103-2","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-84800-143-5_17"},{"key":"jhcr.2012100103-3","doi-asserted-by":"crossref","unstructured":"Durling, S., & Lumsden, J. (2008, November 24-26). Speech recognition use in healthcare applications. In Proceedings of the Sixth International Conference on Advances in Mobile Computing and Multimedia, Linz, Austria (pp. 473-478).","DOI":"10.1145\/1497185.1497286"},{"key":"jhcr.2012100103-4","doi-asserted-by":"crossref","unstructured":"Fabbrizio, G. D., Okken, T., & Wilpon, J. G. (2009, November 2-4). A speech mashup framework for multimodal mobile services. In Proceedings of the 11th International Conference on Multimodal Interfaces, Cambridge, MA (pp. 71-78).","DOI":"10.1145\/1647314.1647329"},{"key":"jhcr.2012100103-5","doi-asserted-by":"publisher","DOI":"10.1207\/s15327590ijhc1903_1"},{"key":"jhcr.2012100103-6","doi-asserted-by":"publisher","DOI":"10.1109\/TCE.2008.4637616"},{"issue":"2","key":"jhcr.2012100103-7","doi-asserted-by":"crossref","first-page":"135","DOI":"10.3233\/TAD-2008-20208","article-title":"Speech technologies for blind and low vision persons.","volume":"20","author":"D.Freitas","year":"2008","journal-title":"Technology and Disability"},{"key":"jhcr.2012100103-8","unstructured":"Goletsis, Y., Anagnostakis, A., Pavlopoulos, S., & Fotiadis, D. (2004). Voice mediated platform for structured entry of medical data. In Proceedings of the IASTED International Conference on Biomedical Engineering, Innsbruck, Austria (pp.16-18)."},{"key":"jhcr.2012100103-9","unstructured":"Huerta, J. (2000). Speech recognition in mobile environments (Unpublished doctoral dissertation). Carnegie Mellon University, Pittsburgh, PA."},{"key":"jhcr.2012100103-10","doi-asserted-by":"crossref","unstructured":"Jiang, Y., Wang, X., Tian, F., Ao, X., Dai, G., & Wang, H. (2008, January 13-16). Multimodal Chinese text entry with speech and keypad on mobile devices. In Proceedings of the International Conference on Intelligent User Interfaces, Canary Islands, Spain (pp. 341-344).","DOI":"10.1145\/1378773.1378825"},{"key":"jhcr.2012100103-11","unstructured":"Kane, S., Bigham, J., & Wobbrock, J. (2008). Making mobile touch screens accessible to blind people using multi-touch interaction techniques. In Proceedings of the ACM Assets Conference, Halifax, NS, Canada (pp. 13-15)."},{"key":"jhcr.2012100103-12","doi-asserted-by":"crossref","unstructured":"Kart, F., Miao, G., Moser, L., & Melliar-Smith, P. (2007). A distributed e-healthcare system based on the service oriented architecture. In Proceedings of the IEEE International Conference on Services Computing, Salt Lake City, UT (pp. 652-659).","DOI":"10.1109\/SCC.2007.2"},{"key":"jhcr.2012100103-13","first-page":"95","article-title":"The use of speed-up techniques for a speech recognizer system.","author":"A.Kocsor","year":"2008","journal-title":"International Journal of Speech Technology"},{"key":"jhcr.2012100103-14","doi-asserted-by":"publisher","DOI":"10.1080\/02533839.2009.9671481"},{"key":"jhcr.2012100103-15","doi-asserted-by":"publisher","DOI":"10.1109\/TCE.2009.5373813"},{"key":"jhcr.2012100103-16","doi-asserted-by":"crossref","unstructured":"L\u00e9vy, C., Linares, G., & Bonastre, J. (2009). Compact acoustic models for embedded speech recognition. Journal on Audio, Speech, and Music Processing, 1-12.","DOI":"10.1155\/2009\/806186"},{"key":"jhcr.2012100103-17","doi-asserted-by":"crossref","unstructured":"Li, X., Deng, L., Ju, Y., & Acero, A. (2008, September 22-26). Automatic children\u2019s reading tutor on hand-held devices. In Proceedings of Interspeech, Brisbane, Australia (pp. 1733-1736).","DOI":"10.21437\/Interspeech.2008-467"},{"key":"jhcr.2012100103-18","author":"M.Lingdell","year":"2008","journal-title":"Embedded, speaker independent speech recognition of connected digits on Java ME enabled cell phones"},{"key":"jhcr.2012100103-19","doi-asserted-by":"crossref","unstructured":"Mart\u00ed, R., & Delgado, J. (2004, March 29-31). Security specification and implementation for mobile e-health services. In Proceedings of the IEEE International Conference on e-Technology, e-Commerce and e-Service, Taipei, Taiwan (pp. 241-248).","DOI":"10.1109\/EEE.2004.1287316"},{"key":"jhcr.2012100103-20","doi-asserted-by":"crossref","unstructured":"Matthews, T., Carter, S., Pai, C., Fong, J., & Mankoff, J. (2006, September 17-21). Scribe4Me: Evaluating a mobile sound transcription tool for the deaf. In Proceedings of the International Conference on Ubiquitous Computing, Orange County, CA (pp. 17-21).","DOI":"10.1007\/11853565_10"},{"key":"jhcr.2012100103-21","doi-asserted-by":"crossref","unstructured":"Melto, A., Turunen, M., Kainulainen, A., Hakulinen, J., Heimonen, T., & Antila, V. (2008, September 2-5). Evaluation of predictive text and speech inputs in a multimodal mobile route guidance application. In Proceedings of the 10th International Conference on Human-Computer Interaction with Mobile Devices and Services, Amsterdam, The Netherlands (pp. 355-358).","DOI":"10.1145\/1409240.1409287"},{"issue":"3","key":"jhcr.2012100103-22","doi-asserted-by":"crossref","first-page":"225","DOI":"10.3233\/TAD-2008-20305","article-title":"An interfacing system that enables speech generating device users to independently access and use a mobile phone.","volume":"20","author":"T.Nguyen","year":"2008","journal-title":"Technology and Disability"},{"key":"jhcr.2012100103-23","doi-asserted-by":"crossref","unstructured":"Padmanabhan, M., Eide, E., Ramabhadran, B., Ramaswamy, G., & Bahl, L. (1998, May 12-15). Speech recognition performance on a voicemail transcription task. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Seattle, WA (pp. 913-916).","DOI":"10.1109\/ICASSP.1998.675414"},{"key":"jhcr.2012100103-24","doi-asserted-by":"publisher","DOI":"10.1007\/s11257-006-9021-6"},{"key":"jhcr.2012100103-25","first-page":"375","article-title":"Energy aware speech recognition for mobile devices","author":"Z.Tan","year":"2008","journal-title":"Automatic speech recognition on mobile devices and over communication networks"},{"key":"jhcr.2012100103-26","doi-asserted-by":"publisher","DOI":"10.1007\/978-1-84800-143-5_1"},{"key":"jhcr.2012100103-27","doi-asserted-by":"crossref","unstructured":"Vertanen, K., & Kristensson, P. (2008, April 5-10). On the benefits of confidence visualization in speech recognition. In Proceeding of the Twenty-Sixth Annual SIGCHI Conference on Human Factors in Computing Systems, Florence, Italy (pp. 1497-1500).","DOI":"10.1145\/1357054.1357288"},{"key":"jhcr.2012100103-28","doi-asserted-by":"crossref","unstructured":"Vertanen, K., & Kristensson, P. (2009a, February 8-11). Parakeet: A continuous speech recognition system for mobile touch-screen devices. In Proceedings of the 14th International Conference on Intelligent User Interfaces, Sanibel Island, FL (pp. 237-246).","DOI":"10.1145\/1502650.1502685"},{"key":"jhcr.2012100103-29","doi-asserted-by":"crossref","unstructured":"Vertanen, K., & Kristensson, P. (2009b, September 6-10). Recognition and correction of voice web search queries. In Proceedings of the 10th Annual Conference of the International Speech Communication, Brighton, UK (pp. 1863-1866).","DOI":"10.21437\/Interspeech.2009-541"},{"key":"jhcr.2012100103-30","doi-asserted-by":"publisher","DOI":"10.1109\/TCE.2008.4560173"},{"key":"jhcr.2012100103-31","unstructured":"Zaykovskiy, D. (2006, June 25-29). Survey of the speech recognition techniques for mobile devices. In Proceedings of the 13th International Conference on Speech and Computer, Petersburg, FL (pp. 88-93)."},{"key":"jhcr.2012100103-32","doi-asserted-by":"crossref","unstructured":"Zaykovskiy, D., & Schmitt, A. (2008, July 21-22). Java vs. Symbian: A comparison of software-based DSR implementations on mobile phones. In Proceedings of the 4th IET International Conference on Intelligent Environments, Seattle, WA (pp. 1-6).","DOI":"10.1049\/cp:20081110"},{"key":"jhcr.2012100103-33","doi-asserted-by":"crossref","unstructured":"Zhang, R., North, S., & Koutsofios, E. (2010, September 7-10). A comparison of speech and GUI input for navigation in complex visualizations on mobile devices. In Proceedings of the International Conference on Human Computer Interaction with Mobile Devices and Services, Lisboa, Portugal (pp. 357-360).","DOI":"10.1145\/1851600.1851665"},{"key":"jhcr.2012100103-34","unstructured":"Zhou, B., Gao, Y., Sorensen, J., D\u00e9chelotte, D., & Picheny, M. (2003, November 30-December 3). A hand-held speech-to-speech translation system. In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding, Virgin Islands, USA (pp. 664-669)."},{"key":"jhcr.2012100103-35","doi-asserted-by":"publisher","DOI":"10.1109\/TSA.2005.851874"},{"key":"jhcr.2012100103-36","doi-asserted-by":"publisher","DOI":"10.2753\/MIS0742-1222220409"}],"container-title":["International Journal of Handheld Computing Research"],"original-title":[],"language":"ng","link":[{"URL":"https:\/\/www.igi-global.com\/viewtitle.aspx?TitleId=73805","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,4,23]],"date-time":"2025-04-23T19:28:42Z","timestamp":1745436522000},"score":1,"resource":{"primary":{"URL":"https:\/\/services.igi-global.com\/resolvedoi\/resolve.aspx?doi=10.4018\/jhcr.2012100103"}},"subtitle":["Speech Recognition on Mobile Handheld Devices (MASR)"],"short-title":[],"issued":{"date-parts":[[2012,10,1]]},"references-count":37,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2012,10]]}},"URL":"https:\/\/doi.org\/10.4018\/jhcr.2012100103","relation":{},"ISSN":["1947-9158","1947-9166"],"issn-type":[{"value":"1947-9158","type":"print"},{"value":"1947-9166","type":"electronic"}],"subject":[],"published":{"date-parts":[[2012,10,1]]}}}