{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:23:18Z","timestamp":1750220598012,"version":"3.41.0"},"reference-count":39,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2020,11,10]],"date-time":"2020-11-10T00:00:00Z","timestamp":1604966400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100007601","name":"Horizon 2020","doi-asserted-by":"publisher","award":["833057"],"award-info":[{"award-number":["833057"]}],"id":[{"id":"10.13039\/501100007601","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100011033","name":"Agencia Estatal de Investigaci\u00f3n","doi-asserted-by":"crossref","award":["TIN2016-75344-R"],"award-info":[{"award-number":["TIN2016-75344-R"]}],"id":[{"id":"10.13039\/501100011033","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100003741","name":"Instituci\u00f3 Catalana de Recerca i Estudis Avan\u00e7ats","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100003741","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Ministerio de Ciencia, Innovaci\u00f3n y Universidades","award":["BES-2017-080605"],"award-info":[{"award-number":["BES-2017-080605"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2020,12,31]]},"abstract":"<jats:p>Automatic Speech Recognition (ASR) has experienced a dramatic evolution since pioneer development of Bell Lab\u2019s single-digit recognizer more than 50 years ago. Current ASR systems have taken advantage of the tremendous improvements in AI during the past decade by incorporating Deep Neural Networks into the system and pushing their accuracy to levels comparable to that of humans. This article describes and characterizes a representative ASR system with state-of-the-art accuracy and proposes a hardware platform capable of decoding speech in real-time with a power dissipation close to 1 Watt. The software is based on the so-called hybrid approach with a vocabulary of 200K words and RNN-based language model re-scoring, whereas the hardware consists of a commercially available low-power processor along with two accelerators used for the most compute-intensive tasks. The article shows that high performance can be obtained with very low power, enabling the deployment of these systems in extremely power-constrained environments such as mobile and IoT devices.<\/jats:p>","DOI":"10.1145\/3425604","type":"journal-article","created":{"date-parts":[[2020,11,10]],"date-time":"2020-11-10T23:16:11Z","timestamp":1605050171000},"page":"1-19","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["Design and Evaluation of an Ultra Low-power Human-quality Speech Recognition System"],"prefix":"10.1145","volume":"17","author":[{"given":"Dennis","family":"Pinto","sequence":"first","affiliation":[{"name":"Universitat Polit\u00e8cnica de Catalunya, Barcelona, Spain"}]},{"given":"Jose-Mar\u00eda","family":"Arnau","sequence":"additional","affiliation":[{"name":"Universitat Polit\u00e8cnica de Catalunya, Barcelona, Spain"}]},{"given":"Antonio","family":"Gonz\u00e1lez","sequence":"additional","affiliation":[{"name":"Universitat Polit\u00e8cnica de Catalunya, Barcelona, Spain"}]}],"member":"320","published-online":{"date-parts":[[2020,11,10]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Proceedings of the International Conference on Machine Learning. 173--182","author":"Amodei Dario","year":"2016","unstructured":"Dario Amodei , Sundaram Ananthanarayanan , Rishita Anubhai , Jingliang Bai , Eric Battenberg , Carl Case , Jared Casper , Bryan Catanzaro , Qiang Cheng , Guoliang Chen , et\u00a0al. 2016 . Deep Speech 2: End-to-end speech recognition in English and Mandarin . In Proceedings of the International Conference on Machine Learning. 173--182 . Dario Amodei, Sundaram Ananthanarayanan, Rishita Anubhai, Jingliang Bai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Qiang Cheng, Guoliang Chen, et\u00a0al. 2016. Deep Speech 2: End-to-end speech recognition in English and Mandarin. In Proceedings of the International Conference on Machine Learning. 173--182."},{"key":"e_1_2_1_2_1","volume-title":"Bourlard and Nelson Morgan","author":"Herve","year":"2012","unstructured":"Herve A. Bourlard and Nelson Morgan . 2012 . Connectionist Speech Recognition: A Hybrid Approach. Vol. 247 . Springer Science 8 Business Media. Herve A. Bourlard and Nelson Morgan. 2012. Connectionist Speech Recognition: A Hybrid Approach. Vol. 247. Springer Science 8 Business Media."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/2541940.2541967"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.1109\/SASP.2011.5941078"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASL.2010.2064307"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/89.466659"},{"key":"e_1_2_1_7_1","volume-title":"Proceedings of the International Conference on Machine Learning. 1764--1772","author":"Graves Alex","year":"2014","unstructured":"Alex Graves and Navdeep Jaitly . 2014 . Towards end-to-end speech recognition with recurrent neural networks . In Proceedings of the International Conference on Machine Learning. 1764--1772 . Alex Graves and Navdeep Jaitly. 2014. Towards end-to-end speech recognition with recurrent neural networks. In Proceedings of the International Conference on Machine Learning. 1764--1772."},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASRU.2013.6707742"},{"key":"e_1_2_1_9_1","volume-title":"IEEE Sig. Proc. Mag. 29","author":"Hinton Geoffrey","year":"2012","unstructured":"Geoffrey Hinton , Li Deng , Dong Yu , George Dahl , Abdel-rahman Mohamed, Navdeep Jaitly , Andrew Senior , Vincent Vanhoucke , Patrick Nguyen , Brian Kingsbury , et\u00a0al. 2012 . Deep neural networks for acoustic modeling in speech recognition . IEEE Sig. Proc. Mag. 29 (2012). Geoffrey Hinton, Li Deng, Dong Yu, George Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Brian Kingsbury, et\u00a0al. 2012. Deep neural networks for acoustic modeling in speech recognition. IEEE Sig. Proc. Mag. 29 (2012)."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASL.2006.889790"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2005-340"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASRU.2011.6163922"},{"key":"e_1_2_1_13_1","volume-title":"Jasper: An end-to-end convolutional neural acoustic model. arXiv preprint arXiv:1904.03288","author":"Li Jason","year":"2019","unstructured":"Jason Li , Vitaly Lavrukhin , Boris Ginsburg , Ryan Leary , Oleksii Kuchaiev , Jonathan M. Cohen , Huyen Nguyen , and Ravi Teja Gadde . 2019 . Jasper: An end-to-end convolutional neural acoustic model. arXiv preprint arXiv:1904.03288 (2019). Jason Li, Vitaly Lavrukhin, Boris Ginsburg, Ryan Leary, Oleksii Kuchaiev, Jonathan M. Cohen, Huyen Nguyen, and Ravi Teja Gadde. 2019. Jasper: An end-to-end convolutional neural acoustic model. arXiv preprint arXiv:1904.03288 (2019)."},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2006-103"},{"volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP\u201914)","author":"Liu Xunying","key":"e_1_2_1_15_1","unstructured":"Xunying Liu , Yongqiang Wang , Xie Chen , Mark J. F. Gales , and Philip C. Woodland . 2014. Efficient lattice rescoring using recurrent neural network language models . In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP\u201914) . IEEE, 4908--4912. Xunying Liu, Yongqiang Wang, Xie Chen, Mark J. F. Gales, and Philip C. Woodland. 2014. Efficient lattice rescoring using recurrent neural network language models. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP\u201914). IEEE, 4908--4912."},{"key":"e_1_2_1_16_1","volume-title":"Proceedings of the 6th European Conference on Speech Communication and Technology.","author":"Ljolje Andrej","year":"1999","unstructured":"Andrej Ljolje , Fernando Pereira , and Michael Riley . 1999 . Efficient general lattice generation and rescoring . In Proceedings of the 6th European Conference on Speech Communication and Technology. Andrej Ljolje, Fernando Pereira, and Michael Riley. 1999. Efficient general lattice generation and rescoring. In Proceedings of the 6th European Conference on Speech Communication and Technology."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1109\/ASRU.2015.7404790"},{"key":"e_1_2_1_18_1","unstructured":"Micron Technology Inc. 2016. TN-53-01: LPDDR4 System Power Calculator. urlRetrieved from https:\/\/www.micron.com\/support\/tools-and-utilities\/power-calc.  Micron Technology Inc. 2016. TN-53-01: LPDDR4 System Power Calculator. urlRetrieved from https:\/\/www.micron.com\/support\/tools-and-utilities\/power-calc."},{"key":"e_1_2_1_19_1","volume-title":"Proceedings of the International Conference on Field-programmable Technology. IEEE, 341--344","author":"Miura Kazuo","year":"2008","unstructured":"Kazuo Miura , Hiroki Noguchi , Hiroshi Kawaguchi , and Masahiko Yoshimoto . 2008 . A low memory bandwidth Gaussian Mixture Model (GMM) processor for 20,000-word real-time speech recognition FPGA system . In Proceedings of the International Conference on Field-programmable Technology. IEEE, 341--344 . Kazuo Miura, Hiroki Noguchi, Hiroshi Kawaguchi, and Masahiko Yoshimoto. 2008. A low memory bandwidth Gaussian Mixture Model (GMM) processor for 20,000-word real-time speech recognition FPGA system. In Proceedings of the International Conference on Field-programmable Technology. IEEE, 341--344."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1006\/csla.2001.0184"},{"volume-title":"Speech recognition with weighted finite-state transducers","author":"Mohri Mehryar","key":"e_1_2_1_21_1","unstructured":"Mehryar Mohri , Fernando Pereira , and Michael Riley . 2008. Speech recognition with weighted finite-state transducers . In Springer Handbook of Speech Processing. Springer , 559--584. Mehryar Mohri, Fernando Pereira, and Michael Riley. 2008. Speech recognition with weighted finite-state transducers. In Springer Handbook of Speech Processing. Springer, 559--584."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1006\/csla.1996.0022"},{"key":"e_1_2_1_23_1","volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP\u201915)","author":"Panayotov V.","year":"2015","unstructured":"V. Panayotov , G. Chen , D. Povey , and S. Khudanpur . 2015. Librispeech: An ASR corpus based on public domain audio books . In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP\u201915) . 5206--5210. DOI:https:\/\/doi.org\/10.1109\/ICASSP. 2015 .7178964. V. Panayotov, G. Chen, D. Povey, and S. Khudanpur. 2015. Librispeech: An ASR corpus based on public domain audio books. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP\u201915). 5206--5210. DOI:https:\/\/doi.org\/10.1109\/ICASSP.2015.7178964."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2015-647"},{"key":"e_1_2_1_25_1","doi-asserted-by":"publisher","DOI":"10.1109\/VETECS.2011.5956528"},{"key":"e_1_2_1_26_1","volume-title":"Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society.","author":"Povey Daniel","year":"2011","unstructured":"Daniel Povey , Arnab Ghoshal , Gilles Boulianne , Lukas Burget , Ondrej Glembek , Nagendra Goel , Mirko Hannemann , Petr Motlicek , Yanmin Qian , Petr Schwarz , Jan Silovsky , Georg Stemmer , and Karel Vesely . 2011 . The Kaldi speech recognition toolkit . In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society. Daniel Povey, Arnab Ghoshal, Gilles Boulianne, Lukas Burget, Ondrej Glembek, Nagendra Goel, Mirko Hannemann, Petr Motlicek, Yanmin Qian, Petr Schwarz, Jan Silovsky, Georg Stemmer, and Karel Vesely. 2011. The Kaldi speech recognition toolkit. In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1109\/5.18626"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.5555\/108235.108253"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2010-532"},{"volume-title":"ACM SIGARCH Computer Architecture News","author":"Smith James E.","key":"e_1_2_1_31_1","unstructured":"James E. Smith . 1982. Decoupled access\/execute computer architectures . In ACM SIGARCH Computer Architecture News , Vol. 10 . IEEE Computer Society Press , 112--119. James E. Smith. 1982. Decoupled access\/execute computer architectures. In ACM SIGARCH Computer Architecture News, Vol. 10. IEEE Computer Society Press, 112--119."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2017.11"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1109\/29.21701"},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASSP.1977.1162929"},{"key":"e_1_2_1_35_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2018.8461974"},{"key":"e_1_2_1_36_1","volume-title":"A novel pyramidal-FSMN architecture with lattice-free MMI for speech recognition. arXiv preprint arXiv:1810.11352","author":"Yang Xuerui","year":"2018","unstructured":"Xuerui Yang , Jiwei Li , and Xi Zhou . 2018. A novel pyramidal-FSMN architecture with lattice-free MMI for speech recognition. arXiv preprint arXiv:1810.11352 ( 2018 ). Xuerui Yang, Jiwei Li, and Xi Zhou. 2018. A novel pyramidal-FSMN architecture with lattice-free MMI for speech recognition. arXiv preprint arXiv:1810.11352 (2018)."},{"key":"e_1_2_1_37_1","doi-asserted-by":"publisher","DOI":"10.5555\/3195638.3195696"},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2017.15"},{"key":"e_1_2_1_39_1","volume-title":"Article arXiv:1812.06864 (Dec","author":"Zeghidour Neil","year":"2018","unstructured":"Neil Zeghidour , Qiantong Xu , Vitaliy Liptchinsky , Nicolas Usunier , Gabriel Synnaeve , and Ronan Collobert . 2018. Fully convolutional speech recognition. arXiv e-prints , Article arXiv:1812.06864 (Dec 2018 ). Neil Zeghidour, Qiantong Xu, Vitaliy Liptchinsky, Nicolas Usunier, Gabriel Synnaeve, and Ronan Collobert. 2018. Fully convolutional speech recognition. arXiv e-prints, Article arXiv:1812.06864 (Dec 2018)."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1007\/BF02943243"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3425604","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3425604","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T21:31:55Z","timestamp":1750195915000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3425604"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,11,10]]},"references-count":39,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2020,12,31]]}},"alternative-id":["10.1145\/3425604"],"URL":"https:\/\/doi.org\/10.1145\/3425604","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2020,11,10]]},"assertion":[{"value":"2020-05-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-09-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-11-10","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}