{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T07:39:10Z","timestamp":1740123550488,"version":"3.37.3"},"reference-count":81,"publisher":"Springer Science and Business Media LLC","issue":"17","license":[{"start":{"date-parts":[[2024,7,15]],"date-time":"2024-07-15T00:00:00Z","timestamp":1721001600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2024,7,15]],"date-time":"2024-07-15T00:00:00Z","timestamp":1721001600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"name":"CoCoUnit ERC Advanced Grant of the EU\u2019s Horizon 2020","award":["833057"],"award-info":[{"award-number":["833057"]}]},{"name":"Spanish MICINN Ministry","award":["BES-2017-080605"],"award-info":[{"award-number":["BES-2017-080605"]}]},{"DOI":"10.13039\/501100011033","name":"Spanish State Research Agency","doi-asserted-by":"crossref","award":["PID2020-113172RB-I00"],"award-info":[{"award-number":["PID2020-113172RB-I00"]}],"id":[{"id":"10.13039\/501100011033","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Catalan Agency for University and Research","award":["2021SGR00383"],"award-info":[{"award-number":["2021SGR00383"]}]},{"name":"ICREA Academia"},{"DOI":"10.13039\/501100014374","name":"Universitat Polit\u00e8cnica de Catalunya","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100014374","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["J Supercomput"],"published-print":{"date-parts":[[2024,11]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>With mobile and embedded devices getting more integrated in our daily lives, the focus is increasingly shifting toward human-friendly interfaces, making automatic speech recognition (ASR) a central player as the ideal means of interaction with machines. ASR is essential for many cognitive computing applications, such as speech-based assistants, dictation systems and real-time language translation. Consequently, interest in speech technology has grown in the last few years, with more systems being proposed and higher accuracy levels being achieved, even surpassing human accuracy. However, highly accurate ASR systems are computationally expensive, requiring on the order of billions of arithmetic operations to decode each second of audio, which conflicts with a growing interest in deploying ASR on edge devices. On these devices, efficient hardware acceleration is key for achieving acceptable performance. In this paper, we propose a technique to improve the energy efficiency and performance of ASR systems, focusing on low-power hardware for edge devices. We focus on optimizing the DNN-based acoustic model evaluation, as we have observed it to be the main bottleneck in popular ASR systems, by leveraging run-time information from the beam search. By doing so, we reduce energy and execution time of the acoustic model evaluation by 25.6\u00a0 and 25.9\u00a0%, respectively, with negligible accuracy loss.<\/jats:p>","DOI":"10.1007\/s11227-024-06351-y","type":"journal-article","created":{"date-parts":[[2024,7,15]],"date-time":"2024-07-15T15:01:58Z","timestamp":1721055718000},"page":"24908-24937","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["Exploiting beam search confidence for energy-efficient speech recognition"],"prefix":"10.1007","volume":"80","author":[{"given":"Dennis","family":"Pinto","sequence":"first","affiliation":[]},{"given":"Jos\u00e9-Mar\u00eda","family":"Arnau","sequence":"additional","affiliation":[]},{"given":"Marc","family":"Riera","sequence":"additional","affiliation":[]},{"given":"Josep-Lloren\u00e7","family":"Cruz","sequence":"additional","affiliation":[]},{"given":"Antonio","family":"Gonz\u00e1lez","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2024,7,15]]},"reference":[{"key":"6351_CR1","doi-asserted-by":"publisher","first-page":"131858","DOI":"10.1109\/ACCESS.2021.3112535","volume":"9","author":"S Alharbi","year":"2021","unstructured":"Alharbi S, Alrazgan M, Alrashed A et al (2021) Automatic speech recognition: systematic literature review. IEEE Access 9:131858\u2013131876","journal-title":"IEEE Access"},{"key":"6351_CR2","unstructured":"Amazon (2014) Alexa. https:\/\/en.wikipedia.org\/wiki\/Amazon_Alexa, [Online; accessed 22-Mar-2024]"},{"key":"6351_CR3","unstructured":"Amodei D, Ananthanarayanan S, Anubhai R, et\u00a0al (2016) Deep speech 2: End-to-end speech recognition in english and mandarin. In: International Conference on Machine Learning, pp 173\u2013182"},{"key":"6351_CR4","unstructured":"Apple (2011) Siri. https:\/\/en.wikipedia.org\/wiki\/Siri, [Online; accessed 22-Mar-2024]"},{"key":"6351_CR5","first-page":"12449","volume":"33","author":"A Baevski","year":"2020","unstructured":"Baevski A, Zhou Y, Mohamed A et al (2020) wav2vec 2.0: a framework for self-supervised learning of speech representations. Adv Neural Inf Process Syst 33:12449\u201312460","journal-title":"Adv Neural Inf Process Syst"},{"key":"6351_CR6","unstructured":"Baevski A, Hsu WN, Xu Q, et\u00a0al (2022) Data2vec: a general framework for self-supervised learning in speech, vision and language. In: International Conference on Machine Learning, PMLR, pp 1298\u20131312"},{"key":"6351_CR7","doi-asserted-by":"publisher","first-page":"65","DOI":"10.1016\/j.neunet.2021.03.004","volume":"140","author":"Z Bai","year":"2021","unstructured":"Bai Z, Zhang XL (2021) Speaker recognition based on deep learning: An overview. Neural Netw 140:65\u201399","journal-title":"Neural Netw"},{"key":"6351_CR8","doi-asserted-by":"crossref","unstructured":"Chan W, Jaitly N, Le Q et al (2016) Listen, attend and spell: a neural network for large vocabulary conversational speech recognition. 2016 IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP), IEEE, pp 4960\u20134964","DOI":"10.1109\/ICASSP.2016.7472621"},{"issue":"1","key":"6351_CR9","doi-asserted-by":"publisher","first-page":"269","DOI":"10.1145\/2654822.2541967","volume":"42","author":"T Chen","year":"2014","unstructured":"Chen T, Du Z, Sun N et al (2014) Diannao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM SIGARCH Comput Archit News 42(1):269\u2013284","journal-title":"ACM SIGARCH Comput Archit News"},{"issue":"1","key":"6351_CR10","doi-asserted-by":"publisher","first-page":"127","DOI":"10.1109\/JSSC.2016.2616357","volume":"52","author":"YH Chen","year":"2016","unstructured":"Chen YH, Krishna T, Emer JS et al (2016) Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J Sol State Circuits 52(1):127\u2013138","journal-title":"IEEE J Sol State Circuits"},{"issue":"2","key":"6351_CR11","doi-asserted-by":"publisher","first-page":"292","DOI":"10.1109\/JETCAS.2019.2910232","volume":"9","author":"YH Chen","year":"2019","unstructured":"Chen YH, Yang TJ, Emer J et al (2019) Eyeriss v2: a flexible accelerator for emerging deep neural networks on mobile devices. IEEE J Emer Select Topics Circuits Syst 9(2):292\u2013308","journal-title":"IEEE J Emer Select Topics Circuits Syst"},{"key":"6351_CR12","unstructured":"Chorowski JK, Bahdanau D, Serdyuk D, et\u00a0al (2015) Attention-based models for speech recognition. Advances in neural information processing systems 28"},{"key":"6351_CR13","doi-asserted-by":"crossref","unstructured":"Chun A, Chang JX, Fang Z, et\u00a0al (2011) Isis: an accelerator for sphinx speech recognition. In: 2011 IEEE 9th Symposium on Application Specific Processors (SASP), IEEE, pp 58\u201361","DOI":"10.1109\/SASP.2011.5941078"},{"issue":"6","key":"6351_CR14","first-page":"1","volume":"1","author":"N Dave","year":"2013","unstructured":"Dave N (2013) Feature extraction methods lpc, plp and mfcc in speech recognition. Int J Adv Res Eng Technol 1(6):1\u20134","journal-title":"Int J Adv Res Eng Technol"},{"issue":"4","key":"6351_CR15","doi-asserted-by":"publisher","first-page":"788","DOI":"10.1109\/TASL.2010.2064307","volume":"19","author":"N Dehak","year":"2010","unstructured":"Dehak N, Kenny PJ, Dehak R et al (2010) Front-end factor analysis for speaker verification. IEEE Trans Audio Speech Lang Process 19(4):788\u2013798","journal-title":"IEEE Trans Audio Speech Lang Process"},{"key":"6351_CR16","doi-asserted-by":"crossref","unstructured":"Du Z, Fasthuber R, Chen T, et\u00a0al (2015) Shidiannao: shifting vision processing closer to the sensor. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture, pp 92\u2013104","DOI":"10.1145\/2749469.2750389"},{"key":"6351_CR17","unstructured":"Google (2023) Gemini. https:\/\/en.wikipedia.org\/wiki\/Gemini_(chatbot), [Online; accessed 09-Apr-2024]"},{"key":"6351_CR18","doi-asserted-by":"crossref","unstructured":"Gulati A, Qin J, Chiu CC, et\u00a0al (2020) Conformer: convolution-augmented transformer for speech recognition. arXiv preprint arXiv:2005.08100","DOI":"10.21437\/Interspeech.2020-3015"},{"key":"6351_CR19","doi-asserted-by":"crossref","unstructured":"Gupta U, Reagen B, Pentecost L, et\u00a0al (2019) Masr: a modular accelerator for sparse rnns. In: 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT), IEEE, pp 1\u201314","DOI":"10.1109\/PACT.2019.00009"},{"key":"6351_CR20","doi-asserted-by":"crossref","unstructured":"Gutierrez-Garcia JO, L\u00f3pez-Neri E (2015) Cognitive computing: a brief survey and open research challenges. In: 3rd International Conference on Applied Computing and Information Technology\/2nd International Conference on Computational Science and Intelligence, IEEE, pp 328\u2013333","DOI":"10.1109\/ACIT-CSI.2015.64"},{"key":"6351_CR21","doi-asserted-by":"crossref","unstructured":"Han S, Kang J, Mao H, et\u00a0al (2017) Ese: efficient speech recognition engine with sparse lstm on fpga. In: Proceedings of the 2017 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays, pp 75\u201384","DOI":"10.1145\/3020078.3021745"},{"key":"6351_CR22","doi-asserted-by":"crossref","unstructured":"Hegde K, Yu J, Agrawal R, et\u00a0al (2018) Ucnn: exploiting computational reuse in deep neural networks via weight repetition. In: 2018 ACM\/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), IEEE, pp 674\u2013687","DOI":"10.1109\/ISCA.2018.00062"},{"key":"6351_CR23","unstructured":"Hub HF (2024) Open asr leaderboard. https:\/\/huggingface.co\/spaces\/hf-audio\/open_asr_leaderboard, [Online; accessed 06-06-2024]"},{"issue":"3","key":"6351_CR24","doi-asserted-by":"publisher","first-page":"2663","DOI":"10.1007\/s40747-021-00637-x","volume":"8","author":"W Jia","year":"2022","unstructured":"Jia W, Sun M, Lian J et al (2022) Feature dimensionality reduction: a review. Complex Intell Syst 8(3):2663\u20132693","journal-title":"Complex Intell Syst"},{"key":"6351_CR25","doi-asserted-by":"crossref","unstructured":"Jiao X, Akhlaghi V, Jiang Y, et\u00a0al (2018) Energy-efficient neural networks using approximate computation reuse. In: 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE), IEEE, pp 1223\u20131228","DOI":"10.23919\/DATE.2018.8342202"},{"key":"6351_CR26","doi-asserted-by":"crossref","unstructured":"Jouppi NP, Young C, Patil N, et\u00a0al (2017) In-datacenter performance analysis of a tensor processing unit. In: Proceedings of the 44th Annual International Symposium on Computer Architecture, p 1-12","DOI":"10.1145\/3079856.3080246"},{"key":"6351_CR27","doi-asserted-by":"crossref","unstructured":"Karafi\u00e1t M, Burget L, Mat\u011bjka P, et\u00a0al (2011) ivector-based discriminative adaptation for automatic speech recognition. In: 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, IEEE, pp 152\u2013157","DOI":"10.1109\/ASRU.2011.6163922"},{"key":"6351_CR28","unstructured":"Kelly JE (2015) Computing, cognition and the future of knowing. IBM Research :12"},{"key":"6351_CR29","doi-asserted-by":"publisher","DOI":"10.7312\/kell16856","volume-title":"Smart machines: IBM\u2019s Watson and the era of cognitive computing","author":"JE Kelly","year":"2013","unstructured":"Kelly JE, Hamm S (2013) Smart machines: IBM\u2019s Watson and the era of cognitive computing. Columbia University Press, New York"},{"key":"6351_CR30","doi-asserted-by":"crossref","unstructured":"Li J, Lavrukhin V, Ginsburg B, et\u00a0al (2019) Jasper: an end-to-end convolutional neural acoustic model. arXiv preprint arXiv:1904.03288","DOI":"10.21437\/Interspeech.2019-1819"},{"key":"6351_CR31","doi-asserted-by":"crossref","unstructured":"Li J, et\u00a0al (2022) Recent advances in end-to-end automatic speech recognition. APSIPA Transactions on Signal and Information Processing 11(1)","DOI":"10.1561\/116.00000050"},{"key":"6351_CR32","doi-asserted-by":"publisher","first-page":"12","DOI":"10.1016\/j.specom.2022.12.002","volume":"147","author":"Q Li","year":"2023","unstructured":"Li Q, Zhang C, Woodland PC (2023) Combining hybrid dnn-hmm asr systems with attention-based models using lattice rescoring. Speech Commun 147:12\u201321","journal-title":"Speech Commun"},{"key":"6351_CR33","doi-asserted-by":"crossref","unstructured":"Li S, Ahn JH, Strong RD, et\u00a0al (2009) Mcpat: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In: Proceedings of the 42nd Annual IEEE\/ACM International Symposium on Microarchitecture, pp 469\u2013480","DOI":"10.1145\/1669112.1669172"},{"key":"6351_CR34","doi-asserted-by":"crossref","unstructured":"Lin EC, Yu K, Rutenbar RA, et\u00a0al (2006) Moving speech recognition from software to silicon: the in silico vox project. In: Ninth International Conference on Spoken Language Processing","DOI":"10.21437\/Interspeech.2006-103"},{"key":"6351_CR35","doi-asserted-by":"publisher","first-page":"52227","DOI":"10.1109\/ACCESS.2018.2870273","volume":"6","author":"B Liu","year":"2018","unstructured":"Liu B, Qin H, Gong Y et al (2018) Eera-asr: an energy-efficient reconfigurable architecture for automatic speech recognition with hybrid dnn and approximate computing. IEEE Access 6:52227\u201352237","journal-title":"IEEE Access"},{"key":"6351_CR36","unstructured":"Logan B (2000) Mel frequency cepstral coefficients for music modeling. In: Ismir, pp 1\u201311"},{"issue":"1","key":"6351_CR37","doi-asserted-by":"publisher","first-page":"73","DOI":"10.1109\/TC.2016.2574353","volume":"66","author":"T Luo","year":"2016","unstructured":"Luo T, Liu S, Li L et al (2016) Dadiannao: a neural network supercomputer. IEEE Trans Comput 66(1):73\u201388","journal-title":"IEEE Trans Comput"},{"key":"6351_CR38","doi-asserted-by":"publisher","first-page":"9411","DOI":"10.1007\/s11042-020-10073-7","volume":"80","author":"M Malik","year":"2021","unstructured":"Malik M, Malik MK, Mehmood K et al (2021) Automatic speech recognition: a survey. Multimed Tools Appl 80:9411\u20139457","journal-title":"Multimed Tools Appl"},{"key":"6351_CR39","doi-asserted-by":"publisher","first-page":"795","DOI":"10.1162\/tacl_a_00346","volume":"8","author":"C Meister","year":"2020","unstructured":"Meister C, Vieira T, Cotterell R (2020) Best-first beam search. Trans Assoc Comput Linguist 8:795\u2013809","journal-title":"Trans Assoc Comput Linguist"},{"key":"6351_CR40","unstructured":"Micron\u00a0Technology I (2016) Tn-53-01: Lpddr4 system power calculator. urlhttps:\/\/www.micron.com\/support\/tools-and-utilities\/power-calc"},{"key":"6351_CR41","unstructured":"Microsoft (2014) Cortana. https:\/\/www.microsoft.com\/en-us\/cortana, [Online; accessed 22-Mar-2024]"},{"key":"6351_CR42","doi-asserted-by":"crossref","unstructured":"Miura K, Noguchi H, Kawaguchi H, et\u00a0al (2008) A low memory bandwidth gaussian mixture model (gmm) processor for 20,000-word real-time speech recognition fpga system. In: 2008 International Conference on Field-Programmable Technology, IEEE, pp 341\u2013344","DOI":"10.1109\/FPT.2008.4762413"},{"issue":"8","key":"6351_CR43","doi-asserted-by":"publisher","first-page":"62","DOI":"10.1145\/1978542.1978559","volume":"54","author":"DS Modha","year":"2011","unstructured":"Modha DS, Ananthanarayanan R, Esser SK et al (2011) Cognitive computing. Commun ACM 54(8):62\u201371","journal-title":"Commun ACM"},{"issue":"6","key":"6351_CR44","doi-asserted-by":"publisher","first-page":"1179","DOI":"10.1109\/JSTSP.2022.3207050","volume":"16","author":"A Mohamed","year":"2022","unstructured":"Mohamed A, Hy Lee, Borgholt L et al (2022) Self-supervised speech representation learning: a review. IEEE J Sel Topics Signal Process 16(6):1179\u20131210","journal-title":"IEEE J Sel Topics Signal Process"},{"issue":"1","key":"6351_CR45","doi-asserted-by":"publisher","first-page":"69","DOI":"10.1006\/csla.2001.0184","volume":"16","author":"M Mohri","year":"2002","unstructured":"Mohri M, Pereira F, Riley M (2002) Weighted finite-state transducers in speech recognition. Comput Speech Lang 16(1):69\u201388","journal-title":"Comput Speech Lang"},{"key":"6351_CR46","doi-asserted-by":"crossref","unstructured":"Muralimanohar N, Balasubramonian R, Jouppi NP (2009) Cacti 6.0: a tool to model large caches. HP laboratories 27:28","DOI":"10.1109\/MM.2008.2"},{"key":"6351_CR47","doi-asserted-by":"crossref","unstructured":"Ning L, Shen X (2019) Deep reuse: streamline cnn inference on the fly via coarse-grained computation reuse. In: Proceedings of the ACM International Conference on Supercomputing, pp 438\u2013448","DOI":"10.1145\/3330345.3330384"},{"key":"6351_CR48","doi-asserted-by":"crossref","unstructured":"Panayotov V, Chen G, Povey D et al (2015) Librispeech: an asr corpus based on public domain audio books. 2015 IEEE International Conference on Acoustics. Speech and Signal Processing (ICASSP), IEEE, pp 5206\u20135210","DOI":"10.1109\/ICASSP.2015.7178964"},{"key":"6351_CR49","doi-asserted-by":"crossref","unstructured":"Park DS, Chan W, Zhang Y, et\u00a0al (2019) Specaugment: A simple data augmentation method for automatic speech recognition. arXiv preprint arXiv:1904.08779","DOI":"10.21437\/Interspeech.2019-2680"},{"key":"6351_CR50","doi-asserted-by":"publisher","DOI":"10.1016\/j.csl.2021.101317","volume":"72","author":"TJ Park","year":"2022","unstructured":"Park TJ, Kanda N, Dimitriadis D et al (2022) A review of speaker diarization: recent advances with deep learning. Comput Speech Lang 72:101317","journal-title":"Comput Speech Lang"},{"key":"6351_CR51","doi-asserted-by":"crossref","unstructured":"Peddinti V, Povey D, Khudanpur S (2015) A time delay neural network architecture for efficient modeling of long temporal contexts. In: Sixteenth Annual Conference of the International Speech Communication Association","DOI":"10.21437\/Interspeech.2015-647"},{"issue":"4","key":"6351_CR52","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1145\/3425604","volume":"17","author":"D Pinto","year":"2020","unstructured":"Pinto D, Arnau JM, Gonz\u00e1lez A (2020) Design and evaluation of an ultra low-power human-quality speech recognition system. ACM Trans Architec Code Optim (TACO) 17(4):1\u201319","journal-title":"ACM Trans Architec Code Optim (TACO)"},{"key":"6351_CR53","unstructured":"Povey D, Ghoshal A, Boulianne G, et\u00a0al (2011) The kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding, IEEE Signal Processing Society, CONF"},{"key":"6351_CR54","doi-asserted-by":"crossref","unstructured":"Povey D, Peddinti V, Galvez D, et\u00a0al (2016) Purely sequence-trained neural networks for asr based on lattice-free mmi. In: Interspeech, pp 2751\u20132755","DOI":"10.21437\/Interspeech.2016-595"},{"key":"6351_CR55","doi-asserted-by":"crossref","unstructured":"Povey D, Hadian H, Ghahremani P et al (2018) A time-restricted self-attention layer for asr. 2018 IEEE International Conference on Acoustics. IEEE, Speech and Signal Processing (ICASSP), pp 5874\u20135878","DOI":"10.1109\/ICASSP.2018.8462497"},{"key":"6351_CR56","doi-asserted-by":"crossref","unstructured":"Prabhavalkar R, Hori T, Sainath TN, et\u00a0al (2023) End-to-end speech recognition: a survey. IEEE\/ACM Transactions on Audio, Speech, and Language Processing","DOI":"10.1109\/TASLP.2023.3328283"},{"key":"6351_CR57","unstructured":"Price M (2016) Energy-scalable speech recognition circuits. PhD thesis, Massachusetts Institute of Technology"},{"issue":"2","key":"6351_CR58","doi-asserted-by":"publisher","first-page":"257","DOI":"10.1109\/5.18626","volume":"77","author":"LR Rabiner","year":"1989","unstructured":"Rabiner LR (1989) A tutorial on hidden markov models and selected applications in speech recognition. Proc IEEE 77(2):257\u2013286","journal-title":"Proc IEEE"},{"key":"6351_CR59","unstructured":"Radford A,\u00a0Kim JW,\u00a0Xu T et al (2023)\u00a0Robust speech recognition via large-scale weak supervision. In:\u00a0International Conference on Machine Learning,\u00a0PMLR, pp\u00a028492\u201328518"},{"key":"6351_CR60","doi-asserted-by":"crossref","unstructured":"Riera M, Arnau JM, Gonz\u00e1lez A (2018) Computation reuse in dnns by exploiting input similarity. In: 2018 ACM\/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), IEEE, pp 57\u201368","DOI":"10.1109\/ISCA.2018.00016"},{"issue":"5","key":"6351_CR61","doi-asserted-by":"publisher","first-page":"36","DOI":"10.1109\/MM.2019.2929742","volume":"39","author":"M Riera","year":"2019","unstructured":"Riera M, Arnau JM, Gonz\u00e1lez A (2019) Cgpa: coarse-grained pruning of activations for energy-efficient rnn inference. IEEE Micro 39(5):36\u201345","journal-title":"IEEE Micro"},{"key":"6351_CR62","doi-asserted-by":"publisher","DOI":"10.1016\/j.sysarc.2022.102604","volume":"129","author":"M Riera","year":"2022","unstructured":"Riera M, Arnau JM, Gonz\u00e1lez A (2022) Crew: computation reuse and efficient weight storage for hardware-accelerated mlps and rnns. J Syst Architect 129:102604","journal-title":"J Syst Architect"},{"key":"6351_CR63","doi-asserted-by":"publisher","DOI":"10.1016\/j.sysarc.2021.102336","volume":"122","author":"M Riera","year":"2022","unstructured":"Riera M, Arnau JM, Gonz\u00e1lez A (2022) Dnn pruning with principal component analysis and connection importance estimation. J Syst Architec (JSA) 122:102336","journal-title":"J Syst Architec (JSA)"},{"key":"6351_CR64","doi-asserted-by":"crossref","unstructured":"Rouvier M, Favre B (2014) Speaker adaptation of dnn-based asr with i-vectors: Does it actually adapt models to speakers? In: Fifteenth Annual Conference of the International Speech Communication Association","DOI":"10.21437\/Interspeech.2014-503"},{"key":"6351_CR65","volume-title":"The cognitive computer: on language, learning, and artificial intelligence","author":"RC Schank","year":"1984","unstructured":"Schank RC (1984) The cognitive computer: on language, learning, and artificial intelligence. Addison-Wesley Longman Publishing Co., Inc., Boston"},{"key":"6351_CR66","doi-asserted-by":"crossref","unstructured":"Silfa F, Dot G, Arnau JM, et\u00a0al (2018) E-pur: an energy-efficient processing unit for recurrent neural networks. In: Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, pp 1\u201312","DOI":"10.1145\/3243176.3243184"},{"key":"6351_CR67","unstructured":"Synnaeve G, Xu Q, Kahn J, et\u00a0al (2019) End-to-end asr: from supervised to semi-supervised learning with modern architectures. arXiv preprint arXiv:1911.08460"},{"key":"6351_CR68","doi-asserted-by":"crossref","unstructured":"Tabani H, Arnau JM, Tubella J, et\u00a0al (2017) An ultra low-power hardware accelerator for acoustic scoring in speech recognition. In: 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT), IEEE, pp 41\u201352","DOI":"10.1109\/PACT.2017.11"},{"key":"6351_CR69","unstructured":"Waibel A, Hanazawa T, Hinton G et al (1995) Phoneme recognition using time-delay neural networks. Theory, Architectures and Applications, Backpropagation, pp 35\u201361"},{"key":"6351_CR70","doi-asserted-by":"crossref","unstructured":"Wang Y, Mohamed A, Le D et al (2020) Transformer-based acoustic modeling for hybrid speech recognition. ICASSP 2020\u20132020 IEEE International Conference on Acoustics. IEEE, Speech and Signal Processing (ICASSP), pp 6874\u20136878","DOI":"10.1109\/ICASSP40776.2020.9054345"},{"key":"6351_CR71","unstructured":"Xiong W, Droppo J, Huang X, et\u00a0al (2016) Achieving human parity in conversational speech recognition. arXiv preprint arXiv:1610.05256"},{"key":"6351_CR72","doi-asserted-by":"crossref","unstructured":"Xu L, Gu Y, Kolehmainen J et al (2022) Rescorebert: Discriminative speech recognition rescoring with bert. ICASSP 2022\u20132022 IEEE International Conference on Acoustics. IEEE, Speech and Signal Processing (ICASSP), pp 6117\u20136121","DOI":"10.1109\/ICASSP43922.2022.9747118"},{"key":"6351_CR73","doi-asserted-by":"crossref","unstructured":"Yazdani R, Segura A, Arnau JM, et\u00a0al (2016) An ultra low-power hardware accelerator for automatic speech recognition. In: 2016 49th Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO), IEEE, pp 1\u201312","DOI":"10.1109\/MICRO.2016.7783750"},{"key":"6351_CR74","doi-asserted-by":"crossref","unstructured":"Yazdani R, Arnau JM, Gonz\u00e1lez A (2017a) Unfold: A memory-efficient speech recognizer using on-the-fly wfst composition. In: Proceedings of the 50th Annual IEEE\/ACM International Symposium on Microarchitecture, pp 69\u201381","DOI":"10.1145\/3123939.3124542"},{"issue":"1","key":"6351_CR75","doi-asserted-by":"publisher","first-page":"22","DOI":"10.1109\/MM.2017.15","volume":"37","author":"R Yazdani","year":"2017","unstructured":"Yazdani R, Segura A, Arnau JM et al (2017) Low-power automatic speech recognition through a mobile gpu and a viterbi accelerator. IEEE Micro 37(1):22\u201329","journal-title":"IEEE Micro"},{"issue":"12","key":"6351_CR76","doi-asserted-by":"publisher","first-page":"1817","DOI":"10.1109\/TC.2019.2937075","volume":"68","author":"R Yazdani","year":"2019","unstructured":"Yazdani R, Arnau JM, Gonz\u00e1lez A (2019) A low-power, high-performance speech recognition accelerator. IEEE Trans Comput 68(12):1817\u20131831","journal-title":"IEEE Trans Comput"},{"issue":"8","key":"6351_CR77","first-page":"1197","volume":"69","author":"R Yazdani","year":"2020","unstructured":"Yazdani R, Arnau JM, Gonzalez A (2020) Laws: locality-aware scheme for automatic speech recognition. IEEE Trans Comput 69(8):1197\u20131208","journal-title":"IEEE Trans Comput"},{"key":"6351_CR78","unstructured":"Zhang Y, Qin J, Park DS, et\u00a0al (2020) Pushing the limits of semi-supervised learning for automatic speech recognition. arXiv preprint arXiv:2010.10504"},{"key":"6351_CR79","unstructured":"Zhang Y, Han W, Qin J, et\u00a0al (2023) Google usm: scaling automatic speech recognition beyond 100 languages. arXiv preprint arXiv:2303.01037"},{"key":"6351_CR80","doi-asserted-by":"crossref","unstructured":"Zhou T, Zhao Y, Wu J (2021) Resnext and res2net structures for speaker verification. In: 2021 IEEE Spoken Language Technology Workshop (SLT), IEEE, pp 301\u2013307","DOI":"10.1109\/SLT48900.2021.9383531"},{"issue":"3","key":"6351_CR81","doi-asserted-by":"publisher","first-page":"8","DOI":"10.1109\/MSP.2023.3240008","volume":"40","author":"K Zmolikova","year":"2023","unstructured":"Zmolikova K, Delcroix M, Ochiai T et al (2023) Neural target speech extraction: an overview. IEEE Signal Process Mag 40(3):8\u201329","journal-title":"IEEE Signal Process Mag"}],"container-title":["The Journal of Supercomputing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11227-024-06351-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s11227-024-06351-y\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s11227-024-06351-y.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,9,5]],"date-time":"2024-09-05T15:29:47Z","timestamp":1725550187000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s11227-024-06351-y"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7,15]]},"references-count":81,"journal-issue":{"issue":"17","published-print":{"date-parts":[[2024,11]]}},"alternative-id":["6351"],"URL":"https:\/\/doi.org\/10.1007\/s11227-024-06351-y","relation":{},"ISSN":["0920-8542","1573-0484"],"issn-type":[{"type":"print","value":"0920-8542"},{"type":"electronic","value":"1573-0484"}],"subject":[],"published":{"date-parts":[[2024,7,15]]},"assertion":[{"value":"5 July 2024","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"15 July 2024","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare no competing interests.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}