{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,3,31]],"date-time":"2026-03-31T13:57:16Z","timestamp":1774965436372,"version":"3.50.1"},"reference-count":46,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2023,1,24]],"date-time":"2023-01-24T00:00:00Z","timestamp":1674518400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"name":"CoCoUnit ERC Advanced"},{"name":"EU\u2019s Horizon 2020","award":["833057"],"award-info":[{"award-number":["833057"]}]},{"DOI":"10.13039\/501100011033","name":"Spanish State Research Agency","doi-asserted-by":"crossref","award":["PID2020-113172RB-I00"],"award-info":[{"award-number":["PID2020-113172RB-I00"]}],"id":[{"id":"10.13039\/501100011033","id-type":"DOI","asserted-by":"crossref"}]},{"name":"ICREA Academia"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Embed. Comput. Syst."],"published-print":{"date-parts":[[2023,3,31]]},"abstract":"<jats:p>The effectiveness of Recurrent Neural Networks (RNNs) for tasks such as Automatic Speech Recognition has fostered interest in RNN inference acceleration. Due to the recurrent nature and data dependencies of RNN computations, prior work has designed customized architectures specifically tailored to the computation pattern of RNN, getting high computation efficiency for certain chosen model sizes. However, given that the dimensionality of RNNs varies a lot for different tasks, it is crucial to generalize this efficiency to diverse configurations.<\/jats:p>\n          <jats:p>In this work, we identify adaptiveness as a key feature that is missing from today\u2019s RNN accelerators. In particular, we first show the problem of low resource utilization and low adaptiveness for the state-of-the-art RNN implementations on GPU, FPGA, and ASIC architectures. To solve these issues, we propose an intelligent tiled-based dispatching mechanism for increasing the adaptiveness of RNN computation, in order to efficiently handle the data dependencies. To do so, we propose Sharp as a hardware accelerator, which pipelines RNN computation using an effective scheduling scheme to hide most of the dependent serialization. Furthermore, Sharp employs dynamic reconfigurable architecture to adapt to the model\u2019s characteristics.<\/jats:p>\n          <jats:p>Sharp achieves 2\u00d7, 2.8\u00d7, and 82\u00d7 speedups on average, considering different RNN models and resource budgets, compared to the state-of-the-art ASIC, FPGA, and GPU implementations, respectively. Furthermore, we provide significant energy reduction with respect to the previous solutions, due to the low power dissipation of Sharp (321 GFLOPS\/Watt).<\/jats:p>","DOI":"10.1145\/3552513","type":"journal-article","created":{"date-parts":[[2022,8,12]],"date-time":"2022-08-12T11:30:32Z","timestamp":1660303832000},"page":"1-23","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["SHARP: An Adaptable, Energy-Efficient Accelerator for Recurrent Neural Networks"],"prefix":"10.1145","volume":"22","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7949-6453","authenticated-orcid":false,"given":"Reza Yazdani","family":"Aminabadi","sequence":"first","affiliation":[{"name":"Microsoft, Canada"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5508-0728","authenticated-orcid":false,"given":"Olatunji","family":"Ruwase","sequence":"additional","affiliation":[{"name":"Microsoft, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8165-166X","authenticated-orcid":false,"given":"Minjia","family":"Zhang","sequence":"additional","affiliation":[{"name":"Microsoft, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-0478-8854","authenticated-orcid":false,"given":"Yuxiong","family":"He","sequence":"additional","affiliation":[{"name":"Microsoft, USA"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0336-9191","authenticated-orcid":false,"given":"Jose-Maria","family":"Arnau","sequence":"additional","affiliation":[{"name":"Universitat Politecnica de Catalunya, Spain"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0009-0996","authenticated-orcid":false,"given":"Antonio","family":"Gonz\u00e1lez","sequence":"additional","affiliation":[{"name":"Universitat Politecnica de Catalunya, Spain"}]}],"member":"320","published-online":{"date-parts":[[2023,1,24]]},"reference":[{"key":"e_1_3_1_2_2","article-title":"Optimizing performance of recurrent neural networks on GPUs","volume":"1604","author":"Appleyard Jeremy","year":"2016","unstructured":"Jeremy Appleyard, Tom\u00e1s Kocisk\u00fd, and Phil Blunsom. 2016. Optimizing performance of recurrent neural networks on GPUs. CoRR abs\/1604.01946 (2016). arXiv:1604.01946. http:\/\/arxiv.org\/abs\/1604.01946.","journal-title":"CoRR"},{"key":"e_1_3_1_3_2","article-title":"Towards non-saturating recurrent units for modelling long-term dependencies","volume":"1902","author":"Chandar Sarath","year":"2019","unstructured":"Sarath Chandar, Chinnadhurai Sankar, Eugene Vorontsov, Samira Ebrahimi Kahou, and Yoshua Bengio. 2019. Towards non-saturating recurrent units for modelling long-term dependencies. CoRR abs\/1902.06704 (2019). arXiv:1902.06704. http:\/\/arxiv.org\/abs\/1902.06704.","journal-title":"CoRR"},{"key":"e_1_3_1_4_2","article-title":"Recurrent neural networks hardware implementation on FPGA","volume":"1511","author":"Chang Andre Xian Ming","year":"2015","unstructured":"Andre Xian Ming Chang, Berin Martini, and Eugenio Culurciello. 2015. Recurrent neural networks hardware implementation on FPGA. CoRR abs\/1511.05552 (2015). arXiv:1511.05552. http:\/\/arxiv.org\/abs\/1511.05552.","journal-title":"CoRR"},{"key":"e_1_3_1_5_2","article-title":"cuDNN: Efficient primitives for deep learning","volume":"1410","author":"Chetlur Sharan","year":"2014","unstructured":"Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. 2014. cuDNN: Efficient primitives for deep learning. CoRR abs\/1410.0759 (2014). arXiv:1410.0759. http:\/\/arxiv.org\/abs\/1410.0759.","journal-title":"CoRR"},{"key":"e_1_3_1_6_2","article-title":"Learning phrase representations using RNN Encoder-Decoder for Statistical Machine Translation","volume":"1406","author":"Cho Kyunghyun","year":"2014","unstructured":"Kyunghyun Cho, Bart van Merrienboer, \u00c7aglar G\u00fcl\u00e7ehre, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN Encoder-Decoder for Statistical Machine Translation. CoRR abs\/1406.1078 (2014). arXiv:1406.1078. http:\/\/arxiv.org\/abs\/1406.1078.","journal-title":"CoRR"},{"key":"e_1_3_1_7_2","article-title":"Empirical evaluation of gated recurrent neural networks on sequence modeling","volume":"1412","author":"Chung Junyoung","year":"2014","unstructured":"Junyoung Chung, \u00c7aglar G\u00fcl\u00e7ehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. CoRR abs\/1412.3555 (2014). arXiv:1412.3555. http:\/\/arxiv.org\/abs\/1412.3555.","journal-title":"CoRR"},{"key":"e_1_3_1_8_2","first-page":"2024","volume-title":"Proceedings of the 33rd International Conference on Machine Learning (ICML\u201916)","author":"Diamos Greg","year":"2016","unstructured":"Greg Diamos, Shubho Sengupta, Bryan Catanzaro, Mike Chrzanowski, Adam Coates, Erich Elsen, Jesse Engel, Awni Y. Hannun, and Sanjeev Satheesh. 2016. Persistent RNNs: Stashing recurrent weights on-chip. In Proceedings of the 33rd International Conference on Machine Learning (ICML\u201916). 2024\u20132033."},{"key":"e_1_3_1_9_2","article-title":"Long-term recurrent convolutional networks for visual recognition and description","volume":"1411","author":"Donahue Jeff","year":"2014","unstructured":"Jeff Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. 2014. Long-term recurrent convolutional networks for visual recognition and description. CoRR abs\/1411.4389 (2014). arXiv:1411.4389. http:\/\/arxiv.org\/abs\/1411.4389.","journal-title":"CoRR"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2018.00012"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/ASPDAC.2017.7858394"},{"key":"e_1_3_1_12_2","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2019.00009"},{"key":"e_1_3_1_13_2","article-title":"ESE: Efficient speech recognition engine with compressed LSTM on FPGA","volume":"1612","author":"Han Song","year":"2016","unstructured":"Song Han, Junlong Kang, Huizi Mao, Yiming Hu, Xin Li, Yubin Li, Dongliang Xie, Hong Luo, Song Yao, Yu Wang, Huazhong Yang, and William J. Dally. 2016. ESE: Efficient speech recognition engine with compressed LSTM on FPGA. CoRR abs\/1612.00694 (2016). arXiv:1612.00694. http:\/\/arxiv.org\/abs\/1612.00694.","journal-title":"CoRR"},{"key":"e_1_3_1_14_2","unstructured":"Song Han Huizi Mao and William J. Dally. 2016. Deep compression: Compressing deep neural networks with pruning trained quantization and Huffman coding. arXiv:cs.CV\/1510.00149."},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.2307\/2346830"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1997.9.8.1735"},{"key":"e_1_3_1_17_2","first-page":"33","volume-title":"Proceedings of the 7th International Conference on Performance, Safety and Robustness in Complex Systems and Applications","author":"Hoffmann Javier","year":"2017","unstructured":"Javier Hoffmann, Osvaldo Navarro Guzm\u00e1n, Florian K\u00e4stner, Benedikt Jan\u00dfen, and Michael H\u00fcbner. 2017. A survey on CNN and RNN implementations. In Proceedings of the 7th International Conference on Performance, Safety and Robustness in Complex Systems and Applications. CYBER, 33\u201339."},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1145\/3302424.3303949"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/3079856.3080246"},{"key":"e_1_3_1_20_2","unstructured":"Jaeyoung Kim Mostafa El-Khamy and Jungwon Lee. 2017. Residual LSTM: Design of a deep recurrent architecture for distant speech recognition. (2017). arXiv:cs.LG\/1701.03360."},{"key":"e_1_3_1_21_2","article-title":"The emergence of number and syntax units in LSTM language models","volume":"1903","author":"Lakretz Yair","year":"2019","unstructured":"Yair Lakretz, Germ\u00e1n Kruszewski, Theo Desbordes, Dieuwke Hupkes, Stanislas Dehaene, and Marco Baroni. 2019. The emergence of number and syntax units in LSTM language models. CoRR abs\/1903.07435 (2019). arXiv:1903.07435. http:\/\/arxiv.org\/abs\/1903.07435.","journal-title":"CoRR"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2019.8683660"},{"key":"e_1_3_1_23_2","volume-title":"IEEE\/ACM ICCAD","author":"Li S.","year":"2011","unstructured":"S. Li, K. Chen, J. H. Ahn, J. B. Brockman, and N. P. Jouppi. 2011. CACTI-P: Architecture-level modeling for SRAM-based structures with advanced leakage reduction techniques. In IEEE\/ACM ICCAD."},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2015.50"},{"key":"e_1_3_1_25_2","article-title":"EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding","volume":"1507","author":"Miao Yajie","year":"2015","unstructured":"Yajie Miao, Mohammad Gowayyed, and Florian Metze. 2015. EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding. CoRR abs\/1507.08240 (2015). arXiv:1507.08240. http:\/\/arxiv.org\/abs\/1507.08240.","journal-title":"CoRR"},{"key":"e_1_3_1_26_2","doi-asserted-by":"crossref","unstructured":"Tomas Mikolov Martin Karafi\u00e1t Lukas Burget Jan Cernock\u00fd and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In Proceedings of the 11th Annual Conference of the International Speech Communication Association . 1045\u20131048.","DOI":"10.21437\/Interspeech.2010-343"},{"key":"e_1_3_1_27_2","unstructured":"S. Narang and G. Diamo. 2017. Baidu DeepBench. (2017). https:\/\/github.com\/baidu-research\/DeepBench."},{"key":"e_1_3_1_28_2","doi-asserted-by":"crossref","unstructured":"Joe Yue-Hei Ng Matthew Hausknecht Sudheendra Vijayanarasimhan Oriol Vinyals Rajat Monga and George Toderici. 2015. Beyond Short Snippets: Deep Networks for Video Classification. (2015). arXiv:cs.CV\/1503.08909.","DOI":"10.1109\/CVPR.2015.7299101"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.21437\/interspeech.2019-2680"},{"key":"e_1_3_1_30_2","article-title":"Recent advances in recurrent neural networks","volume":"1801","author":"Salehinejad Hojjat","year":"2018","unstructured":"Hojjat Salehinejad, Julianne Baarbe, Sharan Sankar, Joseph Barfett, Errol Colak, and Shahrokh Valaee. 2018. Recent advances in recurrent neural networks. CoRR abs\/1801.01078 (2018). arXiv:1801.01078. http:\/\/arxiv.org\/abs\/1801.01078.","journal-title":"CoRR"},{"key":"e_1_3_1_31_2","article-title":"Bidirectional attention flow for machine comprehension","volume":"1611","author":"Seo Min Joon","year":"2016","unstructured":"Min Joon Seo, Aniruddha Kembhavi, Ali Farhadi, and Hannaneh Hajishirzi. 2016. Bidirectional attention flow for machine comprehension. CoRR abs\/1611.01603 (2016). arXiv:1611.01603. http:\/\/arxiv.org\/abs\/1611.01603.","journal-title":"CoRR"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/3243176.3243184"},{"key":"e_1_3_1_33_2","article-title":"Sequence to sequence learning with neural networks","volume":"1409","author":"Sutskever Ilya","year":"2014","unstructured":"Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. CoRR abs\/1409.3215 (2014). arXiv:1409.3215. http:\/\/arxiv.org\/abs\/1409.3215.","journal-title":"CoRR"},{"key":"e_1_3_1_34_2","unstructured":"Synopsys. 2010. https:\/\/www.synopsys.com\/."},{"key":"e_1_3_1_35_2","unstructured":"Synopsys. 2021. Synopsys DesignWare Library. https:\/\/www.synopsys.com\/dw\/buildingblock.php."},{"key":"e_1_3_1_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/PACT.2017.11"},{"key":"e_1_3_1_37_2","volume-title":"Calculating Memory System Power for DDR3, Micron Technology","year":"2007","unstructured":"TN-41-01. 2007. Calculating Memory System Power for DDR3, Micron Technology. Technical Report."},{"key":"e_1_3_1_38_2","article-title":"Sequence to sequence\u2014video to text","volume":"1505","author":"Venugopalan Subhashini","year":"2015","unstructured":"Subhashini Venugopalan, Marcus Rohrbach, Jeff Donahue, Raymond J. Mooney, Trevor Darrell, and Kate Saenko. 2015. Sequence to sequence\u2014video to text. CoRR abs\/1505.00487 (2015). arXiv:1505.00487. http:\/\/arxiv.org\/abs\/1505.00487.","journal-title":"CoRR"},{"key":"e_1_3_1_39_2","article-title":"Show and tell: A neural image caption generator","volume":"1411","author":"Vinyals Oriol","year":"2014","unstructured":"Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. 2014. Show and tell: A neural image caption generator. CoRR abs\/1411.4555 (2014). arXiv:1411.4555. http:\/\/arxiv.org\/abs\/1411.4555.","journal-title":"CoRR"},{"key":"e_1_3_1_40_2","volume-title":"Proceedings of the 19th International Conference on Computational Linguistics and Intelligent Text Processing","author":"Wang Shuohang","year":"2018","unstructured":"Shuohang Wang and Jing Jiang. 2018. An LSTM model for cloze-style machine comprehension. In Proceedings of the 19th International Conference on Computational Linguistics and Intelligent Text Processing."},{"key":"e_1_3_1_41_2","article-title":"C-LSTM: Enabling efficient LSTM using structured compression techniques on FPGAs","volume":"1803","author":"Wang Shuo","year":"2018","unstructured":"Shuo Wang, Zhe Li, Caiwen Ding, Bo Yuan, Yanzhi Wang, Qinru Qiu, and Yun Liang. 2018. C-LSTM: Enabling efficient LSTM using structured compression techniques on FPGAs. CoRR abs\/1803.06305 (2018). arXiv:1803.06305. http:\/\/arxiv.org\/abs\/1803.06305.","journal-title":"CoRR"},{"key":"e_1_3_1_42_2","doi-asserted-by":"publisher","DOI":"10.5555\/2789272.2789289"},{"key":"e_1_3_1_43_2","article-title":"Google\u2019s neural machine translation system: Bridging the gap between human and machine translation","volume":"1609","author":"Wu Yonghui","year":"2016","unstructured":"Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Lukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2016. Google\u2019s neural machine translation system: Bridging the gap between human and machine translation. CoRR abs\/1609.08144 (2016). arXiv:1609.08144. http:\/\/arxiv.org\/abs\/1609.08144.","journal-title":"CoRR"},{"key":"e_1_3_1_44_2","unstructured":"Scott Yokim. 2018. Tensor Ops Made Easier in cuDNN. https:\/\/devblogs.nvidia.com\/tensor-ops-made-easier-in-cudnn\/."},{"key":"e_1_3_1_45_2","article-title":"Recurrent neural network regularization","volume":"1409","author":"Zaremba Wojciech","year":"2014","unstructured":"Wojciech Zaremba, Ilya Sutskever, and Oriol Vinyals. 2014. Recurrent neural network regularization. CoRR abs\/1409.2329 (2014). arXiv:1409.2329.","journal-title":"CoRR"},{"key":"e_1_3_1_46_2","first-page":"951","volume-title":"2018 USENIX Annual Technical Conference","author":"Zhang Minjia","year":"2018","unstructured":"Minjia Zhang, Samyam Rajbhandari, Wenhan Wang, and Yuxiong He. 2018. DeepCPU: Serving RNN-based deep learning models 10\u00d7 faster. In 2018 USENIX Annual Technical Conference. 951\u2013965."},{"key":"e_1_3_1_47_2","article-title":"Learning to ask unanswerable questions for machine reading comprehension","volume":"1906","author":"Zhu Haichao","year":"2019","unstructured":"Haichao Zhu, Li Dong, Furu Wei, Wenhui Wang, Bing Qin, and Ting Liu. 2019. Learning to ask unanswerable questions for machine reading comprehension. CoRR abs\/1906.06045 (2019).","journal-title":"CoRR"}],"container-title":["ACM Transactions on Embedded Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3552513","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3552513","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T17:45:12Z","timestamp":1750268712000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3552513"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2023,1,24]]},"references-count":46,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2023,3,31]]}},"alternative-id":["10.1145\/3552513"],"URL":"https:\/\/doi.org\/10.1145\/3552513","relation":{},"ISSN":["1539-9087","1558-3465"],"issn-type":[{"value":"1539-9087","type":"print"},{"value":"1558-3465","type":"electronic"}],"subject":[],"published":{"date-parts":[[2023,1,24]]},"assertion":[{"value":"2021-10-11","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-07-18","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2023-01-24","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}