{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2026,5,2]],"date-time":"2026-05-02T15:12:48Z","timestamp":1777734768444,"version":"3.51.4"},"reference-count":35,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2022,5,28]],"date-time":"2022-05-28T00:00:00Z","timestamp":1653696000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Embed. Comput. Syst."],"published-print":{"date-parts":[[2022,5,31]]},"abstract":"<jats:p>\n            Large Vocabulary Continuous Speech Recognition systems require Viterbi searching through a large state space to find the most probable sequence of phonemes that led to a given sound sample. This needs storing and updating of a large\n            <jats:bold>Active State List (ASL)<\/jats:bold>\n            in the\n            <jats:bold>on-chip memory (OCM)<\/jats:bold>\n            at regular intervals (called frames), which poses a major performance bottleneck for speech decoding. Most works use hash tables for OCM storage while\n            <jats:italic>beam-width pruning<\/jats:italic>\n            to restrict the ASL size. To achieve a decent accuracy and performance, a large OCM, numerous acoustic probability computations, and DRAM accesses are incurred.\n          <\/jats:p>\n          <jats:p>\n            We propose to use a binary search tree for ASL storage and a max heap data structure to track the worst cost state and efficiently replace it when a better state is found. With this approach, the ASL size can be reduced from over 32K to 512 with minimal impact on recognition accuracy for a 7,000-word vocabulary model. This, combined with a caching technique for acoustic scores, reduced the DRAM data accessed by 31\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\( \\times \\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            and the acoustic probability computations by 26\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\( \\times \\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            .\n          <\/jats:p>\n          <jats:p>The approach has also been implemented in hardware on a Xilinx Zynq FPGA at 200 MHz using the Vivado SDS compiler. We study the tradeoffs among the amount of OCM used, word error rate, and decoding speed to show the effectiveness of the approach. The resulting implementation is capable of running faster than real time with 91% lesser block-RAMs.<\/jats:p>","DOI":"10.1145\/3510028","type":"journal-article","created":{"date-parts":[[2022,1,26]],"date-time":"2022-01-26T18:01:53Z","timestamp":1643220113000},"page":"1-18","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":4,"title":["Reduced Memory Viterbi Decoding for Hardware-accelerated Speech Recognition"],"prefix":"10.1145","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0001-8529-7920","authenticated-orcid":false,"given":"Pani Prithvi","family":"Raj","sequence":"first","affiliation":[{"name":"Indian Institute of Technology Madras, Madras, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0570-4993","authenticated-orcid":false,"given":"Pakala Akhil","family":"Reddy","sequence":"additional","affiliation":[{"name":"Indian Institute of Technology Madras, Terraces on Brompton, Houston, TX, India"}],"role":[{"role":"author","vocabulary":"crossref"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-9258-7317","authenticated-orcid":false,"given":"Nitin","family":"Chandrachoodan","sequence":"additional","affiliation":[{"name":"Indian Institute of Technology Madras, IIT Madras, India"}],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"320","published-online":{"date-parts":[[2022,5,28]]},"reference":[{"key":"e_1_3_2_2_2","article-title":"Frustratingly easy noise-aware training of acoustic models","author":"Raj Desh","year":"2021","unstructured":"Desh Raj, Jesus Villalba, Daniel Povey, and Sanjeev Khudanpur. 2021. Frustratingly easy noise-aware training of acoustic models. arXiv:2011.02090. Retrieved from https:\/\/arxiv.org\/abs\/2011.02090.","journal-title":"arXiv:2011.02090"},{"key":"e_1_3_2_3_2","doi-asserted-by":"publisher","DOI":"10.1145\/3123939.3124542"},{"key":"e_1_3_2_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2017.2752838"},{"key":"e_1_3_2_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/TC.2019.2937075"},{"key":"e_1_3_2_6_2","doi-asserted-by":"publisher","DOI":"10.1145\/3425604"},{"key":"e_1_3_2_7_2","doi-asserted-by":"publisher","DOI":"10.1109\/JSSC.2014.2367818"},{"key":"e_1_3_2_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/3007787.3001163"},{"key":"e_1_3_2_9_2","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2016-287"},{"key":"e_1_3_2_10_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-981-15-1275-9_32"},{"key":"e_1_3_2_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/TASSP.1980.1163420"},{"key":"e_1_3_2_12_2","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2013-48"},{"key":"e_1_3_2_13_2","doi-asserted-by":"publisher","DOI":"10.1006\/csla.2001.0184"},{"key":"e_1_3_2_14_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-540-49127-9_28"},{"key":"e_1_3_2_15_2","volume-title":"Proceedings of the 7th European Conference on Speech Communication and Technology","author":"Willett Daniel","year":"2001","unstructured":"Daniel Willett et\u00a0al. 2001. Time and memory efficient viterbi decoding for LVCSR using a precompiled search network. In Proceedings of the 7th European Conference on Speech Communication and Technology. https:\/\/www.isca-speech.org\/archive_v0\/archive_papers\/eurospeech_2001\/e01_0847.pdf."},{"key":"e_1_3_2_16_2","doi-asserted-by":"publisher","DOI":"10.1109\/TIT.1967.1054010"},{"key":"e_1_3_2_17_2","volume-title":"Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding","author":"Povey Daniel","year":"2011","unstructured":"Daniel Povey et\u00a0al. 2011. The kaldi speech recognition toolkit. In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding. https:\/\/www.danielpovey.com\/files\/2011_asru_kaldi.pdf."},{"key":"e_1_3_2_18_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11042-021-10800-8"},{"key":"e_1_3_2_19_2","volume-title":"Introduction to Algorithms (2nd ed.)","author":"Cormen Thomas H.","year":"2001","unstructured":"Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2001. Introduction to Algorithms (2nd ed.). The MIT Press. https:\/\/doc.lagout.org\/science\/0_Computer%20Science\/2_Algorithms\/Introduction%20to%20Algorithms%2C%202nd%20Edition.pdf."},{"key":"e_1_3_2_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2018.00071"},{"key":"e_1_3_2_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2012.6288972"},{"key":"e_1_3_2_22_2","unstructured":"Xilinx Inc.2019. UG902 Vivado Design Suite User Guide\u2014High-Level Synthesis. https:\/\/docs.xilinx.com\/v\/u\/en-US\/ug902-vivado-high-level-synthesis."},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/ELMAR.2016.7731785"},{"key":"e_1_3_2_24_2","unstructured":"Xilinx Inc.2019. UG1027 SDSoC Environment User Guide. https:\/\/www.xilinx.com\/support\/documents\/sw_manuals\/xilinx2019_1\/ug1027-sdsoc-user-guide.pdf."},{"key":"e_1_3_2_25_2","unstructured":"Xilinx Inc.2019. ZCU102 Evaluation Board User Guide (UG1182). Retrieved on July 29 2021 from https:\/\/www.xilinx.com\/support\/documentation\/boards_and_kits\/zcu102\/ug1182-zcu102-eval-bd.pdf."},{"key":"e_1_3_2_26_2","unstructured":"2006. Librivox\u2014 Solomon Mines Audio Book. Retrieved on October 7 2021 from https:\/\/librivox.org\/king-solomons-mines-by-haggard\/."},{"key":"e_1_3_2_27_2","unstructured":"Micron Technology Inc.2016. DDR4 SDRAM SODIMM Features (MTA4ATF51264HZ\u20132G6E1). Retrieved on July 29 2021 from https:\/\/media-www.micron.com\/-\/media\/client\/global\/documents\/products\/data-sheet\/modules\/sodimm\/ddr4\/atf4c512x64hz.pdf?rev=e4f0743341814159bc75d9f2511f4dfd."},{"key":"e_1_3_2_28_2","unstructured":"Micron Technology Inc. DDR4 Power Calculator. Retrieved on June 28 2021 from https:\/\/media-www.micron.com\/-\/media\/client\/global\/documents\/products\/power-calculator\/ddr4_power_calc.xlsm?la=en&rev=5e97be39078d4a1b8619cb85c96bbe63."},{"key":"e_1_3_2_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/CICC.2012.6330678"},{"key":"e_1_3_2_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2010.5495099"},{"key":"e_1_3_2_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/ASPDAC.2013.6509561"},{"key":"e_1_3_2_32_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2010.5495538"},{"key":"e_1_3_2_33_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11265-011-0587-9"},{"key":"e_1_3_2_34_2","volume-title":"Energy-Scalable Speech Recognition Circuits","author":"Price Michael","year":"2016","unstructured":"Michael Price. 2016. Energy-Scalable Speech Recognition Circuits. Ph.D. Dissertation. Massachusetts Institute of Technology. https:\/\/dspace.mit.edu\/handle\/1721.1\/106090."},{"key":"e_1_3_2_35_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCSI.2010.2041501"},{"key":"e_1_3_2_36_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.2006.1659988"}],"container-title":["ACM Transactions on Embedded Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3510028","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3510028","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:12:24Z","timestamp":1750191144000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3510028"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,5,28]]},"references-count":35,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2022,5,31]]}},"alternative-id":["10.1145\/3510028"],"URL":"https:\/\/doi.org\/10.1145\/3510028","relation":{},"ISSN":["1539-9087","1558-3465"],"issn-type":[{"value":"1539-9087","type":"print"},{"value":"1558-3465","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,5,28]]},"assertion":[{"value":"2021-10-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-12-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-05-28","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}