{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T04:28:16Z","timestamp":1750307296444,"version":"3.41.0"},"reference-count":38,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2011,5,1]],"date-time":"2011-05-01T00:00:00Z","timestamp":1304208000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/100000001","name":"National Science Foundation","doi-asserted-by":"publisher","award":["CNS DDDAS-SMRP: 0540407"],"award-info":[{"award-number":["CNS DDDAS-SMRP: 0540407"]}],"id":[{"id":"10.13039\/100000001","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/100000144","name":"Division of Computer and Network Systems","doi-asserted-by":"publisher","award":["CNS DDDAS-SMRP: 0540407"],"award-info":[{"award-number":["CNS DDDAS-SMRP: 0540407"]}],"id":[{"id":"10.13039\/100000144","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Reconfigurable Technol. Syst."],"published-print":{"date-parts":[[2011,5]]},"abstract":"<jats:p>Domain-specific optimizations on matrix computations exploiting specific arithmetic and matrix representation formats have achieved significant performance\/area gains in Field-Programmable Gate Array (FPGA) hardware designs. In this article, we explore the application of data-driven optimizations to reduce both storage and computation requirements to the problem of signal recognition from a known dictionary. By starting with a high-level mathematical representation of a signal recognition problem, we perform optimizations across the layers of the system, exploiting mathematical structure to improve implementation efficiency. Specifically, we use Walsh wavelet packets in conjunction with a BestBasis algorithm to distinguish between spoken digits. The resulting transform matrices are quite sparse, and exhibit a rich algebraic structure that contains significant overlap across rows. As a consequence, dot-product computations of the transform matrix and signal vectors exhibit significant computation reuse, or repeated identical computations. We present an algorithm for identifying this computation reuse and scheduling of the row computations. We exploit this reuse to derive FPGA hardware implementations that reduce the amount of computation for an individual matrix by as much as 6.35\u00d7 and an average of 2\u00d7 for a single dot-product unit. The implementation that exploits reuse achieves a 2\u00d7 computation reduction compared to three concurrently-executing simpler accumulator units with the same aggregate design area and outperforms software implementations on high-end desktop personal computers.<\/jats:p>","DOI":"10.1145\/1968502.1968508","type":"journal-article","created":{"date-parts":[[2011,6,6]],"date-time":"2011-06-06T11:51:38Z","timestamp":1307361098000},"page":"1-26","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Domain-Specific Optimization of Signal Recognition Targeting FPGAs"],"prefix":"10.1145","volume":"4","author":[{"given":"Melina","family":"Demertzi","sequence":"first","affiliation":[{"name":"University of Southern California"}]},{"given":"Pedro C.","family":"Diniz","sequence":"additional","affiliation":[{"name":"INESC-ID, Lisboa"}]},{"given":"Mary W.","family":"Hall","sequence":"additional","affiliation":[{"name":"University of Utah"}]},{"given":"Anna C.","family":"Gilbert","sequence":"additional","affiliation":[{"name":"University of Michigan"}]},{"given":"Yi","family":"Wang","sequence":"additional","affiliation":[{"name":"University of Michigan"}]}],"member":"320","published-online":{"date-parts":[[2011,5]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Compilers: Principles, Techniques and Toos","author":"Aho A.","year":"2006","unstructured":"Aho , A. , Lam , M. , Sethi , R. , and Ullman , J . 2006 . Compilers: Principles, Techniques and Toos 2 nd Ed. Addison-Wesley , Reading, MA . Aho, A., Lam, M., Sethi, R., and Ullman, J. 2006. Compilers: Principles, Techniques and Toos 2nd Ed. Addison-Wesley, Reading, MA.","edition":"2"},{"volume-title":"2000. Templates for the Soution of Algebraic Eigenvalue Problems: A Practical Guide","author":"Bai Z.","key":"e_1_2_1_2_1","unstructured":"Bai , Z. , Demmel , J. , Dongarra , J. , Ruhe , A. , and van der Vorst , H. , Eds. 2000. Templates for the Soution of Algebraic Eigenvalue Problems: A Practical Guide . Society for Industrial and Applied Mathematics , Philadelphia, PA . Bai, Z., Demmel, J., Dongarra, J., Ruhe, A., and van der Vorst, H., Eds. 2000. Templates for the Soution of Algebraic Eigenvalue Problems: A Practical Guide. Society for Industrial and Applied Mathematics, Philadelphia, PA."},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1145\/1117201.1117204"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.5555\/1058426.1058883"},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1145\/1462586.1462593"},{"volume-title":"Proceedings of the International Conference on Field Programmable Logic and Applications (FPL\u201906)","author":"Callanan O.","key":"e_1_2_1_6_1","unstructured":"Callanan , O. , Gregg , D. , Nisbet , A. , and Peardon , M . 2006. High performance scientific computing using FPGAs with IEEE floating point and logarithmic arithmetic for lattice QCD . In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL\u201906) . 1--6. Callanan, O., Gregg, D., Nisbet, A., and Peardon, M. 2006. High performance scientific computing using FPGAs with IEEE floating point and logarithmic arithmetic for lattice QCD. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL\u201906). 1--6."},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1007\/s003659910004"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/18.119732"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1145\/785411.785415"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/FCCM.2007.34"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1145\/1046192.1046203"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.2307\/2374796"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1073\/pnas.0437847100"},{"key":"e_1_2_1_14_1","doi-asserted-by":"publisher","DOI":"10.1145\/1046192.1046204"},{"key":"e_1_2_1_15_1","unstructured":"Ellis D. 2003. Recoded digits archive at Columbia University. Ellis D. 2003. Recoded digits archive at Columbia University."},{"volume-title":"Proceedings of the International Conference on Field Programmable Logic and Applications (FPL\u201905)","author":"Fahmy S.","key":"e_1_2_1_16_1","unstructured":"Fahmy , S. , Cheung , P. , and Luk , W . 2005. Novel FPGA-based implementation of median and weighted median filters for image processing . In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL\u201905) . 142--147. Fahmy, S., Cheung, P., and Luk, W. 2005. Novel FPGA-based implementation of median and weighted median filters for image processing. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL\u201905). 142--147."},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/301618.301661"},{"key":"e_1_2_1_18_1","doi-asserted-by":"publisher","DOI":"10.1109\/78.258082"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.1109\/RECONFIG.2005.10"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1145\/951710.951739"},{"volume-title":"Proceedings of the 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM\u201902)","author":"Melnikoff S.","key":"e_1_2_1_21_1","unstructured":"Melnikoff , S. , Quigley , S. , and Russell , M . 2002. Implementing a simple continuous speech recognition system on an FPGA . In Proceedings of the 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM\u201902) . Melnikoff, S., Quigley, S., and Russell, M. 2002. Implementing a simple continuous speech recognition system on an FPGA. In Proceedings of the 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM\u201902)."},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/1391469.1391572"},{"key":"e_1_2_1_23_1","volume-title":"Proceedings of the 13th International Conference on Field Programmable Logic and Applications (FPL\u201903)","volume":"2778","author":"Ortigosa E.","unstructured":"Ortigosa , E. , Ortigosa , P. , Canas , A. , Ros , E. , Agis , R. , and Ortega , J . 2003. FPGA implementation of multi-layer perceptrons for speech recognition . In Proceedings of the 13th International Conference on Field Programmable Logic and Applications (FPL\u201903) . Lecture Notes in Computer Science , vol. 2778 . Springer. Ortigosa, E., Ortigosa, P., Canas, A., Ros, E., Agis, R., and Ortega, J. 2003. FPGA implementation of multi-layer perceptrons for speech recognition. In Proceedings of the 13th International Conference on Field Programmable Logic and Applications (FPL\u201903). Lecture Notes in Computer Science, vol. 2778. Springer."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1145\/378239.378564"},{"volume-title":"Proceedings of the 27th Annual Asilomar Conference on Signals, Systems, and Computers. 40--44","author":"Pati Y. C.","key":"e_1_2_1_25_1","unstructured":"Pati , Y. C. , Rezaiifar , R. , and Krishnaprasad , P. S . 1993. Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition . In Proceedings of the 27th Annual Asilomar Conference on Signals, Systems, and Computers. 40--44 . Pati, Y. C., Rezaiifar, R., and Krishnaprasad, P. S. 1993. Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition. In Proceedings of the 27th Annual Asilomar Conference on Signals, Systems, and Computers. 40--44."},{"key":"e_1_2_1_26_1","unstructured":"Proakis J. G. and Manolakis D. K. 1996. Digital Signal Processing: Principles Algorithms and Applications 3rd Ed. Prentice-Hall NJ. Proakis J. G. and Manolakis D. K. 1996. Digital Signal Processing: Principles Algorithms and Applications 3rd Ed. Prentice-Hall NJ."},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1109\/JPROC.2004.840306"},{"key":"e_1_2_1_28_1","doi-asserted-by":"crossref","unstructured":"Saito N. and Coifman R. 1994. Local discriminant bases. In SPIE Math. Imag. Wavelet Appl. Signal Image Process. (SPIE). vol. 2303. Saito N. and Coifman R. 1994. Local discriminant bases. In SPIE Math. Imag. Wavelet Appl. Signal Image Process. (SPIE) . vol. 2303.","DOI":"10.1117\/12.188763"},{"volume-title":"Proceedings of Supercomputing (SC\u201992)","author":"Temam O.","key":"e_1_2_1_29_1","unstructured":"Temam , O. and Jalby , W . 1992. Characterizing the behavior of sparse algorithms on caches . In Proceedings of Supercomputing (SC\u201992) . IEEE Computer Society Press, 578--587. Temam, O. and Jalby, W. 1992. Characterizing the behavior of sparse algorithms on caches. In Proceedings of Supercomputing (SC\u201992). IEEE Computer Society Press, 578--587."},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.1006\/acha.1996.0009"},{"volume-title":"Greed is good: Algorithmic results for sparse approximation. Tech. rep. 2003--04","author":"Tropp J. A.","key":"e_1_2_1_31_1","unstructured":"Tropp , J. A. 2003. Greed is good: Algorithmic results for sparse approximation. Tech. rep. 2003--04 , Texas Institute for Computational and Applied Mathematics . Tropp, J. A. 2003. Greed is good: Algorithmic results for sparse approximation. Tech. rep. 2003--04, Texas Institute for Computational and Applied Mathematics."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.1145\/968280.968305"},{"key":"e_1_2_1_33_1","doi-asserted-by":"publisher","DOI":"10.1007\/11557654_91"},{"key":"e_1_2_1_34_1","doi-asserted-by":"crossref","unstructured":"Woods R. McAllister J. and Lightbody G. 2008. FPGA-Based Implementation of Signal Processing Systems. John Wiley &amp; Sons. Woods R. McAllister J. and Lightbody G. 2008. FPGA-Based Implementation of Signal Processing Systems . John Wiley &amp; Sons.","DOI":"10.1002\/9780470713785"},{"key":"e_1_2_1_35_1","unstructured":"Xilinx Corp. 2007. Virtex-II Pro&lt;sup&gt;TM&lt;\/sup&gt; platform FPGAs: Complete data sheet. Xilinx Corp. 2007. Virtex-II Pro&lt;sup&gt;TM&lt;\/sup&gt; platform FPGAs: Complete data sheet."},{"key":"e_1_2_1_36_1","unstructured":"Xilinx Corp. 2008. Virtex-3&lt;sup&gt;TM&lt;\/sup&gt; platform FPGAs: Complete data sheet. Xilinx Corp. 2008. Virtex-3&lt;sup&gt;TM&lt;\/sup&gt; platform FPGAs: Complete data sheet."},{"key":"e_1_2_1_37_1","unstructured":"Xilinx Corp. 2009. Virtex-5&lt;sup&gt;TM&lt;\/sup&gt; platform FPGAs: Complete data sheet. Xilinx Corp. 2009. Virtex-5&lt;sup&gt;TM&lt;\/sup&gt; platform FPGAs: Complete data sheet."},{"key":"e_1_2_1_38_1","doi-asserted-by":"publisher","DOI":"10.1145\/1046192.1046202"}],"container-title":["ACM Transactions on Reconfigurable Technology and Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1968502.1968508","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/1968502.1968508","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T10:59:48Z","timestamp":1750244388000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/1968502.1968508"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2011,5]]},"references-count":38,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2011,5]]}},"alternative-id":["10.1145\/1968502.1968508"],"URL":"https:\/\/doi.org\/10.1145\/1968502.1968508","relation":{},"ISSN":["1936-7406","1936-7414"],"issn-type":[{"type":"print","value":"1936-7406"},{"type":"electronic","value":"1936-7414"}],"subject":[],"published":{"date-parts":[[2011,5]]},"assertion":[{"value":"2009-02-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2009-12-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2011-05-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}