{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T11:05:31Z","timestamp":1740135931618,"version":"3.37.3"},"reference-count":3,"publisher":"Wiley","license":[{"start":{"date-parts":[[2014,2,24]],"date-time":"2014-02-24T00:00:00Z","timestamp":1393200000000},"content-version":"unspecified","delay-in-days":0,"URL":"http:\/\/creativecommons.org\/licenses\/by\/3.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["VLSI Design"],"published-print":{"date-parts":[[2014,2,24]]},"abstract":"<jats:p>Many applications ranging from machine learning, image processing, and machine vision to optimization utilize matrix multiplication as a fundamental block. Matrix operations\nplay an important role in determining the performance of such applications. This paper proposes a novel efficient, highly scalable hardware accelerator that is of equivalent performance to a 2\u2009GHz quad core PC but can be used in low-power applications targeting embedded systems requiring high performance computation. Power, performance, and resource consumption are demonstrated on a fully-functional prototype. The proposed hardware accelerator is 36\u00d7 more energy efficient per unit of computation compared to state-of-the-art Xeon processor of equal vintage and is 14\u00d7 more efficient as a stand-alone platform with equivalent performance. An important comparison between simulated system estimates and real system performance is carried out.<\/jats:p>","DOI":"10.1155\/2014\/712085","type":"journal-article","created":{"date-parts":[[2014,2,24]],"date-time":"2014-02-24T21:03:14Z","timestamp":1393275794000},"page":"1-11","source":"Crossref","is-referenced-by-count":3,"title":["A Low-Power Scalable Stream Compute Accelerator for General Matrix Multiply (GEMM)"],"prefix":"10.1155","volume":"2014","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-7589-477X","authenticated-orcid":true,"given":"Antony","family":"Savich","sequence":"first","affiliation":[{"name":"School of Engineering, University of Guelph, Guelph, ON, Canada N1G 2W1"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4832-0911","authenticated-orcid":true,"given":"Shawki","family":"Areibi","sequence":"additional","affiliation":[{"name":"School of Engineering, University of Guelph, Guelph, ON, Canada N1G 2W1"}]}],"member":"311","reference":[{"key":"3","doi-asserted-by":"publisher","DOI":"10.1049\/ip-cds:20040838"},{"key":"7","doi-asserted-by":"publisher","DOI":"10.5120\/3084-4222"},{"key":"10","doi-asserted-by":"publisher","DOI":"10.1007\/s10766-010-0131-8"}],"container-title":["VLSI Design"],"original-title":[],"language":"en","link":[{"URL":"http:\/\/downloads.hindawi.com\/archive\/2014\/712085.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/downloads.hindawi.com\/archive\/2014\/712085.xml","content-type":"application\/xml","content-version":"vor","intended-application":"text-mining"},{"URL":"http:\/\/downloads.hindawi.com\/archive\/2014\/712085.pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2020,12,9]],"date-time":"2020-12-09T19:41:50Z","timestamp":1607542910000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.hindawi.com\/journals\/vlsi\/2014\/712085\/"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,2,24]]},"references-count":3,"alternative-id":["712085","712085"],"URL":"https:\/\/doi.org\/10.1155\/2014\/712085","relation":{},"ISSN":["1065-514X","1563-5171"],"issn-type":[{"type":"print","value":"1065-514X"},{"type":"electronic","value":"1563-5171"}],"subject":[],"published":{"date-parts":[[2014,2,24]]}}}