{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,12,4]],"date-time":"2025-12-04T18:39:53Z","timestamp":1764873593050,"version":"3.41.0"},"reference-count":22,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2018,9,4]],"date-time":"2018-09-04T00:00:00Z","timestamp":1536019200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"funder":[{"DOI":"10.13039\/501100001602","name":"Science Foundation Ireland","doi-asserted-by":"crossref","award":["12\/IA\/1381"],"award-info":[{"award-number":["12\/IA\/1381"]}],"id":[{"id":"10.13039\/501100001602","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2018,9,30]]},"abstract":"<jats:p>Convolutional neural networks (CNNs) are one of the most successful machine-learning techniques for image, voice, and video processing. CNNs require large amounts of processing capacity and memory bandwidth. Hardware accelerators have been proposed for CNNs that typically contain large numbers of multiply-accumulate (MAC) units, the multipliers of which are large in integrated circuit (IC) gate count and power consumption. \u201cWeight-sharing\u201d accelerators have been proposed where the full range of weight values in a trained CNN are compressed and put into bins, and the bin index is used to access the weight-shared value. We reduce power and area of the CNN by implementing parallel accumulate shared MAC (PASM) in a weight-shared CNN. PASM re-architects the MAC to instead count the frequency of each weight and place it in a bin. The accumulated value is computed in a subsequent multiply phase, significantly reducing gate count and power consumption of the CNN. In this article, we implement PASM in a weight-shared CNN convolution hardware accelerator and analyze its effectiveness. Experiments show that for a clock speed 1GHz implemented on a 45nm ASIC process our approach results in fewer gates, smaller logic, and reduced power with only a slight increase in latency. We also show that the same weight-shared-with-PASM CNN accelerator can be implemented in resource-constrained FPGAs, where the FPGA has limited numbers of digital signal processor (DSP) units to accelerate the MAC operations.<\/jats:p>","DOI":"10.1145\/3233300","type":"journal-article","created":{"date-parts":[[2018,9,4]],"date-time":"2018-09-04T12:37:30Z","timestamp":1536064650000},"page":"1-24","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":29,"title":["Low Complexity Multiply-Accumulate Units for Convolutional Neural Networks with Weight-Sharing"],"prefix":"10.1145","volume":"15","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8688-9407","authenticated-orcid":false,"given":"James","family":"Garland","sequence":"first","affiliation":[{"name":"Trinity College Dublin and Trinity College Dublin, Ireland"}]},{"given":"David","family":"Gregg","sequence":"additional","affiliation":[{"name":"Trinity College Dublin and Trinity College Dublin, Ireland"}]}],"member":"320","published-online":{"date-parts":[[2018,9,4]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1145\/2541940.2541967"},{"key":"e_1_2_1_2_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2016.40"},{"volume-title":"4th International Conference on Learning Representations (ICLR'16)","year":"2016","author":"Dettmers Tim","key":"e_1_2_1_3_1"},{"volume-title":"Proceedings of 2010 IEEE International Symposium on Circuits and Systems. 257--260","author":"Farabet C.","key":"e_1_2_1_4_1"},{"volume-title":"WP486 (v1.0.1)","year":"2016","author":"Fu Yao","key":"e_1_2_1_5_1"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1145\/1250790.1250800"},{"volume-title":"Constraining Designs for Synthesis and Timing Analysis: A Practical Guide to Synopsys Design Constraints (SDC)","author":"Gangadharan Sridhar","key":"e_1_2_1_7_1","doi-asserted-by":"crossref","DOI":"10.1007\/978-1-4614-3269-2"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1109\/LCA.2017.2656880"},{"volume-title":"Proceedings of the 32nd International Conference on Machine Learning. 1737--1746","year":"2015","author":"Gupta Suyog","key":"e_1_2_1_9_1"},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2016.30"},{"volume-title":"International Conference on Learning Representations (ICLR'16)","author":"Han Song","key":"e_1_2_1_11_1"},{"volume-title":"Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR\u201916)","author":"He K.","key":"e_1_2_1_12_1"},{"key":"e_1_2_1_13_1","doi-asserted-by":"publisher","DOI":"10.1109\/MSP.2012.2205597"},{"volume-title":"Proceedings of the 25th International Conference on Neural Information Processing Systems\u2014Volume 1 (NIPS\u201912)","author":"Krizhevsky Alex","key":"e_1_2_1_14_1"},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.1162\/neco.1989.1.4.541"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/5.726791"},{"key":"e_1_2_1_17_1","doi-asserted-by":"publisher","DOI":"10.1145\/3020078.3021736"},{"volume-title":"Proceedings of the 2015 2nd International Conference on Electronics and Communication Systems (ICECS\u201915)","author":"Sabeetha S.","key":"e_1_2_1_18_1"},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.21437\/Interspeech.2014-274"},{"volume-title":"Very deep convolutional networks for large-scale image recognition. CoRR abs\/1409.1556","year":"2014","author":"Simonyan Karen","key":"e_1_2_1_20_1"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"e_1_2_1_22_1","doi-asserted-by":"publisher","DOI":"10.1145\/2684746.2689060"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3233300","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3233300","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T02:13:12Z","timestamp":1750212792000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3233300"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2018,9,4]]},"references-count":22,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2018,9,30]]}},"alternative-id":["10.1145\/3233300"],"URL":"https:\/\/doi.org\/10.1145\/3233300","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"type":"print","value":"1544-3566"},{"type":"electronic","value":"1544-3973"}],"subject":[],"published":{"date-parts":[[2018,9,4]]},"assertion":[{"value":"2018-01-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2018-06-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2018-09-04","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}