{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T04:21:32Z","timestamp":1750220492666,"version":"3.41.0"},"reference-count":63,"publisher":"Association for Computing Machinery (ACM)","issue":"5","license":[{"start":{"date-parts":[[2022,6,6]],"date-time":"2022-06-06T00:00:00Z","timestamp":1654473600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Des. Autom. Electron. Syst."],"published-print":{"date-parts":[[2022,9,30]]},"abstract":"<jats:p>\n            Mobile and edge devices become common platforms for inferring\n            <jats:bold>convolutional neural networks (CNNs)<\/jats:bold>\n            due to superior privacy and service quality. To reduce the computational costs of\n            <jats:bold>convolution (CONV)<\/jats:bold>\n            , recent CNN models adopt\n            <jats:bold>depth-wise CONV (DW-CONV)<\/jats:bold>\n            and\n            <jats:bold>Squeeze-and-Excitation (SE)<\/jats:bold>\n            . However, existing area-efficient CNN accelerators are sub-optimal for these latest CNN models because they were mainly optimized for compute-intensive standard CONV layers with abundant data reuse that can be pipelined with activation and normalization operations. In contrast, DW-CONV and SE are memory-intensive with limited data reuse. The latter also strongly depends on the nearby CONV layers, making an effective pipelining a daunting task. Therefore, DW-CONV and SE only occupy 10% of entire operations but become memory bandwidth bound, spending more than 60% of the processing time in systolic-array-based accelerators.\n          <\/jats:p>\n          <jats:p>\n            We propose a CNN acceleration architecture called MVP, which efficiently processes both compute- and memory-intensive operations with a small area overhead on top of the baseline systolic-array-based architecture. We suggest a specialized vector unit tailored for processing DW-CONV, including multipliers, adder trees, and multi-banked buffers to meet the high memory bandwidth requirement. We augment the unified buffer with tiny processing elements to smoothly pipeline SE with the subsequent CONV, enabling concurrent processing of DW-CONV with standard CONV, thereby achieving the maximum utilization of arithmetic units. Our evaluation shows that MVP improves performance by 2.6\n            <jats:inline-formula content-type=\"math\/tex\">\n              <jats:tex-math notation=\"LaTeX\" version=\"MathJax\">\\( \\times \\)<\/jats:tex-math>\n            <\/jats:inline-formula>\n            and reduces energy by 47% on average for EfficientNet-B0\/B4\/B7, MnasNet, and MobileNet-V1\/V2 with only a 9% area overhead compared to the baseline.\n          <\/jats:p>","DOI":"10.1145\/3497745","type":"journal-article","created":{"date-parts":[[2022,2,24]],"date-time":"2022-02-24T17:13:41Z","timestamp":1645722821000},"page":"1-25","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["MVP: An Efficient CNN Accelerator with Matrix, Vector, and Processing-Near-Memory Units"],"prefix":"10.1145","volume":"27","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-5177-0916","authenticated-orcid":false,"given":"Sunjung","family":"Lee","sequence":"first","affiliation":[{"name":"Department of Intelligence and Information, Seoul National University, Seoul, South Korea"}]},{"given":"Jaewan","family":"Choi","sequence":"additional","affiliation":[{"name":"Department of Intelligence and Information, Seoul National University, Seoul, South Korea"}]},{"given":"Wonkyung","family":"Jung","sequence":"additional","affiliation":[{"name":"Department of Intelligence and Information, Seoul National University, Seoul, South Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3227-2436","authenticated-orcid":false,"given":"Byeongho","family":"Kim","sequence":"additional","affiliation":[{"name":"Department of Intelligence and Information, Seoul National University, Seoul, South Korea"}]},{"given":"Jaehyun","family":"Park","sequence":"additional","affiliation":[{"name":"Department of Intelligence and Information, Seoul National University, Seoul, South Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5147-0972","authenticated-orcid":false,"given":"Hweesoo","family":"Kim","sequence":"additional","affiliation":[{"name":"Samsung Electronics Co., Ltd, Hwaseong, Gyeonggi, South Korea"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-1733-1394","authenticated-orcid":false,"given":"Jung Ho","family":"Ahn","sequence":"additional","affiliation":[{"name":"Department of Intelligence and Information &amp; Inter-University Semiconductor Research Center, Seoul National University, Seoul, South Korea"}]}],"member":"320","published-online":{"date-parts":[[2022,6,6]]},"reference":[{"key":"e_1_3_2_2_2"},{"key":"e_1_3_2_3_2"},{"key":"e_1_3_2_4_2","unstructured":"Apple. 2017. On-device Deep Neural Network for Face Detection. https:\/\/machinelearning.apple.com\/research\/face-detection."},{"key":"e_1_3_2_5_2"},{"key":"e_1_3_2_6_2"},{"key":"e_1_3_2_7_2"},{"key":"e_1_3_2_8_2"},{"key":"e_1_3_2_9_2"},{"key":"e_1_3_2_10_2","unstructured":"Sharan Chetlur Cliff Woolley Philippe Vandermersch Jonathan Cohen John Tran Bryan Catanzaro and Evan Shelhamer. 2014. cuDNN: Efficient primitives for deep learning. (2014). arXiv:1410.0759 https:\/\/arxiv.org\/abs\/1410.0759."},{"key":"e_1_3_2_11_2"},{"key":"e_1_3_2_12_2"},{"key":"e_1_3_2_13_2"},{"key":"e_1_3_2_14_2"},{"key":"e_1_3_2_15_2"},{"key":"e_1_3_2_16_2","unstructured":"Gartner. 2020. Gartner Highlights 10 Uses for AI-Powered Smartphones. https:\/\/www.gartner.com\/en\/newsroom\/press-releases\/2018-03-20-gartner-highlights-10-uses-for-ai-powered-smartphones."},{"key":"e_1_3_2_17_2","unstructured":"Google. 2018. Edge TPU. https:\/\/cloud.google.com\/edge-tpu."},{"key":"e_1_3_2_18_2","unstructured":"Google. 2019. Pixel 4 is here to help. https:\/\/blog.google\/products\/pixel\/pixel-4\/."},{"key":"e_1_3_2_19_2"},{"key":"e_1_3_2_20_2","volume-title":"Proceeding of the Conference on Machine Learning and Systems (MLSys)","author":"Gupta Suyog","year":"2020","unstructured":"Suyog Gupta and Berkin Akin. 2020. Accelerator-aware neural network design using AutoML. In Proceeding of the Conference on Machine Learning and Systems (MLSys). https:\/\/arxiv.org\/abs\/2003.02838."},{"key":"e_1_3_2_21_2"},{"key":"e_1_3_2_22_2"},{"key":"e_1_3_2_23_2","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-319-46493-0_38"},{"key":"e_1_3_2_24_2","unstructured":"Geoffrey Hinton Oriol Vinyals and Jeff Dean. 2015. Distilling the knowledge in a neural network. (2015). arXiv:1503.02531 http:\/\/arxiv.org\/abs\/1503.02531."},{"key":"e_1_3_2_25_2","unstructured":"Andrew G. Howard Menglong Zhu Bo Chen Dmitry Kalenichenko Weijun Wang Tobias Weyand Marco Andreetto and Hartwig Adam. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. (2017). arXiv:1704.04861 http:\/\/arxiv.org\/abs\/1704.04861."},{"key":"e_1_3_2_26_2"},{"key":"e_1_3_2_27_2"},{"key":"e_1_3_2_28_2","volume-title":"Proceedings of the International Conference on Machine Learning (ICML)","author":"Ioffe Sergey","year":"2015","unstructured":"Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning (ICML). http:\/\/proceedings.mlr.press\/v37\/ioffe15.html."},{"key":"e_1_3_2_29_2"},{"key":"e_1_3_2_30_2"},{"key":"e_1_3_2_31_2"},{"key":"e_1_3_2_32_2"},{"key":"e_1_3_2_33_2"},{"key":"e_1_3_2_34_2"},{"key":"e_1_3_2_35_2"},{"key":"e_1_3_2_36_2"},{"key":"e_1_3_2_37_2"},{"key":"e_1_3_2_38_2"},{"volume-title":"IEEE Hot Chips Symposium (HCS)","year":"2018","key":"e_1_3_2_39_2","unstructured":"NVIDIA. 2018. The NVIDIA deep learning accelerator. In IEEE Hot Chips Symposium (HCS)."},{"key":"e_1_3_2_40_2"},{"key":"e_1_3_2_41_2","unstructured":"Hieu Pham Zihang Dai Qizhe Xie Minh-Thang Luong and Quoc V. Le. 2020. Meta pseudo labels. (2020). arXiv:2003.10580 https:\/\/arxiv.org\/abs\/2003.10580."},{"key":"e_1_3_2_42_2"},{"key":"e_1_3_2_43_2","unstructured":"Jonathan Ross and Andrew Everett Pheps. 2015. Computing Convolutions Using a Neural Network Processor. US Patent App. 62\/164 902."},{"key":"e_1_3_2_44_2","unstructured":"Jonathan Ross and Gregory Michael Thorson. 2015. Rotating Data for Neural Network Computations. US Patent App. 62\/164 908."},{"key":"e_1_3_2_45_2"},{"key":"e_1_3_2_46_2","volume-title":"Proceedings of the International Conference on Learning Representations (ICLR)","author":"Simonyan Karen","year":"2015","unstructured":"Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations (ICLR). http:\/\/arxiv.org\/abs\/1409.1556."},{"key":"e_1_3_2_47_2"},{"key":"e_1_3_2_48_2","unstructured":"Statista. 2020. Forecast number of mobile users worldwide from 2020 to 2024. https:\/\/www.statista.com\/statistics\/218984\/number-of-global-mobile-users-since-2010\/."},{"key":"e_1_3_2_49_2"},{"key":"e_1_3_2_50_2"},{"key":"e_1_3_2_51_2"},{"key":"e_1_3_2_52_2"},{"key":"e_1_3_2_53_2","unstructured":"Mingxing Tan and Quoc V. Le. 2019. EfficientNet: Rethinking model scaling for convolutional neural networks. (2019). arXiv:1905.11946 http:\/\/arxiv.org\/abs\/1905.11946."},{"key":"e_1_3_2_54_2"},{"key":"e_1_3_2_55_2","unstructured":"Hugo Touvron Andrea Vedaldi Matthijs Douze and Herv\u00e9 J\u00e9gou. 2020. Fixing the train-test resolution discrepancy: FixEfficientNet. (2020). arXiv:2003.08237 https:\/\/arxiv.org\/abs\/2003.08237."},{"key":"e_1_3_2_56_2"},{"key":"e_1_3_2_57_2"},{"key":"e_1_3_2_58_2"},{"key":"e_1_3_2_59_2"},{"key":"e_1_3_2_60_2"},{"key":"e_1_3_2_61_2"},{"key":"e_1_3_2_62_2"},{"key":"e_1_3_2_63_2"},{"key":"e_1_3_2_64_2"}],"container-title":["ACM Transactions on Design Automation of Electronic Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3497745","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3497745","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,17]],"date-time":"2025-06-17T20:49:25Z","timestamp":1750193365000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3497745"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,6,6]]},"references-count":63,"journal-issue":{"issue":"5","published-print":{"date-parts":[[2022,9,30]]}},"alternative-id":["10.1145\/3497745"],"URL":"https:\/\/doi.org\/10.1145\/3497745","relation":{},"ISSN":["1084-4309","1557-7309"],"issn-type":[{"type":"print","value":"1084-4309"},{"type":"electronic","value":"1557-7309"}],"subject":[],"published":{"date-parts":[[2022,6,6]]},"assertion":[{"value":"2021-06-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2021-11-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2022-06-06","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}