{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T05:04:39Z","timestamp":1750309479756,"version":"3.41.0"},"reference-count":45,"publisher":"Association for Computing Machinery (ACM)","issue":"4","license":[{"start":{"date-parts":[[2024,7,9]],"date-time":"2024-07-09T00:00:00Z","timestamp":1720483200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"NSF","award":["CCF-2112665"],"award-info":[{"award-number":["CCF-2112665"]}]},{"DOI":"10.13039\/100000185","name":"Defense Advanced Research Projects Agency","doi-asserted-by":"crossref","award":["FA8650-20-2-7009"],"award-info":[{"award-number":["FA8650-20-2-7009"]}],"id":[{"id":"10.13039\/100000185","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Des. Autom. Electron. Syst."],"published-print":{"date-parts":[[2024,7,31]]},"abstract":"<jats:p>Parameterizable machine learning\u00a0(ML) accelerators are the product of recent breakthroughs in ML. To fully enable their design space exploration\u00a0(DSE), we propose a physical-design-driven, learning-based prediction framework for hardware-accelerated deep neural network\u00a0(DNN) and non-DNN ML algorithms. It adopts a unified approach that combines power, performance, and area\u00a0(PPA) analysis with frontend performance simulation, thereby achieving a realistic estimation of both backend PPA and system metrics such as runtime and energy. In addition, our framework includes a fully automated DSE technique, which optimizes backend and system metrics through an automated search of architectural and backend parameters. Experimental studies show that our approach consistently predicts backend PPA and system metrics with an average 7% or less prediction error for the ASIC implementation of two deep learning accelerator platforms, VTA and VeriGOOD-ML, in both a commercial 12 nm process and a research-oriented 45 nm process.<\/jats:p>","DOI":"10.1145\/3664652","type":"journal-article","created":{"date-parts":[[2024,5,11]],"date-time":"2024-05-11T11:12:29Z","timestamp":1715425949000},"page":"1-33","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":2,"title":["An Open-Source ML-Based Full-Stack Optimization Framework for Machine Learning Accelerators"],"prefix":"10.1145","volume":"29","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-8548-1039","authenticated-orcid":false,"given":"Hadi","family":"Esmaeilzadeh","sequence":"first","affiliation":[{"name":"University of California San Diego, La Jolla, United States"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5514-8027","authenticated-orcid":false,"given":"Soroush","family":"Ghodrati","sequence":"additional","affiliation":[{"name":"University of California San Diego, La Jolla, United States"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-4490-5018","authenticated-orcid":false,"given":"Andrew","family":"Kahng","sequence":"additional","affiliation":[{"name":"CSE and ECE, University of California San Diego, La Jolla, United States"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-2698-7950","authenticated-orcid":false,"given":"Joon Kyung","family":"Kim","sequence":"additional","affiliation":[{"name":"University of California San Diego, La Jolla, United States"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-0955-585X","authenticated-orcid":false,"given":"Sean","family":"Kinzer","sequence":"additional","affiliation":[{"name":"University of California San Diego, La Jolla, United States"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-8077-1328","authenticated-orcid":false,"given":"Sayak","family":"Kundu","sequence":"additional","affiliation":[{"name":"Electrical and Computer Engineering, University of California San Diego, La Jolla, United States"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-2887-9761","authenticated-orcid":false,"given":"Rohan","family":"Mahapatra","sequence":"additional","affiliation":[{"name":"University of California San Diego, La Jolla, United States"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-9358-6255","authenticated-orcid":false,"given":"Susmita Dey","family":"Manasi","sequence":"additional","affiliation":[{"name":"University of Minnesota, Minneapolis, United States"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-5353-2364","authenticated-orcid":false,"given":"Sachin","family":"Sapatnekar","sequence":"additional","affiliation":[{"name":"Electrical and Computer Engineering, Univ of Minnesota, Minneapolis, United States"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6669-9702","authenticated-orcid":false,"given":"Zhiang","family":"Wang","sequence":"additional","affiliation":[{"name":"ECE, University of California San Diego, La Jolla, United States"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6981-2299","authenticated-orcid":false,"given":"Ziqing","family":"Zeng","sequence":"additional","affiliation":[{"name":"University of Minnesota, Minneapolis, United States"}]}],"member":"320","published-online":{"date-parts":[[2024,7,9]]},"reference":[{"doi-asserted-by":"crossref","unstructured":"A. Agnesina K. Chang and S. K. Lim. 2020. VLSI placement parameter optimization using deep reinforcement learning. In Proceedings of the International Conference on Computer-Aided Design (ICCAD\u201920). 1\u20139.","key":"e_1_3_4_2_2","DOI":"10.1145\/3400302.3415690"},{"doi-asserted-by":"crossref","unstructured":"C. Bai Q. Sun J. Zhai Y. Ma B. Yu and M. D. F. Wong. 2021. BOOM-Explorer: RISC-V BOOM microarchitecture design space exploration framework. In Proceedings of the International Conference on Computer-Aided Design (ICCAD\u201921).","key":"e_1_3_4_3_2","DOI":"10.1109\/ICCAD51958.2021.9643455"},{"unstructured":"S. Banerjee S. Burns P. Cocchini A. Davare S. Jain D. Kirkpatrick et\u00a0al. 2020. A highly configurable hardware\/software stack for DNN inference acceleration. Retrieved from https:\/\/arXiv:2111.15024","key":"e_1_3_4_4_2"},{"unstructured":"J. Bergstra and Y. Bengio. 2012. Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13 10 (2012) 281\u2013305.","key":"e_1_3_4_5_2"},{"unstructured":"T. Chen T. Moreau Z. Jiang L. Zheng E. Yan M. Cowan et\u00a0al. 2018. TVM: An automated end-to-end optimizing compiler for deep learning. In Proceedings of the USENIX Conference on Operating Systems Design and Implementation (OSDI\u201918). 578\u2013594.","key":"e_1_3_4_6_2"},{"doi-asserted-by":"crossref","unstructured":"Y.-H. Chen T. Krishna J. S. Emer and V. Sze. 2017. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE JSSC 52 1 (2017) 127\u2013138.","key":"e_1_3_4_7_2","DOI":"10.1109\/JSSC.2016.2616357"},{"doi-asserted-by":"crossref","unstructured":"C. K. Cheng C. Holtz A. B. Kahng B. Lin and U. Mallappa. 2023. DAGSizer: A directed graph convolutional network approach to discrete gate sizing of VLSI graphs. ACM TODAES 28 4 (2023) 1\u201331.","key":"e_1_3_4_8_2","DOI":"10.1145\/3577019"},{"doi-asserted-by":"crossref","unstructured":"S. Dai Y. Zhou H. Zhang E. Ustun E. F. Y. Young and Z. Zhang. 2018. Fast and accurate estimation of quality of results in high-level synthesis with machine learning. In Proceedings of the Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM\u201918). 129\u2013132.","key":"e_1_3_4_9_2","DOI":"10.1109\/FCCM.2018.00029"},{"doi-asserted-by":"crossref","unstructured":"H. Esmaeilzadeh S. Ghodrati J. Gu S. Guo A. B. Kahng J. K. Kim et\u00a0al. 2021. VeriGOOD-ML: An open-source flow for automated ML hardware synthesis. In Proceedings of the International Conference on Computer-Aided Design (ICCAD\u201920). 1\u20138.","key":"e_1_3_4_10_2","DOI":"10.1109\/ICCAD51958.2021.9643449"},{"doi-asserted-by":"crossref","unstructured":"H. Esmaeilzadeh S. Ghodrati A. B. Kahng J. K. Kim S. Kinzer S. Kundu et\u00a0al. 2022. Physically accurate learning-based performance prediction of hardware-accelerated ML algorithms. In Proceedings of the ACM\/IEEE Workshop on Machine Learning for CAD (MLCAD\u201922).","key":"e_1_3_4_11_2","DOI":"10.1145\/3551901.3556489"},{"unstructured":"M. Fey and J. E. Lenssen. 2019. Fast graph representation learning with PyTorch geometric. In Proceedings of the International Conference on Learning Representations (ICLR\u201919).","key":"e_1_3_4_12_2"},{"doi-asserted-by":"crossref","unstructured":"H. Genc S. Kim A. Amid A. H.-Ali V. Iyer P. Prakash et\u00a0al. 2021. Gemmini: Enabling systematic deep-learning architecture evaluation via full-stack integration. In Proceedings of the Design Automation Conference (DAC\u201921). 769\u2013774.","key":"e_1_3_4_13_2","DOI":"10.1109\/DAC18074.2021.9586216"},{"doi-asserted-by":"publisher","unstructured":"T. Head MechCoder G. Louppe I. Shcherbatyi A. Fabisch et\u00a0al. 2018. scikit-optimize\/scikit-optimize: v0.5.2 Zenodo. 10.5281\/zenodo.1207017","key":"e_1_3_4_14_2","DOI":"10.5281\/zenodo.1207017"},{"unstructured":"N. P. Jouppi C. Young N. Patil D. Patterson G. Agrawal R. Bajwa et\u00a0al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the Annual International Symposium on Computer Architecture (ISCA\u201917). 1\u201312.","key":"e_1_3_4_15_2"},{"doi-asserted-by":"crossref","unstructured":"A. B. Kahng B. Lin and S. Nath. 2015. ORION3.0: A comprehensive NoC router estimation tool. IEEE Embed. Syst. Lett. 7 2 (2015) 41\u201345.","key":"e_1_3_4_16_2","DOI":"10.1109\/LES.2015.2402197"},{"unstructured":"D. P. Kingma and J. Ba. 2014. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations (ICLR\u201914).","key":"e_1_3_4_17_2"},{"doi-asserted-by":"publisher","unstructured":"M. J. van der Laan E. C Polley and A. E. Hubbard. 2007. Super learner. Stat. Appl. Genet. Mol. Biol. 6 1 (2007). 10.2202\/1544-6115.1309","key":"e_1_3_4_18_2","DOI":"10.2202\/1544-6115.1309"},{"doi-asserted-by":"crossref","unstructured":"F. Last and U. Schlichtmann. 2021. Feeding hungry models less: Deep transfer learning for embedded memory PPA models. In Proceedings of the ACM\/IEEE Workshop on Machine Learning for CAD (MLCAD\u201921). 1\u20136.","key":"e_1_3_4_19_2","DOI":"10.1109\/MLCAD52597.2021.9531299"},{"doi-asserted-by":"crossref","unstructured":"W. Lee Y. Kim J. H. Ryoo D. Sunwoo A. Gerstlauer and L. K. John. 2015. PowerTrain: A learning-based calibration of McPAT power models. Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED\u201915). 189\u2013194.","key":"e_1_3_4_20_2","DOI":"10.1109\/ISLPED.2015.7273512"},{"doi-asserted-by":"crossref","unstructured":"F. Li Y. Wang C. Liu H. Li and X. Li. 2022. NoCeption: A fast PPA prediction framework for network-on-chips using graph neural network. In Proceedings of the Conference on Design Automation and Test in Europe (DATE\u201922).","key":"e_1_3_4_21_2","DOI":"10.23919\/DATE54114.2022.9774525"},{"doi-asserted-by":"crossref","unstructured":"S. Li J. H. Ahn R. D. Strong J. B. Brockman D. M. Tullsen and N. P. Jouppi. 2009. McPAT: An integrated power area and timing modeling framework for multicore and manycore architectures. In Proceedings of the Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201909).","key":"e_1_3_4_22_2","DOI":"10.1145\/1669112.1669172"},{"doi-asserted-by":"crossref","unstructured":"S. D. Manasi F. S. Snigdha and S. S. Sapatnekar. 2020. NeuPart: Using analytical models to drive energy-efficient partitioning of CNN computations on cloud-connected mobile clients. IEEE TVLSI 28 8 (2020) 1844\u20131857.","key":"e_1_3_4_23_2","DOI":"10.1109\/TVLSI.2020.2995135"},{"unstructured":"R. Liaw E. Liang R. Nishihara P. Moritz J. E. Gonzalez and I. Stoica. 2018. Tune: A research platform for distributed model selection and training. Retrieved from https:\/\/arxiv.org\/abs\/1807.05118","key":"e_1_3_4_24_2"},{"doi-asserted-by":"crossref","unstructured":"Z. Lin J. Zhao S. Sinha and W. Zhang. 2020. HL-Pow: A learning-based power modeling framework for high-level synthesis. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC\u201920). 574\u2013580.","key":"e_1_3_4_25_2","DOI":"10.1109\/ASP-DAC47756.2020.9045442"},{"unstructured":"Y.-C. Lu W.-T. Chan V. Khandelwal and S. K. Lim. 2022. Driving early physical synthesis exploration through end-of-flow total power prediction. In Proceedings of the ACM\/IEEE Workshop on Machine Learning for CAD (MLCAD\u201922).","key":"e_1_3_4_26_2"},{"doi-asserted-by":"crossref","unstructured":"D. Mahajan J. Park E. Amaro H. Sharma A. Yazdanbakhsh J. K. Kim et\u00a0al. 2016. TABLA: A unified template-based framework for accelerating statistical machine learning. In Proceedings of the IEEE International Symposium on High-Performance Computer Architecture (HPCA\u201916) 2016 14\u201326.","key":"e_1_3_4_27_2","DOI":"10.1109\/HPCA.2016.7446050"},{"doi-asserted-by":"crossref","unstructured":"S. D. Manasi F. S. Snigdha and S. S. Sapatnekar. 2018. NeuPart: Using analytical models to drive energy-efficient partitioning of CNN computations on cloud-connected mobile clients. IEEE TVLSI 28 8 (2018) 1844\u20131857.","key":"e_1_3_4_28_2","DOI":"10.1109\/TVLSI.2020.2995135"},{"doi-asserted-by":"crossref","unstructured":"T. Moreau T. Chen L. Vega J. Roesch E. Yan L. Zheng et\u00a0al. 2019. A hardware\u2013software blueprint for flexible deep learning specialization. IEEE Micro 39 5 (2019) 8\u201316.","key":"e_1_3_4_29_2","DOI":"10.1109\/MM.2019.2928962"},{"doi-asserted-by":"crossref","unstructured":"S. D. Manasi and S. S. Sapatnekar. 2021. DeepOpt: Optimized scheduling of CNN workloads for ASIC-based systolic deep learning accelerators. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC\u201921). 235\u2013241.","key":"e_1_3_4_30_2","DOI":"10.1145\/3394885.3431539"},{"doi-asserted-by":"crossref","unstructured":"H. Niederreiter. 1988. Low-discrepancy and low-dispersion sequences. J. Number Theory 30 1 (1988) 51\u201370.","key":"e_1_3_4_31_2","DOI":"10.1016\/0022-314X(88)90025-X"},{"doi-asserted-by":"crossref","unstructured":"Y. Ozaki Y. Tanigaki S. Watanabe and M. Onishi. 2020. Multiobjective tree-structured parzen estimator for computationally expensive optimization problems. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO\u201920).","key":"e_1_3_4_32_2","DOI":"10.1145\/3377930.3389817"},{"unstructured":"A. Paszke S. Gross F. Massa A. Lerer J. Bradbury G. Chanan et\u00a0al. 2019. PyTorch: An imperative style high-performance deep learning library. In Proceedings of the Neural Information Processing Systems.","key":"e_1_3_4_33_2"},{"doi-asserted-by":"crossref","unstructured":"P. Sengupta A. Tyagi Y. Chen and J. Hu. 2022. How good is your verilog RTL code? A quick answer from machine learning. In Proceedings of the International Conference on Computer-Aided Design (ICCAD\u201922).","key":"e_1_3_4_34_2","DOI":"10.1145\/3508352.3549375"},{"doi-asserted-by":"crossref","unstructured":"Y. S. Shao B. Reagen G. -Y. Wei and D. Brooks. 2014. Aladdin: A Pre-RTL power-performance accelerator simulator enabling large design space exploration of customized architectures. In Proceedings of the Annual International Symposium on Computer Architecture (ISCA\u201914). 97\u2013108.","key":"e_1_3_4_35_2","DOI":"10.1145\/2678373.2665689"},{"unstructured":"E. Tabanelli G. Tagliavini and L. Benini. 2021. DNN is not all you need: Parallelizing non-neural ML algorithms on ultra-low-power IoT processors. Retrieved from https:\/\/arxiv.org\/abs\/2107.09448","key":"e_1_3_4_36_2"},{"doi-asserted-by":"crossref","unstructured":"S. Takamaeda-Yamazaki. 2015. Pyverilog: A python-based hardware design processing toolkit for verilog HDL. In Applied Reconfigurable Computing. Springer 451\u2013460.","key":"e_1_3_4_37_2","DOI":"10.1007\/978-3-319-16214-0_42"},{"unstructured":"H.-S. Wang X. Zhu L.-S. Peh and S. Malik. 2002. Orion: A power-performance simulator for interconnection networks. In Proceedings of the Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO\u201902). 294\u2013395.","key":"e_1_3_4_38_2"},{"unstructured":"S. Williams and M. Baxter. 2002. Icarus verilog: Open-source verilog more than a year later. Linux J. (July 2002). Retrieved from https:\/\/www.linuxjournal.com\/article\/6001","key":"e_1_3_4_39_2"},{"doi-asserted-by":"crossref","unstructured":"P. Xu X. Zhang C. Hao Y. Zhao Y. Zhang Y. Wang et\u00a0al. 2020. AutoDNNchip: An automated DNN chip predictor and builder for both FPGAs and ASICs. In Proceedings of the Conference on Field Programmable Gate Arrays (FPGA\u201920). 40\u201350.","key":"e_1_3_4_40_2","DOI":"10.1145\/3373087.3375306"},{"doi-asserted-by":"crossref","unstructured":"Z. Zeng and S. S. Sapatnekar. 2023. Energy-efficient hardware acceleration of shallow machine learning applications. In Proceedings of the Conference on Design Automation and Test in Europe (DATE\u201923).","key":"e_1_3_4_41_2","DOI":"10.23919\/DATE56975.2023.10137232"},{"unstructured":"VeriGood-ML. Retrieved from https:\/\/github.com\/VeriGOOD-ML\/public","key":"e_1_3_4_42_2"},{"unstructured":"AutoML: Automatic Machine Learning. Retrieved from https:\/\/docs.h2o.ai\/h2o\/latest-stable\/h2o-docs\/automl.html","key":"e_1_3_4_43_2"},{"unstructured":"VTA Hardware Design Stack. Retrieved from https:\/\/github.com\/pasqoc\/incubator-tvm-vta","key":"e_1_3_4_44_2"},{"unstructured":"GitHub repository: \u201cVeriGOOD-ML: Verilog Generator Optimized for Designs for Machine Learning\u201d. Retrieved from https:\/\/github.com\/VeriGOOD-ML\/public","key":"e_1_3_4_45_2"},{"unstructured":"NanGate45 PDK. Retrieved from https:\/\/eda.ncsu.edu\/freepdk\/freepdk45\/","key":"e_1_3_4_46_2"}],"container-title":["ACM Transactions on Design Automation of Electronic Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3664652","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3664652","content-type":"application\/pdf","content-version":"vor","intended-application":"syndication"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3664652","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:17:29Z","timestamp":1750295849000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3664652"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2024,7,9]]},"references-count":45,"journal-issue":{"issue":"4","published-print":{"date-parts":[[2024,7,31]]}},"alternative-id":["10.1145\/3664652"],"URL":"https:\/\/doi.org\/10.1145\/3664652","relation":{},"ISSN":["1084-4309","1557-7309"],"issn-type":[{"type":"print","value":"1084-4309"},{"type":"electronic","value":"1557-7309"}],"subject":[],"published":{"date-parts":[[2024,7,9]]},"assertion":[{"value":"2023-08-22","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-03-13","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-07-09","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}