{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T05:05:51Z","timestamp":1750309551374,"version":"3.41.0"},"reference-count":35,"publisher":"Association for Computing Machinery (ACM)","issue":"3","license":[{"start":{"date-parts":[[2025,5,8]],"date-time":"2025-05-08T00:00:00Z","timestamp":1746662400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Des. Autom. Electron. Syst."],"published-print":{"date-parts":[[2025,5,31]]},"abstract":"<jats:p>As Deep Neural Network (DNN) models became more complex, the escalating computational demands on hardware made DNN accelerators a critical research topic. The rapid growth of DNN models required DNN accelerators to keep pace with these computational demands. However, the cost of hardware design was significant, and hardware and software were tightly coupled in the design of DNN accelerators. Much research on HW\/SW co-design was evident, highlighting the importance of having a comprehensive framework to help find the optimal hardware and software design during the design phase. The cost models used in most of the current research relied on data reuse and mathematical estimation to calculate costs, an approach that was fast but inaccurate. In this article, we propose a framework for HW\/SW co-design and introduce a hybrid cost model based on Gem5 that provides fast and precise performance evaluation. The framework uses a memory-centric approach to accurately model off-chip memory behavior. In addition, we discuss how to find the best design in a large co-design space and integrate a design point through a traffic generator and a cost model. Finally, we demonstrate that our framework can accurately assist DNN accelerator developers in exploring the optimal hardware and software co-design quickly and efficiently.<\/jats:p>","DOI":"10.1145\/3729227","type":"journal-article","created":{"date-parts":[[2025,4,18]],"date-time":"2025-04-18T11:22:41Z","timestamp":1744975361000},"page":"1-29","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Design Space Exploration for Scalable DNN Accelerators Using a Memory-Centric Analytical Model for HW\/SW Co-Design"],"prefix":"10.1145","volume":"30","author":[{"ORCID":"https:\/\/orcid.org\/0009-0009-5711-8046","authenticated-orcid":false,"given":"Wei-Chun","family":"Huang","sequence":"first","affiliation":[{"name":"National Yang Ming Chiao Tung University, Hsinchu, Taiwan"}]},{"ORCID":"https:\/\/orcid.org\/0009-0001-2380-3119","authenticated-orcid":false,"given":"Chih-Wei","family":"Tang","sequence":"additional","affiliation":[{"name":"National Yang Ming Chiao Tung University, Hsinchu, Taiwan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0003-4145-1927","authenticated-orcid":false,"given":"Kuei-Chung","family":"Chang","sequence":"additional","affiliation":[{"name":"International Graduate School of Artificial Intelligence, National Yunlin University of Science and Technology, Douliou, Taiwan"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-6925-893X","authenticated-orcid":false,"given":"Tien-Fu","family":"Chen","sequence":"additional","affiliation":[{"name":"National Yang Ming Chiao Tung University, Hsinchu, Taiwan"}]},{"ORCID":"https:\/\/orcid.org\/0009-0008-3151-967X","authenticated-orcid":false,"given":"Hsiang-Cheng","family":"Hsieh","sequence":"additional","affiliation":[{"name":"National Yang Ming Chiao Tung University, Hsinchu, Taiwan"}]},{"ORCID":"https:\/\/orcid.org\/0009-0006-8969-4599","authenticated-orcid":false,"given":"Ming-Hsuan","family":"Tsai","sequence":"additional","affiliation":[{"name":"National Yang Ming Chiao Tung University, Hsinchu, Taiwan"}]}],"member":"320","published-online":{"date-parts":[[2025,5,8]]},"reference":[{"key":"e_1_3_1_2_2","unstructured":"C. Lattner M. Amini U. Bondhugula A. Cohen A. Davis J. Pienaar R. Riddle T. Shpeisman N. Vasilache and O. Zinenko. 2023. MLIR: A compiler infrastructure for the end of Moore's law. arXiv preprint arXiv:2002.11054."},{"key":"e_1_3_1_3_2","doi-asserted-by":"crossref","unstructured":"Z. Jia O. Padon J. Thomas T. Warszawski M. Zaharia and A. Aiken. 2019. TASO: Optimizing deep learning computation with automatic generation of graph substitutions. In SOSP 2019 - Proceedings of the 27th ACM Symposium on Operating Systems Principles. 47--62.","DOI":"10.1145\/3341301.3359630"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA52012.2021.00050"},{"volume-title":"Proceedings of the 49th Annual International Symposium on Computer Architecture (ISCA'22)","key":"e_1_3_1_5_2","unstructured":"S. Zheng, R. Chen, A. Wei, Y. Jin, Q. Han, L. Lu, B. Wu, X. Li, S. Yan, and Y. Liang. 2022. AMOS: Enabling automatic mapping for tensor computations on spatial accelerators with hardware abstraction. In Proceedings of the 49th Annual International Symposium on Computer Architecture (ISCA'22). Association for Computing Machinery, New York, NY, USA, 874--887."},{"key":"e_1_3_1_6_2","unstructured":"T. Chen T. Moreau Z. Jiang L. Zheng E. Yan H. Shen M. Cowan L. Wang Y. Hu L. Ceze C. Guestrin and A. Krishnamurthy 2018. TVM: An automated end-to-end optimizing compiler for deep learning. In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI'18). USENIX Association Carlsbad CA USA 578--594."},{"key":"e_1_3_1_7_2","unstructured":"\u2018NVIDIA Deep Learning Accelerator\u2019. Retrieved July 21 2023 from http:\/\/nvdla.org\/"},{"volume-title":"Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA'15)","key":"e_1_3_1_8_2","unstructured":"Z. Du, R. Fasthuber, T. Chen, P. Ienne, L. Li, T. Luo, X. Feng, Y. Chen, and O. Temam. 2015. ShiDianNao: Shifting vision processing closer to the sensor. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA'15). 92--104."},{"key":"e_1_3_1_9_2","doi-asserted-by":"publisher","DOI":"10.1109\/JETCAS.2019.2910232"},{"key":"e_1_3_1_10_2","doi-asserted-by":"crossref","unstructured":"H. Kwon P. Chatarasi M. Pellauer A. Parashar V. Sarkar and T. Krishna. 2019. Understanding reuse performance and hardware cost of DNN dataflow: A data-centric approach. In Proceedings of the 52nd Annual IEEE\/ACM International Symposium on Microarchitecture (MICRO'52). Association for Computing Machinery New York NY USA 754--768.","DOI":"10.1145\/3352460.3358252"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","DOI":"10.1109\/MM.2020.2985963"},{"volume-title":"Proceedings of the 48th Annual International Symposium on Computer Architecture (ISCA'21)","key":"e_1_3_1_12_2","unstructured":"L. Lu, N. Guan, Y. Wang, L. Jia, Z. Luo, J. Yin, J. Cong, and Y. Liang. 2021. TENET: A framework for modeling tensor dataflow based on relation-centric notation. In Proceedings of the 48th Annual International Symposium on Computer Architecture (ISCA'21). 720--733."},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA52012.2021.00086"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA56546.2023.10071095"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.23919\/DATE54114.2022.9774568"},{"key":"e_1_3_1_16_2","doi-asserted-by":"crossref","unstructured":"T. Chen Z. Du N. Sun J. Wang C. Wu Y. Chen and O. Temam. 2014. \u2018DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning\u2019. ACM SIGARCH Computer Architecture News 42 1 (2014) 269--284.","DOI":"10.1145\/2654822.2541967"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1145\/3007787.3001177"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISCA.2018.00012"},{"key":"e_1_3_1_19_2","doi-asserted-by":"publisher","DOI":"10.1145\/3173162.3173176"},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1109\/FPL.2016.7577308"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1109\/HPCA.2017.55"},{"key":"e_1_3_1_22_2","doi-asserted-by":"crossref","unstructured":"M. Gao X. Yang J. Pu M. Horowitz and C. Kozyrakis. 2019. TANGRAM: Optimized coarse-grained dataflow for scalable NN accelerators. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'19). 807--820.","DOI":"10.1145\/3297858.3304014"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1109\/ISPASS.2019.00040"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP40776.2020.9053977"},{"key":"e_1_3_1_25_2","doi-asserted-by":"crossref","unstructured":"X. Yang M. Gao Q. Liu J. Setter J. Pu A. Nayak S. Bell K. Cao H. Ha P. Raina C. Kozyrakis and M. Horowitz. 2020. Interstellar: using halide.s scheduling language to analyze DNN accelerators. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'20). 369--383.","DOI":"10.1145\/3373376.3378514"},{"key":"e_1_3_1_26_2","unstructured":"T. Chen T. Moreau Z. Jiang L. Zheng E. Yan H. Shen M. Cowan L. Wang Y. Hu L. Ceze C. Guestrin and A. Krishnamurthy. 2018. Learning to optimize tensor programs. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (NeurIPS'18). 3393--3404."},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1145\/3358198"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","unstructured":"P. Chatarasi H. Kwon A. Parashar M. Pellauer T. Krishna and V. Sarkar. 2021. Marvel: A data-centric approach for mapping deep learning operators on spatial accelerators. ACM Transactions on Architecture and Code Optimization (TACO) 19 1 (2021) 1--26. DOI:10.1145\/3485137","DOI":"10.1145\/3485137"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","unstructured":"S.-C. Kao and T. Krishna. 2020. GAMMA: automating the HW mapping of DNN models on accelerators via genetic algorithm. In Proceedings of the 39th International Conference on Computer-Aided Design (ICCAD'20). Article 44 1--9. DOI:10.1145\/3400302.3415639","DOI":"10.1145\/3400302.3415639"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1109\/MICRO50266.2020.00058"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1145\/3373376.3378508"},{"key":"e_1_3_1_32_2","doi-asserted-by":"publisher","DOI":"10.1145\/3445814.3446762"},{"key":"e_1_3_1_33_2","doi-asserted-by":"publisher","DOI":"10.1109\/4235.996017"},{"key":"e_1_3_1_34_2","doi-asserted-by":"publisher","DOI":"10.1109\/CVPR.2018.00474"},{"key":"e_1_3_1_35_2","unstructured":"K. He X. Zhang S. Ren and J. Sun. 2023. \u2018Deep residual learning for image recognition\u2019. Retrieved August 03 2023 from http:\/\/image-net.org\/challenges\/LSVRC\/2015\/"},{"key":"e_1_3_1_36_2","article-title":"Very deep convolutional networks for large-scale image recognition","author":"Simonyan K.","year":"2023","unstructured":"K. Simonyan and A. Zisserman. 2023. \u2018Very deep convolutional networks for large-scale image recognition\u2019. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015.","journal-title":"Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015"}],"container-title":["ACM Transactions on Design Automation of Electronic Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3729227","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3729227","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:18:48Z","timestamp":1750295928000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3729227"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,5,8]]},"references-count":35,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2025,5,31]]}},"alternative-id":["10.1145\/3729227"],"URL":"https:\/\/doi.org\/10.1145\/3729227","relation":{},"ISSN":["1084-4309","1557-7309"],"issn-type":[{"type":"print","value":"1084-4309"},{"type":"electronic","value":"1557-7309"}],"subject":[],"published":{"date-parts":[[2025,5,8]]},"assertion":[{"value":"2024-08-13","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-03-25","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-05-08","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}