{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T05:05:34Z","timestamp":1750309534123,"version":"3.41.0"},"reference-count":30,"publisher":"Association for Computing Machinery (ACM)","issue":"2","license":[{"start":{"date-parts":[[2025,3,6]],"date-time":"2025-03-06T00:00:00Z","timestamp":1741219200000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by-sa\/4.0\/"}],"funder":[{"name":"German Federal Ministry of Education and Research","award":["16ES0876 (GENIAL!)"],"award-info":[{"award-number":["16ES0876 (GENIAL!)"]}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Embed. Comput. Syst."],"published-print":{"date-parts":[[2025,3,31]]},"abstract":"<jats:p>Implementing Deep Neural Networks (DNNs) on resource-constrained edge devices is a challenging task that requires tailored hardware accelerator architectures and a clear understanding of their performance characteristics when executing the intended AI workload. To facilitate this, we present an automated generation approach for fast performance models to accurately estimate the latency of a DNN mapped onto systematically modeled and concisely described accelerator architectures.<\/jats:p>\n          <jats:p>Using our accelerator architecture description method, we modeled representative DNN accelerators such as Gemmini, UltraTrail, Plasticine-derived, and a parameterizable systolic array. Together with DNN mappings for those modeled architectures, we perform a combined DNN\/hardware dependency graph analysis, which enables us, in the best case, to evaluate only 154 loop kernel iterations to estimate the performance for 4.19 billion instructions achieving a significant speedup. We outperform regression and analytical models in terms of mean absolute percentage error (MAPE) compared with simulation results, while being several magnitudes faster than an RTL simulation.<\/jats:p>","DOI":"10.1145\/3715122","type":"journal-article","created":{"date-parts":[[2025,1,25]],"date-time":"2025-01-25T11:43:47Z","timestamp":1737805427000},"page":"1-32","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":0,"title":["Automatic Generation of Fast and Accurate Performance Models for Deep Neural Network Accelerators"],"prefix":"10.1145","volume":"24","author":[{"ORCID":"https:\/\/orcid.org\/0000-0002-2701-5881","authenticated-orcid":false,"given":"Konstantin","family":"L\u00fcbeck","sequence":"first","affiliation":[{"name":"Embedded Systems, University of T\u00fcbingen, T\u00fcbingen, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0001-5702-5768","authenticated-orcid":false,"given":"Alexander Louis-Ferdinand","family":"Jung","sequence":"additional","affiliation":[{"name":"Embedded Systems, University of T\u00fcbingen, T\u00fcbingen, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-2632-7078","authenticated-orcid":false,"given":"Felix","family":"Wedlich","sequence":"additional","affiliation":[{"name":"Embedded Systems, University of T\u00fcbingen, T\u00fcbingen, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-3471-7560","authenticated-orcid":false,"given":"Mika Markus","family":"M\u00fcller","sequence":"additional","affiliation":[{"name":"Embedded Systems, University of T\u00fcbingen, T\u00fcbingen, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-3587-0415","authenticated-orcid":false,"given":"Federico Nicol\u00e1s","family":"Peccia","sequence":"additional","affiliation":[{"name":"FZI, Karlsruhe, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-0885-5424","authenticated-orcid":false,"given":"Felix","family":"Th\u00f6mmes","sequence":"additional","affiliation":[{"name":"FZI, Karlsruhe, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0009-0003-7193-5620","authenticated-orcid":false,"given":"Jannik","family":"Steinmetz","sequence":"additional","affiliation":[{"name":"Embedded Systems, University of T\u00fcbingen, T\u00fcbingen, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0009-0007-9428-9085","authenticated-orcid":false,"given":"Valentin","family":"Biermaier","sequence":"additional","affiliation":[{"name":"Embedded Systems, University of T\u00fcbingen, T\u00fcbingen, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1795-0948","authenticated-orcid":false,"given":"Adrian","family":"Frischknecht","sequence":"additional","affiliation":[{"name":"Embedded Systems, University of T\u00fcbingen, T\u00fcbingen, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-6642-3976","authenticated-orcid":false,"given":"Paul","family":"Palomero Bernardo","sequence":"additional","affiliation":[{"name":"Embedded Systems, University of T\u00fcbingen, T\u00fcbingen, Germany"}]},{"ORCID":"https:\/\/orcid.org\/0000-0002-1615-507X","authenticated-orcid":false,"given":"Oliver","family":"Bringmann","sequence":"additional","affiliation":[{"name":"Embedded Systems, University of T\u00fcbingen, T\u00fcbingen, Germany"}]}],"member":"320","published-online":{"date-parts":[[2025,3,6]]},"reference":[{"key":"e_1_3_1_2_2","doi-asserted-by":"publisher","unstructured":"Alon Amid David Biancolin Abraham Gonzalez Daniel Grubb Sagar Karandikar Harrison Liew Albert Magyar Howard Mao Albert Ou Nathan Pemberton Paul Rigge Colin Schmidt John Wright Jerry Zhao Yakun Sophia Shao Krste Asanovi\u0107 and Borivoje Nikoli\u0107. 2020. Chipyard: Integrated design simulation and implementation framework for custom SoCs. IEEE Micro 40 4 (2020) 10\u201321. DOI:10.1109\/MM.2020.2996616","DOI":"10.1109\/MM.2020.2996616"},{"key":"e_1_3_1_3_2","doi-asserted-by":"publisher","DOI":"10.1109\/isvlsi.2016.111"},{"key":"e_1_3_1_4_2","doi-asserted-by":"publisher","DOI":"10.1145\/2228360.2228584"},{"key":"e_1_3_1_5_2","doi-asserted-by":"publisher","DOI":"10.1109\/TCAD.2020.3012320"},{"key":"e_1_3_1_6_2","doi-asserted-by":"publisher","DOI":"10.1145\/3457388.3458666"},{"key":"e_1_3_1_7_2","volume-title":"Proceedings of the 10th International Workshop on Frontiers in Handwriting Recognition","author":"Chellapilla Kumar","year":"2006","unstructured":"Kumar Chellapilla, Sidd Puri, and Patrice Simard. 2006. High performance convolutional neural networks for document processing. In Proceedings of the 10th International Workshop on Frontiers in Handwriting Recognition. Suvisoft."},{"key":"e_1_3_1_8_2","doi-asserted-by":"publisher","DOI":"10.1145\/2654822.2541967"},{"key":"e_1_3_1_9_2","unstructured":"Tianqi Chen Thierry Moreau Ziheng Jiang Lianmin Zheng Eddie Yan Haichen Shen Meghan Cowan Leyuan Wang Yuwei Hu Luis Ceze Carlos Guestrin and Arvind Krishnamurthy. 2018. TVM: An automated end-to-end optimizing compiler for deep learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI\u201918). USENIX Association Carlsbad CA 578\u2013594. Retrieved from https:\/\/www.usenix.org\/conference\/osdi18\/presentation\/chen"},{"key":"e_1_3_1_10_2","doi-asserted-by":"publisher","DOI":"10.1109\/jssc.2016.2616357"},{"key":"e_1_3_1_11_2","doi-asserted-by":"publisher","unstructured":"Seungwoo Choi Seokjun Seo Beomjun Shin Hyeongmin Byun Martin Kersner Beomsu Kim Dongyoung Kim and Sungjoo Ha. 2019. Temporal convolution for real-time keyword spotting on mobile devices. In Interspeech 2019. 3372\u20133376. DOI:10.21437\/Interspeech.2019-1363","DOI":"10.21437\/Interspeech.2019-1363"},{"key":"e_1_3_1_12_2","volume-title":"Unified Modeling Language (UML) Version 2.5.1","author":"Cook Steve","year":"2017","unstructured":"Steve Cook, Conrad Bock, Pete Rivett, Tom Rutt, Ed Seidewitz, Bran Selic, and Doug Tolbert. 2017. Unified Modeling Language (UML) Version 2.5.1. Standard. Object Management Group (OMG). Retrieved from https:\/\/www.omg.org\/spec\/UML\/2.5.1"},{"key":"e_1_3_1_13_2","doi-asserted-by":"publisher","DOI":"10.23919\/DATE54114.2022.9774593"},{"key":"e_1_3_1_14_2","doi-asserted-by":"publisher","unstructured":"Hasan Genc Seah Kim Alon Amid Ameer Haj-Ali Vighnesh Iyer Pranav Prakash Jerry Zhao Daniel Grubb Harrison Liew Howard Mao Albert Ou Colin Schmidt Samuel Steffl John Wright Ion Stoica Jonathan Ragan-Kelley Krste Asanovic Borivoje Nikolic and Yakun Sophia Shao. 2021. Gemmini: Enabling systematic deep-learning architecture evaluation via full-stack integration. In 2021 58th ACM\/IEEE Design Automation Conference (DAC). 769\u2013774. DOI:10.1109\/DAC18074.2021.9586216","DOI":"10.1109\/DAC18074.2021.9586216"},{"key":"e_1_3_1_15_2","doi-asserted-by":"publisher","DOI":"10.1109\/DSD57027.2022.00056"},{"key":"e_1_3_1_16_2","doi-asserted-by":"publisher","DOI":"10.1145\/3065386"},{"key":"e_1_3_1_17_2","doi-asserted-by":"publisher","DOI":"10.1007\/s11241-006-9205-5"},{"key":"e_1_3_1_18_2","doi-asserted-by":"publisher","DOI":"10.1109\/CASES55004.2022.00020"},{"key":"e_1_3_1_19_2","unstructured":"Mika Markus Mueller Alexander Richard Manfred Borst Konstantin Luebeck Alexander Louis-Ferdinand Jung and Oliver Bringmann. 2024. Using the abstract computer architecture description language to model AI hardware accelerators. In MBMV 2024; 27. Workshop. 19\u201330."},{"key":"e_1_3_1_20_2","doi-asserted-by":"publisher","DOI":"10.1093\/comjnl\/7.4.308"},{"key":"e_1_3_1_21_2","doi-asserted-by":"publisher","DOI":"10.1145\/1862876.1862877"},{"key":"e_1_3_1_22_2","doi-asserted-by":"publisher","DOI":"10.1109\/ispass.2019.00042"},{"key":"e_1_3_1_23_2","doi-asserted-by":"publisher","DOI":"10.1145\/3079856.3080256"},{"key":"e_1_3_1_24_2","doi-asserted-by":"publisher","DOI":"10.1109\/isca.2014.6853196"},{"key":"e_1_3_1_25_2","first-page":"6105","volume-title":"Proceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research)","volume":"97","author":"Tan Mingxing","year":"2019","unstructured":"Mingxing Tan and Quoc Le. 2019. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.), Vol. 97. PMLR, 6105\u20136114."},{"key":"e_1_3_1_26_2","unstructured":"The Python Software Foundation. 2022. Python 3: Typing\u2014Support for Type Hints. (2022). Retrieved from https:\/\/docs.python.org\/3\/library\/typing.html"},{"key":"e_1_3_1_27_2","doi-asserted-by":"publisher","DOI":"10.1109\/jiot.2020.2981684"},{"key":"e_1_3_1_28_2","doi-asserted-by":"publisher","DOI":"10.1109\/access.2022.3146413"},{"key":"e_1_3_1_29_2","doi-asserted-by":"publisher","DOI":"10.1109\/access.2020.3047259"},{"key":"e_1_3_1_30_2","doi-asserted-by":"publisher","DOI":"10.1145\/1498765.1498785"},{"key":"e_1_3_1_31_2","doi-asserted-by":"publisher","DOI":"10.1109\/ICCAD45719.2019.8942149"}],"container-title":["ACM Transactions on Embedded Computing Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3715122","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3715122","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,19]],"date-time":"2025-06-19T01:18:18Z","timestamp":1750295898000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3715122"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2025,3,6]]},"references-count":30,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2025,3,31]]}},"alternative-id":["10.1145\/3715122"],"URL":"https:\/\/doi.org\/10.1145\/3715122","relation":{},"ISSN":["1539-9087","1558-3465"],"issn-type":[{"type":"print","value":"1539-9087"},{"type":"electronic","value":"1558-3465"}],"subject":[],"published":{"date-parts":[[2025,3,6]]},"assertion":[{"value":"2024-02-22","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2024-06-05","order":2,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2025-03-06","order":3,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}