{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T07:44:37Z","timestamp":1740123877751,"version":"3.37.3"},"reference-count":16,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2022,3,24]],"date-time":"2022-03-24T00:00:00Z","timestamp":1648080000000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,3,24]],"date-time":"2022-03-24T00:00:00Z","timestamp":1648080000000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100007210","name":"RWTH Aachen University","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100007210","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Int J Parallel Prog"],"published-print":{"date-parts":[[2022,4]]},"abstract":"<jats:title>Abstract<\/jats:title><jats:p>In recent years the growing popularity of Convolutional Neural Network(CNNs) has driven the development of specialized hardware, so called Deep Learning Accelerator (DLAs). The large market for DLAs and the huge amount of papers published on DLA design show that there is currently no one-size-fits-all solution. Depending on the given optimization goals such as power consumption or performance, there may be several optimal solutions for each scenario. A commonly used method for finding these solutions as early as possible in the design cycle, is the employment of analytical models which try to describe a design by simple yet insightful and sufficiently accurate formulas. The main contribution of this work is the generic Analytical Model for AI accelerators (AMAIX) for the estimation of CNN execution time on DLAs. It is based on the popular Roofline model. To show the validity of our approach, AMAIX was applied to the Nvidia Deep Learning Accelerator (NVDLA) as a case study using the AlexNet and LeNet CNNs as workloads. The resulting performance predictions were verified against an RTL emulation of the NVDLA using a Synopsys ZeBu Server-based hybrid prototype. By refining the model following a divide-and-conquer paradigm, AMAIX predicted the inference time of AlexNet and LeNet on the NVDLA with an accuracy 98%. Furthermore, this work shows how to use the obtained results for root-cause analysis and as a starting point for design space exploration.<\/jats:p>","DOI":"10.1007\/s10766-022-00728-3","type":"journal-article","created":{"date-parts":[[2022,3,25]],"date-time":"2022-03-25T01:03:15Z","timestamp":1648170195000},"page":"295-318","update-policy":"https:\/\/doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":0,"title":["AMAIX In-Depth: A Generic Analytical Model for Deep Learning Accelerators"],"prefix":"10.1007","volume":"50","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3434-2271","authenticated-orcid":false,"given":"Niko","family":"Zurstra\u00dfen","sequence":"first","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Lukas","family":"J\u00fcnger","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Tim","family":"Kogel","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Holger","family":"Keding","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]},{"given":"Rainer","family":"Leupers","sequence":"additional","affiliation":[],"role":[{"role":"author","vocabulary":"crossref"}]}],"member":"297","published-online":{"date-parts":[[2022,3,24]]},"reference":[{"key":"728_CR1","unstructured":"Bratt, I.: Arm\u2019s first-generation machine learning processor. In: IEEE Hot Chips 30 Symposium (2018)"},{"key":"728_CR2","unstructured":"Jouppi, N.P., Young, C., Patil, N.: In-datacenter performance of a tensor processing unit. In: 44th International Symposium on Computer Architecture (ISCA) (2017)"},{"key":"728_CR3","unstructured":"Venkataramanan, G.: Compute and redundancy solution for the full self-driving computer. In: IEEE Hot Chips 31 Symposium (2019)"},{"key":"728_CR4","doi-asserted-by":"crossref","unstructured":"Alwani, M., Chen, H., Ferdman, M., Milder, P.: Fused-layer CNN accelerators. In: 49th IEEE\/ACM International Symposium on Microarchitecture (MICRO) (2016)","DOI":"10.1109\/MICRO.2016.7783725"},{"key":"728_CR5","doi-asserted-by":"crossref","unstructured":"Chen, Y., Emer, J.S., Sze, V.: Eyeriss v2: a flexible and high-performance accelerator for emerging deep neural networks. CoRR (2018)","DOI":"10.1109\/JETCAS.2019.2910232"},{"key":"728_CR6","doi-asserted-by":"crossref","unstructured":"Zhang, C., Li, P., Sun, G.: Optimizing FPGA-based accelerator design for deep convolutional neural networks. In: Proceedings of the 2015 ACM\/SIGDA International Symposium on Field-Programmable Gate Arrays (2015)","DOI":"10.1145\/2684746.2689060"},{"key":"728_CR7","doi-asserted-by":"crossref","unstructured":"Reagen, B., Whatmough, P., Adolf, R., Rama, S., Lee, H., Lee, S.K., Hern\u00e1ndez-Lobato, J.M., Wei, G.Y., Brooks, D.: Minerva: enabling low-power, highly-accurate deep neural network accelerators. In: 2016 ACM\/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA) (2016)","DOI":"10.1109\/ISCA.2016.32"},{"key":"728_CR8","doi-asserted-by":"publisher","DOI":"10.1155\/2021\/6630552","author":"J Misko","year":"2021","unstructured":"Misko, J., Jadhav, S.S., Kim, Y.: Extensible embedded processor for convolutional neural networks. Sci. Program. (2021). https:\/\/doi.org\/10.1155\/2021\/6630552","journal-title":"Sci. Program."},{"issue":"4","key":"728_CR9","doi-asserted-by":"publisher","first-page":"65","DOI":"10.1145\/1498765.1498785","volume":"52","author":"S Williams","year":"2009","unstructured":"Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65\u201376 (2009)","journal-title":"Commun. ACM"},{"key":"728_CR10","unstructured":"NVDLA Github Repository. https:\/\/github.com\/nvdla. Accessed: 27.07.2019"},{"key":"728_CR11","doi-asserted-by":"crossref","unstructured":"J\u00fcnger, L., Zurstrassen, N., Kogel, T., Keding, H., Leupers, R.: Amaix: a generic analytical model for deep learning accelerators. In: SAMOS International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation, pp. 36\u201351. Springer (2020)","DOI":"10.1007\/978-3-030-60939-9_3"},{"key":"728_CR12","doi-asserted-by":"crossref","unstructured":"Hill, M., Janapa Reddi, V.: Gables: a roofline model for mobile SoCs. In: IEEE International Symposium on High Performance Computer Architecture (HPCA) (2019)","DOI":"10.1109\/HPCA.2019.00047"},{"key":"728_CR13","unstructured":"Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS\u201912 Proceedings of the 25th International Conference on Neural Information Processing Systems (2012)"},{"key":"728_CR14","doi-asserted-by":"crossref","unstructured":"LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. In: Proceedings of the IEEE, vol. 86 (1998)","DOI":"10.1109\/5.726791"},{"key":"728_CR15","first-page":"1127","volume-title":"Synopsys Virtual Prototyping for Software Development and Early Architecture Analysis","author":"T Kogel","year":"2017","unstructured":"Kogel, T.: Synopsys Virtual Prototyping for Software Development and Early Architecture Analysis, pp. 1127\u20131159. Springer, Netherlands (2017)"},{"key":"728_CR16","unstructured":"LeNet Prototxt. https:\/\/github.com\/BVLC\/caffe\/blob\/master\/examples\/mnist\/lenet.prototxt. Accessed: 27.08.2021"}],"container-title":["International Journal of Parallel Programming"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10766-022-00728-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10766-022-00728-3\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10766-022-00728-3.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,4,29]],"date-time":"2022-04-29T22:04:54Z","timestamp":1651269894000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10766-022-00728-3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,3,24]]},"references-count":16,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2022,4]]}},"alternative-id":["728"],"URL":"https:\/\/doi.org\/10.1007\/s10766-022-00728-3","relation":{},"ISSN":["0885-7458","1573-7640"],"issn-type":[{"type":"print","value":"0885-7458"},{"type":"electronic","value":"1573-7640"}],"subject":[],"published":{"date-parts":[[2022,3,24]]},"assertion":[{"value":"18 March 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"1 March 2022","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"24 March 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}]}}